All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/23] Implement an HPET-based hardlockup detector
@ 2018-06-13  0:57 ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri

Hi,

This patchset demonstrates the implementation of a hardlockup detector
driven by the High-Precision Event Timer.

== Introduction ==

In CPU architectures that do not have an NMI watchdog, one can be
constructed using a counter of the Performance Monitoring Unit (PMU).
Counters in the PMU have high granularity and high visibility of the CPU.
These capabilities and their limited number make these counters precious
resources. Unfortunately, the perf-based hardlockup detector permanently
consumes one of these counters per CPU.

These counters could be freed for profiling purposes if the hardlockup
detector were driven by another timer.

The hardlockup detector runs relatively infrequently and does not require
visibility of the CPU activity (in addition to detect locked-up CPUs). A
timer that is external to the CPU (e.g., in the chipset) can be used to
drive the detector.

A key requirement is that the timer needs to be capable of issuing a
non-maskable interrupt to the CPU. In most cases, this can be achieved
by tweaking the delivery mode of the interrupt in the interrupt controller
chip (the exception is the IO APIC).

== Parts of this series ==

Several parts of Linux need to be updated to operate the aforementioned
detector.

   1) Update the interrupt subsystem to accept requests of interrupts as
      non-maskable. Likewise, handle irqchips that have this capability.
      Patches 1-5
   
   2) Rework the x86 HPET platform code to reserve, configure a timer
      and its interrupt, and expose the needed interfaces and definitions.
      Patches 6-11

   3) Rework the hardlockup detector to decouple its generic part from
      perf. This adds definitions to be implemented using other sources
      of non-maskable interrupts. Patches 12-14

   4) Add an HPET-based hardlockup detector. This includes probing the
      hardware resources, configure the interrupt and rotate the
      destination of the interrupts among all monitored CPUs.

== Details on the HPET-based hardlockup detector

Unlike the the perf-based hardlockup detector, this implementation is
driven by a single timer. The timer targets one CPU at a time in a round-
robin manner. This means that if a CPU must be monitored every watch_thresh
seconds, in a system with N monitored CPUs the timer must expire every
watch_thresh/N. A timer expiration per CPU attribute is maintained.

The timer expiration time per CPU is updated every time CPUs are put
online or offline (a CPU hotplug thread enables and disables the watchdog
in these events).

Also, given that a single timer drives the detector, a cpumask is needed
to keep track of which online CPUs are allowed to be monitored. This mask
is updated every time a CPU is put online or offline as well as when the
user modifies the mask in /proc/sys/kernel/watchdog_cpumask. This mask
is needed to keep the current behavior of the lockup detector.


Thanks and BR,
Ricardo

Ricardo Neri (23):
  x86/apic: Add a parameter for the APIC delivery mode
  genirq: Introduce IRQD_DELIVER_AS_NMI
  genirq: Introduce IRQF_DELIVER_AS_NMI
  iommu/vt-d/irq_remapping: Add support for IRQCHIP_CAN_DELIVER_AS_NMI
  x86/msi: Add support for IRQCHIP_CAN_DELIVER_AS_NMI
  x86/ioapic: Add support for IRQCHIP_CAN_DELIVER_AS_NMI with interrupt
    remapping
  x86/hpet: Expose more functions to read and write registers
  x86/hpet: Calculate ticks-per-second in a separate function
  x86/hpet: Reserve timer for the HPET hardlockup detector
  x86/hpet: Relocate flag definitions to a header file
  x86/hpet: Configure the timer used by the hardlockup detector
  kernel/watchdog: Introduce a struct for NMI watchdog operations
  watchdog/hardlockup: Define a generic function to detect hardlockups
  watchdog/hardlockup: Decouple the hardlockup detector from perf
  kernel/watchdog: Add a function to obtain the watchdog_allowed_mask
  watchdog/hardlockup: Add an HPET-based hardlockup detector
  watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
  watchdog/hardlockup/hpet: Add the NMI watchdog operations
  watchdog/hardlockup: Make arch_touch_nmi_watchdog() to hpet-based
    implementation
  watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs
  watchdog/hardlockup/hpet: Adjust timer expiration on the number of
    monitored CPUs
  watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot
    parameter
  watchdog/hardlockup: Activate the HPET-based lockup detector

 Documentation/admin-guide/kernel-parameters.txt |   5 +-
 arch/x86/include/asm/hpet.h                     |  38 ++
 arch/x86/include/asm/hw_irq.h                   |   5 +-
 arch/x86/include/asm/msidef.h                   |   3 +
 arch/x86/kernel/apic/io_apic.c                  |   5 +-
 arch/x86/kernel/apic/msi.c                      |   7 +-
 arch/x86/kernel/apic/vector.c                   |   8 +
 arch/x86/kernel/hpet.c                          | 149 ++++++-
 arch/x86/platform/uv/uv_irq.c                   |   2 +-
 drivers/char/hpet.c                             |  31 +-
 drivers/iommu/intel_irq_remapping.c             |  18 +-
 include/linux/hpet.h                            |   1 +
 include/linux/interrupt.h                       |   3 +
 include/linux/irq.h                             |  15 +
 include/linux/nmi.h                             |  56 ++-
 kernel/Makefile                                 |   3 +-
 kernel/irq/manage.c                             |  22 +-
 kernel/watchdog.c                               |  78 +++-
 kernel/watchdog_hld.c                           | 152 +------
 kernel/watchdog_hld_hpet.c                      | 557 ++++++++++++++++++++++++
 kernel/watchdog_hld_perf.c                      | 182 ++++++++
 lib/Kconfig.debug                               |  10 +
 22 files changed, 1145 insertions(+), 205 deletions(-)
 create mode 100644 kernel/watchdog_hld_hpet.c
 create mode 100644 kernel/watchdog_hld_perf.c

-- 
2.7.4


^ permalink raw reply	[flat|nested] 200+ messages in thread

* [RFC PATCH 00/23] Implement an HPET-based hardlockup detector
@ 2018-06-13  0:57 ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri

Hi,

This patchset demonstrates the implementation of a hardlockup detector
driven by the High-Precision Event Timer.

= Introduction =

In CPU architectures that do not have an NMI watchdog, one can be
constructed using a counter of the Performance Monitoring Unit (PMU).
Counters in the PMU have high granularity and high visibility of the CPU.
These capabilities and their limited number make these counters precious
resources. Unfortunately, the perf-based hardlockup detector permanently
consumes one of these counters per CPU.

These counters could be freed for profiling purposes if the hardlockup
detector were driven by another timer.

The hardlockup detector runs relatively infrequently and does not require
visibility of the CPU activity (in addition to detect locked-up CPUs). A
timer that is external to the CPU (e.g., in the chipset) can be used to
drive the detector.

A key requirement is that the timer needs to be capable of issuing a
non-maskable interrupt to the CPU. In most cases, this can be achieved
by tweaking the delivery mode of the interrupt in the interrupt controller
chip (the exception is the IO APIC).

= Parts of this series =

Several parts of Linux need to be updated to operate the aforementioned
detector.

   1) Update the interrupt subsystem to accept requests of interrupts as
      non-maskable. Likewise, handle irqchips that have this capability.
      Patches 1-5
   
   2) Rework the x86 HPET platform code to reserve, configure a timer
      and its interrupt, and expose the needed interfaces and definitions.
      Patches 6-11

   3) Rework the hardlockup detector to decouple its generic part from
      perf. This adds definitions to be implemented using other sources
      of non-maskable interrupts. Patches 12-14

   4) Add an HPET-based hardlockup detector. This includes probing the
      hardware resources, configure the interrupt and rotate the
      destination of the interrupts among all monitored CPUs.

= Details on the HPET-based hardlockup detector

Unlike the the perf-based hardlockup detector, this implementation is
driven by a single timer. The timer targets one CPU at a time in a round-
robin manner. This means that if a CPU must be monitored every watch_thresh
seconds, in a system with N monitored CPUs the timer must expire every
watch_thresh/N. A timer expiration per CPU attribute is maintained.

The timer expiration time per CPU is updated every time CPUs are put
online or offline (a CPU hotplug thread enables and disables the watchdog
in these events).

Also, given that a single timer drives the detector, a cpumask is needed
to keep track of which online CPUs are allowed to be monitored. This mask
is updated every time a CPU is put online or offline as well as when the
user modifies the mask in /proc/sys/kernel/watchdog_cpumask. This mask
is needed to keep the current behavior of the lockup detector.


Thanks and BR,
Ricardo

Ricardo Neri (23):
  x86/apic: Add a parameter for the APIC delivery mode
  genirq: Introduce IRQD_DELIVER_AS_NMI
  genirq: Introduce IRQF_DELIVER_AS_NMI
  iommu/vt-d/irq_remapping: Add support for IRQCHIP_CAN_DELIVER_AS_NMI
  x86/msi: Add support for IRQCHIP_CAN_DELIVER_AS_NMI
  x86/ioapic: Add support for IRQCHIP_CAN_DELIVER_AS_NMI with interrupt
    remapping
  x86/hpet: Expose more functions to read and write registers
  x86/hpet: Calculate ticks-per-second in a separate function
  x86/hpet: Reserve timer for the HPET hardlockup detector
  x86/hpet: Relocate flag definitions to a header file
  x86/hpet: Configure the timer used by the hardlockup detector
  kernel/watchdog: Introduce a struct for NMI watchdog operations
  watchdog/hardlockup: Define a generic function to detect hardlockups
  watchdog/hardlockup: Decouple the hardlockup detector from perf
  kernel/watchdog: Add a function to obtain the watchdog_allowed_mask
  watchdog/hardlockup: Add an HPET-based hardlockup detector
  watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
  watchdog/hardlockup/hpet: Add the NMI watchdog operations
  watchdog/hardlockup: Make arch_touch_nmi_watchdog() to hpet-based
    implementation
  watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs
  watchdog/hardlockup/hpet: Adjust timer expiration on the number of
    monitored CPUs
  watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot
    parameter
  watchdog/hardlockup: Activate the HPET-based lockup detector

 Documentation/admin-guide/kernel-parameters.txt |   5 +-
 arch/x86/include/asm/hpet.h                     |  38 ++
 arch/x86/include/asm/hw_irq.h                   |   5 +-
 arch/x86/include/asm/msidef.h                   |   3 +
 arch/x86/kernel/apic/io_apic.c                  |   5 +-
 arch/x86/kernel/apic/msi.c                      |   7 +-
 arch/x86/kernel/apic/vector.c                   |   8 +
 arch/x86/kernel/hpet.c                          | 149 ++++++-
 arch/x86/platform/uv/uv_irq.c                   |   2 +-
 drivers/char/hpet.c                             |  31 +-
 drivers/iommu/intel_irq_remapping.c             |  18 +-
 include/linux/hpet.h                            |   1 +
 include/linux/interrupt.h                       |   3 +
 include/linux/irq.h                             |  15 +
 include/linux/nmi.h                             |  56 ++-
 kernel/Makefile                                 |   3 +-
 kernel/irq/manage.c                             |  22 +-
 kernel/watchdog.c                               |  78 +++-
 kernel/watchdog_hld.c                           | 152 +------
 kernel/watchdog_hld_hpet.c                      | 557 ++++++++++++++++++++++++
 kernel/watchdog_hld_perf.c                      | 182 ++++++++
 lib/Kconfig.debug                               |  10 +
 22 files changed, 1145 insertions(+), 205 deletions(-)
 create mode 100644 kernel/watchdog_hld_hpet.c
 create mode 100644 kernel/watchdog_hld_perf.c

-- 
2.7.4


^ permalink raw reply	[flat|nested] 200+ messages in thread

* [RFC PATCH 01/23] x86/apic: Add a parameter for the APIC delivery mode
  2018-06-13  0:57 ` Ricardo Neri
  (?)
@ 2018-06-13  0:57   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Joerg Roedel, Juergen Gross,
	Bjorn Helgaas, Wincy Van, Kate Stewart, Philippe Ombredanne,
	Eric W. Biederman, Baoquan He, Dou Liyang, Jan Kiszka, iommu

Until now, the delivery mode of APIC interrupts is set to the default
mode set in the APIC driver. However, there are no restrictions in hardware
to configure each interrupt with a different delivery mode. Specifying the
delivery mode per interrupt is useful when one is interested in changing
the delivery mode of a particular interrupt. For instance, this can be used
to deliver an interrupt as non-maskable.

Add a new member, delivery_mode, to struct irq_cfg. Also, update the
configuration of the delivery mode in the IO APIC, the MSI APIC and the
Intel interrupt remapping driver to use this new per-interrupt member to
configure their respective interrupt tables.

In order to keep the current behavior, initialize the delivery mode of
each interrupt with the with the delivery mode of the APIC driver in use
when the interrupt data is allocated.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Wincy Van <fanwenyi0529@gmail.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hw_irq.h       |  5 +++--
 arch/x86/include/asm/msidef.h       |  3 +++
 arch/x86/kernel/apic/io_apic.c      |  2 +-
 arch/x86/kernel/apic/msi.c          |  2 +-
 arch/x86/kernel/apic/vector.c       |  8 ++++++++
 arch/x86/platform/uv/uv_irq.c       |  2 +-
 drivers/iommu/intel_irq_remapping.c | 10 +++++-----
 7 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 32e666e..c024e59 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -117,8 +117,9 @@ struct irq_alloc_info {
 };
 
 struct irq_cfg {
-	unsigned int		dest_apicid;
-	unsigned int		vector;
+	unsigned int				dest_apicid;
+	unsigned int				vector;
+	enum ioapic_irq_destination_types	delivery_mode;
 };
 
 extern struct irq_cfg *irq_cfg(unsigned int irq);
diff --git a/arch/x86/include/asm/msidef.h b/arch/x86/include/asm/msidef.h
index ee2f8cc..6aef434 100644
--- a/arch/x86/include/asm/msidef.h
+++ b/arch/x86/include/asm/msidef.h
@@ -16,6 +16,9 @@
 					 MSI_DATA_VECTOR_MASK)
 
 #define MSI_DATA_DELIVERY_MODE_SHIFT	8
+#define MSI_DATA_DELIVERY_MODE_MASK	0x00000700
+#define MSI_DATA_DELIVERY_MODE(dm)	(((dm) << MSI_DATA_DELIVERY_MODE_SHIFT) & \
+					 MSI_DATA_DELIVERY_MODE_MASK)
 #define  MSI_DATA_DELIVERY_FIXED	(0 << MSI_DATA_DELIVERY_MODE_SHIFT)
 #define  MSI_DATA_DELIVERY_LOWPRI	(1 << MSI_DATA_DELIVERY_MODE_SHIFT)
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 7553819..10a20f8 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2887,8 +2887,8 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
 			   struct IO_APIC_route_entry *entry)
 {
 	memset(entry, 0, sizeof(*entry));
-	entry->delivery_mode = apic->irq_delivery_mode;
 	entry->dest_mode     = apic->irq_dest_mode;
+	entry->delivery_mode = cfg->delivery_mode;
 	entry->dest	     = cfg->dest_apicid;
 	entry->vector	     = cfg->vector;
 	entry->trigger	     = data->trigger;
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index ce503c9..12202ac 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -45,7 +45,7 @@ static void irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
 	msg->data =
 		MSI_DATA_TRIGGER_EDGE |
 		MSI_DATA_LEVEL_ASSERT |
-		MSI_DATA_DELIVERY_FIXED |
+		MSI_DATA_DELIVERY_MODE(cfg->delivery_mode) |
 		MSI_DATA_VECTOR(cfg->vector);
 }
 
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index bb6f7a2..dfe0a2a 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -547,6 +547,14 @@ static int x86_vector_alloc_irqs(struct irq_domain *domain, unsigned int virq,
 		irqd->chip_data = apicd;
 		irqd->hwirq = virq + i;
 		irqd_set_single_target(irqd);
+
+		/*
+		 * Initialize the delivery mode of this irq to match
+		 * the default delivery mode of the APIC. This could be
+		 * changed later when the interrupt is activated.
+		 */
+		 apicd->hw_irq_cfg.delivery_mode = apic->irq_delivery_mode;
+
 		/*
 		 * Legacy vectors are already assigned when the IOAPIC
 		 * takes them over. They stay on the same vector. This is
diff --git a/arch/x86/platform/uv/uv_irq.c b/arch/x86/platform/uv/uv_irq.c
index e4cb9f4..c88508b 100644
--- a/arch/x86/platform/uv/uv_irq.c
+++ b/arch/x86/platform/uv/uv_irq.c
@@ -35,7 +35,7 @@ static void uv_program_mmr(struct irq_cfg *cfg, struct uv_irq_2_mmr_pnode *info)
 	mmr_value = 0;
 	entry = (struct uv_IO_APIC_route_entry *)&mmr_value;
 	entry->vector		= cfg->vector;
-	entry->delivery_mode	= apic->irq_delivery_mode;
+	entry->delivery_mode	= cfg->delivery_mode;
 	entry->dest_mode	= apic->irq_dest_mode;
 	entry->polarity		= 0;
 	entry->trigger		= 0;
diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 3062a15..9f3a04d 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -1045,7 +1045,7 @@ static int reenable_irq_remapping(int eim)
 	return -1;
 }
 
-static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
+static void prepare_irte(struct irte *irte, struct irq_cfg *irq_cfg)
 {
 	memset(irte, 0, sizeof(*irte));
 
@@ -1059,9 +1059,9 @@ static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
 	 * irq migration in the presence of interrupt-remapping.
 	*/
 	irte->trigger_mode = 0;
-	irte->dlvry_mode = apic->irq_delivery_mode;
-	irte->vector = vector;
-	irte->dest_id = IRTE_DEST(dest);
+	irte->dlvry_mode = irq_cfg->delivery_mode;
+	irte->vector = irq_cfg->vector;
+	irte->dest_id = IRTE_DEST(irq_cfg->dest_apicid);
 	irte->redir_hint = 1;
 }
 
@@ -1238,7 +1238,7 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 	struct irte *irte = &data->irte_entry;
 	struct msi_msg *msg = &data->msi_entry;
 
-	prepare_irte(irte, irq_cfg->vector, irq_cfg->dest_apicid);
+	prepare_irte(irte, irq_cfg);
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
 		/* Set source-id of interrupt request */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 01/23] x86/apic: Add a parameter for the APIC delivery mode
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Joerg Roedel, Juergen Gross,
	Bjorn Helgaas, Wincy Van, Kate Stewart, Philippe Ombredanne,
	Eric W. Biederman, Baoquan He, Dou Liyang, Jan Kiszka, iommu

Until now, the delivery mode of APIC interrupts is set to the default
mode set in the APIC driver. However, there are no restrictions in hardware
to configure each interrupt with a different delivery mode. Specifying the
delivery mode per interrupt is useful when one is interested in changing
the delivery mode of a particular interrupt. For instance, this can be used
to deliver an interrupt as non-maskable.

Add a new member, delivery_mode, to struct irq_cfg. Also, update the
configuration of the delivery mode in the IO APIC, the MSI APIC and the
Intel interrupt remapping driver to use this new per-interrupt member to
configure their respective interrupt tables.

In order to keep the current behavior, initialize the delivery mode of
each interrupt with the with the delivery mode of the APIC driver in use
when the interrupt data is allocated.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Wincy Van <fanwenyi0529@gmail.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hw_irq.h       |  5 +++--
 arch/x86/include/asm/msidef.h       |  3 +++
 arch/x86/kernel/apic/io_apic.c      |  2 +-
 arch/x86/kernel/apic/msi.c          |  2 +-
 arch/x86/kernel/apic/vector.c       |  8 ++++++++
 arch/x86/platform/uv/uv_irq.c       |  2 +-
 drivers/iommu/intel_irq_remapping.c | 10 +++++-----
 7 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 32e666e..c024e59 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -117,8 +117,9 @@ struct irq_alloc_info {
 };
 
 struct irq_cfg {
-	unsigned int		dest_apicid;
-	unsigned int		vector;
+	unsigned int				dest_apicid;
+	unsigned int				vector;
+	enum ioapic_irq_destination_types	delivery_mode;
 };
 
 extern struct irq_cfg *irq_cfg(unsigned int irq);
diff --git a/arch/x86/include/asm/msidef.h b/arch/x86/include/asm/msidef.h
index ee2f8cc..6aef434 100644
--- a/arch/x86/include/asm/msidef.h
+++ b/arch/x86/include/asm/msidef.h
@@ -16,6 +16,9 @@
 					 MSI_DATA_VECTOR_MASK)
 
 #define MSI_DATA_DELIVERY_MODE_SHIFT	8
+#define MSI_DATA_DELIVERY_MODE_MASK	0x00000700
+#define MSI_DATA_DELIVERY_MODE(dm)	(((dm) << MSI_DATA_DELIVERY_MODE_SHIFT) & \
+					 MSI_DATA_DELIVERY_MODE_MASK)
 #define  MSI_DATA_DELIVERY_FIXED	(0 << MSI_DATA_DELIVERY_MODE_SHIFT)
 #define  MSI_DATA_DELIVERY_LOWPRI	(1 << MSI_DATA_DELIVERY_MODE_SHIFT)
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 7553819..10a20f8 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2887,8 +2887,8 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
 			   struct IO_APIC_route_entry *entry)
 {
 	memset(entry, 0, sizeof(*entry));
-	entry->delivery_mode = apic->irq_delivery_mode;
 	entry->dest_mode     = apic->irq_dest_mode;
+	entry->delivery_mode = cfg->delivery_mode;
 	entry->dest	     = cfg->dest_apicid;
 	entry->vector	     = cfg->vector;
 	entry->trigger	     = data->trigger;
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index ce503c9..12202ac 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -45,7 +45,7 @@ static void irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
 	msg->data  		MSI_DATA_TRIGGER_EDGE |
 		MSI_DATA_LEVEL_ASSERT |
-		MSI_DATA_DELIVERY_FIXED |
+		MSI_DATA_DELIVERY_MODE(cfg->delivery_mode) |
 		MSI_DATA_VECTOR(cfg->vector);
 }
 
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index bb6f7a2..dfe0a2a 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -547,6 +547,14 @@ static int x86_vector_alloc_irqs(struct irq_domain *domain, unsigned int virq,
 		irqd->chip_data = apicd;
 		irqd->hwirq = virq + i;
 		irqd_set_single_target(irqd);
+
+		/*
+		 * Initialize the delivery mode of this irq to match
+		 * the default delivery mode of the APIC. This could be
+		 * changed later when the interrupt is activated.
+		 */
+		 apicd->hw_irq_cfg.delivery_mode = apic->irq_delivery_mode;
+
 		/*
 		 * Legacy vectors are already assigned when the IOAPIC
 		 * takes them over. They stay on the same vector. This is
diff --git a/arch/x86/platform/uv/uv_irq.c b/arch/x86/platform/uv/uv_irq.c
index e4cb9f4..c88508b 100644
--- a/arch/x86/platform/uv/uv_irq.c
+++ b/arch/x86/platform/uv/uv_irq.c
@@ -35,7 +35,7 @@ static void uv_program_mmr(struct irq_cfg *cfg, struct uv_irq_2_mmr_pnode *info)
 	mmr_value = 0;
 	entry = (struct uv_IO_APIC_route_entry *)&mmr_value;
 	entry->vector		= cfg->vector;
-	entry->delivery_mode	= apic->irq_delivery_mode;
+	entry->delivery_mode	= cfg->delivery_mode;
 	entry->dest_mode	= apic->irq_dest_mode;
 	entry->polarity		= 0;
 	entry->trigger		= 0;
diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 3062a15..9f3a04d 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -1045,7 +1045,7 @@ static int reenable_irq_remapping(int eim)
 	return -1;
 }
 
-static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
+static void prepare_irte(struct irte *irte, struct irq_cfg *irq_cfg)
 {
 	memset(irte, 0, sizeof(*irte));
 
@@ -1059,9 +1059,9 @@ static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
 	 * irq migration in the presence of interrupt-remapping.
 	*/
 	irte->trigger_mode = 0;
-	irte->dlvry_mode = apic->irq_delivery_mode;
-	irte->vector = vector;
-	irte->dest_id = IRTE_DEST(dest);
+	irte->dlvry_mode = irq_cfg->delivery_mode;
+	irte->vector = irq_cfg->vector;
+	irte->dest_id = IRTE_DEST(irq_cfg->dest_apicid);
 	irte->redir_hint = 1;
 }
 
@@ -1238,7 +1238,7 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 	struct irte *irte = &data->irte_entry;
 	struct msi_msg *msg = &data->msi_entry;
 
-	prepare_irte(irte, irq_cfg->vector, irq_cfg->dest_apicid);
+	prepare_irte(irte, irq_cfg);
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
 		/* Set source-id of interrupt request */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 01/23] x86/apic: Add a parameter for the APIC delivery mode
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Joerg Roedel, Juergen Gross,
	Bjorn Helgaas, Wincy Van, Kate Stewart, Philippe Ombredanne,
	Eric W. Biederman, Baoquan He, Dou Liyang, Jan Kiszka, iommu

Until now, the delivery mode of APIC interrupts is set to the default
mode set in the APIC driver. However, there are no restrictions in hardware
to configure each interrupt with a different delivery mode. Specifying the
delivery mode per interrupt is useful when one is interested in changing
the delivery mode of a particular interrupt. For instance, this can be used
to deliver an interrupt as non-maskable.

Add a new member, delivery_mode, to struct irq_cfg. Also, update the
configuration of the delivery mode in the IO APIC, the MSI APIC and the
Intel interrupt remapping driver to use this new per-interrupt member to
configure their respective interrupt tables.

In order to keep the current behavior, initialize the delivery mode of
each interrupt with the with the delivery mode of the APIC driver in use
when the interrupt data is allocated.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Wincy Van <fanwenyi0529@gmail.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hw_irq.h       |  5 +++--
 arch/x86/include/asm/msidef.h       |  3 +++
 arch/x86/kernel/apic/io_apic.c      |  2 +-
 arch/x86/kernel/apic/msi.c          |  2 +-
 arch/x86/kernel/apic/vector.c       |  8 ++++++++
 arch/x86/platform/uv/uv_irq.c       |  2 +-
 drivers/iommu/intel_irq_remapping.c | 10 +++++-----
 7 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 32e666e..c024e59 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -117,8 +117,9 @@ struct irq_alloc_info {
 };
 
 struct irq_cfg {
-	unsigned int		dest_apicid;
-	unsigned int		vector;
+	unsigned int				dest_apicid;
+	unsigned int				vector;
+	enum ioapic_irq_destination_types	delivery_mode;
 };
 
 extern struct irq_cfg *irq_cfg(unsigned int irq);
diff --git a/arch/x86/include/asm/msidef.h b/arch/x86/include/asm/msidef.h
index ee2f8cc..6aef434 100644
--- a/arch/x86/include/asm/msidef.h
+++ b/arch/x86/include/asm/msidef.h
@@ -16,6 +16,9 @@
 					 MSI_DATA_VECTOR_MASK)
 
 #define MSI_DATA_DELIVERY_MODE_SHIFT	8
+#define MSI_DATA_DELIVERY_MODE_MASK	0x00000700
+#define MSI_DATA_DELIVERY_MODE(dm)	(((dm) << MSI_DATA_DELIVERY_MODE_SHIFT) & \
+					 MSI_DATA_DELIVERY_MODE_MASK)
 #define  MSI_DATA_DELIVERY_FIXED	(0 << MSI_DATA_DELIVERY_MODE_SHIFT)
 #define  MSI_DATA_DELIVERY_LOWPRI	(1 << MSI_DATA_DELIVERY_MODE_SHIFT)
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 7553819..10a20f8 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2887,8 +2887,8 @@ static void mp_setup_entry(struct irq_cfg *cfg, struct mp_chip_data *data,
 			   struct IO_APIC_route_entry *entry)
 {
 	memset(entry, 0, sizeof(*entry));
-	entry->delivery_mode = apic->irq_delivery_mode;
 	entry->dest_mode     = apic->irq_dest_mode;
+	entry->delivery_mode = cfg->delivery_mode;
 	entry->dest	     = cfg->dest_apicid;
 	entry->vector	     = cfg->vector;
 	entry->trigger	     = data->trigger;
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index ce503c9..12202ac 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -45,7 +45,7 @@ static void irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
 	msg->data =
 		MSI_DATA_TRIGGER_EDGE |
 		MSI_DATA_LEVEL_ASSERT |
-		MSI_DATA_DELIVERY_FIXED |
+		MSI_DATA_DELIVERY_MODE(cfg->delivery_mode) |
 		MSI_DATA_VECTOR(cfg->vector);
 }
 
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index bb6f7a2..dfe0a2a 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -547,6 +547,14 @@ static int x86_vector_alloc_irqs(struct irq_domain *domain, unsigned int virq,
 		irqd->chip_data = apicd;
 		irqd->hwirq = virq + i;
 		irqd_set_single_target(irqd);
+
+		/*
+		 * Initialize the delivery mode of this irq to match
+		 * the default delivery mode of the APIC. This could be
+		 * changed later when the interrupt is activated.
+		 */
+		 apicd->hw_irq_cfg.delivery_mode = apic->irq_delivery_mode;
+
 		/*
 		 * Legacy vectors are already assigned when the IOAPIC
 		 * takes them over. They stay on the same vector. This is
diff --git a/arch/x86/platform/uv/uv_irq.c b/arch/x86/platform/uv/uv_irq.c
index e4cb9f4..c88508b 100644
--- a/arch/x86/platform/uv/uv_irq.c
+++ b/arch/x86/platform/uv/uv_irq.c
@@ -35,7 +35,7 @@ static void uv_program_mmr(struct irq_cfg *cfg, struct uv_irq_2_mmr_pnode *info)
 	mmr_value = 0;
 	entry = (struct uv_IO_APIC_route_entry *)&mmr_value;
 	entry->vector		= cfg->vector;
-	entry->delivery_mode	= apic->irq_delivery_mode;
+	entry->delivery_mode	= cfg->delivery_mode;
 	entry->dest_mode	= apic->irq_dest_mode;
 	entry->polarity		= 0;
 	entry->trigger		= 0;
diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 3062a15..9f3a04d 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -1045,7 +1045,7 @@ static int reenable_irq_remapping(int eim)
 	return -1;
 }
 
-static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
+static void prepare_irte(struct irte *irte, struct irq_cfg *irq_cfg)
 {
 	memset(irte, 0, sizeof(*irte));
 
@@ -1059,9 +1059,9 @@ static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
 	 * irq migration in the presence of interrupt-remapping.
 	*/
 	irte->trigger_mode = 0;
-	irte->dlvry_mode = apic->irq_delivery_mode;
-	irte->vector = vector;
-	irte->dest_id = IRTE_DEST(dest);
+	irte->dlvry_mode = irq_cfg->delivery_mode;
+	irte->vector = irq_cfg->vector;
+	irte->dest_id = IRTE_DEST(irq_cfg->dest_apicid);
 	irte->redir_hint = 1;
 }
 
@@ -1238,7 +1238,7 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 	struct irte *irte = &data->irte_entry;
 	struct msi_msg *msg = &data->msi_entry;
 
-	prepare_irte(irte, irq_cfg->vector, irq_cfg->dest_apicid);
+	prepare_irte(irte, irq_cfg);
 	switch (info->type) {
 	case X86_IRQ_ALLOC_TYPE_IOAPIC:
 		/* Set source-id of interrupt request */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 02/23] genirq: Introduce IRQD_DELIVER_AS_NMI
  2018-06-13  0:57 ` Ricardo Neri
@ 2018-06-13  0:57   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Marc Zyngier, Bartosz Golaszewski,
	Doug Berger, Palmer Dabbelt, Randy Dunlap, iommu

Certain interrupt controllers (e.g., APIC) are capable of delivering
interrupts to the CPU as non-maskable. Add the new IRQD_DELIVER_AS_NMI
interrupt state flag. The purpose of this flag is to communicate to the
underlying irqchip whether the interrupt must be delivered in this manner.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Bartosz Golaszewski <brgl@bgdev.pl>
Cc: Doug Berger <opendmb@gmail.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/irq.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 65916a3..7271a2c 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -208,6 +208,7 @@ struct irq_data {
  * IRQD_SINGLE_TARGET		- IRQ allows only a single affinity target
  * IRQD_DEFAULT_TRIGGER_SET	- Expected trigger already been set
  * IRQD_CAN_RESERVE		- Can use reservation mode
+ * IRQD_DELIVER_AS_NMI		- Deliver this interrupt as non-maskable
  */
 enum {
 	IRQD_TRIGGER_MASK		= 0xf,
@@ -230,6 +231,7 @@ enum {
 	IRQD_SINGLE_TARGET		= (1 << 24),
 	IRQD_DEFAULT_TRIGGER_SET	= (1 << 25),
 	IRQD_CAN_RESERVE		= (1 << 26),
+	IRQD_DELIVER_AS_NMI		= (1 << 27),
 };
 
 #define __irqd_to_state(d) ACCESS_PRIVATE((d)->common, state_use_accessors)
@@ -389,6 +391,16 @@ static inline bool irqd_can_reserve(struct irq_data *d)
 	return __irqd_to_state(d) & IRQD_CAN_RESERVE;
 }
 
+static inline void irqd_set_deliver_as_nmi(struct irq_data *d)
+{
+	__irqd_to_state(d) |= IRQD_DELIVER_AS_NMI;
+}
+
+static inline bool irqd_deliver_as_nmi(struct irq_data *d)
+{
+	return __irqd_to_state(d) & IRQD_DELIVER_AS_NMI;
+}
+
 #undef __irqd_to_state
 
 static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 02/23] genirq: Introduce IRQD_DELIVER_AS_NMI
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Marc Zyngier, Bartosz Golaszewski,
	Doug Berger, Palmer Dabbelt, Randy Dunlap, iommu

Certain interrupt controllers (e.g., APIC) are capable of delivering
interrupts to the CPU as non-maskable. Add the new IRQD_DELIVER_AS_NMI
interrupt state flag. The purpose of this flag is to communicate to the
underlying irqchip whether the interrupt must be delivered in this manner.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Bartosz Golaszewski <brgl@bgdev.pl>
Cc: Doug Berger <opendmb@gmail.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/irq.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 65916a3..7271a2c 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -208,6 +208,7 @@ struct irq_data {
  * IRQD_SINGLE_TARGET		- IRQ allows only a single affinity target
  * IRQD_DEFAULT_TRIGGER_SET	- Expected trigger already been set
  * IRQD_CAN_RESERVE		- Can use reservation mode
+ * IRQD_DELIVER_AS_NMI		- Deliver this interrupt as non-maskable
  */
 enum {
 	IRQD_TRIGGER_MASK		= 0xf,
@@ -230,6 +231,7 @@ enum {
 	IRQD_SINGLE_TARGET		= (1 << 24),
 	IRQD_DEFAULT_TRIGGER_SET	= (1 << 25),
 	IRQD_CAN_RESERVE		= (1 << 26),
+	IRQD_DELIVER_AS_NMI		= (1 << 27),
 };
 
 #define __irqd_to_state(d) ACCESS_PRIVATE((d)->common, state_use_accessors)
@@ -389,6 +391,16 @@ static inline bool irqd_can_reserve(struct irq_data *d)
 	return __irqd_to_state(d) & IRQD_CAN_RESERVE;
 }
 
+static inline void irqd_set_deliver_as_nmi(struct irq_data *d)
+{
+	__irqd_to_state(d) |= IRQD_DELIVER_AS_NMI;
+}
+
+static inline bool irqd_deliver_as_nmi(struct irq_data *d)
+{
+	return __irqd_to_state(d) & IRQD_DELIVER_AS_NMI;
+}
+
 #undef __irqd_to_state
 
 static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
  2018-06-13  0:57 ` Ricardo Neri
  (?)
@ 2018-06-13  0:57   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Marc Zyngier,
	Bartosz Golaszewski, Doug Berger, Palmer Dabbelt, iommu

Certain interrupt controllers (such as APIC) are capable of delivering
interrupts as non-maskable. Likewise, drivers or subsystems (e.g., the
hardlockup detector) might be interested in requesting a non-maskable
interrupt. The new flag IRQF_DELIVER_AS_NMI serves this purpose.

When setting up an interrupt, non-maskable delivery will be set in the
interrupt state data only if supported by the underlying interrupt
controller chips.

Interrupt controller chips can declare that they support non-maskable
delivery by using the new flag IRQCHIP_CAN_DELIVER_AS_NMI.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Bartosz Golaszewski <brgl@bgdev.pl>
Cc: Doug Berger <opendmb@gmail.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/interrupt.h |  3 +++
 include/linux/irq.h       |  3 +++
 kernel/irq/manage.c       | 22 +++++++++++++++++++++-
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 5426627..dbc5e02 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -61,6 +61,8 @@
  *                interrupt handler after suspending interrupts. For system
  *                wakeup devices users need to implement wakeup detection in
  *                their interrupt handlers.
+ * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as non-maskable, if
+ *                supported by the chip.
  */
 #define IRQF_SHARED		0x00000080
 #define IRQF_PROBE_SHARED	0x00000100
@@ -74,6 +76,7 @@
 #define IRQF_NO_THREAD		0x00010000
 #define IRQF_EARLY_RESUME	0x00020000
 #define IRQF_COND_SUSPEND	0x00040000
+#define IRQF_DELIVER_AS_NMI	0x00080000
 
 #define IRQF_TIMER		(__IRQF_TIMER | IRQF_NO_SUSPEND | IRQF_NO_THREAD)
 
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 7271a2c..d2520ae 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -515,6 +515,8 @@ struct irq_chip {
  * IRQCHIP_SKIP_SET_WAKE:	Skip chip.irq_set_wake(), for this irq chip
  * IRQCHIP_ONESHOT_SAFE:	One shot does not require mask/unmask
  * IRQCHIP_EOI_THREADED:	Chip requires eoi() on unmask in threaded mode
+ * IRQCHIP_CAN_DELIVER_AS_NMI	Chip can deliver interrupts it receives as non-
+ *				maskable.
  */
 enum {
 	IRQCHIP_SET_TYPE_MASKED		= (1 <<  0),
@@ -524,6 +526,7 @@ enum {
 	IRQCHIP_SKIP_SET_WAKE		= (1 <<  4),
 	IRQCHIP_ONESHOT_SAFE		= (1 <<  5),
 	IRQCHIP_EOI_THREADED		= (1 <<  6),
+	IRQCHIP_CAN_DELIVER_AS_NMI	= (1 <<  7),
 };
 
 #include <linux/irqdesc.h>
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index e3336d9..d058aa8 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1137,7 +1137,7 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 {
 	struct irqaction *old, **old_ptr;
 	unsigned long flags, thread_mask = 0;
-	int ret, nested, shared = 0;
+	int ret, nested, shared = 0, deliver_as_nmi = 0;
 
 	if (!desc)
 		return -EINVAL;
@@ -1156,6 +1156,16 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 	if (!(new->flags & IRQF_TRIGGER_MASK))
 		new->flags |= irqd_get_trigger_type(&desc->irq_data);
 
+	/* Only deliver as non-maskable interrupt if supported by chip. */
+	if (new->flags & IRQF_DELIVER_AS_NMI) {
+		if (desc->irq_data.chip->flags & IRQCHIP_CAN_DELIVER_AS_NMI) {
+			irqd_set_deliver_as_nmi(&desc->irq_data);
+			deliver_as_nmi = 1;
+		} else {
+			return -EINVAL;
+		}
+	}
+
 	/*
 	 * Check whether the interrupt nests into another interrupt
 	 * thread.
@@ -1166,6 +1176,13 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 			ret = -EINVAL;
 			goto out_mput;
 		}
+
+		/* Don't allow nesting if interrupt will be delivered as NMI. */
+		if (deliver_as_nmi) {
+			ret = -EINVAL;
+			goto out_mput;
+		}
+
 		/*
 		 * Replace the primary handler which was provided from
 		 * the driver for non nested interrupt handling by the
@@ -1186,6 +1203,9 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 	 * thread.
 	 */
 	if (new->thread_fn && !nested) {
+		if (deliver_as_nmi)
+			goto out_mput;
+
 		ret = setup_irq_thread(new, irq, false);
 		if (ret)
 			goto out_mput;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Marc Zyngier,
	Bartosz Golaszewski, Doug Berger, Palmer Dabbelt, iommu

Certain interrupt controllers (such as APIC) are capable of delivering
interrupts as non-maskable. Likewise, drivers or subsystems (e.g., the
hardlockup detector) might be interested in requesting a non-maskable
interrupt. The new flag IRQF_DELIVER_AS_NMI serves this purpose.

When setting up an interrupt, non-maskable delivery will be set in the
interrupt state data only if supported by the underlying interrupt
controller chips.

Interrupt controller chips can declare that they support non-maskable
delivery by using the new flag IRQCHIP_CAN_DELIVER_AS_NMI.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Bartosz Golaszewski <brgl@bgdev.pl>
Cc: Doug Berger <opendmb@gmail.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/interrupt.h |  3 +++
 include/linux/irq.h       |  3 +++
 kernel/irq/manage.c       | 22 +++++++++++++++++++++-
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 5426627..dbc5e02 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -61,6 +61,8 @@
  *                interrupt handler after suspending interrupts. For system
  *                wakeup devices users need to implement wakeup detection in
  *                their interrupt handlers.
+ * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as non-maskable, if
+ *                supported by the chip.
  */
 #define IRQF_SHARED		0x00000080
 #define IRQF_PROBE_SHARED	0x00000100
@@ -74,6 +76,7 @@
 #define IRQF_NO_THREAD		0x00010000
 #define IRQF_EARLY_RESUME	0x00020000
 #define IRQF_COND_SUSPEND	0x00040000
+#define IRQF_DELIVER_AS_NMI	0x00080000
 
 #define IRQF_TIMER		(__IRQF_TIMER | IRQF_NO_SUSPEND | IRQF_NO_THREAD)
 
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 7271a2c..d2520ae 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -515,6 +515,8 @@ struct irq_chip {
  * IRQCHIP_SKIP_SET_WAKE:	Skip chip.irq_set_wake(), for this irq chip
  * IRQCHIP_ONESHOT_SAFE:	One shot does not require mask/unmask
  * IRQCHIP_EOI_THREADED:	Chip requires eoi() on unmask in threaded mode
+ * IRQCHIP_CAN_DELIVER_AS_NMI	Chip can deliver interrupts it receives as non-
+ *				maskable.
  */
 enum {
 	IRQCHIP_SET_TYPE_MASKED		= (1 <<  0),
@@ -524,6 +526,7 @@ enum {
 	IRQCHIP_SKIP_SET_WAKE		= (1 <<  4),
 	IRQCHIP_ONESHOT_SAFE		= (1 <<  5),
 	IRQCHIP_EOI_THREADED		= (1 <<  6),
+	IRQCHIP_CAN_DELIVER_AS_NMI	= (1 <<  7),
 };
 
 #include <linux/irqdesc.h>
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index e3336d9..d058aa8 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1137,7 +1137,7 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 {
 	struct irqaction *old, **old_ptr;
 	unsigned long flags, thread_mask = 0;
-	int ret, nested, shared = 0;
+	int ret, nested, shared = 0, deliver_as_nmi = 0;
 
 	if (!desc)
 		return -EINVAL;
@@ -1156,6 +1156,16 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 	if (!(new->flags & IRQF_TRIGGER_MASK))
 		new->flags |= irqd_get_trigger_type(&desc->irq_data);
 
+	/* Only deliver as non-maskable interrupt if supported by chip. */
+	if (new->flags & IRQF_DELIVER_AS_NMI) {
+		if (desc->irq_data.chip->flags & IRQCHIP_CAN_DELIVER_AS_NMI) {
+			irqd_set_deliver_as_nmi(&desc->irq_data);
+			deliver_as_nmi = 1;
+		} else {
+			return -EINVAL;
+		}
+	}
+
 	/*
 	 * Check whether the interrupt nests into another interrupt
 	 * thread.
@@ -1166,6 +1176,13 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 			ret = -EINVAL;
 			goto out_mput;
 		}
+
+		/* Don't allow nesting if interrupt will be delivered as NMI. */
+		if (deliver_as_nmi) {
+			ret = -EINVAL;
+			goto out_mput;
+		}
+
 		/*
 		 * Replace the primary handler which was provided from
 		 * the driver for non nested interrupt handling by the
@@ -1186,6 +1203,9 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 	 * thread.
 	 */
 	if (new->thread_fn && !nested) {
+		if (deliver_as_nmi)
+			goto out_mput;
+
 		ret = setup_irq_thread(new, irq, false);
 		if (ret)
 			goto out_mput;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Marc Zyngier,
	Bartosz Golaszewski, Doug Berger, Palmer Dabbelt, iommu

Certain interrupt controllers (such as APIC) are capable of delivering
interrupts as non-maskable. Likewise, drivers or subsystems (e.g., the
hardlockup detector) might be interested in requesting a non-maskable
interrupt. The new flag IRQF_DELIVER_AS_NMI serves this purpose.

When setting up an interrupt, non-maskable delivery will be set in the
interrupt state data only if supported by the underlying interrupt
controller chips.

Interrupt controller chips can declare that they support non-maskable
delivery by using the new flag IRQCHIP_CAN_DELIVER_AS_NMI.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Bartosz Golaszewski <brgl@bgdev.pl>
Cc: Doug Berger <opendmb@gmail.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/interrupt.h |  3 +++
 include/linux/irq.h       |  3 +++
 kernel/irq/manage.c       | 22 +++++++++++++++++++++-
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 5426627..dbc5e02 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -61,6 +61,8 @@
  *                interrupt handler after suspending interrupts. For system
  *                wakeup devices users need to implement wakeup detection in
  *                their interrupt handlers.
+ * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as non-maskable, if
+ *                supported by the chip.
  */
 #define IRQF_SHARED		0x00000080
 #define IRQF_PROBE_SHARED	0x00000100
@@ -74,6 +76,7 @@
 #define IRQF_NO_THREAD		0x00010000
 #define IRQF_EARLY_RESUME	0x00020000
 #define IRQF_COND_SUSPEND	0x00040000
+#define IRQF_DELIVER_AS_NMI	0x00080000
 
 #define IRQF_TIMER		(__IRQF_TIMER | IRQF_NO_SUSPEND | IRQF_NO_THREAD)
 
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 7271a2c..d2520ae 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -515,6 +515,8 @@ struct irq_chip {
  * IRQCHIP_SKIP_SET_WAKE:	Skip chip.irq_set_wake(), for this irq chip
  * IRQCHIP_ONESHOT_SAFE:	One shot does not require mask/unmask
  * IRQCHIP_EOI_THREADED:	Chip requires eoi() on unmask in threaded mode
+ * IRQCHIP_CAN_DELIVER_AS_NMI	Chip can deliver interrupts it receives as non-
+ *				maskable.
  */
 enum {
 	IRQCHIP_SET_TYPE_MASKED		= (1 <<  0),
@@ -524,6 +526,7 @@ enum {
 	IRQCHIP_SKIP_SET_WAKE		= (1 <<  4),
 	IRQCHIP_ONESHOT_SAFE		= (1 <<  5),
 	IRQCHIP_EOI_THREADED		= (1 <<  6),
+	IRQCHIP_CAN_DELIVER_AS_NMI	= (1 <<  7),
 };
 
 #include <linux/irqdesc.h>
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index e3336d9..d058aa8 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1137,7 +1137,7 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 {
 	struct irqaction *old, **old_ptr;
 	unsigned long flags, thread_mask = 0;
-	int ret, nested, shared = 0;
+	int ret, nested, shared = 0, deliver_as_nmi = 0;
 
 	if (!desc)
 		return -EINVAL;
@@ -1156,6 +1156,16 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 	if (!(new->flags & IRQF_TRIGGER_MASK))
 		new->flags |= irqd_get_trigger_type(&desc->irq_data);
 
+	/* Only deliver as non-maskable interrupt if supported by chip. */
+	if (new->flags & IRQF_DELIVER_AS_NMI) {
+		if (desc->irq_data.chip->flags & IRQCHIP_CAN_DELIVER_AS_NMI) {
+			irqd_set_deliver_as_nmi(&desc->irq_data);
+			deliver_as_nmi = 1;
+		} else {
+			return -EINVAL;
+		}
+	}
+
 	/*
 	 * Check whether the interrupt nests into another interrupt
 	 * thread.
@@ -1166,6 +1176,13 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 			ret = -EINVAL;
 			goto out_mput;
 		}
+
+		/* Don't allow nesting if interrupt will be delivered as NMI. */
+		if (deliver_as_nmi) {
+			ret = -EINVAL;
+			goto out_mput;
+		}
+
 		/*
 		 * Replace the primary handler which was provided from
 		 * the driver for non nested interrupt handling by the
@@ -1186,6 +1203,9 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
 	 * thread.
 	 */
 	if (new->thread_fn && !nested) {
+		if (deliver_as_nmi)
+			goto out_mput;
+
 		ret = setup_irq_thread(new, irq, false);
 		if (ret)
 			goto out_mput;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 04/23] iommu/vt-d/irq_remapping: Add support for IRQCHIP_CAN_DELIVER_AS_NMI
  2018-06-13  0:57 ` Ricardo Neri
@ 2018-06-13  0:57   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Joerg Roedel, iommu

The Intel IOMMU is capable of delivering remapped interrupts as non-
maskable. Add the IRQCHIP_CAN_DELIVER_AS_NMI flag to its irq_chip
structure to declare this capability. The delivery mode of each interrupt
can be set separately.

By default, the deliver mode is taken from the configuration field of the
interrupt data. If non-maskable delivery is requested in the interrupt
state flags, the respective entry in the remapping table is updated.

When remapping an interrupt from an IO APIC, modify the delivery
field in the interrupt remapping table entry. When remapping an MSI
interrupt, simply update the delivery mode when composing the message.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 drivers/iommu/intel_irq_remapping.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 9f3a04d..b6cf7c4 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -1128,10 +1128,14 @@ static void intel_ir_reconfigure_irte(struct irq_data *irqd, bool force)
 	struct irte *irte = &ir_data->irte_entry;
 	struct irq_cfg *cfg = irqd_cfg(irqd);
 
+	if (irqd_deliver_as_nmi(irqd))
+		cfg->delivery_mode = dest_NMI;
+
 	/*
 	 * Atomically updates the IRTE with the new destination, vector
 	 * and flushes the interrupt entry cache.
 	 */
+	irte->dlvry_mode = cfg->delivery_mode;
 	irte->vector = cfg->vector;
 	irte->dest_id = IRTE_DEST(cfg->dest_apicid);
 
@@ -1182,6 +1186,9 @@ static void intel_ir_compose_msi_msg(struct irq_data *irq_data,
 {
 	struct intel_ir_data *ir_data = irq_data->chip_data;
 
+	if (irqd_deliver_as_nmi(irq_data))
+		ir_data->irte_entry.dlvry_mode = dest_NMI;
+
 	*msg = ir_data->msi_entry;
 }
 
@@ -1227,6 +1234,7 @@ static struct irq_chip intel_ir_chip = {
 	.irq_set_affinity	= intel_ir_set_affinity,
 	.irq_compose_msi_msg	= intel_ir_compose_msi_msg,
 	.irq_set_vcpu_affinity	= intel_ir_set_vcpu_affinity,
+	.flags			= IRQCHIP_CAN_DELIVER_AS_NMI,
 };
 
 static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 04/23] iommu/vt-d/irq_remapping: Add support for IRQCHIP_CAN_DELIVER_AS_NMI
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Joerg Roedel, iommu

The Intel IOMMU is capable of delivering remapped interrupts as non-
maskable. Add the IRQCHIP_CAN_DELIVER_AS_NMI flag to its irq_chip
structure to declare this capability. The delivery mode of each interrupt
can be set separately.

By default, the deliver mode is taken from the configuration field of the
interrupt data. If non-maskable delivery is requested in the interrupt
state flags, the respective entry in the remapping table is updated.

When remapping an interrupt from an IO APIC, modify the delivery
field in the interrupt remapping table entry. When remapping an MSI
interrupt, simply update the delivery mode when composing the message.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 drivers/iommu/intel_irq_remapping.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 9f3a04d..b6cf7c4 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -1128,10 +1128,14 @@ static void intel_ir_reconfigure_irte(struct irq_data *irqd, bool force)
 	struct irte *irte = &ir_data->irte_entry;
 	struct irq_cfg *cfg = irqd_cfg(irqd);
 
+	if (irqd_deliver_as_nmi(irqd))
+		cfg->delivery_mode = dest_NMI;
+
 	/*
 	 * Atomically updates the IRTE with the new destination, vector
 	 * and flushes the interrupt entry cache.
 	 */
+	irte->dlvry_mode = cfg->delivery_mode;
 	irte->vector = cfg->vector;
 	irte->dest_id = IRTE_DEST(cfg->dest_apicid);
 
@@ -1182,6 +1186,9 @@ static void intel_ir_compose_msi_msg(struct irq_data *irq_data,
 {
 	struct intel_ir_data *ir_data = irq_data->chip_data;
 
+	if (irqd_deliver_as_nmi(irq_data))
+		ir_data->irte_entry.dlvry_mode = dest_NMI;
+
 	*msg = ir_data->msi_entry;
 }
 
@@ -1227,6 +1234,7 @@ static struct irq_chip intel_ir_chip = {
 	.irq_set_affinity	= intel_ir_set_affinity,
 	.irq_compose_msi_msg	= intel_ir_compose_msi_msg,
 	.irq_set_vcpu_affinity	= intel_ir_set_vcpu_affinity,
+	.flags			= IRQCHIP_CAN_DELIVER_AS_NMI,
 };
 
 static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 05/23] x86/msi: Add support for IRQCHIP_CAN_DELIVER_AS_NMI
  2018-06-13  0:57 ` Ricardo Neri
@ 2018-06-13  0:57   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Dou Liyang, Juergen Gross, iommu

As per the Intel 64 and IA-32 Architectures Software Developer's Manual
Volume 3 Section 10.11.2, the delivery mode field of the interrupt message
can be set to configure as non-maskable. Declare support to deliver non-
maskable interrupts by adding IRQCHIP_CAN_DELIVER_AS_NMI.

When composing the interrupt message, the delivery mode is obtained from
the configuration of the interrupt data.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/kernel/apic/msi.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 12202ac..68b6a04 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -29,6 +29,9 @@ static void irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
 {
 	struct irq_cfg *cfg = irqd_cfg(data);
 
+	if (irqd_deliver_as_nmi(data))
+		cfg->delivery_mode = dest_NMI;
+
 	msg->address_hi = MSI_ADDR_BASE_HI;
 
 	if (x2apic_enabled())
@@ -297,7 +300,7 @@ static struct irq_chip hpet_msi_controller __ro_after_init = {
 	.irq_retrigger = irq_chip_retrigger_hierarchy,
 	.irq_compose_msi_msg = irq_msi_compose_msg,
 	.irq_write_msi_msg = hpet_msi_write_msg,
-	.flags = IRQCHIP_SKIP_SET_WAKE,
+	.flags = IRQCHIP_SKIP_SET_WAKE | IRQCHIP_CAN_DELIVER_AS_NMI,
 };
 
 static irq_hw_number_t hpet_msi_get_hwirq(struct msi_domain_info *info,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 05/23] x86/msi: Add support for IRQCHIP_CAN_DELIVER_AS_NMI
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Dou Liyang, Juergen Gross, iommu

As per the Intel 64 and IA-32 Architectures Software Developer's Manual
Volume 3 Section 10.11.2, the delivery mode field of the interrupt message
can be set to configure as non-maskable. Declare support to deliver non-
maskable interrupts by adding IRQCHIP_CAN_DELIVER_AS_NMI.

When composing the interrupt message, the delivery mode is obtained from
the configuration of the interrupt data.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/kernel/apic/msi.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 12202ac..68b6a04 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -29,6 +29,9 @@ static void irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
 {
 	struct irq_cfg *cfg = irqd_cfg(data);
 
+	if (irqd_deliver_as_nmi(data))
+		cfg->delivery_mode = dest_NMI;
+
 	msg->address_hi = MSI_ADDR_BASE_HI;
 
 	if (x2apic_enabled())
@@ -297,7 +300,7 @@ static struct irq_chip hpet_msi_controller __ro_after_init = {
 	.irq_retrigger = irq_chip_retrigger_hierarchy,
 	.irq_compose_msi_msg = irq_msi_compose_msg,
 	.irq_write_msi_msg = hpet_msi_write_msg,
-	.flags = IRQCHIP_SKIP_SET_WAKE,
+	.flags = IRQCHIP_SKIP_SET_WAKE | IRQCHIP_CAN_DELIVER_AS_NMI,
 };
 
 static irq_hw_number_t hpet_msi_get_hwirq(struct msi_domain_info *info,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 06/23] x86/ioapic: Add support for IRQCHIP_CAN_DELIVER_AS_NMI with interrupt remapping
  2018-06-13  0:57 ` Ricardo Neri
@ 2018-06-13  0:57   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Juergen Gross, Baoquan He,
	Eric W. Biederman, Dou Liyang, Jan Kiszka, iommu

Even though there is a delivery mode field at the entries of an IO APIC's
redirection table, the documentation of the majority of the IO APICs
explicitly states that interrupt delivery as non-maskable is not supported.
Thus,

However, when using an IO APIC in combination with the Intel VT-d interrupt
remapping functionality, the delivery of the interrupt to the CPU is
handled by the remapping hardware. In such a case, the interrupt can be
delivered as non maskable.

Thus, add the IRQCHIP_CAN_DELIVER_AS_NMI flag only when used in combination
with interrupt remapping.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/kernel/apic/io_apic.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 10a20f8..39de91b 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1911,7 +1911,8 @@ static struct irq_chip ioapic_ir_chip __read_mostly = {
 	.irq_eoi		= ioapic_ir_ack_level,
 	.irq_set_affinity	= ioapic_set_affinity,
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
-	.flags			= IRQCHIP_SKIP_SET_WAKE,
+	.flags			= IRQCHIP_SKIP_SET_WAKE |
+				  IRQCHIP_CAN_DELIVER_AS_NMI,
 };
 
 static inline void init_IO_APIC_traps(void)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 06/23] x86/ioapic: Add support for IRQCHIP_CAN_DELIVER_AS_NMI with interrupt remapping
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Juergen Gross, Baoquan He,
	Eric W. Biederman, Dou Liyang, Jan Kiszka, iommu

Even though there is a delivery mode field at the entries of an IO APIC's
redirection table, the documentation of the majority of the IO APICs
explicitly states that interrupt delivery as non-maskable is not supported.
Thus,

However, when using an IO APIC in combination with the Intel VT-d interrupt
remapping functionality, the delivery of the interrupt to the CPU is
handled by the remapping hardware. In such a case, the interrupt can be
delivered as non maskable.

Thus, add the IRQCHIP_CAN_DELIVER_AS_NMI flag only when used in combination
with interrupt remapping.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/kernel/apic/io_apic.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 10a20f8..39de91b 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1911,7 +1911,8 @@ static struct irq_chip ioapic_ir_chip __read_mostly = {
 	.irq_eoi		= ioapic_ir_ack_level,
 	.irq_set_affinity	= ioapic_set_affinity,
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
-	.flags			= IRQCHIP_SKIP_SET_WAKE,
+	.flags			= IRQCHIP_SKIP_SET_WAKE |
+				  IRQCHIP_CAN_DELIVER_AS_NMI,
 };
 
 static inline void init_IO_APIC_traps(void)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 07/23] x86/hpet: Expose more functions to read and write registers
  2018-06-13  0:57 ` Ricardo Neri
@ 2018-06-13  0:57   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Philippe Ombredanne, Kate Stewart,
	Rafael J. Wysocki, iommu

Some of the registers in the HPET hardware have a width of 64 bits. 64-bit
access functions are needed mostly to read the counter and write the
comparator in a single read or write. Also, 64-bit accesses can be used to
to read parameters located in the higher bits of some registers (such as
the timer period and the IO APIC pins that can be asserted by the timer)
without the need of masking and shifting the register values.

64-bit read and write functions are added. These functions, along with the
existing hpet_writel(), are exposed via the HPET header to be used by other
kernel subsystems.

Thus far, the only consumer of these functions will the HPET-based
hardlockup detector, which will only be available in 64-bit builds. Thus,
the 64-bit access functions are wrapped in CONFIG_X86_64.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hpet.h | 10 ++++++++++
 arch/x86/kernel/hpet.c      | 12 +++++++++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 67385d5..9e0afde 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -72,6 +72,11 @@ extern int is_hpet_enabled(void);
 extern int hpet_enable(void);
 extern void hpet_disable(void);
 extern unsigned int hpet_readl(unsigned int a);
+extern void hpet_writel(unsigned int d, unsigned int a);
+#ifdef CONFIG_X86_64
+extern unsigned long hpet_readq(unsigned int a);
+extern void hpet_writeq(unsigned long d, unsigned int a);
+#endif
 extern void force_hpet_resume(void);
 
 struct irq_data;
@@ -109,6 +114,11 @@ extern void hpet_unregister_irq_handler(rtc_irq_handler handler);
 static inline int hpet_enable(void) { return 0; }
 static inline int is_hpet_enabled(void) { return 0; }
 #define hpet_readl(a) 0
+#define hpet_writel(d, a)
+#ifdef CONFIG_X86_64
+#define hpet_readq(a) 0
+#define hpet_writeq(d, a)
+#endif
 #define default_setup_hpet_msi	NULL
 
 #endif
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 8ce4212..3fa1d3f 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -64,12 +64,22 @@ inline unsigned int hpet_readl(unsigned int a)
 	return readl(hpet_virt_address + a);
 }
 
-static inline void hpet_writel(unsigned int d, unsigned int a)
+inline void hpet_writel(unsigned int d, unsigned int a)
 {
 	writel(d, hpet_virt_address + a);
 }
 
 #ifdef CONFIG_X86_64
+inline unsigned long hpet_readq(unsigned int a)
+{
+	return readq(hpet_virt_address + a);
+}
+
+inline void hpet_writeq(unsigned long d, unsigned int a)
+{
+	writeq(d, hpet_virt_address + a);
+}
+
 #include <asm/pgtable.h>
 #endif
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 07/23] x86/hpet: Expose more functions to read and write registers
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Philippe Ombredanne, Kate Stewart,
	Rafael J. Wysocki, iommu

Some of the registers in the HPET hardware have a width of 64 bits. 64-bit
access functions are needed mostly to read the counter and write the
comparator in a single read or write. Also, 64-bit accesses can be used to
to read parameters located in the higher bits of some registers (such as
the timer period and the IO APIC pins that can be asserted by the timer)
without the need of masking and shifting the register values.

64-bit read and write functions are added. These functions, along with the
existing hpet_writel(), are exposed via the HPET header to be used by other
kernel subsystems.

Thus far, the only consumer of these functions will the HPET-based
hardlockup detector, which will only be available in 64-bit builds. Thus,
the 64-bit access functions are wrapped in CONFIG_X86_64.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hpet.h | 10 ++++++++++
 arch/x86/kernel/hpet.c      | 12 +++++++++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 67385d5..9e0afde 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -72,6 +72,11 @@ extern int is_hpet_enabled(void);
 extern int hpet_enable(void);
 extern void hpet_disable(void);
 extern unsigned int hpet_readl(unsigned int a);
+extern void hpet_writel(unsigned int d, unsigned int a);
+#ifdef CONFIG_X86_64
+extern unsigned long hpet_readq(unsigned int a);
+extern void hpet_writeq(unsigned long d, unsigned int a);
+#endif
 extern void force_hpet_resume(void);
 
 struct irq_data;
@@ -109,6 +114,11 @@ extern void hpet_unregister_irq_handler(rtc_irq_handler handler);
 static inline int hpet_enable(void) { return 0; }
 static inline int is_hpet_enabled(void) { return 0; }
 #define hpet_readl(a) 0
+#define hpet_writel(d, a)
+#ifdef CONFIG_X86_64
+#define hpet_readq(a) 0
+#define hpet_writeq(d, a)
+#endif
 #define default_setup_hpet_msi	NULL
 
 #endif
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 8ce4212..3fa1d3f 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -64,12 +64,22 @@ inline unsigned int hpet_readl(unsigned int a)
 	return readl(hpet_virt_address + a);
 }
 
-static inline void hpet_writel(unsigned int d, unsigned int a)
+inline void hpet_writel(unsigned int d, unsigned int a)
 {
 	writel(d, hpet_virt_address + a);
 }
 
 #ifdef CONFIG_X86_64
+inline unsigned long hpet_readq(unsigned int a)
+{
+	return readq(hpet_virt_address + a);
+}
+
+inline void hpet_writeq(unsigned long d, unsigned int a)
+{
+	writeq(d, hpet_virt_address + a);
+}
+
 #include <asm/pgtable.h>
 #endif
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 08/23] x86/hpet: Calculate ticks-per-second in a separate function
  2018-06-13  0:57 ` Ricardo Neri
@ 2018-06-13  0:57   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki, iommu

It is easier to compute the expiration times of an HPET timer by using
its frequency (i.e., the number of times it ticks in a second) than its
period, as given in the capabilities register.

In addition to the HPET char driver, the HPET-based hardlockup detector
will also need to know the timer's frequency. Thus, create a common
function that both can use.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 drivers/char/hpet.c  | 31 +++++++++++++++++++++++++------
 include/linux/hpet.h |  1 +
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
index be426eb..1c9584a 100644
--- a/drivers/char/hpet.c
+++ b/drivers/char/hpet.c
@@ -838,6 +838,29 @@ static unsigned long hpet_calibrate(struct hpets *hpetp)
 	return ret;
 }
 
+u64 hpet_get_ticks_per_sec(u64 hpet_caps)
+{
+	u64 ticks_per_sec, period;
+
+	period = (hpet_caps & HPET_COUNTER_CLK_PERIOD_MASK) >>
+		 HPET_COUNTER_CLK_PERIOD_SHIFT; /* fs, 10^-15 */
+
+	/*
+	 * The frequency is the reciprocal of the period. The period is given
+	 * femtoseconds per second. Thus, prepare a dividend to obtain the
+	 * frequency in ticks per second.
+	 */
+
+	/* 10^15 femtoseconds per second */
+	ticks_per_sec = 1000000000000000uLL;
+	ticks_per_sec += period >> 1; /* round */
+
+	/* The quotient is put in the dividend. We drop the remainder. */
+	do_div(ticks_per_sec, period);
+
+	return ticks_per_sec;
+}
+
 int hpet_alloc(struct hpet_data *hdp)
 {
 	u64 cap, mcfg;
@@ -847,7 +870,6 @@ int hpet_alloc(struct hpet_data *hdp)
 	size_t siz;
 	struct hpet __iomem *hpet;
 	static struct hpets *last;
-	unsigned long period;
 	unsigned long long temp;
 	u32 remainder;
 
@@ -883,6 +905,8 @@ int hpet_alloc(struct hpet_data *hdp)
 
 	cap = readq(&hpet->hpet_cap);
 
+	temp = hpet_get_ticks_per_sec(cap);
+
 	ntimer = ((cap & HPET_NUM_TIM_CAP_MASK) >> HPET_NUM_TIM_CAP_SHIFT) + 1;
 
 	if (hpetp->hp_ntimer != ntimer) {
@@ -899,11 +923,6 @@ int hpet_alloc(struct hpet_data *hdp)
 
 	last = hpetp;
 
-	period = (cap & HPET_COUNTER_CLK_PERIOD_MASK) >>
-		HPET_COUNTER_CLK_PERIOD_SHIFT; /* fs, 10^-15 */
-	temp = 1000000000000000uLL; /* 10^15 femtoseconds per second */
-	temp += period >> 1; /* round */
-	do_div(temp, period);
 	hpetp->hp_tick_freq = temp; /* ticks per second */
 
 	printk(KERN_INFO "hpet%d: at MMIO 0x%lx, IRQ%s",
diff --git a/include/linux/hpet.h b/include/linux/hpet.h
index 8604564..e7b36bcf4 100644
--- a/include/linux/hpet.h
+++ b/include/linux/hpet.h
@@ -107,5 +107,6 @@ static inline void hpet_reserve_timer(struct hpet_data *hd, int timer)
 }
 
 int hpet_alloc(struct hpet_data *);
+u64 hpet_get_ticks_per_sec(u64 hpet_caps);
 
 #endif				/* !__HPET__ */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 08/23] x86/hpet: Calculate ticks-per-second in a separate function
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki, iommu

It is easier to compute the expiration times of an HPET timer by using
its frequency (i.e., the number of times it ticks in a second) than its
period, as given in the capabilities register.

In addition to the HPET char driver, the HPET-based hardlockup detector
will also need to know the timer's frequency. Thus, create a common
function that both can use.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 drivers/char/hpet.c  | 31 +++++++++++++++++++++++++------
 include/linux/hpet.h |  1 +
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
index be426eb..1c9584a 100644
--- a/drivers/char/hpet.c
+++ b/drivers/char/hpet.c
@@ -838,6 +838,29 @@ static unsigned long hpet_calibrate(struct hpets *hpetp)
 	return ret;
 }
 
+u64 hpet_get_ticks_per_sec(u64 hpet_caps)
+{
+	u64 ticks_per_sec, period;
+
+	period = (hpet_caps & HPET_COUNTER_CLK_PERIOD_MASK) >>
+		 HPET_COUNTER_CLK_PERIOD_SHIFT; /* fs, 10^-15 */
+
+	/*
+	 * The frequency is the reciprocal of the period. The period is given
+	 * femtoseconds per second. Thus, prepare a dividend to obtain the
+	 * frequency in ticks per second.
+	 */
+
+	/* 10^15 femtoseconds per second */
+	ticks_per_sec = 1000000000000000uLL;
+	ticks_per_sec += period >> 1; /* round */
+
+	/* The quotient is put in the dividend. We drop the remainder. */
+	do_div(ticks_per_sec, period);
+
+	return ticks_per_sec;
+}
+
 int hpet_alloc(struct hpet_data *hdp)
 {
 	u64 cap, mcfg;
@@ -847,7 +870,6 @@ int hpet_alloc(struct hpet_data *hdp)
 	size_t siz;
 	struct hpet __iomem *hpet;
 	static struct hpets *last;
-	unsigned long period;
 	unsigned long long temp;
 	u32 remainder;
 
@@ -883,6 +905,8 @@ int hpet_alloc(struct hpet_data *hdp)
 
 	cap = readq(&hpet->hpet_cap);
 
+	temp = hpet_get_ticks_per_sec(cap);
+
 	ntimer = ((cap & HPET_NUM_TIM_CAP_MASK) >> HPET_NUM_TIM_CAP_SHIFT) + 1;
 
 	if (hpetp->hp_ntimer != ntimer) {
@@ -899,11 +923,6 @@ int hpet_alloc(struct hpet_data *hdp)
 
 	last = hpetp;
 
-	period = (cap & HPET_COUNTER_CLK_PERIOD_MASK) >>
-		HPET_COUNTER_CLK_PERIOD_SHIFT; /* fs, 10^-15 */
-	temp = 1000000000000000uLL; /* 10^15 femtoseconds per second */
-	temp += period >> 1; /* round */
-	do_div(temp, period);
 	hpetp->hp_tick_freq = temp; /* ticks per second */
 
 	printk(KERN_INFO "hpet%d: at MMIO 0x%lx, IRQ%s",
diff --git a/include/linux/hpet.h b/include/linux/hpet.h
index 8604564..e7b36bcf4 100644
--- a/include/linux/hpet.h
+++ b/include/linux/hpet.h
@@ -107,5 +107,6 @@ static inline void hpet_reserve_timer(struct hpet_data *hd, int timer)
 }
 
 int hpet_alloc(struct hpet_data *);
+u64 hpet_get_ticks_per_sec(u64 hpet_caps);
 
 #endif				/* !__HPET__ */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 09/23] x86/hpet: Reserve timer for the HPET hardlockup detector
  2018-06-13  0:57 ` Ricardo Neri
@ 2018-06-13  0:57   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki, iommu

HPET timer 2 will be used to drive the HPET-based hardlockup detector.
Reserve such timer to ensure it cannot be used by user space programs or
clock events.

When looking for MSI-capable timers for clock events, skip timer 2 if
the HPET hardlockup detector is selected.

Also, do not assign an IO APIC pin to timer 2 of the HPET. A subsequent
changeset will handle the interrupt setup of the timer used for the
hardlockup detector.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hpet.h |  3 +++
 arch/x86/kernel/hpet.c      | 19 ++++++++++++++++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 9e0afde..3266796 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -61,6 +61,9 @@
  */
 #define HPET_MIN_PERIOD		100000UL
 
+/* Timer used for the hardlockup detector */
+#define HPET_WD_TIMER_NR 2
+
 /* hpet memory map physical address */
 extern unsigned long hpet_address;
 extern unsigned long force_hpet_address;
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 3fa1d3f..b03faee 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -185,7 +185,8 @@ do {								\
 
 /*
  * When the hpet driver (/dev/hpet) is enabled, we need to reserve
- * timer 0 and timer 1 in case of RTC emulation.
+ * timer 0 and timer 1 in case of RTC emulation. Timer 2 is reserved in case
+ * the HPET-based hardlockup detector is used.
  */
 #ifdef CONFIG_HPET
 
@@ -195,7 +196,7 @@ static void hpet_reserve_platform_timers(unsigned int id)
 {
 	struct hpet __iomem *hpet = hpet_virt_address;
 	struct hpet_timer __iomem *timer = &hpet->hpet_timers[2];
-	unsigned int nrtimers, i;
+	unsigned int nrtimers, i, start_timer;
 	struct hpet_data hd;
 
 	nrtimers = ((id & HPET_ID_NUMBER) >> HPET_ID_NUMBER_SHIFT) + 1;
@@ -210,6 +211,13 @@ static void hpet_reserve_platform_timers(unsigned int id)
 	hpet_reserve_timer(&hd, 1);
 #endif
 
+	if (IS_ENABLED(CONFIG_HARDLOCKUP_DETECTOR_HPET)) {
+		hpet_reserve_timer(&hd, HPET_WD_TIMER_NR);
+		start_timer = HPET_WD_TIMER_NR + 1;
+	} else {
+		start_timer = HPET_WD_TIMER_NR;
+	}
+
 	/*
 	 * NOTE that hd_irq[] reflects IOAPIC input pins (LEGACY_8254
 	 * is wrong for i8259!) not the output IRQ.  Many BIOS writers
@@ -218,7 +226,7 @@ static void hpet_reserve_platform_timers(unsigned int id)
 	hd.hd_irq[0] = HPET_LEGACY_8254;
 	hd.hd_irq[1] = HPET_LEGACY_RTC;
 
-	for (i = 2; i < nrtimers; timer++, i++) {
+	for (i = start_timer; i < nrtimers; timer++, i++) {
 		hd.hd_irq[i] = (readl(&timer->hpet_config) &
 			Tn_INT_ROUTE_CNF_MASK) >> Tn_INT_ROUTE_CNF_SHIFT;
 	}
@@ -630,6 +638,11 @@ static void hpet_msi_capability_lookup(unsigned int start_timer)
 		struct hpet_dev *hdev = &hpet_devs[num_timers_used];
 		unsigned int cfg = hpet_readl(HPET_Tn_CFG(i));
 
+		/* Do not use timer reserved for the HPET watchdog. */
+		if (IS_ENABLED(CONFIG_HARDLOCKUP_DETECTOR_HPET) &&
+		    i == HPET_WD_TIMER_NR)
+			continue;
+
 		/* Only consider HPET timer with MSI support */
 		if (!(cfg & HPET_TN_FSB_CAP))
 			continue;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 09/23] x86/hpet: Reserve timer for the HPET hardlockup detector
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki, iommu

HPET timer 2 will be used to drive the HPET-based hardlockup detector.
Reserve such timer to ensure it cannot be used by user space programs or
clock events.

When looking for MSI-capable timers for clock events, skip timer 2 if
the HPET hardlockup detector is selected.

Also, do not assign an IO APIC pin to timer 2 of the HPET. A subsequent
changeset will handle the interrupt setup of the timer used for the
hardlockup detector.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hpet.h |  3 +++
 arch/x86/kernel/hpet.c      | 19 ++++++++++++++++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 9e0afde..3266796 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -61,6 +61,9 @@
  */
 #define HPET_MIN_PERIOD		100000UL
 
+/* Timer used for the hardlockup detector */
+#define HPET_WD_TIMER_NR 2
+
 /* hpet memory map physical address */
 extern unsigned long hpet_address;
 extern unsigned long force_hpet_address;
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 3fa1d3f..b03faee 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -185,7 +185,8 @@ do {								\
 
 /*
  * When the hpet driver (/dev/hpet) is enabled, we need to reserve
- * timer 0 and timer 1 in case of RTC emulation.
+ * timer 0 and timer 1 in case of RTC emulation. Timer 2 is reserved in case
+ * the HPET-based hardlockup detector is used.
  */
 #ifdef CONFIG_HPET
 
@@ -195,7 +196,7 @@ static void hpet_reserve_platform_timers(unsigned int id)
 {
 	struct hpet __iomem *hpet = hpet_virt_address;
 	struct hpet_timer __iomem *timer = &hpet->hpet_timers[2];
-	unsigned int nrtimers, i;
+	unsigned int nrtimers, i, start_timer;
 	struct hpet_data hd;
 
 	nrtimers = ((id & HPET_ID_NUMBER) >> HPET_ID_NUMBER_SHIFT) + 1;
@@ -210,6 +211,13 @@ static void hpet_reserve_platform_timers(unsigned int id)
 	hpet_reserve_timer(&hd, 1);
 #endif
 
+	if (IS_ENABLED(CONFIG_HARDLOCKUP_DETECTOR_HPET)) {
+		hpet_reserve_timer(&hd, HPET_WD_TIMER_NR);
+		start_timer = HPET_WD_TIMER_NR + 1;
+	} else {
+		start_timer = HPET_WD_TIMER_NR;
+	}
+
 	/*
 	 * NOTE that hd_irq[] reflects IOAPIC input pins (LEGACY_8254
 	 * is wrong for i8259!) not the output IRQ.  Many BIOS writers
@@ -218,7 +226,7 @@ static void hpet_reserve_platform_timers(unsigned int id)
 	hd.hd_irq[0] = HPET_LEGACY_8254;
 	hd.hd_irq[1] = HPET_LEGACY_RTC;
 
-	for (i = 2; i < nrtimers; timer++, i++) {
+	for (i = start_timer; i < nrtimers; timer++, i++) {
 		hd.hd_irq[i] = (readl(&timer->hpet_config) &
 			Tn_INT_ROUTE_CNF_MASK) >> Tn_INT_ROUTE_CNF_SHIFT;
 	}
@@ -630,6 +638,11 @@ static void hpet_msi_capability_lookup(unsigned int start_timer)
 		struct hpet_dev *hdev = &hpet_devs[num_timers_used];
 		unsigned int cfg = hpet_readl(HPET_Tn_CFG(i));
 
+		/* Do not use timer reserved for the HPET watchdog. */
+		if (IS_ENABLED(CONFIG_HARDLOCKUP_DETECTOR_HPET) &&
+		    i = HPET_WD_TIMER_NR)
+			continue;
+
 		/* Only consider HPET timer with MSI support */
 		if (!(cfg & HPET_TN_FSB_CAP))
 			continue;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 10/23] x86/hpet: Relocate flag definitions to a header file
  2018-06-13  0:57 ` Ricardo Neri
@ 2018-06-13  0:57   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki, iommu

Users of HPET timers (such as the hardlockup detector) need the definitions
of these flags to interpret the configuration of a timer as passed by
platform code.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hpet.h | 6 ++++++
 arch/x86/kernel/hpet.c      | 6 ------
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 3266796..9fd112a 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -64,6 +64,12 @@
 /* Timer used for the hardlockup detector */
 #define HPET_WD_TIMER_NR 2
 
+#define HPET_DEV_USED_BIT		2
+#define HPET_DEV_USED			(1 << HPET_DEV_USED_BIT)
+#define HPET_DEV_VALID			0x8
+#define HPET_DEV_FSB_CAP		0x1000
+#define HPET_DEV_PERI_CAP		0x2000
+
 /* hpet memory map physical address */
 extern unsigned long hpet_address;
 extern unsigned long force_hpet_address;
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index b03faee..99d4972 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -24,12 +24,6 @@
    NSEC = 10^-9 */
 #define FSEC_PER_NSEC			1000000L
 
-#define HPET_DEV_USED_BIT		2
-#define HPET_DEV_USED			(1 << HPET_DEV_USED_BIT)
-#define HPET_DEV_VALID			0x8
-#define HPET_DEV_FSB_CAP		0x1000
-#define HPET_DEV_PERI_CAP		0x2000
-
 #define HPET_MIN_CYCLES			128
 #define HPET_MIN_PROG_DELTA		(HPET_MIN_CYCLES + (HPET_MIN_CYCLES >> 1))
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 10/23] x86/hpet: Relocate flag definitions to a header file
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki, iommu

Users of HPET timers (such as the hardlockup detector) need the definitions
of these flags to interpret the configuration of a timer as passed by
platform code.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hpet.h | 6 ++++++
 arch/x86/kernel/hpet.c      | 6 ------
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 3266796..9fd112a 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -64,6 +64,12 @@
 /* Timer used for the hardlockup detector */
 #define HPET_WD_TIMER_NR 2
 
+#define HPET_DEV_USED_BIT		2
+#define HPET_DEV_USED			(1 << HPET_DEV_USED_BIT)
+#define HPET_DEV_VALID			0x8
+#define HPET_DEV_FSB_CAP		0x1000
+#define HPET_DEV_PERI_CAP		0x2000
+
 /* hpet memory map physical address */
 extern unsigned long hpet_address;
 extern unsigned long force_hpet_address;
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index b03faee..99d4972 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -24,12 +24,6 @@
    NSEC = 10^-9 */
 #define FSEC_PER_NSEC			1000000L
 
-#define HPET_DEV_USED_BIT		2
-#define HPET_DEV_USED			(1 << HPET_DEV_USED_BIT)
-#define HPET_DEV_VALID			0x8
-#define HPET_DEV_FSB_CAP		0x1000
-#define HPET_DEV_PERI_CAP		0x2000
-
 #define HPET_MIN_CYCLES			128
 #define HPET_MIN_PROG_DELTA		(HPET_MIN_CYCLES + (HPET_MIN_CYCLES >> 1))
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 11/23] x86/hpet: Configure the timer used by the hardlockup detector
  2018-06-13  0:57 ` Ricardo Neri
@ 2018-06-13  0:57   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki, iommu

Implement the initial configuration of the timer to be used by the
hardlockup detector. The main focus of this configuration is to provide an
interrupt for the timer.

Two types of interrupt can be assigned to the timer. First, attempt to
assign a message-signaled interrupt. This implies creating the HPET MSI
domain; only if it was not created when HPET timers are used for event
timers. The data structures needed to allocate the MSI interrupt in the
domain are also created.

If message-signaled interrupts cannot be used, assign a legacy IO APIC
interrupt via the ACPI Global System Interrupts.

The resulting interrupt configuration, along with the timer instance, and
frequency are then made available to the hardlockup detector in a struct
via the new function hpet_hardlockup_detector_assign_timer().

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hpet.h |  16 +++++++
 arch/x86/kernel/hpet.c      | 112 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 9fd112a..33309b7 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -118,6 +118,22 @@ extern void hpet_unregister_irq_handler(rtc_irq_handler handler);
 
 #endif /* CONFIG_HPET_EMULATE_RTC */
 
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_HPET
+struct hpet_hld_data {
+	u32		num;
+	u32		irq;
+	u32		flags;
+	u64		ticks_per_second;
+};
+
+extern struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void);
+#else
+static inline struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void)
+{
+	return NULL;
+}
+#endif /* CONFIG_HARDLOCKUP_DETECTOR_HPET */
+
 #else /* CONFIG_HPET_TIMER */
 
 static inline int hpet_enable(void) { return 0; }
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 99d4972..fda6e19 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -5,6 +5,7 @@
 #include <linux/delay.h>
 #include <linux/errno.h>
 #include <linux/i8253.h>
+#include <linux/acpi.h>
 #include <linux/slab.h>
 #include <linux/hpet.h>
 #include <linux/init.h>
@@ -36,6 +37,7 @@ bool					hpet_msi_disable;
 
 #ifdef CONFIG_PCI_MSI
 static unsigned int			hpet_num_timers;
+static struct irq_domain		*hpet_domain;
 #endif
 static void __iomem			*hpet_virt_address;
 
@@ -177,6 +179,115 @@ do {								\
 		_hpet_print_config(__func__, __LINE__);	\
 } while (0)
 
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_HPET
+static
+int hpet_hardlockup_detector_assign_legacy_irq(struct hpet_hld_data *hdata)
+{
+	unsigned long v;
+	int gsi, hwirq;
+
+	/* Obtain interrupt pins that can be used by this timer. */
+	v = hpet_readq(HPET_Tn_CFG(HPET_WD_TIMER_NR));
+	v = (v & Tn_INT_ROUTE_CAP_MASK) >> Tn_INT_ROUTE_CAP_SHIFT;
+
+	/*
+	 * In PIC mode, skip IRQ0-4, IRQ6-9, IRQ12-15 which is always used by
+	 * legacy device. In IO APIC mode, we skip all the legacy IRQS.
+	 */
+	if (acpi_irq_model == ACPI_IRQ_MODEL_PIC)
+		v &= ~0xf3df;
+	else
+		v &= ~0xffff;
+
+	for_each_set_bit(hwirq, &v, HPET_MAX_IRQ) {
+		if (hwirq >= NR_IRQS) {
+			hwirq = HPET_MAX_IRQ;
+			break;
+		}
+
+		gsi = acpi_register_gsi(NULL, hwirq, ACPI_LEVEL_SENSITIVE,
+					ACPI_ACTIVE_LOW);
+		if (gsi > 0)
+			break;
+	}
+
+	if (hwirq >= HPET_MAX_IRQ)
+		return -ENODEV;
+
+	hdata->irq = hwirq;
+	return 0;
+}
+
+static int hpet_hardlockup_detector_assign_msi_irq(struct hpet_hld_data *hdata)
+{
+	struct hpet_dev *hdev;
+	int hwirq;
+
+	if (hpet_msi_disable)
+		return -ENODEV;
+
+	hdev = kzalloc(sizeof(*hdev), GFP_KERNEL);
+	if (!hdev)
+		return -ENOMEM;
+
+	hdev->flags |= HPET_DEV_FSB_CAP;
+	hdev->num = hdata->num;
+	sprintf(hdev->name, "hpet_hld");
+
+	/* Domain may exist if CPU does not have Always-Running APIC Timers. */
+	if (!hpet_domain) {
+		hpet_domain = hpet_create_irq_domain(hpet_blockid);
+		if (!hpet_domain)
+			return -EPERM;
+	}
+
+	hwirq = hpet_assign_irq(hpet_domain, hdev, hdev->num);
+	if (hwirq <= 0) {
+		kfree(hdev);
+		return -ENODEV;
+	}
+
+	hdata->irq = hwirq;
+	hdata->flags |= HPET_DEV_FSB_CAP;
+
+	hdev->irq = hwirq;
+
+	return 0;
+}
+
+struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void)
+{
+	struct hpet_hld_data *hdata;
+	int ret = -ENODEV;
+	unsigned int cfg;
+
+	hdata = kzalloc(sizeof(*hdata), GFP_KERNEL);
+	if (!hdata)
+		return NULL;
+
+	hdata->num = HPET_WD_TIMER_NR;
+
+	cfg = hpet_readl(HPET_Tn_CFG(HPET_WD_TIMER_NR));
+
+	hdata->ticks_per_second = hpet_get_ticks_per_sec(hpet_readq(HPET_ID));
+
+	/* Try first an MSI interrupt or fallback to IO APIC. */
+	if (cfg & HPET_TN_FSB_CAP)
+		ret = hpet_hardlockup_detector_assign_msi_irq(hdata);
+
+	if (!ret)
+		return hdata;
+
+	ret = hpet_hardlockup_detector_assign_legacy_irq(hdata);
+	if (ret) {
+		kfree(hdata);
+		return NULL;
+	}
+
+	return hdata;
+}
+#endif /* CONFIG_HARDLOCKUP_DETECTOR_HPET */
+
 /*
  * When the hpet driver (/dev/hpet) is enabled, we need to reserve
  * timer 0 and timer 1 in case of RTC emulation. Timer 2 is reserved in case
@@ -450,7 +561,6 @@ static struct clock_event_device hpet_clockevent = {
 
 static DEFINE_PER_CPU(struct hpet_dev *, cpu_hpet_dev);
 static struct hpet_dev	*hpet_devs;
-static struct irq_domain *hpet_domain;
 
 void hpet_msi_unmask(struct irq_data *data)
 {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 11/23] x86/hpet: Configure the timer used by the hardlockup detector
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Clemens Ladisch, Arnd Bergmann,
	Philippe Ombredanne, Kate Stewart, Rafael J. Wysocki, iommu

Implement the initial configuration of the timer to be used by the
hardlockup detector. The main focus of this configuration is to provide an
interrupt for the timer.

Two types of interrupt can be assigned to the timer. First, attempt to
assign a message-signaled interrupt. This implies creating the HPET MSI
domain; only if it was not created when HPET timers are used for event
timers. The data structures needed to allocate the MSI interrupt in the
domain are also created.

If message-signaled interrupts cannot be used, assign a legacy IO APIC
interrupt via the ACPI Global System Interrupts.

The resulting interrupt configuration, along with the timer instance, and
frequency are then made available to the hardlockup detector in a struct
via the new function hpet_hardlockup_detector_assign_timer().

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hpet.h |  16 +++++++
 arch/x86/kernel/hpet.c      | 112 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 9fd112a..33309b7 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -118,6 +118,22 @@ extern void hpet_unregister_irq_handler(rtc_irq_handler handler);
 
 #endif /* CONFIG_HPET_EMULATE_RTC */
 
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_HPET
+struct hpet_hld_data {
+	u32		num;
+	u32		irq;
+	u32		flags;
+	u64		ticks_per_second;
+};
+
+extern struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void);
+#else
+static inline struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void)
+{
+	return NULL;
+}
+#endif /* CONFIG_HARDLOCKUP_DETECTOR_HPET */
+
 #else /* CONFIG_HPET_TIMER */
 
 static inline int hpet_enable(void) { return 0; }
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 99d4972..fda6e19 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -5,6 +5,7 @@
 #include <linux/delay.h>
 #include <linux/errno.h>
 #include <linux/i8253.h>
+#include <linux/acpi.h>
 #include <linux/slab.h>
 #include <linux/hpet.h>
 #include <linux/init.h>
@@ -36,6 +37,7 @@ bool					hpet_msi_disable;
 
 #ifdef CONFIG_PCI_MSI
 static unsigned int			hpet_num_timers;
+static struct irq_domain		*hpet_domain;
 #endif
 static void __iomem			*hpet_virt_address;
 
@@ -177,6 +179,115 @@ do {								\
 		_hpet_print_config(__func__, __LINE__);	\
 } while (0)
 
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_HPET
+static
+int hpet_hardlockup_detector_assign_legacy_irq(struct hpet_hld_data *hdata)
+{
+	unsigned long v;
+	int gsi, hwirq;
+
+	/* Obtain interrupt pins that can be used by this timer. */
+	v = hpet_readq(HPET_Tn_CFG(HPET_WD_TIMER_NR));
+	v = (v & Tn_INT_ROUTE_CAP_MASK) >> Tn_INT_ROUTE_CAP_SHIFT;
+
+	/*
+	 * In PIC mode, skip IRQ0-4, IRQ6-9, IRQ12-15 which is always used by
+	 * legacy device. In IO APIC mode, we skip all the legacy IRQS.
+	 */
+	if (acpi_irq_model = ACPI_IRQ_MODEL_PIC)
+		v &= ~0xf3df;
+	else
+		v &= ~0xffff;
+
+	for_each_set_bit(hwirq, &v, HPET_MAX_IRQ) {
+		if (hwirq >= NR_IRQS) {
+			hwirq = HPET_MAX_IRQ;
+			break;
+		}
+
+		gsi = acpi_register_gsi(NULL, hwirq, ACPI_LEVEL_SENSITIVE,
+					ACPI_ACTIVE_LOW);
+		if (gsi > 0)
+			break;
+	}
+
+	if (hwirq >= HPET_MAX_IRQ)
+		return -ENODEV;
+
+	hdata->irq = hwirq;
+	return 0;
+}
+
+static int hpet_hardlockup_detector_assign_msi_irq(struct hpet_hld_data *hdata)
+{
+	struct hpet_dev *hdev;
+	int hwirq;
+
+	if (hpet_msi_disable)
+		return -ENODEV;
+
+	hdev = kzalloc(sizeof(*hdev), GFP_KERNEL);
+	if (!hdev)
+		return -ENOMEM;
+
+	hdev->flags |= HPET_DEV_FSB_CAP;
+	hdev->num = hdata->num;
+	sprintf(hdev->name, "hpet_hld");
+
+	/* Domain may exist if CPU does not have Always-Running APIC Timers. */
+	if (!hpet_domain) {
+		hpet_domain = hpet_create_irq_domain(hpet_blockid);
+		if (!hpet_domain)
+			return -EPERM;
+	}
+
+	hwirq = hpet_assign_irq(hpet_domain, hdev, hdev->num);
+	if (hwirq <= 0) {
+		kfree(hdev);
+		return -ENODEV;
+	}
+
+	hdata->irq = hwirq;
+	hdata->flags |= HPET_DEV_FSB_CAP;
+
+	hdev->irq = hwirq;
+
+	return 0;
+}
+
+struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void)
+{
+	struct hpet_hld_data *hdata;
+	int ret = -ENODEV;
+	unsigned int cfg;
+
+	hdata = kzalloc(sizeof(*hdata), GFP_KERNEL);
+	if (!hdata)
+		return NULL;
+
+	hdata->num = HPET_WD_TIMER_NR;
+
+	cfg = hpet_readl(HPET_Tn_CFG(HPET_WD_TIMER_NR));
+
+	hdata->ticks_per_second = hpet_get_ticks_per_sec(hpet_readq(HPET_ID));
+
+	/* Try first an MSI interrupt or fallback to IO APIC. */
+	if (cfg & HPET_TN_FSB_CAP)
+		ret = hpet_hardlockup_detector_assign_msi_irq(hdata);
+
+	if (!ret)
+		return hdata;
+
+	ret = hpet_hardlockup_detector_assign_legacy_irq(hdata);
+	if (ret) {
+		kfree(hdata);
+		return NULL;
+	}
+
+	return hdata;
+}
+#endif /* CONFIG_HARDLOCKUP_DETECTOR_HPET */
+
 /*
  * When the hpet driver (/dev/hpet) is enabled, we need to reserve
  * timer 0 and timer 1 in case of RTC emulation. Timer 2 is reserved in case
@@ -450,7 +561,6 @@ static struct clock_event_device hpet_clockevent = {
 
 static DEFINE_PER_CPU(struct hpet_dev *, cpu_hpet_dev);
 static struct hpet_dev	*hpet_devs;
-static struct irq_domain *hpet_domain;
 
 void hpet_msi_unmask(struct irq_data *data)
 {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
  2018-06-13  0:57 ` Ricardo Neri
  (?)
@ 2018-06-13  0:57   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Don Zickus, Nicholas Piggin,
	Michael Ellerman, Frederic Weisbecker, Babu Moger,
	David S. Miller, Benjamin Herrenschmidt, Paul Mackerras,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Luis R. Rodriguez, iommu

Instead of exposing individual functions for the operations of the NMI
watchdog, define a common interface that can be used across multiple
implementations.

The struct nmi_watchdog_ops is defined for such operations. These initial
definitions include the enable, disable, start, stop, and cleanup
operations.

Only a single NMI watchdog can be used in the system. The operations of
this NMI watchdog are accessed via the new variable nmi_wd_ops. This
variable is set to point the operations of the first NMI watchdog that
initializes successfully. Even though at this moment, the only available
NMI watchdog is the perf-based hardlockup detector. More implementations
can be added in the future.

While introducing this new struct for the NMI watchdog operations, convert
the perf-based NMI watchdog to use these operations.

The functions hardlockup_detector_perf_restart() and
hardlockup_detector_perf_stop() are special. They are not regular watchdog
operations; they are used to work around hardware bugs. Thus, they are not
used for the start and stop operations. Furthermore, the perf-based NMI
watchdog does not need to implement such operations. They are intended to
globally start and stop the NMI watchdog; the perf-based NMI
watchdog is implemented on a per-CPU basis.

Currently, when perf-based hardlockup detector is not selected at build
time, a dummy hardlockup_detector_perf_init() is used. The return value
of this function depends on CONFIG_HAVE_NMI_WATCHDOG. This behavior is
conserved by defining using the set of NMI watchdog operations structure
hardlockup_detector_noop. These dummy operations are used when no hard-
lockup detector is used or fails to initialize.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/nmi.h   | 39 +++++++++++++++++++++++++++----------
 kernel/watchdog.c     | 54 +++++++++++++++++++++++++++++++++++++++++++++------
 kernel/watchdog_hld.c | 16 +++++++++++----
 3 files changed, 89 insertions(+), 20 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index b8d868d..d3f5d55f 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -92,24 +92,43 @@ static inline void hardlockup_detector_disable(void) {}
 extern void arch_touch_nmi_watchdog(void);
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
-extern void hardlockup_detector_perf_disable(void);
-extern void hardlockup_detector_perf_enable(void);
-extern void hardlockup_detector_perf_cleanup(void);
-extern int hardlockup_detector_perf_init(void);
 #else
 static inline void hardlockup_detector_perf_stop(void) { }
 static inline void hardlockup_detector_perf_restart(void) { }
-static inline void hardlockup_detector_perf_disable(void) { }
-static inline void hardlockup_detector_perf_enable(void) { }
-static inline void hardlockup_detector_perf_cleanup(void) { }
 # if !defined(CONFIG_HAVE_NMI_WATCHDOG)
-static inline int hardlockup_detector_perf_init(void) { return -ENODEV; }
 static inline void arch_touch_nmi_watchdog(void) {}
-# else
-static inline int hardlockup_detector_perf_init(void) { return 0; }
 # endif
 #endif
 
+/**
+ * struct nmi_watchdog_ops - Operations performed by NMI watchdogs
+ * @init:		Initialize and configure the hardware resources of the
+ *			NMI watchdog.
+ * @enable:		Enable (i.e., monitor for hardlockups) the NMI watchdog
+ *			in the CPU in which the function is executed.
+ * @disable:		Disable (i.e., do not monitor for hardlockups) the NMI
+ *			in the CPU in which the function is executed.
+ * @start:		Start the the NMI watchdog in all CPUs. Used after the
+ *			parameters of the watchdog are updated. Optional if
+ *			such updates does not impact operation the NMI watchdog.
+ * @stop:		Stop the the NMI watchdog in all CPUs. Used before the
+ *			parameters of the watchdog are updated. Optional if
+ *			such updates does not impact the NMI watchdog.
+ * @cleanup:		Cleanup unneeded data structures of the NMI watchdog.
+ *			Used after updating the parameters of the watchdog.
+ *			Optional no cleanup is needed.
+ */
+struct nmi_watchdog_ops {
+	int	(*init)(void);
+	void	(*enable)(void);
+	void	(*disable)(void);
+	void	(*start)(void);
+	void	(*stop)(void);
+	void	(*cleanup)(void);
+};
+
+extern struct nmi_watchdog_ops hardlockup_detector_perf_ops;
+
 void watchdog_nmi_stop(void);
 void watchdog_nmi_start(void);
 int watchdog_nmi_probe(void);
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 576d180..5057376 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -48,6 +48,8 @@ int __read_mostly soft_watchdog_user_enabled = 1;
 int __read_mostly watchdog_thresh = 10;
 int __read_mostly nmi_watchdog_available;
 
+static struct nmi_watchdog_ops *nmi_wd_ops;
+
 struct cpumask watchdog_allowed_mask __read_mostly;
 
 struct cpumask watchdog_cpumask __read_mostly;
@@ -99,6 +101,23 @@ __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
 #endif /* CONFIG_HARDLOCKUP_DETECTOR */
 
 /*
+ * Define a non-existent hard lockup detector. It will be used only if
+ * no actual hardlockup detector was selected at built time.
+ */
+static inline int noop_hardlockup_detector_init(void)
+{
+	/* If arch has an NMI watchdog, pretend to initialize it. */
+	if (IS_ENABLED(CONFIG_HAVE_NMI_WATCHDOG))
+		return 0;
+	else
+		return -ENODEV;
+}
+
+static struct nmi_watchdog_ops hardlockup_detector_noop = {
+	.init = noop_hardlockup_detector_init,
+};
+
+/*
  * These functions can be overridden if an architecture implements its
  * own hardlockup detector.
  *
@@ -108,19 +127,33 @@ __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
  */
 int __weak watchdog_nmi_enable(unsigned int cpu)
 {
-	hardlockup_detector_perf_enable();
+	if (nmi_wd_ops && nmi_wd_ops->enable)
+		nmi_wd_ops->enable();
+
 	return 0;
 }
 
 void __weak watchdog_nmi_disable(unsigned int cpu)
 {
-	hardlockup_detector_perf_disable();
+	if (nmi_wd_ops && nmi_wd_ops->disable)
+		nmi_wd_ops->disable();
 }
 
 /* Return 0, if a NMI watchdog is available. Error code otherwise */
 int __weak __init watchdog_nmi_probe(void)
 {
-	return hardlockup_detector_perf_init();
+	int ret = -ENODEV;
+
+	if (IS_ENABLED(CONFIG_HARDLOCKUP_DETECTOR_PERF))
+		ret = hardlockup_detector_perf_ops.init();
+
+	if (!ret) {
+		nmi_wd_ops = &hardlockup_detector_perf_ops;
+		return ret;
+	}
+
+	nmi_wd_ops = &hardlockup_detector_noop;
+	return nmi_wd_ops->init();
 }
 
 /**
@@ -131,7 +164,11 @@ int __weak __init watchdog_nmi_probe(void)
  * update_variables();
  * watchdog_nmi_start();
  */
-void __weak watchdog_nmi_stop(void) { }
+void __weak watchdog_nmi_stop(void)
+{
+	if (nmi_wd_ops && nmi_wd_ops->stop)
+		nmi_wd_ops->stop();
+}
 
 /**
  * watchdog_nmi_start - Start the watchdog after reconfiguration
@@ -144,7 +181,11 @@ void __weak watchdog_nmi_stop(void) { }
  * - watchdog_thresh
  * - watchdog_cpumask
  */
-void __weak watchdog_nmi_start(void) { }
+void __weak watchdog_nmi_start(void)
+{
+	if (nmi_wd_ops && nmi_wd_ops->start)
+		nmi_wd_ops->start();
+}
 
 /**
  * lockup_detector_update_enable - Update the sysctl enable bit
@@ -627,7 +668,8 @@ static inline void lockup_detector_setup(void)
 static void __lockup_detector_cleanup(void)
 {
 	lockdep_assert_held(&watchdog_mutex);
-	hardlockup_detector_perf_cleanup();
+	if (nmi_wd_ops && nmi_wd_ops->cleanup)
+		nmi_wd_ops->cleanup();
 }
 
 /**
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index e449a23..036cb0a 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -186,7 +186,7 @@ static int hardlockup_detector_event_create(void)
 /**
  * hardlockup_detector_perf_enable - Enable the local event
  */
-void hardlockup_detector_perf_enable(void)
+static void hardlockup_detector_perf_enable(void)
 {
 	if (hardlockup_detector_event_create())
 		return;
@@ -201,7 +201,7 @@ void hardlockup_detector_perf_enable(void)
 /**
  * hardlockup_detector_perf_disable - Disable the local event
  */
-void hardlockup_detector_perf_disable(void)
+static void hardlockup_detector_perf_disable(void)
 {
 	struct perf_event *event = this_cpu_read(watchdog_ev);
 
@@ -219,7 +219,7 @@ void hardlockup_detector_perf_disable(void)
  *
  * Called from lockup_detector_cleanup(). Serialized by the caller.
  */
-void hardlockup_detector_perf_cleanup(void)
+static void hardlockup_detector_perf_cleanup(void)
 {
 	int cpu;
 
@@ -281,7 +281,7 @@ void __init hardlockup_detector_perf_restart(void)
 /**
  * hardlockup_detector_perf_init - Probe whether NMI event is available at all
  */
-int __init hardlockup_detector_perf_init(void)
+static int __init hardlockup_detector_perf_init(void)
 {
 	int ret = hardlockup_detector_event_create();
 
@@ -291,5 +291,13 @@ int __init hardlockup_detector_perf_init(void)
 		perf_event_release_kernel(this_cpu_read(watchdog_ev));
 		this_cpu_write(watchdog_ev, NULL);
 	}
+
 	return ret;
 }
+
+struct nmi_watchdog_ops hardlockup_detector_perf_ops = {
+	.init		= hardlockup_detector_perf_init,
+	.enable		= hardlockup_detector_perf_enable,
+	.disable	= hardlockup_detector_perf_disable,
+	.cleanup	= hardlockup_detector_perf_cleanup,
+};
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Don Zickus, Nicholas Piggin,
	Michael Ellerman, Frederic Weisbecker, Babu Moger,
	David S. Miller, Benjamin Herrenschmidt, Paul Mackerras,
	Mathieu Desnoyers, Masami Hiramatsu

Instead of exposing individual functions for the operations of the NMI
watchdog, define a common interface that can be used across multiple
implementations.

The struct nmi_watchdog_ops is defined for such operations. These initial
definitions include the enable, disable, start, stop, and cleanup
operations.

Only a single NMI watchdog can be used in the system. The operations of
this NMI watchdog are accessed via the new variable nmi_wd_ops. This
variable is set to point the operations of the first NMI watchdog that
initializes successfully. Even though at this moment, the only available
NMI watchdog is the perf-based hardlockup detector. More implementations
can be added in the future.

While introducing this new struct for the NMI watchdog operations, convert
the perf-based NMI watchdog to use these operations.

The functions hardlockup_detector_perf_restart() and
hardlockup_detector_perf_stop() are special. They are not regular watchdog
operations; they are used to work around hardware bugs. Thus, they are not
used for the start and stop operations. Furthermore, the perf-based NMI
watchdog does not need to implement such operations. They are intended to
globally start and stop the NMI watchdog; the perf-based NMI
watchdog is implemented on a per-CPU basis.

Currently, when perf-based hardlockup detector is not selected at build
time, a dummy hardlockup_detector_perf_init() is used. The return value
of this function depends on CONFIG_HAVE_NMI_WATCHDOG. This behavior is
conserved by defining using the set of NMI watchdog operations structure
hardlockup_detector_noop. These dummy operations are used when no hard-
lockup detector is used or fails to initialize.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/nmi.h   | 39 +++++++++++++++++++++++++++----------
 kernel/watchdog.c     | 54 +++++++++++++++++++++++++++++++++++++++++++++------
 kernel/watchdog_hld.c | 16 +++++++++++----
 3 files changed, 89 insertions(+), 20 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index b8d868d..d3f5d55f 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -92,24 +92,43 @@ static inline void hardlockup_detector_disable(void) {}
 extern void arch_touch_nmi_watchdog(void);
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
-extern void hardlockup_detector_perf_disable(void);
-extern void hardlockup_detector_perf_enable(void);
-extern void hardlockup_detector_perf_cleanup(void);
-extern int hardlockup_detector_perf_init(void);
 #else
 static inline void hardlockup_detector_perf_stop(void) { }
 static inline void hardlockup_detector_perf_restart(void) { }
-static inline void hardlockup_detector_perf_disable(void) { }
-static inline void hardlockup_detector_perf_enable(void) { }
-static inline void hardlockup_detector_perf_cleanup(void) { }
 # if !defined(CONFIG_HAVE_NMI_WATCHDOG)
-static inline int hardlockup_detector_perf_init(void) { return -ENODEV; }
 static inline void arch_touch_nmi_watchdog(void) {}
-# else
-static inline int hardlockup_detector_perf_init(void) { return 0; }
 # endif
 #endif
 
+/**
+ * struct nmi_watchdog_ops - Operations performed by NMI watchdogs
+ * @init:		Initialize and configure the hardware resources of the
+ *			NMI watchdog.
+ * @enable:		Enable (i.e., monitor for hardlockups) the NMI watchdog
+ *			in the CPU in which the function is executed.
+ * @disable:		Disable (i.e., do not monitor for hardlockups) the NMI
+ *			in the CPU in which the function is executed.
+ * @start:		Start the the NMI watchdog in all CPUs. Used after the
+ *			parameters of the watchdog are updated. Optional if
+ *			such updates does not impact operation the NMI watchdog.
+ * @stop:		Stop the the NMI watchdog in all CPUs. Used before the
+ *			parameters of the watchdog are updated. Optional if
+ *			such updates does not impact the NMI watchdog.
+ * @cleanup:		Cleanup unneeded data structures of the NMI watchdog.
+ *			Used after updating the parameters of the watchdog.
+ *			Optional no cleanup is needed.
+ */
+struct nmi_watchdog_ops {
+	int	(*init)(void);
+	void	(*enable)(void);
+	void	(*disable)(void);
+	void	(*start)(void);
+	void	(*stop)(void);
+	void	(*cleanup)(void);
+};
+
+extern struct nmi_watchdog_ops hardlockup_detector_perf_ops;
+
 void watchdog_nmi_stop(void);
 void watchdog_nmi_start(void);
 int watchdog_nmi_probe(void);
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 576d180..5057376 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -48,6 +48,8 @@ int __read_mostly soft_watchdog_user_enabled = 1;
 int __read_mostly watchdog_thresh = 10;
 int __read_mostly nmi_watchdog_available;
 
+static struct nmi_watchdog_ops *nmi_wd_ops;
+
 struct cpumask watchdog_allowed_mask __read_mostly;
 
 struct cpumask watchdog_cpumask __read_mostly;
@@ -99,6 +101,23 @@ __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
 #endif /* CONFIG_HARDLOCKUP_DETECTOR */
 
 /*
+ * Define a non-existent hard lockup detector. It will be used only if
+ * no actual hardlockup detector was selected at built time.
+ */
+static inline int noop_hardlockup_detector_init(void)
+{
+	/* If arch has an NMI watchdog, pretend to initialize it. */
+	if (IS_ENABLED(CONFIG_HAVE_NMI_WATCHDOG))
+		return 0;
+	else
+		return -ENODEV;
+}
+
+static struct nmi_watchdog_ops hardlockup_detector_noop = {
+	.init = noop_hardlockup_detector_init,
+};
+
+/*
  * These functions can be overridden if an architecture implements its
  * own hardlockup detector.
  *
@@ -108,19 +127,33 @@ __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
  */
 int __weak watchdog_nmi_enable(unsigned int cpu)
 {
-	hardlockup_detector_perf_enable();
+	if (nmi_wd_ops && nmi_wd_ops->enable)
+		nmi_wd_ops->enable();
+
 	return 0;
 }
 
 void __weak watchdog_nmi_disable(unsigned int cpu)
 {
-	hardlockup_detector_perf_disable();
+	if (nmi_wd_ops && nmi_wd_ops->disable)
+		nmi_wd_ops->disable();
 }
 
 /* Return 0, if a NMI watchdog is available. Error code otherwise */
 int __weak __init watchdog_nmi_probe(void)
 {
-	return hardlockup_detector_perf_init();
+	int ret = -ENODEV;
+
+	if (IS_ENABLED(CONFIG_HARDLOCKUP_DETECTOR_PERF))
+		ret = hardlockup_detector_perf_ops.init();
+
+	if (!ret) {
+		nmi_wd_ops = &hardlockup_detector_perf_ops;
+		return ret;
+	}
+
+	nmi_wd_ops = &hardlockup_detector_noop;
+	return nmi_wd_ops->init();
 }
 
 /**
@@ -131,7 +164,11 @@ int __weak __init watchdog_nmi_probe(void)
  * update_variables();
  * watchdog_nmi_start();
  */
-void __weak watchdog_nmi_stop(void) { }
+void __weak watchdog_nmi_stop(void)
+{
+	if (nmi_wd_ops && nmi_wd_ops->stop)
+		nmi_wd_ops->stop();
+}
 
 /**
  * watchdog_nmi_start - Start the watchdog after reconfiguration
@@ -144,7 +181,11 @@ void __weak watchdog_nmi_stop(void) { }
  * - watchdog_thresh
  * - watchdog_cpumask
  */
-void __weak watchdog_nmi_start(void) { }
+void __weak watchdog_nmi_start(void)
+{
+	if (nmi_wd_ops && nmi_wd_ops->start)
+		nmi_wd_ops->start();
+}
 
 /**
  * lockup_detector_update_enable - Update the sysctl enable bit
@@ -627,7 +668,8 @@ static inline void lockup_detector_setup(void)
 static void __lockup_detector_cleanup(void)
 {
 	lockdep_assert_held(&watchdog_mutex);
-	hardlockup_detector_perf_cleanup();
+	if (nmi_wd_ops && nmi_wd_ops->cleanup)
+		nmi_wd_ops->cleanup();
 }
 
 /**
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index e449a23..036cb0a 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -186,7 +186,7 @@ static int hardlockup_detector_event_create(void)
 /**
  * hardlockup_detector_perf_enable - Enable the local event
  */
-void hardlockup_detector_perf_enable(void)
+static void hardlockup_detector_perf_enable(void)
 {
 	if (hardlockup_detector_event_create())
 		return;
@@ -201,7 +201,7 @@ void hardlockup_detector_perf_enable(void)
 /**
  * hardlockup_detector_perf_disable - Disable the local event
  */
-void hardlockup_detector_perf_disable(void)
+static void hardlockup_detector_perf_disable(void)
 {
 	struct perf_event *event = this_cpu_read(watchdog_ev);
 
@@ -219,7 +219,7 @@ void hardlockup_detector_perf_disable(void)
  *
  * Called from lockup_detector_cleanup(). Serialized by the caller.
  */
-void hardlockup_detector_perf_cleanup(void)
+static void hardlockup_detector_perf_cleanup(void)
 {
 	int cpu;
 
@@ -281,7 +281,7 @@ void __init hardlockup_detector_perf_restart(void)
 /**
  * hardlockup_detector_perf_init - Probe whether NMI event is available at all
  */
-int __init hardlockup_detector_perf_init(void)
+static int __init hardlockup_detector_perf_init(void)
 {
 	int ret = hardlockup_detector_event_create();
 
@@ -291,5 +291,13 @@ int __init hardlockup_detector_perf_init(void)
 		perf_event_release_kernel(this_cpu_read(watchdog_ev));
 		this_cpu_write(watchdog_ev, NULL);
 	}
+
 	return ret;
 }
+
+struct nmi_watchdog_ops hardlockup_detector_perf_ops = {
+	.init		= hardlockup_detector_perf_init,
+	.enable		= hardlockup_detector_perf_enable,
+	.disable	= hardlockup_detector_perf_disable,
+	.cleanup	= hardlockup_detector_perf_cleanup,
+};
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Don Zickus, Nicholas Piggin,
	Michael Ellerman, Frederic Weisbecker, Babu Moger,
	David S. Miller, Benjamin Herrenschmidt, Paul Mackerras,
	Mathieu Desnoyers, Masami Hiramatsu

Instead of exposing individual functions for the operations of the NMI
watchdog, define a common interface that can be used across multiple
implementations.

The struct nmi_watchdog_ops is defined for such operations. These initial
definitions include the enable, disable, start, stop, and cleanup
operations.

Only a single NMI watchdog can be used in the system. The operations of
this NMI watchdog are accessed via the new variable nmi_wd_ops. This
variable is set to point the operations of the first NMI watchdog that
initializes successfully. Even though at this moment, the only available
NMI watchdog is the perf-based hardlockup detector. More implementations
can be added in the future.

While introducing this new struct for the NMI watchdog operations, convert
the perf-based NMI watchdog to use these operations.

The functions hardlockup_detector_perf_restart() and
hardlockup_detector_perf_stop() are special. They are not regular watchdog
operations; they are used to work around hardware bugs. Thus, they are not
used for the start and stop operations. Furthermore, the perf-based NMI
watchdog does not need to implement such operations. They are intended to
globally start and stop the NMI watchdog; the perf-based NMI
watchdog is implemented on a per-CPU basis.

Currently, when perf-based hardlockup detector is not selected at build
time, a dummy hardlockup_detector_perf_init() is used. The return value
of this function depends on CONFIG_HAVE_NMI_WATCHDOG. This behavior is
conserved by defining using the set of NMI watchdog operations structure
hardlockup_detector_noop. These dummy operations are used when no hard-
lockup detector is used or fails to initialize.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/nmi.h   | 39 +++++++++++++++++++++++++++----------
 kernel/watchdog.c     | 54 +++++++++++++++++++++++++++++++++++++++++++++------
 kernel/watchdog_hld.c | 16 +++++++++++----
 3 files changed, 89 insertions(+), 20 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index b8d868d..d3f5d55f 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -92,24 +92,43 @@ static inline void hardlockup_detector_disable(void) {}
 extern void arch_touch_nmi_watchdog(void);
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
-extern void hardlockup_detector_perf_disable(void);
-extern void hardlockup_detector_perf_enable(void);
-extern void hardlockup_detector_perf_cleanup(void);
-extern int hardlockup_detector_perf_init(void);
 #else
 static inline void hardlockup_detector_perf_stop(void) { }
 static inline void hardlockup_detector_perf_restart(void) { }
-static inline void hardlockup_detector_perf_disable(void) { }
-static inline void hardlockup_detector_perf_enable(void) { }
-static inline void hardlockup_detector_perf_cleanup(void) { }
 # if !defined(CONFIG_HAVE_NMI_WATCHDOG)
-static inline int hardlockup_detector_perf_init(void) { return -ENODEV; }
 static inline void arch_touch_nmi_watchdog(void) {}
-# else
-static inline int hardlockup_detector_perf_init(void) { return 0; }
 # endif
 #endif
 
+/**
+ * struct nmi_watchdog_ops - Operations performed by NMI watchdogs
+ * @init:		Initialize and configure the hardware resources of the
+ *			NMI watchdog.
+ * @enable:		Enable (i.e., monitor for hardlockups) the NMI watchdog
+ *			in the CPU in which the function is executed.
+ * @disable:		Disable (i.e., do not monitor for hardlockups) the NMI
+ *			in the CPU in which the function is executed.
+ * @start:		Start the the NMI watchdog in all CPUs. Used after the
+ *			parameters of the watchdog are updated. Optional if
+ *			such updates does not impact operation the NMI watchdog.
+ * @stop:		Stop the the NMI watchdog in all CPUs. Used before the
+ *			parameters of the watchdog are updated. Optional if
+ *			such updates does not impact the NMI watchdog.
+ * @cleanup:		Cleanup unneeded data structures of the NMI watchdog.
+ *			Used after updating the parameters of the watchdog.
+ *			Optional no cleanup is needed.
+ */
+struct nmi_watchdog_ops {
+	int	(*init)(void);
+	void	(*enable)(void);
+	void	(*disable)(void);
+	void	(*start)(void);
+	void	(*stop)(void);
+	void	(*cleanup)(void);
+};
+
+extern struct nmi_watchdog_ops hardlockup_detector_perf_ops;
+
 void watchdog_nmi_stop(void);
 void watchdog_nmi_start(void);
 int watchdog_nmi_probe(void);
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 576d180..5057376 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -48,6 +48,8 @@ int __read_mostly soft_watchdog_user_enabled = 1;
 int __read_mostly watchdog_thresh = 10;
 int __read_mostly nmi_watchdog_available;
 
+static struct nmi_watchdog_ops *nmi_wd_ops;
+
 struct cpumask watchdog_allowed_mask __read_mostly;
 
 struct cpumask watchdog_cpumask __read_mostly;
@@ -99,6 +101,23 @@ __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
 #endif /* CONFIG_HARDLOCKUP_DETECTOR */
 
 /*
+ * Define a non-existent hard lockup detector. It will be used only if
+ * no actual hardlockup detector was selected at built time.
+ */
+static inline int noop_hardlockup_detector_init(void)
+{
+	/* If arch has an NMI watchdog, pretend to initialize it. */
+	if (IS_ENABLED(CONFIG_HAVE_NMI_WATCHDOG))
+		return 0;
+	else
+		return -ENODEV;
+}
+
+static struct nmi_watchdog_ops hardlockup_detector_noop = {
+	.init = noop_hardlockup_detector_init,
+};
+
+/*
  * These functions can be overridden if an architecture implements its
  * own hardlockup detector.
  *
@@ -108,19 +127,33 @@ __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
  */
 int __weak watchdog_nmi_enable(unsigned int cpu)
 {
-	hardlockup_detector_perf_enable();
+	if (nmi_wd_ops && nmi_wd_ops->enable)
+		nmi_wd_ops->enable();
+
 	return 0;
 }
 
 void __weak watchdog_nmi_disable(unsigned int cpu)
 {
-	hardlockup_detector_perf_disable();
+	if (nmi_wd_ops && nmi_wd_ops->disable)
+		nmi_wd_ops->disable();
 }
 
 /* Return 0, if a NMI watchdog is available. Error code otherwise */
 int __weak __init watchdog_nmi_probe(void)
 {
-	return hardlockup_detector_perf_init();
+	int ret = -ENODEV;
+
+	if (IS_ENABLED(CONFIG_HARDLOCKUP_DETECTOR_PERF))
+		ret = hardlockup_detector_perf_ops.init();
+
+	if (!ret) {
+		nmi_wd_ops = &hardlockup_detector_perf_ops;
+		return ret;
+	}
+
+	nmi_wd_ops = &hardlockup_detector_noop;
+	return nmi_wd_ops->init();
 }
 
 /**
@@ -131,7 +164,11 @@ int __weak __init watchdog_nmi_probe(void)
  * update_variables();
  * watchdog_nmi_start();
  */
-void __weak watchdog_nmi_stop(void) { }
+void __weak watchdog_nmi_stop(void)
+{
+	if (nmi_wd_ops && nmi_wd_ops->stop)
+		nmi_wd_ops->stop();
+}
 
 /**
  * watchdog_nmi_start - Start the watchdog after reconfiguration
@@ -144,7 +181,11 @@ void __weak watchdog_nmi_stop(void) { }
  * - watchdog_thresh
  * - watchdog_cpumask
  */
-void __weak watchdog_nmi_start(void) { }
+void __weak watchdog_nmi_start(void)
+{
+	if (nmi_wd_ops && nmi_wd_ops->start)
+		nmi_wd_ops->start();
+}
 
 /**
  * lockup_detector_update_enable - Update the sysctl enable bit
@@ -627,7 +668,8 @@ static inline void lockup_detector_setup(void)
 static void __lockup_detector_cleanup(void)
 {
 	lockdep_assert_held(&watchdog_mutex);
-	hardlockup_detector_perf_cleanup();
+	if (nmi_wd_ops && nmi_wd_ops->cleanup)
+		nmi_wd_ops->cleanup();
 }
 
 /**
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index e449a23..036cb0a 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -186,7 +186,7 @@ static int hardlockup_detector_event_create(void)
 /**
  * hardlockup_detector_perf_enable - Enable the local event
  */
-void hardlockup_detector_perf_enable(void)
+static void hardlockup_detector_perf_enable(void)
 {
 	if (hardlockup_detector_event_create())
 		return;
@@ -201,7 +201,7 @@ void hardlockup_detector_perf_enable(void)
 /**
  * hardlockup_detector_perf_disable - Disable the local event
  */
-void hardlockup_detector_perf_disable(void)
+static void hardlockup_detector_perf_disable(void)
 {
 	struct perf_event *event = this_cpu_read(watchdog_ev);
 
@@ -219,7 +219,7 @@ void hardlockup_detector_perf_disable(void)
  *
  * Called from lockup_detector_cleanup(). Serialized by the caller.
  */
-void hardlockup_detector_perf_cleanup(void)
+static void hardlockup_detector_perf_cleanup(void)
 {
 	int cpu;
 
@@ -281,7 +281,7 @@ void __init hardlockup_detector_perf_restart(void)
 /**
  * hardlockup_detector_perf_init - Probe whether NMI event is available at all
  */
-int __init hardlockup_detector_perf_init(void)
+static int __init hardlockup_detector_perf_init(void)
 {
 	int ret = hardlockup_detector_event_create();
 
@@ -291,5 +291,13 @@ int __init hardlockup_detector_perf_init(void)
 		perf_event_release_kernel(this_cpu_read(watchdog_ev));
 		this_cpu_write(watchdog_ev, NULL);
 	}
+
 	return ret;
 }
+
+struct nmi_watchdog_ops hardlockup_detector_perf_ops = {
+	.init		= hardlockup_detector_perf_init,
+	.enable		= hardlockup_detector_perf_enable,
+	.disable	= hardlockup_detector_perf_disable,
+	.cleanup	= hardlockup_detector_perf_cleanup,
+};
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 13/23] watchdog/hardlockup: Define a generic function to detect hardlockups
  2018-06-13  0:57 ` Ricardo Neri
  (?)
@ 2018-06-13  0:57   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Don Zickus, Nicholas Piggin,
	Michael Ellerman, Frederic Weisbecker, Babu Moger,
	David S. Miller, Benjamin Herrenschmidt, Paul Mackerras,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Luis R. Rodriguez, iommu

The procedure to detect hardlockups is independent of the underlying
mechanism that generated the non-maskable interrupt used to drive the
detector. Thus, it can be put in a separate, generic function. In this
manner, it can be invoked by various implementations of the NMI watchdog.

For this purpose, move the bulk of watchdog_overflow_callback() to the
new function inspect_for_hardlockups(). This function can then be called
from the applicable NMI handlers.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/nmi.h   |  1 +
 kernel/watchdog_hld.c | 18 +++++++++++-------
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index d3f5d55f..e61b441 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -223,6 +223,7 @@ extern int proc_watchdog_thresh(struct ctl_table *, int ,
 				void __user *, size_t *, loff_t *);
 extern int proc_watchdog_cpumask(struct ctl_table *, int,
 				 void __user *, size_t *, loff_t *);
+void inspect_for_hardlockups(struct pt_regs *regs);
 
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 #include <asm/nmi.h>
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 036cb0a..28a00c3 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -106,14 +106,8 @@ static struct perf_event_attr wd_hw_attr = {
 	.disabled	= 1,
 };
 
-/* Callback function for perf event subsystem */
-static void watchdog_overflow_callback(struct perf_event *event,
-				       struct perf_sample_data *data,
-				       struct pt_regs *regs)
+void inspect_for_hardlockups(struct pt_regs *regs)
 {
-	/* Ensure the watchdog never gets throttled */
-	event->hw.interrupts = 0;
-
 	if (__this_cpu_read(watchdog_nmi_touch) == true) {
 		__this_cpu_write(watchdog_nmi_touch, false);
 		return;
@@ -162,6 +156,16 @@ static void watchdog_overflow_callback(struct perf_event *event,
 	return;
 }
 
+/* Callback function for perf event subsystem */
+static void watchdog_overflow_callback(struct perf_event *event,
+				       struct perf_sample_data *data,
+				       struct pt_regs *regs)
+{
+	/* Ensure the watchdog never gets throttled */
+	event->hw.interrupts = 0;
+	inspect_for_hardlockups(regs);
+}
+
 static int hardlockup_detector_event_create(void)
 {
 	unsigned int cpu = smp_processor_id();
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 13/23] watchdog/hardlockup: Define a generic function to detect hardlockups
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Don Zickus, Nicholas Piggin,
	Michael Ellerman, Frederic Weisbecker, Babu Moger,
	David S. Miller, Benjamin Herrenschmidt, Paul Mackerras,
	Mathieu Desnoyers, Masami Hiramatsu

The procedure to detect hardlockups is independent of the underlying
mechanism that generated the non-maskable interrupt used to drive the
detector. Thus, it can be put in a separate, generic function. In this
manner, it can be invoked by various implementations of the NMI watchdog.

For this purpose, move the bulk of watchdog_overflow_callback() to the
new function inspect_for_hardlockups(). This function can then be called
from the applicable NMI handlers.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/nmi.h   |  1 +
 kernel/watchdog_hld.c | 18 +++++++++++-------
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index d3f5d55f..e61b441 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -223,6 +223,7 @@ extern int proc_watchdog_thresh(struct ctl_table *, int ,
 				void __user *, size_t *, loff_t *);
 extern int proc_watchdog_cpumask(struct ctl_table *, int,
 				 void __user *, size_t *, loff_t *);
+void inspect_for_hardlockups(struct pt_regs *regs);
 
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 #include <asm/nmi.h>
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 036cb0a..28a00c3 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -106,14 +106,8 @@ static struct perf_event_attr wd_hw_attr = {
 	.disabled	= 1,
 };
 
-/* Callback function for perf event subsystem */
-static void watchdog_overflow_callback(struct perf_event *event,
-				       struct perf_sample_data *data,
-				       struct pt_regs *regs)
+void inspect_for_hardlockups(struct pt_regs *regs)
 {
-	/* Ensure the watchdog never gets throttled */
-	event->hw.interrupts = 0;
-
 	if (__this_cpu_read(watchdog_nmi_touch) = true) {
 		__this_cpu_write(watchdog_nmi_touch, false);
 		return;
@@ -162,6 +156,16 @@ static void watchdog_overflow_callback(struct perf_event *event,
 	return;
 }
 
+/* Callback function for perf event subsystem */
+static void watchdog_overflow_callback(struct perf_event *event,
+				       struct perf_sample_data *data,
+				       struct pt_regs *regs)
+{
+	/* Ensure the watchdog never gets throttled */
+	event->hw.interrupts = 0;
+	inspect_for_hardlockups(regs);
+}
+
 static int hardlockup_detector_event_create(void)
 {
 	unsigned int cpu = smp_processor_id();
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 13/23] watchdog/hardlockup: Define a generic function to detect hardlockups
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Don Zickus, Nicholas Piggin,
	Michael Ellerman, Frederic Weisbecker, Babu Moger,
	David S. Miller, Benjamin Herrenschmidt, Paul Mackerras,
	Mathieu Desnoyers, Masami Hiramatsu

The procedure to detect hardlockups is independent of the underlying
mechanism that generated the non-maskable interrupt used to drive the
detector. Thus, it can be put in a separate, generic function. In this
manner, it can be invoked by various implementations of the NMI watchdog.

For this purpose, move the bulk of watchdog_overflow_callback() to the
new function inspect_for_hardlockups(). This function can then be called
from the applicable NMI handlers.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/nmi.h   |  1 +
 kernel/watchdog_hld.c | 18 +++++++++++-------
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index d3f5d55f..e61b441 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -223,6 +223,7 @@ extern int proc_watchdog_thresh(struct ctl_table *, int ,
 				void __user *, size_t *, loff_t *);
 extern int proc_watchdog_cpumask(struct ctl_table *, int,
 				 void __user *, size_t *, loff_t *);
+void inspect_for_hardlockups(struct pt_regs *regs);
 
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 #include <asm/nmi.h>
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 036cb0a..28a00c3 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -106,14 +106,8 @@ static struct perf_event_attr wd_hw_attr = {
 	.disabled	= 1,
 };
 
-/* Callback function for perf event subsystem */
-static void watchdog_overflow_callback(struct perf_event *event,
-				       struct perf_sample_data *data,
-				       struct pt_regs *regs)
+void inspect_for_hardlockups(struct pt_regs *regs)
 {
-	/* Ensure the watchdog never gets throttled */
-	event->hw.interrupts = 0;
-
 	if (__this_cpu_read(watchdog_nmi_touch) == true) {
 		__this_cpu_write(watchdog_nmi_touch, false);
 		return;
@@ -162,6 +156,16 @@ static void watchdog_overflow_callback(struct perf_event *event,
 	return;
 }
 
+/* Callback function for perf event subsystem */
+static void watchdog_overflow_callback(struct perf_event *event,
+				       struct perf_sample_data *data,
+				       struct pt_regs *regs)
+{
+	/* Ensure the watchdog never gets throttled */
+	event->hw.interrupts = 0;
+	inspect_for_hardlockups(regs);
+}
+
 static int hardlockup_detector_event_create(void)
 {
 	unsigned int cpu = smp_processor_id();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 14/23] watchdog/hardlockup: Decouple the hardlockup detector from perf
  2018-06-13  0:57 ` Ricardo Neri
  (?)
@ 2018-06-13  0:57   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Don Zickus, Nicholas Piggin,
	Michael Ellerman, Frederic Weisbecker, Babu Moger,
	David S. Miller, Benjamin Herrenschmidt, Paul Mackerras,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Luis R. Rodriguez, iommu

The current default implementation of the hardlockup detector assumes that
it is implemented using perf events. However, the hardlockup detector can
be driven by other sources of non-maskable interrupts (e.g., a properly
configured timer).

Put in a separate file all the code that is specific to perf: create and
manage events, stop and start the detector. This perf-specific code is put
in the new file watchdog_hld_perf.c

The code generic code used to monitor the timers' thresholds, check
timestamps and detect hardlockups remains in watchdog_hld.c

Functions and variables are simply relocated to a new file. No functional
changes were made.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/Makefile            |   2 +-
 kernel/watchdog_hld.c      | 162 ----------------------------------------
 kernel/watchdog_hld_perf.c | 182 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 183 insertions(+), 163 deletions(-)
 create mode 100644 kernel/watchdog_hld_perf.c

diff --git a/kernel/Makefile b/kernel/Makefile
index f85ae5d..0a0d86d 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -85,7 +85,7 @@ obj-$(CONFIG_FAIL_FUNCTION) += fail_function.o
 obj-$(CONFIG_KGDB) += debug/
 obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
 obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
-obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o watchdog_hld_perf.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 28a00c3..96615a2 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -22,12 +22,8 @@
 
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
-static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
-static DEFINE_PER_CPU(struct perf_event *, dead_event);
-static struct cpumask dead_events_mask;
 
 static unsigned long hardlockup_allcpu_dumped;
-static atomic_t watchdog_cpus = ATOMIC_INIT(0);
 
 void arch_touch_nmi_watchdog(void)
 {
@@ -98,14 +94,6 @@ static inline bool watchdog_check_timestamp(void)
 }
 #endif
 
-static struct perf_event_attr wd_hw_attr = {
-	.type		= PERF_TYPE_HARDWARE,
-	.config		= PERF_COUNT_HW_CPU_CYCLES,
-	.size		= sizeof(struct perf_event_attr),
-	.pinned		= 1,
-	.disabled	= 1,
-};
-
 void inspect_for_hardlockups(struct pt_regs *regs)
 {
 	if (__this_cpu_read(watchdog_nmi_touch) == true) {
@@ -155,153 +143,3 @@ void inspect_for_hardlockups(struct pt_regs *regs)
 	__this_cpu_write(hard_watchdog_warn, false);
 	return;
 }
-
-/* Callback function for perf event subsystem */
-static void watchdog_overflow_callback(struct perf_event *event,
-				       struct perf_sample_data *data,
-				       struct pt_regs *regs)
-{
-	/* Ensure the watchdog never gets throttled */
-	event->hw.interrupts = 0;
-	inspect_for_hardlockups(regs);
-}
-
-static int hardlockup_detector_event_create(void)
-{
-	unsigned int cpu = smp_processor_id();
-	struct perf_event_attr *wd_attr;
-	struct perf_event *evt;
-
-	wd_attr = &wd_hw_attr;
-	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
-
-	/* Try to register using hardware perf events */
-	evt = perf_event_create_kernel_counter(wd_attr, cpu, NULL,
-					       watchdog_overflow_callback, NULL);
-	if (IS_ERR(evt)) {
-		pr_info("Perf event create on CPU %d failed with %ld\n", cpu,
-			PTR_ERR(evt));
-		return PTR_ERR(evt);
-	}
-	this_cpu_write(watchdog_ev, evt);
-	return 0;
-}
-
-/**
- * hardlockup_detector_perf_enable - Enable the local event
- */
-static void hardlockup_detector_perf_enable(void)
-{
-	if (hardlockup_detector_event_create())
-		return;
-
-	/* use original value for check */
-	if (!atomic_fetch_inc(&watchdog_cpus))
-		pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
-
-	perf_event_enable(this_cpu_read(watchdog_ev));
-}
-
-/**
- * hardlockup_detector_perf_disable - Disable the local event
- */
-static void hardlockup_detector_perf_disable(void)
-{
-	struct perf_event *event = this_cpu_read(watchdog_ev);
-
-	if (event) {
-		perf_event_disable(event);
-		this_cpu_write(watchdog_ev, NULL);
-		this_cpu_write(dead_event, event);
-		cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
-		atomic_dec(&watchdog_cpus);
-	}
-}
-
-/**
- * hardlockup_detector_perf_cleanup - Cleanup disabled events and destroy them
- *
- * Called from lockup_detector_cleanup(). Serialized by the caller.
- */
-static void hardlockup_detector_perf_cleanup(void)
-{
-	int cpu;
-
-	for_each_cpu(cpu, &dead_events_mask) {
-		struct perf_event *event = per_cpu(dead_event, cpu);
-
-		/*
-		 * Required because for_each_cpu() reports  unconditionally
-		 * CPU0 as set on UP kernels. Sigh.
-		 */
-		if (event)
-			perf_event_release_kernel(event);
-		per_cpu(dead_event, cpu) = NULL;
-	}
-	cpumask_clear(&dead_events_mask);
-}
-
-/**
- * hardlockup_detector_perf_stop - Globally stop watchdog events
- *
- * Special interface for x86 to handle the perf HT bug.
- */
-void __init hardlockup_detector_perf_stop(void)
-{
-	int cpu;
-
-	lockdep_assert_cpus_held();
-
-	for_each_online_cpu(cpu) {
-		struct perf_event *event = per_cpu(watchdog_ev, cpu);
-
-		if (event)
-			perf_event_disable(event);
-	}
-}
-
-/**
- * hardlockup_detector_perf_restart - Globally restart watchdog events
- *
- * Special interface for x86 to handle the perf HT bug.
- */
-void __init hardlockup_detector_perf_restart(void)
-{
-	int cpu;
-
-	lockdep_assert_cpus_held();
-
-	if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
-		return;
-
-	for_each_online_cpu(cpu) {
-		struct perf_event *event = per_cpu(watchdog_ev, cpu);
-
-		if (event)
-			perf_event_enable(event);
-	}
-}
-
-/**
- * hardlockup_detector_perf_init - Probe whether NMI event is available at all
- */
-static int __init hardlockup_detector_perf_init(void)
-{
-	int ret = hardlockup_detector_event_create();
-
-	if (ret) {
-		pr_info("Perf NMI watchdog permanently disabled\n");
-	} else {
-		perf_event_release_kernel(this_cpu_read(watchdog_ev));
-		this_cpu_write(watchdog_ev, NULL);
-	}
-
-	return ret;
-}
-
-struct nmi_watchdog_ops hardlockup_detector_perf_ops = {
-	.init		= hardlockup_detector_perf_init,
-	.enable		= hardlockup_detector_perf_enable,
-	.disable	= hardlockup_detector_perf_disable,
-	.cleanup	= hardlockup_detector_perf_cleanup,
-};
diff --git a/kernel/watchdog_hld_perf.c b/kernel/watchdog_hld_perf.c
new file mode 100644
index 0000000..abc8edc
--- /dev/null
+++ b/kernel/watchdog_hld_perf.c
@@ -0,0 +1,182 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Detect hard lockups on a system
+ *
+ * started by Ricardo Neri, Copyright (C) 2018 Intel Corporation.
+ *
+ * Note: All of this code comes from the previous perf-specific hardlockup
+ * detector.
+ */
+
+#define pr_fmt(fmt) "NMI perf watchdog: " fmt
+
+#include <linux/nmi.h>
+#include <linux/atomic.h>
+#include <linux/module.h>
+#include <linux/sched/debug.h>
+#include <linux/perf_event.h>
+#include <asm/irq_regs.h>
+
+static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
+static DEFINE_PER_CPU(struct perf_event *, dead_event);
+static struct cpumask dead_events_mask;
+
+static atomic_t watchdog_cpus = ATOMIC_INIT(0);
+
+static struct perf_event_attr wd_hw_attr = {
+	.type		= PERF_TYPE_HARDWARE,
+	.config		= PERF_COUNT_HW_CPU_CYCLES,
+	.size		= sizeof(struct perf_event_attr),
+	.pinned		= 1,
+	.disabled	= 1,
+};
+
+/* Callback function for perf event subsystem */
+static void watchdog_overflow_callback(struct perf_event *event,
+				       struct perf_sample_data *data,
+				       struct pt_regs *regs)
+{
+	/* Ensure the watchdog never gets throttled */
+	event->hw.interrupts = 0;
+	inspect_for_hardlockups(regs);
+}
+
+static int hardlockup_detector_event_create(void)
+{
+	unsigned int cpu = smp_processor_id();
+	struct perf_event_attr *wd_attr;
+	struct perf_event *evt;
+
+	wd_attr = &wd_hw_attr;
+	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
+
+	/* Try to register using hardware perf events */
+	evt = perf_event_create_kernel_counter(wd_attr, cpu, NULL,
+					       watchdog_overflow_callback, NULL);
+	if (IS_ERR(evt)) {
+		pr_info("Perf event create on CPU %d failed with %ld\n", cpu,
+			PTR_ERR(evt));
+		return PTR_ERR(evt);
+	}
+	this_cpu_write(watchdog_ev, evt);
+	return 0;
+}
+
+/**
+ * hardlockup_detector_perf_enable - Enable the local event
+ */
+static void hardlockup_detector_perf_enable(void)
+{
+	if (hardlockup_detector_event_create())
+		return;
+
+	/* use original value for check */
+	if (!atomic_fetch_inc(&watchdog_cpus))
+		pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
+
+	perf_event_enable(this_cpu_read(watchdog_ev));
+}
+
+/**
+ * hardlockup_detector_perf_disable - Disable the local event
+ */
+static void hardlockup_detector_perf_disable(void)
+{
+	struct perf_event *event = this_cpu_read(watchdog_ev);
+
+	if (event) {
+		perf_event_disable(event);
+		this_cpu_write(watchdog_ev, NULL);
+		this_cpu_write(dead_event, event);
+		cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
+		atomic_dec(&watchdog_cpus);
+	}
+}
+
+/**
+ * hardlockup_detector_perf_cleanup - Cleanup disabled events and destroy them
+ *
+ * Called from lockup_detector_cleanup(). Serialized by the caller.
+ */
+static void hardlockup_detector_perf_cleanup(void)
+{
+	int cpu;
+
+	for_each_cpu(cpu, &dead_events_mask) {
+		struct perf_event *event = per_cpu(dead_event, cpu);
+
+		/*
+		 * Required because for_each_cpu() reports  unconditionally
+		 * CPU0 as set on UP kernels. Sigh.
+		 */
+		if (event)
+			perf_event_release_kernel(event);
+		per_cpu(dead_event, cpu) = NULL;
+	}
+	cpumask_clear(&dead_events_mask);
+}
+
+/**
+ * hardlockup_detector_perf_stop - Globally stop watchdog events
+ *
+ * Special interface for x86 to handle the perf HT bug.
+ */
+void __init hardlockup_detector_perf_stop(void)
+{
+	int cpu;
+
+	lockdep_assert_cpus_held();
+
+	for_each_online_cpu(cpu) {
+		struct perf_event *event = per_cpu(watchdog_ev, cpu);
+
+		if (event)
+			perf_event_disable(event);
+	}
+}
+
+/**
+ * hardlockup_detector_perf_restart - Globally restart watchdog events
+ *
+ * Special interface for x86 to handle the perf HT bug.
+ */
+void __init hardlockup_detector_perf_restart(void)
+{
+	int cpu;
+
+	lockdep_assert_cpus_held();
+
+	if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
+		return;
+
+	for_each_online_cpu(cpu) {
+		struct perf_event *event = per_cpu(watchdog_ev, cpu);
+
+		if (event)
+			perf_event_enable(event);
+	}
+}
+
+/**
+ * hardlockup_detector_perf_init - Probe whether NMI event is available at all
+ */
+static int __init hardlockup_detector_perf_init(void)
+{
+	int ret = hardlockup_detector_event_create();
+
+	if (ret) {
+		pr_info("Perf NMI watchdog permanently disabled\n");
+	} else {
+		perf_event_release_kernel(this_cpu_read(watchdog_ev));
+		this_cpu_write(watchdog_ev, NULL);
+	}
+
+	return ret;
+}
+
+struct nmi_watchdog_ops hardlockup_detector_perf_ops = {
+	.init		= hardlockup_detector_perf_init,
+	.enable		= hardlockup_detector_perf_enable,
+	.disable	= hardlockup_detector_perf_disable,
+	.cleanup	= hardlockup_detector_perf_cleanup,
+};
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 14/23] watchdog/hardlockup: Decouple the hardlockup detector from perf
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Don Zickus, Nicholas Piggin,
	Michael Ellerman, Frederic Weisbecker, Babu Moger,
	David S. Miller, Benjamin Herrenschmidt, Paul Mackerras,
	Mathieu Desnoyers, Masami Hiramatsu

The current default implementation of the hardlockup detector assumes that
it is implemented using perf events. However, the hardlockup detector can
be driven by other sources of non-maskable interrupts (e.g., a properly
configured timer).

Put in a separate file all the code that is specific to perf: create and
manage events, stop and start the detector. This perf-specific code is put
in the new file watchdog_hld_perf.c

The code generic code used to monitor the timers' thresholds, check
timestamps and detect hardlockups remains in watchdog_hld.c

Functions and variables are simply relocated to a new file. No functional
changes were made.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/Makefile            |   2 +-
 kernel/watchdog_hld.c      | 162 ----------------------------------------
 kernel/watchdog_hld_perf.c | 182 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 183 insertions(+), 163 deletions(-)
 create mode 100644 kernel/watchdog_hld_perf.c

diff --git a/kernel/Makefile b/kernel/Makefile
index f85ae5d..0a0d86d 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -85,7 +85,7 @@ obj-$(CONFIG_FAIL_FUNCTION) += fail_function.o
 obj-$(CONFIG_KGDB) += debug/
 obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
 obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
-obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o watchdog_hld_perf.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 28a00c3..96615a2 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -22,12 +22,8 @@
 
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
-static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
-static DEFINE_PER_CPU(struct perf_event *, dead_event);
-static struct cpumask dead_events_mask;
 
 static unsigned long hardlockup_allcpu_dumped;
-static atomic_t watchdog_cpus = ATOMIC_INIT(0);
 
 void arch_touch_nmi_watchdog(void)
 {
@@ -98,14 +94,6 @@ static inline bool watchdog_check_timestamp(void)
 }
 #endif
 
-static struct perf_event_attr wd_hw_attr = {
-	.type		= PERF_TYPE_HARDWARE,
-	.config		= PERF_COUNT_HW_CPU_CYCLES,
-	.size		= sizeof(struct perf_event_attr),
-	.pinned		= 1,
-	.disabled	= 1,
-};
-
 void inspect_for_hardlockups(struct pt_regs *regs)
 {
 	if (__this_cpu_read(watchdog_nmi_touch) = true) {
@@ -155,153 +143,3 @@ void inspect_for_hardlockups(struct pt_regs *regs)
 	__this_cpu_write(hard_watchdog_warn, false);
 	return;
 }
-
-/* Callback function for perf event subsystem */
-static void watchdog_overflow_callback(struct perf_event *event,
-				       struct perf_sample_data *data,
-				       struct pt_regs *regs)
-{
-	/* Ensure the watchdog never gets throttled */
-	event->hw.interrupts = 0;
-	inspect_for_hardlockups(regs);
-}
-
-static int hardlockup_detector_event_create(void)
-{
-	unsigned int cpu = smp_processor_id();
-	struct perf_event_attr *wd_attr;
-	struct perf_event *evt;
-
-	wd_attr = &wd_hw_attr;
-	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
-
-	/* Try to register using hardware perf events */
-	evt = perf_event_create_kernel_counter(wd_attr, cpu, NULL,
-					       watchdog_overflow_callback, NULL);
-	if (IS_ERR(evt)) {
-		pr_info("Perf event create on CPU %d failed with %ld\n", cpu,
-			PTR_ERR(evt));
-		return PTR_ERR(evt);
-	}
-	this_cpu_write(watchdog_ev, evt);
-	return 0;
-}
-
-/**
- * hardlockup_detector_perf_enable - Enable the local event
- */
-static void hardlockup_detector_perf_enable(void)
-{
-	if (hardlockup_detector_event_create())
-		return;
-
-	/* use original value for check */
-	if (!atomic_fetch_inc(&watchdog_cpus))
-		pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
-
-	perf_event_enable(this_cpu_read(watchdog_ev));
-}
-
-/**
- * hardlockup_detector_perf_disable - Disable the local event
- */
-static void hardlockup_detector_perf_disable(void)
-{
-	struct perf_event *event = this_cpu_read(watchdog_ev);
-
-	if (event) {
-		perf_event_disable(event);
-		this_cpu_write(watchdog_ev, NULL);
-		this_cpu_write(dead_event, event);
-		cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
-		atomic_dec(&watchdog_cpus);
-	}
-}
-
-/**
- * hardlockup_detector_perf_cleanup - Cleanup disabled events and destroy them
- *
- * Called from lockup_detector_cleanup(). Serialized by the caller.
- */
-static void hardlockup_detector_perf_cleanup(void)
-{
-	int cpu;
-
-	for_each_cpu(cpu, &dead_events_mask) {
-		struct perf_event *event = per_cpu(dead_event, cpu);
-
-		/*
-		 * Required because for_each_cpu() reports  unconditionally
-		 * CPU0 as set on UP kernels. Sigh.
-		 */
-		if (event)
-			perf_event_release_kernel(event);
-		per_cpu(dead_event, cpu) = NULL;
-	}
-	cpumask_clear(&dead_events_mask);
-}
-
-/**
- * hardlockup_detector_perf_stop - Globally stop watchdog events
- *
- * Special interface for x86 to handle the perf HT bug.
- */
-void __init hardlockup_detector_perf_stop(void)
-{
-	int cpu;
-
-	lockdep_assert_cpus_held();
-
-	for_each_online_cpu(cpu) {
-		struct perf_event *event = per_cpu(watchdog_ev, cpu);
-
-		if (event)
-			perf_event_disable(event);
-	}
-}
-
-/**
- * hardlockup_detector_perf_restart - Globally restart watchdog events
- *
- * Special interface for x86 to handle the perf HT bug.
- */
-void __init hardlockup_detector_perf_restart(void)
-{
-	int cpu;
-
-	lockdep_assert_cpus_held();
-
-	if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
-		return;
-
-	for_each_online_cpu(cpu) {
-		struct perf_event *event = per_cpu(watchdog_ev, cpu);
-
-		if (event)
-			perf_event_enable(event);
-	}
-}
-
-/**
- * hardlockup_detector_perf_init - Probe whether NMI event is available at all
- */
-static int __init hardlockup_detector_perf_init(void)
-{
-	int ret = hardlockup_detector_event_create();
-
-	if (ret) {
-		pr_info("Perf NMI watchdog permanently disabled\n");
-	} else {
-		perf_event_release_kernel(this_cpu_read(watchdog_ev));
-		this_cpu_write(watchdog_ev, NULL);
-	}
-
-	return ret;
-}
-
-struct nmi_watchdog_ops hardlockup_detector_perf_ops = {
-	.init		= hardlockup_detector_perf_init,
-	.enable		= hardlockup_detector_perf_enable,
-	.disable	= hardlockup_detector_perf_disable,
-	.cleanup	= hardlockup_detector_perf_cleanup,
-};
diff --git a/kernel/watchdog_hld_perf.c b/kernel/watchdog_hld_perf.c
new file mode 100644
index 0000000..abc8edc
--- /dev/null
+++ b/kernel/watchdog_hld_perf.c
@@ -0,0 +1,182 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Detect hard lockups on a system
+ *
+ * started by Ricardo Neri, Copyright (C) 2018 Intel Corporation.
+ *
+ * Note: All of this code comes from the previous perf-specific hardlockup
+ * detector.
+ */
+
+#define pr_fmt(fmt) "NMI perf watchdog: " fmt
+
+#include <linux/nmi.h>
+#include <linux/atomic.h>
+#include <linux/module.h>
+#include <linux/sched/debug.h>
+#include <linux/perf_event.h>
+#include <asm/irq_regs.h>
+
+static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
+static DEFINE_PER_CPU(struct perf_event *, dead_event);
+static struct cpumask dead_events_mask;
+
+static atomic_t watchdog_cpus = ATOMIC_INIT(0);
+
+static struct perf_event_attr wd_hw_attr = {
+	.type		= PERF_TYPE_HARDWARE,
+	.config		= PERF_COUNT_HW_CPU_CYCLES,
+	.size		= sizeof(struct perf_event_attr),
+	.pinned		= 1,
+	.disabled	= 1,
+};
+
+/* Callback function for perf event subsystem */
+static void watchdog_overflow_callback(struct perf_event *event,
+				       struct perf_sample_data *data,
+				       struct pt_regs *regs)
+{
+	/* Ensure the watchdog never gets throttled */
+	event->hw.interrupts = 0;
+	inspect_for_hardlockups(regs);
+}
+
+static int hardlockup_detector_event_create(void)
+{
+	unsigned int cpu = smp_processor_id();
+	struct perf_event_attr *wd_attr;
+	struct perf_event *evt;
+
+	wd_attr = &wd_hw_attr;
+	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
+
+	/* Try to register using hardware perf events */
+	evt = perf_event_create_kernel_counter(wd_attr, cpu, NULL,
+					       watchdog_overflow_callback, NULL);
+	if (IS_ERR(evt)) {
+		pr_info("Perf event create on CPU %d failed with %ld\n", cpu,
+			PTR_ERR(evt));
+		return PTR_ERR(evt);
+	}
+	this_cpu_write(watchdog_ev, evt);
+	return 0;
+}
+
+/**
+ * hardlockup_detector_perf_enable - Enable the local event
+ */
+static void hardlockup_detector_perf_enable(void)
+{
+	if (hardlockup_detector_event_create())
+		return;
+
+	/* use original value for check */
+	if (!atomic_fetch_inc(&watchdog_cpus))
+		pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
+
+	perf_event_enable(this_cpu_read(watchdog_ev));
+}
+
+/**
+ * hardlockup_detector_perf_disable - Disable the local event
+ */
+static void hardlockup_detector_perf_disable(void)
+{
+	struct perf_event *event = this_cpu_read(watchdog_ev);
+
+	if (event) {
+		perf_event_disable(event);
+		this_cpu_write(watchdog_ev, NULL);
+		this_cpu_write(dead_event, event);
+		cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
+		atomic_dec(&watchdog_cpus);
+	}
+}
+
+/**
+ * hardlockup_detector_perf_cleanup - Cleanup disabled events and destroy them
+ *
+ * Called from lockup_detector_cleanup(). Serialized by the caller.
+ */
+static void hardlockup_detector_perf_cleanup(void)
+{
+	int cpu;
+
+	for_each_cpu(cpu, &dead_events_mask) {
+		struct perf_event *event = per_cpu(dead_event, cpu);
+
+		/*
+		 * Required because for_each_cpu() reports  unconditionally
+		 * CPU0 as set on UP kernels. Sigh.
+		 */
+		if (event)
+			perf_event_release_kernel(event);
+		per_cpu(dead_event, cpu) = NULL;
+	}
+	cpumask_clear(&dead_events_mask);
+}
+
+/**
+ * hardlockup_detector_perf_stop - Globally stop watchdog events
+ *
+ * Special interface for x86 to handle the perf HT bug.
+ */
+void __init hardlockup_detector_perf_stop(void)
+{
+	int cpu;
+
+	lockdep_assert_cpus_held();
+
+	for_each_online_cpu(cpu) {
+		struct perf_event *event = per_cpu(watchdog_ev, cpu);
+
+		if (event)
+			perf_event_disable(event);
+	}
+}
+
+/**
+ * hardlockup_detector_perf_restart - Globally restart watchdog events
+ *
+ * Special interface for x86 to handle the perf HT bug.
+ */
+void __init hardlockup_detector_perf_restart(void)
+{
+	int cpu;
+
+	lockdep_assert_cpus_held();
+
+	if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
+		return;
+
+	for_each_online_cpu(cpu) {
+		struct perf_event *event = per_cpu(watchdog_ev, cpu);
+
+		if (event)
+			perf_event_enable(event);
+	}
+}
+
+/**
+ * hardlockup_detector_perf_init - Probe whether NMI event is available at all
+ */
+static int __init hardlockup_detector_perf_init(void)
+{
+	int ret = hardlockup_detector_event_create();
+
+	if (ret) {
+		pr_info("Perf NMI watchdog permanently disabled\n");
+	} else {
+		perf_event_release_kernel(this_cpu_read(watchdog_ev));
+		this_cpu_write(watchdog_ev, NULL);
+	}
+
+	return ret;
+}
+
+struct nmi_watchdog_ops hardlockup_detector_perf_ops = {
+	.init		= hardlockup_detector_perf_init,
+	.enable		= hardlockup_detector_perf_enable,
+	.disable	= hardlockup_detector_perf_disable,
+	.cleanup	= hardlockup_detector_perf_cleanup,
+};
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 14/23] watchdog/hardlockup: Decouple the hardlockup detector from perf
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Don Zickus, Nicholas Piggin,
	Michael Ellerman, Frederic Weisbecker, Babu Moger,
	David S. Miller, Benjamin Herrenschmidt, Paul Mackerras,
	Mathieu Desnoyers, Masami Hiramatsu

The current default implementation of the hardlockup detector assumes that
it is implemented using perf events. However, the hardlockup detector can
be driven by other sources of non-maskable interrupts (e.g., a properly
configured timer).

Put in a separate file all the code that is specific to perf: create and
manage events, stop and start the detector. This perf-specific code is put
in the new file watchdog_hld_perf.c

The code generic code used to monitor the timers' thresholds, check
timestamps and detect hardlockups remains in watchdog_hld.c

Functions and variables are simply relocated to a new file. No functional
changes were made.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/Makefile            |   2 +-
 kernel/watchdog_hld.c      | 162 ----------------------------------------
 kernel/watchdog_hld_perf.c | 182 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 183 insertions(+), 163 deletions(-)
 create mode 100644 kernel/watchdog_hld_perf.c

diff --git a/kernel/Makefile b/kernel/Makefile
index f85ae5d..0a0d86d 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -85,7 +85,7 @@ obj-$(CONFIG_FAIL_FUNCTION) += fail_function.o
 obj-$(CONFIG_KGDB) += debug/
 obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
 obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
-obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o watchdog_hld_perf.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 28a00c3..96615a2 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -22,12 +22,8 @@
 
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
-static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
-static DEFINE_PER_CPU(struct perf_event *, dead_event);
-static struct cpumask dead_events_mask;
 
 static unsigned long hardlockup_allcpu_dumped;
-static atomic_t watchdog_cpus = ATOMIC_INIT(0);
 
 void arch_touch_nmi_watchdog(void)
 {
@@ -98,14 +94,6 @@ static inline bool watchdog_check_timestamp(void)
 }
 #endif
 
-static struct perf_event_attr wd_hw_attr = {
-	.type		= PERF_TYPE_HARDWARE,
-	.config		= PERF_COUNT_HW_CPU_CYCLES,
-	.size		= sizeof(struct perf_event_attr),
-	.pinned		= 1,
-	.disabled	= 1,
-};
-
 void inspect_for_hardlockups(struct pt_regs *regs)
 {
 	if (__this_cpu_read(watchdog_nmi_touch) == true) {
@@ -155,153 +143,3 @@ void inspect_for_hardlockups(struct pt_regs *regs)
 	__this_cpu_write(hard_watchdog_warn, false);
 	return;
 }
-
-/* Callback function for perf event subsystem */
-static void watchdog_overflow_callback(struct perf_event *event,
-				       struct perf_sample_data *data,
-				       struct pt_regs *regs)
-{
-	/* Ensure the watchdog never gets throttled */
-	event->hw.interrupts = 0;
-	inspect_for_hardlockups(regs);
-}
-
-static int hardlockup_detector_event_create(void)
-{
-	unsigned int cpu = smp_processor_id();
-	struct perf_event_attr *wd_attr;
-	struct perf_event *evt;
-
-	wd_attr = &wd_hw_attr;
-	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
-
-	/* Try to register using hardware perf events */
-	evt = perf_event_create_kernel_counter(wd_attr, cpu, NULL,
-					       watchdog_overflow_callback, NULL);
-	if (IS_ERR(evt)) {
-		pr_info("Perf event create on CPU %d failed with %ld\n", cpu,
-			PTR_ERR(evt));
-		return PTR_ERR(evt);
-	}
-	this_cpu_write(watchdog_ev, evt);
-	return 0;
-}
-
-/**
- * hardlockup_detector_perf_enable - Enable the local event
- */
-static void hardlockup_detector_perf_enable(void)
-{
-	if (hardlockup_detector_event_create())
-		return;
-
-	/* use original value for check */
-	if (!atomic_fetch_inc(&watchdog_cpus))
-		pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
-
-	perf_event_enable(this_cpu_read(watchdog_ev));
-}
-
-/**
- * hardlockup_detector_perf_disable - Disable the local event
- */
-static void hardlockup_detector_perf_disable(void)
-{
-	struct perf_event *event = this_cpu_read(watchdog_ev);
-
-	if (event) {
-		perf_event_disable(event);
-		this_cpu_write(watchdog_ev, NULL);
-		this_cpu_write(dead_event, event);
-		cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
-		atomic_dec(&watchdog_cpus);
-	}
-}
-
-/**
- * hardlockup_detector_perf_cleanup - Cleanup disabled events and destroy them
- *
- * Called from lockup_detector_cleanup(). Serialized by the caller.
- */
-static void hardlockup_detector_perf_cleanup(void)
-{
-	int cpu;
-
-	for_each_cpu(cpu, &dead_events_mask) {
-		struct perf_event *event = per_cpu(dead_event, cpu);
-
-		/*
-		 * Required because for_each_cpu() reports  unconditionally
-		 * CPU0 as set on UP kernels. Sigh.
-		 */
-		if (event)
-			perf_event_release_kernel(event);
-		per_cpu(dead_event, cpu) = NULL;
-	}
-	cpumask_clear(&dead_events_mask);
-}
-
-/**
- * hardlockup_detector_perf_stop - Globally stop watchdog events
- *
- * Special interface for x86 to handle the perf HT bug.
- */
-void __init hardlockup_detector_perf_stop(void)
-{
-	int cpu;
-
-	lockdep_assert_cpus_held();
-
-	for_each_online_cpu(cpu) {
-		struct perf_event *event = per_cpu(watchdog_ev, cpu);
-
-		if (event)
-			perf_event_disable(event);
-	}
-}
-
-/**
- * hardlockup_detector_perf_restart - Globally restart watchdog events
- *
- * Special interface for x86 to handle the perf HT bug.
- */
-void __init hardlockup_detector_perf_restart(void)
-{
-	int cpu;
-
-	lockdep_assert_cpus_held();
-
-	if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
-		return;
-
-	for_each_online_cpu(cpu) {
-		struct perf_event *event = per_cpu(watchdog_ev, cpu);
-
-		if (event)
-			perf_event_enable(event);
-	}
-}
-
-/**
- * hardlockup_detector_perf_init - Probe whether NMI event is available at all
- */
-static int __init hardlockup_detector_perf_init(void)
-{
-	int ret = hardlockup_detector_event_create();
-
-	if (ret) {
-		pr_info("Perf NMI watchdog permanently disabled\n");
-	} else {
-		perf_event_release_kernel(this_cpu_read(watchdog_ev));
-		this_cpu_write(watchdog_ev, NULL);
-	}
-
-	return ret;
-}
-
-struct nmi_watchdog_ops hardlockup_detector_perf_ops = {
-	.init		= hardlockup_detector_perf_init,
-	.enable		= hardlockup_detector_perf_enable,
-	.disable	= hardlockup_detector_perf_disable,
-	.cleanup	= hardlockup_detector_perf_cleanup,
-};
diff --git a/kernel/watchdog_hld_perf.c b/kernel/watchdog_hld_perf.c
new file mode 100644
index 0000000..abc8edc
--- /dev/null
+++ b/kernel/watchdog_hld_perf.c
@@ -0,0 +1,182 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Detect hard lockups on a system
+ *
+ * started by Ricardo Neri, Copyright (C) 2018 Intel Corporation.
+ *
+ * Note: All of this code comes from the previous perf-specific hardlockup
+ * detector.
+ */
+
+#define pr_fmt(fmt) "NMI perf watchdog: " fmt
+
+#include <linux/nmi.h>
+#include <linux/atomic.h>
+#include <linux/module.h>
+#include <linux/sched/debug.h>
+#include <linux/perf_event.h>
+#include <asm/irq_regs.h>
+
+static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
+static DEFINE_PER_CPU(struct perf_event *, dead_event);
+static struct cpumask dead_events_mask;
+
+static atomic_t watchdog_cpus = ATOMIC_INIT(0);
+
+static struct perf_event_attr wd_hw_attr = {
+	.type		= PERF_TYPE_HARDWARE,
+	.config		= PERF_COUNT_HW_CPU_CYCLES,
+	.size		= sizeof(struct perf_event_attr),
+	.pinned		= 1,
+	.disabled	= 1,
+};
+
+/* Callback function for perf event subsystem */
+static void watchdog_overflow_callback(struct perf_event *event,
+				       struct perf_sample_data *data,
+				       struct pt_regs *regs)
+{
+	/* Ensure the watchdog never gets throttled */
+	event->hw.interrupts = 0;
+	inspect_for_hardlockups(regs);
+}
+
+static int hardlockup_detector_event_create(void)
+{
+	unsigned int cpu = smp_processor_id();
+	struct perf_event_attr *wd_attr;
+	struct perf_event *evt;
+
+	wd_attr = &wd_hw_attr;
+	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
+
+	/* Try to register using hardware perf events */
+	evt = perf_event_create_kernel_counter(wd_attr, cpu, NULL,
+					       watchdog_overflow_callback, NULL);
+	if (IS_ERR(evt)) {
+		pr_info("Perf event create on CPU %d failed with %ld\n", cpu,
+			PTR_ERR(evt));
+		return PTR_ERR(evt);
+	}
+	this_cpu_write(watchdog_ev, evt);
+	return 0;
+}
+
+/**
+ * hardlockup_detector_perf_enable - Enable the local event
+ */
+static void hardlockup_detector_perf_enable(void)
+{
+	if (hardlockup_detector_event_create())
+		return;
+
+	/* use original value for check */
+	if (!atomic_fetch_inc(&watchdog_cpus))
+		pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
+
+	perf_event_enable(this_cpu_read(watchdog_ev));
+}
+
+/**
+ * hardlockup_detector_perf_disable - Disable the local event
+ */
+static void hardlockup_detector_perf_disable(void)
+{
+	struct perf_event *event = this_cpu_read(watchdog_ev);
+
+	if (event) {
+		perf_event_disable(event);
+		this_cpu_write(watchdog_ev, NULL);
+		this_cpu_write(dead_event, event);
+		cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
+		atomic_dec(&watchdog_cpus);
+	}
+}
+
+/**
+ * hardlockup_detector_perf_cleanup - Cleanup disabled events and destroy them
+ *
+ * Called from lockup_detector_cleanup(). Serialized by the caller.
+ */
+static void hardlockup_detector_perf_cleanup(void)
+{
+	int cpu;
+
+	for_each_cpu(cpu, &dead_events_mask) {
+		struct perf_event *event = per_cpu(dead_event, cpu);
+
+		/*
+		 * Required because for_each_cpu() reports  unconditionally
+		 * CPU0 as set on UP kernels. Sigh.
+		 */
+		if (event)
+			perf_event_release_kernel(event);
+		per_cpu(dead_event, cpu) = NULL;
+	}
+	cpumask_clear(&dead_events_mask);
+}
+
+/**
+ * hardlockup_detector_perf_stop - Globally stop watchdog events
+ *
+ * Special interface for x86 to handle the perf HT bug.
+ */
+void __init hardlockup_detector_perf_stop(void)
+{
+	int cpu;
+
+	lockdep_assert_cpus_held();
+
+	for_each_online_cpu(cpu) {
+		struct perf_event *event = per_cpu(watchdog_ev, cpu);
+
+		if (event)
+			perf_event_disable(event);
+	}
+}
+
+/**
+ * hardlockup_detector_perf_restart - Globally restart watchdog events
+ *
+ * Special interface for x86 to handle the perf HT bug.
+ */
+void __init hardlockup_detector_perf_restart(void)
+{
+	int cpu;
+
+	lockdep_assert_cpus_held();
+
+	if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
+		return;
+
+	for_each_online_cpu(cpu) {
+		struct perf_event *event = per_cpu(watchdog_ev, cpu);
+
+		if (event)
+			perf_event_enable(event);
+	}
+}
+
+/**
+ * hardlockup_detector_perf_init - Probe whether NMI event is available at all
+ */
+static int __init hardlockup_detector_perf_init(void)
+{
+	int ret = hardlockup_detector_event_create();
+
+	if (ret) {
+		pr_info("Perf NMI watchdog permanently disabled\n");
+	} else {
+		perf_event_release_kernel(this_cpu_read(watchdog_ev));
+		this_cpu_write(watchdog_ev, NULL);
+	}
+
+	return ret;
+}
+
+struct nmi_watchdog_ops hardlockup_detector_perf_ops = {
+	.init		= hardlockup_detector_perf_init,
+	.enable		= hardlockup_detector_perf_enable,
+	.disable	= hardlockup_detector_perf_disable,
+	.cleanup	= hardlockup_detector_perf_cleanup,
+};
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 15/23] kernel/watchdog: Add a function to obtain the watchdog_allowed_mask
  2018-06-13  0:57 ` Ricardo Neri
  (?)
@ 2018-06-13  0:57   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Paul Mackerras,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Randy Dunlap, Davidlohr Bueso, Christoffer Dall,
	Marc Zyngier, Kai-Heng Feng, Konrad Rzeszutek Wilk,
	David Rientjes, David S. Miller, Benjamin Herrenschmidt, iommu

Implementations of NMI watchdogs that use a single piece of hardware to
monitor all the CPUs in the system (as opposed to per-CPU implementations
such as perf) need to know which CPUs the watchdog is allowed to monitor.
In this manner, non-maskable interrupts are directed only to the monitored
CPUs.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: iommu@lists.linux-foundation.org
Cc: sparclinux@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/nmi.h | 1 +
 kernel/watchdog.c   | 7 ++++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index e61b441..e608762 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -77,6 +77,7 @@ static inline void reset_hung_task_detector(void) { }
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR)
 extern void hardlockup_detector_disable(void);
+extern struct cpumask *watchdog_get_allowed_cpumask(void);
 extern unsigned int hardlockup_panic;
 #else
 static inline void hardlockup_detector_disable(void) {}
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 5057376..b94bbe3 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -50,7 +50,7 @@ int __read_mostly nmi_watchdog_available;
 
 static struct nmi_watchdog_ops *nmi_wd_ops;
 
-struct cpumask watchdog_allowed_mask __read_mostly;
+static struct cpumask watchdog_allowed_mask __read_mostly;
 
 struct cpumask watchdog_cpumask __read_mostly;
 unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
@@ -98,6 +98,11 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
 }
 __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
 # endif /* CONFIG_SMP */
+
+struct cpumask *watchdog_get_allowed_cpumask(void)
+{
+	return &watchdog_allowed_mask;
+}
 #endif /* CONFIG_HARDLOCKUP_DETECTOR */
 
 /*
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 15/23] kernel/watchdog: Add a function to obtain the watchdog_allowed_mask
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Paul Mackerras,
	Mathieu Desnoyers, Masami Hiramatsu

Implementations of NMI watchdogs that use a single piece of hardware to
monitor all the CPUs in the system (as opposed to per-CPU implementations
such as perf) need to know which CPUs the watchdog is allowed to monitor.
In this manner, non-maskable interrupts are directed only to the monitored
CPUs.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: iommu@lists.linux-foundation.org
Cc: sparclinux@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/nmi.h | 1 +
 kernel/watchdog.c   | 7 ++++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index e61b441..e608762 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -77,6 +77,7 @@ static inline void reset_hung_task_detector(void) { }
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR)
 extern void hardlockup_detector_disable(void);
+extern struct cpumask *watchdog_get_allowed_cpumask(void);
 extern unsigned int hardlockup_panic;
 #else
 static inline void hardlockup_detector_disable(void) {}
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 5057376..b94bbe3 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -50,7 +50,7 @@ int __read_mostly nmi_watchdog_available;
 
 static struct nmi_watchdog_ops *nmi_wd_ops;
 
-struct cpumask watchdog_allowed_mask __read_mostly;
+static struct cpumask watchdog_allowed_mask __read_mostly;
 
 struct cpumask watchdog_cpumask __read_mostly;
 unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
@@ -98,6 +98,11 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
 }
 __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
 # endif /* CONFIG_SMP */
+
+struct cpumask *watchdog_get_allowed_cpumask(void)
+{
+	return &watchdog_allowed_mask;
+}
 #endif /* CONFIG_HARDLOCKUP_DETECTOR */
 
 /*
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 15/23] kernel/watchdog: Add a function to obtain the watchdog_allowed_mask
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Paul Mackerras,
	Mathieu Desnoyers, Masami Hiramatsu

Implementations of NMI watchdogs that use a single piece of hardware to
monitor all the CPUs in the system (as opposed to per-CPU implementations
such as perf) need to know which CPUs the watchdog is allowed to monitor.
In this manner, non-maskable interrupts are directed only to the monitored
CPUs.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: iommu@lists.linux-foundation.org
Cc: sparclinux@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/nmi.h | 1 +
 kernel/watchdog.c   | 7 ++++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index e61b441..e608762 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -77,6 +77,7 @@ static inline void reset_hung_task_detector(void) { }
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR)
 extern void hardlockup_detector_disable(void);
+extern struct cpumask *watchdog_get_allowed_cpumask(void);
 extern unsigned int hardlockup_panic;
 #else
 static inline void hardlockup_detector_disable(void) {}
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 5057376..b94bbe3 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -50,7 +50,7 @@ int __read_mostly nmi_watchdog_available;
 
 static struct nmi_watchdog_ops *nmi_wd_ops;
 
-struct cpumask watchdog_allowed_mask __read_mostly;
+static struct cpumask watchdog_allowed_mask __read_mostly;
 
 struct cpumask watchdog_cpumask __read_mostly;
 unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
@@ -98,6 +98,11 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
 }
 __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
 # endif /* CONFIG_SMP */
+
+struct cpumask *watchdog_get_allowed_cpumask(void)
+{
+	return &watchdog_allowed_mask;
+}
 #endif /* CONFIG_HARDLOCKUP_DETECTOR */
 
 /*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 16/23] watchdog/hardlockup: Add an HPET-based hardlockup detector
  2018-06-13  0:57 ` Ricardo Neri
  (?)
@ 2018-06-13  0:57   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Mathieu Desnoyers,
	Masami Hiramatsu, Peter Zijlstra, Andrew Morton,
	Philippe Ombredanne, Colin Ian King, Byungchul Park,
	Paul E. McKenney, Luis R. Rodriguez, Waiman Long, Josh Poimboeuf,
	Randy Dunlap, Davidlohr Bueso, Christoffer Dall, Marc Zyngier,
	Kai-Heng Feng, Konrad Rzeszutek Wilk, David Rientjes, iommu

This is the initial implementation of a hardlockup detector driven by an
HPET timer. This initial implementation includes functions to control
the timer via its registers. It also requests such timer, installs
a minimal interrupt handler and performs the initial configuration of
the timer.

The detector is not functional at this stage. Subsequent changesets will
populate the NMI watchdog operations and register it with the lockup
detector.

This detector depends on HPET_TIMER since platform code performs the
initialization of the timer and maps its registers to memory. It depends
on HPET to compute the ticks per second of the timer.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/Makefile            |   1 +
 kernel/watchdog_hld_hpet.c | 334 +++++++++++++++++++++++++++++++++++++++++++++
 lib/Kconfig.debug          |  10 ++
 3 files changed, 345 insertions(+)
 create mode 100644 kernel/watchdog_hld_hpet.c

diff --git a/kernel/Makefile b/kernel/Makefile
index 0a0d86d..73c79b2 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -86,6 +86,7 @@ obj-$(CONFIG_KGDB) += debug/
 obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
 obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
 obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o watchdog_hld_perf.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_HPET) += watchdog_hld.o watchdog_hld_hpet.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
new file mode 100644
index 0000000..8fa4e55
--- /dev/null
+++ b/kernel/watchdog_hld_hpet.c
@@ -0,0 +1,334 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * A hardlockup detector driven by an HPET timer.
+ *
+ * Copyright (C) Intel Corporation 2018
+ */
+
+#define pr_fmt(fmt) "NMI hpet watchdog: " fmt
+
+#include <linux/nmi.h>
+#include <linux/hpet.h>
+#include <asm/hpet.h>
+
+#undef pr_fmt
+#define pr_fmt(fmt) "NMI hpet watchdog: " fmt
+
+static struct hpet_hld_data *hld_data;
+
+/**
+ * get_count() - Get the current count of the HPET timer
+ *
+ * Returns:
+ *
+ * Value of the main counter of the HPET timer
+ */
+static inline unsigned long get_count(void)
+{
+	return hpet_readq(HPET_COUNTER);
+}
+
+/**
+ * set_comparator() - Update the comparator in an HPET timer instance
+ * @hdata:	A data structure with the timer instance to update
+ * @cmp:	The value to write in the in the comparator registere
+ *
+ * Returns:
+ *
+ * None
+ */
+static inline void set_comparator(struct hpet_hld_data *hdata,
+				  unsigned long cmp)
+{
+	hpet_writeq(cmp, HPET_Tn_CMP(hdata->num));
+}
+
+/**
+ * kick_timer() - Reprogram timer to expire in the future
+ * @hdata:	A data structure with the timer instance to update
+ *
+ * Reprogram the timer to expire within watchdog_thresh seconds in the future.
+ *
+ * Returns:
+ *
+ * None
+ */
+static void kick_timer(struct hpet_hld_data *hdata)
+{
+	unsigned long new_compare, count;
+
+	/*
+	 * Update the comparator in increments of watch_thresh seconds relative
+	 * to the current count. Since watch_thresh is given in seconds, we
+	 * are able to update the comparator before the counter reaches such new
+	 * value.
+	 *
+	 * Let it wrap around if needed.
+	 */
+	count = get_count();
+
+	new_compare = count + watchdog_thresh * hdata->ticks_per_second;
+
+	set_comparator(hdata, new_compare);
+}
+
+/**
+ * disable() - Disable an HPET timer instance
+ * @hdata:	A data structure with the timer instance to disable
+ *
+ * Returns:
+ *
+ * None
+ */
+static void disable(struct hpet_hld_data *hdata)
+{
+	unsigned int v;
+
+	v = hpet_readl(HPET_Tn_CFG(hdata->num));
+	v &= ~HPET_TN_ENABLE;
+	hpet_writel(v, HPET_Tn_CFG(hdata->num));
+}
+
+/**
+ * enable() - Enable an HPET timer instance
+ * @hdata:	A data structure with the timer instance to enable
+ *
+ * Returns:
+ *
+ * None
+ */
+static void enable(struct hpet_hld_data *hdata)
+{
+	unsigned long v;
+
+	/* Clear any previously active interrupt. */
+	hpet_writel(BIT(hdata->num), HPET_STATUS);
+
+	v = hpet_readl(HPET_Tn_CFG(hdata->num));
+	v |= HPET_TN_ENABLE;
+	hpet_writel(v, HPET_Tn_CFG(hdata->num));
+}
+
+/**
+ * set_periodic() - Set an HPET timer instance in periodic mode
+ * @hdata:	A data structure with the timer instance to enable
+ *
+ * If the timer supports periodic mode, configure it in such mode.
+ * Returns:
+ *
+ * None
+ */
+static void set_periodic(struct hpet_hld_data *hdata)
+{
+	unsigned long v;
+
+	v = hpet_readl(HPET_Tn_CFG(hdata->num));
+	if (v & HPET_TN_PERIODIC_CAP) {
+		v |= HPET_TN_PERIODIC;
+		hpet_writel(v, HPET_Tn_CFG(hdata->num));
+		hdata->flags |= HPET_DEV_PERI_CAP;
+	}
+}
+
+/**
+ * is_hpet_wdt_interrupt() - Determine if an HPET timer caused interrupt
+ * @hdata:	A data structure with the timer instance to enable
+ *
+ * To be used when the timer is programmed in level-triggered mode, determine
+ * if an instance of an HPET timer indicates that it asserted an interrupt by
+ * checking the status register.
+ *
+ * Returns:
+ *
+ * True if a level-triggered timer asserted an interrupt. False otherwise.
+ */
+static bool is_hpet_wdt_interrupt(struct hpet_hld_data *hdata)
+{
+	unsigned long this_isr;
+	unsigned int lvl_trig;
+
+	this_isr = hpet_readl(HPET_STATUS) & BIT(hdata->num);
+
+	lvl_trig = hpet_readl(HPET_Tn_CFG(hdata->num)) & HPET_TN_LEVEL;
+
+	if (lvl_trig && this_isr)
+		return true;
+
+	return false;
+}
+
+/**
+ * hardlockup_detector_irq_handler() - Interrupt handler
+ * @irq:	Interrupt number
+ * @data:	Data associated with the interrupt
+ *
+ * A simple interrupt handler. Simply kick the timer and acknowledge the
+ * interrupt.
+ *
+ * Returns:
+ *
+ * IRQ_NONE if the HPET timer did not cause the interrupt. IRQ_HANDLED
+ * otherwise.
+ */
+static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
+{
+	struct hpet_hld_data *hdata = data;
+	unsigned int use_fsb;
+
+	use_fsb = hdata->flags & HPET_DEV_FSB_CAP;
+
+	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
+		return IRQ_NONE;
+
+	if (!(hdata->flags & HPET_DEV_PERI_CAP))
+		kick_timer(hdata);
+
+	/* Acknowledge interrupt if in level-triggered mode */
+	if (!use_fsb)
+		hpet_writel(BIT(hdata->num), HPET_STATUS);
+
+	return IRQ_HANDLED;
+}
+
+/**
+ * setup_irq_msi_mode() - Configure the timer to deliver an MSI interrupt
+ * @data:	Data associated with the instance of the HPET timer to configure
+ *
+ * Configure an instance of the HPET timer to deliver interrupts via the Front-
+ * Side Bus.
+ *
+ * Returns:
+ *
+ * 0 success. An error code in configuration was unsuccessful.
+ */
+static int setup_irq_msi_mode(struct hpet_hld_data *hdata)
+{
+	unsigned int v;
+
+	v = hpet_readl(HPET_Tn_CFG(hdata->num));
+
+	/*
+	 * If FSB interrupt delivery is used, configure as edge-triggered
+	 * interrupt. We are certain the interrupt comes from the HPET timer as
+	 * we receive the MSI message.
+	 *
+	 * Also, the FSB delivery mode and the FSB route are configured when the
+	 * interrupt is unmasked.
+	 */
+	v &= ~HPET_TN_LEVEL;
+
+	hpet_writel(v, HPET_Tn_CFG(hdata->num));
+
+	return 0;
+}
+
+/**
+ * setup_irq_legacy_mode() - Configure the timer to deliver an pin interrupt
+ * @data:	Data associated with the instance of the HPET timer to configure
+ *
+ * Configure an instance of the HPET timer to deliver interrupts via a pin of
+ * the IO APIC.
+ *
+ * Returns:
+ *
+ * 0 success. An error code in configuration was unsuccessful.
+ */
+static int setup_irq_legacy_mode(struct hpet_hld_data *hdata)
+{
+	int hwirq = hdata->irq;
+	unsigned long v;
+
+	v = hpet_readl(HPET_Tn_CFG(hdata->num));
+
+	v |= hwirq << HPET_TN_ROUTE_SHIFT;
+	hpet_writel(v, HPET_Tn_CFG(hdata->num));
+
+	/*
+	 * If IO APIC interrupt delivery is used, configure as level-triggered.
+	 * In this way, the ISR register can be used to determine if this HPET
+	 * timer caused the interrupt at the IO APIC pin.
+	 */
+	v |= HPET_TN_LEVEL;
+
+	/* Disable Front-Side Bus delivery. */
+	v &= ~HPET_TN_FSB;
+
+	hpet_writel(v, HPET_Tn_CFG(hdata->num));
+
+	return 0;
+}
+
+/**
+ * setup_hpet_irq() - Configure the interrupt delivery of an HPET timer
+ * @data:	Data associated with the instance of the HPET timer to configure
+ *
+ * Configure the interrupt parameters of an HPET timer. If supported, configure
+ * interrupts to be delivered via the Front-Side Bus. Also, install an interrupt
+ * handler.
+ *
+ * Returns:
+ *
+ * 0 success. An error code in configuration was unsuccessful.
+ */
+static int setup_hpet_irq(struct hpet_hld_data *hdata)
+{
+	int hwirq = hdata->irq, ret;
+
+	if (hdata->flags & HPET_DEV_FSB_CAP)
+		ret = setup_irq_msi_mode(hdata);
+	else
+		ret = setup_irq_legacy_mode(hdata);
+
+	if (ret)
+		return ret;
+
+	/*
+	 * Request an interrupt to activate the irq in all the needed domains.
+	 */
+	ret = request_irq(hwirq, hardlockup_detector_irq_handler,
+			  IRQF_TIMER, "hpet_hld", hdata);
+
+	return ret;
+}
+
+/**
+ * hardlockup_detector_hpet_init() - Initialize the hardlockup detector
+ *
+ * Only initialize and configure the detector if an HPET is available on the
+ * system.
+ *
+ * Returns:
+ *
+ * 0 success. An error code if initialization was unsuccessful.
+ */
+static int __init hardlockup_detector_hpet_init(void)
+{
+	int ret;
+
+	if (!is_hpet_enabled())
+		return -ENODEV;
+
+	hld_data = hpet_hardlockup_detector_assign_timer();
+	if (!hld_data)
+		return -ENODEV;
+
+	/* Disable before configuring. */
+	disable(hld_data);
+
+	set_periodic(hld_data);
+
+	/* Set timer for the first time relative to the current count. */
+	kick_timer(hld_data);
+
+	ret = setup_hpet_irq(hld_data);
+	if (ret)
+		return -ENODEV;
+
+	/*
+	 * Timer might have been enabled when the interrupt was unmasked.
+	 * This should be done via the .enable operation.
+	 */
+	disable(hld_data);
+
+	return 0;
+}
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c40c7b7..6e79833 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -828,6 +828,16 @@ config HARDLOCKUP_DETECTOR_PERF
 	bool
 	select SOFTLOCKUP_DETECTOR
 
+config HARDLOCKUP_DETECTOR_HPET
+	bool "Use HPET Timer for Hard Lockup Detection"
+	select SOFTLOCKUP_DETECTOR
+	select HARDLOCKUP_DETECTOR
+	depends on HPET_TIMER && HPET
+	help
+	  Say y to enable a hardlockup detector that is driven by an High-Precision
+	  Event Timer. In addition to selecting this option, the command-line
+	  parameter nmi_watchdog option. See Documentation/admin-guide/kernel-parameters.rst
+
 #
 # Enables a timestamp based low pass filter to compensate for perf based
 # hard lockup detection which runs too fast due to turbo modes.
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 16/23] watchdog/hardlockup: Add an HPET-based hardlockup detector
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Mathieu Desnoyers,
	Masami Hiramatsu, Peter Zijlstra

This is the initial implementation of a hardlockup detector driven by an
HPET timer. This initial implementation includes functions to control
the timer via its registers. It also requests such timer, installs
a minimal interrupt handler and performs the initial configuration of
the timer.

The detector is not functional at this stage. Subsequent changesets will
populate the NMI watchdog operations and register it with the lockup
detector.

This detector depends on HPET_TIMER since platform code performs the
initialization of the timer and maps its registers to memory. It depends
on HPET to compute the ticks per second of the timer.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/Makefile            |   1 +
 kernel/watchdog_hld_hpet.c | 334 +++++++++++++++++++++++++++++++++++++++++++++
 lib/Kconfig.debug          |  10 ++
 3 files changed, 345 insertions(+)
 create mode 100644 kernel/watchdog_hld_hpet.c

diff --git a/kernel/Makefile b/kernel/Makefile
index 0a0d86d..73c79b2 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -86,6 +86,7 @@ obj-$(CONFIG_KGDB) += debug/
 obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
 obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
 obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o watchdog_hld_perf.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_HPET) += watchdog_hld.o watchdog_hld_hpet.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
new file mode 100644
index 0000000..8fa4e55
--- /dev/null
+++ b/kernel/watchdog_hld_hpet.c
@@ -0,0 +1,334 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * A hardlockup detector driven by an HPET timer.
+ *
+ * Copyright (C) Intel Corporation 2018
+ */
+
+#define pr_fmt(fmt) "NMI hpet watchdog: " fmt
+
+#include <linux/nmi.h>
+#include <linux/hpet.h>
+#include <asm/hpet.h>
+
+#undef pr_fmt
+#define pr_fmt(fmt) "NMI hpet watchdog: " fmt
+
+static struct hpet_hld_data *hld_data;
+
+/**
+ * get_count() - Get the current count of the HPET timer
+ *
+ * Returns:
+ *
+ * Value of the main counter of the HPET timer
+ */
+static inline unsigned long get_count(void)
+{
+	return hpet_readq(HPET_COUNTER);
+}
+
+/**
+ * set_comparator() - Update the comparator in an HPET timer instance
+ * @hdata:	A data structure with the timer instance to update
+ * @cmp:	The value to write in the in the comparator registere
+ *
+ * Returns:
+ *
+ * None
+ */
+static inline void set_comparator(struct hpet_hld_data *hdata,
+				  unsigned long cmp)
+{
+	hpet_writeq(cmp, HPET_Tn_CMP(hdata->num));
+}
+
+/**
+ * kick_timer() - Reprogram timer to expire in the future
+ * @hdata:	A data structure with the timer instance to update
+ *
+ * Reprogram the timer to expire within watchdog_thresh seconds in the future.
+ *
+ * Returns:
+ *
+ * None
+ */
+static void kick_timer(struct hpet_hld_data *hdata)
+{
+	unsigned long new_compare, count;
+
+	/*
+	 * Update the comparator in increments of watch_thresh seconds relative
+	 * to the current count. Since watch_thresh is given in seconds, we
+	 * are able to update the comparator before the counter reaches such new
+	 * value.
+	 *
+	 * Let it wrap around if needed.
+	 */
+	count = get_count();
+
+	new_compare = count + watchdog_thresh * hdata->ticks_per_second;
+
+	set_comparator(hdata, new_compare);
+}
+
+/**
+ * disable() - Disable an HPET timer instance
+ * @hdata:	A data structure with the timer instance to disable
+ *
+ * Returns:
+ *
+ * None
+ */
+static void disable(struct hpet_hld_data *hdata)
+{
+	unsigned int v;
+
+	v = hpet_readl(HPET_Tn_CFG(hdata->num));
+	v &= ~HPET_TN_ENABLE;
+	hpet_writel(v, HPET_Tn_CFG(hdata->num));
+}
+
+/**
+ * enable() - Enable an HPET timer instance
+ * @hdata:	A data structure with the timer instance to enable
+ *
+ * Returns:
+ *
+ * None
+ */
+static void enable(struct hpet_hld_data *hdata)
+{
+	unsigned long v;
+
+	/* Clear any previously active interrupt. */
+	hpet_writel(BIT(hdata->num), HPET_STATUS);
+
+	v = hpet_readl(HPET_Tn_CFG(hdata->num));
+	v |= HPET_TN_ENABLE;
+	hpet_writel(v, HPET_Tn_CFG(hdata->num));
+}
+
+/**
+ * set_periodic() - Set an HPET timer instance in periodic mode
+ * @hdata:	A data structure with the timer instance to enable
+ *
+ * If the timer supports periodic mode, configure it in such mode.
+ * Returns:
+ *
+ * None
+ */
+static void set_periodic(struct hpet_hld_data *hdata)
+{
+	unsigned long v;
+
+	v = hpet_readl(HPET_Tn_CFG(hdata->num));
+	if (v & HPET_TN_PERIODIC_CAP) {
+		v |= HPET_TN_PERIODIC;
+		hpet_writel(v, HPET_Tn_CFG(hdata->num));
+		hdata->flags |= HPET_DEV_PERI_CAP;
+	}
+}
+
+/**
+ * is_hpet_wdt_interrupt() - Determine if an HPET timer caused interrupt
+ * @hdata:	A data structure with the timer instance to enable
+ *
+ * To be used when the timer is programmed in level-triggered mode, determine
+ * if an instance of an HPET timer indicates that it asserted an interrupt by
+ * checking the status register.
+ *
+ * Returns:
+ *
+ * True if a level-triggered timer asserted an interrupt. False otherwise.
+ */
+static bool is_hpet_wdt_interrupt(struct hpet_hld_data *hdata)
+{
+	unsigned long this_isr;
+	unsigned int lvl_trig;
+
+	this_isr = hpet_readl(HPET_STATUS) & BIT(hdata->num);
+
+	lvl_trig = hpet_readl(HPET_Tn_CFG(hdata->num)) & HPET_TN_LEVEL;
+
+	if (lvl_trig && this_isr)
+		return true;
+
+	return false;
+}
+
+/**
+ * hardlockup_detector_irq_handler() - Interrupt handler
+ * @irq:	Interrupt number
+ * @data:	Data associated with the interrupt
+ *
+ * A simple interrupt handler. Simply kick the timer and acknowledge the
+ * interrupt.
+ *
+ * Returns:
+ *
+ * IRQ_NONE if the HPET timer did not cause the interrupt. IRQ_HANDLED
+ * otherwise.
+ */
+static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
+{
+	struct hpet_hld_data *hdata = data;
+	unsigned int use_fsb;
+
+	use_fsb = hdata->flags & HPET_DEV_FSB_CAP;
+
+	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
+		return IRQ_NONE;
+
+	if (!(hdata->flags & HPET_DEV_PERI_CAP))
+		kick_timer(hdata);
+
+	/* Acknowledge interrupt if in level-triggered mode */
+	if (!use_fsb)
+		hpet_writel(BIT(hdata->num), HPET_STATUS);
+
+	return IRQ_HANDLED;
+}
+
+/**
+ * setup_irq_msi_mode() - Configure the timer to deliver an MSI interrupt
+ * @data:	Data associated with the instance of the HPET timer to configure
+ *
+ * Configure an instance of the HPET timer to deliver interrupts via the Front-
+ * Side Bus.
+ *
+ * Returns:
+ *
+ * 0 success. An error code in configuration was unsuccessful.
+ */
+static int setup_irq_msi_mode(struct hpet_hld_data *hdata)
+{
+	unsigned int v;
+
+	v = hpet_readl(HPET_Tn_CFG(hdata->num));
+
+	/*
+	 * If FSB interrupt delivery is used, configure as edge-triggered
+	 * interrupt. We are certain the interrupt comes from the HPET timer as
+	 * we receive the MSI message.
+	 *
+	 * Also, the FSB delivery mode and the FSB route are configured when the
+	 * interrupt is unmasked.
+	 */
+	v &= ~HPET_TN_LEVEL;
+
+	hpet_writel(v, HPET_Tn_CFG(hdata->num));
+
+	return 0;
+}
+
+/**
+ * setup_irq_legacy_mode() - Configure the timer to deliver an pin interrupt
+ * @data:	Data associated with the instance of the HPET timer to configure
+ *
+ * Configure an instance of the HPET timer to deliver interrupts via a pin of
+ * the IO APIC.
+ *
+ * Returns:
+ *
+ * 0 success. An error code in configuration was unsuccessful.
+ */
+static int setup_irq_legacy_mode(struct hpet_hld_data *hdata)
+{
+	int hwirq = hdata->irq;
+	unsigned long v;
+
+	v = hpet_readl(HPET_Tn_CFG(hdata->num));
+
+	v |= hwirq << HPET_TN_ROUTE_SHIFT;
+	hpet_writel(v, HPET_Tn_CFG(hdata->num));
+
+	/*
+	 * If IO APIC interrupt delivery is used, configure as level-triggered.
+	 * In this way, the ISR register can be used to determine if this HPET
+	 * timer caused the interrupt at the IO APIC pin.
+	 */
+	v |= HPET_TN_LEVEL;
+
+	/* Disable Front-Side Bus delivery. */
+	v &= ~HPET_TN_FSB;
+
+	hpet_writel(v, HPET_Tn_CFG(hdata->num));
+
+	return 0;
+}
+
+/**
+ * setup_hpet_irq() - Configure the interrupt delivery of an HPET timer
+ * @data:	Data associated with the instance of the HPET timer to configure
+ *
+ * Configure the interrupt parameters of an HPET timer. If supported, configure
+ * interrupts to be delivered via the Front-Side Bus. Also, install an interrupt
+ * handler.
+ *
+ * Returns:
+ *
+ * 0 success. An error code in configuration was unsuccessful.
+ */
+static int setup_hpet_irq(struct hpet_hld_data *hdata)
+{
+	int hwirq = hdata->irq, ret;
+
+	if (hdata->flags & HPET_DEV_FSB_CAP)
+		ret = setup_irq_msi_mode(hdata);
+	else
+		ret = setup_irq_legacy_mode(hdata);
+
+	if (ret)
+		return ret;
+
+	/*
+	 * Request an interrupt to activate the irq in all the needed domains.
+	 */
+	ret = request_irq(hwirq, hardlockup_detector_irq_handler,
+			  IRQF_TIMER, "hpet_hld", hdata);
+
+	return ret;
+}
+
+/**
+ * hardlockup_detector_hpet_init() - Initialize the hardlockup detector
+ *
+ * Only initialize and configure the detector if an HPET is available on the
+ * system.
+ *
+ * Returns:
+ *
+ * 0 success. An error code if initialization was unsuccessful.
+ */
+static int __init hardlockup_detector_hpet_init(void)
+{
+	int ret;
+
+	if (!is_hpet_enabled())
+		return -ENODEV;
+
+	hld_data = hpet_hardlockup_detector_assign_timer();
+	if (!hld_data)
+		return -ENODEV;
+
+	/* Disable before configuring. */
+	disable(hld_data);
+
+	set_periodic(hld_data);
+
+	/* Set timer for the first time relative to the current count. */
+	kick_timer(hld_data);
+
+	ret = setup_hpet_irq(hld_data);
+	if (ret)
+		return -ENODEV;
+
+	/*
+	 * Timer might have been enabled when the interrupt was unmasked.
+	 * This should be done via the .enable operation.
+	 */
+	disable(hld_data);
+
+	return 0;
+}
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c40c7b7..6e79833 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -828,6 +828,16 @@ config HARDLOCKUP_DETECTOR_PERF
 	bool
 	select SOFTLOCKUP_DETECTOR
 
+config HARDLOCKUP_DETECTOR_HPET
+	bool "Use HPET Timer for Hard Lockup Detection"
+	select SOFTLOCKUP_DETECTOR
+	select HARDLOCKUP_DETECTOR
+	depends on HPET_TIMER && HPET
+	help
+	  Say y to enable a hardlockup detector that is driven by an High-Precision
+	  Event Timer. In addition to selecting this option, the command-line
+	  parameter nmi_watchdog option. See Documentation/admin-guide/kernel-parameters.rst
+
 #
 # Enables a timestamp based low pass filter to compensate for perf based
 # hard lockup detection which runs too fast due to turbo modes.
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 16/23] watchdog/hardlockup: Add an HPET-based hardlockup detector
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Mathieu Desnoyers,
	Masami Hiramatsu, Peter Zijlstra

This is the initial implementation of a hardlockup detector driven by an
HPET timer. This initial implementation includes functions to control
the timer via its registers. It also requests such timer, installs
a minimal interrupt handler and performs the initial configuration of
the timer.

The detector is not functional at this stage. Subsequent changesets will
populate the NMI watchdog operations and register it with the lockup
detector.

This detector depends on HPET_TIMER since platform code performs the
initialization of the timer and maps its registers to memory. It depends
on HPET to compute the ticks per second of the timer.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/Makefile            |   1 +
 kernel/watchdog_hld_hpet.c | 334 +++++++++++++++++++++++++++++++++++++++++++++
 lib/Kconfig.debug          |  10 ++
 3 files changed, 345 insertions(+)
 create mode 100644 kernel/watchdog_hld_hpet.c

diff --git a/kernel/Makefile b/kernel/Makefile
index 0a0d86d..73c79b2 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -86,6 +86,7 @@ obj-$(CONFIG_KGDB) += debug/
 obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
 obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
 obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o watchdog_hld_perf.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_HPET) += watchdog_hld.o watchdog_hld_hpet.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
new file mode 100644
index 0000000..8fa4e55
--- /dev/null
+++ b/kernel/watchdog_hld_hpet.c
@@ -0,0 +1,334 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * A hardlockup detector driven by an HPET timer.
+ *
+ * Copyright (C) Intel Corporation 2018
+ */
+
+#define pr_fmt(fmt) "NMI hpet watchdog: " fmt
+
+#include <linux/nmi.h>
+#include <linux/hpet.h>
+#include <asm/hpet.h>
+
+#undef pr_fmt
+#define pr_fmt(fmt) "NMI hpet watchdog: " fmt
+
+static struct hpet_hld_data *hld_data;
+
+/**
+ * get_count() - Get the current count of the HPET timer
+ *
+ * Returns:
+ *
+ * Value of the main counter of the HPET timer
+ */
+static inline unsigned long get_count(void)
+{
+	return hpet_readq(HPET_COUNTER);
+}
+
+/**
+ * set_comparator() - Update the comparator in an HPET timer instance
+ * @hdata:	A data structure with the timer instance to update
+ * @cmp:	The value to write in the in the comparator registere
+ *
+ * Returns:
+ *
+ * None
+ */
+static inline void set_comparator(struct hpet_hld_data *hdata,
+				  unsigned long cmp)
+{
+	hpet_writeq(cmp, HPET_Tn_CMP(hdata->num));
+}
+
+/**
+ * kick_timer() - Reprogram timer to expire in the future
+ * @hdata:	A data structure with the timer instance to update
+ *
+ * Reprogram the timer to expire within watchdog_thresh seconds in the future.
+ *
+ * Returns:
+ *
+ * None
+ */
+static void kick_timer(struct hpet_hld_data *hdata)
+{
+	unsigned long new_compare, count;
+
+	/*
+	 * Update the comparator in increments of watch_thresh seconds relative
+	 * to the current count. Since watch_thresh is given in seconds, we
+	 * are able to update the comparator before the counter reaches such new
+	 * value.
+	 *
+	 * Let it wrap around if needed.
+	 */
+	count = get_count();
+
+	new_compare = count + watchdog_thresh * hdata->ticks_per_second;
+
+	set_comparator(hdata, new_compare);
+}
+
+/**
+ * disable() - Disable an HPET timer instance
+ * @hdata:	A data structure with the timer instance to disable
+ *
+ * Returns:
+ *
+ * None
+ */
+static void disable(struct hpet_hld_data *hdata)
+{
+	unsigned int v;
+
+	v = hpet_readl(HPET_Tn_CFG(hdata->num));
+	v &= ~HPET_TN_ENABLE;
+	hpet_writel(v, HPET_Tn_CFG(hdata->num));
+}
+
+/**
+ * enable() - Enable an HPET timer instance
+ * @hdata:	A data structure with the timer instance to enable
+ *
+ * Returns:
+ *
+ * None
+ */
+static void enable(struct hpet_hld_data *hdata)
+{
+	unsigned long v;
+
+	/* Clear any previously active interrupt. */
+	hpet_writel(BIT(hdata->num), HPET_STATUS);
+
+	v = hpet_readl(HPET_Tn_CFG(hdata->num));
+	v |= HPET_TN_ENABLE;
+	hpet_writel(v, HPET_Tn_CFG(hdata->num));
+}
+
+/**
+ * set_periodic() - Set an HPET timer instance in periodic mode
+ * @hdata:	A data structure with the timer instance to enable
+ *
+ * If the timer supports periodic mode, configure it in such mode.
+ * Returns:
+ *
+ * None
+ */
+static void set_periodic(struct hpet_hld_data *hdata)
+{
+	unsigned long v;
+
+	v = hpet_readl(HPET_Tn_CFG(hdata->num));
+	if (v & HPET_TN_PERIODIC_CAP) {
+		v |= HPET_TN_PERIODIC;
+		hpet_writel(v, HPET_Tn_CFG(hdata->num));
+		hdata->flags |= HPET_DEV_PERI_CAP;
+	}
+}
+
+/**
+ * is_hpet_wdt_interrupt() - Determine if an HPET timer caused interrupt
+ * @hdata:	A data structure with the timer instance to enable
+ *
+ * To be used when the timer is programmed in level-triggered mode, determine
+ * if an instance of an HPET timer indicates that it asserted an interrupt by
+ * checking the status register.
+ *
+ * Returns:
+ *
+ * True if a level-triggered timer asserted an interrupt. False otherwise.
+ */
+static bool is_hpet_wdt_interrupt(struct hpet_hld_data *hdata)
+{
+	unsigned long this_isr;
+	unsigned int lvl_trig;
+
+	this_isr = hpet_readl(HPET_STATUS) & BIT(hdata->num);
+
+	lvl_trig = hpet_readl(HPET_Tn_CFG(hdata->num)) & HPET_TN_LEVEL;
+
+	if (lvl_trig && this_isr)
+		return true;
+
+	return false;
+}
+
+/**
+ * hardlockup_detector_irq_handler() - Interrupt handler
+ * @irq:	Interrupt number
+ * @data:	Data associated with the interrupt
+ *
+ * A simple interrupt handler. Simply kick the timer and acknowledge the
+ * interrupt.
+ *
+ * Returns:
+ *
+ * IRQ_NONE if the HPET timer did not cause the interrupt. IRQ_HANDLED
+ * otherwise.
+ */
+static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
+{
+	struct hpet_hld_data *hdata = data;
+	unsigned int use_fsb;
+
+	use_fsb = hdata->flags & HPET_DEV_FSB_CAP;
+
+	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
+		return IRQ_NONE;
+
+	if (!(hdata->flags & HPET_DEV_PERI_CAP))
+		kick_timer(hdata);
+
+	/* Acknowledge interrupt if in level-triggered mode */
+	if (!use_fsb)
+		hpet_writel(BIT(hdata->num), HPET_STATUS);
+
+	return IRQ_HANDLED;
+}
+
+/**
+ * setup_irq_msi_mode() - Configure the timer to deliver an MSI interrupt
+ * @data:	Data associated with the instance of the HPET timer to configure
+ *
+ * Configure an instance of the HPET timer to deliver interrupts via the Front-
+ * Side Bus.
+ *
+ * Returns:
+ *
+ * 0 success. An error code in configuration was unsuccessful.
+ */
+static int setup_irq_msi_mode(struct hpet_hld_data *hdata)
+{
+	unsigned int v;
+
+	v = hpet_readl(HPET_Tn_CFG(hdata->num));
+
+	/*
+	 * If FSB interrupt delivery is used, configure as edge-triggered
+	 * interrupt. We are certain the interrupt comes from the HPET timer as
+	 * we receive the MSI message.
+	 *
+	 * Also, the FSB delivery mode and the FSB route are configured when the
+	 * interrupt is unmasked.
+	 */
+	v &= ~HPET_TN_LEVEL;
+
+	hpet_writel(v, HPET_Tn_CFG(hdata->num));
+
+	return 0;
+}
+
+/**
+ * setup_irq_legacy_mode() - Configure the timer to deliver an pin interrupt
+ * @data:	Data associated with the instance of the HPET timer to configure
+ *
+ * Configure an instance of the HPET timer to deliver interrupts via a pin of
+ * the IO APIC.
+ *
+ * Returns:
+ *
+ * 0 success. An error code in configuration was unsuccessful.
+ */
+static int setup_irq_legacy_mode(struct hpet_hld_data *hdata)
+{
+	int hwirq = hdata->irq;
+	unsigned long v;
+
+	v = hpet_readl(HPET_Tn_CFG(hdata->num));
+
+	v |= hwirq << HPET_TN_ROUTE_SHIFT;
+	hpet_writel(v, HPET_Tn_CFG(hdata->num));
+
+	/*
+	 * If IO APIC interrupt delivery is used, configure as level-triggered.
+	 * In this way, the ISR register can be used to determine if this HPET
+	 * timer caused the interrupt at the IO APIC pin.
+	 */
+	v |= HPET_TN_LEVEL;
+
+	/* Disable Front-Side Bus delivery. */
+	v &= ~HPET_TN_FSB;
+
+	hpet_writel(v, HPET_Tn_CFG(hdata->num));
+
+	return 0;
+}
+
+/**
+ * setup_hpet_irq() - Configure the interrupt delivery of an HPET timer
+ * @data:	Data associated with the instance of the HPET timer to configure
+ *
+ * Configure the interrupt parameters of an HPET timer. If supported, configure
+ * interrupts to be delivered via the Front-Side Bus. Also, install an interrupt
+ * handler.
+ *
+ * Returns:
+ *
+ * 0 success. An error code in configuration was unsuccessful.
+ */
+static int setup_hpet_irq(struct hpet_hld_data *hdata)
+{
+	int hwirq = hdata->irq, ret;
+
+	if (hdata->flags & HPET_DEV_FSB_CAP)
+		ret = setup_irq_msi_mode(hdata);
+	else
+		ret = setup_irq_legacy_mode(hdata);
+
+	if (ret)
+		return ret;
+
+	/*
+	 * Request an interrupt to activate the irq in all the needed domains.
+	 */
+	ret = request_irq(hwirq, hardlockup_detector_irq_handler,
+			  IRQF_TIMER, "hpet_hld", hdata);
+
+	return ret;
+}
+
+/**
+ * hardlockup_detector_hpet_init() - Initialize the hardlockup detector
+ *
+ * Only initialize and configure the detector if an HPET is available on the
+ * system.
+ *
+ * Returns:
+ *
+ * 0 success. An error code if initialization was unsuccessful.
+ */
+static int __init hardlockup_detector_hpet_init(void)
+{
+	int ret;
+
+	if (!is_hpet_enabled())
+		return -ENODEV;
+
+	hld_data = hpet_hardlockup_detector_assign_timer();
+	if (!hld_data)
+		return -ENODEV;
+
+	/* Disable before configuring. */
+	disable(hld_data);
+
+	set_periodic(hld_data);
+
+	/* Set timer for the first time relative to the current count. */
+	kick_timer(hld_data);
+
+	ret = setup_hpet_irq(hld_data);
+	if (ret)
+		return -ENODEV;
+
+	/*
+	 * Timer might have been enabled when the interrupt was unmasked.
+	 * This should be done via the .enable operation.
+	 */
+	disable(hld_data);
+
+	return 0;
+}
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c40c7b7..6e79833 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -828,6 +828,16 @@ config HARDLOCKUP_DETECTOR_PERF
 	bool
 	select SOFTLOCKUP_DETECTOR
 
+config HARDLOCKUP_DETECTOR_HPET
+	bool "Use HPET Timer for Hard Lockup Detection"
+	select SOFTLOCKUP_DETECTOR
+	select HARDLOCKUP_DETECTOR
+	depends on HPET_TIMER && HPET
+	help
+	  Say y to enable a hardlockup detector that is driven by an High-Precision
+	  Event Timer. In addition to selecting this option, the command-line
+	  parameter nmi_watchdog option. See Documentation/admin-guide/kernel-parameters.rst
+
 #
 # Enables a timestamp based low pass filter to compensate for perf based
 # hard lockup detection which runs too fast due to turbo modes.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
  2018-06-13  0:57 ` Ricardo Neri
  (?)
@ 2018-06-13  0:57   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Mathieu Desnoyers,
	Masami Hiramatsu, Peter Zijlstra, Andrew Morton,
	Philippe Ombredanne, Colin Ian King, Byungchul Park,
	Paul E. McKenney, Luis R. Rodriguez, Waiman Long, Josh Poimboeuf,
	Randy Dunlap, Davidlohr Bueso, Christoffer Dall, Marc Zyngier,
	Kai-Heng Feng, Konrad Rzeszutek Wilk, David Rientjes, iommu

In order to detect hardlockups, it is necessary to have the ability to
receive interrupts even when disabled: a non-maskable interrupt is
required. Add the flag IRQF_DELIVER_AS_NMI to the arguments of
request_irq() for this purpose.

Note that the timer, when programmed to deliver interrupts via the IO APIC
is programmed as level-triggered. This is to have an indication that the
NMI comes from HPET timer as indicated in the General Status Interrupt
Register. However, NMIs are always edge-triggered, thus a GSI edge-
triggered interrupt is now requested.

An NMI handler is also implemented. The handler looks for hardlockups and
kicks the timer.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/kernel/hpet.c     |  2 +-
 kernel/watchdog_hld_hpet.c | 55 +++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index fda6e19..5ca1953 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -205,7 +205,7 @@ int hpet_hardlockup_detector_assign_legacy_irq(struct hpet_hld_data *hdata)
 			break;
 		}
 
-		gsi = acpi_register_gsi(NULL, hwirq, ACPI_LEVEL_SENSITIVE,
+		gsi = acpi_register_gsi(NULL, hwirq, ACPI_EDGE_SENSITIVE,
 					ACPI_ACTIVE_LOW);
 		if (gsi > 0)
 			break;
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index 8fa4e55..3bedffa 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -10,6 +10,7 @@
 #include <linux/nmi.h>
 #include <linux/hpet.h>
 #include <asm/hpet.h>
+#include <asm/irq_remapping.h>
 
 #undef pr_fmt
 #define pr_fmt(fmt) "NMI hpet watchdog: " fmt
@@ -183,6 +184,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
 	if (!(hdata->flags & HPET_DEV_PERI_CAP))
 		kick_timer(hdata);
 
+	pr_err("This interrupt should not have happened. Ensure delivery mode is NMI.\n");
+
 	/* Acknowledge interrupt if in level-triggered mode */
 	if (!use_fsb)
 		hpet_writel(BIT(hdata->num), HPET_STATUS);
@@ -191,6 +194,47 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
 }
 
 /**
+ * hardlockup_detector_nmi_handler() - NMI Interrupt handler
+ * @val:	Attribute associated with the NMI. Not used.
+ * @regs:	Register values as seen when the NMI was asserted
+ *
+ * When an NMI is issued, look for hardlockups. If the timer is not periodic,
+ * kick it. The interrupt is always handled when if delivered via the
+ * Front-Side Bus.
+ *
+ * Returns:
+ *
+ * NMI_DONE if the HPET timer did not cause the interrupt. NMI_HANDLED
+ * otherwise.
+ */
+static int hardlockup_detector_nmi_handler(unsigned int val,
+					   struct pt_regs *regs)
+{
+	struct hpet_hld_data *hdata = hld_data;
+	unsigned int use_fsb;
+
+	/*
+	 * If FSB delivery mode is used, the timer interrupt is programmed as
+	 * edge-triggered and there is no need to check the ISR register.
+	 */
+	use_fsb = hdata->flags & HPET_DEV_FSB_CAP;
+
+	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
+		return NMI_DONE;
+
+	inspect_for_hardlockups(regs);
+
+	if (!(hdata->flags & HPET_DEV_PERI_CAP))
+		kick_timer(hdata);
+
+	/* Acknowledge interrupt if in level-triggered mode */
+	if (!use_fsb)
+		hpet_writel(BIT(hdata->num), HPET_STATUS);
+
+	return NMI_HANDLED;
+}
+
+/**
  * setup_irq_msi_mode() - Configure the timer to deliver an MSI interrupt
  * @data:	Data associated with the instance of the HPET timer to configure
  *
@@ -282,11 +326,20 @@ static int setup_hpet_irq(struct hpet_hld_data *hdata)
 	if (ret)
 		return ret;
 
+	/* Register the NMI handler, which will be the actual handler we use. */
+	ret = register_nmi_handler(NMI_LOCAL, hardlockup_detector_nmi_handler,
+				   0, "hpet_hld");
+	if (ret)
+		return ret;
+
 	/*
 	 * Request an interrupt to activate the irq in all the needed domains.
 	 */
 	ret = request_irq(hwirq, hardlockup_detector_irq_handler,
-			  IRQF_TIMER, "hpet_hld", hdata);
+			  IRQF_TIMER | IRQF_DELIVER_AS_NMI,
+			  "hpet_hld", hdata);
+	if (ret)
+		unregister_nmi_handler(NMI_LOCAL, "hpet_hld");
 
 	return ret;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Mathieu Desnoyers,
	Masami Hiramatsu, Peter Zijlstra

In order to detect hardlockups, it is necessary to have the ability to
receive interrupts even when disabled: a non-maskable interrupt is
required. Add the flag IRQF_DELIVER_AS_NMI to the arguments of
request_irq() for this purpose.

Note that the timer, when programmed to deliver interrupts via the IO APIC
is programmed as level-triggered. This is to have an indication that the
NMI comes from HPET timer as indicated in the General Status Interrupt
Register. However, NMIs are always edge-triggered, thus a GSI edge-
triggered interrupt is now requested.

An NMI handler is also implemented. The handler looks for hardlockups and
kicks the timer.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/kernel/hpet.c     |  2 +-
 kernel/watchdog_hld_hpet.c | 55 +++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index fda6e19..5ca1953 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -205,7 +205,7 @@ int hpet_hardlockup_detector_assign_legacy_irq(struct hpet_hld_data *hdata)
 			break;
 		}
 
-		gsi = acpi_register_gsi(NULL, hwirq, ACPI_LEVEL_SENSITIVE,
+		gsi = acpi_register_gsi(NULL, hwirq, ACPI_EDGE_SENSITIVE,
 					ACPI_ACTIVE_LOW);
 		if (gsi > 0)
 			break;
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index 8fa4e55..3bedffa 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -10,6 +10,7 @@
 #include <linux/nmi.h>
 #include <linux/hpet.h>
 #include <asm/hpet.h>
+#include <asm/irq_remapping.h>
 
 #undef pr_fmt
 #define pr_fmt(fmt) "NMI hpet watchdog: " fmt
@@ -183,6 +184,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
 	if (!(hdata->flags & HPET_DEV_PERI_CAP))
 		kick_timer(hdata);
 
+	pr_err("This interrupt should not have happened. Ensure delivery mode is NMI.\n");
+
 	/* Acknowledge interrupt if in level-triggered mode */
 	if (!use_fsb)
 		hpet_writel(BIT(hdata->num), HPET_STATUS);
@@ -191,6 +194,47 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
 }
 
 /**
+ * hardlockup_detector_nmi_handler() - NMI Interrupt handler
+ * @val:	Attribute associated with the NMI. Not used.
+ * @regs:	Register values as seen when the NMI was asserted
+ *
+ * When an NMI is issued, look for hardlockups. If the timer is not periodic,
+ * kick it. The interrupt is always handled when if delivered via the
+ * Front-Side Bus.
+ *
+ * Returns:
+ *
+ * NMI_DONE if the HPET timer did not cause the interrupt. NMI_HANDLED
+ * otherwise.
+ */
+static int hardlockup_detector_nmi_handler(unsigned int val,
+					   struct pt_regs *regs)
+{
+	struct hpet_hld_data *hdata = hld_data;
+	unsigned int use_fsb;
+
+	/*
+	 * If FSB delivery mode is used, the timer interrupt is programmed as
+	 * edge-triggered and there is no need to check the ISR register.
+	 */
+	use_fsb = hdata->flags & HPET_DEV_FSB_CAP;
+
+	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
+		return NMI_DONE;
+
+	inspect_for_hardlockups(regs);
+
+	if (!(hdata->flags & HPET_DEV_PERI_CAP))
+		kick_timer(hdata);
+
+	/* Acknowledge interrupt if in level-triggered mode */
+	if (!use_fsb)
+		hpet_writel(BIT(hdata->num), HPET_STATUS);
+
+	return NMI_HANDLED;
+}
+
+/**
  * setup_irq_msi_mode() - Configure the timer to deliver an MSI interrupt
  * @data:	Data associated with the instance of the HPET timer to configure
  *
@@ -282,11 +326,20 @@ static int setup_hpet_irq(struct hpet_hld_data *hdata)
 	if (ret)
 		return ret;
 
+	/* Register the NMI handler, which will be the actual handler we use. */
+	ret = register_nmi_handler(NMI_LOCAL, hardlockup_detector_nmi_handler,
+				   0, "hpet_hld");
+	if (ret)
+		return ret;
+
 	/*
 	 * Request an interrupt to activate the irq in all the needed domains.
 	 */
 	ret = request_irq(hwirq, hardlockup_detector_irq_handler,
-			  IRQF_TIMER, "hpet_hld", hdata);
+			  IRQF_TIMER | IRQF_DELIVER_AS_NMI,
+			  "hpet_hld", hdata);
+	if (ret)
+		unregister_nmi_handler(NMI_LOCAL, "hpet_hld");
 
 	return ret;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Mathieu Desnoyers,
	Masami Hiramatsu, Peter Zijlstra

In order to detect hardlockups, it is necessary to have the ability to
receive interrupts even when disabled: a non-maskable interrupt is
required. Add the flag IRQF_DELIVER_AS_NMI to the arguments of
request_irq() for this purpose.

Note that the timer, when programmed to deliver interrupts via the IO APIC
is programmed as level-triggered. This is to have an indication that the
NMI comes from HPET timer as indicated in the General Status Interrupt
Register. However, NMIs are always edge-triggered, thus a GSI edge-
triggered interrupt is now requested.

An NMI handler is also implemented. The handler looks for hardlockups and
kicks the timer.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/kernel/hpet.c     |  2 +-
 kernel/watchdog_hld_hpet.c | 55 +++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index fda6e19..5ca1953 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -205,7 +205,7 @@ int hpet_hardlockup_detector_assign_legacy_irq(struct hpet_hld_data *hdata)
 			break;
 		}
 
-		gsi = acpi_register_gsi(NULL, hwirq, ACPI_LEVEL_SENSITIVE,
+		gsi = acpi_register_gsi(NULL, hwirq, ACPI_EDGE_SENSITIVE,
 					ACPI_ACTIVE_LOW);
 		if (gsi > 0)
 			break;
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index 8fa4e55..3bedffa 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -10,6 +10,7 @@
 #include <linux/nmi.h>
 #include <linux/hpet.h>
 #include <asm/hpet.h>
+#include <asm/irq_remapping.h>
 
 #undef pr_fmt
 #define pr_fmt(fmt) "NMI hpet watchdog: " fmt
@@ -183,6 +184,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
 	if (!(hdata->flags & HPET_DEV_PERI_CAP))
 		kick_timer(hdata);
 
+	pr_err("This interrupt should not have happened. Ensure delivery mode is NMI.\n");
+
 	/* Acknowledge interrupt if in level-triggered mode */
 	if (!use_fsb)
 		hpet_writel(BIT(hdata->num), HPET_STATUS);
@@ -191,6 +194,47 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
 }
 
 /**
+ * hardlockup_detector_nmi_handler() - NMI Interrupt handler
+ * @val:	Attribute associated with the NMI. Not used.
+ * @regs:	Register values as seen when the NMI was asserted
+ *
+ * When an NMI is issued, look for hardlockups. If the timer is not periodic,
+ * kick it. The interrupt is always handled when if delivered via the
+ * Front-Side Bus.
+ *
+ * Returns:
+ *
+ * NMI_DONE if the HPET timer did not cause the interrupt. NMI_HANDLED
+ * otherwise.
+ */
+static int hardlockup_detector_nmi_handler(unsigned int val,
+					   struct pt_regs *regs)
+{
+	struct hpet_hld_data *hdata = hld_data;
+	unsigned int use_fsb;
+
+	/*
+	 * If FSB delivery mode is used, the timer interrupt is programmed as
+	 * edge-triggered and there is no need to check the ISR register.
+	 */
+	use_fsb = hdata->flags & HPET_DEV_FSB_CAP;
+
+	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
+		return NMI_DONE;
+
+	inspect_for_hardlockups(regs);
+
+	if (!(hdata->flags & HPET_DEV_PERI_CAP))
+		kick_timer(hdata);
+
+	/* Acknowledge interrupt if in level-triggered mode */
+	if (!use_fsb)
+		hpet_writel(BIT(hdata->num), HPET_STATUS);
+
+	return NMI_HANDLED;
+}
+
+/**
  * setup_irq_msi_mode() - Configure the timer to deliver an MSI interrupt
  * @data:	Data associated with the instance of the HPET timer to configure
  *
@@ -282,11 +326,20 @@ static int setup_hpet_irq(struct hpet_hld_data *hdata)
 	if (ret)
 		return ret;
 
+	/* Register the NMI handler, which will be the actual handler we use. */
+	ret = register_nmi_handler(NMI_LOCAL, hardlockup_detector_nmi_handler,
+				   0, "hpet_hld");
+	if (ret)
+		return ret;
+
 	/*
 	 * Request an interrupt to activate the irq in all the needed domains.
 	 */
 	ret = request_irq(hwirq, hardlockup_detector_irq_handler,
-			  IRQF_TIMER, "hpet_hld", hdata);
+			  IRQF_TIMER | IRQF_DELIVER_AS_NMI,
+			  "hpet_hld", hdata);
+	if (ret)
+		unregister_nmi_handler(NMI_LOCAL, "hpet_hld");
 
 	return ret;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 18/23] watchdog/hardlockup/hpet: Add the NMI watchdog operations
  2018-06-13  0:57 ` Ricardo Neri
  (?)
@ 2018-06-13  0:57   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Mathieu Desnoyers,
	Masami Hiramatsu, Peter Zijlstra, Andrew Morton,
	Philippe Ombredanne, Colin Ian King, Byungchul Park,
	Paul E. McKenney, Luis R. Rodriguez, Waiman Long, Josh Poimboeuf,
	Randy Dunlap, Davidlohr Bueso, Christoffer Dall, Marc Zyngier,
	Kai-Heng Feng, Konrad Rzeszutek Wilk, David Rientjes, iommu

Implement the start, stop and disable operations of the HPET-based NMI
watchdog. Given that a single timer is used to monitor all the CPUs in
the system, it is necessary to define a cpumask that keeps track of the
CPUs that can be monitored. This cpumask is protected with a spin lock.

As individual CPUs are put online and offline, this cpumask is updated.
CPUs are unconditionally cleared from the mask when going offline. When
going online, the CPU is set in the mask only if is one of the CPUs allowed
to be monitored by the watchdog.

It is not necessary to implement a start function. The NMI watchdog will
be enabled when there is at least one CPU to monitor.

The disable function clears the CPU mask and disables the timer.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hpet.h |  2 +
 include/linux/nmi.h         |  1 +
 kernel/watchdog_hld_hpet.c  | 98 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 101 insertions(+)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 33309b7..6ace2d1 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -124,6 +124,8 @@ struct hpet_hld_data {
 	u32		irq;
 	u32		flags;
 	u64		ticks_per_second;
+	struct cpumask	monitored_mask;
+	spinlock_t	lock; /* serialized access to monitored_mask */
 };
 
 extern struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void);
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index e608762..23e20d2 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -129,6 +129,7 @@ struct nmi_watchdog_ops {
 };
 
 extern struct nmi_watchdog_ops hardlockup_detector_perf_ops;
+extern struct nmi_watchdog_ops hardlockup_detector_hpet_ops;
 
 void watchdog_nmi_stop(void);
 void watchdog_nmi_start(void);
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index 3bedffa..857e051 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -345,6 +345,91 @@ static int setup_hpet_irq(struct hpet_hld_data *hdata)
 }
 
 /**
+ * hardlockup_detector_hpet_enable() - Enable the hardlockup detector
+ *
+ * The hardlockup detector is enabled for the CPU that executes the
+ * function. It is only enabled if such CPU is allowed to be monitored
+ * by the lockup detector.
+ *
+ * Returns:
+ *
+ * None
+ *
+ */
+static void hardlockup_detector_hpet_enable(void)
+{
+	struct cpumask *allowed = watchdog_get_allowed_cpumask();
+	unsigned int cpu = smp_processor_id();
+
+	if (!hld_data)
+		return;
+
+	if (!cpumask_test_cpu(cpu, allowed))
+		return;
+
+	spin_lock(&hld_data->lock);
+
+	cpumask_set_cpu(cpu, &hld_data->monitored_mask);
+
+	/*
+	 * If this is the first CPU to be monitored, set everything in motion:
+	 * move the interrupt to this CPU, kick and enable the timer.
+	 */
+	if (cpumask_weight(&hld_data->monitored_mask) == 1) {
+		if (irq_set_affinity(hld_data->irq, cpumask_of(cpu))) {
+			spin_unlock(&hld_data->lock);
+			pr_err("Unable to enable on CPU %d.!\n", cpu);
+			return;
+		}
+
+		kick_timer(hld_data);
+		enable(hld_data);
+	}
+
+	spin_unlock(&hld_data->lock);
+}
+
+/**
+ * hardlockup_detector_hpet_disable() - Disable the hardlockup detector
+ *
+ * The hardlockup detector is disabled for the CPU that executes the
+ * function.
+ *
+ * None
+ */
+static void hardlockup_detector_hpet_disable(void)
+{
+	if (!hld_data)
+		return;
+
+	spin_lock(&hld_data->lock);
+
+	cpumask_clear_cpu(smp_processor_id(), &hld_data->monitored_mask);
+
+	/* Only disable the timer if there are no more CPUs to monitor. */
+	if (!cpumask_weight(&hld_data->monitored_mask))
+		disable(hld_data);
+
+	spin_unlock(&hld_data->lock);
+}
+
+/**
+ * hardlockup_detector_hpet_stop() - Stop the NMI watchdog on all CPUs
+ *
+ * Returns:
+ *
+ * None
+ */
+static void hardlockup_detector_hpet_stop(void)
+{
+	disable(hld_data);
+
+	spin_lock(&hld_data->lock);
+	cpumask_clear(&hld_data->monitored_mask);
+	spin_unlock(&hld_data->lock);
+}
+
+/**
  * hardlockup_detector_hpet_init() - Initialize the hardlockup detector
  *
  * Only initialize and configure the detector if an HPET is available on the
@@ -383,5 +468,18 @@ static int __init hardlockup_detector_hpet_init(void)
 	 */
 	disable(hld_data);
 
+	spin_lock_init(&hld_data->lock);
+
+	spin_lock(&hld_data->lock);
+	cpumask_clear(&hld_data->monitored_mask);
+	spin_unlock(&hld_data->lock);
+
 	return 0;
 }
+
+struct nmi_watchdog_ops hardlockup_detector_hpet_ops = {
+	.init		= hardlockup_detector_hpet_init,
+	.enable		= hardlockup_detector_hpet_enable,
+	.disable	= hardlockup_detector_hpet_disable,
+	.stop		= hardlockup_detector_hpet_stop
+};
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 18/23] watchdog/hardlockup/hpet: Add the NMI watchdog operations
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Mathieu Desnoyers,
	Masami Hiramatsu, Peter Zijlstra

Implement the start, stop and disable operations of the HPET-based NMI
watchdog. Given that a single timer is used to monitor all the CPUs in
the system, it is necessary to define a cpumask that keeps track of the
CPUs that can be monitored. This cpumask is protected with a spin lock.

As individual CPUs are put online and offline, this cpumask is updated.
CPUs are unconditionally cleared from the mask when going offline. When
going online, the CPU is set in the mask only if is one of the CPUs allowed
to be monitored by the watchdog.

It is not necessary to implement a start function. The NMI watchdog will
be enabled when there is at least one CPU to monitor.

The disable function clears the CPU mask and disables the timer.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hpet.h |  2 +
 include/linux/nmi.h         |  1 +
 kernel/watchdog_hld_hpet.c  | 98 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 101 insertions(+)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 33309b7..6ace2d1 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -124,6 +124,8 @@ struct hpet_hld_data {
 	u32		irq;
 	u32		flags;
 	u64		ticks_per_second;
+	struct cpumask	monitored_mask;
+	spinlock_t	lock; /* serialized access to monitored_mask */
 };
 
 extern struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void);
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index e608762..23e20d2 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -129,6 +129,7 @@ struct nmi_watchdog_ops {
 };
 
 extern struct nmi_watchdog_ops hardlockup_detector_perf_ops;
+extern struct nmi_watchdog_ops hardlockup_detector_hpet_ops;
 
 void watchdog_nmi_stop(void);
 void watchdog_nmi_start(void);
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index 3bedffa..857e051 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -345,6 +345,91 @@ static int setup_hpet_irq(struct hpet_hld_data *hdata)
 }
 
 /**
+ * hardlockup_detector_hpet_enable() - Enable the hardlockup detector
+ *
+ * The hardlockup detector is enabled for the CPU that executes the
+ * function. It is only enabled if such CPU is allowed to be monitored
+ * by the lockup detector.
+ *
+ * Returns:
+ *
+ * None
+ *
+ */
+static void hardlockup_detector_hpet_enable(void)
+{
+	struct cpumask *allowed = watchdog_get_allowed_cpumask();
+	unsigned int cpu = smp_processor_id();
+
+	if (!hld_data)
+		return;
+
+	if (!cpumask_test_cpu(cpu, allowed))
+		return;
+
+	spin_lock(&hld_data->lock);
+
+	cpumask_set_cpu(cpu, &hld_data->monitored_mask);
+
+	/*
+	 * If this is the first CPU to be monitored, set everything in motion:
+	 * move the interrupt to this CPU, kick and enable the timer.
+	 */
+	if (cpumask_weight(&hld_data->monitored_mask) = 1) {
+		if (irq_set_affinity(hld_data->irq, cpumask_of(cpu))) {
+			spin_unlock(&hld_data->lock);
+			pr_err("Unable to enable on CPU %d.!\n", cpu);
+			return;
+		}
+
+		kick_timer(hld_data);
+		enable(hld_data);
+	}
+
+	spin_unlock(&hld_data->lock);
+}
+
+/**
+ * hardlockup_detector_hpet_disable() - Disable the hardlockup detector
+ *
+ * The hardlockup detector is disabled for the CPU that executes the
+ * function.
+ *
+ * None
+ */
+static void hardlockup_detector_hpet_disable(void)
+{
+	if (!hld_data)
+		return;
+
+	spin_lock(&hld_data->lock);
+
+	cpumask_clear_cpu(smp_processor_id(), &hld_data->monitored_mask);
+
+	/* Only disable the timer if there are no more CPUs to monitor. */
+	if (!cpumask_weight(&hld_data->monitored_mask))
+		disable(hld_data);
+
+	spin_unlock(&hld_data->lock);
+}
+
+/**
+ * hardlockup_detector_hpet_stop() - Stop the NMI watchdog on all CPUs
+ *
+ * Returns:
+ *
+ * None
+ */
+static void hardlockup_detector_hpet_stop(void)
+{
+	disable(hld_data);
+
+	spin_lock(&hld_data->lock);
+	cpumask_clear(&hld_data->monitored_mask);
+	spin_unlock(&hld_data->lock);
+}
+
+/**
  * hardlockup_detector_hpet_init() - Initialize the hardlockup detector
  *
  * Only initialize and configure the detector if an HPET is available on the
@@ -383,5 +468,18 @@ static int __init hardlockup_detector_hpet_init(void)
 	 */
 	disable(hld_data);
 
+	spin_lock_init(&hld_data->lock);
+
+	spin_lock(&hld_data->lock);
+	cpumask_clear(&hld_data->monitored_mask);
+	spin_unlock(&hld_data->lock);
+
 	return 0;
 }
+
+struct nmi_watchdog_ops hardlockup_detector_hpet_ops = {
+	.init		= hardlockup_detector_hpet_init,
+	.enable		= hardlockup_detector_hpet_enable,
+	.disable	= hardlockup_detector_hpet_disable,
+	.stop		= hardlockup_detector_hpet_stop
+};
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 18/23] watchdog/hardlockup/hpet: Add the NMI watchdog operations
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Mathieu Desnoyers,
	Masami Hiramatsu, Peter Zijlstra

Implement the start, stop and disable operations of the HPET-based NMI
watchdog. Given that a single timer is used to monitor all the CPUs in
the system, it is necessary to define a cpumask that keeps track of the
CPUs that can be monitored. This cpumask is protected with a spin lock.

As individual CPUs are put online and offline, this cpumask is updated.
CPUs are unconditionally cleared from the mask when going offline. When
going online, the CPU is set in the mask only if is one of the CPUs allowed
to be monitored by the watchdog.

It is not necessary to implement a start function. The NMI watchdog will
be enabled when there is at least one CPU to monitor.

The disable function clears the CPU mask and disables the timer.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hpet.h |  2 +
 include/linux/nmi.h         |  1 +
 kernel/watchdog_hld_hpet.c  | 98 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 101 insertions(+)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 33309b7..6ace2d1 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -124,6 +124,8 @@ struct hpet_hld_data {
 	u32		irq;
 	u32		flags;
 	u64		ticks_per_second;
+	struct cpumask	monitored_mask;
+	spinlock_t	lock; /* serialized access to monitored_mask */
 };
 
 extern struct hpet_hld_data *hpet_hardlockup_detector_assign_timer(void);
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index e608762..23e20d2 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -129,6 +129,7 @@ struct nmi_watchdog_ops {
 };
 
 extern struct nmi_watchdog_ops hardlockup_detector_perf_ops;
+extern struct nmi_watchdog_ops hardlockup_detector_hpet_ops;
 
 void watchdog_nmi_stop(void);
 void watchdog_nmi_start(void);
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index 3bedffa..857e051 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -345,6 +345,91 @@ static int setup_hpet_irq(struct hpet_hld_data *hdata)
 }
 
 /**
+ * hardlockup_detector_hpet_enable() - Enable the hardlockup detector
+ *
+ * The hardlockup detector is enabled for the CPU that executes the
+ * function. It is only enabled if such CPU is allowed to be monitored
+ * by the lockup detector.
+ *
+ * Returns:
+ *
+ * None
+ *
+ */
+static void hardlockup_detector_hpet_enable(void)
+{
+	struct cpumask *allowed = watchdog_get_allowed_cpumask();
+	unsigned int cpu = smp_processor_id();
+
+	if (!hld_data)
+		return;
+
+	if (!cpumask_test_cpu(cpu, allowed))
+		return;
+
+	spin_lock(&hld_data->lock);
+
+	cpumask_set_cpu(cpu, &hld_data->monitored_mask);
+
+	/*
+	 * If this is the first CPU to be monitored, set everything in motion:
+	 * move the interrupt to this CPU, kick and enable the timer.
+	 */
+	if (cpumask_weight(&hld_data->monitored_mask) == 1) {
+		if (irq_set_affinity(hld_data->irq, cpumask_of(cpu))) {
+			spin_unlock(&hld_data->lock);
+			pr_err("Unable to enable on CPU %d.!\n", cpu);
+			return;
+		}
+
+		kick_timer(hld_data);
+		enable(hld_data);
+	}
+
+	spin_unlock(&hld_data->lock);
+}
+
+/**
+ * hardlockup_detector_hpet_disable() - Disable the hardlockup detector
+ *
+ * The hardlockup detector is disabled for the CPU that executes the
+ * function.
+ *
+ * None
+ */
+static void hardlockup_detector_hpet_disable(void)
+{
+	if (!hld_data)
+		return;
+
+	spin_lock(&hld_data->lock);
+
+	cpumask_clear_cpu(smp_processor_id(), &hld_data->monitored_mask);
+
+	/* Only disable the timer if there are no more CPUs to monitor. */
+	if (!cpumask_weight(&hld_data->monitored_mask))
+		disable(hld_data);
+
+	spin_unlock(&hld_data->lock);
+}
+
+/**
+ * hardlockup_detector_hpet_stop() - Stop the NMI watchdog on all CPUs
+ *
+ * Returns:
+ *
+ * None
+ */
+static void hardlockup_detector_hpet_stop(void)
+{
+	disable(hld_data);
+
+	spin_lock(&hld_data->lock);
+	cpumask_clear(&hld_data->monitored_mask);
+	spin_unlock(&hld_data->lock);
+}
+
+/**
  * hardlockup_detector_hpet_init() - Initialize the hardlockup detector
  *
  * Only initialize and configure the detector if an HPET is available on the
@@ -383,5 +468,18 @@ static int __init hardlockup_detector_hpet_init(void)
 	 */
 	disable(hld_data);
 
+	spin_lock_init(&hld_data->lock);
+
+	spin_lock(&hld_data->lock);
+	cpumask_clear(&hld_data->monitored_mask);
+	spin_unlock(&hld_data->lock);
+
 	return 0;
 }
+
+struct nmi_watchdog_ops hardlockup_detector_hpet_ops = {
+	.init		= hardlockup_detector_hpet_init,
+	.enable		= hardlockup_detector_hpet_enable,
+	.disable	= hardlockup_detector_hpet_disable,
+	.stop		= hardlockup_detector_hpet_stop
+};
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 19/23] watchdog/hardlockup: Make arch_touch_nmi_watchdog() to hpet-based implementation
  2018-06-13  0:57 ` Ricardo Neri
  (?)
@ 2018-06-13  0:57   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, David S. Miller,
	Benjamin Herrenschmidt, Paul Mackerras, Mathieu Desnoyers,
	Masami Hiramatsu, Peter Zijlstra, Andrew Morton,
	Philippe Ombredanne, Colin Ian King, Byungchul Park,
	Paul E. McKenney, Luis R. Rodriguez, Waiman Long, Josh Poimboeuf,
	Randy Dunlap, Davidlohr Bueso, Christoffer Dall, Marc Zyngier,
	Kai-Heng Feng, Konrad Rzeszutek Wilk, David Rientjes, iommu

CPU architectures that have an NMI watchdog use arch_touch_nmi_watchdog()
to briefly ignore the hardlockup detector. If the architecture does not
have an NMI watchdog, one can be constructed using a source of non-
maskable interrupts. In this case, arch_touch_nmi_watchdog() is common
to any underlying hardware resource used to drive the detector and needs
to be available to other kernel subsystems if hardware different from perf
drives the detector.

There exists perf-based and HPET-based implementations. Make it available
to the latter.

For clarity, wrap this function in a separate preprocessor conditional
from functions which are truly specific to the perf-based implementation.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/nmi.h | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 23e20d2..8b6b814 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -89,16 +89,22 @@ static inline void hardlockup_detector_disable(void) {}
 # define NMI_WATCHDOG_SYSCTL_PERM	0444
 #endif
 
-#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) || \
+    defined(CONFIG_HARDLOCKUP_DETECTOR_HPET)
 extern void arch_touch_nmi_watchdog(void);
+#else
+# if !defined(CONFIG_HAVE_NMI_WATCHDOG)
+static inline void arch_touch_nmi_watchdog(void) {}
+# endif
+#endif
+
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
 #else
 static inline void hardlockup_detector_perf_stop(void) { }
 static inline void hardlockup_detector_perf_restart(void) { }
-# if !defined(CONFIG_HAVE_NMI_WATCHDOG)
-static inline void arch_touch_nmi_watchdog(void) {}
-# endif
+
 #endif
 
 /**
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 19/23] watchdog/hardlockup: Make arch_touch_nmi_watchdog() to hpet-based implementation
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, David S. Miller,
	Benjamin Herrenschmidt, Paul Mackerras

CPU architectures that have an NMI watchdog use arch_touch_nmi_watchdog()
to briefly ignore the hardlockup detector. If the architecture does not
have an NMI watchdog, one can be constructed using a source of non-
maskable interrupts. In this case, arch_touch_nmi_watchdog() is common
to any underlying hardware resource used to drive the detector and needs
to be available to other kernel subsystems if hardware different from perf
drives the detector.

There exists perf-based and HPET-based implementations. Make it available
to the latter.

For clarity, wrap this function in a separate preprocessor conditional
from functions which are truly specific to the perf-based implementation.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/nmi.h | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 23e20d2..8b6b814 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -89,16 +89,22 @@ static inline void hardlockup_detector_disable(void) {}
 # define NMI_WATCHDOG_SYSCTL_PERM	0444
 #endif
 
-#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) || \
+    defined(CONFIG_HARDLOCKUP_DETECTOR_HPET)
 extern void arch_touch_nmi_watchdog(void);
+#else
+# if !defined(CONFIG_HAVE_NMI_WATCHDOG)
+static inline void arch_touch_nmi_watchdog(void) {}
+# endif
+#endif
+
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
 #else
 static inline void hardlockup_detector_perf_stop(void) { }
 static inline void hardlockup_detector_perf_restart(void) { }
-# if !defined(CONFIG_HAVE_NMI_WATCHDOG)
-static inline void arch_touch_nmi_watchdog(void) {}
-# endif
+
 #endif
 
 /**
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 19/23] watchdog/hardlockup: Make arch_touch_nmi_watchdog() to hpet-based implementation
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, David S. Miller,
	Benjamin Herrenschmidt, Paul Mackerras

CPU architectures that have an NMI watchdog use arch_touch_nmi_watchdog()
to briefly ignore the hardlockup detector. If the architecture does not
have an NMI watchdog, one can be constructed using a source of non-
maskable interrupts. In this case, arch_touch_nmi_watchdog() is common
to any underlying hardware resource used to drive the detector and needs
to be available to other kernel subsystems if hardware different from perf
drives the detector.

There exists perf-based and HPET-based implementations. Make it available
to the latter.

For clarity, wrap this function in a separate preprocessor conditional
from functions which are truly specific to the perf-based implementation.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/nmi.h | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 23e20d2..8b6b814 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -89,16 +89,22 @@ static inline void hardlockup_detector_disable(void) {}
 # define NMI_WATCHDOG_SYSCTL_PERM	0444
 #endif
 
-#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) || \
+    defined(CONFIG_HARDLOCKUP_DETECTOR_HPET)
 extern void arch_touch_nmi_watchdog(void);
+#else
+# if !defined(CONFIG_HAVE_NMI_WATCHDOG)
+static inline void arch_touch_nmi_watchdog(void) {}
+# endif
+#endif
+
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
 #else
 static inline void hardlockup_detector_perf_stop(void) { }
 static inline void hardlockup_detector_perf_restart(void) { }
-# if !defined(CONFIG_HAVE_NMI_WATCHDOG)
-static inline void arch_touch_nmi_watchdog(void) {}
-# endif
+
 #endif
 
 /**
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs
  2018-06-13  0:57 ` Ricardo Neri
  (?)
@ 2018-06-13  0:57   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Mathieu Desnoyers,
	Masami Hiramatsu, Peter Zijlstra, Andrew Morton,
	Philippe Ombredanne, Colin Ian King, Byungchul Park,
	Paul E. McKenney, Luis R. Rodriguez, Waiman Long, Josh Poimboeuf,
	Randy Dunlap, Davidlohr Bueso, Christoffer Dall, Marc Zyngier,
	Kai-Heng Feng, Konrad Rzeszutek Wilk, David Rientjes, iommu

In order to detect hardlockups in all the monitored CPUs, move the
interrupt to the next monitored CPU when handling the NMI interrupt; wrap
around when reaching the highest CPU in the mask. This rotation is achieved
by setting the affinity mask to only contain the next CPU to monitor.

In order to prevent our interrupt to be reassigned to another CPU, flag
it as IRQF_NONBALANCING.

The cpumask monitored_mask keeps track of the CPUs that the watchdog
should monitor. This structure is updated when the NMI watchdog is
enabled or disabled in a specific CPU. As this mask can change
concurrently as CPUs are put online or offline and the watchdog is
disabled or enabled, a lock is required to protect the monitored_mask.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/watchdog_hld_hpet.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index 857e051..c40acfd 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -10,6 +10,7 @@
 #include <linux/nmi.h>
 #include <linux/hpet.h>
 #include <asm/hpet.h>
+#include <asm/cpumask.h>
 #include <asm/irq_remapping.h>
 
 #undef pr_fmt
@@ -199,8 +200,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
  * @regs:	Register values as seen when the NMI was asserted
  *
  * When an NMI is issued, look for hardlockups. If the timer is not periodic,
- * kick it. The interrupt is always handled when if delivered via the
- * Front-Side Bus.
+ * kick it. Move the interrupt to the next monitored CPU. The interrupt is
+ * always handled when if delivered via the Front-Side Bus.
  *
  * Returns:
  *
@@ -211,7 +212,7 @@ static int hardlockup_detector_nmi_handler(unsigned int val,
 					   struct pt_regs *regs)
 {
 	struct hpet_hld_data *hdata = hld_data;
-	unsigned int use_fsb;
+	unsigned int use_fsb, cpu;
 
 	/*
 	 * If FSB delivery mode is used, the timer interrupt is programmed as
@@ -222,8 +223,27 @@ static int hardlockup_detector_nmi_handler(unsigned int val,
 	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
 		return NMI_DONE;
 
+	/* There are no CPUs to monitor. */
+	if (!cpumask_weight(&hdata->monitored_mask))
+		return NMI_HANDLED;
+
 	inspect_for_hardlockups(regs);
 
+	/*
+	 * Target a new CPU. Keep trying until we find a monitored CPU. CPUs
+	 * are addded and removed to this mask at cpu_up() and cpu_down(),
+	 * respectively. Thus, the interrupt should be able to be moved to
+	 * the next monitored CPU.
+	 */
+	spin_lock(&hld_data->lock);
+	for_each_cpu_wrap(cpu, &hdata->monitored_mask, smp_processor_id() + 1) {
+		if (!irq_set_affinity(hld_data->irq, cpumask_of(cpu)))
+			break;
+		pr_err("Could not assign interrupt to CPU %d. Trying with next present CPU.\n",
+		       cpu);
+	}
+	spin_unlock(&hld_data->lock);
+
 	if (!(hdata->flags & HPET_DEV_PERI_CAP))
 		kick_timer(hdata);
 
@@ -336,7 +356,7 @@ static int setup_hpet_irq(struct hpet_hld_data *hdata)
 	 * Request an interrupt to activate the irq in all the needed domains.
 	 */
 	ret = request_irq(hwirq, hardlockup_detector_irq_handler,
-			  IRQF_TIMER | IRQF_DELIVER_AS_NMI,
+			  IRQF_TIMER | IRQF_DELIVER_AS_NMI | IRQF_NOBALANCING,
 			  "hpet_hld", hdata);
 	if (ret)
 		unregister_nmi_handler(NMI_LOCAL, "hpet_hld");
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Mathieu Desnoyers,
	Masami Hiramatsu, Peter Zijlstra

In order to detect hardlockups in all the monitored CPUs, move the
interrupt to the next monitored CPU when handling the NMI interrupt; wrap
around when reaching the highest CPU in the mask. This rotation is achieved
by setting the affinity mask to only contain the next CPU to monitor.

In order to prevent our interrupt to be reassigned to another CPU, flag
it as IRQF_NONBALANCING.

The cpumask monitored_mask keeps track of the CPUs that the watchdog
should monitor. This structure is updated when the NMI watchdog is
enabled or disabled in a specific CPU. As this mask can change
concurrently as CPUs are put online or offline and the watchdog is
disabled or enabled, a lock is required to protect the monitored_mask.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/watchdog_hld_hpet.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index 857e051..c40acfd 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -10,6 +10,7 @@
 #include <linux/nmi.h>
 #include <linux/hpet.h>
 #include <asm/hpet.h>
+#include <asm/cpumask.h>
 #include <asm/irq_remapping.h>
 
 #undef pr_fmt
@@ -199,8 +200,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
  * @regs:	Register values as seen when the NMI was asserted
  *
  * When an NMI is issued, look for hardlockups. If the timer is not periodic,
- * kick it. The interrupt is always handled when if delivered via the
- * Front-Side Bus.
+ * kick it. Move the interrupt to the next monitored CPU. The interrupt is
+ * always handled when if delivered via the Front-Side Bus.
  *
  * Returns:
  *
@@ -211,7 +212,7 @@ static int hardlockup_detector_nmi_handler(unsigned int val,
 					   struct pt_regs *regs)
 {
 	struct hpet_hld_data *hdata = hld_data;
-	unsigned int use_fsb;
+	unsigned int use_fsb, cpu;
 
 	/*
 	 * If FSB delivery mode is used, the timer interrupt is programmed as
@@ -222,8 +223,27 @@ static int hardlockup_detector_nmi_handler(unsigned int val,
 	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
 		return NMI_DONE;
 
+	/* There are no CPUs to monitor. */
+	if (!cpumask_weight(&hdata->monitored_mask))
+		return NMI_HANDLED;
+
 	inspect_for_hardlockups(regs);
 
+	/*
+	 * Target a new CPU. Keep trying until we find a monitored CPU. CPUs
+	 * are addded and removed to this mask at cpu_up() and cpu_down(),
+	 * respectively. Thus, the interrupt should be able to be moved to
+	 * the next monitored CPU.
+	 */
+	spin_lock(&hld_data->lock);
+	for_each_cpu_wrap(cpu, &hdata->monitored_mask, smp_processor_id() + 1) {
+		if (!irq_set_affinity(hld_data->irq, cpumask_of(cpu)))
+			break;
+		pr_err("Could not assign interrupt to CPU %d. Trying with next present CPU.\n",
+		       cpu);
+	}
+	spin_unlock(&hld_data->lock);
+
 	if (!(hdata->flags & HPET_DEV_PERI_CAP))
 		kick_timer(hdata);
 
@@ -336,7 +356,7 @@ static int setup_hpet_irq(struct hpet_hld_data *hdata)
 	 * Request an interrupt to activate the irq in all the needed domains.
 	 */
 	ret = request_irq(hwirq, hardlockup_detector_irq_handler,
-			  IRQF_TIMER | IRQF_DELIVER_AS_NMI,
+			  IRQF_TIMER | IRQF_DELIVER_AS_NMI | IRQF_NOBALANCING,
 			  "hpet_hld", hdata);
 	if (ret)
 		unregister_nmi_handler(NMI_LOCAL, "hpet_hld");
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Mathieu Desnoyers,
	Masami Hiramatsu, Peter Zijlstra

In order to detect hardlockups in all the monitored CPUs, move the
interrupt to the next monitored CPU when handling the NMI interrupt; wrap
around when reaching the highest CPU in the mask. This rotation is achieved
by setting the affinity mask to only contain the next CPU to monitor.

In order to prevent our interrupt to be reassigned to another CPU, flag
it as IRQF_NONBALANCING.

The cpumask monitored_mask keeps track of the CPUs that the watchdog
should monitor. This structure is updated when the NMI watchdog is
enabled or disabled in a specific CPU. As this mask can change
concurrently as CPUs are put online or offline and the watchdog is
disabled or enabled, a lock is required to protect the monitored_mask.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/watchdog_hld_hpet.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index 857e051..c40acfd 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -10,6 +10,7 @@
 #include <linux/nmi.h>
 #include <linux/hpet.h>
 #include <asm/hpet.h>
+#include <asm/cpumask.h>
 #include <asm/irq_remapping.h>
 
 #undef pr_fmt
@@ -199,8 +200,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
  * @regs:	Register values as seen when the NMI was asserted
  *
  * When an NMI is issued, look for hardlockups. If the timer is not periodic,
- * kick it. The interrupt is always handled when if delivered via the
- * Front-Side Bus.
+ * kick it. Move the interrupt to the next monitored CPU. The interrupt is
+ * always handled when if delivered via the Front-Side Bus.
  *
  * Returns:
  *
@@ -211,7 +212,7 @@ static int hardlockup_detector_nmi_handler(unsigned int val,
 					   struct pt_regs *regs)
 {
 	struct hpet_hld_data *hdata = hld_data;
-	unsigned int use_fsb;
+	unsigned int use_fsb, cpu;
 
 	/*
 	 * If FSB delivery mode is used, the timer interrupt is programmed as
@@ -222,8 +223,27 @@ static int hardlockup_detector_nmi_handler(unsigned int val,
 	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
 		return NMI_DONE;
 
+	/* There are no CPUs to monitor. */
+	if (!cpumask_weight(&hdata->monitored_mask))
+		return NMI_HANDLED;
+
 	inspect_for_hardlockups(regs);
 
+	/*
+	 * Target a new CPU. Keep trying until we find a monitored CPU. CPUs
+	 * are addded and removed to this mask at cpu_up() and cpu_down(),
+	 * respectively. Thus, the interrupt should be able to be moved to
+	 * the next monitored CPU.
+	 */
+	spin_lock(&hld_data->lock);
+	for_each_cpu_wrap(cpu, &hdata->monitored_mask, smp_processor_id() + 1) {
+		if (!irq_set_affinity(hld_data->irq, cpumask_of(cpu)))
+			break;
+		pr_err("Could not assign interrupt to CPU %d. Trying with next present CPU.\n",
+		       cpu);
+	}
+	spin_unlock(&hld_data->lock);
+
 	if (!(hdata->flags & HPET_DEV_PERI_CAP))
 		kick_timer(hdata);
 
@@ -336,7 +356,7 @@ static int setup_hpet_irq(struct hpet_hld_data *hdata)
 	 * Request an interrupt to activate the irq in all the needed domains.
 	 */
 	ret = request_irq(hwirq, hardlockup_detector_irq_handler,
-			  IRQF_TIMER | IRQF_DELIVER_AS_NMI,
+			  IRQF_TIMER | IRQF_DELIVER_AS_NMI | IRQF_NOBALANCING,
 			  "hpet_hld", hdata);
 	if (ret)
 		unregister_nmi_handler(NMI_LOCAL, "hpet_hld");
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 21/23] watchdog/hardlockup/hpet: Adjust timer expiration on the number of monitored CPUs
  2018-06-13  0:57 ` Ricardo Neri
  (?)
@ 2018-06-13  0:57   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Mathieu Desnoyers,
	Masami Hiramatsu, Peter Zijlstra, Andrew Morton,
	Philippe Ombredanne, Colin Ian King, Byungchul Park,
	Paul E. McKenney, Luis R. Rodriguez, Waiman Long, Josh Poimboeuf,
	Randy Dunlap, Davidlohr Bueso, Christoffer Dall, Marc Zyngier,
	Kai-Heng Feng, Konrad Rzeszutek Wilk, David Rientjes, iommu

Each CPU should be monitored for hardlockups every watchdog_thresh seconds.
Since all the CPUs in the system are monitored by the same timer and the
timer interrupt is rotated among the monitored CPUs, the timer must expire
every watchdog_thresh/N seconds; where N is the number of monitored CPUs.

A new member is added to struct hpet_wdt_data to determine the per-CPU
ticks per second. This quantity is used to program the comparator of the
timer.

The ticks-per-CPU quantity is updated every time when the number of
monitored CPUs changes: when the watchdog is enabled or disabled for
a specific CPU.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hpet.h |  1 +
 kernel/watchdog_hld_hpet.c  | 41 ++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 6ace2d1..e67818d 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -124,6 +124,7 @@ struct hpet_hld_data {
 	u32		irq;
 	u32		flags;
 	u64		ticks_per_second;
+	u64		ticks_per_cpu;
 	struct cpumask	monitored_mask;
 	spinlock_t	lock; /* serialized access to monitored_mask */
 };
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index c40acfd..ebb820d 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -65,11 +65,21 @@ static void kick_timer(struct hpet_hld_data *hdata)
 	 * are able to update the comparator before the counter reaches such new
 	 * value.
 	 *
+	 * The timer must monitor each CPU every watch_thresh seconds. Hence the
+	 * timer expiration must be:
+	 *
+	 *    watch_thresh/N
+	 *
+	 * where N is the number of monitored CPUs.
+	 *
+	 * in order to monitor all the online CPUs. ticks_per_cpu gives the
+	 * number of ticks needed to meet the condition above.
+	 *
 	 * Let it wrap around if needed.
 	 */
 	count = get_count();
 
-	new_compare = count + watchdog_thresh * hdata->ticks_per_second;
+	new_compare = count + watchdog_thresh * hdata->ticks_per_cpu;
 
 	set_comparator(hdata, new_compare);
 }
@@ -160,6 +170,33 @@ static bool is_hpet_wdt_interrupt(struct hpet_hld_data *hdata)
 }
 
 /**
+ * update_ticks_per_cpu() - Update the number of HPET ticks per CPU
+ * @hdata:	struct with the timer's the ticks-per-second and CPU mask
+ *
+ * From the overall ticks-per-second of the timer, compute the number of ticks
+ * after which the timer should expire to monitor each CPU every watch_thresh
+ * seconds. The ticks-per-cpu quantity is computed using the number of CPUs that
+ * the watchdog currently monitors.
+ *
+ * Returns:
+ *
+ * None
+ *
+ */
+static void update_ticks_per_cpu(struct hpet_hld_data *hdata)
+{
+	unsigned int num_cpus = cpumask_weight(&hdata->monitored_mask);
+	unsigned long long temp = hdata->ticks_per_second;
+
+	/* Only update if there are monitored CPUs. */
+	if (!num_cpus)
+		return;
+
+	do_div(temp, num_cpus);
+	hdata->ticks_per_cpu = temp;
+}
+
+/**
  * hardlockup_detector_irq_handler() - Interrupt handler
  * @irq:	Interrupt number
  * @data:	Data associated with the interrupt
@@ -390,6 +427,7 @@ static void hardlockup_detector_hpet_enable(void)
 	spin_lock(&hld_data->lock);
 
 	cpumask_set_cpu(cpu, &hld_data->monitored_mask);
+	update_ticks_per_cpu(hld_data);
 
 	/*
 	 * If this is the first CPU to be monitored, set everything in motion:
@@ -425,6 +463,7 @@ static void hardlockup_detector_hpet_disable(void)
 	spin_lock(&hld_data->lock);
 
 	cpumask_clear_cpu(smp_processor_id(), &hld_data->monitored_mask);
+	update_ticks_per_cpu(hld_data);
 
 	/* Only disable the timer if there are no more CPUs to monitor. */
 	if (!cpumask_weight(&hld_data->monitored_mask))
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 21/23] watchdog/hardlockup/hpet: Adjust timer expiration on the number of monitored CPUs
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Mathieu Desnoyers,
	Masami Hiramatsu, Peter Zijlstra

Each CPU should be monitored for hardlockups every watchdog_thresh seconds.
Since all the CPUs in the system are monitored by the same timer and the
timer interrupt is rotated among the monitored CPUs, the timer must expire
every watchdog_thresh/N seconds; where N is the number of monitored CPUs.

A new member is added to struct hpet_wdt_data to determine the per-CPU
ticks per second. This quantity is used to program the comparator of the
timer.

The ticks-per-CPU quantity is updated every time when the number of
monitored CPUs changes: when the watchdog is enabled or disabled for
a specific CPU.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hpet.h |  1 +
 kernel/watchdog_hld_hpet.c  | 41 ++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 6ace2d1..e67818d 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -124,6 +124,7 @@ struct hpet_hld_data {
 	u32		irq;
 	u32		flags;
 	u64		ticks_per_second;
+	u64		ticks_per_cpu;
 	struct cpumask	monitored_mask;
 	spinlock_t	lock; /* serialized access to monitored_mask */
 };
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index c40acfd..ebb820d 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -65,11 +65,21 @@ static void kick_timer(struct hpet_hld_data *hdata)
 	 * are able to update the comparator before the counter reaches such new
 	 * value.
 	 *
+	 * The timer must monitor each CPU every watch_thresh seconds. Hence the
+	 * timer expiration must be:
+	 *
+	 *    watch_thresh/N
+	 *
+	 * where N is the number of monitored CPUs.
+	 *
+	 * in order to monitor all the online CPUs. ticks_per_cpu gives the
+	 * number of ticks needed to meet the condition above.
+	 *
 	 * Let it wrap around if needed.
 	 */
 	count = get_count();
 
-	new_compare = count + watchdog_thresh * hdata->ticks_per_second;
+	new_compare = count + watchdog_thresh * hdata->ticks_per_cpu;
 
 	set_comparator(hdata, new_compare);
 }
@@ -160,6 +170,33 @@ static bool is_hpet_wdt_interrupt(struct hpet_hld_data *hdata)
 }
 
 /**
+ * update_ticks_per_cpu() - Update the number of HPET ticks per CPU
+ * @hdata:	struct with the timer's the ticks-per-second and CPU mask
+ *
+ * From the overall ticks-per-second of the timer, compute the number of ticks
+ * after which the timer should expire to monitor each CPU every watch_thresh
+ * seconds. The ticks-per-cpu quantity is computed using the number of CPUs that
+ * the watchdog currently monitors.
+ *
+ * Returns:
+ *
+ * None
+ *
+ */
+static void update_ticks_per_cpu(struct hpet_hld_data *hdata)
+{
+	unsigned int num_cpus = cpumask_weight(&hdata->monitored_mask);
+	unsigned long long temp = hdata->ticks_per_second;
+
+	/* Only update if there are monitored CPUs. */
+	if (!num_cpus)
+		return;
+
+	do_div(temp, num_cpus);
+	hdata->ticks_per_cpu = temp;
+}
+
+/**
  * hardlockup_detector_irq_handler() - Interrupt handler
  * @irq:	Interrupt number
  * @data:	Data associated with the interrupt
@@ -390,6 +427,7 @@ static void hardlockup_detector_hpet_enable(void)
 	spin_lock(&hld_data->lock);
 
 	cpumask_set_cpu(cpu, &hld_data->monitored_mask);
+	update_ticks_per_cpu(hld_data);
 
 	/*
 	 * If this is the first CPU to be monitored, set everything in motion:
@@ -425,6 +463,7 @@ static void hardlockup_detector_hpet_disable(void)
 	spin_lock(&hld_data->lock);
 
 	cpumask_clear_cpu(smp_processor_id(), &hld_data->monitored_mask);
+	update_ticks_per_cpu(hld_data);
 
 	/* Only disable the timer if there are no more CPUs to monitor. */
 	if (!cpumask_weight(&hld_data->monitored_mask))
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 21/23] watchdog/hardlockup/hpet: Adjust timer expiration on the number of monitored CPUs
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Mathieu Desnoyers,
	Masami Hiramatsu, Peter Zijlstra

Each CPU should be monitored for hardlockups every watchdog_thresh seconds.
Since all the CPUs in the system are monitored by the same timer and the
timer interrupt is rotated among the monitored CPUs, the timer must expire
every watchdog_thresh/N seconds; where N is the number of monitored CPUs.

A new member is added to struct hpet_wdt_data to determine the per-CPU
ticks per second. This quantity is used to program the comparator of the
timer.

The ticks-per-CPU quantity is updated every time when the number of
monitored CPUs changes: when the watchdog is enabled or disabled for
a specific CPU.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hpet.h |  1 +
 kernel/watchdog_hld_hpet.c  | 41 ++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 6ace2d1..e67818d 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -124,6 +124,7 @@ struct hpet_hld_data {
 	u32		irq;
 	u32		flags;
 	u64		ticks_per_second;
+	u64		ticks_per_cpu;
 	struct cpumask	monitored_mask;
 	spinlock_t	lock; /* serialized access to monitored_mask */
 };
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index c40acfd..ebb820d 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -65,11 +65,21 @@ static void kick_timer(struct hpet_hld_data *hdata)
 	 * are able to update the comparator before the counter reaches such new
 	 * value.
 	 *
+	 * The timer must monitor each CPU every watch_thresh seconds. Hence the
+	 * timer expiration must be:
+	 *
+	 *    watch_thresh/N
+	 *
+	 * where N is the number of monitored CPUs.
+	 *
+	 * in order to monitor all the online CPUs. ticks_per_cpu gives the
+	 * number of ticks needed to meet the condition above.
+	 *
 	 * Let it wrap around if needed.
 	 */
 	count = get_count();
 
-	new_compare = count + watchdog_thresh * hdata->ticks_per_second;
+	new_compare = count + watchdog_thresh * hdata->ticks_per_cpu;
 
 	set_comparator(hdata, new_compare);
 }
@@ -160,6 +170,33 @@ static bool is_hpet_wdt_interrupt(struct hpet_hld_data *hdata)
 }
 
 /**
+ * update_ticks_per_cpu() - Update the number of HPET ticks per CPU
+ * @hdata:	struct with the timer's the ticks-per-second and CPU mask
+ *
+ * From the overall ticks-per-second of the timer, compute the number of ticks
+ * after which the timer should expire to monitor each CPU every watch_thresh
+ * seconds. The ticks-per-cpu quantity is computed using the number of CPUs that
+ * the watchdog currently monitors.
+ *
+ * Returns:
+ *
+ * None
+ *
+ */
+static void update_ticks_per_cpu(struct hpet_hld_data *hdata)
+{
+	unsigned int num_cpus = cpumask_weight(&hdata->monitored_mask);
+	unsigned long long temp = hdata->ticks_per_second;
+
+	/* Only update if there are monitored CPUs. */
+	if (!num_cpus)
+		return;
+
+	do_div(temp, num_cpus);
+	hdata->ticks_per_cpu = temp;
+}
+
+/**
  * hardlockup_detector_irq_handler() - Interrupt handler
  * @irq:	Interrupt number
  * @data:	Data associated with the interrupt
@@ -390,6 +427,7 @@ static void hardlockup_detector_hpet_enable(void)
 	spin_lock(&hld_data->lock);
 
 	cpumask_set_cpu(cpu, &hld_data->monitored_mask);
+	update_ticks_per_cpu(hld_data);
 
 	/*
 	 * If this is the first CPU to be monitored, set everything in motion:
@@ -425,6 +463,7 @@ static void hardlockup_detector_hpet_disable(void)
 	spin_lock(&hld_data->lock);
 
 	cpumask_clear_cpu(smp_processor_id(), &hld_data->monitored_mask);
+	update_ticks_per_cpu(hld_data);
 
 	/* Only disable the timer if there are no more CPUs to monitor. */
 	if (!cpumask_weight(&hld_data->monitored_mask))
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 22/23] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter
  2018-06-13  0:57 ` Ricardo Neri
  (?)
@ 2018-06-13  0:57   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Mathieu Desnoyers,
	Masami Hiramatsu, Peter Zijlstra, Andrew Morton,
	Philippe Ombredanne, Colin Ian King, Byungchul Park,
	Paul E. McKenney, Luis R. Rodriguez, Waiman Long, Josh Poimboeuf,
	Randy Dunlap, Davidlohr Bueso, Christoffer Dall, Marc Zyngier,
	Kai-Heng Feng, Konrad Rzeszutek Wilk, David Rientjes, iommu

Keep the HPET-based hardlockup detector disabled unless explicitly enabled
via a command line argument. If such parameter is not given, the hardlockup
detector will fallback to use the perf-based implementation.

The function hardlockup_panic_setup() is updated to return 0 in order to
to allow __setup functions of specific hardlockup detectors (in this case
hardlockup_detector_hpet_setup()) to inspect the nmi_watchdog boot
parameter.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
--
checkpatch gives the following warning:

CHECK: __setup appears un-documented -- check Documentation/admin-guide/kernel-parameters.rst
+__setup("nmi_watchdog=", hardlockup_detector_hpet_setup);

This is a false-positive as the option nmi_watchdog is already
documented. The option is re-evaluated in this file as well.
---
 Documentation/admin-guide/kernel-parameters.txt |  5 ++++-
 kernel/watchdog.c                               |  2 +-
 kernel/watchdog_hld_hpet.c                      | 13 +++++++++++++
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f2040d4..a8833c7 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2577,7 +2577,7 @@
 			Format: [state][,regs][,debounce][,die]
 
 	nmi_watchdog=	[KNL,BUGS=X86] Debugging features for SMP kernels
-			Format: [panic,][nopanic,][num]
+			Format: [panic,][nopanic,][num,][hpet]
 			Valid num: 0 or 1
 			0 - turn hardlockup detector in nmi_watchdog off
 			1 - turn hardlockup detector in nmi_watchdog on
@@ -2587,6 +2587,9 @@
 			please see 'nowatchdog'.
 			This is useful when you use a panic=... timeout and
 			need the box quickly up again.
+			When hpet is specified, the NMI watchdog will be driven
+			by an HPET timer, if available in the system. Otherwise,
+			the perf-based implementation will be used.
 
 			These settings can be accessed at runtime via
 			the nmi_watchdog and hardlockup_panic sysctls.
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b94bbe3..b5ce6e4 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -84,7 +84,7 @@ static int __init hardlockup_panic_setup(char *str)
 		nmi_watchdog_user_enabled = 0;
 	else if (!strncmp(str, "1", 1))
 		nmi_watchdog_user_enabled = 1;
-	return 1;
+	return 0;
 }
 __setup("nmi_watchdog=", hardlockup_panic_setup);
 
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index ebb820d..12e5937 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -17,6 +17,7 @@
 #define pr_fmt(fmt) "NMI hpet watchdog: " fmt
 
 static struct hpet_hld_data *hld_data;
+static bool hardlockup_use_hpet;
 
 /**
  * get_count() - Get the current count of the HPET timer
@@ -488,6 +489,15 @@ static void hardlockup_detector_hpet_stop(void)
 	spin_unlock(&hld_data->lock);
 }
 
+static int __init hardlockup_detector_hpet_setup(char *str)
+{
+	if (strstr(str, "hpet"))
+		hardlockup_use_hpet = true;
+
+	return 0;
+}
+__setup("nmi_watchdog=", hardlockup_detector_hpet_setup);
+
 /**
  * hardlockup_detector_hpet_init() - Initialize the hardlockup detector
  *
@@ -502,6 +512,9 @@ static int __init hardlockup_detector_hpet_init(void)
 {
 	int ret;
 
+	if (!hardlockup_use_hpet)
+		return -EINVAL;
+
 	if (!is_hpet_enabled())
 		return -ENODEV;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 22/23] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Mathieu Desnoyers,
	Masami Hiramatsu, Peter Zijlstra

Keep the HPET-based hardlockup detector disabled unless explicitly enabled
via a command line argument. If such parameter is not given, the hardlockup
detector will fallback to use the perf-based implementation.

The function hardlockup_panic_setup() is updated to return 0 in order to
to allow __setup functions of specific hardlockup detectors (in this case
hardlockup_detector_hpet_setup()) to inspect the nmi_watchdog boot
parameter.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
--
checkpatch gives the following warning:

CHECK: __setup appears un-documented -- check Documentation/admin-guide/kernel-parameters.rst
+__setup("nmi_watchdog=", hardlockup_detector_hpet_setup);

This is a false-positive as the option nmi_watchdog is already
documented. The option is re-evaluated in this file as well.
---
 Documentation/admin-guide/kernel-parameters.txt |  5 ++++-
 kernel/watchdog.c                               |  2 +-
 kernel/watchdog_hld_hpet.c                      | 13 +++++++++++++
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f2040d4..a8833c7 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2577,7 +2577,7 @@
 			Format: [state][,regs][,debounce][,die]
 
 	nmi_watchdog=	[KNL,BUGS=X86] Debugging features for SMP kernels
-			Format: [panic,][nopanic,][num]
+			Format: [panic,][nopanic,][num,][hpet]
 			Valid num: 0 or 1
 			0 - turn hardlockup detector in nmi_watchdog off
 			1 - turn hardlockup detector in nmi_watchdog on
@@ -2587,6 +2587,9 @@
 			please see 'nowatchdog'.
 			This is useful when you use a panic=... timeout and
 			need the box quickly up again.
+			When hpet is specified, the NMI watchdog will be driven
+			by an HPET timer, if available in the system. Otherwise,
+			the perf-based implementation will be used.
 
 			These settings can be accessed at runtime via
 			the nmi_watchdog and hardlockup_panic sysctls.
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b94bbe3..b5ce6e4 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -84,7 +84,7 @@ static int __init hardlockup_panic_setup(char *str)
 		nmi_watchdog_user_enabled = 0;
 	else if (!strncmp(str, "1", 1))
 		nmi_watchdog_user_enabled = 1;
-	return 1;
+	return 0;
 }
 __setup("nmi_watchdog=", hardlockup_panic_setup);
 
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index ebb820d..12e5937 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -17,6 +17,7 @@
 #define pr_fmt(fmt) "NMI hpet watchdog: " fmt
 
 static struct hpet_hld_data *hld_data;
+static bool hardlockup_use_hpet;
 
 /**
  * get_count() - Get the current count of the HPET timer
@@ -488,6 +489,15 @@ static void hardlockup_detector_hpet_stop(void)
 	spin_unlock(&hld_data->lock);
 }
 
+static int __init hardlockup_detector_hpet_setup(char *str)
+{
+	if (strstr(str, "hpet"))
+		hardlockup_use_hpet = true;
+
+	return 0;
+}
+__setup("nmi_watchdog=", hardlockup_detector_hpet_setup);
+
 /**
  * hardlockup_detector_hpet_init() - Initialize the hardlockup detector
  *
@@ -502,6 +512,9 @@ static int __init hardlockup_detector_hpet_init(void)
 {
 	int ret;
 
+	if (!hardlockup_use_hpet)
+		return -EINVAL;
+
 	if (!is_hpet_enabled())
 		return -ENODEV;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 22/23] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Mathieu Desnoyers,
	Masami Hiramatsu, Peter Zijlstra

Keep the HPET-based hardlockup detector disabled unless explicitly enabled
via a command line argument. If such parameter is not given, the hardlockup
detector will fallback to use the perf-based implementation.

The function hardlockup_panic_setup() is updated to return 0 in order to
to allow __setup functions of specific hardlockup detectors (in this case
hardlockup_detector_hpet_setup()) to inspect the nmi_watchdog boot
parameter.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
--
checkpatch gives the following warning:

CHECK: __setup appears un-documented -- check Documentation/admin-guide/kernel-parameters.rst
+__setup("nmi_watchdog=", hardlockup_detector_hpet_setup);

This is a false-positive as the option nmi_watchdog is already
documented. The option is re-evaluated in this file as well.
---
 Documentation/admin-guide/kernel-parameters.txt |  5 ++++-
 kernel/watchdog.c                               |  2 +-
 kernel/watchdog_hld_hpet.c                      | 13 +++++++++++++
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f2040d4..a8833c7 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2577,7 +2577,7 @@
 			Format: [state][,regs][,debounce][,die]
 
 	nmi_watchdog=	[KNL,BUGS=X86] Debugging features for SMP kernels
-			Format: [panic,][nopanic,][num]
+			Format: [panic,][nopanic,][num,][hpet]
 			Valid num: 0 or 1
 			0 - turn hardlockup detector in nmi_watchdog off
 			1 - turn hardlockup detector in nmi_watchdog on
@@ -2587,6 +2587,9 @@
 			please see 'nowatchdog'.
 			This is useful when you use a panic=... timeout and
 			need the box quickly up again.
+			When hpet is specified, the NMI watchdog will be driven
+			by an HPET timer, if available in the system. Otherwise,
+			the perf-based implementation will be used.
 
 			These settings can be accessed at runtime via
 			the nmi_watchdog and hardlockup_panic sysctls.
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b94bbe3..b5ce6e4 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -84,7 +84,7 @@ static int __init hardlockup_panic_setup(char *str)
 		nmi_watchdog_user_enabled = 0;
 	else if (!strncmp(str, "1", 1))
 		nmi_watchdog_user_enabled = 1;
-	return 1;
+	return 0;
 }
 __setup("nmi_watchdog=", hardlockup_panic_setup);
 
diff --git a/kernel/watchdog_hld_hpet.c b/kernel/watchdog_hld_hpet.c
index ebb820d..12e5937 100644
--- a/kernel/watchdog_hld_hpet.c
+++ b/kernel/watchdog_hld_hpet.c
@@ -17,6 +17,7 @@
 #define pr_fmt(fmt) "NMI hpet watchdog: " fmt
 
 static struct hpet_hld_data *hld_data;
+static bool hardlockup_use_hpet;
 
 /**
  * get_count() - Get the current count of the HPET timer
@@ -488,6 +489,15 @@ static void hardlockup_detector_hpet_stop(void)
 	spin_unlock(&hld_data->lock);
 }
 
+static int __init hardlockup_detector_hpet_setup(char *str)
+{
+	if (strstr(str, "hpet"))
+		hardlockup_use_hpet = true;
+
+	return 0;
+}
+__setup("nmi_watchdog=", hardlockup_detector_hpet_setup);
+
 /**
  * hardlockup_detector_hpet_init() - Initialize the hardlockup detector
  *
@@ -502,6 +512,9 @@ static int __init hardlockup_detector_hpet_init(void)
 {
 	int ret;
 
+	if (!hardlockup_use_hpet)
+		return -EINVAL;
+
 	if (!is_hpet_enabled())
 		return -ENODEV;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 23/23] watchdog/hardlockup: Activate the HPET-based lockup detector
  2018-06-13  0:57 ` Ricardo Neri
  (?)
@ 2018-06-13  0:57   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Mathieu Desnoyers,
	Masami Hiramatsu, Peter Zijlstra, Andrew Morton,
	Philippe Ombredanne, Colin Ian King, Byungchul Park,
	Paul E. McKenney, Luis R. Rodriguez, Waiman Long, Josh Poimboeuf,
	Randy Dunlap, Davidlohr Bueso, Christoffer Dall, Marc Zyngier,
	Kai-Heng Feng, Konrad Rzeszutek Wilk, David Rientjes, iommu

Now that the implementation of the HPET-based hardlockup detector is
complete, enable it. It will be used only if it can be initialized
successfully. Otherwise, the perf-based detector will be used.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/watchdog.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b5ce6e4..e2cc6c0 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -149,6 +149,21 @@ int __weak __init watchdog_nmi_probe(void)
 {
 	int ret = -ENODEV;
 
+	/*
+	 * Try first with the HPET hardlockup detector. It will only
+	 * succeed if selected at build time and the nmi_watchdog
+	 * command-line parameter is configured. This ensure that the
+	 * perf-based detector is used by default, if selected at
+	 * build time.
+	 */
+	if (IS_ENABLED(CONFIG_HARDLOCKUP_DETECTOR_HPET))
+		ret = hardlockup_detector_hpet_ops.init();
+
+	if (!ret) {
+		nmi_wd_ops = &hardlockup_detector_hpet_ops;
+		return ret;
+	}
+
 	if (IS_ENABLED(CONFIG_HARDLOCKUP_DETECTOR_PERF))
 		ret = hardlockup_detector_perf_ops.init();
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 23/23] watchdog/hardlockup: Activate the HPET-based lockup detector
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Mathieu Desnoyers,
	Masami Hiramatsu, Peter Zijlstra

Now that the implementation of the HPET-based hardlockup detector is
complete, enable it. It will be used only if it can be initialized
successfully. Otherwise, the perf-based detector will be used.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/watchdog.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b5ce6e4..e2cc6c0 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -149,6 +149,21 @@ int __weak __init watchdog_nmi_probe(void)
 {
 	int ret = -ENODEV;
 
+	/*
+	 * Try first with the HPET hardlockup detector. It will only
+	 * succeed if selected at build time and the nmi_watchdog
+	 * command-line parameter is configured. This ensure that the
+	 * perf-based detector is used by default, if selected at
+	 * build time.
+	 */
+	if (IS_ENABLED(CONFIG_HARDLOCKUP_DETECTOR_HPET))
+		ret = hardlockup_detector_hpet_ops.init();
+
+	if (!ret) {
+		nmi_wd_ops = &hardlockup_detector_hpet_ops;
+		return ret;
+	}
+
 	if (IS_ENABLED(CONFIG_HARDLOCKUP_DETECTOR_PERF))
 		ret = hardlockup_detector_perf_ops.init();
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [RFC PATCH 23/23] watchdog/hardlockup: Activate the HPET-based lockup detector
@ 2018-06-13  0:57   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-13  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Ricardo Neri, Jacob Pan, Rafael J. Wysocki, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Alexei Starovoitov, Babu Moger, Mathieu Desnoyers,
	Masami Hiramatsu, Peter Zijlstra

Now that the implementation of the HPET-based hardlockup detector is
complete, enable it. It will be used only if it can be initialized
successfully. Otherwise, the perf-based detector will be used.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Christoffer Dall <cdall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: x86@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/watchdog.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b5ce6e4..e2cc6c0 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -149,6 +149,21 @@ int __weak __init watchdog_nmi_probe(void)
 {
 	int ret = -ENODEV;
 
+	/*
+	 * Try first with the HPET hardlockup detector. It will only
+	 * succeed if selected at build time and the nmi_watchdog
+	 * command-line parameter is configured. This ensure that the
+	 * perf-based detector is used by default, if selected at
+	 * build time.
+	 */
+	if (IS_ENABLED(CONFIG_HARDLOCKUP_DETECTOR_HPET))
+		ret = hardlockup_detector_hpet_ops.init();
+
+	if (!ret) {
+		nmi_wd_ops = &hardlockup_detector_hpet_ops;
+		return ret;
+	}
+
 	if (IS_ENABLED(CONFIG_HARDLOCKUP_DETECTOR_PERF))
 		ret = hardlockup_detector_perf_ops.init();
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 16/23] watchdog/hardlockup: Add an HPET-based hardlockup detector
@ 2018-06-13  5:23     ` Randy Dunlap
  0 siblings, 0 replies; 200+ messages in thread
From: Randy Dunlap @ 2018-06-13  5:23 UTC (permalink / raw)
  To: Ricardo Neri, Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Rafael J. Wysocki, Don Zickus, Nicholas Piggin,
	Michael Ellerman, Frederic Weisbecker, Alexei Starovoitov,
	Babu Moger, Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Davidlohr Bueso, Christoffer Dall, Marc Zyngier,
	Kai-Heng Feng, Konrad Rzeszutek Wilk, David Rientjes, iommu

Hi,

On 06/12/2018 05:57 PM, Ricardo Neri wrote:
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index c40c7b7..6e79833 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -828,6 +828,16 @@ config HARDLOCKUP_DETECTOR_PERF
>  	bool
>  	select SOFTLOCKUP_DETECTOR
>  
> +config HARDLOCKUP_DETECTOR_HPET
> +	bool "Use HPET Timer for Hard Lockup Detection"
> +	select SOFTLOCKUP_DETECTOR
> +	select HARDLOCKUP_DETECTOR
> +	depends on HPET_TIMER && HPET
> +	help
> +	  Say y to enable a hardlockup detector that is driven by an High-Precision
> +	  Event Timer. In addition to selecting this option, the command-line
> +	  parameter nmi_watchdog option. See Documentation/admin-guide/kernel-parameters.rst

The "In addition ..." thing is a broken (incomplete) sentence.

> +
>  #
>  # Enables a timestamp based low pass filter to compensate for perf based
>  # hard lockup detection which runs too fast due to turbo modes.


-- 
~Randy

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 16/23] watchdog/hardlockup: Add an HPET-based hardlockup detector
@ 2018-06-13  5:23     ` Randy Dunlap
  0 siblings, 0 replies; 200+ messages in thread
From: Randy Dunlap @ 2018-06-13  5:23 UTC (permalink / raw)
  To: Ricardo Neri, Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Rafael J. Wysocki, Peter Zijlstra, Alexei Starovoitov,
	Kai-Heng Feng, sparclinux-u79uwXL29TY76Z2rM5mHXA,
	Christoffer Dall, Davidlohr Bueso, Ashok Raj, Michael Ellerman,
	x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin, Byungchul Park, Mathieu Desnoyers

Hi,

On 06/12/2018 05:57 PM, Ricardo Neri wrote:
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index c40c7b7..6e79833 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -828,6 +828,16 @@ config HARDLOCKUP_DETECTOR_PERF
>  	bool
>  	select SOFTLOCKUP_DETECTOR
>  
> +config HARDLOCKUP_DETECTOR_HPET
> +	bool "Use HPET Timer for Hard Lockup Detection"
> +	select SOFTLOCKUP_DETECTOR
> +	select HARDLOCKUP_DETECTOR
> +	depends on HPET_TIMER && HPET
> +	help
> +	  Say y to enable a hardlockup detector that is driven by an High-Precision
> +	  Event Timer. In addition to selecting this option, the command-line
> +	  parameter nmi_watchdog option. See Documentation/admin-guide/kernel-parameters.rst

The "In addition ..." thing is a broken (incomplete) sentence.

> +
>  #
>  # Enables a timestamp based low pass filter to compensate for perf based
>  # hard lockup detection which runs too fast due to turbo modes.


-- 
~Randy

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 16/23] watchdog/hardlockup: Add an HPET-based hardlockup detector
@ 2018-06-13  5:23     ` Randy Dunlap
  0 siblings, 0 replies; 200+ messages in thread
From: Randy Dunlap @ 2018-06-13  5:23 UTC (permalink / raw)
  To: Ricardo Neri, Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Rafael J. Wysocki, Peter Zijlstra, Alexei Starovoitov,
	Kai-Heng Feng, sparclinux-u79uwXL29TY76Z2rM5mHXA,
	Christoffer Dall, Davidlohr Bueso, Ashok Raj, Michael Ellerman,
	x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin, Byungchul Park, Mathieu Desnoyers

Hi,

On 06/12/2018 05:57 PM, Ricardo Neri wrote:
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index c40c7b7..6e79833 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -828,6 +828,16 @@ config HARDLOCKUP_DETECTOR_PERF
>  	bool
>  	select SOFTLOCKUP_DETECTOR
>  
> +config HARDLOCKUP_DETECTOR_HPET
> +	bool "Use HPET Timer for Hard Lockup Detection"
> +	select SOFTLOCKUP_DETECTOR
> +	select HARDLOCKUP_DETECTOR
> +	depends on HPET_TIMER && HPET
> +	help
> +	  Say y to enable a hardlockup detector that is driven by an High-Precision
> +	  Event Timer. In addition to selecting this option, the command-line
> +	  parameter nmi_watchdog option. See Documentation/admin-guide/kernel-parameters.rst

The "In addition ..." thing is a broken (incomplete) sentence.

> +
>  #
>  # Enables a timestamp based low pass filter to compensate for perf based
>  # hard lockup detection which runs too fast due to turbo modes.


-- 
~Randy

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 22/23] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter
@ 2018-06-13  5:26     ` Randy Dunlap
  0 siblings, 0 replies; 200+ messages in thread
From: Randy Dunlap @ 2018-06-13  5:26 UTC (permalink / raw)
  To: Ricardo Neri, Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Rafael J. Wysocki, Don Zickus, Nicholas Piggin,
	Michael Ellerman, Frederic Weisbecker, Alexei Starovoitov,
	Babu Moger, Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Davidlohr Bueso, Christoffer Dall, Marc Zyngier,
	Kai-Heng Feng, Konrad Rzeszutek Wilk, David Rientjes, iommu

On 06/12/2018 05:57 PM, Ricardo Neri wrote:
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index f2040d4..a8833c7 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2577,7 +2577,7 @@
>  			Format: [state][,regs][,debounce][,die]
>  
>  	nmi_watchdog=	[KNL,BUGS=X86] Debugging features for SMP kernels
> -			Format: [panic,][nopanic,][num]
> +			Format: [panic,][nopanic,][num,][hpet]
>  			Valid num: 0 or 1
>  			0 - turn hardlockup detector in nmi_watchdog off
>  			1 - turn hardlockup detector in nmi_watchdog on

This says that I can use "nmi_watchdog=hpet" without using 0 or 1.
Is that correct?

> @@ -2587,6 +2587,9 @@
>  			please see 'nowatchdog'.
>  			This is useful when you use a panic=... timeout and
>  			need the box quickly up again.
> +			When hpet is specified, the NMI watchdog will be driven
> +			by an HPET timer, if available in the system. Otherwise,
> +			the perf-based implementation will be used.
>  
>  			These settings can be accessed at runtime via
>  			the nmi_watchdog and hardlockup_panic sysctls.


thanks,
-- 
~Randy

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 22/23] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter
@ 2018-06-13  5:26     ` Randy Dunlap
  0 siblings, 0 replies; 200+ messages in thread
From: Randy Dunlap @ 2018-06-13  5:26 UTC (permalink / raw)
  To: Ricardo Neri, Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Rafael J. Wysocki, Peter Zijlstra, Alexei Starovoitov,
	Kai-Heng Feng, sparclinux-u79uwXL29TY76Z2rM5mHXA,
	Christoffer Dall, Davidlohr Bueso, Ashok Raj, Michael Ellerman,
	x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin, Byungchul Park, Mathieu Desnoyers

On 06/12/2018 05:57 PM, Ricardo Neri wrote:
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index f2040d4..a8833c7 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2577,7 +2577,7 @@
>  			Format: [state][,regs][,debounce][,die]
>  
>  	nmi_watchdog=	[KNL,BUGS=X86] Debugging features for SMP kernels
> -			Format: [panic,][nopanic,][num]
> +			Format: [panic,][nopanic,][num,][hpet]
>  			Valid num: 0 or 1
>  			0 - turn hardlockup detector in nmi_watchdog off
>  			1 - turn hardlockup detector in nmi_watchdog on

This says that I can use "nmi_watchdog=hpet" without using 0 or 1.
Is that correct?

> @@ -2587,6 +2587,9 @@
>  			please see 'nowatchdog'.
>  			This is useful when you use a panic=... timeout and
>  			need the box quickly up again.
> +			When hpet is specified, the NMI watchdog will be driven
> +			by an HPET timer, if available in the system. Otherwise,
> +			the perf-based implementation will be used.
>  
>  			These settings can be accessed at runtime via
>  			the nmi_watchdog and hardlockup_panic sysctls.


thanks,
-- 
~Randy

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 22/23] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter
@ 2018-06-13  5:26     ` Randy Dunlap
  0 siblings, 0 replies; 200+ messages in thread
From: Randy Dunlap @ 2018-06-13  5:26 UTC (permalink / raw)
  To: Ricardo Neri, Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Rafael J. Wysocki, Peter Zijlstra, Alexei Starovoitov,
	Kai-Heng Feng, sparclinux-u79uwXL29TY76Z2rM5mHXA,
	Christoffer Dall, Davidlohr Bueso, Ashok Raj, Michael Ellerman,
	x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin, Byungchul Park, Mathieu Desnoyers

On 06/12/2018 05:57 PM, Ricardo Neri wrote:
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index f2040d4..a8833c7 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2577,7 +2577,7 @@
>  			Format: [state][,regs][,debounce][,die]
>  
>  	nmi_watchdog=	[KNL,BUGS=X86] Debugging features for SMP kernels
> -			Format: [panic,][nopanic,][num]
> +			Format: [panic,][nopanic,][num,][hpet]
>  			Valid num: 0 or 1
>  			0 - turn hardlockup detector in nmi_watchdog off
>  			1 - turn hardlockup detector in nmi_watchdog on

This says that I can use "nmi_watchdog=hpet" without using 0 or 1.
Is that correct?

> @@ -2587,6 +2587,9 @@
>  			please see 'nowatchdog'.
>  			This is useful when you use a panic=... timeout and
>  			need the box quickly up again.
> +			When hpet is specified, the NMI watchdog will be driven
> +			by an HPET timer, if available in the system. Otherwise,
> +			the perf-based implementation will be used.
>  
>  			These settings can be accessed at runtime via
>  			the nmi_watchdog and hardlockup_panic sysctls.


thanks,
-- 
~Randy

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
  2018-06-13  0:57   ` Ricardo Neri
  (?)
@ 2018-06-13  7:41     ` Nicholas Piggin
  -1 siblings, 0 replies; 200+ messages in thread
From: Nicholas Piggin @ 2018-06-13  7:41 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Ashok Raj, Borislav Petkov, Tony Luck, Ravi V. Shankar, x86,
	sparclinux, linuxppc-dev, linux-kernel, Jacob Pan, Don Zickus,
	Michael Ellerman, Frederic Weisbecker, Babu Moger,
	David S. Miller, Benjamin Herrenschmidt, Paul Mackerras,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Luis R. Rodriguez, iommu

On Tue, 12 Jun 2018 17:57:32 -0700
Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:

> Instead of exposing individual functions for the operations of the NMI
> watchdog, define a common interface that can be used across multiple
> implementations.
> 
> The struct nmi_watchdog_ops is defined for such operations. These initial
> definitions include the enable, disable, start, stop, and cleanup
> operations.
> 
> Only a single NMI watchdog can be used in the system. The operations of
> this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> variable is set to point the operations of the first NMI watchdog that
> initializes successfully. Even though at this moment, the only available
> NMI watchdog is the perf-based hardlockup detector. More implementations
> can be added in the future.

Cool, this looks pretty nice at a quick glance. sparc and powerpc at
least have their own NMI watchdogs, it would be good to have those
converted as well.

Is hpet a cross platform thing, or just x86? We should avoid
proliferation of files under kernel/ I think, so with these watchdog
driver structs then maybe implementations could go in drivers/ or
arch/

Thanks,
Nick

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
@ 2018-06-13  7:41     ` Nicholas Piggin
  0 siblings, 0 replies; 200+ messages in thread
From: Nicholas Piggin @ 2018-06-13  7:41 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Ashok Raj, Borislav Petkov, Tony Luck, Ravi V. Shankar, x86,
	sparclinux, linuxppc-dev, linux-kernel, Jacob Pan, Don Zickus,
	Michael Ellerman, Frederic Weisbecker, Babu Moger,
	David S. Miller, Benjamin Herrenschmidt, Paul Mackerras,
	Mathieu Desnoyers, Masami

On Tue, 12 Jun 2018 17:57:32 -0700
Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:

> Instead of exposing individual functions for the operations of the NMI
> watchdog, define a common interface that can be used across multiple
> implementations.
> 
> The struct nmi_watchdog_ops is defined for such operations. These initial
> definitions include the enable, disable, start, stop, and cleanup
> operations.
> 
> Only a single NMI watchdog can be used in the system. The operations of
> this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> variable is set to point the operations of the first NMI watchdog that
> initializes successfully. Even though at this moment, the only available
> NMI watchdog is the perf-based hardlockup detector. More implementations
> can be added in the future.

Cool, this looks pretty nice at a quick glance. sparc and powerpc at
least have their own NMI watchdogs, it would be good to have those
converted as well.

Is hpet a cross platform thing, or just x86? We should avoid
proliferation of files under kernel/ I think, so with these watchdog
driver structs then maybe implementations could go in drivers/ or
arch/

Thanks,
Nick

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
@ 2018-06-13  7:41     ` Nicholas Piggin
  0 siblings, 0 replies; 200+ messages in thread
From: Nicholas Piggin @ 2018-06-13  7:41 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Ashok Raj, Borislav Petkov, Tony Luck, Ravi V. Shankar, x86,
	sparclinux, linuxppc-dev, linux-kernel, Jacob Pan, Don Zickus,
	Michael Ellerman, Frederic Weisbecker, Babu Moger,
	David S. Miller, Benjamin Herrenschmidt, Paul Mackerras,
	Mathieu Desnoyers, Masami

On Tue, 12 Jun 2018 17:57:32 -0700
Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:

> Instead of exposing individual functions for the operations of the NMI
> watchdog, define a common interface that can be used across multiple
> implementations.
> 
> The struct nmi_watchdog_ops is defined for such operations. These initial
> definitions include the enable, disable, start, stop, and cleanup
> operations.
> 
> Only a single NMI watchdog can be used in the system. The operations of
> this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> variable is set to point the operations of the first NMI watchdog that
> initializes successfully. Even though at this moment, the only available
> NMI watchdog is the perf-based hardlockup detector. More implementations
> can be added in the future.

Cool, this looks pretty nice at a quick glance. sparc and powerpc at
least have their own NMI watchdogs, it would be good to have those
converted as well.

Is hpet a cross platform thing, or just x86? We should avoid
proliferation of files under kernel/ I think, so with these watchdog
driver structs then maybe implementations could go in drivers/ or
arch/

Thanks,
Nick

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-13  8:34     ` Peter Zijlstra
  0 siblings, 0 replies; 200+ messages in thread
From: Peter Zijlstra @ 2018-06-13  8:34 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Ashok Raj, Borislav Petkov, Tony Luck, Ravi V. Shankar, x86,
	sparclinux, linuxppc-dev, linux-kernel, Jacob Pan,
	Daniel Lezcano, Andrew Morton, Levin, Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Marc Zyngier,
	Bartosz Golaszewski, Doug Berger, Palmer Dabbelt, iommu

On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index 5426627..dbc5e02 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -61,6 +61,8 @@
>   *                interrupt handler after suspending interrupts. For system
>   *                wakeup devices users need to implement wakeup detection in
>   *                their interrupt handlers.
> + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as non-maskable, if
> + *                supported by the chip.
>   */

NAK on the first 6 patches. You really _REALLY_ don't want to expose
NMIs to this level.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-13  8:34     ` Peter Zijlstra
  0 siblings, 0 replies; 200+ messages in thread
From: Peter Zijlstra @ 2018-06-13  8:34 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Daniel Lezcano, Palmer Dabbelt, H. Peter Anvin,
	sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar, Doug Berger,
	Ashok Raj, Bartosz Golaszewski, x86-DgEjT+Ai2ygdnm+yROfE0A,
	Andi Kleen, Borislav Petkov, Masami Hiramatsu, Ravi V. Shankar,
	Marc Zyngier, Thomas Gleixner, Tony Luck, Randy Dunlap,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Levin,
	Alexander (Sasha Levin),
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jacob Pan,
	Andrew Morton, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ

On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index 5426627..dbc5e02 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -61,6 +61,8 @@
>   *                interrupt handler after suspending interrupts. For system
>   *                wakeup devices users need to implement wakeup detection in
>   *                their interrupt handlers.
> + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as non-maskable, if
> + *                supported by the chip.
>   */

NAK on the first 6 patches. You really _REALLY_ don't want to expose
NMIs to this level.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-13  8:34     ` Peter Zijlstra
  0 siblings, 0 replies; 200+ messages in thread
From: Peter Zijlstra @ 2018-06-13  8:34 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Daniel Lezcano, Palmer Dabbelt, H. Peter Anvin,
	sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar, Doug Berger,
	Ashok Raj, Bartosz Golaszewski, x86-DgEjT+Ai2ygdnm+yROfE0A,
	Andi Kleen, Borislav Petkov, Masami Hiramatsu, Ravi V. Shankar,
	Marc Zyngier, Thomas Gleixner, Tony Luck, Randy Dunlap,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Levin,
	Alexander (Sasha Levin),
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jacob Pan,
	Andrew Morton, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ

On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index 5426627..dbc5e02 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -61,6 +61,8 @@
>   *                interrupt handler after suspending interrupts. For system
>   *                wakeup devices users need to implement wakeup detection in
>   *                their interrupt handlers.
> + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as non-maskable, if
> + *                supported by the chip.
>   */

NAK on the first 6 patches. You really _REALLY_ don't want to expose
NMIs to this level.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
@ 2018-06-13  8:42       ` Peter Zijlstra
  0 siblings, 0 replies; 200+ messages in thread
From: Peter Zijlstra @ 2018-06-13  8:42 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Ricardo Neri, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Don Zickus, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Mathieu Desnoyers, Masami Hiramatsu,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Luis R. Rodriguez, iommu

On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:
> On Tue, 12 Jun 2018 17:57:32 -0700
> Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> 
> > Instead of exposing individual functions for the operations of the NMI
> > watchdog, define a common interface that can be used across multiple
> > implementations.
> > 
> > The struct nmi_watchdog_ops is defined for such operations. These initial
> > definitions include the enable, disable, start, stop, and cleanup
> > operations.
> > 
> > Only a single NMI watchdog can be used in the system. The operations of
> > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > variable is set to point the operations of the first NMI watchdog that
> > initializes successfully. Even though at this moment, the only available
> > NMI watchdog is the perf-based hardlockup detector. More implementations
> > can be added in the future.
> 
> Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> least have their own NMI watchdogs, it would be good to have those
> converted as well.

Yeah, agreed, this looks like half a patch.

> Is hpet a cross platform thing, or just x86? We should avoid
> proliferation of files under kernel/ I think, so with these watchdog
> driver structs then maybe implementations could go in drivers/ or
> arch/

HPET is mostly an x86 thing (altough it can be found elsewhere), but the
whole thing relies on the x86 NMI mechanism and is thus firmly arch/
material (like the sparc and ppc thing).

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
@ 2018-06-13  8:42       ` Peter Zijlstra
  0 siblings, 0 replies; 200+ messages in thread
From: Peter Zijlstra @ 2018-06-13  8:42 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Benjamin Herrenschmidt, Paul Mackerras, H. Peter Anvin,
	sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar, Ashok Raj,
	Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Andi Kleen,
	Borislav Petkov, Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Ricardo Neri, Frederic Weisbecker, Mathieu Desnoyers,
	Thomas Gleixner, Tony Luck, Babu Moger,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Luis R. Rodriguez,
	Jacob Pan, Philippe Ombredanne

On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:
> On Tue, 12 Jun 2018 17:57:32 -0700
> Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> 
> > Instead of exposing individual functions for the operations of the NMI
> > watchdog, define a common interface that can be used across multiple
> > implementations.
> > 
> > The struct nmi_watchdog_ops is defined for such operations. These initial
> > definitions include the enable, disable, start, stop, and cleanup
> > operations.
> > 
> > Only a single NMI watchdog can be used in the system. The operations of
> > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > variable is set to point the operations of the first NMI watchdog that
> > initializes successfully. Even though at this moment, the only available
> > NMI watchdog is the perf-based hardlockup detector. More implementations
> > can be added in the future.
> 
> Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> least have their own NMI watchdogs, it would be good to have those
> converted as well.

Yeah, agreed, this looks like half a patch.

> Is hpet a cross platform thing, or just x86? We should avoid
> proliferation of files under kernel/ I think, so with these watchdog
> driver structs then maybe implementations could go in drivers/ or
> arch/

HPET is mostly an x86 thing (altough it can be found elsewhere), but the
whole thing relies on the x86 NMI mechanism and is thus firmly arch/
material (like the sparc and ppc thing).

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
@ 2018-06-13  8:42       ` Peter Zijlstra
  0 siblings, 0 replies; 200+ messages in thread
From: Peter Zijlstra @ 2018-06-13  8:42 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Benjamin Herrenschmidt, Paul Mackerras, H. Peter Anvin,
	sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar, Ashok Raj,
	Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Andi Kleen,
	Borislav Petkov, Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Ricardo Neri, Frederic Weisbecker, Mathieu Desnoyers,
	Thomas Gleixner, Tony Luck, Babu Moger,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Luis R. Rodriguez,
	Jacob Pan, Philippe Ombredanne

On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:
> On Tue, 12 Jun 2018 17:57:32 -0700
> Ricardo Neri <ricardo.neri-calderon-VuQAYsv1563Yd54FQh9/CA@public.gmane.org> wrote:
> 
> > Instead of exposing individual functions for the operations of the NMI
> > watchdog, define a common interface that can be used across multiple
> > implementations.
> > 
> > The struct nmi_watchdog_ops is defined for such operations. These initial
> > definitions include the enable, disable, start, stop, and cleanup
> > operations.
> > 
> > Only a single NMI watchdog can be used in the system. The operations of
> > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > variable is set to point the operations of the first NMI watchdog that
> > initializes successfully. Even though at this moment, the only available
> > NMI watchdog is the perf-based hardlockup detector. More implementations
> > can be added in the future.
> 
> Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> least have their own NMI watchdogs, it would be good to have those
> converted as well.

Yeah, agreed, this looks like half a patch.

> Is hpet a cross platform thing, or just x86? We should avoid
> proliferation of files under kernel/ I think, so with these watchdog
> driver structs then maybe implementations could go in drivers/ or
> arch/

HPET is mostly an x86 thing (altough it can be found elsewhere), but the
whole thing relies on the x86 NMI mechanism and is thus firmly arch/
material (like the sparc and ppc thing).

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 14/23] watchdog/hardlockup: Decouple the hardlockup detector from perf
@ 2018-06-13  8:43     ` Peter Zijlstra
  0 siblings, 0 replies; 200+ messages in thread
From: Peter Zijlstra @ 2018-06-13  8:43 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Ashok Raj, Borislav Petkov, Tony Luck, Ravi V. Shankar, x86,
	sparclinux, linuxppc-dev, linux-kernel, Jacob Pan, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Mathieu Desnoyers, Masami Hiramatsu,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Luis R. Rodriguez, iommu

On Tue, Jun 12, 2018 at 05:57:34PM -0700, Ricardo Neri wrote:
> The current default implementation of the hardlockup detector assumes that
> it is implemented using perf events.

The sparc and powerpc things are very much not using perf.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 14/23] watchdog/hardlockup: Decouple the hardlockup detector from perf
@ 2018-06-13  8:43     ` Peter Zijlstra
  0 siblings, 0 replies; 200+ messages in thread
From: Peter Zijlstra @ 2018-06-13  8:43 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Benjamin Herrenschmidt, Paul Mackerras, H. Peter Anvin,
	sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar, Ashok Raj,
	Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Andi Kleen,
	Borislav Petkov, Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Frederic Weisbecker, Nicholas Piggin, Mathieu Desnoyers,
	Thomas Gleixner, Tony Luck, Babu Moger,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Luis R. Rodriguez,
	Jacob Pan, Philippe Ombredanne, Co

On Tue, Jun 12, 2018 at 05:57:34PM -0700, Ricardo Neri wrote:
> The current default implementation of the hardlockup detector assumes that
> it is implemented using perf events.

The sparc and powerpc things are very much not using perf.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 14/23] watchdog/hardlockup: Decouple the hardlockup detector from perf
@ 2018-06-13  8:43     ` Peter Zijlstra
  0 siblings, 0 replies; 200+ messages in thread
From: Peter Zijlstra @ 2018-06-13  8:43 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Benjamin Herrenschmidt, Paul Mackerras, H. Peter Anvin,
	sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar, Ashok Raj,
	Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Andi Kleen,
	Borislav Petkov, Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Frederic Weisbecker, Nicholas Piggin, Mathieu Desnoyers,
	Thomas Gleixner, Tony Luck, Babu Moger,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Luis R. Rodriguez,
	Jacob Pan, Philippe Ombredanne, Co

On Tue, Jun 12, 2018 at 05:57:34PM -0700, Ricardo Neri wrote:
> The current default implementation of the hardlockup detector assumes that
> it is implemented using perf events.

The sparc and powerpc things are very much not using perf.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
  2018-06-13  8:34     ` Peter Zijlstra
  (?)
@ 2018-06-13  8:59       ` Julien Thierry
  -1 siblings, 0 replies; 200+ messages in thread
From: Julien Thierry @ 2018-06-13  8:59 UTC (permalink / raw)
  To: Peter Zijlstra, Ricardo Neri
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Ashok Raj, Borislav Petkov, Tony Luck, Ravi V. Shankar, x86,
	sparclinux, linuxppc-dev, linux-kernel, Jacob Pan,
	Daniel Lezcano, Andrew Morton, Levin, Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Marc Zyngier,
	Bartosz Golaszewski, Doug Berger, Palmer Dabbelt, iommu

Hi Peter, Ricardo,

On 13/06/18 09:34, Peter Zijlstra wrote:
> On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
>> index 5426627..dbc5e02 100644
>> --- a/include/linux/interrupt.h
>> +++ b/include/linux/interrupt.h
>> @@ -61,6 +61,8 @@
>>    *                interrupt handler after suspending interrupts. For system
>>    *                wakeup devices users need to implement wakeup detection in
>>    *                their interrupt handlers.
>> + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as non-maskable, if
>> + *                supported by the chip.
>>    */
> 
> NAK on the first 6 patches. You really _REALLY_ don't want to expose
> NMIs to this level.
> 

I've been working on something similar on arm64 side, and effectively 
the one thing that might be common to arm64 and intel is the interface 
to set an interrupt as NMI. So I guess it would be nice to agree on the 
right approach for this.

The way I did it was by introducing a new irq_state and let the irqchip 
driver handle most of the work (if it supports that state):

https://lkml.org/lkml/2018/5/25/181

This has not been ACKed nor NAKed. So I am just asking whether this is a 
more suitable approach, and if not, is there any suggestions on how to 
do this?

Thanks,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-13  8:59       ` Julien Thierry
  0 siblings, 0 replies; 200+ messages in thread
From: Julien Thierry @ 2018-06-13  8:59 UTC (permalink / raw)
  To: Peter Zijlstra, Ricardo Neri
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Ashok Raj, Borislav Petkov, Tony Luck, Ravi V. Shankar, x86,
	sparclinux, linuxppc-dev, linux-kernel, Jacob Pan,
	Daniel Lezcano, Andrew Morton, Levin, Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Marc Zyngier,
	Bartosz Golaszewski, Doug Berger

Hi Peter, Ricardo,

On 13/06/18 09:34, Peter Zijlstra wrote:
> On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
>> index 5426627..dbc5e02 100644
>> --- a/include/linux/interrupt.h
>> +++ b/include/linux/interrupt.h
>> @@ -61,6 +61,8 @@
>>    *                interrupt handler after suspending interrupts. For system
>>    *                wakeup devices users need to implement wakeup detection in
>>    *                their interrupt handlers.
>> + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as non-maskable, if
>> + *                supported by the chip.
>>    */
> 
> NAK on the first 6 patches. You really _REALLY_ don't want to expose
> NMIs to this level.
> 

I've been working on something similar on arm64 side, and effectively 
the one thing that might be common to arm64 and intel is the interface 
to set an interrupt as NMI. So I guess it would be nice to agree on the 
right approach for this.

The way I did it was by introducing a new irq_state and let the irqchip 
driver handle most of the work (if it supports that state):

https://lkml.org/lkml/2018/5/25/181

This has not been ACKed nor NAKed. So I am just asking whether this is a 
more suitable approach, and if not, is there any suggestions on how to 
do this?

Thanks,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-13  8:59       ` Julien Thierry
  0 siblings, 0 replies; 200+ messages in thread
From: Julien Thierry @ 2018-06-13  8:59 UTC (permalink / raw)
  To: Peter Zijlstra, Ricardo Neri
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Ashok Raj, Borislav Petkov, Tony Luck, Ravi V. Shankar, x86,
	sparclinux, linuxppc-dev, linux-kernel, Jacob Pan,
	Daniel Lezcano, Andrew Morton, Levin, Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Marc Zyngier,
	Bartosz Golaszewski, Doug Berger

Hi Peter, Ricardo,

On 13/06/18 09:34, Peter Zijlstra wrote:
> On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
>> index 5426627..dbc5e02 100644
>> --- a/include/linux/interrupt.h
>> +++ b/include/linux/interrupt.h
>> @@ -61,6 +61,8 @@
>>    *                interrupt handler after suspending interrupts. For system
>>    *                wakeup devices users need to implement wakeup detection in
>>    *                their interrupt handlers.
>> + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as non-maskable, if
>> + *                supported by the chip.
>>    */
> 
> NAK on the first 6 patches. You really _REALLY_ don't want to expose
> NMIs to this level.
> 

I've been working on something similar on arm64 side, and effectively 
the one thing that might be common to arm64 and intel is the interface 
to set an interrupt as NMI. So I guess it would be nice to agree on the 
right approach for this.

The way I did it was by introducing a new irq_state and let the irqchip 
driver handle most of the work (if it supports that state):

https://lkml.org/lkml/2018/5/25/181

This has not been ACKed nor NAKed. So I am just asking whether this is a 
more suitable approach, and if not, is there any suggestions on how to 
do this?

Thanks,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-13  9:07     ` Peter Zijlstra
  0 siblings, 0 replies; 200+ messages in thread
From: Peter Zijlstra @ 2018-06-13  9:07 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Ashok Raj, Borislav Petkov, Tony Luck, Ravi V. Shankar, x86,
	sparclinux, linuxppc-dev, linux-kernel, Jacob Pan,
	Rafael J. Wysocki, Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Andrew Morton,
	Philippe Ombredanne, Colin Ian King, Byungchul Park,
	Paul E. McKenney, Luis R. Rodriguez, Waiman Long, Josh Poimboeuf,
	Randy Dunlap, Davidlohr Bueso, Christoffer Dall, Marc Zyngier,
	Kai-Heng Feng, Konrad Rzeszutek Wilk, David Rientjes, iommu

On Tue, Jun 12, 2018 at 05:57:37PM -0700, Ricardo Neri wrote:

+static bool is_hpet_wdt_interrupt(struct hpet_hld_data *hdata)
+{
+       unsigned long this_isr;
+       unsigned int lvl_trig;
+
+       this_isr = hpet_readl(HPET_STATUS) & BIT(hdata->num);
+
+       lvl_trig = hpet_readl(HPET_Tn_CFG(hdata->num)) & HPET_TN_LEVEL;
+
+       if (lvl_trig && this_isr)
+               return true;
+
+       return false;
+}

> +static int hardlockup_detector_nmi_handler(unsigned int val,
> +					   struct pt_regs *regs)
> +{
> +	struct hpet_hld_data *hdata = hld_data;
> +	unsigned int use_fsb;
> +
> +	/*
> +	 * If FSB delivery mode is used, the timer interrupt is programmed as
> +	 * edge-triggered and there is no need to check the ISR register.
> +	 */
> +	use_fsb = hdata->flags & HPET_DEV_FSB_CAP;

Please do explain.. That FSB thing basically means MSI. But there's only
a single NMI vector. How do we know this NMI came from the HPET?

> +
> +	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))

So you add _2_ HPET reads for every single NMI that gets triggered...
and IIRC HPET reads are _sllooooowwwwww_.

> +		return NMI_DONE;
> +
> +	inspect_for_hardlockups(regs);
> +
> +	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> +		kick_timer(hdata);
> +
> +	/* Acknowledge interrupt if in level-triggered mode */
> +	if (!use_fsb)
> +		hpet_writel(BIT(hdata->num), HPET_STATUS);
> +
> +	return NMI_HANDLED;

So if I read this right, when in FSB/MSI mode, we'll basically _always_
claim every single NMI as handled?

That's broken.

> +}

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-13  9:07     ` Peter Zijlstra
  0 siblings, 0 replies; 200+ messages in thread
From: Peter Zijlstra @ 2018-06-13  9:07 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Rafael J. Wysocki, Alexei Starovoitov, Kai-Heng Feng,
	H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar,
	Christoffer Dall, Davidlohr Bueso, Ashok Raj, Michael Ellerman,
	x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin, Byungchul Park

On Tue, Jun 12, 2018 at 05:57:37PM -0700, Ricardo Neri wrote:

+static bool is_hpet_wdt_interrupt(struct hpet_hld_data *hdata)
+{
+       unsigned long this_isr;
+       unsigned int lvl_trig;
+
+       this_isr = hpet_readl(HPET_STATUS) & BIT(hdata->num);
+
+       lvl_trig = hpet_readl(HPET_Tn_CFG(hdata->num)) & HPET_TN_LEVEL;
+
+       if (lvl_trig && this_isr)
+               return true;
+
+       return false;
+}

> +static int hardlockup_detector_nmi_handler(unsigned int val,
> +					   struct pt_regs *regs)
> +{
> +	struct hpet_hld_data *hdata = hld_data;
> +	unsigned int use_fsb;
> +
> +	/*
> +	 * If FSB delivery mode is used, the timer interrupt is programmed as
> +	 * edge-triggered and there is no need to check the ISR register.
> +	 */
> +	use_fsb = hdata->flags & HPET_DEV_FSB_CAP;

Please do explain.. That FSB thing basically means MSI. But there's only
a single NMI vector. How do we know this NMI came from the HPET?

> +
> +	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))

So you add _2_ HPET reads for every single NMI that gets triggered...
and IIRC HPET reads are _sllooooowwwwww_.

> +		return NMI_DONE;
> +
> +	inspect_for_hardlockups(regs);
> +
> +	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> +		kick_timer(hdata);
> +
> +	/* Acknowledge interrupt if in level-triggered mode */
> +	if (!use_fsb)
> +		hpet_writel(BIT(hdata->num), HPET_STATUS);
> +
> +	return NMI_HANDLED;

So if I read this right, when in FSB/MSI mode, we'll basically _always_
claim every single NMI as handled?

That's broken.

> +}

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-13  9:07     ` Peter Zijlstra
  0 siblings, 0 replies; 200+ messages in thread
From: Peter Zijlstra @ 2018-06-13  9:07 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Rafael J. Wysocki, Alexei Starovoitov, Kai-Heng Feng,
	H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar,
	Christoffer Dall, Davidlohr Bueso, Ashok Raj, Michael Ellerman,
	x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin, Byungchul Park

On Tue, Jun 12, 2018 at 05:57:37PM -0700, Ricardo Neri wrote:

+static bool is_hpet_wdt_interrupt(struct hpet_hld_data *hdata)
+{
+       unsigned long this_isr;
+       unsigned int lvl_trig;
+
+       this_isr = hpet_readl(HPET_STATUS) & BIT(hdata->num);
+
+       lvl_trig = hpet_readl(HPET_Tn_CFG(hdata->num)) & HPET_TN_LEVEL;
+
+       if (lvl_trig && this_isr)
+               return true;
+
+       return false;
+}

> +static int hardlockup_detector_nmi_handler(unsigned int val,
> +					   struct pt_regs *regs)
> +{
> +	struct hpet_hld_data *hdata = hld_data;
> +	unsigned int use_fsb;
> +
> +	/*
> +	 * If FSB delivery mode is used, the timer interrupt is programmed as
> +	 * edge-triggered and there is no need to check the ISR register.
> +	 */
> +	use_fsb = hdata->flags & HPET_DEV_FSB_CAP;

Please do explain.. That FSB thing basically means MSI. But there's only
a single NMI vector. How do we know this NMI came from the HPET?

> +
> +	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))

So you add _2_ HPET reads for every single NMI that gets triggered...
and IIRC HPET reads are _sllooooowwwwww_.

> +		return NMI_DONE;
> +
> +	inspect_for_hardlockups(regs);
> +
> +	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> +		kick_timer(hdata);
> +
> +	/* Acknowledge interrupt if in level-triggered mode */
> +	if (!use_fsb)
> +		hpet_writel(BIT(hdata->num), HPET_STATUS);
> +
> +	return NMI_HANDLED;

So if I read this right, when in FSB/MSI mode, we'll basically _always_
claim every single NMI as handled?

That's broken.

> +}

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
  2018-06-13  8:59       ` Julien Thierry
  (?)
@ 2018-06-13  9:20         ` Thomas Gleixner
  -1 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-13  9:20 UTC (permalink / raw)
  To: Julien Thierry
  Cc: Peter Zijlstra, Ricardo Neri, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Marc Zyngier,
	Bartosz Golaszewski, Doug Berger, Palmer Dabbelt, iommu

On Wed, 13 Jun 2018, Julien Thierry wrote:
> On 13/06/18 09:34, Peter Zijlstra wrote:
> > On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
> > > diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> > > index 5426627..dbc5e02 100644
> > > --- a/include/linux/interrupt.h
> > > +++ b/include/linux/interrupt.h
> > > @@ -61,6 +61,8 @@
> > >    *                interrupt handler after suspending interrupts. For
> > > system
> > >    *                wakeup devices users need to implement wakeup
> > > detection in
> > >    *                their interrupt handlers.
> > > + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
> > > non-maskable, if
> > > + *                supported by the chip.
> > >    */
> > 
> > NAK on the first 6 patches. You really _REALLY_ don't want to expose
> > NMIs to this level.
> > 
> 
> I've been working on something similar on arm64 side, and effectively the one
> thing that might be common to arm64 and intel is the interface to set an
> interrupt as NMI. So I guess it would be nice to agree on the right approach
> for this.
> 
> The way I did it was by introducing a new irq_state and let the irqchip driver
> handle most of the work (if it supports that state):
> 
> https://lkml.org/lkml/2018/5/25/181
>
> This has not been ACKed nor NAKed. So I am just asking whether this is a more
> suitable approach, and if not, is there any suggestions on how to do this?

I really didn't pay attention to that as it's burried in the GIC/ARM series
which is usually Marc's playground.

Adding NMI delivery support at low level architecture irq chip level is
perfectly fine, but the exposure of that needs to be restricted very
much. Adding it to the generic interrupt control interfaces is not going to
happen. That's doomed to begin with and a complete abuse of the interface
as the handler can not ever be used for that.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-13  9:20         ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-13  9:20 UTC (permalink / raw)
  To: Julien Thierry
  Cc: Peter Zijlstra, Ricardo Neri, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Marc Zyngier,
	Bartosz Golaszewski

On Wed, 13 Jun 2018, Julien Thierry wrote:
> On 13/06/18 09:34, Peter Zijlstra wrote:
> > On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
> > > diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> > > index 5426627..dbc5e02 100644
> > > --- a/include/linux/interrupt.h
> > > +++ b/include/linux/interrupt.h
> > > @@ -61,6 +61,8 @@
> > >    *                interrupt handler after suspending interrupts. For
> > > system
> > >    *                wakeup devices users need to implement wakeup
> > > detection in
> > >    *                their interrupt handlers.
> > > + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
> > > non-maskable, if
> > > + *                supported by the chip.
> > >    */
> > 
> > NAK on the first 6 patches. You really _REALLY_ don't want to expose
> > NMIs to this level.
> > 
> 
> I've been working on something similar on arm64 side, and effectively the one
> thing that might be common to arm64 and intel is the interface to set an
> interrupt as NMI. So I guess it would be nice to agree on the right approach
> for this.
> 
> The way I did it was by introducing a new irq_state and let the irqchip driver
> handle most of the work (if it supports that state):
> 
> https://lkml.org/lkml/2018/5/25/181
>
> This has not been ACKed nor NAKed. So I am just asking whether this is a more
> suitable approach, and if not, is there any suggestions on how to do this?

I really didn't pay attention to that as it's burried in the GIC/ARM series
which is usually Marc's playground.

Adding NMI delivery support at low level architecture irq chip level is
perfectly fine, but the exposure of that needs to be restricted very
much. Adding it to the generic interrupt control interfaces is not going to
happen. That's doomed to begin with and a complete abuse of the interface
as the handler can not ever be used for that.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-13  9:20         ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-13  9:20 UTC (permalink / raw)
  To: Julien Thierry
  Cc: Peter Zijlstra, Ricardo Neri, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Marc Zyngier,
	Bartosz Golaszewski

On Wed, 13 Jun 2018, Julien Thierry wrote:
> On 13/06/18 09:34, Peter Zijlstra wrote:
> > On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
> > > diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> > > index 5426627..dbc5e02 100644
> > > --- a/include/linux/interrupt.h
> > > +++ b/include/linux/interrupt.h
> > > @@ -61,6 +61,8 @@
> > >    *                interrupt handler after suspending interrupts. For
> > > system
> > >    *                wakeup devices users need to implement wakeup
> > > detection in
> > >    *                their interrupt handlers.
> > > + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
> > > non-maskable, if
> > > + *                supported by the chip.
> > >    */
> > 
> > NAK on the first 6 patches. You really _REALLY_ don't want to expose
> > NMIs to this level.
> > 
> 
> I've been working on something similar on arm64 side, and effectively the one
> thing that might be common to arm64 and intel is the interface to set an
> interrupt as NMI. So I guess it would be nice to agree on the right approach
> for this.
> 
> The way I did it was by introducing a new irq_state and let the irqchip driver
> handle most of the work (if it supports that state):
> 
> https://lkml.org/lkml/2018/5/25/181
>
> This has not been ACKed nor NAKed. So I am just asking whether this is a more
> suitable approach, and if not, is there any suggestions on how to do this?

I really didn't pay attention to that as it's burried in the GIC/ARM series
which is usually Marc's playground.

Adding NMI delivery support at low level architecture irq chip level is
perfectly fine, but the exposure of that needs to be restricted very
much. Adding it to the generic interrupt control interfaces is not going to
happen. That's doomed to begin with and a complete abuse of the interface
as the handler can not ever be used for that.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
  2018-06-13  8:42       ` Peter Zijlstra
  (?)
@ 2018-06-13  9:26         ` Thomas Gleixner
  -1 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-13  9:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nicholas Piggin, Ricardo Neri, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Don Zickus, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Mathieu Desnoyers, Masami Hiramatsu,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Luis R. Rodriguez, iommu

On Wed, 13 Jun 2018, Peter Zijlstra wrote:
> On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:
> > On Tue, 12 Jun 2018 17:57:32 -0700
> > Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> > 
> > > Instead of exposing individual functions for the operations of the NMI
> > > watchdog, define a common interface that can be used across multiple
> > > implementations.
> > > 
> > > The struct nmi_watchdog_ops is defined for such operations. These initial
> > > definitions include the enable, disable, start, stop, and cleanup
> > > operations.
> > > 
> > > Only a single NMI watchdog can be used in the system. The operations of
> > > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > > variable is set to point the operations of the first NMI watchdog that
> > > initializes successfully. Even though at this moment, the only available
> > > NMI watchdog is the perf-based hardlockup detector. More implementations
> > > can be added in the future.
> > 
> > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > least have their own NMI watchdogs, it would be good to have those
> > converted as well.
> 
> Yeah, agreed, this looks like half a patch.

Though I'm not seeing the advantage of it. That kind of NMI watchdogs are
low level architecture details so having yet another 'ops' data structure
with a gazillion of callbacks, checks and indirections does not provide
value over the currently available weak stubs.

> > Is hpet a cross platform thing, or just x86? We should avoid
> > proliferation of files under kernel/ I think, so with these watchdog
> > driver structs then maybe implementations could go in drivers/ or
> > arch/
> 
> HPET is mostly an x86 thing (altough it can be found elsewhere), but the

On ia64 and I doubt that anyone wants to take on the task of underwater
welding it to Itanic.

> whole thing relies on the x86 NMI mechanism and is thus firmly arch/
> material (like the sparc and ppc thing).

Right. Trying to make this 'generic' is not really solving anything.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
@ 2018-06-13  9:26         ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-13  9:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nicholas Piggin, Ricardo Neri, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Don Zickus, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Mathie

On Wed, 13 Jun 2018, Peter Zijlstra wrote:
> On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:
> > On Tue, 12 Jun 2018 17:57:32 -0700
> > Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> > 
> > > Instead of exposing individual functions for the operations of the NMI
> > > watchdog, define a common interface that can be used across multiple
> > > implementations.
> > > 
> > > The struct nmi_watchdog_ops is defined for such operations. These initial
> > > definitions include the enable, disable, start, stop, and cleanup
> > > operations.
> > > 
> > > Only a single NMI watchdog can be used in the system. The operations of
> > > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > > variable is set to point the operations of the first NMI watchdog that
> > > initializes successfully. Even though at this moment, the only available
> > > NMI watchdog is the perf-based hardlockup detector. More implementations
> > > can be added in the future.
> > 
> > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > least have their own NMI watchdogs, it would be good to have those
> > converted as well.
> 
> Yeah, agreed, this looks like half a patch.

Though I'm not seeing the advantage of it. That kind of NMI watchdogs are
low level architecture details so having yet another 'ops' data structure
with a gazillion of callbacks, checks and indirections does not provide
value over the currently available weak stubs.

> > Is hpet a cross platform thing, or just x86? We should avoid
> > proliferation of files under kernel/ I think, so with these watchdog
> > driver structs then maybe implementations could go in drivers/ or
> > arch/
> 
> HPET is mostly an x86 thing (altough it can be found elsewhere), but the

On ia64 and I doubt that anyone wants to take on the task of underwater
welding it to Itanic.

> whole thing relies on the x86 NMI mechanism and is thus firmly arch/
> material (like the sparc and ppc thing).

Right. Trying to make this 'generic' is not really solving anything.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
@ 2018-06-13  9:26         ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-13  9:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nicholas Piggin, Ricardo Neri, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Don Zickus, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Mathie

On Wed, 13 Jun 2018, Peter Zijlstra wrote:
> On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:
> > On Tue, 12 Jun 2018 17:57:32 -0700
> > Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> > 
> > > Instead of exposing individual functions for the operations of the NMI
> > > watchdog, define a common interface that can be used across multiple
> > > implementations.
> > > 
> > > The struct nmi_watchdog_ops is defined for such operations. These initial
> > > definitions include the enable, disable, start, stop, and cleanup
> > > operations.
> > > 
> > > Only a single NMI watchdog can be used in the system. The operations of
> > > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > > variable is set to point the operations of the first NMI watchdog that
> > > initializes successfully. Even though at this moment, the only available
> > > NMI watchdog is the perf-based hardlockup detector. More implementations
> > > can be added in the future.
> > 
> > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > least have their own NMI watchdogs, it would be good to have those
> > converted as well.
> 
> Yeah, agreed, this looks like half a patch.

Though I'm not seeing the advantage of it. That kind of NMI watchdogs are
low level architecture details so having yet another 'ops' data structure
with a gazillion of callbacks, checks and indirections does not provide
value over the currently available weak stubs.

> > Is hpet a cross platform thing, or just x86? We should avoid
> > proliferation of files under kernel/ I think, so with these watchdog
> > driver structs then maybe implementations could go in drivers/ or
> > arch/
> 
> HPET is mostly an x86 thing (altough it can be found elsewhere), but the

On ia64 and I doubt that anyone wants to take on the task of underwater
welding it to Itanic.

> whole thing relies on the x86 NMI mechanism and is thus firmly arch/
> material (like the sparc and ppc thing).

Right. Trying to make this 'generic' is not really solving anything.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
  2018-06-13  9:20         ` Thomas Gleixner
  (?)
@ 2018-06-13  9:36           ` Julien Thierry
  -1 siblings, 0 replies; 200+ messages in thread
From: Julien Thierry @ 2018-06-13  9:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Ricardo Neri, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Marc Zyngier,
	Bartosz Golaszewski, Doug Berger, Palmer Dabbelt, iommu



On 13/06/18 10:20, Thomas Gleixner wrote:
> On Wed, 13 Jun 2018, Julien Thierry wrote:
>> On 13/06/18 09:34, Peter Zijlstra wrote:
>>> On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
>>>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
>>>> index 5426627..dbc5e02 100644
>>>> --- a/include/linux/interrupt.h
>>>> +++ b/include/linux/interrupt.h
>>>> @@ -61,6 +61,8 @@
>>>>     *                interrupt handler after suspending interrupts. For
>>>> system
>>>>     *                wakeup devices users need to implement wakeup
>>>> detection in
>>>>     *                their interrupt handlers.
>>>> + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
>>>> non-maskable, if
>>>> + *                supported by the chip.
>>>>     */
>>>
>>> NAK on the first 6 patches. You really _REALLY_ don't want to expose
>>> NMIs to this level.
>>>
>>
>> I've been working on something similar on arm64 side, and effectively the one
>> thing that might be common to arm64 and intel is the interface to set an
>> interrupt as NMI. So I guess it would be nice to agree on the right approach
>> for this.
>>
>> The way I did it was by introducing a new irq_state and let the irqchip driver
>> handle most of the work (if it supports that state):
>>
>> https://lkml.org/lkml/2018/5/25/181
>>
>> This has not been ACKed nor NAKed. So I am just asking whether this is a more
>> suitable approach, and if not, is there any suggestions on how to do this?
> 
> I really didn't pay attention to that as it's burried in the GIC/ARM series
> which is usually Marc's playground.
> 
> Adding NMI delivery support at low level architecture irq chip level is
> perfectly fine, but the exposure of that needs to be restricted very
> much. Adding it to the generic interrupt control interfaces is not going to
> happen. That's doomed to begin with and a complete abuse of the interface
> as the handler can not ever be used for that.
> 

Understood, however the need would be to provide a way for a driver to 
request an interrupt to be delivered as an NMI (if irqchip supports it).

But from your response this would be out of the question (in the 
interrupt/irq/irqchip definitions).

Or somehow the concerned irqchip informs the arch it supports NMI 
delivery and it is up to the interested drivers to query the arch 
whether NMI delivery is supported by the system?

Thanks,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-13  9:36           ` Julien Thierry
  0 siblings, 0 replies; 200+ messages in thread
From: Julien Thierry @ 2018-06-13  9:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Ricardo Neri, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Marc Zyngier,
	Bartosz Golaszewski



On 13/06/18 10:20, Thomas Gleixner wrote:
> On Wed, 13 Jun 2018, Julien Thierry wrote:
>> On 13/06/18 09:34, Peter Zijlstra wrote:
>>> On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
>>>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
>>>> index 5426627..dbc5e02 100644
>>>> --- a/include/linux/interrupt.h
>>>> +++ b/include/linux/interrupt.h
>>>> @@ -61,6 +61,8 @@
>>>>     *                interrupt handler after suspending interrupts. For
>>>> system
>>>>     *                wakeup devices users need to implement wakeup
>>>> detection in
>>>>     *                their interrupt handlers.
>>>> + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
>>>> non-maskable, if
>>>> + *                supported by the chip.
>>>>     */
>>>
>>> NAK on the first 6 patches. You really _REALLY_ don't want to expose
>>> NMIs to this level.
>>>
>>
>> I've been working on something similar on arm64 side, and effectively the one
>> thing that might be common to arm64 and intel is the interface to set an
>> interrupt as NMI. So I guess it would be nice to agree on the right approach
>> for this.
>>
>> The way I did it was by introducing a new irq_state and let the irqchip driver
>> handle most of the work (if it supports that state):
>>
>> https://lkml.org/lkml/2018/5/25/181
>>
>> This has not been ACKed nor NAKed. So I am just asking whether this is a more
>> suitable approach, and if not, is there any suggestions on how to do this?
> 
> I really didn't pay attention to that as it's burried in the GIC/ARM series
> which is usually Marc's playground.
> 
> Adding NMI delivery support at low level architecture irq chip level is
> perfectly fine, but the exposure of that needs to be restricted very
> much. Adding it to the generic interrupt control interfaces is not going to
> happen. That's doomed to begin with and a complete abuse of the interface
> as the handler can not ever be used for that.
> 

Understood, however the need would be to provide a way for a driver to 
request an interrupt to be delivered as an NMI (if irqchip supports it).

But from your response this would be out of the question (in the 
interrupt/irq/irqchip definitions).

Or somehow the concerned irqchip informs the arch it supports NMI 
delivery and it is up to the interested drivers to query the arch 
whether NMI delivery is supported by the system?

Thanks,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-13  9:36           ` Julien Thierry
  0 siblings, 0 replies; 200+ messages in thread
From: Julien Thierry @ 2018-06-13  9:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Ricardo Neri, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Marc Zyngier,
	Bartosz Golaszewski



On 13/06/18 10:20, Thomas Gleixner wrote:
> On Wed, 13 Jun 2018, Julien Thierry wrote:
>> On 13/06/18 09:34, Peter Zijlstra wrote:
>>> On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
>>>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
>>>> index 5426627..dbc5e02 100644
>>>> --- a/include/linux/interrupt.h
>>>> +++ b/include/linux/interrupt.h
>>>> @@ -61,6 +61,8 @@
>>>>     *                interrupt handler after suspending interrupts. For
>>>> system
>>>>     *                wakeup devices users need to implement wakeup
>>>> detection in
>>>>     *                their interrupt handlers.
>>>> + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
>>>> non-maskable, if
>>>> + *                supported by the chip.
>>>>     */
>>>
>>> NAK on the first 6 patches. You really _REALLY_ don't want to expose
>>> NMIs to this level.
>>>
>>
>> I've been working on something similar on arm64 side, and effectively the one
>> thing that might be common to arm64 and intel is the interface to set an
>> interrupt as NMI. So I guess it would be nice to agree on the right approach
>> for this.
>>
>> The way I did it was by introducing a new irq_state and let the irqchip driver
>> handle most of the work (if it supports that state):
>>
>> https://lkml.org/lkml/2018/5/25/181
>>
>> This has not been ACKed nor NAKed. So I am just asking whether this is a more
>> suitable approach, and if not, is there any suggestions on how to do this?
> 
> I really didn't pay attention to that as it's burried in the GIC/ARM series
> which is usually Marc's playground.
> 
> Adding NMI delivery support at low level architecture irq chip level is
> perfectly fine, but the exposure of that needs to be restricted very
> much. Adding it to the generic interrupt control interfaces is not going to
> happen. That's doomed to begin with and a complete abuse of the interface
> as the handler can not ever be used for that.
> 

Understood, however the need would be to provide a way for a driver to 
request an interrupt to be delivered as an NMI (if irqchip supports it).

But from your response this would be out of the question (in the 
interrupt/irq/irqchip definitions).

Or somehow the concerned irqchip informs the arch it supports NMI 
delivery and it is up to the interested drivers to query the arch 
whether NMI delivery is supported by the system?

Thanks,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
  2018-06-13  0:57   ` Ricardo Neri
  (?)
@ 2018-06-13  9:40     ` Thomas Gleixner
  -1 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-13  9:40 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Randy Dunlap, Davidlohr Bueso, Christoffer Dall,
	Marc Zyngier, Kai-Heng Feng, Konrad Rzeszutek Wilk,
	David Rientjes, iommu

On Tue, 12 Jun 2018, Ricardo Neri wrote:
> @@ -183,6 +184,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
>  	if (!(hdata->flags & HPET_DEV_PERI_CAP))
>  		kick_timer(hdata);
>  
> +	pr_err("This interrupt should not have happened. Ensure delivery mode is NMI.\n");

Eeew.

>  /**
> + * hardlockup_detector_nmi_handler() - NMI Interrupt handler
> + * @val:	Attribute associated with the NMI. Not used.
> + * @regs:	Register values as seen when the NMI was asserted
> + *
> + * When an NMI is issued, look for hardlockups. If the timer is not periodic,
> + * kick it. The interrupt is always handled when if delivered via the
> + * Front-Side Bus.
> + *
> + * Returns:
> + *
> + * NMI_DONE if the HPET timer did not cause the interrupt. NMI_HANDLED
> + * otherwise.
> + */
> +static int hardlockup_detector_nmi_handler(unsigned int val,
> +					   struct pt_regs *regs)
> +{
> +	struct hpet_hld_data *hdata = hld_data;
> +	unsigned int use_fsb;
> +
> +	/*
> +	 * If FSB delivery mode is used, the timer interrupt is programmed as
> +	 * edge-triggered and there is no need to check the ISR register.
> +	 */
> +	use_fsb = hdata->flags & HPET_DEV_FSB_CAP;
> +
> +	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
> +		return NMI_DONE;

So for 'use_fsb == True' every single NMI will fall through into the
watchdog code below.

> +	inspect_for_hardlockups(regs);
> +
> +	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> +		kick_timer(hdata);

And in case that the HPET does not support periodic mode this reprogramms
the timer on every NMI which means that while perf is running the watchdog
will never ever detect anything.

Aside of that, reading TWO HPET registers for every NMI is insane. HPET
access is horribly slow, so any high frequency perf monitoring will take a
massive performance hit.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-13  9:40     ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-13  9:40 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu

On Tue, 12 Jun 2018, Ricardo Neri wrote:
> @@ -183,6 +184,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
>  	if (!(hdata->flags & HPET_DEV_PERI_CAP))
>  		kick_timer(hdata);
>  
> +	pr_err("This interrupt should not have happened. Ensure delivery mode is NMI.\n");

Eeew.

>  /**
> + * hardlockup_detector_nmi_handler() - NMI Interrupt handler
> + * @val:	Attribute associated with the NMI. Not used.
> + * @regs:	Register values as seen when the NMI was asserted
> + *
> + * When an NMI is issued, look for hardlockups. If the timer is not periodic,
> + * kick it. The interrupt is always handled when if delivered via the
> + * Front-Side Bus.
> + *
> + * Returns:
> + *
> + * NMI_DONE if the HPET timer did not cause the interrupt. NMI_HANDLED
> + * otherwise.
> + */
> +static int hardlockup_detector_nmi_handler(unsigned int val,
> +					   struct pt_regs *regs)
> +{
> +	struct hpet_hld_data *hdata = hld_data;
> +	unsigned int use_fsb;
> +
> +	/*
> +	 * If FSB delivery mode is used, the timer interrupt is programmed as
> +	 * edge-triggered and there is no need to check the ISR register.
> +	 */
> +	use_fsb = hdata->flags & HPET_DEV_FSB_CAP;
> +
> +	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
> +		return NMI_DONE;

So for 'use_fsb = True' every single NMI will fall through into the
watchdog code below.

> +	inspect_for_hardlockups(regs);
> +
> +	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> +		kick_timer(hdata);

And in case that the HPET does not support periodic mode this reprogramms
the timer on every NMI which means that while perf is running the watchdog
will never ever detect anything.

Aside of that, reading TWO HPET registers for every NMI is insane. HPET
access is horribly slow, so any high frequency perf monitoring will take a
massive performance hit.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-13  9:40     ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-13  9:40 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu

On Tue, 12 Jun 2018, Ricardo Neri wrote:
> @@ -183,6 +184,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
>  	if (!(hdata->flags & HPET_DEV_PERI_CAP))
>  		kick_timer(hdata);
>  
> +	pr_err("This interrupt should not have happened. Ensure delivery mode is NMI.\n");

Eeew.

>  /**
> + * hardlockup_detector_nmi_handler() - NMI Interrupt handler
> + * @val:	Attribute associated with the NMI. Not used.
> + * @regs:	Register values as seen when the NMI was asserted
> + *
> + * When an NMI is issued, look for hardlockups. If the timer is not periodic,
> + * kick it. The interrupt is always handled when if delivered via the
> + * Front-Side Bus.
> + *
> + * Returns:
> + *
> + * NMI_DONE if the HPET timer did not cause the interrupt. NMI_HANDLED
> + * otherwise.
> + */
> +static int hardlockup_detector_nmi_handler(unsigned int val,
> +					   struct pt_regs *regs)
> +{
> +	struct hpet_hld_data *hdata = hld_data;
> +	unsigned int use_fsb;
> +
> +	/*
> +	 * If FSB delivery mode is used, the timer interrupt is programmed as
> +	 * edge-triggered and there is no need to check the ISR register.
> +	 */
> +	use_fsb = hdata->flags & HPET_DEV_FSB_CAP;
> +
> +	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
> +		return NMI_DONE;

So for 'use_fsb == True' every single NMI will fall through into the
watchdog code below.

> +	inspect_for_hardlockups(regs);
> +
> +	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> +		kick_timer(hdata);

And in case that the HPET does not support periodic mode this reprogramms
the timer on every NMI which means that while perf is running the watchdog
will never ever detect anything.

Aside of that, reading TWO HPET registers for every NMI is insane. HPET
access is horribly slow, so any high frequency perf monitoring will take a
massive performance hit.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs
  2018-06-13  0:57   ` Ricardo Neri
  (?)
@ 2018-06-13  9:48     ` Thomas Gleixner
  -1 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-13  9:48 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Randy Dunlap, Davidlohr Bueso, Christoffer Dall,
	Marc Zyngier, Kai-Heng Feng, Konrad Rzeszutek Wilk,
	David Rientjes, iommu

On Tue, 12 Jun 2018, Ricardo Neri wrote:
> +	/* There are no CPUs to monitor. */
> +	if (!cpumask_weight(&hdata->monitored_mask))
> +		return NMI_HANDLED;
> +
>  	inspect_for_hardlockups(regs);
>  
> +	/*
> +	 * Target a new CPU. Keep trying until we find a monitored CPU. CPUs
> +	 * are addded and removed to this mask at cpu_up() and cpu_down(),
> +	 * respectively. Thus, the interrupt should be able to be moved to
> +	 * the next monitored CPU.
> +	 */
> +	spin_lock(&hld_data->lock);

Yuck. Taking a spinlock from NMI ... 

> +	for_each_cpu_wrap(cpu, &hdata->monitored_mask, smp_processor_id() + 1) {
> +		if (!irq_set_affinity(hld_data->irq, cpumask_of(cpu)))
> +			break;

... and then calling into generic interrupt code which will take even more
locks is completely broken.

Guess what happens when the NMI hits a section where one of those locks is
held? Then you need another watchdog to decode the lockup you just ran into.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs
@ 2018-06-13  9:48     ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-13  9:48 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu

On Tue, 12 Jun 2018, Ricardo Neri wrote:
> +	/* There are no CPUs to monitor. */
> +	if (!cpumask_weight(&hdata->monitored_mask))
> +		return NMI_HANDLED;
> +
>  	inspect_for_hardlockups(regs);
>  
> +	/*
> +	 * Target a new CPU. Keep trying until we find a monitored CPU. CPUs
> +	 * are addded and removed to this mask at cpu_up() and cpu_down(),
> +	 * respectively. Thus, the interrupt should be able to be moved to
> +	 * the next monitored CPU.
> +	 */
> +	spin_lock(&hld_data->lock);

Yuck. Taking a spinlock from NMI ... 

> +	for_each_cpu_wrap(cpu, &hdata->monitored_mask, smp_processor_id() + 1) {
> +		if (!irq_set_affinity(hld_data->irq, cpumask_of(cpu)))
> +			break;

... and then calling into generic interrupt code which will take even more
locks is completely broken.

Guess what happens when the NMI hits a section where one of those locks is
held? Then you need another watchdog to decode the lockup you just ran into.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs
@ 2018-06-13  9:48     ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-13  9:48 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu

On Tue, 12 Jun 2018, Ricardo Neri wrote:
> +	/* There are no CPUs to monitor. */
> +	if (!cpumask_weight(&hdata->monitored_mask))
> +		return NMI_HANDLED;
> +
>  	inspect_for_hardlockups(regs);
>  
> +	/*
> +	 * Target a new CPU. Keep trying until we find a monitored CPU. CPUs
> +	 * are addded and removed to this mask at cpu_up() and cpu_down(),
> +	 * respectively. Thus, the interrupt should be able to be moved to
> +	 * the next monitored CPU.
> +	 */
> +	spin_lock(&hld_data->lock);

Yuck. Taking a spinlock from NMI ... 

> +	for_each_cpu_wrap(cpu, &hdata->monitored_mask, smp_processor_id() + 1) {
> +		if (!irq_set_affinity(hld_data->irq, cpumask_of(cpu)))
> +			break;

... and then calling into generic interrupt code which will take even more
locks is completely broken.

Guess what happens when the NMI hits a section where one of those locks is
held? Then you need another watchdog to decode the lockup you just ran into.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-13  9:49             ` Julien Thierry
  0 siblings, 0 replies; 200+ messages in thread
From: Julien Thierry @ 2018-06-13  9:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Ricardo Neri, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Marc Zyngier,
	Bartosz Golaszewski, Doug Berger, Palmer Dabbelt, iommu



On 13/06/18 10:36, Julien Thierry wrote:
> 
> 
> On 13/06/18 10:20, Thomas Gleixner wrote:
>> On Wed, 13 Jun 2018, Julien Thierry wrote:
>>> On 13/06/18 09:34, Peter Zijlstra wrote:
>>>> On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
>>>>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
>>>>> index 5426627..dbc5e02 100644
>>>>> --- a/include/linux/interrupt.h
>>>>> +++ b/include/linux/interrupt.h
>>>>> @@ -61,6 +61,8 @@
>>>>>     *                interrupt handler after suspending interrupts. 
>>>>> For
>>>>> system
>>>>>     *                wakeup devices users need to implement wakeup
>>>>> detection in
>>>>>     *                their interrupt handlers.
>>>>> + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
>>>>> non-maskable, if
>>>>> + *                supported by the chip.
>>>>>     */
>>>>
>>>> NAK on the first 6 patches. You really _REALLY_ don't want to expose
>>>> NMIs to this level.
>>>>
>>>
>>> I've been working on something similar on arm64 side, and effectively 
>>> the one
>>> thing that might be common to arm64 and intel is the interface to set an
>>> interrupt as NMI. So I guess it would be nice to agree on the right 
>>> approach
>>> for this.
>>>
>>> The way I did it was by introducing a new irq_state and let the 
>>> irqchip driver
>>> handle most of the work (if it supports that state):
>>>
>>> https://lkml.org/lkml/2018/5/25/181
>>>
>>> This has not been ACKed nor NAKed. So I am just asking whether this 
>>> is a more
>>> suitable approach, and if not, is there any suggestions on how to do 
>>> this?
>>
>> I really didn't pay attention to that as it's burried in the GIC/ARM 
>> series
>> which is usually Marc's playground.
>>
>> Adding NMI delivery support at low level architecture irq chip level is
>> perfectly fine, but the exposure of that needs to be restricted very
>> much. Adding it to the generic interrupt control interfaces is not 
>> going to
>> happen. That's doomed to begin with and a complete abuse of the interface
>> as the handler can not ever be used for that.
>>
> 
> Understood, however the need would be to provide a way for a driver to 
> request an interrupt to be delivered as an NMI (if irqchip supports it).
> 
> But from your response this would be out of the question (in the 
> interrupt/irq/irqchip definitions).
> 
> Or somehow the concerned irqchip informs the arch it supports NMI 
> delivery and it is up to the interested drivers to query the arch 
> whether NMI delivery is supported by the system?

Actually scratch that last part, it is also missing a way for the driver 
to actually communicate to the irqchip that its interrupt should be 
treated as an NMI, so it wouldn't work...

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-13  9:49             ` Julien Thierry
  0 siblings, 0 replies; 200+ messages in thread
From: Julien Thierry @ 2018-06-13  9:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Daniel Lezcano, Peter Zijlstra, Palmer Dabbelt, H. Peter Anvin,
	sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar, Doug Berger,
	Ashok Raj, Bartosz Golaszewski, x86-DgEjT+Ai2ygdnm+yROfE0A,
	Andi Kleen, Borislav Petkov, Masami Hiramatsu, Ravi V. Shankar,
	Marc Zyngier, Ricardo Neri, Tony Luck, Randy Dunlap,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Levin,
	Alexander (Sasha Levin),
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jacob Pan,
	Andrew Morton, linuxppc-dev-uLR06cmDAlaKREJ1Ck/qmQ



On 13/06/18 10:36, Julien Thierry wrote:
> 
> 
> On 13/06/18 10:20, Thomas Gleixner wrote:
>> On Wed, 13 Jun 2018, Julien Thierry wrote:
>>> On 13/06/18 09:34, Peter Zijlstra wrote:
>>>> On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
>>>>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
>>>>> index 5426627..dbc5e02 100644
>>>>> --- a/include/linux/interrupt.h
>>>>> +++ b/include/linux/interrupt.h
>>>>> @@ -61,6 +61,8 @@
>>>>>     *                interrupt handler after suspending interrupts. 
>>>>> For
>>>>> system
>>>>>     *                wakeup devices users need to implement wakeup
>>>>> detection in
>>>>>     *                their interrupt handlers.
>>>>> + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
>>>>> non-maskable, if
>>>>> + *                supported by the chip.
>>>>>     */
>>>>
>>>> NAK on the first 6 patches. You really _REALLY_ don't want to expose
>>>> NMIs to this level.
>>>>
>>>
>>> I've been working on something similar on arm64 side, and effectively 
>>> the one
>>> thing that might be common to arm64 and intel is the interface to set an
>>> interrupt as NMI. So I guess it would be nice to agree on the right 
>>> approach
>>> for this.
>>>
>>> The way I did it was by introducing a new irq_state and let the 
>>> irqchip driver
>>> handle most of the work (if it supports that state):
>>>
>>> https://lkml.org/lkml/2018/5/25/181
>>>
>>> This has not been ACKed nor NAKed. So I am just asking whether this 
>>> is a more
>>> suitable approach, and if not, is there any suggestions on how to do 
>>> this?
>>
>> I really didn't pay attention to that as it's burried in the GIC/ARM 
>> series
>> which is usually Marc's playground.
>>
>> Adding NMI delivery support at low level architecture irq chip level is
>> perfectly fine, but the exposure of that needs to be restricted very
>> much. Adding it to the generic interrupt control interfaces is not 
>> going to
>> happen. That's doomed to begin with and a complete abuse of the interface
>> as the handler can not ever be used for that.
>>
> 
> Understood, however the need would be to provide a way for a driver to 
> request an interrupt to be delivered as an NMI (if irqchip supports it).
> 
> But from your response this would be out of the question (in the 
> interrupt/irq/irqchip definitions).
> 
> Or somehow the concerned irqchip informs the arch it supports NMI 
> delivery and it is up to the interested drivers to query the arch 
> whether NMI delivery is supported by the system?

Actually scratch that last part, it is also missing a way for the driver 
to actually communicate to the irqchip that its interrupt should be 
treated as an NMI, so it wouldn't work...

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-13  9:49             ` Julien Thierry
  0 siblings, 0 replies; 200+ messages in thread
From: Julien Thierry @ 2018-06-13  9:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Daniel Lezcano, Peter Zijlstra, Palmer Dabbelt, H. Peter Anvin,
	sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar, Doug Berger,
	Ashok Raj, Bartosz Golaszewski, x86-DgEjT+Ai2ygdnm+yROfE0A,
	Andi Kleen, Borislav Petkov, Masami Hiramatsu, Ravi V. Shankar,
	Marc Zyngier, Ricardo Neri, Tony Luck, Randy Dunlap,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Levin,
	Alexander (Sasha Levin),
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jacob Pan,
	Andrew Morton, linuxppc-dev-uLR06cmDAlaKREJ1Ck/qmQ



On 13/06/18 10:36, Julien Thierry wrote:
> 
> 
> On 13/06/18 10:20, Thomas Gleixner wrote:
>> On Wed, 13 Jun 2018, Julien Thierry wrote:
>>> On 13/06/18 09:34, Peter Zijlstra wrote:
>>>> On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
>>>>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
>>>>> index 5426627..dbc5e02 100644
>>>>> --- a/include/linux/interrupt.h
>>>>> +++ b/include/linux/interrupt.h
>>>>> @@ -61,6 +61,8 @@
>>>>>     *                interrupt handler after suspending interrupts. 
>>>>> For
>>>>> system
>>>>>     *                wakeup devices users need to implement wakeup
>>>>> detection in
>>>>>     *                their interrupt handlers.
>>>>> + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
>>>>> non-maskable, if
>>>>> + *                supported by the chip.
>>>>>     */
>>>>
>>>> NAK on the first 6 patches. You really _REALLY_ don't want to expose
>>>> NMIs to this level.
>>>>
>>>
>>> I've been working on something similar on arm64 side, and effectively 
>>> the one
>>> thing that might be common to arm64 and intel is the interface to set an
>>> interrupt as NMI. So I guess it would be nice to agree on the right 
>>> approach
>>> for this.
>>>
>>> The way I did it was by introducing a new irq_state and let the 
>>> irqchip driver
>>> handle most of the work (if it supports that state):
>>>
>>> https://lkml.org/lkml/2018/5/25/181
>>>
>>> This has not been ACKed nor NAKed. So I am just asking whether this 
>>> is a more
>>> suitable approach, and if not, is there any suggestions on how to do 
>>> this?
>>
>> I really didn't pay attention to that as it's burried in the GIC/ARM 
>> series
>> which is usually Marc's playground.
>>
>> Adding NMI delivery support at low level architecture irq chip level is
>> perfectly fine, but the exposure of that needs to be restricted very
>> much. Adding it to the generic interrupt control interfaces is not 
>> going to
>> happen. That's doomed to begin with and a complete abuse of the interface
>> as the handler can not ever be used for that.
>>
> 
> Understood, however the need would be to provide a way for a driver to 
> request an interrupt to be delivered as an NMI (if irqchip supports it).
> 
> But from your response this would be out of the question (in the 
> interrupt/irq/irqchip definitions).
> 
> Or somehow the concerned irqchip informs the arch it supports NMI 
> delivery and it is up to the interested drivers to query the arch 
> whether NMI delivery is supported by the system?

Actually scratch that last part, it is also missing a way for the driver 
to actually communicate to the irqchip that its interrupt should be 
treated as an NMI, so it wouldn't work...

-- 
Julien Thierry
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
  2018-06-13  9:36           ` Julien Thierry
  (?)
@ 2018-06-13  9:57             ` Thomas Gleixner
  -1 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-13  9:57 UTC (permalink / raw)
  To: Julien Thierry
  Cc: Peter Zijlstra, Ricardo Neri, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Marc Zyngier,
	Bartosz Golaszewski, Doug Berger, Palmer Dabbelt, iommu

On Wed, 13 Jun 2018, Julien Thierry wrote:
> On 13/06/18 10:20, Thomas Gleixner wrote:
> > Adding NMI delivery support at low level architecture irq chip level is
> > perfectly fine, but the exposure of that needs to be restricted very
> > much. Adding it to the generic interrupt control interfaces is not going to
> > happen. That's doomed to begin with and a complete abuse of the interface
> > as the handler can not ever be used for that.
> > 
> 
> Understood, however the need would be to provide a way for a driver to request
> an interrupt to be delivered as an NMI (if irqchip supports it).

s/driver/specialized code written by people who know what they are doing/

> But from your response this would be out of the question (in the
> interrupt/irq/irqchip definitions).

Adding some magic to the irq chip is fine, because that's where the low
level integration needs to be done, but exposing it through the generic
interrupt subsystem is a NONO for obvious reasons.

> Or somehow the concerned irqchip informs the arch it supports NMI delivery and
> it is up to the interested drivers to query the arch whether NMI delivery is
> supported by the system?

Yes, we need some infrastructure for that, but that needs to be separate
and with very limited exposure.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-13  9:57             ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-13  9:57 UTC (permalink / raw)
  To: Julien Thierry
  Cc: Peter Zijlstra, Ricardo Neri, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Marc Zyngier,
	Bartosz Golaszewski

On Wed, 13 Jun 2018, Julien Thierry wrote:
> On 13/06/18 10:20, Thomas Gleixner wrote:
> > Adding NMI delivery support at low level architecture irq chip level is
> > perfectly fine, but the exposure of that needs to be restricted very
> > much. Adding it to the generic interrupt control interfaces is not going to
> > happen. That's doomed to begin with and a complete abuse of the interface
> > as the handler can not ever be used for that.
> > 
> 
> Understood, however the need would be to provide a way for a driver to request
> an interrupt to be delivered as an NMI (if irqchip supports it).

s/driver/specialized code written by people who know what they are doing/

> But from your response this would be out of the question (in the
> interrupt/irq/irqchip definitions).

Adding some magic to the irq chip is fine, because that's where the low
level integration needs to be done, but exposing it through the generic
interrupt subsystem is a NONO for obvious reasons.

> Or somehow the concerned irqchip informs the arch it supports NMI delivery and
> it is up to the interested drivers to query the arch whether NMI delivery is
> supported by the system?

Yes, we need some infrastructure for that, but that needs to be separate
and with very limited exposure.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-13  9:57             ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-13  9:57 UTC (permalink / raw)
  To: Julien Thierry
  Cc: Peter Zijlstra, Ricardo Neri, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Marc Zyngier,
	Bartosz Golaszewski

On Wed, 13 Jun 2018, Julien Thierry wrote:
> On 13/06/18 10:20, Thomas Gleixner wrote:
> > Adding NMI delivery support at low level architecture irq chip level is
> > perfectly fine, but the exposure of that needs to be restricted very
> > much. Adding it to the generic interrupt control interfaces is not going to
> > happen. That's doomed to begin with and a complete abuse of the interface
> > as the handler can not ever be used for that.
> > 
> 
> Understood, however the need would be to provide a way for a driver to request
> an interrupt to be delivered as an NMI (if irqchip supports it).

s/driver/specialized code written by people who know what they are doing/

> But from your response this would be out of the question (in the
> interrupt/irq/irqchip definitions).

Adding some magic to the irq chip is fine, because that's where the low
level integration needs to be done, but exposing it through the generic
interrupt subsystem is a NONO for obvious reasons.

> Or somehow the concerned irqchip informs the arch it supports NMI delivery and
> it is up to the interested drivers to query the arch whether NMI delivery is
> supported by the system?

Yes, we need some infrastructure for that, but that needs to be separate
and with very limited exposure.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-13 10:06           ` Marc Zyngier
  0 siblings, 0 replies; 200+ messages in thread
From: Marc Zyngier @ 2018-06-13 10:06 UTC (permalink / raw)
  To: Thomas Gleixner, Julien Thierry
  Cc: Peter Zijlstra, Ricardo Neri, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Bartosz Golaszewski, Doug Berger,
	Palmer Dabbelt, iommu

On 13/06/18 10:20, Thomas Gleixner wrote:
> On Wed, 13 Jun 2018, Julien Thierry wrote:
>> On 13/06/18 09:34, Peter Zijlstra wrote:
>>> On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
>>>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
>>>> index 5426627..dbc5e02 100644
>>>> --- a/include/linux/interrupt.h
>>>> +++ b/include/linux/interrupt.h
>>>> @@ -61,6 +61,8 @@
>>>>    *                interrupt handler after suspending interrupts. For
>>>> system
>>>>    *                wakeup devices users need to implement wakeup
>>>> detection in
>>>>    *                their interrupt handlers.
>>>> + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
>>>> non-maskable, if
>>>> + *                supported by the chip.
>>>>    */
>>>
>>> NAK on the first 6 patches. You really _REALLY_ don't want to expose
>>> NMIs to this level.
>>>
>>
>> I've been working on something similar on arm64 side, and effectively the one
>> thing that might be common to arm64 and intel is the interface to set an
>> interrupt as NMI. So I guess it would be nice to agree on the right approach
>> for this.
>>
>> The way I did it was by introducing a new irq_state and let the irqchip driver
>> handle most of the work (if it supports that state):
>>
>> https://lkml.org/lkml/2018/5/25/181
>>
>> This has not been ACKed nor NAKed. So I am just asking whether this is a more
>> suitable approach, and if not, is there any suggestions on how to do this?
> 
> I really didn't pay attention to that as it's burried in the GIC/ARM series
> which is usually Marc's playground.

I'm working my way through it ATM now that I have some brain cycles back.

> Adding NMI delivery support at low level architecture irq chip level is
> perfectly fine, but the exposure of that needs to be restricted very
> much. Adding it to the generic interrupt control interfaces is not going to
> happen. That's doomed to begin with and a complete abuse of the interface
> as the handler can not ever be used for that.

I can only agree with that. Allowing random driver to use request_irq()
to make anything an NMI ultimately turns it into a complete mess ("hey,
NMI is *faster*, let's use that"), and a potential source of horrible
deadlocks.

What I'd find more palatable is a way for an irqchip to be able to
prioritize some interrupts based on a set of architecturally-defined
requirements, and a separate NMI requesting/handling framework that is
separate from the IRQ API, as the overall requirements are likely to
completely different.

It shouldn't have to be nearly as complex as the IRQ API, and require
much stricter requirements in terms of what you can do there (flow
handling should definitely be different).

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-13 10:06           ` Marc Zyngier
  0 siblings, 0 replies; 200+ messages in thread
From: Marc Zyngier @ 2018-06-13 10:06 UTC (permalink / raw)
  To: Thomas Gleixner, Julien Thierry
  Cc: Daniel Lezcano, Peter Zijlstra, Palmer Dabbelt, H. Peter Anvin,
	sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar, Doug Berger,
	Ashok Raj, Bartosz Golaszewski, x86-DgEjT+Ai2ygdnm+yROfE0A,
	Andi Kleen, Borislav Petkov, Masami Hiramatsu, Ravi V. Shankar,
	Ricardo Neri, Tony Luck, Randy Dunlap,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Levin,
	Alexander (Sasha Levin),
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jacob Pan,
	Andrew Morton, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ

On 13/06/18 10:20, Thomas Gleixner wrote:
> On Wed, 13 Jun 2018, Julien Thierry wrote:
>> On 13/06/18 09:34, Peter Zijlstra wrote:
>>> On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
>>>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
>>>> index 5426627..dbc5e02 100644
>>>> --- a/include/linux/interrupt.h
>>>> +++ b/include/linux/interrupt.h
>>>> @@ -61,6 +61,8 @@
>>>>    *                interrupt handler after suspending interrupts. For
>>>> system
>>>>    *                wakeup devices users need to implement wakeup
>>>> detection in
>>>>    *                their interrupt handlers.
>>>> + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
>>>> non-maskable, if
>>>> + *                supported by the chip.
>>>>    */
>>>
>>> NAK on the first 6 patches. You really _REALLY_ don't want to expose
>>> NMIs to this level.
>>>
>>
>> I've been working on something similar on arm64 side, and effectively the one
>> thing that might be common to arm64 and intel is the interface to set an
>> interrupt as NMI. So I guess it would be nice to agree on the right approach
>> for this.
>>
>> The way I did it was by introducing a new irq_state and let the irqchip driver
>> handle most of the work (if it supports that state):
>>
>> https://lkml.org/lkml/2018/5/25/181
>>
>> This has not been ACKed nor NAKed. So I am just asking whether this is a more
>> suitable approach, and if not, is there any suggestions on how to do this?
> 
> I really didn't pay attention to that as it's burried in the GIC/ARM series
> which is usually Marc's playground.

I'm working my way through it ATM now that I have some brain cycles back.

> Adding NMI delivery support at low level architecture irq chip level is
> perfectly fine, but the exposure of that needs to be restricted very
> much. Adding it to the generic interrupt control interfaces is not going to
> happen. That's doomed to begin with and a complete abuse of the interface
> as the handler can not ever be used for that.

I can only agree with that. Allowing random driver to use request_irq()
to make anything an NMI ultimately turns it into a complete mess ("hey,
NMI is *faster*, let's use that"), and a potential source of horrible
deadlocks.

What I'd find more palatable is a way for an irqchip to be able to
prioritize some interrupts based on a set of architecturally-defined
requirements, and a separate NMI requesting/handling framework that is
separate from the IRQ API, as the overall requirements are likely to
completely different.

It shouldn't have to be nearly as complex as the IRQ API, and require
much stricter requirements in terms of what you can do there (flow
handling should definitely be different).

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-13 10:06           ` Marc Zyngier
  0 siblings, 0 replies; 200+ messages in thread
From: Marc Zyngier @ 2018-06-13 10:06 UTC (permalink / raw)
  To: Thomas Gleixner, Julien Thierry
  Cc: Daniel Lezcano, Peter Zijlstra, Palmer Dabbelt, H. Peter Anvin,
	sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar, Doug Berger,
	Ashok Raj, Bartosz Golaszewski, x86-DgEjT+Ai2ygdnm+yROfE0A,
	Andi Kleen, Borislav Petkov, Masami Hiramatsu, Ravi V. Shankar,
	Ricardo Neri, Tony Luck, Randy Dunlap,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Levin,
	Alexander (Sasha Levin),
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jacob Pan,
	Andrew Morton, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ

On 13/06/18 10:20, Thomas Gleixner wrote:
> On Wed, 13 Jun 2018, Julien Thierry wrote:
>> On 13/06/18 09:34, Peter Zijlstra wrote:
>>> On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
>>>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
>>>> index 5426627..dbc5e02 100644
>>>> --- a/include/linux/interrupt.h
>>>> +++ b/include/linux/interrupt.h
>>>> @@ -61,6 +61,8 @@
>>>>    *                interrupt handler after suspending interrupts. For
>>>> system
>>>>    *                wakeup devices users need to implement wakeup
>>>> detection in
>>>>    *                their interrupt handlers.
>>>> + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
>>>> non-maskable, if
>>>> + *                supported by the chip.
>>>>    */
>>>
>>> NAK on the first 6 patches. You really _REALLY_ don't want to expose
>>> NMIs to this level.
>>>
>>
>> I've been working on something similar on arm64 side, and effectively the one
>> thing that might be common to arm64 and intel is the interface to set an
>> interrupt as NMI. So I guess it would be nice to agree on the right approach
>> for this.
>>
>> The way I did it was by introducing a new irq_state and let the irqchip driver
>> handle most of the work (if it supports that state):
>>
>> https://lkml.org/lkml/2018/5/25/181
>>
>> This has not been ACKed nor NAKed. So I am just asking whether this is a more
>> suitable approach, and if not, is there any suggestions on how to do this?
> 
> I really didn't pay attention to that as it's burried in the GIC/ARM series
> which is usually Marc's playground.

I'm working my way through it ATM now that I have some brain cycles back.

> Adding NMI delivery support at low level architecture irq chip level is
> perfectly fine, but the exposure of that needs to be restricted very
> much. Adding it to the generic interrupt control interfaces is not going to
> happen. That's doomed to begin with and a complete abuse of the interface
> as the handler can not ever be used for that.

I can only agree with that. Allowing random driver to use request_irq()
to make anything an NMI ultimately turns it into a complete mess ("hey,
NMI is *faster*, let's use that"), and a potential source of horrible
deadlocks.

What I'd find more palatable is a way for an irqchip to be able to
prioritize some interrupts based on a set of architecturally-defined
requirements, and a separate NMI requesting/handling framework that is
separate from the IRQ API, as the overall requirements are likely to
completely different.

It shouldn't have to be nearly as complex as the IRQ API, and require
much stricter requirements in terms of what you can do there (flow
handling should definitely be different).

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
  2018-06-13  9:57             ` Thomas Gleixner
  (?)
@ 2018-06-13 10:25               ` Julien Thierry
  -1 siblings, 0 replies; 200+ messages in thread
From: Julien Thierry @ 2018-06-13 10:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Ricardo Neri, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Marc Zyngier,
	Bartosz Golaszewski, Doug Berger, Palmer Dabbelt, iommu



On 13/06/18 10:57, Thomas Gleixner wrote:
> On Wed, 13 Jun 2018, Julien Thierry wrote:
>> On 13/06/18 10:20, Thomas Gleixner wrote:
>>> Adding NMI delivery support at low level architecture irq chip level is
>>> perfectly fine, but the exposure of that needs to be restricted very
>>> much. Adding it to the generic interrupt control interfaces is not going to
>>> happen. That's doomed to begin with and a complete abuse of the interface
>>> as the handler can not ever be used for that.
>>>
>>
>> Understood, however the need would be to provide a way for a driver to request
>> an interrupt to be delivered as an NMI (if irqchip supports it).
> 
> s/driver/specialized code written by people who know what they are doing/
> 
>> But from your response this would be out of the question (in the
>> interrupt/irq/irqchip definitions).
> 
> Adding some magic to the irq chip is fine, because that's where the low
> level integration needs to be done, but exposing it through the generic
> interrupt subsystem is a NONO for obvious reasons.
> 
>> Or somehow the concerned irqchip informs the arch it supports NMI delivery and
>> it is up to the interested drivers to query the arch whether NMI delivery is
>> supported by the system?
> 
> Yes, we need some infrastructure for that, but that needs to be separate
> and with very limited exposure.
> 

Right, makes sense. I'll check with Marc how such an infrastructure 
should be introduced.

Thanks,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-13 10:25               ` Julien Thierry
  0 siblings, 0 replies; 200+ messages in thread
From: Julien Thierry @ 2018-06-13 10:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Ricardo Neri, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Marc Zyngier,
	Bartosz Golaszewski



On 13/06/18 10:57, Thomas Gleixner wrote:
> On Wed, 13 Jun 2018, Julien Thierry wrote:
>> On 13/06/18 10:20, Thomas Gleixner wrote:
>>> Adding NMI delivery support at low level architecture irq chip level is
>>> perfectly fine, but the exposure of that needs to be restricted very
>>> much. Adding it to the generic interrupt control interfaces is not going to
>>> happen. That's doomed to begin with and a complete abuse of the interface
>>> as the handler can not ever be used for that.
>>>
>>
>> Understood, however the need would be to provide a way for a driver to request
>> an interrupt to be delivered as an NMI (if irqchip supports it).
> 
> s/driver/specialized code written by people who know what they are doing/
> 
>> But from your response this would be out of the question (in the
>> interrupt/irq/irqchip definitions).
> 
> Adding some magic to the irq chip is fine, because that's where the low
> level integration needs to be done, but exposing it through the generic
> interrupt subsystem is a NONO for obvious reasons.
> 
>> Or somehow the concerned irqchip informs the arch it supports NMI delivery and
>> it is up to the interested drivers to query the arch whether NMI delivery is
>> supported by the system?
> 
> Yes, we need some infrastructure for that, but that needs to be separate
> and with very limited exposure.
> 

Right, makes sense. I'll check with Marc how such an infrastructure 
should be introduced.

Thanks,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-13 10:25               ` Julien Thierry
  0 siblings, 0 replies; 200+ messages in thread
From: Julien Thierry @ 2018-06-13 10:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Ricardo Neri, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Marc Zyngier,
	Bartosz Golaszewski



On 13/06/18 10:57, Thomas Gleixner wrote:
> On Wed, 13 Jun 2018, Julien Thierry wrote:
>> On 13/06/18 10:20, Thomas Gleixner wrote:
>>> Adding NMI delivery support at low level architecture irq chip level is
>>> perfectly fine, but the exposure of that needs to be restricted very
>>> much. Adding it to the generic interrupt control interfaces is not going to
>>> happen. That's doomed to begin with and a complete abuse of the interface
>>> as the handler can not ever be used for that.
>>>
>>
>> Understood, however the need would be to provide a way for a driver to request
>> an interrupt to be delivered as an NMI (if irqchip supports it).
> 
> s/driver/specialized code written by people who know what they are doing/
> 
>> But from your response this would be out of the question (in the
>> interrupt/irq/irqchip definitions).
> 
> Adding some magic to the irq chip is fine, because that's where the low
> level integration needs to be done, but exposing it through the generic
> interrupt subsystem is a NONO for obvious reasons.
> 
>> Or somehow the concerned irqchip informs the arch it supports NMI delivery and
>> it is up to the interested drivers to query the arch whether NMI delivery is
>> supported by the system?
> 
> Yes, we need some infrastructure for that, but that needs to be separate
> and with very limited exposure.
> 

Right, makes sense. I'll check with Marc how such an infrastructure 
should be introduced.

Thanks,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
@ 2018-06-13 11:52           ` Nicholas Piggin
  0 siblings, 0 replies; 200+ messages in thread
From: Nicholas Piggin @ 2018-06-13 11:52 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Ricardo Neri, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Don Zickus, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Mathieu Desnoyers, Masami Hiramatsu,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Luis R. Rodriguez, iommu

On Wed, 13 Jun 2018 11:26:49 +0200 (CEST)
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Wed, 13 Jun 2018, Peter Zijlstra wrote:
> > On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:  
> > > On Tue, 12 Jun 2018 17:57:32 -0700
> > > Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> > >   
> > > > Instead of exposing individual functions for the operations of the NMI
> > > > watchdog, define a common interface that can be used across multiple
> > > > implementations.
> > > > 
> > > > The struct nmi_watchdog_ops is defined for such operations. These initial
> > > > definitions include the enable, disable, start, stop, and cleanup
> > > > operations.
> > > > 
> > > > Only a single NMI watchdog can be used in the system. The operations of
> > > > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > > > variable is set to point the operations of the first NMI watchdog that
> > > > initializes successfully. Even though at this moment, the only available
> > > > NMI watchdog is the perf-based hardlockup detector. More implementations
> > > > can be added in the future.  
> > > 
> > > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > > least have their own NMI watchdogs, it would be good to have those
> > > converted as well.  
> > 
> > Yeah, agreed, this looks like half a patch.  
> 
> Though I'm not seeing the advantage of it. That kind of NMI watchdogs are
> low level architecture details so having yet another 'ops' data structure
> with a gazillion of callbacks, checks and indirections does not provide
> value over the currently available weak stubs.

The other way to go of course is librify the perf watchdog and make an
x86 watchdog that selects between perf and hpet... I also probably
prefer that for code such as this, but I wouldn't strongly object to
ops struct if I'm not writing the code. It's not that bad is it?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
@ 2018-06-13 11:52           ` Nicholas Piggin
  0 siblings, 0 replies; 200+ messages in thread
From: Nicholas Piggin @ 2018-06-13 11:52 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Benjamin Herrenschmidt, Paul Mackerras,
	H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar,
	Ashok Raj, Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Andi Kleen,
	Borislav Petkov, Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Ricardo Neri, Frederic Weisbecker, Mathieu Desnoyers, Tony Luck,
	Babu Moger, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Luis R. Rodriguez, Jacob Pan, Philippe Ombredanne

On Wed, 13 Jun 2018 11:26:49 +0200 (CEST)
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Wed, 13 Jun 2018, Peter Zijlstra wrote:
> > On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:  
> > > On Tue, 12 Jun 2018 17:57:32 -0700
> > > Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> > >   
> > > > Instead of exposing individual functions for the operations of the NMI
> > > > watchdog, define a common interface that can be used across multiple
> > > > implementations.
> > > > 
> > > > The struct nmi_watchdog_ops is defined for such operations. These initial
> > > > definitions include the enable, disable, start, stop, and cleanup
> > > > operations.
> > > > 
> > > > Only a single NMI watchdog can be used in the system. The operations of
> > > > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > > > variable is set to point the operations of the first NMI watchdog that
> > > > initializes successfully. Even though at this moment, the only available
> > > > NMI watchdog is the perf-based hardlockup detector. More implementations
> > > > can be added in the future.  
> > > 
> > > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > > least have their own NMI watchdogs, it would be good to have those
> > > converted as well.  
> > 
> > Yeah, agreed, this looks like half a patch.  
> 
> Though I'm not seeing the advantage of it. That kind of NMI watchdogs are
> low level architecture details so having yet another 'ops' data structure
> with a gazillion of callbacks, checks and indirections does not provide
> value over the currently available weak stubs.

The other way to go of course is librify the perf watchdog and make an
x86 watchdog that selects between perf and hpet... I also probably
prefer that for code such as this, but I wouldn't strongly object to
ops struct if I'm not writing the code. It's not that bad is it?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
@ 2018-06-13 11:52           ` Nicholas Piggin
  0 siblings, 0 replies; 200+ messages in thread
From: Nicholas Piggin @ 2018-06-13 11:52 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Benjamin Herrenschmidt, Paul Mackerras,
	H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar,
	Ashok Raj, Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Andi Kleen,
	Borislav Petkov, Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Ricardo Neri, Frederic Weisbecker, Mathieu Desnoyers, Tony Luck,
	Babu Moger, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Luis R. Rodriguez, Jacob Pan, Philippe Ombredanne

On Wed, 13 Jun 2018 11:26:49 +0200 (CEST)
Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org> wrote:

> On Wed, 13 Jun 2018, Peter Zijlstra wrote:
> > On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:  
> > > On Tue, 12 Jun 2018 17:57:32 -0700
> > > Ricardo Neri <ricardo.neri-calderon-VuQAYsv1563Yd54FQh9/CA@public.gmane.org> wrote:
> > >   
> > > > Instead of exposing individual functions for the operations of the NMI
> > > > watchdog, define a common interface that can be used across multiple
> > > > implementations.
> > > > 
> > > > The struct nmi_watchdog_ops is defined for such operations. These initial
> > > > definitions include the enable, disable, start, stop, and cleanup
> > > > operations.
> > > > 
> > > > Only a single NMI watchdog can be used in the system. The operations of
> > > > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > > > variable is set to point the operations of the first NMI watchdog that
> > > > initializes successfully. Even though at this moment, the only available
> > > > NMI watchdog is the perf-based hardlockup detector. More implementations
> > > > can be added in the future.  
> > > 
> > > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > > least have their own NMI watchdogs, it would be good to have those
> > > converted as well.  
> > 
> > Yeah, agreed, this looks like half a patch.  
> 
> Though I'm not seeing the advantage of it. That kind of NMI watchdogs are
> low level architecture details so having yet another 'ops' data structure
> with a gazillion of callbacks, checks and indirections does not provide
> value over the currently available weak stubs.

The other way to go of course is librify the perf watchdog and make an
x86 watchdog that selects between perf and hpet... I also probably
prefer that for code such as this, but I wouldn't strongly object to
ops struct if I'm not writing the code. It's not that bad is it?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 22/23] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter
@ 2018-06-14  0:58       ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-14  0:58 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Ashok Raj, Borislav Petkov, Tony Luck, Ravi V. Shankar, x86,
	sparclinux, linuxppc-dev, linux-kernel, Jacob Pan,
	Rafael J. Wysocki, Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Davidlohr Bueso, Christoffer Dall, Marc Zyngier,
	Kai-Heng Feng, Konrad Rzeszutek Wilk, David Rientjes, iommu

On Tue, Jun 12, 2018 at 10:26:57PM -0700, Randy Dunlap wrote:
> On 06/12/2018 05:57 PM, Ricardo Neri wrote:
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index f2040d4..a8833c7 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -2577,7 +2577,7 @@
> >  			Format: [state][,regs][,debounce][,die]
> >  
> >  	nmi_watchdog=	[KNL,BUGS=X86] Debugging features for SMP kernels
> > -			Format: [panic,][nopanic,][num]
> > +			Format: [panic,][nopanic,][num,][hpet]
> >  			Valid num: 0 or 1
> >  			0 - turn hardlockup detector in nmi_watchdog off
> >  			1 - turn hardlockup detector in nmi_watchdog on
> 
> This says that I can use "nmi_watchdog=hpet" without using 0 or 1.
> Is that correct?

Yes, this what I meant. In my view, if you set nmi_watchdog=hpet it
implies that you want to activate the NMI watchdog. In this case, perf.

I can see how this will be ambiguous for the case of perf and arch NMI
watchdogs.

Alternative, a new parameter could be added; such as nmi_watchdog_type. I
didn't want to add it in this patchset as I think that a single parameter
can handle the enablement and type of the NMI watchdog.

What do you think?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 22/23] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter
@ 2018-06-14  0:58       ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-14  0:58 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Rafael J. Wysocki, Peter Zijlstra, Alexei Starovoitov,
	Kai-Heng Feng, H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA,
	Ingo Molnar, Christoffer Dall, Davidlohr Bueso, Ashok Raj,
	Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin

On Tue, Jun 12, 2018 at 10:26:57PM -0700, Randy Dunlap wrote:
> On 06/12/2018 05:57 PM, Ricardo Neri wrote:
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index f2040d4..a8833c7 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -2577,7 +2577,7 @@
> >  			Format: [state][,regs][,debounce][,die]
> >  
> >  	nmi_watchdog=	[KNL,BUGS=X86] Debugging features for SMP kernels
> > -			Format: [panic,][nopanic,][num]
> > +			Format: [panic,][nopanic,][num,][hpet]
> >  			Valid num: 0 or 1
> >  			0 - turn hardlockup detector in nmi_watchdog off
> >  			1 - turn hardlockup detector in nmi_watchdog on
> 
> This says that I can use "nmi_watchdog=hpet" without using 0 or 1.
> Is that correct?

Yes, this what I meant. In my view, if you set nmi_watchdog=hpet it
implies that you want to activate the NMI watchdog. In this case, perf.

I can see how this will be ambiguous for the case of perf and arch NMI
watchdogs.

Alternative, a new parameter could be added; such as nmi_watchdog_type. I
didn't want to add it in this patchset as I think that a single parameter
can handle the enablement and type of the NMI watchdog.

What do you think?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 22/23] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter
@ 2018-06-14  0:58       ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-14  0:58 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Rafael J. Wysocki, Peter Zijlstra, Alexei Starovoitov,
	Kai-Heng Feng, H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA,
	Ingo Molnar, Christoffer Dall, Davidlohr Bueso, Ashok Raj,
	Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin

On Tue, Jun 12, 2018 at 10:26:57PM -0700, Randy Dunlap wrote:
> On 06/12/2018 05:57 PM, Ricardo Neri wrote:
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index f2040d4..a8833c7 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -2577,7 +2577,7 @@
> >  			Format: [state][,regs][,debounce][,die]
> >  
> >  	nmi_watchdog=	[KNL,BUGS=X86] Debugging features for SMP kernels
> > -			Format: [panic,][nopanic,][num]
> > +			Format: [panic,][nopanic,][num,][hpet]
> >  			Valid num: 0 or 1
> >  			0 - turn hardlockup detector in nmi_watchdog off
> >  			1 - turn hardlockup detector in nmi_watchdog on
> 
> This says that I can use "nmi_watchdog=hpet" without using 0 or 1.
> Is that correct?

Yes, this what I meant. In my view, if you set nmi_watchdog=hpet it
implies that you want to activate the NMI watchdog. In this case, perf.

I can see how this will be ambiguous for the case of perf and arch NMI
watchdogs.

Alternative, a new parameter could be added; such as nmi_watchdog_type. I
didn't want to add it in this patchset as I think that a single parameter
can handle the enablement and type of the NMI watchdog.

What do you think?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 16/23] watchdog/hardlockup: Add an HPET-based hardlockup detector
@ 2018-06-14  1:00       ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-14  1:00 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Ashok Raj, Borislav Petkov, Tony Luck, Ravi V. Shankar, x86,
	sparclinux, linuxppc-dev, linux-kernel, Jacob Pan,
	Rafael J. Wysocki, Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Davidlohr Bueso, Christoffer Dall, Marc Zyngier,
	Kai-Heng Feng, Konrad Rzeszutek Wilk, David Rientjes, iommu

On Tue, Jun 12, 2018 at 10:23:47PM -0700, Randy Dunlap wrote:
> Hi,

Hi Randy,

> 
> On 06/12/2018 05:57 PM, Ricardo Neri wrote:
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index c40c7b7..6e79833 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -828,6 +828,16 @@ config HARDLOCKUP_DETECTOR_PERF
> >  	bool
> >  	select SOFTLOCKUP_DETECTOR
> >  
> > +config HARDLOCKUP_DETECTOR_HPET
> > +	bool "Use HPET Timer for Hard Lockup Detection"
> > +	select SOFTLOCKUP_DETECTOR
> > +	select HARDLOCKUP_DETECTOR
> > +	depends on HPET_TIMER && HPET
> > +	help
> > +	  Say y to enable a hardlockup detector that is driven by an High-Precision
> > +	  Event Timer. In addition to selecting this option, the command-line
> > +	  parameter nmi_watchdog option. See Documentation/admin-guide/kernel-parameters.rst
> 
> The "In addition ..." thing is a broken (incomplete) sentence.

Oops. I apologize. I missed this I will fix it in my next version.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 16/23] watchdog/hardlockup: Add an HPET-based hardlockup detector
@ 2018-06-14  1:00       ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-14  1:00 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Rafael J. Wysocki, Peter Zijlstra, Alexei Starovoitov,
	Kai-Heng Feng, H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA,
	Ingo Molnar, Christoffer Dall, Davidlohr Bueso, Ashok Raj,
	Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin

On Tue, Jun 12, 2018 at 10:23:47PM -0700, Randy Dunlap wrote:
> Hi,

Hi Randy,

> 
> On 06/12/2018 05:57 PM, Ricardo Neri wrote:
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index c40c7b7..6e79833 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -828,6 +828,16 @@ config HARDLOCKUP_DETECTOR_PERF
> >  	bool
> >  	select SOFTLOCKUP_DETECTOR
> >  
> > +config HARDLOCKUP_DETECTOR_HPET
> > +	bool "Use HPET Timer for Hard Lockup Detection"
> > +	select SOFTLOCKUP_DETECTOR
> > +	select HARDLOCKUP_DETECTOR
> > +	depends on HPET_TIMER && HPET
> > +	help
> > +	  Say y to enable a hardlockup detector that is driven by an High-Precision
> > +	  Event Timer. In addition to selecting this option, the command-line
> > +	  parameter nmi_watchdog option. See Documentation/admin-guide/kernel-parameters.rst
> 
> The "In addition ..." thing is a broken (incomplete) sentence.

Oops. I apologize. I missed this I will fix it in my next version.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 16/23] watchdog/hardlockup: Add an HPET-based hardlockup detector
@ 2018-06-14  1:00       ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-14  1:00 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Rafael J. Wysocki, Peter Zijlstra, Alexei Starovoitov,
	Kai-Heng Feng, H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA,
	Ingo Molnar, Christoffer Dall, Davidlohr Bueso, Ashok Raj,
	Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin

On Tue, Jun 12, 2018 at 10:23:47PM -0700, Randy Dunlap wrote:
> Hi,

Hi Randy,

> 
> On 06/12/2018 05:57 PM, Ricardo Neri wrote:
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index c40c7b7..6e79833 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -828,6 +828,16 @@ config HARDLOCKUP_DETECTOR_PERF
> >  	bool
> >  	select SOFTLOCKUP_DETECTOR
> >  
> > +config HARDLOCKUP_DETECTOR_HPET
> > +	bool "Use HPET Timer for Hard Lockup Detection"
> > +	select SOFTLOCKUP_DETECTOR
> > +	select HARDLOCKUP_DETECTOR
> > +	depends on HPET_TIMER && HPET
> > +	help
> > +	  Say y to enable a hardlockup detector that is driven by an High-Precision
> > +	  Event Timer. In addition to selecting this option, the command-line
> > +	  parameter nmi_watchdog option. See Documentation/admin-guide/kernel-parameters.rst
> 
> The "In addition ..." thing is a broken (incomplete) sentence.

Oops. I apologize. I missed this I will fix it in my next version.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 14/23] watchdog/hardlockup: Decouple the hardlockup detector from perf
  2018-06-13  8:43     ` Peter Zijlstra
  (?)
@ 2018-06-14  1:19       ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-14  1:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Ashok Raj, Borislav Petkov, Tony Luck, Ravi V. Shankar, x86,
	sparclinux, linuxppc-dev, linux-kernel, Jacob Pan, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Mathieu Desnoyers, Masami Hiramatsu,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Luis R. Rodriguez, iommu

On Wed, Jun 13, 2018 at 10:43:24AM +0200, Peter Zijlstra wrote:
> On Tue, Jun 12, 2018 at 05:57:34PM -0700, Ricardo Neri wrote:
> > The current default implementation of the hardlockup detector assumes that
> > it is implemented using perf events.
> 
> The sparc and powerpc things are very much not using perf.

Isn't it true that the current hardlockup detector
(under kernel/watchdog_hld.c) is based on perf? As far as I understand,
this hardlockup detector is constructed using perf events for architectures
that don't provide an NMI watchdog. Perhaps I can be more specific and say
that this synthetized detector is based on perf.

On a side note, I saw that powerpc might use a perf-based hardlockup
detector if it has perf events [1].

Please let me know if my understanding is not correct.

Thanks and BR,
Ricardo

[1]. https://elixir.bootlin.com/linux/v4.17/source/arch/powerpc/Kconfig#L218


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 14/23] watchdog/hardlockup: Decouple the hardlockup detector from perf
@ 2018-06-14  1:19       ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-14  1:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Ashok Raj, Borislav Petkov, Tony Luck, Ravi V. Shankar, x86,
	sparclinux, linuxppc-dev, linux-kernel, Jacob Pan, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Mathieu Desnoyers

On Wed, Jun 13, 2018 at 10:43:24AM +0200, Peter Zijlstra wrote:
> On Tue, Jun 12, 2018 at 05:57:34PM -0700, Ricardo Neri wrote:
> > The current default implementation of the hardlockup detector assumes that
> > it is implemented using perf events.
> 
> The sparc and powerpc things are very much not using perf.

Isn't it true that the current hardlockup detector
(under kernel/watchdog_hld.c) is based on perf? As far as I understand,
this hardlockup detector is constructed using perf events for architectures
that don't provide an NMI watchdog. Perhaps I can be more specific and say
that this synthetized detector is based on perf.

On a side note, I saw that powerpc might use a perf-based hardlockup
detector if it has perf events [1].

Please let me know if my understanding is not correct.

Thanks and BR,
Ricardo

[1]. https://elixir.bootlin.com/linux/v4.17/source/arch/powerpc/Kconfig#L218


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 14/23] watchdog/hardlockup: Decouple the hardlockup detector from perf
@ 2018-06-14  1:19       ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-14  1:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Ashok Raj, Borislav Petkov, Tony Luck, Ravi V. Shankar, x86,
	sparclinux, linuxppc-dev, linux-kernel, Jacob Pan, Don Zickus,
	Nicholas Piggin, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Mathieu Desnoyers

On Wed, Jun 13, 2018 at 10:43:24AM +0200, Peter Zijlstra wrote:
> On Tue, Jun 12, 2018 at 05:57:34PM -0700, Ricardo Neri wrote:
> > The current default implementation of the hardlockup detector assumes that
> > it is implemented using perf events.
> 
> The sparc and powerpc things are very much not using perf.

Isn't it true that the current hardlockup detector
(under kernel/watchdog_hld.c) is based on perf? As far as I understand,
this hardlockup detector is constructed using perf events for architectures
that don't provide an NMI watchdog. Perhaps I can be more specific and say
that this synthetized detector is based on perf.

On a side note, I saw that powerpc might use a perf-based hardlockup
detector if it has perf events [1].

Please let me know if my understanding is not correct.

Thanks and BR,
Ricardo

[1]. https://elixir.bootlin.com/linux/v4.17/source/arch/powerpc/Kconfig#L218

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
  2018-06-13  8:42       ` Peter Zijlstra
  (?)
@ 2018-06-14  1:26         ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-14  1:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nicholas Piggin, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Don Zickus, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Mathieu Desnoyers, Masami Hiramatsu,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Luis R. Rodriguez, iommu

On Wed, Jun 13, 2018 at 10:42:19AM +0200, Peter Zijlstra wrote:
> On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:
> > On Tue, 12 Jun 2018 17:57:32 -0700
> > Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> > 
> > > Instead of exposing individual functions for the operations of the NMI
> > > watchdog, define a common interface that can be used across multiple
> > > implementations.
> > > 
> > > The struct nmi_watchdog_ops is defined for such operations. These initial
> > > definitions include the enable, disable, start, stop, and cleanup
> > > operations.
> > > 
> > > Only a single NMI watchdog can be used in the system. The operations of
> > > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > > variable is set to point the operations of the first NMI watchdog that
> > > initializes successfully. Even though at this moment, the only available
> > > NMI watchdog is the perf-based hardlockup detector. More implementations
> > > can be added in the future.
> > 
> > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > least have their own NMI watchdogs, it would be good to have those
> > converted as well.
> 
> Yeah, agreed, this looks like half a patch.

I planned to look into the conversion of sparc and powerpc. I just wanted
to see the reception to these patches before jumping and do potentially
useless work. Comments in this thread lean towards keep using the weak
stubs.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
@ 2018-06-14  1:26         ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-14  1:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nicholas Piggin, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Don Zickus, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Mathieu Desnoyers

On Wed, Jun 13, 2018 at 10:42:19AM +0200, Peter Zijlstra wrote:
> On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:
> > On Tue, 12 Jun 2018 17:57:32 -0700
> > Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> > 
> > > Instead of exposing individual functions for the operations of the NMI
> > > watchdog, define a common interface that can be used across multiple
> > > implementations.
> > > 
> > > The struct nmi_watchdog_ops is defined for such operations. These initial
> > > definitions include the enable, disable, start, stop, and cleanup
> > > operations.
> > > 
> > > Only a single NMI watchdog can be used in the system. The operations of
> > > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > > variable is set to point the operations of the first NMI watchdog that
> > > initializes successfully. Even though at this moment, the only available
> > > NMI watchdog is the perf-based hardlockup detector. More implementations
> > > can be added in the future.
> > 
> > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > least have their own NMI watchdogs, it would be good to have those
> > converted as well.
> 
> Yeah, agreed, this looks like half a patch.

I planned to look into the conversion of sparc and powerpc. I just wanted
to see the reception to these patches before jumping and do potentially
useless work. Comments in this thread lean towards keep using the weak
stubs.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
@ 2018-06-14  1:26         ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-14  1:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nicholas Piggin, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Don Zickus, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Mathieu Desnoyers

On Wed, Jun 13, 2018 at 10:42:19AM +0200, Peter Zijlstra wrote:
> On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:
> > On Tue, 12 Jun 2018 17:57:32 -0700
> > Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> > 
> > > Instead of exposing individual functions for the operations of the NMI
> > > watchdog, define a common interface that can be used across multiple
> > > implementations.
> > > 
> > > The struct nmi_watchdog_ops is defined for such operations. These initial
> > > definitions include the enable, disable, start, stop, and cleanup
> > > operations.
> > > 
> > > Only a single NMI watchdog can be used in the system. The operations of
> > > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > > variable is set to point the operations of the first NMI watchdog that
> > > initializes successfully. Even though at this moment, the only available
> > > NMI watchdog is the perf-based hardlockup detector. More implementations
> > > can be added in the future.
> > 
> > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > least have their own NMI watchdogs, it would be good to have those
> > converted as well.
> 
> Yeah, agreed, this looks like half a patch.

I planned to look into the conversion of sparc and powerpc. I just wanted
to see the reception to these patches before jumping and do potentially
useless work. Comments in this thread lean towards keep using the weak
stubs.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
@ 2018-06-14  1:31             ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-14  1:31 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Don Zickus, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Mathieu Desnoyers, Masami Hiramatsu,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Luis R. Rodriguez, iommu

On Wed, Jun 13, 2018 at 09:52:25PM +1000, Nicholas Piggin wrote:
> On Wed, 13 Jun 2018 11:26:49 +0200 (CEST)
> Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > On Wed, 13 Jun 2018, Peter Zijlstra wrote:
> > > On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:  
> > > > On Tue, 12 Jun 2018 17:57:32 -0700
> > > > Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> > > >   
> > > > > Instead of exposing individual functions for the operations of the NMI
> > > > > watchdog, define a common interface that can be used across multiple
> > > > > implementations.
> > > > > 
> > > > > The struct nmi_watchdog_ops is defined for such operations. These initial
> > > > > definitions include the enable, disable, start, stop, and cleanup
> > > > > operations.
> > > > > 
> > > > > Only a single NMI watchdog can be used in the system. The operations of
> > > > > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > > > > variable is set to point the operations of the first NMI watchdog that
> > > > > initializes successfully. Even though at this moment, the only available
> > > > > NMI watchdog is the perf-based hardlockup detector. More implementations
> > > > > can be added in the future.  
> > > > 
> > > > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > > > least have their own NMI watchdogs, it would be good to have those
> > > > converted as well.  
> > > 
> > > Yeah, agreed, this looks like half a patch.  
> > 
> > Though I'm not seeing the advantage of it. That kind of NMI watchdogs are
> > low level architecture details so having yet another 'ops' data structure
> > with a gazillion of callbacks, checks and indirections does not provide
> > value over the currently available weak stubs.
> 
> The other way to go of course is librify the perf watchdog and make an
> x86 watchdog that selects between perf and hpet... I also probably
> prefer that for code such as this, but I wouldn't strongly object to
> ops struct if I'm not writing the code. It's not that bad is it?

My motivation to add the ops was that the hpet and perf watchdog share
significant portions of code. I could look into creating the library for
common code and relocate the hpet watchdog into arch/x86 for the hpet-
specific parts.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
@ 2018-06-14  1:31             ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-14  1:31 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Peter Zijlstra, Benjamin Herrenschmidt, Paul Mackerras,
	H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar,
	Ashok Raj, Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Andi Kleen,
	Borislav Petkov, Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Frederic Weisbecker, Mathieu Desnoyers, Thomas Gleixner,
	Tony Luck, Babu Moger, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Luis R. Rodriguez, Jacob Pan, Philippe Ombredanne

On Wed, Jun 13, 2018 at 09:52:25PM +1000, Nicholas Piggin wrote:
> On Wed, 13 Jun 2018 11:26:49 +0200 (CEST)
> Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > On Wed, 13 Jun 2018, Peter Zijlstra wrote:
> > > On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:  
> > > > On Tue, 12 Jun 2018 17:57:32 -0700
> > > > Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> > > >   
> > > > > Instead of exposing individual functions for the operations of the NMI
> > > > > watchdog, define a common interface that can be used across multiple
> > > > > implementations.
> > > > > 
> > > > > The struct nmi_watchdog_ops is defined for such operations. These initial
> > > > > definitions include the enable, disable, start, stop, and cleanup
> > > > > operations.
> > > > > 
> > > > > Only a single NMI watchdog can be used in the system. The operations of
> > > > > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > > > > variable is set to point the operations of the first NMI watchdog that
> > > > > initializes successfully. Even though at this moment, the only available
> > > > > NMI watchdog is the perf-based hardlockup detector. More implementations
> > > > > can be added in the future.  
> > > > 
> > > > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > > > least have their own NMI watchdogs, it would be good to have those
> > > > converted as well.  
> > > 
> > > Yeah, agreed, this looks like half a patch.  
> > 
> > Though I'm not seeing the advantage of it. That kind of NMI watchdogs are
> > low level architecture details so having yet another 'ops' data structure
> > with a gazillion of callbacks, checks and indirections does not provide
> > value over the currently available weak stubs.
> 
> The other way to go of course is librify the perf watchdog and make an
> x86 watchdog that selects between perf and hpet... I also probably
> prefer that for code such as this, but I wouldn't strongly object to
> ops struct if I'm not writing the code. It's not that bad is it?

My motivation to add the ops was that the hpet and perf watchdog share
significant portions of code. I could look into creating the library for
common code and relocate the hpet watchdog into arch/x86 for the hpet-
specific parts.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
@ 2018-06-14  1:31             ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-14  1:31 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Peter Zijlstra, Benjamin Herrenschmidt, Paul Mackerras,
	H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar,
	Ashok Raj, Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Andi Kleen,
	Borislav Petkov, Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Frederic Weisbecker, Mathieu Desnoyers, Thomas Gleixner,
	Tony Luck, Babu Moger, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Luis R. Rodriguez, Jacob Pan, Philippe Ombredanne

On Wed, Jun 13, 2018 at 09:52:25PM +1000, Nicholas Piggin wrote:
> On Wed, 13 Jun 2018 11:26:49 +0200 (CEST)
> Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org> wrote:
> 
> > On Wed, 13 Jun 2018, Peter Zijlstra wrote:
> > > On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:  
> > > > On Tue, 12 Jun 2018 17:57:32 -0700
> > > > Ricardo Neri <ricardo.neri-calderon-VuQAYsv1563Yd54FQh9/CA@public.gmane.org> wrote:
> > > >   
> > > > > Instead of exposing individual functions for the operations of the NMI
> > > > > watchdog, define a common interface that can be used across multiple
> > > > > implementations.
> > > > > 
> > > > > The struct nmi_watchdog_ops is defined for such operations. These initial
> > > > > definitions include the enable, disable, start, stop, and cleanup
> > > > > operations.
> > > > > 
> > > > > Only a single NMI watchdog can be used in the system. The operations of
> > > > > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > > > > variable is set to point the operations of the first NMI watchdog that
> > > > > initializes successfully. Even though at this moment, the only available
> > > > > NMI watchdog is the perf-based hardlockup detector. More implementations
> > > > > can be added in the future.  
> > > > 
> > > > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > > > least have their own NMI watchdogs, it would be good to have those
> > > > converted as well.  
> > > 
> > > Yeah, agreed, this looks like half a patch.  
> > 
> > Though I'm not seeing the advantage of it. That kind of NMI watchdogs are
> > low level architecture details so having yet another 'ops' data structure
> > with a gazillion of callbacks, checks and indirections does not provide
> > value over the currently available weak stubs.
> 
> The other way to go of course is librify the perf watchdog and make an
> x86 watchdog that selects between perf and hpet... I also probably
> prefer that for code such as this, but I wouldn't strongly object to
> ops struct if I'm not writing the code. It's not that bad is it?

My motivation to add the ops was that the hpet and perf watchdog share
significant portions of code. I could look into creating the library for
common code and relocate the hpet watchdog into arch/x86 for the hpet-
specific parts.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 14/23] watchdog/hardlockup: Decouple the hardlockup detector from perf
  2018-06-14  1:19       ` Ricardo Neri
  (?)
@ 2018-06-14  1:41         ` Nicholas Piggin
  -1 siblings, 0 replies; 200+ messages in thread
From: Nicholas Piggin @ 2018-06-14  1:41 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Don Zickus, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Mathieu Desnoyers, Masami Hiramatsu,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Luis R. Rodriguez, iommu

On Wed, 13 Jun 2018 18:19:01 -0700
Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:

> On Wed, Jun 13, 2018 at 10:43:24AM +0200, Peter Zijlstra wrote:
> > On Tue, Jun 12, 2018 at 05:57:34PM -0700, Ricardo Neri wrote:  
> > > The current default implementation of the hardlockup detector assumes that
> > > it is implemented using perf events.  
> > 
> > The sparc and powerpc things are very much not using perf.  
> 
> Isn't it true that the current hardlockup detector
> (under kernel/watchdog_hld.c) is based on perf?

arch/powerpc/kernel/watchdog.c is a powerpc implementation that uses
the kernel/watchdog_hld.c framework.

> As far as I understand,
> this hardlockup detector is constructed using perf events for architectures
> that don't provide an NMI watchdog. Perhaps I can be more specific and say
> that this synthetized detector is based on perf.

The perf detector is like that, but we want NMI watchdogs to share
the watchdog_hld code as much as possible even for arch specific NMI
watchdogs, so that kernel and user interfaces and behaviour are
consistent.

Other arch watchdogs like sparc are a little older so they are not
using HLD. You don't have to change those for your series, but it
would be good to bring them into the fold if possible at some time.
IIRC sparc was slightly non-trivial because it has some differences
in sysctl or cmdline APIs that we don't want to break.

But powerpc at least needs to be updated if you change hld apis.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 14/23] watchdog/hardlockup: Decouple the hardlockup detector from perf
@ 2018-06-14  1:41         ` Nicholas Piggin
  0 siblings, 0 replies; 200+ messages in thread
From: Nicholas Piggin @ 2018-06-14  1:41 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra, Benjamin Herrenschmidt, Paul Mackerras,
	H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar,
	Ashok Raj, Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Andi Kleen,
	Borislav Petkov, Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Frederic Weisbecker, Mathieu Desnoyers, Thomas Gleixner,
	Tony Luck, Babu Moger, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Luis R. Rodriguez, Jacob Pan, Philippe Ombredanne

On Wed, 13 Jun 2018 18:19:01 -0700
Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:

> On Wed, Jun 13, 2018 at 10:43:24AM +0200, Peter Zijlstra wrote:
> > On Tue, Jun 12, 2018 at 05:57:34PM -0700, Ricardo Neri wrote:  
> > > The current default implementation of the hardlockup detector assumes that
> > > it is implemented using perf events.  
> > 
> > The sparc and powerpc things are very much not using perf.  
> 
> Isn't it true that the current hardlockup detector
> (under kernel/watchdog_hld.c) is based on perf?

arch/powerpc/kernel/watchdog.c is a powerpc implementation that uses
the kernel/watchdog_hld.c framework.

> As far as I understand,
> this hardlockup detector is constructed using perf events for architectures
> that don't provide an NMI watchdog. Perhaps I can be more specific and say
> that this synthetized detector is based on perf.

The perf detector is like that, but we want NMI watchdogs to share
the watchdog_hld code as much as possible even for arch specific NMI
watchdogs, so that kernel and user interfaces and behaviour are
consistent.

Other arch watchdogs like sparc are a little older so they are not
using HLD. You don't have to change those for your series, but it
would be good to bring them into the fold if possible at some time.
IIRC sparc was slightly non-trivial because it has some differences
in sysctl or cmdline APIs that we don't want to break.

But powerpc at least needs to be updated if you change hld apis.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 14/23] watchdog/hardlockup: Decouple the hardlockup detector from perf
@ 2018-06-14  1:41         ` Nicholas Piggin
  0 siblings, 0 replies; 200+ messages in thread
From: Nicholas Piggin @ 2018-06-14  1:41 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra, Benjamin Herrenschmidt, Paul Mackerras,
	H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar,
	Ashok Raj, Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Andi Kleen,
	Borislav Petkov, Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Frederic Weisbecker, Mathieu Desnoyers, Thomas Gleixner,
	Tony Luck, Babu Moger, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Luis R. Rodriguez, Jacob Pan, Philippe Ombredanne

On Wed, 13 Jun 2018 18:19:01 -0700
Ricardo Neri <ricardo.neri-calderon-VuQAYsv1563Yd54FQh9/CA@public.gmane.org> wrote:

> On Wed, Jun 13, 2018 at 10:43:24AM +0200, Peter Zijlstra wrote:
> > On Tue, Jun 12, 2018 at 05:57:34PM -0700, Ricardo Neri wrote:  
> > > The current default implementation of the hardlockup detector assumes that
> > > it is implemented using perf events.  
> > 
> > The sparc and powerpc things are very much not using perf.  
> 
> Isn't it true that the current hardlockup detector
> (under kernel/watchdog_hld.c) is based on perf?

arch/powerpc/kernel/watchdog.c is a powerpc implementation that uses
the kernel/watchdog_hld.c framework.

> As far as I understand,
> this hardlockup detector is constructed using perf events for architectures
> that don't provide an NMI watchdog. Perhaps I can be more specific and say
> that this synthetized detector is based on perf.

The perf detector is like that, but we want NMI watchdogs to share
the watchdog_hld code as much as possible even for arch specific NMI
watchdogs, so that kernel and user interfaces and behaviour are
consistent.

Other arch watchdogs like sparc are a little older so they are not
using HLD. You don't have to change those for your series, but it
would be good to bring them into the fold if possible at some time.
IIRC sparc was slightly non-trivial because it has some differences
in sysctl or cmdline APIs that we don't want to break.

But powerpc at least needs to be updated if you change hld apis.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
  2018-06-14  1:31             ` Ricardo Neri
  (?)
@ 2018-06-14  2:32               ` Nicholas Piggin
  -1 siblings, 0 replies; 200+ messages in thread
From: Nicholas Piggin @ 2018-06-14  2:32 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Don Zickus, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Mathieu Desnoyers, Masami Hiramatsu,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Luis R. Rodriguez, iommu

On Wed, 13 Jun 2018 18:31:17 -0700
Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:

> On Wed, Jun 13, 2018 at 09:52:25PM +1000, Nicholas Piggin wrote:
> > On Wed, 13 Jun 2018 11:26:49 +0200 (CEST)
> > Thomas Gleixner <tglx@linutronix.de> wrote:
> >   
> > > On Wed, 13 Jun 2018, Peter Zijlstra wrote:  
> > > > On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:    
> > > > > On Tue, 12 Jun 2018 17:57:32 -0700
> > > > > Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> > > > >     
> > > > > > Instead of exposing individual functions for the operations of the NMI
> > > > > > watchdog, define a common interface that can be used across multiple
> > > > > > implementations.
> > > > > > 
> > > > > > The struct nmi_watchdog_ops is defined for such operations. These initial
> > > > > > definitions include the enable, disable, start, stop, and cleanup
> > > > > > operations.
> > > > > > 
> > > > > > Only a single NMI watchdog can be used in the system. The operations of
> > > > > > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > > > > > variable is set to point the operations of the first NMI watchdog that
> > > > > > initializes successfully. Even though at this moment, the only available
> > > > > > NMI watchdog is the perf-based hardlockup detector. More implementations
> > > > > > can be added in the future.    
> > > > > 
> > > > > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > > > > least have their own NMI watchdogs, it would be good to have those
> > > > > converted as well.    
> > > > 
> > > > Yeah, agreed, this looks like half a patch.    
> > > 
> > > Though I'm not seeing the advantage of it. That kind of NMI watchdogs are
> > > low level architecture details so having yet another 'ops' data structure
> > > with a gazillion of callbacks, checks and indirections does not provide
> > > value over the currently available weak stubs.  
> > 
> > The other way to go of course is librify the perf watchdog and make an
> > x86 watchdog that selects between perf and hpet... I also probably
> > prefer that for code such as this, but I wouldn't strongly object to
> > ops struct if I'm not writing the code. It's not that bad is it?  
> 
> My motivation to add the ops was that the hpet and perf watchdog share
> significant portions of code.

Right, a good motivation.

> I could look into creating the library for
> common code and relocate the hpet watchdog into arch/x86 for the hpet-
> specific parts.

If you can investigate that approach, that would be appreciated. I hope
I did not misunderstand you there, Thomas.

Basically you would have perf infrastructure and hpet infrastructure,
and then the x86 watchdog driver will use one or the other of those. The
generic watchdog driver will be just a simple shim that uses the perf
infrastructure. Then hopefully the powerpc driver would require almost
no change.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
@ 2018-06-14  2:32               ` Nicholas Piggin
  0 siblings, 0 replies; 200+ messages in thread
From: Nicholas Piggin @ 2018-06-14  2:32 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra, Benjamin Herrenschmidt, Paul Mackerras,
	H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar,
	Ashok Raj, Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Andi Kleen,
	Borislav Petkov, Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Frederic Weisbecker, Mathieu Desnoyers, Thomas Gleixner,
	Tony Luck, Babu Moger, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Luis R. Rodriguez, Jacob Pan, Philippe Ombredanne

On Wed, 13 Jun 2018 18:31:17 -0700
Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:

> On Wed, Jun 13, 2018 at 09:52:25PM +1000, Nicholas Piggin wrote:
> > On Wed, 13 Jun 2018 11:26:49 +0200 (CEST)
> > Thomas Gleixner <tglx@linutronix.de> wrote:
> >   
> > > On Wed, 13 Jun 2018, Peter Zijlstra wrote:  
> > > > On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:    
> > > > > On Tue, 12 Jun 2018 17:57:32 -0700
> > > > > Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> > > > >     
> > > > > > Instead of exposing individual functions for the operations of the NMI
> > > > > > watchdog, define a common interface that can be used across multiple
> > > > > > implementations.
> > > > > > 
> > > > > > The struct nmi_watchdog_ops is defined for such operations. These initial
> > > > > > definitions include the enable, disable, start, stop, and cleanup
> > > > > > operations.
> > > > > > 
> > > > > > Only a single NMI watchdog can be used in the system. The operations of
> > > > > > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > > > > > variable is set to point the operations of the first NMI watchdog that
> > > > > > initializes successfully. Even though at this moment, the only available
> > > > > > NMI watchdog is the perf-based hardlockup detector. More implementations
> > > > > > can be added in the future.    
> > > > > 
> > > > > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > > > > least have their own NMI watchdogs, it would be good to have those
> > > > > converted as well.    
> > > > 
> > > > Yeah, agreed, this looks like half a patch.    
> > > 
> > > Though I'm not seeing the advantage of it. That kind of NMI watchdogs are
> > > low level architecture details so having yet another 'ops' data structure
> > > with a gazillion of callbacks, checks and indirections does not provide
> > > value over the currently available weak stubs.  
> > 
> > The other way to go of course is librify the perf watchdog and make an
> > x86 watchdog that selects between perf and hpet... I also probably
> > prefer that for code such as this, but I wouldn't strongly object to
> > ops struct if I'm not writing the code. It's not that bad is it?  
> 
> My motivation to add the ops was that the hpet and perf watchdog share
> significant portions of code.

Right, a good motivation.

> I could look into creating the library for
> common code and relocate the hpet watchdog into arch/x86 for the hpet-
> specific parts.

If you can investigate that approach, that would be appreciated. I hope
I did not misunderstand you there, Thomas.

Basically you would have perf infrastructure and hpet infrastructure,
and then the x86 watchdog driver will use one or the other of those. The
generic watchdog driver will be just a simple shim that uses the perf
infrastructure. Then hopefully the powerpc driver would require almost
no change.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
@ 2018-06-14  2:32               ` Nicholas Piggin
  0 siblings, 0 replies; 200+ messages in thread
From: Nicholas Piggin @ 2018-06-14  2:32 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra, Benjamin Herrenschmidt, Paul Mackerras,
	H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar,
	Ashok Raj, Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Andi Kleen,
	Borislav Petkov, Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Frederic Weisbecker, Mathieu Desnoyers, Thomas Gleixner,
	Tony Luck, Babu Moger, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Luis R. Rodriguez, Jacob Pan, Philippe Ombredanne

On Wed, 13 Jun 2018 18:31:17 -0700
Ricardo Neri <ricardo.neri-calderon-VuQAYsv1563Yd54FQh9/CA@public.gmane.org> wrote:

> On Wed, Jun 13, 2018 at 09:52:25PM +1000, Nicholas Piggin wrote:
> > On Wed, 13 Jun 2018 11:26:49 +0200 (CEST)
> > Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org> wrote:
> >   
> > > On Wed, 13 Jun 2018, Peter Zijlstra wrote:  
> > > > On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:    
> > > > > On Tue, 12 Jun 2018 17:57:32 -0700
> > > > > Ricardo Neri <ricardo.neri-calderon-VuQAYsv1563Yd54FQh9/CA@public.gmane.org> wrote:
> > > > >     
> > > > > > Instead of exposing individual functions for the operations of the NMI
> > > > > > watchdog, define a common interface that can be used across multiple
> > > > > > implementations.
> > > > > > 
> > > > > > The struct nmi_watchdog_ops is defined for such operations. These initial
> > > > > > definitions include the enable, disable, start, stop, and cleanup
> > > > > > operations.
> > > > > > 
> > > > > > Only a single NMI watchdog can be used in the system. The operations of
> > > > > > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > > > > > variable is set to point the operations of the first NMI watchdog that
> > > > > > initializes successfully. Even though at this moment, the only available
> > > > > > NMI watchdog is the perf-based hardlockup detector. More implementations
> > > > > > can be added in the future.    
> > > > > 
> > > > > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > > > > least have their own NMI watchdogs, it would be good to have those
> > > > > converted as well.    
> > > > 
> > > > Yeah, agreed, this looks like half a patch.    
> > > 
> > > Though I'm not seeing the advantage of it. That kind of NMI watchdogs are
> > > low level architecture details so having yet another 'ops' data structure
> > > with a gazillion of callbacks, checks and indirections does not provide
> > > value over the currently available weak stubs.  
> > 
> > The other way to go of course is librify the perf watchdog and make an
> > x86 watchdog that selects between perf and hpet... I also probably
> > prefer that for code such as this, but I wouldn't strongly object to
> > ops struct if I'm not writing the code. It's not that bad is it?  
> 
> My motivation to add the ops was that the hpet and perf watchdog share
> significant portions of code.

Right, a good motivation.

> I could look into creating the library for
> common code and relocate the hpet watchdog into arch/x86 for the hpet-
> specific parts.

If you can investigate that approach, that would be appreciated. I hope
I did not misunderstand you there, Thomas.

Basically you would have perf infrastructure and hpet infrastructure,
and then the x86 watchdog driver will use one or the other of those. The
generic watchdog driver will be just a simple shim that uses the perf
infrastructure. Then hopefully the powerpc driver would require almost
no change.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 22/23] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter
  2018-06-14  0:58       ` Ricardo Neri
  (?)
@ 2018-06-14  3:30         ` Randy Dunlap
  -1 siblings, 0 replies; 200+ messages in thread
From: Randy Dunlap @ 2018-06-14  3:30 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Ashok Raj, Borislav Petkov, Tony Luck, Ravi V. Shankar, x86,
	sparclinux, linuxppc-dev, linux-kernel, Jacob Pan,
	Rafael J. Wysocki, Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Davidlohr Bueso, Christoffer Dall, Marc Zyngier,
	Kai-Heng Feng, Konrad Rzeszutek Wilk, David Rientjes, iommu

On 06/13/2018 05:58 PM, Ricardo Neri wrote:
> On Tue, Jun 12, 2018 at 10:26:57PM -0700, Randy Dunlap wrote:
>> On 06/12/2018 05:57 PM, Ricardo Neri wrote:
>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>>> index f2040d4..a8833c7 100644
>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>> @@ -2577,7 +2577,7 @@
>>>  			Format: [state][,regs][,debounce][,die]
>>>  
>>>  	nmi_watchdog=	[KNL,BUGS=X86] Debugging features for SMP kernels
>>> -			Format: [panic,][nopanic,][num]
>>> +			Format: [panic,][nopanic,][num,][hpet]
>>>  			Valid num: 0 or 1
>>>  			0 - turn hardlockup detector in nmi_watchdog off
>>>  			1 - turn hardlockup detector in nmi_watchdog on
>>
>> This says that I can use "nmi_watchdog=hpet" without using 0 or 1.
>> Is that correct?
> 
> Yes, this what I meant. In my view, if you set nmi_watchdog=hpet it
> implies that you want to activate the NMI watchdog. In this case, perf.
> 
> I can see how this will be ambiguous for the case of perf and arch NMI
> watchdogs.
> 
> Alternative, a new parameter could be added; such as nmi_watchdog_type. I
> didn't want to add it in this patchset as I think that a single parameter
> can handle the enablement and type of the NMI watchdog.
> 
> What do you think?

I think it's OK like it is.

thanks,
-- 
~Randy

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 22/23] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter
@ 2018-06-14  3:30         ` Randy Dunlap
  0 siblings, 0 replies; 200+ messages in thread
From: Randy Dunlap @ 2018-06-14  3:30 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Rafael J. Wysocki, Peter Zijlstra, Alexei Starovoitov,
	Kai-Heng Feng, H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA,
	Ingo Molnar, Christoffer Dall, Davidlohr Bueso, Ashok Raj,
	Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin

On 06/13/2018 05:58 PM, Ricardo Neri wrote:
> On Tue, Jun 12, 2018 at 10:26:57PM -0700, Randy Dunlap wrote:
>> On 06/12/2018 05:57 PM, Ricardo Neri wrote:
>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>>> index f2040d4..a8833c7 100644
>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>> @@ -2577,7 +2577,7 @@
>>>  			Format: [state][,regs][,debounce][,die]
>>>  
>>>  	nmi_watchdog=	[KNL,BUGS=X86] Debugging features for SMP kernels
>>> -			Format: [panic,][nopanic,][num]
>>> +			Format: [panic,][nopanic,][num,][hpet]
>>>  			Valid num: 0 or 1
>>>  			0 - turn hardlockup detector in nmi_watchdog off
>>>  			1 - turn hardlockup detector in nmi_watchdog on
>>
>> This says that I can use "nmi_watchdog=hpet" without using 0 or 1.
>> Is that correct?
> 
> Yes, this what I meant. In my view, if you set nmi_watchdog=hpet it
> implies that you want to activate the NMI watchdog. In this case, perf.
> 
> I can see how this will be ambiguous for the case of perf and arch NMI
> watchdogs.
> 
> Alternative, a new parameter could be added; such as nmi_watchdog_type. I
> didn't want to add it in this patchset as I think that a single parameter
> can handle the enablement and type of the NMI watchdog.
> 
> What do you think?

I think it's OK like it is.

thanks,
-- 
~Randy

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 22/23] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter
@ 2018-06-14  3:30         ` Randy Dunlap
  0 siblings, 0 replies; 200+ messages in thread
From: Randy Dunlap @ 2018-06-14  3:30 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Rafael J. Wysocki, Peter Zijlstra, Alexei Starovoitov,
	Kai-Heng Feng, H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA,
	Ingo Molnar, Christoffer Dall, Davidlohr Bueso, Ashok Raj,
	Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin

On 06/13/2018 05:58 PM, Ricardo Neri wrote:
> On Tue, Jun 12, 2018 at 10:26:57PM -0700, Randy Dunlap wrote:
>> On 06/12/2018 05:57 PM, Ricardo Neri wrote:
>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>>> index f2040d4..a8833c7 100644
>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>> @@ -2577,7 +2577,7 @@
>>>  			Format: [state][,regs][,debounce][,die]
>>>  
>>>  	nmi_watchdog=	[KNL,BUGS=X86] Debugging features for SMP kernels
>>> -			Format: [panic,][nopanic,][num]
>>> +			Format: [panic,][nopanic,][num,][hpet]
>>>  			Valid num: 0 or 1
>>>  			0 - turn hardlockup detector in nmi_watchdog off
>>>  			1 - turn hardlockup detector in nmi_watchdog on
>>
>> This says that I can use "nmi_watchdog=hpet" without using 0 or 1.
>> Is that correct?
> 
> Yes, this what I meant. In my view, if you set nmi_watchdog=hpet it
> implies that you want to activate the NMI watchdog. In this case, perf.
> 
> I can see how this will be ambiguous for the case of perf and arch NMI
> watchdogs.
> 
> Alternative, a new parameter could be added; such as nmi_watchdog_type. I
> didn't want to add it in this patchset as I think that a single parameter
> can handle the enablement and type of the NMI watchdog.
> 
> What do you think?

I think it's OK like it is.

thanks,
-- 
~Randy

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
  2018-06-14  2:32               ` Nicholas Piggin
  (?)
@ 2018-06-14  8:32                 ` Thomas Gleixner
  -1 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-14  8:32 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Ricardo Neri, Peter Zijlstra, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Don Zickus, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Mathieu Desnoyers, Masami Hiramatsu,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Luis R. Rodriguez, iommu

On Thu, 14 Jun 2018, Nicholas Piggin wrote:
> On Wed, 13 Jun 2018 18:31:17 -0700
> > I could look into creating the library for
> > common code and relocate the hpet watchdog into arch/x86 for the hpet-
> > specific parts.
> 
> If you can investigate that approach, that would be appreciated. I hope
> I did not misunderstand you there, Thomas.

I'm not against cleanups and consolidation, quite the contrary.

But this stuff just adds new infrastructure w/o showing that it's actually
a cleanup and consolidation.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
@ 2018-06-14  8:32                 ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-14  8:32 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Ricardo Neri, Peter Zijlstra, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Don Zickus, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Math

On Thu, 14 Jun 2018, Nicholas Piggin wrote:
> On Wed, 13 Jun 2018 18:31:17 -0700
> > I could look into creating the library for
> > common code and relocate the hpet watchdog into arch/x86 for the hpet-
> > specific parts.
> 
> If you can investigate that approach, that would be appreciated. I hope
> I did not misunderstand you there, Thomas.

I'm not against cleanups and consolidation, quite the contrary.

But this stuff just adds new infrastructure w/o showing that it's actually
a cleanup and consolidation.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
@ 2018-06-14  8:32                 ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-14  8:32 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Ricardo Neri, Peter Zijlstra, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Don Zickus, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Math

On Thu, 14 Jun 2018, Nicholas Piggin wrote:
> On Wed, 13 Jun 2018 18:31:17 -0700
> > I could look into creating the library for
> > common code and relocate the hpet watchdog into arch/x86 for the hpet-
> > specific parts.
> 
> If you can investigate that approach, that would be appreciated. I hope
> I did not misunderstand you there, Thomas.

I'm not against cleanups and consolidation, quite the contrary.

But this stuff just adds new infrastructure w/o showing that it's actually
a cleanup and consolidation.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
  2018-06-13  9:40     ` Thomas Gleixner
  (?)
@ 2018-06-15  2:03       ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-15  2:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Randy Dunlap, Davidlohr Bueso, Christoffer Dall,
	Marc Zyngier, Kai-Heng Feng, Konrad Rzeszutek Wilk,
	David Rientjes, iommu

On Wed, Jun 13, 2018 at 11:40:00AM +0200, Thomas Gleixner wrote:
> On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > @@ -183,6 +184,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
> >  	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> >  		kick_timer(hdata);
> >  
> > +	pr_err("This interrupt should not have happened. Ensure delivery mode is NMI.\n");
> 
> Eeew.

If you don't mind me asking. What is the problem with this error message?
> 
> >  /**
> > + * hardlockup_detector_nmi_handler() - NMI Interrupt handler
> > + * @val:	Attribute associated with the NMI. Not used.
> > + * @regs:	Register values as seen when the NMI was asserted
> > + *
> > + * When an NMI is issued, look for hardlockups. If the timer is not periodic,
> > + * kick it. The interrupt is always handled when if delivered via the
> > + * Front-Side Bus.
> > + *
> > + * Returns:
> > + *
> > + * NMI_DONE if the HPET timer did not cause the interrupt. NMI_HANDLED
> > + * otherwise.
> > + */
> > +static int hardlockup_detector_nmi_handler(unsigned int val,
> > +					   struct pt_regs *regs)
> > +{
> > +	struct hpet_hld_data *hdata = hld_data;
> > +	unsigned int use_fsb;
> > +
> > +	/*
> > +	 * If FSB delivery mode is used, the timer interrupt is programmed as
> > +	 * edge-triggered and there is no need to check the ISR register.
> > +	 */
> > +	use_fsb = hdata->flags & HPET_DEV_FSB_CAP;
> > +
> > +	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
> > +		return NMI_DONE;
> 
> So for 'use_fsb == True' every single NMI will fall through into the
> watchdog code below.
> 
> > +	inspect_for_hardlockups(regs);
> > +
> > +	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> > +		kick_timer(hdata);
> 
> And in case that the HPET does not support periodic mode this reprogramms
> the timer on every NMI which means that while perf is running the watchdog
> will never ever detect anything.

Yes. I see that this is wrong. With MSI interrupts, as far as I can
see, there is not a way to make sure that the HPET timer caused the NMI
perhaps the only option is to use an IO APIC interrupt and read the
interrupt status register.

> 
> Aside of that, reading TWO HPET registers for every NMI is insane. HPET
> access is horribly slow, so any high frequency perf monitoring will take a
> massive performance hit.

If an IO APIC interrupt is used, only HPET register (the status register)
would need to be read for every NMI. Would that be more acceptable? Otherwise,
there is no way to determine if the HPET cause the NMI.

Alternatively, there could be a counter that skips reading the HPET status
register (and the detection of hardlockups) for every X NMIs. This would
reduce the overall frequency of HPET register reads.

Is that more acceptable?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-15  2:03       ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-15  2:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu

On Wed, Jun 13, 2018 at 11:40:00AM +0200, Thomas Gleixner wrote:
> On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > @@ -183,6 +184,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
> >  	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> >  		kick_timer(hdata);
> >  
> > +	pr_err("This interrupt should not have happened. Ensure delivery mode is NMI.\n");
> 
> Eeew.

If you don't mind me asking. What is the problem with this error message?
> 
> >  /**
> > + * hardlockup_detector_nmi_handler() - NMI Interrupt handler
> > + * @val:	Attribute associated with the NMI. Not used.
> > + * @regs:	Register values as seen when the NMI was asserted
> > + *
> > + * When an NMI is issued, look for hardlockups. If the timer is not periodic,
> > + * kick it. The interrupt is always handled when if delivered via the
> > + * Front-Side Bus.
> > + *
> > + * Returns:
> > + *
> > + * NMI_DONE if the HPET timer did not cause the interrupt. NMI_HANDLED
> > + * otherwise.
> > + */
> > +static int hardlockup_detector_nmi_handler(unsigned int val,
> > +					   struct pt_regs *regs)
> > +{
> > +	struct hpet_hld_data *hdata = hld_data;
> > +	unsigned int use_fsb;
> > +
> > +	/*
> > +	 * If FSB delivery mode is used, the timer interrupt is programmed as
> > +	 * edge-triggered and there is no need to check the ISR register.
> > +	 */
> > +	use_fsb = hdata->flags & HPET_DEV_FSB_CAP;
> > +
> > +	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
> > +		return NMI_DONE;
> 
> So for 'use_fsb = True' every single NMI will fall through into the
> watchdog code below.
> 
> > +	inspect_for_hardlockups(regs);
> > +
> > +	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> > +		kick_timer(hdata);
> 
> And in case that the HPET does not support periodic mode this reprogramms
> the timer on every NMI which means that while perf is running the watchdog
> will never ever detect anything.

Yes. I see that this is wrong. With MSI interrupts, as far as I can
see, there is not a way to make sure that the HPET timer caused the NMI
perhaps the only option is to use an IO APIC interrupt and read the
interrupt status register.

> 
> Aside of that, reading TWO HPET registers for every NMI is insane. HPET
> access is horribly slow, so any high frequency perf monitoring will take a
> massive performance hit.

If an IO APIC interrupt is used, only HPET register (the status register)
would need to be read for every NMI. Would that be more acceptable? Otherwise,
there is no way to determine if the HPET cause the NMI.

Alternatively, there could be a counter that skips reading the HPET status
register (and the detection of hardlockups) for every X NMIs. This would
reduce the overall frequency of HPET register reads.

Is that more acceptable?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-15  2:03       ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-15  2:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu

On Wed, Jun 13, 2018 at 11:40:00AM +0200, Thomas Gleixner wrote:
> On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > @@ -183,6 +184,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
> >  	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> >  		kick_timer(hdata);
> >  
> > +	pr_err("This interrupt should not have happened. Ensure delivery mode is NMI.\n");
> 
> Eeew.

If you don't mind me asking. What is the problem with this error message?
> 
> >  /**
> > + * hardlockup_detector_nmi_handler() - NMI Interrupt handler
> > + * @val:	Attribute associated with the NMI. Not used.
> > + * @regs:	Register values as seen when the NMI was asserted
> > + *
> > + * When an NMI is issued, look for hardlockups. If the timer is not periodic,
> > + * kick it. The interrupt is always handled when if delivered via the
> > + * Front-Side Bus.
> > + *
> > + * Returns:
> > + *
> > + * NMI_DONE if the HPET timer did not cause the interrupt. NMI_HANDLED
> > + * otherwise.
> > + */
> > +static int hardlockup_detector_nmi_handler(unsigned int val,
> > +					   struct pt_regs *regs)
> > +{
> > +	struct hpet_hld_data *hdata = hld_data;
> > +	unsigned int use_fsb;
> > +
> > +	/*
> > +	 * If FSB delivery mode is used, the timer interrupt is programmed as
> > +	 * edge-triggered and there is no need to check the ISR register.
> > +	 */
> > +	use_fsb = hdata->flags & HPET_DEV_FSB_CAP;
> > +
> > +	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
> > +		return NMI_DONE;
> 
> So for 'use_fsb == True' every single NMI will fall through into the
> watchdog code below.
> 
> > +	inspect_for_hardlockups(regs);
> > +
> > +	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> > +		kick_timer(hdata);
> 
> And in case that the HPET does not support periodic mode this reprogramms
> the timer on every NMI which means that while perf is running the watchdog
> will never ever detect anything.

Yes. I see that this is wrong. With MSI interrupts, as far as I can
see, there is not a way to make sure that the HPET timer caused the NMI
perhaps the only option is to use an IO APIC interrupt and read the
interrupt status register.

> 
> Aside of that, reading TWO HPET registers for every NMI is insane. HPET
> access is horribly slow, so any high frequency perf monitoring will take a
> massive performance hit.

If an IO APIC interrupt is used, only HPET register (the status register)
would need to be read for every NMI. Would that be more acceptable? Otherwise,
there is no way to determine if the HPET cause the NMI.

Alternatively, there could be a counter that skips reading the HPET status
register (and the detection of hardlockups) for every X NMIs. This would
reduce the overall frequency of HPET register reads.

Is that more acceptable?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-15  2:07       ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-15  2:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Ashok Raj, Borislav Petkov, Tony Luck, Ravi V. Shankar, x86,
	sparclinux, linuxppc-dev, linux-kernel, Jacob Pan,
	Rafael J. Wysocki, Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Andrew Morton,
	Philippe Ombredanne, Colin Ian King, Byungchul Park,
	Paul E. McKenney, Luis R. Rodriguez, Waiman Long, Josh Poimboeuf,
	Randy Dunlap, Davidlohr Bueso, Christoffer Dall, Marc Zyngier,
	Kai-Heng Feng, Konrad Rzeszutek Wilk, David Rientjes, iommu

On Wed, Jun 13, 2018 at 11:07:20AM +0200, Peter Zijlstra wrote:
> On Tue, Jun 12, 2018 at 05:57:37PM -0700, Ricardo Neri wrote:
> 
> +static bool is_hpet_wdt_interrupt(struct hpet_hld_data *hdata)
> +{
> +       unsigned long this_isr;
> +       unsigned int lvl_trig;
> +
> +       this_isr = hpet_readl(HPET_STATUS) & BIT(hdata->num);
> +
> +       lvl_trig = hpet_readl(HPET_Tn_CFG(hdata->num)) & HPET_TN_LEVEL;
> +
> +       if (lvl_trig && this_isr)
> +               return true;
> +
> +       return false;
> +}
> 
> > +static int hardlockup_detector_nmi_handler(unsigned int val,
> > +					   struct pt_regs *regs)
> > +{
> > +	struct hpet_hld_data *hdata = hld_data;
> > +	unsigned int use_fsb;
> > +
> > +	/*
> > +	 * If FSB delivery mode is used, the timer interrupt is programmed as
> > +	 * edge-triggered and there is no need to check the ISR register.
> > +	 */
> > +	use_fsb = hdata->flags & HPET_DEV_FSB_CAP;
> 
> Please do explain.. That FSB thing basically means MSI. But there's only
> a single NMI vector. How do we know this NMI came from the HPET?

Indeed, I see now that this is wrong. There is no way to know. The only way
is to use an IO APIC interrupt and read the HPET status register.

> 
> > +
> > +	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
> 
> So you add _2_ HPET reads for every single NMI that gets triggered...
> and IIRC HPET reads are _sllooooowwwwww_.


Since the trigger mode of the HPET timer is not expected to change, 
perhaps is_hpet_wdt_interrupt() can only need the interrupt status
register. This would reduce the reads to one. Furthermore, the hardlockup
detector can skip an X number of NMIs and reduce further the frequency
of reads. Does this make sense?

> 
> > +		return NMI_DONE;
> > +
> > +	inspect_for_hardlockups(regs);
> > +
> > +	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> > +		kick_timer(hdata);
> > +
> > +	/* Acknowledge interrupt if in level-triggered mode */
> > +	if (!use_fsb)
> > +		hpet_writel(BIT(hdata->num), HPET_STATUS);
> > +
> > +	return NMI_HANDLED;
> 
> So if I read this right, when in FSB/MSI mode, we'll basically _always_
> claim every single NMI as handled?
> 
> That's broken.

Yes, this is not correct. I will drop the functionality to use
FSB/MSI mode.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-15  2:07       ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-15  2:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rafael J. Wysocki, Alexei Starovoitov, Kai-Heng Feng,
	H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar,
	Christoffer Dall, Davidlohr Bueso, Ashok Raj, Michael Ellerman,
	x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin, Byungchul Park

On Wed, Jun 13, 2018 at 11:07:20AM +0200, Peter Zijlstra wrote:
> On Tue, Jun 12, 2018 at 05:57:37PM -0700, Ricardo Neri wrote:
> 
> +static bool is_hpet_wdt_interrupt(struct hpet_hld_data *hdata)
> +{
> +       unsigned long this_isr;
> +       unsigned int lvl_trig;
> +
> +       this_isr = hpet_readl(HPET_STATUS) & BIT(hdata->num);
> +
> +       lvl_trig = hpet_readl(HPET_Tn_CFG(hdata->num)) & HPET_TN_LEVEL;
> +
> +       if (lvl_trig && this_isr)
> +               return true;
> +
> +       return false;
> +}
> 
> > +static int hardlockup_detector_nmi_handler(unsigned int val,
> > +					   struct pt_regs *regs)
> > +{
> > +	struct hpet_hld_data *hdata = hld_data;
> > +	unsigned int use_fsb;
> > +
> > +	/*
> > +	 * If FSB delivery mode is used, the timer interrupt is programmed as
> > +	 * edge-triggered and there is no need to check the ISR register.
> > +	 */
> > +	use_fsb = hdata->flags & HPET_DEV_FSB_CAP;
> 
> Please do explain.. That FSB thing basically means MSI. But there's only
> a single NMI vector. How do we know this NMI came from the HPET?

Indeed, I see now that this is wrong. There is no way to know. The only way
is to use an IO APIC interrupt and read the HPET status register.

> 
> > +
> > +	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
> 
> So you add _2_ HPET reads for every single NMI that gets triggered...
> and IIRC HPET reads are _sllooooowwwwww_.


Since the trigger mode of the HPET timer is not expected to change, 
perhaps is_hpet_wdt_interrupt() can only need the interrupt status
register. This would reduce the reads to one. Furthermore, the hardlockup
detector can skip an X number of NMIs and reduce further the frequency
of reads. Does this make sense?

> 
> > +		return NMI_DONE;
> > +
> > +	inspect_for_hardlockups(regs);
> > +
> > +	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> > +		kick_timer(hdata);
> > +
> > +	/* Acknowledge interrupt if in level-triggered mode */
> > +	if (!use_fsb)
> > +		hpet_writel(BIT(hdata->num), HPET_STATUS);
> > +
> > +	return NMI_HANDLED;
> 
> So if I read this right, when in FSB/MSI mode, we'll basically _always_
> claim every single NMI as handled?
> 
> That's broken.

Yes, this is not correct. I will drop the functionality to use
FSB/MSI mode.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-15  2:07       ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-15  2:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rafael J. Wysocki, Alexei Starovoitov, Kai-Heng Feng,
	H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar,
	Christoffer Dall, Davidlohr Bueso, Ashok Raj, Michael Ellerman,
	x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin, Byungchul Park

On Wed, Jun 13, 2018 at 11:07:20AM +0200, Peter Zijlstra wrote:
> On Tue, Jun 12, 2018 at 05:57:37PM -0700, Ricardo Neri wrote:
> 
> +static bool is_hpet_wdt_interrupt(struct hpet_hld_data *hdata)
> +{
> +       unsigned long this_isr;
> +       unsigned int lvl_trig;
> +
> +       this_isr = hpet_readl(HPET_STATUS) & BIT(hdata->num);
> +
> +       lvl_trig = hpet_readl(HPET_Tn_CFG(hdata->num)) & HPET_TN_LEVEL;
> +
> +       if (lvl_trig && this_isr)
> +               return true;
> +
> +       return false;
> +}
> 
> > +static int hardlockup_detector_nmi_handler(unsigned int val,
> > +					   struct pt_regs *regs)
> > +{
> > +	struct hpet_hld_data *hdata = hld_data;
> > +	unsigned int use_fsb;
> > +
> > +	/*
> > +	 * If FSB delivery mode is used, the timer interrupt is programmed as
> > +	 * edge-triggered and there is no need to check the ISR register.
> > +	 */
> > +	use_fsb = hdata->flags & HPET_DEV_FSB_CAP;
> 
> Please do explain.. That FSB thing basically means MSI. But there's only
> a single NMI vector. How do we know this NMI came from the HPET?

Indeed, I see now that this is wrong. There is no way to know. The only way
is to use an IO APIC interrupt and read the HPET status register.

> 
> > +
> > +	if (!use_fsb && !is_hpet_wdt_interrupt(hdata))
> 
> So you add _2_ HPET reads for every single NMI that gets triggered...
> and IIRC HPET reads are _sllooooowwwwww_.


Since the trigger mode of the HPET timer is not expected to change, 
perhaps is_hpet_wdt_interrupt() can only need the interrupt status
register. This would reduce the reads to one. Furthermore, the hardlockup
detector can skip an X number of NMIs and reduce further the frequency
of reads. Does this make sense?

> 
> > +		return NMI_DONE;
> > +
> > +	inspect_for_hardlockups(regs);
> > +
> > +	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> > +		kick_timer(hdata);
> > +
> > +	/* Acknowledge interrupt if in level-triggered mode */
> > +	if (!use_fsb)
> > +		hpet_writel(BIT(hdata->num), HPET_STATUS);
> > +
> > +	return NMI_HANDLED;
> 
> So if I read this right, when in FSB/MSI mode, we'll basically _always_
> claim every single NMI as handled?
> 
> That's broken.

Yes, this is not correct. I will drop the functionality to use
FSB/MSI mode.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
  2018-06-13 10:06           ` Marc Zyngier
  (?)
@ 2018-06-15  2:12             ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-15  2:12 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, Julien Thierry, Peter Zijlstra, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Ashok Raj, Borislav Petkov,
	Tony Luck, Ravi V. Shankar, x86, sparclinux, linuxppc-dev,
	linux-kernel, Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Bartosz Golaszewski, Doug Berger,
	Palmer Dabbelt, iommu

On Wed, Jun 13, 2018 at 11:06:25AM +0100, Marc Zyngier wrote:
> On 13/06/18 10:20, Thomas Gleixner wrote:
> > On Wed, 13 Jun 2018, Julien Thierry wrote:
> >> On 13/06/18 09:34, Peter Zijlstra wrote:
> >>> On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
> >>>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> >>>> index 5426627..dbc5e02 100644
> >>>> --- a/include/linux/interrupt.h
> >>>> +++ b/include/linux/interrupt.h
> >>>> @@ -61,6 +61,8 @@
> >>>>    *                interrupt handler after suspending interrupts. For
> >>>> system
> >>>>    *                wakeup devices users need to implement wakeup
> >>>> detection in
> >>>>    *                their interrupt handlers.
> >>>> + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
> >>>> non-maskable, if
> >>>> + *                supported by the chip.
> >>>>    */
> >>>
> >>> NAK on the first 6 patches. You really _REALLY_ don't want to expose
> >>> NMIs to this level.
> >>>
> >>
> >> I've been working on something similar on arm64 side, and effectively the one
> >> thing that might be common to arm64 and intel is the interface to set an
> >> interrupt as NMI. So I guess it would be nice to agree on the right approach
> >> for this.
> >>
> >> The way I did it was by introducing a new irq_state and let the irqchip driver
> >> handle most of the work (if it supports that state):
> >>
> >> https://lkml.org/lkml/2018/5/25/181
> >>
> >> This has not been ACKed nor NAKed. So I am just asking whether this is a more
> >> suitable approach, and if not, is there any suggestions on how to do this?
> > 
> > I really didn't pay attention to that as it's burried in the GIC/ARM series
> > which is usually Marc's playground.
> 
> I'm working my way through it ATM now that I have some brain cycles back.
> 
> > Adding NMI delivery support at low level architecture irq chip level is
> > perfectly fine, but the exposure of that needs to be restricted very
> > much. Adding it to the generic interrupt control interfaces is not going to
> > happen. That's doomed to begin with and a complete abuse of the interface
> > as the handler can not ever be used for that.
> 
> I can only agree with that. Allowing random driver to use request_irq()
> to make anything an NMI ultimately turns it into a complete mess ("hey,
> NMI is *faster*, let's use that"), and a potential source of horrible
> deadlocks.
> 
> What I'd find more palatable is a way for an irqchip to be able to
> prioritize some interrupts based on a set of architecturally-defined
> requirements, and a separate NMI requesting/handling framework that is
> separate from the IRQ API, as the overall requirements are likely to
> completely different.
> 
> It shouldn't have to be nearly as complex as the IRQ API, and require
> much stricter requirements in terms of what you can do there (flow
> handling should definitely be different).

Marc, Julien, do you plan to actively work on this? Would you mind keeping
me in the loop? I also need this work for this watchdog. In the meantime,
I will go through Julien's patches and try to adapt it to my work.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-15  2:12             ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-15  2:12 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, Julien Thierry, Peter Zijlstra, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Ashok Raj, Borislav Petkov,
	Tony Luck, Ravi V. Shankar, x86, sparclinux, linuxppc-dev,
	linux-kernel, Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Bartosz Golaszewski

On Wed, Jun 13, 2018 at 11:06:25AM +0100, Marc Zyngier wrote:
> On 13/06/18 10:20, Thomas Gleixner wrote:
> > On Wed, 13 Jun 2018, Julien Thierry wrote:
> >> On 13/06/18 09:34, Peter Zijlstra wrote:
> >>> On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
> >>>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> >>>> index 5426627..dbc5e02 100644
> >>>> --- a/include/linux/interrupt.h
> >>>> +++ b/include/linux/interrupt.h
> >>>> @@ -61,6 +61,8 @@
> >>>>    *                interrupt handler after suspending interrupts. For
> >>>> system
> >>>>    *                wakeup devices users need to implement wakeup
> >>>> detection in
> >>>>    *                their interrupt handlers.
> >>>> + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
> >>>> non-maskable, if
> >>>> + *                supported by the chip.
> >>>>    */
> >>>
> >>> NAK on the first 6 patches. You really _REALLY_ don't want to expose
> >>> NMIs to this level.
> >>>
> >>
> >> I've been working on something similar on arm64 side, and effectively the one
> >> thing that might be common to arm64 and intel is the interface to set an
> >> interrupt as NMI. So I guess it would be nice to agree on the right approach
> >> for this.
> >>
> >> The way I did it was by introducing a new irq_state and let the irqchip driver
> >> handle most of the work (if it supports that state):
> >>
> >> https://lkml.org/lkml/2018/5/25/181
> >>
> >> This has not been ACKed nor NAKed. So I am just asking whether this is a more
> >> suitable approach, and if not, is there any suggestions on how to do this?
> > 
> > I really didn't pay attention to that as it's burried in the GIC/ARM series
> > which is usually Marc's playground.
> 
> I'm working my way through it ATM now that I have some brain cycles back.
> 
> > Adding NMI delivery support at low level architecture irq chip level is
> > perfectly fine, but the exposure of that needs to be restricted very
> > much. Adding it to the generic interrupt control interfaces is not going to
> > happen. That's doomed to begin with and a complete abuse of the interface
> > as the handler can not ever be used for that.
> 
> I can only agree with that. Allowing random driver to use request_irq()
> to make anything an NMI ultimately turns it into a complete mess ("hey,
> NMI is *faster*, let's use that"), and a potential source of horrible
> deadlocks.
> 
> What I'd find more palatable is a way for an irqchip to be able to
> prioritize some interrupts based on a set of architecturally-defined
> requirements, and a separate NMI requesting/handling framework that is
> separate from the IRQ API, as the overall requirements are likely to
> completely different.
> 
> It shouldn't have to be nearly as complex as the IRQ API, and require
> much stricter requirements in terms of what you can do there (flow
> handling should definitely be different).

Marc, Julien, do you plan to actively work on this? Would you mind keeping
me in the loop? I also need this work for this watchdog. In the meantime,
I will go through Julien's patches and try to adapt it to my work.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-15  2:12             ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-15  2:12 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, Julien Thierry, Peter Zijlstra, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Ashok Raj, Borislav Petkov,
	Tony Luck, Ravi V. Shankar, x86, sparclinux, linuxppc-dev,
	linux-kernel, Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Bartosz Golaszewski

On Wed, Jun 13, 2018 at 11:06:25AM +0100, Marc Zyngier wrote:
> On 13/06/18 10:20, Thomas Gleixner wrote:
> > On Wed, 13 Jun 2018, Julien Thierry wrote:
> >> On 13/06/18 09:34, Peter Zijlstra wrote:
> >>> On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
> >>>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> >>>> index 5426627..dbc5e02 100644
> >>>> --- a/include/linux/interrupt.h
> >>>> +++ b/include/linux/interrupt.h
> >>>> @@ -61,6 +61,8 @@
> >>>>    *                interrupt handler after suspending interrupts. For
> >>>> system
> >>>>    *                wakeup devices users need to implement wakeup
> >>>> detection in
> >>>>    *                their interrupt handlers.
> >>>> + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
> >>>> non-maskable, if
> >>>> + *                supported by the chip.
> >>>>    */
> >>>
> >>> NAK on the first 6 patches. You really _REALLY_ don't want to expose
> >>> NMIs to this level.
> >>>
> >>
> >> I've been working on something similar on arm64 side, and effectively the one
> >> thing that might be common to arm64 and intel is the interface to set an
> >> interrupt as NMI. So I guess it would be nice to agree on the right approach
> >> for this.
> >>
> >> The way I did it was by introducing a new irq_state and let the irqchip driver
> >> handle most of the work (if it supports that state):
> >>
> >> https://lkml.org/lkml/2018/5/25/181
> >>
> >> This has not been ACKed nor NAKed. So I am just asking whether this is a more
> >> suitable approach, and if not, is there any suggestions on how to do this?
> > 
> > I really didn't pay attention to that as it's burried in the GIC/ARM series
> > which is usually Marc's playground.
> 
> I'm working my way through it ATM now that I have some brain cycles back.
> 
> > Adding NMI delivery support at low level architecture irq chip level is
> > perfectly fine, but the exposure of that needs to be restricted very
> > much. Adding it to the generic interrupt control interfaces is not going to
> > happen. That's doomed to begin with and a complete abuse of the interface
> > as the handler can not ever be used for that.
> 
> I can only agree with that. Allowing random driver to use request_irq()
> to make anything an NMI ultimately turns it into a complete mess ("hey,
> NMI is *faster*, let's use that"), and a potential source of horrible
> deadlocks.
> 
> What I'd find more palatable is a way for an irqchip to be able to
> prioritize some interrupts based on a set of architecturally-defined
> requirements, and a separate NMI requesting/handling framework that is
> separate from the IRQ API, as the overall requirements are likely to
> completely different.
> 
> It shouldn't have to be nearly as complex as the IRQ API, and require
> much stricter requirements in terms of what you can do there (flow
> handling should definitely be different).

Marc, Julien, do you plan to actively work on this? Would you mind keeping
me in the loop? I also need this work for this watchdog. In the meantime,
I will go through Julien's patches and try to adapt it to my work.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs
@ 2018-06-15  2:16       ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-15  2:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Randy Dunlap, Davidlohr Bueso, Christoffer Dall,
	Marc Zyngier, Kai-Heng Feng, Konrad Rzeszutek Wilk,
	David Rientjes, iommu

On Wed, Jun 13, 2018 at 11:48:09AM +0200, Thomas Gleixner wrote:
> On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > +	/* There are no CPUs to monitor. */
> > +	if (!cpumask_weight(&hdata->monitored_mask))
> > +		return NMI_HANDLED;
> > +
> >  	inspect_for_hardlockups(regs);
> >  
> > +	/*
> > +	 * Target a new CPU. Keep trying until we find a monitored CPU. CPUs
> > +	 * are addded and removed to this mask at cpu_up() and cpu_down(),
> > +	 * respectively. Thus, the interrupt should be able to be moved to
> > +	 * the next monitored CPU.
> > +	 */
> > +	spin_lock(&hld_data->lock);
> 
> Yuck. Taking a spinlock from NMI ...

I am sorry. I will look into other options for locking. Do you think rcu_lock
would help in this case? I need this locking because the CPUs being monitored
changes as CPUs come online and offline.

> 
> > +	for_each_cpu_wrap(cpu, &hdata->monitored_mask, smp_processor_id() + 1) {
> > +		if (!irq_set_affinity(hld_data->irq, cpumask_of(cpu)))
> > +			break;
> 
> ... and then calling into generic interrupt code which will take even more
> locks is completely broken.


I will into reworking how the destination of the interrupt is set.

Thanks and BR,

Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs
@ 2018-06-15  2:16       ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-15  2:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, Peter Zijlstra, Alexei Starovoitov,
	Kai-Heng Feng, H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA,
	Ingo Molnar, Christoffer Dall, Davidlohr Bueso, Ashok Raj,
	Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin

On Wed, Jun 13, 2018 at 11:48:09AM +0200, Thomas Gleixner wrote:
> On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > +	/* There are no CPUs to monitor. */
> > +	if (!cpumask_weight(&hdata->monitored_mask))
> > +		return NMI_HANDLED;
> > +
> >  	inspect_for_hardlockups(regs);
> >  
> > +	/*
> > +	 * Target a new CPU. Keep trying until we find a monitored CPU. CPUs
> > +	 * are addded and removed to this mask at cpu_up() and cpu_down(),
> > +	 * respectively. Thus, the interrupt should be able to be moved to
> > +	 * the next monitored CPU.
> > +	 */
> > +	spin_lock(&hld_data->lock);
> 
> Yuck. Taking a spinlock from NMI ...

I am sorry. I will look into other options for locking. Do you think rcu_lock
would help in this case? I need this locking because the CPUs being monitored
changes as CPUs come online and offline.

> 
> > +	for_each_cpu_wrap(cpu, &hdata->monitored_mask, smp_processor_id() + 1) {
> > +		if (!irq_set_affinity(hld_data->irq, cpumask_of(cpu)))
> > +			break;
> 
> ... and then calling into generic interrupt code which will take even more
> locks is completely broken.


I will into reworking how the destination of the interrupt is set.

Thanks and BR,

Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs
@ 2018-06-15  2:16       ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-15  2:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, Peter Zijlstra, Alexei Starovoitov,
	Kai-Heng Feng, H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA,
	Ingo Molnar, Christoffer Dall, Davidlohr Bueso, Ashok Raj,
	Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin

On Wed, Jun 13, 2018 at 11:48:09AM +0200, Thomas Gleixner wrote:
> On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > +	/* There are no CPUs to monitor. */
> > +	if (!cpumask_weight(&hdata->monitored_mask))
> > +		return NMI_HANDLED;
> > +
> >  	inspect_for_hardlockups(regs);
> >  
> > +	/*
> > +	 * Target a new CPU. Keep trying until we find a monitored CPU. CPUs
> > +	 * are addded and removed to this mask at cpu_up() and cpu_down(),
> > +	 * respectively. Thus, the interrupt should be able to be moved to
> > +	 * the next monitored CPU.
> > +	 */
> > +	spin_lock(&hld_data->lock);
> 
> Yuck. Taking a spinlock from NMI ...

I am sorry. I will look into other options for locking. Do you think rcu_lock
would help in this case? I need this locking because the CPUs being monitored
changes as CPUs come online and offline.

> 
> > +	for_each_cpu_wrap(cpu, &hdata->monitored_mask, smp_processor_id() + 1) {
> > +		if (!irq_set_affinity(hld_data->irq, cpumask_of(cpu)))
> > +			break;
> 
> ... and then calling into generic interrupt code which will take even more
> locks is completely broken.


I will into reworking how the destination of the interrupt is set.

Thanks and BR,

Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
  2018-06-14  2:32               ` Nicholas Piggin
  (?)
@ 2018-06-15  2:21                 ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-15  2:21 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Don Zickus, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Mathieu Desnoyers, Masami Hiramatsu,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Luis R. Rodriguez, iommu

On Thu, Jun 14, 2018 at 12:32:50PM +1000, Nicholas Piggin wrote:
> On Wed, 13 Jun 2018 18:31:17 -0700
> Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> 
> > On Wed, Jun 13, 2018 at 09:52:25PM +1000, Nicholas Piggin wrote:
> > > On Wed, 13 Jun 2018 11:26:49 +0200 (CEST)
> > > Thomas Gleixner <tglx@linutronix.de> wrote:
> > >   
> > > > On Wed, 13 Jun 2018, Peter Zijlstra wrote:  
> > > > > On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:    
> > > > > > On Tue, 12 Jun 2018 17:57:32 -0700
> > > > > > Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> > > > > >     
> > > > > > > Instead of exposing individual functions for the operations of the NMI
> > > > > > > watchdog, define a common interface that can be used across multiple
> > > > > > > implementations.
> > > > > > > 
> > > > > > > The struct nmi_watchdog_ops is defined for such operations. These initial
> > > > > > > definitions include the enable, disable, start, stop, and cleanup
> > > > > > > operations.
> > > > > > > 
> > > > > > > Only a single NMI watchdog can be used in the system. The operations of
> > > > > > > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > > > > > > variable is set to point the operations of the first NMI watchdog that
> > > > > > > initializes successfully. Even though at this moment, the only available
> > > > > > > NMI watchdog is the perf-based hardlockup detector. More implementations
> > > > > > > can be added in the future.    
> > > > > > 
> > > > > > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > > > > > least have their own NMI watchdogs, it would be good to have those
> > > > > > converted as well.    
> > > > > 
> > > > > Yeah, agreed, this looks like half a patch.    
> > > > 
> > > > Though I'm not seeing the advantage of it. That kind of NMI watchdogs are
> > > > low level architecture details so having yet another 'ops' data structure
> > > > with a gazillion of callbacks, checks and indirections does not provide
> > > > value over the currently available weak stubs.  
> > > 
> > > The other way to go of course is librify the perf watchdog and make an
> > > x86 watchdog that selects between perf and hpet... I also probably
> > > prefer that for code such as this, but I wouldn't strongly object to
> > > ops struct if I'm not writing the code. It's not that bad is it?  
> > 
> > My motivation to add the ops was that the hpet and perf watchdog share
> > significant portions of code.
> 
> Right, a good motivation.
> 
> > I could look into creating the library for
> > common code and relocate the hpet watchdog into arch/x86 for the hpet-
> > specific parts.
> 
> If you can investigate that approach, that would be appreciated. I hope
> I did not misunderstand you there, Thomas.
> 
> Basically you would have perf infrastructure and hpet infrastructure,
> and then the x86 watchdog driver will use one or the other of those. The
> generic watchdog driver will be just a simple shim that uses the perf
> infrastructure. Then hopefully the powerpc driver would require almost
> no change.

Sure, I will try to structure code to minimize the changes to the powerpc
watchdog... without breaking the sparc one.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
@ 2018-06-15  2:21                 ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-15  2:21 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Don Zickus, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras

On Thu, Jun 14, 2018 at 12:32:50PM +1000, Nicholas Piggin wrote:
> On Wed, 13 Jun 2018 18:31:17 -0700
> Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> 
> > On Wed, Jun 13, 2018 at 09:52:25PM +1000, Nicholas Piggin wrote:
> > > On Wed, 13 Jun 2018 11:26:49 +0200 (CEST)
> > > Thomas Gleixner <tglx@linutronix.de> wrote:
> > >   
> > > > On Wed, 13 Jun 2018, Peter Zijlstra wrote:  
> > > > > On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:    
> > > > > > On Tue, 12 Jun 2018 17:57:32 -0700
> > > > > > Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> > > > > >     
> > > > > > > Instead of exposing individual functions for the operations of the NMI
> > > > > > > watchdog, define a common interface that can be used across multiple
> > > > > > > implementations.
> > > > > > > 
> > > > > > > The struct nmi_watchdog_ops is defined for such operations. These initial
> > > > > > > definitions include the enable, disable, start, stop, and cleanup
> > > > > > > operations.
> > > > > > > 
> > > > > > > Only a single NMI watchdog can be used in the system. The operations of
> > > > > > > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > > > > > > variable is set to point the operations of the first NMI watchdog that
> > > > > > > initializes successfully. Even though at this moment, the only available
> > > > > > > NMI watchdog is the perf-based hardlockup detector. More implementations
> > > > > > > can be added in the future.    
> > > > > > 
> > > > > > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > > > > > least have their own NMI watchdogs, it would be good to have those
> > > > > > converted as well.    
> > > > > 
> > > > > Yeah, agreed, this looks like half a patch.    
> > > > 
> > > > Though I'm not seeing the advantage of it. That kind of NMI watchdogs are
> > > > low level architecture details so having yet another 'ops' data structure
> > > > with a gazillion of callbacks, checks and indirections does not provide
> > > > value over the currently available weak stubs.  
> > > 
> > > The other way to go of course is librify the perf watchdog and make an
> > > x86 watchdog that selects between perf and hpet... I also probably
> > > prefer that for code such as this, but I wouldn't strongly object to
> > > ops struct if I'm not writing the code. It's not that bad is it?  
> > 
> > My motivation to add the ops was that the hpet and perf watchdog share
> > significant portions of code.
> 
> Right, a good motivation.
> 
> > I could look into creating the library for
> > common code and relocate the hpet watchdog into arch/x86 for the hpet-
> > specific parts.
> 
> If you can investigate that approach, that would be appreciated. I hope
> I did not misunderstand you there, Thomas.
> 
> Basically you would have perf infrastructure and hpet infrastructure,
> and then the x86 watchdog driver will use one or the other of those. The
> generic watchdog driver will be just a simple shim that uses the perf
> infrastructure. Then hopefully the powerpc driver would require almost
> no change.

Sure, I will try to structure code to minimize the changes to the powerpc
watchdog... without breaking the sparc one.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations
@ 2018-06-15  2:21                 ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-15  2:21 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Don Zickus, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras

On Thu, Jun 14, 2018 at 12:32:50PM +1000, Nicholas Piggin wrote:
> On Wed, 13 Jun 2018 18:31:17 -0700
> Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> 
> > On Wed, Jun 13, 2018 at 09:52:25PM +1000, Nicholas Piggin wrote:
> > > On Wed, 13 Jun 2018 11:26:49 +0200 (CEST)
> > > Thomas Gleixner <tglx@linutronix.de> wrote:
> > >   
> > > > On Wed, 13 Jun 2018, Peter Zijlstra wrote:  
> > > > > On Wed, Jun 13, 2018 at 05:41:41PM +1000, Nicholas Piggin wrote:    
> > > > > > On Tue, 12 Jun 2018 17:57:32 -0700
> > > > > > Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> > > > > >     
> > > > > > > Instead of exposing individual functions for the operations of the NMI
> > > > > > > watchdog, define a common interface that can be used across multiple
> > > > > > > implementations.
> > > > > > > 
> > > > > > > The struct nmi_watchdog_ops is defined for such operations. These initial
> > > > > > > definitions include the enable, disable, start, stop, and cleanup
> > > > > > > operations.
> > > > > > > 
> > > > > > > Only a single NMI watchdog can be used in the system. The operations of
> > > > > > > this NMI watchdog are accessed via the new variable nmi_wd_ops. This
> > > > > > > variable is set to point the operations of the first NMI watchdog that
> > > > > > > initializes successfully. Even though at this moment, the only available
> > > > > > > NMI watchdog is the perf-based hardlockup detector. More implementations
> > > > > > > can be added in the future.    
> > > > > > 
> > > > > > Cool, this looks pretty nice at a quick glance. sparc and powerpc at
> > > > > > least have their own NMI watchdogs, it would be good to have those
> > > > > > converted as well.    
> > > > > 
> > > > > Yeah, agreed, this looks like half a patch.    
> > > > 
> > > > Though I'm not seeing the advantage of it. That kind of NMI watchdogs are
> > > > low level architecture details so having yet another 'ops' data structure
> > > > with a gazillion of callbacks, checks and indirections does not provide
> > > > value over the currently available weak stubs.  
> > > 
> > > The other way to go of course is librify the perf watchdog and make an
> > > x86 watchdog that selects between perf and hpet... I also probably
> > > prefer that for code such as this, but I wouldn't strongly object to
> > > ops struct if I'm not writing the code. It's not that bad is it?  
> > 
> > My motivation to add the ops was that the hpet and perf watchdog share
> > significant portions of code.
> 
> Right, a good motivation.
> 
> > I could look into creating the library for
> > common code and relocate the hpet watchdog into arch/x86 for the hpet-
> > specific parts.
> 
> If you can investigate that approach, that would be appreciated. I hope
> I did not misunderstand you there, Thomas.
> 
> Basically you would have perf infrastructure and hpet infrastructure,
> and then the x86 watchdog driver will use one or the other of those. The
> generic watchdog driver will be just a simple shim that uses the perf
> infrastructure. Then hopefully the powerpc driver would require almost
> no change.

Sure, I will try to structure code to minimize the changes to the powerpc
watchdog... without breaking the sparc one.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 14/23] watchdog/hardlockup: Decouple the hardlockup detector from perf
  2018-06-14  1:41         ` Nicholas Piggin
  (?)
@ 2018-06-15  2:23           ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-15  2:23 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Don Zickus, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras, Mathieu Desnoyers, Masami Hiramatsu,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Luis R. Rodriguez, iommu

On Thu, Jun 14, 2018 at 11:41:44AM +1000, Nicholas Piggin wrote:
> On Wed, 13 Jun 2018 18:19:01 -0700
> Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> 
> > On Wed, Jun 13, 2018 at 10:43:24AM +0200, Peter Zijlstra wrote:
> > > On Tue, Jun 12, 2018 at 05:57:34PM -0700, Ricardo Neri wrote:  
> > > > The current default implementation of the hardlockup detector assumes that
> > > > it is implemented using perf events.  
> > > 
> > > The sparc and powerpc things are very much not using perf.  
> > 
> > Isn't it true that the current hardlockup detector
> > (under kernel/watchdog_hld.c) is based on perf?
> 
> arch/powerpc/kernel/watchdog.c is a powerpc implementation that uses
> the kernel/watchdog_hld.c framework.
> 
> > As far as I understand,
> > this hardlockup detector is constructed using perf events for architectures
> > that don't provide an NMI watchdog. Perhaps I can be more specific and say
> > that this synthetized detector is based on perf.
> 
> The perf detector is like that, but we want NMI watchdogs to share
> the watchdog_hld code as much as possible even for arch specific NMI
> watchdogs, so that kernel and user interfaces and behaviour are
> consistent.
> 
> Other arch watchdogs like sparc are a little older so they are not
> using HLD. You don't have to change those for your series, but it
> would be good to bring them into the fold if possible at some time.
> IIRC sparc was slightly non-trivial because it has some differences
> in sysctl or cmdline APIs that we don't want to break.
> 
> But powerpc at least needs to be updated if you change hld apis.

I will look into updating at least the powerpc implementation as part
of these changes.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 14/23] watchdog/hardlockup: Decouple the hardlockup detector from perf
@ 2018-06-15  2:23           ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-15  2:23 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Don Zickus, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras

On Thu, Jun 14, 2018 at 11:41:44AM +1000, Nicholas Piggin wrote:
> On Wed, 13 Jun 2018 18:19:01 -0700
> Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> 
> > On Wed, Jun 13, 2018 at 10:43:24AM +0200, Peter Zijlstra wrote:
> > > On Tue, Jun 12, 2018 at 05:57:34PM -0700, Ricardo Neri wrote:  
> > > > The current default implementation of the hardlockup detector assumes that
> > > > it is implemented using perf events.  
> > > 
> > > The sparc and powerpc things are very much not using perf.  
> > 
> > Isn't it true that the current hardlockup detector
> > (under kernel/watchdog_hld.c) is based on perf?
> 
> arch/powerpc/kernel/watchdog.c is a powerpc implementation that uses
> the kernel/watchdog_hld.c framework.
> 
> > As far as I understand,
> > this hardlockup detector is constructed using perf events for architectures
> > that don't provide an NMI watchdog. Perhaps I can be more specific and say
> > that this synthetized detector is based on perf.
> 
> The perf detector is like that, but we want NMI watchdogs to share
> the watchdog_hld code as much as possible even for arch specific NMI
> watchdogs, so that kernel and user interfaces and behaviour are
> consistent.
> 
> Other arch watchdogs like sparc are a little older so they are not
> using HLD. You don't have to change those for your series, but it
> would be good to bring them into the fold if possible at some time.
> IIRC sparc was slightly non-trivial because it has some differences
> in sysctl or cmdline APIs that we don't want to break.
> 
> But powerpc at least needs to be updated if you change hld apis.

I will look into updating at least the powerpc implementation as part
of these changes.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 14/23] watchdog/hardlockup: Decouple the hardlockup detector from perf
@ 2018-06-15  2:23           ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-15  2:23 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Don Zickus, Michael Ellerman, Frederic Weisbecker,
	Babu Moger, David S. Miller, Benjamin Herrenschmidt,
	Paul Mackerras

On Thu, Jun 14, 2018 at 11:41:44AM +1000, Nicholas Piggin wrote:
> On Wed, 13 Jun 2018 18:19:01 -0700
> Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> 
> > On Wed, Jun 13, 2018 at 10:43:24AM +0200, Peter Zijlstra wrote:
> > > On Tue, Jun 12, 2018 at 05:57:34PM -0700, Ricardo Neri wrote:  
> > > > The current default implementation of the hardlockup detector assumes that
> > > > it is implemented using perf events.  
> > > 
> > > The sparc and powerpc things are very much not using perf.  
> > 
> > Isn't it true that the current hardlockup detector
> > (under kernel/watchdog_hld.c) is based on perf?
> 
> arch/powerpc/kernel/watchdog.c is a powerpc implementation that uses
> the kernel/watchdog_hld.c framework.
> 
> > As far as I understand,
> > this hardlockup detector is constructed using perf events for architectures
> > that don't provide an NMI watchdog. Perhaps I can be more specific and say
> > that this synthetized detector is based on perf.
> 
> The perf detector is like that, but we want NMI watchdogs to share
> the watchdog_hld code as much as possible even for arch specific NMI
> watchdogs, so that kernel and user interfaces and behaviour are
> consistent.
> 
> Other arch watchdogs like sparc are a little older so they are not
> using HLD. You don't have to change those for your series, but it
> would be good to bring them into the fold if possible at some time.
> IIRC sparc was slightly non-trivial because it has some differences
> in sysctl or cmdline APIs that we don't want to break.
> 
> But powerpc at least needs to be updated if you change hld apis.

I will look into updating at least the powerpc implementation as part
of these changes.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
  2018-06-15  2:12             ` Ricardo Neri
  (?)
@ 2018-06-15  8:01               ` Julien Thierry
  -1 siblings, 0 replies; 200+ messages in thread
From: Julien Thierry @ 2018-06-15  8:01 UTC (permalink / raw)
  To: Ricardo Neri, Marc Zyngier
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Bartosz Golaszewski, Doug Berger,
	Palmer Dabbelt, iommu

Hi Ricardo,

On 15/06/18 03:12, Ricardo Neri wrote:
> On Wed, Jun 13, 2018 at 11:06:25AM +0100, Marc Zyngier wrote:
>> On 13/06/18 10:20, Thomas Gleixner wrote:
>>> On Wed, 13 Jun 2018, Julien Thierry wrote:
>>>> On 13/06/18 09:34, Peter Zijlstra wrote:
>>>>> On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
>>>>>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
>>>>>> index 5426627..dbc5e02 100644
>>>>>> --- a/include/linux/interrupt.h
>>>>>> +++ b/include/linux/interrupt.h
>>>>>> @@ -61,6 +61,8 @@
>>>>>>     *                interrupt handler after suspending interrupts. For
>>>>>> system
>>>>>>     *                wakeup devices users need to implement wakeup
>>>>>> detection in
>>>>>>     *                their interrupt handlers.
>>>>>> + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
>>>>>> non-maskable, if
>>>>>> + *                supported by the chip.
>>>>>>     */
>>>>>
>>>>> NAK on the first 6 patches. You really _REALLY_ don't want to expose
>>>>> NMIs to this level.
>>>>>
>>>>
>>>> I've been working on something similar on arm64 side, and effectively the one
>>>> thing that might be common to arm64 and intel is the interface to set an
>>>> interrupt as NMI. So I guess it would be nice to agree on the right approach
>>>> for this.
>>>>
>>>> The way I did it was by introducing a new irq_state and let the irqchip driver
>>>> handle most of the work (if it supports that state):
>>>>
>>>> https://lkml.org/lkml/2018/5/25/181
>>>>
>>>> This has not been ACKed nor NAKed. So I am just asking whether this is a more
>>>> suitable approach, and if not, is there any suggestions on how to do this?
>>>
>>> I really didn't pay attention to that as it's burried in the GIC/ARM series
>>> which is usually Marc's playground.
>>
>> I'm working my way through it ATM now that I have some brain cycles back.
>>
>>> Adding NMI delivery support at low level architecture irq chip level is
>>> perfectly fine, but the exposure of that needs to be restricted very
>>> much. Adding it to the generic interrupt control interfaces is not going to
>>> happen. That's doomed to begin with and a complete abuse of the interface
>>> as the handler can not ever be used for that.
>>
>> I can only agree with that. Allowing random driver to use request_irq()
>> to make anything an NMI ultimately turns it into a complete mess ("hey,
>> NMI is *faster*, let's use that"), and a potential source of horrible
>> deadlocks.
>>
>> What I'd find more palatable is a way for an irqchip to be able to
>> prioritize some interrupts based on a set of architecturally-defined
>> requirements, and a separate NMI requesting/handling framework that is
>> separate from the IRQ API, as the overall requirements are likely to
>> completely different.
>>
>> It shouldn't have to be nearly as complex as the IRQ API, and require
>> much stricter requirements in terms of what you can do there (flow
>> handling should definitely be different).
> 
> Marc, Julien, do you plan to actively work on this? Would you mind keeping
> me in the loop? I also need this work for this watchdog. In the meantime,
> I will go through Julien's patches and try to adapt it to my work.

We are going to work on this and of course your input is most welcome to 
make sure we have an interface usable across different architectures.

In my patches, I'm not sure there is much to adapt to your work as most 
of it is arch specific (although I wont say no to another pair of eyes 
looking at them). From what I've seen of your patches, the point where 
we converge is that need for some code to be able to tell the irqchip "I 
want that particular interrupt line to be treated/setup as an NMI".

We'll make sure to keep you in the loop for discussions/suggestions on this.

Thanks,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-15  8:01               ` Julien Thierry
  0 siblings, 0 replies; 200+ messages in thread
From: Julien Thierry @ 2018-06-15  8:01 UTC (permalink / raw)
  To: Ricardo Neri, Marc Zyngier
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Bartosz Golaszewski, Doug Berger

Hi Ricardo,

On 15/06/18 03:12, Ricardo Neri wrote:
> On Wed, Jun 13, 2018 at 11:06:25AM +0100, Marc Zyngier wrote:
>> On 13/06/18 10:20, Thomas Gleixner wrote:
>>> On Wed, 13 Jun 2018, Julien Thierry wrote:
>>>> On 13/06/18 09:34, Peter Zijlstra wrote:
>>>>> On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
>>>>>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
>>>>>> index 5426627..dbc5e02 100644
>>>>>> --- a/include/linux/interrupt.h
>>>>>> +++ b/include/linux/interrupt.h
>>>>>> @@ -61,6 +61,8 @@
>>>>>>     *                interrupt handler after suspending interrupts. For
>>>>>> system
>>>>>>     *                wakeup devices users need to implement wakeup
>>>>>> detection in
>>>>>>     *                their interrupt handlers.
>>>>>> + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
>>>>>> non-maskable, if
>>>>>> + *                supported by the chip.
>>>>>>     */
>>>>>
>>>>> NAK on the first 6 patches. You really _REALLY_ don't want to expose
>>>>> NMIs to this level.
>>>>>
>>>>
>>>> I've been working on something similar on arm64 side, and effectively the one
>>>> thing that might be common to arm64 and intel is the interface to set an
>>>> interrupt as NMI. So I guess it would be nice to agree on the right approach
>>>> for this.
>>>>
>>>> The way I did it was by introducing a new irq_state and let the irqchip driver
>>>> handle most of the work (if it supports that state):
>>>>
>>>> https://lkml.org/lkml/2018/5/25/181
>>>>
>>>> This has not been ACKed nor NAKed. So I am just asking whether this is a more
>>>> suitable approach, and if not, is there any suggestions on how to do this?
>>>
>>> I really didn't pay attention to that as it's burried in the GIC/ARM series
>>> which is usually Marc's playground.
>>
>> I'm working my way through it ATM now that I have some brain cycles back.
>>
>>> Adding NMI delivery support at low level architecture irq chip level is
>>> perfectly fine, but the exposure of that needs to be restricted very
>>> much. Adding it to the generic interrupt control interfaces is not going to
>>> happen. That's doomed to begin with and a complete abuse of the interface
>>> as the handler can not ever be used for that.
>>
>> I can only agree with that. Allowing random driver to use request_irq()
>> to make anything an NMI ultimately turns it into a complete mess ("hey,
>> NMI is *faster*, let's use that"), and a potential source of horrible
>> deadlocks.
>>
>> What I'd find more palatable is a way for an irqchip to be able to
>> prioritize some interrupts based on a set of architecturally-defined
>> requirements, and a separate NMI requesting/handling framework that is
>> separate from the IRQ API, as the overall requirements are likely to
>> completely different.
>>
>> It shouldn't have to be nearly as complex as the IRQ API, and require
>> much stricter requirements in terms of what you can do there (flow
>> handling should definitely be different).
> 
> Marc, Julien, do you plan to actively work on this? Would you mind keeping
> me in the loop? I also need this work for this watchdog. In the meantime,
> I will go through Julien's patches and try to adapt it to my work.

We are going to work on this and of course your input is most welcome to 
make sure we have an interface usable across different architectures.

In my patches, I'm not sure there is much to adapt to your work as most 
of it is arch specific (although I wont say no to another pair of eyes 
looking at them). From what I've seen of your patches, the point where 
we converge is that need for some code to be able to tell the irqchip "I 
want that particular interrupt line to be treated/setup as an NMI".

We'll make sure to keep you in the loop for discussions/suggestions on this.

Thanks,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-15  8:01               ` Julien Thierry
  0 siblings, 0 replies; 200+ messages in thread
From: Julien Thierry @ 2018-06-15  8:01 UTC (permalink / raw)
  To: Ricardo Neri, Marc Zyngier
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, H. Peter Anvin,
	Andi Kleen, Ashok Raj, Borislav Petkov, Tony Luck,
	Ravi V. Shankar, x86, sparclinux, linuxppc-dev, linux-kernel,
	Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Bartosz Golaszewski, Doug Berger

Hi Ricardo,

On 15/06/18 03:12, Ricardo Neri wrote:
> On Wed, Jun 13, 2018 at 11:06:25AM +0100, Marc Zyngier wrote:
>> On 13/06/18 10:20, Thomas Gleixner wrote:
>>> On Wed, 13 Jun 2018, Julien Thierry wrote:
>>>> On 13/06/18 09:34, Peter Zijlstra wrote:
>>>>> On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
>>>>>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
>>>>>> index 5426627..dbc5e02 100644
>>>>>> --- a/include/linux/interrupt.h
>>>>>> +++ b/include/linux/interrupt.h
>>>>>> @@ -61,6 +61,8 @@
>>>>>>     *                interrupt handler after suspending interrupts. For
>>>>>> system
>>>>>>     *                wakeup devices users need to implement wakeup
>>>>>> detection in
>>>>>>     *                their interrupt handlers.
>>>>>> + * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
>>>>>> non-maskable, if
>>>>>> + *                supported by the chip.
>>>>>>     */
>>>>>
>>>>> NAK on the first 6 patches. You really _REALLY_ don't want to expose
>>>>> NMIs to this level.
>>>>>
>>>>
>>>> I've been working on something similar on arm64 side, and effectively the one
>>>> thing that might be common to arm64 and intel is the interface to set an
>>>> interrupt as NMI. So I guess it would be nice to agree on the right approach
>>>> for this.
>>>>
>>>> The way I did it was by introducing a new irq_state and let the irqchip driver
>>>> handle most of the work (if it supports that state):
>>>>
>>>> https://lkml.org/lkml/2018/5/25/181
>>>>
>>>> This has not been ACKed nor NAKed. So I am just asking whether this is a more
>>>> suitable approach, and if not, is there any suggestions on how to do this?
>>>
>>> I really didn't pay attention to that as it's burried in the GIC/ARM series
>>> which is usually Marc's playground.
>>
>> I'm working my way through it ATM now that I have some brain cycles back.
>>
>>> Adding NMI delivery support at low level architecture irq chip level is
>>> perfectly fine, but the exposure of that needs to be restricted very
>>> much. Adding it to the generic interrupt control interfaces is not going to
>>> happen. That's doomed to begin with and a complete abuse of the interface
>>> as the handler can not ever be used for that.
>>
>> I can only agree with that. Allowing random driver to use request_irq()
>> to make anything an NMI ultimately turns it into a complete mess ("hey,
>> NMI is *faster*, let's use that"), and a potential source of horrible
>> deadlocks.
>>
>> What I'd find more palatable is a way for an irqchip to be able to
>> prioritize some interrupts based on a set of architecturally-defined
>> requirements, and a separate NMI requesting/handling framework that is
>> separate from the IRQ API, as the overall requirements are likely to
>> completely different.
>>
>> It shouldn't have to be nearly as complex as the IRQ API, and require
>> much stricter requirements in terms of what you can do there (flow
>> handling should definitely be different).
> 
> Marc, Julien, do you plan to actively work on this? Would you mind keeping
> me in the loop? I also need this work for this watchdog. In the meantime,
> I will go through Julien's patches and try to adapt it to my work.

We are going to work on this and of course your input is most welcome to 
make sure we have an interface usable across different architectures.

In my patches, I'm not sure there is much to adapt to your work as most 
of it is arch specific (although I wont say no to another pair of eyes 
looking at them). From what I've seen of your patches, the point where 
we converge is that need for some code to be able to tell the irqchip "I 
want that particular interrupt line to be treated/setup as an NMI".

We'll make sure to keep you in the loop for discussions/suggestions on this.

Thanks,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
  2018-06-15  2:03       ` Ricardo Neri
  (?)
@ 2018-06-15  9:19         ` Thomas Gleixner
  -1 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-15  9:19 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Randy Dunlap, Davidlohr Bueso, Christoffer Dall,
	Marc Zyngier, Kai-Heng Feng, Konrad Rzeszutek Wilk,
	David Rientjes, iommu

On Thu, 14 Jun 2018, Ricardo Neri wrote:
> On Wed, Jun 13, 2018 at 11:40:00AM +0200, Thomas Gleixner wrote:
> > On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > > @@ -183,6 +184,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
> > >  	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> > >  		kick_timer(hdata);
> > >  
> > > +	pr_err("This interrupt should not have happened. Ensure delivery mode is NMI.\n");
> > 
> > Eeew.
> 
> If you don't mind me asking. What is the problem with this error message?

The problem is not the error message. The problem is the abuse of
request_irq() and the fact that this irq handler function exists in the
first place for something which is NMI based.

> > And in case that the HPET does not support periodic mode this reprogramms
> > the timer on every NMI which means that while perf is running the watchdog
> > will never ever detect anything.
> 
> Yes. I see that this is wrong. With MSI interrupts, as far as I can
> see, there is not a way to make sure that the HPET timer caused the NMI
> perhaps the only option is to use an IO APIC interrupt and read the
> interrupt status register.
> 
> > Aside of that, reading TWO HPET registers for every NMI is insane. HPET
> > access is horribly slow, so any high frequency perf monitoring will take a
> > massive performance hit.
> 
> If an IO APIC interrupt is used, only HPET register (the status register)
> would need to be read for every NMI. Would that be more acceptable? Otherwise,
> there is no way to determine if the HPET cause the NMI.

You need level trigger for the HPET status register to be useful at all
because in edge mode the interrupt status bits read always 0.

That means you have to fiddle with the IOAPIC acknowledge magic from NMI
context. Brilliant idea. If the NMI hits in the middle of a regular
io_apic_read() then the interrupted code will endup with the wrong index
register. Not to talk about the fun which the affinity rotation from NMI
context would bring.

Do not even think about using IOAPIC and level for this.

> Alternatively, there could be a counter that skips reading the HPET status
> register (and the detection of hardlockups) for every X NMIs. This would
> reduce the overall frequency of HPET register reads.

Great plan. So if the watchdog is the only NMI (because perf is off) then
you delay the watchdog detection by that count.

You neither can do a time based check, because time might be corrupted and
then you end up in lala land as well.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-15  9:19         ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-15  9:19 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu

On Thu, 14 Jun 2018, Ricardo Neri wrote:
> On Wed, Jun 13, 2018 at 11:40:00AM +0200, Thomas Gleixner wrote:
> > On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > > @@ -183,6 +184,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
> > >  	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> > >  		kick_timer(hdata);
> > >  
> > > +	pr_err("This interrupt should not have happened. Ensure delivery mode is NMI.\n");
> > 
> > Eeew.
> 
> If you don't mind me asking. What is the problem with this error message?

The problem is not the error message. The problem is the abuse of
request_irq() and the fact that this irq handler function exists in the
first place for something which is NMI based.

> > And in case that the HPET does not support periodic mode this reprogramms
> > the timer on every NMI which means that while perf is running the watchdog
> > will never ever detect anything.
> 
> Yes. I see that this is wrong. With MSI interrupts, as far as I can
> see, there is not a way to make sure that the HPET timer caused the NMI
> perhaps the only option is to use an IO APIC interrupt and read the
> interrupt status register.
> 
> > Aside of that, reading TWO HPET registers for every NMI is insane. HPET
> > access is horribly slow, so any high frequency perf monitoring will take a
> > massive performance hit.
> 
> If an IO APIC interrupt is used, only HPET register (the status register)
> would need to be read for every NMI. Would that be more acceptable? Otherwise,
> there is no way to determine if the HPET cause the NMI.

You need level trigger for the HPET status register to be useful at all
because in edge mode the interrupt status bits read always 0.

That means you have to fiddle with the IOAPIC acknowledge magic from NMI
context. Brilliant idea. If the NMI hits in the middle of a regular
io_apic_read() then the interrupted code will endup with the wrong index
register. Not to talk about the fun which the affinity rotation from NMI
context would bring.

Do not even think about using IOAPIC and level for this.

> Alternatively, there could be a counter that skips reading the HPET status
> register (and the detection of hardlockups) for every X NMIs. This would
> reduce the overall frequency of HPET register reads.

Great plan. So if the watchdog is the only NMI (because perf is off) then
you delay the watchdog detection by that count.

You neither can do a time based check, because time might be corrupted and
then you end up in lala land as well.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-15  9:19         ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-15  9:19 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu

On Thu, 14 Jun 2018, Ricardo Neri wrote:
> On Wed, Jun 13, 2018 at 11:40:00AM +0200, Thomas Gleixner wrote:
> > On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > > @@ -183,6 +184,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
> > >  	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> > >  		kick_timer(hdata);
> > >  
> > > +	pr_err("This interrupt should not have happened. Ensure delivery mode is NMI.\n");
> > 
> > Eeew.
> 
> If you don't mind me asking. What is the problem with this error message?

The problem is not the error message. The problem is the abuse of
request_irq() and the fact that this irq handler function exists in the
first place for something which is NMI based.

> > And in case that the HPET does not support periodic mode this reprogramms
> > the timer on every NMI which means that while perf is running the watchdog
> > will never ever detect anything.
> 
> Yes. I see that this is wrong. With MSI interrupts, as far as I can
> see, there is not a way to make sure that the HPET timer caused the NMI
> perhaps the only option is to use an IO APIC interrupt and read the
> interrupt status register.
> 
> > Aside of that, reading TWO HPET registers for every NMI is insane. HPET
> > access is horribly slow, so any high frequency perf monitoring will take a
> > massive performance hit.
> 
> If an IO APIC interrupt is used, only HPET register (the status register)
> would need to be read for every NMI. Would that be more acceptable? Otherwise,
> there is no way to determine if the HPET cause the NMI.

You need level trigger for the HPET status register to be useful at all
because in edge mode the interrupt status bits read always 0.

That means you have to fiddle with the IOAPIC acknowledge magic from NMI
context. Brilliant idea. If the NMI hits in the middle of a regular
io_apic_read() then the interrupted code will endup with the wrong index
register. Not to talk about the fun which the affinity rotation from NMI
context would bring.

Do not even think about using IOAPIC and level for this.

> Alternatively, there could be a counter that skips reading the HPET status
> register (and the detection of hardlockups) for every X NMIs. This would
> reduce the overall frequency of HPET register reads.

Great plan. So if the watchdog is the only NMI (because perf is off) then
you delay the watchdog detection by that count.

You neither can do a time based check, because time might be corrupted and
then you end up in lala land as well.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs
  2018-06-15  2:16       ` Ricardo Neri
  (?)
@ 2018-06-15 10:29         ` Thomas Gleixner
  -1 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-15 10:29 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Randy Dunlap, Davidlohr Bueso, Christoffer Dall,
	Marc Zyngier, Kai-Heng Feng, Konrad Rzeszutek Wilk,
	David Rientjes, iommu

On Thu, 14 Jun 2018, Ricardo Neri wrote:
> On Wed, Jun 13, 2018 at 11:48:09AM +0200, Thomas Gleixner wrote:
> > On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > > +	/* There are no CPUs to monitor. */
> > > +	if (!cpumask_weight(&hdata->monitored_mask))
> > > +		return NMI_HANDLED;
> > > +
> > >  	inspect_for_hardlockups(regs);
> > >  
> > > +	/*
> > > +	 * Target a new CPU. Keep trying until we find a monitored CPU. CPUs
> > > +	 * are addded and removed to this mask at cpu_up() and cpu_down(),
> > > +	 * respectively. Thus, the interrupt should be able to be moved to
> > > +	 * the next monitored CPU.
> > > +	 */
> > > +	spin_lock(&hld_data->lock);
> > 
> > Yuck. Taking a spinlock from NMI ...
> 
> I am sorry. I will look into other options for locking. Do you think rcu_lock
> would help in this case? I need this locking because the CPUs being monitored
> changes as CPUs come online and offline.

Sure, but you _cannot_ take any locks in NMI context which are also taken
in !NMI context. And RCU will not help either. How so? The NMI can hit
exactly before the CPU bit is cleared and then the CPU goes down. So RCU
_cannot_ protect anything.

All you can do there is make sure that the TIMn_CONF is only ever accessed
in !NMI code. Then you can stop the timer _before_ a CPU goes down and make
sure that the eventually on the fly NMI is finished. After that you can
fiddle with the CPU mask and restart the timer. Be aware that this is going
to be more corner case handling that actual functionality.

> > > +	for_each_cpu_wrap(cpu, &hdata->monitored_mask, smp_processor_id() + 1) {
> > > +		if (!irq_set_affinity(hld_data->irq, cpumask_of(cpu)))
> > > +			break;
> > 
> > ... and then calling into generic interrupt code which will take even more
> > locks is completely broken.
> 
> I will into reworking how the destination of the interrupt is set.

You have to consider two cases:

 1) !remapped mode:

    That's reasonably simple because you just have to deal with the HPET
    TIMERn_PROCMSG_ROUT register. But then you need to do this directly and
    not through any of the existing interrupt facilities.

 2) remapped mode:

    That's way more complex as you _cannot_ ever do anything which touches
    the IOMMU and the related tables.

    So you'd need to reserve an IOMMU remapping entry for each CPU upfront,
    store the resulting value for the HPET TIMERn_PROCMSG_ROUT register in
    per cpu storage and just modify that one from NMI.

    Though there might be subtle side effects involved, which are related to
    the acknowledge part. You need to talk to the IOMMU wizards first.

All in all, the idea itself is interesting, but the envisioned approach of
round robin and no fast accessible NMI reason detection is going to create
more problems than it solves.

This all could have been avoided if Intel hadn't decided to reuse the APIC
timer registers for the TSC deadline timer. If both would be available we'd
have a CPU local fast accessible watchdog timer when TSC deadline is used
for general timer purposes. But why am I complaining? I've resigned to the
fact that timers are designed^Wcobbled together by janitors long ago.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs
@ 2018-06-15 10:29         ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-15 10:29 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu

On Thu, 14 Jun 2018, Ricardo Neri wrote:
> On Wed, Jun 13, 2018 at 11:48:09AM +0200, Thomas Gleixner wrote:
> > On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > > +	/* There are no CPUs to monitor. */
> > > +	if (!cpumask_weight(&hdata->monitored_mask))
> > > +		return NMI_HANDLED;
> > > +
> > >  	inspect_for_hardlockups(regs);
> > >  
> > > +	/*
> > > +	 * Target a new CPU. Keep trying until we find a monitored CPU. CPUs
> > > +	 * are addded and removed to this mask at cpu_up() and cpu_down(),
> > > +	 * respectively. Thus, the interrupt should be able to be moved to
> > > +	 * the next monitored CPU.
> > > +	 */
> > > +	spin_lock(&hld_data->lock);
> > 
> > Yuck. Taking a spinlock from NMI ...
> 
> I am sorry. I will look into other options for locking. Do you think rcu_lock
> would help in this case? I need this locking because the CPUs being monitored
> changes as CPUs come online and offline.

Sure, but you _cannot_ take any locks in NMI context which are also taken
in !NMI context. And RCU will not help either. How so? The NMI can hit
exactly before the CPU bit is cleared and then the CPU goes down. So RCU
_cannot_ protect anything.

All you can do there is make sure that the TIMn_CONF is only ever accessed
in !NMI code. Then you can stop the timer _before_ a CPU goes down and make
sure that the eventually on the fly NMI is finished. After that you can
fiddle with the CPU mask and restart the timer. Be aware that this is going
to be more corner case handling that actual functionality.

> > > +	for_each_cpu_wrap(cpu, &hdata->monitored_mask, smp_processor_id() + 1) {
> > > +		if (!irq_set_affinity(hld_data->irq, cpumask_of(cpu)))
> > > +			break;
> > 
> > ... and then calling into generic interrupt code which will take even more
> > locks is completely broken.
> 
> I will into reworking how the destination of the interrupt is set.

You have to consider two cases:

 1) !remapped mode:

    That's reasonably simple because you just have to deal with the HPET
    TIMERn_PROCMSG_ROUT register. But then you need to do this directly and
    not through any of the existing interrupt facilities.

 2) remapped mode:

    That's way more complex as you _cannot_ ever do anything which touches
    the IOMMU and the related tables.

    So you'd need to reserve an IOMMU remapping entry for each CPU upfront,
    store the resulting value for the HPET TIMERn_PROCMSG_ROUT register in
    per cpu storage and just modify that one from NMI.

    Though there might be subtle side effects involved, which are related to
    the acknowledge part. You need to talk to the IOMMU wizards first.

All in all, the idea itself is interesting, but the envisioned approach of
round robin and no fast accessible NMI reason detection is going to create
more problems than it solves.

This all could have been avoided if Intel hadn't decided to reuse the APIC
timer registers for the TSC deadline timer. If both would be available we'd
have a CPU local fast accessible watchdog timer when TSC deadline is used
for general timer purposes. But why am I complaining? I've resigned to the
fact that timers are designed^Wcobbled together by janitors long ago.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs
@ 2018-06-15 10:29         ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-15 10:29 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu

On Thu, 14 Jun 2018, Ricardo Neri wrote:
> On Wed, Jun 13, 2018 at 11:48:09AM +0200, Thomas Gleixner wrote:
> > On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > > +	/* There are no CPUs to monitor. */
> > > +	if (!cpumask_weight(&hdata->monitored_mask))
> > > +		return NMI_HANDLED;
> > > +
> > >  	inspect_for_hardlockups(regs);
> > >  
> > > +	/*
> > > +	 * Target a new CPU. Keep trying until we find a monitored CPU. CPUs
> > > +	 * are addded and removed to this mask at cpu_up() and cpu_down(),
> > > +	 * respectively. Thus, the interrupt should be able to be moved to
> > > +	 * the next monitored CPU.
> > > +	 */
> > > +	spin_lock(&hld_data->lock);
> > 
> > Yuck. Taking a spinlock from NMI ...
> 
> I am sorry. I will look into other options for locking. Do you think rcu_lock
> would help in this case? I need this locking because the CPUs being monitored
> changes as CPUs come online and offline.

Sure, but you _cannot_ take any locks in NMI context which are also taken
in !NMI context. And RCU will not help either. How so? The NMI can hit
exactly before the CPU bit is cleared and then the CPU goes down. So RCU
_cannot_ protect anything.

All you can do there is make sure that the TIMn_CONF is only ever accessed
in !NMI code. Then you can stop the timer _before_ a CPU goes down and make
sure that the eventually on the fly NMI is finished. After that you can
fiddle with the CPU mask and restart the timer. Be aware that this is going
to be more corner case handling that actual functionality.

> > > +	for_each_cpu_wrap(cpu, &hdata->monitored_mask, smp_processor_id() + 1) {
> > > +		if (!irq_set_affinity(hld_data->irq, cpumask_of(cpu)))
> > > +			break;
> > 
> > ... and then calling into generic interrupt code which will take even more
> > locks is completely broken.
> 
> I will into reworking how the destination of the interrupt is set.

You have to consider two cases:

 1) !remapped mode:

    That's reasonably simple because you just have to deal with the HPET
    TIMERn_PROCMSG_ROUT register. But then you need to do this directly and
    not through any of the existing interrupt facilities.

 2) remapped mode:

    That's way more complex as you _cannot_ ever do anything which touches
    the IOMMU and the related tables.

    So you'd need to reserve an IOMMU remapping entry for each CPU upfront,
    store the resulting value for the HPET TIMERn_PROCMSG_ROUT register in
    per cpu storage and just modify that one from NMI.

    Though there might be subtle side effects involved, which are related to
    the acknowledge part. You need to talk to the IOMMU wizards first.

All in all, the idea itself is interesting, but the envisioned approach of
round robin and no fast accessible NMI reason detection is going to create
more problems than it solves.

This all could have been avoided if Intel hadn't decided to reuse the APIC
timer registers for the TSC deadline timer. If both would be available we'd
have a CPU local fast accessible watchdog timer when TSC deadline is used
for general timer purposes. But why am I complaining? I've resigned to the
fact that timers are designed^Wcobbled together by janitors long ago.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-16  0:39                 ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-16  0:39 UTC (permalink / raw)
  To: Julien Thierry
  Cc: Marc Zyngier, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Ashok Raj, Borislav Petkov,
	Tony Luck, Ravi V. Shankar, x86, sparclinux, linuxppc-dev,
	linux-kernel, Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Bartosz Golaszewski, Doug Berger,
	Palmer Dabbelt, iommu

On Fri, Jun 15, 2018 at 09:01:02AM +0100, Julien Thierry wrote:
> Hi Ricardo,
> 
> On 15/06/18 03:12, Ricardo Neri wrote:
> >On Wed, Jun 13, 2018 at 11:06:25AM +0100, Marc Zyngier wrote:
> >>On 13/06/18 10:20, Thomas Gleixner wrote:
> >>>On Wed, 13 Jun 2018, Julien Thierry wrote:
> >>>>On 13/06/18 09:34, Peter Zijlstra wrote:
> >>>>>On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
> >>>>>>diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> >>>>>>index 5426627..dbc5e02 100644
> >>>>>>--- a/include/linux/interrupt.h
> >>>>>>+++ b/include/linux/interrupt.h
> >>>>>>@@ -61,6 +61,8 @@
> >>>>>>    *                interrupt handler after suspending interrupts. For
> >>>>>>system
> >>>>>>    *                wakeup devices users need to implement wakeup
> >>>>>>detection in
> >>>>>>    *                their interrupt handlers.
> >>>>>>+ * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
> >>>>>>non-maskable, if
> >>>>>>+ *                supported by the chip.
> >>>>>>    */
> >>>>>
> >>>>>NAK on the first 6 patches. You really _REALLY_ don't want to expose
> >>>>>NMIs to this level.
> >>>>>
> >>>>
> >>>>I've been working on something similar on arm64 side, and effectively the one
> >>>>thing that might be common to arm64 and intel is the interface to set an
> >>>>interrupt as NMI. So I guess it would be nice to agree on the right approach
> >>>>for this.
> >>>>
> >>>>The way I did it was by introducing a new irq_state and let the irqchip driver
> >>>>handle most of the work (if it supports that state):
> >>>>
> >>>>https://lkml.org/lkml/2018/5/25/181
> >>>>
> >>>>This has not been ACKed nor NAKed. So I am just asking whether this is a more
> >>>>suitable approach, and if not, is there any suggestions on how to do this?
> >>>
> >>>I really didn't pay attention to that as it's burried in the GIC/ARM series
> >>>which is usually Marc's playground.
> >>
> >>I'm working my way through it ATM now that I have some brain cycles back.
> >>
> >>>Adding NMI delivery support at low level architecture irq chip level is
> >>>perfectly fine, but the exposure of that needs to be restricted very
> >>>much. Adding it to the generic interrupt control interfaces is not going to
> >>>happen. That's doomed to begin with and a complete abuse of the interface
> >>>as the handler can not ever be used for that.
> >>
> >>I can only agree with that. Allowing random driver to use request_irq()
> >>to make anything an NMI ultimately turns it into a complete mess ("hey,
> >>NMI is *faster*, let's use that"), and a potential source of horrible
> >>deadlocks.
> >>
> >>What I'd find more palatable is a way for an irqchip to be able to
> >>prioritize some interrupts based on a set of architecturally-defined
> >>requirements, and a separate NMI requesting/handling framework that is
> >>separate from the IRQ API, as the overall requirements are likely to
> >>completely different.
> >>
> >>It shouldn't have to be nearly as complex as the IRQ API, and require
> >>much stricter requirements in terms of what you can do there (flow
> >>handling should definitely be different).
> >
> >Marc, Julien, do you plan to actively work on this? Would you mind keeping
> >me in the loop? I also need this work for this watchdog. In the meantime,
> >I will go through Julien's patches and try to adapt it to my work.
> 
> We are going to work on this and of course your input is most welcome to
> make sure we have an interface usable across different architectures.

Great! Thanks! I will keep an eye to future version of your "arm64: provide
pseudo NMI with GICv3" series.
> 
> In my patches, I'm not sure there is much to adapt to your work as most of
> it is arch specific (although I wont say no to another pair of eyes looking
> at them). From what I've seen of your patches, the point where we converge
> is that need for some code to be able to tell the irqchip "I want that
> particular interrupt line to be treated/setup as an NMI".

Indeed, there has to be a generic way for the irqchip to announce that it
supports configuring an interrupt as NMI... and a way to actually configuring
it.

> 
> We'll make sure to keep you in the loop for discussions/suggestions on this.

Thank you!

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-16  0:39                 ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-16  0:39 UTC (permalink / raw)
  To: Julien Thierry
  Cc: Daniel Lezcano, Peter Zijlstra, Palmer Dabbelt, H. Peter Anvin,
	sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar, Doug Berger,
	Ashok Raj, Bartosz Golaszewski, x86-DgEjT+Ai2ygdnm+yROfE0A,
	Andi Kleen, Borislav Petkov, Masami Hiramatsu, Ravi V. Shankar,
	Marc Zyngier, Thomas Gleixner, Tony Luck, Randy Dunlap,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Levin,
	Alexander (Sasha Levin),
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jacob Pan,
	Andrew Morton, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ

On Fri, Jun 15, 2018 at 09:01:02AM +0100, Julien Thierry wrote:
> Hi Ricardo,
> 
> On 15/06/18 03:12, Ricardo Neri wrote:
> >On Wed, Jun 13, 2018 at 11:06:25AM +0100, Marc Zyngier wrote:
> >>On 13/06/18 10:20, Thomas Gleixner wrote:
> >>>On Wed, 13 Jun 2018, Julien Thierry wrote:
> >>>>On 13/06/18 09:34, Peter Zijlstra wrote:
> >>>>>On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
> >>>>>>diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> >>>>>>index 5426627..dbc5e02 100644
> >>>>>>--- a/include/linux/interrupt.h
> >>>>>>+++ b/include/linux/interrupt.h
> >>>>>>@@ -61,6 +61,8 @@
> >>>>>>    *                interrupt handler after suspending interrupts. For
> >>>>>>system
> >>>>>>    *                wakeup devices users need to implement wakeup
> >>>>>>detection in
> >>>>>>    *                their interrupt handlers.
> >>>>>>+ * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
> >>>>>>non-maskable, if
> >>>>>>+ *                supported by the chip.
> >>>>>>    */
> >>>>>
> >>>>>NAK on the first 6 patches. You really _REALLY_ don't want to expose
> >>>>>NMIs to this level.
> >>>>>
> >>>>
> >>>>I've been working on something similar on arm64 side, and effectively the one
> >>>>thing that might be common to arm64 and intel is the interface to set an
> >>>>interrupt as NMI. So I guess it would be nice to agree on the right approach
> >>>>for this.
> >>>>
> >>>>The way I did it was by introducing a new irq_state and let the irqchip driver
> >>>>handle most of the work (if it supports that state):
> >>>>
> >>>>https://lkml.org/lkml/2018/5/25/181
> >>>>
> >>>>This has not been ACKed nor NAKed. So I am just asking whether this is a more
> >>>>suitable approach, and if not, is there any suggestions on how to do this?
> >>>
> >>>I really didn't pay attention to that as it's burried in the GIC/ARM series
> >>>which is usually Marc's playground.
> >>
> >>I'm working my way through it ATM now that I have some brain cycles back.
> >>
> >>>Adding NMI delivery support at low level architecture irq chip level is
> >>>perfectly fine, but the exposure of that needs to be restricted very
> >>>much. Adding it to the generic interrupt control interfaces is not going to
> >>>happen. That's doomed to begin with and a complete abuse of the interface
> >>>as the handler can not ever be used for that.
> >>
> >>I can only agree with that. Allowing random driver to use request_irq()
> >>to make anything an NMI ultimately turns it into a complete mess ("hey,
> >>NMI is *faster*, let's use that"), and a potential source of horrible
> >>deadlocks.
> >>
> >>What I'd find more palatable is a way for an irqchip to be able to
> >>prioritize some interrupts based on a set of architecturally-defined
> >>requirements, and a separate NMI requesting/handling framework that is
> >>separate from the IRQ API, as the overall requirements are likely to
> >>completely different.
> >>
> >>It shouldn't have to be nearly as complex as the IRQ API, and require
> >>much stricter requirements in terms of what you can do there (flow
> >>handling should definitely be different).
> >
> >Marc, Julien, do you plan to actively work on this? Would you mind keeping
> >me in the loop? I also need this work for this watchdog. In the meantime,
> >I will go through Julien's patches and try to adapt it to my work.
> 
> We are going to work on this and of course your input is most welcome to
> make sure we have an interface usable across different architectures.

Great! Thanks! I will keep an eye to future version of your "arm64: provide
pseudo NMI with GICv3" series.
> 
> In my patches, I'm not sure there is much to adapt to your work as most of
> it is arch specific (although I wont say no to another pair of eyes looking
> at them). From what I've seen of your patches, the point where we converge
> is that need for some code to be able to tell the irqchip "I want that
> particular interrupt line to be treated/setup as an NMI".

Indeed, there has to be a generic way for the irqchip to announce that it
supports configuring an interrupt as NMI... and a way to actually configuring
it.

> 
> We'll make sure to keep you in the loop for discussions/suggestions on this.

Thank you!

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-16  0:39                 ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-16  0:39 UTC (permalink / raw)
  To: Julien Thierry
  Cc: Daniel Lezcano, Peter Zijlstra, Palmer Dabbelt, H. Peter Anvin,
	sparclinux-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar, Doug Berger,
	Ashok Raj, Bartosz Golaszewski, x86-DgEjT+Ai2ygdnm+yROfE0A,
	Andi Kleen, Borislav Petkov, Masami Hiramatsu, Ravi V. Shankar,
	Marc Zyngier, Thomas Gleixner, Tony Luck, Randy Dunlap,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Levin,
	Alexander (Sasha Levin),
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jacob Pan,
	Andrew Morton, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ

On Fri, Jun 15, 2018 at 09:01:02AM +0100, Julien Thierry wrote:
> Hi Ricardo,
> 
> On 15/06/18 03:12, Ricardo Neri wrote:
> >On Wed, Jun 13, 2018 at 11:06:25AM +0100, Marc Zyngier wrote:
> >>On 13/06/18 10:20, Thomas Gleixner wrote:
> >>>On Wed, 13 Jun 2018, Julien Thierry wrote:
> >>>>On 13/06/18 09:34, Peter Zijlstra wrote:
> >>>>>On Tue, Jun 12, 2018 at 05:57:23PM -0700, Ricardo Neri wrote:
> >>>>>>diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> >>>>>>index 5426627..dbc5e02 100644
> >>>>>>--- a/include/linux/interrupt.h
> >>>>>>+++ b/include/linux/interrupt.h
> >>>>>>@@ -61,6 +61,8 @@
> >>>>>>    *                interrupt handler after suspending interrupts. For
> >>>>>>system
> >>>>>>    *                wakeup devices users need to implement wakeup
> >>>>>>detection in
> >>>>>>    *                their interrupt handlers.
> >>>>>>+ * IRQF_DELIVER_AS_NMI - Configure interrupt to be delivered as
> >>>>>>non-maskable, if
> >>>>>>+ *                supported by the chip.
> >>>>>>    */
> >>>>>
> >>>>>NAK on the first 6 patches. You really _REALLY_ don't want to expose
> >>>>>NMIs to this level.
> >>>>>
> >>>>
> >>>>I've been working on something similar on arm64 side, and effectively the one
> >>>>thing that might be common to arm64 and intel is the interface to set an
> >>>>interrupt as NMI. So I guess it would be nice to agree on the right approach
> >>>>for this.
> >>>>
> >>>>The way I did it was by introducing a new irq_state and let the irqchip driver
> >>>>handle most of the work (if it supports that state):
> >>>>
> >>>>https://lkml.org/lkml/2018/5/25/181
> >>>>
> >>>>This has not been ACKed nor NAKed. So I am just asking whether this is a more
> >>>>suitable approach, and if not, is there any suggestions on how to do this?
> >>>
> >>>I really didn't pay attention to that as it's burried in the GIC/ARM series
> >>>which is usually Marc's playground.
> >>
> >>I'm working my way through it ATM now that I have some brain cycles back.
> >>
> >>>Adding NMI delivery support at low level architecture irq chip level is
> >>>perfectly fine, but the exposure of that needs to be restricted very
> >>>much. Adding it to the generic interrupt control interfaces is not going to
> >>>happen. That's doomed to begin with and a complete abuse of the interface
> >>>as the handler can not ever be used for that.
> >>
> >>I can only agree with that. Allowing random driver to use request_irq()
> >>to make anything an NMI ultimately turns it into a complete mess ("hey,
> >>NMI is *faster*, let's use that"), and a potential source of horrible
> >>deadlocks.
> >>
> >>What I'd find more palatable is a way for an irqchip to be able to
> >>prioritize some interrupts based on a set of architecturally-defined
> >>requirements, and a separate NMI requesting/handling framework that is
> >>separate from the IRQ API, as the overall requirements are likely to
> >>completely different.
> >>
> >>It shouldn't have to be nearly as complex as the IRQ API, and require
> >>much stricter requirements in terms of what you can do there (flow
> >>handling should definitely be different).
> >
> >Marc, Julien, do you plan to actively work on this? Would you mind keeping
> >me in the loop? I also need this work for this watchdog. In the meantime,
> >I will go through Julien's patches and try to adapt it to my work.
> 
> We are going to work on this and of course your input is most welcome to
> make sure we have an interface usable across different architectures.

Great! Thanks! I will keep an eye to future version of your "arm64: provide
pseudo NMI with GICv3" series.
> 
> In my patches, I'm not sure there is much to adapt to your work as most of
> it is arch specific (although I wont say no to another pair of eyes looking
> at them). From what I've seen of your patches, the point where we converge
> is that need for some code to be able to tell the irqchip "I want that
> particular interrupt line to be treated/setup as an NMI".

Indeed, there has to be a generic way for the irqchip to announce that it
supports configuring an interrupt as NMI... and a way to actually configuring
it.

> 
> We'll make sure to keep you in the loop for discussions/suggestions on this.

Thank you!

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs
@ 2018-06-16  0:46           ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-16  0:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Randy Dunlap, Davidlohr Bueso, Christoffer Dall,
	Marc Zyngier, Kai-Heng Feng, Konrad Rzeszutek Wilk,
	David Rientjes, iommu

On Fri, Jun 15, 2018 at 12:29:06PM +0200, Thomas Gleixner wrote:
> On Thu, 14 Jun 2018, Ricardo Neri wrote:
> > On Wed, Jun 13, 2018 at 11:48:09AM +0200, Thomas Gleixner wrote:
> > > On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > > > +	/* There are no CPUs to monitor. */
> > > > +	if (!cpumask_weight(&hdata->monitored_mask))
> > > > +		return NMI_HANDLED;
> > > > +
> > > >  	inspect_for_hardlockups(regs);
> > > >  
> > > > +	/*
> > > > +	 * Target a new CPU. Keep trying until we find a monitored CPU. CPUs
> > > > +	 * are addded and removed to this mask at cpu_up() and cpu_down(),
> > > > +	 * respectively. Thus, the interrupt should be able to be moved to
> > > > +	 * the next monitored CPU.
> > > > +	 */
> > > > +	spin_lock(&hld_data->lock);
> > > 
> > > Yuck. Taking a spinlock from NMI ...
> > 
> > I am sorry. I will look into other options for locking. Do you think rcu_lock
> > would help in this case? I need this locking because the CPUs being monitored
> > changes as CPUs come online and offline.
> 
> Sure, but you _cannot_ take any locks in NMI context which are also taken
> in !NMI context. And RCU will not help either. How so? The NMI can hit
> exactly before the CPU bit is cleared and then the CPU goes down. So RCU
> _cannot_ protect anything.
> 
> All you can do there is make sure that the TIMn_CONF is only ever accessed
> in !NMI code. Then you can stop the timer _before_ a CPU goes down and make
> sure that the eventually on the fly NMI is finished. After that you can
> fiddle with the CPU mask and restart the timer. Be aware that this is going
> to be more corner case handling that actual functionality.

Thanks for the suggestion. It makes sense to stop the timer when updating the
CPU mask. In this manner the timer will not cause any NMI.
> 
> > > > +	for_each_cpu_wrap(cpu, &hdata->monitored_mask, smp_processor_id() + 1) {
> > > > +		if (!irq_set_affinity(hld_data->irq, cpumask_of(cpu)))
> > > > +			break;
> > > 
> > > ... and then calling into generic interrupt code which will take even more
> > > locks is completely broken.
> > 
> > I will into reworking how the destination of the interrupt is set.
> 
> You have to consider two cases:
> 
>  1) !remapped mode:
> 
>     That's reasonably simple because you just have to deal with the HPET
>     TIMERn_PROCMSG_ROUT register. But then you need to do this directly and
>     not through any of the existing interrupt facilities.

Indeed, there is no need to use the generic interrupt faciities to set affinity;
I am dealing with an NMI anyways.
> 
>  2) remapped mode:
> 
>     That's way more complex as you _cannot_ ever do anything which touches
>     the IOMMU and the related tables.
> 
>     So you'd need to reserve an IOMMU remapping entry for each CPU upfront,
>     store the resulting value for the HPET TIMERn_PROCMSG_ROUT register in
>     per cpu storage and just modify that one from NMI.
> 
>     Though there might be subtle side effects involved, which are related to
>     the acknowledge part. You need to talk to the IOMMU wizards first.

I see. I will look into the code and prototype something that makes sense for
the IOMMU maintainers.

> 
> All in all, the idea itself is interesting, but the envisioned approach of
> round robin and no fast accessible NMI reason detection is going to create
> more problems than it solves.

I see it more clearly now.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs
@ 2018-06-16  0:46           ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-16  0:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, Peter Zijlstra, Alexei Starovoitov,
	Kai-Heng Feng, H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA,
	Ingo Molnar, Christoffer Dall, Davidlohr Bueso, Ashok Raj,
	Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin

On Fri, Jun 15, 2018 at 12:29:06PM +0200, Thomas Gleixner wrote:
> On Thu, 14 Jun 2018, Ricardo Neri wrote:
> > On Wed, Jun 13, 2018 at 11:48:09AM +0200, Thomas Gleixner wrote:
> > > On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > > > +	/* There are no CPUs to monitor. */
> > > > +	if (!cpumask_weight(&hdata->monitored_mask))
> > > > +		return NMI_HANDLED;
> > > > +
> > > >  	inspect_for_hardlockups(regs);
> > > >  
> > > > +	/*
> > > > +	 * Target a new CPU. Keep trying until we find a monitored CPU. CPUs
> > > > +	 * are addded and removed to this mask at cpu_up() and cpu_down(),
> > > > +	 * respectively. Thus, the interrupt should be able to be moved to
> > > > +	 * the next monitored CPU.
> > > > +	 */
> > > > +	spin_lock(&hld_data->lock);
> > > 
> > > Yuck. Taking a spinlock from NMI ...
> > 
> > I am sorry. I will look into other options for locking. Do you think rcu_lock
> > would help in this case? I need this locking because the CPUs being monitored
> > changes as CPUs come online and offline.
> 
> Sure, but you _cannot_ take any locks in NMI context which are also taken
> in !NMI context. And RCU will not help either. How so? The NMI can hit
> exactly before the CPU bit is cleared and then the CPU goes down. So RCU
> _cannot_ protect anything.
> 
> All you can do there is make sure that the TIMn_CONF is only ever accessed
> in !NMI code. Then you can stop the timer _before_ a CPU goes down and make
> sure that the eventually on the fly NMI is finished. After that you can
> fiddle with the CPU mask and restart the timer. Be aware that this is going
> to be more corner case handling that actual functionality.

Thanks for the suggestion. It makes sense to stop the timer when updating the
CPU mask. In this manner the timer will not cause any NMI.
> 
> > > > +	for_each_cpu_wrap(cpu, &hdata->monitored_mask, smp_processor_id() + 1) {
> > > > +		if (!irq_set_affinity(hld_data->irq, cpumask_of(cpu)))
> > > > +			break;
> > > 
> > > ... and then calling into generic interrupt code which will take even more
> > > locks is completely broken.
> > 
> > I will into reworking how the destination of the interrupt is set.
> 
> You have to consider two cases:
> 
>  1) !remapped mode:
> 
>     That's reasonably simple because you just have to deal with the HPET
>     TIMERn_PROCMSG_ROUT register. But then you need to do this directly and
>     not through any of the existing interrupt facilities.

Indeed, there is no need to use the generic interrupt faciities to set affinity;
I am dealing with an NMI anyways.
> 
>  2) remapped mode:
> 
>     That's way more complex as you _cannot_ ever do anything which touches
>     the IOMMU and the related tables.
> 
>     So you'd need to reserve an IOMMU remapping entry for each CPU upfront,
>     store the resulting value for the HPET TIMERn_PROCMSG_ROUT register in
>     per cpu storage and just modify that one from NMI.
> 
>     Though there might be subtle side effects involved, which are related to
>     the acknowledge part. You need to talk to the IOMMU wizards first.

I see. I will look into the code and prototype something that makes sense for
the IOMMU maintainers.

> 
> All in all, the idea itself is interesting, but the envisioned approach of
> round robin and no fast accessible NMI reason detection is going to create
> more problems than it solves.

I see it more clearly now.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs
@ 2018-06-16  0:46           ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-16  0:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, Peter Zijlstra, Alexei Starovoitov,
	Kai-Heng Feng, H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA,
	Ingo Molnar, Christoffer Dall, Davidlohr Bueso, Ashok Raj,
	Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin

On Fri, Jun 15, 2018 at 12:29:06PM +0200, Thomas Gleixner wrote:
> On Thu, 14 Jun 2018, Ricardo Neri wrote:
> > On Wed, Jun 13, 2018 at 11:48:09AM +0200, Thomas Gleixner wrote:
> > > On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > > > +	/* There are no CPUs to monitor. */
> > > > +	if (!cpumask_weight(&hdata->monitored_mask))
> > > > +		return NMI_HANDLED;
> > > > +
> > > >  	inspect_for_hardlockups(regs);
> > > >  
> > > > +	/*
> > > > +	 * Target a new CPU. Keep trying until we find a monitored CPU. CPUs
> > > > +	 * are addded and removed to this mask at cpu_up() and cpu_down(),
> > > > +	 * respectively. Thus, the interrupt should be able to be moved to
> > > > +	 * the next monitored CPU.
> > > > +	 */
> > > > +	spin_lock(&hld_data->lock);
> > > 
> > > Yuck. Taking a spinlock from NMI ...
> > 
> > I am sorry. I will look into other options for locking. Do you think rcu_lock
> > would help in this case? I need this locking because the CPUs being monitored
> > changes as CPUs come online and offline.
> 
> Sure, but you _cannot_ take any locks in NMI context which are also taken
> in !NMI context. And RCU will not help either. How so? The NMI can hit
> exactly before the CPU bit is cleared and then the CPU goes down. So RCU
> _cannot_ protect anything.
> 
> All you can do there is make sure that the TIMn_CONF is only ever accessed
> in !NMI code. Then you can stop the timer _before_ a CPU goes down and make
> sure that the eventually on the fly NMI is finished. After that you can
> fiddle with the CPU mask and restart the timer. Be aware that this is going
> to be more corner case handling that actual functionality.

Thanks for the suggestion. It makes sense to stop the timer when updating the
CPU mask. In this manner the timer will not cause any NMI.
> 
> > > > +	for_each_cpu_wrap(cpu, &hdata->monitored_mask, smp_processor_id() + 1) {
> > > > +		if (!irq_set_affinity(hld_data->irq, cpumask_of(cpu)))
> > > > +			break;
> > > 
> > > ... and then calling into generic interrupt code which will take even more
> > > locks is completely broken.
> > 
> > I will into reworking how the destination of the interrupt is set.
> 
> You have to consider two cases:
> 
>  1) !remapped mode:
> 
>     That's reasonably simple because you just have to deal with the HPET
>     TIMERn_PROCMSG_ROUT register. But then you need to do this directly and
>     not through any of the existing interrupt facilities.

Indeed, there is no need to use the generic interrupt faciities to set affinity;
I am dealing with an NMI anyways.
> 
>  2) remapped mode:
> 
>     That's way more complex as you _cannot_ ever do anything which touches
>     the IOMMU and the related tables.
> 
>     So you'd need to reserve an IOMMU remapping entry for each CPU upfront,
>     store the resulting value for the HPET TIMERn_PROCMSG_ROUT register in
>     per cpu storage and just modify that one from NMI.
> 
>     Though there might be subtle side effects involved, which are related to
>     the acknowledge part. You need to talk to the IOMMU wizards first.

I see. I will look into the code and prototype something that makes sense for
the IOMMU maintainers.

> 
> All in all, the idea itself is interesting, but the envisioned approach of
> round robin and no fast accessible NMI reason detection is going to create
> more problems than it solves.

I see it more clearly now.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-16  0:51           ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-16  0:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Randy Dunlap, Davidlohr Bueso, Christoffer Dall,
	Marc Zyngier, Kai-Heng Feng, Konrad Rzeszutek Wilk,
	David Rientjes, iommu

On Fri, Jun 15, 2018 at 11:19:09AM +0200, Thomas Gleixner wrote:
> On Thu, 14 Jun 2018, Ricardo Neri wrote:
> > On Wed, Jun 13, 2018 at 11:40:00AM +0200, Thomas Gleixner wrote:
> > > On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > > > @@ -183,6 +184,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
> > > >  	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> > > >  		kick_timer(hdata);
> > > >  
> > > > +	pr_err("This interrupt should not have happened. Ensure delivery mode is NMI.\n");
> > > 
> > > Eeew.
> > 
> > If you don't mind me asking. What is the problem with this error message?
> 
> The problem is not the error message. The problem is the abuse of
> request_irq() and the fact that this irq handler function exists in the
> first place for something which is NMI based.

I wanted to add this handler in case the interrupt was not configured correctly
to be delivered as NMI (e.g., not supported by the hardware). I see your point.
Perhaps this is not needed. There is code in place to complain when an interrupt
that nobody was expecting happens.

> 
> > > And in case that the HPET does not support periodic mode this reprogramms
> > > the timer on every NMI which means that while perf is running the watchdog
> > > will never ever detect anything.
> > 
> > Yes. I see that this is wrong. With MSI interrupts, as far as I can
> > see, there is not a way to make sure that the HPET timer caused the NMI
> > perhaps the only option is to use an IO APIC interrupt and read the
> > interrupt status register.
> > 
> > > Aside of that, reading TWO HPET registers for every NMI is insane. HPET
> > > access is horribly slow, so any high frequency perf monitoring will take a
> > > massive performance hit.
> > 
> > If an IO APIC interrupt is used, only HPET register (the status register)
> > would need to be read for every NMI. Would that be more acceptable? Otherwise,
> > there is no way to determine if the HPET cause the NMI.
> 
> You need level trigger for the HPET status register to be useful at all
> because in edge mode the interrupt status bits read always 0.

Indeed.

> 
> That means you have to fiddle with the IOAPIC acknowledge magic from NMI
> context. Brilliant idea. If the NMI hits in the middle of a regular
> io_apic_read() then the interrupted code will endup with the wrong index
> register. Not to talk about the fun which the affinity rotation from NMI
> context would bring.
> 
> Do not even think about using IOAPIC and level for this.

OK. I will stay away of it and focus on MSI.
> 
> > Alternatively, there could be a counter that skips reading the HPET status
> > register (and the detection of hardlockups) for every X NMIs. This would
> > reduce the overall frequency of HPET register reads.
> 
> Great plan. So if the watchdog is the only NMI (because perf is off) then
> you delay the watchdog detection by that count.

OK. This was a bad idea. Then, is it acceptable to have an read to an HPET
register per NMI just to check in the status register if the HPET timer
caused the NMI?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-16  0:51           ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-16  0:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, Peter Zijlstra, Alexei Starovoitov,
	Kai-Heng Feng, H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA,
	Ingo Molnar, Christoffer Dall, Davidlohr Bueso, Ashok Raj,
	Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin

On Fri, Jun 15, 2018 at 11:19:09AM +0200, Thomas Gleixner wrote:
> On Thu, 14 Jun 2018, Ricardo Neri wrote:
> > On Wed, Jun 13, 2018 at 11:40:00AM +0200, Thomas Gleixner wrote:
> > > On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > > > @@ -183,6 +184,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
> > > >  	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> > > >  		kick_timer(hdata);
> > > >  
> > > > +	pr_err("This interrupt should not have happened. Ensure delivery mode is NMI.\n");
> > > 
> > > Eeew.
> > 
> > If you don't mind me asking. What is the problem with this error message?
> 
> The problem is not the error message. The problem is the abuse of
> request_irq() and the fact that this irq handler function exists in the
> first place for something which is NMI based.

I wanted to add this handler in case the interrupt was not configured correctly
to be delivered as NMI (e.g., not supported by the hardware). I see your point.
Perhaps this is not needed. There is code in place to complain when an interrupt
that nobody was expecting happens.

> 
> > > And in case that the HPET does not support periodic mode this reprogramms
> > > the timer on every NMI which means that while perf is running the watchdog
> > > will never ever detect anything.
> > 
> > Yes. I see that this is wrong. With MSI interrupts, as far as I can
> > see, there is not a way to make sure that the HPET timer caused the NMI
> > perhaps the only option is to use an IO APIC interrupt and read the
> > interrupt status register.
> > 
> > > Aside of that, reading TWO HPET registers for every NMI is insane. HPET
> > > access is horribly slow, so any high frequency perf monitoring will take a
> > > massive performance hit.
> > 
> > If an IO APIC interrupt is used, only HPET register (the status register)
> > would need to be read for every NMI. Would that be more acceptable? Otherwise,
> > there is no way to determine if the HPET cause the NMI.
> 
> You need level trigger for the HPET status register to be useful at all
> because in edge mode the interrupt status bits read always 0.

Indeed.

> 
> That means you have to fiddle with the IOAPIC acknowledge magic from NMI
> context. Brilliant idea. If the NMI hits in the middle of a regular
> io_apic_read() then the interrupted code will endup with the wrong index
> register. Not to talk about the fun which the affinity rotation from NMI
> context would bring.
> 
> Do not even think about using IOAPIC and level for this.

OK. I will stay away of it and focus on MSI.
> 
> > Alternatively, there could be a counter that skips reading the HPET status
> > register (and the detection of hardlockups) for every X NMIs. This would
> > reduce the overall frequency of HPET register reads.
> 
> Great plan. So if the watchdog is the only NMI (because perf is off) then
> you delay the watchdog detection by that count.

OK. This was a bad idea. Then, is it acceptable to have an read to an HPET
register per NMI just to check in the status register if the HPET timer
caused the NMI?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-16  0:51           ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-16  0:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, Peter Zijlstra, Alexei Starovoitov,
	Kai-Heng Feng, H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA,
	Ingo Molnar, Christoffer Dall, Davidlohr Bueso, Ashok Raj,
	Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin

On Fri, Jun 15, 2018 at 11:19:09AM +0200, Thomas Gleixner wrote:
> On Thu, 14 Jun 2018, Ricardo Neri wrote:
> > On Wed, Jun 13, 2018 at 11:40:00AM +0200, Thomas Gleixner wrote:
> > > On Tue, 12 Jun 2018, Ricardo Neri wrote:
> > > > @@ -183,6 +184,8 @@ static irqreturn_t hardlockup_detector_irq_handler(int irq, void *data)
> > > >  	if (!(hdata->flags & HPET_DEV_PERI_CAP))
> > > >  		kick_timer(hdata);
> > > >  
> > > > +	pr_err("This interrupt should not have happened. Ensure delivery mode is NMI.\n");
> > > 
> > > Eeew.
> > 
> > If you don't mind me asking. What is the problem with this error message?
> 
> The problem is not the error message. The problem is the abuse of
> request_irq() and the fact that this irq handler function exists in the
> first place for something which is NMI based.

I wanted to add this handler in case the interrupt was not configured correctly
to be delivered as NMI (e.g., not supported by the hardware). I see your point.
Perhaps this is not needed. There is code in place to complain when an interrupt
that nobody was expecting happens.

> 
> > > And in case that the HPET does not support periodic mode this reprogramms
> > > the timer on every NMI which means that while perf is running the watchdog
> > > will never ever detect anything.
> > 
> > Yes. I see that this is wrong. With MSI interrupts, as far as I can
> > see, there is not a way to make sure that the HPET timer caused the NMI
> > perhaps the only option is to use an IO APIC interrupt and read the
> > interrupt status register.
> > 
> > > Aside of that, reading TWO HPET registers for every NMI is insane. HPET
> > > access is horribly slow, so any high frequency perf monitoring will take a
> > > massive performance hit.
> > 
> > If an IO APIC interrupt is used, only HPET register (the status register)
> > would need to be read for every NMI. Would that be more acceptable? Otherwise,
> > there is no way to determine if the HPET cause the NMI.
> 
> You need level trigger for the HPET status register to be useful at all
> because in edge mode the interrupt status bits read always 0.

Indeed.

> 
> That means you have to fiddle with the IOAPIC acknowledge magic from NMI
> context. Brilliant idea. If the NMI hits in the middle of a regular
> io_apic_read() then the interrupted code will endup with the wrong index
> register. Not to talk about the fun which the affinity rotation from NMI
> context would bring.
> 
> Do not even think about using IOAPIC and level for this.

OK. I will stay away of it and focus on MSI.
> 
> > Alternatively, there could be a counter that skips reading the HPET status
> > register (and the detection of hardlockups) for every X NMIs. This would
> > reduce the overall frequency of HPET register reads.
> 
> Great plan. So if the watchdog is the only NMI (because perf is off) then
> you delay the watchdog detection by that count.

OK. This was a bad idea. Then, is it acceptable to have an read to an HPET
register per NMI just to check in the status register if the HPET timer
caused the NMI?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
  2018-06-16  0:51           ` Ricardo Neri
  (?)
@ 2018-06-16 13:24             ` Thomas Gleixner
  -1 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-16 13:24 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Randy Dunlap, Davidlohr Bueso, Christoffer Dall,
	Marc Zyngier, Kai-Heng Feng, Konrad Rzeszutek Wilk,
	David Rientjes, iommu

On Fri, 15 Jun 2018, Ricardo Neri wrote:
> On Fri, Jun 15, 2018 at 11:19:09AM +0200, Thomas Gleixner wrote:
> > On Thu, 14 Jun 2018, Ricardo Neri wrote:
> > > Alternatively, there could be a counter that skips reading the HPET status
> > > register (and the detection of hardlockups) for every X NMIs. This would
> > > reduce the overall frequency of HPET register reads.
> > 
> > Great plan. So if the watchdog is the only NMI (because perf is off) then
> > you delay the watchdog detection by that count.
> 
> OK. This was a bad idea. Then, is it acceptable to have an read to an HPET
> register per NMI just to check in the status register if the HPET timer
> caused the NMI?

The status register is useless in case of MSI. MSI is edge triggered ....

The only register which gives you proper information is the counter
register itself. That adds an massive overhead to each NMI, because the
counter register access is synchronized to the HPET clock with hardware
magic. Plus on larger systems, the HPET access is cross node and even
slower.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-16 13:24             ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-16 13:24 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu

On Fri, 15 Jun 2018, Ricardo Neri wrote:
> On Fri, Jun 15, 2018 at 11:19:09AM +0200, Thomas Gleixner wrote:
> > On Thu, 14 Jun 2018, Ricardo Neri wrote:
> > > Alternatively, there could be a counter that skips reading the HPET status
> > > register (and the detection of hardlockups) for every X NMIs. This would
> > > reduce the overall frequency of HPET register reads.
> > 
> > Great plan. So if the watchdog is the only NMI (because perf is off) then
> > you delay the watchdog detection by that count.
> 
> OK. This was a bad idea. Then, is it acceptable to have an read to an HPET
> register per NMI just to check in the status register if the HPET timer
> caused the NMI?

The status register is useless in case of MSI. MSI is edge triggered ....

The only register which gives you proper information is the counter
register itself. That adds an massive overhead to each NMI, because the
counter register access is synchronized to the HPET clock with hardware
magic. Plus on larger systems, the HPET access is cross node and even
slower.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-16 13:24             ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-16 13:24 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu

On Fri, 15 Jun 2018, Ricardo Neri wrote:
> On Fri, Jun 15, 2018 at 11:19:09AM +0200, Thomas Gleixner wrote:
> > On Thu, 14 Jun 2018, Ricardo Neri wrote:
> > > Alternatively, there could be a counter that skips reading the HPET status
> > > register (and the detection of hardlockups) for every X NMIs. This would
> > > reduce the overall frequency of HPET register reads.
> > 
> > Great plan. So if the watchdog is the only NMI (because perf is off) then
> > you delay the watchdog detection by that count.
> 
> OK. This was a bad idea. Then, is it acceptable to have an read to an HPET
> register per NMI just to check in the status register if the HPET timer
> caused the NMI?

The status register is useless in case of MSI. MSI is edge triggered ....

The only register which gives you proper information is the counter
register itself. That adds an massive overhead to each NMI, because the
counter register access is synchronized to the HPET clock with hardware
magic. Plus on larger systems, the HPET access is cross node and even
slower.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs
  2018-06-16  0:46           ` Ricardo Neri
  (?)
@ 2018-06-16 13:27             ` Thomas Gleixner
  -1 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-16 13:27 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Randy Dunlap, Davidlohr Bueso, Christoffer Dall,
	Marc Zyngier, Kai-Heng Feng, Konrad Rzeszutek Wilk,
	David Rientjes, iommu

On Fri, 15 Jun 2018, Ricardo Neri wrote:
> On Fri, Jun 15, 2018 at 12:29:06PM +0200, Thomas Gleixner wrote:
> > You have to consider two cases:
> > 
> >  1) !remapped mode:
> > 
> >     That's reasonably simple because you just have to deal with the HPET
> >     TIMERn_PROCMSG_ROUT register. But then you need to do this directly and
> >     not through any of the existing interrupt facilities.
> 
> Indeed, there is no need to use the generic interrupt faciities to set affinity;
> I am dealing with an NMI anyways.
> > 
> >  2) remapped mode:
> > 
> >     That's way more complex as you _cannot_ ever do anything which touches
> >     the IOMMU and the related tables.
> > 
> >     So you'd need to reserve an IOMMU remapping entry for each CPU upfront,
> >     store the resulting value for the HPET TIMERn_PROCMSG_ROUT register in
> >     per cpu storage and just modify that one from NMI.
> > 
> >     Though there might be subtle side effects involved, which are related to
> >     the acknowledge part. You need to talk to the IOMMU wizards first.
> 
> I see. I will look into the code and prototype something that makes sense for
> the IOMMU maintainers.

I'd recommend to talk to them _before_ you cobble something together. If we
cannot reliably switch the affinity by directing the HPET NMI to a
different IOMMU remapping entry then the whole scheme does not work at all.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs
@ 2018-06-16 13:27             ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-16 13:27 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu

On Fri, 15 Jun 2018, Ricardo Neri wrote:
> On Fri, Jun 15, 2018 at 12:29:06PM +0200, Thomas Gleixner wrote:
> > You have to consider two cases:
> > 
> >  1) !remapped mode:
> > 
> >     That's reasonably simple because you just have to deal with the HPET
> >     TIMERn_PROCMSG_ROUT register. But then you need to do this directly and
> >     not through any of the existing interrupt facilities.
> 
> Indeed, there is no need to use the generic interrupt faciities to set affinity;
> I am dealing with an NMI anyways.
> > 
> >  2) remapped mode:
> > 
> >     That's way more complex as you _cannot_ ever do anything which touches
> >     the IOMMU and the related tables.
> > 
> >     So you'd need to reserve an IOMMU remapping entry for each CPU upfront,
> >     store the resulting value for the HPET TIMERn_PROCMSG_ROUT register in
> >     per cpu storage and just modify that one from NMI.
> > 
> >     Though there might be subtle side effects involved, which are related to
> >     the acknowledge part. You need to talk to the IOMMU wizards first.
> 
> I see. I will look into the code and prototype something that makes sense for
> the IOMMU maintainers.

I'd recommend to talk to them _before_ you cobble something together. If we
cannot reliably switch the affinity by directing the HPET NMI to a
different IOMMU remapping entry then the whole scheme does not work at all.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs
@ 2018-06-16 13:27             ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-16 13:27 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu

On Fri, 15 Jun 2018, Ricardo Neri wrote:
> On Fri, Jun 15, 2018 at 12:29:06PM +0200, Thomas Gleixner wrote:
> > You have to consider two cases:
> > 
> >  1) !remapped mode:
> > 
> >     That's reasonably simple because you just have to deal with the HPET
> >     TIMERn_PROCMSG_ROUT register. But then you need to do this directly and
> >     not through any of the existing interrupt facilities.
> 
> Indeed, there is no need to use the generic interrupt faciities to set affinity;
> I am dealing with an NMI anyways.
> > 
> >  2) remapped mode:
> > 
> >     That's way more complex as you _cannot_ ever do anything which touches
> >     the IOMMU and the related tables.
> > 
> >     So you'd need to reserve an IOMMU remapping entry for each CPU upfront,
> >     store the resulting value for the HPET TIMERn_PROCMSG_ROUT register in
> >     per cpu storage and just modify that one from NMI.
> > 
> >     Though there might be subtle side effects involved, which are related to
> >     the acknowledge part. You need to talk to the IOMMU wizards first.
> 
> I see. I will look into the code and prototype something that makes sense for
> the IOMMU maintainers.

I'd recommend to talk to them _before_ you cobble something together. If we
cannot reliably switch the affinity by directing the HPET NMI to a
different IOMMU remapping entry then the whole scheme does not work at all.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
  2018-06-16  0:39                 ` Ricardo Neri
  (?)
@ 2018-06-16 13:36                   ` Thomas Gleixner
  -1 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-16 13:36 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Julien Thierry, Marc Zyngier, Peter Zijlstra, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Ashok Raj, Borislav Petkov,
	Tony Luck, Ravi V. Shankar, x86, sparclinux, linuxppc-dev,
	linux-kernel, Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Bartosz Golaszewski, Doug Berger,
	Palmer Dabbelt, iommu

On Fri, 15 Jun 2018, Ricardo Neri wrote:
> On Fri, Jun 15, 2018 at 09:01:02AM +0100, Julien Thierry wrote:
> > In my patches, I'm not sure there is much to adapt to your work as most of
> > it is arch specific (although I wont say no to another pair of eyes looking
> > at them). From what I've seen of your patches, the point where we converge
> > is that need for some code to be able to tell the irqchip "I want that
> > particular interrupt line to be treated/setup as an NMI".
> 
> Indeed, there has to be a generic way for the irqchip to announce that it
> supports configuring an interrupt as NMI... and a way to actually configuring
> it.

There has to be nothing. The irqchip infrastructure might be able to
provide certain aspects of NMI support, perhaps for initialization, but
everything else is fundamentally different and the executional parts simply
cannot use any of the irq chip functions at all.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-16 13:36                   ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-16 13:36 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Julien Thierry, Marc Zyngier, Peter Zijlstra, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Ashok Raj, Borislav Petkov,
	Tony Luck, Ravi V. Shankar, x86, sparclinux, linuxppc-dev,
	linux-kernel, Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Bartosz Golaszewski

On Fri, 15 Jun 2018, Ricardo Neri wrote:
> On Fri, Jun 15, 2018 at 09:01:02AM +0100, Julien Thierry wrote:
> > In my patches, I'm not sure there is much to adapt to your work as most of
> > it is arch specific (although I wont say no to another pair of eyes looking
> > at them). From what I've seen of your patches, the point where we converge
> > is that need for some code to be able to tell the irqchip "I want that
> > particular interrupt line to be treated/setup as an NMI".
> 
> Indeed, there has to be a generic way for the irqchip to announce that it
> supports configuring an interrupt as NMI... and a way to actually configuring
> it.

There has to be nothing. The irqchip infrastructure might be able to
provide certain aspects of NMI support, perhaps for initialization, but
everything else is fundamentally different and the executional parts simply
cannot use any of the irq chip functions at all.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI
@ 2018-06-16 13:36                   ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-16 13:36 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Julien Thierry, Marc Zyngier, Peter Zijlstra, Ingo Molnar,
	H. Peter Anvin, Andi Kleen, Ashok Raj, Borislav Petkov,
	Tony Luck, Ravi V. Shankar, x86, sparclinux, linuxppc-dev,
	linux-kernel, Jacob Pan, Daniel Lezcano, Andrew Morton, Levin,
	Alexander (Sasha Levin),
	Randy Dunlap, Masami Hiramatsu, Bartosz Golaszewski

On Fri, 15 Jun 2018, Ricardo Neri wrote:
> On Fri, Jun 15, 2018 at 09:01:02AM +0100, Julien Thierry wrote:
> > In my patches, I'm not sure there is much to adapt to your work as most of
> > it is arch specific (although I wont say no to another pair of eyes looking
> > at them). From what I've seen of your patches, the point where we converge
> > is that need for some code to be able to tell the irqchip "I want that
> > particular interrupt line to be treated/setup as an NMI".
> 
> Indeed, there has to be a generic way for the irqchip to announce that it
> supports configuring an interrupt as NMI... and a way to actually configuring
> it.

There has to be nothing. The irqchip infrastructure might be able to
provide certain aspects of NMI support, perhaps for initialization, but
everything else is fundamentally different and the executional parts simply
cannot use any of the irq chip functions at all.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-20  0:15               ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-20  0:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Randy Dunlap, Davidlohr Bueso, Christoffer Dall,
	Marc Zyngier, Kai-Heng Feng, Konrad Rzeszutek Wilk,
	David Rientjes, iommu

On Sat, Jun 16, 2018 at 03:24:49PM +0200, Thomas Gleixner wrote:
> On Fri, 15 Jun 2018, Ricardo Neri wrote:
> > On Fri, Jun 15, 2018 at 11:19:09AM +0200, Thomas Gleixner wrote:
> > > On Thu, 14 Jun 2018, Ricardo Neri wrote:
> > > > Alternatively, there could be a counter that skips reading the HPET status
> > > > register (and the detection of hardlockups) for every X NMIs. This would
> > > > reduce the overall frequency of HPET register reads.
> > > 
> > > Great plan. So if the watchdog is the only NMI (because perf is off) then
> > > you delay the watchdog detection by that count.
> > 
> > OK. This was a bad idea. Then, is it acceptable to have an read to an HPET
> > register per NMI just to check in the status register if the HPET timer
> > caused the NMI?
> 
> The status register is useless in case of MSI. MSI is edge triggered ....
> 
> The only register which gives you proper information is the counter
> register itself. That adds an massive overhead to each NMI, because the
> counter register access is synchronized to the HPET clock with hardware
> magic. Plus on larger systems, the HPET access is cross node and even
> slower.

It starts to sound that the HPET is too slow to drive the hardlockup detector.

Would it be possible to envision a variant of this implementation? In this
variant, the HPET only targets a single CPU. The actual hardlockup detector
is implemented by this single CPU sending interprocessor interrupts to the
rest of the CPUs.

In this manner only one CPU has to deal with the slowness of the HPET; the
rest of the CPUs don't have to read or write any HPET registers. A sysfs
entry could be added to configure which CPU will have to deal with the HPET
timer. However, profiling could not be done accurately on such CPU.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-20  0:15               ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-20  0:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, Peter Zijlstra, Alexei Starovoitov,
	Kai-Heng Feng, H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA,
	Ingo Molnar, Christoffer Dall, Davidlohr Bueso, Ashok Raj,
	Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin

On Sat, Jun 16, 2018 at 03:24:49PM +0200, Thomas Gleixner wrote:
> On Fri, 15 Jun 2018, Ricardo Neri wrote:
> > On Fri, Jun 15, 2018 at 11:19:09AM +0200, Thomas Gleixner wrote:
> > > On Thu, 14 Jun 2018, Ricardo Neri wrote:
> > > > Alternatively, there could be a counter that skips reading the HPET status
> > > > register (and the detection of hardlockups) for every X NMIs. This would
> > > > reduce the overall frequency of HPET register reads.
> > > 
> > > Great plan. So if the watchdog is the only NMI (because perf is off) then
> > > you delay the watchdog detection by that count.
> > 
> > OK. This was a bad idea. Then, is it acceptable to have an read to an HPET
> > register per NMI just to check in the status register if the HPET timer
> > caused the NMI?
> 
> The status register is useless in case of MSI. MSI is edge triggered ....
> 
> The only register which gives you proper information is the counter
> register itself. That adds an massive overhead to each NMI, because the
> counter register access is synchronized to the HPET clock with hardware
> magic. Plus on larger systems, the HPET access is cross node and even
> slower.

It starts to sound that the HPET is too slow to drive the hardlockup detector.

Would it be possible to envision a variant of this implementation? In this
variant, the HPET only targets a single CPU. The actual hardlockup detector
is implemented by this single CPU sending interprocessor interrupts to the
rest of the CPUs.

In this manner only one CPU has to deal with the slowness of the HPET; the
rest of the CPUs don't have to read or write any HPET registers. A sysfs
entry could be added to configure which CPU will have to deal with the HPET
timer. However, profiling could not be done accurately on such CPU.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-20  0:15               ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-20  0:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, Peter Zijlstra, Alexei Starovoitov,
	Kai-Heng Feng, H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA,
	Ingo Molnar, Christoffer Dall, Davidlohr Bueso, Ashok Raj,
	Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin

On Sat, Jun 16, 2018 at 03:24:49PM +0200, Thomas Gleixner wrote:
> On Fri, 15 Jun 2018, Ricardo Neri wrote:
> > On Fri, Jun 15, 2018 at 11:19:09AM +0200, Thomas Gleixner wrote:
> > > On Thu, 14 Jun 2018, Ricardo Neri wrote:
> > > > Alternatively, there could be a counter that skips reading the HPET status
> > > > register (and the detection of hardlockups) for every X NMIs. This would
> > > > reduce the overall frequency of HPET register reads.
> > > 
> > > Great plan. So if the watchdog is the only NMI (because perf is off) then
> > > you delay the watchdog detection by that count.
> > 
> > OK. This was a bad idea. Then, is it acceptable to have an read to an HPET
> > register per NMI just to check in the status register if the HPET timer
> > caused the NMI?
> 
> The status register is useless in case of MSI. MSI is edge triggered ....
> 
> The only register which gives you proper information is the counter
> register itself. That adds an massive overhead to each NMI, because the
> counter register access is synchronized to the HPET clock with hardware
> magic. Plus on larger systems, the HPET access is cross node and even
> slower.

It starts to sound that the HPET is too slow to drive the hardlockup detector.

Would it be possible to envision a variant of this implementation? In this
variant, the HPET only targets a single CPU. The actual hardlockup detector
is implemented by this single CPU sending interprocessor interrupts to the
rest of the CPUs.

In this manner only one CPU has to deal with the slowness of the HPET; the
rest of the CPUs don't have to read or write any HPET registers. A sysfs
entry could be added to configure which CPU will have to deal with the HPET
timer. However, profiling could not be done accurately on such CPU.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
  2018-06-20  0:15               ` Ricardo Neri
  (?)
@ 2018-06-20  0:25                 ` Randy Dunlap
  -1 siblings, 0 replies; 200+ messages in thread
From: Randy Dunlap @ 2018-06-20  0:25 UTC (permalink / raw)
  To: Ricardo Neri, Thomas Gleixner
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Davidlohr Bueso, Christoffer Dall, Marc Zyngier,
	Kai-Heng Feng, Konrad Rzeszutek Wilk, David Rientjes, iommu

On 06/19/2018 05:15 PM, Ricardo Neri wrote:
> On Sat, Jun 16, 2018 at 03:24:49PM +0200, Thomas Gleixner wrote:
>> On Fri, 15 Jun 2018, Ricardo Neri wrote:
>>> On Fri, Jun 15, 2018 at 11:19:09AM +0200, Thomas Gleixner wrote:
>>>> On Thu, 14 Jun 2018, Ricardo Neri wrote:
>>>>> Alternatively, there could be a counter that skips reading the HPET status
>>>>> register (and the detection of hardlockups) for every X NMIs. This would
>>>>> reduce the overall frequency of HPET register reads.
>>>>
>>>> Great plan. So if the watchdog is the only NMI (because perf is off) then
>>>> you delay the watchdog detection by that count.
>>>
>>> OK. This was a bad idea. Then, is it acceptable to have an read to an HPET
>>> register per NMI just to check in the status register if the HPET timer
>>> caused the NMI?
>>
>> The status register is useless in case of MSI. MSI is edge triggered ....
>>
>> The only register which gives you proper information is the counter
>> register itself. That adds an massive overhead to each NMI, because the
>> counter register access is synchronized to the HPET clock with hardware
>> magic. Plus on larger systems, the HPET access is cross node and even
>> slower.
> 
> It starts to sound that the HPET is too slow to drive the hardlockup detector.
> 
> Would it be possible to envision a variant of this implementation? In this
> variant, the HPET only targets a single CPU. The actual hardlockup detector
> is implemented by this single CPU sending interprocessor interrupts to the
> rest of the CPUs.
> 
> In this manner only one CPU has to deal with the slowness of the HPET; the
> rest of the CPUs don't have to read or write any HPET registers. A sysfs
> entry could be added to configure which CPU will have to deal with the HPET
> timer. However, profiling could not be done accurately on such CPU.

Please forgive my simple question:

What happens when this one CPU is the one that locks up?

thnx,
-- 
~Randy

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-20  0:25                 ` Randy Dunlap
  0 siblings, 0 replies; 200+ messages in thread
From: Randy Dunlap @ 2018-06-20  0:25 UTC (permalink / raw)
  To: Ricardo Neri, Thomas Gleixner
  Cc: Rafael J. Wysocki, Peter Zijlstra, Alexei Starovoitov,
	Kai-Heng Feng, H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA,
	Ingo Molnar, Christoffer Dall, Davidlohr Bueso, Ashok Raj,
	Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin

On 06/19/2018 05:15 PM, Ricardo Neri wrote:
> On Sat, Jun 16, 2018 at 03:24:49PM +0200, Thomas Gleixner wrote:
>> On Fri, 15 Jun 2018, Ricardo Neri wrote:
>>> On Fri, Jun 15, 2018 at 11:19:09AM +0200, Thomas Gleixner wrote:
>>>> On Thu, 14 Jun 2018, Ricardo Neri wrote:
>>>>> Alternatively, there could be a counter that skips reading the HPET status
>>>>> register (and the detection of hardlockups) for every X NMIs. This would
>>>>> reduce the overall frequency of HPET register reads.
>>>>
>>>> Great plan. So if the watchdog is the only NMI (because perf is off) then
>>>> you delay the watchdog detection by that count.
>>>
>>> OK. This was a bad idea. Then, is it acceptable to have an read to an HPET
>>> register per NMI just to check in the status register if the HPET timer
>>> caused the NMI?
>>
>> The status register is useless in case of MSI. MSI is edge triggered ....
>>
>> The only register which gives you proper information is the counter
>> register itself. That adds an massive overhead to each NMI, because the
>> counter register access is synchronized to the HPET clock with hardware
>> magic. Plus on larger systems, the HPET access is cross node and even
>> slower.
> 
> It starts to sound that the HPET is too slow to drive the hardlockup detector.
> 
> Would it be possible to envision a variant of this implementation? In this
> variant, the HPET only targets a single CPU. The actual hardlockup detector
> is implemented by this single CPU sending interprocessor interrupts to the
> rest of the CPUs.
> 
> In this manner only one CPU has to deal with the slowness of the HPET; the
> rest of the CPUs don't have to read or write any HPET registers. A sysfs
> entry could be added to configure which CPU will have to deal with the HPET
> timer. However, profiling could not be done accurately on such CPU.

Please forgive my simple question:

What happens when this one CPU is the one that locks up?

thnx,
-- 
~Randy

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-20  0:25                 ` Randy Dunlap
  0 siblings, 0 replies; 200+ messages in thread
From: Randy Dunlap @ 2018-06-20  0:25 UTC (permalink / raw)
  To: Ricardo Neri, Thomas Gleixner
  Cc: Rafael J. Wysocki, Peter Zijlstra, Alexei Starovoitov,
	Kai-Heng Feng, H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA,
	Ingo Molnar, Christoffer Dall, Davidlohr Bueso, Ashok Raj,
	Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin

On 06/19/2018 05:15 PM, Ricardo Neri wrote:
> On Sat, Jun 16, 2018 at 03:24:49PM +0200, Thomas Gleixner wrote:
>> On Fri, 15 Jun 2018, Ricardo Neri wrote:
>>> On Fri, Jun 15, 2018 at 11:19:09AM +0200, Thomas Gleixner wrote:
>>>> On Thu, 14 Jun 2018, Ricardo Neri wrote:
>>>>> Alternatively, there could be a counter that skips reading the HPET status
>>>>> register (and the detection of hardlockups) for every X NMIs. This would
>>>>> reduce the overall frequency of HPET register reads.
>>>>
>>>> Great plan. So if the watchdog is the only NMI (because perf is off) then
>>>> you delay the watchdog detection by that count.
>>>
>>> OK. This was a bad idea. Then, is it acceptable to have an read to an HPET
>>> register per NMI just to check in the status register if the HPET timer
>>> caused the NMI?
>>
>> The status register is useless in case of MSI. MSI is edge triggered ....
>>
>> The only register which gives you proper information is the counter
>> register itself. That adds an massive overhead to each NMI, because the
>> counter register access is synchronized to the HPET clock with hardware
>> magic. Plus on larger systems, the HPET access is cross node and even
>> slower.
> 
> It starts to sound that the HPET is too slow to drive the hardlockup detector.
> 
> Would it be possible to envision a variant of this implementation? In this
> variant, the HPET only targets a single CPU. The actual hardlockup detector
> is implemented by this single CPU sending interprocessor interrupts to the
> rest of the CPUs.
> 
> In this manner only one CPU has to deal with the slowness of the HPET; the
> rest of the CPUs don't have to read or write any HPET registers. A sysfs
> entry could be added to configure which CPU will have to deal with the HPET
> timer. However, profiling could not be done accurately on such CPU.

Please forgive my simple question:

What happens when this one CPU is the one that locks up?

thnx,
-- 
~Randy

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
  2018-06-20  0:15               ` Ricardo Neri
  (?)
@ 2018-06-20  7:47                 ` Thomas Gleixner
  -1 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-20  7:47 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Randy Dunlap, Davidlohr Bueso, Christoffer Dall,
	Marc Zyngier, Kai-Heng Feng, Konrad Rzeszutek Wilk,
	David Rientjes, iommu

On Tue, 19 Jun 2018, Ricardo Neri wrote:
> On Sat, Jun 16, 2018 at 03:24:49PM +0200, Thomas Gleixner wrote:
> > The status register is useless in case of MSI. MSI is edge triggered ....
> > 
> > The only register which gives you proper information is the counter
> > register itself. That adds an massive overhead to each NMI, because the
> > counter register access is synchronized to the HPET clock with hardware
> > magic. Plus on larger systems, the HPET access is cross node and even
> > slower.
> 
> It starts to sound that the HPET is too slow to drive the hardlockup detector.
> 
> Would it be possible to envision a variant of this implementation? In this
> variant, the HPET only targets a single CPU. The actual hardlockup detector
> is implemented by this single CPU sending interprocessor interrupts to the
> rest of the CPUs.

And these IPIs must be NMIs which need to have a software based indicator
that the watchdog needs to be checked, which is going to create yet another
can of race conditions and in the worst case 'unknown NMI' splats. Not
pretty either.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-20  7:47                 ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-20  7:47 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Rafael J. Wysocki, Peter Zijlstra, Alexei Starovoitov,
	Kai-Heng Feng, H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA,
	Ingo Molnar, Christoffer Dall, Davidlohr Bueso, Ashok Raj,
	Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin

On Tue, 19 Jun 2018, Ricardo Neri wrote:
> On Sat, Jun 16, 2018 at 03:24:49PM +0200, Thomas Gleixner wrote:
> > The status register is useless in case of MSI. MSI is edge triggered ....
> > 
> > The only register which gives you proper information is the counter
> > register itself. That adds an massive overhead to each NMI, because the
> > counter register access is synchronized to the HPET clock with hardware
> > magic. Plus on larger systems, the HPET access is cross node and even
> > slower.
> 
> It starts to sound that the HPET is too slow to drive the hardlockup detector.
> 
> Would it be possible to envision a variant of this implementation? In this
> variant, the HPET only targets a single CPU. The actual hardlockup detector
> is implemented by this single CPU sending interprocessor interrupts to the
> rest of the CPUs.

And these IPIs must be NMIs which need to have a software based indicator
that the watchdog needs to be checked, which is going to create yet another
can of race conditions and in the worst case 'unknown NMI' splats. Not
pretty either.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-20  7:47                 ` Thomas Gleixner
  0 siblings, 0 replies; 200+ messages in thread
From: Thomas Gleixner @ 2018-06-20  7:47 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Rafael J. Wysocki, Peter Zijlstra, Alexei Starovoitov,
	Kai-Heng Feng, H. Peter Anvin, sparclinux-u79uwXL29TY76Z2rM5mHXA,
	Ingo Molnar, Christoffer Dall, Davidlohr Bueso, Ashok Raj,
	Michael Ellerman, x86-DgEjT+Ai2ygdnm+yROfE0A,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	David Rientjes, Andi Kleen, Waiman Long, Borislav Petkov,
	Masami Hiramatsu, Don Zickus, Ravi V. Shankar,
	Konrad Rzeszutek Wilk, Marc Zyngier, Frederic Weisbecker,
	Nicholas Piggin

On Tue, 19 Jun 2018, Ricardo Neri wrote:
> On Sat, Jun 16, 2018 at 03:24:49PM +0200, Thomas Gleixner wrote:
> > The status register is useless in case of MSI. MSI is edge triggered ....
> > 
> > The only register which gives you proper information is the counter
> > register itself. That adds an massive overhead to each NMI, because the
> > counter register access is synchronized to the HPET clock with hardware
> > magic. Plus on larger systems, the HPET access is cross node and even
> > slower.
> 
> It starts to sound that the HPET is too slow to drive the hardlockup detector.
> 
> Would it be possible to envision a variant of this implementation? In this
> variant, the HPET only targets a single CPU. The actual hardlockup detector
> is implemented by this single CPU sending interprocessor interrupts to the
> rest of the CPUs.

And these IPIs must be NMIs which need to have a software based indicator
that the watchdog needs to be checked, which is going to create yet another
can of race conditions and in the worst case 'unknown NMI' splats. Not
pretty either.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
  2018-06-20  0:25                 ` Randy Dunlap
  (?)
@ 2018-06-21  0:25                   ` Ricardo Neri
  -1 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-21  0:25 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Ashok Raj, Borislav Petkov, Tony Luck, Ravi V. Shankar, x86,
	sparclinux, linuxppc-dev, linux-kernel, Jacob Pan,
	Rafael J. Wysocki, Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Davidlohr Bueso, Christoffer Dall, Marc Zyngier,
	Kai-Heng Feng, Konrad Rzeszutek Wilk, David Rientjes, iommu

On Tue, Jun 19, 2018 at 05:25:09PM -0700, Randy Dunlap wrote:
> On 06/19/2018 05:15 PM, Ricardo Neri wrote:
> > On Sat, Jun 16, 2018 at 03:24:49PM +0200, Thomas Gleixner wrote:
> >> On Fri, 15 Jun 2018, Ricardo Neri wrote:
> >>> On Fri, Jun 15, 2018 at 11:19:09AM +0200, Thomas Gleixner wrote:
> >>>> On Thu, 14 Jun 2018, Ricardo Neri wrote:
> >>>>> Alternatively, there could be a counter that skips reading the HPET status
> >>>>> register (and the detection of hardlockups) for every X NMIs. This would
> >>>>> reduce the overall frequency of HPET register reads.
> >>>>
> >>>> Great plan. So if the watchdog is the only NMI (because perf is off) then
> >>>> you delay the watchdog detection by that count.
> >>>
> >>> OK. This was a bad idea. Then, is it acceptable to have an read to an HPET
> >>> register per NMI just to check in the status register if the HPET timer
> >>> caused the NMI?
> >>
> >> The status register is useless in case of MSI. MSI is edge triggered ....
> >>
> >> The only register which gives you proper information is the counter
> >> register itself. That adds an massive overhead to each NMI, because the
> >> counter register access is synchronized to the HPET clock with hardware
> >> magic. Plus on larger systems, the HPET access is cross node and even
> >> slower.
> > 
> > It starts to sound that the HPET is too slow to drive the hardlockup detector.
> > 
> > Would it be possible to envision a variant of this implementation? In this
> > variant, the HPET only targets a single CPU. The actual hardlockup detector
> > is implemented by this single CPU sending interprocessor interrupts to the
> > rest of the CPUs.
> > 
> > In this manner only one CPU has to deal with the slowness of the HPET; the
> > rest of the CPUs don't have to read or write any HPET registers. A sysfs
> > entry could be added to configure which CPU will have to deal with the HPET
> > timer. However, profiling could not be done accurately on such CPU.
> 
> Please forgive my simple question:
> 
> What happens when this one CPU is the one that locks up?

I think that in this particular case this one CPU would check for hardlockups
on itself when it receives the NMI from the HPET timer. It would also issue
NMIs to the other monitored processors.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-21  0:25                   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-21  0:25 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Ashok Raj, Borislav Petkov, Tony Luck, Ravi V. Shankar, x86,
	sparclinux, linuxppc-dev, linux-kernel, Jacob Pan,
	Rafael J. Wysocki, Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami

On Tue, Jun 19, 2018 at 05:25:09PM -0700, Randy Dunlap wrote:
> On 06/19/2018 05:15 PM, Ricardo Neri wrote:
> > On Sat, Jun 16, 2018 at 03:24:49PM +0200, Thomas Gleixner wrote:
> >> On Fri, 15 Jun 2018, Ricardo Neri wrote:
> >>> On Fri, Jun 15, 2018 at 11:19:09AM +0200, Thomas Gleixner wrote:
> >>>> On Thu, 14 Jun 2018, Ricardo Neri wrote:
> >>>>> Alternatively, there could be a counter that skips reading the HPET status
> >>>>> register (and the detection of hardlockups) for every X NMIs. This would
> >>>>> reduce the overall frequency of HPET register reads.
> >>>>
> >>>> Great plan. So if the watchdog is the only NMI (because perf is off) then
> >>>> you delay the watchdog detection by that count.
> >>>
> >>> OK. This was a bad idea. Then, is it acceptable to have an read to an HPET
> >>> register per NMI just to check in the status register if the HPET timer
> >>> caused the NMI?
> >>
> >> The status register is useless in case of MSI. MSI is edge triggered ....
> >>
> >> The only register which gives you proper information is the counter
> >> register itself. That adds an massive overhead to each NMI, because the
> >> counter register access is synchronized to the HPET clock with hardware
> >> magic. Plus on larger systems, the HPET access is cross node and even
> >> slower.
> > 
> > It starts to sound that the HPET is too slow to drive the hardlockup detector.
> > 
> > Would it be possible to envision a variant of this implementation? In this
> > variant, the HPET only targets a single CPU. The actual hardlockup detector
> > is implemented by this single CPU sending interprocessor interrupts to the
> > rest of the CPUs.
> > 
> > In this manner only one CPU has to deal with the slowness of the HPET; the
> > rest of the CPUs don't have to read or write any HPET registers. A sysfs
> > entry could be added to configure which CPU will have to deal with the HPET
> > timer. However, profiling could not be done accurately on such CPU.
> 
> Please forgive my simple question:
> 
> What happens when this one CPU is the one that locks up?

I think that in this particular case this one CPU would check for hardlockups
on itself when it receives the NMI from the HPET timer. It would also issue
NMIs to the other monitored processors.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
@ 2018-06-21  0:25                   ` Ricardo Neri
  0 siblings, 0 replies; 200+ messages in thread
From: Ricardo Neri @ 2018-06-21  0:25 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andi Kleen,
	Ashok Raj, Borislav Petkov, Tony Luck, Ravi V. Shankar, x86,
	sparclinux, linuxppc-dev, linux-kernel, Jacob Pan,
	Rafael J. Wysocki, Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami

On Tue, Jun 19, 2018 at 05:25:09PM -0700, Randy Dunlap wrote:
> On 06/19/2018 05:15 PM, Ricardo Neri wrote:
> > On Sat, Jun 16, 2018 at 03:24:49PM +0200, Thomas Gleixner wrote:
> >> On Fri, 15 Jun 2018, Ricardo Neri wrote:
> >>> On Fri, Jun 15, 2018 at 11:19:09AM +0200, Thomas Gleixner wrote:
> >>>> On Thu, 14 Jun 2018, Ricardo Neri wrote:
> >>>>> Alternatively, there could be a counter that skips reading the HPET status
> >>>>> register (and the detection of hardlockups) for every X NMIs. This would
> >>>>> reduce the overall frequency of HPET register reads.
> >>>>
> >>>> Great plan. So if the watchdog is the only NMI (because perf is off) then
> >>>> you delay the watchdog detection by that count.
> >>>
> >>> OK. This was a bad idea. Then, is it acceptable to have an read to an HPET
> >>> register per NMI just to check in the status register if the HPET timer
> >>> caused the NMI?
> >>
> >> The status register is useless in case of MSI. MSI is edge triggered ....
> >>
> >> The only register which gives you proper information is the counter
> >> register itself. That adds an massive overhead to each NMI, because the
> >> counter register access is synchronized to the HPET clock with hardware
> >> magic. Plus on larger systems, the HPET access is cross node and even
> >> slower.
> > 
> > It starts to sound that the HPET is too slow to drive the hardlockup detector.
> > 
> > Would it be possible to envision a variant of this implementation? In this
> > variant, the HPET only targets a single CPU. The actual hardlockup detector
> > is implemented by this single CPU sending interprocessor interrupts to the
> > rest of the CPUs.
> > 
> > In this manner only one CPU has to deal with the slowness of the HPET; the
> > rest of the CPUs don't have to read or write any HPET registers. A sysfs
> > entry could be added to configure which CPU will have to deal with the HPET
> > timer. However, profiling could not be done accurately on such CPU.
> 
> Please forgive my simple question:
> 
> What happens when this one CPU is the one that locks up?

I think that in this particular case this one CPU would check for hardlockups
on itself when it receives the NMI from the HPET timer. It would also issue
NMIs to the other monitored processors.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

end of thread, other threads:[~2018-06-21  0:29 UTC | newest]

Thread overview: 200+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-13  0:57 [RFC PATCH 00/23] Implement an HPET-based hardlockup detector Ricardo Neri
2018-06-13  0:57 ` Ricardo Neri
2018-06-13  0:57 ` [RFC PATCH 01/23] x86/apic: Add a parameter for the APIC delivery mode Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57 ` [RFC PATCH 02/23] genirq: Introduce IRQD_DELIVER_AS_NMI Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57 ` [RFC PATCH 03/23] genirq: Introduce IRQF_DELIVER_AS_NMI Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  8:34   ` Peter Zijlstra
2018-06-13  8:34     ` Peter Zijlstra
2018-06-13  8:34     ` Peter Zijlstra
2018-06-13  8:59     ` Julien Thierry
2018-06-13  8:59       ` Julien Thierry
2018-06-13  8:59       ` Julien Thierry
2018-06-13  9:20       ` Thomas Gleixner
2018-06-13  9:20         ` Thomas Gleixner
2018-06-13  9:20         ` Thomas Gleixner
2018-06-13  9:36         ` Julien Thierry
2018-06-13  9:36           ` Julien Thierry
2018-06-13  9:36           ` Julien Thierry
2018-06-13  9:49           ` Julien Thierry
2018-06-13  9:49             ` Julien Thierry
2018-06-13  9:49             ` Julien Thierry
2018-06-13  9:57           ` Thomas Gleixner
2018-06-13  9:57             ` Thomas Gleixner
2018-06-13  9:57             ` Thomas Gleixner
2018-06-13 10:25             ` Julien Thierry
2018-06-13 10:25               ` Julien Thierry
2018-06-13 10:25               ` Julien Thierry
2018-06-13 10:06         ` Marc Zyngier
2018-06-13 10:06           ` Marc Zyngier
2018-06-13 10:06           ` Marc Zyngier
2018-06-15  2:12           ` Ricardo Neri
2018-06-15  2:12             ` Ricardo Neri
2018-06-15  2:12             ` Ricardo Neri
2018-06-15  8:01             ` Julien Thierry
2018-06-15  8:01               ` Julien Thierry
2018-06-15  8:01               ` Julien Thierry
2018-06-16  0:39               ` Ricardo Neri
2018-06-16  0:39                 ` Ricardo Neri
2018-06-16  0:39                 ` Ricardo Neri
2018-06-16 13:36                 ` Thomas Gleixner
2018-06-16 13:36                   ` Thomas Gleixner
2018-06-16 13:36                   ` Thomas Gleixner
2018-06-13  0:57 ` [RFC PATCH 04/23] iommu/vt-d/irq_remapping: Add support for IRQCHIP_CAN_DELIVER_AS_NMI Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57 ` [RFC PATCH 05/23] x86/msi: " Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57 ` [RFC PATCH 06/23] x86/ioapic: Add support for IRQCHIP_CAN_DELIVER_AS_NMI with interrupt remapping Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57 ` [RFC PATCH 07/23] x86/hpet: Expose more functions to read and write registers Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57 ` [RFC PATCH 08/23] x86/hpet: Calculate ticks-per-second in a separate function Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57 ` [RFC PATCH 09/23] x86/hpet: Reserve timer for the HPET hardlockup detector Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57 ` [RFC PATCH 10/23] x86/hpet: Relocate flag definitions to a header file Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57 ` [RFC PATCH 11/23] x86/hpet: Configure the timer used by the hardlockup detector Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57 ` [RFC PATCH 12/23] kernel/watchdog: Introduce a struct for NMI watchdog operations Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  7:41   ` Nicholas Piggin
2018-06-13  7:41     ` Nicholas Piggin
2018-06-13  7:41     ` Nicholas Piggin
2018-06-13  8:42     ` Peter Zijlstra
2018-06-13  8:42       ` Peter Zijlstra
2018-06-13  8:42       ` Peter Zijlstra
2018-06-13  9:26       ` Thomas Gleixner
2018-06-13  9:26         ` Thomas Gleixner
2018-06-13  9:26         ` Thomas Gleixner
2018-06-13 11:52         ` Nicholas Piggin
2018-06-13 11:52           ` Nicholas Piggin
2018-06-13 11:52           ` Nicholas Piggin
2018-06-14  1:31           ` Ricardo Neri
2018-06-14  1:31             ` Ricardo Neri
2018-06-14  1:31             ` Ricardo Neri
2018-06-14  2:32             ` Nicholas Piggin
2018-06-14  2:32               ` Nicholas Piggin
2018-06-14  2:32               ` Nicholas Piggin
2018-06-14  8:32               ` Thomas Gleixner
2018-06-14  8:32                 ` Thomas Gleixner
2018-06-14  8:32                 ` Thomas Gleixner
2018-06-15  2:21               ` Ricardo Neri
2018-06-15  2:21                 ` Ricardo Neri
2018-06-15  2:21                 ` Ricardo Neri
2018-06-14  1:26       ` Ricardo Neri
2018-06-14  1:26         ` Ricardo Neri
2018-06-14  1:26         ` Ricardo Neri
2018-06-13  0:57 ` [RFC PATCH 13/23] watchdog/hardlockup: Define a generic function to detect hardlockups Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57 ` [RFC PATCH 14/23] watchdog/hardlockup: Decouple the hardlockup detector from perf Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  8:43   ` Peter Zijlstra
2018-06-13  8:43     ` Peter Zijlstra
2018-06-13  8:43     ` Peter Zijlstra
2018-06-14  1:19     ` Ricardo Neri
2018-06-14  1:19       ` Ricardo Neri
2018-06-14  1:19       ` Ricardo Neri
2018-06-14  1:41       ` Nicholas Piggin
2018-06-14  1:41         ` Nicholas Piggin
2018-06-14  1:41         ` Nicholas Piggin
2018-06-15  2:23         ` Ricardo Neri
2018-06-15  2:23           ` Ricardo Neri
2018-06-15  2:23           ` Ricardo Neri
2018-06-13  0:57 ` [RFC PATCH 15/23] kernel/watchdog: Add a function to obtain the watchdog_allowed_mask Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57 ` [RFC PATCH 16/23] watchdog/hardlockup: Add an HPET-based hardlockup detector Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  5:23   ` Randy Dunlap
2018-06-13  5:23     ` Randy Dunlap
2018-06-13  5:23     ` Randy Dunlap
2018-06-14  1:00     ` Ricardo Neri
2018-06-14  1:00       ` Ricardo Neri
2018-06-14  1:00       ` Ricardo Neri
2018-06-13  0:57 ` [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  9:07   ` Peter Zijlstra
2018-06-13  9:07     ` Peter Zijlstra
2018-06-13  9:07     ` Peter Zijlstra
2018-06-15  2:07     ` Ricardo Neri
2018-06-15  2:07       ` Ricardo Neri
2018-06-15  2:07       ` Ricardo Neri
2018-06-13  9:40   ` Thomas Gleixner
2018-06-13  9:40     ` Thomas Gleixner
2018-06-13  9:40     ` Thomas Gleixner
2018-06-15  2:03     ` Ricardo Neri
2018-06-15  2:03       ` Ricardo Neri
2018-06-15  2:03       ` Ricardo Neri
2018-06-15  9:19       ` Thomas Gleixner
2018-06-15  9:19         ` Thomas Gleixner
2018-06-15  9:19         ` Thomas Gleixner
2018-06-16  0:51         ` Ricardo Neri
2018-06-16  0:51           ` Ricardo Neri
2018-06-16  0:51           ` Ricardo Neri
2018-06-16 13:24           ` Thomas Gleixner
2018-06-16 13:24             ` Thomas Gleixner
2018-06-16 13:24             ` Thomas Gleixner
2018-06-20  0:15             ` Ricardo Neri
2018-06-20  0:15               ` Ricardo Neri
2018-06-20  0:15               ` Ricardo Neri
2018-06-20  0:25               ` Randy Dunlap
2018-06-20  0:25                 ` Randy Dunlap
2018-06-20  0:25                 ` Randy Dunlap
2018-06-21  0:25                 ` Ricardo Neri
2018-06-21  0:25                   ` Ricardo Neri
2018-06-21  0:25                   ` Ricardo Neri
2018-06-20  7:47               ` Thomas Gleixner
2018-06-20  7:47                 ` Thomas Gleixner
2018-06-20  7:47                 ` Thomas Gleixner
2018-06-13  0:57 ` [RFC PATCH 18/23] watchdog/hardlockup/hpet: Add the NMI watchdog operations Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57 ` [RFC PATCH 19/23] watchdog/hardlockup: Make arch_touch_nmi_watchdog() to hpet-based implementation Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57 ` [RFC PATCH 20/23] watchdog/hardlockup/hpet: Rotate interrupt among all monitored CPUs Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  9:48   ` Thomas Gleixner
2018-06-13  9:48     ` Thomas Gleixner
2018-06-13  9:48     ` Thomas Gleixner
2018-06-15  2:16     ` Ricardo Neri
2018-06-15  2:16       ` Ricardo Neri
2018-06-15  2:16       ` Ricardo Neri
2018-06-15 10:29       ` Thomas Gleixner
2018-06-15 10:29         ` Thomas Gleixner
2018-06-15 10:29         ` Thomas Gleixner
2018-06-16  0:46         ` Ricardo Neri
2018-06-16  0:46           ` Ricardo Neri
2018-06-16  0:46           ` Ricardo Neri
2018-06-16 13:27           ` Thomas Gleixner
2018-06-16 13:27             ` Thomas Gleixner
2018-06-16 13:27             ` Thomas Gleixner
2018-06-13  0:57 ` [RFC PATCH 21/23] watchdog/hardlockup/hpet: Adjust timer expiration on the number of " Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57 ` [RFC PATCH 22/23] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  5:26   ` Randy Dunlap
2018-06-13  5:26     ` Randy Dunlap
2018-06-13  5:26     ` Randy Dunlap
2018-06-14  0:58     ` Ricardo Neri
2018-06-14  0:58       ` Ricardo Neri
2018-06-14  0:58       ` Ricardo Neri
2018-06-14  3:30       ` Randy Dunlap
2018-06-14  3:30         ` Randy Dunlap
2018-06-14  3:30         ` Randy Dunlap
2018-06-13  0:57 ` [RFC PATCH 23/23] watchdog/hardlockup: Activate the HPET-based lockup detector Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri
2018-06-13  0:57   ` Ricardo Neri

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.