All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] Extending NMI watchdog during LPM
@ 2022-06-27 13:53 ` Laurent Dufour
  0 siblings, 0 replies; 28+ messages in thread
From: Laurent Dufour @ 2022-06-27 13:53 UTC (permalink / raw)
  To: wim, linux, mpe, benh, paulus, nathanl, haren, npiggin
  Cc: linux-kernel, linuxppc-dev, linux-watchdog

When a partition is transferred, once it arrives at the destination node,
the partition is active but much of its memory must be transferred from the
start node.

It depends on the activity in the partition, but the more CPU the partition
has, the more memory to be transferred is likely to be. This causes latency
when accessing pages that need to be transferred, and often, for large
partitions, it triggers the NMI watchdog.

The NMI watchdog causes the CPU stack to dump where it appears to be
stuck. In this case, it does not bring much information since it can happen
during any memory access of the kernel.

In addition, the NMI interrupt mechanism is not secure and can generate a
dump system in the event that the interruption is taken while MSR[RI]=0.

Depending on the LPAR size and load, it may be interesting to extend the
NMI watchdog timer during the LPM.

That's configurable through sysctl with the new introduced variable
(specific to powerpc) nmi_watchdog_factor. This value represents the
percentage added to watchdog_tresh to set the NMI watchdog timeout during a
LPM.

Changes in v3:
 - don't export watchdog_mutex
 - fix a comment in mobilty.c, wait_for_vasi_session_completed()
 - fix a build issue when !CONFIG_PPC_WATCHDOG
 - rework some printk and rename the sysctl variable.

v2:
https://lore.kernel.org/all/20220614135414.37746-1-ldufour@linux.ibm.com/

Laurent Dufour (4):
  powerpc/mobility: wait for memory transfer to complete
  watchdog: export lockup_detector_reconfigure
  powerpc/watchdog: introduce a NMI watchdog's factor
  pseries/mobility: set NMI watchdog factor during LPM

 Documentation/admin-guide/sysctl/kernel.rst | 12 +++
 arch/powerpc/include/asm/nmi.h              |  2 +
 arch/powerpc/kernel/watchdog.c              | 21 ++++-
 arch/powerpc/platforms/pseries/mobility.c   | 85 ++++++++++++++++++++-
 include/linux/nmi.h                         |  2 +
 kernel/watchdog.c                           | 21 +++--
 6 files changed, 135 insertions(+), 8 deletions(-)

-- 
2.36.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 0/4] Extending NMI watchdog during LPM
@ 2022-06-27 13:53 ` Laurent Dufour
  0 siblings, 0 replies; 28+ messages in thread
From: Laurent Dufour @ 2022-06-27 13:53 UTC (permalink / raw)
  To: wim, linux, mpe, benh, paulus, nathanl, haren, npiggin
  Cc: linuxppc-dev, linux-kernel, linux-watchdog

When a partition is transferred, once it arrives at the destination node,
the partition is active but much of its memory must be transferred from the
start node.

It depends on the activity in the partition, but the more CPU the partition
has, the more memory to be transferred is likely to be. This causes latency
when accessing pages that need to be transferred, and often, for large
partitions, it triggers the NMI watchdog.

The NMI watchdog causes the CPU stack to dump where it appears to be
stuck. In this case, it does not bring much information since it can happen
during any memory access of the kernel.

In addition, the NMI interrupt mechanism is not secure and can generate a
dump system in the event that the interruption is taken while MSR[RI]=0.

Depending on the LPAR size and load, it may be interesting to extend the
NMI watchdog timer during the LPM.

That's configurable through sysctl with the new introduced variable
(specific to powerpc) nmi_watchdog_factor. This value represents the
percentage added to watchdog_tresh to set the NMI watchdog timeout during a
LPM.

Changes in v3:
 - don't export watchdog_mutex
 - fix a comment in mobilty.c, wait_for_vasi_session_completed()
 - fix a build issue when !CONFIG_PPC_WATCHDOG
 - rework some printk and rename the sysctl variable.

v2:
https://lore.kernel.org/all/20220614135414.37746-1-ldufour@linux.ibm.com/

Laurent Dufour (4):
  powerpc/mobility: wait for memory transfer to complete
  watchdog: export lockup_detector_reconfigure
  powerpc/watchdog: introduce a NMI watchdog's factor
  pseries/mobility: set NMI watchdog factor during LPM

 Documentation/admin-guide/sysctl/kernel.rst | 12 +++
 arch/powerpc/include/asm/nmi.h              |  2 +
 arch/powerpc/kernel/watchdog.c              | 21 ++++-
 arch/powerpc/platforms/pseries/mobility.c   | 85 ++++++++++++++++++++-
 include/linux/nmi.h                         |  2 +
 kernel/watchdog.c                           | 21 +++--
 6 files changed, 135 insertions(+), 8 deletions(-)

-- 
2.36.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 1/4] powerpc/mobility: wait for memory transfer to complete
  2022-06-27 13:53 ` Laurent Dufour
@ 2022-06-27 13:53   ` Laurent Dufour
  -1 siblings, 0 replies; 28+ messages in thread
From: Laurent Dufour @ 2022-06-27 13:53 UTC (permalink / raw)
  To: wim, linux, mpe, benh, paulus, nathanl, haren, npiggin
  Cc: linux-kernel, linuxppc-dev, linux-watchdog

In pseries_migration_partition(), loop until the memory transfer is
complete. This way the calling drmgr process will not exit earlier,
allowing callbacks to be run only once the migration is fully completed.

If reading the VASI state is done after the hypervisor has completed the
migration, the HCALL is returning H_PARAMETER. We can safely assume that
the memory transfer is achieved if this happens.

This will also allow to manage the NMI watchdog state in the next commits.

Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/mobility.c | 42 +++++++++++++++++++++--
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 78f3f74c7056..907a779074d6 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -427,6 +427,43 @@ static int wait_for_vasi_session_suspending(u64 handle)
 	return ret;
 }
 
+static void wait_for_vasi_session_completed(u64 handle)
+{
+	unsigned long state = 0;
+	int ret;
+
+	pr_info("waiting for memory transfert to complete...\n");
+
+	/*
+	 * Wait for transition from H_VASI_RESUMED to H_VASI_COMPLETED.
+	 */
+	while (true) {
+		ret = poll_vasi_state(handle, &state);
+
+		/*
+		 * If the memory transfer is already complete and the migration
+		 * has been cleaned up by the hypervisor, H_PARAMETER is return,
+		 * which is translate in EINVAL by poll_vasi_state().
+		 */
+		if (ret == -EINVAL || (!ret && state == H_VASI_COMPLETED)) {
+			pr_info("memory transfert completed.\n");
+			break;
+		}
+
+		if (ret) {
+			pr_err("H_VASI_STATE return error (%d)\n", ret);
+			break;
+		}
+
+		if (state != H_VASI_RESUMED) {
+			pr_err("unexpected H_VASI_STATE result %lu\n", state);
+			break;
+		}
+
+		msleep(500);
+	}
+}
+
 static void prod_single(unsigned int target_cpu)
 {
 	long hvrc;
@@ -673,9 +710,10 @@ static int pseries_migrate_partition(u64 handle)
 	vas_migration_handler(VAS_SUSPEND);
 
 	ret = pseries_suspend(handle);
-	if (ret == 0)
+	if (ret == 0) {
 		post_mobility_fixup();
-	else
+		wait_for_vasi_session_completed(handle);
+	} else
 		pseries_cancel_migration(handle, ret);
 
 	vas_migration_handler(VAS_RESUME);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 1/4] powerpc/mobility: wait for memory transfer to complete
@ 2022-06-27 13:53   ` Laurent Dufour
  0 siblings, 0 replies; 28+ messages in thread
From: Laurent Dufour @ 2022-06-27 13:53 UTC (permalink / raw)
  To: wim, linux, mpe, benh, paulus, nathanl, haren, npiggin
  Cc: linuxppc-dev, linux-kernel, linux-watchdog

In pseries_migration_partition(), loop until the memory transfer is
complete. This way the calling drmgr process will not exit earlier,
allowing callbacks to be run only once the migration is fully completed.

If reading the VASI state is done after the hypervisor has completed the
migration, the HCALL is returning H_PARAMETER. We can safely assume that
the memory transfer is achieved if this happens.

This will also allow to manage the NMI watchdog state in the next commits.

Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/mobility.c | 42 +++++++++++++++++++++--
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 78f3f74c7056..907a779074d6 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -427,6 +427,43 @@ static int wait_for_vasi_session_suspending(u64 handle)
 	return ret;
 }
 
+static void wait_for_vasi_session_completed(u64 handle)
+{
+	unsigned long state = 0;
+	int ret;
+
+	pr_info("waiting for memory transfert to complete...\n");
+
+	/*
+	 * Wait for transition from H_VASI_RESUMED to H_VASI_COMPLETED.
+	 */
+	while (true) {
+		ret = poll_vasi_state(handle, &state);
+
+		/*
+		 * If the memory transfer is already complete and the migration
+		 * has been cleaned up by the hypervisor, H_PARAMETER is return,
+		 * which is translate in EINVAL by poll_vasi_state().
+		 */
+		if (ret == -EINVAL || (!ret && state == H_VASI_COMPLETED)) {
+			pr_info("memory transfert completed.\n");
+			break;
+		}
+
+		if (ret) {
+			pr_err("H_VASI_STATE return error (%d)\n", ret);
+			break;
+		}
+
+		if (state != H_VASI_RESUMED) {
+			pr_err("unexpected H_VASI_STATE result %lu\n", state);
+			break;
+		}
+
+		msleep(500);
+	}
+}
+
 static void prod_single(unsigned int target_cpu)
 {
 	long hvrc;
@@ -673,9 +710,10 @@ static int pseries_migrate_partition(u64 handle)
 	vas_migration_handler(VAS_SUSPEND);
 
 	ret = pseries_suspend(handle);
-	if (ret == 0)
+	if (ret == 0) {
 		post_mobility_fixup();
-	else
+		wait_for_vasi_session_completed(handle);
+	} else
 		pseries_cancel_migration(handle, ret);
 
 	vas_migration_handler(VAS_RESUME);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 2/4] watchdog: export lockup_detector_reconfigure
  2022-06-27 13:53 ` Laurent Dufour
@ 2022-06-27 13:53   ` Laurent Dufour
  -1 siblings, 0 replies; 28+ messages in thread
From: Laurent Dufour @ 2022-06-27 13:53 UTC (permalink / raw)
  To: wim, linux, mpe, benh, paulus, nathanl, haren, npiggin
  Cc: linux-kernel, linuxppc-dev, linux-watchdog, Christoph Hellwig

In some circumstances it may be interesting to reconfigure the watchdog
from inside the kernel.

On PowerPC, this may helpful before and after a LPAR migration (LPM) is
initiated, because it implies some latencies, watchdog, and especially NMI
watchdog is expected to be triggered during this operation. Reconfiguring
the watchdog with a factor, would prevent it to happen too frequently
during LPM.

Rename lockup_detector_reconfigure() as __lockup_detector_reconfigure() and
create a new function lockup_detector_reconfigure() calling
__lockup_detector_reconfigure() under the protection of watchdog_mutex.

Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
 include/linux/nmi.h |  2 ++
 kernel/watchdog.c   | 21 ++++++++++++++++-----
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 750c7f395ca9..f700ff2df074 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -122,6 +122,8 @@ int watchdog_nmi_probe(void);
 int watchdog_nmi_enable(unsigned int cpu);
 void watchdog_nmi_disable(unsigned int cpu);
 
+void lockup_detector_reconfigure(void);
+
 /**
  * touch_nmi_watchdog - restart NMI watchdog timeout.
  *
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 20a7a55e62b6..90e6c41d5e33 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -541,7 +541,7 @@ int lockup_detector_offline_cpu(unsigned int cpu)
 	return 0;
 }
 
-static void lockup_detector_reconfigure(void)
+static void __lockup_detector_reconfigure(void)
 {
 	cpus_read_lock();
 	watchdog_nmi_stop();
@@ -561,6 +561,13 @@ static void lockup_detector_reconfigure(void)
 	__lockup_detector_cleanup();
 }
 
+void lockup_detector_reconfigure(void)
+{
+	mutex_lock(&watchdog_mutex);
+	__lockup_detector_reconfigure();
+	mutex_unlock(&watchdog_mutex);
+}
+
 /*
  * Create the watchdog infrastructure and configure the detector(s).
  */
@@ -577,13 +584,13 @@ static __init void lockup_detector_setup(void)
 		return;
 
 	mutex_lock(&watchdog_mutex);
-	lockup_detector_reconfigure();
+	__lockup_detector_reconfigure();
 	softlockup_initialized = true;
 	mutex_unlock(&watchdog_mutex);
 }
 
 #else /* CONFIG_SOFTLOCKUP_DETECTOR */
-static void lockup_detector_reconfigure(void)
+void __lockup_detector_reconfigure(void)
 {
 	cpus_read_lock();
 	watchdog_nmi_stop();
@@ -591,9 +598,13 @@ static void lockup_detector_reconfigure(void)
 	watchdog_nmi_start();
 	cpus_read_unlock();
 }
+static inline void lockup_detector_reconfigure(void)
+{
+	__lockup_detector_reconfigure();
+}
 static inline void lockup_detector_setup(void)
 {
-	lockup_detector_reconfigure();
+	__lockup_detector_reconfigure();
 }
 #endif /* !CONFIG_SOFTLOCKUP_DETECTOR */
 
@@ -633,7 +644,7 @@ static void proc_watchdog_update(void)
 {
 	/* Remove impossible cpus to keep sysctl output clean. */
 	cpumask_and(&watchdog_cpumask, &watchdog_cpumask, cpu_possible_mask);
-	lockup_detector_reconfigure();
+	__lockup_detector_reconfigure();
 }
 
 /*
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 2/4] watchdog: export lockup_detector_reconfigure
@ 2022-06-27 13:53   ` Laurent Dufour
  0 siblings, 0 replies; 28+ messages in thread
From: Laurent Dufour @ 2022-06-27 13:53 UTC (permalink / raw)
  To: wim, linux, mpe, benh, paulus, nathanl, haren, npiggin
  Cc: Christoph Hellwig, linuxppc-dev, linux-kernel, linux-watchdog

In some circumstances it may be interesting to reconfigure the watchdog
from inside the kernel.

On PowerPC, this may helpful before and after a LPAR migration (LPM) is
initiated, because it implies some latencies, watchdog, and especially NMI
watchdog is expected to be triggered during this operation. Reconfiguring
the watchdog with a factor, would prevent it to happen too frequently
during LPM.

Rename lockup_detector_reconfigure() as __lockup_detector_reconfigure() and
create a new function lockup_detector_reconfigure() calling
__lockup_detector_reconfigure() under the protection of watchdog_mutex.

Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
 include/linux/nmi.h |  2 ++
 kernel/watchdog.c   | 21 ++++++++++++++++-----
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 750c7f395ca9..f700ff2df074 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -122,6 +122,8 @@ int watchdog_nmi_probe(void);
 int watchdog_nmi_enable(unsigned int cpu);
 void watchdog_nmi_disable(unsigned int cpu);
 
+void lockup_detector_reconfigure(void);
+
 /**
  * touch_nmi_watchdog - restart NMI watchdog timeout.
  *
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 20a7a55e62b6..90e6c41d5e33 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -541,7 +541,7 @@ int lockup_detector_offline_cpu(unsigned int cpu)
 	return 0;
 }
 
-static void lockup_detector_reconfigure(void)
+static void __lockup_detector_reconfigure(void)
 {
 	cpus_read_lock();
 	watchdog_nmi_stop();
@@ -561,6 +561,13 @@ static void lockup_detector_reconfigure(void)
 	__lockup_detector_cleanup();
 }
 
+void lockup_detector_reconfigure(void)
+{
+	mutex_lock(&watchdog_mutex);
+	__lockup_detector_reconfigure();
+	mutex_unlock(&watchdog_mutex);
+}
+
 /*
  * Create the watchdog infrastructure and configure the detector(s).
  */
@@ -577,13 +584,13 @@ static __init void lockup_detector_setup(void)
 		return;
 
 	mutex_lock(&watchdog_mutex);
-	lockup_detector_reconfigure();
+	__lockup_detector_reconfigure();
 	softlockup_initialized = true;
 	mutex_unlock(&watchdog_mutex);
 }
 
 #else /* CONFIG_SOFTLOCKUP_DETECTOR */
-static void lockup_detector_reconfigure(void)
+void __lockup_detector_reconfigure(void)
 {
 	cpus_read_lock();
 	watchdog_nmi_stop();
@@ -591,9 +598,13 @@ static void lockup_detector_reconfigure(void)
 	watchdog_nmi_start();
 	cpus_read_unlock();
 }
+static inline void lockup_detector_reconfigure(void)
+{
+	__lockup_detector_reconfigure();
+}
 static inline void lockup_detector_setup(void)
 {
-	lockup_detector_reconfigure();
+	__lockup_detector_reconfigure();
 }
 #endif /* !CONFIG_SOFTLOCKUP_DETECTOR */
 
@@ -633,7 +644,7 @@ static void proc_watchdog_update(void)
 {
 	/* Remove impossible cpus to keep sysctl output clean. */
 	cpumask_and(&watchdog_cpumask, &watchdog_cpumask, cpu_possible_mask);
-	lockup_detector_reconfigure();
+	__lockup_detector_reconfigure();
 }
 
 /*
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 3/4] powerpc/watchdog: introduce a NMI watchdog's factor
  2022-06-27 13:53 ` Laurent Dufour
@ 2022-06-27 13:53   ` Laurent Dufour
  -1 siblings, 0 replies; 28+ messages in thread
From: Laurent Dufour @ 2022-06-27 13:53 UTC (permalink / raw)
  To: wim, linux, mpe, benh, paulus, nathanl, haren, npiggin
  Cc: linux-kernel, linuxppc-dev, linux-watchdog

Introduce a factor which would apply to the NMI watchdog timeout.

This factor is a percentage added to the watchdog_tresh value. The value is
set under the watchdog_mutex protection and lockup_detector_reconfigure()
is called to recompute wd_panic_timeout_tb.

Once the factor is set, it remains until it is set back to 0, which means
no impact.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
 arch/powerpc/include/asm/nmi.h |  2 ++
 arch/powerpc/kernel/watchdog.c | 21 ++++++++++++++++++++-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/nmi.h b/arch/powerpc/include/asm/nmi.h
index ea0e487f87b1..7d6a8d9b0543 100644
--- a/arch/powerpc/include/asm/nmi.h
+++ b/arch/powerpc/include/asm/nmi.h
@@ -5,8 +5,10 @@
 #ifdef CONFIG_PPC_WATCHDOG
 extern void arch_touch_nmi_watchdog(void);
 long soft_nmi_interrupt(struct pt_regs *regs);
+void watchdog_nmi_set_lpm_factor(u64 factor);
 #else
 static inline void arch_touch_nmi_watchdog(void) {}
+static inline void watchdog_nmi_set_lpm_factor(u64 factor) {}
 #endif
 
 #ifdef CONFIG_NMI_IPI
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 7d28b9553654..80851b228f71 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -91,6 +91,10 @@ static cpumask_t wd_smp_cpus_pending;
 static cpumask_t wd_smp_cpus_stuck;
 static u64 wd_smp_last_reset_tb;
 
+#ifdef CONFIG_PPC_PSERIES
+static u64 wd_factor;
+#endif
+
 /*
  * Try to take the exclusive watchdog action / NMI IPI / printing lock.
  * wd_smp_lock must be held. If this fails, we should return and wait
@@ -527,7 +531,13 @@ static int stop_watchdog_on_cpu(unsigned int cpu)
 
 static void watchdog_calc_timeouts(void)
 {
-	wd_panic_timeout_tb = watchdog_thresh * ppc_tb_freq;
+	u64 threshold = watchdog_thresh;
+
+#ifdef CONFIG_PPC_PSERIES
+	threshold += (READ_ONCE(wd_factor) * threshold) / 100;
+#endif
+
+	wd_panic_timeout_tb = threshold * ppc_tb_freq;
 
 	/* Have the SMP detector trigger a bit later */
 	wd_smp_panic_timeout_tb = wd_panic_timeout_tb * 3 / 2;
@@ -570,3 +580,12 @@ int __init watchdog_nmi_probe(void)
 	}
 	return 0;
 }
+
+#ifdef CONFIG_PPC_PSERIES
+void watchdog_nmi_set_lpm_factor(u64 factor)
+{
+	pr_info("Set the NMI watchdog factor to %llu%%\n", factor);
+	WRITE_ONCE(wd_factor, factor);
+	lockup_detector_reconfigure();
+}
+#endif
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 3/4] powerpc/watchdog: introduce a NMI watchdog's factor
@ 2022-06-27 13:53   ` Laurent Dufour
  0 siblings, 0 replies; 28+ messages in thread
From: Laurent Dufour @ 2022-06-27 13:53 UTC (permalink / raw)
  To: wim, linux, mpe, benh, paulus, nathanl, haren, npiggin
  Cc: linuxppc-dev, linux-kernel, linux-watchdog

Introduce a factor which would apply to the NMI watchdog timeout.

This factor is a percentage added to the watchdog_tresh value. The value is
set under the watchdog_mutex protection and lockup_detector_reconfigure()
is called to recompute wd_panic_timeout_tb.

Once the factor is set, it remains until it is set back to 0, which means
no impact.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
 arch/powerpc/include/asm/nmi.h |  2 ++
 arch/powerpc/kernel/watchdog.c | 21 ++++++++++++++++++++-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/nmi.h b/arch/powerpc/include/asm/nmi.h
index ea0e487f87b1..7d6a8d9b0543 100644
--- a/arch/powerpc/include/asm/nmi.h
+++ b/arch/powerpc/include/asm/nmi.h
@@ -5,8 +5,10 @@
 #ifdef CONFIG_PPC_WATCHDOG
 extern void arch_touch_nmi_watchdog(void);
 long soft_nmi_interrupt(struct pt_regs *regs);
+void watchdog_nmi_set_lpm_factor(u64 factor);
 #else
 static inline void arch_touch_nmi_watchdog(void) {}
+static inline void watchdog_nmi_set_lpm_factor(u64 factor) {}
 #endif
 
 #ifdef CONFIG_NMI_IPI
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 7d28b9553654..80851b228f71 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -91,6 +91,10 @@ static cpumask_t wd_smp_cpus_pending;
 static cpumask_t wd_smp_cpus_stuck;
 static u64 wd_smp_last_reset_tb;
 
+#ifdef CONFIG_PPC_PSERIES
+static u64 wd_factor;
+#endif
+
 /*
  * Try to take the exclusive watchdog action / NMI IPI / printing lock.
  * wd_smp_lock must be held. If this fails, we should return and wait
@@ -527,7 +531,13 @@ static int stop_watchdog_on_cpu(unsigned int cpu)
 
 static void watchdog_calc_timeouts(void)
 {
-	wd_panic_timeout_tb = watchdog_thresh * ppc_tb_freq;
+	u64 threshold = watchdog_thresh;
+
+#ifdef CONFIG_PPC_PSERIES
+	threshold += (READ_ONCE(wd_factor) * threshold) / 100;
+#endif
+
+	wd_panic_timeout_tb = threshold * ppc_tb_freq;
 
 	/* Have the SMP detector trigger a bit later */
 	wd_smp_panic_timeout_tb = wd_panic_timeout_tb * 3 / 2;
@@ -570,3 +580,12 @@ int __init watchdog_nmi_probe(void)
 	}
 	return 0;
 }
+
+#ifdef CONFIG_PPC_PSERIES
+void watchdog_nmi_set_lpm_factor(u64 factor)
+{
+	pr_info("Set the NMI watchdog factor to %llu%%\n", factor);
+	WRITE_ONCE(wd_factor, factor);
+	lockup_detector_reconfigure();
+}
+#endif
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 4/4] pseries/mobility: set NMI watchdog factor during LPM
  2022-06-27 13:53 ` Laurent Dufour
@ 2022-06-27 13:53   ` Laurent Dufour
  -1 siblings, 0 replies; 28+ messages in thread
From: Laurent Dufour @ 2022-06-27 13:53 UTC (permalink / raw)
  To: wim, linux, mpe, benh, paulus, nathanl, haren, npiggin
  Cc: linux-kernel, linuxppc-dev, linux-watchdog

During a LPM, while the memory transfer is in progress on the arrival side,
some latencies is generated when accessing not yet transferred pages on the
arrival side. Thus, the NMI watchdog may be triggered too frequently, which
increases the risk to hit a NMI interrupt in a bad place in the kernel,
leading to a kernel panic.

Disabling the Hard Lockup Watchdog until the memory transfer could be a too
strong work around, some users would want this timeout to be eventually
triggered if the system is hanging even during LPM.

Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply
a factor to the NMI watchdog timeout during a LPM. Just before the CPU are
stopped for the switchover sequence, the NMI watchdog timer is set to
 watchdog_tresh + factor%

A value of 0 has no effect. The default value is 200, meaning that the NMI
watchdog is set to 30s during LPM (based on a 10s watchdog_tresh value).
Once the memory transfer is achieved, the factor is reset to 0.

Setting this value to a high number is like disabling the NMI watchdog
during a LPM.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
 Documentation/admin-guide/sysctl/kernel.rst | 12 ++++++
 arch/powerpc/platforms/pseries/mobility.c   | 43 +++++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index ddccd1077462..0bb0b7f27e96 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -592,6 +592,18 @@ to the guest kernel command line (see
 Documentation/admin-guide/kernel-parameters.rst).
 
 
+nmi_watchdog_factor (PPC only)
+==================================
+
+Factor apply to to the NMI watchdog timeout (only when ``nmi_watchdog`` is
+set to 1). This factor represents the percentage added to
+``watchdog_thresh`` when calculating the NMI watchdog timeout during a
+LPM. The soft lockup timeout is not impacted.
+
+A value of 0 means no change. The default value is 200 meaning the NMI
+watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10).
+
+
 numa_balancing
 ==============
 
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 907a779074d6..649155faafc2 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -48,6 +48,39 @@ struct update_props_workarea {
 #define MIGRATION_SCOPE	(1)
 #define PRRN_SCOPE -2
 
+#ifdef CONFIG_PPC_WATCHDOG
+static unsigned int nmi_wd_factor = 200;
+
+#ifdef CONFIG_SYSCTL
+static struct ctl_table nmi_wd_factor_ctl_table[] = {
+	{
+		.procname	= "nmi_watchdog_factor",
+		.data		= &nmi_wd_factor,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_douintvec_minmax,
+	},
+	{}
+};
+static struct ctl_table nmi_wd_factor_sysctl_root[] = {
+	{
+		.procname       = "kernel",
+		.mode           = 0555,
+		.child          = nmi_wd_factor_ctl_table,
+	},
+	{}
+};
+
+static int __init register_nmi_wd_factor_sysctl(void)
+{
+	register_sysctl_table(nmi_wd_factor_sysctl_root);
+
+	return 0;
+}
+device_initcall(register_nmi_wd_factor_sysctl);
+#endif /* CONFIG_SYSCTL */
+#endif /* CONFIG_PPC_WATCHDOG */
+
 static int mobility_rtas_call(int token, char *buf, s32 scope)
 {
 	int rc;
@@ -702,13 +735,20 @@ static int pseries_suspend(u64 handle)
 static int pseries_migrate_partition(u64 handle)
 {
 	int ret;
+	unsigned int factor = 0;
 
+#ifdef CONFIG_PPC_WATCHDOG
+	factor = nmi_wd_factor;
+#endif
 	ret = wait_for_vasi_session_suspending(handle);
 	if (ret)
 		return ret;
 
 	vas_migration_handler(VAS_SUSPEND);
 
+	if (factor)
+		watchdog_nmi_set_lpm_factor(factor);
+
 	ret = pseries_suspend(handle);
 	if (ret == 0) {
 		post_mobility_fixup();
@@ -716,6 +756,9 @@ static int pseries_migrate_partition(u64 handle)
 	} else
 		pseries_cancel_migration(handle, ret);
 
+	if (factor)
+		watchdog_nmi_set_lpm_factor(0);
+
 	vas_migration_handler(VAS_RESUME);
 
 	return ret;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 4/4] pseries/mobility: set NMI watchdog factor during LPM
@ 2022-06-27 13:53   ` Laurent Dufour
  0 siblings, 0 replies; 28+ messages in thread
From: Laurent Dufour @ 2022-06-27 13:53 UTC (permalink / raw)
  To: wim, linux, mpe, benh, paulus, nathanl, haren, npiggin
  Cc: linuxppc-dev, linux-kernel, linux-watchdog

During a LPM, while the memory transfer is in progress on the arrival side,
some latencies is generated when accessing not yet transferred pages on the
arrival side. Thus, the NMI watchdog may be triggered too frequently, which
increases the risk to hit a NMI interrupt in a bad place in the kernel,
leading to a kernel panic.

Disabling the Hard Lockup Watchdog until the memory transfer could be a too
strong work around, some users would want this timeout to be eventually
triggered if the system is hanging even during LPM.

Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply
a factor to the NMI watchdog timeout during a LPM. Just before the CPU are
stopped for the switchover sequence, the NMI watchdog timer is set to
 watchdog_tresh + factor%

A value of 0 has no effect. The default value is 200, meaning that the NMI
watchdog is set to 30s during LPM (based on a 10s watchdog_tresh value).
Once the memory transfer is achieved, the factor is reset to 0.

Setting this value to a high number is like disabling the NMI watchdog
during a LPM.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
 Documentation/admin-guide/sysctl/kernel.rst | 12 ++++++
 arch/powerpc/platforms/pseries/mobility.c   | 43 +++++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index ddccd1077462..0bb0b7f27e96 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -592,6 +592,18 @@ to the guest kernel command line (see
 Documentation/admin-guide/kernel-parameters.rst).
 
 
+nmi_watchdog_factor (PPC only)
+==================================
+
+Factor apply to to the NMI watchdog timeout (only when ``nmi_watchdog`` is
+set to 1). This factor represents the percentage added to
+``watchdog_thresh`` when calculating the NMI watchdog timeout during a
+LPM. The soft lockup timeout is not impacted.
+
+A value of 0 means no change. The default value is 200 meaning the NMI
+watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10).
+
+
 numa_balancing
 ==============
 
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 907a779074d6..649155faafc2 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -48,6 +48,39 @@ struct update_props_workarea {
 #define MIGRATION_SCOPE	(1)
 #define PRRN_SCOPE -2
 
+#ifdef CONFIG_PPC_WATCHDOG
+static unsigned int nmi_wd_factor = 200;
+
+#ifdef CONFIG_SYSCTL
+static struct ctl_table nmi_wd_factor_ctl_table[] = {
+	{
+		.procname	= "nmi_watchdog_factor",
+		.data		= &nmi_wd_factor,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_douintvec_minmax,
+	},
+	{}
+};
+static struct ctl_table nmi_wd_factor_sysctl_root[] = {
+	{
+		.procname       = "kernel",
+		.mode           = 0555,
+		.child          = nmi_wd_factor_ctl_table,
+	},
+	{}
+};
+
+static int __init register_nmi_wd_factor_sysctl(void)
+{
+	register_sysctl_table(nmi_wd_factor_sysctl_root);
+
+	return 0;
+}
+device_initcall(register_nmi_wd_factor_sysctl);
+#endif /* CONFIG_SYSCTL */
+#endif /* CONFIG_PPC_WATCHDOG */
+
 static int mobility_rtas_call(int token, char *buf, s32 scope)
 {
 	int rc;
@@ -702,13 +735,20 @@ static int pseries_suspend(u64 handle)
 static int pseries_migrate_partition(u64 handle)
 {
 	int ret;
+	unsigned int factor = 0;
 
+#ifdef CONFIG_PPC_WATCHDOG
+	factor = nmi_wd_factor;
+#endif
 	ret = wait_for_vasi_session_suspending(handle);
 	if (ret)
 		return ret;
 
 	vas_migration_handler(VAS_SUSPEND);
 
+	if (factor)
+		watchdog_nmi_set_lpm_factor(factor);
+
 	ret = pseries_suspend(handle);
 	if (ret == 0) {
 		post_mobility_fixup();
@@ -716,6 +756,9 @@ static int pseries_migrate_partition(u64 handle)
 	} else
 		pseries_cancel_migration(handle, ret);
 
+	if (factor)
+		watchdog_nmi_set_lpm_factor(0);
+
 	vas_migration_handler(VAS_RESUME);
 
 	return ret;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/4] Extending NMI watchdog during LPM
  2022-06-27 13:53 ` Laurent Dufour
@ 2022-07-12  1:21   ` Nicholas Piggin
  -1 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2022-07-12  1:21 UTC (permalink / raw)
  To: benh, haren, Laurent Dufour, linux, mpe, nathanl, paulus, wim
  Cc: linux-kernel, linuxppc-dev, linux-watchdog

Excerpts from Laurent Dufour's message of June 27, 2022 11:53 pm:
> When a partition is transferred, once it arrives at the destination node,
> the partition is active but much of its memory must be transferred from the
> start node.
> 
> It depends on the activity in the partition, but the more CPU the partition
> has, the more memory to be transferred is likely to be. This causes latency
> when accessing pages that need to be transferred, and often, for large
> partitions, it triggers the NMI watchdog.

Importantly, it can require page in of code that runs with irqs 
disabled, which is unlike a guest normally runs under PowerVM
(but it can under KVM) which is why we enabled the watchdog under
PowerVM but not KVM. So, okay it makes sense to mak an exception
for this case.

Thanks,
Nick

> The NMI watchdog causes the CPU stack to dump where it appears to be
> stuck. In this case, it does not bring much information since it can happen
> during any memory access of the kernel.
> 
> In addition, the NMI interrupt mechanism is not secure and can generate a
> dump system in the event that the interruption is taken while MSR[RI]=0.
> 
> Depending on the LPAR size and load, it may be interesting to extend the
> NMI watchdog timer during the LPM.
> 
> That's configurable through sysctl with the new introduced variable
> (specific to powerpc) nmi_watchdog_factor. This value represents the
> percentage added to watchdog_tresh to set the NMI watchdog timeout during a
> LPM.
> 
> Changes in v3:
>  - don't export watchdog_mutex
>  - fix a comment in mobilty.c, wait_for_vasi_session_completed()
>  - fix a build issue when !CONFIG_PPC_WATCHDOG
>  - rework some printk and rename the sysctl variable.
> 
> v2:
> https://lore.kernel.org/all/20220614135414.37746-1-ldufour@linux.ibm.com/
> 
> Laurent Dufour (4):
>   powerpc/mobility: wait for memory transfer to complete
>   watchdog: export lockup_detector_reconfigure
>   powerpc/watchdog: introduce a NMI watchdog's factor
>   pseries/mobility: set NMI watchdog factor during LPM
> 
>  Documentation/admin-guide/sysctl/kernel.rst | 12 +++
>  arch/powerpc/include/asm/nmi.h              |  2 +
>  arch/powerpc/kernel/watchdog.c              | 21 ++++-
>  arch/powerpc/platforms/pseries/mobility.c   | 85 ++++++++++++++++++++-
>  include/linux/nmi.h                         |  2 +
>  kernel/watchdog.c                           | 21 +++--
>  6 files changed, 135 insertions(+), 8 deletions(-)
> 
> -- 
> 2.36.1
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/4] Extending NMI watchdog during LPM
@ 2022-07-12  1:21   ` Nicholas Piggin
  0 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2022-07-12  1:21 UTC (permalink / raw)
  To: benh, haren, Laurent Dufour, linux, mpe, nathanl, paulus, wim
  Cc: linuxppc-dev, linux-kernel, linux-watchdog

Excerpts from Laurent Dufour's message of June 27, 2022 11:53 pm:
> When a partition is transferred, once it arrives at the destination node,
> the partition is active but much of its memory must be transferred from the
> start node.
> 
> It depends on the activity in the partition, but the more CPU the partition
> has, the more memory to be transferred is likely to be. This causes latency
> when accessing pages that need to be transferred, and often, for large
> partitions, it triggers the NMI watchdog.

Importantly, it can require page in of code that runs with irqs 
disabled, which is unlike a guest normally runs under PowerVM
(but it can under KVM) which is why we enabled the watchdog under
PowerVM but not KVM. So, okay it makes sense to mak an exception
for this case.

Thanks,
Nick

> The NMI watchdog causes the CPU stack to dump where it appears to be
> stuck. In this case, it does not bring much information since it can happen
> during any memory access of the kernel.
> 
> In addition, the NMI interrupt mechanism is not secure and can generate a
> dump system in the event that the interruption is taken while MSR[RI]=0.
> 
> Depending on the LPAR size and load, it may be interesting to extend the
> NMI watchdog timer during the LPM.
> 
> That's configurable through sysctl with the new introduced variable
> (specific to powerpc) nmi_watchdog_factor. This value represents the
> percentage added to watchdog_tresh to set the NMI watchdog timeout during a
> LPM.
> 
> Changes in v3:
>  - don't export watchdog_mutex
>  - fix a comment in mobilty.c, wait_for_vasi_session_completed()
>  - fix a build issue when !CONFIG_PPC_WATCHDOG
>  - rework some printk and rename the sysctl variable.
> 
> v2:
> https://lore.kernel.org/all/20220614135414.37746-1-ldufour@linux.ibm.com/
> 
> Laurent Dufour (4):
>   powerpc/mobility: wait for memory transfer to complete
>   watchdog: export lockup_detector_reconfigure
>   powerpc/watchdog: introduce a NMI watchdog's factor
>   pseries/mobility: set NMI watchdog factor during LPM
> 
>  Documentation/admin-guide/sysctl/kernel.rst | 12 +++
>  arch/powerpc/include/asm/nmi.h              |  2 +
>  arch/powerpc/kernel/watchdog.c              | 21 ++++-
>  arch/powerpc/platforms/pseries/mobility.c   | 85 ++++++++++++++++++++-
>  include/linux/nmi.h                         |  2 +
>  kernel/watchdog.c                           | 21 +++--
>  6 files changed, 135 insertions(+), 8 deletions(-)
> 
> -- 
> 2.36.1
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 1/4] powerpc/mobility: wait for memory transfer to complete
  2022-06-27 13:53   ` Laurent Dufour
@ 2022-07-12  1:33     ` Nicholas Piggin
  -1 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2022-07-12  1:33 UTC (permalink / raw)
  To: benh, haren, Laurent Dufour, linux, mpe, nathanl, paulus, wim
  Cc: linux-kernel, linuxppc-dev, linux-watchdog

Excerpts from Laurent Dufour's message of June 27, 2022 11:53 pm:
> In pseries_migration_partition(), loop until the memory transfer is
> complete. This way the calling drmgr process will not exit earlier,
> allowing callbacks to be run only once the migration is fully completed.
> 
> If reading the VASI state is done after the hypervisor has completed the
> migration, the HCALL is returning H_PARAMETER. We can safely assume that
> the memory transfer is achieved if this happens.
> 
> This will also allow to manage the NMI watchdog state in the next commits.
> 
> Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/mobility.c | 42 +++++++++++++++++++++--
>  1 file changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> index 78f3f74c7056..907a779074d6 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -427,6 +427,43 @@ static int wait_for_vasi_session_suspending(u64 handle)
>  	return ret;
>  }
>  
> +static void wait_for_vasi_session_completed(u64 handle)
> +{
> +	unsigned long state = 0;
> +	int ret;
> +
> +	pr_info("waiting for memory transfert to complete...\n");

                                            ^ extra t (also below)
> +
> +	/*
> +	 * Wait for transition from H_VASI_RESUMED to H_VASI_COMPLETED.
> +	 */
> +	while (true) {
> +		ret = poll_vasi_state(handle, &state);
> +
> +		/*
> +		 * If the memory transfer is already complete and the migration
> +		 * has been cleaned up by the hypervisor, H_PARAMETER is return,
> +		 * which is translate in EINVAL by poll_vasi_state().
> +		 */
> +		if (ret == -EINVAL || (!ret && state == H_VASI_COMPLETED)) {
> +			pr_info("memory transfert completed.\n");
> +			break;
> +		}
> +
> +		if (ret) {
> +			pr_err("H_VASI_STATE return error (%d)\n", ret);
> +			break;
> +		}
> +
> +		if (state != H_VASI_RESUMED) {
> +			pr_err("unexpected H_VASI_STATE result %lu\n", state);
> +			break;
> +		}
> +
> +		msleep(500);

Is 500 specified anywhere? Another caller uses 1000, and the other one 
uses some backoff interval starting at 1ms...

> +	}
> +}
> +
>  static void prod_single(unsigned int target_cpu)
>  {
>  	long hvrc;
> @@ -673,9 +710,10 @@ static int pseries_migrate_partition(u64 handle)
>  	vas_migration_handler(VAS_SUSPEND);
>  
>  	ret = pseries_suspend(handle);
> -	if (ret == 0)
> +	if (ret == 0) {
>  		post_mobility_fixup();
> -	else
> +		wait_for_vasi_session_completed(handle);

If this wasn't required until later patches, maybe a comment about why 
it's here? Could call it wait_for_migration() or similar too.

Looks okay though from my basic reading of PAPR.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

> +	} else
>  		pseries_cancel_migration(handle, ret);
>  
>  	vas_migration_handler(VAS_RESUME);
> -- 
> 2.36.1
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 1/4] powerpc/mobility: wait for memory transfer to complete
@ 2022-07-12  1:33     ` Nicholas Piggin
  0 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2022-07-12  1:33 UTC (permalink / raw)
  To: benh, haren, Laurent Dufour, linux, mpe, nathanl, paulus, wim
  Cc: linuxppc-dev, linux-kernel, linux-watchdog

Excerpts from Laurent Dufour's message of June 27, 2022 11:53 pm:
> In pseries_migration_partition(), loop until the memory transfer is
> complete. This way the calling drmgr process will not exit earlier,
> allowing callbacks to be run only once the migration is fully completed.
> 
> If reading the VASI state is done after the hypervisor has completed the
> migration, the HCALL is returning H_PARAMETER. We can safely assume that
> the memory transfer is achieved if this happens.
> 
> This will also allow to manage the NMI watchdog state in the next commits.
> 
> Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/mobility.c | 42 +++++++++++++++++++++--
>  1 file changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> index 78f3f74c7056..907a779074d6 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -427,6 +427,43 @@ static int wait_for_vasi_session_suspending(u64 handle)
>  	return ret;
>  }
>  
> +static void wait_for_vasi_session_completed(u64 handle)
> +{
> +	unsigned long state = 0;
> +	int ret;
> +
> +	pr_info("waiting for memory transfert to complete...\n");

                                            ^ extra t (also below)
> +
> +	/*
> +	 * Wait for transition from H_VASI_RESUMED to H_VASI_COMPLETED.
> +	 */
> +	while (true) {
> +		ret = poll_vasi_state(handle, &state);
> +
> +		/*
> +		 * If the memory transfer is already complete and the migration
> +		 * has been cleaned up by the hypervisor, H_PARAMETER is return,
> +		 * which is translate in EINVAL by poll_vasi_state().
> +		 */
> +		if (ret == -EINVAL || (!ret && state == H_VASI_COMPLETED)) {
> +			pr_info("memory transfert completed.\n");
> +			break;
> +		}
> +
> +		if (ret) {
> +			pr_err("H_VASI_STATE return error (%d)\n", ret);
> +			break;
> +		}
> +
> +		if (state != H_VASI_RESUMED) {
> +			pr_err("unexpected H_VASI_STATE result %lu\n", state);
> +			break;
> +		}
> +
> +		msleep(500);

Is 500 specified anywhere? Another caller uses 1000, and the other one 
uses some backoff interval starting at 1ms...

> +	}
> +}
> +
>  static void prod_single(unsigned int target_cpu)
>  {
>  	long hvrc;
> @@ -673,9 +710,10 @@ static int pseries_migrate_partition(u64 handle)
>  	vas_migration_handler(VAS_SUSPEND);
>  
>  	ret = pseries_suspend(handle);
> -	if (ret == 0)
> +	if (ret == 0) {
>  		post_mobility_fixup();
> -	else
> +		wait_for_vasi_session_completed(handle);

If this wasn't required until later patches, maybe a comment about why 
it's here? Could call it wait_for_migration() or similar too.

Looks okay though from my basic reading of PAPR.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

> +	} else
>  		pseries_cancel_migration(handle, ret);
>  
>  	vas_migration_handler(VAS_RESUME);
> -- 
> 2.36.1
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/4] powerpc/watchdog: introduce a NMI watchdog's factor
  2022-06-27 13:53   ` Laurent Dufour
@ 2022-07-12  1:42     ` Nicholas Piggin
  -1 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2022-07-12  1:42 UTC (permalink / raw)
  To: benh, haren, Laurent Dufour, linux, mpe, nathanl, paulus, wim
  Cc: linux-kernel, linuxppc-dev, linux-watchdog

Excerpts from Laurent Dufour's message of June 27, 2022 11:53 pm:
> Introduce a factor which would apply to the NMI watchdog timeout.
> 
> This factor is a percentage added to the watchdog_tresh value. The value is
> set under the watchdog_mutex protection and lockup_detector_reconfigure()
> is called to recompute wd_panic_timeout_tb.
> 
> Once the factor is set, it remains until it is set back to 0, which means
> no impact.

Looks okay. We could worry about making it more generic or nicer if
another user came along.

Could you make the naming a bit more self documenting? 
watchdog_nmi_set_timeout_pct(), maybe? Does the wd really care
that it is for LPM in particular?

Variables and parameters could have a _pct suffix too.

Otherwise

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

> 
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
> ---
>  arch/powerpc/include/asm/nmi.h |  2 ++
>  arch/powerpc/kernel/watchdog.c | 21 ++++++++++++++++++++-
>  2 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/nmi.h b/arch/powerpc/include/asm/nmi.h
> index ea0e487f87b1..7d6a8d9b0543 100644
> --- a/arch/powerpc/include/asm/nmi.h
> +++ b/arch/powerpc/include/asm/nmi.h
> @@ -5,8 +5,10 @@
>  #ifdef CONFIG_PPC_WATCHDOG
>  extern void arch_touch_nmi_watchdog(void);
>  long soft_nmi_interrupt(struct pt_regs *regs);
> +void watchdog_nmi_set_lpm_factor(u64 factor);
>  #else
>  static inline void arch_touch_nmi_watchdog(void) {}
> +static inline void watchdog_nmi_set_lpm_factor(u64 factor) {}
>  #endif
>  
>  #ifdef CONFIG_NMI_IPI
> diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
> index 7d28b9553654..80851b228f71 100644
> --- a/arch/powerpc/kernel/watchdog.c
> +++ b/arch/powerpc/kernel/watchdog.c
> @@ -91,6 +91,10 @@ static cpumask_t wd_smp_cpus_pending;
>  static cpumask_t wd_smp_cpus_stuck;
>  static u64 wd_smp_last_reset_tb;
>  
> +#ifdef CONFIG_PPC_PSERIES
> +static u64 wd_factor;
> +#endif
> +
>  /*
>   * Try to take the exclusive watchdog action / NMI IPI / printing lock.
>   * wd_smp_lock must be held. If this fails, we should return and wait
> @@ -527,7 +531,13 @@ static int stop_watchdog_on_cpu(unsigned int cpu)
>  
>  static void watchdog_calc_timeouts(void)
>  {
> -	wd_panic_timeout_tb = watchdog_thresh * ppc_tb_freq;
> +	u64 threshold = watchdog_thresh;
> +
> +#ifdef CONFIG_PPC_PSERIES
> +	threshold += (READ_ONCE(wd_factor) * threshold) / 100;
> +#endif
> +
> +	wd_panic_timeout_tb = threshold * ppc_tb_freq;
>  
>  	/* Have the SMP detector trigger a bit later */
>  	wd_smp_panic_timeout_tb = wd_panic_timeout_tb * 3 / 2;
> @@ -570,3 +580,12 @@ int __init watchdog_nmi_probe(void)
>  	}
>  	return 0;
>  }
> +
> +#ifdef CONFIG_PPC_PSERIES
> +void watchdog_nmi_set_lpm_factor(u64 factor)
> +{
> +	pr_info("Set the NMI watchdog factor to %llu%%\n", factor);
> +	WRITE_ONCE(wd_factor, factor);
> +	lockup_detector_reconfigure();
> +}
> +#endif
> -- 
> 2.36.1
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/4] powerpc/watchdog: introduce a NMI watchdog's factor
@ 2022-07-12  1:42     ` Nicholas Piggin
  0 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2022-07-12  1:42 UTC (permalink / raw)
  To: benh, haren, Laurent Dufour, linux, mpe, nathanl, paulus, wim
  Cc: linuxppc-dev, linux-kernel, linux-watchdog

Excerpts from Laurent Dufour's message of June 27, 2022 11:53 pm:
> Introduce a factor which would apply to the NMI watchdog timeout.
> 
> This factor is a percentage added to the watchdog_tresh value. The value is
> set under the watchdog_mutex protection and lockup_detector_reconfigure()
> is called to recompute wd_panic_timeout_tb.
> 
> Once the factor is set, it remains until it is set back to 0, which means
> no impact.

Looks okay. We could worry about making it more generic or nicer if
another user came along.

Could you make the naming a bit more self documenting? 
watchdog_nmi_set_timeout_pct(), maybe? Does the wd really care
that it is for LPM in particular?

Variables and parameters could have a _pct suffix too.

Otherwise

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

> 
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
> ---
>  arch/powerpc/include/asm/nmi.h |  2 ++
>  arch/powerpc/kernel/watchdog.c | 21 ++++++++++++++++++++-
>  2 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/nmi.h b/arch/powerpc/include/asm/nmi.h
> index ea0e487f87b1..7d6a8d9b0543 100644
> --- a/arch/powerpc/include/asm/nmi.h
> +++ b/arch/powerpc/include/asm/nmi.h
> @@ -5,8 +5,10 @@
>  #ifdef CONFIG_PPC_WATCHDOG
>  extern void arch_touch_nmi_watchdog(void);
>  long soft_nmi_interrupt(struct pt_regs *regs);
> +void watchdog_nmi_set_lpm_factor(u64 factor);
>  #else
>  static inline void arch_touch_nmi_watchdog(void) {}
> +static inline void watchdog_nmi_set_lpm_factor(u64 factor) {}
>  #endif
>  
>  #ifdef CONFIG_NMI_IPI
> diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
> index 7d28b9553654..80851b228f71 100644
> --- a/arch/powerpc/kernel/watchdog.c
> +++ b/arch/powerpc/kernel/watchdog.c
> @@ -91,6 +91,10 @@ static cpumask_t wd_smp_cpus_pending;
>  static cpumask_t wd_smp_cpus_stuck;
>  static u64 wd_smp_last_reset_tb;
>  
> +#ifdef CONFIG_PPC_PSERIES
> +static u64 wd_factor;
> +#endif
> +
>  /*
>   * Try to take the exclusive watchdog action / NMI IPI / printing lock.
>   * wd_smp_lock must be held. If this fails, we should return and wait
> @@ -527,7 +531,13 @@ static int stop_watchdog_on_cpu(unsigned int cpu)
>  
>  static void watchdog_calc_timeouts(void)
>  {
> -	wd_panic_timeout_tb = watchdog_thresh * ppc_tb_freq;
> +	u64 threshold = watchdog_thresh;
> +
> +#ifdef CONFIG_PPC_PSERIES
> +	threshold += (READ_ONCE(wd_factor) * threshold) / 100;
> +#endif
> +
> +	wd_panic_timeout_tb = threshold * ppc_tb_freq;
>  
>  	/* Have the SMP detector trigger a bit later */
>  	wd_smp_panic_timeout_tb = wd_panic_timeout_tb * 3 / 2;
> @@ -570,3 +580,12 @@ int __init watchdog_nmi_probe(void)
>  	}
>  	return 0;
>  }
> +
> +#ifdef CONFIG_PPC_PSERIES
> +void watchdog_nmi_set_lpm_factor(u64 factor)
> +{
> +	pr_info("Set the NMI watchdog factor to %llu%%\n", factor);
> +	WRITE_ONCE(wd_factor, factor);
> +	lockup_detector_reconfigure();
> +}
> +#endif
> -- 
> 2.36.1
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 4/4] pseries/mobility: set NMI watchdog factor during LPM
  2022-06-27 13:53   ` Laurent Dufour
@ 2022-07-12  1:46     ` Nicholas Piggin
  -1 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2022-07-12  1:46 UTC (permalink / raw)
  To: benh, haren, Laurent Dufour, linux, mpe, nathanl, paulus, wim
  Cc: linux-kernel, linuxppc-dev, linux-watchdog

Excerpts from Laurent Dufour's message of June 27, 2022 11:53 pm:
> During a LPM, while the memory transfer is in progress on the arrival side,
> some latencies is generated when accessing not yet transferred pages on the
> arrival side. Thus, the NMI watchdog may be triggered too frequently, which
> increases the risk to hit a NMI interrupt in a bad place in the kernel,
> leading to a kernel panic.
> 
> Disabling the Hard Lockup Watchdog until the memory transfer could be a too
> strong work around, some users would want this timeout to be eventually
> triggered if the system is hanging even during LPM.
> 
> Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply
> a factor to the NMI watchdog timeout during a LPM. Just before the CPU are
> stopped for the switchover sequence, the NMI watchdog timer is set to
>  watchdog_tresh + factor%
> 
> A value of 0 has no effect. The default value is 200, meaning that the NMI
> watchdog is set to 30s during LPM (based on a 10s watchdog_tresh value).
> Once the memory transfer is achieved, the factor is reset to 0.
> 
> Setting this value to a high number is like disabling the NMI watchdog
> during a LPM.
> 
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
> ---
>  Documentation/admin-guide/sysctl/kernel.rst | 12 ++++++
>  arch/powerpc/platforms/pseries/mobility.c   | 43 +++++++++++++++++++++
>  2 files changed, 55 insertions(+)
> 
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index ddccd1077462..0bb0b7f27e96 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -592,6 +592,18 @@ to the guest kernel command line (see
>  Documentation/admin-guide/kernel-parameters.rst).
>  
>  
> +nmi_watchdog_factor (PPC only)
> +==================================
> +
> +Factor apply to to the NMI watchdog timeout (only when ``nmi_watchdog`` is
> +set to 1). This factor represents the percentage added to
> +``watchdog_thresh`` when calculating the NMI watchdog timeout during a
> +LPM. The soft lockup timeout is not impacted.

Could "LPM" or "mobility" be a bit more prominent in the parameter name
and documentation? Something else might want to add a factor as well,
one day.

Otherwise the code looks okay.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

> +
> +A value of 0 means no change. The default value is 200 meaning the NMI
> +watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10).
> +
> +
>  numa_balancing
>  ==============
>  
> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> index 907a779074d6..649155faafc2 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -48,6 +48,39 @@ struct update_props_workarea {
>  #define MIGRATION_SCOPE	(1)
>  #define PRRN_SCOPE -2
>  
> +#ifdef CONFIG_PPC_WATCHDOG
> +static unsigned int nmi_wd_factor = 200;
> +
> +#ifdef CONFIG_SYSCTL
> +static struct ctl_table nmi_wd_factor_ctl_table[] = {
> +	{
> +		.procname	= "nmi_watchdog_factor",
> +		.data		= &nmi_wd_factor,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_douintvec_minmax,
> +	},
> +	{}
> +};
> +static struct ctl_table nmi_wd_factor_sysctl_root[] = {
> +	{
> +		.procname       = "kernel",
> +		.mode           = 0555,
> +		.child          = nmi_wd_factor_ctl_table,
> +	},
> +	{}
> +};
> +
> +static int __init register_nmi_wd_factor_sysctl(void)
> +{
> +	register_sysctl_table(nmi_wd_factor_sysctl_root);
> +
> +	return 0;
> +}
> +device_initcall(register_nmi_wd_factor_sysctl);
> +#endif /* CONFIG_SYSCTL */
> +#endif /* CONFIG_PPC_WATCHDOG */
> +
>  static int mobility_rtas_call(int token, char *buf, s32 scope)
>  {
>  	int rc;
> @@ -702,13 +735,20 @@ static int pseries_suspend(u64 handle)
>  static int pseries_migrate_partition(u64 handle)
>  {
>  	int ret;
> +	unsigned int factor = 0;
>  
> +#ifdef CONFIG_PPC_WATCHDOG
> +	factor = nmi_wd_factor;
> +#endif
>  	ret = wait_for_vasi_session_suspending(handle);
>  	if (ret)
>  		return ret;
>  
>  	vas_migration_handler(VAS_SUSPEND);
>  
> +	if (factor)
> +		watchdog_nmi_set_lpm_factor(factor);
> +
>  	ret = pseries_suspend(handle);
>  	if (ret == 0) {
>  		post_mobility_fixup();
> @@ -716,6 +756,9 @@ static int pseries_migrate_partition(u64 handle)
>  	} else
>  		pseries_cancel_migration(handle, ret);
>  
> +	if (factor)
> +		watchdog_nmi_set_lpm_factor(0);
> +
>  	vas_migration_handler(VAS_RESUME);
>  
>  	return ret;
> -- 
> 2.36.1
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 4/4] pseries/mobility: set NMI watchdog factor during LPM
@ 2022-07-12  1:46     ` Nicholas Piggin
  0 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2022-07-12  1:46 UTC (permalink / raw)
  To: benh, haren, Laurent Dufour, linux, mpe, nathanl, paulus, wim
  Cc: linuxppc-dev, linux-kernel, linux-watchdog

Excerpts from Laurent Dufour's message of June 27, 2022 11:53 pm:
> During a LPM, while the memory transfer is in progress on the arrival side,
> some latencies is generated when accessing not yet transferred pages on the
> arrival side. Thus, the NMI watchdog may be triggered too frequently, which
> increases the risk to hit a NMI interrupt in a bad place in the kernel,
> leading to a kernel panic.
> 
> Disabling the Hard Lockup Watchdog until the memory transfer could be a too
> strong work around, some users would want this timeout to be eventually
> triggered if the system is hanging even during LPM.
> 
> Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply
> a factor to the NMI watchdog timeout during a LPM. Just before the CPU are
> stopped for the switchover sequence, the NMI watchdog timer is set to
>  watchdog_tresh + factor%
> 
> A value of 0 has no effect. The default value is 200, meaning that the NMI
> watchdog is set to 30s during LPM (based on a 10s watchdog_tresh value).
> Once the memory transfer is achieved, the factor is reset to 0.
> 
> Setting this value to a high number is like disabling the NMI watchdog
> during a LPM.
> 
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
> ---
>  Documentation/admin-guide/sysctl/kernel.rst | 12 ++++++
>  arch/powerpc/platforms/pseries/mobility.c   | 43 +++++++++++++++++++++
>  2 files changed, 55 insertions(+)
> 
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index ddccd1077462..0bb0b7f27e96 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -592,6 +592,18 @@ to the guest kernel command line (see
>  Documentation/admin-guide/kernel-parameters.rst).
>  
>  
> +nmi_watchdog_factor (PPC only)
> +==================================
> +
> +Factor apply to to the NMI watchdog timeout (only when ``nmi_watchdog`` is
> +set to 1). This factor represents the percentage added to
> +``watchdog_thresh`` when calculating the NMI watchdog timeout during a
> +LPM. The soft lockup timeout is not impacted.

Could "LPM" or "mobility" be a bit more prominent in the parameter name
and documentation? Something else might want to add a factor as well,
one day.

Otherwise the code looks okay.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

> +
> +A value of 0 means no change. The default value is 200 meaning the NMI
> +watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10).
> +
> +
>  numa_balancing
>  ==============
>  
> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> index 907a779074d6..649155faafc2 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -48,6 +48,39 @@ struct update_props_workarea {
>  #define MIGRATION_SCOPE	(1)
>  #define PRRN_SCOPE -2
>  
> +#ifdef CONFIG_PPC_WATCHDOG
> +static unsigned int nmi_wd_factor = 200;
> +
> +#ifdef CONFIG_SYSCTL
> +static struct ctl_table nmi_wd_factor_ctl_table[] = {
> +	{
> +		.procname	= "nmi_watchdog_factor",
> +		.data		= &nmi_wd_factor,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_douintvec_minmax,
> +	},
> +	{}
> +};
> +static struct ctl_table nmi_wd_factor_sysctl_root[] = {
> +	{
> +		.procname       = "kernel",
> +		.mode           = 0555,
> +		.child          = nmi_wd_factor_ctl_table,
> +	},
> +	{}
> +};
> +
> +static int __init register_nmi_wd_factor_sysctl(void)
> +{
> +	register_sysctl_table(nmi_wd_factor_sysctl_root);
> +
> +	return 0;
> +}
> +device_initcall(register_nmi_wd_factor_sysctl);
> +#endif /* CONFIG_SYSCTL */
> +#endif /* CONFIG_PPC_WATCHDOG */
> +
>  static int mobility_rtas_call(int token, char *buf, s32 scope)
>  {
>  	int rc;
> @@ -702,13 +735,20 @@ static int pseries_suspend(u64 handle)
>  static int pseries_migrate_partition(u64 handle)
>  {
>  	int ret;
> +	unsigned int factor = 0;
>  
> +#ifdef CONFIG_PPC_WATCHDOG
> +	factor = nmi_wd_factor;
> +#endif
>  	ret = wait_for_vasi_session_suspending(handle);
>  	if (ret)
>  		return ret;
>  
>  	vas_migration_handler(VAS_SUSPEND);
>  
> +	if (factor)
> +		watchdog_nmi_set_lpm_factor(factor);
> +
>  	ret = pseries_suspend(handle);
>  	if (ret == 0) {
>  		post_mobility_fixup();
> @@ -716,6 +756,9 @@ static int pseries_migrate_partition(u64 handle)
>  	} else
>  		pseries_cancel_migration(handle, ret);
>  
> +	if (factor)
> +		watchdog_nmi_set_lpm_factor(0);
> +
>  	vas_migration_handler(VAS_RESUME);
>  
>  	return ret;
> -- 
> 2.36.1
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/4] Extending NMI watchdog during LPM
  2022-07-12  1:21   ` Nicholas Piggin
@ 2022-07-12  9:32     ` Laurent Dufour
  -1 siblings, 0 replies; 28+ messages in thread
From: Laurent Dufour @ 2022-07-12  9:32 UTC (permalink / raw)
  To: Nicholas Piggin, benh, haren, linux, mpe, nathanl, paulus, wim
  Cc: linux-kernel, linuxppc-dev, linux-watchdog

Le 12/07/2022 à 03:21, Nicholas Piggin a écrit :
> Excerpts from Laurent Dufour's message of June 27, 2022 11:53 pm:
>> When a partition is transferred, once it arrives at the destination node,
>> the partition is active but much of its memory must be transferred from the
>> start node.
>>
>> It depends on the activity in the partition, but the more CPU the partition
>> has, the more memory to be transferred is likely to be. This causes latency
>> when accessing pages that need to be transferred, and often, for large
>> partitions, it triggers the NMI watchdog.
> 
> Importantly, it can require page in of code that runs with irqs 
> disabled, which is unlike a guest normally runs under PowerVM
> (but it can under KVM) which is why we enabled the watchdog under
> PowerVM but not KVM. So, okay it makes sense to mak an exception
> for this case.

On PowerVM, irqs disabled code may trigger multiple page in of data, which
is generating latencies too. Furthermore, the hypervisor is currently not
prioritizing page in operation based on the faulting CPU state. So the
kernel may have to wait for user page in operations to last.

> Thanks,
> Nick
> 
>> The NMI watchdog causes the CPU stack to dump where it appears to be
>> stuck. In this case, it does not bring much information since it can happen
>> during any memory access of the kernel.
>>
>> In addition, the NMI interrupt mechanism is not secure and can generate a
>> dump system in the event that the interruption is taken while MSR[RI]=0.
>>
>> Depending on the LPAR size and load, it may be interesting to extend the
>> NMI watchdog timer during the LPM.
>>
>> That's configurable through sysctl with the new introduced variable
>> (specific to powerpc) nmi_watchdog_factor. This value represents the
>> percentage added to watchdog_tresh to set the NMI watchdog timeout during a
>> LPM.
>>
>> Changes in v3:
>>  - don't export watchdog_mutex
>>  - fix a comment in mobilty.c, wait_for_vasi_session_completed()
>>  - fix a build issue when !CONFIG_PPC_WATCHDOG
>>  - rework some printk and rename the sysctl variable.
>>
>> v2:
>> https://lore.kernel.org/all/20220614135414.37746-1-ldufour@linux.ibm.com/
>>
>> Laurent Dufour (4):
>>   powerpc/mobility: wait for memory transfer to complete
>>   watchdog: export lockup_detector_reconfigure
>>   powerpc/watchdog: introduce a NMI watchdog's factor
>>   pseries/mobility: set NMI watchdog factor during LPM
>>
>>  Documentation/admin-guide/sysctl/kernel.rst | 12 +++
>>  arch/powerpc/include/asm/nmi.h              |  2 +
>>  arch/powerpc/kernel/watchdog.c              | 21 ++++-
>>  arch/powerpc/platforms/pseries/mobility.c   | 85 ++++++++++++++++++++-
>>  include/linux/nmi.h                         |  2 +
>>  kernel/watchdog.c                           | 21 +++--
>>  6 files changed, 135 insertions(+), 8 deletions(-)
>>
>> -- 
>> 2.36.1
>>
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/4] Extending NMI watchdog during LPM
@ 2022-07-12  9:32     ` Laurent Dufour
  0 siblings, 0 replies; 28+ messages in thread
From: Laurent Dufour @ 2022-07-12  9:32 UTC (permalink / raw)
  To: Nicholas Piggin, benh, haren, linux, mpe, nathanl, paulus, wim
  Cc: linuxppc-dev, linux-kernel, linux-watchdog

Le 12/07/2022 à 03:21, Nicholas Piggin a écrit :
> Excerpts from Laurent Dufour's message of June 27, 2022 11:53 pm:
>> When a partition is transferred, once it arrives at the destination node,
>> the partition is active but much of its memory must be transferred from the
>> start node.
>>
>> It depends on the activity in the partition, but the more CPU the partition
>> has, the more memory to be transferred is likely to be. This causes latency
>> when accessing pages that need to be transferred, and often, for large
>> partitions, it triggers the NMI watchdog.
> 
> Importantly, it can require page in of code that runs with irqs 
> disabled, which is unlike a guest normally runs under PowerVM
> (but it can under KVM) which is why we enabled the watchdog under
> PowerVM but not KVM. So, okay it makes sense to mak an exception
> for this case.

On PowerVM, irqs disabled code may trigger multiple page in of data, which
is generating latencies too. Furthermore, the hypervisor is currently not
prioritizing page in operation based on the faulting CPU state. So the
kernel may have to wait for user page in operations to last.

> Thanks,
> Nick
> 
>> The NMI watchdog causes the CPU stack to dump where it appears to be
>> stuck. In this case, it does not bring much information since it can happen
>> during any memory access of the kernel.
>>
>> In addition, the NMI interrupt mechanism is not secure and can generate a
>> dump system in the event that the interruption is taken while MSR[RI]=0.
>>
>> Depending on the LPAR size and load, it may be interesting to extend the
>> NMI watchdog timer during the LPM.
>>
>> That's configurable through sysctl with the new introduced variable
>> (specific to powerpc) nmi_watchdog_factor. This value represents the
>> percentage added to watchdog_tresh to set the NMI watchdog timeout during a
>> LPM.
>>
>> Changes in v3:
>>  - don't export watchdog_mutex
>>  - fix a comment in mobilty.c, wait_for_vasi_session_completed()
>>  - fix a build issue when !CONFIG_PPC_WATCHDOG
>>  - rework some printk and rename the sysctl variable.
>>
>> v2:
>> https://lore.kernel.org/all/20220614135414.37746-1-ldufour@linux.ibm.com/
>>
>> Laurent Dufour (4):
>>   powerpc/mobility: wait for memory transfer to complete
>>   watchdog: export lockup_detector_reconfigure
>>   powerpc/watchdog: introduce a NMI watchdog's factor
>>   pseries/mobility: set NMI watchdog factor during LPM
>>
>>  Documentation/admin-guide/sysctl/kernel.rst | 12 +++
>>  arch/powerpc/include/asm/nmi.h              |  2 +
>>  arch/powerpc/kernel/watchdog.c              | 21 ++++-
>>  arch/powerpc/platforms/pseries/mobility.c   | 85 ++++++++++++++++++++-
>>  include/linux/nmi.h                         |  2 +
>>  kernel/watchdog.c                           | 21 +++--
>>  6 files changed, 135 insertions(+), 8 deletions(-)
>>
>> -- 
>> 2.36.1
>>
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 1/4] powerpc/mobility: wait for memory transfer to complete
  2022-07-12  1:33     ` Nicholas Piggin
@ 2022-07-12  9:38       ` Laurent Dufour
  -1 siblings, 0 replies; 28+ messages in thread
From: Laurent Dufour @ 2022-07-12  9:38 UTC (permalink / raw)
  To: Nicholas Piggin, benh, haren, linux, mpe, nathanl, paulus, wim
  Cc: linux-kernel, linuxppc-dev, linux-watchdog

Le 12/07/2022 à 03:33, Nicholas Piggin a écrit :
> Excerpts from Laurent Dufour's message of June 27, 2022 11:53 pm:
>> In pseries_migration_partition(), loop until the memory transfer is
>> complete. This way the calling drmgr process will not exit earlier,
>> allowing callbacks to be run only once the migration is fully completed.
>>
>> If reading the VASI state is done after the hypervisor has completed the
>> migration, the HCALL is returning H_PARAMETER. We can safely assume that
>> the memory transfer is achieved if this happens.
>>
>> This will also allow to manage the NMI watchdog state in the next commits.
>>
>> Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
>> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
>> ---
>>  arch/powerpc/platforms/pseries/mobility.c | 42 +++++++++++++++++++++--
>>  1 file changed, 40 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>> index 78f3f74c7056..907a779074d6 100644
>> --- a/arch/powerpc/platforms/pseries/mobility.c
>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>> @@ -427,6 +427,43 @@ static int wait_for_vasi_session_suspending(u64 handle)
>>  	return ret;
>>  }
>>  
>> +static void wait_for_vasi_session_completed(u64 handle)
>> +{
>> +	unsigned long state = 0;
>> +	int ret;
>> +
>> +	pr_info("waiting for memory transfert to complete...\n");
> 
>                                             ^ extra t (also below)

I tried to push one French word, but you caught it ;)
Will fix that and the other ones.

>> +
>> +	/*
>> +	 * Wait for transition from H_VASI_RESUMED to H_VASI_COMPLETED.
>> +	 */
>> +	while (true) {
>> +		ret = poll_vasi_state(handle, &state);
>> +
>> +		/*
>> +		 * If the memory transfer is already complete and the migration
>> +		 * has been cleaned up by the hypervisor, H_PARAMETER is return,
>> +		 * which is translate in EINVAL by poll_vasi_state().
>> +		 */
>> +		if (ret == -EINVAL || (!ret && state == H_VASI_COMPLETED)) {
>> +			pr_info("memory transfert completed.\n");
>> +			break;
>> +		}
>> +
>> +		if (ret) {
>> +			pr_err("H_VASI_STATE return error (%d)\n", ret);
>> +			break;
>> +		}
>> +
>> +		if (state != H_VASI_RESUMED) {
>> +			pr_err("unexpected H_VASI_STATE result %lu\n", state);
>> +			break;
>> +		}
>> +
>> +		msleep(500);
> 
> Is 500 specified anywhere? Another caller uses 1000, and the other one 
> uses some backoff interval starting at 1ms...

This is a bit empiric, the idea is to wait for the overall memory transfer
to be done. There is no real need to interact immediately after the
operation is terminated, so I pick that value to not make too many Hcalls
just for that. From the test I did, that seems to be a reasonable choice.

> 
>> +	}
>> +}
>> +
>>  static void prod_single(unsigned int target_cpu)
>>  {
>>  	long hvrc;
>> @@ -673,9 +710,10 @@ static int pseries_migrate_partition(u64 handle)
>>  	vas_migration_handler(VAS_SUSPEND);
>>  
>>  	ret = pseries_suspend(handle);
>> -	if (ret == 0)
>> +	if (ret == 0) {
>>  		post_mobility_fixup();
>> -	else
>> +		wait_for_vasi_session_completed(handle);
> 
> If this wasn't required until later patches, maybe a comment about why 
> it's here? Could call it wait_for_migration() or similar too.
> 
> Looks okay though from my basic reading of PAPR.
> 
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

Thanks Nick for reviewing this series.

> 
>> +	} else
>>  		pseries_cancel_migration(handle, ret);
>>  
>>  	vas_migration_handler(VAS_RESUME);
>> -- 
>> 2.36.1
>>
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 1/4] powerpc/mobility: wait for memory transfer to complete
@ 2022-07-12  9:38       ` Laurent Dufour
  0 siblings, 0 replies; 28+ messages in thread
From: Laurent Dufour @ 2022-07-12  9:38 UTC (permalink / raw)
  To: Nicholas Piggin, benh, haren, linux, mpe, nathanl, paulus, wim
  Cc: linuxppc-dev, linux-kernel, linux-watchdog

Le 12/07/2022 à 03:33, Nicholas Piggin a écrit :
> Excerpts from Laurent Dufour's message of June 27, 2022 11:53 pm:
>> In pseries_migration_partition(), loop until the memory transfer is
>> complete. This way the calling drmgr process will not exit earlier,
>> allowing callbacks to be run only once the migration is fully completed.
>>
>> If reading the VASI state is done after the hypervisor has completed the
>> migration, the HCALL is returning H_PARAMETER. We can safely assume that
>> the memory transfer is achieved if this happens.
>>
>> This will also allow to manage the NMI watchdog state in the next commits.
>>
>> Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
>> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
>> ---
>>  arch/powerpc/platforms/pseries/mobility.c | 42 +++++++++++++++++++++--
>>  1 file changed, 40 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>> index 78f3f74c7056..907a779074d6 100644
>> --- a/arch/powerpc/platforms/pseries/mobility.c
>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>> @@ -427,6 +427,43 @@ static int wait_for_vasi_session_suspending(u64 handle)
>>  	return ret;
>>  }
>>  
>> +static void wait_for_vasi_session_completed(u64 handle)
>> +{
>> +	unsigned long state = 0;
>> +	int ret;
>> +
>> +	pr_info("waiting for memory transfert to complete...\n");
> 
>                                             ^ extra t (also below)

I tried to push one French word, but you caught it ;)
Will fix that and the other ones.

>> +
>> +	/*
>> +	 * Wait for transition from H_VASI_RESUMED to H_VASI_COMPLETED.
>> +	 */
>> +	while (true) {
>> +		ret = poll_vasi_state(handle, &state);
>> +
>> +		/*
>> +		 * If the memory transfer is already complete and the migration
>> +		 * has been cleaned up by the hypervisor, H_PARAMETER is return,
>> +		 * which is translate in EINVAL by poll_vasi_state().
>> +		 */
>> +		if (ret == -EINVAL || (!ret && state == H_VASI_COMPLETED)) {
>> +			pr_info("memory transfert completed.\n");
>> +			break;
>> +		}
>> +
>> +		if (ret) {
>> +			pr_err("H_VASI_STATE return error (%d)\n", ret);
>> +			break;
>> +		}
>> +
>> +		if (state != H_VASI_RESUMED) {
>> +			pr_err("unexpected H_VASI_STATE result %lu\n", state);
>> +			break;
>> +		}
>> +
>> +		msleep(500);
> 
> Is 500 specified anywhere? Another caller uses 1000, and the other one 
> uses some backoff interval starting at 1ms...

This is a bit empiric, the idea is to wait for the overall memory transfer
to be done. There is no real need to interact immediately after the
operation is terminated, so I pick that value to not make too many Hcalls
just for that. From the test I did, that seems to be a reasonable choice.

> 
>> +	}
>> +}
>> +
>>  static void prod_single(unsigned int target_cpu)
>>  {
>>  	long hvrc;
>> @@ -673,9 +710,10 @@ static int pseries_migrate_partition(u64 handle)
>>  	vas_migration_handler(VAS_SUSPEND);
>>  
>>  	ret = pseries_suspend(handle);
>> -	if (ret == 0)
>> +	if (ret == 0) {
>>  		post_mobility_fixup();
>> -	else
>> +		wait_for_vasi_session_completed(handle);
> 
> If this wasn't required until later patches, maybe a comment about why 
> it's here? Could call it wait_for_migration() or similar too.
> 
> Looks okay though from my basic reading of PAPR.
> 
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

Thanks Nick for reviewing this series.

> 
>> +	} else
>>  		pseries_cancel_migration(handle, ret);
>>  
>>  	vas_migration_handler(VAS_RESUME);
>> -- 
>> 2.36.1
>>
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 4/4] pseries/mobility: set NMI watchdog factor during LPM
  2022-07-12  1:46     ` Nicholas Piggin
@ 2022-07-12  9:47       ` Laurent Dufour
  -1 siblings, 0 replies; 28+ messages in thread
From: Laurent Dufour @ 2022-07-12  9:47 UTC (permalink / raw)
  To: Nicholas Piggin, nathanl
  Cc: linux-kernel, linux, haren, benh, linuxppc-dev, linux-watchdog,
	mpe, paulus, wim

Le 12/07/2022 à 03:46, Nicholas Piggin a écrit :
> Excerpts from Laurent Dufour's message of June 27, 2022 11:53 pm:
>> During a LPM, while the memory transfer is in progress on the arrival side,
>> some latencies is generated when accessing not yet transferred pages on the
>> arrival side. Thus, the NMI watchdog may be triggered too frequently, which
>> increases the risk to hit a NMI interrupt in a bad place in the kernel,
>> leading to a kernel panic.
>>
>> Disabling the Hard Lockup Watchdog until the memory transfer could be a too
>> strong work around, some users would want this timeout to be eventually
>> triggered if the system is hanging even during LPM.
>>
>> Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply
>> a factor to the NMI watchdog timeout during a LPM. Just before the CPU are
>> stopped for the switchover sequence, the NMI watchdog timer is set to
>>  watchdog_tresh + factor%
>>
>> A value of 0 has no effect. The default value is 200, meaning that the NMI
>> watchdog is set to 30s during LPM (based on a 10s watchdog_tresh value).
>> Once the memory transfer is achieved, the factor is reset to 0.
>>
>> Setting this value to a high number is like disabling the NMI watchdog
>> during a LPM.
>>
>> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
>> ---
>>  Documentation/admin-guide/sysctl/kernel.rst | 12 ++++++
>>  arch/powerpc/platforms/pseries/mobility.c   | 43 +++++++++++++++++++++
>>  2 files changed, 55 insertions(+)
>>
>> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
>> index ddccd1077462..0bb0b7f27e96 100644
>> --- a/Documentation/admin-guide/sysctl/kernel.rst
>> +++ b/Documentation/admin-guide/sysctl/kernel.rst
>> @@ -592,6 +592,18 @@ to the guest kernel command line (see
>>  Documentation/admin-guide/kernel-parameters.rst).
>>  
>>  
>> +nmi_watchdog_factor (PPC only)
>> +==================================
>> +
>> +Factor apply to to the NMI watchdog timeout (only when ``nmi_watchdog`` is
>> +set to 1). This factor represents the percentage added to
>> +``watchdog_thresh`` when calculating the NMI watchdog timeout during a
>> +LPM. The soft lockup timeout is not impacted.
> 
> Could "LPM" or "mobility" be a bit more prominent in the parameter name
> and documentation? Something else might want to add a factor as well,
> one day.

In the V2 version, Nathan suggested "making the user-visible
name more generic (e.g. "nmi_watchdog_factor") in case it makes sense to
apply this to other contexts in the future."

So I made the change to a more generic name. I think this is a good option
since the documentation is explicit about the LPM particular case.
If in the future this factor needs to apply during an other operation that
name will be generic enough.

Do you agree ?

> 
> Otherwise the code looks okay.
> 
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
> 
>> +
>> +A value of 0 means no change. The default value is 200 meaning the NMI
>> +watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10).
>> +
>> +
>>  numa_balancing
>>  ==============
>>  
>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>> index 907a779074d6..649155faafc2 100644
>> --- a/arch/powerpc/platforms/pseries/mobility.c
>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>> @@ -48,6 +48,39 @@ struct update_props_workarea {
>>  #define MIGRATION_SCOPE	(1)
>>  #define PRRN_SCOPE -2
>>  
>> +#ifdef CONFIG_PPC_WATCHDOG
>> +static unsigned int nmi_wd_factor = 200;
>> +
>> +#ifdef CONFIG_SYSCTL
>> +static struct ctl_table nmi_wd_factor_ctl_table[] = {
>> +	{
>> +		.procname	= "nmi_watchdog_factor",
>> +		.data		= &nmi_wd_factor,
>> +		.maxlen		= sizeof(int),
>> +		.mode		= 0644,
>> +		.proc_handler	= proc_douintvec_minmax,
>> +	},
>> +	{}
>> +};
>> +static struct ctl_table nmi_wd_factor_sysctl_root[] = {
>> +	{
>> +		.procname       = "kernel",
>> +		.mode           = 0555,
>> +		.child          = nmi_wd_factor_ctl_table,
>> +	},
>> +	{}
>> +};
>> +
>> +static int __init register_nmi_wd_factor_sysctl(void)
>> +{
>> +	register_sysctl_table(nmi_wd_factor_sysctl_root);
>> +
>> +	return 0;
>> +}
>> +device_initcall(register_nmi_wd_factor_sysctl);
>> +#endif /* CONFIG_SYSCTL */
>> +#endif /* CONFIG_PPC_WATCHDOG */
>> +
>>  static int mobility_rtas_call(int token, char *buf, s32 scope)
>>  {
>>  	int rc;
>> @@ -702,13 +735,20 @@ static int pseries_suspend(u64 handle)
>>  static int pseries_migrate_partition(u64 handle)
>>  {
>>  	int ret;
>> +	unsigned int factor = 0;
>>  
>> +#ifdef CONFIG_PPC_WATCHDOG
>> +	factor = nmi_wd_factor;
>> +#endif
>>  	ret = wait_for_vasi_session_suspending(handle);
>>  	if (ret)
>>  		return ret;
>>  
>>  	vas_migration_handler(VAS_SUSPEND);
>>  
>> +	if (factor)
>> +		watchdog_nmi_set_lpm_factor(factor);
>> +
>>  	ret = pseries_suspend(handle);
>>  	if (ret == 0) {
>>  		post_mobility_fixup();
>> @@ -716,6 +756,9 @@ static int pseries_migrate_partition(u64 handle)
>>  	} else
>>  		pseries_cancel_migration(handle, ret);
>>  
>> +	if (factor)
>> +		watchdog_nmi_set_lpm_factor(0);
>> +
>>  	vas_migration_handler(VAS_RESUME);
>>  
>>  	return ret;
>> -- 
>> 2.36.1
>>
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 4/4] pseries/mobility: set NMI watchdog factor during LPM
@ 2022-07-12  9:47       ` Laurent Dufour
  0 siblings, 0 replies; 28+ messages in thread
From: Laurent Dufour @ 2022-07-12  9:47 UTC (permalink / raw)
  To: Nicholas Piggin, nathanl
  Cc: linux-watchdog, linux-kernel, paulus, linux, wim, linuxppc-dev, haren

Le 12/07/2022 à 03:46, Nicholas Piggin a écrit :
> Excerpts from Laurent Dufour's message of June 27, 2022 11:53 pm:
>> During a LPM, while the memory transfer is in progress on the arrival side,
>> some latencies is generated when accessing not yet transferred pages on the
>> arrival side. Thus, the NMI watchdog may be triggered too frequently, which
>> increases the risk to hit a NMI interrupt in a bad place in the kernel,
>> leading to a kernel panic.
>>
>> Disabling the Hard Lockup Watchdog until the memory transfer could be a too
>> strong work around, some users would want this timeout to be eventually
>> triggered if the system is hanging even during LPM.
>>
>> Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply
>> a factor to the NMI watchdog timeout during a LPM. Just before the CPU are
>> stopped for the switchover sequence, the NMI watchdog timer is set to
>>  watchdog_tresh + factor%
>>
>> A value of 0 has no effect. The default value is 200, meaning that the NMI
>> watchdog is set to 30s during LPM (based on a 10s watchdog_tresh value).
>> Once the memory transfer is achieved, the factor is reset to 0.
>>
>> Setting this value to a high number is like disabling the NMI watchdog
>> during a LPM.
>>
>> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
>> ---
>>  Documentation/admin-guide/sysctl/kernel.rst | 12 ++++++
>>  arch/powerpc/platforms/pseries/mobility.c   | 43 +++++++++++++++++++++
>>  2 files changed, 55 insertions(+)
>>
>> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
>> index ddccd1077462..0bb0b7f27e96 100644
>> --- a/Documentation/admin-guide/sysctl/kernel.rst
>> +++ b/Documentation/admin-guide/sysctl/kernel.rst
>> @@ -592,6 +592,18 @@ to the guest kernel command line (see
>>  Documentation/admin-guide/kernel-parameters.rst).
>>  
>>  
>> +nmi_watchdog_factor (PPC only)
>> +==================================
>> +
>> +Factor apply to to the NMI watchdog timeout (only when ``nmi_watchdog`` is
>> +set to 1). This factor represents the percentage added to
>> +``watchdog_thresh`` when calculating the NMI watchdog timeout during a
>> +LPM. The soft lockup timeout is not impacted.
> 
> Could "LPM" or "mobility" be a bit more prominent in the parameter name
> and documentation? Something else might want to add a factor as well,
> one day.

In the V2 version, Nathan suggested "making the user-visible
name more generic (e.g. "nmi_watchdog_factor") in case it makes sense to
apply this to other contexts in the future."

So I made the change to a more generic name. I think this is a good option
since the documentation is explicit about the LPM particular case.
If in the future this factor needs to apply during an other operation that
name will be generic enough.

Do you agree ?

> 
> Otherwise the code looks okay.
> 
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
> 
>> +
>> +A value of 0 means no change. The default value is 200 meaning the NMI
>> +watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10).
>> +
>> +
>>  numa_balancing
>>  ==============
>>  
>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>> index 907a779074d6..649155faafc2 100644
>> --- a/arch/powerpc/platforms/pseries/mobility.c
>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>> @@ -48,6 +48,39 @@ struct update_props_workarea {
>>  #define MIGRATION_SCOPE	(1)
>>  #define PRRN_SCOPE -2
>>  
>> +#ifdef CONFIG_PPC_WATCHDOG
>> +static unsigned int nmi_wd_factor = 200;
>> +
>> +#ifdef CONFIG_SYSCTL
>> +static struct ctl_table nmi_wd_factor_ctl_table[] = {
>> +	{
>> +		.procname	= "nmi_watchdog_factor",
>> +		.data		= &nmi_wd_factor,
>> +		.maxlen		= sizeof(int),
>> +		.mode		= 0644,
>> +		.proc_handler	= proc_douintvec_minmax,
>> +	},
>> +	{}
>> +};
>> +static struct ctl_table nmi_wd_factor_sysctl_root[] = {
>> +	{
>> +		.procname       = "kernel",
>> +		.mode           = 0555,
>> +		.child          = nmi_wd_factor_ctl_table,
>> +	},
>> +	{}
>> +};
>> +
>> +static int __init register_nmi_wd_factor_sysctl(void)
>> +{
>> +	register_sysctl_table(nmi_wd_factor_sysctl_root);
>> +
>> +	return 0;
>> +}
>> +device_initcall(register_nmi_wd_factor_sysctl);
>> +#endif /* CONFIG_SYSCTL */
>> +#endif /* CONFIG_PPC_WATCHDOG */
>> +
>>  static int mobility_rtas_call(int token, char *buf, s32 scope)
>>  {
>>  	int rc;
>> @@ -702,13 +735,20 @@ static int pseries_suspend(u64 handle)
>>  static int pseries_migrate_partition(u64 handle)
>>  {
>>  	int ret;
>> +	unsigned int factor = 0;
>>  
>> +#ifdef CONFIG_PPC_WATCHDOG
>> +	factor = nmi_wd_factor;
>> +#endif
>>  	ret = wait_for_vasi_session_suspending(handle);
>>  	if (ret)
>>  		return ret;
>>  
>>  	vas_migration_handler(VAS_SUSPEND);
>>  
>> +	if (factor)
>> +		watchdog_nmi_set_lpm_factor(factor);
>> +
>>  	ret = pseries_suspend(handle);
>>  	if (ret == 0) {
>>  		post_mobility_fixup();
>> @@ -716,6 +756,9 @@ static int pseries_migrate_partition(u64 handle)
>>  	} else
>>  		pseries_cancel_migration(handle, ret);
>>  
>> +	if (factor)
>> +		watchdog_nmi_set_lpm_factor(0);
>> +
>>  	vas_migration_handler(VAS_RESUME);
>>  
>>  	return ret;
>> -- 
>> 2.36.1
>>
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/4] powerpc/watchdog: introduce a NMI watchdog's factor
  2022-07-12  1:42     ` Nicholas Piggin
@ 2022-07-12  9:51       ` Laurent Dufour
  -1 siblings, 0 replies; 28+ messages in thread
From: Laurent Dufour @ 2022-07-12  9:51 UTC (permalink / raw)
  To: Nicholas Piggin, benh, haren, linux, mpe, nathanl, paulus, wim
  Cc: linux-kernel, linuxppc-dev, linux-watchdog

Le 12/07/2022 à 03:42, Nicholas Piggin a écrit :
> Excerpts from Laurent Dufour's message of June 27, 2022 11:53 pm:
>> Introduce a factor which would apply to the NMI watchdog timeout.
>>
>> This factor is a percentage added to the watchdog_tresh value. The value is
>> set under the watchdog_mutex protection and lockup_detector_reconfigure()
>> is called to recompute wd_panic_timeout_tb.
>>
>> Once the factor is set, it remains until it is set back to 0, which means
>> no impact.
> 
> Looks okay. We could worry about making it more generic or nicer if
> another user came along.
> 
> Could you make the naming a bit more self documenting? 
> watchdog_nmi_set_timeout_pct(), maybe? Does the wd really care
> that it is for LPM in particular?

You're right, the name should not mention lpm.
For my information, what does "pct" stand for?

> 
> Variables and parameters could have a _pct suffix too.

I'll do that.

> 
> Otherwise
> 
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
> 
>>
>> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
>> ---
>>  arch/powerpc/include/asm/nmi.h |  2 ++
>>  arch/powerpc/kernel/watchdog.c | 21 ++++++++++++++++++++-
>>  2 files changed, 22 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/include/asm/nmi.h b/arch/powerpc/include/asm/nmi.h
>> index ea0e487f87b1..7d6a8d9b0543 100644
>> --- a/arch/powerpc/include/asm/nmi.h
>> +++ b/arch/powerpc/include/asm/nmi.h
>> @@ -5,8 +5,10 @@
>>  #ifdef CONFIG_PPC_WATCHDOG
>>  extern void arch_touch_nmi_watchdog(void);
>>  long soft_nmi_interrupt(struct pt_regs *regs);
>> +void watchdog_nmi_set_lpm_factor(u64 factor);
>>  #else
>>  static inline void arch_touch_nmi_watchdog(void) {}
>> +static inline void watchdog_nmi_set_lpm_factor(u64 factor) {}
>>  #endif
>>  
>>  #ifdef CONFIG_NMI_IPI
>> diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
>> index 7d28b9553654..80851b228f71 100644
>> --- a/arch/powerpc/kernel/watchdog.c
>> +++ b/arch/powerpc/kernel/watchdog.c
>> @@ -91,6 +91,10 @@ static cpumask_t wd_smp_cpus_pending;
>>  static cpumask_t wd_smp_cpus_stuck;
>>  static u64 wd_smp_last_reset_tb;
>>  
>> +#ifdef CONFIG_PPC_PSERIES
>> +static u64 wd_factor;
>> +#endif
>> +
>>  /*
>>   * Try to take the exclusive watchdog action / NMI IPI / printing lock.
>>   * wd_smp_lock must be held. If this fails, we should return and wait
>> @@ -527,7 +531,13 @@ static int stop_watchdog_on_cpu(unsigned int cpu)
>>  
>>  static void watchdog_calc_timeouts(void)
>>  {
>> -	wd_panic_timeout_tb = watchdog_thresh * ppc_tb_freq;
>> +	u64 threshold = watchdog_thresh;
>> +
>> +#ifdef CONFIG_PPC_PSERIES
>> +	threshold += (READ_ONCE(wd_factor) * threshold) / 100;
>> +#endif
>> +
>> +	wd_panic_timeout_tb = threshold * ppc_tb_freq;
>>  
>>  	/* Have the SMP detector trigger a bit later */
>>  	wd_smp_panic_timeout_tb = wd_panic_timeout_tb * 3 / 2;
>> @@ -570,3 +580,12 @@ int __init watchdog_nmi_probe(void)
>>  	}
>>  	return 0;
>>  }
>> +
>> +#ifdef CONFIG_PPC_PSERIES
>> +void watchdog_nmi_set_lpm_factor(u64 factor)
>> +{
>> +	pr_info("Set the NMI watchdog factor to %llu%%\n", factor);
>> +	WRITE_ONCE(wd_factor, factor);
>> +	lockup_detector_reconfigure();
>> +}
>> +#endif
>> -- 
>> 2.36.1
>>
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/4] powerpc/watchdog: introduce a NMI watchdog's factor
@ 2022-07-12  9:51       ` Laurent Dufour
  0 siblings, 0 replies; 28+ messages in thread
From: Laurent Dufour @ 2022-07-12  9:51 UTC (permalink / raw)
  To: Nicholas Piggin, benh, haren, linux, mpe, nathanl, paulus, wim
  Cc: linuxppc-dev, linux-kernel, linux-watchdog

Le 12/07/2022 à 03:42, Nicholas Piggin a écrit :
> Excerpts from Laurent Dufour's message of June 27, 2022 11:53 pm:
>> Introduce a factor which would apply to the NMI watchdog timeout.
>>
>> This factor is a percentage added to the watchdog_tresh value. The value is
>> set under the watchdog_mutex protection and lockup_detector_reconfigure()
>> is called to recompute wd_panic_timeout_tb.
>>
>> Once the factor is set, it remains until it is set back to 0, which means
>> no impact.
> 
> Looks okay. We could worry about making it more generic or nicer if
> another user came along.
> 
> Could you make the naming a bit more self documenting? 
> watchdog_nmi_set_timeout_pct(), maybe? Does the wd really care
> that it is for LPM in particular?

You're right, the name should not mention lpm.
For my information, what does "pct" stand for?

> 
> Variables and parameters could have a _pct suffix too.

I'll do that.

> 
> Otherwise
> 
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
> 
>>
>> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
>> ---
>>  arch/powerpc/include/asm/nmi.h |  2 ++
>>  arch/powerpc/kernel/watchdog.c | 21 ++++++++++++++++++++-
>>  2 files changed, 22 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/include/asm/nmi.h b/arch/powerpc/include/asm/nmi.h
>> index ea0e487f87b1..7d6a8d9b0543 100644
>> --- a/arch/powerpc/include/asm/nmi.h
>> +++ b/arch/powerpc/include/asm/nmi.h
>> @@ -5,8 +5,10 @@
>>  #ifdef CONFIG_PPC_WATCHDOG
>>  extern void arch_touch_nmi_watchdog(void);
>>  long soft_nmi_interrupt(struct pt_regs *regs);
>> +void watchdog_nmi_set_lpm_factor(u64 factor);
>>  #else
>>  static inline void arch_touch_nmi_watchdog(void) {}
>> +static inline void watchdog_nmi_set_lpm_factor(u64 factor) {}
>>  #endif
>>  
>>  #ifdef CONFIG_NMI_IPI
>> diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
>> index 7d28b9553654..80851b228f71 100644
>> --- a/arch/powerpc/kernel/watchdog.c
>> +++ b/arch/powerpc/kernel/watchdog.c
>> @@ -91,6 +91,10 @@ static cpumask_t wd_smp_cpus_pending;
>>  static cpumask_t wd_smp_cpus_stuck;
>>  static u64 wd_smp_last_reset_tb;
>>  
>> +#ifdef CONFIG_PPC_PSERIES
>> +static u64 wd_factor;
>> +#endif
>> +
>>  /*
>>   * Try to take the exclusive watchdog action / NMI IPI / printing lock.
>>   * wd_smp_lock must be held. If this fails, we should return and wait
>> @@ -527,7 +531,13 @@ static int stop_watchdog_on_cpu(unsigned int cpu)
>>  
>>  static void watchdog_calc_timeouts(void)
>>  {
>> -	wd_panic_timeout_tb = watchdog_thresh * ppc_tb_freq;
>> +	u64 threshold = watchdog_thresh;
>> +
>> +#ifdef CONFIG_PPC_PSERIES
>> +	threshold += (READ_ONCE(wd_factor) * threshold) / 100;
>> +#endif
>> +
>> +	wd_panic_timeout_tb = threshold * ppc_tb_freq;
>>  
>>  	/* Have the SMP detector trigger a bit later */
>>  	wd_smp_panic_timeout_tb = wd_panic_timeout_tb * 3 / 2;
>> @@ -570,3 +580,12 @@ int __init watchdog_nmi_probe(void)
>>  	}
>>  	return 0;
>>  }
>> +
>> +#ifdef CONFIG_PPC_PSERIES
>> +void watchdog_nmi_set_lpm_factor(u64 factor)
>> +{
>> +	pr_info("Set the NMI watchdog factor to %llu%%\n", factor);
>> +	WRITE_ONCE(wd_factor, factor);
>> +	lockup_detector_reconfigure();
>> +}
>> +#endif
>> -- 
>> 2.36.1
>>
>>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 4/4] pseries/mobility: set NMI watchdog factor during LPM
  2022-07-12  9:47       ` Laurent Dufour
@ 2022-07-13 14:22         ` Laurent Dufour
  -1 siblings, 0 replies; 28+ messages in thread
From: Laurent Dufour @ 2022-07-13 14:22 UTC (permalink / raw)
  To: Nicholas Piggin, nathanl
  Cc: linux-watchdog, linux-kernel, paulus, linux, wim, linuxppc-dev, haren

Le 12/07/2022 à 11:47, Laurent Dufour a écrit :
> Le 12/07/2022 à 03:46, Nicholas Piggin a écrit :
>> Excerpts from Laurent Dufour's message of June 27, 2022 11:53 pm:
>>> During a LPM, while the memory transfer is in progress on the arrival side,
>>> some latencies is generated when accessing not yet transferred pages on the
>>> arrival side. Thus, the NMI watchdog may be triggered too frequently, which
>>> increases the risk to hit a NMI interrupt in a bad place in the kernel,
>>> leading to a kernel panic.
>>>
>>> Disabling the Hard Lockup Watchdog until the memory transfer could be a too
>>> strong work around, some users would want this timeout to be eventually
>>> triggered if the system is hanging even during LPM.
>>>
>>> Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply
>>> a factor to the NMI watchdog timeout during a LPM. Just before the CPU are
>>> stopped for the switchover sequence, the NMI watchdog timer is set to
>>>  watchdog_tresh + factor%
>>>
>>> A value of 0 has no effect. The default value is 200, meaning that the NMI
>>> watchdog is set to 30s during LPM (based on a 10s watchdog_tresh value).
>>> Once the memory transfer is achieved, the factor is reset to 0.
>>>
>>> Setting this value to a high number is like disabling the NMI watchdog
>>> during a LPM.
>>>
>>> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
>>> ---
>>>  Documentation/admin-guide/sysctl/kernel.rst | 12 ++++++
>>>  arch/powerpc/platforms/pseries/mobility.c   | 43 +++++++++++++++++++++
>>>  2 files changed, 55 insertions(+)
>>>
>>> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
>>> index ddccd1077462..0bb0b7f27e96 100644
>>> --- a/Documentation/admin-guide/sysctl/kernel.rst
>>> +++ b/Documentation/admin-guide/sysctl/kernel.rst
>>> @@ -592,6 +592,18 @@ to the guest kernel command line (see
>>>  Documentation/admin-guide/kernel-parameters.rst).
>>>  
>>>  
>>> +nmi_watchdog_factor (PPC only)
>>> +==================================
>>> +
>>> +Factor apply to to the NMI watchdog timeout (only when ``nmi_watchdog`` is
>>> +set to 1). This factor represents the percentage added to
>>> +``watchdog_thresh`` when calculating the NMI watchdog timeout during a
>>> +LPM. The soft lockup timeout is not impacted.
>>
>> Could "LPM" or "mobility" be a bit more prominent in the parameter name
>> and documentation? Something else might want to add a factor as well,
>> one day.
> 
> In the V2 version, Nathan suggested "making the user-visible
> name more generic (e.g. "nmi_watchdog_factor") in case it makes sense to
> apply this to other contexts in the future."
> 
> So I made the change to a more generic name. I think this is a good option
> since the documentation is explicit about the LPM particular case.
> If in the future this factor needs to apply during an other operation that
> name will be generic enough.
> 
> Do you agree ?

Nick and I discussed that.
Nick prefers to have LPM in the tunable names, and thinks we can add a new
tunable if a separate user came up which required it.

We agree that 'nmi_wd_lpm_factor' is a good name.
I'll send a v5 updating that name.

>>
>> Otherwise the code looks okay.
>>
>> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
>>
>>> +
>>> +A value of 0 means no change. The default value is 200 meaning the NMI
>>> +watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10).
>>> +
>>> +
>>>  numa_balancing
>>>  ==============
>>>  
>>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>>> index 907a779074d6..649155faafc2 100644
>>> --- a/arch/powerpc/platforms/pseries/mobility.c
>>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>>> @@ -48,6 +48,39 @@ struct update_props_workarea {
>>>  #define MIGRATION_SCOPE	(1)
>>>  #define PRRN_SCOPE -2
>>>  
>>> +#ifdef CONFIG_PPC_WATCHDOG
>>> +static unsigned int nmi_wd_factor = 200;
>>> +
>>> +#ifdef CONFIG_SYSCTL
>>> +static struct ctl_table nmi_wd_factor_ctl_table[] = {
>>> +	{
>>> +		.procname	= "nmi_watchdog_factor",
>>> +		.data		= &nmi_wd_factor,
>>> +		.maxlen		= sizeof(int),
>>> +		.mode		= 0644,
>>> +		.proc_handler	= proc_douintvec_minmax,
>>> +	},
>>> +	{}
>>> +};
>>> +static struct ctl_table nmi_wd_factor_sysctl_root[] = {
>>> +	{
>>> +		.procname       = "kernel",
>>> +		.mode           = 0555,
>>> +		.child          = nmi_wd_factor_ctl_table,
>>> +	},
>>> +	{}
>>> +};
>>> +
>>> +static int __init register_nmi_wd_factor_sysctl(void)
>>> +{
>>> +	register_sysctl_table(nmi_wd_factor_sysctl_root);
>>> +
>>> +	return 0;
>>> +}
>>> +device_initcall(register_nmi_wd_factor_sysctl);
>>> +#endif /* CONFIG_SYSCTL */
>>> +#endif /* CONFIG_PPC_WATCHDOG */
>>> +
>>>  static int mobility_rtas_call(int token, char *buf, s32 scope)
>>>  {
>>>  	int rc;
>>> @@ -702,13 +735,20 @@ static int pseries_suspend(u64 handle)
>>>  static int pseries_migrate_partition(u64 handle)
>>>  {
>>>  	int ret;
>>> +	unsigned int factor = 0;
>>>  
>>> +#ifdef CONFIG_PPC_WATCHDOG
>>> +	factor = nmi_wd_factor;
>>> +#endif
>>>  	ret = wait_for_vasi_session_suspending(handle);
>>>  	if (ret)
>>>  		return ret;
>>>  
>>>  	vas_migration_handler(VAS_SUSPEND);
>>>  
>>> +	if (factor)
>>> +		watchdog_nmi_set_lpm_factor(factor);
>>> +
>>>  	ret = pseries_suspend(handle);
>>>  	if (ret == 0) {
>>>  		post_mobility_fixup();
>>> @@ -716,6 +756,9 @@ static int pseries_migrate_partition(u64 handle)
>>>  	} else
>>>  		pseries_cancel_migration(handle, ret);
>>>  
>>> +	if (factor)
>>> +		watchdog_nmi_set_lpm_factor(0);
>>> +
>>>  	vas_migration_handler(VAS_RESUME);
>>>  
>>>  	return ret;
>>> -- 
>>> 2.36.1
>>>
>>>
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 4/4] pseries/mobility: set NMI watchdog factor during LPM
@ 2022-07-13 14:22         ` Laurent Dufour
  0 siblings, 0 replies; 28+ messages in thread
From: Laurent Dufour @ 2022-07-13 14:22 UTC (permalink / raw)
  To: Nicholas Piggin, nathanl
  Cc: linux-watchdog, linux-kernel, paulus, haren, wim, linuxppc-dev, linux

Le 12/07/2022 à 11:47, Laurent Dufour a écrit :
> Le 12/07/2022 à 03:46, Nicholas Piggin a écrit :
>> Excerpts from Laurent Dufour's message of June 27, 2022 11:53 pm:
>>> During a LPM, while the memory transfer is in progress on the arrival side,
>>> some latencies is generated when accessing not yet transferred pages on the
>>> arrival side. Thus, the NMI watchdog may be triggered too frequently, which
>>> increases the risk to hit a NMI interrupt in a bad place in the kernel,
>>> leading to a kernel panic.
>>>
>>> Disabling the Hard Lockup Watchdog until the memory transfer could be a too
>>> strong work around, some users would want this timeout to be eventually
>>> triggered if the system is hanging even during LPM.
>>>
>>> Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply
>>> a factor to the NMI watchdog timeout during a LPM. Just before the CPU are
>>> stopped for the switchover sequence, the NMI watchdog timer is set to
>>>  watchdog_tresh + factor%
>>>
>>> A value of 0 has no effect. The default value is 200, meaning that the NMI
>>> watchdog is set to 30s during LPM (based on a 10s watchdog_tresh value).
>>> Once the memory transfer is achieved, the factor is reset to 0.
>>>
>>> Setting this value to a high number is like disabling the NMI watchdog
>>> during a LPM.
>>>
>>> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
>>> ---
>>>  Documentation/admin-guide/sysctl/kernel.rst | 12 ++++++
>>>  arch/powerpc/platforms/pseries/mobility.c   | 43 +++++++++++++++++++++
>>>  2 files changed, 55 insertions(+)
>>>
>>> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
>>> index ddccd1077462..0bb0b7f27e96 100644
>>> --- a/Documentation/admin-guide/sysctl/kernel.rst
>>> +++ b/Documentation/admin-guide/sysctl/kernel.rst
>>> @@ -592,6 +592,18 @@ to the guest kernel command line (see
>>>  Documentation/admin-guide/kernel-parameters.rst).
>>>  
>>>  
>>> +nmi_watchdog_factor (PPC only)
>>> +==================================
>>> +
>>> +Factor apply to to the NMI watchdog timeout (only when ``nmi_watchdog`` is
>>> +set to 1). This factor represents the percentage added to
>>> +``watchdog_thresh`` when calculating the NMI watchdog timeout during a
>>> +LPM. The soft lockup timeout is not impacted.
>>
>> Could "LPM" or "mobility" be a bit more prominent in the parameter name
>> and documentation? Something else might want to add a factor as well,
>> one day.
> 
> In the V2 version, Nathan suggested "making the user-visible
> name more generic (e.g. "nmi_watchdog_factor") in case it makes sense to
> apply this to other contexts in the future."
> 
> So I made the change to a more generic name. I think this is a good option
> since the documentation is explicit about the LPM particular case.
> If in the future this factor needs to apply during an other operation that
> name will be generic enough.
> 
> Do you agree ?

Nick and I discussed that.
Nick prefers to have LPM in the tunable names, and thinks we can add a new
tunable if a separate user came up which required it.

We agree that 'nmi_wd_lpm_factor' is a good name.
I'll send a v5 updating that name.

>>
>> Otherwise the code looks okay.
>>
>> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
>>
>>> +
>>> +A value of 0 means no change. The default value is 200 meaning the NMI
>>> +watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10).
>>> +
>>> +
>>>  numa_balancing
>>>  ==============
>>>  
>>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>>> index 907a779074d6..649155faafc2 100644
>>> --- a/arch/powerpc/platforms/pseries/mobility.c
>>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>>> @@ -48,6 +48,39 @@ struct update_props_workarea {
>>>  #define MIGRATION_SCOPE	(1)
>>>  #define PRRN_SCOPE -2
>>>  
>>> +#ifdef CONFIG_PPC_WATCHDOG
>>> +static unsigned int nmi_wd_factor = 200;
>>> +
>>> +#ifdef CONFIG_SYSCTL
>>> +static struct ctl_table nmi_wd_factor_ctl_table[] = {
>>> +	{
>>> +		.procname	= "nmi_watchdog_factor",
>>> +		.data		= &nmi_wd_factor,
>>> +		.maxlen		= sizeof(int),
>>> +		.mode		= 0644,
>>> +		.proc_handler	= proc_douintvec_minmax,
>>> +	},
>>> +	{}
>>> +};
>>> +static struct ctl_table nmi_wd_factor_sysctl_root[] = {
>>> +	{
>>> +		.procname       = "kernel",
>>> +		.mode           = 0555,
>>> +		.child          = nmi_wd_factor_ctl_table,
>>> +	},
>>> +	{}
>>> +};
>>> +
>>> +static int __init register_nmi_wd_factor_sysctl(void)
>>> +{
>>> +	register_sysctl_table(nmi_wd_factor_sysctl_root);
>>> +
>>> +	return 0;
>>> +}
>>> +device_initcall(register_nmi_wd_factor_sysctl);
>>> +#endif /* CONFIG_SYSCTL */
>>> +#endif /* CONFIG_PPC_WATCHDOG */
>>> +
>>>  static int mobility_rtas_call(int token, char *buf, s32 scope)
>>>  {
>>>  	int rc;
>>> @@ -702,13 +735,20 @@ static int pseries_suspend(u64 handle)
>>>  static int pseries_migrate_partition(u64 handle)
>>>  {
>>>  	int ret;
>>> +	unsigned int factor = 0;
>>>  
>>> +#ifdef CONFIG_PPC_WATCHDOG
>>> +	factor = nmi_wd_factor;
>>> +#endif
>>>  	ret = wait_for_vasi_session_suspending(handle);
>>>  	if (ret)
>>>  		return ret;
>>>  
>>>  	vas_migration_handler(VAS_SUSPEND);
>>>  
>>> +	if (factor)
>>> +		watchdog_nmi_set_lpm_factor(factor);
>>> +
>>>  	ret = pseries_suspend(handle);
>>>  	if (ret == 0) {
>>>  		post_mobility_fixup();
>>> @@ -716,6 +756,9 @@ static int pseries_migrate_partition(u64 handle)
>>>  	} else
>>>  		pseries_cancel_migration(handle, ret);
>>>  
>>> +	if (factor)
>>> +		watchdog_nmi_set_lpm_factor(0);
>>> +
>>>  	vas_migration_handler(VAS_RESUME);
>>>  
>>>  	return ret;
>>> -- 
>>> 2.36.1
>>>
>>>
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-07-13 14:23 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-27 13:53 [PATCH v3 0/4] Extending NMI watchdog during LPM Laurent Dufour
2022-06-27 13:53 ` Laurent Dufour
2022-06-27 13:53 ` [PATCH v3 1/4] powerpc/mobility: wait for memory transfer to complete Laurent Dufour
2022-06-27 13:53   ` Laurent Dufour
2022-07-12  1:33   ` Nicholas Piggin
2022-07-12  1:33     ` Nicholas Piggin
2022-07-12  9:38     ` Laurent Dufour
2022-07-12  9:38       ` Laurent Dufour
2022-06-27 13:53 ` [PATCH v3 2/4] watchdog: export lockup_detector_reconfigure Laurent Dufour
2022-06-27 13:53   ` Laurent Dufour
2022-06-27 13:53 ` [PATCH v3 3/4] powerpc/watchdog: introduce a NMI watchdog's factor Laurent Dufour
2022-06-27 13:53   ` Laurent Dufour
2022-07-12  1:42   ` Nicholas Piggin
2022-07-12  1:42     ` Nicholas Piggin
2022-07-12  9:51     ` Laurent Dufour
2022-07-12  9:51       ` Laurent Dufour
2022-06-27 13:53 ` [PATCH v3 4/4] pseries/mobility: set NMI watchdog factor during LPM Laurent Dufour
2022-06-27 13:53   ` Laurent Dufour
2022-07-12  1:46   ` Nicholas Piggin
2022-07-12  1:46     ` Nicholas Piggin
2022-07-12  9:47     ` Laurent Dufour
2022-07-12  9:47       ` Laurent Dufour
2022-07-13 14:22       ` Laurent Dufour
2022-07-13 14:22         ` Laurent Dufour
2022-07-12  1:21 ` [PATCH v3 0/4] Extending NMI watchdog " Nicholas Piggin
2022-07-12  1:21   ` Nicholas Piggin
2022-07-12  9:32   ` Laurent Dufour
2022-07-12  9:32     ` Laurent Dufour

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.