* [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer @ 2022-06-01 15:53 Laurent Dufour 2022-06-01 15:53 ` [PATCH 1/2] powerpc/mobility: Wait for memory transfer to complete Laurent Dufour ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Laurent Dufour @ 2022-06-01 15:53 UTC (permalink / raw) To: mpe, benh, paulus, nathanl, haren, npiggin; +Cc: linuxppc-dev, linux-kernel When a partition is transferred, once it arrives at the destination node, the partition is active but much of its memory must be transferred from the start node. It depends on the activity in the partition, but the more CPU the partition has, the more memory to be transferred is likely to be. This causes latency when accessing pages that need to be transferred, and often, for large partitions, it triggers the NMI watchdog. The NMI watchdog causes the CPU stack to dump where it appears to be stuck. In this case, it does not bring much information since it can happen during any memory access of the kernel. In addition, the NMI interrupt mechanism is not secure and can generate a dump system in the event that the interruption is taken while MSR[RI]=0. Given how often hard lockups are detected when transferring large partitions, it seems best to disable the watchdog NMI until the memory transfer from the start node is complete. The first patch in this series waits for the memory transfer to complete, the second disables the watchdog NMI just before stopping the CPUs and reactivates it when the memory transfer is complete. Laurent Dufour (2): powerpc/mobility: Wait for memory transfer to complete powerpc/mobility: disabling hard lockup watchdog during LPM arch/powerpc/platforms/pseries/mobility.c | 40 +++++++++++++++++++++-- 1 file changed, 38 insertions(+), 2 deletions(-) -- 2.36.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] powerpc/mobility: Wait for memory transfer to complete 2022-06-01 15:53 [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer Laurent Dufour @ 2022-06-01 15:53 ` Laurent Dufour 2022-06-01 15:53 ` [PATCH 2/2] powerpc/mobility: disabling hard lockup watchdog during LPM Laurent Dufour 2022-06-02 17:58 ` [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer Nathan Lynch 2 siblings, 0 replies; 9+ messages in thread From: Laurent Dufour @ 2022-06-01 15:53 UTC (permalink / raw) To: mpe, benh, paulus, nathanl, haren, npiggin; +Cc: linuxppc-dev, linux-kernel In pseries_migration_partition(), loop until the memory transfer is complete. This way the calling drmgr process will not exit earlier, allowing callbacks to be run only once the migration is fully completed. This will also allow to manage the NMI watchdog state in the next commits. Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> --- arch/powerpc/platforms/pseries/mobility.c | 34 +++++++++++++++++++++-- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index 78f3f74c7056..55612a1b07d6 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -427,6 +427,35 @@ static int wait_for_vasi_session_suspending(u64 handle) return ret; } +static void wait_for_vasi_session_completed(u64 handle) +{ + unsigned long state = 0; + int ret; + + pr_info("waiting for memory transfert to complete...\n"); + /* + * Wait for transition from H_VASI_RESUMED to + * H_VASI_COMPLETED. Treat anything else as an error. + */ + while (true) { + ret = poll_vasi_state(handle, &state); + + if (ret || state == H_VASI_COMPLETED) + break; + + if (state != H_VASI_RESUMED) { + pr_err("unexpected H_VASI_STATE result %lu\n", state); + ret = -EIO; + break; + } + + msleep(500); + } + + pr_info("memory transfert completed (ret:%d state:%ld).\n", + ret, state); +} + static void prod_single(unsigned int target_cpu) { long hvrc; @@ -673,9 +702,10 @@ static int pseries_migrate_partition(u64 handle) vas_migration_handler(VAS_SUSPEND); ret = pseries_suspend(handle); - if (ret == 0) + if (ret == 0) { post_mobility_fixup(); - else + wait_for_vasi_session_completed(handle); + } else pseries_cancel_migration(handle, ret); vas_migration_handler(VAS_RESUME); -- 2.36.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] powerpc/mobility: disabling hard lockup watchdog during LPM 2022-06-01 15:53 [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer Laurent Dufour 2022-06-01 15:53 ` [PATCH 1/2] powerpc/mobility: Wait for memory transfer to complete Laurent Dufour @ 2022-06-01 15:53 ` Laurent Dufour 2022-06-06 1:41 ` kernel test robot 2022-06-02 17:58 ` [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer Nathan Lynch 2 siblings, 1 reply; 9+ messages in thread From: Laurent Dufour @ 2022-06-01 15:53 UTC (permalink / raw) To: mpe, benh, paulus, nathanl, haren, npiggin; +Cc: linuxppc-dev, linux-kernel Disabling the Hard Lockup Watchdog until the memory transfer is complete. This avoids hard lockup seen while the memory is still in progress when the system is heavily loaded and a lot of pages are still not transferred on the arrival side. Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> --- arch/powerpc/platforms/pseries/mobility.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index 55612a1b07d6..061d4faefefb 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -701,6 +701,9 @@ static int pseries_migrate_partition(u64 handle) vas_migration_handler(VAS_SUSPEND); + pr_debug("Disabling the NMI watchdog\n"); + watchdog_nmi_stop(); + ret = pseries_suspend(handle); if (ret == 0) { post_mobility_fixup(); @@ -708,6 +711,9 @@ static int pseries_migrate_partition(u64 handle) } else pseries_cancel_migration(handle, ret); + pr_debug("Enabling the NMI watchdog again\n"); + watchdog_nmi_start(); + vas_migration_handler(VAS_RESUME); return ret; -- 2.36.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] powerpc/mobility: disabling hard lockup watchdog during LPM 2022-06-01 15:53 ` [PATCH 2/2] powerpc/mobility: disabling hard lockup watchdog during LPM Laurent Dufour @ 2022-06-06 1:41 ` kernel test robot 0 siblings, 0 replies; 9+ messages in thread From: kernel test robot @ 2022-06-06 1:41 UTC (permalink / raw) To: Laurent Dufour, mpe, benh, paulus, nathanl, haren, npiggin Cc: kbuild-all, linuxppc-dev, linux-kernel Hi Laurent, Thank you for the patch! Yet something to improve: [auto build test ERROR on powerpc/next] [also build test ERROR on v5.18 next-20220603] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/intel-lab-lkp/linux/commits/Laurent-Dufour/Disabling-NMI-watchdog-during-LPM-s-memory-transfer/20220601-235741 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc64-randconfig-r002-20220605 (https://download.01.org/0day-ci/archive/20220606/202206060910.rYNTFqdI-lkp@intel.com/config) compiler: powerpc64-linux-gcc (GCC) 11.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/d409d8549db37257e33691523100679a23cfd887 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Laurent-Dufour/Disabling-NMI-watchdog-during-LPM-s-memory-transfer/20220601-235741 git checkout d409d8549db37257e33691523100679a23cfd887 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=powerpc SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): powerpc64-linux-ld: warning: discarding dynamic section .glink powerpc64-linux-ld: warning: discarding dynamic section .plt powerpc64-linux-ld: linkage table error against `watchdog_nmi_stop' powerpc64-linux-ld: stubs don't match calculated size powerpc64-linux-ld: can not build stubs: bad value powerpc64-linux-ld: arch/powerpc/platforms/pseries/mobility.o: in function `.pseries_migrate_partition': >> arch/powerpc/platforms/pseries/mobility.c:705: undefined reference to `.watchdog_nmi_stop' >> powerpc64-linux-ld: arch/powerpc/platforms/pseries/mobility.c:715: undefined reference to `.watchdog_nmi_start' vim +705 arch/powerpc/platforms/pseries/mobility.c 693 694 static int pseries_migrate_partition(u64 handle) 695 { 696 int ret; 697 698 ret = wait_for_vasi_session_suspending(handle); 699 if (ret) 700 return ret; 701 702 vas_migration_handler(VAS_SUSPEND); 703 704 pr_debug("Disabling the NMI watchdog\n"); > 705 watchdog_nmi_stop(); 706 707 ret = pseries_suspend(handle); 708 if (ret == 0) { 709 post_mobility_fixup(); 710 wait_for_vasi_session_completed(handle); 711 } else 712 pseries_cancel_migration(handle, ret); 713 714 pr_debug("Enabling the NMI watchdog again\n"); > 715 watchdog_nmi_start(); 716 717 vas_migration_handler(VAS_RESUME); 718 719 return ret; 720 } 721 -- 0-DAY CI Kernel Test Service https://01.org/lkp ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer 2022-06-01 15:53 [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer Laurent Dufour 2022-06-01 15:53 ` [PATCH 1/2] powerpc/mobility: Wait for memory transfer to complete Laurent Dufour 2022-06-01 15:53 ` [PATCH 2/2] powerpc/mobility: disabling hard lockup watchdog during LPM Laurent Dufour @ 2022-06-02 17:58 ` Nathan Lynch 2022-06-03 8:59 ` Laurent Dufour 2 siblings, 1 reply; 9+ messages in thread From: Nathan Lynch @ 2022-06-02 17:58 UTC (permalink / raw) To: Laurent Dufour Cc: linuxppc-dev, linux-kernel, mpe, benh, paulus, haren, npiggin Laurent Dufour <ldufour@linux.ibm.com> writes: > When a partition is transferred, once it arrives at the destination node, > the partition is active but much of its memory must be transferred from the > start node. > > It depends on the activity in the partition, but the more CPU the partition > has, the more memory to be transferred is likely to be. This causes latency > when accessing pages that need to be transferred, and often, for large > partitions, it triggers the NMI watchdog. It also triggers warnings from other watchdogs and subsystems that have soft latency requirements - softlockup, RCU, workqueue. The issue is more general than the NMI watchdog. > The NMI watchdog causes the CPU stack to dump where it appears to be > stuck. In this case, it does not bring much information since it can happen > during any memory access of the kernel. When the site of a watchdog backtrace shows a thread stuck on a routine memory access as opposed to something like a lock acquisition, that is actually useful information that shouldn't be discarded. It tells us the platform is failing to adequately virtualize partition memory. This isn't a benign situation and it's likely to unacceptably affect real workloads. The kernel is ideally situated to detect and warn about this. > In addition, the NMI interrupt mechanism is not secure and can generate a > dump system in the event that the interruption is taken while > MSR[RI]=0. This sounds like a general problem with that facility that isn't specific to partition migration? Maybe it should be disabled altogether until that can be fixed? > Given how often hard lockups are detected when transferring large > partitions, it seems best to disable the watchdog NMI until the memory > transfer from the start node is complete. At this time, I'm far from convinced. Disabling the watchdog is going to make the underlying problems in the platform and/or network harder to understand. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer 2022-06-02 17:58 ` [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer Nathan Lynch @ 2022-06-03 8:59 ` Laurent Dufour 2022-06-06 20:00 ` Nathan Lynch 0 siblings, 1 reply; 9+ messages in thread From: Laurent Dufour @ 2022-06-03 8:59 UTC (permalink / raw) To: Nathan Lynch Cc: linuxppc-dev, linux-kernel, mpe, benh, paulus, haren, npiggin On 02/06/2022, 19:58:31, Nathan Lynch wrote: > Laurent Dufour <ldufour@linux.ibm.com> writes: >> When a partition is transferred, once it arrives at the destination node, >> the partition is active but much of its memory must be transferred from the >> start node. >> >> It depends on the activity in the partition, but the more CPU the partition >> has, the more memory to be transferred is likely to be. This causes latency >> when accessing pages that need to be transferred, and often, for large >> partitions, it triggers the NMI watchdog. > > It also triggers warnings from other watchdogs and subsystems that > have soft latency requirements - softlockup, RCU, workqueue. The issue > is more general than the NMI watchdog. I agree, but, as you can read in the title, this series is focusing on the NMI watchdog which may have some dangerous side effects. >> The NMI watchdog causes the CPU stack to dump where it appears to be >> stuck. In this case, it does not bring much information since it can happen >> during any memory access of the kernel. > > When the site of a watchdog backtrace shows a thread stuck on a routine > memory access as opposed to something like a lock acquisition, that is > actually useful information that shouldn't be discarded. It tells us the > platform is failing to adequately virtualize partition memory. This > isn't a benign situation and it's likely to unacceptably affect real > workloads. The kernel is ideally situated to detect and warn about this. > I agree, but the information provided are most of the time misleading, pointing to various part in the kernel where the last page fault of a series generated by the kernel happened. There is no real added value, since this is well known that the memory transfer is introducing latency that is detected by the kernel. Furthermore, soft lockups are still triggered and report as well this latency without any side effect. >> In addition, the NMI interrupt mechanism is not secure and can generate a >> dump system in the event that the interruption is taken while >> MSR[RI]=0. > > This sounds like a general problem with that facility that isn't > specific to partition migration? Maybe it should be disabled altogether > until that can be fixed? We already discuss that with Nick and it sounds that it is not so easy to fix that. Furthermore, the NMI watchdog is considered as last option for analyzing a potential dying system. So taking the risk of generating a crash because of the NMI interrupt looks acceptable. But disabling it totally because of that is not the right option. In the LPM's case, the system is dependent on the LPM's latency, it is not really dying or in a really bad shape, so that risk is too expansive. Fixing the latency at the source is definitively the best option, and the PHYP team is already investigating that. But, in the meantime, there is a way to prevent the system to die because of that side effect by disabling the NMI watchdog during the memory transfer. > >> Given how often hard lockups are detected when transferring large >> partitions, it seems best to disable the watchdog NMI until the memory >> transfer from the start node is complete. > > At this time, I'm far from convinced. Disabling the watchdog is going to > make the underlying problems in the platform and/or network harder to > understand. I was also reluctant, and would like the NMI watchdog to remain active during LPM. But there is currently no other way to work around the LPM's latency, and its potential risk of system crash. I've spent a lot of time analyzing many crashes happening during LPM and all of them are now pointing to the NMI watchdog issue. Furthermore, on a system with thousands of CPUs, I saw a system crash because a CPU was not able to respond in time (1s) to the NMI interrupt and thus generate the panic. In addition, we now know that a RTAS call, made right after the system is running again on the arrival side, is taking ages and is most of the time triggering the NMI watchdog. There are ongoing investigations to clarify where and how this latency is happening. I'm not excluding any other issue in the Linux kernel, but right now, this looks to be the best option to prevent system crash during LPM. I'm hoping that the PHYP team will be able to improve that latency. At that time, this commit can be reverted, but until then, I don't see how we can do without that workaround. Laurent. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer 2022-06-03 8:59 ` Laurent Dufour @ 2022-06-06 20:00 ` Nathan Lynch 2022-06-09 7:45 ` Michael Ellerman 0 siblings, 1 reply; 9+ messages in thread From: Nathan Lynch @ 2022-06-06 20:00 UTC (permalink / raw) To: Laurent Dufour; +Cc: linux-kernel, npiggin, paulus, linuxppc-dev, haren Laurent Dufour <ldufour@linux.ibm.com> writes: > On 02/06/2022, 19:58:31, Nathan Lynch wrote: >> Laurent Dufour <ldufour@linux.ibm.com> writes: >>> When a partition is transferred, once it arrives at the destination node, >>> the partition is active but much of its memory must be transferred from the >>> start node. >>> >>> It depends on the activity in the partition, but the more CPU the partition >>> has, the more memory to be transferred is likely to be. This causes latency >>> when accessing pages that need to be transferred, and often, for large >>> partitions, it triggers the NMI watchdog. >> >> It also triggers warnings from other watchdogs and subsystems that >> have soft latency requirements - softlockup, RCU, workqueue. The issue >> is more general than the NMI watchdog. > > I agree, but, as you can read in the title, this series is focusing on the > NMI watchdog which may have some dangerous side effects. Sure, I read the subject line. I'm saying that focus may be too narrow. > >>> The NMI watchdog causes the CPU stack to dump where it appears to be >>> stuck. In this case, it does not bring much information since it can happen >>> during any memory access of the kernel. >> >> When the site of a watchdog backtrace shows a thread stuck on a routine >> memory access as opposed to something like a lock acquisition, that is >> actually useful information that shouldn't be discarded. It tells us the >> platform is failing to adequately virtualize partition memory. This >> isn't a benign situation and it's likely to unacceptably affect real >> workloads. The kernel is ideally situated to detect and warn about this. >> > > I agree, but the information provided are most of the time misleading, > pointing to various part in the kernel where the last page fault of a > series generated by the kernel happened. There is no real added value, > since this is well known that the memory transfer is introducing latency > that is detected by the kernel. Hmm, I don't understand why it would be considered misleading when the stack trace shows where the thread has been stuck. And this behavior of the platform, where resolving post-resume memory accesses takes multiple seconds under certain conditions has not been well-understood by us until recently. > Furthermore, soft lockups are still > triggered and report as well this latency without any side effect. It's fair to say that the softlockup watchdog does not panic in the configurations that our internal test environments happen to use. But real users can (and do) enable these: /proc/sys/kernel/hardlockup_panic /proc/sys/kernel/hung_task_panic /proc/sys/kernel/panic_on_rcu_stall /proc/sys/kernel/softlockup_panic And if so, they likely expect that the OS will simply panic and reboot when a condition arises that causes memory access times to exceed the corresponding timeout or threshold. Even during a partition migration. >>> In addition, the NMI interrupt mechanism is not secure and can generate a >>> dump system in the event that the interruption is taken while >>> MSR[RI]=0. >> >> This sounds like a general problem with that facility that isn't >> specific to partition migration? Maybe it should be disabled altogether >> until that can be fixed? > > We already discuss that with Nick and it sounds that it is not so easy to > fix that. Furthermore, the NMI watchdog is considered as last option for > analyzing a potential dying system. So taking the risk of generating a > crash because of the NMI interrupt looks acceptable. But disabling it > totally because of that is not the right option. OK. > In the LPM's case, the system is dependent on the LPM's latency, it is not > really dying or in a really bad shape, so that risk is too expansive. I would say the partition OS is actually in very bad shape if memory accesses are taking dozens of seconds or more. Any real workload is likely to be affected to an unacceptable degree. It depends on the situation, but some users may prefer a panic+reboot to waiting for the situation to resolve. And this change would effectively prevent the kernel from carrying out that policy. > Fixing the latency at the source is definitively the best option, and the > PHYP team is already investigating that. But, in the meantime, there is a > way to prevent the system to die because of that side effect by disabling > the NMI watchdog during the memory transfer. > >> >>> Given how often hard lockups are detected when transferring large >>> partitions, it seems best to disable the watchdog NMI until the memory >>> transfer from the start node is complete. >> >> At this time, I'm far from convinced. Disabling the watchdog is going to >> make the underlying problems in the platform and/or network harder to >> understand. > > I was also reluctant, and would like the NMI watchdog to remain active > during LPM. But there is currently no other way to work around the LPM's > latency, and its potential risk of system crash. > > I've spent a lot of time analyzing many crashes happening during LPM and > all of them are now pointing to the NMI watchdog issue. Furthermore, on a > system with thousands of CPUs, I saw a system crash because a CPU was not > able to respond in time (1s) to the NMI interrupt and thus generate > the panic. > > In addition, we now know that a RTAS call, made right after the system is > running again on the arrival side, is taking ages and is most of the time > triggering the NMI watchdog. That's good to know. > There are ongoing investigations to clarify where and how this latency is > happening. I'm not excluding any other issue in the Linux kernel, but right > now, this looks to be the best option to prevent system crash during > LPM. It will prevent the likely crash mode for enterprise distros with default watchdog tunables that our internal test environments happen to use. But if someone were to run the same scenario with softlockup_panic enabled, or with the RCU stall timeout lower than the watchdog threshold, the failure mode would be different. Basically I'm saying: * Some users may actually want the OS to panic when it's in this state, because their applications can't work correctly. * But if we're going to inhibit one watchdog, we should inhibit them all. I wonder if we should freeze processes across the suspend, thawing them on the destination only after the device tree update is complete, perhaps even waiting until the VASI state transitions to "Completed". Suspending the workload for some time after resume would reduce the number of demand faults that have to be serviced. If that provides better overall behavior then we could avoid disabling watchdogs. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer 2022-06-06 20:00 ` Nathan Lynch @ 2022-06-09 7:45 ` Michael Ellerman 2022-06-09 9:09 ` Laurent Dufour 0 siblings, 1 reply; 9+ messages in thread From: Michael Ellerman @ 2022-06-09 7:45 UTC (permalink / raw) To: Nathan Lynch, Laurent Dufour Cc: linux-kernel, npiggin, paulus, linuxppc-dev, haren Nathan Lynch <nathanl@linux.ibm.com> writes: > Laurent Dufour <ldufour@linux.ibm.com> writes: ... > >> There are ongoing investigations to clarify where and how this latency is >> happening. I'm not excluding any other issue in the Linux kernel, but right >> now, this looks to be the best option to prevent system crash during >> LPM. > > It will prevent the likely crash mode for enterprise distros with > default watchdog tunables that our internal test environments happen to > use. But if someone were to run the same scenario with softlockup_panic > enabled, or with the RCU stall timeout lower than the watchdog > threshold, the failure mode would be different. > > Basically I'm saying: > * Some users may actually want the OS to panic when it's in this state, > because their applications can't work correctly. > * But if we're going to inhibit one watchdog, we should inhibit them > all. I'm sympathetic to both of your arguments. But I think there is a key difference between the NMI watchdog and other watchdogs, which is that the NMI watchdog will use the unsafe NMI to interrupt other CPUs, and that can cause the system to crash when other watchdogs would just print a backtrace. We had the same problem with the rcu_sched stall detector until we changed it to use the "safe" NMI, see: 5cc05910f26e ("powerpc/64s: Wire up arch_trigger_cpumask_backtrace()") So even if the NMI watchdog is disabled there are still the other watchdogs enabled, which should print backtraces by default, and if desired can also be configured to cause a panic. Instead of disabling the NMI watchdog, can we instead increase the timeout (by how much?) during LPM, so that it is less likely to fire in normal usage, but is still there as a backup if the system is completely clogged. cheers ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer 2022-06-09 7:45 ` Michael Ellerman @ 2022-06-09 9:09 ` Laurent Dufour 0 siblings, 0 replies; 9+ messages in thread From: Laurent Dufour @ 2022-06-09 9:09 UTC (permalink / raw) To: Michael Ellerman, Nathan Lynch Cc: linux-kernel, npiggin, paulus, linuxppc-dev, haren On 09/06/2022, 09:45:49, Michael Ellerman wrote: > Nathan Lynch <nathanl@linux.ibm.com> writes: >> Laurent Dufour <ldufour@linux.ibm.com> writes: > ... >> >>> There are ongoing investigations to clarify where and how this latency is >>> happening. I'm not excluding any other issue in the Linux kernel, but right >>> now, this looks to be the best option to prevent system crash during >>> LPM. >> >> It will prevent the likely crash mode for enterprise distros with >> default watchdog tunables that our internal test environments happen to >> use. But if someone were to run the same scenario with softlockup_panic >> enabled, or with the RCU stall timeout lower than the watchdog >> threshold, the failure mode would be different. >> >> Basically I'm saying: >> * Some users may actually want the OS to panic when it's in this state, >> because their applications can't work correctly. >> * But if we're going to inhibit one watchdog, we should inhibit them >> all. > > I'm sympathetic to both of your arguments. > > But I think there is a key difference between the NMI watchdog and other > watchdogs, which is that the NMI watchdog will use the unsafe NMI to > interrupt other CPUs, and that can cause the system to crash when other > watchdogs would just print a backtrace. > > We had the same problem with the rcu_sched stall detector until we > changed it to use the "safe" NMI, see: > 5cc05910f26e ("powerpc/64s: Wire up arch_trigger_cpumask_backtrace()") > > > So even if the NMI watchdog is disabled there are still the other > watchdogs enabled, which should print backtraces by default, and if > desired can also be configured to cause a panic. > > Instead of disabling the NMI watchdog, can we instead increase the > timeout (by how much?) during LPM, so that it is less likely to fire in > normal usage, but is still there as a backup if the system is completely > clogged. That's probably doable, tweaking wd_smp_panic_timeout_tb and wd_panic_timeout_tb when the LPM is in progress. I'll add a new sysctl value, so administrator will have the capability to change that and also fully disable the NMI watchdog during LPM if he want. I've no idea what should be the default factor, I guess this will be a bit empiric. I'll rework my patch in that way. cheers, Laurent. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-06-09 9:10 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-06-01 15:53 [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer Laurent Dufour 2022-06-01 15:53 ` [PATCH 1/2] powerpc/mobility: Wait for memory transfer to complete Laurent Dufour 2022-06-01 15:53 ` [PATCH 2/2] powerpc/mobility: disabling hard lockup watchdog during LPM Laurent Dufour 2022-06-06 1:41 ` kernel test robot 2022-06-02 17:58 ` [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer Nathan Lynch 2022-06-03 8:59 ` Laurent Dufour 2022-06-06 20:00 ` Nathan Lynch 2022-06-09 7:45 ` Michael Ellerman 2022-06-09 9:09 ` Laurent Dufour
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).