linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nathan Lynch <nathanl@linux.ibm.com>
To: Laurent Dufour <ldufour@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	mpe@ellerman.id.au, benh@kernel.crashing.org, paulus@samba.org,
	haren@linux.vnet.ibm.com, npiggin@gmail.com
Subject: Re: [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer
Date: Thu, 02 Jun 2022 12:58:31 -0500	[thread overview]
Message-ID: <87a6av0wxk.fsf@linux.ibm.com> (raw)
In-Reply-To: <20220601155315.35109-1-ldufour@linux.ibm.com>

Laurent Dufour <ldufour@linux.ibm.com> writes:
> When a partition is transferred, once it arrives at the destination node,
> the partition is active but much of its memory must be transferred from the
> start node.
>
> It depends on the activity in the partition, but the more CPU the partition
> has, the more memory to be transferred is likely to be. This causes latency
> when accessing pages that need to be transferred, and often, for large
> partitions, it triggers the NMI watchdog.

It also triggers warnings from other watchdogs and subsystems that
have soft latency requirements  - softlockup, RCU, workqueue. The issue
is more general than the NMI watchdog.

> The NMI watchdog causes the CPU stack to dump where it appears to be
> stuck. In this case, it does not bring much information since it can happen
> during any memory access of the kernel.

When the site of a watchdog backtrace shows a thread stuck on a routine
memory access as opposed to something like a lock acquisition, that is
actually useful information that shouldn't be discarded. It tells us the
platform is failing to adequately virtualize partition memory. This
isn't a benign situation and it's likely to unacceptably affect real
workloads. The kernel is ideally situated to detect and warn about this.

> In addition, the NMI interrupt mechanism is not secure and can generate a
> dump system in the event that the interruption is taken while
> MSR[RI]=0.

This sounds like a general problem with that facility that isn't
specific to partition migration? Maybe it should be disabled altogether
until that can be fixed?

> Given how often hard lockups are detected when transferring large
> partitions, it seems best to disable the watchdog NMI until the memory
> transfer from the start node is complete.

At this time, I'm far from convinced. Disabling the watchdog is going to
make the underlying problems in the platform and/or network harder to
understand.

  parent reply	other threads:[~2022-06-02 17:58 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-01 15:53 [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer Laurent Dufour
2022-06-01 15:53 ` [PATCH 1/2] powerpc/mobility: Wait for memory transfer to complete Laurent Dufour
2022-06-01 15:53 ` [PATCH 2/2] powerpc/mobility: disabling hard lockup watchdog during LPM Laurent Dufour
2022-06-06  1:41   ` kernel test robot
2022-06-02 17:58 ` Nathan Lynch [this message]
2022-06-03  8:59   ` [PATCH 0/2] Disabling NMI watchdog during LPM's memory transfer Laurent Dufour
2022-06-06 20:00     ` Nathan Lynch
2022-06-09  7:45       ` Michael Ellerman
2022-06-09  9:09         ` Laurent Dufour

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a6av0wxk.fsf@linux.ibm.com \
    --to=nathanl@linux.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=haren@linux.vnet.ibm.com \
    --cc=ldufour@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).