linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sebastian Ene <sebastianene@google.com>
To: Guenter Roeck <linux@roeck-us.net>
Cc: Wim Van Sebroeck <wim@linux-watchdog.org>,
	Rob Herring <robh+dt@kernel.org>,
	devicetree@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-watchdog@vger.kernel.org, will@kernel.org,
	qperret@google.com, maz@kernel.org
Subject: Re: [PATCH 2/2] watchdog: Add a mechanism to detect stalls on guest vCPUs
Date: Fri, 8 Apr 2022 14:03:16 +0000	[thread overview]
Message-ID: <YlBApO66qMP7oC3c@google.com> (raw)
In-Reply-To: <ebc0923a-48c1-ccd4-6b89-c4ba9ac48da4@roeck-us.net>

On Wed, Apr 06, 2022 at 09:52:05AM -0700, Guenter Roeck wrote:
> On 4/6/22 09:31, Sebastian Ene wrote:
> > On Tue, Apr 05, 2022 at 02:15:51PM -0700, Guenter Roeck wrote:
> > > Sebastian,
> > > 
> > 
> > Hello Guenter,
> > 
> > > On Tue, Apr 05, 2022 at 02:19:55PM +0000, Sebastian Ene wrote:
> > > > This patch adds support for a virtual watchdog which relies on the
> > > > per-cpu hrtimers to pet at regular intervals.
> > > > 
> > > 
> > > The watchdog subsystem is not intended to detect soft and hard lockups.
> > > It is intended to detect userspace issues. A watchdog driver requires
> > > a userspace compinent which needs to ping the watchdog on a regular basis
> > > to prevent timeouts (and watchdog drivers are supposed to use the
> > > watchdog kernel API).
> > > 
> > 
> > Thanks for getting back ! I wanted to create a mechanism to detect
> > stalls on vCPUs and I am not sure if the current watchdog subsystem has a way
> > to create per-CPU binded watchdogs (in the same way as Power PC has
> > kernel/watchdog.c).
> > The per-CPU watchdog is needed to account for time that the guest is not
> > running(either scheduled out or waiting for an event) to prevent spurious
> > reset events caused by the watchdog.
> > 
> > > What you have here is a CPU stall detection mechanism, similar to the
> > > existing soft/hard lockup detection mechanism. This code does not
> > > belong into the watchdog subsystem; it is similar to the existing
> > > hard/softlockup detection code (kernel/watchdog.c) and should reside
> > > at the same location.
> > > 
> > 
> > I agree that this doesn't belong to the watchdog subsytem but the current
> > stall detection mechanism calls through MMIO into a virtual device
> > 'qemu,virt-watchdog'. Calling a device from (kernel/watchdog.c) isn't
> > something that we should avoid ?
> > 

Hello Guenter,

> 
> You are introducing qemu,virt-watchdog, so it seems to me that any argument
> along that line doesn't really apply.
> 

I am trying to follow your guidelines to make this work, so I would be grateful
if you have some time to share your thoughts on this. 

> I think it is more a matter for core kernel developers to discuss and
> decide how this functionality is best instantiated. It doesn't _have_
> to be a device, after all, just like the current lockup detection
> code is not a device. Either case, I am not really the right person
> to discuss this since it is a matter of core kernel code which I am
> not sufficiently familiar with. All I can say is that watchdog drivers
> in the watchdog subsystem have a different scope.

This watchdog device tracks the elapsed time on a per-cpu basis, since KVM
schedules vCPUs independently.
I am attempting to re-write it to use the watchdog-core infrastructure but
doing this will loose the per-cpu watchdog binding and exposing it to
userspace would require a strong thread affinity setting. How can I overcome
this problem ?

Having it like a hard lockup detector mechanism doesn’t work either because
when the watchdog expires, we rely on crosvm (not the guest kernel) to handle
this event and reset the machine. We cannot inject the reset event back into
the guest as we don’t have NMI support on arm64.

> 
> Guenter

Thanks,
Sebastian

> 
> > > Having said that, I could imagine a watchdog driver to be used in VMs,
> > > but that would be similar to existing watchdog drivers. The easiest way
> > > to get there would probably be to just instantiate one of the watchdog
> > > devices already supported by qemu.
> > > 
> > 
> > I am looking forward for your response,
> > 
> > > Guenter
> > 
> > Cheers,
> > Sebastian
> 

      reply	other threads:[~2022-04-08 14:03 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-05 14:19 [PATCH 0/2] Detect stalls on guest vCPUS Sebastian Ene
2022-04-05 14:19 ` [PATCH 1/2] dt-bindings: watchdog: Add qemu,vm-watchdog compatible Sebastian Ene
2022-04-06 12:25   ` Rob Herring
2022-04-06 16:34     ` Sebastian Ene
2022-04-05 14:19 ` [PATCH 2/2] watchdog: Add a mechanism to detect stalls on guest vCPUs Sebastian Ene
2022-04-05 21:15   ` Guenter Roeck
2022-04-06 16:31     ` Sebastian Ene
2022-04-06 16:52       ` Guenter Roeck
2022-04-08 14:03         ` Sebastian Ene [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YlBApO66qMP7oC3c@google.com \
    --to=sebastianene@google.com \
    --cc=devicetree@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-watchdog@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=maz@kernel.org \
    --cc=qperret@google.com \
    --cc=robh+dt@kernel.org \
    --cc=will@kernel.org \
    --cc=wim@linux-watchdog.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).