All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Ernberg <john.ernberg@actia.se>
To: Bertrand Marquis <Bertrand.Marquis@arm.com>,
	Julien Grall <julien@xen.org>
Cc: Peng Fan <peng.fan@nxp.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Michal Orzel <michal.orzel@amd.com>,
	Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>,
	Jonas Blixt <jonas.blixt@actia.se>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Subject: Re: [PATCH 1/2] xen/arm: Add imx8q{m,x} platform glue
Date: Thu, 21 Mar 2024 16:05:37 +0000	[thread overview]
Message-ID: <1f5eba1a-7dc3-41f2-8986-26452c264770@actia.se> (raw)
In-Reply-To: <2F07325F-428F-4570-BE14-B84DD0A1C9F7@arm.com>

Hi Bertrand,

On 3/21/24 08:41, Bertrand Marquis wrote:
> Hi,
> 
>> On 20 Mar 2024, at 18:40, Julien Grall <julien@xen.org> wrote:
>>
>> Hi John,
>>
>> On 20/03/2024 16:24, John Ernberg wrote:
>>> Hi Bertrand,
>>> On 3/13/24 11:07, Bertrand Marquis wrote:
>>>> Hi,
>>>>
>>>>> On 8 Mar 2024, at 15:04, Julien Grall <julien@xen.org> wrote:
>>>>>
>>>>> Hi John,
>>>>>
>>>>> Thank you for the reply.
>>>>>
>>>>> On 08/03/2024 13:40, John Ernberg wrote:
>>>>>> On 3/7/24 00:07, Julien Grall wrote:
>>>>>>>    > Ping on the watchdog discussion bits.
>>>>>>>
>>>>>>> Sorry for the late reply.
>>>>>>>
>>>>>>> On 06/03/2024 13:13, John Ernberg wrote:
>>>>>>>> On 2/9/24 14:14, John Ernberg wrote:
>>>>>>>>>
>>>>>>>>>>       * IMX_SIP_TIMER_*:  This seems to be related to the watchdog.
>>>>>>>>>> Shouldn't dom0 rely on the watchdog provided by Xen instead? So those
>>>>>>>>>> call will be used by Xen.
>>>>>>>>>
>>>>>>>>> That is indeed a watchdog SIP, and also for setting the SoC internal RTC
>>>>>>>>> if it is being used.
>>>>>>>>>
>>>>>>>>> I looked around if there was previous discussion and only really
>>>>>>>>> found [3].
>>>>>>>>> Is the xen/xen/include/watchdog.h header meant to be for this kind of
>>>>>>>>> watchdog support or is that more for the VM watchdog? Looking at the x86
>>>>>>>>> ACPI NMI watchdog it seems like the former, but I have never worked with
>>>>>>>>> x86 nor ACPI.
>>>>>>>
>>>>>>> include/watchdog.h contains helper to configure the watchdog for Xen. We
>>>>>>> also have per-VM watchdog and this is configured by the hypercall
>>>>>>> SCHEDOP_watchdog.
>>>>>>>
>>>>>>>>>
>>>>>>>>> Currently forwarding it to Dom0 has not caused any watchdog resets with
>>>>>>>>> our watchdog timeout settings, our specific Dom0 setup and VM count.
>>>>>>>
>>>>>>> IIUC, the SMC API for the watchdog would be similar to the ACPI NMI
>>>>>>> watchdog. So I think it would make more sense if this is not exposed to
>>>>>>> dom0 (even if Xen is doing nothing with it).
>>>>>>>
>>>>>>> Can you try to hide the SMCs and check if dom0 still behave properly?
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>> This SMC manages a hardware watchdog, if it's not pinged within a
>>>>>> specific interval the entire board resets.
>>>>>
>>>>> Do you know what's the default interval? Is it large enough so Xen + dom0 can boot (at least up to when the watchdog driver is initialized)?
>>>>>
>>>>>> If I block the SMCs the watchdog driver in Dom0 will fail to ping the
>>>>>> watchdog, triggering a board reset because the system looks to have
>>>>>> become unresponsive. The reason this patch set started is because we
>>>>>> couldn't ping the watchdog when running with Xen.
>>>>>> In our specific system the bootloader enables the watchdog as early as
>>>>>> possible so that we can get watchdog protection for as much of the boot
>>>>>> as we possibly can.
>>>>>> So, if we are to block the SMC from Dom0, then Xen needs to take over
>>>>>> the pinging. It could be implemented similarly to the NMI watchdog,
>>>>>> except that the system will reset if the ping is missed rather than
>>>>>> backtrace.
>>>>>> It would also mean that Xen will get a whole watchdog driver-category
>>>>>> due to the watchdog being vendor and sometimes even SoC specific when it
>>>>>> comes to Arm.
>>>>>> My understanding of the domain watchdog code is that today the domain
>>>>>> needs to call SCHEDOP_watchdog at least once to start the watchdog
>>>>>> timer. Since watchdog protection through the whole boot process is
>>>>>> desirable we'd need some core changes, such as an option to start the
>>>>>> domain watchdog on init. >
>>>>>> It's quite a big change to make
>>>>>
>>>>> For clarification, above you seem to mention two changes:
>>>>>
>>>>> 1) Allow Xen to use the HW watchdog
>>>>> 2) Allow the domain to use the watchdog early
>>>>>
>>>>> I am assuming by big change, you are referring to 2?
>>>>>
>>>>> , while I am not against doing it if it
>>>>>> makes sense, I now wonder if Xen should manage hardware watchdogs.
>>>>>> Looking at the domain watchdog code it looks like if a domain does not
>>>>>> get enough execution time, the watchdog will not be pinged enough and
>>>>>> the guest will be reset. So either watchdog approach requires Dom0 to
>>>>>> get execution time. Dom0 also needs to service all the PV backends it's
>>>>>> responsible for. I'm not sure it's valuable to add another layer of
>>>>>> watchdog for this scenario as the end result (checking that the entire
>>>>>> system works) is achieved without it as well.
>>>>>> So, before I try to find the time to make a proposal for moving the
>>>>>> hardware watchdog bit to Xen, do we really want it?
>>>>>
>>>>> Thanks for the details. Given that the watchdog is enabled by the bootloader, I think we want Xen to drive the watchdog for two reasons:
>>>>> 1) In true dom0less environment, dom0 would not exist
>>>>> 2) You are relying on Xen + Dom0 to boot (or at least enough to get the watchdog working) within the watchdog interval.
>>>>
>>>> Definitely we need to consider the case where there is no Dom0.
>>>>
>>>> I think there are in fact 3 different use cases here:
>>>> - watchdog fully driven in a domain (dom0 or another): would work if it is expected
>>>>     that Xen + Domain boot time is under the watchdog initial refresh rate. I think this
>>>>     could make sense on some applications where your system depends on a specific
>>>>     domain to be properly booted to work. This would require an initial refresh time
>>>>     configurable in the boot loader probably.
>>> This is our use-case. ^
>>> Our dom0 is monitoring and managing the other domains in our system.
>>> Without dom0 working the system isn't really working as a whole.
>>> @Julien: Would you be ok with the patch set continuing in the direction
>>> of the
>>> original proposal, letting another party (or me at a later time) implement
>>> the fully driven by Xen option?
>> I am concerned about this particular point from Bertrand:
>>
>> "would work if it is expected that Xen + Domain boot time is under the watchdog initial refresh rate."
>>
>> How will a user be able to figure out how to initially configure the watchdog? Is this something you can easily configure in the bootloader at runtime?

If you go through the trouble of enabling the watchdog in the bootloader you
usually know what you're doing and set the timeout yourself.

Since our systems can be mounted in really annoying locations (both in
installations and world-wise) we need as much auto-recovery as possible to
avoid people having to travel to collect a unit that just needed a reset due
to some unexpected obscure issue at startup.
> 
> Definitely here it would be better to have the watchdog turned off by default and document how to enable it in the firmware.
> 
> Even if a long timeout is configured by default, a user could run into trouble if using a linux without watchdog or not running linux or using dom0less without a linux having access to it.
> I agree with Julien here that the concern could be that users would come to us instead of NXP if there is system is doing a reset without reasons after some seconds or minutes.

I could add myself as a reviewer for the iMX parts if that helps routing 
such
questions (and future patches) also to me. We have experience with the QXP,
and the QM (for the supported parts by this patch set) is identical.

Would that help with the concerns?

> 
>>
>>
>> Overall, I am not for this approach at least in the current status. I would be more inclined if there are some documentations explaining how this is supposed to be configured on NXP, so others can use the code.
>>
>> Anyway, this is why we have multiple Arm maintainers for this kind of situation. If they other agrees with you, then they can ack the patch and this can be merged.
> 
> I agree with Stefano that it would be good to have those board supported.
> 
> One thing i could suggest until there is a watchdog driver inside Xen is to have a clear Warning at Xen boot on those boards in the console so that we could at least identify the reason easily if a reset occurs for someone.

How do other SoCs deal with this currently? The iMX SoCs aren't the only 
ones
with a watchdog, just the first one added to Xen that pings the watchdog 
over
an SMC. What about the OMAPs, the R-Cars, Xilinx's, Exynos' and so on?

Thanks! // John Ernberg

  reply	other threads:[~2024-03-21 16:05 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-31 11:50 [PATCH 0/2] Xen: ARM: Improved NXP iMX8 platform support John Ernberg
2024-01-31 11:50 ` [PATCH 2/2] xen/drivers: imx-lpuart: Add iMX8QXP compatible John Ernberg
2024-01-31 12:29   ` Julien Grall
2024-01-31 15:31     ` John Ernberg
2024-01-31 11:50 ` [PATCH 1/2] xen/arm: Add imx8q{m,x} platform glue John Ernberg
2024-01-31 12:22   ` Julien Grall
2024-01-31 15:32     ` John Ernberg
2024-02-01  4:10       ` Peng Fan
2024-02-01 16:15         ` John Ernberg
2024-02-01 12:20       ` Julien Grall
2024-02-01 16:17         ` John Ernberg
2024-02-02  9:19           ` Julien Grall
2024-02-02 22:09             ` Stefano Stabellini
2024-02-04  9:40             ` Peng Fan
2024-02-05 10:13               ` Julien Grall
2024-02-09 13:14                 ` John Ernberg
2024-03-06 13:13                   ` John Ernberg
2024-03-06 23:07                     ` Julien Grall
2024-03-08 13:40                       ` John Ernberg
2024-03-08 14:04                         ` Julien Grall
2024-03-11  8:39                           ` John Ernberg
2024-03-13 10:07                           ` Bertrand Marquis
2024-03-20 16:24                             ` John Ernberg
2024-03-20 17:40                               ` Julien Grall
2024-03-20 23:53                                 ` Stefano Stabellini
2024-03-21  1:09                                   ` Peng Fan
2024-03-21  7:41                                 ` Bertrand Marquis
2024-03-21 16:05                                   ` John Ernberg [this message]
2024-03-21 16:15                                     ` Bertrand Marquis
2024-03-25 12:18                                       ` John Ernberg
2024-03-26  8:07                                         ` Bertrand Marquis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1f5eba1a-7dc3-41f2-8986-26452c264770@actia.se \
    --to=john.ernberg@actia.se \
    --cc=Bertrand.Marquis@arm.com \
    --cc=Volodymyr_Babchuk@epam.com \
    --cc=jonas.blixt@actia.se \
    --cc=julien@xen.org \
    --cc=michal.orzel@amd.com \
    --cc=peng.fan@nxp.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.