linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Should arm64 have a custom crash shutdown handler?
@ 2022-05-04 20:00 Guilherme G. Piccoli
  2022-05-05  7:29 ` Marc Zyngier
  2022-05-05 11:10 ` Mark Rutland
  0 siblings, 2 replies; 14+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-04 20:00 UTC (permalink / raw)
  To: Catalin Marinas, will Deacon, Michael Kelley (LINUX),
	Vitaly Kuznetsov, Marc Zyngier, mark.rutland, Russell King,
	Ard Biesheuvel, broonie
  Cc: linux-arm-kernel, linux-kernel, linux-hyperv

Hi folks, this email is to ask feedback / trigger a discussion about the
concept of custom crash shutdown handler, that is "missing" in arm64
while it's present in many architectures [mips, powerpc, x86, sh (!)].

Currently, when we kexec in arm64, the function machine_crash_shutdown()
is called as a handler to disable CPUs and (potentially) do extra
quiesce work. In the aforementioned architectures, there's a way to
override this function, if for example an hypervisor wish to have its
guests running their own custom shutdown machinery.

For powerpc/mips, the approach is a generic shutdown function that might
call other handler-registered functions, whereas x86/sh relies in the
"machine_ops" structure, having the crash shutdown as a callback in such
struct.

The usage for that is very broad, but heavy users are hypervisors like
Hyper-V / KVM (CCed Michael and Vitaly here for this reason). The
discussion about the need for that in arm64 is from another thread [0],
so before start implementing/playing with that, I'd like to ask ARM64
community if there is any feedback and in case it's positive, what is
the best implementation strategy (struct callback vs. handler call), etc.

I've CCed ARM64/ARM32 maintainers plus extra people I found as really
involved with ARM architecture - sorry if I added people I shouldn't or
if I forgot somebody (though the ARM mailing-list is CC).
Cheers,


Guilherme


[0]
https://lore.kernel.org/lkml/2787b476-6366-1c83-db80-0393da417497@igalia.com/
See the proposed option (b)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Should arm64 have a custom crash shutdown handler?
  2022-05-04 20:00 Should arm64 have a custom crash shutdown handler? Guilherme G. Piccoli
@ 2022-05-05  7:29 ` Marc Zyngier
  2022-05-05 12:44   ` Guilherme G. Piccoli
  2022-05-05 11:10 ` Mark Rutland
  1 sibling, 1 reply; 14+ messages in thread
From: Marc Zyngier @ 2022-05-05  7:29 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Catalin Marinas, will Deacon, Michael Kelley (LINUX),
	Vitaly Kuznetsov, mark.rutland, Russell King, Ard Biesheuvel,
	broonie, linux-arm-kernel, linux-kernel, linux-hyperv

On Wed, 04 May 2022 21:00:42 +0100,
"Guilherme G. Piccoli" <gpiccoli@igalia.com> wrote:
> 
> Hi folks, this email is to ask feedback / trigger a discussion about the
> concept of custom crash shutdown handler, that is "missing" in arm64
> while it's present in many architectures [mips, powerpc, x86, sh (!)].
> 
> Currently, when we kexec in arm64, the function machine_crash_shutdown()
> is called as a handler to disable CPUs and (potentially) do extra
> quiesce work. In the aforementioned architectures, there's a way to
> override this function, if for example an hypervisor wish to have its
> guests running their own custom shutdown machinery.
>
> For powerpc/mips, the approach is a generic shutdown function that might
> call other handler-registered functions, whereas x86/sh relies in the
> "machine_ops" structure, having the crash shutdown as a callback in such
> struct.
> 
> The usage for that is very broad, but heavy users are hypervisors like
> Hyper-V / KVM (CCed Michael and Vitaly here for this reason). The
> discussion about the need for that in arm64 is from another thread [0],
> so before start implementing/playing with that, I'd like to ask ARM64
> community if there is any feedback and in case it's positive, what is
> the best implementation strategy (struct callback vs. handler call), etc.
> 
> I've CCed ARM64/ARM32 maintainers plus extra people I found as really
> involved with ARM architecture - sorry if I added people I shouldn't or
> if I forgot somebody (though the ARM mailing-list is CC).

I have the feeling that you are conflating two different things here:

(1) general shutdown/reboot, whether this because of a crash or not

(2) kexec, for which the whole point is that it is possible to handle
*everything* from within the kernel

On arm64:

(1) is already abstracted via PSCI. The hypervisor can do whatever it
wants there (KVM, not needing anything, just forwards this to
userspace for fun and profit -- if something has to be done, the VMM
is the right spot). I expect other hypervisors to do the same thing
(and that's what the architecture expects anyway).

(2) must, by definition, fit into the architectural envelope. If you
need help from another entity in the system to be able to kexec,
something is broken, because the hypervisor doesn't implement the
architecture correctly (and frankly, we really don't need much to be
able to kexec).

Not having any 'machine_ops' indirection was a conscious decision on
arm64, if only to avoid the nightmare that 32bit was at a time with
every single platform doing their own stuff. Introducing them would
not be an improvement, but simply the admission that hypervisors are
simply too broken for words. And I don't buy the "but x86 has it!"
argument. x86 is a nightmare of PV mess that we can happily ignore,
because we don't do PV for core operations at all.

If something has to be done to quiesce the system, it probably is
related to the system topology, and must be linked to it. We already
have these requirements in order to correctly stop ongoing DMA, shut
down IOMMUs, and other similar stuff. What other requirements does
your favourite hypervisor have?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Should arm64 have a custom crash shutdown handler?
  2022-05-04 20:00 Should arm64 have a custom crash shutdown handler? Guilherme G. Piccoli
  2022-05-05  7:29 ` Marc Zyngier
@ 2022-05-05 11:10 ` Mark Rutland
  1 sibling, 0 replies; 14+ messages in thread
From: Mark Rutland @ 2022-05-05 11:10 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Catalin Marinas, will Deacon, Michael Kelley (LINUX),
	Vitaly Kuznetsov, Marc Zyngier, Russell King, Ard Biesheuvel,
	broonie, linux-arm-kernel, linux-kernel, linux-hyperv

On Wed, May 04, 2022 at 05:00:42PM -0300, Guilherme G. Piccoli wrote:
> Hi folks, this email is to ask feedback / trigger a discussion about the
> concept of custom crash shutdown handler, that is "missing" in arm64
> while it's present in many architectures [mips, powerpc, x86, sh (!)].
> 
> Currently, when we kexec in arm64, the function machine_crash_shutdown()
> is called as a handler to disable CPUs and (potentially) do extra
> quiesce work. In the aforementioned architectures, there's a way to
> override this function, if for example an hypervisor wish to have its
> guests running their own custom shutdown machinery.

What exactly do you need to do in this custom shutdown machinery?

The general expectation for arm64 is that any hypervisor can implement PSCI,
and as long as you have that, CPUs (and the VM as a whole) can be shutdown in a
standard way.

I suspect what you're actually after is a mechanism to notify the hypervisor
when the guest crashes, rather than changing the way the shutdown itself
occurs? If so, we already have panic notifiers, and QEMU has a "pvpanic"
device using that. See drivers/misc/pvpanic/.

Thanks,
Mark.

> For powerpc/mips, the approach is a generic shutdown function that might
> call other handler-registered functions, whereas x86/sh relies in the
> "machine_ops" structure, having the crash shutdown as a callback in such
> struct.
> 
> The usage for that is very broad, but heavy users are hypervisors like
> Hyper-V / KVM (CCed Michael and Vitaly here for this reason). The
> discussion about the need for that in arm64 is from another thread [0],
> so before start implementing/playing with that, I'd like to ask ARM64
> community if there is any feedback and in case it's positive, what is
> the best implementation strategy (struct callback vs. handler call), etc.
> 
> I've CCed ARM64/ARM32 maintainers plus extra people I found as really
> involved with ARM architecture - sorry if I added people I shouldn't or
> if I forgot somebody (though the ARM mailing-list is CC).
> Cheers,
> 
> 
> Guilherme
> 
> 
> [0]
> https://lore.kernel.org/lkml/2787b476-6366-1c83-db80-0393da417497@igalia.com/
> See the proposed option (b)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Should arm64 have a custom crash shutdown handler?
  2022-05-05  7:29 ` Marc Zyngier
@ 2022-05-05 12:44   ` Guilherme G. Piccoli
  2022-05-05 12:53     ` Mark Rutland
  0 siblings, 1 reply; 14+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-05 12:44 UTC (permalink / raw)
  To: Marc Zyngier, mark.rutland
  Cc: Catalin Marinas, will Deacon, Michael Kelley (LINUX),
	Vitaly Kuznetsov, Russell King, Ard Biesheuvel, broonie,
	linux-arm-kernel, linux-kernel, linux-hyperv

On 05/05/2022 04:29, Marc Zyngier wrote:
> [...]
> Not having any 'machine_ops' indirection was a conscious decision on
> arm64, if only to avoid the nightmare that 32bit was at a time with
> every single platform doing their own stuff. Introducing them would
> not be an improvement, but simply the admission that hypervisors are
> simply too broken for words. And I don't buy the "but x86 has it!"
> argument. x86 is a nightmare of PV mess that we can happily ignore,
> because we don't do PV for core operations at all.
> 
> If something has to be done to quiesce the system, it probably is
> related to the system topology, and must be linked to it. We already
> have these requirements in order to correctly stop ongoing DMA, shut
> down IOMMUs, and other similar stuff. What other requirements does
> your favourite hypervisor have?
> 

Thanks Marc and Mark for the details. I agree with most part of it, and
in fact panic notifiers was the trigger for this discussion (and they
are in fact used for this purpose to some extent in Hyper-V).

The idea of having this custom handler from kexec comes from Hyper-V
discussion - I feel it's better to show the code, so please take a look
at functions: hv_machine_crash_shutdown()
[arch/x86/kernel/cpu/mshyperv.c] and the one called from there,
hv_crash_handler() [drivers/hv/vmbus_drv.c].

These routines perform last minute clean-ups, right before kdump/kexec
happens, but *after* the panic notifiers. It seems there is no way to
accomplish that without architecture involvement or core kexec code
pollution heh

Anyway, the idea here was to gather a feedback on how "receptive" arm64
community would be to allow such customization, appreciated your feedback =)

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Should arm64 have a custom crash shutdown handler?
  2022-05-05 12:44   ` Guilherme G. Piccoli
@ 2022-05-05 12:53     ` Mark Rutland
  2022-05-05 13:05       ` Guilherme G. Piccoli
  0 siblings, 1 reply; 14+ messages in thread
From: Mark Rutland @ 2022-05-05 12:53 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Marc Zyngier, Catalin Marinas, will Deacon,
	Michael Kelley (LINUX),
	Vitaly Kuznetsov, Russell King, Ard Biesheuvel, broonie,
	linux-arm-kernel, linux-kernel, linux-hyperv

On Thu, May 05, 2022 at 09:44:25AM -0300, Guilherme G. Piccoli wrote:
> On 05/05/2022 04:29, Marc Zyngier wrote:
> > [...]
> > Not having any 'machine_ops' indirection was a conscious decision on
> > arm64, if only to avoid the nightmare that 32bit was at a time with
> > every single platform doing their own stuff. Introducing them would
> > not be an improvement, but simply the admission that hypervisors are
> > simply too broken for words. And I don't buy the "but x86 has it!"
> > argument. x86 is a nightmare of PV mess that we can happily ignore,
> > because we don't do PV for core operations at all.
> > 
> > If something has to be done to quiesce the system, it probably is
> > related to the system topology, and must be linked to it. We already
> > have these requirements in order to correctly stop ongoing DMA, shut
> > down IOMMUs, and other similar stuff. What other requirements does
> > your favourite hypervisor have?
> > 
> 
> Thanks Marc and Mark for the details. I agree with most part of it, and
> in fact panic notifiers was the trigger for this discussion (and they
> are in fact used for this purpose to some extent in Hyper-V).
> 
> The idea of having this custom handler from kexec comes from Hyper-V
> discussion - I feel it's better to show the code, so please take a look
> at functions: hv_machine_crash_shutdown()
> [arch/x86/kernel/cpu/mshyperv.c] and the one called from there,
> hv_crash_handler() [drivers/hv/vmbus_drv.c].
> 
> These routines perform last minute clean-ups, right before kdump/kexec
> happens, but *after* the panic notifiers. It seems there is no way to
> accomplish that without architecture involvement or core kexec code
> pollution heh

Looking at those, the cleanup work is all arch-specific. What exactly would we
need to do on arm64, and why does it need to happen at that point specifically?
On arm64 we don't expect as much paravirtualization as on x86, so it's not
clear to me whether we need anything at all.

> Anyway, the idea here was to gather a feedback on how "receptive" arm64
> community would be to allow such customization, appreciated your feedback =)

... and are you trying to do this for Hyper-V or just using that as an example?

I think we're not going to be very receptive without a more concrete example of
what you want.

What exactly do *you* need, and *why*? Is that for Hyper-V or another hypervisor?

Thanks
Mark.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Should arm64 have a custom crash shutdown handler?
  2022-05-05 12:53     ` Mark Rutland
@ 2022-05-05 13:05       ` Guilherme G. Piccoli
  2022-05-05 13:15         ` Mark Rutland
  2022-05-05 13:52         ` Vitaly Kuznetsov
  0 siblings, 2 replies; 14+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-05 13:05 UTC (permalink / raw)
  To: Mark Rutland, Michael Kelley (LINUX)
  Cc: Marc Zyngier, Catalin Marinas, will Deacon, Vitaly Kuznetsov,
	Russell King, Ard Biesheuvel, broonie, linux-arm-kernel,
	linux-kernel, linux-hyperv

On 05/05/2022 09:53, Mark Rutland wrote:
> [...]
> Looking at those, the cleanup work is all arch-specific. What exactly would we
> need to do on arm64, and why does it need to happen at that point specifically?
> On arm64 we don't expect as much paravirtualization as on x86, so it's not
> clear to me whether we need anything at all.
> 
>> Anyway, the idea here was to gather a feedback on how "receptive" arm64
>> community would be to allow such customization, appreciated your feedback =)
> 
> ... and are you trying to do this for Hyper-V or just using that as an example?
> 
> I think we're not going to be very receptive without a more concrete example of
> what you want.
> 
> What exactly do *you* need, and *why*? Is that for Hyper-V or another hypervisor?
> 
> Thanks
> Mark.

Hi Mark, my plan would be doing that for Hyper-V - kind of the same
code, almost. For example, in hv_crash_handler() there is a stimer
clean-up and the vmbus unload - my understanding is that this same code
would need to run in arm64. Michael Kelley is CCed, he was discussing
with me in the panic notifiers thread and may elaborate more on the needs.

But also (not related with my specific plan), I've seen KVM quiesce code
on x86 as well [see kvm_crash_shutdown() on arch/x86] , I'm not sure if
this is necessary for arm64 or if this already executing in some
abstracted form, I didn't dig deep - probably Vitaly is aware of that,
hence I've CCed him here.

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Should arm64 have a custom crash shutdown handler?
  2022-05-05 13:05       ` Guilherme G. Piccoli
@ 2022-05-05 13:15         ` Mark Rutland
  2022-05-05 13:19           ` Guilherme G. Piccoli
  2022-05-05 13:52         ` Vitaly Kuznetsov
  1 sibling, 1 reply; 14+ messages in thread
From: Mark Rutland @ 2022-05-05 13:15 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Michael Kelley (LINUX),
	Marc Zyngier, Catalin Marinas, will Deacon, Vitaly Kuznetsov,
	Russell King, Ard Biesheuvel, broonie, linux-arm-kernel,
	linux-kernel, linux-hyperv

On Thu, May 05, 2022 at 10:05:18AM -0300, Guilherme G. Piccoli wrote:
> On 05/05/2022 09:53, Mark Rutland wrote:
> > [...]
> > Looking at those, the cleanup work is all arch-specific. What exactly would we
> > need to do on arm64, and why does it need to happen at that point specifically?
> > On arm64 we don't expect as much paravirtualization as on x86, so it's not
> > clear to me whether we need anything at all.
> > 
> >> Anyway, the idea here was to gather a feedback on how "receptive" arm64
> >> community would be to allow such customization, appreciated your feedback =)
> > 
> > ... and are you trying to do this for Hyper-V or just using that as an example?
> > 
> > I think we're not going to be very receptive without a more concrete example of
> > what you want.
> > 
> > What exactly do *you* need, and *why*? Is that for Hyper-V or another hypervisor?
> > 
> > Thanks
> > Mark.
> 
> Hi Mark, my plan would be doing that for Hyper-V - kind of the same
> code, almost. For example, in hv_crash_handler() there is a stimer
> clean-up and the vmbus unload - my understanding is that this same code
> would need to run in arm64. Michael Kelley is CCed, he was discussing
> with me in the panic notifiers thread and may elaborate more on the needs.

The key problem here is that there's no justification as to why this cannot be
done in a panic notifier (or a regular reboot notifer for the plain shutdown
or kexec case).

I undertstand that x86 does one thing, and what Marc and I have said is that
fact alone doesn't justify doing the same.

We need to know:

a) What specifically you want to do on arm64

b) Why you think this cannot be done using the existing mechanisms (e.g.
   notifiers).

Without a strong justification, we wouldn't add such hooks to arm64.

Could you start by trying to use the notifiers, and if you encounter a problem,
*then* we consider an alternative? That should mean we have a concrete reason.

Thanks,
Mark.

> But also (not related with my specific plan), I've seen KVM quiesce code
> on x86 as well [see kvm_crash_shutdown() on arch/x86] , I'm not sure if
> this is necessary for arm64 or if this already executing in some
> abstracted form, I didn't dig deep - probably Vitaly is aware of that,
> hence I've CCed him here.
> 
> Cheers,
> 
> 
> Guilherme

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Should arm64 have a custom crash shutdown handler?
  2022-05-05 13:15         ` Mark Rutland
@ 2022-05-05 13:19           ` Guilherme G. Piccoli
  0 siblings, 0 replies; 14+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-05 13:19 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Michael Kelley (LINUX),
	Marc Zyngier, Catalin Marinas, will Deacon, Vitaly Kuznetsov,
	Russell King, Ard Biesheuvel, broonie, linux-arm-kernel,
	linux-kernel, linux-hyperv

On 05/05/2022 10:15, Mark Rutland wrote:
> [...]
> Without a strong justification, we wouldn't add such hooks to arm64.
> 
> Could you start by trying to use the notifiers, and if you encounter a problem,
> *then* we consider an alternative? That should mean we have a concrete reason.
> 
> Thanks,
> Mark.

That's a good idea Mark. I'm not expert in Hyper-V - I could try that
for a fact, but let's see if Michael has a strong reason on why this
need to be done in arch code and panic notifiers aren't enough.

Thanks,


Guilherme

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Should arm64 have a custom crash shutdown handler?
  2022-05-05 13:05       ` Guilherme G. Piccoli
  2022-05-05 13:15         ` Mark Rutland
@ 2022-05-05 13:52         ` Vitaly Kuznetsov
  2022-05-05 14:07           ` Guilherme G. Piccoli
  2022-05-05 14:31           ` Mark Rutland
  1 sibling, 2 replies; 14+ messages in thread
From: Vitaly Kuznetsov @ 2022-05-05 13:52 UTC (permalink / raw)
  To: Guilherme G. Piccoli, Mark Rutland, Michael Kelley (LINUX)
  Cc: Marc Zyngier, Catalin Marinas, will Deacon, Russell King,
	Ard Biesheuvel, broonie, linux-arm-kernel, linux-kernel,
	linux-hyperv

"Guilherme G. Piccoli" <gpiccoli@igalia.com> writes:

> On 05/05/2022 09:53, Mark Rutland wrote:
>> [...]
>> Looking at those, the cleanup work is all arch-specific. What exactly would we
>> need to do on arm64, and why does it need to happen at that point specifically?
>> On arm64 we don't expect as much paravirtualization as on x86, so it's not
>> clear to me whether we need anything at all.
>> 
>>> Anyway, the idea here was to gather a feedback on how "receptive" arm64
>>> community would be to allow such customization, appreciated your feedback =)
>> 
>> ... and are you trying to do this for Hyper-V or just using that as an example?
>> 
>> I think we're not going to be very receptive without a more concrete example of
>> what you want.
>> 
>> What exactly do *you* need, and *why*? Is that for Hyper-V or another hypervisor?
>> 
>> Thanks
>> Mark.
>
> Hi Mark, my plan would be doing that for Hyper-V - kind of the same
> code, almost. For example, in hv_crash_handler() there is a stimer
> clean-up and the vmbus unload - my understanding is that this same code
> would need to run in arm64. Michael Kelley is CCed, he was discussing
> with me in the panic notifiers thread and may elaborate more on the needs.
>
> But also (not related with my specific plan), I've seen KVM quiesce code
> on x86 as well [see kvm_crash_shutdown() on arch/x86] , I'm not sure if
> this is necessary for arm64 or if this already executing in some
> abstracted form, I didn't dig deep - probably Vitaly is aware of that,
> hence I've CCed him here.

Speaking about the difference between reboot notifiers call chain and
machine_ops.crash_shutdown for KVM/x86, the main difference is that
reboot notifier is called on some CPU while the VM is fully functional,
this way we may e.g. still use IPIs (see kvm_pv_reboot_notify() doing
on_each_cpu()). When we're in a crash situation,
machine_ops.crash_shutdown is called on the CPU which crashed. We can't
count on IPIs still being functional so we do the very basic minimum so
*this* CPU can boot kdump kernel. There's no guarantee other CPUs can
still boot but normally we do kdump with 'nprocs=1'.

For Hyper-V, the situation is similar: hv_crash_handler() intitiates
VMbus unload on the crashing CPU only, there's no mechanism to do
'global' unload so other CPUs will likely not be able to connect Vmbus
devices in kdump kernel but this should not be necessary.

There's a crash_kexec_post_notifiers mechanism which can be used instead
but it's disabled by default so using machine_ops.crash_shutdown is
better.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Should arm64 have a custom crash shutdown handler?
  2022-05-05 13:52         ` Vitaly Kuznetsov
@ 2022-05-05 14:07           ` Guilherme G. Piccoli
  2022-05-05 14:31           ` Mark Rutland
  1 sibling, 0 replies; 14+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-05 14:07 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Mark Rutland, Michael Kelley (LINUX)
  Cc: Marc Zyngier, Catalin Marinas, will Deacon, Russell King,
	Ard Biesheuvel, broonie, linux-arm-kernel, linux-kernel,
	linux-hyperv

On 05/05/2022 10:52, Vitaly Kuznetsov wrote:
> [...]
> For Hyper-V, the situation is similar: hv_crash_handler() intitiates
> VMbus unload on the crashing CPU only, there's no mechanism to do
> 'global' unload so other CPUs will likely not be able to connect Vmbus
> devices in kdump kernel but this should not be necessary.
> 
> There's a crash_kexec_post_notifiers mechanism which can be used instead
> but it's disabled by default so using machine_ops.crash_shutdown is
> better.
> 

Thanks a bunch Vitaly, for the clarification!

Just as a heads-up: there's been a panic notifiers refactor proposed [0]
in which some notifiers will run before kdump by default, not requiring
"crash_kexec_post_notifiers" (which BTW is *unfortunately* hardcoded as
'Y' for hyper-v, since a11589563e96).

Cheers,


Guilherme


[0]
https://lore.kernel.org/lkml/20220427224924.592546-1-gpiccoli@igalia.com/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Should arm64 have a custom crash shutdown handler?
  2022-05-05 13:52         ` Vitaly Kuznetsov
  2022-05-05 14:07           ` Guilherme G. Piccoli
@ 2022-05-05 14:31           ` Mark Rutland
  2022-05-05 14:51             ` Vitaly Kuznetsov
  1 sibling, 1 reply; 14+ messages in thread
From: Mark Rutland @ 2022-05-05 14:31 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Guilherme G. Piccoli, Michael Kelley (LINUX),
	Marc Zyngier, Catalin Marinas, will Deacon, Russell King,
	Ard Biesheuvel, broonie, linux-arm-kernel, linux-kernel,
	linux-hyperv

On Thu, May 05, 2022 at 03:52:24PM +0200, Vitaly Kuznetsov wrote:
> "Guilherme G. Piccoli" <gpiccoli@igalia.com> writes:
> 
> > On 05/05/2022 09:53, Mark Rutland wrote:
> >> [...]
> >> Looking at those, the cleanup work is all arch-specific. What exactly would we
> >> need to do on arm64, and why does it need to happen at that point specifically?
> >> On arm64 we don't expect as much paravirtualization as on x86, so it's not
> >> clear to me whether we need anything at all.
> >> 
> >>> Anyway, the idea here was to gather a feedback on how "receptive" arm64
> >>> community would be to allow such customization, appreciated your feedback =)
> >> 
> >> ... and are you trying to do this for Hyper-V or just using that as an example?
> >> 
> >> I think we're not going to be very receptive without a more concrete example of
> >> what you want.
> >> 
> >> What exactly do *you* need, and *why*? Is that for Hyper-V or another hypervisor?
> >> 
> >> Thanks
> >> Mark.
> >
> > Hi Mark, my plan would be doing that for Hyper-V - kind of the same
> > code, almost. For example, in hv_crash_handler() there is a stimer
> > clean-up and the vmbus unload - my understanding is that this same code
> > would need to run in arm64. Michael Kelley is CCed, he was discussing
> > with me in the panic notifiers thread and may elaborate more on the needs.
> >
> > But also (not related with my specific plan), I've seen KVM quiesce code
> > on x86 as well [see kvm_crash_shutdown() on arch/x86] , I'm not sure if
> > this is necessary for arm64 or if this already executing in some
> > abstracted form, I didn't dig deep - probably Vitaly is aware of that,
> > hence I've CCed him here.
> 
> Speaking about the difference between reboot notifiers call chain and
> machine_ops.crash_shutdown for KVM/x86, the main difference is that
> reboot notifier is called on some CPU while the VM is fully functional,
> this way we may e.g. still use IPIs (see kvm_pv_reboot_notify() doing
> on_each_cpu()). When we're in a crash situation,
> machine_ops.crash_shutdown is called on the CPU which crashed. We can't
> count on IPIs still being functional so we do the very basic minimum so
> *this* CPU can boot kdump kernel. There's no guarantee other CPUs can
> still boot but normally we do kdump with 'nprocs=1'.

Sure; IIUC the IPI problem doesn't apply to arm64, though, since that doesn't
use a PV mechanism (and practically speaking will either be GICv2 or GICv3).

> For Hyper-V, the situation is similar: hv_crash_handler() intitiates
> VMbus unload on the crashing CPU only, there's no mechanism to do
> 'global' unload so other CPUs will likely not be able to connect Vmbus
> devices in kdump kernel but this should not be necessary.

Given kdump is best-effort (and we can't rely on secondary CPUs even making it
into the kdump kernel), I also don't think that should be necessary.

> There's a crash_kexec_post_notifiers mechanism which can be used instead
> but it's disabled by default so using machine_ops.crash_shutdown is
> better.

Another option is to defer this to the kdump kernel. On arm64 at least, we know
if we're in a kdump kernel early on, and can reset some state based upon that.

Looking at x86's hyperv_cleanup(), everything relevant to arm64 can be deferred
to just before the kdump kernel detects and initializes anything relating to
hyperv. So AFAICT we could have hyperv_init() check is_kdump_kernel() prior to
the first hypercall, and do the cleanup/reset there.

Maybe we need more data for the vmbus bits? ... if so it seems that could blow
up anyway when the first kernel was tearing down.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Should arm64 have a custom crash shutdown handler?
  2022-05-05 14:31           ` Mark Rutland
@ 2022-05-05 14:51             ` Vitaly Kuznetsov
  2022-05-06 11:01               ` Mark Rutland
  0 siblings, 1 reply; 14+ messages in thread
From: Vitaly Kuznetsov @ 2022-05-05 14:51 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Guilherme G. Piccoli, Michael Kelley (LINUX),
	Marc Zyngier, Catalin Marinas, will Deacon, Russell King,
	Ard Biesheuvel, broonie, linux-arm-kernel, linux-kernel,
	linux-hyperv

Mark Rutland <mark.rutland@arm.com> writes:

> On Thu, May 05, 2022 at 03:52:24PM +0200, Vitaly Kuznetsov wrote:
>> "Guilherme G. Piccoli" <gpiccoli@igalia.com> writes:
>> 
>> > On 05/05/2022 09:53, Mark Rutland wrote:
>> >> [...]
>> >> Looking at those, the cleanup work is all arch-specific. What exactly would we
>> >> need to do on arm64, and why does it need to happen at that point specifically?
>> >> On arm64 we don't expect as much paravirtualization as on x86, so it's not
>> >> clear to me whether we need anything at all.
>> >> 
>> >>> Anyway, the idea here was to gather a feedback on how "receptive" arm64
>> >>> community would be to allow such customization, appreciated your feedback =)
>> >> 
>> >> ... and are you trying to do this for Hyper-V or just using that as an example?
>> >> 
>> >> I think we're not going to be very receptive without a more concrete example of
>> >> what you want.
>> >> 
>> >> What exactly do *you* need, and *why*? Is that for Hyper-V or another hypervisor?
>> >> 
>> >> Thanks
>> >> Mark.
>> >
>> > Hi Mark, my plan would be doing that for Hyper-V - kind of the same
>> > code, almost. For example, in hv_crash_handler() there is a stimer
>> > clean-up and the vmbus unload - my understanding is that this same code
>> > would need to run in arm64. Michael Kelley is CCed, he was discussing
>> > with me in the panic notifiers thread and may elaborate more on the needs.
>> >
>> > But also (not related with my specific plan), I've seen KVM quiesce code
>> > on x86 as well [see kvm_crash_shutdown() on arch/x86] , I'm not sure if
>> > this is necessary for arm64 or if this already executing in some
>> > abstracted form, I didn't dig deep - probably Vitaly is aware of that,
>> > hence I've CCed him here.
>> 
>> Speaking about the difference between reboot notifiers call chain and
>> machine_ops.crash_shutdown for KVM/x86, the main difference is that
>> reboot notifier is called on some CPU while the VM is fully functional,
>> this way we may e.g. still use IPIs (see kvm_pv_reboot_notify() doing
>> on_each_cpu()). When we're in a crash situation,
>> machine_ops.crash_shutdown is called on the CPU which crashed. We can't
>> count on IPIs still being functional so we do the very basic minimum so
>> *this* CPU can boot kdump kernel. There's no guarantee other CPUs can
>> still boot but normally we do kdump with 'nprocs=1'.
>
> Sure; IIUC the IPI problem doesn't apply to arm64, though, since that doesn't
> use a PV mechanism (and practically speaking will either be GICv2 or GICv3).
>

This isn't really about PV: when the kernel is crashing, you have no
idea what's going on on other CPUs, they may be crashing too, locked in
a tight loop, ... so sending an IPI there to do some work and expecting
it to report back is dangerous.

>> For Hyper-V, the situation is similar: hv_crash_handler() intitiates
>> VMbus unload on the crashing CPU only, there's no mechanism to do
>> 'global' unload so other CPUs will likely not be able to connect Vmbus
>> devices in kdump kernel but this should not be necessary.
>
> Given kdump is best-effort (and we can't rely on secondary CPUs even making it
> into the kdump kernel), I also don't think that should be necessary.

Yes, exactly.

>
>> There's a crash_kexec_post_notifiers mechanism which can be used instead
>> but it's disabled by default so using machine_ops.crash_shutdown is
>> better.
>
> Another option is to defer this to the kdump kernel. On arm64 at least, we know
> if we're in a kdump kernel early on, and can reset some state based upon that.
>
> Looking at x86's hyperv_cleanup(), everything relevant to arm64 can be deferred
> to just before the kdump kernel detects and initializes anything relating to
> hyperv. So AFAICT we could have hyperv_init() check is_kdump_kernel() prior to
> the first hypercall, and do the cleanup/reset there.

In theory yes, it is possible to try sending CHANNELMSG_UNLOAD on kdump
kernel boot and not upon crash, I don't remember if this approach was
tried in the past. 

>
> Maybe we need more data for the vmbus bits? ... if so it seems that could blow
> up anyway when the first kernel was tearing down.

Not sure I understood what you mean... From what I remember, there were
issues with CHANNELMSG_UNLOAD handling on the Hyper-V host side in the
past (it was taking *minutes* for the host to reply) but this is
orthogonal to the fact that we need to do this cleanup so kdump kernel
is able to connect to Vmbus devices again.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Should arm64 have a custom crash shutdown handler?
  2022-05-05 14:51             ` Vitaly Kuznetsov
@ 2022-05-06 11:01               ` Mark Rutland
  2022-05-30  1:51                 ` Michael Kelley (LINUX)
  0 siblings, 1 reply; 14+ messages in thread
From: Mark Rutland @ 2022-05-06 11:01 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Guilherme G. Piccoli, Michael Kelley (LINUX),
	Marc Zyngier, Catalin Marinas, will Deacon, Russell King,
	Ard Biesheuvel, broonie, linux-arm-kernel, linux-kernel,
	linux-hyperv

On Thu, May 05, 2022 at 04:51:54PM +0200, Vitaly Kuznetsov wrote:
> Mark Rutland <mark.rutland@arm.com> writes:
> 
> > On Thu, May 05, 2022 at 03:52:24PM +0200, Vitaly Kuznetsov wrote:
> >> "Guilherme G. Piccoli" <gpiccoli@igalia.com> writes:
> >> 
> >> > On 05/05/2022 09:53, Mark Rutland wrote:
> >> >> [...]
> >> >> Looking at those, the cleanup work is all arch-specific. What exactly would we
> >> >> need to do on arm64, and why does it need to happen at that point specifically?
> >> >> On arm64 we don't expect as much paravirtualization as on x86, so it's not
> >> >> clear to me whether we need anything at all.
> >> >> 
> >> >>> Anyway, the idea here was to gather a feedback on how "receptive" arm64
> >> >>> community would be to allow such customization, appreciated your feedback =)
> >> >> 
> >> >> ... and are you trying to do this for Hyper-V or just using that as an example?
> >> >> 
> >> >> I think we're not going to be very receptive without a more concrete example of
> >> >> what you want.
> >> >> 
> >> >> What exactly do *you* need, and *why*? Is that for Hyper-V or another hypervisor?
> >> >> 
> >> >> Thanks
> >> >> Mark.
> >> >
> >> > Hi Mark, my plan would be doing that for Hyper-V - kind of the same
> >> > code, almost. For example, in hv_crash_handler() there is a stimer
> >> > clean-up and the vmbus unload - my understanding is that this same code
> >> > would need to run in arm64. Michael Kelley is CCed, he was discussing
> >> > with me in the panic notifiers thread and may elaborate more on the needs.
> >> >
> >> > But also (not related with my specific plan), I've seen KVM quiesce code
> >> > on x86 as well [see kvm_crash_shutdown() on arch/x86] , I'm not sure if
> >> > this is necessary for arm64 or if this already executing in some
> >> > abstracted form, I didn't dig deep - probably Vitaly is aware of that,
> >> > hence I've CCed him here.
> >> 
> >> Speaking about the difference between reboot notifiers call chain and
> >> machine_ops.crash_shutdown for KVM/x86, the main difference is that
> >> reboot notifier is called on some CPU while the VM is fully functional,
> >> this way we may e.g. still use IPIs (see kvm_pv_reboot_notify() doing
> >> on_each_cpu()). When we're in a crash situation,
> >> machine_ops.crash_shutdown is called on the CPU which crashed. We can't
> >> count on IPIs still being functional so we do the very basic minimum so
> >> *this* CPU can boot kdump kernel. There's no guarantee other CPUs can
> >> still boot but normally we do kdump with 'nprocs=1'.
> >
> > Sure; IIUC the IPI problem doesn't apply to arm64, though, since that doesn't
> > use a PV mechanism (and practically speaking will either be GICv2 or GICv3).
> >
> 
> This isn't really about PV: when the kernel is crashing, you have no
> idea what's going on on other CPUs, they may be crashing too, locked in
> a tight loop, ... so sending an IPI there to do some work and expecting
> it to report back is dangerous.

Sorry, I misunderstood what you meant about IPIs. I thought you meant that some
enlightened IPI mechanism might be broken, rather than you simply cannot rely
on secondary CPUs to do anything (which is true regardless of whether the
kernel is running under a hypervisor).

So I understand not calling all the regular reboot notifiers in case they do
something like that, but it seems like we should be able to do that with a
panic notifier, since that could *should* follow the principle that you can't
rely on a working IPI.

[...]

> >> There's a crash_kexec_post_notifiers mechanism which can be used instead
> >> but it's disabled by default so using machine_ops.crash_shutdown is
> >> better.
> >
> > Another option is to defer this to the kdump kernel. On arm64 at least, we know
> > if we're in a kdump kernel early on, and can reset some state based upon that.
> >
> > Looking at x86's hyperv_cleanup(), everything relevant to arm64 can be deferred
> > to just before the kdump kernel detects and initializes anything relating to
> > hyperv. So AFAICT we could have hyperv_init() check is_kdump_kernel() prior to
> > the first hypercall, and do the cleanup/reset there.
> 
> In theory yes, it is possible to try sending CHANNELMSG_UNLOAD on kdump
> kernel boot and not upon crash, I don't remember if this approach was
> tried in the past. 
> 
> > Maybe we need more data for the vmbus bits? ... if so it seems that could blow
> > up anyway when the first kernel was tearing down.
> 
> Not sure I understood what you mean... From what I remember, there were
> issues with CHANNELMSG_UNLOAD handling on the Hyper-V host side in the
> past (it was taking *minutes* for the host to reply) but this is
> orthogonal to the fact that we need to do this cleanup so kdump kernel
> is able to connect to Vmbus devices again.

I was thinking that if it was necessary to have some context (e.g. pointers to
buffers which are active) in order to do the teardown, it might be painful to
do that in the kdump kernel itself.

Otherwise, I think doing the teardown in the kdump kernel itself would be
preferable, since there's a greater likelihood that kernel infrastructure will
work relative to doing that in the kernel which crashed, and it gives the kdump
kernel the option to detect when something cannot be torn down, and not use
that feature.

Thanks,
Mark.
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: Should arm64 have a custom crash shutdown handler?
  2022-05-06 11:01               ` Mark Rutland
@ 2022-05-30  1:51                 ` Michael Kelley (LINUX)
  0 siblings, 0 replies; 14+ messages in thread
From: Michael Kelley (LINUX) @ 2022-05-30  1:51 UTC (permalink / raw)
  To: Mark Rutland, vkuznets
  Cc: Guilherme G. Piccoli, Marc Zyngier, Catalin Marinas, will Deacon,
	Russell King, Ard Biesheuvel, broonie, linux-arm-kernel,
	linux-kernel, linux-hyperv

From: Mark Rutland <mark.rutland@arm.com> Sent: Friday, May 6, 2022 4:01 AM
> 
> On Thu, May 05, 2022 at 04:51:54PM +0200, Vitaly Kuznetsov wrote:
> > Mark Rutland <mark.rutland@arm.com> writes:
> >
> > > On Thu, May 05, 2022 at 03:52:24PM +0200, Vitaly Kuznetsov wrote:
> > >> "Guilherme G. Piccoli" <gpiccoli@igalia.com> writes:
> > >>
> > >> > On 05/05/2022 09:53, Mark Rutland wrote:
> > >> >> [...]
> > >> >> Looking at those, the cleanup work is all arch-specific. What exactly would we
> > >> >> need to do on arm64, and why does it need to happen at that point specifically?
> > >> >> On arm64 we don't expect as much paravirtualization as on x86, so it's not
> > >> >> clear to me whether we need anything at all.
> > >> >>
> > >> >>> Anyway, the idea here was to gather a feedback on how "receptive" arm64
> > >> >>> community would be to allow such customization, appreciated your feedback =)
> > >> >>
> > >> >> ... and are you trying to do this for Hyper-V or just using that as an example?
> > >> >>
> > >> >> I think we're not going to be very receptive without a more concrete example of
> > >> >> what you want.
> > >> >>
> > >> >> What exactly do *you* need, and *why*? Is that for Hyper-V or another hypervisor?
> > >> >>
> > >> >> Thanks
> > >> >> Mark.
> > >> >
> > >> > Hi Mark, my plan would be doing that for Hyper-V - kind of the same
> > >> > code, almost. For example, in hv_crash_handler() there is a stimer
> > >> > clean-up and the vmbus unload - my understanding is that this same code
> > >> > would need to run in arm64. Michael Kelley is CCed, he was discussing
> > >> > with me in the panic notifiers thread and may elaborate more on the needs.
> > >> >
> > >> > But also (not related with my specific plan), I've seen KVM quiesce code
> > >> > on x86 as well [see kvm_crash_shutdown() on arch/x86] , I'm not sure if
> > >> > this is necessary for arm64 or if this already executing in some
> > >> > abstracted form, I didn't dig deep - probably Vitaly is aware of that,
> > >> > hence I've CCed him here.
> > >>
> > >> Speaking about the difference between reboot notifiers call chain and
> > >> machine_ops.crash_shutdown for KVM/x86, the main difference is that
> > >> reboot notifier is called on some CPU while the VM is fully functional,
> > >> this way we may e.g. still use IPIs (see kvm_pv_reboot_notify() doing
> > >> on_each_cpu()). When we're in a crash situation,
> > >> machine_ops.crash_shutdown is called on the CPU which crashed. We can't
> > >> count on IPIs still being functional so we do the very basic minimum so
> > >> *this* CPU can boot kdump kernel. There's no guarantee other CPUs can
> > >> still boot but normally we do kdump with 'nprocs=1'.
> > >
> > > Sure; IIUC the IPI problem doesn't apply to arm64, though, since that doesn't
> > > use a PV mechanism (and practically speaking will either be GICv2 or GICv3).
> > >
> >
> > This isn't really about PV: when the kernel is crashing, you have no
> > idea what's going on on other CPUs, they may be crashing too, locked in
> > a tight loop, ... so sending an IPI there to do some work and expecting
> > it to report back is dangerous.
> 
> Sorry, I misunderstood what you meant about IPIs. I thought you meant that some
> enlightened IPI mechanism might be broken, rather than you simply cannot rely
> on secondary CPUs to do anything (which is true regardless of whether the
> kernel is running under a hypervisor).
> 
> So I understand not calling all the regular reboot notifiers in case they do
> something like that, but it seems like we should be able to do that with a
> panic notifier, since that could *should* follow the principle that you can't
> rely on a working IPI.
> 
> [...]
> 
> > >> There's a crash_kexec_post_notifiers mechanism which can be used instead
> > >> but it's disabled by default so using machine_ops.crash_shutdown is
> > >> better.
> > >
> > > Another option is to defer this to the kdump kernel. On arm64 at least, we know
> > > if we're in a kdump kernel early on, and can reset some state based upon that.
> > >
> > > Looking at x86's hyperv_cleanup(), everything relevant to arm64 can be deferred
> > > to just before the kdump kernel detects and initializes anything relating to
> > > hyperv. So AFAICT we could have hyperv_init() check is_kdump_kernel() prior to
> > > the first hypercall, and do the cleanup/reset there.
> >
> > In theory yes, it is possible to try sending CHANNELMSG_UNLOAD on kdump
> > kernel boot and not upon crash, I don't remember if this approach was
> > tried in the past.
> >
> > > Maybe we need more data for the vmbus bits? ... if so it seems that could blow
> > > up anyway when the first kernel was tearing down.
> >
> > Not sure I understood what you mean... From what I remember, there were
> > issues with CHANNELMSG_UNLOAD handling on the Hyper-V host side in the
> > past (it was taking *minutes* for the host to reply) but this is
> > orthogonal to the fact that we need to do this cleanup so kdump kernel
> > is able to connect to Vmbus devices again.
> 
> I was thinking that if it was necessary to have some context (e.g. pointers to
> buffers which are active) in order to do the teardown, it might be painful to
> do that in the kdump kernel itself.
> 
> Otherwise, I think doing the teardown in the kdump kernel itself would be
> preferable, since there's a greater likelihood that kernel infrastructure will
> work relative to doing that in the kernel which crashed, and it gives the kdump
> kernel the option to detect when something cannot be torn down, and not use
> that feature.
> 

Apologies for the delay in joining this thread.  In addition to being out on
vacation, I've been doing some further investigation to make sure I have
my info right.

The idea of doing the VMbus teardown in the kdump kernel itself is intriguing,
but has its own problems.  Sending the CHANNELMSG_UNLOAD in the kdump
kernel should work OK.  But Hyper-V will ack the command, the ack comes
back into a queue in the original kernel memory.  We can't re-initiate the
VMbus connection in the kdump kernel until we have the ack.  We don't need
any data from the ack, so we *could* just wait 100 seconds and assume the
ack has come in (in unusual cases, the ack really can take that long for
reasons documented in the code).  Given a choice of doing the VMbus
teardown in the kdump kernel (including waiting an extra 100 seconds) vs.
doing the teardown in a panic notifier in the original kernel, I think the panic
notifier approach is preferable.  The risk of a failure that prevents kdump
from working seems only very slightly higher when the teardown is done in
the original kernel.

The bigger problem is with a normal kexec().   On x86/x64, we depend on
the machine_ops.shutdown() to run the code to do the VMbus teardown.
Today, the teardown isn't happening at all on ARM64, leaving kexec() at risk
of a variety of failures.  kexec() shuts down all devices, so individual VMbus
synthetic devices get properly shutdown.  But VMbus is bus, not a device,
and the VMbus connection is managed by the VMbus bus driver.  From what
I can see, there's no mechanism to explicitly shut down busses upon kexec().

Just brainstorming, I'm wondering if we could create a dummy VMbus
device that would teardown the VMbus connection in the case of a kexec().
Kexec() appears to explicitly shutdown devices in reverse order, which would
work since the dummy device can be created before any of the other
synthetic VMbus devices show up.

I'm open to other ideas as well.  I understand the desire not to open
floodgates by adding the equivalent of machine_ops on the ARM64 side.

Michael

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-05-30  1:51 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-04 20:00 Should arm64 have a custom crash shutdown handler? Guilherme G. Piccoli
2022-05-05  7:29 ` Marc Zyngier
2022-05-05 12:44   ` Guilherme G. Piccoli
2022-05-05 12:53     ` Mark Rutland
2022-05-05 13:05       ` Guilherme G. Piccoli
2022-05-05 13:15         ` Mark Rutland
2022-05-05 13:19           ` Guilherme G. Piccoli
2022-05-05 13:52         ` Vitaly Kuznetsov
2022-05-05 14:07           ` Guilherme G. Piccoli
2022-05-05 14:31           ` Mark Rutland
2022-05-05 14:51             ` Vitaly Kuznetsov
2022-05-06 11:01               ` Mark Rutland
2022-05-30  1:51                 ` Michael Kelley (LINUX)
2022-05-05 11:10 ` Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).