Thoughts on kexec / SBI

* Thoughts on kexec / SBI
@ 2019-03-23 18:05 Nick Kossifidis
       [not found] ` <6948FE9F-68BE-448A-A8B5-A293C297B088@wdc.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Nick Kossifidis @ 2019-03-23 18:05 UTC (permalink / raw)
  To: linux-riscv; +Cc: atish.patra, Anup.Patel, palmer

Hello all,

I'm working on implementing kexec on RISC-V (just kexec for now, not 
kdump yet) since we want to be able to test new kernel images easily on 
our testbeds. It's part of a larger project where I'm trying to have a 
unified way of testing various linux-capable RISC-V targets 
(https://github.com/mickflemm/yarvt) that we have on the lab.

The issue is that we don't have a way of stopping the secondary harts in 
a recoverable way, so at this point I have two options, either I'm going 
to call smp_send_stop() on machine_shutdown() and come back from kexec 
with a single hart running (which should be fine for kdump btw), or I 
need to have a reserved memory space where I'll have to keep some code 
to be executed by the secondary harts, that will be patched by the boot 
hart once the new kernel is in place, to let them execute the new 
kernel. The second option is too complicated for no reason and it also 
reduces the flexibility of the process since we can't use the whole RAM 
for the loaded kimage, plus it will be obsolete when we have proper 
handling of CPU suspend / per-hart reset, through SBI. So I'm going for 
the first option until then.

I'd like to jump-start the discussion on how we can handle things 
through SBI, my initial through was this:

a) Have a new IPI code IPI_SUSPEND, handled by the firmware (its 
code/data is persistent across kexecs), that puts the hart on a wfi 
loop, checking for a variable that contains the new virtual address to 
jump to (where the new kernel is located) on this hart's scratch buffer. 
Alternatively this call may do power management and completely shut down 
the CPU, if the hardware supports this.

b) Have another IPI code IPI_RESUME, that wakes up the harts by 
providing them with the virtual address to jump to. This can be a simple 
write of the new address on the remote hart's scratch buffer + 
interrupt, or it can use power management to power up the hart and 
either provide it with a new reset vector, or do some handling on the 
BootROM code for that (this btw is also an open discussion on the TEE 
group where we are discussing Secure Boot). Also in case the hart is 
already running (an IPI_SUSPEND hasn't been sent on it before, e.g. 
during boot), the firmware will ignore the event, alternatively we may 
call IPI_SUSPEND on the firmware during boot so that the boot process 
happens with a single hart and other harts wait for the OS to wake them 
up.

c) During machine_shutdown / machine_crash_shutdown, we issue 
IPI_SUSPEND on all other harts, and on setup_smp() we issue an 
IPI_RESUME to wake them up just in case.

This way we'll also be able to announce ARCH_SUSPEND_POSSIBLE and 
implement CPU hot-plugging (for PM_SLEEP_SMP) even on platforms that 
don't support power management on hardware.

What do you think ?

Regards,
Nick

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 5+ messages in thread