linux-riscv.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* Thoughts on kexec / SBI
@ 2019-03-23 18:05 Nick Kossifidis
       [not found] ` <6948FE9F-68BE-448A-A8B5-A293C297B088@wdc.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Nick Kossifidis @ 2019-03-23 18:05 UTC (permalink / raw)
  To: linux-riscv; +Cc: atish.patra, Anup.Patel, palmer

Hello all,

I'm working on implementing kexec on RISC-V (just kexec for now, not 
kdump yet) since we want to be able to test new kernel images easily on 
our testbeds. It's part of a larger project where I'm trying to have a 
unified way of testing various linux-capable RISC-V targets 
(https://github.com/mickflemm/yarvt) that we have on the lab.

The issue is that we don't have a way of stopping the secondary harts in 
a recoverable way, so at this point I have two options, either I'm going 
to call smp_send_stop() on machine_shutdown() and come back from kexec 
with a single hart running (which should be fine for kdump btw), or I 
need to have a reserved memory space where I'll have to keep some code 
to be executed by the secondary harts, that will be patched by the boot 
hart once the new kernel is in place, to let them execute the new 
kernel. The second option is too complicated for no reason and it also 
reduces the flexibility of the process since we can't use the whole RAM 
for the loaded kimage, plus it will be obsolete when we have proper 
handling of CPU suspend / per-hart reset, through SBI. So I'm going for 
the first option until then.

I'd like to jump-start the discussion on how we can handle things 
through SBI, my initial through was this:

a) Have a new IPI code IPI_SUSPEND, handled by the firmware (its 
code/data is persistent across kexecs), that puts the hart on a wfi 
loop, checking for a variable that contains the new virtual address to 
jump to (where the new kernel is located) on this hart's scratch buffer. 
Alternatively this call may do power management and completely shut down 
the CPU, if the hardware supports this.

b) Have another IPI code IPI_RESUME, that wakes up the harts by 
providing them with the virtual address to jump to. This can be a simple 
write of the new address on the remote hart's scratch buffer + 
interrupt, or it can use power management to power up the hart and 
either provide it with a new reset vector, or do some handling on the 
BootROM code for that (this btw is also an open discussion on the TEE 
group where we are discussing Secure Boot). Also in case the hart is 
already running (an IPI_SUSPEND hasn't been sent on it before, e.g. 
during boot), the firmware will ignore the event, alternatively we may 
call IPI_SUSPEND on the firmware during boot so that the boot process 
happens with a single hart and other harts wait for the OS to wake them 
up.

c) During machine_shutdown / machine_crash_shutdown, we issue 
IPI_SUSPEND on all other harts, and on setup_smp() we issue an 
IPI_RESUME to wake them up just in case.

This way we'll also be able to announce ARCH_SUSPEND_POSSIBLE and 
implement CPU hot-plugging (for PM_SLEEP_SMP) even on platforms that 
don't support power management on hardware.

What do you think ?

Regards,
Nick

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Thoughts on kexec / SBI
       [not found] ` <6948FE9F-68BE-448A-A8B5-A293C297B088@wdc.com>
@ 2019-03-24 22:17   ` Nick Kossifidis
  2019-03-26  1:58     ` Palmer Dabbelt
  0 siblings, 1 reply; 5+ messages in thread
From: Nick Kossifidis @ 2019-03-24 22:17 UTC (permalink / raw)
  To: Atish Patra; +Cc: Nick Kossifidis, Anup Patel, linux-riscv, palmer

Hello Atish,

Στις 2019-03-24 00:31, Atish Patra έγραψε:
> Hi Nick,
> 
> On Mar 23, 2019, at 11:05 AM, Nick Kossifidis <mick@ics.forth.gr>
> wrote:
> 
>> Hello all,
>> 
>> I'm working on implementing kexec on RISC-V (just kexec for now, not
>> kdump yet) since we want to be able to test new kernel images easily
>> on our testbeds. It's part of a larger project where I'm trying to
>> have a unified way of testing various linux-capable RISC-V targets
>> (https://github.com/mickflemm/yarvt [1]) that we have on the lab.
> 
> Great. This repo will be very helpful to beginners. I had a quick look
> at the sifive-fu540/commands.sh.
> objcopy step is redundant for OpenSBI as it already generates the
> fw_payload.bin as well.
> 

I hope so, thanks for the update, I prefer it this way since it shows 
how the .bin is generated.

>> The issue is that we don't have a way of stopping the secondary
>> harts in a recoverable way, so at this point I have two options,
>> either I'm going to call smp_send_stop() on machine_shutdown() and
>> come back from kexec with a single hart running (which should be
>> fine for kdump btw), or I need to have a reserved memory space where
>> I'll have to keep some code to be executed by the secondary harts,
>> that will be patched by the boot hart once the new kernel is in
>> place, to let them execute the new kernel. The second option is too
>> complicated for no reason and it also reduces the flexibility of the
>> process since we can't use the whole RAM for the loaded kimage, plus
>> it will be obsolete when we have proper handling of CPU suspend /
>> per-hart reset, through SBI. So I'm going for the first option until
>> then.
>> 
>> I'd like to jump-start the discussion on how we can handle things
>> through SBI, my initial through was this:
>> 
>> a) Have a new IPI code IPI_SUSPEND, handled by the firmware (its
>> code/data is persistent across kexecs), that puts the hart on a wfi
>> loop, checking for a variable that contains the new virtual address
>> to jump to (where the new kernel is located) on this hart's scratch
>> buffer. Alternatively this call may do power management and
>> completely shut down the CPU, if the hardware supports this.
>> 
>> b) Have another IPI code IPI_RESUME, that wakes up the harts by
>> providing them with the virtual address to jump to. This can be a
>> simple write of the new address on the remote hart's scratch buffer
>> + interrupt, or it can use power management to power up the hart and
>> either provide it with a new reset vector, or do some handling on
>> the BootROM code for that (this btw is also an open discussion on
>> the TEE group where we are discussing Secure Boot). Also in case the
>> hart is already running (an IPI_SUSPEND hasn't been sent on it
>> before, e.g. during boot), the firmware will ignore the event,
>> alternatively we may call IPI_SUSPEND on the firmware during boot so
>> that the boot process happens with a single hart and other harts
>> wait for the OS to wake them up.
> 
> Looks good to me. I had similar IPI approach(IPI WAKEUP) for cpu
> hotplug implementation except the resume address requirement.
> https://lkml.org/lkml/2018/9/6/214 [2]
> 
> The only issue is that with all this IPIs will be redundant when hart
> power management SBI extension (a much cleaner interface) will be
> available. My hotplug patch is on hold because of that reason as well.
> 
> 
> Regards,
> Atish
> 

My proposal is about the new SBI API/extension, as I mention these IPI 
calls i'm talking about need to be implemented on the firmware side, 
they won't be handled by the OS like IPI_SOFT. So since you prefer to 
have separate calls for each of the hart states the idea is:

sbi_hart_suspend(cpumask) -> send an IPI_SUSPEND to cpus on cpumask
sbi_hart_resume(cpumask) -> send an IPI_RESUME to cpus on cpumask

Regards,
Nick

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Thoughts on kexec / SBI
  2019-03-24 22:17   ` Nick Kossifidis
@ 2019-03-26  1:58     ` Palmer Dabbelt
  2019-03-26  3:24       ` Nick Kossifidis
  0 siblings, 1 reply; 5+ messages in thread
From: Palmer Dabbelt @ 2019-03-26  1:58 UTC (permalink / raw)
  To: mick; +Cc: mick, Atish Patra, linux-riscv, Anup Patel, Paul Walmsley

On Sun, 24 Mar 2019 15:17:10 PDT (-0700), mick@ics.forth.gr wrote:
> Hello Atish,
>
> Στις 2019-03-24 00:31, Atish Patra έγραψε:
>> Hi Nick,
>>
>> On Mar 23, 2019, at 11:05 AM, Nick Kossifidis <mick@ics.forth.gr>
>> wrote:
>>
>>> Hello all,
>>>
>>> I'm working on implementing kexec on RISC-V (just kexec for now, not
>>> kdump yet) since we want to be able to test new kernel images easily
>>> on our testbeds. It's part of a larger project where I'm trying to
>>> have a unified way of testing various linux-capable RISC-V targets
>>> (https://github.com/mickflemm/yarvt [1]) that we have on the lab.
>>
>> Great. This repo will be very helpful to beginners. I had a quick look
>> at the sifive-fu540/commands.sh.
>> objcopy step is redundant for OpenSBI as it already generates the
>> fw_payload.bin as well.
>>
>
> I hope so, thanks for the update, I prefer it this way since it shows
> how the .bin is generated.
>
>>> The issue is that we don't have a way of stopping the secondary
>>> harts in a recoverable way, so at this point I have two options,
>>> either I'm going to call smp_send_stop() on machine_shutdown() and
>>> come back from kexec with a single hart running (which should be
>>> fine for kdump btw), or I need to have a reserved memory space where
>>> I'll have to keep some code to be executed by the secondary harts,
>>> that will be patched by the boot hart once the new kernel is in
>>> place, to let them execute the new kernel. The second option is too
>>> complicated for no reason and it also reduces the flexibility of the
>>> process since we can't use the whole RAM for the loaded kimage, plus
>>> it will be obsolete when we have proper handling of CPU suspend /
>>> per-hart reset, through SBI. So I'm going for the first option until
>>> then.
>>>
>>> I'd like to jump-start the discussion on how we can handle things
>>> through SBI, my initial through was this:
>>>
>>> a) Have a new IPI code IPI_SUSPEND, handled by the firmware (its
>>> code/data is persistent across kexecs), that puts the hart on a wfi
>>> loop, checking for a variable that contains the new virtual address
>>> to jump to (where the new kernel is located) on this hart's scratch
>>> buffer. Alternatively this call may do power management and
>>> completely shut down the CPU, if the hardware supports this.
>>>
>>> b) Have another IPI code IPI_RESUME, that wakes up the harts by
>>> providing them with the virtual address to jump to. This can be a
>>> simple write of the new address on the remote hart's scratch buffer
>>> + interrupt, or it can use power management to power up the hart and
>>> either provide it with a new reset vector, or do some handling on
>>> the BootROM code for that (this btw is also an open discussion on
>>> the TEE group where we are discussing Secure Boot). Also in case the
>>> hart is already running (an IPI_SUSPEND hasn't been sent on it
>>> before, e.g. during boot), the firmware will ignore the event,
>>> alternatively we may call IPI_SUSPEND on the firmware during boot so
>>> that the boot process happens with a single hart and other harts
>>> wait for the OS to wake them up.
>>
>> Looks good to me. I had similar IPI approach(IPI WAKEUP) for cpu
>> hotplug implementation except the resume address requirement.
>> https://lkml.org/lkml/2018/9/6/214 [2]
>>
>> The only issue is that with all this IPIs will be redundant when hart
>> power management SBI extension (a much cleaner interface) will be
>> available. My hotplug patch is on hold because of that reason as well.
>>
>>
>> Regards,
>> Atish
>>
>
> My proposal is about the new SBI API/extension, as I mention these IPI
> calls i'm talking about need to be implemented on the firmware side,
> they won't be handled by the OS like IPI_SOFT. So since you prefer to
> have separate calls for each of the hart states the idea is:
>
> sbi_hart_suspend(cpumask) -> send an IPI_SUSPEND to cpus on cpumask
> sbi_hart_resume(cpumask) -> send an IPI_RESUME to cpus on cpumask

The IPI types are currently not exposed between Linux and the firmware and I'd 
like to keep it that way.  Let's just prototype the first version of the hart 
power management extension instead.  Implementing them in the firmware via IPIs 
is the only valid mechanism on our existing implementation, but the codes can 
stay within the firmware.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Thoughts on kexec / SBI
  2019-03-26  1:58     ` Palmer Dabbelt
@ 2019-03-26  3:24       ` Nick Kossifidis
  2019-03-26  6:57         ` Palmer Dabbelt
  0 siblings, 1 reply; 5+ messages in thread
From: Nick Kossifidis @ 2019-03-26  3:24 UTC (permalink / raw)
  To: Palmer Dabbelt; +Cc: mick, Atish Patra, linux-riscv, Anup Patel, Paul Walmsley

Στις 2019-03-26 03:58, Palmer Dabbelt έγραψε:
> On Sun, 24 Mar 2019 15:17:10 PDT (-0700), mick@ics.forth.gr wrote:
>> Hello Atish,
>> 
>> Στις 2019-03-24 00:31, Atish Patra έγραψε:
>>> Hi Nick,
>>> 
>>> On Mar 23, 2019, at 11:05 AM, Nick Kossifidis <mick@ics.forth.gr>
>>> wrote:
>>> 
>>>> Hello all,
>>>> 
>>>> I'm working on implementing kexec on RISC-V (just kexec for now, not
>>>> kdump yet) since we want to be able to test new kernel images easily
>>>> on our testbeds. It's part of a larger project where I'm trying to
>>>> have a unified way of testing various linux-capable RISC-V targets
>>>> (https://github.com/mickflemm/yarvt [1]) that we have on the lab.
>>> 
>>> Great. This repo will be very helpful to beginners. I had a quick 
>>> look
>>> at the sifive-fu540/commands.sh.
>>> objcopy step is redundant for OpenSBI as it already generates the
>>> fw_payload.bin as well.
>>> 
>> 
>> I hope so, thanks for the update, I prefer it this way since it shows
>> how the .bin is generated.
>> 
>>>> The issue is that we don't have a way of stopping the secondary
>>>> harts in a recoverable way, so at this point I have two options,
>>>> either I'm going to call smp_send_stop() on machine_shutdown() and
>>>> come back from kexec with a single hart running (which should be
>>>> fine for kdump btw), or I need to have a reserved memory space where
>>>> I'll have to keep some code to be executed by the secondary harts,
>>>> that will be patched by the boot hart once the new kernel is in
>>>> place, to let them execute the new kernel. The second option is too
>>>> complicated for no reason and it also reduces the flexibility of the
>>>> process since we can't use the whole RAM for the loaded kimage, plus
>>>> it will be obsolete when we have proper handling of CPU suspend /
>>>> per-hart reset, through SBI. So I'm going for the first option until
>>>> then.
>>>> 
>>>> I'd like to jump-start the discussion on how we can handle things
>>>> through SBI, my initial through was this:
>>>> 
>>>> a) Have a new IPI code IPI_SUSPEND, handled by the firmware (its
>>>> code/data is persistent across kexecs), that puts the hart on a wfi
>>>> loop, checking for a variable that contains the new virtual address
>>>> to jump to (where the new kernel is located) on this hart's scratch
>>>> buffer. Alternatively this call may do power management and
>>>> completely shut down the CPU, if the hardware supports this.
>>>> 
>>>> b) Have another IPI code IPI_RESUME, that wakes up the harts by
>>>> providing them with the virtual address to jump to. This can be a
>>>> simple write of the new address on the remote hart's scratch buffer
>>>> + interrupt, or it can use power management to power up the hart and
>>>> either provide it with a new reset vector, or do some handling on
>>>> the BootROM code for that (this btw is also an open discussion on
>>>> the TEE group where we are discussing Secure Boot). Also in case the
>>>> hart is already running (an IPI_SUSPEND hasn't been sent on it
>>>> before, e.g. during boot), the firmware will ignore the event,
>>>> alternatively we may call IPI_SUSPEND on the firmware during boot so
>>>> that the boot process happens with a single hart and other harts
>>>> wait for the OS to wake them up.
>>> 
>>> Looks good to me. I had similar IPI approach(IPI WAKEUP) for cpu
>>> hotplug implementation except the resume address requirement.
>>> https://lkml.org/lkml/2018/9/6/214 [2]
>>> 
>>> The only issue is that with all this IPIs will be redundant when hart
>>> power management SBI extension (a much cleaner interface) will be
>>> available. My hotplug patch is on hold because of that reason as 
>>> well.
>>> 
>>> 
>>> Regards,
>>> Atish
>>> 
>> 
>> My proposal is about the new SBI API/extension, as I mention these IPI
>> calls i'm talking about need to be implemented on the firmware side,
>> they won't be handled by the OS like IPI_SOFT. So since you prefer to
>> have separate calls for each of the hart states the idea is:
>> 
>> sbi_hart_suspend(cpumask) -> send an IPI_SUSPEND to cpus on cpumask
>> sbi_hart_resume(cpumask) -> send an IPI_RESUME to cpus on cpumask
> 
> The IPI types are currently not exposed between Linux and the firmware
> and I'd like to keep it that way.  Let's just prototype the first
> version of the hart power management extension instead.  Implementing
> them in the firmware via IPIs is the only valid mechanism on our
> existing implementation, but the codes can stay within the firmware.

I had OpenSBI in mind when I mentioned the IPI_* codes, I totally agree 
with you these shouldn't be exposed / handled by the OS.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Thoughts on kexec / SBI
  2019-03-26  3:24       ` Nick Kossifidis
@ 2019-03-26  6:57         ` Palmer Dabbelt
  0 siblings, 0 replies; 5+ messages in thread
From: Palmer Dabbelt @ 2019-03-26  6:57 UTC (permalink / raw)
  To: mick; +Cc: mick, Atish Patra, linux-riscv, Anup Patel, Paul Walmsley

On Mon, 25 Mar 2019 20:24:34 PDT (-0700), mick@ics.forth.gr wrote:
> Στις 2019-03-26 03:58, Palmer Dabbelt έγραψε:
>> On Sun, 24 Mar 2019 15:17:10 PDT (-0700), mick@ics.forth.gr wrote:
>>> Hello Atish,
>>>
>>> Στις 2019-03-24 00:31, Atish Patra έγραψε:
>>>> Hi Nick,
>>>>
>>>> On Mar 23, 2019, at 11:05 AM, Nick Kossifidis <mick@ics.forth.gr>
>>>> wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> I'm working on implementing kexec on RISC-V (just kexec for now, not
>>>>> kdump yet) since we want to be able to test new kernel images easily
>>>>> on our testbeds. It's part of a larger project where I'm trying to
>>>>> have a unified way of testing various linux-capable RISC-V targets
>>>>> (https://github.com/mickflemm/yarvt [1]) that we have on the lab.
>>>>
>>>> Great. This repo will be very helpful to beginners. I had a quick
>>>> look
>>>> at the sifive-fu540/commands.sh.
>>>> objcopy step is redundant for OpenSBI as it already generates the
>>>> fw_payload.bin as well.
>>>>
>>>
>>> I hope so, thanks for the update, I prefer it this way since it shows
>>> how the .bin is generated.
>>>
>>>>> The issue is that we don't have a way of stopping the secondary
>>>>> harts in a recoverable way, so at this point I have two options,
>>>>> either I'm going to call smp_send_stop() on machine_shutdown() and
>>>>> come back from kexec with a single hart running (which should be
>>>>> fine for kdump btw), or I need to have a reserved memory space where
>>>>> I'll have to keep some code to be executed by the secondary harts,
>>>>> that will be patched by the boot hart once the new kernel is in
>>>>> place, to let them execute the new kernel. The second option is too
>>>>> complicated for no reason and it also reduces the flexibility of the
>>>>> process since we can't use the whole RAM for the loaded kimage, plus
>>>>> it will be obsolete when we have proper handling of CPU suspend /
>>>>> per-hart reset, through SBI. So I'm going for the first option until
>>>>> then.
>>>>>
>>>>> I'd like to jump-start the discussion on how we can handle things
>>>>> through SBI, my initial through was this:
>>>>>
>>>>> a) Have a new IPI code IPI_SUSPEND, handled by the firmware (its
>>>>> code/data is persistent across kexecs), that puts the hart on a wfi
>>>>> loop, checking for a variable that contains the new virtual address
>>>>> to jump to (where the new kernel is located) on this hart's scratch
>>>>> buffer. Alternatively this call may do power management and
>>>>> completely shut down the CPU, if the hardware supports this.
>>>>>
>>>>> b) Have another IPI code IPI_RESUME, that wakes up the harts by
>>>>> providing them with the virtual address to jump to. This can be a
>>>>> simple write of the new address on the remote hart's scratch buffer
>>>>> + interrupt, or it can use power management to power up the hart and
>>>>> either provide it with a new reset vector, or do some handling on
>>>>> the BootROM code for that (this btw is also an open discussion on
>>>>> the TEE group where we are discussing Secure Boot). Also in case the
>>>>> hart is already running (an IPI_SUSPEND hasn't been sent on it
>>>>> before, e.g. during boot), the firmware will ignore the event,
>>>>> alternatively we may call IPI_SUSPEND on the firmware during boot so
>>>>> that the boot process happens with a single hart and other harts
>>>>> wait for the OS to wake them up.
>>>>
>>>> Looks good to me. I had similar IPI approach(IPI WAKEUP) for cpu
>>>> hotplug implementation except the resume address requirement.
>>>> https://lkml.org/lkml/2018/9/6/214 [2]
>>>>
>>>> The only issue is that with all this IPIs will be redundant when hart
>>>> power management SBI extension (a much cleaner interface) will be
>>>> available. My hotplug patch is on hold because of that reason as
>>>> well.
>>>>
>>>>
>>>> Regards,
>>>> Atish
>>>>
>>>
>>> My proposal is about the new SBI API/extension, as I mention these IPI
>>> calls i'm talking about need to be implemented on the firmware side,
>>> they won't be handled by the OS like IPI_SOFT. So since you prefer to
>>> have separate calls for each of the hart states the idea is:
>>>
>>> sbi_hart_suspend(cpumask) -> send an IPI_SUSPEND to cpus on cpumask
>>> sbi_hart_resume(cpumask) -> send an IPI_RESUME to cpus on cpumask
>>
>> The IPI types are currently not exposed between Linux and the firmware
>> and I'd like to keep it that way.  Let's just prototype the first
>> version of the hart power management extension instead.  Implementing
>> them in the firmware via IPIs is the only valid mechanism on our
>> existing implementation, but the codes can stay within the firmware.
>
> I had OpenSBI in mind when I mentioned the IPI_* codes, I totally agree
> with you these shouldn't be exposed / handled by the OS.

OK, cool, I think we're all on the same page.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-03-26  6:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-23 18:05 Thoughts on kexec / SBI Nick Kossifidis
     [not found] ` <6948FE9F-68BE-448A-A8B5-A293C297B088@wdc.com>
2019-03-24 22:17   ` Nick Kossifidis
2019-03-26  1:58     ` Palmer Dabbelt
2019-03-26  3:24       ` Nick Kossifidis
2019-03-26  6:57         ` Palmer Dabbelt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).