* Thoughts on kexec / SBI
@ 2019-03-23 18:05 Nick Kossifidis
[not found] ` <6948FE9F-68BE-448A-A8B5-A293C297B088@wdc.com>
0 siblings, 1 reply; 5+ messages in thread
From: Nick Kossifidis @ 2019-03-23 18:05 UTC (permalink / raw)
To: linux-riscv; +Cc: atish.patra, Anup.Patel, palmer
Hello all,
I'm working on implementing kexec on RISC-V (just kexec for now, not
kdump yet) since we want to be able to test new kernel images easily on
our testbeds. It's part of a larger project where I'm trying to have a
unified way of testing various linux-capable RISC-V targets
(https://github.com/mickflemm/yarvt) that we have on the lab.
The issue is that we don't have a way of stopping the secondary harts in
a recoverable way, so at this point I have two options, either I'm going
to call smp_send_stop() on machine_shutdown() and come back from kexec
with a single hart running (which should be fine for kdump btw), or I
need to have a reserved memory space where I'll have to keep some code
to be executed by the secondary harts, that will be patched by the boot
hart once the new kernel is in place, to let them execute the new
kernel. The second option is too complicated for no reason and it also
reduces the flexibility of the process since we can't use the whole RAM
for the loaded kimage, plus it will be obsolete when we have proper
handling of CPU suspend / per-hart reset, through SBI. So I'm going for
the first option until then.
I'd like to jump-start the discussion on how we can handle things
through SBI, my initial through was this:
a) Have a new IPI code IPI_SUSPEND, handled by the firmware (its
code/data is persistent across kexecs), that puts the hart on a wfi
loop, checking for a variable that contains the new virtual address to
jump to (where the new kernel is located) on this hart's scratch buffer.
Alternatively this call may do power management and completely shut down
the CPU, if the hardware supports this.
b) Have another IPI code IPI_RESUME, that wakes up the harts by
providing them with the virtual address to jump to. This can be a simple
write of the new address on the remote hart's scratch buffer +
interrupt, or it can use power management to power up the hart and
either provide it with a new reset vector, or do some handling on the
BootROM code for that (this btw is also an open discussion on the TEE
group where we are discussing Secure Boot). Also in case the hart is
already running (an IPI_SUSPEND hasn't been sent on it before, e.g.
during boot), the firmware will ignore the event, alternatively we may
call IPI_SUSPEND on the firmware during boot so that the boot process
happens with a single hart and other harts wait for the OS to wake them
up.
c) During machine_shutdown / machine_crash_shutdown, we issue
IPI_SUSPEND on all other harts, and on setup_smp() we issue an
IPI_RESUME to wake them up just in case.
This way we'll also be able to announce ARCH_SUSPEND_POSSIBLE and
implement CPU hot-plugging (for PM_SLEEP_SMP) even on platforms that
don't support power management on hardware.
What do you think ?
Regards,
Nick
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Thoughts on kexec / SBI
[not found] ` <6948FE9F-68BE-448A-A8B5-A293C297B088@wdc.com>
@ 2019-03-24 22:17 ` Nick Kossifidis
2019-03-26 1:58 ` Palmer Dabbelt
0 siblings, 1 reply; 5+ messages in thread
From: Nick Kossifidis @ 2019-03-24 22:17 UTC (permalink / raw)
To: Atish Patra; +Cc: Nick Kossifidis, Anup Patel, linux-riscv, palmer
Hello Atish,
Στις 2019-03-24 00:31, Atish Patra έγραψε:
> Hi Nick,
>
> On Mar 23, 2019, at 11:05 AM, Nick Kossifidis <mick@ics.forth.gr>
> wrote:
>
>> Hello all,
>>
>> I'm working on implementing kexec on RISC-V (just kexec for now, not
>> kdump yet) since we want to be able to test new kernel images easily
>> on our testbeds. It's part of a larger project where I'm trying to
>> have a unified way of testing various linux-capable RISC-V targets
>> (https://github.com/mickflemm/yarvt [1]) that we have on the lab.
>
> Great. This repo will be very helpful to beginners. I had a quick look
> at the sifive-fu540/commands.sh.
> objcopy step is redundant for OpenSBI as it already generates the
> fw_payload.bin as well.
>
I hope so, thanks for the update, I prefer it this way since it shows
how the .bin is generated.
>> The issue is that we don't have a way of stopping the secondary
>> harts in a recoverable way, so at this point I have two options,
>> either I'm going to call smp_send_stop() on machine_shutdown() and
>> come back from kexec with a single hart running (which should be
>> fine for kdump btw), or I need to have a reserved memory space where
>> I'll have to keep some code to be executed by the secondary harts,
>> that will be patched by the boot hart once the new kernel is in
>> place, to let them execute the new kernel. The second option is too
>> complicated for no reason and it also reduces the flexibility of the
>> process since we can't use the whole RAM for the loaded kimage, plus
>> it will be obsolete when we have proper handling of CPU suspend /
>> per-hart reset, through SBI. So I'm going for the first option until
>> then.
>>
>> I'd like to jump-start the discussion on how we can handle things
>> through SBI, my initial through was this:
>>
>> a) Have a new IPI code IPI_SUSPEND, handled by the firmware (its
>> code/data is persistent across kexecs), that puts the hart on a wfi
>> loop, checking for a variable that contains the new virtual address
>> to jump to (where the new kernel is located) on this hart's scratch
>> buffer. Alternatively this call may do power management and
>> completely shut down the CPU, if the hardware supports this.
>>
>> b) Have another IPI code IPI_RESUME, that wakes up the harts by
>> providing them with the virtual address to jump to. This can be a
>> simple write of the new address on the remote hart's scratch buffer
>> + interrupt, or it can use power management to power up the hart and
>> either provide it with a new reset vector, or do some handling on
>> the BootROM code for that (this btw is also an open discussion on
>> the TEE group where we are discussing Secure Boot). Also in case the
>> hart is already running (an IPI_SUSPEND hasn't been sent on it
>> before, e.g. during boot), the firmware will ignore the event,
>> alternatively we may call IPI_SUSPEND on the firmware during boot so
>> that the boot process happens with a single hart and other harts
>> wait for the OS to wake them up.
>
> Looks good to me. I had similar IPI approach(IPI WAKEUP) for cpu
> hotplug implementation except the resume address requirement.
> https://lkml.org/lkml/2018/9/6/214 [2]
>
> The only issue is that with all this IPIs will be redundant when hart
> power management SBI extension (a much cleaner interface) will be
> available. My hotplug patch is on hold because of that reason as well.
>
>
> Regards,
> Atish
>
My proposal is about the new SBI API/extension, as I mention these IPI
calls i'm talking about need to be implemented on the firmware side,
they won't be handled by the OS like IPI_SOFT. So since you prefer to
have separate calls for each of the hart states the idea is:
sbi_hart_suspend(cpumask) -> send an IPI_SUSPEND to cpus on cpumask
sbi_hart_resume(cpumask) -> send an IPI_RESUME to cpus on cpumask
Regards,
Nick
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Thoughts on kexec / SBI
2019-03-24 22:17 ` Nick Kossifidis
@ 2019-03-26 1:58 ` Palmer Dabbelt
2019-03-26 3:24 ` Nick Kossifidis
0 siblings, 1 reply; 5+ messages in thread
From: Palmer Dabbelt @ 2019-03-26 1:58 UTC (permalink / raw)
To: mick; +Cc: mick, Atish Patra, linux-riscv, Anup Patel, Paul Walmsley
On Sun, 24 Mar 2019 15:17:10 PDT (-0700), mick@ics.forth.gr wrote:
> Hello Atish,
>
> Στις 2019-03-24 00:31, Atish Patra έγραψε:
>> Hi Nick,
>>
>> On Mar 23, 2019, at 11:05 AM, Nick Kossifidis <mick@ics.forth.gr>
>> wrote:
>>
>>> Hello all,
>>>
>>> I'm working on implementing kexec on RISC-V (just kexec for now, not
>>> kdump yet) since we want to be able to test new kernel images easily
>>> on our testbeds. It's part of a larger project where I'm trying to
>>> have a unified way of testing various linux-capable RISC-V targets
>>> (https://github.com/mickflemm/yarvt [1]) that we have on the lab.
>>
>> Great. This repo will be very helpful to beginners. I had a quick look
>> at the sifive-fu540/commands.sh.
>> objcopy step is redundant for OpenSBI as it already generates the
>> fw_payload.bin as well.
>>
>
> I hope so, thanks for the update, I prefer it this way since it shows
> how the .bin is generated.
>
>>> The issue is that we don't have a way of stopping the secondary
>>> harts in a recoverable way, so at this point I have two options,
>>> either I'm going to call smp_send_stop() on machine_shutdown() and
>>> come back from kexec with a single hart running (which should be
>>> fine for kdump btw), or I need to have a reserved memory space where
>>> I'll have to keep some code to be executed by the secondary harts,
>>> that will be patched by the boot hart once the new kernel is in
>>> place, to let them execute the new kernel. The second option is too
>>> complicated for no reason and it also reduces the flexibility of the
>>> process since we can't use the whole RAM for the loaded kimage, plus
>>> it will be obsolete when we have proper handling of CPU suspend /
>>> per-hart reset, through SBI. So I'm going for the first option until
>>> then.
>>>
>>> I'd like to jump-start the discussion on how we can handle things
>>> through SBI, my initial through was this:
>>>
>>> a) Have a new IPI code IPI_SUSPEND, handled by the firmware (its
>>> code/data is persistent across kexecs), that puts the hart on a wfi
>>> loop, checking for a variable that contains the new virtual address
>>> to jump to (where the new kernel is located) on this hart's scratch
>>> buffer. Alternatively this call may do power management and
>>> completely shut down the CPU, if the hardware supports this.
>>>
>>> b) Have another IPI code IPI_RESUME, that wakes up the harts by
>>> providing them with the virtual address to jump to. This can be a
>>> simple write of the new address on the remote hart's scratch buffer
>>> + interrupt, or it can use power management to power up the hart and
>>> either provide it with a new reset vector, or do some handling on
>>> the BootROM code for that (this btw is also an open discussion on
>>> the TEE group where we are discussing Secure Boot). Also in case the
>>> hart is already running (an IPI_SUSPEND hasn't been sent on it
>>> before, e.g. during boot), the firmware will ignore the event,
>>> alternatively we may call IPI_SUSPEND on the firmware during boot so
>>> that the boot process happens with a single hart and other harts
>>> wait for the OS to wake them up.
>>
>> Looks good to me. I had similar IPI approach(IPI WAKEUP) for cpu
>> hotplug implementation except the resume address requirement.
>> https://lkml.org/lkml/2018/9/6/214 [2]
>>
>> The only issue is that with all this IPIs will be redundant when hart
>> power management SBI extension (a much cleaner interface) will be
>> available. My hotplug patch is on hold because of that reason as well.
>>
>>
>> Regards,
>> Atish
>>
>
> My proposal is about the new SBI API/extension, as I mention these IPI
> calls i'm talking about need to be implemented on the firmware side,
> they won't be handled by the OS like IPI_SOFT. So since you prefer to
> have separate calls for each of the hart states the idea is:
>
> sbi_hart_suspend(cpumask) -> send an IPI_SUSPEND to cpus on cpumask
> sbi_hart_resume(cpumask) -> send an IPI_RESUME to cpus on cpumask
The IPI types are currently not exposed between Linux and the firmware and I'd
like to keep it that way. Let's just prototype the first version of the hart
power management extension instead. Implementing them in the firmware via IPIs
is the only valid mechanism on our existing implementation, but the codes can
stay within the firmware.
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Thoughts on kexec / SBI
2019-03-26 1:58 ` Palmer Dabbelt
@ 2019-03-26 3:24 ` Nick Kossifidis
2019-03-26 6:57 ` Palmer Dabbelt
0 siblings, 1 reply; 5+ messages in thread
From: Nick Kossifidis @ 2019-03-26 3:24 UTC (permalink / raw)
To: Palmer Dabbelt; +Cc: mick, Atish Patra, linux-riscv, Anup Patel, Paul Walmsley
Στις 2019-03-26 03:58, Palmer Dabbelt έγραψε:
> On Sun, 24 Mar 2019 15:17:10 PDT (-0700), mick@ics.forth.gr wrote:
>> Hello Atish,
>>
>> Στις 2019-03-24 00:31, Atish Patra έγραψε:
>>> Hi Nick,
>>>
>>> On Mar 23, 2019, at 11:05 AM, Nick Kossifidis <mick@ics.forth.gr>
>>> wrote:
>>>
>>>> Hello all,
>>>>
>>>> I'm working on implementing kexec on RISC-V (just kexec for now, not
>>>> kdump yet) since we want to be able to test new kernel images easily
>>>> on our testbeds. It's part of a larger project where I'm trying to
>>>> have a unified way of testing various linux-capable RISC-V targets
>>>> (https://github.com/mickflemm/yarvt [1]) that we have on the lab.
>>>
>>> Great. This repo will be very helpful to beginners. I had a quick
>>> look
>>> at the sifive-fu540/commands.sh.
>>> objcopy step is redundant for OpenSBI as it already generates the
>>> fw_payload.bin as well.
>>>
>>
>> I hope so, thanks for the update, I prefer it this way since it shows
>> how the .bin is generated.
>>
>>>> The issue is that we don't have a way of stopping the secondary
>>>> harts in a recoverable way, so at this point I have two options,
>>>> either I'm going to call smp_send_stop() on machine_shutdown() and
>>>> come back from kexec with a single hart running (which should be
>>>> fine for kdump btw), or I need to have a reserved memory space where
>>>> I'll have to keep some code to be executed by the secondary harts,
>>>> that will be patched by the boot hart once the new kernel is in
>>>> place, to let them execute the new kernel. The second option is too
>>>> complicated for no reason and it also reduces the flexibility of the
>>>> process since we can't use the whole RAM for the loaded kimage, plus
>>>> it will be obsolete when we have proper handling of CPU suspend /
>>>> per-hart reset, through SBI. So I'm going for the first option until
>>>> then.
>>>>
>>>> I'd like to jump-start the discussion on how we can handle things
>>>> through SBI, my initial through was this:
>>>>
>>>> a) Have a new IPI code IPI_SUSPEND, handled by the firmware (its
>>>> code/data is persistent across kexecs), that puts the hart on a wfi
>>>> loop, checking for a variable that contains the new virtual address
>>>> to jump to (where the new kernel is located) on this hart's scratch
>>>> buffer. Alternatively this call may do power management and
>>>> completely shut down the CPU, if the hardware supports this.
>>>>
>>>> b) Have another IPI code IPI_RESUME, that wakes up the harts by
>>>> providing them with the virtual address to jump to. This can be a
>>>> simple write of the new address on the remote hart's scratch buffer
>>>> + interrupt, or it can use power management to power up the hart and
>>>> either provide it with a new reset vector, or do some handling on
>>>> the BootROM code for that (this btw is also an open discussion on
>>>> the TEE group where we are discussing Secure Boot). Also in case the
>>>> hart is already running (an IPI_SUSPEND hasn't been sent on it
>>>> before, e.g. during boot), the firmware will ignore the event,
>>>> alternatively we may call IPI_SUSPEND on the firmware during boot so
>>>> that the boot process happens with a single hart and other harts
>>>> wait for the OS to wake them up.
>>>
>>> Looks good to me. I had similar IPI approach(IPI WAKEUP) for cpu
>>> hotplug implementation except the resume address requirement.
>>> https://lkml.org/lkml/2018/9/6/214 [2]
>>>
>>> The only issue is that with all this IPIs will be redundant when hart
>>> power management SBI extension (a much cleaner interface) will be
>>> available. My hotplug patch is on hold because of that reason as
>>> well.
>>>
>>>
>>> Regards,
>>> Atish
>>>
>>
>> My proposal is about the new SBI API/extension, as I mention these IPI
>> calls i'm talking about need to be implemented on the firmware side,
>> they won't be handled by the OS like IPI_SOFT. So since you prefer to
>> have separate calls for each of the hart states the idea is:
>>
>> sbi_hart_suspend(cpumask) -> send an IPI_SUSPEND to cpus on cpumask
>> sbi_hart_resume(cpumask) -> send an IPI_RESUME to cpus on cpumask
>
> The IPI types are currently not exposed between Linux and the firmware
> and I'd like to keep it that way. Let's just prototype the first
> version of the hart power management extension instead. Implementing
> them in the firmware via IPIs is the only valid mechanism on our
> existing implementation, but the codes can stay within the firmware.
I had OpenSBI in mind when I mentioned the IPI_* codes, I totally agree
with you these shouldn't be exposed / handled by the OS.
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Thoughts on kexec / SBI
2019-03-26 3:24 ` Nick Kossifidis
@ 2019-03-26 6:57 ` Palmer Dabbelt
0 siblings, 0 replies; 5+ messages in thread
From: Palmer Dabbelt @ 2019-03-26 6:57 UTC (permalink / raw)
To: mick; +Cc: mick, Atish Patra, linux-riscv, Anup Patel, Paul Walmsley
On Mon, 25 Mar 2019 20:24:34 PDT (-0700), mick@ics.forth.gr wrote:
> Στις 2019-03-26 03:58, Palmer Dabbelt έγραψε:
>> On Sun, 24 Mar 2019 15:17:10 PDT (-0700), mick@ics.forth.gr wrote:
>>> Hello Atish,
>>>
>>> Στις 2019-03-24 00:31, Atish Patra έγραψε:
>>>> Hi Nick,
>>>>
>>>> On Mar 23, 2019, at 11:05 AM, Nick Kossifidis <mick@ics.forth.gr>
>>>> wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> I'm working on implementing kexec on RISC-V (just kexec for now, not
>>>>> kdump yet) since we want to be able to test new kernel images easily
>>>>> on our testbeds. It's part of a larger project where I'm trying to
>>>>> have a unified way of testing various linux-capable RISC-V targets
>>>>> (https://github.com/mickflemm/yarvt [1]) that we have on the lab.
>>>>
>>>> Great. This repo will be very helpful to beginners. I had a quick
>>>> look
>>>> at the sifive-fu540/commands.sh.
>>>> objcopy step is redundant for OpenSBI as it already generates the
>>>> fw_payload.bin as well.
>>>>
>>>
>>> I hope so, thanks for the update, I prefer it this way since it shows
>>> how the .bin is generated.
>>>
>>>>> The issue is that we don't have a way of stopping the secondary
>>>>> harts in a recoverable way, so at this point I have two options,
>>>>> either I'm going to call smp_send_stop() on machine_shutdown() and
>>>>> come back from kexec with a single hart running (which should be
>>>>> fine for kdump btw), or I need to have a reserved memory space where
>>>>> I'll have to keep some code to be executed by the secondary harts,
>>>>> that will be patched by the boot hart once the new kernel is in
>>>>> place, to let them execute the new kernel. The second option is too
>>>>> complicated for no reason and it also reduces the flexibility of the
>>>>> process since we can't use the whole RAM for the loaded kimage, plus
>>>>> it will be obsolete when we have proper handling of CPU suspend /
>>>>> per-hart reset, through SBI. So I'm going for the first option until
>>>>> then.
>>>>>
>>>>> I'd like to jump-start the discussion on how we can handle things
>>>>> through SBI, my initial through was this:
>>>>>
>>>>> a) Have a new IPI code IPI_SUSPEND, handled by the firmware (its
>>>>> code/data is persistent across kexecs), that puts the hart on a wfi
>>>>> loop, checking for a variable that contains the new virtual address
>>>>> to jump to (where the new kernel is located) on this hart's scratch
>>>>> buffer. Alternatively this call may do power management and
>>>>> completely shut down the CPU, if the hardware supports this.
>>>>>
>>>>> b) Have another IPI code IPI_RESUME, that wakes up the harts by
>>>>> providing them with the virtual address to jump to. This can be a
>>>>> simple write of the new address on the remote hart's scratch buffer
>>>>> + interrupt, or it can use power management to power up the hart and
>>>>> either provide it with a new reset vector, or do some handling on
>>>>> the BootROM code for that (this btw is also an open discussion on
>>>>> the TEE group where we are discussing Secure Boot). Also in case the
>>>>> hart is already running (an IPI_SUSPEND hasn't been sent on it
>>>>> before, e.g. during boot), the firmware will ignore the event,
>>>>> alternatively we may call IPI_SUSPEND on the firmware during boot so
>>>>> that the boot process happens with a single hart and other harts
>>>>> wait for the OS to wake them up.
>>>>
>>>> Looks good to me. I had similar IPI approach(IPI WAKEUP) for cpu
>>>> hotplug implementation except the resume address requirement.
>>>> https://lkml.org/lkml/2018/9/6/214 [2]
>>>>
>>>> The only issue is that with all this IPIs will be redundant when hart
>>>> power management SBI extension (a much cleaner interface) will be
>>>> available. My hotplug patch is on hold because of that reason as
>>>> well.
>>>>
>>>>
>>>> Regards,
>>>> Atish
>>>>
>>>
>>> My proposal is about the new SBI API/extension, as I mention these IPI
>>> calls i'm talking about need to be implemented on the firmware side,
>>> they won't be handled by the OS like IPI_SOFT. So since you prefer to
>>> have separate calls for each of the hart states the idea is:
>>>
>>> sbi_hart_suspend(cpumask) -> send an IPI_SUSPEND to cpus on cpumask
>>> sbi_hart_resume(cpumask) -> send an IPI_RESUME to cpus on cpumask
>>
>> The IPI types are currently not exposed between Linux and the firmware
>> and I'd like to keep it that way. Let's just prototype the first
>> version of the hart power management extension instead. Implementing
>> them in the firmware via IPIs is the only valid mechanism on our
>> existing implementation, but the codes can stay within the firmware.
>
> I had OpenSBI in mind when I mentioned the IPI_* codes, I totally agree
> with you these shouldn't be exposed / handled by the OS.
OK, cool, I think we're all on the same page.
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-03-26 6:57 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-23 18:05 Thoughts on kexec / SBI Nick Kossifidis
[not found] ` <6948FE9F-68BE-448A-A8B5-A293C297B088@wdc.com>
2019-03-24 22:17 ` Nick Kossifidis
2019-03-26 1:58 ` Palmer Dabbelt
2019-03-26 3:24 ` Nick Kossifidis
2019-03-26 6:57 ` Palmer Dabbelt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).