From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: BALATON Zoltan <balaton@eik.bme.hu>
Cc: David Gibson <david@gibson.dropbear.id.au>,
qemu-ppc@nongnu.org, qemu-devel@nongnu.org,
Greg Kurz <groug@kaod.org>
Subject: Re: [PATCH qemu v21] spapr: Implement Open Firmware client interface
Date: Fri, 18 Jun 2021 13:19:33 +1000 [thread overview]
Message-ID: <5082f436-1e36-05f7-5b7a-0b190351ffbb@ozlabs.ru> (raw)
In-Reply-To: <48f4473-3f44-4b6-7bf2-a8381fa8268e@eik.bme.hu>
On 6/17/21 21:29, BALATON Zoltan wrote:
> On Thu, 17 Jun 2021, Alexey Kardashevskiy wrote:
>> On 17/06/2021 19:16, BALATON Zoltan wrote:
>>> On Thu, 17 Jun 2021, Alexey Kardashevskiy wrote:
>>>> On 16/06/2021 20:34, BALATON Zoltan wrote:
>>>>> On Wed, 16 Jun 2021, Alexey Kardashevskiy wrote:
>>>>>> On 6/15/21 20:29, BALATON Zoltan wrote:
>>>>>>> On Tue, 15 Jun 2021, Alexey Kardashevskiy wrote:
>>>>>>>> The PAPR platform describes an OS environment that's presented by
>>>>>>>> a combination of a hypervisor and firmware. The features it
>>>>>>>> specifies
>>>>>>>> require collaboration between the firmware and the hypervisor.
>>>>>>>>
>>>>>>>> Since the beginning, the runtime component of the firmware
>>>>>>>> (RTAS) has
>>>>>>>> been implemented as a 20 byte shim which simply forwards it to
>>>>>>>> a hypercall implemented in qemu. The boot time firmware
>>>>>>>> component is
>>>>>>>> SLOF - but a build that's specific to qemu, and has always
>>>>>>>> needed to be
>>>>>>>> updated in sync with it. Even though we've managed to limit the
>>>>>>>> amount
>>>>>>>> of runtime communication we need between qemu and SLOF, there's
>>>>>>>> some,
>>>>>>>> and it has become increasingly awkward to handle as we've
>>>>>>>> implemented
>>>>>>>> new features.
>>>>>>>>
>>>>>>>> This implements a boot time OF client interface (CI) which is
>>>>>>>> enabled by a new "x-vof" pseries machine option (stands for
>>>>>>>> "Virtual Open
>>>>>>>> Firmware). When enabled, QEMU implements the custom H_OF_CLIENT
>>>>>>>> hcall
>>>>>>>> which implements Open Firmware Client Interface (OF CI). This
>>>>>>>> allows
>>>>>>>> using a smaller stateless firmware which does not have to manage
>>>>>>>> the device tree.
>>>>>>>>
>>>>>>>> The new "vof.bin" firmware image is included with source code under
>>>>>>>> pc-bios/. It also includes RTAS blob.
>>>>>>>>
>>>>>>>> This implements a handful of CI methods just to get -kernel/-initrd
>>>>>>>> working. In particular, this implements the device tree fetching
>>>>>>>> and
>>>>>>>> simple memory allocator - "claim" (an OF CI memory allocator)
>>>>>>>> and updates
>>>>>>>> "/memory@0/available" to report the client about available memory.
>>>>>>>>
>>>>>>>> This implements changing some device tree properties which we
>>>>>>>> know how
>>>>>>>> to deal with, the rest is ignored. To allow changes, this skips
>>>>>>>> fdt_pack() when x-vof=on as not packing the blob leaves some
>>>>>>>> room for
>>>>>>>> appending.
>>>>>>>>
>>>>>>>> In absence of SLOF, this assigns phandles to device tree nodes
>>>>>>>> to make
>>>>>>>> device tree traversing work.
>>>>>>>>
>>>>>>>> When x-vof=on, this adds "/chosen" every time QEMU (re)builds a
>>>>>>>> tree.
>>>>>>>>
>>>>>>>> This adds basic instances support which are managed by a hash map
>>>>>>>> ihandle -> [phandle].
>>>>>>>>
>>>>>>>> Before the guest started, the used memory is:
>>>>>>>> 0..e60 - the initial firmware
>>>>>>>> 8000..10000 - stack
>>>>>>>> 400000.. - kernel
>>>>>>>> 3ea0000.. - initramdisk
>>>>>>>>
>>>>>>>> This OF CI does not implement "interpret".
>>>>>>>>
>>>>>>>> Unlike SLOF, this does not format uninitialized nvram. Instead,
>>>>>>>> this
>>>>>>>> includes a disk image with pre-formatted nvram.
>>>>>>>>
>>>>>>>> With this basic support, this can only boot into kernel directly.
>>>>>>>> However this is just enough for the petitboot kernel and
>>>>>>>> initradmdisk to
>>>>>>>> boot from any possible source. Note this requires reasonably
>>>>>>>> recent guest
>>>>>>>> kernel with:
>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735
>>>>>>>> The immediate benefit is much faster booting time which especially
>>>>>>>> crucial with fully emulated early CPU bring up environments.
>>>>>>>> Also this
>>>>>>>> may come handy when/if GRUB-in-the-userspace sees light of the day.
>>>>>>>>
>>>>>>>> This separates VOF and sPAPR in a hope that VOF bits may be
>>>>>>>> reused by
>>>>>>>> other POWERPC boards which do not support pSeries.
>>>>>>>>
>>>>>>>> This make VOF optional, it is disabled by default, add --enable-vof
>>>>>>>> to ./configure to enable it.
>>>>>>>>
>>>>>>>> This assumes potential support for booting from QEMU backends
>>>>>>>> such as blockdev or netdev without devices/drivers used.
>>>>>>>>
>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> The example command line is:
>>>>>>>>
>>>>>>>> /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \
>>>>>>>> -nodefaults \
>>>>>>>> -chardev stdio,id=STDIO0,signal=off,mux=on \
>>>>>>>> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
>>>>>>>> -mon id=MON0,chardev=STDIO0,mode=readline \
>>>>>>>> -nographic \
>>>>>>>> -vga none \
>>>>>>>> -enable-kvm \
>>>>>>>> -m 8G \
>>>>>>>> -machine
>>>>>>>> pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off
>>>>>>>> \
>>>>>>>> -kernel pbuild/kernel-le-guest/vmlinux \
>>>>>>>> -initrd pb/rootfs.cpio.xz \
>>>>>>>> -drive
>>>>>>>> id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw
>>>>>>>> \
>>>>>>>> -global spapr-nvram.drive=DRIVE0 \
>>>>>>>> -snapshot \
>>>>>>>> -smp 8,threads=8 \
>>>>>>>> -L /home/aik/t/qemu-ppc64-bios/ \
>>>>>>>> -trace events=qemu_trace_events \
>>>>>>>> -d guest_errors \
>>>>>>>> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \
>>>>>>>> -mon chardev=SOCKET0,mode=control
>>>>>>>
>>>>>>> I haven't looked at it in detail yet, just some quick comments I
>>>>>>> have on first skim through.
>>>>>>>
>>>>>>>> ---
>>>>>>>> Changes:
>>>>>>>> v21:
>>>>>>>> * s/ld/ldz/ in entry.S
>>>>>>>
>>>>>>> Typo? Has this become lwz?
>>>>>>
>>>>>> Yup, lwz.
>>>>>>
>>>>>>>
>>>>>>>> * moved CONFIG_VOF from
>>>>>>>> default-configs/devices/ppc64-softmmu.mak to Kconfig
>>>>>>>> * made CONFIG_VOF optional
>>>>>>>
>>>>>>> This won't work for pegasos2, see below.
>>>>>>>
>>>>>>>> * s/l.lds/vof.lds/
>>>>>>>> * force 32 BE in spapr_machine_reset() instead of the firmware
>>>>>>>> * added checks for non-null methods of VofMachineIfClass
>>>>>>>> * moved OF_STACK_SIZE to vof.h, renamed to VOF_..., added a
>>>>>>>> better comment
>>>>>>>> * added path_offset wrapper for handling mixed case for addresses
>>>>>>>> after "@" in node names
>>>>>>>> * changed getprop() to check for actual "name" property in the fdt
>>>>>>>> * moved VOF_MEM_READ/VOF_MEM_WRITE to vof.h for sharing as
>>>>>>>> (unlike similar
>>>>>>>> rtas_ld/ldl_be_*) they return error codes
>>>>>>>> * VOF_MEM_READ uses now address_space_read (it was
>>>>>>>> address_space_read_full
>>>>>>>> before, not sure why)
>>>>>>> [...]
>>>>>>>> ---
>>>>>>>> configure | 9 +
>>>>>>>> pc-bios/vof/Makefile | 23 +
>>>>>>>> include/hw/ppc/spapr.h | 25 +-
>>>>>>>> include/hw/ppc/vof.h | 55 ++
>>>>>>>> pc-bios/vof/vof.h | 43 ++
>>>>>>>> hw/ppc/spapr.c | 87 +++-
>>>>>>>> hw/ppc/spapr_hcall.c | 29 +-
>>>>>>>> hw/ppc/spapr_vof.c | 153 ++++++
>>>>>>>> hw/ppc/vof.c | 1052
>>>>>>>> +++++++++++++++++++++++++++++++++++++++
>>>>>>>> pc-bios/vof/bootmem.c | 14 +
>>>>>>>> pc-bios/vof/ci.c | 91 ++++
>>>>>>>> pc-bios/vof/libc.c | 92 ++++
>>>>>>>> pc-bios/vof/main.c | 21 +
>>>>>>>> tests/qtest/rtas-test.c | 17 +-
>>>>>>>> MAINTAINERS | 12 +
>>>>>>>> hw/ppc/Kconfig | 3 +
>>>>>>>> hw/ppc/meson.build | 3 +
>>>>>>>> hw/ppc/trace-events | 24 +
>>>>>>>> meson.build | 1 +
>>>>>>>> pc-bios/README | 2 +
>>>>>>>> pc-bios/vof-nvram.bin | Bin 0 -> 16384 bytes
>>>>>>>> pc-bios/vof.bin | Bin 0 -> 3784 bytes
>>>>>>>> pc-bios/vof/entry.S | 49 ++
>>>>>>>> pc-bios/vof/vof.lds | 48 ++
>>>>>>>> 24 files changed, 1840 insertions(+), 13 deletions(-)
>>>>>>>> create mode 100644 pc-bios/vof/Makefile
>>>>>>>> create mode 100644 include/hw/ppc/vof.h
>>>>>>>> create mode 100644 pc-bios/vof/vof.h
>>>>>>>> create mode 100644 hw/ppc/spapr_vof.c
>>>>>>>> create mode 100644 hw/ppc/vof.c
>>>>>>>> create mode 100644 pc-bios/vof/bootmem.c
>>>>>>>> create mode 100644 pc-bios/vof/ci.c
>>>>>>>> create mode 100644 pc-bios/vof/libc.c
>>>>>>>> create mode 100644 pc-bios/vof/main.c
>>>>>>>> create mode 100644 pc-bios/vof-nvram.bin
>>>>>>>> create mode 100755 pc-bios/vof.bin
>>>>>>>> create mode 100644 pc-bios/vof/entry.S
>>>>>>>> create mode 100644 pc-bios/vof/vof.lds
>>>>>>>>
>>> [...]
>>>>>>>> diff --git a/include/hw/ppc/vof.h b/include/hw/ppc/vof.h
>>>>>>>> new file mode 100644
>>>>>>>> index 000000000000..65ca2fed0d41
>>>>>>>> --- /dev/null
>>>>>>>> +++ b/include/hw/ppc/vof.h
>>>>>>>> @@ -0,0 +1,55 @@
>>>>>>>> +/*
>>>>>>>> + * Virtual Open Firmware
>>>>>>>> + *
>>>>>>>> + * SPDX-License-Identifier: GPL-2.0-or-later
>>>>>>>> + */
>>>>>>>> +#ifndef HW_VOF_H
>>>>>>>> +#define HW_VOF_H
>>>>>>>> +
>>>>>>>> +typedef struct Vof {
>>>>>>>> + uint64_t top_addr; /* copied from rma_size */
>>>>>>>> + GArray *claimed; /* array of SpaprOfClaimed */
>>>>>>>> + uint64_t claimed_base;
>>>>>>>> + GHashTable *of_instances; /* ihandle -> SpaprOfInstance */
>>>>>>>> + uint32_t of_instance_last;
>>>>>>>> + char *bootargs;
>>>>>>>> + long fw_size;
>>>>>>>> +} Vof;
>>>>>>>> +
>>>>>>>> +int vof_client_call(MachineState *ms, Vof *vof, void *fdt,
>>>>>>>> + target_ulong args_real);
>>>>>>>> +uint64_t vof_claim(Vof *vof, uint64_t virt, uint64_t size,
>>>>>>>> uint64_t align);
>>>>>>>> +void vof_init(Vof *vof, uint64_t top_addr, Error **errp);
>>>>>>>> +void vof_cleanup(Vof *vof);
>>>>>>>> +void vof_build_dt(void *fdt, Vof *vof);
>>>>>>>> +uint32_t vof_client_open_store(void *fdt, Vof *vof, const char
>>>>>>>> *nodename,
>>>>>>>> + const char *prop, const char
>>>>>>>> *path);
>>>>>>>> +
>>>>>>>> +#define TYPE_VOF_MACHINE_IF "vof-machine-if"
>>>>>>>> +
>>>>>>>> +typedef struct VofMachineIfClass VofMachineIfClass;
>>>>>>>> +DECLARE_CLASS_CHECKERS(VofMachineIfClass, VOF_MACHINE,
>>>>>>>> TYPE_VOF_MACHINE_IF)
>>>>>>>> +
>>>>>>>> +struct VofMachineIfClass {
>>>>>>>> + InterfaceClass parent;
>>>>>>>> + target_ulong (*client_architecture_support)(MachineState
>>>>>>>> *ms, CPUState *cs,
>>>>>>>> + target_ulong vec);
>>>>>>>> + void (*quiesce)(MachineState *ms);
>>>>>>>> + bool (*setprop)(MachineState *ms, const char *path, const
>>>>>>>> char *propname,
>>>>>>>> + void *val, int vallen);
>>>>>>>> +};
>>>>>>>> +
>>>>>>>> +/*
>>>>>>>> + * Initial stack size is from
>>>>>>>> + *
>>>>>>>> https://www.devicetree.org/open-firmware/bindings/ppc/release/ppc-2_1.html
>>>>>>>> + */
>>>>>>>> +#define VOF_STACK_SIZE 0x8000
>>>>>>>
>>>>>>> Maybe also add a define for RTAS_SIZE here? We'll need to put
>>>>>>> that in the device tree but it depends on the rtas shim size
>>>>>>> that's part of VOF so it should be defined here instead of
>>>>>>> hardcoding it in boards that use VOF so it can be updated later
>>>>>>> at one place if needed.
>>>>>>
>>>>>> This is rtas-size for pseries:
>>>>>>
>>>>>> _FDT(fdt_setprop_cell(fdt, rtas, "rtas-size", RTAS_ERROR_LOG_MAX +
>>>>>> ms->smp.max_cpus * sizeof(uint64_t)*2 + sizeof(uint64_t)));
>>>>>>
>>>>>> => depends on cpus => depends on the command line.
>>>>>>
>>>>>>
>>>>>> RTAS_SIZE is not used by anything in pseries anymore, I'll send a
>>>>>> patch to ditch it.
>>>>>
>>>>> I mean you need to have at least the size of code in
>>>>> pc-bios/vof/entry.S hv_rtas where also hv_rtas_size is defined but
>>>>> that value is not available in QEMU where one needs to add it to
>>>>> the device tree. So a define for that should be here in vof.h.
>>>>> Currently I've counted instructions and have
>>>>>
>>>>> qemu_fdt_setprop_cell(fdt, "/rtas", "rtas-size", 20);
>>>>>
>>>>> in pegasos2.c but that 20 should be some VOF_RTAS_SIZE instead that
>>>>> you define corresponding to hv_rtas_size. You'll probably need the
>>>>> same even after changing above rtas size calculation in spapr
>>>>> because client has to allocate memory for instantiate-rtas.
>>>>
>>>>
>>>> Ah fair point. I do not like "20" here and I think the right thing
>>>> will be adding whatever number of bytes to rtas-size in the firmware
>>>> itself and update it in QEMU via "setprop" as we do for
>>>> "linux,rtas-base". And then do the same in SLOF.
>>>
>>> This is not the base address but the size of the shim with the
>>> hypercall that instantiate-rtas copies. Why does it need to be updated?
>>
>> The vm kernel allocates the space for it.
>>
>>> And why does it need to be more bytes than necessary?
>>
>> What is necessary? It is definitely way more than 20 bytes.
>
> I thought instantiate-rtas only copies the hv_rtas routine as the
> comment in qemu/pc-bios/vof/entry.S says
It does only copy the code, correct.
> and that routine is 20 bytes.
There is no "#define XXX 20" anywhere though. QEMU does not know and
does not need to know that it is 20, it does not manage the RTAS blob.
> What else is needed? If that's not enough then we even more need a
> define for it as boards using VOF have no idea otherwise.
>
>>> I don't know what you do for spapr and why do you need larger
>>> rtas-size than this but for pegasos2 this /rtas/rtas-size property is
>>> only used by guests to allocate memory for rtas so all I need is how
>>> many bytes are needed for hv_rtas in pc-bios/vof/entry.S which is
>>> what should be #defined in vof.h. I've found 20 is just enough so you
>>> could add that to vof.h.
>>
>> I am thinking now that may be the property should be created by
>> vof.bin and not QEMU, QEMU just has to tell how many bytes on top it
>> needs.
>
> Maybe. If it's always in /rtas/rtas-size on every OF implementation (if
> that path is kind of standard for rtas) then that could also work or you
> could have an vof_init_rtas() function or similar that sets this, maybe
> pass it "/rtas" as path argument or even the whole property path
> ("/rtas/rtas-size") to avoid hard coding it and let the board tell it
> where it expects this property, then the value can be set by this
> function so that's within VOF then. But I think just adding a define for
> it in vof.h is enough and simple. Then boards can add whatever they need
> and put that in the property where they like.
My idea is that boards like pegasos put a zero in such property and VOF
then adjusts it to whatever it is + 20.
--
Alexey
next prev parent reply other threads:[~2021-06-18 3:20 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-15 7:06 [PATCH qemu v21] spapr: Implement Open Firmware client interface Alexey Kardashevskiy
2021-06-15 10:29 ` BALATON Zoltan
2021-06-16 6:49 ` Alexey Kardashevskiy
2021-06-16 10:34 ` BALATON Zoltan
2021-06-17 2:23 ` Alexey Kardashevskiy
2021-06-17 9:16 ` BALATON Zoltan
2021-06-17 10:28 ` Alexey Kardashevskiy
2021-06-17 11:29 ` BALATON Zoltan
2021-06-18 3:19 ` Alexey Kardashevskiy [this message]
2021-06-18 10:13 ` BALATON Zoltan
2021-06-22 7:49 ` Alexey Kardashevskiy
2021-06-19 9:28 ` David Gibson
2021-06-15 21:09 ` BALATON Zoltan
2021-06-16 6:49 ` Alexey Kardashevskiy
2021-06-16 10:26 ` BALATON Zoltan
2021-06-17 2:40 ` Alexey Kardashevskiy
2021-06-17 9:21 ` BALATON Zoltan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5082f436-1e36-05f7-5b7a-0b190351ffbb@ozlabs.ru \
--to=aik@ozlabs.ru \
--cc=balaton@eik.bme.hu \
--cc=david@gibson.dropbear.id.au \
--cc=groug@kaod.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).