qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: BALATON Zoltan <balaton@eik.bme.hu>
Cc: David Gibson <david@gibson.dropbear.id.au>,
	qemu-ppc@nongnu.org, qemu-devel@nongnu.org,
	Greg Kurz <groug@kaod.org>
Subject: Re: [PATCH qemu v21] spapr: Implement Open Firmware client interface
Date: Fri, 18 Jun 2021 13:19:33 +1000	[thread overview]
Message-ID: <5082f436-1e36-05f7-5b7a-0b190351ffbb@ozlabs.ru> (raw)
In-Reply-To: <48f4473-3f44-4b6-7bf2-a8381fa8268e@eik.bme.hu>



On 6/17/21 21:29, BALATON Zoltan wrote:
> On Thu, 17 Jun 2021, Alexey Kardashevskiy wrote:
>> On 17/06/2021 19:16, BALATON Zoltan wrote:
>>> On Thu, 17 Jun 2021, Alexey Kardashevskiy wrote:
>>>> On 16/06/2021 20:34, BALATON Zoltan wrote:
>>>>> On Wed, 16 Jun 2021, Alexey Kardashevskiy wrote:
>>>>>> On 6/15/21 20:29, BALATON Zoltan wrote:
>>>>>>> On Tue, 15 Jun 2021, Alexey Kardashevskiy wrote:
>>>>>>>> The PAPR platform describes an OS environment that's presented by
>>>>>>>> a combination of a hypervisor and firmware. The features it 
>>>>>>>> specifies
>>>>>>>> require collaboration between the firmware and the hypervisor.
>>>>>>>>
>>>>>>>> Since the beginning, the runtime component of the firmware 
>>>>>>>> (RTAS) has
>>>>>>>> been implemented as a 20 byte shim which simply forwards it to
>>>>>>>> a hypercall implemented in qemu. The boot time firmware 
>>>>>>>> component is
>>>>>>>> SLOF - but a build that's specific to qemu, and has always 
>>>>>>>> needed to be
>>>>>>>> updated in sync with it. Even though we've managed to limit the 
>>>>>>>> amount
>>>>>>>> of runtime communication we need between qemu and SLOF, there's 
>>>>>>>> some,
>>>>>>>> and it has become increasingly awkward to handle as we've 
>>>>>>>> implemented
>>>>>>>> new features.
>>>>>>>>
>>>>>>>> This implements a boot time OF client interface (CI) which is
>>>>>>>> enabled by a new "x-vof" pseries machine option (stands for 
>>>>>>>> "Virtual Open
>>>>>>>> Firmware). When enabled, QEMU implements the custom H_OF_CLIENT 
>>>>>>>> hcall
>>>>>>>> which implements Open Firmware Client Interface (OF CI). This 
>>>>>>>> allows
>>>>>>>> using a smaller stateless firmware which does not have to manage
>>>>>>>> the device tree.
>>>>>>>>
>>>>>>>> The new "vof.bin" firmware image is included with source code under
>>>>>>>> pc-bios/. It also includes RTAS blob.
>>>>>>>>
>>>>>>>> This implements a handful of CI methods just to get -kernel/-initrd
>>>>>>>> working. In particular, this implements the device tree fetching 
>>>>>>>> and
>>>>>>>> simple memory allocator - "claim" (an OF CI memory allocator) 
>>>>>>>> and updates
>>>>>>>> "/memory@0/available" to report the client about available memory.
>>>>>>>>
>>>>>>>> This implements changing some device tree properties which we 
>>>>>>>> know how
>>>>>>>> to deal with, the rest is ignored. To allow changes, this skips
>>>>>>>> fdt_pack() when x-vof=on as not packing the blob leaves some 
>>>>>>>> room for
>>>>>>>> appending.
>>>>>>>>
>>>>>>>> In absence of SLOF, this assigns phandles to device tree nodes 
>>>>>>>> to make
>>>>>>>> device tree traversing work.
>>>>>>>>
>>>>>>>> When x-vof=on, this adds "/chosen" every time QEMU (re)builds a 
>>>>>>>> tree.
>>>>>>>>
>>>>>>>> This adds basic instances support which are managed by a hash map
>>>>>>>> ihandle -> [phandle].
>>>>>>>>
>>>>>>>> Before the guest started, the used memory is:
>>>>>>>> 0..e60 - the initial firmware
>>>>>>>> 8000..10000 - stack
>>>>>>>> 400000.. - kernel
>>>>>>>> 3ea0000.. - initramdisk
>>>>>>>>
>>>>>>>> This OF CI does not implement "interpret".
>>>>>>>>
>>>>>>>> Unlike SLOF, this does not format uninitialized nvram. Instead, 
>>>>>>>> this
>>>>>>>> includes a disk image with pre-formatted nvram.
>>>>>>>>
>>>>>>>> With this basic support, this can only boot into kernel directly.
>>>>>>>> However this is just enough for the petitboot kernel and 
>>>>>>>> initradmdisk to
>>>>>>>> boot from any possible source. Note this requires reasonably 
>>>>>>>> recent guest
>>>>>>>> kernel with:
>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735 
>>>>>>>> The immediate benefit is much faster booting time which especially
>>>>>>>> crucial with fully emulated early CPU bring up environments. 
>>>>>>>> Also this
>>>>>>>> may come handy when/if GRUB-in-the-userspace sees light of the day.
>>>>>>>>
>>>>>>>> This separates VOF and sPAPR in a hope that VOF bits may be 
>>>>>>>> reused by
>>>>>>>> other POWERPC boards which do not support pSeries.
>>>>>>>>
>>>>>>>> This make VOF optional, it is disabled by default, add --enable-vof
>>>>>>>> to ./configure to enable it.
>>>>>>>>
>>>>>>>> This assumes potential support for booting from QEMU backends
>>>>>>>> such as blockdev or netdev without devices/drivers used.
>>>>>>>>
>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> The example command line is:
>>>>>>>>
>>>>>>>> /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \
>>>>>>>> -nodefaults \
>>>>>>>> -chardev stdio,id=STDIO0,signal=off,mux=on \
>>>>>>>> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
>>>>>>>> -mon id=MON0,chardev=STDIO0,mode=readline \
>>>>>>>> -nographic \
>>>>>>>> -vga none \
>>>>>>>> -enable-kvm \
>>>>>>>> -m 8G \
>>>>>>>> -machine 
>>>>>>>> pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off 
>>>>>>>> \
>>>>>>>> -kernel pbuild/kernel-le-guest/vmlinux \
>>>>>>>> -initrd pb/rootfs.cpio.xz \
>>>>>>>> -drive 
>>>>>>>> id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw 
>>>>>>>> \
>>>>>>>> -global spapr-nvram.drive=DRIVE0 \
>>>>>>>> -snapshot \
>>>>>>>> -smp 8,threads=8 \
>>>>>>>> -L /home/aik/t/qemu-ppc64-bios/ \
>>>>>>>> -trace events=qemu_trace_events \
>>>>>>>> -d guest_errors \
>>>>>>>> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \
>>>>>>>> -mon chardev=SOCKET0,mode=control
>>>>>>>
>>>>>>> I haven't looked at it in detail yet, just some quick comments I 
>>>>>>> have on first skim through.
>>>>>>>
>>>>>>>> ---
>>>>>>>> Changes:
>>>>>>>> v21:
>>>>>>>> * s/ld/ldz/ in entry.S
>>>>>>>
>>>>>>> Typo? Has this become lwz?
>>>>>>
>>>>>> Yup, lwz.
>>>>>>
>>>>>>>
>>>>>>>> * moved CONFIG_VOF from 
>>>>>>>> default-configs/devices/ppc64-softmmu.mak to Kconfig
>>>>>>>> * made CONFIG_VOF optional
>>>>>>>
>>>>>>> This won't work for pegasos2, see below.
>>>>>>>
>>>>>>>> * s/l.lds/vof.lds/
>>>>>>>> * force 32 BE in spapr_machine_reset() instead of the firmware
>>>>>>>> * added checks for non-null methods of VofMachineIfClass
>>>>>>>> * moved OF_STACK_SIZE to vof.h, renamed to VOF_..., added a 
>>>>>>>> better comment
>>>>>>>> * added  path_offset wrapper for handling mixed case for addresses
>>>>>>>> after "@" in node names
>>>>>>>> * changed getprop() to check for actual "name" property in the fdt
>>>>>>>> * moved VOF_MEM_READ/VOF_MEM_WRITE to vof.h for sharing as 
>>>>>>>> (unlike similar
>>>>>>>> rtas_ld/ldl_be_*) they return error codes
>>>>>>>> * VOF_MEM_READ uses now address_space_read (it was 
>>>>>>>> address_space_read_full
>>>>>>>> before, not sure why)
>>>>>>> [...]
>>>>>>>> ---
>>>>>>>> configure               |    9 +
>>>>>>>> pc-bios/vof/Makefile    |   23 +
>>>>>>>> include/hw/ppc/spapr.h  |   25 +-
>>>>>>>> include/hw/ppc/vof.h    |   55 ++
>>>>>>>> pc-bios/vof/vof.h       |   43 ++
>>>>>>>> hw/ppc/spapr.c          |   87 +++-
>>>>>>>> hw/ppc/spapr_hcall.c    |   29 +-
>>>>>>>> hw/ppc/spapr_vof.c      |  153 ++++++
>>>>>>>> hw/ppc/vof.c            | 1052 
>>>>>>>> +++++++++++++++++++++++++++++++++++++++
>>>>>>>> pc-bios/vof/bootmem.c   |   14 +
>>>>>>>> pc-bios/vof/ci.c        |   91 ++++
>>>>>>>> pc-bios/vof/libc.c      |   92 ++++
>>>>>>>> pc-bios/vof/main.c      |   21 +
>>>>>>>> tests/qtest/rtas-test.c |   17 +-
>>>>>>>> MAINTAINERS             |   12 +
>>>>>>>> hw/ppc/Kconfig          |    3 +
>>>>>>>> hw/ppc/meson.build      |    3 +
>>>>>>>> hw/ppc/trace-events     |   24 +
>>>>>>>> meson.build             |    1 +
>>>>>>>> pc-bios/README          |    2 +
>>>>>>>> pc-bios/vof-nvram.bin   |  Bin 0 -> 16384 bytes
>>>>>>>> pc-bios/vof.bin         |  Bin 0 -> 3784 bytes
>>>>>>>> pc-bios/vof/entry.S     |   49 ++
>>>>>>>> pc-bios/vof/vof.lds     |   48 ++
>>>>>>>> 24 files changed, 1840 insertions(+), 13 deletions(-)
>>>>>>>> create mode 100644 pc-bios/vof/Makefile
>>>>>>>> create mode 100644 include/hw/ppc/vof.h
>>>>>>>> create mode 100644 pc-bios/vof/vof.h
>>>>>>>> create mode 100644 hw/ppc/spapr_vof.c
>>>>>>>> create mode 100644 hw/ppc/vof.c
>>>>>>>> create mode 100644 pc-bios/vof/bootmem.c
>>>>>>>> create mode 100644 pc-bios/vof/ci.c
>>>>>>>> create mode 100644 pc-bios/vof/libc.c
>>>>>>>> create mode 100644 pc-bios/vof/main.c
>>>>>>>> create mode 100644 pc-bios/vof-nvram.bin
>>>>>>>> create mode 100755 pc-bios/vof.bin
>>>>>>>> create mode 100644 pc-bios/vof/entry.S
>>>>>>>> create mode 100644 pc-bios/vof/vof.lds
>>>>>>>>
>>> [...]
>>>>>>>> diff --git a/include/hw/ppc/vof.h b/include/hw/ppc/vof.h
>>>>>>>> new file mode 100644
>>>>>>>> index 000000000000..65ca2fed0d41
>>>>>>>> --- /dev/null
>>>>>>>> +++ b/include/hw/ppc/vof.h
>>>>>>>> @@ -0,0 +1,55 @@
>>>>>>>> +/*
>>>>>>>> + * Virtual Open Firmware
>>>>>>>> + *
>>>>>>>> + * SPDX-License-Identifier: GPL-2.0-or-later
>>>>>>>> + */
>>>>>>>> +#ifndef HW_VOF_H
>>>>>>>> +#define HW_VOF_H
>>>>>>>> +
>>>>>>>> +typedef struct Vof {
>>>>>>>> +    uint64_t top_addr; /* copied from rma_size */
>>>>>>>> +    GArray *claimed; /* array of SpaprOfClaimed */
>>>>>>>> +    uint64_t claimed_base;
>>>>>>>> +    GHashTable *of_instances; /* ihandle -> SpaprOfInstance */
>>>>>>>> +    uint32_t of_instance_last;
>>>>>>>> +    char *bootargs;
>>>>>>>> +    long fw_size;
>>>>>>>> +} Vof;
>>>>>>>> +
>>>>>>>> +int vof_client_call(MachineState *ms, Vof *vof, void *fdt,
>>>>>>>> +                    target_ulong args_real);
>>>>>>>> +uint64_t vof_claim(Vof *vof, uint64_t virt, uint64_t size, 
>>>>>>>> uint64_t align);
>>>>>>>> +void vof_init(Vof *vof, uint64_t top_addr, Error **errp);
>>>>>>>> +void vof_cleanup(Vof *vof);
>>>>>>>> +void vof_build_dt(void *fdt, Vof *vof);
>>>>>>>> +uint32_t vof_client_open_store(void *fdt, Vof *vof, const char 
>>>>>>>> *nodename,
>>>>>>>> +                               const char *prop, const char 
>>>>>>>> *path);
>>>>>>>> +
>>>>>>>> +#define TYPE_VOF_MACHINE_IF "vof-machine-if"
>>>>>>>> +
>>>>>>>> +typedef struct VofMachineIfClass VofMachineIfClass;
>>>>>>>> +DECLARE_CLASS_CHECKERS(VofMachineIfClass, VOF_MACHINE, 
>>>>>>>> TYPE_VOF_MACHINE_IF)
>>>>>>>> +
>>>>>>>> +struct VofMachineIfClass {
>>>>>>>> +    InterfaceClass parent;
>>>>>>>> +    target_ulong (*client_architecture_support)(MachineState 
>>>>>>>> *ms, CPUState *cs,
>>>>>>>> +                                                target_ulong vec);
>>>>>>>> +    void (*quiesce)(MachineState *ms);
>>>>>>>> +    bool (*setprop)(MachineState *ms, const char *path, const 
>>>>>>>> char *propname,
>>>>>>>> +                    void *val, int vallen);
>>>>>>>> +};
>>>>>>>> +
>>>>>>>> +/*
>>>>>>>> + * Initial stack size is from
>>>>>>>> + * 
>>>>>>>> https://www.devicetree.org/open-firmware/bindings/ppc/release/ppc-2_1.html 
>>>>>>>> + */
>>>>>>>> +#define VOF_STACK_SIZE       0x8000
>>>>>>>
>>>>>>> Maybe also add a define for RTAS_SIZE here? We'll need to put 
>>>>>>> that in the device tree but it depends on the rtas shim size 
>>>>>>> that's part of VOF so it should be defined here instead of 
>>>>>>> hardcoding it in boards that use VOF so it can be updated later 
>>>>>>> at one place if needed.
>>>>>>
>>>>>> This is rtas-size for pseries:
>>>>>>
>>>>>> _FDT(fdt_setprop_cell(fdt, rtas, "rtas-size", RTAS_ERROR_LOG_MAX +
>>>>>>          ms->smp.max_cpus * sizeof(uint64_t)*2 + sizeof(uint64_t)));
>>>>>>
>>>>>> => depends on cpus => depends on the command line.
>>>>>>
>>>>>>
>>>>>> RTAS_SIZE is not used by anything in pseries anymore, I'll send a 
>>>>>> patch to ditch it.
>>>>>
>>>>> I mean you need to have at least the size of code in 
>>>>> pc-bios/vof/entry.S hv_rtas where also hv_rtas_size is defined but 
>>>>> that value is not available in QEMU where one needs to add it to 
>>>>> the device tree. So a define for that should be here in vof.h. 
>>>>> Currently I've counted instructions and have
>>>>>
>>>>>      qemu_fdt_setprop_cell(fdt, "/rtas", "rtas-size", 20);
>>>>>
>>>>> in pegasos2.c but that 20 should be some VOF_RTAS_SIZE instead that 
>>>>> you define corresponding to hv_rtas_size. You'll probably need the 
>>>>> same even after changing above rtas size calculation in spapr 
>>>>> because client has to allocate memory for instantiate-rtas.
>>>>
>>>>
>>>> Ah fair point. I do not like "20" here and I think the right thing 
>>>> will be adding whatever number of bytes to rtas-size in the firmware 
>>>> itself and update it in QEMU via "setprop" as we do for 
>>>> "linux,rtas-base". And then do the same in SLOF.
>>>
>>> This is not the base address but the size of the shim with the 
>>> hypercall that instantiate-rtas copies. Why does it need to be updated?
>>
>> The vm kernel allocates the space for it.
>>
>>> And why does it need to be more bytes than necessary?
>>
>> What is necessary? It is definitely way more than 20 bytes.
> 
> I thought instantiate-rtas only copies the hv_rtas routine as the 
> comment in qemu/pc-bios/vof/entry.S says

It does only copy the code, correct.

> and that routine is 20 bytes. 


There is no "#define XXX 20" anywhere though. QEMU does not know and 
does not need to know that it is 20, it does not manage the RTAS blob.


> What else is needed? If that's not enough then we even more need a 
> define for it as boards using VOF have no idea otherwise.
> 
>>> I don't know what you do for spapr and why do you need larger 
>>> rtas-size than this but for pegasos2 this /rtas/rtas-size property is 
>>> only used by guests to allocate memory for rtas so all I need is how 
>>> many bytes are needed for hv_rtas in pc-bios/vof/entry.S which is 
>>> what should be #defined in vof.h. I've found 20 is just enough so you 
>>> could add that to vof.h.
>>
>> I am thinking now that may be the property should be created by 
>> vof.bin and not QEMU, QEMU just has to tell how many bytes on top it 
>> needs.
> 
> Maybe. If it's always in /rtas/rtas-size on every OF implementation (if 
> that path is kind of standard for rtas) then that could also work or you 
> could have an vof_init_rtas() function or similar that sets this, maybe 
> pass it "/rtas" as path argument or even the whole property path 
> ("/rtas/rtas-size") to avoid hard coding it and let the board tell it 
> where it expects this property, then the value can be set by this 
> function so that's within VOF then. But I think just adding a define for 
> it in vof.h is enough and simple. Then boards can add whatever they need 
> and put that in the property where they like.


My idea is that boards like pegasos put a zero in such property and VOF 
then adjusts it to whatever it is + 20.


-- 
Alexey


  reply	other threads:[~2021-06-18  3:20 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-15  7:06 [PATCH qemu v21] spapr: Implement Open Firmware client interface Alexey Kardashevskiy
2021-06-15 10:29 ` BALATON Zoltan
2021-06-16  6:49   ` Alexey Kardashevskiy
2021-06-16 10:34     ` BALATON Zoltan
2021-06-17  2:23       ` Alexey Kardashevskiy
2021-06-17  9:16         ` BALATON Zoltan
2021-06-17 10:28           ` Alexey Kardashevskiy
2021-06-17 11:29             ` BALATON Zoltan
2021-06-18  3:19               ` Alexey Kardashevskiy [this message]
2021-06-18 10:13                 ` BALATON Zoltan
2021-06-22  7:49                   ` Alexey Kardashevskiy
2021-06-19  9:28           ` David Gibson
2021-06-15 21:09 ` BALATON Zoltan
2021-06-16  6:49   ` Alexey Kardashevskiy
2021-06-16 10:26     ` BALATON Zoltan
2021-06-17  2:40       ` Alexey Kardashevskiy
2021-06-17  9:21         ` BALATON Zoltan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5082f436-1e36-05f7-5b7a-0b190351ffbb@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=balaton@eik.bme.hu \
    --cc=david@gibson.dropbear.id.au \
    --cc=groug@kaod.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).