From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org, Greg Kurz <groug@kaod.org>
Subject: Re: [PATCH qemu v16] spapr: Implement Open Firmware client interface
Date: Thu, 25 Mar 2021 14:25:33 +1100 [thread overview]
Message-ID: <98565b10-debd-be0a-79f7-9f08737a49d1@ozlabs.ru> (raw)
In-Reply-To: <YFv69rtZd6yzKAtU@yekko.fritz.box>
On 25/03/2021 13:52, David Gibson wrote:
> On Tue, Mar 23, 2021 at 01:58:30PM +1100, Alexey Kardashevskiy wrote:
>> The PAPR platform which describes an OS environment that's presented by
>> a combination of a hypervisor and firmware. The features it specifies
>> require collaboration between the firmware and the hypervisor.
>>
>> Since the beginning, the runtime component of the firmware (RTAS) has
>> been implemented as a 20 byte shim which simply forwards it to
>> a hypercall implemented in qemu. The boot time firmware component is
>> SLOF - but a build that's specific to qemu, and has always needed to be
>> updated in sync with it. Even though we've managed to limit the amount
>> of runtime communication we need between qemu and SLOF, there's some,
>> and it has become increasingly awkward to handle as we've implemented
>> new features.
>>
>> This implements a boot time OF client interface (CI) which is
>> enabled by a new "x-vof" pseries machine option (stands for "Virtual Open
>> Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall
>> which implements Open Firmware Client Interface (OF CI). This allows
>> using a smaller stateless firmware which does not have to manage
>> the device tree.
>>
>> The new "vof.bin" firmware image is included with source code under
>> pc-bios/. It also includes RTAS blob.
>>
>> This implements a handful of CI methods just to get -kernel/-initrd
>> working. In particular, this implements the device tree fetching and
>> simple memory allocator - "claim" (an OF CI memory allocator) and updates
>> "/memory@0/available" to report the client about available memory.
>>
>> This implements changing some device tree properties which we know how
>> to deal with, the rest is ignored. To allow changes, this skips
>> fdt_pack() when x-vof=on as not packing the blob leaves some room for
>> appending.
>>
>> In absence of SLOF, this assigns phandles to device tree nodes to make
>> device tree traversing work.
>>
>> When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree.
>>
>> This adds basic instances support which are managed by a hash map
>> ihandle -> [phandle].
>>
>> Before the guest started, the used memory is:
>> 0..e60 - the initial firmware
>> 8000..10000 - stack
>> 400000.. - kernel
>> 3ea0000.. - initramdisk
>>
>> This OF CI does not implement "interpret".
>>
>> Unlike SLOF, this does not format uninitialized nvram. Instead, this
>> includes a disk image with pre-formatted nvram.
>>
>> With this basic support, this can only boot into kernel directly.
>> However this is just enough for the petitboot kernel and initradmdisk to
>> boot from any possible source. Note this requires reasonably recent guest
>> kernel with:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735
>>
>> The immediate benefit is much faster booting time which especially
>> crucial with fully emulated early CPU bring up environments. Also this
>> may come handy when/if GRUB-in-the-userspace sees light of the day.
>>
>> This separates VOF and sPAPR in a hope that VOF bits may be reused by
>> other POWERPC boards which do not support pSeries.
>>
>> This is coded in assumption that later on we might be adding support for
>> booting from QEMU backends (blockdev is the first candidate) without
>> devices/drivers in between as OF1275 does not require that and
>> it is quite easy to so.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>
> I have some comments below, but they're basically all trivial at this
> point. We've missed qemu-6.0 obviously, but I'm hoping I can merge
> the next spin to my ppc-for-6.1 tree.
>
>> ---
>>
>> The example command line is:
>>
>> /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \
>> -nodefaults \
>> -chardev stdio,id=STDIO0,signal=off,mux=on \
>> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
>> -mon id=MON0,chardev=STDIO0,mode=readline \
>> -nographic \
>> -vga none \
>> -enable-kvm \
>> -m 2G \
>> -machine pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off \
>> -kernel pbuild/kernel-le-guest/vmlinux \
>> -initrd pb/rootfs.cpio.xz \
>> -drive id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw \
>
> Removing the need for a prebuild NVRAM image is something I'd like to
> see as a followup.
We do not _need_ NVRAM in the VM to begin with, or is this a
requirement? The whole VOF thing is more like a hack and I do not recall
myself on doing anything useful with NVRAM.
If we really need it, then when to format it - in QEMU or VOF.bin? This
alone will trigger a (lengthy) discussion :)
>> -global spapr-nvram.drive=DRIVE0 \
>> -snapshot \
>> -smp 8,threads=8 \
>> -L /home/aik/t/qemu-ppc64-bios/ \
>> -trace events=qemu_trace_events \
>> -d guest_errors \
>> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \
>> -mon chardev=SOCKET0,mode=control
>>
>> ---
>> Changes:
>> v16:
>> * rebased on dwg/ppc-for-6.1
>> * s/SpaprVofInterface/VofMachineInterface/
>>
>> v15:
>> * bugfix: claimed memory for the VOF itself
>> * ditched OF_STACK_ADDR and allocate one instead, now it starts from 0x8000
>> because it is aligned to its size (no particular reason though)
>> * coding style
>> * moved nvram.bin up one level
>> * ditched bool in the firmware
>> * made debugging code conditional using trace_event_get_state() + qemu_loglevel_mask()
>> * renamed the CAS interface to SpaprVofInterface
>> * added "write" which for now dumps the message and ihandle via
>> trace point for early debug assistance
>> * commented on when we allocate of_instances in vof_build_dt()
>> * store fw_size is SpaprMachine to let spapr_vof_reset() claim it
>> * many small fixes from v14's review
>>
>> v14:
>> * check for truncates in readstr()
>> * ditched a separate vof_reset()
>> * spapr->vof is a pointer now, dropped the "on" field
>> * removed rtas_base from vof and updated comment why we allow setting it
>> * added myself to maintainers
>> * updated commit log about blockdev and other possible platforms
>> * added a note why new hcall is 0x5
>> * no in place endianness convertion in spapr_h_vof_client
>> * converted all cpu_physical_memory_read/write to address_space_rw
>> * git mv hw/ppc/spapr_vof_client.c hw/ppc/spapr_vof.c
>>
>> v13:
>> * rebase on latest ppc-for-6.0
>> * shuffled code around to touch spapr.c less
>>
>> v12:
>> * split VOF and SPAPR
>>
>> v11:
>> * added g_autofree
>> * fixed gcc warnings
>> * fixed few leaks
>> * added nvram image to make "nvram --print-config" not crash;
>> Note that contrary to MIN_NVRAM_SIZE (8 * KiB), the actual minimum size
>> is 16K, or it just does not work (empty output from "nvram")
>>
>> v10:
>> * now rebased to compile with meson
>>
>> v9:
>> * remove special handling of /rtas/rtas-size as now we always add it in QEMU
>> * removed leftovers from scsi/grub/stdout/stdin/...
>>
>> v8:
>> * no read/write/seek
>> * no @dev in instances
>> * the machine flag is "x-vof" for now
>>
>> v7:
>> * now we have a small firmware which loads at 0 as SLOF and starts from
>> 0x100 as SLOF
>> * no MBR/ELF/GRUB business in QEMU anymore
>> * blockdev is a separate patch
>> * networking is a separate patch
>>
>> v6:
>> * borrowed a big chunk of commit log introduction from David
>> * fixed initial stack pointer (points to the highest address of stack)
>> * traces for "interpret" and others
>> * disabled translate_kernel_address() hack so grub can load (work in
>> progress)
>> * added "milliseconds" for grub
>> * fixed "claim" allocator again
>> * moved FDT_MAX_SIZE to spapr.h as spapr_of_client.c wants it too for CAS
>> * moved the most code possible from spapr.c to spapr_of_client.c, such as
>> RTAS, prom entry and FDT build/finalize
>> * separated blobs
>> * GRUB now proceeds to its console prompt (there are still other issues)
>> * parse MBR/GPT to find PReP and load GRUB
>>
>> v5:
>> * made instances keep device and chardev pointers
>> * removed VIO dependencies
>> * print error if RTAS memory is not claimed as it should have been
>> * pack FDT as "quiesce"
>>
>> v4:
>> * fixed open
>> * validate ihandles in "call-method"
>>
>> v3:
>> * fixed phandles allocation
>> * s/__be32/uint32_t/ as we do not normally have __be32 type in qemu
>> * fixed size of /chosen/stdout
>> * bunch of renames
>> * do not create rtas properties at all, let the client deal with it;
>> instead setprop allows changing these in the FDT
>> * no more packing FDT when bios=off - nobody needs it and getprop does not
>> work otherwise
>> * allow updating initramdisk device tree properties (for zImage)
>> * added instances
>> * fixed stdout on OF's "write"
>> * removed special handling for stdout in OF client, spapr-vty handles it
>> instead
>>
>> v2:
>> * fixed claim()
>> * added "setprop"
>> * cleaner client interface and RTAS blobs management
>> * boots to petitboot and further to the target system
>> * more trace points
>> ---
>> pc-bios/vof/Makefile | 18 +
>> hw/ppc/vof.h | 37 ++
>> include/hw/ppc/spapr.h | 17 +-
>> pc-bios/vof/vof.h | 38 ++
>> hw/ppc/spapr.c | 83 +++-
>> hw/ppc/spapr_hcall.c | 26 +-
>> hw/ppc/spapr_vof.c | 148 +++++++
>> hw/ppc/vof.c | 982 +++++++++++++++++++++++++++++++++++++++++
>> pc-bios/vof/bootmem.c | 14 +
>> pc-bios/vof/ci.c | 91 ++++
>> pc-bios/vof/libc.c | 92 ++++
>> pc-bios/vof/main.c | 21 +
>> MAINTAINERS | 11 +
>> hw/ppc/meson.build | 2 +
>> hw/ppc/trace-events | 24 +
>> pc-bios/README | 2 +
>> pc-bios/vof-nvram.bin | Bin 0 -> 16384 bytes
>> pc-bios/vof.bin | Bin 0 -> 3128 bytes
>> pc-bios/vof/entry.S | 51 +++
>> pc-bios/vof/l.lds | 48 ++
>> 20 files changed, 1693 insertions(+), 12 deletions(-)
>> create mode 100644 pc-bios/vof/Makefile
>> create mode 100644 hw/ppc/vof.h
>> create mode 100644 pc-bios/vof/vof.h
>> create mode 100644 hw/ppc/spapr_vof.c
>> create mode 100644 hw/ppc/vof.c
>> create mode 100644 pc-bios/vof/bootmem.c
>> create mode 100644 pc-bios/vof/ci.c
>> create mode 100644 pc-bios/vof/libc.c
>> create mode 100644 pc-bios/vof/main.c
>> create mode 100644 pc-bios/vof-nvram.bin
>> create mode 100755 pc-bios/vof.bin
>> create mode 100644 pc-bios/vof/entry.S
>> create mode 100644 pc-bios/vof/l.lds
>>
>> diff --git a/pc-bios/vof/Makefile b/pc-bios/vof/Makefile
>> new file mode 100644
>> index 000000000000..1451e0551818
>> --- /dev/null
>> +++ b/pc-bios/vof/Makefile
>> @@ -0,0 +1,18 @@
>> +all: build-all
>> +
>> +build-all: vof.bin
>> +
>> +%.o: %.S
>> + cc -m32 -mbig-endian -c -o $@ $<
>> +
>> +%.o: %.c
>> + cc -m32 -mbig-endian -c -fno-stack-protector -o $@ $<
>> +
>> +vof.elf: entry.o main.o ci.o bootmem.o libc.o
>> + ld -nostdlib -e_start -Tl.lds -EB -o $@ $^
>> +
>> +%.bin: %.elf
>> + objcopy -O binary -j .text -j .data -j .toc -j .got2 $^ $@
>> +
>> +clean:
>> + rm -f *.o vof.bin vof.elf *~
>> diff --git a/hw/ppc/vof.h b/hw/ppc/vof.h
>
> Qemu conventions meant this should be in include/hw/ppc/vof.h.
>
>> new file mode 100644
>> index 000000000000..fc397e4e0c9b
>> --- /dev/null
>> +++ b/hw/ppc/vof.h
>> @@ -0,0 +1,37 @@
>> + /* Virtual Open Firmware */
>> +#ifndef HW_VOF_H
>> +#define HW_VOF_H
>
> Probably worth adding an SPDX-License-Identifier line, just for
> completeness.
>
>> +
>> +typedef struct Vof {
>> + uint32_t top_addr; /* copied from rma_size */
>> + GArray *claimed; /* array of SpaprOfClaimed */
>> + uint64_t claimed_base;
>> + GHashTable *of_instances; /* ihandle -> SpaprOfInstance */
>> + uint32_t of_instance_last;
>> + char *bootargs;
>> + uint32_t initrd_base; /* Updated in spapr at CAS */
>> + long initrd_size; /* Updated in spapr at CAS */
>> +} Vof;
>> +
>> +uint32_t vof_client_call(void *fdt, Vof *vof, const char *service,
>> + uint32_t *args, unsigned nargs,
>> + uint32_t *rets, unsigned nrets);
>> +uint64_t vof_claim(void *fdt, Vof *vof, uint64_t virt, uint64_t size,
>> + uint64_t align);
>> +void vof_cleanup(Vof *vof);
>> +void vof_build_dt(void *fdt, Vof *vof, uint32_t top_addr);
>> +uint32_t vof_client_open_store(void *fdt, Vof *vof, const char *nodename,
>> + const char *prop, const char *path);
>> +
>> +#define TYPE_VOF_MACHINE "vof-machine"
>> +
>> +typedef struct VofMachineClass VofMachineClass;
>
> Probably call this VofMachineInterface rather than VofMachineClass.
>
>> +DECLARE_CLASS_CHECKERS(VofMachineClass, VOF_MACHINE, TYPE_VOF_MACHINE)
>> +
>> +struct VofMachineClass {
>> + InterfaceClass parent;
>> + target_ulong (*client_architecture_support)(CPUState *cs, target_ulong vec);
>> + void (*quiesce)(void);
>> +};
>> +
>> +#endif /* HW_VOF_H */
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index bf7cab7a2ce1..3cb121dae707 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -12,6 +12,7 @@
>> #include "hw/ppc/spapr_xive.h" /* For SpaprXive */
>> #include "hw/ppc/xics.h" /* For ICSState */
>> #include "hw/ppc/spapr_tpm_proxy.h"
>> +#include "hw/ppc/vof.h"
>>
>> struct SpaprVioBus;
>> struct SpaprPhbState;
>> @@ -180,6 +181,8 @@ struct SpaprMachineState {
>> uint64_t kernel_addr;
>> uint32_t initrd_base;
>> long initrd_size;
>> + long fw_size;
>> + Vof *vof;
>> uint64_t rtc_offset; /* Now used only during incoming migration */
>> struct PPCTimebase tb;
>> bool has_graphics;
>> @@ -554,7 +557,9 @@ struct SpaprMachineState {
>> /* Client Architecture support */
>> #define KVMPPC_H_CAS (KVMPPC_HCALL_BASE + 0x2)
>> #define KVMPPC_H_UPDATE_DT (KVMPPC_HCALL_BASE + 0x3)
>> -#define KVMPPC_HCALL_MAX KVMPPC_H_UPDATE_DT
>> +/* 0x4 was used for KVMPPC_H_UPDATE_PHANDLE in SLOF */
>> +#define KVMPPC_H_VOF_CLIENT (KVMPPC_HCALL_BASE + 0x5)
>> +#define KVMPPC_HCALL_MAX KVMPPC_H_VOF_CLIENT
>>
>> /*
>> * The hcall range 0xEF00 to 0xEF80 is reserved for use in facilitating
>> @@ -945,4 +950,14 @@ bool spapr_check_pagesize(SpaprMachineState *spapr, hwaddr pagesize,
>> void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
>> hwaddr spapr_get_rtas_addr(void);
>> bool spapr_memory_hot_unplug_supported(SpaprMachineState *spapr);
>> +
>> +void spapr_vof_reset(SpaprMachineState *spapr, void *fdt,
>> + target_ulong *stack_ptr, Error **errp);
>> +void spapr_vof_quiesce(void);
>> +target_ulong spapr_h_vof_client(PowerPCCPU *cpu, SpaprMachineState *spapr,
>> + target_ulong opcode, target_ulong *args);
>> +target_ulong spapr_vof_client_architecture_support(CPUState *cs,
>> + target_ulong ovec_addr);
>> +void spapr_vof_client_dt_finalize(SpaprMachineState *spapr, void *fdt);
>> +
>> #endif /* HW_SPAPR_H */
>> diff --git a/pc-bios/vof/vof.h b/pc-bios/vof/vof.h
>> new file mode 100644
>> index 000000000000..29c80374d6dd
>> --- /dev/null
>> +++ b/pc-bios/vof/vof.h
>> @@ -0,0 +1,38 @@
>
> Especially as some guest code that's a bit more separated from the
> qemu core, it's probably worth putting an SPDX line here.
>
>> +#include <stdarg.h>
>> +
>> +typedef unsigned char uint8_t;
>> +typedef unsigned short uint16_t;
>> +typedef unsigned long uint32_t;
>> +typedef unsigned long long uint64_t;
>> +#define NULL (0)
>> +#define PROM_ERROR (-1u)
>> +typedef unsigned long ihandle;
>> +typedef unsigned long phandle;
>> +typedef int size_t;
>> +typedef void client(void);
>> +
>> +/* globals */
>> +extern void _prom_entry(void); /* OF CI entry point (i.e. this firmware) */
>> +
>> +void do_boot(unsigned long addr, unsigned long r3, unsigned long r4);
>> +
>> +/* libc */
>> +int strlen(const char *s);
>> +int strcmp(const char *s1, const char *s2);
>> +void *memcpy(void *dest, const void *src, size_t n);
>> +int memcmp(const void *ptr1, const void *ptr2, size_t n);
>> +void *memmove(void *dest, const void *src, size_t n);
>> +void *memset(void *dest, int c, size_t size);
>> +
>> +/* CI wrappers */
>> +void ci_panic(const char *str);
>> +phandle ci_finddevice(const char *path);
>> +uint32_t ci_getprop(phandle ph, const char *propname, void *prop, int len);
>> +
>> +/* booting from -kernel */
>> +void boot_from_memory(uint64_t initrd, uint64_t initrdsize);
>> +
>> +/* Entry points for CI and RTAS */
>> +extern uint32_t ci_entry(uint32_t params);
>> +extern unsigned long hv_rtas(unsigned long params);
>> +extern unsigned int hv_rtas_size;
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 73a06df3b1b1..0a41fd893c47 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -103,6 +103,7 @@
>> #define RTAS_MAX_ADDR 0x80000000 /* RTAS must stay below that */
>> #define FW_MAX_SIZE 0x400000
>> #define FW_FILE_NAME "slof.bin"
>> +#define FW_FILE_NAME_VOF "vof.bin"
>> #define FW_OVERHEAD 0x2800000
>> #define KERNEL_LOAD_ADDR FW_MAX_SIZE
>>
>> @@ -1625,22 +1626,41 @@ static void spapr_machine_reset(MachineState *machine)
>>
>> fdt = spapr_build_fdt(spapr, true, FDT_MAX_SIZE);
>>
>> - rc = fdt_pack(fdt);
>> + if (spapr->vof) {
>> + target_ulong stack_ptr = 0;
>>
>> - /* Should only fail if we've built a corrupted tree */
>> - assert(rc == 0);
>> + /*
>> + * Claims initramdisk and stack which changes "available" so
>> + * doing it befofe packing.
>> + */
>> + spapr_vof_reset(spapr, fdt, &stack_ptr, &error_fatal);
>>
>> - /* Load the fdt */
>> + spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT,
>> + stack_ptr, spapr->initrd_base,
>> + spapr->initrd_size);
>> + /* We do not pack the FDT as the client may change properties. */
>> + } else {
>> + rc = fdt_pack(fdt);
>> + /* Should only fail if we've built a corrupted tree */
>> + assert(rc == 0);
>> +
>> + spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT,
>> + 0, fdt_addr, 0);
>> + }
>> qemu_fdt_dumpdtb(fdt, fdt_totalsize(fdt));
>> - cpu_physical_memory_write(fdt_addr, fdt, fdt_totalsize(fdt));
>> +
>> g_free(spapr->fdt_blob);
>> spapr->fdt_size = fdt_totalsize(fdt);
>> spapr->fdt_initial_size = spapr->fdt_size;
>> spapr->fdt_blob = fdt;
>>
>> /* Set up the entry state */
>> - spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT, 0, fdt_addr, 0);
>> first_ppc_cpu->env.gpr[5] = 0;
>> + /* VOF client does not expect the FDT so we do not load it to the VM. */
>> + if (!spapr->vof) {
>> + /* Load the fdt */
>> + cpu_physical_memory_write(fdt_addr, spapr->fdt_blob, spapr->fdt_size);
>> + }
>>
>> spapr->fwnmi_system_reset_addr = -1;
>> spapr->fwnmi_machine_check_addr = -1;
>> @@ -2640,13 +2660,14 @@ static void spapr_machine_init(MachineState *machine)
>> SpaprMachineState *spapr = SPAPR_MACHINE(machine);
>> SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine);
>> MachineClass *mc = MACHINE_GET_CLASS(machine);
>> - const char *bios_name = machine->firmware ?: FW_FILE_NAME;
>> + const char *bios_default = !!spapr->vof ? FW_FILE_NAME_VOF : FW_FILE_NAME;
>> + const char *bios_name = machine->firmware ?: bios_default;
>> const char *kernel_filename = machine->kernel_filename;
>> const char *initrd_filename = machine->initrd_filename;
>> PCIHostState *phb;
>> int i;
>> MemoryRegion *sysmem = get_system_memory();
>> - long load_limit, fw_size;
>> + long load_limit;
>> char *filename;
>> Error *resize_hpt_err = NULL;
>>
>> @@ -2963,8 +2984,8 @@ static void spapr_machine_init(MachineState *machine)
>> error_report("Could not find LPAR firmware '%s'", bios_name);
>> exit(1);
>> }
>> - fw_size = load_image_targphys(filename, 0, FW_MAX_SIZE);
>> - if (fw_size <= 0) {
>> + spapr->fw_size = load_image_targphys(filename, 0, FW_MAX_SIZE);
>> + if (spapr->fw_size <= 0) {
>> error_report("Could not load LPAR firmware '%s'", filename);
>> exit(1);
>> }
>> @@ -2997,6 +3018,10 @@ static void spapr_machine_init(MachineState *machine)
>> }
>>
>> qemu_cond_init(&spapr->fwnmi_machine_check_interlock_cond);
>> +
>> + if (spapr->vof) {
>> + spapr_register_hypercall(KVMPPC_H_VOF_CLIENT, spapr_h_vof_client);
>> + }
>> }
>>
>> #define DEFAULT_KVM_TYPE "auto"
>> @@ -3187,6 +3212,28 @@ static void spapr_set_resize_hpt(Object *obj, const char *value, Error **errp)
>> }
>> }
>>
>> +static bool spapr_get_vof(Object *obj, Error **errp)
>> +{
>> + SpaprMachineState *spapr = SPAPR_MACHINE(obj);
>> +
>> + return spapr->vof != NULL;
>> +}
>> +
>> +static void spapr_set_vof(Object *obj, bool value, Error **errp)
>> +{
>> + SpaprMachineState *spapr = SPAPR_MACHINE(obj);
>> +
>> + if (spapr->vof) {
>> + vof_cleanup(spapr->vof);
>> + g_free(spapr->vof);
>> + spapr->vof = NULL;
>> + }
>> + if (!value) {
>> + return;
>> + }
>> + spapr->vof = g_malloc0(sizeof(*spapr->vof));
>> +}
>> +
>> static char *spapr_get_ic_mode(Object *obj, Error **errp)
>> {
>> SpaprMachineState *spapr = SPAPR_MACHINE(obj);
>> @@ -3312,6 +3359,10 @@ static void spapr_instance_init(Object *obj)
>> stringify(KERNEL_LOAD_ADDR)
>> " for -kernel is the default");
>> spapr->kernel_addr = KERNEL_LOAD_ADDR;
>> + object_property_add_bool(obj, "x-vof", spapr_get_vof, spapr_set_vof);
>> + object_property_set_description(obj, "x-vof",
>> + "Enable Virtual Open Firmware");
>> +
>> /* The machine class defines the default interrupt controller mode */
>> spapr->irq = smc->irq;
>> object_property_add_str(obj, "ic-mode", spapr_get_ic_mode,
>> @@ -4470,6 +4521,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>> XICSFabricClass *xic = XICS_FABRIC_CLASS(oc);
>> InterruptStatsProviderClass *ispc = INTERRUPT_STATS_PROVIDER_CLASS(oc);
>> XiveFabricClass *xfc = XIVE_FABRIC_CLASS(oc);
>> + VofMachineClass *vmc = VOF_MACHINE_CLASS(oc);
>>
>> mc->desc = "pSeries Logical Partition (PAPR compliant)";
>> mc->ignore_boot_device_suffixes = true;
>> @@ -4549,6 +4601,9 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>> smc->smp_threads_vsmt = true;
>> smc->nr_xirqs = SPAPR_NR_XIRQS;
>> xfc->match_nvt = spapr_match_nvt;
>> +
>> + vmc->client_architecture_support = spapr_vof_client_architecture_support;
>> + vmc->quiesce = spapr_vof_quiesce;
>> }
>>
>> static const TypeInfo spapr_machine_info = {
>> @@ -4568,6 +4623,7 @@ static const TypeInfo spapr_machine_info = {
>> { TYPE_XICS_FABRIC },
>> { TYPE_INTERRUPT_STATS_PROVIDER },
>> { TYPE_XIVE_FABRIC },
>> + { TYPE_VOF_MACHINE },
>> { }
>> },
>> };
>> @@ -5036,9 +5092,16 @@ static void spapr_machine_2_1_class_options(MachineClass *mc)
>> }
>> DEFINE_SPAPR_MACHINE(2_1, "2.1", false);
>>
>> +static const TypeInfo vof_machine_info = {
>> + .name = TYPE_VOF_MACHINE,
>> + .parent = TYPE_INTERFACE,
>> + .class_size = sizeof(VofMachineClass),
>> +};
>
> I think this belongs in vof.c rather than spapr.c
>
>> +
>> static void spapr_machine_register_types(void)
>> {
>> type_register_static(&spapr_machine_info);
>> + type_register_static(&vof_machine_info);
>> }
>>
>> type_init(spapr_machine_register_types)
>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>> index 7b5cd3553c26..0cdf90af6afb 100644
>> --- a/hw/ppc/spapr_hcall.c
>> +++ b/hw/ppc/spapr_hcall.c
>> @@ -1806,7 +1806,13 @@ target_ulong do_client_architecture_support(PowerPCCPU *cpu,
>> spapr_setup_hpt(spapr);
>> }
>>
>> - fdt = spapr_build_fdt(spapr, false, fdt_bufsize);
>> + if (spapr->vof && spapr->vof->initrd_base && spapr->vof->initrd_size) {
>> + /* Update initramdisk location so the right area gets reserved below */
>> + spapr->initrd_base = spapr->vof->initrd_base;
>> + spapr->initrd_size = spapr->vof->initrd_size;
>> + }
>> +
>> + fdt = spapr_build_fdt(spapr, spapr->vof != NULL, fdt_bufsize);
>>
>> g_free(spapr->fdt_blob);
>> spapr->fdt_size = fdt_totalsize(fdt);
>> @@ -1850,6 +1856,24 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>> return ret;
>> }
>>
>> +target_ulong spapr_vof_client_architecture_support(CPUState *cs,
>> + target_ulong ovec_addr)
>> +{
>> + SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +
>> + target_ulong ret = do_client_architecture_support(POWERPC_CPU(cs), spapr,
>> + ovec_addr, FDT_MAX_SIZE);
>> +
>> + /*
>> + * This adds stdout and generates phandles for boottime and CAS FDTs.
>> + * It is alright to update the FDT here as do_client_architecture_support()
>> + * does not pack it.
>> + */
>> + spapr_vof_client_dt_finalize(spapr, spapr->fdt_blob);
>> +
>> + return ret;
>> +}
>> +
>> static target_ulong h_get_cpu_characteristics(PowerPCCPU *cpu,
>> SpaprMachineState *spapr,
>> target_ulong opcode,
>> diff --git a/hw/ppc/spapr_vof.c b/hw/ppc/spapr_vof.c
>> new file mode 100644
>> index 000000000000..8a58364490f4
>> --- /dev/null
>> +++ b/hw/ppc/spapr_vof.c
>
> SPDX.
>
>> @@ -0,0 +1,148 @@
>> +#include "qemu/osdep.h"
>> +#include "qemu-common.h"
>> +#include <sys/ioctl.h>
>> +#include "qapi/error.h"
>> +#include "hw/ppc/spapr.h"
>> +#include "hw/ppc/spapr_vio.h"
>> +#include "hw/ppc/fdt.h"
>> +#include "sysemu/sysemu.h"
>> +#include "qom/qom-qobject.h"
>> +#include "trace.h"
>> +
>> +/* Copied from SLOF, and 4K is definitely not enough for GRUB */
>> +#define OF_STACK_SIZE 0x8000
>> +
>> +/* Defined as Big Endian */
>> +struct prom_args {
>> + uint32_t service;
>> + uint32_t nargs;
>> + uint32_t nret;
>> + uint32_t args[10];
>> +} QEMU_PACKED;
>> +
>> +target_ulong spapr_h_vof_client(PowerPCCPU *cpu, SpaprMachineState *spapr,
>> + target_ulong opcode, target_ulong *_args)
>> +{
>> + target_ulong args_real = ppc64_phys_to_real(_args[0]);
>> + struct prom_args args_be;
>> + uint32_t args[ARRAY_SIZE(args_be.args)];
>> + uint32_t rets[ARRAY_SIZE(args_be.args)] = { 0 }, ret;
>> + char service[64];
>> + unsigned nargs, nret, i;
>> +
>> + if (address_space_rw(&address_space_memory, args_real,
>> + MEMTXATTRS_UNSPECIFIED, &args_be, sizeof(args_be),
>> + false) != MEMTX_OK) {
>> + return H_HARDWARE;
>
> Probably H_PARAMETER rather than H_HARDWARE - the most likely cause
> here is that a bad address was specified for the arguments.
>
>> + }
>> + nargs = be32_to_cpu(args_be.nargs);
>> + if (nargs >= ARRAY_SIZE(args_be.args)) {
>> + return H_PARAMETER;
>> + }
>> +
>> + if (address_space_rw(&address_space_memory, be32_to_cpu(args_be.service),
>> + MEMTXATTRS_UNSPECIFIED, service, sizeof(service),
>> + false) != MEMTX_OK) {
>> + return H_HARDWARE;
>
> Likewise here and the rest.
>
>> + }
>> + if (strnlen(service, sizeof(service)) == sizeof(service)) {
>> + /* Too long service name */
>> + return H_PARAMETER;
>> + }
>> +
>> + for (i = 0; i < nargs; ++i) {
>> + args[i] = be32_to_cpu(args_be.args[i]);
>> + }
>> +
>> + nret = be32_to_cpu(args_be.nret);
>> + ret = vof_client_call(spapr->fdt_blob, spapr->vof, service,
>> + args, nargs, rets, nret);
>> + if (!nret) {
>> + return H_SUCCESS;
>> + }
>> +
>> + args_be.args[nargs] = cpu_to_be32(ret);
>> + for (i = 1; i < nret; ++i) {
>> + args_be.args[nargs + i] = cpu_to_be32(rets[i - 1]);
>> + }
>> +
>> + if (address_space_rw(&address_space_memory,
>> + args_real + offsetof(struct prom_args, args[nargs]),
>> + MEMTXATTRS_UNSPECIFIED, args_be.args + nargs,
>> + sizeof(args_be.args[0]) * nret, true) != MEMTX_OK) {
>> + return H_HARDWARE;
>> + }
>> +
>> + return H_SUCCESS;
>> +}
>> +
>> +void spapr_vof_client_dt_finalize(SpaprMachineState *spapr, void *fdt)
>> +{
>> + char *stdout_path = spapr_vio_stdout_path(spapr->vio_bus);
>> +
>> + vof_build_dt(fdt, spapr->vof, spapr->rma_size);
>> +
>> + /*
>> + * SLOF-less setup requires an open instance of stdout for early
>> + * kernel printk. By now all phandles are settled so we can open
>> + * the default serial console.
>> + */
>> + if (stdout_path) {
>> + _FDT(vof_client_open_store(fdt, spapr->vof, "/chosen", "stdout",
>> + stdout_path));
>> + }
>> +}
>> +
>> +void spapr_vof_reset(SpaprMachineState *spapr, void *fdt,
>> + target_ulong *stack_ptr, Error **errp)
>> +{
>> + Vof *vof = spapr->vof;
>> +
>> + vof_cleanup(vof);
>> +
>> + spapr_vof_client_dt_finalize(spapr, fdt);
>> +
>> + if (vof_claim(spapr->fdt_blob, vof, 0, spapr->fw_size, 0) == -1) {
>> + error_setg(errp, "Memory for firmware is in use");
>
> This could probably be an assert, yes? IIUC this the very first
> claim, so if this fails then we've placed things incorrectly in the
> first place, so it's a code error rather than a user error.
Passing &error_fatal as errp is an assert pretty much but more
informative imho.
In a followup I'd rather shuffle this function to claim kernel/initrd
first and only then claim space the VOF firmware which can fail so I can
then try allocating the space for it, return that to
spapr_machine_reset() to change the entry point. This way I'll be able
to pass 0 to -machine pseries,kernel-addr= to allow having kernel at 0
so when debugging via the qemu gdb stub, I could set breakpoints at
addresses from "objdump vmlinux" without adding the default kernel addr
0x400000 every time. When I do that - this is definitely going to be not
an assert.
>
>> + return;
>> + }
>> +
>> + *stack_ptr = vof_claim(spapr->fdt_blob, vof, 0, OF_STACK_SIZE,
>> + OF_STACK_SIZE);
>> + if (*stack_ptr == -1) {
>
> Likewise here.
This passes alignment so it is allocating and not claiming and if it
fails to allocate - it is most likely because the user cafefully
prepared the vof blob big enough to exhaust the VM memory.
>
>> + error_setg(errp, "Memory allocation for stack failed");
>> + return;
>> + }
>> + /*
>> + * Stack grows downwards and we also reserve here space for
>> + * the minimum stack frame.
>> + */
>> + *stack_ptr += OF_STACK_SIZE - 0x20;
>> +
>> + if (spapr->kernel_size &&
>> + vof_claim(spapr->fdt_blob, vof, spapr->kernel_addr, spapr->kernel_size,
>> + 0) == -1) {
>> + error_setg(errp, "Memory for kernel is in use");
>> + return;
>> + }
>> +
>> + if (spapr->initrd_size &&
>> + vof_claim(spapr->fdt_blob, vof, spapr->initrd_base, spapr->initrd_size,
>> + 0) == -1) {
>> + error_setg(errp, "Memory for initramdisk is in use");
>> + return;
>> + }
>> +
>> + /*
>> + * We skip writing FDT as nothing expects it; OF client interface is
>> + * going to be used for reading the device tree.
>> + */
>> +}
>> +
>> +void spapr_vof_quiesce(void)
>> +{
>> + SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>> +
>> + spapr->fdt_size = fdt_totalsize(spapr->fdt_blob);
>> + spapr->fdt_initial_size = spapr->fdt_size;
>
> I suspect this doesn't matter. AFAICT the only use of
> fdt_initial_size is for the H_UPDATE_DT call, which shouldn't be used
> with VOF. But, that can be a later cleanup.
It does not (like the whole quiesce thing - I only pack the FDT to get
errors if the client tries updating it) but I really like having all fdt
properties in sync.
--
Alexey
next prev parent reply other threads:[~2021-03-25 3:26 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-23 2:58 [PATCH qemu v16] spapr: Implement Open Firmware client interface Alexey Kardashevskiy
2021-03-25 2:52 ` David Gibson
2021-03-25 3:25 ` Alexey Kardashevskiy [this message]
2021-03-31 1:03 ` David Gibson
2021-04-01 0:17 ` Alexey Kardashevskiy
2021-04-01 3:10 ` David Gibson
2021-03-31 11:54 ` BALATON Zoltan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=98565b10-debd-be0a-79f7-9f08737a49d1@ozlabs.ru \
--to=aik@ozlabs.ru \
--cc=david@gibson.dropbear.id.au \
--cc=groug@kaod.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).