All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH qemu v6 0/6] spapr: Kill SLOF
@ 2020-02-03  3:29 Alexey Kardashevskiy
  2020-02-03  3:29 ` [PATCH qemu v6 1/6] ppc: Start CPU in the default mode which is big-endian 32bit Alexey Kardashevskiy
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Alexey Kardashevskiy @ 2020-02-03  3:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Paolo Bonzini, qemu-ppc, Alexey Kardashevskiy,
	David Gibson

This is v6 of an effort to implement Open Firmware Client Interface
in QEMU. The feature is described in 6/6, 1/6..5/6 are small
but necessary preparations.

With this thing, I can boot unmodified Ubuntu 18.04 and Fedora 30
directly from the disk without SLOF.


This is based on sha1
532fe321cf06 Richard Henderson "target/ppc: Use probe_write for DCBZ".

Please comment. Thanks.



Alexey Kardashevskiy (6):
  ppc: Start CPU in the default mode which is big-endian 32bit
  ppc/spapr: Move GPRs setup to one place
  spapr/spapr: Make vty_getchars public
  spapr/cas: Separate CAS handling from rebuilding the FDT
  spapr: Allow changing offset for -kernel image
  spapr: Implement Open Firmware client interface

 hw/ppc/Makefile.objs            |    1 +
 include/hw/ppc/spapr.h          |   29 +-
 include/hw/ppc/spapr_cpu_core.h |    4 +-
 include/hw/ppc/spapr_vio.h      |    1 +
 hw/char/spapr_vty.c             |    2 +-
 hw/ppc/spapr.c                  |  139 ++-
 hw/ppc/spapr_cpu_core.c         |    7 +-
 hw/ppc/spapr_hcall.c            |   73 +-
 hw/ppc/spapr_of_client.c        | 1526 +++++++++++++++++++++++++++++++
 hw/ppc/spapr_rtas.c             |    2 +-
 target/ppc/translate_init.inc.c |    6 -
 hw/ppc/trace-events             |   24 +
 12 files changed, 1744 insertions(+), 70 deletions(-)
 create mode 100644 hw/ppc/spapr_of_client.c

-- 
2.17.1



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH qemu v6 1/6] ppc: Start CPU in the default mode which is big-endian 32bit
  2020-02-03  3:29 [PATCH qemu v6 0/6] spapr: Kill SLOF Alexey Kardashevskiy
@ 2020-02-03  3:29 ` Alexey Kardashevskiy
  2020-02-12  5:43   ` David Gibson
  2020-02-03  3:29 ` [PATCH qemu v6 2/6] ppc/spapr: Move GPRs setup to one place Alexey Kardashevskiy
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Alexey Kardashevskiy @ 2020-02-03  3:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Paolo Bonzini, qemu-ppc, Alexey Kardashevskiy,
	David Gibson

At the moment we enforce 64bit mode on a CPU when reset. This does not
make difference as SLOF or Linux set the desired mode straight away.
However if we ever boot something other than these two,
this might not work as, for example, GRUB expects the default MSR state
and does not work properly.

This removes setting MSR_SF from the PPC CPU reset.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 target/ppc/translate_init.inc.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
index 53995f62eab2..f6a676cf55e8 100644
--- a/target/ppc/translate_init.inc.c
+++ b/target/ppc/translate_init.inc.c
@@ -10710,12 +10710,6 @@ static void ppc_cpu_reset(CPUState *s)
 #endif
 #endif
 
-#if defined(TARGET_PPC64)
-    if (env->mmu_model & POWERPC_MMU_64) {
-        msr |= (1ULL << MSR_SF);
-    }
-#endif
-
     hreg_store_msr(env, msr, 1);
 
 #if !defined(CONFIG_USER_ONLY)
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH qemu v6 2/6] ppc/spapr: Move GPRs setup to one place
  2020-02-03  3:29 [PATCH qemu v6 0/6] spapr: Kill SLOF Alexey Kardashevskiy
  2020-02-03  3:29 ` [PATCH qemu v6 1/6] ppc: Start CPU in the default mode which is big-endian 32bit Alexey Kardashevskiy
@ 2020-02-03  3:29 ` Alexey Kardashevskiy
  2020-02-12 18:44   ` Fabiano Rosas
  2020-02-13  8:41   ` Greg Kurz
  2020-02-03  3:29 ` [PATCH qemu v6 3/6] spapr/spapr: Make vty_getchars public Alexey Kardashevskiy
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 17+ messages in thread
From: Alexey Kardashevskiy @ 2020-02-03  3:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Paolo Bonzini, qemu-ppc, Alexey Kardashevskiy,
	David Gibson

At the moment "pseries" starts in SLOF which only expects the FDT blob
pointer in r3. As we are going to introduce a OpenFirmware support in
QEMU, we will be booting OF clients directly and these expect a stack
pointer in r1, the OF entry point in r5 and in addition to this, Linux
looks at r3/r4 for the initramdisk location (although vmlinux can find
this from the device tree but zImage from distro kernels cannot).

This extends spapr_cpu_set_entry_state() to take more registers. This
should cause no behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 include/hw/ppc/spapr_cpu_core.h | 4 +++-
 hw/ppc/spapr.c                  | 4 ++--
 hw/ppc/spapr_cpu_core.c         | 7 ++++++-
 hw/ppc/spapr_rtas.c             | 2 +-
 4 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/include/hw/ppc/spapr_cpu_core.h b/include/hw/ppc/spapr_cpu_core.h
index 1c4cc6559c52..edd7214fafcf 100644
--- a/include/hw/ppc/spapr_cpu_core.h
+++ b/include/hw/ppc/spapr_cpu_core.h
@@ -40,7 +40,9 @@ typedef struct SpaprCpuCoreClass {
 } SpaprCpuCoreClass;
 
 const char *spapr_get_cpu_core_type(const char *cpu_type);
-void spapr_cpu_set_entry_state(PowerPCCPU *cpu, target_ulong nip, target_ulong r3);
+void spapr_cpu_set_entry_state(PowerPCCPU *cpu, target_ulong nip,
+                               target_ulong r1, target_ulong r3,
+                               target_ulong r4, target_ulong r5);
 
 typedef struct SpaprCpuState {
     uint64_t vpa_addr;
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index c9b2e0a5e060..660a4b60e072 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1674,8 +1674,8 @@ static void spapr_machine_reset(MachineState *machine)
     spapr->fdt_blob = fdt;
 
     /* Set up the entry state */
-    spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT, fdt_addr);
-    first_ppc_cpu->env.gpr[5] = 0;
+    spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT,
+                              0, fdt_addr, 0, 0);
 
     spapr->cas_reboot = false;
 
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index d09125d9afd4..696b76598dd7 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -84,13 +84,18 @@ static void spapr_reset_vcpu(PowerPCCPU *cpu)
     spapr_irq_cpu_intc_reset(spapr, cpu);
 }
 
-void spapr_cpu_set_entry_state(PowerPCCPU *cpu, target_ulong nip, target_ulong r3)
+void spapr_cpu_set_entry_state(PowerPCCPU *cpu, target_ulong nip,
+                               target_ulong r1, target_ulong r3,
+                               target_ulong r4, target_ulong r5)
 {
     PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
     CPUPPCState *env = &cpu->env;
 
     env->nip = nip;
+    env->gpr[1] = r1;
     env->gpr[3] = r3;
+    env->gpr[4] = r4;
+    env->gpr[5] = r5;
     kvmppc_set_reg_ppc_online(cpu, 1);
     CPU(cpu)->halted = 0;
     /* Enable Power-saving mode Exit Cause exceptions */
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 656fdd221665..9e3cbd70bbd9 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -190,7 +190,7 @@ static void rtas_start_cpu(PowerPCCPU *callcpu, SpaprMachineState *spapr,
      */
     newcpu->env.tb_env->tb_offset = callcpu->env.tb_env->tb_offset;
 
-    spapr_cpu_set_entry_state(newcpu, start, r3);
+    spapr_cpu_set_entry_state(newcpu, start, 0, r3, 0, 0);
 
     qemu_cpu_kick(CPU(newcpu));
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH qemu v6 3/6] spapr/spapr: Make vty_getchars public
  2020-02-03  3:29 [PATCH qemu v6 0/6] spapr: Kill SLOF Alexey Kardashevskiy
  2020-02-03  3:29 ` [PATCH qemu v6 1/6] ppc: Start CPU in the default mode which is big-endian 32bit Alexey Kardashevskiy
  2020-02-03  3:29 ` [PATCH qemu v6 2/6] ppc/spapr: Move GPRs setup to one place Alexey Kardashevskiy
@ 2020-02-03  3:29 ` Alexey Kardashevskiy
  2020-02-03  3:29 ` [PATCH qemu v6 4/6] spapr/cas: Separate CAS handling from rebuilding the FDT Alexey Kardashevskiy
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Alexey Kardashevskiy @ 2020-02-03  3:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Paolo Bonzini, qemu-ppc, Alexey Kardashevskiy,
	David Gibson

A serial device fetches the data from the chardev backend as soon as
input happens and stores it in its internal device specific buffer, every
char device implements it again. Since there is no unified interface to
read such buffer, we will have to read characters directly from
VIO_SPAPR_VTY_DEVICE. The OF client is going to need this.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 include/hw/ppc/spapr_vio.h | 1 +
 hw/char/spapr_vty.c        | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/hw/ppc/spapr_vio.h b/include/hw/ppc/spapr_vio.h
index bed7df60e35c..77e9b73bdfe0 100644
--- a/include/hw/ppc/spapr_vio.h
+++ b/include/hw/ppc/spapr_vio.h
@@ -130,6 +130,7 @@ int spapr_vio_send_crq(SpaprVioDevice *dev, uint8_t *crq);
 
 SpaprVioDevice *vty_lookup(SpaprMachineState *spapr, target_ulong reg);
 void vty_putchars(SpaprVioDevice *sdev, uint8_t *buf, int len);
+int vty_getchars(SpaprVioDevice *sdev, uint8_t *buf, int max);
 void spapr_vty_create(SpaprVioBus *bus, Chardev *chardev);
 void spapr_vlan_create(SpaprVioBus *bus, NICInfo *nd);
 void spapr_vscsi_create(SpaprVioBus *bus);
diff --git a/hw/char/spapr_vty.c b/hw/char/spapr_vty.c
index ecb94f5673ca..1c00da75b4f1 100644
--- a/hw/char/spapr_vty.c
+++ b/hw/char/spapr_vty.c
@@ -52,7 +52,7 @@ static void vty_receive(void *opaque, const uint8_t *buf, int size)
     }
 }
 
-static int vty_getchars(SpaprVioDevice *sdev, uint8_t *buf, int max)
+int vty_getchars(SpaprVioDevice *sdev, uint8_t *buf, int max)
 {
     SpaprVioVty *dev = VIO_SPAPR_VTY_DEVICE(sdev);
     int n = 0;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH qemu v6 4/6] spapr/cas: Separate CAS handling from rebuilding the FDT
  2020-02-03  3:29 [PATCH qemu v6 0/6] spapr: Kill SLOF Alexey Kardashevskiy
                   ` (2 preceding siblings ...)
  2020-02-03  3:29 ` [PATCH qemu v6 3/6] spapr/spapr: Make vty_getchars public Alexey Kardashevskiy
@ 2020-02-03  3:29 ` Alexey Kardashevskiy
  2020-02-03  3:29 ` [PATCH qemu v6 5/6] spapr: Allow changing offset for -kernel image Alexey Kardashevskiy
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Alexey Kardashevskiy @ 2020-02-03  3:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Paolo Bonzini, qemu-ppc, Alexey Kardashevskiy,
	David Gibson

At the moment "ibm,client-architecture-support" ("CAS") is implemented
in SLOF and QEMU assists via the custom H_CAS hypercall which copies
an updated flatten device tree (FDT) blob to the SLOF memory which
it then uses to update its internal tree.

When we enable the OpenFirmware client interface in QEMU, we won't need
to copy the FDT to the guest as the client is expected to fetch
the device tree using the client interface.

This moves FDT rebuild out to a separate helper which is going to be
called from the "ibm,client-architecture-support" handler and leaves
writing FDT to the guest in the H_CAS handler.

This should not cause any behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 include/hw/ppc/spapr.h |  7 +++++
 hw/ppc/spapr.c         |  1 -
 hw/ppc/spapr_hcall.c   | 67 ++++++++++++++++++++++++++----------------
 3 files changed, 48 insertions(+), 27 deletions(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index a1fba95c824b..3b50f36c338a 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -102,6 +102,8 @@ typedef enum {
 #define SPAPR_CAP_FIXED_CCD             0x03
 #define SPAPR_CAP_FIXED_NA              0x10 /* Lets leave a bit of a gap... */
 
+#define FDT_MAX_SIZE                    0x100000
+
 typedef struct SpaprCapabilities SpaprCapabilities;
 struct SpaprCapabilities {
     uint8_t caps[SPAPR_CAP_NUM];
@@ -551,6 +553,11 @@ void spapr_register_hypercall(target_ulong opcode, spapr_hcall_fn fn);
 target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
                              target_ulong *args);
 
+target_ulong do_client_architecture_support(PowerPCCPU *cpu,
+                                            SpaprMachineState *spapr,
+                                            target_ulong addr,
+                                            target_ulong fdt_bufsize);
+
 /* Virtual Processor Area structure constants */
 #define VPA_MIN_SIZE           640
 #define VPA_SIZE_OFFSET        0x4
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 660a4b60e072..60153bf0b771 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -95,7 +95,6 @@
  *
  * We load our kernel at 4M, leaving space for SLOF initial image
  */
-#define FDT_MAX_SIZE            0x100000
 #define RTAS_MAX_ADDR           0x80000000 /* RTAS must stay below that */
 #define FW_MAX_SIZE             0x400000
 #define FW_FILE_NAME            "slof.bin"
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index b8bb66b5c0d4..da50d8ee5dd7 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1660,16 +1660,12 @@ static bool spapr_hotplugged_dev_before_cas(void)
     return false;
 }
 
-static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
-                                                  SpaprMachineState *spapr,
-                                                  target_ulong opcode,
-                                                  target_ulong *args)
+target_ulong do_client_architecture_support(PowerPCCPU *cpu,
+                                            SpaprMachineState *spapr,
+                                            target_ulong vec,
+                                            target_ulong fdt_bufsize)
 {
-    /* Working address in data buffer */
-    target_ulong addr = ppc64_phys_to_real(args[0]);
-    target_ulong fdt_buf = args[1];
-    target_ulong fdt_bufsize = args[2];
-    target_ulong ov_table;
+    target_ulong ov_table; /* Working address in data buffer */
     uint32_t cas_pvr;
     SpaprOptionVector *ov1_guest, *ov5_guest, *ov5_cas_old;
     bool guest_radix;
@@ -1689,7 +1685,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
         }
     }
 
-    cas_pvr = cas_check_pvr(spapr, cpu, &addr, &raw_mode_supported, &local_err);
+    cas_pvr = cas_check_pvr(spapr, cpu, &vec, &raw_mode_supported, &local_err);
     if (local_err) {
         error_report_err(local_err);
         return H_HARDWARE;
@@ -1712,7 +1708,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
     }
 
     /* For the future use: here @ov_table points to the first option vector */
-    ov_table = addr;
+    ov_table = vec;
 
     ov1_guest = spapr_ovec_parse_vector(ov_table, 1);
     if (!ov1_guest) {
@@ -1836,7 +1832,6 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
 
     if (!spapr->cas_reboot) {
         void *fdt;
-        SpaprDeviceTreeUpdateHeader hdr = { .version_id = 1 };
 
         /* If spapr_machine_reset() did not set up a HPT but one is necessary
          * (because the guest isn't going to use radix) then set it up here. */
@@ -1845,21 +1840,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
             spapr_setup_hpt_and_vrma(spapr);
         }
 
-        if (fdt_bufsize < sizeof(hdr)) {
-            error_report("SLOF provided insufficient CAS buffer "
-                         TARGET_FMT_lu " (min: %zu)", fdt_bufsize, sizeof(hdr));
-            exit(EXIT_FAILURE);
-        }
-
-        fdt_bufsize -= sizeof(hdr);
-
         fdt = spapr_build_fdt(spapr, false, fdt_bufsize);
-        _FDT((fdt_pack(fdt)));
-
-        cpu_physical_memory_write(fdt_buf, &hdr, sizeof(hdr));
-        cpu_physical_memory_write(fdt_buf + sizeof(hdr), fdt,
-                                  fdt_totalsize(fdt));
-        trace_spapr_cas_continue(fdt_totalsize(fdt) + sizeof(hdr));
 
         g_free(spapr->fdt_blob);
         spapr->fdt_size = fdt_totalsize(fdt);
@@ -1874,6 +1855,40 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
     return H_SUCCESS;
 }
 
+static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
+                                                  SpaprMachineState *spapr,
+                                                  target_ulong opcode,
+                                                  target_ulong *args)
+{
+    target_ulong vec = ppc64_phys_to_real(args[0]);
+    target_ulong fdt_buf = args[1];
+    target_ulong fdt_bufsize = args[2];
+    target_ulong ret;
+    SpaprDeviceTreeUpdateHeader hdr = { .version_id = 1 };
+
+    if (fdt_bufsize < sizeof(hdr)) {
+        error_report("SLOF provided insufficient CAS buffer "
+                     TARGET_FMT_lu " (min: %zu)", fdt_bufsize, sizeof(hdr));
+        exit(EXIT_FAILURE);
+    }
+
+    fdt_bufsize -= sizeof(hdr);
+
+    ret = do_client_architecture_support(cpu, spapr, vec, fdt_bufsize);
+    if (ret == H_SUCCESS) {
+        _FDT((fdt_pack(spapr->fdt_blob)));
+        spapr->fdt_size = fdt_totalsize(spapr->fdt_blob);
+        spapr->fdt_initial_size = spapr->fdt_size;
+
+        cpu_physical_memory_write(fdt_buf, &hdr, sizeof(hdr));
+        cpu_physical_memory_write(fdt_buf + sizeof(hdr), spapr->fdt_blob,
+                                  spapr->fdt_size);
+        trace_spapr_cas_continue(spapr->fdt_size + sizeof(hdr));
+    }
+
+    return ret;
+}
+
 static target_ulong h_home_node_associativity(PowerPCCPU *cpu,
                                               SpaprMachineState *spapr,
                                               target_ulong opcode,
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH qemu v6 5/6] spapr: Allow changing offset for -kernel image
  2020-02-03  3:29 [PATCH qemu v6 0/6] spapr: Kill SLOF Alexey Kardashevskiy
                   ` (3 preceding siblings ...)
  2020-02-03  3:29 ` [PATCH qemu v6 4/6] spapr/cas: Separate CAS handling from rebuilding the FDT Alexey Kardashevskiy
@ 2020-02-03  3:29 ` Alexey Kardashevskiy
  2020-02-12 18:54   ` Fabiano Rosas
  2020-02-13  2:58   ` David Gibson
  2020-02-03  3:29 ` [PATCH qemu v6 6/6] spapr: Implement Open Firmware client interface Alexey Kardashevskiy
  2020-02-05  4:59 ` [PATCH qemu v6] spapr: OF CI networking Alexey Kardashevskiy
  6 siblings, 2 replies; 17+ messages in thread
From: Alexey Kardashevskiy @ 2020-02-03  3:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Paolo Bonzini, qemu-ppc, Alexey Kardashevskiy,
	David Gibson

This allows moving the kernel in the guest memory. The option is useful
for step debugging (as Linux is linked at 0x0); it also allows loading
grub which is normally linked to run at 0x20000.

This uses the existing kernel address by default.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 include/hw/ppc/spapr.h |  1 +
 hw/ppc/spapr.c         | 38 +++++++++++++++++++++++++++++++-------
 2 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 3b50f36c338a..32e831a395ae 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -164,6 +164,7 @@ struct SpaprMachineState {
     void *fdt_blob;
     long kernel_size;
     bool kernel_le;
+    uint64_t kernel_addr;
     uint32_t initrd_base;
     long initrd_size;
     uint64_t rtc_offset; /* Now used only during incoming migration */
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 60153bf0b771..b59e9dc360fe 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1054,7 +1054,7 @@ static void spapr_dt_chosen(SpaprMachineState *spapr, void *fdt)
     }
 
     if (spapr->kernel_size) {
-        uint64_t kprop[2] = { cpu_to_be64(KERNEL_LOAD_ADDR),
+        uint64_t kprop[2] = { cpu_to_be64(spapr->kernel_addr),
                               cpu_to_be64(spapr->kernel_size) };
 
         _FDT(fdt_setprop(fdt, chosen, "qemu,boot-kernel",
@@ -1242,7 +1242,8 @@ void *spapr_build_fdt(SpaprMachineState *spapr, bool reset, size_t space)
     /* Build memory reserve map */
     if (reset) {
         if (spapr->kernel_size) {
-            _FDT((fdt_add_mem_rsv(fdt, KERNEL_LOAD_ADDR, spapr->kernel_size)));
+            _FDT((fdt_add_mem_rsv(fdt, spapr->kernel_addr,
+                                  spapr->kernel_size)));
         }
         if (spapr->initrd_size) {
             _FDT((fdt_add_mem_rsv(fdt, spapr->initrd_base,
@@ -1270,7 +1271,9 @@ void *spapr_build_fdt(SpaprMachineState *spapr, bool reset, size_t space)
 
 static uint64_t translate_kernel_address(void *opaque, uint64_t addr)
 {
-    return (addr & 0x0fffffff) + KERNEL_LOAD_ADDR;
+    SpaprMachineState *spapr = opaque;
+
+    return (addr & 0x0fffffff) + spapr->kernel_addr;
 }
 
 static void emulate_spapr_hypercall(PPCVirtualHypervisor *vhyp,
@@ -2947,14 +2950,15 @@ static void spapr_machine_init(MachineState *machine)
         uint64_t lowaddr = 0;
 
         spapr->kernel_size = load_elf(kernel_filename, NULL,
-                                      translate_kernel_address, NULL,
+                                      translate_kernel_address, spapr,
                                       NULL, &lowaddr, NULL, NULL, 1,
                                       PPC_ELF_MACHINE, 0, 0);
         if (spapr->kernel_size == ELF_LOAD_WRONG_ENDIAN) {
             spapr->kernel_size = load_elf(kernel_filename, NULL,
-                                          translate_kernel_address, NULL, NULL,
+                                          translate_kernel_address, spapr, NULL,
                                           &lowaddr, NULL, NULL, 0,
-                                          PPC_ELF_MACHINE, 0, 0);
+                                          PPC_ELF_MACHINE,
+                                          0, 0);
             spapr->kernel_le = spapr->kernel_size > 0;
         }
         if (spapr->kernel_size < 0) {
@@ -2968,7 +2972,7 @@ static void spapr_machine_init(MachineState *machine)
             /* Try to locate the initrd in the gap between the kernel
              * and the firmware. Add a bit of space just in case
              */
-            spapr->initrd_base = (KERNEL_LOAD_ADDR + spapr->kernel_size
+            spapr->initrd_base = (spapr->kernel_addr + spapr->kernel_size
                                   + 0x1ffff) & ~0xffff;
             spapr->initrd_size = load_image_targphys(initrd_filename,
                                                      spapr->initrd_base,
@@ -3214,6 +3218,18 @@ static void spapr_set_vsmt(Object *obj, Visitor *v, const char *name,
     visit_type_uint32(v, name, (uint32_t *)opaque, errp);
 }
 
+static void spapr_get_kernel_addr(Object *obj, Visitor *v, const char *name,
+                                  void *opaque, Error **errp)
+{
+    visit_type_uint64(v, name, (uint64_t *)opaque, errp);
+}
+
+static void spapr_set_kernel_addr(Object *obj, Visitor *v, const char *name,
+                                  void *opaque, Error **errp)
+{
+    visit_type_uint64(v, name, (uint64_t *)opaque, errp);
+}
+
 static char *spapr_get_ic_mode(Object *obj, Error **errp)
 {
     SpaprMachineState *spapr = SPAPR_MACHINE(obj);
@@ -3319,6 +3335,14 @@ static void spapr_instance_init(Object *obj)
     object_property_add_bool(obj, "vfio-no-msix-emulation",
                              spapr_get_msix_emulation, NULL, NULL);
 
+    object_property_add(obj, "kernel-addr", "uint64", spapr_get_kernel_addr,
+                        spapr_set_kernel_addr, NULL, &spapr->kernel_addr,
+                        &error_abort);
+    object_property_set_description(obj, "kernel-addr",
+                                    stringify(KERNEL_LOAD_ADDR)
+                                    " for -kernel is the default",
+                                    NULL);
+    spapr->kernel_addr = KERNEL_LOAD_ADDR;
     /* The machine class defines the default interrupt controller mode */
     spapr->irq = smc->irq;
     object_property_add_str(obj, "ic-mode", spapr_get_ic_mode,
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH qemu v6 6/6] spapr: Implement Open Firmware client interface
  2020-02-03  3:29 [PATCH qemu v6 0/6] spapr: Kill SLOF Alexey Kardashevskiy
                   ` (4 preceding siblings ...)
  2020-02-03  3:29 ` [PATCH qemu v6 5/6] spapr: Allow changing offset for -kernel image Alexey Kardashevskiy
@ 2020-02-03  3:29 ` Alexey Kardashevskiy
  2020-02-03 13:03   ` BALATON Zoltan
  2020-02-05  4:59 ` [PATCH qemu v6] spapr: OF CI networking Alexey Kardashevskiy
  6 siblings, 1 reply; 17+ messages in thread
From: Alexey Kardashevskiy @ 2020-02-03  3:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Paolo Bonzini, qemu-ppc, Alexey Kardashevskiy,
	David Gibson

The PAPR platform which describes an OS environment that's presented by
a combination of a hypervisor and firmware. The features it specifies
require collaboration between the firmware and the hypervisor.

Since the beginning, the runtime component of the firmware (RTAS) has
been implemented as a 20 byte shim which simply forwards it to
a hypercall implemented in qemu. The boottime firmware component is
SLOF - but a build that's specific to qemu, and has always needed to be
updated in sync with it. Even though we've managed to limit the amount
of runtime communication we need between qemu and SLOF, there's some,
and it's become increasingly awkward to handle as we've implemented
new features.

This implements a boot time OF client interface (CI) which is
enabled by a new "vof" pseries machine option (stands for "Virtual Open
Firmware). When enabled, QEMU does not load SLOF, instead it copies
a small RTAS-alike 20-bytes long shim and jumps to the image from
"-kernel"; if no -kernel specified, this tries loading a bootloader from
boot devices.

This adds very basic support to read MBR/GPT to find a PReP partition
which is then loaded as an ELF. spapr-vty and virtio-scsi are supported
(it is basically adding a "disk" node under a SCSI host).

This adds support for a console. For output any serial device can be used,
for stdin the support is limited by spapr-vty only as allowing input from
a serial device requires device-model specific code (output is simpler).

Note that this implements blockdev and chardev support by hooking OF CI
calls to the backends bypassing any devices and drivers in between.

This implements a handful of CI methods just to get Linux and GRUB going;
Linux requires even less. In particular, this implements the device tree
fetching, reading from block device, read-write stdout/stdin,
ibm,client-architecture-support and instantiate-rtas.

This implements changing some device tree properties which we know how
to deal with, the rest is ignored. To allow changes, this skips
fdt_pack() when vof=on as not packing the blob leaves some room for
appending.

In absence of SLOF, this assigns "phandles" to device tree nodes to make
device tree traversing work.

When vof=off, this adds "/chosen" every time QEMU (re)builds a tree.

This implements "claim" (an OF CI memory allocator) and updates
"/memory@0/available" to report the client about available memory.

This adds a machine ready hook which looks for a bootloader as this cannot
be done from:
- machine init: too early, devices and their bootinedxes
are not known yet;
- machine reset: too late, all images must be registered with the image's
"loader.c" before that.

This disables the translate_kernel_address() hack from ELF loader when
vof=on to allow passing GRUB image via -kernel (requires
-kernel-addr=0x200000 as this is how GRUB is linked).

This adds basic instances support which are managed by a hashmap
ihandle -> [phandle, DeviceState, CharBackend, BlockBackend].

Before the guest started, the used memory is:
0..8000 - stack (the size is copied from SLOF, tested 4k - too little)
8000..8020 - OF CI blob
200000..600000 - GRUB

The limitations summary:
1. load_elf only loads from files so this stores a found bootloader in
the current directory and then calls load_elf on it;
2. load_elf does not report used memory;
3. reading serial device is device-model specific;
4. no networking in OF CI at all;
5. no vga;
6. no disk partitions in CI, i.e. no commas to select a partition -
this relies on a bootloader accessing the disk as a whole;
7. "interpret" (executes passed forth expression) does nothing as in this
environment grub only uses it for switching cursor off and similar tasks.


The test command line (basically - requires a boot order in any form and
"vof=on",):

./ppc64-softmmu/qemu-system-ppc64 \
-nodefaults \
-chardev stdio,id=STDIO0,signal=off,mux=on \
-device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
-mon id=MON0,chardev=STDIO0,mode=readline \
-nographic \
-vga none \
-machine pseries,vof=on \
-m 4G \
-device spapr-vscsi,id=svscsi0 \
-drive id=DRIVE0,if=none,file=img/f30le.qcow2,format=qcow2 \
-device scsi-hd,id=scsi-hd0,drive=DRIVE0,bootindex=1 \
-snapshot \
-enable-kvm \
-smp 8,threads=8 \
-L /home/aik/t/qemu-ppc64-bios/ \
-trace events=qemu_trace_events \
-d guest_errors \
-chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.ssh55056 \
-mon chardev=SOCKET0,mode=control


Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v6:
* borrowed a big chunk of commit log introduction from David
* fixed initial stack pointer (points to the highest address of stack)
* traces for "interpret" and others
* disabled  translate_kernel_address() hack so grub can load (work in
progress)
* added "milliseconds" for grub
* fixed "claim" allocator again
* moved FDT_MAX_SIZE to spapr.h as spapr_of_client.c wants it too for CAS
* moved the most code possible from spapr.c to spapr_of_client.c, such as
RTAS, prom entry and FDT build/finalize
* separated blobs
* GRUB now proceeds to its console prompt (there are still other issues)
* parse MBR/GPT to find PReP and load GRUB

v5:
* made instances keep device and chardev pointers
* removed VIO dependencies
* print error if RTAS memory is not claimed as it should have been
* pack FDT as "quiesce"

v4:
* fixed open
* validate ihandles in "call-method"

v3:
* fixed phandles allocation
* s/__be32/uint32_t/ as we do not normally have __be32 type in qemu
* fixed size of /chosen/stdout
* bunch of renames
* do not create rtas properties at all, let the client deal with it;
instead setprop allows changing these in the FDT
* no more packing FDT when bios=off - nobody needs it and getprop does not
work otherwise
* allow updating initramdisk device tree properties (for zImage)
* added instances
* fixed stdout on OF's "write"
* removed special handling for stdout in OF client, spapr-vty handles it
instead

v2:
* fixed claim()
* added "setprop"
* cleaner client interface and RTAS blobs management
* boots to petitboot and further to the target system
* more trace points
---
 hw/ppc/Makefile.objs     |    1 +
 include/hw/ppc/spapr.h   |   21 +-
 hw/ppc/spapr.c           |  100 ++-
 hw/ppc/spapr_hcall.c     |    6 +-
 hw/ppc/spapr_of_client.c | 1526 ++++++++++++++++++++++++++++++++++++++
 hw/ppc/trace-events      |   24 +
 6 files changed, 1652 insertions(+), 26 deletions(-)
 create mode 100644 hw/ppc/spapr_of_client.c

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index a4bac57be678..0c2720c1d550 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -8,6 +8,7 @@ obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
 obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o
 obj-$(CONFIG_PSERIES) += spapr_cpu_core.o spapr_ovec.o spapr_irq.o
 obj-$(CONFIG_PSERIES) += spapr_tpm_proxy.o
+obj-$(CONFIG_PSERIES) += spapr_of_client.o
 obj-$(CONFIG_SPAPR_RNG) +=  spapr_rng.o
 obj-$(call land,$(CONFIG_PSERIES),$(CONFIG_LINUX)) += spapr_pci_vfio.o spapr_pci_nvlink2.o
 # IBM PowerNV
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 32e831a395ae..78576f829959 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -109,6 +109,11 @@ struct SpaprCapabilities {
     uint8_t caps[SPAPR_CAP_NUM];
 };
 
+typedef struct {
+    uint64_t start;
+    uint64_t size;
+} SpaprOfClaimed;
+
 /**
  * SpaprMachineClass:
  */
@@ -165,6 +170,14 @@ struct SpaprMachineState {
     long kernel_size;
     bool kernel_le;
     uint64_t kernel_addr;
+    bool vof; /* Virtual Open Firmware */
+    uint32_t rtas_base;
+    GArray *claimed; /* array of SpaprOfClaimed */
+    uint64_t claimed_base;
+    GHashTable *of_instances; /* ihandle -> SpaprOfInstance */
+    uint32_t of_instance_last;
+    Notifier machine_ready;
+    char *bootargs;
     uint32_t initrd_base;
     long initrd_size;
     uint64_t rtc_offset; /* Now used only during incoming migration */
@@ -526,7 +539,8 @@ struct SpaprMachineState {
 /* Client Architecture support */
 #define KVMPPC_H_CAS            (KVMPPC_HCALL_BASE + 0x2)
 #define KVMPPC_H_UPDATE_DT      (KVMPPC_HCALL_BASE + 0x3)
-#define KVMPPC_HCALL_MAX        KVMPPC_H_UPDATE_DT
+#define KVMPPC_H_OF_CLIENT      (KVMPPC_HCALL_BASE + 0x5)
+#define KVMPPC_HCALL_MAX        KVMPPC_H_OF_CLIENT
 
 /*
  * The hcall range 0xEF00 to 0xEF80 is reserved for use in facilitating
@@ -795,6 +809,11 @@ struct SpaprEventLogEntry {
 void *spapr_build_fdt(SpaprMachineState *spapr, bool reset, size_t space);
 void spapr_events_init(SpaprMachineState *sm);
 void spapr_dt_events(SpaprMachineState *sm, void *fdt);
+void spapr_setup_of_client(SpaprMachineState *spapr, target_ulong *stack_ptr,
+                           target_ulong *prom_entry);
+void spapr_of_client_dt(SpaprMachineState *spapr, void *fdt);
+void spapr_of_client_dt_finalize(SpaprMachineState *spapr);
+void spapr_of_client_machine_init(SpaprMachineState *spapr);
 void close_htab_fd(SpaprMachineState *spapr);
 void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr);
 void spapr_free_hpt(SpaprMachineState *spapr);
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index b59e9dc360fe..e3f67626a524 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1266,6 +1266,10 @@ void *spapr_build_fdt(SpaprMachineState *spapr, bool reset, size_t space)
         }
     }
 
+    if (spapr->vof) {
+        spapr_of_client_dt(spapr, fdt);
+    }
+
     return fdt;
 }
 
@@ -1273,6 +1277,14 @@ static uint64_t translate_kernel_address(void *opaque, uint64_t addr)
 {
     SpaprMachineState *spapr = opaque;
 
+    if (spapr->vof) {
+        /*
+         * Having no SLOF means we can load kernel at 0 and avoid the hack
+         * and for everything else (such as "grub") just do the usual thing
+         * and load it where it is linked to run.
+         */
+        return addr & 0xffffffff;
+    }
     return (addr & 0x0fffffff) + spapr->kernel_addr;
 }
 
@@ -1660,24 +1672,41 @@ static void spapr_machine_reset(MachineState *machine)
      */
     fdt_addr = MIN(spapr->rma_size, RTAS_MAX_ADDR) - FDT_MAX_SIZE;
 
+    /* Set up the entry state */
+    if (spapr->vof) {
+        target_ulong stack_ptr = 0;
+        target_ulong prom_entry = 0;
+
+        spapr_setup_of_client(spapr, &stack_ptr, &prom_entry);
+        spapr_cpu_set_entry_state(first_ppc_cpu, spapr->kernel_addr,
+                                  stack_ptr, spapr->initrd_base,
+                                  spapr->initrd_size, prom_entry);
+    } else {
+        spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT,
+                                  0, fdt_addr, 0, 0);
+    }
+
     fdt = spapr_build_fdt(spapr, true, FDT_MAX_SIZE);
 
-    rc = fdt_pack(fdt);
-
-    /* Should only fail if we've built a corrupted tree */
-    assert(rc == 0);
-
-    /* Load the fdt */
-    qemu_fdt_dumpdtb(fdt, fdt_totalsize(fdt));
-    cpu_physical_memory_write(fdt_addr, fdt, fdt_totalsize(fdt));
     g_free(spapr->fdt_blob);
     spapr->fdt_size = fdt_totalsize(fdt);
     spapr->fdt_initial_size = spapr->fdt_size;
     spapr->fdt_blob = fdt;
 
-    /* Set up the entry state */
-    spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT,
-                              0, fdt_addr, 0, 0);
+    if (spapr->vof) {
+        spapr_of_client_dt_finalize(spapr);
+    } else {
+        /* Load the fdt */
+        rc = fdt_pack(spapr->fdt_blob);
+        /* Should only fail if we've built a corrupted tree */
+        assert(rc == 0);
+
+        spapr->fdt_size = fdt_totalsize(spapr->fdt_blob);
+        spapr->fdt_initial_size = spapr->fdt_size;
+        cpu_physical_memory_write(fdt_addr, spapr->fdt_blob, spapr->fdt_size);
+    }
+
+    qemu_fdt_dumpdtb(spapr->fdt_blob, spapr->fdt_size);
 
     spapr->cas_reboot = false;
 
@@ -2986,20 +3015,24 @@ static void spapr_machine_init(MachineState *machine)
         }
     }
 
-    if (bios_name == NULL) {
-        bios_name = FW_FILE_NAME;
+    if (spapr->vof) {
+        spapr_of_client_machine_init(spapr);
+    } else {
+        if (bios_name == NULL) {
+            bios_name = FW_FILE_NAME;
+        }
+        filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
+        if (!filename) {
+            error_report("Could not find LPAR firmware '%s'", bios_name);
+            exit(1);
+        }
+        fw_size = load_image_targphys(filename, 0, FW_MAX_SIZE);
+        if (fw_size <= 0) {
+            error_report("Could not load LPAR firmware '%s'", filename);
+            exit(1);
+        }
+        g_free(filename);
     }
-    filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
-    if (!filename) {
-        error_report("Could not find LPAR firmware '%s'", bios_name);
-        exit(1);
-    }
-    fw_size = load_image_targphys(filename, 0, FW_MAX_SIZE);
-    if (fw_size <= 0) {
-        error_report("Could not load LPAR firmware '%s'", filename);
-        exit(1);
-    }
-    g_free(filename);
 
     /* FIXME: Should register things through the MachineState's qdev
      * interface, this is a legacy from the sPAPREnvironment structure
@@ -3230,6 +3263,20 @@ static void spapr_set_kernel_addr(Object *obj, Visitor *v, const char *name,
     visit_type_uint64(v, name, (uint64_t *)opaque, errp);
 }
 
+static bool spapr_get_vof(Object *obj, Error **errp)
+{
+    SpaprMachineState *spapr = SPAPR_MACHINE(obj);
+
+    return spapr->vof;
+}
+
+static void spapr_set_vof(Object *obj, bool value, Error **errp)
+{
+    SpaprMachineState *spapr = SPAPR_MACHINE(obj);
+
+    spapr->vof = value;
+}
+
 static char *spapr_get_ic_mode(Object *obj, Error **errp)
 {
     SpaprMachineState *spapr = SPAPR_MACHINE(obj);
@@ -3343,6 +3390,11 @@ static void spapr_instance_init(Object *obj)
                                     " for -kernel is the default",
                                     NULL);
     spapr->kernel_addr = KERNEL_LOAD_ADDR;
+    object_property_add_bool(obj, "vof", spapr_get_vof, spapr_set_vof, NULL);
+    object_property_set_description(obj, "vof", "Enable Virtual Oepn Firmware",
+                                    NULL);
+    spapr->vof = false;
+
     /* The machine class defines the default interrupt controller mode */
     spapr->irq = smc->irq;
     object_property_add_str(obj, "ic-mode", spapr_get_ic_mode,
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index da50d8ee5dd7..6a62c92b3f89 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1840,12 +1840,16 @@ target_ulong do_client_architecture_support(PowerPCCPU *cpu,
             spapr_setup_hpt_and_vrma(spapr);
         }
 
-        fdt = spapr_build_fdt(spapr, false, fdt_bufsize);
+        fdt = spapr_build_fdt(spapr, spapr->vof, fdt_bufsize);
 
         g_free(spapr->fdt_blob);
         spapr->fdt_size = fdt_totalsize(fdt);
         spapr->fdt_initial_size = spapr->fdt_size;
         spapr->fdt_blob = fdt;
+
+        if (spapr->vof) {
+            spapr_of_client_dt_finalize(spapr);
+        }
     }
 
     if (spapr->cas_reboot) {
diff --git a/hw/ppc/spapr_of_client.c b/hw/ppc/spapr_of_client.c
new file mode 100644
index 000000000000..31555c356de8
--- /dev/null
+++ b/hw/ppc/spapr_of_client.c
@@ -0,0 +1,1526 @@
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include <sys/ioctl.h>
+#include <termios.h>
+#include "qapi/error.h"
+#include "exec/memory.h"
+#include "hw/ppc/spapr.h"
+#include "hw/ppc/spapr_vio.h"
+#include "hw/ppc/fdt.h"
+#include "hw/block/block.h"
+#include "sysemu/block-backend.h"
+#include "sysemu/sysemu.h"
+#include "chardev/char-fe.h"
+#include "qom/qom-qobject.h"
+#include "elf.h"
+#include "hw/ppc/ppc.h"
+#include "hw/loader.h"
+#include "trace.h"
+
+#define ALIGN(x, a) (((x) + (a) - 1) & ~((a) - 1))
+
+struct gpt_header {
+    char signature[8];
+    char revision[4];
+    uint32_t header_size;
+    uint32_t crc;
+    uint32_t reserved;
+    uint64_t current_lba;
+    uint64_t backup_lba;
+    uint64_t first_usable_lba;
+    uint64_t last_usable_lba;
+    char guid[16];
+    uint64_t partition_entries_lba;
+    uint32_t nr_partition_entries;
+    uint32_t size_partition_entry;
+    uint32_t crc_partitions;
+};
+
+#define GPT_SIGNATURE "EFI PART"
+#define GPT_REVISION "\0\0\1\0" /* revision 1.0 */
+
+struct gpt_entry {
+    char partition_type_guid[16];
+    char unique_guid[16];
+    uint64_t first_lba;
+    uint64_t last_lba;
+    uint64_t attributes;
+    char name[72];                /* UTF-16LE */
+};
+
+#define GPT_MIN_PARTITIONS 128
+#define GPT_PT_ENTRY_SIZE 128
+#define SECTOR_SIZE 512
+
+static int find_prep_partition_on_gpt(BlockBackend *blk, uint8_t *lba01,
+                                      uint64_t *offset, uint64_t *size)
+{
+    unsigned i, partnum, partentrysize;
+    int ret;
+    struct gpt_header *hdr = (struct gpt_header *) (lba01 + SECTOR_SIZE);
+    const char *prep_uuid = "9e1a2d38-c612-4316-aa26-8b49521e5a8b";
+
+    if (memcmp(hdr, "EFI PART", 8)) {
+        return -1;
+    }
+
+    partnum = le32_to_cpu(hdr->nr_partition_entries);
+    partentrysize = le32_to_cpu(hdr->size_partition_entry);
+
+    if (partentrysize < 128 || partentrysize > 512) {
+        return -1;
+    }
+
+    for (i = 0; i < partnum; ++i) {
+        uint8_t partdata[partentrysize];
+        struct gpt_entry *entry = (struct gpt_entry *) partdata;
+        unsigned long first, last;
+        QemuUUID parttype;
+        char *uuid;
+
+        ret = blk_pread(blk, 2 * SECTOR_SIZE + i * partentrysize,
+                        partdata, sizeof(partdata));
+        if (ret < 0) {
+            return ret;
+        } else if (!ret) {
+            return -1;
+        }
+
+        memcpy(parttype.data, entry->partition_type_guid, 16);
+        parttype = qemu_uuid_bswap(parttype);
+        first = le64_to_cpu(entry->first_lba);
+        last = le64_to_cpu(entry->last_lba);
+
+        uuid = qemu_uuid_unparse_strdup(&parttype);
+        if (!strcmp(uuid, prep_uuid)) {
+            *offset = first * SECTOR_SIZE;
+            *size = (last - first) * SECTOR_SIZE;
+        }
+    }
+
+    if (*offset) {
+        return 0;
+    }
+
+    return -1;
+}
+
+struct partition_record {
+    uint8_t bootable;
+    uint8_t start_head;
+    uint32_t start_cylinder;
+    uint8_t start_sector;
+    uint8_t system;
+    uint8_t end_head;
+    uint8_t end_cylinder;
+    uint8_t end_sector;
+    uint32_t start_sector_abs;
+    uint32_t nb_sectors_abs;
+};
+
+static void read_partition(uint8_t *p, struct partition_record *r)
+{
+    r->bootable = p[0];
+    r->start_head = p[1];
+    r->start_cylinder = p[3] | ((p[2] << 2) & 0x0300);
+    r->start_sector = p[2] & 0x3f;
+    r->system = p[4];
+    r->end_head = p[5];
+    r->end_cylinder = p[7] | ((p[6] << 2) & 0x300);
+    r->end_sector = p[6] & 0x3f;
+    r->start_sector_abs = ldl_le_p(p + 8);
+    r->nb_sectors_abs   = ldl_le_p(p + 12);
+}
+
+static int find_prep_partition(BlockBackend *blk, uint64_t *offset,
+                               uint64_t *size)
+{
+    uint8_t lba01[SECTOR_SIZE * 2];
+    int i;
+    int ret = -ENOENT;
+
+    ret = blk_pread(blk, 0, lba01, sizeof(lba01));
+    if (ret < 0) {
+        error_report("error while reading: %s", strerror(-ret));
+        exit(EXIT_FAILURE);
+    }
+
+    if (lba01[510] != 0x55 || lba01[511] != 0xaa) {
+        return find_prep_partition_on_gpt(blk, lba01, offset, size);
+    }
+
+    for (i = 0; i < 4; i++) {
+        struct partition_record part = { 0 };
+
+        read_partition(&lba01[446 + 16 * i], &part);
+        if (!part.system || !part.nb_sectors_abs) {
+            continue;
+        }
+
+        /* 0xEE == GPT */
+        if (part.system == 0xEE) {
+            ret = find_prep_partition_on_gpt(blk, lba01, offset, size);
+        }
+        /* 0x41 == PReP */
+        if (part.system == 0x41) {
+            *offset = (uint64_t)part.start_sector_abs << 9;
+            *size = (uint64_t)part.nb_sectors_abs << 9;
+            ret = 0;
+        }
+    }
+
+    return ret;
+}
+
+/*
+ * Below is a compiled version of RTAS blob and OF client interface entry point.
+ *
+ * gcc -nostdlib  -mbig -o spapr-rtas.img spapr-rtas.S
+ * objcopy  -O binary -j .text  spapr-rtas.img spapr-rtas.bin
+ *
+ *   .globl  _start
+ *   _start:
+ *           mr      4,3
+ *           lis     3,KVMPPC_H_RTAS@h
+ *           ori     3,3,KVMPPC_H_RTAS@l
+ *           sc      1
+ *           blr
+ * ...
+ */
+static const uint8_t rtas_blob[] = {
+    0x7c, 0x64, 0x1b, 0x78,
+    0x3c, 0x60, 0x00, 0x00,
+    0x60, 0x63, 0xf0, 0x00,
+    0x44, 0x00, 0x00, 0x22,
+    0x4e, 0x80, 0x00, 0x20
+};
+
+/*
+ * ...
+ *           mr      4,3
+ *           lis     3,KVMPPC_H_OF_CLIENT@h
+ *           ori     3,3,KVMPPC_H_OF_CLIENT@l
+ *           sc      1
+ *           blr
+ */
+static const uint8_t of_client_blob[] = {
+    0x7c, 0x64, 0x1b, 0x78,
+    0x3c, 0x60, 0x00, 0x00,
+    0x60, 0x63, 0xf0, 0x05,
+    0x44, 0x00, 0x00, 0x22,
+    0x4e, 0x80, 0x00, 0x20
+};
+
+typedef struct {
+    DeviceState *dev;
+    CharBackend *cbe;
+    BlockBackend *blk;
+    uint64_t blk_pos;
+    uint16_t blk_physical_block_size;
+    char *path; /* the path used to open the instance */
+    uint32_t phandle;
+} SpaprOfInstance;
+
+/*
+ * OF 1275 "nextprop" description suggests is it 32 bytes max but
+ * LoPAPR defines "ibm,query-interrupt-source-number" which is 33 chars long.
+ */
+#define OF_PROPNAME_LEN_MAX 64
+
+/* Copied from SLOF, and 4K is definitely not enough for GRUB */
+#define OF_STACK_SIZE       0x8000
+
+/* Defined as Big Endian */
+struct prom_args {
+    uint32_t service;
+    uint32_t nargs;
+    uint32_t nret;
+    uint32_t args[10];
+};
+
+static void readstr(hwaddr pa, char *buf, int size)
+{
+    cpu_physical_memory_read(pa, buf, size);
+    if (buf[size - 1] != '\0') {
+        buf[size - 1] = '\0';
+        if (strlen(buf) == size - 1) {
+            trace_spapr_of_client_error_str_truncated(buf, size);
+        }
+    }
+}
+
+static bool cmpservice(const char *s, size_t len,
+                       unsigned nargs, unsigned nret,
+                       const char *s1, size_t len1,
+                       unsigned nargscheck, unsigned nretcheck)
+{
+    if (strcmp(s, s1)) {
+        return false;
+    }
+    if ((nargscheck && (nargs != nargscheck)) ||
+        (nretcheck && (nret != nretcheck))) {
+        trace_spapr_of_client_error_param(s, nargscheck, nretcheck, nargs,
+                                          nret);
+        return false;
+    }
+
+    return true;
+}
+
+static void split_path(const char *fullpath, char **node, char **unit,
+                       char **part)
+{
+    const char *c, *p = NULL, *u = NULL;
+
+    *node = *unit = *part = NULL;
+
+    if (fullpath[0] == '\0') {
+        *node = g_strdup(fullpath);
+        return;
+    }
+
+    for (c = fullpath + strlen(fullpath) - 1; c > fullpath; --c) {
+        if (*c == '/') {
+            break;
+        }
+        if (*c == ':') {
+            p = c + 1;
+            continue;
+        }
+        if (*c == '@') {
+            u = c + 1;
+            continue;
+        }
+    }
+
+    if (p && u && p < u) {
+        p = NULL;
+    }
+
+    if (u && p) {
+        *node = g_strndup(fullpath, u - fullpath - 1);
+        *unit = g_strndup(u, p - u - 1);
+        *part = g_strdup(p);
+    } else if (!u && p) {
+        *node = g_strndup(fullpath, p - fullpath - 1);
+        *part = g_strdup(p);
+    } else if (!p && u) {
+        *node = g_strndup(fullpath, u - fullpath - 1);
+        *unit = g_strdup(u);
+    } else {
+        *node = g_strdup(fullpath);
+    }
+}
+
+static void prop_format(char *tval, int tlen, const void *prop, int len)
+{
+    int i;
+    const char *c;
+    char *t;
+    const char bin[] = "...";
+
+    for (i = 0, c = prop; i < len; ++i, ++c) {
+        if (*c == '\0' && i == len - 1) {
+            strncpy(tval, prop, tlen - 1);
+            return;
+        }
+        if (*c < 0x20 || *c >= 0x80) {
+            /* Not reliably printable string so assume it is binary */
+            break;
+        }
+    }
+
+    /* Accidentally the binary will look like big endian (which it is) */
+    for (i = 0, c = prop, t = tval; i < len; ++i, ++c) {
+        if (t >= tval + tlen - sizeof(bin) - 1 - 2 - 1) {
+            strcpy(t, bin);
+            return;
+        }
+        if (i && i % 4 == 0 && i != len - 1) {
+            strcat(t, " ");
+            ++t;
+        }
+        t += sprintf(t, "%02X", *c & 0xFF);
+    }
+}
+
+static int of_client_fdt_path_offset(const void *fdt, const char *node,
+                                     const char *unit)
+{
+    int offset;
+
+    offset = fdt_path_offset(fdt, node);
+
+    if (offset < 0 && unit) {
+        char *tmp = g_strdup_printf("%s@%s", node, unit);
+
+        offset = fdt_path_offset(fdt, tmp);
+        g_free(tmp);
+    }
+
+    return offset;
+}
+
+static uint32_t of_client_finddevice(const void *fdt, uint32_t nodeaddr)
+{
+    char *node, *unit, *part;
+    char fullnode[1024];
+    uint32_t ret = -1;
+    int offset;
+
+    readstr(nodeaddr, fullnode, sizeof(fullnode));
+
+    split_path(fullnode, &node, &unit, &part);
+    offset = of_client_fdt_path_offset(fdt, node, unit);
+    if (offset >= 0) {
+        ret = fdt_get_phandle(fdt, offset);
+    }
+    trace_spapr_of_client_finddevice(fullnode, ret);
+    g_free(node);
+    g_free(unit);
+    g_free(part);
+    return (uint32_t) ret;
+}
+
+static uint32_t of_client_getprop(const void *fdt, uint32_t nodeph,
+                                  uint32_t pname, uint32_t valaddr,
+                                  uint32_t vallen)
+{
+    char propname[OF_PROPNAME_LEN_MAX + 1];
+    uint32_t ret = 0;
+    int proplen = 0;
+    const void *prop;
+    char trval[64] = "";
+    int nodeoff = fdt_node_offset_by_phandle(fdt, nodeph);
+
+    readstr(pname, propname, sizeof(propname));
+    if (strcmp(propname, "name") == 0) {
+        prop = fdt_get_name(fdt, nodeoff, &proplen);
+        proplen += 1;
+    } else {
+        prop = fdt_getprop(fdt, nodeoff, propname, &proplen);
+    }
+
+    if (prop) {
+        int cb = MIN(proplen, vallen);
+
+        cpu_physical_memory_write(valaddr, prop, cb);
+        /*
+         * OF1275 says:
+         * "Size is either the actual size of the property, or –1 if name
+         * does not exist", hence returning proplen instead of cb.
+         */
+        ret = proplen;
+        prop_format(trval, sizeof(trval), prop, ret);
+    } else {
+        ret = -1;
+    }
+    trace_spapr_of_client_getprop(nodeph, propname, ret, trval);
+
+    return ret;
+}
+
+static uint32_t of_client_getproplen(const void *fdt, uint32_t nodeph,
+                                     uint32_t pname)
+{
+    char propname[OF_PROPNAME_LEN_MAX + 1];
+    uint32_t ret = 0;
+    int proplen = 0;
+    const void *prop;
+    int nodeoff = fdt_node_offset_by_phandle(fdt, nodeph);
+
+    readstr(pname, propname, sizeof(propname));
+    if (strcmp(propname, "name") == 0) {
+        prop = fdt_get_name(fdt, nodeoff, &proplen);
+        proplen += 1;
+    } else {
+        prop = fdt_getprop(fdt, nodeoff, propname, &proplen);
+    }
+
+    if (prop) {
+        ret = proplen;
+    } else {
+        ret = -1;
+    }
+    trace_spapr_of_client_getproplen(nodeph, propname, ret);
+
+    return ret;
+}
+
+static uint32_t of_client_setprop(SpaprMachineState *spapr,
+                                  uint32_t nodeph, uint32_t pname,
+                                  uint32_t valaddr, uint32_t vallen)
+{
+    char propname[OF_PROPNAME_LEN_MAX + 1];
+    uint32_t ret = -1;
+    int offset;
+    char trval[64] = "";
+
+    readstr(pname, propname, sizeof(propname));
+    /*
+     * We only allow changing properties which we know how to update on
+     * the QEMU side.
+     */
+    if (vallen == sizeof(uint32_t)) {
+        uint32_t val32 = ldl_be_phys(first_cpu->as, valaddr);
+
+        if ((strcmp(propname, "linux,rtas-base") == 0) ||
+            (strcmp(propname, "linux,rtas-entry") == 0)) {
+            spapr->rtas_base = val32;
+        } else if (strcmp(propname, "linux,initrd-start") == 0) {
+            spapr->initrd_base = val32;
+        } else if (strcmp(propname, "linux,initrd-end") == 0) {
+            spapr->initrd_size = val32 - spapr->initrd_base;
+        } else {
+            goto trace_exit;
+        }
+    } else if (vallen == sizeof(uint64_t)) {
+        uint64_t val64 = ldq_be_phys(first_cpu->as, valaddr);
+
+        if (strcmp(propname, "linux,initrd-start") == 0) {
+            spapr->initrd_base = val64;
+        } else if (strcmp(propname, "linux,initrd-end") == 0) {
+            spapr->initrd_size = val64 - spapr->initrd_base;
+        } else {
+            goto trace_exit;
+        }
+    } else if (strcmp(propname, "bootargs") == 0) {
+        char val[1024];
+
+        readstr(valaddr, val, sizeof(val));
+        g_free(spapr->bootargs);
+        spapr->bootargs = g_strdup(val);
+    } else {
+        goto trace_exit;
+    }
+
+    offset = fdt_node_offset_by_phandle(spapr->fdt_blob, nodeph);
+    if (offset >= 0) {
+        uint8_t data[vallen];
+
+        cpu_physical_memory_read(valaddr, data, vallen);
+        if (!fdt_setprop(spapr->fdt_blob, offset, propname, data, vallen)) {
+            ret = vallen;
+            prop_format(trval, sizeof(trval), data, ret);
+        }
+    }
+
+trace_exit:
+    trace_spapr_of_client_setprop(nodeph, propname, trval, ret);
+
+    return ret;
+}
+
+static uint32_t of_client_nextprop(const void *fdt, uint32_t phandle,
+                                   uint32_t prevaddr, uint32_t nameaddr)
+{
+    int offset = fdt_node_offset_by_phandle(fdt, phandle);
+    char prev[OF_PROPNAME_LEN_MAX + 1];
+    const char *tmp;
+
+    readstr(prevaddr, prev, sizeof(prev));
+    for (offset = fdt_first_property_offset(fdt, offset);
+         offset >= 0;
+         offset = fdt_next_property_offset(fdt, offset)) {
+
+        if (!fdt_getprop_by_offset(fdt, offset, &tmp, NULL)) {
+            return 0;
+        }
+        if (prev[0] == '\0' || strcmp(prev, tmp) == 0) {
+            if (prev[0] != '\0') {
+                offset = fdt_next_property_offset(fdt, offset);
+                if (offset < 0) {
+                    return 0;
+                }
+            }
+            if (!fdt_getprop_by_offset(fdt, offset, &tmp, NULL)) {
+                return 0;
+            }
+
+            cpu_physical_memory_write(nameaddr, tmp, strlen(tmp) + 1);
+            return 1;
+        }
+    }
+
+    return 0;
+}
+
+static uint32_t of_client_peer(const void *fdt, uint32_t phandle)
+{
+    int ret;
+
+    if (phandle == 0) {
+        ret = fdt_path_offset(fdt, "/");
+    } else {
+        ret = fdt_next_subnode(fdt, fdt_node_offset_by_phandle(fdt, phandle));
+    }
+
+    if (ret < 0) {
+        ret = 0;
+    } else {
+        ret = fdt_get_phandle(fdt, ret);
+    }
+
+    return ret;
+}
+
+static uint32_t of_client_child(const void *fdt, uint32_t phandle)
+{
+    int ret = fdt_first_subnode(fdt, fdt_node_offset_by_phandle(fdt, phandle));
+
+    if (ret < 0) {
+        ret = 0;
+    } else {
+        ret = fdt_get_phandle(fdt, ret);
+    }
+
+    return ret;
+}
+
+static uint32_t of_client_parent(const void *fdt, uint32_t phandle)
+{
+    int ret = fdt_parent_offset(fdt, fdt_node_offset_by_phandle(fdt, phandle));
+
+    if (ret < 0) {
+        ret = 0;
+    } else {
+        ret = fdt_get_phandle(fdt, ret);
+    }
+
+    return ret;
+}
+
+static DeviceState *of_client_find_qom_dev(BusState *bus, const char *path)
+{
+    BusChild *kid;
+
+    QTAILQ_FOREACH(kid, &bus->children, sibling) {
+        const char *p = qdev_get_fw_dev_path(kid->child);
+        BusState *child;
+
+        if (p && strcmp(path, p) == 0) {
+            return kid->child;
+        }
+        QLIST_FOREACH(child, &kid->child->child_bus, sibling) {
+            DeviceState *d = of_client_find_qom_dev(child, path);
+
+            if (d) {
+                return d;
+            }
+        }
+    }
+    return NULL;
+}
+
+static uint32_t spapr_of_client_open(SpaprMachineState *spapr, const char *path)
+{
+    int offset;
+    uint32_t ret = 0;
+    SpaprOfInstance *inst = NULL;
+    char *node, *unit, *part;
+
+    if (spapr->of_instance_last == 0xFFFFFFFF) {
+        /* We do not recycle ihandles yet */
+        goto trace_exit;
+    }
+
+    split_path(path, &node, &unit, &part);
+    if (part && strcmp(part, "0")) {
+        error_report("Error: Do not do partitions now");
+        g_free(part);
+        part = NULL;
+    }
+
+    offset = of_client_fdt_path_offset(spapr->fdt_blob, node, unit);
+    if (offset < 0) {
+        trace_spapr_of_client_error_unknown_path(path);
+        goto trace_exit;
+    }
+
+    inst = g_new0(SpaprOfInstance, 1);
+    inst->phandle = fdt_get_phandle(spapr->fdt_blob, offset);
+    g_assert(inst->phandle);
+    ++spapr->of_instance_last;
+
+    inst->dev = of_client_find_qom_dev(sysbus_get_default(), node);
+    if (!inst->dev) {
+        char *tmp = g_strdup_printf("%s@%s", node, unit);
+        inst->dev = of_client_find_qom_dev(sysbus_get_default(), tmp);
+        g_free(tmp);
+    }
+    inst->path = g_strdup(path);
+    g_hash_table_insert(spapr->of_instances,
+                        GINT_TO_POINTER(spapr->of_instance_last),
+                        inst);
+    ret = spapr->of_instance_last;
+
+    if (inst->dev) {
+        const char *cdevstr = object_property_get_str(OBJECT(inst->dev),
+                                                      "chardev", NULL);
+        const char *blkstr = object_property_get_str(OBJECT(inst->dev),
+                                                     "drive", NULL);
+
+        if (cdevstr) {
+            Chardev *cdev = qemu_chr_find(cdevstr);
+
+            if (cdev) {
+                inst->cbe = cdev->be;
+            }
+        } else if (blkstr) {
+            BlockConf conf = { 0 };
+
+            inst->blk = blk_by_name(blkstr);
+            conf.blk = inst->blk;
+            blkconf_blocksizes(&conf);
+            inst->blk_physical_block_size = conf.physical_block_size;
+        }
+    }
+
+trace_exit:
+    trace_spapr_of_client_open(path, inst ? inst->phandle : 0, ret);
+    g_free(node);
+    g_free(unit);
+    g_free(part);
+
+    return ret;
+}
+
+static uint32_t of_client_open(SpaprMachineState *spapr, uint32_t pathaddr)
+{
+    char path[256];
+
+    readstr(pathaddr, path, sizeof(path));
+
+    return spapr_of_client_open(spapr, path);
+}
+
+static void of_client_close(SpaprMachineState *spapr, uint32_t ihandle)
+{
+    if (!g_hash_table_remove(spapr->of_instances, GINT_TO_POINTER(ihandle))) {
+        trace_spapr_of_client_error_unknown_ihandle_close(ihandle);
+    }
+}
+
+static uint32_t of_client_instance_to_package(SpaprMachineState *spapr,
+                                              uint32_t ihandle)
+{
+    gpointer instp = g_hash_table_lookup(spapr->of_instances,
+                                         GINT_TO_POINTER(ihandle));
+    uint32_t ret = -1;
+
+    if (instp) {
+        ret = ((SpaprOfInstance *)instp)->phandle;
+    }
+    trace_spapr_of_client_instance_to_package(ihandle, ret);
+
+    return ret;
+}
+
+static uint32_t of_client_package_to_path(const void *fdt, uint32_t phandle,
+                                          uint32_t buf, uint32_t len)
+{
+    uint32_t ret = -1;
+    char tmp[256] = "";
+
+    if (0 == fdt_get_path(fdt, fdt_node_offset_by_phandle(fdt, phandle), tmp,
+                          sizeof(tmp))) {
+        tmp[sizeof(tmp) - 1] = 0;
+        ret = MIN(len, strlen(tmp) + 1);
+        cpu_physical_memory_write(buf, tmp, ret);
+    }
+
+    trace_spapr_of_client_package_to_path(phandle, tmp, ret);
+
+    return ret;
+}
+
+static uint32_t of_client_instance_to_path(SpaprMachineState *spapr,
+                                           uint32_t ihandle, uint32_t buf,
+                                           uint32_t len)
+{
+    uint32_t ret = -1;
+    uint32_t phandle = of_client_instance_to_package(spapr, ihandle);
+    char tmp[256] = "";
+
+    if (phandle != -1) {
+        if (0 == fdt_get_path(spapr->fdt_blob,
+                              fdt_node_offset_by_phandle(spapr->fdt_blob,
+                                                         phandle),
+                              tmp, sizeof(tmp))) {
+            tmp[sizeof(tmp) - 1] = 0;
+            ret = MIN(len, strlen(tmp) + 1);
+            cpu_physical_memory_write(buf, tmp, ret);
+        }
+    }
+    trace_spapr_of_client_instance_to_path(ihandle, phandle, tmp, ret);
+
+    return ret;
+}
+
+static uint32_t of_client_write(SpaprMachineState *spapr, uint32_t ihandle,
+                                uint32_t buf, uint32_t len)
+{
+    char tmp[256];
+    int toread, toprint, cb = MIN(len, 1024);
+    SpaprOfInstance *inst = (SpaprOfInstance *)
+        g_hash_table_lookup(spapr->of_instances, GINT_TO_POINTER(ihandle));
+
+    while (cb > 0) {
+        toread = MIN(cb, sizeof(tmp) - 1);
+
+        cpu_physical_memory_read(buf, tmp, toread);
+
+        toprint = toread;
+        if (inst) {
+            if (inst->cbe) {
+                toprint = qemu_chr_fe_write_all(inst->cbe, (uint8_t *) tmp,
+                                                toprint);
+            } else if (inst->blk) {
+                trace_spapr_of_client_blk_write(ihandle, len);
+            }
+        } else {
+            /* We normally open stdout so this is fallback */
+            tmp[toprint] = '\0';
+            printf("DBG[%d]%s", ihandle, tmp);
+        }
+        buf += toprint;
+        cb -= toprint;
+    }
+
+    return len;
+}
+
+static uint32_t of_client_read(SpaprMachineState *spapr, uint32_t ihandle,
+                               uint32_t bufaddr, uint32_t len)
+{
+    uint32_t ret = 0;
+    SpaprOfInstance *inst = (SpaprOfInstance *)
+        g_hash_table_lookup(spapr->of_instances, GINT_TO_POINTER(ihandle));
+
+    if (inst) {
+        hwaddr xlat = 0;
+        hwaddr xlen = len;
+        MemoryRegion *mr = address_space_translate(&address_space_memory,
+                                                   bufaddr, &xlat, &xlen, true,
+                                                   MEMTXATTRS_UNSPECIFIED);
+
+        if (mr && xlen == len) {
+            uint8_t *buf = memory_region_get_ram_ptr(mr) + xlat;
+
+            if (inst->cbe) {
+                SpaprVioDevice *sdev = VIO_SPAPR_DEVICE(inst->dev);
+
+                ret = vty_getchars(sdev, buf, len); /* qemu_chr_fe_read_all? */
+            } else if (inst->blk) {
+                int rc = blk_pread(inst->blk, inst->blk_pos, buf, len);
+
+                if (rc > 0) {
+                    ret = rc;
+                }
+                trace_spapr_of_client_blk_read(ihandle, inst->blk_pos, len,
+                                               ret);
+                if (rc > 0) {
+                    inst->blk_pos += rc;
+                }
+            }
+        }
+    }
+
+    return ret;
+}
+
+static uint32_t of_client_seek(SpaprMachineState *spapr, uint32_t ihandle,
+                               uint32_t hi, uint32_t lo)
+{
+    uint32_t ret = -1;
+    uint64_t pos = ((uint64_t) hi << 32) | lo;
+    SpaprOfInstance *inst = (SpaprOfInstance *)
+        g_hash_table_lookup(spapr->of_instances, GINT_TO_POINTER(ihandle));
+
+    if (inst) {
+        if (inst->blk) {
+            inst->blk_pos = pos;
+            ret = 1;
+            trace_spapr_of_client_blk_seek(ihandle, pos, ret);
+        }
+    }
+
+    return ret;
+}
+
+static void of_client_clamed_dump(GArray *claimed)
+{
+#ifdef DEBUG
+    int i;
+    SpaprOfClaimed c;
+
+    for (i = 0; i < claimed->len; ++i) {
+        c = g_array_index(claimed, SpaprOfClaimed, i);
+        error_printf("CLAIMED %lx..%lx size=%ld\n", c.start, c.start + c.size,
+                     c.size);
+    }
+#endif
+}
+
+static bool of_client_claim_avail(GArray *claimed, uint64_t virt, uint64_t size)
+{
+    int i;
+    SpaprOfClaimed c;
+
+    for (i = 0; i < claimed->len; ++i) {
+        c = g_array_index(claimed, SpaprOfClaimed, i);
+        if ((c.start <= virt && virt < c.start + c.size) ||
+            (virt <= c.start && c.start < virt + size)) {
+            return false;
+        }
+    }
+
+    return true;
+}
+
+static void of_client_claim_add(GArray *claimed, uint64_t virt, uint64_t size)
+{
+    SpaprOfClaimed newclaim;
+
+    newclaim.start = virt;
+    newclaim.size = size;
+    g_array_append_val(claimed, newclaim);
+}
+
+/*
+ * "claim" claims memory at @virt if @align==0; otherwise it allocates
+ * memory at the requested alignment.
+ */
+static void of_client_dt_memory_available(void *fdt, GArray *claimed,
+                                          uint64_t base);
+
+static uint64_t of_client_claim(SpaprMachineState *spapr, uint64_t virt,
+                                uint64_t size, uint64_t align)
+{
+    uint64_t ret;
+
+    if (align == 0) {
+        if (!of_client_claim_avail(spapr->claimed, virt, size)) {
+            ret = -1;
+        } else {
+            ret = virt;
+        }
+    } else {
+        spapr->claimed_base = ALIGN(spapr->claimed_base, align);
+        while (1) {
+            if (spapr->claimed_base >= spapr->rma_size) {
+                error_report("Out of RMA memory for the OF client");
+                return -1;
+            }
+            if (of_client_claim_avail(spapr->claimed, spapr->claimed_base,
+                                      size)) {
+                break;
+            }
+            spapr->claimed_base += size;
+        }
+        ret = spapr->claimed_base;
+    }
+
+    if (ret != -1) {
+        spapr->claimed_base = MAX(spapr->claimed_base, ret + size);
+        of_client_claim_add(spapr->claimed, ret, size);
+        /* The client reads "/memory@0/available" to know where it can claim */
+        of_client_dt_memory_available(spapr->fdt_blob, spapr->claimed,
+                                      spapr->claimed_base);
+    }
+    trace_spapr_of_client_claim(virt, size, align, ret);
+
+    return ret;
+}
+
+static uint32_t of_client_release(SpaprMachineState *spapr, uint64_t virt,
+                                  uint64_t size)
+{
+    uint32_t ret = -1;
+    int i;
+    GArray *claimed = spapr->claimed;
+    SpaprOfClaimed c;
+
+    for (i = 0; i < claimed->len; ++i) {
+        c = g_array_index(claimed, SpaprOfClaimed, i);
+        if (c.start == virt && c.size == size) {
+            g_array_remove_index(claimed, i);
+            ret = 0;
+            break;
+        }
+    }
+
+    trace_spapr_of_client_release(virt, size, ret);
+
+    return ret;
+}
+
+static void of_client_instantiate_rtas(SpaprMachineState *spapr, uint32_t base)
+{
+    uint64_t check_claim = of_client_claim(spapr, base, sizeof(rtas_blob), 0);
+    /*
+     * The client should have claimed RTAS memory, make sure it has or
+     * just claim it here with a warning.
+     */
+    if (check_claim != -1) {
+        error_report("The OF client did not claim RTAS memory at 0x%x", base);
+    }
+    spapr->rtas_base = base;
+    cpu_physical_memory_write(base, rtas_blob, sizeof(rtas_blob));
+}
+
+static uint32_t of_client_call_method(SpaprMachineState *spapr,
+                                      uint32_t methodaddr, uint32_t ihandle,
+                                      uint32_t param1, uint32_t param2,
+                                      uint32_t param3, uint32_t param4,
+                                      uint32_t *ret2)
+{
+    uint32_t ret = -1;
+    char method[256] = "";
+    SpaprOfInstance *inst = NULL;
+
+    if (!ihandle) {
+        goto trace_exit;
+    }
+
+    inst = (SpaprOfInstance *) g_hash_table_lookup(spapr->of_instances,
+                                                   GINT_TO_POINTER(ihandle));
+    if (!inst) {
+        goto trace_exit;
+    }
+
+    readstr(methodaddr, method, sizeof(method));
+
+    if (strcmp(inst->path, "/") == 0) {
+        if (strcmp(method, "ibm,client-architecture-support") == 0) {
+            ret = do_client_architecture_support(POWERPC_CPU(first_cpu), spapr,
+                                                 param1, FDT_MAX_SIZE);
+            *ret2 = 0;
+        }
+    } else if (strcmp(inst->path, "/rtas") == 0) {
+        if (strcmp(method, "instantiate-rtas") == 0) {
+            of_client_instantiate_rtas(spapr, param1);
+            ret = 0;
+            *ret2 = param1; /* rtasbase */
+        }
+    } else if (inst->cbe) {
+        if (strcmp(method, "color!") == 0) {
+            /* do not bother about colors now */
+            ret = 0;
+        }
+    } else if (inst->blk) {
+        if (strcmp(method, "block-size") == 0) {
+            ret = 0;
+            *ret2 = inst->blk_physical_block_size;
+        } else if (strcmp(method, "#blocks") == 0) {
+            ret = 0;
+            *ret2 = blk_getlength(inst->blk) / inst->blk_physical_block_size;
+        }
+     } else if (inst->dev) {
+        if (strcmp(method, "vscsi-report-luns") == 0) {
+            /* TODO: Not implemented yet, not clear when it is really needed */
+            ret = -1;
+            *ret2 = 1;
+        }
+    } else {
+        trace_spapr_of_client_error_unknown_method(method);
+    }
+
+trace_exit:
+    trace_spapr_of_client_method(ihandle, method, param1, ret, *ret2);
+
+    return ret;
+}
+
+static uint32_t of_client_call_interpret(SpaprMachineState *spapr,
+                                         uint32_t cmdaddr, uint32_t param1,
+                                         uint32_t param2, uint32_t *ret2)
+{
+    uint32_t ret = -1;
+    char cmd[256] = "";
+
+    readstr(cmdaddr, cmd, sizeof(cmd));
+    trace_spapr_of_client_interpret(cmd, param1, param2, ret, *ret2);
+
+    return ret;
+}
+
+static void of_client_quiesce(SpaprMachineState *spapr)
+{
+    /* We could as well just free the blob as there is no use for it from now */
+    int rc = fdt_pack(spapr->fdt_blob);
+    /* Should only fail if we've built a corrupted tree */
+    assert(rc == 0);
+
+    spapr->fdt_size = fdt_totalsize(spapr->fdt_blob);
+    spapr->fdt_initial_size = spapr->fdt_size;
+    of_client_clamed_dump(spapr->claimed);
+}
+
+static target_ulong spapr_h_of_client(PowerPCCPU *cpu, SpaprMachineState *spapr,
+                                      target_ulong opcode, target_ulong *args)
+{
+    target_ulong of_client_args = ppc64_phys_to_real(args[0]);
+    struct prom_args pargs = { 0 };
+    char service[64];
+    unsigned nargs, nret;
+    int i, servicelen;
+
+    cpu_physical_memory_read(of_client_args, &pargs, sizeof(pargs));
+    nargs = be32_to_cpu(pargs.nargs);
+    nret = be32_to_cpu(pargs.nret);
+    readstr(be32_to_cpu(pargs.service), service, sizeof(service));
+    servicelen = strlen(service);
+
+    if (nargs >= ARRAY_SIZE(pargs.args)) {
+        /* Bounds checking: something is just very wrong */
+        return H_PARAMETER;
+    }
+
+#define cmpserv(s, a, r) \
+    cmpservice(service, servicelen, nargs, nret, (s), sizeof(s), (a), (r))
+
+    if (cmpserv("finddevice", 1, 1)) {
+        pargs.args[nargs] =
+            of_client_finddevice(spapr->fdt_blob,
+                                 be32_to_cpu(pargs.args[0]));
+    } else if (cmpserv("getprop", 4, 1)) {
+        pargs.args[nargs] =
+            of_client_getprop(spapr->fdt_blob,
+                              be32_to_cpu(pargs.args[0]),
+                              be32_to_cpu(pargs.args[1]),
+                              be32_to_cpu(pargs.args[2]),
+                              be32_to_cpu(pargs.args[3]));
+    } else if (cmpserv("getproplen", 2, 1)) {
+        pargs.args[nargs] =
+            of_client_getproplen(spapr->fdt_blob,
+                                 be32_to_cpu(pargs.args[0]),
+                                 be32_to_cpu(pargs.args[1]));
+    } else if (cmpserv("setprop", 4, 1)) {
+        pargs.args[nargs] =
+            of_client_setprop(spapr,
+                              be32_to_cpu(pargs.args[0]),
+                              be32_to_cpu(pargs.args[1]),
+                              be32_to_cpu(pargs.args[2]),
+                              be32_to_cpu(pargs.args[3]));
+    } else if (cmpserv("nextprop", 3, 1)) {
+        pargs.args[nargs] =
+            of_client_nextprop(spapr->fdt_blob,
+                               be32_to_cpu(pargs.args[0]),
+                               be32_to_cpu(pargs.args[1]),
+                               be32_to_cpu(pargs.args[2]));
+    } else if (cmpserv("peer", 1, 1)) {
+        pargs.args[nargs] =
+            of_client_peer(spapr->fdt_blob,
+                           be32_to_cpu(pargs.args[0]));
+    } else if (cmpserv("child", 1, 1)) {
+        pargs.args[nargs] =
+            of_client_child(spapr->fdt_blob,
+                            be32_to_cpu(pargs.args[0]));
+    } else if (cmpserv("parent", 1, 1)) {
+        pargs.args[nargs] =
+            of_client_parent(spapr->fdt_blob,
+                             be32_to_cpu(pargs.args[0]));
+    } else if (cmpserv("open", 1, 1)) {
+        pargs.args[nargs] = of_client_open(spapr, be32_to_cpu(pargs.args[0]));
+    } else if (cmpserv("close", 1, 0)) {
+        of_client_close(spapr, be32_to_cpu(pargs.args[0]));
+    } else if (cmpserv("instance-to-package", 1, 1)) {
+        pargs.args[nargs] =
+            of_client_instance_to_package(spapr,
+                                          be32_to_cpu(pargs.args[0]));
+    } else if (cmpserv("package-to-path", 3, 1)) {
+        pargs.args[nargs] =
+            of_client_package_to_path(spapr->fdt_blob,
+                                      be32_to_cpu(pargs.args[0]),
+                                      be32_to_cpu(pargs.args[1]),
+                                      be32_to_cpu(pargs.args[2]));
+    } else if (cmpserv("instance-to-path", 3, 1)) {
+        pargs.args[nargs] =
+            of_client_instance_to_path(spapr,
+                                       be32_to_cpu(pargs.args[0]),
+                                       be32_to_cpu(pargs.args[1]),
+                                       be32_to_cpu(pargs.args[2]));
+    } else if (cmpserv("write", 3, 1)) {
+        pargs.args[nargs] =
+            of_client_write(spapr,
+                            be32_to_cpu(pargs.args[0]),
+                            be32_to_cpu(pargs.args[1]),
+                            be32_to_cpu(pargs.args[2]));
+    } else if (cmpserv("read", 3, 1)) {
+        pargs.args[nargs] =
+            of_client_read(spapr,
+                           be32_to_cpu(pargs.args[0]),
+                           be32_to_cpu(pargs.args[1]),
+                           be32_to_cpu(pargs.args[2]));
+    } else if (cmpserv("seek", 3, 1)) {
+        pargs.args[nargs] =
+            of_client_seek(spapr,
+                           be32_to_cpu(pargs.args[0]),
+                           be32_to_cpu(pargs.args[1]),
+                           be32_to_cpu(pargs.args[2]));
+    } else if (cmpserv("claim", 3, 1)) {
+        pargs.args[nargs] =
+            of_client_claim(spapr,
+                            be32_to_cpu(pargs.args[0]),
+                            be32_to_cpu(pargs.args[1]),
+                            be32_to_cpu(pargs.args[2]));
+    } else if (cmpserv("release", 2, 0)) {
+        pargs.args[nargs] =
+            of_client_release(spapr,
+                              be32_to_cpu(pargs.args[0]),
+                              be32_to_cpu(pargs.args[1]));
+    } else if (cmpserv("call-method", 0, 0)) {
+        pargs.args[nargs] =
+            of_client_call_method(spapr,
+                                  be32_to_cpu(pargs.args[0]),
+                                  be32_to_cpu(pargs.args[1]),
+                                  be32_to_cpu(pargs.args[2]),
+                                  be32_to_cpu(pargs.args[3]),
+                                  be32_to_cpu(pargs.args[4]),
+                                  be32_to_cpu(pargs.args[5]),
+                                  &pargs.args[nargs + 1]);
+    } else if (cmpserv("interpret", 0, 0)) {
+        pargs.args[nargs] =
+            of_client_call_interpret(spapr,
+                                     be32_to_cpu(pargs.args[0]),
+                                     be32_to_cpu(pargs.args[1]),
+                                     be32_to_cpu(pargs.args[2]),
+                                     &pargs.args[nargs + 1]);
+    } else if (cmpserv("milliseconds", 0, 1)) {
+        pargs.args[nargs] = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
+    } else if (cmpserv("quiesce", 0, 0)) {
+        of_client_quiesce(spapr);
+    } else if (cmpserv("exit", 0, 0)) {
+        error_report("Stopped as the VM requested \"exit\"");
+        vm_stop(RUN_STATE_PAUSED); /* Or qemu_system_guest_panicked(NULL); ? */
+    } else {
+        trace_spapr_of_client_error_unknown_service(service, nargs, nret);
+        pargs.args[nargs] = -1;
+    }
+
+    for (i = 0; i < nret; ++i) {
+        pargs.args[nargs + i] = be32_to_cpu(pargs.args[nargs + i]);
+    }
+
+    /* Copy what is needed as GRUB allocates only required minimum on stack */
+    cpu_physical_memory_write(of_client_args, &pargs,
+                              sizeof(uint32_t) * (3 + nargs + nret));
+
+    return H_SUCCESS;
+}
+
+static void of_instance_free(gpointer data)
+{
+    SpaprOfInstance *inst = (SpaprOfInstance *) data;
+
+    g_free(inst->path);
+    g_free(inst);
+}
+
+void spapr_setup_of_client(SpaprMachineState *spapr, target_ulong *stack_ptr,
+                           target_ulong *prom_entry)
+{
+    if (spapr->claimed) {
+        g_array_unref(spapr->claimed);
+    }
+    if (spapr->of_instances) {
+        g_hash_table_unref(spapr->of_instances);
+    }
+
+    spapr->claimed = g_array_new(false, false, sizeof(SpaprOfClaimed));
+    spapr->of_instances = g_hash_table_new_full(g_direct_hash, g_direct_equal,
+                                                NULL, of_instance_free);
+
+    *stack_ptr = of_client_claim(spapr, 0, OF_STACK_SIZE, OF_STACK_SIZE);
+    if (*stack_ptr == -1) {
+        error_report("Memory allocation for stack failed");
+        exit(1);
+    }
+    /*
+     * Stack grows downwards and we also reserve here space for
+     * the minimum stack frame.
+     */
+    *stack_ptr += OF_STACK_SIZE - 0x20;
+
+    *prom_entry = of_client_claim(spapr, 0, sizeof(of_client_blob),
+                                  sizeof(of_client_blob));
+    if (*prom_entry == -1) {
+        error_report("Memory allocation for OF client failed");
+        exit(1);
+    }
+
+    cpu_physical_memory_write(*prom_entry, of_client_blob,
+                              sizeof(of_client_blob));
+
+    if (of_client_claim(spapr, spapr->kernel_addr,
+                        spapr->kernel_size, 0) == -1) {
+        error_report("Memory for kernel is in use");
+        exit(1);
+    }
+
+    if (spapr->initrd_size &&
+        of_client_claim(spapr, spapr->initrd_base,
+                        spapr->initrd_size, 0) == -1) {
+        error_report("Memory for initramdisk is in use");
+        exit(1);
+    }
+
+    /*
+     * We skip writing FDT as nothing expects it; OF client interface is
+     * going to be used for reading the device tree.
+     */
+}
+
+static gint of_claimed_compare_func(gconstpointer a, gconstpointer b)
+{
+    return ((SpaprOfClaimed *)a)->start - ((SpaprOfClaimed *)b)->start;
+}
+
+static void of_client_dt_memory_available(void *fdt, GArray *claimed,
+                                          uint64_t base)
+{
+    int i, n, offset, proplen = 0;
+    uint64_t *mem0_reg;
+    struct { uint64_t start, size; } *avail;
+
+    if (!fdt || !claimed) {
+        return;
+    }
+
+    offset = fdt_path_offset(fdt, "/memory@0");
+    _FDT(offset);
+
+    mem0_reg = (uint64_t *) fdt_getprop(fdt, offset, "reg", &proplen);
+    if (!mem0_reg || proplen != 2 * sizeof(uint64_t)) {
+        return;
+    }
+
+    g_array_sort(claimed, of_claimed_compare_func);
+    of_client_clamed_dump(claimed);
+
+    avail = g_malloc0(sizeof(uint64_t) * 2 * claimed->len);
+    for (i = 0, n = 0; i < claimed->len; ++i) {
+        SpaprOfClaimed c = g_array_index(claimed, SpaprOfClaimed, i);
+
+        avail[n].start = c.start + c.size;
+        if (i < claimed->len - 1) {
+            SpaprOfClaimed cn = g_array_index(claimed, SpaprOfClaimed, i + 1);
+
+            avail[n].size = cn.start - avail[n].start;
+        } else {
+            avail[n].size = be64_to_cpu(mem0_reg[1]) - avail[n].start;
+        }
+
+        if (avail[n].size) {
+#ifdef DEBUG
+            error_printf("AVAIL %lx..%lx size=%ld\n", avail[n].start,
+                         avail[n].start + avail[n].size, avail[n].size);
+#endif
+            avail[n].start = cpu_to_be64(avail[n].start);
+            avail[n].size = cpu_to_be64(avail[n].size);
+            ++n;
+        }
+    }
+    _FDT((fdt_setprop(fdt, offset, "available", avail,
+                      sizeof(uint64_t) * 2 * n)));
+    g_free(avail);
+}
+
+void spapr_of_client_dt(SpaprMachineState *spapr, void *fdt)
+{
+    uint32_t phandle;
+    int i, offset, proplen = 0;
+    const void *prop;
+    bool found = false;
+    GArray *phandles = g_array_new(false, false, sizeof(uint32_t));
+    const char *nodename;
+    char *stdout_path = spapr_vio_stdout_path(spapr->vio_bus);
+    int aliases;
+    aliases = fdt_add_subnode(fdt, 0, "aliases");
+
+    if (stdout_path) {
+        fdt_setprop_string(fdt, aliases, "hvterm", stdout_path);
+    }
+
+    /* Add options now, doing it at the end of this __func__ breaks it :-/ */
+    offset = fdt_add_subnode(fdt, 0, "options");
+    if (offset > 0) {
+        struct winsize ws;
+
+        if (ioctl(1, TIOCGWINSZ, &ws) != -1) {
+            _FDT(fdt_setprop_cell(fdt, offset, "screen-#columns", ws.ws_col));
+            _FDT(fdt_setprop_cell(fdt, offset, "screen-#rows", ws.ws_row));
+        }
+        _FDT(fdt_setprop_cell(fdt, offset, "real-mode?", 1));
+    }
+
+    /* Add "disk" nodes to SCSI hosts */
+    for (offset = fdt_next_node(fdt, -1, NULL), phandle = 1;
+         offset >= 0;
+         offset = fdt_next_node(fdt, offset, NULL), ++phandle) {
+
+        nodename = fdt_get_name(fdt, offset, NULL);
+        if (strncmp(nodename, "scsi@", 5) == 0 ||
+            strncmp(nodename, "v-scsi@", 7) == 0) {
+            int disk_node_off = fdt_add_subnode(fdt, offset, "disk");
+
+            fdt_setprop_string(fdt, disk_node_off, "device_type", "block");
+        }
+    }
+
+    /* Find all predefined phandles */
+    for (offset = fdt_next_node(fdt, -1, NULL);
+         offset >= 0;
+         offset = fdt_next_node(fdt, offset, NULL)) {
+        prop = fdt_getprop(fdt, offset, "phandle", &proplen);
+        if (prop && proplen == sizeof(uint32_t)) {
+            phandle = fdt32_ld(prop);
+            g_array_append_val(phandles, phandle);
+        }
+    }
+
+    /* Assign phandles skipping the predefined ones */
+    for (offset = fdt_next_node(fdt, -1, NULL), phandle = 1;
+         offset >= 0;
+         offset = fdt_next_node(fdt, offset, NULL), ++phandle) {
+
+        prop = fdt_getprop(fdt, offset, "phandle", &proplen);
+        if (prop) {
+            continue;
+        }
+        /* Check if the current phandle is not allocated already */
+        for ( ; ; ++phandle) {
+            for (i = 0, found = false; i < phandles->len; ++i) {
+                if (phandle == g_array_index(phandles, uint32_t, i)) {
+                    found = true;
+                    break;
+                }
+            }
+            if (!found) {
+                break;
+            }
+        }
+        _FDT(fdt_setprop_cell(fdt, offset, "phandle", phandle));
+    }
+    g_array_unref(phandles);
+
+    of_client_dt_memory_available(fdt, spapr->claimed, spapr->claimed_base);
+
+    /* Advertise RTAS presense */
+    offset = fdt_path_offset(fdt, "/rtas");
+    _FDT(offset);
+    _FDT(fdt_setprop_cell(fdt, offset, "rtas-size", sizeof(rtas_blob)));
+}
+
+void spapr_of_client_dt_finalize(SpaprMachineState *spapr)
+{
+    void *fdt = spapr->fdt_blob;
+    char *stdout_path = spapr_vio_stdout_path(spapr->vio_bus);
+    int chosen = fdt_path_offset(fdt, "/chosen");
+    size_t cb = 0;
+    char *bootlist = get_boot_devices_list(&cb);
+
+    /*
+     * SLOF-less setup requires an open instance of stdout for early
+     * kernel printk. By now all phandles are settled so we can open
+     * the default serial console.
+     */
+    if (stdout_path) {
+        _FDT(fdt_setprop_cell(fdt, chosen, "stdout",
+                              spapr_of_client_open(spapr, stdout_path)));
+        _FDT(fdt_setprop_cell(fdt, chosen, "stdin",
+                              spapr_of_client_open(spapr, stdout_path)));
+    }
+
+    if (bootlist) {
+        _FDT(fdt_setprop_string(fdt, chosen, "bootpath", bootlist));
+        _FDT(fdt_setprop_string(fdt, chosen, "bootargs",
+                                spapr->bootargs ? spapr->bootargs : ""));
+    }
+}
+
+static void spapr_of_client_machine_ready(Notifier *n, void *opaque)
+{
+    SpaprMachineState *spapr = container_of(n, SpaprMachineState,
+                                            machine_ready);
+    size_t cb = 0;
+    char *bootlist = get_boot_devices_list(&cb);
+    const char *blkstr;
+    BlockBackend *blk;
+    char *cur, *next;
+    DeviceState *qdev;
+    uint64_t offset = 0, size = 0;
+    uint8_t *grub;
+    int rc;
+
+    if (spapr->kernel_size) {
+        return;
+    }
+
+    bootlist = get_boot_devices_list(&cb);
+
+    if (!bootlist) {
+        return;
+    }
+
+    for (cur = bootlist; cb > 0; cur = next + 1) {
+        for (next = cur; cb > 0; --cb) {
+            if (*next == '\n') {
+                *next = '\0';
+                ++next;
+                --cb;
+                break;
+            }
+        }
+
+        qdev = of_client_find_qom_dev(sysbus_get_default(), cur);
+        if (!qdev) {
+            continue;
+        }
+
+        blkstr = object_property_get_str(OBJECT(qdev), "drive", NULL);
+        if (!blkstr) {
+            continue;
+        }
+
+        blk = blk_by_name(blkstr);
+        if (!blk) {
+            continue;
+        }
+
+        if (find_prep_partition(blk, &offset, &size)) {
+            continue;
+        }
+
+        grub = g_malloc0(size);
+        if (!grub) {
+            continue;
+        }
+
+        rc = blk_pread(blk, offset, grub, size);
+        if (rc <= 0) {
+            g_free(grub);
+            continue;
+        }
+
+        g_file_set_contents("my.grub", (void *) grub, size, NULL);
+        spapr->kernel_size = load_elf("my.grub", NULL, NULL, NULL,
+                                      NULL, &spapr->kernel_addr,
+                                      NULL, NULL, 1,
+                                      PPC_ELF_MACHINE, 0, 0);
+        spapr->kernel_size = size;
+
+        trace_spapr_of_client_blk_bootloader_read(offset, size);
+        break;
+    }
+
+    g_free(bootlist);
+}
+
+void spapr_of_client_machine_init(SpaprMachineState *spapr)
+{
+    spapr_register_hypercall(KVMPPC_H_OF_CLIENT, spapr_h_of_client);
+    spapr->machine_ready.notify = spapr_of_client_machine_ready;
+    qemu_add_machine_init_done_notifier(&spapr->machine_ready);
+}
diff --git a/hw/ppc/trace-events b/hw/ppc/trace-events
index 9ea620f23c85..757afb66834e 100644
--- a/hw/ppc/trace-events
+++ b/hw/ppc/trace-events
@@ -21,6 +21,30 @@ spapr_update_dt(unsigned cb) "New blob %u bytes"
 spapr_update_dt_failed_size(unsigned cbold, unsigned cbnew, unsigned magic) "Old blob %u bytes, new blob %u bytes, magic 0x%x"
 spapr_update_dt_failed_check(unsigned cbold, unsigned cbnew, unsigned magic) "Old blob %u bytes, new blob %u bytes, magic 0x%x"
 
+# spapr_client.c
+spapr_of_client_error_str_truncated(const char *s, int len) "%s truncated to %d"
+spapr_of_client_error_param(const char *method, int nargscheck, int nretcheck, int nargs, int nret) "%s takes/returns %d/%d, not %d/%d"
+spapr_of_client_error_unknown_service(const char *service, int nargs, int nret) "\"%s\" args=%d rets=%d"
+spapr_of_client_error_unknown_method(const char *method) "\"%s\""
+spapr_of_client_error_unknown_ihandle_close(uint32_t ih) "ih=0x%x"
+spapr_of_client_error_unknown_path(const char *path) "\"%s\""
+spapr_of_client_finddevice(const char *path, uint32_t ph) "\"%s\" => ph=0x%x"
+spapr_of_client_claim(uint32_t virt, uint32_t size, uint32_t align, uint32_t ret) "virt=0x%x size=0x%x align=0x%x => 0x%x"
+spapr_of_client_release(uint32_t virt, uint32_t size, uint32_t ret) "virt=0x%x size=0x%x => 0x%x"
+spapr_of_client_method(uint32_t ihandle, const char *method, uint32_t param, uint32_t ret, uint32_t ret2) "ih=0x%x \"%s\"(0x%x) => 0x%x 0x%x"
+spapr_of_client_getprop(uint32_t ph, const char *prop, uint32_t ret, const char *val) "ph=0x%x \"%s\" => len=%d [%s]"
+spapr_of_client_getproplen(uint32_t ph, const char *prop, uint32_t ret) "ph=0x%x \"%s\" => len=%d"
+spapr_of_client_setprop(uint32_t ph, const char *prop, const char *val, uint32_t ret) "ph=0x%x \"%s\" [%s] => len=%d"
+spapr_of_client_open(const char *path, uint32_t ph, uint32_t ih) "%s ph=0x%x => ih=0x%x"
+spapr_of_client_interpret(const char *cmd, uint32_t param1, uint32_t param2, uint32_t ret, uint32_t ret2) "[%s] 0x%x 0x%x => 0x%x 0x%x"
+spapr_of_client_blk_write(uint32_t ih, uint32_t len) "0x%x => len=%d"
+spapr_of_client_blk_read(uint32_t ih, uint64_t pos, uint32_t len, uint32_t ret) "ih=0x%x @0x%"PRIx64" size=%d => %d"
+spapr_of_client_blk_seek(uint32_t ih, uint64_t pos, uint32_t ret) "ih=0x%x 0x%"PRIx64" => %d"
+spapr_of_client_blk_bootloader_read(uint64_t offset, uint64_t size) "0x%"PRIx64" size=0x%"PRIx64
+spapr_of_client_package_to_path(uint32_t ph, const char *tmp, uint32_t ret) "ph=0x%x => %s len=%d"
+spapr_of_client_instance_to_path(uint32_t ih, uint32_t ph, const char *tmp, uint32_t ret) "ih=0x%x ph=0x%x => %s len=%d"
+spapr_of_client_instance_to_package(uint32_t ih, uint32_t ph) "ih=0x%x => ph=0x%x"
+
 # spapr_hcall_tpm.c
 spapr_h_tpm_comm(const char *device_path, uint64_t operation) "tpm_device_path=%s operation=0x%"PRIu64
 spapr_tpm_execute(uint64_t data_in, uint64_t data_in_sz, uint64_t data_out, uint64_t data_out_sz) "data_in=0x%"PRIx64", data_in_sz=%"PRIu64", data_out=0x%"PRIx64", data_out_sz=%"PRIu64
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH qemu v6 6/6] spapr: Implement Open Firmware client interface
  2020-02-03  3:29 ` [PATCH qemu v6 6/6] spapr: Implement Open Firmware client interface Alexey Kardashevskiy
@ 2020-02-03 13:03   ` BALATON Zoltan
  2020-02-05  4:18     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 17+ messages in thread
From: BALATON Zoltan @ 2020-02-03 13:03 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Peter Maydell, David Gibson, qemu-ppc, qemu-devel, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 57041 bytes --]

On Mon, 3 Feb 2020, Alexey Kardashevskiy wrote:
[...]
> diff --git a/hw/ppc/spapr_of_client.c b/hw/ppc/spapr_of_client.c
> new file mode 100644
> index 000000000000..31555c356de8
> --- /dev/null
> +++ b/hw/ppc/spapr_of_client.c
> @@ -0,0 +1,1526 @@
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +#include <sys/ioctl.h>
> +#include <termios.h>
> +#include "qapi/error.h"
> +#include "exec/memory.h"
> +#include "hw/ppc/spapr.h"
> +#include "hw/ppc/spapr_vio.h"

I haven't read all of this as it's a lot of code but I was thinking this 
might also be useful to implement a replacement for the proprietary 
firmware of the pegasos2 emulation later. See:

https://osdn.net/projects/qmiga/wiki/SubprojectPegasos2

for details. I likely don't need a complete OpenFirmware for booting Linux 
and MorphOS as these only get some data from the device tree and then use 
their own drivers (MorphOS may use some RTAS but I'm not sure) so 
something like this might be enough to get these booting without the 
original firmware on pegasos2 and is easier than reimplementing the 
firmware or trying to modify OpenFirmware to emulate the different device 
tree that machine has.

Question is if it's possible to make this file independent of spapr or 
gather the OF emulation parts in a separate file that could be reused 
later? This could be done afterwards but if there's anything now that you 
can do to make it easier then that would help so maybe it's good idea to 
make you aware of this in case you can consider this possible use case.

Regards,
BALATON Zoltan

> +#include "hw/ppc/fdt.h"
> +#include "hw/block/block.h"
> +#include "sysemu/block-backend.h"
> +#include "sysemu/sysemu.h"
> +#include "chardev/char-fe.h"
> +#include "qom/qom-qobject.h"
> +#include "elf.h"
> +#include "hw/ppc/ppc.h"
> +#include "hw/loader.h"
> +#include "trace.h"
> +
> +#define ALIGN(x, a) (((x) + (a) - 1) & ~((a) - 1))
> +
> +struct gpt_header {
> +    char signature[8];
> +    char revision[4];
> +    uint32_t header_size;
> +    uint32_t crc;
> +    uint32_t reserved;
> +    uint64_t current_lba;
> +    uint64_t backup_lba;
> +    uint64_t first_usable_lba;
> +    uint64_t last_usable_lba;
> +    char guid[16];
> +    uint64_t partition_entries_lba;
> +    uint32_t nr_partition_entries;
> +    uint32_t size_partition_entry;
> +    uint32_t crc_partitions;
> +};
> +
> +#define GPT_SIGNATURE "EFI PART"
> +#define GPT_REVISION "\0\0\1\0" /* revision 1.0 */
> +
> +struct gpt_entry {
> +    char partition_type_guid[16];
> +    char unique_guid[16];
> +    uint64_t first_lba;
> +    uint64_t last_lba;
> +    uint64_t attributes;
> +    char name[72];                /* UTF-16LE */
> +};
> +
> +#define GPT_MIN_PARTITIONS 128
> +#define GPT_PT_ENTRY_SIZE 128
> +#define SECTOR_SIZE 512
> +
> +static int find_prep_partition_on_gpt(BlockBackend *blk, uint8_t *lba01,
> +                                      uint64_t *offset, uint64_t *size)
> +{
> +    unsigned i, partnum, partentrysize;
> +    int ret;
> +    struct gpt_header *hdr = (struct gpt_header *) (lba01 + SECTOR_SIZE);
> +    const char *prep_uuid = "9e1a2d38-c612-4316-aa26-8b49521e5a8b";
> +
> +    if (memcmp(hdr, "EFI PART", 8)) {
> +        return -1;
> +    }
> +
> +    partnum = le32_to_cpu(hdr->nr_partition_entries);
> +    partentrysize = le32_to_cpu(hdr->size_partition_entry);
> +
> +    if (partentrysize < 128 || partentrysize > 512) {
> +        return -1;
> +    }
> +
> +    for (i = 0; i < partnum; ++i) {
> +        uint8_t partdata[partentrysize];
> +        struct gpt_entry *entry = (struct gpt_entry *) partdata;
> +        unsigned long first, last;
> +        QemuUUID parttype;
> +        char *uuid;
> +
> +        ret = blk_pread(blk, 2 * SECTOR_SIZE + i * partentrysize,
> +                        partdata, sizeof(partdata));
> +        if (ret < 0) {
> +            return ret;
> +        } else if (!ret) {
> +            return -1;
> +        }
> +
> +        memcpy(parttype.data, entry->partition_type_guid, 16);
> +        parttype = qemu_uuid_bswap(parttype);
> +        first = le64_to_cpu(entry->first_lba);
> +        last = le64_to_cpu(entry->last_lba);
> +
> +        uuid = qemu_uuid_unparse_strdup(&parttype);
> +        if (!strcmp(uuid, prep_uuid)) {
> +            *offset = first * SECTOR_SIZE;
> +            *size = (last - first) * SECTOR_SIZE;
> +        }
> +    }
> +
> +    if (*offset) {
> +        return 0;
> +    }
> +
> +    return -1;
> +}
> +
> +struct partition_record {
> +    uint8_t bootable;
> +    uint8_t start_head;
> +    uint32_t start_cylinder;
> +    uint8_t start_sector;
> +    uint8_t system;
> +    uint8_t end_head;
> +    uint8_t end_cylinder;
> +    uint8_t end_sector;
> +    uint32_t start_sector_abs;
> +    uint32_t nb_sectors_abs;
> +};
> +
> +static void read_partition(uint8_t *p, struct partition_record *r)
> +{
> +    r->bootable = p[0];
> +    r->start_head = p[1];
> +    r->start_cylinder = p[3] | ((p[2] << 2) & 0x0300);
> +    r->start_sector = p[2] & 0x3f;
> +    r->system = p[4];
> +    r->end_head = p[5];
> +    r->end_cylinder = p[7] | ((p[6] << 2) & 0x300);
> +    r->end_sector = p[6] & 0x3f;
> +    r->start_sector_abs = ldl_le_p(p + 8);
> +    r->nb_sectors_abs   = ldl_le_p(p + 12);
> +}
> +
> +static int find_prep_partition(BlockBackend *blk, uint64_t *offset,
> +                               uint64_t *size)
> +{
> +    uint8_t lba01[SECTOR_SIZE * 2];
> +    int i;
> +    int ret = -ENOENT;
> +
> +    ret = blk_pread(blk, 0, lba01, sizeof(lba01));
> +    if (ret < 0) {
> +        error_report("error while reading: %s", strerror(-ret));
> +        exit(EXIT_FAILURE);
> +    }
> +
> +    if (lba01[510] != 0x55 || lba01[511] != 0xaa) {
> +        return find_prep_partition_on_gpt(blk, lba01, offset, size);
> +    }
> +
> +    for (i = 0; i < 4; i++) {
> +        struct partition_record part = { 0 };
> +
> +        read_partition(&lba01[446 + 16 * i], &part);
> +        if (!part.system || !part.nb_sectors_abs) {
> +            continue;
> +        }
> +
> +        /* 0xEE == GPT */
> +        if (part.system == 0xEE) {
> +            ret = find_prep_partition_on_gpt(blk, lba01, offset, size);
> +        }
> +        /* 0x41 == PReP */
> +        if (part.system == 0x41) {
> +            *offset = (uint64_t)part.start_sector_abs << 9;
> +            *size = (uint64_t)part.nb_sectors_abs << 9;
> +            ret = 0;
> +        }
> +    }
> +
> +    return ret;
> +}
> +
> +/*
> + * Below is a compiled version of RTAS blob and OF client interface entry point.
> + *
> + * gcc -nostdlib  -mbig -o spapr-rtas.img spapr-rtas.S
> + * objcopy  -O binary -j .text  spapr-rtas.img spapr-rtas.bin
> + *
> + *   .globl  _start
> + *   _start:
> + *           mr      4,3
> + *           lis     3,KVMPPC_H_RTAS@h
> + *           ori     3,3,KVMPPC_H_RTAS@l
> + *           sc      1
> + *           blr
> + * ...
> + */
> +static const uint8_t rtas_blob[] = {
> +    0x7c, 0x64, 0x1b, 0x78,
> +    0x3c, 0x60, 0x00, 0x00,
> +    0x60, 0x63, 0xf0, 0x00,
> +    0x44, 0x00, 0x00, 0x22,
> +    0x4e, 0x80, 0x00, 0x20
> +};
> +
> +/*
> + * ...
> + *           mr      4,3
> + *           lis     3,KVMPPC_H_OF_CLIENT@h
> + *           ori     3,3,KVMPPC_H_OF_CLIENT@l
> + *           sc      1
> + *           blr
> + */
> +static const uint8_t of_client_blob[] = {
> +    0x7c, 0x64, 0x1b, 0x78,
> +    0x3c, 0x60, 0x00, 0x00,
> +    0x60, 0x63, 0xf0, 0x05,
> +    0x44, 0x00, 0x00, 0x22,
> +    0x4e, 0x80, 0x00, 0x20
> +};
> +
> +typedef struct {
> +    DeviceState *dev;
> +    CharBackend *cbe;
> +    BlockBackend *blk;
> +    uint64_t blk_pos;
> +    uint16_t blk_physical_block_size;
> +    char *path; /* the path used to open the instance */
> +    uint32_t phandle;
> +} SpaprOfInstance;
> +
> +/*
> + * OF 1275 "nextprop" description suggests is it 32 bytes max but
> + * LoPAPR defines "ibm,query-interrupt-source-number" which is 33 chars long.
> + */
> +#define OF_PROPNAME_LEN_MAX 64
> +
> +/* Copied from SLOF, and 4K is definitely not enough for GRUB */
> +#define OF_STACK_SIZE       0x8000
> +
> +/* Defined as Big Endian */
> +struct prom_args {
> +    uint32_t service;
> +    uint32_t nargs;
> +    uint32_t nret;
> +    uint32_t args[10];
> +};
> +
> +static void readstr(hwaddr pa, char *buf, int size)
> +{
> +    cpu_physical_memory_read(pa, buf, size);
> +    if (buf[size - 1] != '\0') {
> +        buf[size - 1] = '\0';
> +        if (strlen(buf) == size - 1) {
> +            trace_spapr_of_client_error_str_truncated(buf, size);
> +        }
> +    }
> +}
> +
> +static bool cmpservice(const char *s, size_t len,
> +                       unsigned nargs, unsigned nret,
> +                       const char *s1, size_t len1,
> +                       unsigned nargscheck, unsigned nretcheck)
> +{
> +    if (strcmp(s, s1)) {
> +        return false;
> +    }
> +    if ((nargscheck && (nargs != nargscheck)) ||
> +        (nretcheck && (nret != nretcheck))) {
> +        trace_spapr_of_client_error_param(s, nargscheck, nretcheck, nargs,
> +                                          nret);
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
> +static void split_path(const char *fullpath, char **node, char **unit,
> +                       char **part)
> +{
> +    const char *c, *p = NULL, *u = NULL;
> +
> +    *node = *unit = *part = NULL;
> +
> +    if (fullpath[0] == '\0') {
> +        *node = g_strdup(fullpath);
> +        return;
> +    }
> +
> +    for (c = fullpath + strlen(fullpath) - 1; c > fullpath; --c) {
> +        if (*c == '/') {
> +            break;
> +        }
> +        if (*c == ':') {
> +            p = c + 1;
> +            continue;
> +        }
> +        if (*c == '@') {
> +            u = c + 1;
> +            continue;
> +        }
> +    }
> +
> +    if (p && u && p < u) {
> +        p = NULL;
> +    }
> +
> +    if (u && p) {
> +        *node = g_strndup(fullpath, u - fullpath - 1);
> +        *unit = g_strndup(u, p - u - 1);
> +        *part = g_strdup(p);
> +    } else if (!u && p) {
> +        *node = g_strndup(fullpath, p - fullpath - 1);
> +        *part = g_strdup(p);
> +    } else if (!p && u) {
> +        *node = g_strndup(fullpath, u - fullpath - 1);
> +        *unit = g_strdup(u);
> +    } else {
> +        *node = g_strdup(fullpath);
> +    }
> +}
> +
> +static void prop_format(char *tval, int tlen, const void *prop, int len)
> +{
> +    int i;
> +    const char *c;
> +    char *t;
> +    const char bin[] = "...";
> +
> +    for (i = 0, c = prop; i < len; ++i, ++c) {
> +        if (*c == '\0' && i == len - 1) {
> +            strncpy(tval, prop, tlen - 1);
> +            return;
> +        }
> +        if (*c < 0x20 || *c >= 0x80) {
> +            /* Not reliably printable string so assume it is binary */
> +            break;
> +        }
> +    }
> +
> +    /* Accidentally the binary will look like big endian (which it is) */
> +    for (i = 0, c = prop, t = tval; i < len; ++i, ++c) {
> +        if (t >= tval + tlen - sizeof(bin) - 1 - 2 - 1) {
> +            strcpy(t, bin);
> +            return;
> +        }
> +        if (i && i % 4 == 0 && i != len - 1) {
> +            strcat(t, " ");
> +            ++t;
> +        }
> +        t += sprintf(t, "%02X", *c & 0xFF);
> +    }
> +}
> +
> +static int of_client_fdt_path_offset(const void *fdt, const char *node,
> +                                     const char *unit)
> +{
> +    int offset;
> +
> +    offset = fdt_path_offset(fdt, node);
> +
> +    if (offset < 0 && unit) {
> +        char *tmp = g_strdup_printf("%s@%s", node, unit);
> +
> +        offset = fdt_path_offset(fdt, tmp);
> +        g_free(tmp);
> +    }
> +
> +    return offset;
> +}
> +
> +static uint32_t of_client_finddevice(const void *fdt, uint32_t nodeaddr)
> +{
> +    char *node, *unit, *part;
> +    char fullnode[1024];
> +    uint32_t ret = -1;
> +    int offset;
> +
> +    readstr(nodeaddr, fullnode, sizeof(fullnode));
> +
> +    split_path(fullnode, &node, &unit, &part);
> +    offset = of_client_fdt_path_offset(fdt, node, unit);
> +    if (offset >= 0) {
> +        ret = fdt_get_phandle(fdt, offset);
> +    }
> +    trace_spapr_of_client_finddevice(fullnode, ret);
> +    g_free(node);
> +    g_free(unit);
> +    g_free(part);
> +    return (uint32_t) ret;
> +}
> +
> +static uint32_t of_client_getprop(const void *fdt, uint32_t nodeph,
> +                                  uint32_t pname, uint32_t valaddr,
> +                                  uint32_t vallen)
> +{
> +    char propname[OF_PROPNAME_LEN_MAX + 1];
> +    uint32_t ret = 0;
> +    int proplen = 0;
> +    const void *prop;
> +    char trval[64] = "";
> +    int nodeoff = fdt_node_offset_by_phandle(fdt, nodeph);
> +
> +    readstr(pname, propname, sizeof(propname));
> +    if (strcmp(propname, "name") == 0) {
> +        prop = fdt_get_name(fdt, nodeoff, &proplen);
> +        proplen += 1;
> +    } else {
> +        prop = fdt_getprop(fdt, nodeoff, propname, &proplen);
> +    }
> +
> +    if (prop) {
> +        int cb = MIN(proplen, vallen);
> +
> +        cpu_physical_memory_write(valaddr, prop, cb);
> +        /*
> +         * OF1275 says:
> +         * "Size is either the actual size of the property, or –1 if name
> +         * does not exist", hence returning proplen instead of cb.
> +         */
> +        ret = proplen;
> +        prop_format(trval, sizeof(trval), prop, ret);
> +    } else {
> +        ret = -1;
> +    }
> +    trace_spapr_of_client_getprop(nodeph, propname, ret, trval);
> +
> +    return ret;
> +}
> +
> +static uint32_t of_client_getproplen(const void *fdt, uint32_t nodeph,
> +                                     uint32_t pname)
> +{
> +    char propname[OF_PROPNAME_LEN_MAX + 1];
> +    uint32_t ret = 0;
> +    int proplen = 0;
> +    const void *prop;
> +    int nodeoff = fdt_node_offset_by_phandle(fdt, nodeph);
> +
> +    readstr(pname, propname, sizeof(propname));
> +    if (strcmp(propname, "name") == 0) {
> +        prop = fdt_get_name(fdt, nodeoff, &proplen);
> +        proplen += 1;
> +    } else {
> +        prop = fdt_getprop(fdt, nodeoff, propname, &proplen);
> +    }
> +
> +    if (prop) {
> +        ret = proplen;
> +    } else {
> +        ret = -1;
> +    }
> +    trace_spapr_of_client_getproplen(nodeph, propname, ret);
> +
> +    return ret;
> +}
> +
> +static uint32_t of_client_setprop(SpaprMachineState *spapr,
> +                                  uint32_t nodeph, uint32_t pname,
> +                                  uint32_t valaddr, uint32_t vallen)
> +{
> +    char propname[OF_PROPNAME_LEN_MAX + 1];
> +    uint32_t ret = -1;
> +    int offset;
> +    char trval[64] = "";
> +
> +    readstr(pname, propname, sizeof(propname));
> +    /*
> +     * We only allow changing properties which we know how to update on
> +     * the QEMU side.
> +     */
> +    if (vallen == sizeof(uint32_t)) {
> +        uint32_t val32 = ldl_be_phys(first_cpu->as, valaddr);
> +
> +        if ((strcmp(propname, "linux,rtas-base") == 0) ||
> +            (strcmp(propname, "linux,rtas-entry") == 0)) {
> +            spapr->rtas_base = val32;
> +        } else if (strcmp(propname, "linux,initrd-start") == 0) {
> +            spapr->initrd_base = val32;
> +        } else if (strcmp(propname, "linux,initrd-end") == 0) {
> +            spapr->initrd_size = val32 - spapr->initrd_base;
> +        } else {
> +            goto trace_exit;
> +        }
> +    } else if (vallen == sizeof(uint64_t)) {
> +        uint64_t val64 = ldq_be_phys(first_cpu->as, valaddr);
> +
> +        if (strcmp(propname, "linux,initrd-start") == 0) {
> +            spapr->initrd_base = val64;
> +        } else if (strcmp(propname, "linux,initrd-end") == 0) {
> +            spapr->initrd_size = val64 - spapr->initrd_base;
> +        } else {
> +            goto trace_exit;
> +        }
> +    } else if (strcmp(propname, "bootargs") == 0) {
> +        char val[1024];
> +
> +        readstr(valaddr, val, sizeof(val));
> +        g_free(spapr->bootargs);
> +        spapr->bootargs = g_strdup(val);
> +    } else {
> +        goto trace_exit;
> +    }
> +
> +    offset = fdt_node_offset_by_phandle(spapr->fdt_blob, nodeph);
> +    if (offset >= 0) {
> +        uint8_t data[vallen];
> +
> +        cpu_physical_memory_read(valaddr, data, vallen);
> +        if (!fdt_setprop(spapr->fdt_blob, offset, propname, data, vallen)) {
> +            ret = vallen;
> +            prop_format(trval, sizeof(trval), data, ret);
> +        }
> +    }
> +
> +trace_exit:
> +    trace_spapr_of_client_setprop(nodeph, propname, trval, ret);
> +
> +    return ret;
> +}
> +
> +static uint32_t of_client_nextprop(const void *fdt, uint32_t phandle,
> +                                   uint32_t prevaddr, uint32_t nameaddr)
> +{
> +    int offset = fdt_node_offset_by_phandle(fdt, phandle);
> +    char prev[OF_PROPNAME_LEN_MAX + 1];
> +    const char *tmp;
> +
> +    readstr(prevaddr, prev, sizeof(prev));
> +    for (offset = fdt_first_property_offset(fdt, offset);
> +         offset >= 0;
> +         offset = fdt_next_property_offset(fdt, offset)) {
> +
> +        if (!fdt_getprop_by_offset(fdt, offset, &tmp, NULL)) {
> +            return 0;
> +        }
> +        if (prev[0] == '\0' || strcmp(prev, tmp) == 0) {
> +            if (prev[0] != '\0') {
> +                offset = fdt_next_property_offset(fdt, offset);
> +                if (offset < 0) {
> +                    return 0;
> +                }
> +            }
> +            if (!fdt_getprop_by_offset(fdt, offset, &tmp, NULL)) {
> +                return 0;
> +            }
> +
> +            cpu_physical_memory_write(nameaddr, tmp, strlen(tmp) + 1);
> +            return 1;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +static uint32_t of_client_peer(const void *fdt, uint32_t phandle)
> +{
> +    int ret;
> +
> +    if (phandle == 0) {
> +        ret = fdt_path_offset(fdt, "/");
> +    } else {
> +        ret = fdt_next_subnode(fdt, fdt_node_offset_by_phandle(fdt, phandle));
> +    }
> +
> +    if (ret < 0) {
> +        ret = 0;
> +    } else {
> +        ret = fdt_get_phandle(fdt, ret);
> +    }
> +
> +    return ret;
> +}
> +
> +static uint32_t of_client_child(const void *fdt, uint32_t phandle)
> +{
> +    int ret = fdt_first_subnode(fdt, fdt_node_offset_by_phandle(fdt, phandle));
> +
> +    if (ret < 0) {
> +        ret = 0;
> +    } else {
> +        ret = fdt_get_phandle(fdt, ret);
> +    }
> +
> +    return ret;
> +}
> +
> +static uint32_t of_client_parent(const void *fdt, uint32_t phandle)
> +{
> +    int ret = fdt_parent_offset(fdt, fdt_node_offset_by_phandle(fdt, phandle));
> +
> +    if (ret < 0) {
> +        ret = 0;
> +    } else {
> +        ret = fdt_get_phandle(fdt, ret);
> +    }
> +
> +    return ret;
> +}
> +
> +static DeviceState *of_client_find_qom_dev(BusState *bus, const char *path)
> +{
> +    BusChild *kid;
> +
> +    QTAILQ_FOREACH(kid, &bus->children, sibling) {
> +        const char *p = qdev_get_fw_dev_path(kid->child);
> +        BusState *child;
> +
> +        if (p && strcmp(path, p) == 0) {
> +            return kid->child;
> +        }
> +        QLIST_FOREACH(child, &kid->child->child_bus, sibling) {
> +            DeviceState *d = of_client_find_qom_dev(child, path);
> +
> +            if (d) {
> +                return d;
> +            }
> +        }
> +    }
> +    return NULL;
> +}
> +
> +static uint32_t spapr_of_client_open(SpaprMachineState *spapr, const char *path)
> +{
> +    int offset;
> +    uint32_t ret = 0;
> +    SpaprOfInstance *inst = NULL;
> +    char *node, *unit, *part;
> +
> +    if (spapr->of_instance_last == 0xFFFFFFFF) {
> +        /* We do not recycle ihandles yet */
> +        goto trace_exit;
> +    }
> +
> +    split_path(path, &node, &unit, &part);
> +    if (part && strcmp(part, "0")) {
> +        error_report("Error: Do not do partitions now");
> +        g_free(part);
> +        part = NULL;
> +    }
> +
> +    offset = of_client_fdt_path_offset(spapr->fdt_blob, node, unit);
> +    if (offset < 0) {
> +        trace_spapr_of_client_error_unknown_path(path);
> +        goto trace_exit;
> +    }
> +
> +    inst = g_new0(SpaprOfInstance, 1);
> +    inst->phandle = fdt_get_phandle(spapr->fdt_blob, offset);
> +    g_assert(inst->phandle);
> +    ++spapr->of_instance_last;
> +
> +    inst->dev = of_client_find_qom_dev(sysbus_get_default(), node);
> +    if (!inst->dev) {
> +        char *tmp = g_strdup_printf("%s@%s", node, unit);
> +        inst->dev = of_client_find_qom_dev(sysbus_get_default(), tmp);
> +        g_free(tmp);
> +    }
> +    inst->path = g_strdup(path);
> +    g_hash_table_insert(spapr->of_instances,
> +                        GINT_TO_POINTER(spapr->of_instance_last),
> +                        inst);
> +    ret = spapr->of_instance_last;
> +
> +    if (inst->dev) {
> +        const char *cdevstr = object_property_get_str(OBJECT(inst->dev),
> +                                                      "chardev", NULL);
> +        const char *blkstr = object_property_get_str(OBJECT(inst->dev),
> +                                                     "drive", NULL);
> +
> +        if (cdevstr) {
> +            Chardev *cdev = qemu_chr_find(cdevstr);
> +
> +            if (cdev) {
> +                inst->cbe = cdev->be;
> +            }
> +        } else if (blkstr) {
> +            BlockConf conf = { 0 };
> +
> +            inst->blk = blk_by_name(blkstr);
> +            conf.blk = inst->blk;
> +            blkconf_blocksizes(&conf);
> +            inst->blk_physical_block_size = conf.physical_block_size;
> +        }
> +    }
> +
> +trace_exit:
> +    trace_spapr_of_client_open(path, inst ? inst->phandle : 0, ret);
> +    g_free(node);
> +    g_free(unit);
> +    g_free(part);
> +
> +    return ret;
> +}
> +
> +static uint32_t of_client_open(SpaprMachineState *spapr, uint32_t pathaddr)
> +{
> +    char path[256];
> +
> +    readstr(pathaddr, path, sizeof(path));
> +
> +    return spapr_of_client_open(spapr, path);
> +}
> +
> +static void of_client_close(SpaprMachineState *spapr, uint32_t ihandle)
> +{
> +    if (!g_hash_table_remove(spapr->of_instances, GINT_TO_POINTER(ihandle))) {
> +        trace_spapr_of_client_error_unknown_ihandle_close(ihandle);
> +    }
> +}
> +
> +static uint32_t of_client_instance_to_package(SpaprMachineState *spapr,
> +                                              uint32_t ihandle)
> +{
> +    gpointer instp = g_hash_table_lookup(spapr->of_instances,
> +                                         GINT_TO_POINTER(ihandle));
> +    uint32_t ret = -1;
> +
> +    if (instp) {
> +        ret = ((SpaprOfInstance *)instp)->phandle;
> +    }
> +    trace_spapr_of_client_instance_to_package(ihandle, ret);
> +
> +    return ret;
> +}
> +
> +static uint32_t of_client_package_to_path(const void *fdt, uint32_t phandle,
> +                                          uint32_t buf, uint32_t len)
> +{
> +    uint32_t ret = -1;
> +    char tmp[256] = "";
> +
> +    if (0 == fdt_get_path(fdt, fdt_node_offset_by_phandle(fdt, phandle), tmp,
> +                          sizeof(tmp))) {
> +        tmp[sizeof(tmp) - 1] = 0;
> +        ret = MIN(len, strlen(tmp) + 1);
> +        cpu_physical_memory_write(buf, tmp, ret);
> +    }
> +
> +    trace_spapr_of_client_package_to_path(phandle, tmp, ret);
> +
> +    return ret;
> +}
> +
> +static uint32_t of_client_instance_to_path(SpaprMachineState *spapr,
> +                                           uint32_t ihandle, uint32_t buf,
> +                                           uint32_t len)
> +{
> +    uint32_t ret = -1;
> +    uint32_t phandle = of_client_instance_to_package(spapr, ihandle);
> +    char tmp[256] = "";
> +
> +    if (phandle != -1) {
> +        if (0 == fdt_get_path(spapr->fdt_blob,
> +                              fdt_node_offset_by_phandle(spapr->fdt_blob,
> +                                                         phandle),
> +                              tmp, sizeof(tmp))) {
> +            tmp[sizeof(tmp) - 1] = 0;
> +            ret = MIN(len, strlen(tmp) + 1);
> +            cpu_physical_memory_write(buf, tmp, ret);
> +        }
> +    }
> +    trace_spapr_of_client_instance_to_path(ihandle, phandle, tmp, ret);
> +
> +    return ret;
> +}
> +
> +static uint32_t of_client_write(SpaprMachineState *spapr, uint32_t ihandle,
> +                                uint32_t buf, uint32_t len)
> +{
> +    char tmp[256];
> +    int toread, toprint, cb = MIN(len, 1024);
> +    SpaprOfInstance *inst = (SpaprOfInstance *)
> +        g_hash_table_lookup(spapr->of_instances, GINT_TO_POINTER(ihandle));
> +
> +    while (cb > 0) {
> +        toread = MIN(cb, sizeof(tmp) - 1);
> +
> +        cpu_physical_memory_read(buf, tmp, toread);
> +
> +        toprint = toread;
> +        if (inst) {
> +            if (inst->cbe) {
> +                toprint = qemu_chr_fe_write_all(inst->cbe, (uint8_t *) tmp,
> +                                                toprint);
> +            } else if (inst->blk) {
> +                trace_spapr_of_client_blk_write(ihandle, len);
> +            }
> +        } else {
> +            /* We normally open stdout so this is fallback */
> +            tmp[toprint] = '\0';
> +            printf("DBG[%d]%s", ihandle, tmp);
> +        }
> +        buf += toprint;
> +        cb -= toprint;
> +    }
> +
> +    return len;
> +}
> +
> +static uint32_t of_client_read(SpaprMachineState *spapr, uint32_t ihandle,
> +                               uint32_t bufaddr, uint32_t len)
> +{
> +    uint32_t ret = 0;
> +    SpaprOfInstance *inst = (SpaprOfInstance *)
> +        g_hash_table_lookup(spapr->of_instances, GINT_TO_POINTER(ihandle));
> +
> +    if (inst) {
> +        hwaddr xlat = 0;
> +        hwaddr xlen = len;
> +        MemoryRegion *mr = address_space_translate(&address_space_memory,
> +                                                   bufaddr, &xlat, &xlen, true,
> +                                                   MEMTXATTRS_UNSPECIFIED);
> +
> +        if (mr && xlen == len) {
> +            uint8_t *buf = memory_region_get_ram_ptr(mr) + xlat;
> +
> +            if (inst->cbe) {
> +                SpaprVioDevice *sdev = VIO_SPAPR_DEVICE(inst->dev);
> +
> +                ret = vty_getchars(sdev, buf, len); /* qemu_chr_fe_read_all? */
> +            } else if (inst->blk) {
> +                int rc = blk_pread(inst->blk, inst->blk_pos, buf, len);
> +
> +                if (rc > 0) {
> +                    ret = rc;
> +                }
> +                trace_spapr_of_client_blk_read(ihandle, inst->blk_pos, len,
> +                                               ret);
> +                if (rc > 0) {
> +                    inst->blk_pos += rc;
> +                }
> +            }
> +        }
> +    }
> +
> +    return ret;
> +}
> +
> +static uint32_t of_client_seek(SpaprMachineState *spapr, uint32_t ihandle,
> +                               uint32_t hi, uint32_t lo)
> +{
> +    uint32_t ret = -1;
> +    uint64_t pos = ((uint64_t) hi << 32) | lo;
> +    SpaprOfInstance *inst = (SpaprOfInstance *)
> +        g_hash_table_lookup(spapr->of_instances, GINT_TO_POINTER(ihandle));
> +
> +    if (inst) {
> +        if (inst->blk) {
> +            inst->blk_pos = pos;
> +            ret = 1;
> +            trace_spapr_of_client_blk_seek(ihandle, pos, ret);
> +        }
> +    }
> +
> +    return ret;
> +}
> +
> +static void of_client_clamed_dump(GArray *claimed)
> +{
> +#ifdef DEBUG
> +    int i;
> +    SpaprOfClaimed c;
> +
> +    for (i = 0; i < claimed->len; ++i) {
> +        c = g_array_index(claimed, SpaprOfClaimed, i);
> +        error_printf("CLAIMED %lx..%lx size=%ld\n", c.start, c.start + c.size,
> +                     c.size);
> +    }
> +#endif
> +}
> +
> +static bool of_client_claim_avail(GArray *claimed, uint64_t virt, uint64_t size)
> +{
> +    int i;
> +    SpaprOfClaimed c;
> +
> +    for (i = 0; i < claimed->len; ++i) {
> +        c = g_array_index(claimed, SpaprOfClaimed, i);
> +        if ((c.start <= virt && virt < c.start + c.size) ||
> +            (virt <= c.start && c.start < virt + size)) {
> +            return false;
> +        }
> +    }
> +
> +    return true;
> +}
> +
> +static void of_client_claim_add(GArray *claimed, uint64_t virt, uint64_t size)
> +{
> +    SpaprOfClaimed newclaim;
> +
> +    newclaim.start = virt;
> +    newclaim.size = size;
> +    g_array_append_val(claimed, newclaim);
> +}
> +
> +/*
> + * "claim" claims memory at @virt if @align==0; otherwise it allocates
> + * memory at the requested alignment.
> + */
> +static void of_client_dt_memory_available(void *fdt, GArray *claimed,
> +                                          uint64_t base);
> +
> +static uint64_t of_client_claim(SpaprMachineState *spapr, uint64_t virt,
> +                                uint64_t size, uint64_t align)
> +{
> +    uint64_t ret;
> +
> +    if (align == 0) {
> +        if (!of_client_claim_avail(spapr->claimed, virt, size)) {
> +            ret = -1;
> +        } else {
> +            ret = virt;
> +        }
> +    } else {
> +        spapr->claimed_base = ALIGN(spapr->claimed_base, align);
> +        while (1) {
> +            if (spapr->claimed_base >= spapr->rma_size) {
> +                error_report("Out of RMA memory for the OF client");
> +                return -1;
> +            }
> +            if (of_client_claim_avail(spapr->claimed, spapr->claimed_base,
> +                                      size)) {
> +                break;
> +            }
> +            spapr->claimed_base += size;
> +        }
> +        ret = spapr->claimed_base;
> +    }
> +
> +    if (ret != -1) {
> +        spapr->claimed_base = MAX(spapr->claimed_base, ret + size);
> +        of_client_claim_add(spapr->claimed, ret, size);
> +        /* The client reads "/memory@0/available" to know where it can claim */
> +        of_client_dt_memory_available(spapr->fdt_blob, spapr->claimed,
> +                                      spapr->claimed_base);
> +    }
> +    trace_spapr_of_client_claim(virt, size, align, ret);
> +
> +    return ret;
> +}
> +
> +static uint32_t of_client_release(SpaprMachineState *spapr, uint64_t virt,
> +                                  uint64_t size)
> +{
> +    uint32_t ret = -1;
> +    int i;
> +    GArray *claimed = spapr->claimed;
> +    SpaprOfClaimed c;
> +
> +    for (i = 0; i < claimed->len; ++i) {
> +        c = g_array_index(claimed, SpaprOfClaimed, i);
> +        if (c.start == virt && c.size == size) {
> +            g_array_remove_index(claimed, i);
> +            ret = 0;
> +            break;
> +        }
> +    }
> +
> +    trace_spapr_of_client_release(virt, size, ret);
> +
> +    return ret;
> +}
> +
> +static void of_client_instantiate_rtas(SpaprMachineState *spapr, uint32_t base)
> +{
> +    uint64_t check_claim = of_client_claim(spapr, base, sizeof(rtas_blob), 0);
> +    /*
> +     * The client should have claimed RTAS memory, make sure it has or
> +     * just claim it here with a warning.
> +     */
> +    if (check_claim != -1) {
> +        error_report("The OF client did not claim RTAS memory at 0x%x", base);
> +    }
> +    spapr->rtas_base = base;
> +    cpu_physical_memory_write(base, rtas_blob, sizeof(rtas_blob));
> +}
> +
> +static uint32_t of_client_call_method(SpaprMachineState *spapr,
> +                                      uint32_t methodaddr, uint32_t ihandle,
> +                                      uint32_t param1, uint32_t param2,
> +                                      uint32_t param3, uint32_t param4,
> +                                      uint32_t *ret2)
> +{
> +    uint32_t ret = -1;
> +    char method[256] = "";
> +    SpaprOfInstance *inst = NULL;
> +
> +    if (!ihandle) {
> +        goto trace_exit;
> +    }
> +
> +    inst = (SpaprOfInstance *) g_hash_table_lookup(spapr->of_instances,
> +                                                   GINT_TO_POINTER(ihandle));
> +    if (!inst) {
> +        goto trace_exit;
> +    }
> +
> +    readstr(methodaddr, method, sizeof(method));
> +
> +    if (strcmp(inst->path, "/") == 0) {
> +        if (strcmp(method, "ibm,client-architecture-support") == 0) {
> +            ret = do_client_architecture_support(POWERPC_CPU(first_cpu), spapr,
> +                                                 param1, FDT_MAX_SIZE);
> +            *ret2 = 0;
> +        }
> +    } else if (strcmp(inst->path, "/rtas") == 0) {
> +        if (strcmp(method, "instantiate-rtas") == 0) {
> +            of_client_instantiate_rtas(spapr, param1);
> +            ret = 0;
> +            *ret2 = param1; /* rtasbase */
> +        }
> +    } else if (inst->cbe) {
> +        if (strcmp(method, "color!") == 0) {
> +            /* do not bother about colors now */
> +            ret = 0;
> +        }
> +    } else if (inst->blk) {
> +        if (strcmp(method, "block-size") == 0) {
> +            ret = 0;
> +            *ret2 = inst->blk_physical_block_size;
> +        } else if (strcmp(method, "#blocks") == 0) {
> +            ret = 0;
> +            *ret2 = blk_getlength(inst->blk) / inst->blk_physical_block_size;
> +        }
> +     } else if (inst->dev) {
> +        if (strcmp(method, "vscsi-report-luns") == 0) {
> +            /* TODO: Not implemented yet, not clear when it is really needed */
> +            ret = -1;
> +            *ret2 = 1;
> +        }
> +    } else {
> +        trace_spapr_of_client_error_unknown_method(method);
> +    }
> +
> +trace_exit:
> +    trace_spapr_of_client_method(ihandle, method, param1, ret, *ret2);
> +
> +    return ret;
> +}
> +
> +static uint32_t of_client_call_interpret(SpaprMachineState *spapr,
> +                                         uint32_t cmdaddr, uint32_t param1,
> +                                         uint32_t param2, uint32_t *ret2)
> +{
> +    uint32_t ret = -1;
> +    char cmd[256] = "";
> +
> +    readstr(cmdaddr, cmd, sizeof(cmd));
> +    trace_spapr_of_client_interpret(cmd, param1, param2, ret, *ret2);
> +
> +    return ret;
> +}
> +
> +static void of_client_quiesce(SpaprMachineState *spapr)
> +{
> +    /* We could as well just free the blob as there is no use for it from now */
> +    int rc = fdt_pack(spapr->fdt_blob);
> +    /* Should only fail if we've built a corrupted tree */
> +    assert(rc == 0);
> +
> +    spapr->fdt_size = fdt_totalsize(spapr->fdt_blob);
> +    spapr->fdt_initial_size = spapr->fdt_size;
> +    of_client_clamed_dump(spapr->claimed);
> +}
> +
> +static target_ulong spapr_h_of_client(PowerPCCPU *cpu, SpaprMachineState *spapr,
> +                                      target_ulong opcode, target_ulong *args)
> +{
> +    target_ulong of_client_args = ppc64_phys_to_real(args[0]);
> +    struct prom_args pargs = { 0 };
> +    char service[64];
> +    unsigned nargs, nret;
> +    int i, servicelen;
> +
> +    cpu_physical_memory_read(of_client_args, &pargs, sizeof(pargs));
> +    nargs = be32_to_cpu(pargs.nargs);
> +    nret = be32_to_cpu(pargs.nret);
> +    readstr(be32_to_cpu(pargs.service), service, sizeof(service));
> +    servicelen = strlen(service);
> +
> +    if (nargs >= ARRAY_SIZE(pargs.args)) {
> +        /* Bounds checking: something is just very wrong */
> +        return H_PARAMETER;
> +    }
> +
> +#define cmpserv(s, a, r) \
> +    cmpservice(service, servicelen, nargs, nret, (s), sizeof(s), (a), (r))
> +
> +    if (cmpserv("finddevice", 1, 1)) {
> +        pargs.args[nargs] =
> +            of_client_finddevice(spapr->fdt_blob,
> +                                 be32_to_cpu(pargs.args[0]));
> +    } else if (cmpserv("getprop", 4, 1)) {
> +        pargs.args[nargs] =
> +            of_client_getprop(spapr->fdt_blob,
> +                              be32_to_cpu(pargs.args[0]),
> +                              be32_to_cpu(pargs.args[1]),
> +                              be32_to_cpu(pargs.args[2]),
> +                              be32_to_cpu(pargs.args[3]));
> +    } else if (cmpserv("getproplen", 2, 1)) {
> +        pargs.args[nargs] =
> +            of_client_getproplen(spapr->fdt_blob,
> +                                 be32_to_cpu(pargs.args[0]),
> +                                 be32_to_cpu(pargs.args[1]));
> +    } else if (cmpserv("setprop", 4, 1)) {
> +        pargs.args[nargs] =
> +            of_client_setprop(spapr,
> +                              be32_to_cpu(pargs.args[0]),
> +                              be32_to_cpu(pargs.args[1]),
> +                              be32_to_cpu(pargs.args[2]),
> +                              be32_to_cpu(pargs.args[3]));
> +    } else if (cmpserv("nextprop", 3, 1)) {
> +        pargs.args[nargs] =
> +            of_client_nextprop(spapr->fdt_blob,
> +                               be32_to_cpu(pargs.args[0]),
> +                               be32_to_cpu(pargs.args[1]),
> +                               be32_to_cpu(pargs.args[2]));
> +    } else if (cmpserv("peer", 1, 1)) {
> +        pargs.args[nargs] =
> +            of_client_peer(spapr->fdt_blob,
> +                           be32_to_cpu(pargs.args[0]));
> +    } else if (cmpserv("child", 1, 1)) {
> +        pargs.args[nargs] =
> +            of_client_child(spapr->fdt_blob,
> +                            be32_to_cpu(pargs.args[0]));
> +    } else if (cmpserv("parent", 1, 1)) {
> +        pargs.args[nargs] =
> +            of_client_parent(spapr->fdt_blob,
> +                             be32_to_cpu(pargs.args[0]));
> +    } else if (cmpserv("open", 1, 1)) {
> +        pargs.args[nargs] = of_client_open(spapr, be32_to_cpu(pargs.args[0]));
> +    } else if (cmpserv("close", 1, 0)) {
> +        of_client_close(spapr, be32_to_cpu(pargs.args[0]));
> +    } else if (cmpserv("instance-to-package", 1, 1)) {
> +        pargs.args[nargs] =
> +            of_client_instance_to_package(spapr,
> +                                          be32_to_cpu(pargs.args[0]));
> +    } else if (cmpserv("package-to-path", 3, 1)) {
> +        pargs.args[nargs] =
> +            of_client_package_to_path(spapr->fdt_blob,
> +                                      be32_to_cpu(pargs.args[0]),
> +                                      be32_to_cpu(pargs.args[1]),
> +                                      be32_to_cpu(pargs.args[2]));
> +    } else if (cmpserv("instance-to-path", 3, 1)) {
> +        pargs.args[nargs] =
> +            of_client_instance_to_path(spapr,
> +                                       be32_to_cpu(pargs.args[0]),
> +                                       be32_to_cpu(pargs.args[1]),
> +                                       be32_to_cpu(pargs.args[2]));
> +    } else if (cmpserv("write", 3, 1)) {
> +        pargs.args[nargs] =
> +            of_client_write(spapr,
> +                            be32_to_cpu(pargs.args[0]),
> +                            be32_to_cpu(pargs.args[1]),
> +                            be32_to_cpu(pargs.args[2]));
> +    } else if (cmpserv("read", 3, 1)) {
> +        pargs.args[nargs] =
> +            of_client_read(spapr,
> +                           be32_to_cpu(pargs.args[0]),
> +                           be32_to_cpu(pargs.args[1]),
> +                           be32_to_cpu(pargs.args[2]));
> +    } else if (cmpserv("seek", 3, 1)) {
> +        pargs.args[nargs] =
> +            of_client_seek(spapr,
> +                           be32_to_cpu(pargs.args[0]),
> +                           be32_to_cpu(pargs.args[1]),
> +                           be32_to_cpu(pargs.args[2]));
> +    } else if (cmpserv("claim", 3, 1)) {
> +        pargs.args[nargs] =
> +            of_client_claim(spapr,
> +                            be32_to_cpu(pargs.args[0]),
> +                            be32_to_cpu(pargs.args[1]),
> +                            be32_to_cpu(pargs.args[2]));
> +    } else if (cmpserv("release", 2, 0)) {
> +        pargs.args[nargs] =
> +            of_client_release(spapr,
> +                              be32_to_cpu(pargs.args[0]),
> +                              be32_to_cpu(pargs.args[1]));
> +    } else if (cmpserv("call-method", 0, 0)) {
> +        pargs.args[nargs] =
> +            of_client_call_method(spapr,
> +                                  be32_to_cpu(pargs.args[0]),
> +                                  be32_to_cpu(pargs.args[1]),
> +                                  be32_to_cpu(pargs.args[2]),
> +                                  be32_to_cpu(pargs.args[3]),
> +                                  be32_to_cpu(pargs.args[4]),
> +                                  be32_to_cpu(pargs.args[5]),
> +                                  &pargs.args[nargs + 1]);
> +    } else if (cmpserv("interpret", 0, 0)) {
> +        pargs.args[nargs] =
> +            of_client_call_interpret(spapr,
> +                                     be32_to_cpu(pargs.args[0]),
> +                                     be32_to_cpu(pargs.args[1]),
> +                                     be32_to_cpu(pargs.args[2]),
> +                                     &pargs.args[nargs + 1]);
> +    } else if (cmpserv("milliseconds", 0, 1)) {
> +        pargs.args[nargs] = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
> +    } else if (cmpserv("quiesce", 0, 0)) {
> +        of_client_quiesce(spapr);
> +    } else if (cmpserv("exit", 0, 0)) {
> +        error_report("Stopped as the VM requested \"exit\"");
> +        vm_stop(RUN_STATE_PAUSED); /* Or qemu_system_guest_panicked(NULL); ? */
> +    } else {
> +        trace_spapr_of_client_error_unknown_service(service, nargs, nret);
> +        pargs.args[nargs] = -1;
> +    }
> +
> +    for (i = 0; i < nret; ++i) {
> +        pargs.args[nargs + i] = be32_to_cpu(pargs.args[nargs + i]);
> +    }
> +
> +    /* Copy what is needed as GRUB allocates only required minimum on stack */
> +    cpu_physical_memory_write(of_client_args, &pargs,
> +                              sizeof(uint32_t) * (3 + nargs + nret));
> +
> +    return H_SUCCESS;
> +}
> +
> +static void of_instance_free(gpointer data)
> +{
> +    SpaprOfInstance *inst = (SpaprOfInstance *) data;
> +
> +    g_free(inst->path);
> +    g_free(inst);
> +}
> +
> +void spapr_setup_of_client(SpaprMachineState *spapr, target_ulong *stack_ptr,
> +                           target_ulong *prom_entry)
> +{
> +    if (spapr->claimed) {
> +        g_array_unref(spapr->claimed);
> +    }
> +    if (spapr->of_instances) {
> +        g_hash_table_unref(spapr->of_instances);
> +    }
> +
> +    spapr->claimed = g_array_new(false, false, sizeof(SpaprOfClaimed));
> +    spapr->of_instances = g_hash_table_new_full(g_direct_hash, g_direct_equal,
> +                                                NULL, of_instance_free);
> +
> +    *stack_ptr = of_client_claim(spapr, 0, OF_STACK_SIZE, OF_STACK_SIZE);
> +    if (*stack_ptr == -1) {
> +        error_report("Memory allocation for stack failed");
> +        exit(1);
> +    }
> +    /*
> +     * Stack grows downwards and we also reserve here space for
> +     * the minimum stack frame.
> +     */
> +    *stack_ptr += OF_STACK_SIZE - 0x20;
> +
> +    *prom_entry = of_client_claim(spapr, 0, sizeof(of_client_blob),
> +                                  sizeof(of_client_blob));
> +    if (*prom_entry == -1) {
> +        error_report("Memory allocation for OF client failed");
> +        exit(1);
> +    }
> +
> +    cpu_physical_memory_write(*prom_entry, of_client_blob,
> +                              sizeof(of_client_blob));
> +
> +    if (of_client_claim(spapr, spapr->kernel_addr,
> +                        spapr->kernel_size, 0) == -1) {
> +        error_report("Memory for kernel is in use");
> +        exit(1);
> +    }
> +
> +    if (spapr->initrd_size &&
> +        of_client_claim(spapr, spapr->initrd_base,
> +                        spapr->initrd_size, 0) == -1) {
> +        error_report("Memory for initramdisk is in use");
> +        exit(1);
> +    }
> +
> +    /*
> +     * We skip writing FDT as nothing expects it; OF client interface is
> +     * going to be used for reading the device tree.
> +     */
> +}
> +
> +static gint of_claimed_compare_func(gconstpointer a, gconstpointer b)
> +{
> +    return ((SpaprOfClaimed *)a)->start - ((SpaprOfClaimed *)b)->start;
> +}
> +
> +static void of_client_dt_memory_available(void *fdt, GArray *claimed,
> +                                          uint64_t base)
> +{
> +    int i, n, offset, proplen = 0;
> +    uint64_t *mem0_reg;
> +    struct { uint64_t start, size; } *avail;
> +
> +    if (!fdt || !claimed) {
> +        return;
> +    }
> +
> +    offset = fdt_path_offset(fdt, "/memory@0");
> +    _FDT(offset);
> +
> +    mem0_reg = (uint64_t *) fdt_getprop(fdt, offset, "reg", &proplen);
> +    if (!mem0_reg || proplen != 2 * sizeof(uint64_t)) {
> +        return;
> +    }
> +
> +    g_array_sort(claimed, of_claimed_compare_func);
> +    of_client_clamed_dump(claimed);
> +
> +    avail = g_malloc0(sizeof(uint64_t) * 2 * claimed->len);
> +    for (i = 0, n = 0; i < claimed->len; ++i) {
> +        SpaprOfClaimed c = g_array_index(claimed, SpaprOfClaimed, i);
> +
> +        avail[n].start = c.start + c.size;
> +        if (i < claimed->len - 1) {
> +            SpaprOfClaimed cn = g_array_index(claimed, SpaprOfClaimed, i + 1);
> +
> +            avail[n].size = cn.start - avail[n].start;
> +        } else {
> +            avail[n].size = be64_to_cpu(mem0_reg[1]) - avail[n].start;
> +        }
> +
> +        if (avail[n].size) {
> +#ifdef DEBUG
> +            error_printf("AVAIL %lx..%lx size=%ld\n", avail[n].start,
> +                         avail[n].start + avail[n].size, avail[n].size);
> +#endif
> +            avail[n].start = cpu_to_be64(avail[n].start);
> +            avail[n].size = cpu_to_be64(avail[n].size);
> +            ++n;
> +        }
> +    }
> +    _FDT((fdt_setprop(fdt, offset, "available", avail,
> +                      sizeof(uint64_t) * 2 * n)));
> +    g_free(avail);
> +}
> +
> +void spapr_of_client_dt(SpaprMachineState *spapr, void *fdt)
> +{
> +    uint32_t phandle;
> +    int i, offset, proplen = 0;
> +    const void *prop;
> +    bool found = false;
> +    GArray *phandles = g_array_new(false, false, sizeof(uint32_t));
> +    const char *nodename;
> +    char *stdout_path = spapr_vio_stdout_path(spapr->vio_bus);
> +    int aliases;
> +    aliases = fdt_add_subnode(fdt, 0, "aliases");
> +
> +    if (stdout_path) {
> +        fdt_setprop_string(fdt, aliases, "hvterm", stdout_path);
> +    }
> +
> +    /* Add options now, doing it at the end of this __func__ breaks it :-/ */
> +    offset = fdt_add_subnode(fdt, 0, "options");
> +    if (offset > 0) {
> +        struct winsize ws;
> +
> +        if (ioctl(1, TIOCGWINSZ, &ws) != -1) {
> +            _FDT(fdt_setprop_cell(fdt, offset, "screen-#columns", ws.ws_col));
> +            _FDT(fdt_setprop_cell(fdt, offset, "screen-#rows", ws.ws_row));
> +        }
> +        _FDT(fdt_setprop_cell(fdt, offset, "real-mode?", 1));
> +    }
> +
> +    /* Add "disk" nodes to SCSI hosts */
> +    for (offset = fdt_next_node(fdt, -1, NULL), phandle = 1;
> +         offset >= 0;
> +         offset = fdt_next_node(fdt, offset, NULL), ++phandle) {
> +
> +        nodename = fdt_get_name(fdt, offset, NULL);
> +        if (strncmp(nodename, "scsi@", 5) == 0 ||
> +            strncmp(nodename, "v-scsi@", 7) == 0) {
> +            int disk_node_off = fdt_add_subnode(fdt, offset, "disk");
> +
> +            fdt_setprop_string(fdt, disk_node_off, "device_type", "block");
> +        }
> +    }
> +
> +    /* Find all predefined phandles */
> +    for (offset = fdt_next_node(fdt, -1, NULL);
> +         offset >= 0;
> +         offset = fdt_next_node(fdt, offset, NULL)) {
> +        prop = fdt_getprop(fdt, offset, "phandle", &proplen);
> +        if (prop && proplen == sizeof(uint32_t)) {
> +            phandle = fdt32_ld(prop);
> +            g_array_append_val(phandles, phandle);
> +        }
> +    }
> +
> +    /* Assign phandles skipping the predefined ones */
> +    for (offset = fdt_next_node(fdt, -1, NULL), phandle = 1;
> +         offset >= 0;
> +         offset = fdt_next_node(fdt, offset, NULL), ++phandle) {
> +
> +        prop = fdt_getprop(fdt, offset, "phandle", &proplen);
> +        if (prop) {
> +            continue;
> +        }
> +        /* Check if the current phandle is not allocated already */
> +        for ( ; ; ++phandle) {
> +            for (i = 0, found = false; i < phandles->len; ++i) {
> +                if (phandle == g_array_index(phandles, uint32_t, i)) {
> +                    found = true;
> +                    break;
> +                }
> +            }
> +            if (!found) {
> +                break;
> +            }
> +        }
> +        _FDT(fdt_setprop_cell(fdt, offset, "phandle", phandle));
> +    }
> +    g_array_unref(phandles);
> +
> +    of_client_dt_memory_available(fdt, spapr->claimed, spapr->claimed_base);
> +
> +    /* Advertise RTAS presense */
> +    offset = fdt_path_offset(fdt, "/rtas");
> +    _FDT(offset);
> +    _FDT(fdt_setprop_cell(fdt, offset, "rtas-size", sizeof(rtas_blob)));
> +}
> +
> +void spapr_of_client_dt_finalize(SpaprMachineState *spapr)
> +{
> +    void *fdt = spapr->fdt_blob;
> +    char *stdout_path = spapr_vio_stdout_path(spapr->vio_bus);
> +    int chosen = fdt_path_offset(fdt, "/chosen");
> +    size_t cb = 0;
> +    char *bootlist = get_boot_devices_list(&cb);
> +
> +    /*
> +     * SLOF-less setup requires an open instance of stdout for early
> +     * kernel printk. By now all phandles are settled so we can open
> +     * the default serial console.
> +     */
> +    if (stdout_path) {
> +        _FDT(fdt_setprop_cell(fdt, chosen, "stdout",
> +                              spapr_of_client_open(spapr, stdout_path)));
> +        _FDT(fdt_setprop_cell(fdt, chosen, "stdin",
> +                              spapr_of_client_open(spapr, stdout_path)));
> +    }
> +
> +    if (bootlist) {
> +        _FDT(fdt_setprop_string(fdt, chosen, "bootpath", bootlist));
> +        _FDT(fdt_setprop_string(fdt, chosen, "bootargs",
> +                                spapr->bootargs ? spapr->bootargs : ""));
> +    }
> +}
> +
> +static void spapr_of_client_machine_ready(Notifier *n, void *opaque)
> +{
> +    SpaprMachineState *spapr = container_of(n, SpaprMachineState,
> +                                            machine_ready);
> +    size_t cb = 0;
> +    char *bootlist = get_boot_devices_list(&cb);
> +    const char *blkstr;
> +    BlockBackend *blk;
> +    char *cur, *next;
> +    DeviceState *qdev;
> +    uint64_t offset = 0, size = 0;
> +    uint8_t *grub;
> +    int rc;
> +
> +    if (spapr->kernel_size) {
> +        return;
> +    }
> +
> +    bootlist = get_boot_devices_list(&cb);
> +
> +    if (!bootlist) {
> +        return;
> +    }
> +
> +    for (cur = bootlist; cb > 0; cur = next + 1) {
> +        for (next = cur; cb > 0; --cb) {
> +            if (*next == '\n') {
> +                *next = '\0';
> +                ++next;
> +                --cb;
> +                break;
> +            }
> +        }
> +
> +        qdev = of_client_find_qom_dev(sysbus_get_default(), cur);
> +        if (!qdev) {
> +            continue;
> +        }
> +
> +        blkstr = object_property_get_str(OBJECT(qdev), "drive", NULL);
> +        if (!blkstr) {
> +            continue;
> +        }
> +
> +        blk = blk_by_name(blkstr);
> +        if (!blk) {
> +            continue;
> +        }
> +
> +        if (find_prep_partition(blk, &offset, &size)) {
> +            continue;
> +        }
> +
> +        grub = g_malloc0(size);
> +        if (!grub) {
> +            continue;
> +        }
> +
> +        rc = blk_pread(blk, offset, grub, size);
> +        if (rc <= 0) {
> +            g_free(grub);
> +            continue;
> +        }
> +
> +        g_file_set_contents("my.grub", (void *) grub, size, NULL);
> +        spapr->kernel_size = load_elf("my.grub", NULL, NULL, NULL,
> +                                      NULL, &spapr->kernel_addr,
> +                                      NULL, NULL, 1,
> +                                      PPC_ELF_MACHINE, 0, 0);
> +        spapr->kernel_size = size;
> +
> +        trace_spapr_of_client_blk_bootloader_read(offset, size);
> +        break;
> +    }
> +
> +    g_free(bootlist);
> +}
> +
> +void spapr_of_client_machine_init(SpaprMachineState *spapr)
> +{
> +    spapr_register_hypercall(KVMPPC_H_OF_CLIENT, spapr_h_of_client);
> +    spapr->machine_ready.notify = spapr_of_client_machine_ready;
> +    qemu_add_machine_init_done_notifier(&spapr->machine_ready);
> +}
> diff --git a/hw/ppc/trace-events b/hw/ppc/trace-events
> index 9ea620f23c85..757afb66834e 100644
> --- a/hw/ppc/trace-events
> +++ b/hw/ppc/trace-events
> @@ -21,6 +21,30 @@ spapr_update_dt(unsigned cb) "New blob %u bytes"
> spapr_update_dt_failed_size(unsigned cbold, unsigned cbnew, unsigned magic) "Old blob %u bytes, new blob %u bytes, magic 0x%x"
> spapr_update_dt_failed_check(unsigned cbold, unsigned cbnew, unsigned magic) "Old blob %u bytes, new blob %u bytes, magic 0x%x"
>
> +# spapr_client.c
> +spapr_of_client_error_str_truncated(const char *s, int len) "%s truncated to %d"
> +spapr_of_client_error_param(const char *method, int nargscheck, int nretcheck, int nargs, int nret) "%s takes/returns %d/%d, not %d/%d"
> +spapr_of_client_error_unknown_service(const char *service, int nargs, int nret) "\"%s\" args=%d rets=%d"
> +spapr_of_client_error_unknown_method(const char *method) "\"%s\""
> +spapr_of_client_error_unknown_ihandle_close(uint32_t ih) "ih=0x%x"
> +spapr_of_client_error_unknown_path(const char *path) "\"%s\""
> +spapr_of_client_finddevice(const char *path, uint32_t ph) "\"%s\" => ph=0x%x"
> +spapr_of_client_claim(uint32_t virt, uint32_t size, uint32_t align, uint32_t ret) "virt=0x%x size=0x%x align=0x%x => 0x%x"
> +spapr_of_client_release(uint32_t virt, uint32_t size, uint32_t ret) "virt=0x%x size=0x%x => 0x%x"
> +spapr_of_client_method(uint32_t ihandle, const char *method, uint32_t param, uint32_t ret, uint32_t ret2) "ih=0x%x \"%s\"(0x%x) => 0x%x 0x%x"
> +spapr_of_client_getprop(uint32_t ph, const char *prop, uint32_t ret, const char *val) "ph=0x%x \"%s\" => len=%d [%s]"
> +spapr_of_client_getproplen(uint32_t ph, const char *prop, uint32_t ret) "ph=0x%x \"%s\" => len=%d"
> +spapr_of_client_setprop(uint32_t ph, const char *prop, const char *val, uint32_t ret) "ph=0x%x \"%s\" [%s] => len=%d"
> +spapr_of_client_open(const char *path, uint32_t ph, uint32_t ih) "%s ph=0x%x => ih=0x%x"
> +spapr_of_client_interpret(const char *cmd, uint32_t param1, uint32_t param2, uint32_t ret, uint32_t ret2) "[%s] 0x%x 0x%x => 0x%x 0x%x"
> +spapr_of_client_blk_write(uint32_t ih, uint32_t len) "0x%x => len=%d"
> +spapr_of_client_blk_read(uint32_t ih, uint64_t pos, uint32_t len, uint32_t ret) "ih=0x%x @0x%"PRIx64" size=%d => %d"
> +spapr_of_client_blk_seek(uint32_t ih, uint64_t pos, uint32_t ret) "ih=0x%x 0x%"PRIx64" => %d"
> +spapr_of_client_blk_bootloader_read(uint64_t offset, uint64_t size) "0x%"PRIx64" size=0x%"PRIx64
> +spapr_of_client_package_to_path(uint32_t ph, const char *tmp, uint32_t ret) "ph=0x%x => %s len=%d"
> +spapr_of_client_instance_to_path(uint32_t ih, uint32_t ph, const char *tmp, uint32_t ret) "ih=0x%x ph=0x%x => %s len=%d"
> +spapr_of_client_instance_to_package(uint32_t ih, uint32_t ph) "ih=0x%x => ph=0x%x"
> +
> # spapr_hcall_tpm.c
> spapr_h_tpm_comm(const char *device_path, uint64_t operation) "tpm_device_path=%s operation=0x%"PRIu64
> spapr_tpm_execute(uint64_t data_in, uint64_t data_in_sz, uint64_t data_out, uint64_t data_out_sz) "data_in=0x%"PRIx64", data_in_sz=%"PRIu64", data_out=0x%"PRIx64", data_out_sz=%"PRIu64
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH qemu v6 6/6] spapr: Implement Open Firmware client interface
  2020-02-03 13:03   ` BALATON Zoltan
@ 2020-02-05  4:18     ` Alexey Kardashevskiy
  0 siblings, 0 replies; 17+ messages in thread
From: Alexey Kardashevskiy @ 2020-02-05  4:18 UTC (permalink / raw)
  To: BALATON Zoltan
  Cc: Peter Maydell, David Gibson, qemu-ppc, qemu-devel, Paolo Bonzini



On 04/02/2020 00:03, BALATON Zoltan wrote:
> On Mon, 3 Feb 2020, Alexey Kardashevskiy wrote:
> [...]
>> diff --git a/hw/ppc/spapr_of_client.c b/hw/ppc/spapr_of_client.c
>> new file mode 100644
>> index 000000000000..31555c356de8
>> --- /dev/null
>> +++ b/hw/ppc/spapr_of_client.c
>> @@ -0,0 +1,1526 @@
>> +#include "qemu/osdep.h"
>> +#include "qemu-common.h"
>> +#include <sys/ioctl.h>
>> +#include <termios.h>
>> +#include "qapi/error.h"
>> +#include "exec/memory.h"
>> +#include "hw/ppc/spapr.h"
>> +#include "hw/ppc/spapr_vio.h"
> 
> I haven't read all of this as it's a lot of code but I was thinking this
> might also be useful to implement a replacement for the proprietary
> firmware of the pegasos2 emulation later. See:
> 
> https://osdn.net/projects/qmiga/wiki/SubprojectPegasos2
> 
> for details. I likely don't need a complete OpenFirmware for booting
> Linux and MorphOS as these only get some data from the device tree and
> then use their own drivers (MorphOS may use some RTAS but I'm not sure)
> so something like this might be enough to get these booting without the
> original firmware on pegasos2 and is easier than reimplementing the
> firmware or trying to modify OpenFirmware to emulate the different
> device tree that machine has.
> 
> Question is if it's possible to make this file independent of spapr or
> gather the OF emulation parts in a separate file that could be reused
> later? This could be done afterwards but if there's anything now that
> you can do to make it easier then that would help so maybe it's good
> idea to make you aware of this in case you can consider this possible
> use case.


I can separate OF CI part from the SpaprMachineState but so far I was
not successful in convincing the community about usefulness of this.



-- 
Alexey


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH qemu v6] spapr: OF CI networking
  2020-02-03  3:29 [PATCH qemu v6 0/6] spapr: Kill SLOF Alexey Kardashevskiy
                   ` (5 preceding siblings ...)
  2020-02-03  3:29 ` [PATCH qemu v6 6/6] spapr: Implement Open Firmware client interface Alexey Kardashevskiy
@ 2020-02-05  4:59 ` Alexey Kardashevskiy
  6 siblings, 0 replies; 17+ messages in thread
From: Alexey Kardashevskiy @ 2020-02-05  4:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, Paolo Bonzini, qemu-ppc, Peter Maydell,
	David Gibson

This is a *hack* to demonstrate the idea.

I am connecting the OF CI's "net" device to a corresponding network
backend. Unlike blockdev with its simple API allowing to just read
from a device, network requires a client and in presence of
an existing device in QEMU, having another device-agnostinc receiver
from the netdev backend is quite tricky.

Is this something that should never ever be done in such paravirtual
setup? Is there a better way to do this within the existing QEMU besides
implementing drivers in the guest space?

This one manages to get DHCP config working in GRUB (compiled separately
and loaded via "-kernel") but this is very unstable - needs syncronization,
correct qemu_flush_queued_packets/qemu_purge_queued_packets in places,
etc.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr_of_client.c | 150 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 142 insertions(+), 8 deletions(-)

diff --git a/hw/ppc/spapr_of_client.c b/hw/ppc/spapr_of_client.c
index 31555c356de8..a05d891eb452 100644
--- a/hw/ppc/spapr_of_client.c
+++ b/hw/ppc/spapr_of_client.c
@@ -15,6 +15,7 @@
 #include "elf.h"
 #include "hw/ppc/ppc.h"
 #include "hw/loader.h"
+#include "net/net.h"
 #include "trace.h"
 
 #define ALIGN(x, a) (((x) + (a) - 1) & ~((a) - 1))
@@ -211,12 +212,23 @@ static const uint8_t of_client_blob[] = {
     0x4e, 0x80, 0x00, 0x20
 };
 
+typedef uint8_t SpaprOfNetBuffer[2048];
+typedef struct {
+    int head, tail, used;
+    SpaprOfNetBuffer b[32];
+} SpaprOfNetBuffers;
+
 typedef struct {
     DeviceState *dev;
     CharBackend *cbe;
     BlockBackend *blk;
     uint64_t blk_pos;
     uint16_t blk_physical_block_size;
+    NICState *nic;
+    NICConf nicconf;
+    SpaprOfNetBuffers *netbufs;
+    NetClientState *ncpeer;
+    char *params;
     char *path; /* the path used to open the instance */
     uint32_t phandle;
 } SpaprOfInstance;
@@ -612,6 +624,75 @@ static DeviceState *of_client_find_qom_dev(BusState *bus, const char *path)
     return NULL;
 }
 
+static bool network_device_get_mac(DeviceState *qdev, MACAddr *mac)
+{
+    const char *macstr;
+
+    macstr = object_property_get_str(OBJECT(qdev), "mac", NULL);
+    if (!macstr) {
+        return false;
+    }
+
+    return sscanf(macstr, "%02hhx:%02hhx:%02hhx:%02hhx:%02hhx:%02hhx",
+                  &mac->a[0], &mac->a[1], &mac->a[2],
+                  &mac->a[3], &mac->a[4], &mac->a[5]) == 6;
+}
+
+static int of_client_net_can_receive(NetClientState *nc)
+{
+    SpaprOfInstance *inst = qemu_get_nic_opaque(nc);
+    SpaprOfNetBuffers *nb = inst->netbufs;
+
+    return nb->used < ARRAY_SIZE(nb->b);
+}
+
+static ssize_t of_client_net_receive(NetClientState *nc, const uint8_t *buf,
+                                     size_t size)
+{
+    SpaprOfInstance *inst = qemu_get_nic_opaque(nc);
+    SpaprOfNetBuffers *nb = inst->netbufs;
+    int next;
+
+    next = (nb->head + 1) % ARRAY_SIZE(nb->b);
+    g_assert(next != nb->tail);
+    g_assert(size <= ARRAY_SIZE(nb->b[0]));
+
+    memcpy(nb->b[next], buf, size);
+    nb->head = next;
+    ++nb->used;
+
+    return size;
+}
+
+static ssize_t of_client_net_read(SpaprOfInstance *inst, uint8_t *buf,
+                                  size_t size)
+{
+    SpaprOfNetBuffers *nb = inst->netbufs;
+
+    if (!nb->used) {
+        return 0;
+    }
+    g_assert(nb->head != nb->tail);
+    memcpy(buf, nb->b[nb->tail], size);
+    nb->tail = (nb->tail + 1) % ARRAY_SIZE(nb->b);
+    --nb->used;
+
+    return size;
+}
+
+static ssize_t of_client_net_write(SpaprOfInstance *inst, uint8_t *buf,
+                                   size_t size)
+{
+    return qemu_send_packet(qemu_get_queue(inst->nic), buf, size);
+}
+
+static NetClientInfo of_client_net_info = {
+    .type = NET_CLIENT_DRIVER_NIC,
+    .size = sizeof(NICState),
+    .can_receive = of_client_net_can_receive,
+    .receive = of_client_net_receive,
+};
+
 static uint32_t spapr_of_client_open(SpaprMachineState *spapr, const char *path)
 {
     int offset;
@@ -625,11 +706,6 @@ static uint32_t spapr_of_client_open(SpaprMachineState *spapr, const char *path)
     }
 
     split_path(path, &node, &unit, &part);
-    if (part && strcmp(part, "0")) {
-        error_report("Error: Do not do partitions now");
-        g_free(part);
-        part = NULL;
-    }
 
     offset = of_client_fdt_path_offset(spapr->fdt_blob, node, unit);
     if (offset < 0) {
@@ -649,6 +725,7 @@ static uint32_t spapr_of_client_open(SpaprMachineState *spapr, const char *path)
         g_free(tmp);
     }
     inst->path = g_strdup(path);
+    inst->params = part;
     g_hash_table_insert(spapr->of_instances,
                         GINT_TO_POINTER(spapr->of_instance_last),
                         inst);
@@ -659,6 +736,8 @@ static uint32_t spapr_of_client_open(SpaprMachineState *spapr, const char *path)
                                                       "chardev", NULL);
         const char *blkstr = object_property_get_str(OBJECT(inst->dev),
                                                      "drive", NULL);
+        const char *ncstr = object_property_get_str(OBJECT(inst->dev),
+                                                    "netdev", NULL);
 
         if (cdevstr) {
             Chardev *cdev = qemu_chr_find(cdevstr);
@@ -673,6 +752,28 @@ static uint32_t spapr_of_client_open(SpaprMachineState *spapr, const char *path)
             conf.blk = inst->blk;
             blkconf_blocksizes(&conf);
             inst->blk_physical_block_size = conf.physical_block_size;
+        } else if (ncstr && network_device_get_mac(inst->dev,
+                                                   &inst->nicconf.macaddr)) {
+            /*
+             * We already have a NIC hooked to a netdev bachend. To bypass
+             * the NIC, we temporarily replace the netdev's peer to our
+             * OF NIC and revert it back when the instance is closed.
+             */
+            inst->nicconf.peers.ncs[0] = qemu_find_netdev(ncstr);
+            if (inst->nicconf.peers.ncs[0]) {
+                inst->nicconf.peers.queues = 1;
+                inst->ncpeer = inst->nicconf.peers.ncs[0]->peer;
+                qemu_flush_queued_packets(inst->ncpeer);
+                inst->nicconf.peers.ncs[0]->peer = NULL;
+                inst->nic = qemu_new_nic(&of_client_net_info, &inst->nicconf,
+                                         "OF1275-CI", "ofnet", inst);
+                qemu_purge_queued_packets(inst->nicconf.peers.ncs[0]);
+            }
+            if (inst->nic) {
+                qemu_format_nic_info_str(qemu_get_queue(inst->nic),
+                                         inst->nicconf.macaddr.a);
+                inst->netbufs = g_malloc0(sizeof(inst->netbufs[0]));
+            }
         }
     }
 
@@ -680,7 +781,6 @@ trace_exit:
     trace_spapr_of_client_open(path, inst ? inst->phandle : 0, ret);
     g_free(node);
     g_free(unit);
-    g_free(part);
 
     return ret;
 }
@@ -760,7 +860,7 @@ static uint32_t of_client_instance_to_path(SpaprMachineState *spapr,
 static uint32_t of_client_write(SpaprMachineState *spapr, uint32_t ihandle,
                                 uint32_t buf, uint32_t len)
 {
-    char tmp[256];
+    char tmp[1025];
     int toread, toprint, cb = MIN(len, 1024);
     SpaprOfInstance *inst = (SpaprOfInstance *)
         g_hash_table_lookup(spapr->of_instances, GINT_TO_POINTER(ihandle));
@@ -777,6 +877,8 @@ static uint32_t of_client_write(SpaprMachineState *spapr, uint32_t ihandle,
                                                 toprint);
             } else if (inst->blk) {
                 trace_spapr_of_client_blk_write(ihandle, len);
+            } else if (inst->nic) {
+                toprint = of_client_net_write(inst, (uint8_t *) tmp, toread);
             }
         } else {
             /* We normally open stdout so this is fallback */
@@ -822,6 +924,8 @@ static uint32_t of_client_read(SpaprMachineState *spapr, uint32_t ihandle,
                 if (rc > 0) {
                     inst->blk_pos += rc;
                 }
+            } else if (inst->nic) {
+                ret = of_client_net_read(inst, buf, len);
             }
         }
     }
@@ -1214,6 +1318,15 @@ static void of_instance_free(gpointer data)
 {
     SpaprOfInstance *inst = (SpaprOfInstance *) data;
 
+    if (inst->ncpeer) {
+        qemu_flush_queued_packets(inst->nicconf.peers.ncs[0]->peer);
+        inst->nicconf.peers.ncs[0]->peer = inst->ncpeer;
+        qemu_flush_queued_packets(inst->nicconf.peers.ncs[0]->peer);
+
+        qemu_del_nic(inst->nic);
+    }
+    g_free(inst->netbufs);
+    g_free(inst->params);
     g_free(inst->path);
     g_free(inst);
 }
@@ -1355,7 +1468,7 @@ void spapr_of_client_dt(SpaprMachineState *spapr, void *fdt)
         _FDT(fdt_setprop_cell(fdt, offset, "real-mode?", 1));
     }
 
-    /* Add "disk" nodes to SCSI hosts */
+    /* Add "disk" nodes to SCSI hosts, same for "network" */
     for (offset = fdt_next_node(fdt, -1, NULL), phandle = 1;
          offset >= 0;
          offset = fdt_next_node(fdt, offset, NULL), ++phandle) {
@@ -1366,6 +1479,27 @@ void spapr_of_client_dt(SpaprMachineState *spapr, void *fdt)
             int disk_node_off = fdt_add_subnode(fdt, offset, "disk");
 
             fdt_setprop_string(fdt, disk_node_off, "device_type", "block");
+        } else if (strncmp(nodename, "ethernet@", 9) == 0 ||
+                   strncmp(nodename, "l-lan@", 6) == 0) {
+
+            char devpath[1024];
+            DeviceState *qdev;
+            MACAddr mac;
+
+            if (fdt_get_path(fdt, offset, devpath, sizeof(devpath) - 1) < 0) {
+                continue;
+            }
+
+            qdev = of_client_find_qom_dev(sysbus_get_default(), devpath);
+            if (!qdev) {
+                continue;
+            }
+            if (network_device_get_mac(qdev, &mac)) {
+                fdt_setprop(fdt, offset, "local-mac-address", mac.a,
+                            sizeof(mac));
+            }
+            fdt_setprop_string(fdt, offset, "device_type", "network");
+            fdt_setprop_cell(fdt, offset, "max-frame-size", 1024);
         }
     }
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH qemu v6 1/6] ppc: Start CPU in the default mode which is big-endian 32bit
  2020-02-03  3:29 ` [PATCH qemu v6 1/6] ppc: Start CPU in the default mode which is big-endian 32bit Alexey Kardashevskiy
@ 2020-02-12  5:43   ` David Gibson
  2020-02-13  3:09     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 17+ messages in thread
From: David Gibson @ 2020-02-12  5:43 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: Paolo Bonzini, qemu-ppc, qemu-devel, Peter Maydell

[-- Attachment #1: Type: text/plain, Size: 1790 bytes --]

On Mon, Feb 03, 2020 at 02:29:38PM +1100, Alexey Kardashevskiy wrote:
> At the moment we enforce 64bit mode on a CPU when reset. This does not
> make difference as SLOF or Linux set the desired mode straight away.
> However if we ever boot something other than these two,
> this might not work as, for example, GRUB expects the default MSR state
> and does not work properly.
> 
> This removes setting MSR_SF from the PPC CPU reset.

Hrm.  This is in the core cpu model so it doesn't just affect pseries,
but powernv (and theoretically others) as well.  Generally the cpu
model should have the bare metal behaviour, and we can override it in
the pseries machine if necessary.

So for a bare metal POWER system, what mode do we start in?  I'm
guessing it probably doesn't matter in practice, since the skiboot
firmware also probably does a mode set on entry, but it'd be nice to
get this right in theory.

> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  target/ppc/translate_init.inc.c | 6 ------
>  1 file changed, 6 deletions(-)
> 
> diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
> index 53995f62eab2..f6a676cf55e8 100644
> --- a/target/ppc/translate_init.inc.c
> +++ b/target/ppc/translate_init.inc.c
> @@ -10710,12 +10710,6 @@ static void ppc_cpu_reset(CPUState *s)
>  #endif
>  #endif
>  
> -#if defined(TARGET_PPC64)
> -    if (env->mmu_model & POWERPC_MMU_64) {
> -        msr |= (1ULL << MSR_SF);
> -    }
> -#endif
> -
>      hreg_store_msr(env, msr, 1);
>  
>  #if !defined(CONFIG_USER_ONLY)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH qemu v6 2/6] ppc/spapr: Move GPRs setup to one place
  2020-02-03  3:29 ` [PATCH qemu v6 2/6] ppc/spapr: Move GPRs setup to one place Alexey Kardashevskiy
@ 2020-02-12 18:44   ` Fabiano Rosas
  2020-02-13  8:41   ` Greg Kurz
  1 sibling, 0 replies; 17+ messages in thread
From: Fabiano Rosas @ 2020-02-12 18:44 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Peter Maydell, David Gibson, qemu-ppc, Paolo Bonzini

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> At the moment "pseries" starts in SLOF which only expects the FDT blob
> pointer in r3. As we are going to introduce a OpenFirmware support in
> QEMU, we will be booting OF clients directly and these expect a stack
> pointer in r1, the OF entry point in r5 and in addition to this, Linux
> looks at r3/r4 for the initramdisk location (although vmlinux can find
> this from the device tree but zImage from distro kernels cannot).
>
> This extends spapr_cpu_set_entry_state() to take more registers. This
> should cause no behavioral change.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>

> ---
>  include/hw/ppc/spapr_cpu_core.h | 4 +++-
>  hw/ppc/spapr.c                  | 4 ++--
>  hw/ppc/spapr_cpu_core.c         | 7 ++++++-
>  hw/ppc/spapr_rtas.c             | 2 +-
>  4 files changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/include/hw/ppc/spapr_cpu_core.h b/include/hw/ppc/spapr_cpu_core.h
> index 1c4cc6559c52..edd7214fafcf 100644
> --- a/include/hw/ppc/spapr_cpu_core.h
> +++ b/include/hw/ppc/spapr_cpu_core.h
> @@ -40,7 +40,9 @@ typedef struct SpaprCpuCoreClass {
>  } SpaprCpuCoreClass;
>  
>  const char *spapr_get_cpu_core_type(const char *cpu_type);
> -void spapr_cpu_set_entry_state(PowerPCCPU *cpu, target_ulong nip, target_ulong r3);
> +void spapr_cpu_set_entry_state(PowerPCCPU *cpu, target_ulong nip,
> +                               target_ulong r1, target_ulong r3,
> +                               target_ulong r4, target_ulong r5);
>  
>  typedef struct SpaprCpuState {
>      uint64_t vpa_addr;
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index c9b2e0a5e060..660a4b60e072 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1674,8 +1674,8 @@ static void spapr_machine_reset(MachineState *machine)
>      spapr->fdt_blob = fdt;
>  
>      /* Set up the entry state */
> -    spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT, fdt_addr);
> -    first_ppc_cpu->env.gpr[5] = 0;
> +    spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT,
> +                              0, fdt_addr, 0, 0);
>  
>      spapr->cas_reboot = false;
>  
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index d09125d9afd4..696b76598dd7 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -84,13 +84,18 @@ static void spapr_reset_vcpu(PowerPCCPU *cpu)
>      spapr_irq_cpu_intc_reset(spapr, cpu);
>  }
>  
> -void spapr_cpu_set_entry_state(PowerPCCPU *cpu, target_ulong nip, target_ulong r3)
> +void spapr_cpu_set_entry_state(PowerPCCPU *cpu, target_ulong nip,
> +                               target_ulong r1, target_ulong r3,
> +                               target_ulong r4, target_ulong r5)
>  {
>      PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>      CPUPPCState *env = &cpu->env;
>  
>      env->nip = nip;
> +    env->gpr[1] = r1;
>      env->gpr[3] = r3;
> +    env->gpr[4] = r4;
> +    env->gpr[5] = r5;
>      kvmppc_set_reg_ppc_online(cpu, 1);
>      CPU(cpu)->halted = 0;
>      /* Enable Power-saving mode Exit Cause exceptions */
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 656fdd221665..9e3cbd70bbd9 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -190,7 +190,7 @@ static void rtas_start_cpu(PowerPCCPU *callcpu, SpaprMachineState *spapr,
>       */
>      newcpu->env.tb_env->tb_offset = callcpu->env.tb_env->tb_offset;
>  
> -    spapr_cpu_set_entry_state(newcpu, start, r3);
> +    spapr_cpu_set_entry_state(newcpu, start, 0, r3, 0, 0);
>  
>      qemu_cpu_kick(CPU(newcpu));


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH qemu v6 5/6] spapr: Allow changing offset for -kernel image
  2020-02-03  3:29 ` [PATCH qemu v6 5/6] spapr: Allow changing offset for -kernel image Alexey Kardashevskiy
@ 2020-02-12 18:54   ` Fabiano Rosas
  2020-02-13  2:58   ` David Gibson
  1 sibling, 0 replies; 17+ messages in thread
From: Fabiano Rosas @ 2020-02-12 18:54 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Peter Maydell, David Gibson, qemu-ppc, Paolo Bonzini

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> This allows moving the kernel in the guest memory. The option is useful
> for step debugging (as Linux is linked at 0x0); it also allows loading
> grub which is normally linked to run at 0x20000.
>

+1, as this fixes half of the '-S' debugging issue.

> This uses the existing kernel address by default.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>

> ---
>  include/hw/ppc/spapr.h |  1 +
>  hw/ppc/spapr.c         | 38 +++++++++++++++++++++++++++++++-------
>  2 files changed, 32 insertions(+), 7 deletions(-)
>
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 3b50f36c338a..32e831a395ae 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -164,6 +164,7 @@ struct SpaprMachineState {
>      void *fdt_blob;
>      long kernel_size;
>      bool kernel_le;
> +    uint64_t kernel_addr;
>      uint32_t initrd_base;
>      long initrd_size;
>      uint64_t rtc_offset; /* Now used only during incoming migration */
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 60153bf0b771..b59e9dc360fe 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1054,7 +1054,7 @@ static void spapr_dt_chosen(SpaprMachineState *spapr, void *fdt)
>      }
>  
>      if (spapr->kernel_size) {
> -        uint64_t kprop[2] = { cpu_to_be64(KERNEL_LOAD_ADDR),
> +        uint64_t kprop[2] = { cpu_to_be64(spapr->kernel_addr),
>                                cpu_to_be64(spapr->kernel_size) };
>  
>          _FDT(fdt_setprop(fdt, chosen, "qemu,boot-kernel",
> @@ -1242,7 +1242,8 @@ void *spapr_build_fdt(SpaprMachineState *spapr, bool reset, size_t space)
>      /* Build memory reserve map */
>      if (reset) {
>          if (spapr->kernel_size) {
> -            _FDT((fdt_add_mem_rsv(fdt, KERNEL_LOAD_ADDR, spapr->kernel_size)));
> +            _FDT((fdt_add_mem_rsv(fdt, spapr->kernel_addr,
> +                                  spapr->kernel_size)));
>          }
>          if (spapr->initrd_size) {
>              _FDT((fdt_add_mem_rsv(fdt, spapr->initrd_base,
> @@ -1270,7 +1271,9 @@ void *spapr_build_fdt(SpaprMachineState *spapr, bool reset, size_t space)
>  
>  static uint64_t translate_kernel_address(void *opaque, uint64_t addr)
>  {
> -    return (addr & 0x0fffffff) + KERNEL_LOAD_ADDR;
> +    SpaprMachineState *spapr = opaque;
> +
> +    return (addr & 0x0fffffff) + spapr->kernel_addr;
>  }
>  
>  static void emulate_spapr_hypercall(PPCVirtualHypervisor *vhyp,
> @@ -2947,14 +2950,15 @@ static void spapr_machine_init(MachineState *machine)
>          uint64_t lowaddr = 0;
>  
>          spapr->kernel_size = load_elf(kernel_filename, NULL,
> -                                      translate_kernel_address, NULL,
> +                                      translate_kernel_address, spapr,
>                                        NULL, &lowaddr, NULL, NULL, 1,
>                                        PPC_ELF_MACHINE, 0, 0);
>          if (spapr->kernel_size == ELF_LOAD_WRONG_ENDIAN) {
>              spapr->kernel_size = load_elf(kernel_filename, NULL,
> -                                          translate_kernel_address, NULL, NULL,
> +                                          translate_kernel_address, spapr, NULL,
>                                            &lowaddr, NULL, NULL, 0,
> -                                          PPC_ELF_MACHINE, 0, 0);
> +                                          PPC_ELF_MACHINE,
> +                                          0, 0);
>              spapr->kernel_le = spapr->kernel_size > 0;
>          }
>          if (spapr->kernel_size < 0) {
> @@ -2968,7 +2972,7 @@ static void spapr_machine_init(MachineState *machine)
>              /* Try to locate the initrd in the gap between the kernel
>               * and the firmware. Add a bit of space just in case
>               */
> -            spapr->initrd_base = (KERNEL_LOAD_ADDR + spapr->kernel_size
> +            spapr->initrd_base = (spapr->kernel_addr + spapr->kernel_size
>                                    + 0x1ffff) & ~0xffff;
>              spapr->initrd_size = load_image_targphys(initrd_filename,
>                                                       spapr->initrd_base,
> @@ -3214,6 +3218,18 @@ static void spapr_set_vsmt(Object *obj, Visitor *v, const char *name,
>      visit_type_uint32(v, name, (uint32_t *)opaque, errp);
>  }
>  
> +static void spapr_get_kernel_addr(Object *obj, Visitor *v, const char *name,
> +                                  void *opaque, Error **errp)
> +{
> +    visit_type_uint64(v, name, (uint64_t *)opaque, errp);
> +}
> +
> +static void spapr_set_kernel_addr(Object *obj, Visitor *v, const char *name,
> +                                  void *opaque, Error **errp)
> +{
> +    visit_type_uint64(v, name, (uint64_t *)opaque, errp);
> +}
> +
>  static char *spapr_get_ic_mode(Object *obj, Error **errp)
>  {
>      SpaprMachineState *spapr = SPAPR_MACHINE(obj);
> @@ -3319,6 +3335,14 @@ static void spapr_instance_init(Object *obj)
>      object_property_add_bool(obj, "vfio-no-msix-emulation",
>                               spapr_get_msix_emulation, NULL, NULL);
>  
> +    object_property_add(obj, "kernel-addr", "uint64", spapr_get_kernel_addr,
> +                        spapr_set_kernel_addr, NULL, &spapr->kernel_addr,
> +                        &error_abort);
> +    object_property_set_description(obj, "kernel-addr",
> +                                    stringify(KERNEL_LOAD_ADDR)
> +                                    " for -kernel is the default",
> +                                    NULL);
> +    spapr->kernel_addr = KERNEL_LOAD_ADDR;
>      /* The machine class defines the default interrupt controller mode */
>      spapr->irq = smc->irq;
>      object_property_add_str(obj, "ic-mode", spapr_get_ic_mode,


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH qemu v6 5/6] spapr: Allow changing offset for -kernel image
  2020-02-03  3:29 ` [PATCH qemu v6 5/6] spapr: Allow changing offset for -kernel image Alexey Kardashevskiy
  2020-02-12 18:54   ` Fabiano Rosas
@ 2020-02-13  2:58   ` David Gibson
  1 sibling, 0 replies; 17+ messages in thread
From: David Gibson @ 2020-02-13  2:58 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: Paolo Bonzini, qemu-ppc, qemu-devel, Peter Maydell

[-- Attachment #1: Type: text/plain, Size: 6175 bytes --]

On Mon, Feb 03, 2020 at 02:29:42PM +1100, Alexey Kardashevskiy wrote:
> This allows moving the kernel in the guest memory. The option is useful
> for step debugging (as Linux is linked at 0x0); it also allows loading
> grub which is normally linked to run at 0x20000.
> 
> This uses the existing kernel address by default.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Applied to ppc-for-5.0, since I think it makes sense even without the
rest of the series.

> ---
>  include/hw/ppc/spapr.h |  1 +
>  hw/ppc/spapr.c         | 38 +++++++++++++++++++++++++++++++-------
>  2 files changed, 32 insertions(+), 7 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 3b50f36c338a..32e831a395ae 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -164,6 +164,7 @@ struct SpaprMachineState {
>      void *fdt_blob;
>      long kernel_size;
>      bool kernel_le;
> +    uint64_t kernel_addr;
>      uint32_t initrd_base;
>      long initrd_size;
>      uint64_t rtc_offset; /* Now used only during incoming migration */
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 60153bf0b771..b59e9dc360fe 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1054,7 +1054,7 @@ static void spapr_dt_chosen(SpaprMachineState *spapr, void *fdt)
>      }
>  
>      if (spapr->kernel_size) {
> -        uint64_t kprop[2] = { cpu_to_be64(KERNEL_LOAD_ADDR),
> +        uint64_t kprop[2] = { cpu_to_be64(spapr->kernel_addr),
>                                cpu_to_be64(spapr->kernel_size) };
>  
>          _FDT(fdt_setprop(fdt, chosen, "qemu,boot-kernel",
> @@ -1242,7 +1242,8 @@ void *spapr_build_fdt(SpaprMachineState *spapr, bool reset, size_t space)
>      /* Build memory reserve map */
>      if (reset) {
>          if (spapr->kernel_size) {
> -            _FDT((fdt_add_mem_rsv(fdt, KERNEL_LOAD_ADDR, spapr->kernel_size)));
> +            _FDT((fdt_add_mem_rsv(fdt, spapr->kernel_addr,
> +                                  spapr->kernel_size)));
>          }
>          if (spapr->initrd_size) {
>              _FDT((fdt_add_mem_rsv(fdt, spapr->initrd_base,
> @@ -1270,7 +1271,9 @@ void *spapr_build_fdt(SpaprMachineState *spapr, bool reset, size_t space)
>  
>  static uint64_t translate_kernel_address(void *opaque, uint64_t addr)
>  {
> -    return (addr & 0x0fffffff) + KERNEL_LOAD_ADDR;
> +    SpaprMachineState *spapr = opaque;
> +
> +    return (addr & 0x0fffffff) + spapr->kernel_addr;
>  }
>  
>  static void emulate_spapr_hypercall(PPCVirtualHypervisor *vhyp,
> @@ -2947,14 +2950,15 @@ static void spapr_machine_init(MachineState *machine)
>          uint64_t lowaddr = 0;
>  
>          spapr->kernel_size = load_elf(kernel_filename, NULL,
> -                                      translate_kernel_address, NULL,
> +                                      translate_kernel_address, spapr,
>                                        NULL, &lowaddr, NULL, NULL, 1,
>                                        PPC_ELF_MACHINE, 0, 0);
>          if (spapr->kernel_size == ELF_LOAD_WRONG_ENDIAN) {
>              spapr->kernel_size = load_elf(kernel_filename, NULL,
> -                                          translate_kernel_address, NULL, NULL,
> +                                          translate_kernel_address, spapr, NULL,
>                                            &lowaddr, NULL, NULL, 0,
> -                                          PPC_ELF_MACHINE, 0, 0);
> +                                          PPC_ELF_MACHINE,
> +                                          0, 0);
>              spapr->kernel_le = spapr->kernel_size > 0;
>          }
>          if (spapr->kernel_size < 0) {
> @@ -2968,7 +2972,7 @@ static void spapr_machine_init(MachineState *machine)
>              /* Try to locate the initrd in the gap between the kernel
>               * and the firmware. Add a bit of space just in case
>               */
> -            spapr->initrd_base = (KERNEL_LOAD_ADDR + spapr->kernel_size
> +            spapr->initrd_base = (spapr->kernel_addr + spapr->kernel_size
>                                    + 0x1ffff) & ~0xffff;
>              spapr->initrd_size = load_image_targphys(initrd_filename,
>                                                       spapr->initrd_base,
> @@ -3214,6 +3218,18 @@ static void spapr_set_vsmt(Object *obj, Visitor *v, const char *name,
>      visit_type_uint32(v, name, (uint32_t *)opaque, errp);
>  }
>  
> +static void spapr_get_kernel_addr(Object *obj, Visitor *v, const char *name,
> +                                  void *opaque, Error **errp)
> +{
> +    visit_type_uint64(v, name, (uint64_t *)opaque, errp);
> +}
> +
> +static void spapr_set_kernel_addr(Object *obj, Visitor *v, const char *name,
> +                                  void *opaque, Error **errp)
> +{
> +    visit_type_uint64(v, name, (uint64_t *)opaque, errp);
> +}
> +
>  static char *spapr_get_ic_mode(Object *obj, Error **errp)
>  {
>      SpaprMachineState *spapr = SPAPR_MACHINE(obj);
> @@ -3319,6 +3335,14 @@ static void spapr_instance_init(Object *obj)
>      object_property_add_bool(obj, "vfio-no-msix-emulation",
>                               spapr_get_msix_emulation, NULL, NULL);
>  
> +    object_property_add(obj, "kernel-addr", "uint64", spapr_get_kernel_addr,
> +                        spapr_set_kernel_addr, NULL, &spapr->kernel_addr,
> +                        &error_abort);
> +    object_property_set_description(obj, "kernel-addr",
> +                                    stringify(KERNEL_LOAD_ADDR)
> +                                    " for -kernel is the default",
> +                                    NULL);
> +    spapr->kernel_addr = KERNEL_LOAD_ADDR;
>      /* The machine class defines the default interrupt controller mode */
>      spapr->irq = smc->irq;
>      object_property_add_str(obj, "ic-mode", spapr_get_ic_mode,

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH qemu v6 1/6] ppc: Start CPU in the default mode which is big-endian 32bit
  2020-02-12  5:43   ` David Gibson
@ 2020-02-13  3:09     ` Alexey Kardashevskiy
  2020-02-13  3:34       ` David Gibson
  0 siblings, 1 reply; 17+ messages in thread
From: Alexey Kardashevskiy @ 2020-02-13  3:09 UTC (permalink / raw)
  To: David Gibson; +Cc: Paolo Bonzini, qemu-ppc, qemu-devel, Peter Maydell



On 12/02/2020 16:43, David Gibson wrote:
> On Mon, Feb 03, 2020 at 02:29:38PM +1100, Alexey Kardashevskiy wrote:
>> At the moment we enforce 64bit mode on a CPU when reset. This does not
>> make difference as SLOF or Linux set the desired mode straight away.
>> However if we ever boot something other than these two,
>> this might not work as, for example, GRUB expects the default MSR state
>> and does not work properly.
>>
>> This removes setting MSR_SF from the PPC CPU reset.
> 
> Hrm.  This is in the core cpu model so it doesn't just affect pseries,
> but powernv (and theoretically others) as well.  Generally the cpu
> model should have the bare metal behaviour, and we can override it in
> the pseries machine if necessary.
> 
> So for a bare metal POWER system, what mode do we start in?  I'm
> guessing it probably doesn't matter in practice, since the skiboot
> firmware also probably does a mode set on entry, but it'd be nice to
> get this right in theory.


Huh. "Figure 65.  MSR setting due to interrupt" of PowerISA 3.0 says
"The SF bit is set to 1" so after all the existing behavior is correct
and my patch is just wrong. Cool.



> 
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  target/ppc/translate_init.inc.c | 6 ------
>>  1 file changed, 6 deletions(-)
>>
>> diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
>> index 53995f62eab2..f6a676cf55e8 100644
>> --- a/target/ppc/translate_init.inc.c
>> +++ b/target/ppc/translate_init.inc.c
>> @@ -10710,12 +10710,6 @@ static void ppc_cpu_reset(CPUState *s)
>>  #endif
>>  #endif
>>  
>> -#if defined(TARGET_PPC64)
>> -    if (env->mmu_model & POWERPC_MMU_64) {
>> -        msr |= (1ULL << MSR_SF);
>> -    }
>> -#endif
>> -
>>      hreg_store_msr(env, msr, 1);
>>  
>>  #if !defined(CONFIG_USER_ONLY)
> 

-- 
Alexey


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH qemu v6 1/6] ppc: Start CPU in the default mode which is big-endian 32bit
  2020-02-13  3:09     ` Alexey Kardashevskiy
@ 2020-02-13  3:34       ` David Gibson
  0 siblings, 0 replies; 17+ messages in thread
From: David Gibson @ 2020-02-13  3:34 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: Paolo Bonzini, qemu-ppc, qemu-devel, Peter Maydell

[-- Attachment #1: Type: text/plain, Size: 1610 bytes --]

On Thu, Feb 13, 2020 at 02:09:17PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 12/02/2020 16:43, David Gibson wrote:
> > On Mon, Feb 03, 2020 at 02:29:38PM +1100, Alexey Kardashevskiy wrote:
> >> At the moment we enforce 64bit mode on a CPU when reset. This does not
> >> make difference as SLOF or Linux set the desired mode straight away.
> >> However if we ever boot something other than these two,
> >> this might not work as, for example, GRUB expects the default MSR state
> >> and does not work properly.
> >>
> >> This removes setting MSR_SF from the PPC CPU reset.
> > 
> > Hrm.  This is in the core cpu model so it doesn't just affect pseries,
> > but powernv (and theoretically others) as well.  Generally the cpu
> > model should have the bare metal behaviour, and we can override it in
> > the pseries machine if necessary.
> > 
> > So for a bare metal POWER system, what mode do we start in?  I'm
> > guessing it probably doesn't matter in practice, since the skiboot
> > firmware also probably does a mode set on entry, but it'd be nice to
> > get this right in theory.
> 
> 
> Huh. "Figure 65.  MSR setting due to interrupt" of PowerISA 3.0 says
> "The SF bit is set to 1" so after all the existing behavior is correct
> and my patch is just wrong. Cool.

Well, I guess SF after interrupt isn't *necessarily* the same as SF at
reset, but it's a good place to start.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH qemu v6 2/6] ppc/spapr: Move GPRs setup to one place
  2020-02-03  3:29 ` [PATCH qemu v6 2/6] ppc/spapr: Move GPRs setup to one place Alexey Kardashevskiy
  2020-02-12 18:44   ` Fabiano Rosas
@ 2020-02-13  8:41   ` Greg Kurz
  1 sibling, 0 replies; 17+ messages in thread
From: Greg Kurz @ 2020-02-13  8:41 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Peter Maydell, David Gibson, qemu-ppc, qemu-devel, Paolo Bonzini

On Mon,  3 Feb 2020 14:29:39 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> At the moment "pseries" starts in SLOF which only expects the FDT blob
> pointer in r3. As we are going to introduce a OpenFirmware support in
> QEMU, we will be booting OF clients directly and these expect a stack
> pointer in r1, the OF entry point in r5 and in addition to this, Linux
> looks at r3/r4 for the initramdisk location (although vmlinux can find
> this from the device tree but zImage from distro kernels cannot).
> 
> This extends spapr_cpu_set_entry_state() to take more registers. This
> should cause no behavioral change.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---

Reviewed-by: Greg Kurz <groug@kaod.org>

>  include/hw/ppc/spapr_cpu_core.h | 4 +++-
>  hw/ppc/spapr.c                  | 4 ++--
>  hw/ppc/spapr_cpu_core.c         | 7 ++++++-
>  hw/ppc/spapr_rtas.c             | 2 +-
>  4 files changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr_cpu_core.h b/include/hw/ppc/spapr_cpu_core.h
> index 1c4cc6559c52..edd7214fafcf 100644
> --- a/include/hw/ppc/spapr_cpu_core.h
> +++ b/include/hw/ppc/spapr_cpu_core.h
> @@ -40,7 +40,9 @@ typedef struct SpaprCpuCoreClass {
>  } SpaprCpuCoreClass;
>  
>  const char *spapr_get_cpu_core_type(const char *cpu_type);
> -void spapr_cpu_set_entry_state(PowerPCCPU *cpu, target_ulong nip, target_ulong r3);
> +void spapr_cpu_set_entry_state(PowerPCCPU *cpu, target_ulong nip,
> +                               target_ulong r1, target_ulong r3,
> +                               target_ulong r4, target_ulong r5);
>  
>  typedef struct SpaprCpuState {
>      uint64_t vpa_addr;
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index c9b2e0a5e060..660a4b60e072 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1674,8 +1674,8 @@ static void spapr_machine_reset(MachineState *machine)
>      spapr->fdt_blob = fdt;
>  
>      /* Set up the entry state */
> -    spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT, fdt_addr);
> -    first_ppc_cpu->env.gpr[5] = 0;
> +    spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT,
> +                              0, fdt_addr, 0, 0);
>  
>      spapr->cas_reboot = false;
>  
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index d09125d9afd4..696b76598dd7 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -84,13 +84,18 @@ static void spapr_reset_vcpu(PowerPCCPU *cpu)
>      spapr_irq_cpu_intc_reset(spapr, cpu);
>  }
>  
> -void spapr_cpu_set_entry_state(PowerPCCPU *cpu, target_ulong nip, target_ulong r3)
> +void spapr_cpu_set_entry_state(PowerPCCPU *cpu, target_ulong nip,
> +                               target_ulong r1, target_ulong r3,
> +                               target_ulong r4, target_ulong r5)
>  {
>      PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>      CPUPPCState *env = &cpu->env;
>  
>      env->nip = nip;
> +    env->gpr[1] = r1;
>      env->gpr[3] = r3;
> +    env->gpr[4] = r4;
> +    env->gpr[5] = r5;
>      kvmppc_set_reg_ppc_online(cpu, 1);
>      CPU(cpu)->halted = 0;
>      /* Enable Power-saving mode Exit Cause exceptions */
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 656fdd221665..9e3cbd70bbd9 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -190,7 +190,7 @@ static void rtas_start_cpu(PowerPCCPU *callcpu, SpaprMachineState *spapr,
>       */
>      newcpu->env.tb_env->tb_offset = callcpu->env.tb_env->tb_offset;
>  
> -    spapr_cpu_set_entry_state(newcpu, start, r3);
> +    spapr_cpu_set_entry_state(newcpu, start, 0, r3, 0, 0);
>  
>      qemu_cpu_kick(CPU(newcpu));
>  



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-02-13  8:42 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-03  3:29 [PATCH qemu v6 0/6] spapr: Kill SLOF Alexey Kardashevskiy
2020-02-03  3:29 ` [PATCH qemu v6 1/6] ppc: Start CPU in the default mode which is big-endian 32bit Alexey Kardashevskiy
2020-02-12  5:43   ` David Gibson
2020-02-13  3:09     ` Alexey Kardashevskiy
2020-02-13  3:34       ` David Gibson
2020-02-03  3:29 ` [PATCH qemu v6 2/6] ppc/spapr: Move GPRs setup to one place Alexey Kardashevskiy
2020-02-12 18:44   ` Fabiano Rosas
2020-02-13  8:41   ` Greg Kurz
2020-02-03  3:29 ` [PATCH qemu v6 3/6] spapr/spapr: Make vty_getchars public Alexey Kardashevskiy
2020-02-03  3:29 ` [PATCH qemu v6 4/6] spapr/cas: Separate CAS handling from rebuilding the FDT Alexey Kardashevskiy
2020-02-03  3:29 ` [PATCH qemu v6 5/6] spapr: Allow changing offset for -kernel image Alexey Kardashevskiy
2020-02-12 18:54   ` Fabiano Rosas
2020-02-13  2:58   ` David Gibson
2020-02-03  3:29 ` [PATCH qemu v6 6/6] spapr: Implement Open Firmware client interface Alexey Kardashevskiy
2020-02-03 13:03   ` BALATON Zoltan
2020-02-05  4:18     ` Alexey Kardashevskiy
2020-02-05  4:59 ` [PATCH qemu v6] spapr: OF CI networking Alexey Kardashevskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.