qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFCv2 0/6] s390x: initial support for virtio-mem
@ 2020-07-10 15:12 David Hildenbrand
  2020-07-10 15:12 ` [PATCH RFCv2 1/6] s390x: move setting of maximum ram size to machine init David Hildenbrand
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: David Hildenbrand @ 2020-07-10 15:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Janosch Frank, Michael S . Tsirkin, Heiko Carstens,
	Cornelia Huck, David Hildenbrand, Halil Pasic,
	Christian Borntraeger, qemu-s390x, Claudio Imbrenda,
	Richard Henderson

This wires up the initial, basic version of virito-mem for s390x. General
information about virtio-mem can be found at [1] and in QEMU commit [2].
Patch #5 contains a short example for s390x.

virtio-mem for x86-64 Linux is part of v5.8-rc1. A branch with a s390x
prototype can be found at:
    git@github.com:davidhildenbrand/linux.git virtio-mem-s390x

Note that the kernel should either be compiled via
 CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE, or "memhp_default_state=online"
 should be passed on the kernel cmdline.

This series can be found at:
    git@github.com:davidhildenbrand/qemu.git virtio-mem-s390x-rfcv2

Related to s390x, we'll have to tackle migration of storage keys and
storage attributes (especially, skipping unplugged parts). Not sure if
I am missing something else (any ideas?). For virtio-mem in general, there
are a couple of TODOs, e.g., documented in [1] and [2], both in QEMU and
Linux. However, the basics are around.

I only tested this with fairly small amount of RAM in a z/VM environemnt
and under TCG ...

[1] https://virtio-mem.gitlab.io/
[2] 910b25766b33 ("virtio-mem: Paravirtualized memory hot(un)plug")

RFCv1 -> RFCv2:
- "s390x/diag: no need to check for PGM_PRIVILEGED in diag308"
-- Added
- "s390x/diag: implement diag260"
-- Implement according to doc (fix error cases)
-- Implement subcode 0xc.
-- Enable the new diag unconditionally
- "s390x: prepare device memory address space"
-- Expose maxram size now via diag260 (0xc), not via SCLP. Unfmodified
   guests can now boot without any issues. This needed kernel changes
   (updated the Linux kernel branch)
- "s390x: implement virtio-mem-ccw"
-- Force virtio revision 1

David Hildenbrand (6):
  s390x: move setting of maximum ram size to machine init
  s390x/diag: no need to check for PGM_PRIVILEGED in diag308
  s390x/diag: implement diag260
  s390x: prepare device memory address space
  s390x: implement virtio-mem-ccw
  s390x: initial support for virtio-mem

 hw/s390x/Kconfig                   |   1 +
 hw/s390x/Makefile.objs             |   1 +
 hw/s390x/s390-virtio-ccw.c         | 178 ++++++++++++++++++++++++++++-
 hw/s390x/sclp.c                    |  23 +---
 hw/s390x/virtio-ccw-mem.c          | 167 +++++++++++++++++++++++++++
 hw/s390x/virtio-ccw.h              |  13 +++
 hw/virtio/virtio-mem.c             |   2 +
 include/hw/s390x/s390-virtio-ccw.h |   3 +
 target/s390x/diag.c                |  62 +++++++++-
 target/s390x/internal.h            |   2 +
 target/s390x/kvm.c                 |  11 ++
 target/s390x/misc_helper.c         |   6 +
 target/s390x/translate.c           |   7 ++
 13 files changed, 446 insertions(+), 30 deletions(-)
 create mode 100644 hw/s390x/virtio-ccw-mem.c

-- 
2.26.2



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH RFCv2 1/6] s390x: move setting of maximum ram size to machine init
  2020-07-10 15:12 [PATCH RFCv2 0/6] s390x: initial support for virtio-mem David Hildenbrand
@ 2020-07-10 15:12 ` David Hildenbrand
  2020-07-10 15:12 ` [PATCH RFCv2 2/6] s390x/diag: no need to check for PGM_PRIVILEGED in diag308 David Hildenbrand
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2020-07-10 15:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Janosch Frank, Michael S . Tsirkin, Heiko Carstens,
	Cornelia Huck, David Hildenbrand, Halil Pasic,
	Christian Borntraeger, qemu-s390x, Claudio Imbrenda,
	Richard Henderson

As we no longer fixup the maximum ram size in sclp code, let's move
setting the maximum ram size to ccw_init()->s390_memory_init(), which
now looks like a better fit.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/s390x/s390-virtio-ccw.c | 19 ++++++++++++++++---
 hw/s390x/sclp.c            | 20 +-------------------
 2 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 023fd25f2b..2e6d292c23 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -160,13 +160,26 @@ static void virtio_ccw_register_hcalls(void)
                                    virtio_ccw_hcall_early_printk);
 }
 
-static void s390_memory_init(MemoryRegion *ram)
+static void s390_memory_init(MachineState *machine)
 {
     MemoryRegion *sysmem = get_system_memory();
     Error *local_err = NULL;
+    uint64_t hw_limit;
+    int ret;
+
+    /* We have to set the memory limit before adding any regions to sysmem. */
+    ret = s390_set_memory_limit(machine->maxram_size, &hw_limit);
+    if (ret == -E2BIG) {
+        error_report("host supports a maximum of %" PRIu64 " GB",
+                     hw_limit / GiB);
+        exit(EXIT_FAILURE);
+    } else if (ret) {
+        error_report("setting the guest size failed");
+        exit(EXIT_FAILURE);
+    }
 
     /* allocate RAM for core */
-    memory_region_add_subregion(sysmem, 0, ram);
+    memory_region_add_subregion(sysmem, 0, machine->ram);
 
     /*
      * Configure the maximum page size. As no memory devices were created
@@ -249,7 +262,7 @@ static void ccw_init(MachineState *machine)
 
     s390_sclp_init();
     /* init memory + setup max page size. Required for the CPU model */
-    s390_memory_init(machine->ram);
+    s390_memory_init(machine);
 
     /* init CPUs (incl. CPU model) early so s390_has_feature() works */
     s390_init_cpus(machine);
diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c
index d39f6d7785..f59195e15a 100644
--- a/hw/s390x/sclp.c
+++ b/hw/s390x/sclp.c
@@ -327,32 +327,14 @@ void s390_sclp_init(void)
 
 static void sclp_realize(DeviceState *dev, Error **errp)
 {
-    MachineState *machine = MACHINE(qdev_get_machine());
     SCLPDevice *sclp = SCLP(dev);
-    Error *err = NULL;
-    uint64_t hw_limit;
-    int ret;
 
     /*
      * qdev_device_add searches the sysbus for TYPE_SCLP_EVENTS_BUS. As long
      * as we can't find a fitting bus via the qom tree, we have to add the
      * event facility to the sysbus, so e.g. a sclp console can be created.
      */
-    sysbus_realize(SYS_BUS_DEVICE(sclp->event_facility), &err);
-    if (err) {
-        goto out;
-    }
-
-    ret = s390_set_memory_limit(machine->maxram_size, &hw_limit);
-    if (ret == -E2BIG) {
-        error_setg(&err, "host supports a maximum of %" PRIu64 " GB",
-                   hw_limit / GiB);
-    } else if (ret) {
-        error_setg(&err, "setting the guest size failed");
-    }
-
-out:
-    error_propagate(errp, err);
+    sysbus_realize(SYS_BUS_DEVICE(sclp->event_facility), errp);
 }
 
 static void sclp_memory_init(SCLPDevice *sclp)
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RFCv2 2/6] s390x/diag: no need to check for PGM_PRIVILEGED in diag308
  2020-07-10 15:12 [PATCH RFCv2 0/6] s390x: initial support for virtio-mem David Hildenbrand
  2020-07-10 15:12 ` [PATCH RFCv2 1/6] s390x: move setting of maximum ram size to machine init David Hildenbrand
@ 2020-07-10 15:12 ` David Hildenbrand
  2020-07-15  9:27   ` Janosch Frank
  2020-07-10 15:12 ` [PATCH RFCv2 3/6] s390x/diag: implement diag260 David Hildenbrand
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand @ 2020-07-10 15:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Janosch Frank, Michael S . Tsirkin, Heiko Carstens,
	Cornelia Huck, David Hildenbrand, Halil Pasic,
	Christian Borntraeger, qemu-s390x, Claudio Imbrenda,
	Richard Henderson

Whenever we reach this point via KVM or TCG, we already verified that we
are running in the supervisor state.

TCG checks this via IF_PRIV, KVM checks this directly in the diag
instruction handler, before exiting to userspace.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/diag.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/target/s390x/diag.c b/target/s390x/diag.c
index 1a48429564..be70aecd72 100644
--- a/target/s390x/diag.c
+++ b/target/s390x/diag.c
@@ -80,11 +80,6 @@ void handle_diag_308(CPUS390XState *env, uint64_t r1, uint64_t r3, uintptr_t ra)
     uint64_t subcode = env->regs[r3];
     IplParameterBlock *iplb;
 
-    if (env->psw.mask & PSW_MASK_PSTATE) {
-        s390_program_interrupt(env, PGM_PRIVILEGED, ra);
-        return;
-    }
-
     if (subcode & ~0x0ffffULL) {
         s390_program_interrupt(env, PGM_SPECIFICATION, ra);
         return;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RFCv2 3/6] s390x/diag: implement diag260
  2020-07-10 15:12 [PATCH RFCv2 0/6] s390x: initial support for virtio-mem David Hildenbrand
  2020-07-10 15:12 ` [PATCH RFCv2 1/6] s390x: move setting of maximum ram size to machine init David Hildenbrand
  2020-07-10 15:12 ` [PATCH RFCv2 2/6] s390x/diag: no need to check for PGM_PRIVILEGED in diag308 David Hildenbrand
@ 2020-07-10 15:12 ` David Hildenbrand
  2020-07-14 10:17   ` Claudio Imbrenda
  2020-07-10 15:12 ` [PATCH RFCv2 4/6] s390x: prepare device memory address space David Hildenbrand
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand @ 2020-07-10 15:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Janosch Frank, Michael S . Tsirkin, Heiko Carstens,
	Cornelia Huck, David Hildenbrand, Halil Pasic,
	Christian Borntraeger, qemu-s390x, Claudio Imbrenda,
	Richard Henderson

Let's implement diag260 - "Access Certain Virtual Machine Information",
used under z/VM to expose the storage configuration (especially, layout of
storage extends and thereby holes). For now, the returned information is
completely redundant to the information exposed via SCLP.

We want to reuse diag260 in QEMU to implement memory devices - to have a
mechanism to indicate to the guest OS that the initial ram size and the
maximum possible physical address differ.

The Linux kernel supports diag260 (0x10) to query the available memory
since v4.20. Ancient Linux versions used diag 260 (0xc), but stopped doing
so a while ago.

Let's unconditionally implement the new diag, without any migration
checks (e.g., compatibility machine, CPU model). Although a guest OS
could observe this when migrating between QEMU evrsions, it's somewhat
unlikely to ever trigger due to the way diag260 is used within a guest
OS - called only once or twice during boot.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 target/s390x/diag.c        | 51 ++++++++++++++++++++++++++++++++++++++
 target/s390x/internal.h    |  2 ++
 target/s390x/kvm.c         | 11 ++++++++
 target/s390x/misc_helper.c |  6 +++++
 target/s390x/translate.c   |  7 ++++++
 5 files changed, 77 insertions(+)

diff --git a/target/s390x/diag.c b/target/s390x/diag.c
index be70aecd72..5378fcf582 100644
--- a/target/s390x/diag.c
+++ b/target/s390x/diag.c
@@ -23,6 +23,57 @@
 #include "hw/s390x/pv.h"
 #include "kvm_s390x.h"
 
+void handle_diag_260(CPUS390XState *env, uint64_t r1, uint64_t r3, uintptr_t ra)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    const ram_addr_t initial_ram_size = ms->ram_size;
+    const uint64_t subcode = env->regs[r3];
+
+    switch (subcode) {
+    case 0xc:
+        /* The first storage extent maps to our initial ram. */
+        env->regs[r1] = initial_ram_size - 1;
+        /* The highest addressable byte maps to the initial ram size for now. */
+        env->regs[r3] = initial_ram_size - 1;
+        break;
+    case 0x10: {
+        ram_addr_t addr, length;
+        uint64_t tmp;
+
+        if (r1 & 1) {
+            s390_program_interrupt(env, PGM_SPECIFICATION, ra);
+            return;
+        }
+
+        addr = env->regs[r1];
+        length = env->regs[r1 + 1];
+        if (!QEMU_IS_ALIGNED(addr, 16) || !QEMU_IS_ALIGNED(length, 16) ||
+            !length) {
+            s390_program_interrupt(env, PGM_SPECIFICATION, ra);
+            return;
+        }
+        if (!address_space_access_valid(&address_space_memory, addr, length,
+                                        true, MEMTXATTRS_UNSPECIFIED)) {
+            s390_program_interrupt(env, PGM_ADDRESSING, ra);
+            return;
+        }
+
+        /* Indicate our initial memory ([0 .. ram_size - 1]) */
+        tmp = cpu_to_be64(0);
+        cpu_physical_memory_write(addr, &tmp, sizeof(tmp));
+        tmp = cpu_to_be64(initial_ram_size - 1);
+        cpu_physical_memory_write(addr + sizeof(tmp), &tmp, sizeof(tmp));
+
+        /* Exactly one entry was stored, it always fits into the area. */
+        env->regs[r3] = 1;
+        setcc(env_archcpu(env), 0);
+        break;
+    }
+    default:
+        s390_program_interrupt(env, PGM_SPECIFICATION, ra);
+    }
+}
+
 int handle_diag_288(CPUS390XState *env, uint64_t r1, uint64_t r3)
 {
     uint64_t func = env->regs[r1];
diff --git a/target/s390x/internal.h b/target/s390x/internal.h
index b1e0ebf67f..a7a3df9a3b 100644
--- a/target/s390x/internal.h
+++ b/target/s390x/internal.h
@@ -372,6 +372,8 @@ int mmu_translate_real(CPUS390XState *env, target_ulong raddr, int rw,
 
 
 /* misc_helper.c */
+void handle_diag_260(CPUS390XState *env, uint64_t r1, uint64_t r3,
+                     uintptr_t ra);
 int handle_diag_288(CPUS390XState *env, uint64_t r1, uint64_t r3);
 void handle_diag_308(CPUS390XState *env, uint64_t r1, uint64_t r3,
                      uintptr_t ra);
diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
index f2f75d2a57..d6de3ad86c 100644
--- a/target/s390x/kvm.c
+++ b/target/s390x/kvm.c
@@ -1565,6 +1565,14 @@ static int handle_hypercall(S390CPU *cpu, struct kvm_run *run)
     return ret;
 }
 
+static void kvm_handle_diag_260(S390CPU *cpu, struct kvm_run *run)
+{
+    const uint64_t r1 = (run->s390_sieic.ipa & 0x00f0) >> 4;
+    const uint64_t r3 = run->s390_sieic.ipa & 0x000f;
+
+    handle_diag_260(&cpu->env, r1, r3, 0);
+}
+
 static void kvm_handle_diag_288(S390CPU *cpu, struct kvm_run *run)
 {
     uint64_t r1, r3;
@@ -1614,6 +1622,9 @@ static int handle_diag(S390CPU *cpu, struct kvm_run *run, uint32_t ipb)
      */
     func_code = decode_basedisp_rs(&cpu->env, ipb, NULL) & DIAG_KVM_CODE_MASK;
     switch (func_code) {
+    case 0x260:
+        kvm_handle_diag_260(cpu, run);
+        break;
     case DIAG_TIMEREVENT:
         kvm_handle_diag_288(cpu, run);
         break;
diff --git a/target/s390x/misc_helper.c b/target/s390x/misc_helper.c
index 58dbc023eb..d7274eb320 100644
--- a/target/s390x/misc_helper.c
+++ b/target/s390x/misc_helper.c
@@ -116,6 +116,12 @@ void HELPER(diag)(CPUS390XState *env, uint32_t r1, uint32_t r3, uint32_t num)
     uint64_t r;
 
     switch (num) {
+    case 0x260:
+        qemu_mutex_lock_iothread();
+        handle_diag_260(env, r1, r3, GETPC());
+        qemu_mutex_unlock_iothread();
+        r = 0;
+        break;
     case 0x500:
         /* KVM hypercall */
         qemu_mutex_lock_iothread();
diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index 4f6f1e31cd..e3358395f0 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -2397,6 +2397,13 @@ static DisasJumpType op_diag(DisasContext *s, DisasOps *o)
     TCGv_i32 r3 = tcg_const_i32(get_field(s, r3));
     TCGv_i32 func_code = tcg_const_i32(get_field(s, i2));
 
+    /*
+     * Diag 0x260 updates the CC - only for some subcodes. Calculate the
+     * current cc, such that the helper can simply overwrite it conditionally.
+     */
+    if (get_field(s, i2) == 0x260) {
+        gen_op_calc_cc(s);
+    }
     gen_helper_diag(cpu_env, r1, r3, func_code);
 
     tcg_temp_free_i32(func_code);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RFCv2 4/6] s390x: prepare device memory address space
  2020-07-10 15:12 [PATCH RFCv2 0/6] s390x: initial support for virtio-mem David Hildenbrand
                   ` (2 preceding siblings ...)
  2020-07-10 15:12 ` [PATCH RFCv2 3/6] s390x/diag: implement diag260 David Hildenbrand
@ 2020-07-10 15:12 ` David Hildenbrand
  2020-07-10 15:14 ` [PATCH RFCv2 5/6] s390x: implement virtio-mem-ccw David Hildenbrand
  2020-07-10 15:14 ` [PATCH RFCv2 6/6] s390x: initial support for virtio-mem David Hildenbrand
  5 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2020-07-10 15:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Janosch Frank, Michael S . Tsirkin, Heiko Carstens,
	Cornelia Huck, David Hildenbrand, Halil Pasic,
	Christian Borntraeger, qemu-s390x, Claudio Imbrenda,
	Richard Henderson

Let's allocate the device memory information and setup the device
memory address space. Expose the maximum ramsize via diag260. The RAM
size returned via SCLP is not modified. Guest OSs which support
virtio-mem are expected to consult diag260.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/s390x/s390-virtio-ccw.c         | 43 ++++++++++++++++++++++++++++++
 hw/s390x/sclp.c                    |  3 ++-
 include/hw/s390x/s390-virtio-ccw.h |  3 +++
 target/s390x/diag.c                | 10 +++++--
 4 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 2e6d292c23..577590e623 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -160,6 +160,35 @@ static void virtio_ccw_register_hcalls(void)
                                    virtio_ccw_hcall_early_printk);
 }
 
+static void s390_device_memory_init(MachineState *machine)
+{
+    MemoryRegion *sysmem = get_system_memory();
+
+    machine->device_memory = g_malloc0(sizeof(*machine->device_memory));
+
+    /* initialize device memory address space */
+    if (machine->ram_size < machine->maxram_size) {
+        ram_addr_t device_mem_size = machine->maxram_size - machine->ram_size;
+
+        if (QEMU_ALIGN_UP(machine->maxram_size, MiB) != machine->maxram_size) {
+            error_report("maximum memory size must by aligned to 1 MB");
+            exit(EXIT_FAILURE);
+        }
+
+        machine->device_memory->base = machine->ram_size;
+        if (machine->device_memory->base + device_mem_size < device_mem_size) {
+            error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT,
+                         machine->maxram_size);
+            exit(EXIT_FAILURE);
+        }
+
+        memory_region_init(&machine->device_memory->mr, OBJECT(machine),
+                           "device-memory", device_mem_size);
+        memory_region_add_subregion(sysmem, machine->device_memory->base,
+                                    &machine->device_memory->mr);
+    }
+}
+
 static void s390_memory_init(MachineState *machine)
 {
     MemoryRegion *sysmem = get_system_memory();
@@ -194,6 +223,11 @@ static void s390_memory_init(MachineState *machine)
     s390_skeys_init();
     /* Initialize storage attributes device */
     s390_stattrib_init();
+
+    /* Support for memory devices is glued to compat machines. */
+    if (memory_devices_allowed()) {
+        s390_device_memory_init(machine);
+    }
 }
 
 static void s390_init_ipl_dev(const char *kernel_filename,
@@ -617,6 +651,7 @@ static void ccw_machine_class_init(ObjectClass *oc, void *data)
     s390mc->cpu_model_allowed = true;
     s390mc->css_migration_enabled = true;
     s390mc->hpage_1m_allowed = true;
+    s390mc->memory_devices_allowed = true;
     mc->init = ccw_init;
     mc->reset = s390_machine_reset;
     mc->hot_add_cpu = s390_hot_add_cpu;
@@ -713,6 +748,11 @@ bool hpage_1m_allowed(void)
     return get_machine_class()->hpage_1m_allowed;
 }
 
+bool memory_devices_allowed(void)
+{
+    return get_machine_class()->memory_devices_allowed;
+}
+
 static char *machine_get_loadparm(Object *obj, Error **errp)
 {
     S390CcwMachineState *ms = S390_CCW_MACHINE(obj);
@@ -831,8 +871,11 @@ static void ccw_machine_5_0_instance_options(MachineState *machine)
 
 static void ccw_machine_5_0_class_options(MachineClass *mc)
 {
+    S390CcwMachineClass *s390mc = S390_MACHINE_CLASS(mc);
+
     ccw_machine_5_1_class_options(mc);
     compat_props_add(mc->compat_props, hw_compat_5_0, hw_compat_5_0_len);
+    s390mc->memory_devices_allowed = false;
 }
 DEFINE_CCW_MACHINE(5_0, "5.0", false);
 
diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c
index f59195e15a..972995d928 100644
--- a/hw/s390x/sclp.c
+++ b/hw/s390x/sclp.c
@@ -22,6 +22,7 @@
 #include "hw/s390x/event-facility.h"
 #include "hw/s390x/s390-pci-bus.h"
 #include "hw/s390x/ipl.h"
+#include "hw/s390x/s390-virtio-ccw.h"
 
 static inline SCLPDevice *get_sclp_device(void)
 {
@@ -110,7 +111,7 @@ static void read_SCP_info(SCLPDevice *sclp, SCCB *sccb)
         read_info->rnsize2 = cpu_to_be32(rnsize);
     }
 
-    /* we don't support standby memory, maxram_size is never exposed */
+    /* We don't support standby memory, maxram_size is exposed via diag260. */
     rnmax = machine->ram_size >> sclp->increment_size;
     if (rnmax < 0x10000) {
         read_info->rnmax = cpu_to_be16(rnmax);
diff --git a/include/hw/s390x/s390-virtio-ccw.h b/include/hw/s390x/s390-virtio-ccw.h
index cd1dccc6e3..efbd9f1da1 100644
--- a/include/hw/s390x/s390-virtio-ccw.h
+++ b/include/hw/s390x/s390-virtio-ccw.h
@@ -41,6 +41,7 @@ typedef struct S390CcwMachineClass {
     bool cpu_model_allowed;
     bool css_migration_enabled;
     bool hpage_1m_allowed;
+    bool memory_devices_allowed;
 } S390CcwMachineClass;
 
 /* runtime-instrumentation allowed by the machine */
@@ -49,6 +50,8 @@ bool ri_allowed(void);
 bool cpu_model_allowed(void);
 /* 1M huge page mappings allowed by the machine */
 bool hpage_1m_allowed(void);
+/* Allow memory devices are allowed (memory device address space created). */
+bool memory_devices_allowed(void);
 
 /**
  * Returns true if (vmstate based) migration of the channel subsystem
diff --git a/target/s390x/diag.c b/target/s390x/diag.c
index 5378fcf582..9abd23b64a 100644
--- a/target/s390x/diag.c
+++ b/target/s390x/diag.c
@@ -27,14 +27,20 @@ void handle_diag_260(CPUS390XState *env, uint64_t r1, uint64_t r3, uintptr_t ra)
 {
     MachineState *ms = MACHINE(qdev_get_machine());
     const ram_addr_t initial_ram_size = ms->ram_size;
+    const ram_addr_t maximum_ram_size = ms->maxram_size;
     const uint64_t subcode = env->regs[r3];
 
     switch (subcode) {
     case 0xc:
         /* The first storage extent maps to our initial ram. */
         env->regs[r1] = initial_ram_size - 1;
-        /* The highest addressable byte maps to the initial ram size for now. */
-        env->regs[r3] = initial_ram_size - 1;
+        if (memory_devices_allowed()) {
+            /* The highest addressable byte maps to the maximum ram size. */
+            env->regs[r3] = maximum_ram_size - 1;
+        } else {
+            /* The highest addressable byte maps to the initial ram size. */
+            env->regs[r3] = initial_ram_size - 1;
+        }
         break;
     case 0x10: {
         ram_addr_t addr, length;
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RFCv2 5/6] s390x: implement virtio-mem-ccw
  2020-07-10 15:12 [PATCH RFCv2 0/6] s390x: initial support for virtio-mem David Hildenbrand
                   ` (3 preceding siblings ...)
  2020-07-10 15:12 ` [PATCH RFCv2 4/6] s390x: prepare device memory address space David Hildenbrand
@ 2020-07-10 15:14 ` David Hildenbrand
  2020-07-10 15:14 ` [PATCH RFCv2 6/6] s390x: initial support for virtio-mem David Hildenbrand
  5 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2020-07-10 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Janosch Frank, Michael S . Tsirkin, Heiko Carstens,
	Cornelia Huck, David Hildenbrand, Halil Pasic,
	Christian Borntraeger, qemu-s390x, Claudio Imbrenda,
	Richard Henderson

Add a proper CCW proxy device, similar to the PCI variant.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/s390x/virtio-ccw-mem.c | 167 ++++++++++++++++++++++++++++++++++++++
 hw/s390x/virtio-ccw.h     |  13 +++
 2 files changed, 180 insertions(+)
 create mode 100644 hw/s390x/virtio-ccw-mem.c

diff --git a/hw/s390x/virtio-ccw-mem.c b/hw/s390x/virtio-ccw-mem.c
new file mode 100644
index 0000000000..7b45c68ce4
--- /dev/null
+++ b/hw/s390x/virtio-ccw-mem.c
@@ -0,0 +1,167 @@
+/*
+ * Virtio MEM CCW device
+ *
+ * Copyright (C) 2020 Red Hat, Inc.
+ *
+ * Authors:
+ *  David Hildenbrand <david@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/qdev-properties.h"
+#include "hw/virtio/virtio.h"
+#include "qapi/error.h"
+#include "qemu/module.h"
+#include "virtio-ccw.h"
+#include "hw/mem/memory-device.h"
+#include "qapi/qapi-events-misc.h"
+
+static void virtio_ccw_mem_realize(VirtioCcwDevice *ccw_dev, Error **errp)
+{
+    VirtIOMEMCcw *ccw_mem = VIRTIO_MEM_CCW(ccw_dev);
+    DeviceState *vdev = DEVICE(&ccw_mem->vdev);
+
+    qdev_realize(vdev, BUS(&ccw_dev->bus), errp);
+}
+
+static void virtio_ccw_mem_set_addr(MemoryDeviceState *md, uint64_t addr,
+                                    Error **errp)
+{
+    object_property_set_uint(OBJECT(md), addr, VIRTIO_MEM_ADDR_PROP, errp);
+}
+
+static uint64_t virtio_ccw_mem_get_addr(const MemoryDeviceState *md)
+{
+    return object_property_get_uint(OBJECT(md), VIRTIO_MEM_ADDR_PROP,
+                                    &error_abort);
+}
+
+static MemoryRegion *virtio_ccw_mem_get_memory_region(MemoryDeviceState *md,
+                                                      Error **errp)
+{
+    VirtIOMEMCcw *ccw_mem = VIRTIO_MEM_CCW(md);
+    VirtIOMEM *vmem = VIRTIO_MEM(&ccw_mem->vdev);
+    VirtIOMEMClass *vmc = VIRTIO_MEM_GET_CLASS(vmem);
+
+    return vmc->get_memory_region(vmem, errp);
+}
+
+static uint64_t virtio_ccw_mem_get_plugged_size(const MemoryDeviceState *md,
+                                                Error **errp)
+{
+    return object_property_get_uint(OBJECT(md), VIRTIO_MEM_SIZE_PROP,
+                                    errp);
+}
+
+static void virtio_ccw_mem_fill_device_info(const MemoryDeviceState *md,
+                                            MemoryDeviceInfo *info)
+{
+    VirtioMEMDeviceInfo *vi = g_new0(VirtioMEMDeviceInfo, 1);
+    VirtIOMEMCcw *ccw_mem = VIRTIO_MEM_CCW(md);
+    VirtIOMEM *vmem = VIRTIO_MEM(&ccw_mem->vdev);
+    VirtIOMEMClass *vpc = VIRTIO_MEM_GET_CLASS(vmem);
+    DeviceState *dev = DEVICE(md);
+
+    if (dev->id) {
+        vi->has_id = true;
+        vi->id = g_strdup(dev->id);
+    }
+
+    /* let the real device handle everything else */
+    vpc->fill_device_info(vmem, vi);
+
+    info->u.virtio_mem.data = vi;
+    info->type = MEMORY_DEVICE_INFO_KIND_VIRTIO_MEM;
+}
+
+static void virtio_ccw_mem_size_change_notify(Notifier *notifier, void *data)
+{
+    VirtIOMEMCcw *ccw_mem = container_of(notifier, VirtIOMEMCcw,
+                                         size_change_notifier);
+    DeviceState *dev = DEVICE(ccw_mem);
+    const uint64_t * const size_p = data;
+    const char *id = NULL;
+
+    if (dev->id) {
+        id = g_strdup(dev->id);
+    }
+
+    qapi_event_send_memory_device_size_change(!!id, id, *size_p);
+}
+
+static void virtio_ccw_mem_instance_init(Object *obj)
+{
+    VirtioCcwDevice *ccw_dev = VIRTIO_CCW_DEVICE(obj);
+    VirtIOMEMCcw *ccw_mem = VIRTIO_MEM_CCW(obj);
+    VirtIOMEMClass *vmc;
+    VirtIOMEM *vmem;
+
+    ccw_dev->force_revision_1 = true;
+    virtio_instance_init_common(obj, &ccw_mem->vdev, sizeof(ccw_mem->vdev),
+                                TYPE_VIRTIO_MEM);
+
+    ccw_mem->size_change_notifier.notify = virtio_ccw_mem_size_change_notify;
+    vmem = VIRTIO_MEM(&ccw_mem->vdev);
+    vmc = VIRTIO_MEM_GET_CLASS(vmem);
+    /*
+     * We never remove the notifier again, as we expect both devices to
+     * disappear at the same time.
+     */
+    vmc->add_size_change_notifier(vmem, &ccw_mem->size_change_notifier);
+
+    object_property_add_alias(obj, VIRTIO_MEM_BLOCK_SIZE_PROP,
+                              OBJECT(&ccw_mem->vdev),
+                              VIRTIO_MEM_BLOCK_SIZE_PROP);
+    object_property_add_alias(obj, VIRTIO_MEM_SIZE_PROP, OBJECT(&ccw_mem->vdev),
+                              VIRTIO_MEM_SIZE_PROP);
+    object_property_add_alias(obj, VIRTIO_MEM_REQUESTED_SIZE_PROP,
+                              OBJECT(&ccw_mem->vdev),
+                              VIRTIO_MEM_REQUESTED_SIZE_PROP);
+}
+
+static Property virtio_ccw_mem_properties[] = {
+    DEFINE_PROP_BIT("ioeventfd", VirtioCcwDevice, flags,
+                    VIRTIO_CCW_FLAG_USE_IOEVENTFD_BIT, true),
+    DEFINE_PROP_UINT32("max_revision", VirtioCcwDevice, max_rev,
+                       VIRTIO_CCW_MAX_REV),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void virtio_ccw_mem_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    VirtIOCCWDeviceClass *k = VIRTIO_CCW_DEVICE_CLASS(klass);
+    MemoryDeviceClass *mdc = MEMORY_DEVICE_CLASS(klass);
+
+    k->realize = virtio_ccw_mem_realize;
+    device_class_set_props(dc, virtio_ccw_mem_properties);
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+
+    mdc->get_addr = virtio_ccw_mem_get_addr;
+    mdc->set_addr = virtio_ccw_mem_set_addr;
+    mdc->get_plugged_size = virtio_ccw_mem_get_plugged_size;
+    mdc->get_memory_region = virtio_ccw_mem_get_memory_region;
+    mdc->fill_device_info = virtio_ccw_mem_fill_device_info;
+}
+
+static const TypeInfo virtio_ccw_mem = {
+    .name = TYPE_VIRTIO_MEM_CCW,
+    .parent = TYPE_VIRTIO_CCW_DEVICE,
+    .instance_size = sizeof(VirtIOMEMCcw),
+    .instance_init = virtio_ccw_mem_instance_init,
+    .class_init = virtio_ccw_mem_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_MEMORY_DEVICE },
+        { }
+    },
+};
+
+static void virtio_ccw_mem_register(void)
+{
+    type_register_static(&virtio_ccw_mem);
+}
+
+type_init(virtio_ccw_mem_register)
diff --git a/hw/s390x/virtio-ccw.h b/hw/s390x/virtio-ccw.h
index c0e3355248..77aa87c41f 100644
--- a/hw/s390x/virtio-ccw.h
+++ b/hw/s390x/virtio-ccw.h
@@ -29,6 +29,7 @@
 #endif /* CONFIG_VHOST_VSOCK */
 #include "hw/virtio/virtio-gpu.h"
 #include "hw/virtio/virtio-input.h"
+#include "hw/virtio/virtio-mem.h"
 
 #include "hw/s390x/s390_flic.h"
 #include "hw/s390x/css.h"
@@ -256,4 +257,16 @@ typedef struct VirtIOInputHIDCcw {
     VirtIOInputHID vdev;
 } VirtIOInputHIDCcw;
 
+/* virtio-mem-ccw */
+
+#define TYPE_VIRTIO_MEM_CCW "virtio-mem-ccw"
+#define VIRTIO_MEM_CCW(obj) \
+        OBJECT_CHECK(VirtIOMEMCcw, (obj), TYPE_VIRTIO_MEM_CCW)
+
+typedef struct VirtIOMEMCcw {
+    VirtioCcwDevice parent_obj;
+    VirtIOMEM vdev;
+    Notifier size_change_notifier;
+} VirtIOMEMCcw;
+
 #endif
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RFCv2 6/6] s390x: initial support for virtio-mem
  2020-07-10 15:12 [PATCH RFCv2 0/6] s390x: initial support for virtio-mem David Hildenbrand
                   ` (4 preceding siblings ...)
  2020-07-10 15:14 ` [PATCH RFCv2 5/6] s390x: implement virtio-mem-ccw David Hildenbrand
@ 2020-07-10 15:14 ` David Hildenbrand
  5 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2020-07-10 15:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Janosch Frank, Michael S . Tsirkin, Heiko Carstens,
	Cornelia Huck, David Hildenbrand, Halil Pasic,
	Christian Borntraeger, qemu-s390x, Claudio Imbrenda,
	Richard Henderson

Let's wire up the initial, basic virtio-mem implementation in QEMU. It will
have to see some important extensions (esp., resizeable allocations)
before it can be considered production ready. Also, the focus on the Linux
driver side is on memory hotplug, there are a lot of things optimize in
the future to improve memory unplug capabilities. However, the basics
are in place.

Block migration for now, as we'll have to take proper care of storage
keys and storage attributes. Also, make sure to not hotplug huge pages
to a setup without huge pages.

With a Linux guest that supports virtio-mem (and has
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE set for now), a basic example.

1. Start a VM with 2G initial memory and a virtio-mem device with a maximum
   capacity of 18GB (and an initial size of 300M):
    sudo qemu-system-s390x \
        --enable-kvm \
        -m 2G,maxmem=20G \
        -smp 4 \
        -nographic \
        -chardev socket,id=monitor,path=/var/tmp/monitor,server,nowait \
        -mon chardev=monitor,mode=readline \
        -net nic -net user \
        -hda s390x.cow2 \
        -object memory-backend-ram,id=mem0,size=18G \
        -device virtio-mem-ccw,id=vm0,memdev=mem0,requested-size=300M

2. Query the current size of virtio-mem device:
    (qemu) info memory-devices
    Memory device [virtio-mem]: "vm0"
      memaddr: 0x80000000
      node: 0
      requested-size: 314572800
      size: 314572800
      max-size: 19327352832
      block-size: 1048576
      memdev: /objects/mem0

3. Request to grow it to 8GB:
    (qemu) qom-set vm0 requested-size 8G
    (qemu) info memory-devices
    Memory device [virtio-mem]: "vm0"
      memaddr: 0x80000000
      node: 0
      requested-size: 8589934592
      size: 8589934592
      max-size: 19327352832
      block-size: 1048576
      memdev: /objects/mem0

4. Request to shrink it to 800M (might take a while, might not fully
   succeed, and might not be able to remove memory blocks in Linux):
  (qemu) qom-set vm0 requested-size 800M
  (qemu) info memory-devices
  Memory device [virtio-mem]: "vm0"
    memaddr: 0x80000000
    node: 0
    requested-size: 838860800
    size: 838860800
    max-size: 19327352832
    block-size: 1048576
    memdev: /objects/mem0

Note: Due to lack of resizeable allocations, we will go ahead and
reserve a 18GB vmalloc area + size the QEMU RAM slot + KVM mamory slot
18GB. echo 1 > /proc/sys/vm/overcommit_memory might be required for
now. In the future, this area will instead grow on actual demand and shrink
when possible.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/s390x/Kconfig           |   1 +
 hw/s390x/Makefile.objs     |   1 +
 hw/s390x/s390-virtio-ccw.c | 116 ++++++++++++++++++++++++++++++++++++-
 hw/virtio/virtio-mem.c     |   2 +
 4 files changed, 118 insertions(+), 2 deletions(-)

diff --git a/hw/s390x/Kconfig b/hw/s390x/Kconfig
index 5e7d8a2bae..b8619c1adc 100644
--- a/hw/s390x/Kconfig
+++ b/hw/s390x/Kconfig
@@ -10,3 +10,4 @@ config S390_CCW_VIRTIO
     select SCLPCONSOLE
     select VIRTIO_CCW
     select MSI_NONBROKEN
+    select VIRTIO_MEM_SUPPORTED
diff --git a/hw/s390x/Makefile.objs b/hw/s390x/Makefile.objs
index a46a1c7894..924775d6f0 100644
--- a/hw/s390x/Makefile.objs
+++ b/hw/s390x/Makefile.objs
@@ -20,6 +20,7 @@ obj-$(CONFIG_VIRTIO_NET) += virtio-ccw-net.o
 obj-$(CONFIG_VIRTIO_BLK) += virtio-ccw-blk.o
 obj-$(call land,$(CONFIG_VIRTIO_9P),$(CONFIG_VIRTFS)) += virtio-ccw-9p.o
 obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock-ccw.o
+obj-$(CONFIG_VIRTIO_MEM) += virtio-ccw-mem.o
 endif
 obj-y += css-bridge.o
 obj-y += ccw-device.o
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 577590e623..e714035077 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -45,6 +45,7 @@
 #include "sysemu/sysemu.h"
 #include "hw/s390x/pv.h"
 #include "migration/blocker.h"
+#include "hw/mem/memory-device.h"
 
 static Error *pv_mig_blocker;
 
@@ -542,11 +543,119 @@ static void s390_machine_reset(MachineState *machine)
     s390_ipl_clear_reset_request();
 }
 
+static void s390_virtio_md_pre_plug(HotplugHandler *hotplug_dev,
+                                    DeviceState *dev, Error **errp)
+{
+    HotplugHandler *hotplug_dev2 = qdev_get_bus_hotplug_handler(dev);
+    MemoryDeviceState *md = MEMORY_DEVICE(dev);
+    MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(md);
+    Error *local_err = NULL;
+
+    if (!hotplug_dev2 && dev->hotplugged) {
+        /*
+         * Without a bus hotplug handler, we cannot control the plug/unplug
+         * order. We should never reach this point when hotplugging, however,
+         * better add a safety net.
+         */
+        error_setg(errp, "hotplug of virtio based memory devices not supported"
+                         " on this bus.");
+        return;
+    }
+
+    /*
+     * KVM does not support device memory with a bigger page size than initial
+     * memory. The new memory backend is not mapped yet, so
+     * qemu_maxrampagesize() won't consider it.
+     */
+    if (kvm_enabled()) {
+        MemoryRegion *mr = mdc->get_memory_region(md, &local_err);
+
+        if (local_err) {
+            goto out;
+        }
+        if (qemu_ram_pagesize(mr->ram_block) > qemu_maxrampagesize()) {
+            error_setg(&local_err, "Device memory has a bigger page size than"
+                       " initial memory");
+            goto out;
+        }
+    }
+
+    /*
+     * First, see if we can plug this memory device at all. If that
+     * succeeds, branch of to the actual hotplug handler.
+     */
+    memory_device_pre_plug(md, MACHINE(hotplug_dev), NULL, &local_err);
+    if (!local_err && hotplug_dev2) {
+        hotplug_handler_pre_plug(hotplug_dev2, dev, &local_err);
+    }
+out:
+    error_propagate(errp, local_err);
+}
+
+static void s390_virtio_md_plug(HotplugHandler *hotplug_dev,
+                                DeviceState *dev, Error **errp)
+{
+    HotplugHandler *hotplug_dev2 = qdev_get_bus_hotplug_handler(dev);
+    static Error *migration_blocker;
+    bool add_blocker = !migration_blocker;
+    Error *local_err = NULL;
+
+    /*
+     * Until we support migration of storage keys and storage attributes
+     * for anything that's not initial memory, let's block migration.
+     */
+    if (add_blocker) {
+        error_setg(&migration_blocker, "storage keys/attributes not yet"
+                   " migrated for memory devices");
+        migrate_add_blocker(migration_blocker, &local_err);
+        if (local_err) {
+            error_free_or_abort(&migration_blocker);
+            goto out;
+        }
+    }
+
+    /*
+     * Plug the memory device first and then branch off to the actual
+     * hotplug handler. If that one fails, we can easily undo the memory
+     * device bits.
+     */
+    memory_device_plug(MEMORY_DEVICE(dev), MACHINE(hotplug_dev));
+    if (hotplug_dev2) {
+        hotplug_handler_plug(hotplug_dev2, dev, &local_err);
+        if (local_err) {
+            memory_device_unplug(MEMORY_DEVICE(dev), MACHINE(hotplug_dev));
+            if (add_blocker) {
+                migrate_del_blocker(migration_blocker);
+                error_free_or_abort(&migration_blocker);
+            }
+        }
+    }
+out:
+    error_propagate(errp, local_err);
+}
+
+static void s390_virtio_md_unplug_request(HotplugHandler *hotplug_dev,
+                                          DeviceState *dev, Error **errp)
+{
+    /* We don't support hot unplug of virtio based memory devices */
+    error_setg(errp, "virtio based memory devices cannot be unplugged.");
+}
+
+static void s390_machine_device_pre_plug(HotplugHandler *hotplug_dev,
+                                         DeviceState *dev, Error **errp)
+{
+    if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_CCW)) {
+        s390_virtio_md_pre_plug(hotplug_dev, dev, errp);
+    }
+}
+
 static void s390_machine_device_plug(HotplugHandler *hotplug_dev,
                                      DeviceState *dev, Error **errp)
 {
     if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
         s390_cpu_plug(hotplug_dev, dev, errp);
+    } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_CCW)) {
+        s390_virtio_md_plug(hotplug_dev, dev, errp);
     }
 }
 
@@ -555,7 +664,8 @@ static void s390_machine_device_unplug_request(HotplugHandler *hotplug_dev,
 {
     if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
         error_setg(errp, "CPU hot unplug not supported on this machine");
-        return;
+    } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_CCW)) {
+        s390_virtio_md_unplug_request(hotplug_dev, dev, errp);
     }
 }
 
@@ -596,7 +706,8 @@ static const CPUArchIdList *s390_possible_cpu_arch_ids(MachineState *ms)
 static HotplugHandler *s390_get_hotplug_handler(MachineState *machine,
                                                 DeviceState *dev)
 {
-    if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+    if (object_dynamic_cast(OBJECT(dev), TYPE_CPU) ||
+        object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_CCW)) {
         return HOTPLUG_HANDLER(machine);
     }
     return NULL;
@@ -668,6 +779,7 @@ static void ccw_machine_class_init(ObjectClass *oc, void *data)
     mc->possible_cpu_arch_ids = s390_possible_cpu_arch_ids;
     /* it is overridden with 'host' cpu *in kvm_arch_init* */
     mc->default_cpu_type = S390_CPU_TYPE_NAME("qemu");
+    hc->pre_plug = s390_machine_device_pre_plug;
     hc->plug = s390_machine_device_plug;
     hc->unplug_request = s390_machine_device_unplug_request;
     nc->nmi_monitor_handler = s390_nmi;
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 65850530e7..e1b3275089 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -53,6 +53,8 @@
  */
 #if defined(TARGET_X86_64) || defined(TARGET_I386)
 #define VIRTIO_MEM_USABLE_EXTENT (2 * (128 * MiB))
+#elif defined(TARGET_S390X)
+#define VIRTIO_MEM_USABLE_EXTENT (2 * (256 * MiB))
 #else
 #error VIRTIO_MEM_USABLE_EXTENT not defined
 #endif
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH RFCv2 3/6] s390x/diag: implement diag260
  2020-07-10 15:12 ` [PATCH RFCv2 3/6] s390x/diag: implement diag260 David Hildenbrand
@ 2020-07-14 10:17   ` Claudio Imbrenda
  2020-07-15  9:19     ` David Hildenbrand
  0 siblings, 1 reply; 12+ messages in thread
From: Claudio Imbrenda @ 2020-07-14 10:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Thomas Huth, Janosch Frank, Michael S . Tsirkin, Heiko Carstens,
	Cornelia Huck, qemu-devel, Halil Pasic, Christian Borntraeger,
	qemu-s390x, Richard Henderson

On Fri, 10 Jul 2020 17:12:36 +0200
David Hildenbrand <david@redhat.com> wrote:

> Let's implement diag260 - "Access Certain Virtual Machine
> Information", used under z/VM to expose the storage configuration
> (especially, layout of storage extends and thereby holes). For now,
> the returned information is completely redundant to the information
> exposed via SCLP.
> 
> We want to reuse diag260 in QEMU to implement memory devices - to
> have a mechanism to indicate to the guest OS that the initial ram
> size and the maximum possible physical address differ.
> 
> The Linux kernel supports diag260 (0x10) to query the available memory
> since v4.20. Ancient Linux versions used diag 260 (0xc), but stopped
> doing so a while ago.
> 
> Let's unconditionally implement the new diag, without any migration
> checks (e.g., compatibility machine, CPU model). Although a guest OS
> could observe this when migrating between QEMU evrsions, it's somewhat
> unlikely to ever trigger due to the way diag260 is used within a guest
> OS - called only once or twice during boot.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  target/s390x/diag.c        | 51
> ++++++++++++++++++++++++++++++++++++++ target/s390x/internal.h    |
> 2 ++ target/s390x/kvm.c         | 11 ++++++++
>  target/s390x/misc_helper.c |  6 +++++
>  target/s390x/translate.c   |  7 ++++++
>  5 files changed, 77 insertions(+)
> 
> diff --git a/target/s390x/diag.c b/target/s390x/diag.c
> index be70aecd72..5378fcf582 100644
> --- a/target/s390x/diag.c
> +++ b/target/s390x/diag.c
> @@ -23,6 +23,57 @@
>  #include "hw/s390x/pv.h"
>  #include "kvm_s390x.h"
>  
> +void handle_diag_260(CPUS390XState *env, uint64_t r1, uint64_t r3,
> uintptr_t ra) +{
> +    MachineState *ms = MACHINE(qdev_get_machine());
> +    const ram_addr_t initial_ram_size = ms->ram_size;
> +    const uint64_t subcode = env->regs[r3];
> +
> +    switch (subcode) {
> +    case 0xc:
> +        /* The first storage extent maps to our initial ram. */
> +        env->regs[r1] = initial_ram_size - 1;
> +        /* The highest addressable byte maps to the initial ram size
> for now. */
> +        env->regs[r3] = initial_ram_size - 1;
> +        break;
> +    case 0x10: {
> +        ram_addr_t addr, length;
> +        uint64_t tmp;
> +
> +        if (r1 & 1) {
> +            s390_program_interrupt(env, PGM_SPECIFICATION, ra);
> +            return;
> +        }
> +
> +        addr = env->regs[r1];
> +        length = env->regs[r1 + 1];
> +        if (!QEMU_IS_ALIGNED(addr, 16) || !QEMU_IS_ALIGNED(length,
> 16) ||
> +            !length) {
> +            s390_program_interrupt(env, PGM_SPECIFICATION, ra);
> +            return;
> +        }
> +        if (!address_space_access_valid(&address_space_memory, addr,
> length,
> +                                        true,
> MEMTXATTRS_UNSPECIFIED)) {
> +            s390_program_interrupt(env, PGM_ADDRESSING, ra);
> +            return;
> +        }
> +
> +        /* Indicate our initial memory ([0 .. ram_size - 1]) */
> +        tmp = cpu_to_be64(0);
> +        cpu_physical_memory_write(addr, &tmp, sizeof(tmp));
> +        tmp = cpu_to_be64(initial_ram_size - 1);
> +        cpu_physical_memory_write(addr + sizeof(tmp), &tmp,
> sizeof(tmp)); +
> +        /* Exactly one entry was stored, it always fits into the
> area. */

maybe I missed something, but I have the impression that your
implementation of DIAG 260 always only returns the first extent?

shouldn't it return all the hotplugged areas once hotplugging is
enabled?

> +        env->regs[r3] = 1;
> +        setcc(env_archcpu(env), 0);
> +        break;
> +    }
> +    default:
> +        s390_program_interrupt(env, PGM_SPECIFICATION, ra);
> +    }
> +}
> +
>  int handle_diag_288(CPUS390XState *env, uint64_t r1, uint64_t r3)
>  {
>      uint64_t func = env->regs[r1];
> diff --git a/target/s390x/internal.h b/target/s390x/internal.h
> index b1e0ebf67f..a7a3df9a3b 100644
> --- a/target/s390x/internal.h
> +++ b/target/s390x/internal.h
> @@ -372,6 +372,8 @@ int mmu_translate_real(CPUS390XState *env,
> target_ulong raddr, int rw, 
>  
>  /* misc_helper.c */
> +void handle_diag_260(CPUS390XState *env, uint64_t r1, uint64_t r3,
> +                     uintptr_t ra);
>  int handle_diag_288(CPUS390XState *env, uint64_t r1, uint64_t r3);
>  void handle_diag_308(CPUS390XState *env, uint64_t r1, uint64_t r3,
>                       uintptr_t ra);
> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
> index f2f75d2a57..d6de3ad86c 100644
> --- a/target/s390x/kvm.c
> +++ b/target/s390x/kvm.c
> @@ -1565,6 +1565,14 @@ static int handle_hypercall(S390CPU *cpu,
> struct kvm_run *run) return ret;
>  }
>  
> +static void kvm_handle_diag_260(S390CPU *cpu, struct kvm_run *run)
> +{
> +    const uint64_t r1 = (run->s390_sieic.ipa & 0x00f0) >> 4;
> +    const uint64_t r3 = run->s390_sieic.ipa & 0x000f;
> +
> +    handle_diag_260(&cpu->env, r1, r3, 0);
> +}
> +
>  static void kvm_handle_diag_288(S390CPU *cpu, struct kvm_run *run)
>  {
>      uint64_t r1, r3;
> @@ -1614,6 +1622,9 @@ static int handle_diag(S390CPU *cpu, struct
> kvm_run *run, uint32_t ipb) */
>      func_code = decode_basedisp_rs(&cpu->env, ipb, NULL) &
> DIAG_KVM_CODE_MASK; switch (func_code) {
> +    case 0x260:
> +        kvm_handle_diag_260(cpu, run);
> +        break;
>      case DIAG_TIMEREVENT:
>          kvm_handle_diag_288(cpu, run);
>          break;
> diff --git a/target/s390x/misc_helper.c b/target/s390x/misc_helper.c
> index 58dbc023eb..d7274eb320 100644
> --- a/target/s390x/misc_helper.c
> +++ b/target/s390x/misc_helper.c
> @@ -116,6 +116,12 @@ void HELPER(diag)(CPUS390XState *env, uint32_t
> r1, uint32_t r3, uint32_t num) uint64_t r;
>  
>      switch (num) {
> +    case 0x260:
> +        qemu_mutex_lock_iothread();
> +        handle_diag_260(env, r1, r3, GETPC());
> +        qemu_mutex_unlock_iothread();
> +        r = 0;
> +        break;
>      case 0x500:
>          /* KVM hypercall */
>          qemu_mutex_lock_iothread();
> diff --git a/target/s390x/translate.c b/target/s390x/translate.c
> index 4f6f1e31cd..e3358395f0 100644
> --- a/target/s390x/translate.c
> +++ b/target/s390x/translate.c
> @@ -2397,6 +2397,13 @@ static DisasJumpType op_diag(DisasContext *s,
> DisasOps *o) TCGv_i32 r3 = tcg_const_i32(get_field(s, r3));
>      TCGv_i32 func_code = tcg_const_i32(get_field(s, i2));
>  
> +    /*
> +     * Diag 0x260 updates the CC - only for some subcodes. Calculate
> the
> +     * current cc, such that the helper can simply overwrite it
> conditionally.
> +     */
> +    if (get_field(s, i2) == 0x260) {
> +        gen_op_calc_cc(s);
> +    }
>      gen_helper_diag(cpu_env, r1, r3, func_code);
>  
>      tcg_temp_free_i32(func_code);



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFCv2 3/6] s390x/diag: implement diag260
  2020-07-14 10:17   ` Claudio Imbrenda
@ 2020-07-15  9:19     ` David Hildenbrand
  2020-07-15 10:53       ` Heiko Carstens
  0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand @ 2020-07-15  9:19 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: Thomas Huth, Janosch Frank, Michael S . Tsirkin, Heiko Carstens,
	Cornelia Huck, qemu-devel, Halil Pasic, Christian Borntraeger,
	qemu-s390x, Richard Henderson

On 14.07.20 12:17, Claudio Imbrenda wrote:
> On Fri, 10 Jul 2020 17:12:36 +0200
> David Hildenbrand <david@redhat.com> wrote:
> 
>> Let's implement diag260 - "Access Certain Virtual Machine
>> Information", used under z/VM to expose the storage configuration
>> (especially, layout of storage extends and thereby holes). For now,
>> the returned information is completely redundant to the information
>> exposed via SCLP.
>>
>> We want to reuse diag260 in QEMU to implement memory devices - to
>> have a mechanism to indicate to the guest OS that the initial ram
>> size and the maximum possible physical address differ.
>>
>> The Linux kernel supports diag260 (0x10) to query the available memory
>> since v4.20. Ancient Linux versions used diag 260 (0xc), but stopped
>> doing so a while ago.
>>
>> Let's unconditionally implement the new diag, without any migration
>> checks (e.g., compatibility machine, CPU model). Although a guest OS
>> could observe this when migrating between QEMU evrsions, it's somewhat
>> unlikely to ever trigger due to the way diag260 is used within a guest
>> OS - called only once or twice during boot.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  target/s390x/diag.c        | 51
>> ++++++++++++++++++++++++++++++++++++++ target/s390x/internal.h    |
>> 2 ++ target/s390x/kvm.c         | 11 ++++++++
>>  target/s390x/misc_helper.c |  6 +++++
>>  target/s390x/translate.c   |  7 ++++++
>>  5 files changed, 77 insertions(+)
>>
>> diff --git a/target/s390x/diag.c b/target/s390x/diag.c
>> index be70aecd72..5378fcf582 100644
>> --- a/target/s390x/diag.c
>> +++ b/target/s390x/diag.c
>> @@ -23,6 +23,57 @@
>>  #include "hw/s390x/pv.h"
>>  #include "kvm_s390x.h"
>>  
>> +void handle_diag_260(CPUS390XState *env, uint64_t r1, uint64_t r3,
>> uintptr_t ra) +{
>> +    MachineState *ms = MACHINE(qdev_get_machine());
>> +    const ram_addr_t initial_ram_size = ms->ram_size;
>> +    const uint64_t subcode = env->regs[r3];
>> +
>> +    switch (subcode) {
>> +    case 0xc:
>> +        /* The first storage extent maps to our initial ram. */
>> +        env->regs[r1] = initial_ram_size - 1;
>> +        /* The highest addressable byte maps to the initial ram size
>> for now. */
>> +        env->regs[r3] = initial_ram_size - 1;
>> +        break;
>> +    case 0x10: {
>> +        ram_addr_t addr, length;
>> +        uint64_t tmp;
>> +
>> +        if (r1 & 1) {
>> +            s390_program_interrupt(env, PGM_SPECIFICATION, ra);
>> +            return;
>> +        }
>> +
>> +        addr = env->regs[r1];
>> +        length = env->regs[r1 + 1];
>> +        if (!QEMU_IS_ALIGNED(addr, 16) || !QEMU_IS_ALIGNED(length,
>> 16) ||
>> +            !length) {
>> +            s390_program_interrupt(env, PGM_SPECIFICATION, ra);
>> +            return;
>> +        }
>> +        if (!address_space_access_valid(&address_space_memory, addr,
>> length,
>> +                                        true,
>> MEMTXATTRS_UNSPECIFIED)) {
>> +            s390_program_interrupt(env, PGM_ADDRESSING, ra);
>> +            return;
>> +        }
>> +
>> +        /* Indicate our initial memory ([0 .. ram_size - 1]) */
>> +        tmp = cpu_to_be64(0);
>> +        cpu_physical_memory_write(addr, &tmp, sizeof(tmp));
>> +        tmp = cpu_to_be64(initial_ram_size - 1);
>> +        cpu_physical_memory_write(addr + sizeof(tmp), &tmp,
>> sizeof(tmp)); +
>> +        /* Exactly one entry was stored, it always fits into the
>> area. */
> 
> maybe I missed something, but I have the impression that your
> implementation of DIAG 260 always only returns the first extent?

We only indicate boot memory (e.g., -m 2G), never any memory part of the
device memory address space, because such memory has different semantics.

> 
> shouldn't it return all the hotplugged areas once hotplugging is
> enabled?

No, that would be dangerous and wrong. Memory ranges part of memory
devices never must be indicated as part of hw/firmware interfaces to
indicate valid boot memory. Memory provided via memory devices
(virtio-mem, virtio-pmem, ...) has different semantics than ordinary
hotplugged memory, and unmodified OSs (esp., older Linux versions)
should not silently try to make use of any such memory. It's not just
some hotplugged memory a guest OS should detect+use during boot as
system ram. Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFCv2 2/6] s390x/diag: no need to check for PGM_PRIVILEGED in diag308
  2020-07-10 15:12 ` [PATCH RFCv2 2/6] s390x/diag: no need to check for PGM_PRIVILEGED in diag308 David Hildenbrand
@ 2020-07-15  9:27   ` Janosch Frank
  0 siblings, 0 replies; 12+ messages in thread
From: Janosch Frank @ 2020-07-15  9:27 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: Thomas Huth, Michael S . Tsirkin, Heiko Carstens, Cornelia Huck,
	Halil Pasic, Christian Borntraeger, qemu-s390x, Claudio Imbrenda,
	Richard Henderson


[-- Attachment #1.1: Type: text/plain, Size: 1127 bytes --]

On 7/10/20 5:12 PM, David Hildenbrand wrote:
> Whenever we reach this point via KVM or TCG, we already verified that we
> are running in the supervisor state.
> 
> TCG checks this via IF_PRIV, KVM checks this directly in the diag
> instruction handler, before exiting to userspace.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

I'm trusting you on the tcg part.

Acked-by: Janosch Frank <frankja@linux.ibm.com>

> ---
>  target/s390x/diag.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/target/s390x/diag.c b/target/s390x/diag.c
> index 1a48429564..be70aecd72 100644
> --- a/target/s390x/diag.c
> +++ b/target/s390x/diag.c
> @@ -80,11 +80,6 @@ void handle_diag_308(CPUS390XState *env, uint64_t r1, uint64_t r3, uintptr_t ra)
>      uint64_t subcode = env->regs[r3];
>      IplParameterBlock *iplb;
>  
> -    if (env->psw.mask & PSW_MASK_PSTATE) {
> -        s390_program_interrupt(env, PGM_PRIVILEGED, ra);
> -        return;
> -    }
> -
>      if (subcode & ~0x0ffffULL) {
>          s390_program_interrupt(env, PGM_SPECIFICATION, ra);
>          return;
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFCv2 3/6] s390x/diag: implement diag260
  2020-07-15  9:19     ` David Hildenbrand
@ 2020-07-15 10:53       ` Heiko Carstens
  2020-07-15 11:13         ` David Hildenbrand
  0 siblings, 1 reply; 12+ messages in thread
From: Heiko Carstens @ 2020-07-15 10:53 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Thomas Huth, Janosch Frank, Michael S . Tsirkin, Cornelia Huck,
	qemu-devel, Halil Pasic, Christian Borntraeger, qemu-s390x,
	Claudio Imbrenda, Richard Henderson

On Wed, Jul 15, 2020 at 11:19:03AM +0200, David Hildenbrand wrote:
> On 14.07.20 12:17, Claudio Imbrenda wrote:
> > shouldn't it return all the hotplugged areas once hotplugging is
> > enabled?
> 
> No, that would be dangerous and wrong. Memory ranges part of memory
> devices never must be indicated as part of hw/firmware interfaces to
> indicate valid boot memory. Memory provided via memory devices
> (virtio-mem, virtio-pmem, ...) has different semantics than ordinary
> hotplugged memory, and unmodified OSs (esp., older Linux versions)
> should not silently try to make use of any such memory. It's not just
> some hotplugged memory a guest OS should detect+use during boot as
> system ram. Thanks!

How is kdump supposed to work, if there is no mechanism to figure out
which memory ranges have been added dynamically to the system?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFCv2 3/6] s390x/diag: implement diag260
  2020-07-15 10:53       ` Heiko Carstens
@ 2020-07-15 11:13         ` David Hildenbrand
  0 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2020-07-15 11:13 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Thomas Huth, Janosch Frank, Michael S . Tsirkin, Cornelia Huck,
	qemu-devel, Halil Pasic, Christian Borntraeger, qemu-s390x,
	Claudio Imbrenda, Richard Henderson

On 15.07.20 12:53, Heiko Carstens wrote:
> On Wed, Jul 15, 2020 at 11:19:03AM +0200, David Hildenbrand wrote:
>> On 14.07.20 12:17, Claudio Imbrenda wrote:
>>> shouldn't it return all the hotplugged areas once hotplugging is
>>> enabled?
>>
>> No, that would be dangerous and wrong. Memory ranges part of memory
>> devices never must be indicated as part of hw/firmware interfaces to
>> indicate valid boot memory. Memory provided via memory devices
>> (virtio-mem, virtio-pmem, ...) has different semantics than ordinary
>> hotplugged memory, and unmodified OSs (esp., older Linux versions)
>> should not silently try to make use of any such memory. It's not just
>> some hotplugged memory a guest OS should detect+use during boot as
>> system ram. Thanks!
> 
> How is kdump supposed to work, if there is no mechanism to figure out
> which memory ranges have been added dynamically to the system?

Like other archs (esp x86-64), we would want to revive letting user
space (kexec-tools) prepare the dump header, specifying the memory
ranges. Would be necessary under KVM only, if virtio-mem is detected.

(Note: I am not sure if we would currently dump standby memory that has
been onlined from a kdump kernel.)

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-07-15 11:13 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-10 15:12 [PATCH RFCv2 0/6] s390x: initial support for virtio-mem David Hildenbrand
2020-07-10 15:12 ` [PATCH RFCv2 1/6] s390x: move setting of maximum ram size to machine init David Hildenbrand
2020-07-10 15:12 ` [PATCH RFCv2 2/6] s390x/diag: no need to check for PGM_PRIVILEGED in diag308 David Hildenbrand
2020-07-15  9:27   ` Janosch Frank
2020-07-10 15:12 ` [PATCH RFCv2 3/6] s390x/diag: implement diag260 David Hildenbrand
2020-07-14 10:17   ` Claudio Imbrenda
2020-07-15  9:19     ` David Hildenbrand
2020-07-15 10:53       ` Heiko Carstens
2020-07-15 11:13         ` David Hildenbrand
2020-07-10 15:12 ` [PATCH RFCv2 4/6] s390x: prepare device memory address space David Hildenbrand
2020-07-10 15:14 ` [PATCH RFCv2 5/6] s390x: implement virtio-mem-ccw David Hildenbrand
2020-07-10 15:14 ` [PATCH RFCv2 6/6] s390x: initial support for virtio-mem David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).