All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH RFC 0/2] s390: stop abusing memory_region_allocate_system_memory()
@ 2019-07-29 14:52 Igor Mammedov
  2019-07-29 14:52 ` [Qemu-devel] [PATCH RFC 1/2] memory: make MemoryRegion alias migratable Igor Mammedov
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Igor Mammedov @ 2019-07-29 14:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, qemu-s390x, dgilbert


While looking into unifying guest RAM allocation to use hostmem backends
for initial RAM (especially when -mempath is used) and retiring
memory_region_allocate_system_memory() API, leaving only single hostmem backend,
I was inspecting how currently it is used by boards and it turns out several
boards abuse it by calling the function several times (despite documented contract
forbiding it).

s390 is one of such boards where KVM limitation on memslot size got propagated
to board design and memory_region_allocate_system_memory() was abused to satisfy
KVM requirement for max RAM chunk where memory region alias would suffice.

Unfortunately, memory_region_allocate_system_memory() usage created migration
dependency where guest RAM is transferred in migration stream as several RAMBlocks
if it's more than KVM_SLOT_MAX_BYTES.

In order to replace these several RAM chunks with a single memdev and keep it
working with KVM memslot size limit and migration compatible, following was done:
   * [2/2] use memory region aliases to partition hostmem backend RAM on
           KVM_SLOT_MAX_BYTES chunks, which should keep KVM side working
   * [1/2] hacked memory region aliases (to ram memory regions only) to have
           its own RAMBlocks pointing to RAM chunks owned by aliased memory
           region. While it's admittedly a hack, but it's relatively simple and
           allows board code rashape migration stream as necessary

           I haven't tried to use migratable aliases on x86 machines, but with it
           it could be possible to drop legacy RAM allocation and compat knob
           (cd5ff8333a) dropping '-numa node,mem' completely even for old machines.

PS:
   Tested with ping pong cross version migration on s390 machine 
   (with reduced KVM_SLOT_MAX_BYTES since I don't have access to large
    enough host)
     

Igor Mammedov (2):
  memory: make MemoryRegion alias migratable
  s390: do not call memory_region_allocate_system_memory() multiple
    times

 exec.c                     |  7 ++++---
 hw/s390x/s390-virtio-ccw.c | 20 +++++++++++++++-----
 memory.c                   |  5 +++++
 3 files changed, 24 insertions(+), 8 deletions(-)

-- 
2.18.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH RFC 1/2] memory: make MemoryRegion alias migratable
  2019-07-29 14:52 [Qemu-devel] [PATCH RFC 0/2] s390: stop abusing memory_region_allocate_system_memory() Igor Mammedov
@ 2019-07-29 14:52 ` Igor Mammedov
  2019-07-29 17:53   ` Dr. David Alan Gilbert
  2019-07-29 14:52 ` [Qemu-devel] [PATCH RFC 2/2] s390: do not call memory_region_allocate_system_memory() multiple times Igor Mammedov
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Igor Mammedov @ 2019-07-29 14:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, qemu-s390x, dgilbert

use qemu_ram_alloc_from_ptr() to create aliased RAMBlock
to the part of original memory region.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 exec.c   | 7 ++++---
 memory.c | 5 +++++
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/exec.c b/exec.c
index 3e78de3b8f..daef0cd54f 100644
--- a/exec.c
+++ b/exec.c
@@ -2313,7 +2313,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
                                         new_block->used_length,
                                         DIRTY_CLIENTS_ALL);
 
-    if (new_block->host) {
+    if (new_block->host && !new_block->mr->alias) {
         qemu_ram_setup_dump(new_block->host, new_block->max_length);
         qemu_madvise(new_block->host, new_block->max_length, QEMU_MADV_HUGEPAGE);
         /* MADV_DONTFORK is also needed by KVM in absence of synchronous MMU */
@@ -2671,7 +2671,8 @@ RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
 
     rcu_read_lock();
     block = atomic_rcu_read(&ram_list.mru_block);
-    if (block && block->host && host - block->host < block->max_length) {
+    if (block && !block->mr->alias && block->host &&
+        host - block->host < block->max_length) {
         goto found;
     }
 
@@ -2680,7 +2681,7 @@ RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
         if (block->host == NULL) {
             continue;
         }
-        if (host - block->host < block->max_length) {
+        if (!block->mr->alias && host - block->host < block->max_length) {
             goto found;
         }
     }
diff --git a/memory.c b/memory.c
index 5d8c9a9234..d710c17a26 100644
--- a/memory.c
+++ b/memory.c
@@ -1678,6 +1678,11 @@ void memory_region_init_alias(MemoryRegion *mr,
     memory_region_init(mr, owner, name, size);
     mr->alias = orig;
     mr->alias_offset = offset;
+    if (orig->ram_block && size) {
+        mr->ram_block = qemu_ram_alloc_from_ptr(size,
+                                                orig->ram_block->host + offset,
+                                                mr, &error_fatal);
+    }
 }
 
 void memory_region_init_rom_nomigrate(MemoryRegion *mr,
-- 
2.18.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH RFC 2/2] s390: do not call memory_region_allocate_system_memory() multiple times
  2019-07-29 14:52 [Qemu-devel] [PATCH RFC 0/2] s390: stop abusing memory_region_allocate_system_memory() Igor Mammedov
  2019-07-29 14:52 ` [Qemu-devel] [PATCH RFC 1/2] memory: make MemoryRegion alias migratable Igor Mammedov
@ 2019-07-29 14:52 ` Igor Mammedov
  2019-07-29 14:58 ` [Qemu-devel] [PATCH RFC 0/2] s390: stop abusing memory_region_allocate_system_memory() Cornelia Huck
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 18+ messages in thread
From: Igor Mammedov @ 2019-07-29 14:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, qemu-s390x, dgilbert

s390 was trying to solve limited KVM memslot size issue by abusing
memory_region_allocate_system_memory(), which breaks API contract
where the function might be called only once.

s390 should have used memory aliases to fragment inital memory into
smaller chunks to satisfy KVM's memslot limitation. But its a bit
late now, since allocated pieces are transfered in migration stream
separately, so it's not possible to just replace broken layout with
correct one. To workaround previous patch made MemoryRegion alases
migratable and this patch switches to use them to split big initial
RAM chunk into smaller pieces up to KVM_SLOT_MAX_BYTES each and
registers aliases for migration. That should keep migration and
guest ABI (RAM mapping) compatible with previous QEMU versions.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
I don't have access to a suitable system to test it, so I've simulated
it with smaller chunks on x84 host. Ping-pong migration between old
and new QEMU worked fine. KVM part should be fine as memslots
using mapped MemoryRegions (in this case it would be aliases) as
far as I know but is someone could test it on big enough host it
would be nice.
---
 hw/s390x/s390-virtio-ccw.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 5b6a9a4e55..77910419e6 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -163,21 +163,31 @@ static void virtio_ccw_register_hcalls(void)
 static void s390_memory_init(ram_addr_t mem_size)
 {
     MemoryRegion *sysmem = get_system_memory();
+    MemoryRegion *ram = g_new(MemoryRegion, 1);
     ram_addr_t chunk, offset = 0;
     unsigned int number = 0;
     Error *local_err = NULL;
     gchar *name;
 
     /* allocate RAM for core */
+    memory_region_allocate_system_memory(ram, NULL, "s390.whole.ram", mem_size);
+    /*
+     * memory_region_allocate_system_memory() registers allocated RAM for
+     * migration, however for compat reasons the RAM should be passed over
+     * as RAMBlocks of the size upto KVM_SLOT_MAX_BYTES. So unregister just
+     * allocated RAM so it won't be migrated directly. Aliases will take care
+     * of segmenting RAM into legacy chunks that migration and KVM compatible.
+     */
+    vmstate_unregister_ram(ram, NULL);
     name = g_strdup_printf("s390.ram");
     while (mem_size) {
-        MemoryRegion *ram = g_new(MemoryRegion, 1);
-        uint64_t size = mem_size;
+        MemoryRegion *alias = g_new(MemoryRegion, 1);
 
         /* KVM does not allow memslots >= 8 TB */
-        chunk = MIN(size, KVM_SLOT_MAX_BYTES);
-        memory_region_allocate_system_memory(ram, NULL, name, chunk);
-        memory_region_add_subregion(sysmem, offset, ram);
+        chunk = MIN(mem_size, KVM_SLOT_MAX_BYTES);
+        memory_region_init_alias(alias, NULL, name, ram, offset, chunk);
+        vmstate_register_ram_global(alias);
+        memory_region_add_subregion(sysmem, offset, alias);
         mem_size -= chunk;
         offset += chunk;
         g_free(name);
-- 
2.18.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 0/2] s390: stop abusing memory_region_allocate_system_memory()
  2019-07-29 14:52 [Qemu-devel] [PATCH RFC 0/2] s390: stop abusing memory_region_allocate_system_memory() Igor Mammedov
  2019-07-29 14:52 ` [Qemu-devel] [PATCH RFC 1/2] memory: make MemoryRegion alias migratable Igor Mammedov
  2019-07-29 14:52 ` [Qemu-devel] [PATCH RFC 2/2] s390: do not call memory_region_allocate_system_memory() multiple times Igor Mammedov
@ 2019-07-29 14:58 ` Cornelia Huck
  2019-07-30 15:22 ` [Qemu-devel] [qemu-s390x] " Christian Borntraeger
  2019-08-02  8:04 ` David Hildenbrand
  4 siblings, 0 replies; 18+ messages in thread
From: Cornelia Huck @ 2019-07-29 14:58 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Thomas Huth, David Hildenbrand, qemu-devel, dgilbert,
	Christian Borntraeger, qemu-s390x, pbonzini

On Mon, 29 Jul 2019 10:52:27 -0400
Igor Mammedov <imammedo@redhat.com> wrote:

cc:ing some folks for awareness.

> While looking into unifying guest RAM allocation to use hostmem backends
> for initial RAM (especially when -mempath is used) and retiring
> memory_region_allocate_system_memory() API, leaving only single hostmem backend,
> I was inspecting how currently it is used by boards and it turns out several
> boards abuse it by calling the function several times (despite documented contract
> forbiding it).
> 
> s390 is one of such boards where KVM limitation on memslot size got propagated
> to board design and memory_region_allocate_system_memory() was abused to satisfy
> KVM requirement for max RAM chunk where memory region alias would suffice.
> 
> Unfortunately, memory_region_allocate_system_memory() usage created migration
> dependency where guest RAM is transferred in migration stream as several RAMBlocks
> if it's more than KVM_SLOT_MAX_BYTES.
> 
> In order to replace these several RAM chunks with a single memdev and keep it
> working with KVM memslot size limit and migration compatible, following was done:
>    * [2/2] use memory region aliases to partition hostmem backend RAM on
>            KVM_SLOT_MAX_BYTES chunks, which should keep KVM side working
>    * [1/2] hacked memory region aliases (to ram memory regions only) to have
>            its own RAMBlocks pointing to RAM chunks owned by aliased memory
>            region. While it's admittedly a hack, but it's relatively simple and
>            allows board code rashape migration stream as necessary
> 
>            I haven't tried to use migratable aliases on x86 machines, but with it
>            it could be possible to drop legacy RAM allocation and compat knob
>            (cd5ff8333a) dropping '-numa node,mem' completely even for old machines.
> 
> PS:
>    Tested with ping pong cross version migration on s390 machine 
>    (with reduced KVM_SLOT_MAX_BYTES since I don't have access to large
>     enough host)
>      
> 
> Igor Mammedov (2):
>   memory: make MemoryRegion alias migratable
>   s390: do not call memory_region_allocate_system_memory() multiple
>     times
> 
>  exec.c                     |  7 ++++---
>  hw/s390x/s390-virtio-ccw.c | 20 +++++++++++++++-----
>  memory.c                   |  5 +++++
>  3 files changed, 24 insertions(+), 8 deletions(-)
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 1/2] memory: make MemoryRegion alias migratable
  2019-07-29 14:52 ` [Qemu-devel] [PATCH RFC 1/2] memory: make MemoryRegion alias migratable Igor Mammedov
@ 2019-07-29 17:53   ` Dr. David Alan Gilbert
  2019-07-30 13:25     ` Igor Mammedov
  0 siblings, 1 reply; 18+ messages in thread
From: Dr. David Alan Gilbert @ 2019-07-29 17:53 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: pbonzini, qemu-s390x, qemu-devel

* Igor Mammedov (imammedo@redhat.com) wrote:
> use qemu_ram_alloc_from_ptr() to create aliased RAMBlock
> to the part of original memory region.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  exec.c   | 7 ++++---
>  memory.c | 5 +++++
>  2 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 3e78de3b8f..daef0cd54f 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2313,7 +2313,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
>                                          new_block->used_length,
>                                          DIRTY_CLIENTS_ALL);
>  
> -    if (new_block->host) {
> +    if (new_block->host && !new_block->mr->alias) {
>          qemu_ram_setup_dump(new_block->host, new_block->max_length);
>          qemu_madvise(new_block->host, new_block->max_length, QEMU_MADV_HUGEPAGE);
>          /* MADV_DONTFORK is also needed by KVM in absence of synchronous MMU */
> @@ -2671,7 +2671,8 @@ RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
>  
>      rcu_read_lock();
>      block = atomic_rcu_read(&ram_list.mru_block);
> -    if (block && block->host && host - block->host < block->max_length) {
> +    if (block && !block->mr->alias && block->host &&
> +        host - block->host < block->max_length) {
>          goto found;
>      }
>  
> @@ -2680,7 +2681,7 @@ RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
>          if (block->host == NULL) {
>              continue;
>          }
> -        if (host - block->host < block->max_length) {
> +        if (!block->mr->alias && host - block->host < block->max_length) {
>              goto found;
>          }
>      }
> diff --git a/memory.c b/memory.c
> index 5d8c9a9234..d710c17a26 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1678,6 +1678,11 @@ void memory_region_init_alias(MemoryRegion *mr,
>      memory_region_init(mr, owner, name, size);
>      mr->alias = orig;
>      mr->alias_offset = offset;
> +    if (orig->ram_block && size) {
> +        mr->ram_block = qemu_ram_alloc_from_ptr(size,
> +                                                orig->ram_block->host + offset,
> +                                                mr, &error_fatal);
> +    }
>  }

Doesn't this cause new memory regions to be created in other existing
machines, e.g. x86's mem-smram, or the various PCI vga hacks?

Dave
>  
>  void memory_region_init_rom_nomigrate(MemoryRegion *mr,
> -- 
> 2.18.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 1/2] memory: make MemoryRegion alias migratable
  2019-07-29 17:53   ` Dr. David Alan Gilbert
@ 2019-07-30 13:25     ` Igor Mammedov
  2019-07-30 13:34       ` Paolo Bonzini
  0 siblings, 1 reply; 18+ messages in thread
From: Igor Mammedov @ 2019-07-30 13:25 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: pbonzini, qemu-s390x, qemu-devel

On Mon, 29 Jul 2019 18:53:15 +0100
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Igor Mammedov (imammedo@redhat.com) wrote:
> > use qemu_ram_alloc_from_ptr() to create aliased RAMBlock
> > to the part of original memory region.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  exec.c   | 7 ++++---
> >  memory.c | 5 +++++
> >  2 files changed, 9 insertions(+), 3 deletions(-)
> > 
> > diff --git a/exec.c b/exec.c
> > index 3e78de3b8f..daef0cd54f 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -2313,7 +2313,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp, bool shared)
> >                                          new_block->used_length,
> >                                          DIRTY_CLIENTS_ALL);
> >  
> > -    if (new_block->host) {
> > +    if (new_block->host && !new_block->mr->alias) {
> >          qemu_ram_setup_dump(new_block->host, new_block->max_length);
> >          qemu_madvise(new_block->host, new_block->max_length, QEMU_MADV_HUGEPAGE);
> >          /* MADV_DONTFORK is also needed by KVM in absence of synchronous MMU */
> > @@ -2671,7 +2671,8 @@ RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
> >  
> >      rcu_read_lock();
> >      block = atomic_rcu_read(&ram_list.mru_block);
> > -    if (block && block->host && host - block->host < block->max_length) {
> > +    if (block && !block->mr->alias && block->host &&
> > +        host - block->host < block->max_length) {
> >          goto found;
> >      }
> >  
> > @@ -2680,7 +2681,7 @@ RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
> >          if (block->host == NULL) {
> >              continue;
> >          }
> > -        if (host - block->host < block->max_length) {
> > +        if (!block->mr->alias && host - block->host < block->max_length) {
> >              goto found;
> >          }
> >      }
> > diff --git a/memory.c b/memory.c
> > index 5d8c9a9234..d710c17a26 100644
> > --- a/memory.c
> > +++ b/memory.c
> > @@ -1678,6 +1678,11 @@ void memory_region_init_alias(MemoryRegion *mr,
> >      memory_region_init(mr, owner, name, size);
> >      mr->alias = orig;
> >      mr->alias_offset = offset;
> > +    if (orig->ram_block && size) {
> > +        mr->ram_block = qemu_ram_alloc_from_ptr(size,
> > +                                                orig->ram_block->host + offset,
> > +                                                mr, &error_fatal);
> > +    }
> >  }
> 
> Doesn't this cause new memory regions to be created in other existing
> machines, e.g. x86's mem-smram, or the various PCI vga hacks?

I'd guess you've meant RAMBlocks instead of memory regions, if that's it
then yes, every alias pointing to RAM backed memory region will have
RAMBlock that's points to aliased part of aliased memory region.

(I didn't not intended to limit it to s390 only as it potentially
 could be used for x86 initial memory refactoring as well)
If it's issue we probably can address that by adding additional API
  memory_region_alias_make_migratable()
and call it on selected aliases, but the less APIs we have the better
so I went for reusing exiting vmstate_register_ram_global().

(I admit faking RAMBlock is a hack (albeit working one),
 but my attempt to teach migration code to accept aliases directly didn't
 work for me well and was very fragile, I don't remember specifics any more
 and probably code as well since it was utter failure)

PS:
 this patch is by no means a complete one (I probably cut corners on cleanup path)
 but should be sufficient to demo idea and see if it's acceptable.


> Dave
> >  
> >  void memory_region_init_rom_nomigrate(MemoryRegion *mr,
> > -- 
> > 2.18.1
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 1/2] memory: make MemoryRegion alias migratable
  2019-07-30 13:25     ` Igor Mammedov
@ 2019-07-30 13:34       ` Paolo Bonzini
  2019-07-30 14:35         ` Igor Mammedov
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2019-07-30 13:34 UTC (permalink / raw)
  To: Igor Mammedov, Dr. David Alan Gilbert; +Cc: qemu-s390x, qemu-devel

On 30/07/19 15:25, Igor Mammedov wrote:
> I'd guess you've meant RAMBlocks instead of memory regions, if that's it
> then yes, every alias pointing to RAM backed memory region will have
> RAMBlock that's points to aliased part of aliased memory region.

The question is just, does it break migration from old QEMU to new QEMU
on x86 (which has plenty of RAM-backed aliases)?  If not, explain that
in the commit message or in a code comment.

Paolo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 1/2] memory: make MemoryRegion alias migratable
  2019-07-30 13:34       ` Paolo Bonzini
@ 2019-07-30 14:35         ` Igor Mammedov
  2019-07-30 15:41           ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 18+ messages in thread
From: Igor Mammedov @ 2019-07-30 14:35 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-s390x, Dr. David Alan Gilbert, qemu-devel

On Tue, 30 Jul 2019 15:34:54 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 30/07/19 15:25, Igor Mammedov wrote:
> > I'd guess you've meant RAMBlocks instead of memory regions, if that's it
> > then yes, every alias pointing to RAM backed memory region will have
> > RAMBlock that's points to aliased part of aliased memory region.
> 
> The question is just, does it break migration from old QEMU to new QEMU
> on x86 (which has plenty of RAM-backed aliases)?  If not, explain that
> in the commit message or in a code comment.

pc/q35 old<->new new<->old ping pong tests worked fine (qemu with default devices).

My understanding was that one needs to call vmstate_register_ram()
to make memory region migratable, which marks specific RAMBlock
as migratable. So just assigning new RAMBlock to mr->ram_block shouldn't
affect migration, unless vmstate_register_ram() is called on alias.

I'll add it to commit message.

> 
> Paolo
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH RFC 0/2] s390: stop abusing memory_region_allocate_system_memory()
  2019-07-29 14:52 [Qemu-devel] [PATCH RFC 0/2] s390: stop abusing memory_region_allocate_system_memory() Igor Mammedov
                   ` (2 preceding siblings ...)
  2019-07-29 14:58 ` [Qemu-devel] [PATCH RFC 0/2] s390: stop abusing memory_region_allocate_system_memory() Cornelia Huck
@ 2019-07-30 15:22 ` Christian Borntraeger
  2019-07-30 15:49   ` Igor Mammedov
  2019-08-02  8:04 ` David Hildenbrand
  4 siblings, 1 reply; 18+ messages in thread
From: Christian Borntraeger @ 2019-07-30 15:22 UTC (permalink / raw)
  To: Igor Mammedov, qemu-devel; +Cc: pbonzini, qemu-s390x, dgilbert

I remember that you send a similar patch a while ago and something broke on s390x.
Have you changed something from the old patchs set?

On 29.07.19 16:52, Igor Mammedov wrote:
> While looking into unifying guest RAM allocation to use hostmem backends
> for initial RAM (especially when -mempath is used) and retiring
> memory_region_allocate_system_memory() API, leaving only single hostmem backend,
> I was inspecting how currently it is used by boards and it turns out several
> boards abuse it by calling the function several times (despite documented contract
> forbiding it).
> 
> s390 is one of such boards where KVM limitation on memslot size got propagated
> to board design and memory_region_allocate_system_memory() was abused to satisfy
> KVM requirement for max RAM chunk where memory region alias would suffice.
> 
> Unfortunately, memory_region_allocate_system_memory() usage created migration
> dependency where guest RAM is transferred in migration stream as several RAMBlocks
> if it's more than KVM_SLOT_MAX_BYTES.
> 
> In order to replace these several RAM chunks with a single memdev and keep it
> working with KVM memslot size limit and migration compatible, following was done:
>    * [2/2] use memory region aliases to partition hostmem backend RAM on
>            KVM_SLOT_MAX_BYTES chunks, which should keep KVM side working
>    * [1/2] hacked memory region aliases (to ram memory regions only) to have
>            its own RAMBlocks pointing to RAM chunks owned by aliased memory
>            region. While it's admittedly a hack, but it's relatively simple and
>            allows board code rashape migration stream as necessary
> 
>            I haven't tried to use migratable aliases on x86 machines, but with it
>            it could be possible to drop legacy RAM allocation and compat knob
>            (cd5ff8333a) dropping '-numa node,mem' completely even for old machines.
> 
> PS:
>    Tested with ping pong cross version migration on s390 machine 
>    (with reduced KVM_SLOT_MAX_BYTES since I don't have access to large
>     enough host)
>      
> 
> Igor Mammedov (2):
>   memory: make MemoryRegion alias migratable
>   s390: do not call memory_region_allocate_system_memory() multiple
>     times
> 
>  exec.c                     |  7 ++++---
>  hw/s390x/s390-virtio-ccw.c | 20 +++++++++++++++-----
>  memory.c                   |  5 +++++
>  3 files changed, 24 insertions(+), 8 deletions(-)
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH RFC 1/2] memory: make MemoryRegion alias migratable
  2019-07-30 14:35         ` Igor Mammedov
@ 2019-07-30 15:41           ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 18+ messages in thread
From: Dr. David Alan Gilbert @ 2019-07-30 15:41 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: Paolo Bonzini, qemu-s390x, qemu-devel

* Igor Mammedov (imammedo@redhat.com) wrote:
> On Tue, 30 Jul 2019 15:34:54 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> > On 30/07/19 15:25, Igor Mammedov wrote:
> > > I'd guess you've meant RAMBlocks instead of memory regions, if that's it
> > > then yes, every alias pointing to RAM backed memory region will have
> > > RAMBlock that's points to aliased part of aliased memory region.
> > 
> > The question is just, does it break migration from old QEMU to new QEMU
> > on x86 (which has plenty of RAM-backed aliases)?  If not, explain that
> > in the commit message or in a code comment.
> 
> pc/q35 old<->new new<->old ping pong tests worked fine (qemu with default devices).
> 
> My understanding was that one needs to call vmstate_register_ram()
> to make memory region migratable, which marks specific RAMBlock
> as migratable. So just assigning new RAMBlock to mr->ram_block shouldn't
> affect migration, unless vmstate_register_ram() is called on alias.

You're right; that's actually relatively new and I forget about it.
 From Cedric's b895de50271 in 3.0, before that we migrated every
RAMBlock.

So yes, I think you should be OK.

Dave

> I'll add it to commit message.
> 
> > 
> > Paolo
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH RFC 0/2] s390: stop abusing memory_region_allocate_system_memory()
  2019-07-30 15:22 ` [Qemu-devel] [qemu-s390x] " Christian Borntraeger
@ 2019-07-30 15:49   ` Igor Mammedov
  0 siblings, 0 replies; 18+ messages in thread
From: Igor Mammedov @ 2019-07-30 15:49 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: pbonzini, qemu-s390x, qemu-devel, dgilbert

On Tue, 30 Jul 2019 17:22:01 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> I remember that you send a similar patch a while ago and something broke on s390x.
> Have you changed something from the old patchs set?
Thanks for reminder, I totally forgot about it.

it was "[PATCH v1 5/5] s390: do not call memory_region_allocate_system_memory() multiple times"
now looking at history it all comes back, so this series is incomplete as is
due to memory memory region aliases being merged back to one big memory section
after flatview is rendered. So KVM get too big chunk of RAM and it breaks.

So aliases solve only half of the problem (keeping migration side working)
and to fix KVM side, I'd add splitting memory section on chunks into
kvm_set_phys_mem() to keep KVM specifics to kvm code only.
Board would only have to set max size value and kvm code would use it for splitting,
I'll try to find that patch.


> 
> On 29.07.19 16:52, Igor Mammedov wrote:
> > While looking into unifying guest RAM allocation to use hostmem backends
> > for initial RAM (especially when -mempath is used) and retiring
> > memory_region_allocate_system_memory() API, leaving only single hostmem backend,
> > I was inspecting how currently it is used by boards and it turns out several
> > boards abuse it by calling the function several times (despite documented contract
> > forbiding it).
> > 
> > s390 is one of such boards where KVM limitation on memslot size got propagated
> > to board design and memory_region_allocate_system_memory() was abused to satisfy
> > KVM requirement for max RAM chunk where memory region alias would suffice.
> > 
> > Unfortunately, memory_region_allocate_system_memory() usage created migration
> > dependency where guest RAM is transferred in migration stream as several RAMBlocks
> > if it's more than KVM_SLOT_MAX_BYTES.
> > 
> > In order to replace these several RAM chunks with a single memdev and keep it
> > working with KVM memslot size limit and migration compatible, following was done:
> >    * [2/2] use memory region aliases to partition hostmem backend RAM on
> >            KVM_SLOT_MAX_BYTES chunks, which should keep KVM side working
> >    * [1/2] hacked memory region aliases (to ram memory regions only) to have
> >            its own RAMBlocks pointing to RAM chunks owned by aliased memory
> >            region. While it's admittedly a hack, but it's relatively simple and
> >            allows board code rashape migration stream as necessary
> > 
> >            I haven't tried to use migratable aliases on x86 machines, but with it
> >            it could be possible to drop legacy RAM allocation and compat knob
> >            (cd5ff8333a) dropping '-numa node,mem' completely even for old machines.
> > 
> > PS:
> >    Tested with ping pong cross version migration on s390 machine 
> >    (with reduced KVM_SLOT_MAX_BYTES since I don't have access to large
> >     enough host)
> >      
> > 
> > Igor Mammedov (2):
> >   memory: make MemoryRegion alias migratable
> >   s390: do not call memory_region_allocate_system_memory() multiple
> >     times
> > 
> >  exec.c                     |  7 ++++---
> >  hw/s390x/s390-virtio-ccw.c | 20 +++++++++++++++-----
> >  memory.c                   |  5 +++++
> >  3 files changed, 24 insertions(+), 8 deletions(-)
> > 
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH RFC 0/2] s390: stop abusing memory_region_allocate_system_memory()
  2019-07-29 14:52 [Qemu-devel] [PATCH RFC 0/2] s390: stop abusing memory_region_allocate_system_memory() Igor Mammedov
                   ` (3 preceding siblings ...)
  2019-07-30 15:22 ` [Qemu-devel] [qemu-s390x] " Christian Borntraeger
@ 2019-08-02  8:04 ` David Hildenbrand
  2019-08-02  8:23   ` David Hildenbrand
                     ` (2 more replies)
  4 siblings, 3 replies; 18+ messages in thread
From: David Hildenbrand @ 2019-08-02  8:04 UTC (permalink / raw)
  To: Igor Mammedov, qemu-devel; +Cc: pbonzini, qemu-s390x, dgilbert

On 29.07.19 16:52, Igor Mammedov wrote:
> While looking into unifying guest RAM allocation to use hostmem backends
> for initial RAM (especially when -mempath is used) and retiring
> memory_region_allocate_system_memory() API, leaving only single hostmem backend,
> I was inspecting how currently it is used by boards and it turns out several
> boards abuse it by calling the function several times (despite documented contract
> forbiding it).
> 
> s390 is one of such boards where KVM limitation on memslot size got propagated
> to board design and memory_region_allocate_system_memory() was abused to satisfy
> KVM requirement for max RAM chunk where memory region alias would suffice.
> 
> Unfortunately, memory_region_allocate_system_memory() usage created migration
> dependency where guest RAM is transferred in migration stream as several RAMBlocks
> if it's more than KVM_SLOT_MAX_BYTES.

So if I understand it correctly, we only call
memory_region_allocate_system_memory() in case the guest initial memory
size exceeds KVM_SLOT_MAX_BYTES - ~8TB.

Do we *really* care about keeping migration of systems running that most
probably nobody (except Christian ;) ) really uses? (especially not in
production).

I am fine keeping migration running if it's easy, but introducing hacks
(reading below) for such obscure use cases - I don't know.

@Christian: Please prove me wrong. :)

> 
> In order to replace these several RAM chunks with a single memdev and keep it
> working with KVM memslot size limit and migration compatible, following was done:
>    * [2/2] use memory region aliases to partition hostmem backend RAM on
>            KVM_SLOT_MAX_BYTES chunks, which should keep KVM side working
>    * [1/2] hacked memory region aliases (to ram memory regions only) to have
>            its own RAMBlocks pointing to RAM chunks owned by aliased memory
>            region. While it's admittedly a hack, but it's relatively simple and
>            allows board code rashape migration stream as necessary
> 
>            I haven't tried to use migratable aliases on x86 machines, but with it
>            it could be possible to drop legacy RAM allocation and compat knob
>            (cd5ff8333a) dropping '-numa node,mem' completely even for old machines.
> 
> PS:
>    Tested with ping pong cross version migration on s390 machine 
>    (with reduced KVM_SLOT_MAX_BYTES since I don't have access to large
>     enough host)
>      
> 
> Igor Mammedov (2):
>   memory: make MemoryRegion alias migratable
>   s390: do not call memory_region_allocate_system_memory() multiple
>     times
> 
>  exec.c                     |  7 ++++---
>  hw/s390x/s390-virtio-ccw.c | 20 +++++++++++++++-----
>  memory.c                   |  5 +++++
>  3 files changed, 24 insertions(+), 8 deletions(-)
> 


-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH RFC 0/2] s390: stop abusing memory_region_allocate_system_memory()
  2019-08-02  8:04 ` David Hildenbrand
@ 2019-08-02  8:23   ` David Hildenbrand
  2019-08-02  8:26   ` David Hildenbrand
  2019-08-02  8:29   ` Christian Borntraeger
  2 siblings, 0 replies; 18+ messages in thread
From: David Hildenbrand @ 2019-08-02  8:23 UTC (permalink / raw)
  To: Igor Mammedov, qemu-devel; +Cc: pbonzini, qemu-s390x, dgilbert

On 02.08.19 10:04, David Hildenbrand wrote:
> On 29.07.19 16:52, Igor Mammedov wrote:
>> While looking into unifying guest RAM allocation to use hostmem backends
>> for initial RAM (especially when -mempath is used) and retiring
>> memory_region_allocate_system_memory() API, leaving only single hostmem backend,
>> I was inspecting how currently it is used by boards and it turns out several
>> boards abuse it by calling the function several times (despite documented contract
>> forbiding it).
>>
>> s390 is one of such boards where KVM limitation on memslot size got propagated
>> to board design and memory_region_allocate_system_memory() was abused to satisfy
>> KVM requirement for max RAM chunk where memory region alias would suffice.
>>
>> Unfortunately, memory_region_allocate_system_memory() usage created migration
>> dependency where guest RAM is transferred in migration stream as several RAMBlocks
>> if it's more than KVM_SLOT_MAX_BYTES.
> 
> So if I understand it correctly, we only call
> memory_region_allocate_system_memory() in case the guest initial memory
> size exceeds KVM_SLOT_MAX_BYTES - ~8TB.
> 
> Do we *really* care about keeping migration of systems running that most
> probably nobody (except Christian ;) ) really uses? (especially not in
> production).
> 
> I am fine keeping migration running if it's easy, but introducing hacks
> (reading below) for such obscure use cases - I don't know.
> 
> @Christian: Please prove me wrong. :)

For reference:

https://access.redhat.com/articles/rhel-kvm-limits

RHEL 7/8 supports up to 2TB maximum memory in KVM guests.


https://www.suse.com/releasenotes/s390x/SUSE-SLES/15-SP1/

SLES 15 seems to support up to 4TB in a VM


If migration failing would mean "abort migration, continue on migration
source", I could sleep good at night. (as long as nothing crashes)

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH RFC 0/2] s390: stop abusing memory_region_allocate_system_memory()
  2019-08-02  8:04 ` David Hildenbrand
  2019-08-02  8:23   ` David Hildenbrand
@ 2019-08-02  8:26   ` David Hildenbrand
  2019-08-02  8:29   ` Christian Borntraeger
  2 siblings, 0 replies; 18+ messages in thread
From: David Hildenbrand @ 2019-08-02  8:26 UTC (permalink / raw)
  To: Igor Mammedov, qemu-devel; +Cc: pbonzini, qemu-s390x, dgilbert

On 02.08.19 10:04, David Hildenbrand wrote:
> On 29.07.19 16:52, Igor Mammedov wrote:
>> While looking into unifying guest RAM allocation to use hostmem backends
>> for initial RAM (especially when -mempath is used) and retiring
>> memory_region_allocate_system_memory() API, leaving only single hostmem backend,
>> I was inspecting how currently it is used by boards and it turns out several
>> boards abuse it by calling the function several times (despite documented contract
>> forbiding it).
>>
>> s390 is one of such boards where KVM limitation on memslot size got propagated
>> to board design and memory_region_allocate_system_memory() was abused to satisfy
>> KVM requirement for max RAM chunk where memory region alias would suffice.
>>
>> Unfortunately, memory_region_allocate_system_memory() usage created migration
>> dependency where guest RAM is transferred in migration stream as several RAMBlocks
>> if it's more than KVM_SLOT_MAX_BYTES.
> 
> So if I understand it correctly, we only call
> memory_region_allocate_system_memory() in case the guest initial memory
> size exceeds KVM_SLOT_MAX_BYTES - ~8TB.

(to clarify, I meant: call it more than once)


-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH RFC 0/2] s390: stop abusing memory_region_allocate_system_memory()
  2019-08-02  8:04 ` David Hildenbrand
  2019-08-02  8:23   ` David Hildenbrand
  2019-08-02  8:26   ` David Hildenbrand
@ 2019-08-02  8:29   ` Christian Borntraeger
  2019-08-02  8:37     ` David Hildenbrand
  2019-08-02  9:18     ` Igor Mammedov
  2 siblings, 2 replies; 18+ messages in thread
From: Christian Borntraeger @ 2019-08-02  8:29 UTC (permalink / raw)
  To: David Hildenbrand, Igor Mammedov, qemu-devel
  Cc: pbonzini, qemu-s390x, dgilbert



On 02.08.19 10:04, David Hildenbrand wrote:
> On 29.07.19 16:52, Igor Mammedov wrote:
>> While looking into unifying guest RAM allocation to use hostmem backends
>> for initial RAM (especially when -mempath is used) and retiring
>> memory_region_allocate_system_memory() API, leaving only single hostmem backend,
>> I was inspecting how currently it is used by boards and it turns out several
>> boards abuse it by calling the function several times (despite documented contract
>> forbiding it).
>>
>> s390 is one of such boards where KVM limitation on memslot size got propagated
>> to board design and memory_region_allocate_system_memory() was abused to satisfy
>> KVM requirement for max RAM chunk where memory region alias would suffice.
>>
>> Unfortunately, memory_region_allocate_system_memory() usage created migration
>> dependency where guest RAM is transferred in migration stream as several RAMBlocks
>> if it's more than KVM_SLOT_MAX_BYTES.
> 
> So if I understand it correctly, we only call
> memory_region_allocate_system_memory() in case the guest initial memory
> size exceeds KVM_SLOT_MAX_BYTES - ~8TB.

We always call it. We just call it twice for > 8TB
> 
> Do we *really* care about keeping migration of systems running that most
> probably nobody (except Christian ;) ) really uses? (especially not in
> production).
> 
> I am fine keeping migration running if it's easy, but introducing hacks
> (reading below) for such obscure use cases - I don't know.
> 
> @Christian: Please prove me wrong. :)

For the time being we can block migration for guests > 8TB if that helps (it should not
fail in a guest killing fashion), but we should
1. continue to be able to migrate guests < 8TB
2. continue to be 

On the other hand I find "and suddenly it fails if you go beyond this" really
unpleasant. So it would be interesting to see the next round of patches to 
check how "hacky" those really are.


> 
>>
>> In order to replace these several RAM chunks with a single memdev and keep it
>> working with KVM memslot size limit and migration compatible, following was done:
>>    * [2/2] use memory region aliases to partition hostmem backend RAM on
>>            KVM_SLOT_MAX_BYTES chunks, which should keep KVM side working
>>    * [1/2] hacked memory region aliases (to ram memory regions only) to have
>>            its own RAMBlocks pointing to RAM chunks owned by aliased memory
>>            region. While it's admittedly a hack, but it's relatively simple and
>>            allows board code rashape migration stream as necessary
>>
>>            I haven't tried to use migratable aliases on x86 machines, but with it
>>            it could be possible to drop legacy RAM allocation and compat knob
>>            (cd5ff8333a) dropping '-numa node,mem' completely even for old machines.
>>
>> PS:
>>    Tested with ping pong cross version migration on s390 machine 
>>    (with reduced KVM_SLOT_MAX_BYTES since I don't have access to large
>>     enough host)
>>      
>>
>> Igor Mammedov (2):
>>   memory: make MemoryRegion alias migratable
>>   s390: do not call memory_region_allocate_system_memory() multiple
>>     times
>>
>>  exec.c                     |  7 ++++---
>>  hw/s390x/s390-virtio-ccw.c | 20 +++++++++++++++-----
>>  memory.c                   |  5 +++++
>>  3 files changed, 24 insertions(+), 8 deletions(-)
>>
> 
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH RFC 0/2] s390: stop abusing memory_region_allocate_system_memory()
  2019-08-02  8:29   ` Christian Borntraeger
@ 2019-08-02  8:37     ` David Hildenbrand
  2019-08-02 10:24       ` Christian Borntraeger
  2019-08-02  9:18     ` Igor Mammedov
  1 sibling, 1 reply; 18+ messages in thread
From: David Hildenbrand @ 2019-08-02  8:37 UTC (permalink / raw)
  To: Christian Borntraeger, Igor Mammedov, qemu-devel
  Cc: pbonzini, qemu-s390x, dgilbert

On 02.08.19 10:29, Christian Borntraeger wrote:
> 
> 
> On 02.08.19 10:04, David Hildenbrand wrote:
>> On 29.07.19 16:52, Igor Mammedov wrote:
>>> While looking into unifying guest RAM allocation to use hostmem backends
>>> for initial RAM (especially when -mempath is used) and retiring
>>> memory_region_allocate_system_memory() API, leaving only single hostmem backend,
>>> I was inspecting how currently it is used by boards and it turns out several
>>> boards abuse it by calling the function several times (despite documented contract
>>> forbiding it).
>>>
>>> s390 is one of such boards where KVM limitation on memslot size got propagated
>>> to board design and memory_region_allocate_system_memory() was abused to satisfy
>>> KVM requirement for max RAM chunk where memory region alias would suffice.
>>>
>>> Unfortunately, memory_region_allocate_system_memory() usage created migration
>>> dependency where guest RAM is transferred in migration stream as several RAMBlocks
>>> if it's more than KVM_SLOT_MAX_BYTES.
>>
>> So if I understand it correctly, we only call
>> memory_region_allocate_system_memory() in case the guest initial memory
>> size exceeds KVM_SLOT_MAX_BYTES - ~8TB.
> 
> We always call it. We just call it twice for > 8TB

Yeah, that's what I meant.

>>
>> Do we *really* care about keeping migration of systems running that most
>> probably nobody (except Christian ;) ) really uses? (especially not in
>> production).
>>
>> I am fine keeping migration running if it's easy, but introducing hacks
>> (reading below) for such obscure use cases - I don't know.
>>
>> @Christian: Please prove me wrong. :)
> 
> For the time being we can block migration for guests > 8TB if that helps (it should not
> fail in a guest killing fashion), but we should
> 1. continue to be able to migrate guests < 8TB
> 2. continue to be 
> 
> On the other hand I find "and suddenly it fails if you go beyond this" really
> unpleasant. So it would be interesting to see the next round of patches to 
> check how "hacky" those really are.

I mean migration will work perfectly fine once we fixed it for new QEMU
versions. It's only the older QEMU versions to/from the > fixed one.

Looking at the log I can see that this was introduced with v2.12.0.

I would document this in the next release notes: "Migration of unusual
big VMs (>= 8TB) will not work from/to previous QEMU versions (up to
v2.12, before that starting such guests didn't even work)."

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH RFC 0/2] s390: stop abusing memory_region_allocate_system_memory()
  2019-08-02  8:29   ` Christian Borntraeger
  2019-08-02  8:37     ` David Hildenbrand
@ 2019-08-02  9:18     ` Igor Mammedov
  1 sibling, 0 replies; 18+ messages in thread
From: Igor Mammedov @ 2019-08-02  9:18 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: pbonzini, qemu-s390x, qemu-devel, dgilbert, David Hildenbrand

On Fri, 2 Aug 2019 10:29:43 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 02.08.19 10:04, David Hildenbrand wrote:
> > On 29.07.19 16:52, Igor Mammedov wrote:  
> >> While looking into unifying guest RAM allocation to use hostmem backends
> >> for initial RAM (especially when -mempath is used) and retiring
> >> memory_region_allocate_system_memory() API, leaving only single hostmem backend,
> >> I was inspecting how currently it is used by boards and it turns out several
> >> boards abuse it by calling the function several times (despite documented contract
> >> forbiding it).
> >>
> >> s390 is one of such boards where KVM limitation on memslot size got propagated
> >> to board design and memory_region_allocate_system_memory() was abused to satisfy
> >> KVM requirement for max RAM chunk where memory region alias would suffice.
> >>
> >> Unfortunately, memory_region_allocate_system_memory() usage created migration
> >> dependency where guest RAM is transferred in migration stream as several RAMBlocks
> >> if it's more than KVM_SLOT_MAX_BYTES.  
> > 
> > So if I understand it correctly, we only call
> > memory_region_allocate_system_memory() in case the guest initial memory
> > size exceeds KVM_SLOT_MAX_BYTES - ~8TB.  
> 
> We always call it. We just call it twice for > 8TB
> > 
> > Do we *really* care about keeping migration of systems running that most
> > probably nobody (except Christian ;) ) really uses? (especially not in
> > production).
> > 
> > I am fine keeping migration running if it's easy, but introducing hacks
> > (reading below) for such obscure use cases - I don't know.
> > 
> > @Christian: Please prove me wrong. :)  
> 
> For the time being we can block migration for guests > 8TB if that helps (it should not
> fail in a guest killing fashion), but we should
> 1. continue to be able to migrate guests < 8TB
> 2. continue to be 
> 
> On the other hand I find "and suddenly it fails if you go beyond this" really
> unpleasant. So it would be interesting to see the next round of patches to 
> check how "hacky" those really are.

I've looked into cleaning up "migratable aliases",
so far it works fine (well in configurations I was able to test,
there were no regressions in x86 machines which use aliases quite a bit).
As I've written before, aliases could be used for x86 later but
that yet to be proved, so I'd prefer to delay this patch if possible.

However, I'd prefer to intentionally break migration with >8TB
guests to simpler code without keepeing around compat mode
that won't/isn't used in practice.

As for new round of patches (including missing KVM fixup),
I'm going to post them now-ish for you to check it out.

> 
> >   
> >>
> >> In order to replace these several RAM chunks with a single memdev and keep it
> >> working with KVM memslot size limit and migration compatible, following was done:
> >>    * [2/2] use memory region aliases to partition hostmem backend RAM on
> >>            KVM_SLOT_MAX_BYTES chunks, which should keep KVM side working
> >>    * [1/2] hacked memory region aliases (to ram memory regions only) to have
> >>            its own RAMBlocks pointing to RAM chunks owned by aliased memory
> >>            region. While it's admittedly a hack, but it's relatively simple and
> >>            allows board code rashape migration stream as necessary
> >>
> >>            I haven't tried to use migratable aliases on x86 machines, but with it
> >>            it could be possible to drop legacy RAM allocation and compat knob
> >>            (cd5ff8333a) dropping '-numa node,mem' completely even for old machines.
> >>
> >> PS:
> >>    Tested with ping pong cross version migration on s390 machine 
> >>    (with reduced KVM_SLOT_MAX_BYTES since I don't have access to large
> >>     enough host)
> >>      
> >>
> >> Igor Mammedov (2):
> >>   memory: make MemoryRegion alias migratable
> >>   s390: do not call memory_region_allocate_system_memory() multiple
> >>     times
> >>
> >>  exec.c                     |  7 ++++---
> >>  hw/s390x/s390-virtio-ccw.c | 20 +++++++++++++++-----
> >>  memory.c                   |  5 +++++
> >>  3 files changed, 24 insertions(+), 8 deletions(-)
> >>  
> > 
> >   
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [qemu-s390x] [PATCH RFC 0/2] s390: stop abusing memory_region_allocate_system_memory()
  2019-08-02  8:37     ` David Hildenbrand
@ 2019-08-02 10:24       ` Christian Borntraeger
  0 siblings, 0 replies; 18+ messages in thread
From: Christian Borntraeger @ 2019-08-02 10:24 UTC (permalink / raw)
  To: David Hildenbrand, Igor Mammedov, qemu-devel
  Cc: pbonzini, qemu-s390x, dgilbert



On 02.08.19 10:37, David Hildenbrand wrote:
> On 02.08.19 10:29, Christian Borntraeger wrote:
>>
>>
>> On 02.08.19 10:04, David Hildenbrand wrote:
>>> On 29.07.19 16:52, Igor Mammedov wrote:
>>>> While looking into unifying guest RAM allocation to use hostmem backends
>>>> for initial RAM (especially when -mempath is used) and retiring
>>>> memory_region_allocate_system_memory() API, leaving only single hostmem backend,
>>>> I was inspecting how currently it is used by boards and it turns out several
>>>> boards abuse it by calling the function several times (despite documented contract
>>>> forbiding it).
>>>>
>>>> s390 is one of such boards where KVM limitation on memslot size got propagated
>>>> to board design and memory_region_allocate_system_memory() was abused to satisfy
>>>> KVM requirement for max RAM chunk where memory region alias would suffice.
>>>>
>>>> Unfortunately, memory_region_allocate_system_memory() usage created migration
>>>> dependency where guest RAM is transferred in migration stream as several RAMBlocks
>>>> if it's more than KVM_SLOT_MAX_BYTES.
>>>
>>> So if I understand it correctly, we only call
>>> memory_region_allocate_system_memory() in case the guest initial memory
>>> size exceeds KVM_SLOT_MAX_BYTES - ~8TB.
>>
>> We always call it. We just call it twice for > 8TB
> 
> Yeah, that's what I meant.
> 
>>>
>>> Do we *really* care about keeping migration of systems running that most
>>> probably nobody (except Christian ;) ) really uses? (especially not in
>>> production).
>>>
>>> I am fine keeping migration running if it's easy, but introducing hacks
>>> (reading below) for such obscure use cases - I don't know.
>>>
>>> @Christian: Please prove me wrong. :)
>>
>> For the time being we can block migration for guests > 8TB if that helps (it should not
>> fail in a guest killing fashion), but we should
>> 1. continue to be able to migrate guests < 8TB
>> 2. continue to be 
>>
>> On the other hand I find "and suddenly it fails if you go beyond this" really
>> unpleasant. So it would be interesting to see the next round of patches to 
>> check how "hacky" those really are.
> 
> I mean migration will work perfectly fine once we fixed it for new QEMU
> versions. It's only the older QEMU versions to/from the > fixed one.

I think that would be fine. We just have to make sure that migration really
fails in a way that the system continues to run. 
> 
> Looking at the log I can see that this was introduced with v2.12.0.
> 
> I would document this in the next release notes: "Migration of unusual
> big VMs (>= 8TB) will not work from/to previous QEMU versions (up to
> v2.12, before that starting such guests didn't even work)."

Yes, sounds good.



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-08-02 10:25 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-29 14:52 [Qemu-devel] [PATCH RFC 0/2] s390: stop abusing memory_region_allocate_system_memory() Igor Mammedov
2019-07-29 14:52 ` [Qemu-devel] [PATCH RFC 1/2] memory: make MemoryRegion alias migratable Igor Mammedov
2019-07-29 17:53   ` Dr. David Alan Gilbert
2019-07-30 13:25     ` Igor Mammedov
2019-07-30 13:34       ` Paolo Bonzini
2019-07-30 14:35         ` Igor Mammedov
2019-07-30 15:41           ` Dr. David Alan Gilbert
2019-07-29 14:52 ` [Qemu-devel] [PATCH RFC 2/2] s390: do not call memory_region_allocate_system_memory() multiple times Igor Mammedov
2019-07-29 14:58 ` [Qemu-devel] [PATCH RFC 0/2] s390: stop abusing memory_region_allocate_system_memory() Cornelia Huck
2019-07-30 15:22 ` [Qemu-devel] [qemu-s390x] " Christian Borntraeger
2019-07-30 15:49   ` Igor Mammedov
2019-08-02  8:04 ` David Hildenbrand
2019-08-02  8:23   ` David Hildenbrand
2019-08-02  8:26   ` David Hildenbrand
2019-08-02  8:29   ` Christian Borntraeger
2019-08-02  8:37     ` David Hildenbrand
2019-08-02 10:24       ` Christian Borntraeger
2019-08-02  9:18     ` Igor Mammedov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.