All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] exclude hyperv synic sections from vhost
@ 2020-01-08 13:53 Dr. David Alan Gilbert (git)
  2020-01-08 13:53 ` [PATCH 1/2] vhost: Don't pass ram device sections Dr. David Alan Gilbert (git)
                   ` (4 more replies)
  0 siblings, 5 replies; 35+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2020-01-08 13:53 UTC (permalink / raw)
  To: qemu-devel, vkuznets, mst, jasowang

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Hyperv's synic (that we emulate) is a feature that allows the guest
to place some magic (4k) pages of RAM anywhere it likes in GPA.
This confuses vhost's RAM section merging when these pages
land over the top of hugepages.

Since they're not normal RAM, and they shouldn't have vhost DMAing
into them, exclude them from the vhost set.

I do that by marking them as device-ram and then excluding device-ram
from vhost.

bz: https://bugzilla.redhat.com/show_bug.cgi?id=1779041

Dr. David Alan Gilbert (2):
  vhost: Don't pass ram device sections
  hyperv/synic: Allocate as ram_device

 hw/hyperv/hyperv.c | 14 ++++++++------
 hw/virtio/vhost.c  |  1 +
 2 files changed, 9 insertions(+), 6 deletions(-)

-- 
2.24.1



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/2] vhost: Don't pass ram device sections
  2020-01-08 13:53 [PATCH 0/2] exclude hyperv synic sections from vhost Dr. David Alan Gilbert (git)
@ 2020-01-08 13:53 ` Dr. David Alan Gilbert (git)
  2020-01-09 11:45   ` Michael S. Tsirkin
  2020-01-09 12:38   ` Roman Kagan
  2020-01-08 13:53 ` [PATCH 2/2] hyperv/synic: Allocate as ram_device Dr. David Alan Gilbert (git)
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 35+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2020-01-08 13:53 UTC (permalink / raw)
  To: qemu-devel, vkuznets, mst, jasowang

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Don't pass RAM blocks that are marked as ram devices to vhost.
There's normally something special about them and they're not
normally just shared memory.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/virtio/vhost.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 4da0d5a6c5..c81f0be71b 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -402,6 +402,7 @@ static bool vhost_section(struct vhost_dev *dev, MemoryRegionSection *section)
     bool log_dirty = memory_region_get_dirty_log_mask(section->mr) &
                      ~(1 << DIRTY_MEMORY_MIGRATION);
     result = memory_region_is_ram(section->mr) &&
+        !memory_region_is_ram_device(section->mr) &&
         !memory_region_is_rom(section->mr);
 
     /* Vhost doesn't handle any block which is doing dirty-tracking other
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/2] hyperv/synic: Allocate as ram_device
  2020-01-08 13:53 [PATCH 0/2] exclude hyperv synic sections from vhost Dr. David Alan Gilbert (git)
  2020-01-08 13:53 ` [PATCH 1/2] vhost: Don't pass ram device sections Dr. David Alan Gilbert (git)
@ 2020-01-08 13:53 ` Dr. David Alan Gilbert (git)
  2020-01-09 11:48   ` Michael S. Tsirkin
  2020-01-09 13:03   ` Roman Kagan
  2020-01-08 14:26 ` [PATCH 0/2] exclude hyperv synic sections from vhost Vitaly Kuznetsov
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 35+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2020-01-08 13:53 UTC (permalink / raw)
  To: qemu-devel, vkuznets, mst, jasowang

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Mark the synic pages as ram_device so that they won't be visible
to vhost.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 hw/hyperv/hyperv.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/hw/hyperv/hyperv.c b/hw/hyperv/hyperv.c
index da8ce82725..4de3ec411d 100644
--- a/hw/hyperv/hyperv.c
+++ b/hw/hyperv/hyperv.c
@@ -95,12 +95,14 @@ static void synic_realize(DeviceState *dev, Error **errp)
     msgp_name = g_strdup_printf("synic-%u-msg-page", vp_index);
     eventp_name = g_strdup_printf("synic-%u-event-page", vp_index);
 
-    memory_region_init_ram(&synic->msg_page_mr, obj, msgp_name,
-                           sizeof(*synic->msg_page), &error_abort);
-    memory_region_init_ram(&synic->event_page_mr, obj, eventp_name,
-                           sizeof(*synic->event_page), &error_abort);
-    synic->msg_page = memory_region_get_ram_ptr(&synic->msg_page_mr);
-    synic->event_page = memory_region_get_ram_ptr(&synic->event_page_mr);
+    synic->msg_page = qemu_memalign(qemu_real_host_page_size,
+                                    sizeof(*synic->msg_page));
+    synic->event_page = qemu_memalign(qemu_real_host_page_size,
+                                      sizeof(*synic->event_page));
+    memory_region_init_ram_device_ptr(&synic->msg_page_mr, obj, msgp_name,
+                           sizeof(*synic->msg_page), synic->msg_page);
+    memory_region_init_ram_device_ptr(&synic->event_page_mr, obj, eventp_name,
+                           sizeof(*synic->event_page), synic->event_page);
 
     g_free(msgp_name);
     g_free(eventp_name);
-- 
2.24.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2] exclude hyperv synic sections from vhost
  2020-01-08 13:53 [PATCH 0/2] exclude hyperv synic sections from vhost Dr. David Alan Gilbert (git)
  2020-01-08 13:53 ` [PATCH 1/2] vhost: Don't pass ram device sections Dr. David Alan Gilbert (git)
  2020-01-08 13:53 ` [PATCH 2/2] hyperv/synic: Allocate as ram_device Dr. David Alan Gilbert (git)
@ 2020-01-08 14:26 ` Vitaly Kuznetsov
  2020-01-09  3:00 ` Jason Wang
  2020-01-09 11:53 ` Roman Kagan
  4 siblings, 0 replies; 35+ messages in thread
From: Vitaly Kuznetsov @ 2020-01-08 14:26 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: Roman Kagan, jasowang, qemu-devel, mst

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> writes:

> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Hyperv's synic (that we emulate) is a feature that allows the guest
> to place some magic (4k) pages of RAM anywhere it likes in GPA.
> This confuses vhost's RAM section merging when these pages
> land over the top of hugepages.
>
> Since they're not normal RAM, and they shouldn't have vhost DMAing
> into them, exclude them from the vhost set.
>
> I do that by marking them as device-ram and then excluding device-ram
> from vhost.
>
> bz: https://bugzilla.redhat.com/show_bug.cgi?id=1779041
>

Cc: Roman who's The Synic Overlord to make sure he doesn't miss this.

-- 
Vitaly



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2] exclude hyperv synic sections from vhost
  2020-01-08 13:53 [PATCH 0/2] exclude hyperv synic sections from vhost Dr. David Alan Gilbert (git)
                   ` (2 preceding siblings ...)
  2020-01-08 14:26 ` [PATCH 0/2] exclude hyperv synic sections from vhost Vitaly Kuznetsov
@ 2020-01-09  3:00 ` Jason Wang
  2020-01-09  9:07   ` Vitaly Kuznetsov
  2020-01-09 12:02   ` Dr. David Alan Gilbert
  2020-01-09 11:53 ` Roman Kagan
  4 siblings, 2 replies; 35+ messages in thread
From: Jason Wang @ 2020-01-09  3:00 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel, vkuznets, mst


On 2020/1/8 下午9:53, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Hyperv's synic (that we emulate) is a feature that allows the guest
> to place some magic (4k) pages of RAM anywhere it likes in GPA.
> This confuses vhost's RAM section merging when these pages
> land over the top of hugepages.


Hi David:

A silly question, is this because the alignment when adding sections? If 
yes, what's the reason for doing alignment which is not a must for vhost 
memory table.

Thanks


>
> Since they're not normal RAM, and they shouldn't have vhost DMAing
> into them, exclude them from the vhost set.
>
> I do that by marking them as device-ram and then excluding device-ram
> from vhost.
>
> bz: https://bugzilla.redhat.com/show_bug.cgi?id=1779041
>
> Dr. David Alan Gilbert (2):
>    vhost: Don't pass ram device sections
>    hyperv/synic: Allocate as ram_device
>
>   hw/hyperv/hyperv.c | 14 ++++++++------
>   hw/virtio/vhost.c  |  1 +
>   2 files changed, 9 insertions(+), 6 deletions(-)
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2] exclude hyperv synic sections from vhost
  2020-01-09  3:00 ` Jason Wang
@ 2020-01-09  9:07   ` Vitaly Kuznetsov
  2020-01-09 12:02   ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 35+ messages in thread
From: Vitaly Kuznetsov @ 2020-01-09  9:07 UTC (permalink / raw)
  To: Jason Wang; +Cc: Roman Kagan, mst, Dr. David Alan Gilbert (git), qemu-devel

Jason Wang <jasowang@redhat.com> writes:

> On 2020/1/8 下午9:53, Dr. David Alan Gilbert (git) wrote:
>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>
>> Hyperv's synic (that we emulate) is a feature that allows the guest
>> to place some magic (4k) pages of RAM anywhere it likes in GPA.
>> This confuses vhost's RAM section merging when these pages
>> land over the top of hugepages.
>
>
> Hi David:
>
> A silly question, is this because the alignment when adding sections? If 
> yes, what's the reason for doing alignment which is not a must for vhost 
> memory table.

SynIC regions are two 4k pages and they are picked by the guest, not the
host. These can be anywhere in guest's ram.

-- 
Vitaly



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] vhost: Don't pass ram device sections
  2020-01-08 13:53 ` [PATCH 1/2] vhost: Don't pass ram device sections Dr. David Alan Gilbert (git)
@ 2020-01-09 11:45   ` Michael S. Tsirkin
  2020-01-09 11:56     ` Michael S. Tsirkin
  2020-01-09 12:38   ` Roman Kagan
  1 sibling, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2020-01-09 11:45 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: jasowang, vkuznets, qemu-devel

On Wed, Jan 08, 2020 at 01:53:52PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Don't pass RAM blocks that are marked as ram devices to vhost.
> There's normally something special about them and they're not
> normally just shared memory.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>

So I think this is good by itself.

> ---
>  hw/virtio/vhost.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 4da0d5a6c5..c81f0be71b 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -402,6 +402,7 @@ static bool vhost_section(struct vhost_dev *dev, MemoryRegionSection *section)
>      bool log_dirty = memory_region_get_dirty_log_mask(section->mr) &
>                       ~(1 << DIRTY_MEMORY_MIGRATION);
>      result = memory_region_is_ram(section->mr) &&
> +        !memory_region_is_ram_device(section->mr) &&
>          !memory_region_is_rom(section->mr);
>  
>      /* Vhost doesn't handle any block which is doing dirty-tracking other
> -- 
> 2.24.1



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] hyperv/synic: Allocate as ram_device
  2020-01-08 13:53 ` [PATCH 2/2] hyperv/synic: Allocate as ram_device Dr. David Alan Gilbert (git)
@ 2020-01-09 11:48   ` Michael S. Tsirkin
  2020-01-09 12:08     ` Dr. David Alan Gilbert
  2020-01-09 13:03   ` Roman Kagan
  1 sibling, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2020-01-09 11:48 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: jasowang, vkuznets, qemu-devel

On Wed, Jan 08, 2020 at 01:53:53PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Mark the synic pages as ram_device so that they won't be visible
> to vhost.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>


I think I disagree with this one.
 * A RAM device represents a mapping to a physical device, such as to a PCI
 * MMIO BAR of an vfio-pci assigned device.  The memory region may be mapped
 * into the VM address space and access to the region will modify memory
 * directly.  However, the memory region should not be included in a memory
 * dump (device may not be enabled/mapped at the time of the dump), and
 * operations incompatible with manipulating MMIO should be avoided.  Replaces
 * skip_dump flag.

Looks like an abuse of notation.



> ---
>  hw/hyperv/hyperv.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/hyperv/hyperv.c b/hw/hyperv/hyperv.c
> index da8ce82725..4de3ec411d 100644
> --- a/hw/hyperv/hyperv.c
> +++ b/hw/hyperv/hyperv.c
> @@ -95,12 +95,14 @@ static void synic_realize(DeviceState *dev, Error **errp)
>      msgp_name = g_strdup_printf("synic-%u-msg-page", vp_index);
>      eventp_name = g_strdup_printf("synic-%u-event-page", vp_index);
>  
> -    memory_region_init_ram(&synic->msg_page_mr, obj, msgp_name,
> -                           sizeof(*synic->msg_page), &error_abort);
> -    memory_region_init_ram(&synic->event_page_mr, obj, eventp_name,
> -                           sizeof(*synic->event_page), &error_abort);
> -    synic->msg_page = memory_region_get_ram_ptr(&synic->msg_page_mr);
> -    synic->event_page = memory_region_get_ram_ptr(&synic->event_page_mr);
> +    synic->msg_page = qemu_memalign(qemu_real_host_page_size,
> +                                    sizeof(*synic->msg_page));
> +    synic->event_page = qemu_memalign(qemu_real_host_page_size,
> +                                      sizeof(*synic->event_page));
> +    memory_region_init_ram_device_ptr(&synic->msg_page_mr, obj, msgp_name,
> +                           sizeof(*synic->msg_page), synic->msg_page);
> +    memory_region_init_ram_device_ptr(&synic->event_page_mr, obj, eventp_name,
> +                           sizeof(*synic->event_page), synic->event_page);
>  
>      g_free(msgp_name);
>      g_free(eventp_name);
> -- 
> 2.24.1



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2] exclude hyperv synic sections from vhost
  2020-01-08 13:53 [PATCH 0/2] exclude hyperv synic sections from vhost Dr. David Alan Gilbert (git)
                   ` (3 preceding siblings ...)
  2020-01-09  3:00 ` Jason Wang
@ 2020-01-09 11:53 ` Roman Kagan
  2020-01-09 12:16   ` Dr. David Alan Gilbert
  4 siblings, 1 reply; 35+ messages in thread
From: Roman Kagan @ 2020-01-09 11:53 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: jasowang, vkuznets, qemu-devel, mst

On Wed, Jan 08, 2020 at 01:53:51PM +0000, Dr. David Alan Gilbert (git) wrote:
> Hyperv's synic (that we emulate) is a feature that allows the guest
> to place some magic (4k) pages of RAM anywhere it likes in GPA.
> This confuses vhost's RAM section merging when these pages
> land over the top of hugepages.
> 
> Since they're not normal RAM, and they shouldn't have vhost DMAing
> into them, exclude them from the vhost set.

But they *are* normal RAM.  If the guest driver sets up the device to
DMA to a SynIC page so be it, and the guest deserves what it gets.

> I do that by marking them as device-ram and then excluding device-ram
> from vhost.
> 
> bz: https://bugzilla.redhat.com/show_bug.cgi?id=1779041

I was pointed a while back by Vitaly at
https://bugs.launchpad.net/qemu/+bug/1811533 which appeared to be the
same issue, but failed to reproduce the problem.  Can you please provide
some more detail as to how it's triggered?

Thanks,
Roman.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] vhost: Don't pass ram device sections
  2020-01-09 11:45   ` Michael S. Tsirkin
@ 2020-01-09 11:56     ` Michael S. Tsirkin
  0 siblings, 0 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2020-01-09 11:56 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: jasowang, vkuznets, qemu-devel

On Thu, Jan 09, 2020 at 06:45:24AM -0500, Michael S. Tsirkin wrote:
> On Wed, Jan 08, 2020 at 01:53:52PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Don't pass RAM blocks that are marked as ram devices to vhost.
> > There's normally something special about them and they're not
> > normally just shared memory.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> 
> So I think this is good by itself.

Hmm second thoughts.
vhost kernel can handle any pointer.
And what we care about for vhost-user is that
memory_region_get_fd
gives us an fd.

So why does it matter that there's something special
about it? I worry that this will break things like
pass through of virtio-pmem for nested virt.


> > ---
> >  hw/virtio/vhost.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > index 4da0d5a6c5..c81f0be71b 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -402,6 +402,7 @@ static bool vhost_section(struct vhost_dev *dev, MemoryRegionSection *section)
> >      bool log_dirty = memory_region_get_dirty_log_mask(section->mr) &
> >                       ~(1 << DIRTY_MEMORY_MIGRATION);
> >      result = memory_region_is_ram(section->mr) &&
> > +        !memory_region_is_ram_device(section->mr) &&
> >          !memory_region_is_rom(section->mr);
> >  
> >      /* Vhost doesn't handle any block which is doing dirty-tracking other
> > -- 
> > 2.24.1



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2] exclude hyperv synic sections from vhost
  2020-01-09  3:00 ` Jason Wang
  2020-01-09  9:07   ` Vitaly Kuznetsov
@ 2020-01-09 12:02   ` Dr. David Alan Gilbert
  2020-01-09 12:14     ` Michael S. Tsirkin
  1 sibling, 1 reply; 35+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-09 12:02 UTC (permalink / raw)
  To: Jason Wang; +Cc: vkuznets, qemu-devel, mst

* Jason Wang (jasowang@redhat.com) wrote:
> 
> On 2020/1/8 下午9:53, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Hyperv's synic (that we emulate) is a feature that allows the guest
> > to place some magic (4k) pages of RAM anywhere it likes in GPA.
> > This confuses vhost's RAM section merging when these pages
> > land over the top of hugepages.
> 
> 
> Hi David:
> 
> A silly question, is this because the alignment when adding sections? If
> yes, what's the reason for doing alignment which is not a must for vhost
> memory table.

Page alignment is a bit odd with vhost-user - it ends up having to mmap
each of the sections itself; and still has to map them as hugepages
to be able to mmap - in the old world you could sometimes have
the daemon mmaping the same chunk of memory twice into the vhost-user
process; without the aggregation  you'd get a hugepage mapping for the
0-2MB chunk for the 0-512K mapping, and then maybe another 0-2MB chunk
for some of the other parts over 512K.
With postcopy we can't have the multiple mappings of the same part of
guest memory; we need to have one mapping for userfault.

Also, given the 16 separate synic regions, you'd probably end up having
a lot of wasted vhost-sections.

Dave




> Thanks
> 
> 
> > 
> > Since they're not normal RAM, and they shouldn't have vhost DMAing
> > into them, exclude them from the vhost set.
> > 
> > I do that by marking them as device-ram and then excluding device-ram
> > from vhost.
> > 
> > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1779041
> > 
> > Dr. David Alan Gilbert (2):
> >    vhost: Don't pass ram device sections
> >    hyperv/synic: Allocate as ram_device
> > 
> >   hw/hyperv/hyperv.c | 14 ++++++++------
> >   hw/virtio/vhost.c  |  1 +
> >   2 files changed, 9 insertions(+), 6 deletions(-)
> > 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] hyperv/synic: Allocate as ram_device
  2020-01-09 11:48   ` Michael S. Tsirkin
@ 2020-01-09 12:08     ` Dr. David Alan Gilbert
  2020-01-09 12:18       ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-09 12:08 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: jasowang, vkuznets, qemu-devel

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Wed, Jan 08, 2020 at 01:53:53PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Mark the synic pages as ram_device so that they won't be visible
> > to vhost.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> 
> I think I disagree with this one.
>  * A RAM device represents a mapping to a physical device, such as to a PCI
>  * MMIO BAR of an vfio-pci assigned device.  The memory region may be mapped
>  * into the VM address space and access to the region will modify memory
>  * directly.  However, the memory region should not be included in a memory
>  * dump (device may not be enabled/mapped at the time of the dump), and
>  * operations incompatible with manipulating MMIO should be avoided.  Replaces
>  * skip_dump flag.
> 
> Looks like an abuse of notation.

OK, it did feel a bit like that - any suggestions of another way to do
it?
  This clearly isn't normal RAM.

Dave

> 
> 
> > ---
> >  hw/hyperv/hyperv.c | 14 ++++++++------
> >  1 file changed, 8 insertions(+), 6 deletions(-)
> > 
> > diff --git a/hw/hyperv/hyperv.c b/hw/hyperv/hyperv.c
> > index da8ce82725..4de3ec411d 100644
> > --- a/hw/hyperv/hyperv.c
> > +++ b/hw/hyperv/hyperv.c
> > @@ -95,12 +95,14 @@ static void synic_realize(DeviceState *dev, Error **errp)
> >      msgp_name = g_strdup_printf("synic-%u-msg-page", vp_index);
> >      eventp_name = g_strdup_printf("synic-%u-event-page", vp_index);
> >  
> > -    memory_region_init_ram(&synic->msg_page_mr, obj, msgp_name,
> > -                           sizeof(*synic->msg_page), &error_abort);
> > -    memory_region_init_ram(&synic->event_page_mr, obj, eventp_name,
> > -                           sizeof(*synic->event_page), &error_abort);
> > -    synic->msg_page = memory_region_get_ram_ptr(&synic->msg_page_mr);
> > -    synic->event_page = memory_region_get_ram_ptr(&synic->event_page_mr);
> > +    synic->msg_page = qemu_memalign(qemu_real_host_page_size,
> > +                                    sizeof(*synic->msg_page));
> > +    synic->event_page = qemu_memalign(qemu_real_host_page_size,
> > +                                      sizeof(*synic->event_page));
> > +    memory_region_init_ram_device_ptr(&synic->msg_page_mr, obj, msgp_name,
> > +                           sizeof(*synic->msg_page), synic->msg_page);
> > +    memory_region_init_ram_device_ptr(&synic->event_page_mr, obj, eventp_name,
> > +                           sizeof(*synic->event_page), synic->event_page);
> >  
> >      g_free(msgp_name);
> >      g_free(eventp_name);
> > -- 
> > 2.24.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2] exclude hyperv synic sections from vhost
  2020-01-09 12:02   ` Dr. David Alan Gilbert
@ 2020-01-09 12:14     ` Michael S. Tsirkin
  0 siblings, 0 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2020-01-09 12:14 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: vkuznets, Jason Wang, qemu-devel

On Thu, Jan 09, 2020 at 12:02:16PM +0000, Dr. David Alan Gilbert wrote:
> * Jason Wang (jasowang@redhat.com) wrote:
> > 
> > On 2020/1/8 下午9:53, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Hyperv's synic (that we emulate) is a feature that allows the guest
> > > to place some magic (4k) pages of RAM anywhere it likes in GPA.
> > > This confuses vhost's RAM section merging when these pages
> > > land over the top of hugepages.
> > 
> > 
> > Hi David:
> > 
> > A silly question, is this because the alignment when adding sections? If
> > yes, what's the reason for doing alignment which is not a must for vhost
> > memory table.
> 
> Page alignment is a bit odd with vhost-user - it ends up having to mmap
> each of the sections itself; and still has to map them as hugepages
> to be able to mmap - in the old world you could sometimes have
> the daemon mmaping the same chunk of memory twice into the vhost-user
> process; without the aggregation  you'd get a hugepage mapping for the
> 0-2MB chunk for the 0-512K mapping, and then maybe another 0-2MB chunk
> for some of the other parts over 512K.
> With postcopy we can't have the multiple mappings of the same part of
> guest memory; we need to have one mapping for userfault.
> 
> Also, given the 16 separate synic regions, you'd probably end up having
> a lot of wasted vhost-sections.
> 
> Dave

So I'd worry that this is more an abuse of an interface.
E.g. this means it's skipped from dumps, which is not nice.


And for vhost I worry these patches will break pass-through of
PCI attached memory.


> 
> 
> 
> > Thanks
> > 
> > 
> > > 
> > > Since they're not normal RAM, and they shouldn't have vhost DMAing
> > > into them, exclude them from the vhost set.
> > > 
> > > I do that by marking them as device-ram and then excluding device-ram
> > > from vhost.
> > > 
> > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1779041
> > > 
> > > Dr. David Alan Gilbert (2):
> > >    vhost: Don't pass ram device sections
> > >    hyperv/synic: Allocate as ram_device
> > > 
> > >   hw/hyperv/hyperv.c | 14 ++++++++------
> > >   hw/virtio/vhost.c  |  1 +
> > >   2 files changed, 9 insertions(+), 6 deletions(-)
> > > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2] exclude hyperv synic sections from vhost
  2020-01-09 11:53 ` Roman Kagan
@ 2020-01-09 12:16   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 35+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-09 12:16 UTC (permalink / raw)
  To: Roman Kagan, qemu-devel, vkuznets, mst, jasowang

* Roman Kagan (rkagan@virtuozzo.com) wrote:
> On Wed, Jan 08, 2020 at 01:53:51PM +0000, Dr. David Alan Gilbert (git) wrote:
> > Hyperv's synic (that we emulate) is a feature that allows the guest
> > to place some magic (4k) pages of RAM anywhere it likes in GPA.
> > This confuses vhost's RAM section merging when these pages
> > land over the top of hugepages.
> > 
> > Since they're not normal RAM, and they shouldn't have vhost DMAing
> > into them, exclude them from the vhost set.
> 
> But they *are* normal RAM.  If the guest driver sets up the device to
> DMA to a SynIC page so be it, and the guest deserves what it gets.

I don't think that's guaranteed to work.
However, in our case the guest isn't doing anything that crazy; it's
just setting the GPA of these pages in an inconveninent place for us.

> > I do that by marking them as device-ram and then excluding device-ram
> > from vhost.
> > 
> > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1779041
> 
> I was pointed a while back by Vitaly at
> https://bugs.launchpad.net/qemu/+bug/1811533 which appeared to be the
> same issue, but failed to reproduce the problem.  Can you please provide
> some more detail as to how it's triggered?

Our test script is:
/usr/libexec/qemu-kvm \
    -name 'dgilbert-vm1' \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1 \
    -m 4096 \
    -object memory-backend-file,size=1024M,prealloc=no,mem-path=/mnt/kvm_hugepage,policy=default,id=mem-mem0 \
    -object memory-backend-file,size=3072M,prealloc=no,mem-path=/mnt/kvm_hugepage,policy=default,id=mem-mem1  \
    -smp 16,maxcpus=16,cores=8,threads=1,dies=1,sockets=2  \
    -numa node,memdev=mem-mem0  \
    -numa node,memdev=mem-mem1  \
    -cpu SandyBridge,hv_stimer,hv_time,hv_synic,hv_vpindex \
    -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
    -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-3,addr=0x0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=./win2019-64-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:5c:d7:9f:cd:48,id=id7ex9m8,netdev=idim5Sro,bus=pcie.0-root-port-4,addr=0x0  \
    -netdev tap,id=idim5Sro,vhost=on \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :1  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
    -monitor stdio -qmp tcp:0:4444,server,nowait

Dave

> Thanks,
> Roman.
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] hyperv/synic: Allocate as ram_device
  2020-01-09 12:08     ` Dr. David Alan Gilbert
@ 2020-01-09 12:18       ` Michael S. Tsirkin
  2020-01-09 12:22         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2020-01-09 12:18 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: jasowang, vkuznets, qemu-devel

On Thu, Jan 09, 2020 at 12:08:20PM +0000, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
> > On Wed, Jan 08, 2020 at 01:53:53PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Mark the synic pages as ram_device so that they won't be visible
> > > to vhost.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > 
> > 
> > I think I disagree with this one.
> >  * A RAM device represents a mapping to a physical device, such as to a PCI
> >  * MMIO BAR of an vfio-pci assigned device.  The memory region may be mapped
> >  * into the VM address space and access to the region will modify memory
> >  * directly.  However, the memory region should not be included in a memory
> >  * dump (device may not be enabled/mapped at the time of the dump), and
> >  * operations incompatible with manipulating MMIO should be avoided.  Replaces
> >  * skip_dump flag.
> > 
> > Looks like an abuse of notation.
> 
> OK, it did feel a bit like that - any suggestions of another way to do
> it?
>   This clearly isn't normal RAM.
> 
> Dave

If it's just an optimization for vhost/postcopy/etc, then I think
an API that says how this isn't normal ram would be ok.
E.g. it's not DMA'd into? Then maybe _nodma?

> > 
> > 
> > > ---
> > >  hw/hyperv/hyperv.c | 14 ++++++++------
> > >  1 file changed, 8 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/hw/hyperv/hyperv.c b/hw/hyperv/hyperv.c
> > > index da8ce82725..4de3ec411d 100644
> > > --- a/hw/hyperv/hyperv.c
> > > +++ b/hw/hyperv/hyperv.c
> > > @@ -95,12 +95,14 @@ static void synic_realize(DeviceState *dev, Error **errp)
> > >      msgp_name = g_strdup_printf("synic-%u-msg-page", vp_index);
> > >      eventp_name = g_strdup_printf("synic-%u-event-page", vp_index);
> > >  
> > > -    memory_region_init_ram(&synic->msg_page_mr, obj, msgp_name,
> > > -                           sizeof(*synic->msg_page), &error_abort);
> > > -    memory_region_init_ram(&synic->event_page_mr, obj, eventp_name,
> > > -                           sizeof(*synic->event_page), &error_abort);
> > > -    synic->msg_page = memory_region_get_ram_ptr(&synic->msg_page_mr);
> > > -    synic->event_page = memory_region_get_ram_ptr(&synic->event_page_mr);
> > > +    synic->msg_page = qemu_memalign(qemu_real_host_page_size,
> > > +                                    sizeof(*synic->msg_page));
> > > +    synic->event_page = qemu_memalign(qemu_real_host_page_size,
> > > +                                      sizeof(*synic->event_page));
> > > +    memory_region_init_ram_device_ptr(&synic->msg_page_mr, obj, msgp_name,
> > > +                           sizeof(*synic->msg_page), synic->msg_page);
> > > +    memory_region_init_ram_device_ptr(&synic->event_page_mr, obj, eventp_name,
> > > +                           sizeof(*synic->event_page), synic->event_page);
> > >  
> > >      g_free(msgp_name);
> > >      g_free(eventp_name);
> > > -- 
> > > 2.24.1
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] hyperv/synic: Allocate as ram_device
  2020-01-09 12:18       ` Michael S. Tsirkin
@ 2020-01-09 12:22         ` Dr. David Alan Gilbert
  2020-01-09 13:00           ` Vitaly Kuznetsov
  2020-01-09 13:06           ` Michael S. Tsirkin
  0 siblings, 2 replies; 35+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-09 12:22 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: jasowang, vkuznets, qemu-devel

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Thu, Jan 09, 2020 at 12:08:20PM +0000, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > On Wed, Jan 08, 2020 at 01:53:53PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > 
> > > > Mark the synic pages as ram_device so that they won't be visible
> > > > to vhost.
> > > > 
> > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > 
> > > 
> > > I think I disagree with this one.
> > >  * A RAM device represents a mapping to a physical device, such as to a PCI
> > >  * MMIO BAR of an vfio-pci assigned device.  The memory region may be mapped
> > >  * into the VM address space and access to the region will modify memory
> > >  * directly.  However, the memory region should not be included in a memory
> > >  * dump (device may not be enabled/mapped at the time of the dump), and
> > >  * operations incompatible with manipulating MMIO should be avoided.  Replaces
> > >  * skip_dump flag.
> > > 
> > > Looks like an abuse of notation.
> > 
> > OK, it did feel a bit like that - any suggestions of another way to do
> > it?
> >   This clearly isn't normal RAM.
> > 
> > Dave
> 
> If it's just an optimization for vhost/postcopy/etc, then I think

Note it's not an optimisation; postcopy fails unless you can aggregate
the members of the hugepage.
And I think vhost-user will fail if you have too many sections - and
the 16 sections from synic I think will blow the slots available.

> an API that says how this isn't normal ram would be ok.
> E.g. it's not DMA'd into? Then maybe _nodma?

Do we want a new memory_region_init for that or just to be able to add
a flag?

Dave

> > > 
> > > 
> > > > ---
> > > >  hw/hyperv/hyperv.c | 14 ++++++++------
> > > >  1 file changed, 8 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/hw/hyperv/hyperv.c b/hw/hyperv/hyperv.c
> > > > index da8ce82725..4de3ec411d 100644
> > > > --- a/hw/hyperv/hyperv.c
> > > > +++ b/hw/hyperv/hyperv.c
> > > > @@ -95,12 +95,14 @@ static void synic_realize(DeviceState *dev, Error **errp)
> > > >      msgp_name = g_strdup_printf("synic-%u-msg-page", vp_index);
> > > >      eventp_name = g_strdup_printf("synic-%u-event-page", vp_index);
> > > >  
> > > > -    memory_region_init_ram(&synic->msg_page_mr, obj, msgp_name,
> > > > -                           sizeof(*synic->msg_page), &error_abort);
> > > > -    memory_region_init_ram(&synic->event_page_mr, obj, eventp_name,
> > > > -                           sizeof(*synic->event_page), &error_abort);
> > > > -    synic->msg_page = memory_region_get_ram_ptr(&synic->msg_page_mr);
> > > > -    synic->event_page = memory_region_get_ram_ptr(&synic->event_page_mr);
> > > > +    synic->msg_page = qemu_memalign(qemu_real_host_page_size,
> > > > +                                    sizeof(*synic->msg_page));
> > > > +    synic->event_page = qemu_memalign(qemu_real_host_page_size,
> > > > +                                      sizeof(*synic->event_page));
> > > > +    memory_region_init_ram_device_ptr(&synic->msg_page_mr, obj, msgp_name,
> > > > +                           sizeof(*synic->msg_page), synic->msg_page);
> > > > +    memory_region_init_ram_device_ptr(&synic->event_page_mr, obj, eventp_name,
> > > > +                           sizeof(*synic->event_page), synic->event_page);
> > > >  
> > > >      g_free(msgp_name);
> > > >      g_free(eventp_name);
> > > > -- 
> > > > 2.24.1
> > > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] vhost: Don't pass ram device sections
  2020-01-08 13:53 ` [PATCH 1/2] vhost: Don't pass ram device sections Dr. David Alan Gilbert (git)
  2020-01-09 11:45   ` Michael S. Tsirkin
@ 2020-01-09 12:38   ` Roman Kagan
  2020-01-09 12:42     ` Michael S. Tsirkin
  1 sibling, 1 reply; 35+ messages in thread
From: Roman Kagan @ 2020-01-09 12:38 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: jasowang, vkuznets, qemu-devel, mst

On Wed, Jan 08, 2020 at 01:53:52PM +0000, Dr. David Alan Gilbert (git) wrote:
> Don't pass RAM blocks that are marked as ram devices to vhost.
> There's normally something special about them and they're not
> normally just shared memory.

Does this something special about them, whatever it is, make them
automatically ineligible for DMA?  Why should vhost ignore them?

/me goes reading what ram_device memory regions are...

Roman.

> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  hw/virtio/vhost.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 4da0d5a6c5..c81f0be71b 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -402,6 +402,7 @@ static bool vhost_section(struct vhost_dev *dev, MemoryRegionSection *section)
>      bool log_dirty = memory_region_get_dirty_log_mask(section->mr) &
>                       ~(1 << DIRTY_MEMORY_MIGRATION);
>      result = memory_region_is_ram(section->mr) &&
> +        !memory_region_is_ram_device(section->mr) &&
>          !memory_region_is_rom(section->mr);
>  
>      /* Vhost doesn't handle any block which is doing dirty-tracking other
> -- 
> 2.24.1
> 
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] vhost: Don't pass ram device sections
  2020-01-09 12:38   ` Roman Kagan
@ 2020-01-09 12:42     ` Michael S. Tsirkin
  0 siblings, 0 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2020-01-09 12:42 UTC (permalink / raw)
  To: Roman Kagan, Dr. David Alan Gilbert (git),
	qemu-devel, vkuznets, jasowang

On Thu, Jan 09, 2020 at 12:38:07PM +0000, Roman Kagan wrote:
> On Wed, Jan 08, 2020 at 01:53:52PM +0000, Dr. David Alan Gilbert (git) wrote:
> > Don't pass RAM blocks that are marked as ram devices to vhost.
> > There's normally something special about them and they're not
> > normally just shared memory.
> 
> Does this something special about them, whatever it is, make them
> automatically ineligible for DMA?  Why should vhost ignore them?
> /me goes reading what ram_device memory regions are...
> 
> Roman.

Well device RAM can be DMA'd into.

But vhost is a host side of virtio as opposed to a random device, we
know what it's doing, so skipping specific areas that we know are of no
interest to virtio devices for performance/internal implementation
reasons is OK.



> 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  hw/virtio/vhost.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > index 4da0d5a6c5..c81f0be71b 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -402,6 +402,7 @@ static bool vhost_section(struct vhost_dev *dev, MemoryRegionSection *section)
> >      bool log_dirty = memory_region_get_dirty_log_mask(section->mr) &
> >                       ~(1 << DIRTY_MEMORY_MIGRATION);
> >      result = memory_region_is_ram(section->mr) &&
> > +        !memory_region_is_ram_device(section->mr) &&
> >          !memory_region_is_rom(section->mr);
> >  
> >      /* Vhost doesn't handle any block which is doing dirty-tracking other
> > -- 
> > 2.24.1
> > 
> > 



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] hyperv/synic: Allocate as ram_device
  2020-01-09 12:22         ` Dr. David Alan Gilbert
@ 2020-01-09 13:00           ` Vitaly Kuznetsov
  2020-01-09 13:24             ` Roman Kagan
  2020-01-09 13:06           ` Michael S. Tsirkin
  1 sibling, 1 reply; 35+ messages in thread
From: Vitaly Kuznetsov @ 2020-01-09 13:00 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: jasowang, Michael S. Tsirkin, qemu-devel, Roman Kagan

"Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:

> And I think vhost-user will fail if you have too many sections - and
> the 16 sections from synic I think will blow the slots available.
>

SynIC is percpu, it will allocate two 4k pages for every vCPU the guest
has so we're potentially looking at hundreds of such regions.

-- 
Vitaly



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] hyperv/synic: Allocate as ram_device
  2020-01-08 13:53 ` [PATCH 2/2] hyperv/synic: Allocate as ram_device Dr. David Alan Gilbert (git)
  2020-01-09 11:48   ` Michael S. Tsirkin
@ 2020-01-09 13:03   ` Roman Kagan
  2020-01-09 13:08     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 35+ messages in thread
From: Roman Kagan @ 2020-01-09 13:03 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: jasowang, vkuznets, qemu-devel, mst

On Wed, Jan 08, 2020 at 01:53:53PM +0000, Dr. David Alan Gilbert (git) wrote:
> Mark the synic pages as ram_device so that they won't be visible
> to vhost.

Unless I'm misreading the code this also makes them invisible to
migration.

I need some more reading on the ram_device region behavior to better
understand other potential consequences.

Thanks,
Roman.

> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  hw/hyperv/hyperv.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/hyperv/hyperv.c b/hw/hyperv/hyperv.c
> index da8ce82725..4de3ec411d 100644
> --- a/hw/hyperv/hyperv.c
> +++ b/hw/hyperv/hyperv.c
> @@ -95,12 +95,14 @@ static void synic_realize(DeviceState *dev, Error **errp)
>      msgp_name = g_strdup_printf("synic-%u-msg-page", vp_index);
>      eventp_name = g_strdup_printf("synic-%u-event-page", vp_index);
>  
> -    memory_region_init_ram(&synic->msg_page_mr, obj, msgp_name,
> -                           sizeof(*synic->msg_page), &error_abort);
> -    memory_region_init_ram(&synic->event_page_mr, obj, eventp_name,
> -                           sizeof(*synic->event_page), &error_abort);
> -    synic->msg_page = memory_region_get_ram_ptr(&synic->msg_page_mr);
> -    synic->event_page = memory_region_get_ram_ptr(&synic->event_page_mr);
> +    synic->msg_page = qemu_memalign(qemu_real_host_page_size,
> +                                    sizeof(*synic->msg_page));
> +    synic->event_page = qemu_memalign(qemu_real_host_page_size,
> +                                      sizeof(*synic->event_page));
> +    memory_region_init_ram_device_ptr(&synic->msg_page_mr, obj, msgp_name,
> +                           sizeof(*synic->msg_page), synic->msg_page);
> +    memory_region_init_ram_device_ptr(&synic->event_page_mr, obj, eventp_name,
> +                           sizeof(*synic->event_page), synic->event_page);
>  
>      g_free(msgp_name);
>      g_free(eventp_name);
> -- 
> 2.24.1
> 
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] hyperv/synic: Allocate as ram_device
  2020-01-09 12:22         ` Dr. David Alan Gilbert
  2020-01-09 13:00           ` Vitaly Kuznetsov
@ 2020-01-09 13:06           ` Michael S. Tsirkin
  2020-01-09 13:22             ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2020-01-09 13:06 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: jasowang, vkuznets, qemu-devel

On Thu, Jan 09, 2020 at 12:22:37PM +0000, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
> > On Thu, Jan 09, 2020 at 12:08:20PM +0000, Dr. David Alan Gilbert wrote:
> > > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > > On Wed, Jan 08, 2020 at 01:53:53PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > 
> > > > > Mark the synic pages as ram_device so that they won't be visible
> > > > > to vhost.
> > > > > 
> > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > 
> > > > 
> > > > I think I disagree with this one.
> > > >  * A RAM device represents a mapping to a physical device, such as to a PCI
> > > >  * MMIO BAR of an vfio-pci assigned device.  The memory region may be mapped
> > > >  * into the VM address space and access to the region will modify memory
> > > >  * directly.  However, the memory region should not be included in a memory
> > > >  * dump (device may not be enabled/mapped at the time of the dump), and
> > > >  * operations incompatible with manipulating MMIO should be avoided.  Replaces
> > > >  * skip_dump flag.
> > > > 
> > > > Looks like an abuse of notation.
> > > 
> > > OK, it did feel a bit like that - any suggestions of another way to do
> > > it?
> > >   This clearly isn't normal RAM.
> > > 
> > > Dave
> > 
> > If it's just an optimization for vhost/postcopy/etc, then I think
> 
> Note it's not an optimisation; postcopy fails unless you can aggregate
> the members of the hugepage.
> And I think vhost-user will fail if you have too many sections - and
> the 16 sections from synic I think will blow the slots available.

Right, so both are internal reasons.

> > an API that says how this isn't normal ram would be ok.
> > E.g. it's not DMA'd into? Then maybe _nodma?
> 
> Do we want a new memory_region_init for that or just to be able to add
> a flag?
> 
> Dave

I think a flag API is preferable since this can apply to any kind of
region. But can go either way, Paolo's the maintainer there.

> > > > 
> > > > 
> > > > > ---
> > > > >  hw/hyperv/hyperv.c | 14 ++++++++------
> > > > >  1 file changed, 8 insertions(+), 6 deletions(-)
> > > > > 
> > > > > diff --git a/hw/hyperv/hyperv.c b/hw/hyperv/hyperv.c
> > > > > index da8ce82725..4de3ec411d 100644
> > > > > --- a/hw/hyperv/hyperv.c
> > > > > +++ b/hw/hyperv/hyperv.c
> > > > > @@ -95,12 +95,14 @@ static void synic_realize(DeviceState *dev, Error **errp)
> > > > >      msgp_name = g_strdup_printf("synic-%u-msg-page", vp_index);
> > > > >      eventp_name = g_strdup_printf("synic-%u-event-page", vp_index);
> > > > >  
> > > > > -    memory_region_init_ram(&synic->msg_page_mr, obj, msgp_name,
> > > > > -                           sizeof(*synic->msg_page), &error_abort);
> > > > > -    memory_region_init_ram(&synic->event_page_mr, obj, eventp_name,
> > > > > -                           sizeof(*synic->event_page), &error_abort);
> > > > > -    synic->msg_page = memory_region_get_ram_ptr(&synic->msg_page_mr);
> > > > > -    synic->event_page = memory_region_get_ram_ptr(&synic->event_page_mr);
> > > > > +    synic->msg_page = qemu_memalign(qemu_real_host_page_size,
> > > > > +                                    sizeof(*synic->msg_page));
> > > > > +    synic->event_page = qemu_memalign(qemu_real_host_page_size,
> > > > > +                                      sizeof(*synic->event_page));
> > > > > +    memory_region_init_ram_device_ptr(&synic->msg_page_mr, obj, msgp_name,
> > > > > +                           sizeof(*synic->msg_page), synic->msg_page);
> > > > > +    memory_region_init_ram_device_ptr(&synic->event_page_mr, obj, eventp_name,
> > > > > +                           sizeof(*synic->event_page), synic->event_page);
> > > > >  
> > > > >      g_free(msgp_name);
> > > > >      g_free(eventp_name);
> > > > > -- 
> > > > > 2.24.1
> > > > 
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] hyperv/synic: Allocate as ram_device
  2020-01-09 13:03   ` Roman Kagan
@ 2020-01-09 13:08     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 35+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-09 13:08 UTC (permalink / raw)
  To: Roman Kagan, qemu-devel, vkuznets, mst, jasowang

* Roman Kagan (rkagan@virtuozzo.com) wrote:
> On Wed, Jan 08, 2020 at 01:53:53PM +0000, Dr. David Alan Gilbert (git) wrote:
> > Mark the synic pages as ram_device so that they won't be visible
> > to vhost.
> 
> Unless I'm misreading the code this also makes them invisible to
> migration.
> 
> I need some more reading on the ram_device region behavior to better
> understand other potential consequences.

Hmm you might be right; that's not intended.
Looking at memory.c I see we have vmstate_register_ram calls in
  memory_region_init_ram
  memory_region_init_rom
  memory_region_init_rom_device

but not
  memory_region_init_ram_device_ptr

hmm; OK this needs changing then.

Dave

> Thanks,
> Roman.
> 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  hw/hyperv/hyperv.c | 14 ++++++++------
> >  1 file changed, 8 insertions(+), 6 deletions(-)
> > 
> > diff --git a/hw/hyperv/hyperv.c b/hw/hyperv/hyperv.c
> > index da8ce82725..4de3ec411d 100644
> > --- a/hw/hyperv/hyperv.c
> > +++ b/hw/hyperv/hyperv.c
> > @@ -95,12 +95,14 @@ static void synic_realize(DeviceState *dev, Error **errp)
> >      msgp_name = g_strdup_printf("synic-%u-msg-page", vp_index);
> >      eventp_name = g_strdup_printf("synic-%u-event-page", vp_index);
> >  
> > -    memory_region_init_ram(&synic->msg_page_mr, obj, msgp_name,
> > -                           sizeof(*synic->msg_page), &error_abort);
> > -    memory_region_init_ram(&synic->event_page_mr, obj, eventp_name,
> > -                           sizeof(*synic->event_page), &error_abort);
> > -    synic->msg_page = memory_region_get_ram_ptr(&synic->msg_page_mr);
> > -    synic->event_page = memory_region_get_ram_ptr(&synic->event_page_mr);
> > +    synic->msg_page = qemu_memalign(qemu_real_host_page_size,
> > +                                    sizeof(*synic->msg_page));
> > +    synic->event_page = qemu_memalign(qemu_real_host_page_size,
> > +                                      sizeof(*synic->event_page));
> > +    memory_region_init_ram_device_ptr(&synic->msg_page_mr, obj, msgp_name,
> > +                           sizeof(*synic->msg_page), synic->msg_page);
> > +    memory_region_init_ram_device_ptr(&synic->event_page_mr, obj, eventp_name,
> > +                           sizeof(*synic->event_page), synic->event_page);
> >  
> >      g_free(msgp_name);
> >      g_free(eventp_name);
> > -- 
> > 2.24.1
> > 
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] hyperv/synic: Allocate as ram_device
  2020-01-09 13:06           ` Michael S. Tsirkin
@ 2020-01-09 13:22             ` Dr. David Alan Gilbert
  2020-01-09 13:27               ` Michael S. Tsirkin
  2020-01-09 15:31               ` Paolo Bonzini
  0 siblings, 2 replies; 35+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-09 13:22 UTC (permalink / raw)
  To: Michael S. Tsirkin, pbonzini; +Cc: jasowang, vkuznets, qemu-devel

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Thu, Jan 09, 2020 at 12:22:37PM +0000, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > On Thu, Jan 09, 2020 at 12:08:20PM +0000, Dr. David Alan Gilbert wrote:
> > > > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > > > On Wed, Jan 08, 2020 at 01:53:53PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > > 
> > > > > > Mark the synic pages as ram_device so that they won't be visible
> > > > > > to vhost.
> > > > > > 
> > > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > > 
> > > > > 
> > > > > I think I disagree with this one.
> > > > >  * A RAM device represents a mapping to a physical device, such as to a PCI
> > > > >  * MMIO BAR of an vfio-pci assigned device.  The memory region may be mapped
> > > > >  * into the VM address space and access to the region will modify memory
> > > > >  * directly.  However, the memory region should not be included in a memory
> > > > >  * dump (device may not be enabled/mapped at the time of the dump), and
> > > > >  * operations incompatible with manipulating MMIO should be avoided.  Replaces
> > > > >  * skip_dump flag.
> > > > > 
> > > > > Looks like an abuse of notation.
> > > > 
> > > > OK, it did feel a bit like that - any suggestions of another way to do
> > > > it?
> > > >   This clearly isn't normal RAM.
> > > > 
> > > > Dave
> > > 
> > > If it's just an optimization for vhost/postcopy/etc, then I think
> > 
> > Note it's not an optimisation; postcopy fails unless you can aggregate
> > the members of the hugepage.
> > And I think vhost-user will fail if you have too many sections - and
> > the 16 sections from synic I think will blow the slots available.
> 
> Right, so both are internal reasons.
> 
> > > an API that says how this isn't normal ram would be ok.
> > > E.g. it's not DMA'd into? Then maybe _nodma?
> > 
> > Do we want a new memory_region_init for that or just to be able to add
> > a flag?
> > 
> > Dave
> 
> I think a flag API is preferable since this can apply to any kind of
> region. But can go either way, Paolo's the maintainer there.

(Copying Paolo in)
So what exactly does this flag mean; to me it's 'no vhost' - but is it
actually more general?

Dave

> > > > > 
> > > > > 
> > > > > > ---
> > > > > >  hw/hyperv/hyperv.c | 14 ++++++++------
> > > > > >  1 file changed, 8 insertions(+), 6 deletions(-)
> > > > > > 
> > > > > > diff --git a/hw/hyperv/hyperv.c b/hw/hyperv/hyperv.c
> > > > > > index da8ce82725..4de3ec411d 100644
> > > > > > --- a/hw/hyperv/hyperv.c
> > > > > > +++ b/hw/hyperv/hyperv.c
> > > > > > @@ -95,12 +95,14 @@ static void synic_realize(DeviceState *dev, Error **errp)
> > > > > >      msgp_name = g_strdup_printf("synic-%u-msg-page", vp_index);
> > > > > >      eventp_name = g_strdup_printf("synic-%u-event-page", vp_index);
> > > > > >  
> > > > > > -    memory_region_init_ram(&synic->msg_page_mr, obj, msgp_name,
> > > > > > -                           sizeof(*synic->msg_page), &error_abort);
> > > > > > -    memory_region_init_ram(&synic->event_page_mr, obj, eventp_name,
> > > > > > -                           sizeof(*synic->event_page), &error_abort);
> > > > > > -    synic->msg_page = memory_region_get_ram_ptr(&synic->msg_page_mr);
> > > > > > -    synic->event_page = memory_region_get_ram_ptr(&synic->event_page_mr);
> > > > > > +    synic->msg_page = qemu_memalign(qemu_real_host_page_size,
> > > > > > +                                    sizeof(*synic->msg_page));
> > > > > > +    synic->event_page = qemu_memalign(qemu_real_host_page_size,
> > > > > > +                                      sizeof(*synic->event_page));
> > > > > > +    memory_region_init_ram_device_ptr(&synic->msg_page_mr, obj, msgp_name,
> > > > > > +                           sizeof(*synic->msg_page), synic->msg_page);
> > > > > > +    memory_region_init_ram_device_ptr(&synic->event_page_mr, obj, eventp_name,
> > > > > > +                           sizeof(*synic->event_page), synic->event_page);
> > > > > >  
> > > > > >      g_free(msgp_name);
> > > > > >      g_free(eventp_name);
> > > > > > -- 
> > > > > > 2.24.1
> > > > > 
> > > > --
> > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] hyperv/synic: Allocate as ram_device
  2020-01-09 13:00           ` Vitaly Kuznetsov
@ 2020-01-09 13:24             ` Roman Kagan
  2020-01-09 13:28               ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 35+ messages in thread
From: Roman Kagan @ 2020-01-09 13:24 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: jasowang, Michael S. Tsirkin, Dr. David Alan Gilbert, qemu-devel

On Thu, Jan 09, 2020 at 02:00:00PM +0100, Vitaly Kuznetsov wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> 
> > And I think vhost-user will fail if you have too many sections - and
> > the 16 sections from synic I think will blow the slots available.
> >
> 
> SynIC is percpu, it will allocate two 4k pages for every vCPU the guest
> has so we're potentially looking at hundreds of such regions.

Indeed.

I think my original idea to implement overlay pages word-for-word to the
HyperV spec was a mistake, as it lead to fragmentation and memslot
waste.

I'll look into reworking it without actually mapping extra pages over
the existing RAM, but achieving overlay semantics by just shoving the
*content* of the "overlaid" memory somewhere.

That said, I haven't yet fully understood how the reported issue came
about, and thus whether the proposed approach would resolve it too.

Thanks,
Roman.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] hyperv/synic: Allocate as ram_device
  2020-01-09 13:22             ` Dr. David Alan Gilbert
@ 2020-01-09 13:27               ` Michael S. Tsirkin
  2020-01-09 13:28                 ` Michael S. Tsirkin
  2020-01-09 15:31               ` Paolo Bonzini
  1 sibling, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2020-01-09 13:27 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: jasowang, pbonzini, vkuznets, qemu-devel

On Thu, Jan 09, 2020 at 01:22:42PM +0000, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
> > On Thu, Jan 09, 2020 at 12:22:37PM +0000, Dr. David Alan Gilbert wrote:
> > > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > > On Thu, Jan 09, 2020 at 12:08:20PM +0000, Dr. David Alan Gilbert wrote:
> > > > > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > > > > On Wed, Jan 08, 2020 at 01:53:53PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > > > 
> > > > > > > Mark the synic pages as ram_device so that they won't be visible
> > > > > > > to vhost.
> > > > > > > 
> > > > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > > > 
> > > > > > 
> > > > > > I think I disagree with this one.
> > > > > >  * A RAM device represents a mapping to a physical device, such as to a PCI
> > > > > >  * MMIO BAR of an vfio-pci assigned device.  The memory region may be mapped
> > > > > >  * into the VM address space and access to the region will modify memory
> > > > > >  * directly.  However, the memory region should not be included in a memory
> > > > > >  * dump (device may not be enabled/mapped at the time of the dump), and
> > > > > >  * operations incompatible with manipulating MMIO should be avoided.  Replaces
> > > > > >  * skip_dump flag.
> > > > > > 
> > > > > > Looks like an abuse of notation.
> > > > > 
> > > > > OK, it did feel a bit like that - any suggestions of another way to do
> > > > > it?
> > > > >   This clearly isn't normal RAM.
> > > > > 
> > > > > Dave
> > > > 
> > > > If it's just an optimization for vhost/postcopy/etc, then I think
> > > 
> > > Note it's not an optimisation; postcopy fails unless you can aggregate
> > > the members of the hugepage.
> > > And I think vhost-user will fail if you have too many sections - and
> > > the 16 sections from synic I think will blow the slots available.
> > 
> > Right, so both are internal reasons.
> > 
> > > > an API that says how this isn't normal ram would be ok.
> > > > E.g. it's not DMA'd into? Then maybe _nodma?
> > > 
> > > Do we want a new memory_region_init for that or just to be able to add
> > > a flag?
> > > 
> > > Dave
> > 
> > I think a flag API is preferable since this can apply to any kind of
> > region. But can go either way, Paolo's the maintainer there.
> 
> (Copying Paolo in)
> So what exactly does this flag mean; to me it's 'no vhost' - but is it
> actually more general?
> 
> Dave

I think it's also handy for VFIO, that should skip it too.


> > > > > > 
> > > > > > 
> > > > > > > ---
> > > > > > >  hw/hyperv/hyperv.c | 14 ++++++++------
> > > > > > >  1 file changed, 8 insertions(+), 6 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/hw/hyperv/hyperv.c b/hw/hyperv/hyperv.c
> > > > > > > index da8ce82725..4de3ec411d 100644
> > > > > > > --- a/hw/hyperv/hyperv.c
> > > > > > > +++ b/hw/hyperv/hyperv.c
> > > > > > > @@ -95,12 +95,14 @@ static void synic_realize(DeviceState *dev, Error **errp)
> > > > > > >      msgp_name = g_strdup_printf("synic-%u-msg-page", vp_index);
> > > > > > >      eventp_name = g_strdup_printf("synic-%u-event-page", vp_index);
> > > > > > >  
> > > > > > > -    memory_region_init_ram(&synic->msg_page_mr, obj, msgp_name,
> > > > > > > -                           sizeof(*synic->msg_page), &error_abort);
> > > > > > > -    memory_region_init_ram(&synic->event_page_mr, obj, eventp_name,
> > > > > > > -                           sizeof(*synic->event_page), &error_abort);
> > > > > > > -    synic->msg_page = memory_region_get_ram_ptr(&synic->msg_page_mr);
> > > > > > > -    synic->event_page = memory_region_get_ram_ptr(&synic->event_page_mr);
> > > > > > > +    synic->msg_page = qemu_memalign(qemu_real_host_page_size,
> > > > > > > +                                    sizeof(*synic->msg_page));
> > > > > > > +    synic->event_page = qemu_memalign(qemu_real_host_page_size,
> > > > > > > +                                      sizeof(*synic->event_page));
> > > > > > > +    memory_region_init_ram_device_ptr(&synic->msg_page_mr, obj, msgp_name,
> > > > > > > +                           sizeof(*synic->msg_page), synic->msg_page);
> > > > > > > +    memory_region_init_ram_device_ptr(&synic->event_page_mr, obj, eventp_name,
> > > > > > > +                           sizeof(*synic->event_page), synic->event_page);
> > > > > > >  
> > > > > > >      g_free(msgp_name);
> > > > > > >      g_free(eventp_name);
> > > > > > > -- 
> > > > > > > 2.24.1
> > > > > > 
> > > > > --
> > > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > > 
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] hyperv/synic: Allocate as ram_device
  2020-01-09 13:24             ` Roman Kagan
@ 2020-01-09 13:28               ` Dr. David Alan Gilbert
  2020-01-09 16:12                 ` Roman Kagan
  0 siblings, 1 reply; 35+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-09 13:28 UTC (permalink / raw)
  To: Roman Kagan, Vitaly Kuznetsov, qemu-devel, jasowang, Michael S. Tsirkin

* Roman Kagan (rkagan@virtuozzo.com) wrote:
> On Thu, Jan 09, 2020 at 02:00:00PM +0100, Vitaly Kuznetsov wrote:
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> > 
> > > And I think vhost-user will fail if you have too many sections - and
> > > the 16 sections from synic I think will blow the slots available.
> > >
> > 
> > SynIC is percpu, it will allocate two 4k pages for every vCPU the guest
> > has so we're potentially looking at hundreds of such regions.
> 
> Indeed.
> 
> I think my original idea to implement overlay pages word-for-word to the
> HyperV spec was a mistake, as it lead to fragmentation and memslot
> waste.
> 
> I'll look into reworking it without actually mapping extra pages over
> the existing RAM, but achieving overlay semantics by just shoving the
> *content* of the "overlaid" memory somewhere.
> 
> That said, I haven't yet fully understood how the reported issue came
> about, and thus whether the proposed approach would resolve it too.

The problem happens when we end up with:

 a)  0-512k  RAM
 b)  512k +  synic
 c)  570kish-640k  RAM

the page alignment code rounds
  (a) to 0-2MB   - aligning to the hugepage it's in
  (b) leaves as is
  (c) aligns to 0-2MB

  it then tries to coalesce (c) and (a) and notices (b) got in the way
and fails it.

Given the guest can put Synic anywhere I'm not sure that changing it's
implementatino would help here.
(And changing it's implementation would probably break migration
compatibility).

Dave

> Thanks,
> Roman.
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] hyperv/synic: Allocate as ram_device
  2020-01-09 13:27               ` Michael S. Tsirkin
@ 2020-01-09 13:28                 ` Michael S. Tsirkin
  2020-01-09 13:40                   ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2020-01-09 13:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: jasowang, pbonzini, vkuznets, qemu-devel

On Thu, Jan 09, 2020 at 08:28:00AM -0500, Michael S. Tsirkin wrote:
> On Thu, Jan 09, 2020 at 01:22:42PM +0000, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > On Thu, Jan 09, 2020 at 12:22:37PM +0000, Dr. David Alan Gilbert wrote:
> > > > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > > > On Thu, Jan 09, 2020 at 12:08:20PM +0000, Dr. David Alan Gilbert wrote:
> > > > > > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > > > > > On Wed, Jan 08, 2020 at 01:53:53PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > > > > 
> > > > > > > > Mark the synic pages as ram_device so that they won't be visible
> > > > > > > > to vhost.
> > > > > > > > 
> > > > > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > > > > 
> > > > > > > 
> > > > > > > I think I disagree with this one.
> > > > > > >  * A RAM device represents a mapping to a physical device, such as to a PCI
> > > > > > >  * MMIO BAR of an vfio-pci assigned device.  The memory region may be mapped
> > > > > > >  * into the VM address space and access to the region will modify memory
> > > > > > >  * directly.  However, the memory region should not be included in a memory
> > > > > > >  * dump (device may not be enabled/mapped at the time of the dump), and
> > > > > > >  * operations incompatible with manipulating MMIO should be avoided.  Replaces
> > > > > > >  * skip_dump flag.
> > > > > > > 
> > > > > > > Looks like an abuse of notation.
> > > > > > 
> > > > > > OK, it did feel a bit like that - any suggestions of another way to do
> > > > > > it?
> > > > > >   This clearly isn't normal RAM.
> > > > > > 
> > > > > > Dave
> > > > > 
> > > > > If it's just an optimization for vhost/postcopy/etc, then I think
> > > > 
> > > > Note it's not an optimisation; postcopy fails unless you can aggregate
> > > > the members of the hugepage.
> > > > And I think vhost-user will fail if you have too many sections - and
> > > > the 16 sections from synic I think will blow the slots available.
> > > 
> > > Right, so both are internal reasons.
> > > 
> > > > > an API that says how this isn't normal ram would be ok.
> > > > > E.g. it's not DMA'd into? Then maybe _nodma?
> > > > 
> > > > Do we want a new memory_region_init for that or just to be able to add
> > > > a flag?
> > > > 
> > > > Dave
> > > 
> > > I think a flag API is preferable since this can apply to any kind of
> > > region. But can go either way, Paolo's the maintainer there.
> > 
> > (Copying Paolo in)
> > So what exactly does this flag mean; to me it's 'no vhost' - but is it
> > actually more general?
> > 
> > Dave
> 
> I think it's also handy for VFIO, that should skip it too.

BTW if it's "per cpu" then that is another way to put it.
Neither vfio nor vhost have a concept of cpu so neither
can support accessing per cpu things.

> 
> > > > > > > 
> > > > > > > 
> > > > > > > > ---
> > > > > > > >  hw/hyperv/hyperv.c | 14 ++++++++------
> > > > > > > >  1 file changed, 8 insertions(+), 6 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/hw/hyperv/hyperv.c b/hw/hyperv/hyperv.c
> > > > > > > > index da8ce82725..4de3ec411d 100644
> > > > > > > > --- a/hw/hyperv/hyperv.c
> > > > > > > > +++ b/hw/hyperv/hyperv.c
> > > > > > > > @@ -95,12 +95,14 @@ static void synic_realize(DeviceState *dev, Error **errp)
> > > > > > > >      msgp_name = g_strdup_printf("synic-%u-msg-page", vp_index);
> > > > > > > >      eventp_name = g_strdup_printf("synic-%u-event-page", vp_index);
> > > > > > > >  
> > > > > > > > -    memory_region_init_ram(&synic->msg_page_mr, obj, msgp_name,
> > > > > > > > -                           sizeof(*synic->msg_page), &error_abort);
> > > > > > > > -    memory_region_init_ram(&synic->event_page_mr, obj, eventp_name,
> > > > > > > > -                           sizeof(*synic->event_page), &error_abort);
> > > > > > > > -    synic->msg_page = memory_region_get_ram_ptr(&synic->msg_page_mr);
> > > > > > > > -    synic->event_page = memory_region_get_ram_ptr(&synic->event_page_mr);
> > > > > > > > +    synic->msg_page = qemu_memalign(qemu_real_host_page_size,
> > > > > > > > +                                    sizeof(*synic->msg_page));
> > > > > > > > +    synic->event_page = qemu_memalign(qemu_real_host_page_size,
> > > > > > > > +                                      sizeof(*synic->event_page));
> > > > > > > > +    memory_region_init_ram_device_ptr(&synic->msg_page_mr, obj, msgp_name,
> > > > > > > > +                           sizeof(*synic->msg_page), synic->msg_page);
> > > > > > > > +    memory_region_init_ram_device_ptr(&synic->event_page_mr, obj, eventp_name,
> > > > > > > > +                           sizeof(*synic->event_page), synic->event_page);
> > > > > > > >  
> > > > > > > >      g_free(msgp_name);
> > > > > > > >      g_free(eventp_name);
> > > > > > > > -- 
> > > > > > > > 2.24.1
> > > > > > > 
> > > > > > --
> > > > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > > > 
> > > > --
> > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] hyperv/synic: Allocate as ram_device
  2020-01-09 13:28                 ` Michael S. Tsirkin
@ 2020-01-09 13:40                   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 35+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-09 13:40 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: jasowang, pbonzini, vkuznets, qemu-devel

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Thu, Jan 09, 2020 at 08:28:00AM -0500, Michael S. Tsirkin wrote:
> > On Thu, Jan 09, 2020 at 01:22:42PM +0000, Dr. David Alan Gilbert wrote:
> > > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > > On Thu, Jan 09, 2020 at 12:22:37PM +0000, Dr. David Alan Gilbert wrote:
> > > > > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > > > > On Thu, Jan 09, 2020 at 12:08:20PM +0000, Dr. David Alan Gilbert wrote:
> > > > > > > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > > > > > > On Wed, Jan 08, 2020 at 01:53:53PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > > > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > > > > > 
> > > > > > > > > Mark the synic pages as ram_device so that they won't be visible
> > > > > > > > > to vhost.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > > > > > 
> > > > > > > > 
> > > > > > > > I think I disagree with this one.
> > > > > > > >  * A RAM device represents a mapping to a physical device, such as to a PCI
> > > > > > > >  * MMIO BAR of an vfio-pci assigned device.  The memory region may be mapped
> > > > > > > >  * into the VM address space and access to the region will modify memory
> > > > > > > >  * directly.  However, the memory region should not be included in a memory
> > > > > > > >  * dump (device may not be enabled/mapped at the time of the dump), and
> > > > > > > >  * operations incompatible with manipulating MMIO should be avoided.  Replaces
> > > > > > > >  * skip_dump flag.
> > > > > > > > 
> > > > > > > > Looks like an abuse of notation.
> > > > > > > 
> > > > > > > OK, it did feel a bit like that - any suggestions of another way to do
> > > > > > > it?
> > > > > > >   This clearly isn't normal RAM.
> > > > > > > 
> > > > > > > Dave
> > > > > > 
> > > > > > If it's just an optimization for vhost/postcopy/etc, then I think
> > > > > 
> > > > > Note it's not an optimisation; postcopy fails unless you can aggregate
> > > > > the members of the hugepage.
> > > > > And I think vhost-user will fail if you have too many sections - and
> > > > > the 16 sections from synic I think will blow the slots available.
> > > > 
> > > > Right, so both are internal reasons.
> > > > 
> > > > > > an API that says how this isn't normal ram would be ok.
> > > > > > E.g. it's not DMA'd into? Then maybe _nodma?
> > > > > 
> > > > > Do we want a new memory_region_init for that or just to be able to add
> > > > > a flag?
> > > > > 
> > > > > Dave
> > > > 
> > > > I think a flag API is preferable since this can apply to any kind of
> > > > region. But can go either way, Paolo's the maintainer there.
> > > 
> > > (Copying Paolo in)
> > > So what exactly does this flag mean; to me it's 'no vhost' - but is it
> > > actually more general?
> > > 
> > > Dave
> > 
> > I think it's also handy for VFIO, that should skip it too.
> 
> BTW if it's "per cpu" then that is another way to put it.
> Neither vfio nor vhost have a concept of cpu so neither
> can support accessing per cpu things.

In this case it's not really per-cpu in that sense; there's one of these
regions per cpu, but I think each region is mapped by all cpus.

Dave

> > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > ---
> > > > > > > > >  hw/hyperv/hyperv.c | 14 ++++++++------
> > > > > > > > >  1 file changed, 8 insertions(+), 6 deletions(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/hw/hyperv/hyperv.c b/hw/hyperv/hyperv.c
> > > > > > > > > index da8ce82725..4de3ec411d 100644
> > > > > > > > > --- a/hw/hyperv/hyperv.c
> > > > > > > > > +++ b/hw/hyperv/hyperv.c
> > > > > > > > > @@ -95,12 +95,14 @@ static void synic_realize(DeviceState *dev, Error **errp)
> > > > > > > > >      msgp_name = g_strdup_printf("synic-%u-msg-page", vp_index);
> > > > > > > > >      eventp_name = g_strdup_printf("synic-%u-event-page", vp_index);
> > > > > > > > >  
> > > > > > > > > -    memory_region_init_ram(&synic->msg_page_mr, obj, msgp_name,
> > > > > > > > > -                           sizeof(*synic->msg_page), &error_abort);
> > > > > > > > > -    memory_region_init_ram(&synic->event_page_mr, obj, eventp_name,
> > > > > > > > > -                           sizeof(*synic->event_page), &error_abort);
> > > > > > > > > -    synic->msg_page = memory_region_get_ram_ptr(&synic->msg_page_mr);
> > > > > > > > > -    synic->event_page = memory_region_get_ram_ptr(&synic->event_page_mr);
> > > > > > > > > +    synic->msg_page = qemu_memalign(qemu_real_host_page_size,
> > > > > > > > > +                                    sizeof(*synic->msg_page));
> > > > > > > > > +    synic->event_page = qemu_memalign(qemu_real_host_page_size,
> > > > > > > > > +                                      sizeof(*synic->event_page));
> > > > > > > > > +    memory_region_init_ram_device_ptr(&synic->msg_page_mr, obj, msgp_name,
> > > > > > > > > +                           sizeof(*synic->msg_page), synic->msg_page);
> > > > > > > > > +    memory_region_init_ram_device_ptr(&synic->event_page_mr, obj, eventp_name,
> > > > > > > > > +                           sizeof(*synic->event_page), synic->event_page);
> > > > > > > > >  
> > > > > > > > >      g_free(msgp_name);
> > > > > > > > >      g_free(eventp_name);
> > > > > > > > > -- 
> > > > > > > > > 2.24.1
> > > > > > > > 
> > > > > > > --
> > > > > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > > > > 
> > > > > --
> > > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > > > 
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] hyperv/synic: Allocate as ram_device
  2020-01-09 13:22             ` Dr. David Alan Gilbert
  2020-01-09 13:27               ` Michael S. Tsirkin
@ 2020-01-09 15:31               ` Paolo Bonzini
  2020-01-09 15:38                 ` Dr. David Alan Gilbert
  2020-01-09 15:40                 ` Vitaly Kuznetsov
  1 sibling, 2 replies; 35+ messages in thread
From: Paolo Bonzini @ 2020-01-09 15:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Michael S. Tsirkin; +Cc: jasowang, vkuznets, qemu-devel

On 09/01/20 14:22, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
>> On Thu, Jan 09, 2020 at 12:22:37PM +0000, Dr. David Alan Gilbert wrote:
>>> Do we want a new memory_region_init for that or just to be able to add
>>> a flag?
>>>
>> I think a flag API is preferable since this can apply to any kind of
>> region. But can go either way, Paolo's the maintainer there.
> 
> (Copying Paolo in)
> So what exactly does this flag mean; to me it's 'no vhost' - but is it
> actually more general?

It has two more effects in addition to no vhost:

1) it is skipped when dumping the guest (is this a good or bad idea for
SynIC?)

2) accesses to the region will use the specified size (e.g. 4-bytes for
address_space_stl, 1-byte for address_space_stb) instead of a memcpy.
Doesn't really matter for SynIC regions.

If (1) is a good idea, then it's 2 out of 3 and I guess the patch is okay.

Paolo



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] hyperv/synic: Allocate as ram_device
  2020-01-09 15:31               ` Paolo Bonzini
@ 2020-01-09 15:38                 ` Dr. David Alan Gilbert
  2020-01-09 15:40                 ` Vitaly Kuznetsov
  1 sibling, 0 replies; 35+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-09 15:38 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: jasowang, vkuznets, qemu-devel, Michael S. Tsirkin

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> On 09/01/20 14:22, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (mst@redhat.com) wrote:
> >> On Thu, Jan 09, 2020 at 12:22:37PM +0000, Dr. David Alan Gilbert wrote:
> >>> Do we want a new memory_region_init for that or just to be able to add
> >>> a flag?
> >>>
> >> I think a flag API is preferable since this can apply to any kind of
> >> region. But can go either way, Paolo's the maintainer there.
> > 
> > (Copying Paolo in)
> > So what exactly does this flag mean; to me it's 'no vhost' - but is it
> > actually more general?
> 
> It has two more effects in addition to no vhost:
> 
> 1) it is skipped when dumping the guest (is this a good or bad idea for
> SynIC?)
> 
> 2) accesses to the region will use the specified size (e.g. 4-bytes for
> address_space_stl, 1-byte for address_space_stb) instead of a memcpy.
> Doesn't really matter for SynIC regions.
> 
> If (1) is a good idea, then it's 2 out of 3 and I guess the patch is okay.

It's probably best  to keep them in the dump because they give some info
on the current state of the windows guest and interrupts.
Also, as Roman points out the ram-device pages aren't migrated, so we
need to fix that as well.

So, do we add a new flag? If so, is no-vhost what we want?

Dave

> Paolo
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] hyperv/synic: Allocate as ram_device
  2020-01-09 15:31               ` Paolo Bonzini
  2020-01-09 15:38                 ` Dr. David Alan Gilbert
@ 2020-01-09 15:40                 ` Vitaly Kuznetsov
  2020-07-07 10:54                   ` Paolo Bonzini
  1 sibling, 1 reply; 35+ messages in thread
From: Vitaly Kuznetsov @ 2020-01-09 15:40 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Roman Kagan, jasowang, Michael S. Tsirkin, qemu-devel,
	Dr. David Alan Gilbert

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 09/01/20 14:22, Dr. David Alan Gilbert wrote:
>> * Michael S. Tsirkin (mst@redhat.com) wrote:
>>> On Thu, Jan 09, 2020 at 12:22:37PM +0000, Dr. David Alan Gilbert wrote:
>>>> Do we want a new memory_region_init for that or just to be able to add
>>>> a flag?
>>>>
>>> I think a flag API is preferable since this can apply to any kind of
>>> region. But can go either way, Paolo's the maintainer there.
>> 
>> (Copying Paolo in)
>> So what exactly does this flag mean; to me it's 'no vhost' - but is it
>> actually more general?
>
> It has two more effects in addition to no vhost:
>
> 1) it is skipped when dumping the guest (is this a good or bad idea for
> SynIC?)

Imagine we have an not yet consumed message sitting in message page, or
a signalled event, do I understand correctly that these are going to get
lost upon migration? This may not work then -- unless we transfer
in-QEMU synic state somehow separately.

-- 
Vitaly



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] hyperv/synic: Allocate as ram_device
  2020-01-09 13:28               ` Dr. David Alan Gilbert
@ 2020-01-09 16:12                 ` Roman Kagan
  2020-01-09 16:27                   ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 35+ messages in thread
From: Roman Kagan @ 2020-01-09 16:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: jasowang, Vitaly Kuznetsov, qemu-devel, Michael S. Tsirkin

On Thu, Jan 09, 2020 at 01:28:21PM +0000, Dr. David Alan Gilbert wrote:
> * Roman Kagan (rkagan@virtuozzo.com) wrote:
> > On Thu, Jan 09, 2020 at 02:00:00PM +0100, Vitaly Kuznetsov wrote:
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> > > 
> > > > And I think vhost-user will fail if you have too many sections - and
> > > > the 16 sections from synic I think will blow the slots available.
> > > >
> > > 
> > > SynIC is percpu, it will allocate two 4k pages for every vCPU the guest
> > > has so we're potentially looking at hundreds of such regions.
> > 
> > Indeed.
> > 
> > I think my original idea to implement overlay pages word-for-word to the
> > HyperV spec was a mistake, as it lead to fragmentation and memslot
> > waste.
> > 
> > I'll look into reworking it without actually mapping extra pages over
> > the existing RAM, but achieving overlay semantics by just shoving the
> > *content* of the "overlaid" memory somewhere.
> > 
> > That said, I haven't yet fully understood how the reported issue came
> > about, and thus whether the proposed approach would resolve it too.
> 
> The problem happens when we end up with:
> 
>  a)  0-512k  RAM
>  b)  512k +  synic
>  c)  570kish-640k  RAM
> 
> the page alignment code rounds
>   (a) to 0-2MB   - aligning to the hugepage it's in
>   (b) leaves as is
>   (c) aligns to 0-2MB
> 
>   it then tries to coalesce (c) and (a) and notices (b) got in the way
> and fails it.

I see, thanks.  The only bit I still haven't quite followed is how this
failure results in a quiet vhost malfunction rather than a refusal to
start vhost.

> Given the guest can put Synic anywhere I'm not sure that changing it's
> implementatino would help here.

There would be no (b) nor (separate) (c): synic would just refer to some
memory straight from (a), regardless of its paging granularity.

> (And changing it's implementation would probably break migration
> compatibility).

I'm afraid I see no better option.

Thanks,
Roman.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] hyperv/synic: Allocate as ram_device
  2020-01-09 16:12                 ` Roman Kagan
@ 2020-01-09 16:27                   ` Dr. David Alan Gilbert
  2020-01-09 17:13                     ` Roman Kagan
  0 siblings, 1 reply; 35+ messages in thread
From: Dr. David Alan Gilbert @ 2020-01-09 16:27 UTC (permalink / raw)
  To: Roman Kagan, Vitaly Kuznetsov, qemu-devel, jasowang, Michael S. Tsirkin

* Roman Kagan (rkagan@virtuozzo.com) wrote:
> On Thu, Jan 09, 2020 at 01:28:21PM +0000, Dr. David Alan Gilbert wrote:
> > * Roman Kagan (rkagan@virtuozzo.com) wrote:
> > > On Thu, Jan 09, 2020 at 02:00:00PM +0100, Vitaly Kuznetsov wrote:
> > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> > > > 
> > > > > And I think vhost-user will fail if you have too many sections - and
> > > > > the 16 sections from synic I think will blow the slots available.
> > > > >
> > > > 
> > > > SynIC is percpu, it will allocate two 4k pages for every vCPU the guest
> > > > has so we're potentially looking at hundreds of such regions.
> > > 
> > > Indeed.
> > > 
> > > I think my original idea to implement overlay pages word-for-word to the
> > > HyperV spec was a mistake, as it lead to fragmentation and memslot
> > > waste.
> > > 
> > > I'll look into reworking it without actually mapping extra pages over
> > > the existing RAM, but achieving overlay semantics by just shoving the
> > > *content* of the "overlaid" memory somewhere.
> > > 
> > > That said, I haven't yet fully understood how the reported issue came
> > > about, and thus whether the proposed approach would resolve it too.
> > 
> > The problem happens when we end up with:
> > 
> >  a)  0-512k  RAM
> >  b)  512k +  synic
> >  c)  570kish-640k  RAM
> > 
> > the page alignment code rounds
> >   (a) to 0-2MB   - aligning to the hugepage it's in
> >   (b) leaves as is
> >   (c) aligns to 0-2MB
> > 
> >   it then tries to coalesce (c) and (a) and notices (b) got in the way
> > and fails it.
> 
> I see, thanks.  The only bit I still haven't quite followed is how this
> failure results in a quiet vhost malfunction rather than a refusal to
> start vhost.

Because there's no way to fail in vhost_region_add_section other than to
abort;

            if (mrs_gpa < prev_gpa_start) {
                error_report("%s:Section rounded to %"PRIx64
                             " prior to previous %"PRIx64,
                             __func__, mrs_gpa, prev_gpa_start);
                /* A way to cleanly fail here would be better */
                return;
            }

> > Given the guest can put Synic anywhere I'm not sure that changing it's
> > implementatino would help here.
> 
> There would be no (b) nor (separate) (c): synic would just refer to some
> memory straight from (a), regardless of its paging granularity.

Oh, if it's actually memory from main RAM, then sure, but I guess you'd
have to reserve that somehow to stop the OS using it.

> > (And changing it's implementation would probably break migration
> > compatibility).
> 
> I'm afraid I see no better option.

Migration compatibility!

Dave

> Thanks,
> Roman.
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] hyperv/synic: Allocate as ram_device
  2020-01-09 16:27                   ` Dr. David Alan Gilbert
@ 2020-01-09 17:13                     ` Roman Kagan
  0 siblings, 0 replies; 35+ messages in thread
From: Roman Kagan @ 2020-01-09 17:13 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: jasowang, Vitaly Kuznetsov, qemu-devel, Michael S. Tsirkin

On Thu, Jan 09, 2020 at 04:27:45PM +0000, Dr. David Alan Gilbert wrote:
> * Roman Kagan (rkagan@virtuozzo.com) wrote:
> > On Thu, Jan 09, 2020 at 01:28:21PM +0000, Dr. David Alan Gilbert wrote:
> > > * Roman Kagan (rkagan@virtuozzo.com) wrote:
> > > > On Thu, Jan 09, 2020 at 02:00:00PM +0100, Vitaly Kuznetsov wrote:
> > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> > > > > 
> > > > > > And I think vhost-user will fail if you have too many sections - and
> > > > > > the 16 sections from synic I think will blow the slots available.
> > > > > >
> > > > > 
> > > > > SynIC is percpu, it will allocate two 4k pages for every vCPU the guest
> > > > > has so we're potentially looking at hundreds of such regions.
> > > > 
> > > > Indeed.
> > > > 
> > > > I think my original idea to implement overlay pages word-for-word to the
> > > > HyperV spec was a mistake, as it lead to fragmentation and memslot
> > > > waste.
> > > > 
> > > > I'll look into reworking it without actually mapping extra pages over
> > > > the existing RAM, but achieving overlay semantics by just shoving the
> > > > *content* of the "overlaid" memory somewhere.
> > > > 
> > > > That said, I haven't yet fully understood how the reported issue came
> > > > about, and thus whether the proposed approach would resolve it too.
> > > 
> > > The problem happens when we end up with:
> > > 
> > >  a)  0-512k  RAM
> > >  b)  512k +  synic
> > >  c)  570kish-640k  RAM
> > > 
> > > the page alignment code rounds
> > >   (a) to 0-2MB   - aligning to the hugepage it's in
> > >   (b) leaves as is
> > >   (c) aligns to 0-2MB
> > > 
> > >   it then tries to coalesce (c) and (a) and notices (b) got in the way
> > > and fails it.
> > 
> > I see, thanks.  The only bit I still haven't quite followed is how this
> > failure results in a quiet vhost malfunction rather than a refusal to
> > start vhost.
> 
> Because there's no way to fail in vhost_region_add_section other than to
> abort;
> 
>             if (mrs_gpa < prev_gpa_start) {
>                 error_report("%s:Section rounded to %"PRIx64
>                              " prior to previous %"PRIx64,
>                              __func__, mrs_gpa, prev_gpa_start);
>                 /* A way to cleanly fail here would be better */
>                 return;
>             }
> 
> > > Given the guest can put Synic anywhere I'm not sure that changing it's
> > > implementatino would help here.
> > 
> > There would be no (b) nor (separate) (c): synic would just refer to some
> > memory straight from (a), regardless of its paging granularity.
> 
> Oh, if it's actually memory from main RAM, then sure, but I guess you'd
> have to reserve that somehow to stop the OS using it.

It's up to the OS to use it.

> > > (And changing it's implementation would probably break migration
> > > compatibility).
> > 
> > I'm afraid I see no better option.
> 
> Migration compatibility!

Hmm, good point!

I think I should be able to achieve that, by keeping the current scheme
of allocating a one-page RAM memory region for every synic page, but,
instead of mapping it on top of the general RAM, swap the content with
the page being "overlaid".  Let me see if it works out (and hope the
QEMU gang won't shoot me for such an (ab)use of memory regions).

Thanks,
Roman.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] hyperv/synic: Allocate as ram_device
  2020-01-09 15:40                 ` Vitaly Kuznetsov
@ 2020-07-07 10:54                   ` Paolo Bonzini
  0 siblings, 0 replies; 35+ messages in thread
From: Paolo Bonzini @ 2020-07-07 10:54 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Roman Kagan, jasowang, Michael S. Tsirkin, qemu-devel,
	Dr. David Alan Gilbert

On 09/01/20 16:40, Vitaly Kuznetsov wrote:
>>>>> Do we want a new memory_region_init for that or just to be able to add
>>>>> a flag?
>>>>>
>>>> I think a flag API is preferable since this can apply to any kind of
>>>> region. But can go either way, Paolo's the maintainer there.
>>> (Copying Paolo in)
>>> So what exactly does this flag mean; to me it's 'no vhost' - but is it
>>> actually more general?
>> It has two more effects in addition to no vhost:
>>
>> 1) it is skipped when dumping the guest (is this a good or bad idea for
>> SynIC?)
> Imagine we have an not yet consumed message sitting in message page, or
> a signalled event, do I understand correctly that these are going to get
> lost upon migration? This may not work then -- unless we transfer
> in-QEMU synic state somehow separately.

(Thread necromancy).

This is just dumping (and it should probably be extended to core
dumping, i.e. MADV_DONTDUMP).  Migration is controlled separately with
vmstate_register_ram.

Paolo



^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2020-07-07 10:55 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-08 13:53 [PATCH 0/2] exclude hyperv synic sections from vhost Dr. David Alan Gilbert (git)
2020-01-08 13:53 ` [PATCH 1/2] vhost: Don't pass ram device sections Dr. David Alan Gilbert (git)
2020-01-09 11:45   ` Michael S. Tsirkin
2020-01-09 11:56     ` Michael S. Tsirkin
2020-01-09 12:38   ` Roman Kagan
2020-01-09 12:42     ` Michael S. Tsirkin
2020-01-08 13:53 ` [PATCH 2/2] hyperv/synic: Allocate as ram_device Dr. David Alan Gilbert (git)
2020-01-09 11:48   ` Michael S. Tsirkin
2020-01-09 12:08     ` Dr. David Alan Gilbert
2020-01-09 12:18       ` Michael S. Tsirkin
2020-01-09 12:22         ` Dr. David Alan Gilbert
2020-01-09 13:00           ` Vitaly Kuznetsov
2020-01-09 13:24             ` Roman Kagan
2020-01-09 13:28               ` Dr. David Alan Gilbert
2020-01-09 16:12                 ` Roman Kagan
2020-01-09 16:27                   ` Dr. David Alan Gilbert
2020-01-09 17:13                     ` Roman Kagan
2020-01-09 13:06           ` Michael S. Tsirkin
2020-01-09 13:22             ` Dr. David Alan Gilbert
2020-01-09 13:27               ` Michael S. Tsirkin
2020-01-09 13:28                 ` Michael S. Tsirkin
2020-01-09 13:40                   ` Dr. David Alan Gilbert
2020-01-09 15:31               ` Paolo Bonzini
2020-01-09 15:38                 ` Dr. David Alan Gilbert
2020-01-09 15:40                 ` Vitaly Kuznetsov
2020-07-07 10:54                   ` Paolo Bonzini
2020-01-09 13:03   ` Roman Kagan
2020-01-09 13:08     ` Dr. David Alan Gilbert
2020-01-08 14:26 ` [PATCH 0/2] exclude hyperv synic sections from vhost Vitaly Kuznetsov
2020-01-09  3:00 ` Jason Wang
2020-01-09  9:07   ` Vitaly Kuznetsov
2020-01-09 12:02   ` Dr. David Alan Gilbert
2020-01-09 12:14     ` Michael S. Tsirkin
2020-01-09 11:53 ` Roman Kagan
2020-01-09 12:16   ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.