qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/2] memory: Fix up coalesced_io_del not working for KVM
@ 2019-08-17  9:32 Peter Xu
  2019-08-17  9:32 ` [Qemu-devel] [PATCH 1/2] memory: Replace has_coalesced_range with add/del flags Peter Xu
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Peter Xu @ 2019-08-17  9:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, peterx

I can easily crash QEMU as long as KVM is used with e1000 and reboot
many times, then I hit this and QEMU aborts [1]:

  kvm_mem_ioeventfd_add: error adding ioeventfd: No space left on device (28)

To reproduce this issue and also to avoid rebooting so many times,
simply dump the devcount from KVM would work too with this patch
applied to kernel:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c6a91b044d8d..c4f1e8a5a93c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3841,6 +3841,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,

        memcpy(new_bus, bus, sizeof(*bus) + i * sizeof(struct kvm_io_range));
        new_bus->dev_count++;
+       pr_info("%s: dev_count++ (%d)\n", __func__, new_bus->dev_count);
        new_bus->range[i] = range;
        memcpy(new_bus->range + i + 1, bus->range + i,
                (bus->dev_count - i) * sizeof(struct kvm_io_range));
@@ -3879,6 +3880,7 @@ void kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,

        memcpy(new_bus, bus, sizeof(*bus) + i * sizeof(struct kvm_io_range));
        new_bus->dev_count--;
+       pr_info("%s: dev_count-- (%d)\n", __func__, new_bus->dev_count);
        memcpy(new_bus->range + i, bus->range + i + 1,
               (new_bus->dev_count - i) * sizeof(struct kvm_io_range));

Just watch it increase with reboots...

After some digging, it seems to be the coalesced mmio device that
overflowed the kvm io device count.

I suspect it's not working from the very beginning when the coalesced
interfaces were introduced...  We had a fix for the addition
previously but it seems that the deletion part was still broken.  This
patchset tries to fix the two problems related to the deletion part.

IMHO the 2nd patch is a workaround of KVM in that KVM should allow
KVM_UNREGISTER_COALESCED_MMIO to work even if the user specified a
very large zone that covers multiple mmio devices.  I've a KVM patch
for that, however I didn't send it because it'll slightly change the
syscall behavior (of course it won't break any existing users in most
cases).  Please shoot if anyone thought I should post that for good.

I _think_ this should be needed by stables as well because e1000 is
still widely used (is it?) and triggering it is still not that hard
(simply reboot enough times, this should be even worse if we got more
MMIO devices, e.g., multiple e1000-like devices). I'll leave
maintainers to judge.

Please have a look, thanks.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1741863

Peter Xu (2):
  memory: Replace has_coalesced_range with add/del flags
  memory: Split zones when do coalesced_io_del()

 memory.c | 51 +++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 43 insertions(+), 8 deletions(-)

-- 
2.21.0



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Qemu-devel] [PATCH 1/2] memory: Replace has_coalesced_range with add/del flags
  2019-08-17  9:32 [Qemu-devel] [PATCH 0/2] memory: Fix up coalesced_io_del not working for KVM Peter Xu
@ 2019-08-17  9:32 ` Peter Xu
  2019-08-19 14:30   ` Paolo Bonzini
  2019-08-17  9:32 ` [Qemu-devel] [PATCH 2/2] memory: Split zones when do coalesced_io_del() Peter Xu
  2019-08-19  9:32 ` [Qemu-devel] [PATCH 0/2] memory: Fix up coalesced_io_del not working for KVM Peter Xu
  2 siblings, 1 reply; 7+ messages in thread
From: Peter Xu @ 2019-08-17  9:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, peterx

The previous has_coalesced_range counter has a problem in that it only
works for additions of coalesced mmio ranges but not deletions.  The
reason is that has_coalesced_range information can be lost when the
FlatView updates the topology again when the updated region is not
covering the coalesced regions. When that happens, due to
flatrange_equal() is not checking against has_coalesced_range, the new
FlatRange will be seen as the same one as the old and the new
instance (whose has_coalesced_range will be zero) will replace the old
instance (whose has_coalesced_range _could_ be non-zero).

To fix it, we don't cache has_coalesced_range at all in the FlatRange.
Instead we introduce two flags to make sure the coalesced_io_{add|del}
will only be called once for every FlatRange instance.  This will even
work if another FlatRange replaces current one.

Without this patch, MemoryListener.coalesced_io_del is hardly being
called due to has_coalesced_range will be mostly zero in
flat_range_coalesced_io_del() when topologies frequently change for
the "memory" address space.

Fixes: 3ac7d43a6fbb5d4a3
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 memory.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/memory.c b/memory.c
index 8141486832..1a2b465a96 100644
--- a/memory.c
+++ b/memory.c
@@ -217,7 +217,13 @@ struct FlatRange {
     bool romd_mode;
     bool readonly;
     bool nonvolatile;
-    int has_coalesced_range;
+    /*
+     * Flags to show whether we have delievered the
+     * coalesced_io_{add|del} events to the listeners for this
+     * FlatRange.
+     */
+    bool coalesced_mmio_add_done;
+    bool coalesced_mmio_del_done;
 };
 
 #define FOR_EACH_FLAT_RANGE(var, view)          \
@@ -654,7 +660,8 @@ static void render_memory_region(FlatView *view,
     fr.romd_mode = mr->romd_mode;
     fr.readonly = readonly;
     fr.nonvolatile = nonvolatile;
-    fr.has_coalesced_range = 0;
+    fr.coalesced_mmio_add_done = false;
+    fr.coalesced_mmio_del_done = false;
 
     /* Render the region itself into any gaps left by the current view. */
     for (i = 0; i < view->nr && int128_nz(remain); ++i) {
@@ -857,14 +864,16 @@ static void address_space_update_ioeventfds(AddressSpace *as)
 
 static void flat_range_coalesced_io_del(FlatRange *fr, AddressSpace *as)
 {
-    if (!fr->has_coalesced_range) {
+    if (QTAILQ_EMPTY(&fr->mr->coalesced)) {
         return;
     }
 
-    if (--fr->has_coalesced_range > 0) {
+    if (fr->coalesced_mmio_del_done) {
         return;
     }
 
+    fr->coalesced_mmio_del_done = true;
+
     MEMORY_LISTENER_UPDATE_REGION(fr, as, Reverse, coalesced_io_del,
                                   int128_get64(fr->addr.start),
                                   int128_get64(fr->addr.size));
@@ -880,10 +889,12 @@ static void flat_range_coalesced_io_add(FlatRange *fr, AddressSpace *as)
         return;
     }
 
-    if (fr->has_coalesced_range++) {
+    if (fr->coalesced_mmio_add_done) {
         return;
     }
 
+    fr->coalesced_mmio_add_done = true;
+
     QTAILQ_FOREACH(cmr, &mr->coalesced, link) {
         tmp = addrrange_shift(cmr->addr,
                               int128_sub(fr->addr.start,
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Qemu-devel] [PATCH 2/2] memory: Split zones when do coalesced_io_del()
  2019-08-17  9:32 [Qemu-devel] [PATCH 0/2] memory: Fix up coalesced_io_del not working for KVM Peter Xu
  2019-08-17  9:32 ` [Qemu-devel] [PATCH 1/2] memory: Replace has_coalesced_range with add/del flags Peter Xu
@ 2019-08-17  9:32 ` Peter Xu
  2019-08-19 14:24   ` Paolo Bonzini
  2019-08-19  9:32 ` [Qemu-devel] [PATCH 0/2] memory: Fix up coalesced_io_del not working for KVM Peter Xu
  2 siblings, 1 reply; 7+ messages in thread
From: Peter Xu @ 2019-08-17  9:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, peterx

It is a workaround of current KVM's KVM_UNREGISTER_COALESCED_MMIO
interface.  The kernel interface only allows to unregister an mmio
device with exactly the zone size when registered, or any smaller zone
that is included in the device mmio zone.  It does not support the
userspace to specify a very large zone to remove all the small mmio
devices within the zone covered.

Logically speaking it would be nicer to fix this from KVM side, though
in all cases we still need to coop with old kernels so let's do this.

This patch has nothing to do with 3ac7d43a6fbb5d4a3 because this is
probably broken from the very beginning when the
KVM_UNREGISTER_COALESCED_MMIO interface is introduced in kernel.
However to make the backport to stables easier, I'm still using the
commit 3ac7d43a6fbb5d4a3 to track this problem because this will
depend on that otherwise even additions of mmio devices won't work.

Fixes: 3ac7d43a6fbb5d4a3
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 memory.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/memory.c b/memory.c
index 1a2b465a96..b24cdd13cf 100644
--- a/memory.c
+++ b/memory.c
@@ -864,6 +864,9 @@ static void address_space_update_ioeventfds(AddressSpace *as)
 
 static void flat_range_coalesced_io_del(FlatRange *fr, AddressSpace *as)
 {
+    CoalescedMemoryRange *cmr;
+    AddrRange tmp;
+
     if (QTAILQ_EMPTY(&fr->mr->coalesced)) {
         return;
     }
@@ -874,9 +877,30 @@ static void flat_range_coalesced_io_del(FlatRange *fr, AddressSpace *as)
 
     fr->coalesced_mmio_del_done = true;
 
-    MEMORY_LISTENER_UPDATE_REGION(fr, as, Reverse, coalesced_io_del,
-                                  int128_get64(fr->addr.start),
-                                  int128_get64(fr->addr.size));
+    /*
+     * We split the big region into smaller ones to satisfy KVM's
+     * KVM_UNREGISTER_COALESCED_MMIO interface, where it does not
+     * allow to specify a large region to unregister all the devices
+     * under that zone instead it only accepts exact zones or even a
+     * smaller zone of previously registered mmio device.  Logically
+     * speaking we should better fix KVM to allow the userspace to
+     * unregister multiple mmio devices within a large requested zone,
+     * but in all cases we'll still need to live with old kernels.  So
+     * let's simply break the zones into exactly the small pieces when
+     * we do coalesced_io_add().
+     */
+    QTAILQ_FOREACH(cmr, &fr->mr->coalesced, link) {
+        tmp = addrrange_shift(cmr->addr,
+                              int128_sub(fr->addr.start,
+                                         int128_make64(fr->offset_in_region)));
+        if (!addrrange_intersects(tmp, fr->addr)) {
+            continue;
+        }
+        tmp = addrrange_intersection(tmp, fr->addr);
+        MEMORY_LISTENER_UPDATE_REGION(fr, as, Reverse, coalesced_io_del,
+                                      int128_get64(tmp.start),
+                                      int128_get64(tmp.size));
+    }
 }
 
 static void flat_range_coalesced_io_add(FlatRange *fr, AddressSpace *as)
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] memory: Fix up coalesced_io_del not working for KVM
  2019-08-17  9:32 [Qemu-devel] [PATCH 0/2] memory: Fix up coalesced_io_del not working for KVM Peter Xu
  2019-08-17  9:32 ` [Qemu-devel] [PATCH 1/2] memory: Replace has_coalesced_range with add/del flags Peter Xu
  2019-08-17  9:32 ` [Qemu-devel] [PATCH 2/2] memory: Split zones when do coalesced_io_del() Peter Xu
@ 2019-08-19  9:32 ` Peter Xu
  2 siblings, 0 replies; 7+ messages in thread
From: Peter Xu @ 2019-08-19  9:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini

On Sat, Aug 17, 2019 at 05:32:35PM +0800, Peter Xu wrote:
> I can easily crash QEMU as long as KVM is used with e1000 and reboot
> many times, then I hit this and QEMU aborts [1]:
> 
>   kvm_mem_ioeventfd_add: error adding ioeventfd: No space left on device (28)

Reproducer:

bin=x86_64-softmmu/qemu-system-x86_64
$bin -M q35,accel=kvm,kernel-irqchip=on -smp 8 -m 2G -cpu host \
     -device e1000,netdev=net0 \
     -netdev user,id=net0,hostfwd=tcp::5555-:22 \
     -device e1000,netdev=net1 \
     -netdev user,id=net1 \
     -device e1000,netdev=net2 \
     -netdev user,id=net2 \
     -device e1000,netdev=net3 \
     -netdev user,id=net3 \
     -drive file=/images/default.qcow2,if=none,cache=none,id=drive0 \
     -device virtio-blk-pci,drive=drive0

This should crash for no more than 2-3 reboots.  The more e1000, the
faster.

Regards,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] memory: Split zones when do coalesced_io_del()
  2019-08-17  9:32 ` [Qemu-devel] [PATCH 2/2] memory: Split zones when do coalesced_io_del() Peter Xu
@ 2019-08-19 14:24   ` Paolo Bonzini
  0 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2019-08-19 14:24 UTC (permalink / raw)
  To: Peter Xu, qemu-devel

On 17/08/19 11:32, Peter Xu wrote:
> It is a workaround of current KVM's KVM_UNREGISTER_COALESCED_MMIO
> interface.  The kernel interface only allows to unregister an mmio
> device with exactly the zone size when registered, or any smaller zone
> that is included in the device mmio zone.  It does not support the
> userspace to specify a very large zone to remove all the small mmio
> devices within the zone covered.
> 
> Logically speaking it would be nicer to fix this from KVM side, though
> in all cases we still need to coop with old kernels so let's do this.
> 
> This patch has nothing to do with 3ac7d43a6fbb5d4a3 because this is
> probably broken from the very beginning when the
> KVM_UNREGISTER_COALESCED_MMIO interface is introduced in kernel.
> However to make the backport to stables easier, I'm still using the
> commit 3ac7d43a6fbb5d4a3 to track this problem because this will
> depend on that otherwise even additions of mmio devices won't work.
> 
> Fixes: 3ac7d43a6fbb5d4a3
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  memory.c | 30 +++++++++++++++++++++++++++---
>  1 file changed, 27 insertions(+), 3 deletions(-)

This is still messy because memory_region_add_coalescing and
memory_region_clear_coalescing modify fr->mr->colesced.

It's not hard to fix it, but not trivial either.  Probably it is
sufficient to replace memory_region_update_coalesced_range and
memory_region_update_coalesced_range_as with two pairs:

- memory_region_add_coalesced_range and
memory_region_add_coalesced_range_as, which call a new function
flat_range_coalesced_io_add_one to call the listener only on the
newly-added range (and set coalesced_mmio_add_done).
memory_region_add_coalescing then can call
memory_region_add_coalesced_range_as

- memory_region_clear_coalesced_ranges and
memory_region_clear_coalesced_ranges_as, which call
flat_range_coalesced_io_del.  Now memory_region_clear_coalescing can
call memory_region_clear_coalesced_ranges *before* emptying the list, or
exit immediately if it is empty.

Thanks,

Paolo

> diff --git a/memory.c b/memory.c
> index 1a2b465a96..b24cdd13cf 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -864,6 +864,9 @@ static void address_space_update_ioeventfds(AddressSpace *as)
>  
>  static void flat_range_coalesced_io_del(FlatRange *fr, AddressSpace *as)
>  {
> +    CoalescedMemoryRange *cmr;
> +    AddrRange tmp;
> +
>      if (QTAILQ_EMPTY(&fr->mr->coalesced)) {
>          return;
>      }
> @@ -874,9 +877,30 @@ static void flat_range_coalesced_io_del(FlatRange *fr, AddressSpace *as)
>  
>      fr->coalesced_mmio_del_done = true;
>  
> -    MEMORY_LISTENER_UPDATE_REGION(fr, as, Reverse, coalesced_io_del,
> -                                  int128_get64(fr->addr.start),
> -                                  int128_get64(fr->addr.size));
> +    /*
> +     * We split the big region into smaller ones to satisfy KVM's
> +     * KVM_UNREGISTER_COALESCED_MMIO interface, where it does not
> +     * allow to specify a large region to unregister all the devices
> +     * under that zone instead it only accepts exact zones or even a
> +     * smaller zone of previously registered mmio device.  Logically
> +     * speaking we should better fix KVM to allow the userspace to
> +     * unregister multiple mmio devices within a large requested zone,
> +     * but in all cases we'll still need to live with old kernels.  So
> +     * let's simply break the zones into exactly the small pieces when
> +     * we do coalesced_io_add().
> +     */
> +    QTAILQ_FOREACH(cmr, &fr->mr->coalesced, link) {
> +        tmp = addrrange_shift(cmr->addr,
> +                              int128_sub(fr->addr.start,
> +                                         int128_make64(fr->offset_in_region)));
> +        if (!addrrange_intersects(tmp, fr->addr)) {
> +            continue;
> +        }
> +        tmp = addrrange_intersection(tmp, fr->addr);
> +        MEMORY_LISTENER_UPDATE_REGION(fr, as, Reverse, coalesced_io_del,
> +                                      int128_get64(tmp.start),
> +                                      int128_get64(tmp.size));
> +    }
>  }
>  
>  static void flat_range_coalesced_io_add(FlatRange *fr, AddressSpace *as)
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] memory: Replace has_coalesced_range with add/del flags
  2019-08-17  9:32 ` [Qemu-devel] [PATCH 1/2] memory: Replace has_coalesced_range with add/del flags Peter Xu
@ 2019-08-19 14:30   ` Paolo Bonzini
  2019-08-20  4:52     ` Peter Xu
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2019-08-19 14:30 UTC (permalink / raw)
  To: Peter Xu, qemu-devel

On 17/08/19 11:32, Peter Xu wrote:
> The previous has_coalesced_range counter has a problem in that it only
> works for additions of coalesced mmio ranges but not deletions.  The
> reason is that has_coalesced_range information can be lost when the
> FlatView updates the topology again when the updated region is not
> covering the coalesced regions. When that happens, due to
> flatrange_equal() is not checking against has_coalesced_range, the new
> FlatRange will be seen as the same one as the old and the new
> instance (whose has_coalesced_range will be zero) will replace the old
> instance (whose has_coalesced_range _could_ be non-zero).
> 
> To fix it, we don't cache has_coalesced_range at all in the FlatRange.
> Instead we introduce two flags to make sure the coalesced_io_{add|del}
> will only be called once for every FlatRange instance.  This will even
> work if another FlatRange replaces current one.

It's still a bit ugly that coalesced_mmio_add_done ends up not being set
on the new (but equal) FlatRange.

Would something like this work too?

diff --git a/memory.c b/memory.c
index edd0c13..fc91f06 100644
--- a/memory.c
+++ b/memory.c
@@ -939,6 +939,7 @@ static void address_space_update_topology_pass(AddressSpace *as,
             /* In both and unchanged (except logging may have changed) */
 
             if (adding) {
+                frnew->has_coalesced_range = frold->has_coalesced_range;
                 MEMORY_LISTENER_UPDATE_REGION(frnew, as, Forward, region_nop);
                 if (frnew->dirty_log_mask & ~frold->dirty_log_mask) {
                     MEMORY_LISTENER_UPDATE_REGION(frnew, as, Forward, log_start,

Thanks,

Paolo

> Without this patch, MemoryListener.coalesced_io_del is hardly being
> called due to has_coalesced_range will be mostly zero in
> flat_range_coalesced_io_del() when topologies frequently change for
> the "memory" address space.



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] memory: Replace has_coalesced_range with add/del flags
  2019-08-19 14:30   ` Paolo Bonzini
@ 2019-08-20  4:52     ` Peter Xu
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Xu @ 2019-08-20  4:52 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Mon, Aug 19, 2019 at 04:30:45PM +0200, Paolo Bonzini wrote:
> On 17/08/19 11:32, Peter Xu wrote:
> > The previous has_coalesced_range counter has a problem in that it only
> > works for additions of coalesced mmio ranges but not deletions.  The
> > reason is that has_coalesced_range information can be lost when the
> > FlatView updates the topology again when the updated region is not
> > covering the coalesced regions. When that happens, due to
> > flatrange_equal() is not checking against has_coalesced_range, the new
> > FlatRange will be seen as the same one as the old and the new
> > instance (whose has_coalesced_range will be zero) will replace the old
> > instance (whose has_coalesced_range _could_ be non-zero).
> > 
> > To fix it, we don't cache has_coalesced_range at all in the FlatRange.
> > Instead we introduce two flags to make sure the coalesced_io_{add|del}
> > will only be called once for every FlatRange instance.  This will even
> > work if another FlatRange replaces current one.
> 
> It's still a bit ugly that coalesced_mmio_add_done ends up not being set
> on the new (but equal) FlatRange.
> 
> Would something like this work too?
> 
> diff --git a/memory.c b/memory.c
> index edd0c13..fc91f06 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -939,6 +939,7 @@ static void address_space_update_topology_pass(AddressSpace *as,
>              /* In both and unchanged (except logging may have changed) */
>  
>              if (adding) {
> +                frnew->has_coalesced_range = frold->has_coalesced_range;
>                  MEMORY_LISTENER_UPDATE_REGION(frnew, as, Forward, region_nop);
>                  if (frnew->dirty_log_mask & ~frold->dirty_log_mask) {
>                      MEMORY_LISTENER_UPDATE_REGION(frnew, as, Forward, log_start,

This seems to be a much better (and, shorter) idea. :-)

I'll verify it and repost if it goes well.

Regards,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-08-20  4:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-17  9:32 [Qemu-devel] [PATCH 0/2] memory: Fix up coalesced_io_del not working for KVM Peter Xu
2019-08-17  9:32 ` [Qemu-devel] [PATCH 1/2] memory: Replace has_coalesced_range with add/del flags Peter Xu
2019-08-19 14:30   ` Paolo Bonzini
2019-08-20  4:52     ` Peter Xu
2019-08-17  9:32 ` [Qemu-devel] [PATCH 2/2] memory: Split zones when do coalesced_io_del() Peter Xu
2019-08-19 14:24   ` Paolo Bonzini
2019-08-19  9:32 ` [Qemu-devel] [PATCH 0/2] memory: Fix up coalesced_io_del not working for KVM Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).