All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] Memory region allocation races
@ 2016-02-17 12:22 Andrey Korolyov
  2016-02-17 15:30 ` Igor Mammedov
  0 siblings, 1 reply; 4+ messages in thread
From: Andrey Korolyov @ 2016-02-17 12:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: zwu.kernel, Igor Mammedov

Hello Igor, everyone,

we are seemingly running into the issue with "virtio: error trying to
map MMIO memory" on a 'legacy' vhost-net with 64 regions on VMs with
relatively small amount of DIMMs, less than ten of 512Mb and larger
ones for which it could appear literally on every boot. I could
suggest to link the problem to the memory fragmentation as busier
hypervisors tending to halt VM more frequently on this error, though
it is still a very rare issue and could not be reproduced
deterministically. Also there seems to be a (very unobvious) link to
the CVE-2015-5307, because we started seeing these stops more
frequently after rolling in an appropriate patch. Before it we saw
this stop for good once per 1M machine-hours and now it appears ~20
times more frequent all across the infrastructure. Could
4de7255f7d2be5e51664c6ac6011ffd6e5463571 +
1e0994730f772580ff98754eb5595190cdf371ef (and the rest of the queue,
for example, as in RHEL kernel) be a matter of interest for fixing the
issue or there must be an another reason, since the problem was
amplified by a patch which could add only some timing- (therefore
racing-) conditions? Static configurations where all memory is
populated not through DIMMs are not affected, so we are dealing only
with the backend device memory accesses over high (> 512Mb of the
populated base) mem:

-m 512,slots=31,maxmem=16384M -object
memory-backend-ram,id=mem0,size=512M -device
pc-dimm,id=dimm0,node=0,memdev=mem0 ......

Thanks!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] Memory region allocation races
  2016-02-17 12:22 [Qemu-devel] Memory region allocation races Andrey Korolyov
@ 2016-02-17 15:30 ` Igor Mammedov
  2016-02-17 19:07   ` Andrey Korolyov
  0 siblings, 1 reply; 4+ messages in thread
From: Igor Mammedov @ 2016-02-17 15:30 UTC (permalink / raw)
  To: Andrey Korolyov; +Cc: zwu.kernel, qemu-devel

On Wed, 17 Feb 2016 15:22:29 +0300
Andrey Korolyov <andrey@xdel.ru> wrote:

> Hello Igor, everyone,
> 
> we are seemingly running into the issue with "virtio: error trying to
> map MMIO memory" on a 'legacy' vhost-net with 64 regions on VMs with
> relatively small amount of DIMMs, less than ten of 512Mb and larger
> ones for which it could appear literally on every boot. I could
> suggest to link the problem to the memory fragmentation as busier
> hypervisors tending to halt VM more frequently on this error, though
> it is still a very rare issue and could not be reproduced
> deterministically. Also there seems to be a (very unobvious) link to
> the CVE-2015-5307, because we started seeing these stops more
> frequently after rolling in an appropriate patch. Before it we saw
> this stop for good once per 1M machine-hours and now it appears ~20
> times more frequent all across the infrastructure. Could
> 4de7255f7d2be5e51664c6ac6011ffd6e5463571 +
> 1e0994730f772580ff98754eb5595190cdf371ef
These commit are more related to supporting more than 64 regions,
so it's unlikely that it would fix the issue.

What version of QEMU are you using?
Do you have following QEMU series applied?
 [PATCH 0/6] virtio: handle non contigious s/g entries
It's probably what you are missing.


< (and the rest of the queue,
> for example, as in RHEL kernel) be a matter of interest for fixing the
> issue or there must be an another reason, since the problem was
> amplified by a patch which could add only some timing- (therefore
> racing-) conditions? Static configurations where all memory is
> populated not through DIMMs are not affected, so we are dealing only
> with the backend device memory accesses over high (> 512Mb of the
> populated base) mem:
> 
> -m 512,slots=31,maxmem=16384M -object
> memory-backend-ram,id=mem0,size=512M -device
> pc-dimm,id=dimm0,node=0,memdev=mem0 ......
> 
> Thanks!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] Memory region allocation races
  2016-02-17 15:30 ` Igor Mammedov
@ 2016-02-17 19:07   ` Andrey Korolyov
  2016-02-22 15:19     ` Andrey Korolyov
  0 siblings, 1 reply; 4+ messages in thread
From: Andrey Korolyov @ 2016-02-17 19:07 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: Zhi Yong Wu, qemu-devel

On Wed, Feb 17, 2016 at 6:30 PM, Igor Mammedov <imammedo@redhat.com> wrote:
> On Wed, 17 Feb 2016 15:22:29 +0300
> Andrey Korolyov <andrey@xdel.ru> wrote:
>
>> Hello Igor, everyone,
>>
>> we are seemingly running into the issue with "virtio: error trying to
>> map MMIO memory" on a 'legacy' vhost-net with 64 regions on VMs with
>> relatively small amount of DIMMs, less than ten of 512Mb and larger
>> ones for which it could appear literally on every boot. I could
>> suggest to link the problem to the memory fragmentation as busier
>> hypervisors tending to halt VM more frequently on this error, though
>> it is still a very rare issue and could not be reproduced
>> deterministically. Also there seems to be a (very unobvious) link to
>> the CVE-2015-5307, because we started seeing these stops more
>> frequently after rolling in an appropriate patch. Before it we saw
>> this stop for good once per 1M machine-hours and now it appears ~20
>> times more frequent all across the infrastructure. Could
>> 4de7255f7d2be5e51664c6ac6011ffd6e5463571 +
>> 1e0994730f772580ff98754eb5595190cdf371ef
> These commit are more related to supporting more than 64 regions,
> so it's unlikely that it would fix the issue.
>
> What version of QEMU are you using?
> Do you have following QEMU series applied?
>  [PATCH 0/6] virtio: handle non contigious s/g entries
> It's probably what you are missing.
>

Thanks, I`ve missed this queue back in time!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] Memory region allocation races
  2016-02-17 19:07   ` Andrey Korolyov
@ 2016-02-22 15:19     ` Andrey Korolyov
  0 siblings, 0 replies; 4+ messages in thread
From: Andrey Korolyov @ 2016-02-22 15:19 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: Zhi Yong Wu, qemu-devel

> Thanks, I`ve missed this queue back in time!

JFEI: virtqueue_map allows far easier 'unlimited growth' of the wb
cache with stuck storage backend (seems that the writes got acked for
the guest OS but actually floating in cache at the moment). Have no
stable reproducer yet to put it under valgrind, the observations are
coming from events on a scale. The issue with cache has been there
from very long ago, at least from 1.1 and we are catching its
appearance (unfortunately only when doing delayed OOM job) with RBD
backend for years. Hopefully it would be possible to trigger it more
deterministically with this queue and finally pin it :)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-02-22 15:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-17 12:22 [Qemu-devel] Memory region allocation races Andrey Korolyov
2016-02-17 15:30 ` Igor Mammedov
2016-02-17 19:07   ` Andrey Korolyov
2016-02-22 15:19     ` Andrey Korolyov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.