* [Qemu-devel] [PATCH 1/5] virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate
2019-02-14 4:39 [Qemu-devel] [PATCH 0/5] Improve balloon handling of pagesizes other than 4kiB David Gibson
@ 2019-02-14 4:39 ` David Gibson
2019-02-28 13:36 ` Michael S. Tsirkin
2019-02-14 4:39 ` [Qemu-devel] [PATCH 2/5] virtio-balloon: Corrections to address verification David Gibson
` (4 subsequent siblings)
5 siblings, 1 reply; 38+ messages in thread
From: David Gibson @ 2019-02-14 4:39 UTC (permalink / raw)
To: mst, qemu-devel; +Cc: qemu-ppc, David Gibson, David Hildenbrand
When the balloon is inflated, we discard memory place in it using madvise()
with MADV_DONTNEED. And when we deflate it we use MADV_WILLNEED, which
sounds like it makes sense but is actually unnecessary.
The misleadingly named MADV_DONTNEED just discards the memory in question,
it doesn't set any persistent state on it in-kernel; all that's necessary
to bring the memory back is to touch it. MADV_WILLNEED in contrast
specifically says that the memory will be used soon and faults it in.
This patch simplify's the balloon operation by dropping the madvise()
on deflate. This might have an impact on performance - it will move a
delay at deflate time until that memory is actually touched, which
might be more latency sensitive. However:
* Memory that's being given back to the guest by deflating the
balloon *might* be used soon, but it equally could just sit around
in the guest's pools until needed (or even be faulted out again if
the host is under memory pressure).
* Usually, the timescale over which you'll be adjusting the balloon
is long enough that a few extra faults after deflation aren't
going to make a difference.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
---
hw/virtio/virtio-balloon.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a12677d4d5..43af521884 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -35,9 +35,8 @@
static void balloon_page(void *addr, int deflate)
{
- if (!qemu_balloon_is_inhibited()) {
- qemu_madvise(addr, BALLOON_PAGE_SIZE,
- deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
+ if (!qemu_balloon_is_inhibited() && !deflate) {
+ qemu_madvise(addr, BALLOON_PAGE_SIZE, QEMU_MADV_DONTNEED);
}
}
--
2.20.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [PATCH 1/5] virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate
2019-02-14 4:39 ` [Qemu-devel] [PATCH 1/5] virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate David Gibson
@ 2019-02-28 13:36 ` Michael S. Tsirkin
2019-03-05 0:52 ` David Gibson
0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2019-02-28 13:36 UTC (permalink / raw)
To: David Gibson; +Cc: qemu-devel, qemu-ppc, David Hildenbrand
On Thu, Feb 14, 2019 at 03:39:12PM +1100, David Gibson wrote:
> When the balloon is inflated, we discard memory place in it using madvise()
> with MADV_DONTNEED. And when we deflate it we use MADV_WILLNEED, which
> sounds like it makes sense but is actually unnecessary.
>
> The misleadingly named MADV_DONTNEED just discards the memory in question,
> it doesn't set any persistent state on it in-kernel; all that's necessary
> to bring the memory back is to touch it. MADV_WILLNEED in contrast
> specifically says that the memory will be used soon and faults it in.
>
> This patch simplify's the balloon operation by dropping the madvise()
> on deflate. This might have an impact on performance - it will move a
> delay at deflate time until that memory is actually touched, which
> might be more latency sensitive. However:
>
> * Memory that's being given back to the guest by deflating the
> balloon *might* be used soon, but it equally could just sit around
> in the guest's pools until needed (or even be faulted out again if
> the host is under memory pressure).
>
> * Usually, the timescale over which you'll be adjusting the balloon
> is long enough that a few extra faults after deflation aren't
> going to make a difference.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
I'm having second thoughts about this. It might affect performance but
probably won't but we have no idea. Might cause latency jitter after
deflate where it previously didn't happen. This kind of patch should
really be accompanied by benchmarking results, not philosophy.
> ---
> hw/virtio/virtio-balloon.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index a12677d4d5..43af521884 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -35,9 +35,8 @@
>
> static void balloon_page(void *addr, int deflate)
> {
> - if (!qemu_balloon_is_inhibited()) {
> - qemu_madvise(addr, BALLOON_PAGE_SIZE,
> - deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
> + if (!qemu_balloon_is_inhibited() && !deflate) {
> + qemu_madvise(addr, BALLOON_PAGE_SIZE, QEMU_MADV_DONTNEED);
> }
> }
>
> --
> 2.20.1
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [PATCH 1/5] virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate
2019-02-28 13:36 ` Michael S. Tsirkin
@ 2019-03-05 0:52 ` David Gibson
2019-03-05 2:29 ` Michael S. Tsirkin
0 siblings, 1 reply; 38+ messages in thread
From: David Gibson @ 2019-03-05 0:52 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: qemu-devel, qemu-ppc, David Hildenbrand
[-- Attachment #1: Type: text/plain, Size: 2366 bytes --]
On Thu, Feb 28, 2019 at 08:36:58AM -0500, Michael S. Tsirkin wrote:
> On Thu, Feb 14, 2019 at 03:39:12PM +1100, David Gibson wrote:
> > When the balloon is inflated, we discard memory place in it using madvise()
> > with MADV_DONTNEED. And when we deflate it we use MADV_WILLNEED, which
> > sounds like it makes sense but is actually unnecessary.
> >
> > The misleadingly named MADV_DONTNEED just discards the memory in question,
> > it doesn't set any persistent state on it in-kernel; all that's necessary
> > to bring the memory back is to touch it. MADV_WILLNEED in contrast
> > specifically says that the memory will be used soon and faults it in.
> >
> > This patch simplify's the balloon operation by dropping the madvise()
> > on deflate. This might have an impact on performance - it will move a
> > delay at deflate time until that memory is actually touched, which
> > might be more latency sensitive. However:
> >
> > * Memory that's being given back to the guest by deflating the
> > balloon *might* be used soon, but it equally could just sit around
> > in the guest's pools until needed (or even be faulted out again if
> > the host is under memory pressure).
> >
> > * Usually, the timescale over which you'll be adjusting the balloon
> > is long enough that a few extra faults after deflation aren't
> > going to make a difference.
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > Reviewed-by: David Hildenbrand <david@redhat.com>
> > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>
> I'm having second thoughts about this. It might affect performance but
> probably won't but we have no idea. Might cause latency jitter after
> deflate where it previously didn't happen. This kind of patch should
> really be accompanied by benchmarking results, not philosophy.
I guess I see your point, much as it's annoying to spend time
benchmarking a device that's basically broken by design.
That said.. I don't really know how I'd go about benchmarking it. Any
guesses at a suitable workload which would be most likely to show a
performance degradation here?
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [PATCH 1/5] virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate
2019-03-05 0:52 ` David Gibson
@ 2019-03-05 2:29 ` Michael S. Tsirkin
2019-03-05 5:03 ` David Gibson
0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2019-03-05 2:29 UTC (permalink / raw)
To: David Gibson; +Cc: qemu-devel, qemu-ppc, David Hildenbrand
On Tue, Mar 05, 2019 at 11:52:08AM +1100, David Gibson wrote:
> On Thu, Feb 28, 2019 at 08:36:58AM -0500, Michael S. Tsirkin wrote:
> > On Thu, Feb 14, 2019 at 03:39:12PM +1100, David Gibson wrote:
> > > When the balloon is inflated, we discard memory place in it using madvise()
> > > with MADV_DONTNEED. And when we deflate it we use MADV_WILLNEED, which
> > > sounds like it makes sense but is actually unnecessary.
> > >
> > > The misleadingly named MADV_DONTNEED just discards the memory in question,
> > > it doesn't set any persistent state on it in-kernel; all that's necessary
> > > to bring the memory back is to touch it. MADV_WILLNEED in contrast
> > > specifically says that the memory will be used soon and faults it in.
> > >
> > > This patch simplify's the balloon operation by dropping the madvise()
> > > on deflate. This might have an impact on performance - it will move a
> > > delay at deflate time until that memory is actually touched, which
> > > might be more latency sensitive. However:
> > >
> > > * Memory that's being given back to the guest by deflating the
> > > balloon *might* be used soon, but it equally could just sit around
> > > in the guest's pools until needed (or even be faulted out again if
> > > the host is under memory pressure).
> > >
> > > * Usually, the timescale over which you'll be adjusting the balloon
> > > is long enough that a few extra faults after deflation aren't
> > > going to make a difference.
> > >
> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > Reviewed-by: David Hildenbrand <david@redhat.com>
> > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> >
> > I'm having second thoughts about this. It might affect performance but
> > probably won't but we have no idea. Might cause latency jitter after
> > deflate where it previously didn't happen. This kind of patch should
> > really be accompanied by benchmarking results, not philosophy.
>
> I guess I see your point, much as it's annoying to spend time
> benchmarking a device that's basically broken by design.
Because of 4K page thing? It's an annoying bug for sure. There were
patches to add a feature bit to just switch to plan s/g format, but they
were abandoned. You are welcome to revive them though.
Additionally or alternatively, we can easily add a field specifying page size.
> That said.. I don't really know how I'd go about benchmarking it. Any
> guesses at a suitable workload which would be most likely to show a
> performance degradation here?
>
> --
> David Gibson | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
> | _way_ _around_!
> http://www.ozlabs.org/~dgibson
Here's one idea - I tried to come up with a worst case scenario here.
Basically based on idea by Alex Duyck. All credits are his, all bugs are
mine:
Setup:
Memory-15837 MB
Guest Memory Size-5 GB
Swap-Disabled
Test Program-Simple program which allocates 4GB memory via malloc, touches it via memset and exits.
Use case-Number of guests that can be launched completely including the successful execution of the test program.
Procedure:
Setup:
A first guest is launched and once its console is up,
test allocation program is executed with 4 GB memory request (Due to
this the guest occupies almost 4-5 GB of memory in the host)
Afterwards balloon is inflated by 4Gbyte in the guest.
We continue launching the guests until a guest gets
killed due to low memory condition in the host.
Now repeatedly, in each guest in turn, balloon is deflated and
test allocation program is executed with 4 GB memory request (Due to
this the guest occupies almost 4-5 GB of memory in the host)
After program finishes balloon is inflated by 4GB again.
Then we switch to another guest.
Time how many cycles of this we can do.
Hope this helps.
--
MST
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [PATCH 1/5] virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate
2019-03-05 2:29 ` Michael S. Tsirkin
@ 2019-03-05 5:03 ` David Gibson
2019-03-05 14:41 ` Michael S. Tsirkin
0 siblings, 1 reply; 38+ messages in thread
From: David Gibson @ 2019-03-05 5:03 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: qemu-devel, qemu-ppc, David Hildenbrand
[-- Attachment #1: Type: text/plain, Size: 4431 bytes --]
On Mon, Mar 04, 2019 at 09:29:24PM -0500, Michael S. Tsirkin wrote:
> On Tue, Mar 05, 2019 at 11:52:08AM +1100, David Gibson wrote:
> > On Thu, Feb 28, 2019 at 08:36:58AM -0500, Michael S. Tsirkin wrote:
> > > On Thu, Feb 14, 2019 at 03:39:12PM +1100, David Gibson wrote:
> > > > When the balloon is inflated, we discard memory place in it using madvise()
> > > > with MADV_DONTNEED. And when we deflate it we use MADV_WILLNEED, which
> > > > sounds like it makes sense but is actually unnecessary.
> > > >
> > > > The misleadingly named MADV_DONTNEED just discards the memory in question,
> > > > it doesn't set any persistent state on it in-kernel; all that's necessary
> > > > to bring the memory back is to touch it. MADV_WILLNEED in contrast
> > > > specifically says that the memory will be used soon and faults it in.
> > > >
> > > > This patch simplify's the balloon operation by dropping the madvise()
> > > > on deflate. This might have an impact on performance - it will move a
> > > > delay at deflate time until that memory is actually touched, which
> > > > might be more latency sensitive. However:
> > > >
> > > > * Memory that's being given back to the guest by deflating the
> > > > balloon *might* be used soon, but it equally could just sit around
> > > > in the guest's pools until needed (or even be faulted out again if
> > > > the host is under memory pressure).
> > > >
> > > > * Usually, the timescale over which you'll be adjusting the balloon
> > > > is long enough that a few extra faults after deflation aren't
> > > > going to make a difference.
> > > >
> > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > Reviewed-by: David Hildenbrand <david@redhat.com>
> > > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > >
> > > I'm having second thoughts about this. It might affect performance but
> > > probably won't but we have no idea. Might cause latency jitter after
> > > deflate where it previously didn't happen. This kind of patch should
> > > really be accompanied by benchmarking results, not philosophy.
> >
> > I guess I see your point, much as it's annoying to spend time
> > benchmarking a device that's basically broken by design.
>
> Because of 4K page thing?
For one thing. I believe David H has bunch of other reasons.
> It's an annoying bug for sure. There were
> patches to add a feature bit to just switch to plan s/g format, but they
> were abandoned. You are welcome to revive them though.
> Additionally or alternatively, we can easily add a field specifying
> page size.
We could, but I'm pretty disinclined to work on this when virtio-mem
is a better solution in nearly every way.
> > That said.. I don't really know how I'd go about benchmarking it. Any
> > guesses at a suitable workload which would be most likely to show a
> > performance degradation here?
>
> Here's one idea - I tried to come up with a worst case scenario here.
> Basically based on idea by Alex Duyck. All credits are his, all bugs are
> mine:
Ok. I'll try to find time to implement this and test it.
> Setup:
> Memory-15837 MB
> Guest Memory Size-5 GB
> Swap-Disabled
> Test Program-Simple program which allocates 4GB memory via malloc, touches it via memset and exits.
> Use case-Number of guests that can be launched completely including the successful execution of the test program.
> Procedure:
> Setup:
> A first guest is launched and once its console is up,
> test allocation program is executed with 4 GB memory request (Due to
> this the guest occupies almost 4-5 GB of memory in the host)
> Afterwards balloon is inflated by 4Gbyte in the guest.
> We continue launching the guests until a guest gets
> killed due to low memory condition in the host.
>
>
> Now repeatedly, in each guest in turn, balloon is deflated and
> test allocation program is executed with 4 GB memory request (Due to
> this the guest occupies almost 4-5 GB of memory in the host)
> After program finishes balloon is inflated by 4GB again.
>
> Then we switch to another guest.
>
> Time how many cycles of this we can do.
>
>
> Hope this helps.
>
>
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [PATCH 1/5] virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate
2019-03-05 5:03 ` David Gibson
@ 2019-03-05 14:41 ` Michael S. Tsirkin
2019-03-05 23:35 ` David Gibson
0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2019-03-05 14:41 UTC (permalink / raw)
To: David Gibson; +Cc: qemu-devel, qemu-ppc, David Hildenbrand
On Tue, Mar 05, 2019 at 04:03:00PM +1100, David Gibson wrote:
> On Mon, Mar 04, 2019 at 09:29:24PM -0500, Michael S. Tsirkin wrote:
> > On Tue, Mar 05, 2019 at 11:52:08AM +1100, David Gibson wrote:
> > > On Thu, Feb 28, 2019 at 08:36:58AM -0500, Michael S. Tsirkin wrote:
> > > > On Thu, Feb 14, 2019 at 03:39:12PM +1100, David Gibson wrote:
> > > > > When the balloon is inflated, we discard memory place in it using madvise()
> > > > > with MADV_DONTNEED. And when we deflate it we use MADV_WILLNEED, which
> > > > > sounds like it makes sense but is actually unnecessary.
> > > > >
> > > > > The misleadingly named MADV_DONTNEED just discards the memory in question,
> > > > > it doesn't set any persistent state on it in-kernel; all that's necessary
> > > > > to bring the memory back is to touch it. MADV_WILLNEED in contrast
> > > > > specifically says that the memory will be used soon and faults it in.
> > > > >
> > > > > This patch simplify's the balloon operation by dropping the madvise()
> > > > > on deflate. This might have an impact on performance - it will move a
> > > > > delay at deflate time until that memory is actually touched, which
> > > > > might be more latency sensitive. However:
> > > > >
> > > > > * Memory that's being given back to the guest by deflating the
> > > > > balloon *might* be used soon, but it equally could just sit around
> > > > > in the guest's pools until needed (or even be faulted out again if
> > > > > the host is under memory pressure).
> > > > >
> > > > > * Usually, the timescale over which you'll be adjusting the balloon
> > > > > is long enough that a few extra faults after deflation aren't
> > > > > going to make a difference.
> > > > >
> > > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > > Reviewed-by: David Hildenbrand <david@redhat.com>
> > > > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > > >
> > > > I'm having second thoughts about this. It might affect performance but
> > > > probably won't but we have no idea. Might cause latency jitter after
> > > > deflate where it previously didn't happen. This kind of patch should
> > > > really be accompanied by benchmarking results, not philosophy.
> > >
> > > I guess I see your point, much as it's annoying to spend time
> > > benchmarking a device that's basically broken by design.
> >
> > Because of 4K page thing?
>
> For one thing. I believe David H has bunch of other reasons.
>
> > It's an annoying bug for sure. There were
> > patches to add a feature bit to just switch to plan s/g format, but they
> > were abandoned. You are welcome to revive them though.
> > Additionally or alternatively, we can easily add a field specifying
> > page size.
>
> We could, but I'm pretty disinclined to work on this when virtio-mem
> is a better solution in nearly every way.
Then one way would be to just let balloon be. Make it behave same as
always and don't make changes to it :)
> > > That said.. I don't really know how I'd go about benchmarking it. Any
> > > guesses at a suitable workload which would be most likely to show a
> > > performance degradation here?
> >
> > Here's one idea - I tried to come up with a worst case scenario here.
> > Basically based on idea by Alex Duyck. All credits are his, all bugs are
> > mine:
>
> Ok. I'll try to find time to implement this and test it.
>
> > Setup:
> > Memory-15837 MB
> > Guest Memory Size-5 GB
> > Swap-Disabled
> > Test Program-Simple program which allocates 4GB memory via malloc, touches it via memset and exits.
> > Use case-Number of guests that can be launched completely including the successful execution of the test program.
> > Procedure:
> > Setup:
> > A first guest is launched and once its console is up,
> > test allocation program is executed with 4 GB memory request (Due to
> > this the guest occupies almost 4-5 GB of memory in the host)
> > Afterwards balloon is inflated by 4Gbyte in the guest.
> > We continue launching the guests until a guest gets
> > killed due to low memory condition in the host.
> >
> >
> > Now repeatedly, in each guest in turn, balloon is deflated and
> > test allocation program is executed with 4 GB memory request (Due to
> > this the guest occupies almost 4-5 GB of memory in the host)
> > After program finishes balloon is inflated by 4GB again.
> >
> > Then we switch to another guest.
> >
> > Time how many cycles of this we can do.
> >
> >
> > Hope this helps.
> >
> >
> >
>
> --
> David Gibson | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
> | _way_ _around_!
> http://www.ozlabs.org/~dgibson
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [PATCH 1/5] virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate
2019-03-05 14:41 ` Michael S. Tsirkin
@ 2019-03-05 23:35 ` David Gibson
2019-03-06 0:14 ` Michael S. Tsirkin
0 siblings, 1 reply; 38+ messages in thread
From: David Gibson @ 2019-03-05 23:35 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: qemu-devel, qemu-ppc, David Hildenbrand
[-- Attachment #1: Type: text/plain, Size: 5314 bytes --]
On Tue, Mar 05, 2019 at 09:41:34AM -0500, Michael S. Tsirkin wrote:
> On Tue, Mar 05, 2019 at 04:03:00PM +1100, David Gibson wrote:
> > On Mon, Mar 04, 2019 at 09:29:24PM -0500, Michael S. Tsirkin wrote:
> > > On Tue, Mar 05, 2019 at 11:52:08AM +1100, David Gibson wrote:
> > > > On Thu, Feb 28, 2019 at 08:36:58AM -0500, Michael S. Tsirkin wrote:
> > > > > On Thu, Feb 14, 2019 at 03:39:12PM +1100, David Gibson wrote:
> > > > > > When the balloon is inflated, we discard memory place in it using madvise()
> > > > > > with MADV_DONTNEED. And when we deflate it we use MADV_WILLNEED, which
> > > > > > sounds like it makes sense but is actually unnecessary.
> > > > > >
> > > > > > The misleadingly named MADV_DONTNEED just discards the memory in question,
> > > > > > it doesn't set any persistent state on it in-kernel; all that's necessary
> > > > > > to bring the memory back is to touch it. MADV_WILLNEED in contrast
> > > > > > specifically says that the memory will be used soon and faults it in.
> > > > > >
> > > > > > This patch simplify's the balloon operation by dropping the madvise()
> > > > > > on deflate. This might have an impact on performance - it will move a
> > > > > > delay at deflate time until that memory is actually touched, which
> > > > > > might be more latency sensitive. However:
> > > > > >
> > > > > > * Memory that's being given back to the guest by deflating the
> > > > > > balloon *might* be used soon, but it equally could just sit around
> > > > > > in the guest's pools until needed (or even be faulted out again if
> > > > > > the host is under memory pressure).
> > > > > >
> > > > > > * Usually, the timescale over which you'll be adjusting the balloon
> > > > > > is long enough that a few extra faults after deflation aren't
> > > > > > going to make a difference.
> > > > > >
> > > > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > > > Reviewed-by: David Hildenbrand <david@redhat.com>
> > > > > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > > > >
> > > > > I'm having second thoughts about this. It might affect performance but
> > > > > probably won't but we have no idea. Might cause latency jitter after
> > > > > deflate where it previously didn't happen. This kind of patch should
> > > > > really be accompanied by benchmarking results, not philosophy.
> > > >
> > > > I guess I see your point, much as it's annoying to spend time
> > > > benchmarking a device that's basically broken by design.
> > >
> > > Because of 4K page thing?
> >
> > For one thing. I believe David H has bunch of other reasons.
> >
> > > It's an annoying bug for sure. There were
> > > patches to add a feature bit to just switch to plan s/g format, but they
> > > were abandoned. You are welcome to revive them though.
> > > Additionally or alternatively, we can easily add a field specifying
> > > page size.
> >
> > We could, but I'm pretty disinclined to work on this when virtio-mem
> > is a better solution in nearly every way.
>
> Then one way would be to just let balloon be. Make it behave same as
> always and don't make changes to it :)
I'd love to, but it is in real world use, so I think we do need to fix
serious bugs in it - at least if they can be fixed on one side,
without needing to roll out both qemu and guest changes (which adding
page size negotiation would require).
>
> > > > That said.. I don't really know how I'd go about benchmarking it. Any
> > > > guesses at a suitable workload which would be most likely to show a
> > > > performance degradation here?
> > >
> > > Here's one idea - I tried to come up with a worst case scenario here.
> > > Basically based on idea by Alex Duyck. All credits are his, all bugs are
> > > mine:
> >
> > Ok. I'll try to find time to implement this and test it.
> >
> > > Setup:
> > > Memory-15837 MB
> > > Guest Memory Size-5 GB
> > > Swap-Disabled
> > > Test Program-Simple program which allocates 4GB memory via malloc, touches it via memset and exits.
> > > Use case-Number of guests that can be launched completely including the successful execution of the test program.
> > > Procedure:
> > > Setup:
> > > A first guest is launched and once its console is up,
> > > test allocation program is executed with 4 GB memory request (Due to
> > > this the guest occupies almost 4-5 GB of memory in the host)
> > > Afterwards balloon is inflated by 4Gbyte in the guest.
> > > We continue launching the guests until a guest gets
> > > killed due to low memory condition in the host.
> > >
> > >
> > > Now repeatedly, in each guest in turn, balloon is deflated and
> > > test allocation program is executed with 4 GB memory request (Due to
> > > this the guest occupies almost 4-5 GB of memory in the host)
> > > After program finishes balloon is inflated by 4GB again.
> > >
> > > Then we switch to another guest.
> > >
> > > Time how many cycles of this we can do.
> > >
> > >
> > > Hope this helps.
> > >
> > >
> > >
> >
>
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [PATCH 1/5] virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate
2019-03-05 23:35 ` David Gibson
@ 2019-03-06 0:14 ` Michael S. Tsirkin
2019-03-06 0:58 ` David Gibson
0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2019-03-06 0:14 UTC (permalink / raw)
To: David Gibson; +Cc: qemu-devel, qemu-ppc, David Hildenbrand
On Wed, Mar 06, 2019 at 10:35:12AM +1100, David Gibson wrote:
> On Tue, Mar 05, 2019 at 09:41:34AM -0500, Michael S. Tsirkin wrote:
> > On Tue, Mar 05, 2019 at 04:03:00PM +1100, David Gibson wrote:
> > > On Mon, Mar 04, 2019 at 09:29:24PM -0500, Michael S. Tsirkin wrote:
> > > > On Tue, Mar 05, 2019 at 11:52:08AM +1100, David Gibson wrote:
> > > > > On Thu, Feb 28, 2019 at 08:36:58AM -0500, Michael S. Tsirkin wrote:
> > > > > > On Thu, Feb 14, 2019 at 03:39:12PM +1100, David Gibson wrote:
> > > > > > > When the balloon is inflated, we discard memory place in it using madvise()
> > > > > > > with MADV_DONTNEED. And when we deflate it we use MADV_WILLNEED, which
> > > > > > > sounds like it makes sense but is actually unnecessary.
> > > > > > >
> > > > > > > The misleadingly named MADV_DONTNEED just discards the memory in question,
> > > > > > > it doesn't set any persistent state on it in-kernel; all that's necessary
> > > > > > > to bring the memory back is to touch it. MADV_WILLNEED in contrast
> > > > > > > specifically says that the memory will be used soon and faults it in.
> > > > > > >
> > > > > > > This patch simplify's the balloon operation by dropping the madvise()
> > > > > > > on deflate. This might have an impact on performance - it will move a
> > > > > > > delay at deflate time until that memory is actually touched, which
> > > > > > > might be more latency sensitive. However:
> > > > > > >
> > > > > > > * Memory that's being given back to the guest by deflating the
> > > > > > > balloon *might* be used soon, but it equally could just sit around
> > > > > > > in the guest's pools until needed (or even be faulted out again if
> > > > > > > the host is under memory pressure).
> > > > > > >
> > > > > > > * Usually, the timescale over which you'll be adjusting the balloon
> > > > > > > is long enough that a few extra faults after deflation aren't
> > > > > > > going to make a difference.
> > > > > > >
> > > > > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > > > > Reviewed-by: David Hildenbrand <david@redhat.com>
> > > > > > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > >
> > > > > > I'm having second thoughts about this. It might affect performance but
> > > > > > probably won't but we have no idea. Might cause latency jitter after
> > > > > > deflate where it previously didn't happen. This kind of patch should
> > > > > > really be accompanied by benchmarking results, not philosophy.
> > > > >
> > > > > I guess I see your point, much as it's annoying to spend time
> > > > > benchmarking a device that's basically broken by design.
> > > >
> > > > Because of 4K page thing?
> > >
> > > For one thing. I believe David H has bunch of other reasons.
> > >
> > > > It's an annoying bug for sure. There were
> > > > patches to add a feature bit to just switch to plan s/g format, but they
> > > > were abandoned. You are welcome to revive them though.
> > > > Additionally or alternatively, we can easily add a field specifying
> > > > page size.
> > >
> > > We could, but I'm pretty disinclined to work on this when virtio-mem
> > > is a better solution in nearly every way.
> >
> > Then one way would be to just let balloon be. Make it behave same as
> > always and don't make changes to it :)
>
> I'd love to, but it is in real world use, so I think we do need to fix
> serious bugs in it - at least if they can be fixed on one side,
> without needing to roll out both qemu and guest changes (which adding
> page size negotiation would require).
Absolutely I'm just saying don't add optimizations in that case :)
> >
> > > > > That said.. I don't really know how I'd go about benchmarking it. Any
> > > > > guesses at a suitable workload which would be most likely to show a
> > > > > performance degradation here?
> > > >
> > > > Here's one idea - I tried to come up with a worst case scenario here.
> > > > Basically based on idea by Alex Duyck. All credits are his, all bugs are
> > > > mine:
> > >
> > > Ok. I'll try to find time to implement this and test it.
> > >
> > > > Setup:
> > > > Memory-15837 MB
> > > > Guest Memory Size-5 GB
> > > > Swap-Disabled
> > > > Test Program-Simple program which allocates 4GB memory via malloc, touches it via memset and exits.
> > > > Use case-Number of guests that can be launched completely including the successful execution of the test program.
> > > > Procedure:
> > > > Setup:
> > > > A first guest is launched and once its console is up,
> > > > test allocation program is executed with 4 GB memory request (Due to
> > > > this the guest occupies almost 4-5 GB of memory in the host)
> > > > Afterwards balloon is inflated by 4Gbyte in the guest.
> > > > We continue launching the guests until a guest gets
> > > > killed due to low memory condition in the host.
> > > >
> > > >
> > > > Now repeatedly, in each guest in turn, balloon is deflated and
> > > > test allocation program is executed with 4 GB memory request (Due to
> > > > this the guest occupies almost 4-5 GB of memory in the host)
> > > > After program finishes balloon is inflated by 4GB again.
> > > >
> > > > Then we switch to another guest.
> > > >
> > > > Time how many cycles of this we can do.
> > > >
> > > >
> > > > Hope this helps.
> > > >
> > > >
> > > >
> > >
> >
> >
>
> --
> David Gibson | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
> | _way_ _around_!
> http://www.ozlabs.org/~dgibson
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [PATCH 1/5] virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate
2019-03-06 0:14 ` Michael S. Tsirkin
@ 2019-03-06 0:58 ` David Gibson
0 siblings, 0 replies; 38+ messages in thread
From: David Gibson @ 2019-03-06 0:58 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: qemu-devel, qemu-ppc, David Hildenbrand
[-- Attachment #1: Type: text/plain, Size: 4131 bytes --]
On Tue, Mar 05, 2019 at 07:14:09PM -0500, Michael S. Tsirkin wrote:
> On Wed, Mar 06, 2019 at 10:35:12AM +1100, David Gibson wrote:
> > On Tue, Mar 05, 2019 at 09:41:34AM -0500, Michael S. Tsirkin wrote:
> > > On Tue, Mar 05, 2019 at 04:03:00PM +1100, David Gibson wrote:
> > > > On Mon, Mar 04, 2019 at 09:29:24PM -0500, Michael S. Tsirkin wrote:
> > > > > On Tue, Mar 05, 2019 at 11:52:08AM +1100, David Gibson wrote:
> > > > > > On Thu, Feb 28, 2019 at 08:36:58AM -0500, Michael S. Tsirkin wrote:
> > > > > > > On Thu, Feb 14, 2019 at 03:39:12PM +1100, David Gibson wrote:
> > > > > > > > When the balloon is inflated, we discard memory place in it using madvise()
> > > > > > > > with MADV_DONTNEED. And when we deflate it we use MADV_WILLNEED, which
> > > > > > > > sounds like it makes sense but is actually unnecessary.
> > > > > > > >
> > > > > > > > The misleadingly named MADV_DONTNEED just discards the memory in question,
> > > > > > > > it doesn't set any persistent state on it in-kernel; all that's necessary
> > > > > > > > to bring the memory back is to touch it. MADV_WILLNEED in contrast
> > > > > > > > specifically says that the memory will be used soon and faults it in.
> > > > > > > >
> > > > > > > > This patch simplify's the balloon operation by dropping the madvise()
> > > > > > > > on deflate. This might have an impact on performance - it will move a
> > > > > > > > delay at deflate time until that memory is actually touched, which
> > > > > > > > might be more latency sensitive. However:
> > > > > > > >
> > > > > > > > * Memory that's being given back to the guest by deflating the
> > > > > > > > balloon *might* be used soon, but it equally could just sit around
> > > > > > > > in the guest's pools until needed (or even be faulted out again if
> > > > > > > > the host is under memory pressure).
> > > > > > > >
> > > > > > > > * Usually, the timescale over which you'll be adjusting the balloon
> > > > > > > > is long enough that a few extra faults after deflation aren't
> > > > > > > > going to make a difference.
> > > > > > > >
> > > > > > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > > > > > Reviewed-by: David Hildenbrand <david@redhat.com>
> > > > > > > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > > >
> > > > > > > I'm having second thoughts about this. It might affect performance but
> > > > > > > probably won't but we have no idea. Might cause latency jitter after
> > > > > > > deflate where it previously didn't happen. This kind of patch should
> > > > > > > really be accompanied by benchmarking results, not philosophy.
> > > > > >
> > > > > > I guess I see your point, much as it's annoying to spend time
> > > > > > benchmarking a device that's basically broken by design.
> > > > >
> > > > > Because of 4K page thing?
> > > >
> > > > For one thing. I believe David H has bunch of other reasons.
> > > >
> > > > > It's an annoying bug for sure. There were
> > > > > patches to add a feature bit to just switch to plan s/g format, but they
> > > > > were abandoned. You are welcome to revive them though.
> > > > > Additionally or alternatively, we can easily add a field specifying
> > > > > page size.
> > > >
> > > > We could, but I'm pretty disinclined to work on this when virtio-mem
> > > > is a better solution in nearly every way.
> > >
> > > Then one way would be to just let balloon be. Make it behave same as
> > > always and don't make changes to it :)
> >
> > I'd love to, but it is in real world use, so I think we do need to fix
> > serious bugs in it - at least if they can be fixed on one side,
> > without needing to roll out both qemu and guest changes (which adding
> > page size negotiation would require).
>
>
> Absolutely I'm just saying don't add optimizations in that case :)
I don't plan to.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Qemu-devel] [PATCH 2/5] virtio-balloon: Corrections to address verification
2019-02-14 4:39 [Qemu-devel] [PATCH 0/5] Improve balloon handling of pagesizes other than 4kiB David Gibson
2019-02-14 4:39 ` [Qemu-devel] [PATCH 1/5] virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate David Gibson
@ 2019-02-14 4:39 ` David Gibson
2019-02-22 9:08 ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2019-02-14 4:39 ` [Qemu-devel] [PATCH 3/5] virtio-balloon: Rework ballon_page() interface David Gibson
` (3 subsequent siblings)
5 siblings, 1 reply; 38+ messages in thread
From: David Gibson @ 2019-02-14 4:39 UTC (permalink / raw)
To: mst, qemu-devel; +Cc: qemu-ppc, David Gibson, David Hildenbrand
The virtio-balloon device's verification of the address given to it by the
guest has a number of faults:
* The addresses here are guest physical addresses, which should be
'hwaddr' rather than 'ram_addr_t' (the distinction is admittedly
pretty subtle and confusing)
* We don't check for section.mr being NULL, which is the main way that
memory_region_find() reports basic failures. We really need to check
that before looking at any other section fields, because
memory_region_find() doesn't initialize them on the failure path
* We're passing a length of '1' to memory_region_find(), but really the
guest is requesting that we put the entire page into the balloon,
so it makes more sense to call it with BALLOON_PAGE_SIZE
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
---
hw/virtio/virtio-balloon.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 43af521884..eb357824d8 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -221,17 +221,20 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
}
while (iov_to_buf(elem->out_sg, elem->out_num, offset, &pfn, 4) == 4) {
- ram_addr_t pa;
- ram_addr_t addr;
+ hwaddr pa;
+ hwaddr addr;
int p = virtio_ldl_p(vdev, &pfn);
- pa = (ram_addr_t) p << VIRTIO_BALLOON_PFN_SHIFT;
+ pa = (hwaddr) p << VIRTIO_BALLOON_PFN_SHIFT;
offset += 4;
- /* FIXME: remove get_system_memory(), but how? */
- section = memory_region_find(get_system_memory(), pa, 1);
- if (!int128_nz(section.size) ||
- !memory_region_is_ram(section.mr) ||
+ section = memory_region_find(get_system_memory(), pa,
+ BALLOON_PAGE_SIZE);
+ if (!section.mr) {
+ trace_virtio_balloon_bad_addr(pa);
+ continue;
+ }
+ if (!memory_region_is_ram(section.mr) ||
memory_region_is_rom(section.mr) ||
memory_region_is_romd(section.mr)) {
trace_virtio_balloon_bad_addr(pa);
--
2.20.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/5] virtio-balloon: Corrections to address verification
2019-02-14 4:39 ` [Qemu-devel] [PATCH 2/5] virtio-balloon: Corrections to address verification David Gibson
@ 2019-02-22 9:08 ` Greg Kurz
2019-02-24 23:37 ` David Gibson
0 siblings, 1 reply; 38+ messages in thread
From: Greg Kurz @ 2019-02-22 9:08 UTC (permalink / raw)
To: David Gibson; +Cc: mst, qemu-devel, David Hildenbrand, qemu-ppc
On Thu, 14 Feb 2019 15:39:13 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> The virtio-balloon device's verification of the address given to it by the
> guest has a number of faults:
> * The addresses here are guest physical addresses, which should be
> 'hwaddr' rather than 'ram_addr_t' (the distinction is admittedly
> pretty subtle and confusing)
> * We don't check for section.mr being NULL, which is the main way that
> memory_region_find() reports basic failures. We really need to check
> that before looking at any other section fields, because
> memory_region_find() doesn't initialize them on the failure path
> * We're passing a length of '1' to memory_region_find(), but really the
> guest is requesting that we put the entire page into the balloon,
> so it makes more sense to call it with BALLOON_PAGE_SIZE
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> hw/virtio/virtio-balloon.c | 17 ++++++++++-------
> 1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 43af521884..eb357824d8 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -221,17 +221,20 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> }
>
> while (iov_to_buf(elem->out_sg, elem->out_num, offset, &pfn, 4) == 4) {
> - ram_addr_t pa;
> - ram_addr_t addr;
> + hwaddr pa;
> + hwaddr addr;
> int p = virtio_ldl_p(vdev, &pfn);
>
> - pa = (ram_addr_t) p << VIRTIO_BALLOON_PFN_SHIFT;
> + pa = (hwaddr) p << VIRTIO_BALLOON_PFN_SHIFT;
> offset += 4;
>
> - /* FIXME: remove get_system_memory(), but how? */
> - section = memory_region_find(get_system_memory(), pa, 1);
> - if (!int128_nz(section.size) ||
> - !memory_region_is_ram(section.mr) ||
> + section = memory_region_find(get_system_memory(), pa,
> + BALLOON_PAGE_SIZE);
> + if (!section.mr) {
> + trace_virtio_balloon_bad_addr(pa);
> + continue;
> + }
memory_region_unref(section.mr) with section.mr == NULL is safe and
resolves to a nop. Not sure you need a separate if to handle this
case.
Apart from that,
Reviewed-by: Greg Kurz <groug@kaod.org>
> + if (!memory_region_is_ram(section.mr) ||
> memory_region_is_rom(section.mr) ||
> memory_region_is_romd(section.mr)) {
> trace_virtio_balloon_bad_addr(pa);
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/5] virtio-balloon: Corrections to address verification
2019-02-22 9:08 ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
@ 2019-02-24 23:37 ` David Gibson
2019-02-25 9:26 ` Greg Kurz
0 siblings, 1 reply; 38+ messages in thread
From: David Gibson @ 2019-02-24 23:37 UTC (permalink / raw)
To: Greg Kurz; +Cc: mst, qemu-devel, David Hildenbrand, qemu-ppc
[-- Attachment #1: Type: text/plain, Size: 3368 bytes --]
On Fri, Feb 22, 2019 at 10:08:22AM +0100, Greg Kurz wrote:
> On Thu, 14 Feb 2019 15:39:13 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > The virtio-balloon device's verification of the address given to it by the
> > guest has a number of faults:
> > * The addresses here are guest physical addresses, which should be
> > 'hwaddr' rather than 'ram_addr_t' (the distinction is admittedly
> > pretty subtle and confusing)
> > * We don't check for section.mr being NULL, which is the main way that
> > memory_region_find() reports basic failures. We really need to check
> > that before looking at any other section fields, because
> > memory_region_find() doesn't initialize them on the failure path
> > * We're passing a length of '1' to memory_region_find(), but really the
> > guest is requesting that we put the entire page into the balloon,
> > so it makes more sense to call it with BALLOON_PAGE_SIZE
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > Reviewed-by: David Hildenbrand <david@redhat.com>
> > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > hw/virtio/virtio-balloon.c | 17 ++++++++++-------
> > 1 file changed, 10 insertions(+), 7 deletions(-)
> >
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index 43af521884..eb357824d8 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -221,17 +221,20 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> > }
> >
> > while (iov_to_buf(elem->out_sg, elem->out_num, offset, &pfn, 4) == 4) {
> > - ram_addr_t pa;
> > - ram_addr_t addr;
> > + hwaddr pa;
> > + hwaddr addr;
> > int p = virtio_ldl_p(vdev, &pfn);
> >
> > - pa = (ram_addr_t) p << VIRTIO_BALLOON_PFN_SHIFT;
> > + pa = (hwaddr) p << VIRTIO_BALLOON_PFN_SHIFT;
> > offset += 4;
> >
> > - /* FIXME: remove get_system_memory(), but how? */
> > - section = memory_region_find(get_system_memory(), pa, 1);
> > - if (!int128_nz(section.size) ||
> > - !memory_region_is_ram(section.mr) ||
> > + section = memory_region_find(get_system_memory(), pa,
> > + BALLOON_PAGE_SIZE);
> > + if (!section.mr) {
> > + trace_virtio_balloon_bad_addr(pa);
> > + continue;
> > + }
>
> memory_region_unref(section.mr) with section.mr == NULL is safe and
> resolves to a nop. Not sure you need a separate if to handle this
> case.
memory_region_is_ram() and friends are not, however - they will
dereference their argument unconditionally.
>
> Apart from that,
>
> Reviewed-by: Greg Kurz <groug@kaod.org>
>
> > + if (!memory_region_is_ram(section.mr) ||
> > memory_region_is_rom(section.mr) ||
> > memory_region_is_romd(section.mr)) {
> > trace_virtio_balloon_bad_addr(pa);
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/5] virtio-balloon: Corrections to address verification
2019-02-24 23:37 ` David Gibson
@ 2019-02-25 9:26 ` Greg Kurz
2019-02-26 23:20 ` David Gibson
0 siblings, 1 reply; 38+ messages in thread
From: Greg Kurz @ 2019-02-25 9:26 UTC (permalink / raw)
To: David Gibson; +Cc: mst, qemu-devel, David Hildenbrand, qemu-ppc
[-- Attachment #1: Type: text/plain, Size: 3808 bytes --]
On Mon, 25 Feb 2019 10:37:11 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> On Fri, Feb 22, 2019 at 10:08:22AM +0100, Greg Kurz wrote:
> > On Thu, 14 Feb 2019 15:39:13 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >
> > > The virtio-balloon device's verification of the address given to it by the
> > > guest has a number of faults:
> > > * The addresses here are guest physical addresses, which should be
> > > 'hwaddr' rather than 'ram_addr_t' (the distinction is admittedly
> > > pretty subtle and confusing)
> > > * We don't check for section.mr being NULL, which is the main way that
> > > memory_region_find() reports basic failures. We really need to check
> > > that before looking at any other section fields, because
> > > memory_region_find() doesn't initialize them on the failure path
> > > * We're passing a length of '1' to memory_region_find(), but really the
> > > guest is requesting that we put the entire page into the balloon,
> > > so it makes more sense to call it with BALLOON_PAGE_SIZE
> > >
> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > Reviewed-by: David Hildenbrand <david@redhat.com>
> > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > > hw/virtio/virtio-balloon.c | 17 ++++++++++-------
> > > 1 file changed, 10 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > > index 43af521884..eb357824d8 100644
> > > --- a/hw/virtio/virtio-balloon.c
> > > +++ b/hw/virtio/virtio-balloon.c
> > > @@ -221,17 +221,20 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> > > }
> > >
> > > while (iov_to_buf(elem->out_sg, elem->out_num, offset, &pfn, 4) == 4) {
> > > - ram_addr_t pa;
> > > - ram_addr_t addr;
> > > + hwaddr pa;
> > > + hwaddr addr;
> > > int p = virtio_ldl_p(vdev, &pfn);
> > >
> > > - pa = (ram_addr_t) p << VIRTIO_BALLOON_PFN_SHIFT;
> > > + pa = (hwaddr) p << VIRTIO_BALLOON_PFN_SHIFT;
> > > offset += 4;
> > >
> > > - /* FIXME: remove get_system_memory(), but how? */
> > > - section = memory_region_find(get_system_memory(), pa, 1);
> > > - if (!int128_nz(section.size) ||
> > > - !memory_region_is_ram(section.mr) ||
> > > + section = memory_region_find(get_system_memory(), pa,
> > > + BALLOON_PAGE_SIZE);
> > > + if (!section.mr) {
> > > + trace_virtio_balloon_bad_addr(pa);
> > > + continue;
> > > + }
> >
> > memory_region_unref(section.mr) with section.mr == NULL is safe and
> > resolves to a nop. Not sure you need a separate if to handle this
> > case.
>
> memory_region_is_ram() and friends are not, however - they will
> dereference their argument unconditionally.
>
Indeed but the two ifs can be merged anyway:
if (!section.mr ||
!memory_region_is_ram(section.mr) ||
memory_region_is_rom(section.mr) ||
memory_region_is_romd(section.mr)) {
trace_virtio_balloon_bad_addr(pa);
memory_region_unref(section.mr);
continue;
}
> >
> > Apart from that,
> >
> > Reviewed-by: Greg Kurz <groug@kaod.org>
> >
> > > + if (!memory_region_is_ram(section.mr) ||
> > > memory_region_is_rom(section.mr) ||
> > > memory_region_is_romd(section.mr)) {
> > > trace_virtio_balloon_bad_addr(pa);
> >
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/5] virtio-balloon: Corrections to address verification
2019-02-25 9:26 ` Greg Kurz
@ 2019-02-26 23:20 ` David Gibson
2019-02-28 9:09 ` Greg Kurz
0 siblings, 1 reply; 38+ messages in thread
From: David Gibson @ 2019-02-26 23:20 UTC (permalink / raw)
To: Greg Kurz; +Cc: mst, qemu-devel, David Hildenbrand, qemu-ppc
[-- Attachment #1: Type: text/plain, Size: 4383 bytes --]
On Mon, Feb 25, 2019 at 10:26:51AM +0100, Greg Kurz wrote:
> On Mon, 25 Feb 2019 10:37:11 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
> > On Fri, Feb 22, 2019 at 10:08:22AM +0100, Greg Kurz wrote:
> > > On Thu, 14 Feb 2019 15:39:13 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >
> > > > The virtio-balloon device's verification of the address given to it by the
> > > > guest has a number of faults:
> > > > * The addresses here are guest physical addresses, which should be
> > > > 'hwaddr' rather than 'ram_addr_t' (the distinction is admittedly
> > > > pretty subtle and confusing)
> > > > * We don't check for section.mr being NULL, which is the main way that
> > > > memory_region_find() reports basic failures. We really need to check
> > > > that before looking at any other section fields, because
> > > > memory_region_find() doesn't initialize them on the failure path
> > > > * We're passing a length of '1' to memory_region_find(), but really the
> > > > guest is requesting that we put the entire page into the balloon,
> > > > so it makes more sense to call it with BALLOON_PAGE_SIZE
> > > >
> > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > Reviewed-by: David Hildenbrand <david@redhat.com>
> > > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > > > ---
> > > > hw/virtio/virtio-balloon.c | 17 ++++++++++-------
> > > > 1 file changed, 10 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > > > index 43af521884..eb357824d8 100644
> > > > --- a/hw/virtio/virtio-balloon.c
> > > > +++ b/hw/virtio/virtio-balloon.c
> > > > @@ -221,17 +221,20 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> > > > }
> > > >
> > > > while (iov_to_buf(elem->out_sg, elem->out_num, offset, &pfn, 4) == 4) {
> > > > - ram_addr_t pa;
> > > > - ram_addr_t addr;
> > > > + hwaddr pa;
> > > > + hwaddr addr;
> > > > int p = virtio_ldl_p(vdev, &pfn);
> > > >
> > > > - pa = (ram_addr_t) p << VIRTIO_BALLOON_PFN_SHIFT;
> > > > + pa = (hwaddr) p << VIRTIO_BALLOON_PFN_SHIFT;
> > > > offset += 4;
> > > >
> > > > - /* FIXME: remove get_system_memory(), but how? */
> > > > - section = memory_region_find(get_system_memory(), pa, 1);
> > > > - if (!int128_nz(section.size) ||
> > > > - !memory_region_is_ram(section.mr) ||
> > > > + section = memory_region_find(get_system_memory(), pa,
> > > > + BALLOON_PAGE_SIZE);
> > > > + if (!section.mr) {
> > > > + trace_virtio_balloon_bad_addr(pa);
> > > > + continue;
> > > > + }
> > >
> > > memory_region_unref(section.mr) with section.mr == NULL is safe and
> > > resolves to a nop. Not sure you need a separate if to handle this
> > > case.
> >
> > memory_region_is_ram() and friends are not, however - they will
> > dereference their argument unconditionally.
> >
>
> Indeed but the two ifs can be merged anyway:
>
> if (!section.mr ||
> !memory_region_is_ram(section.mr) ||
> memory_region_is_rom(section.mr) ||
> memory_region_is_romd(section.mr)) {
> trace_virtio_balloon_bad_addr(pa);
> memory_region_unref(section.mr);
> continue;
> }
Oh, I see what you mean. Hrm, I still kind of prefer visually
separating the validity check from tests which depend on that validity.
>
> > >
> > > Apart from that,
> > >
> > > Reviewed-by: Greg Kurz <groug@kaod.org>
> > >
> > > > + if (!memory_region_is_ram(section.mr) ||
> > > > memory_region_is_rom(section.mr) ||
> > > > memory_region_is_romd(section.mr)) {
> > > > trace_virtio_balloon_bad_addr(pa);
> > >
> >
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/5] virtio-balloon: Corrections to address verification
2019-02-26 23:20 ` David Gibson
@ 2019-02-28 9:09 ` Greg Kurz
0 siblings, 0 replies; 38+ messages in thread
From: Greg Kurz @ 2019-02-28 9:09 UTC (permalink / raw)
To: David Gibson; +Cc: David Hildenbrand, qemu-ppc, qemu-devel, mst
[-- Attachment #1: Type: text/plain, Size: 4535 bytes --]
On Wed, 27 Feb 2019 10:20:12 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:
> On Mon, Feb 25, 2019 at 10:26:51AM +0100, Greg Kurz wrote:
> > On Mon, 25 Feb 2019 10:37:11 +1100
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >
> > > On Fri, Feb 22, 2019 at 10:08:22AM +0100, Greg Kurz wrote:
> > > > On Thu, 14 Feb 2019 15:39:13 +1100
> > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >
> > > > > The virtio-balloon device's verification of the address given to it by the
> > > > > guest has a number of faults:
> > > > > * The addresses here are guest physical addresses, which should be
> > > > > 'hwaddr' rather than 'ram_addr_t' (the distinction is admittedly
> > > > > pretty subtle and confusing)
> > > > > * We don't check for section.mr being NULL, which is the main way that
> > > > > memory_region_find() reports basic failures. We really need to check
> > > > > that before looking at any other section fields, because
> > > > > memory_region_find() doesn't initialize them on the failure path
> > > > > * We're passing a length of '1' to memory_region_find(), but really the
> > > > > guest is requesting that we put the entire page into the balloon,
> > > > > so it makes more sense to call it with BALLOON_PAGE_SIZE
> > > > >
> > > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > > Reviewed-by: David Hildenbrand <david@redhat.com>
> > > > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > ---
> > > > > hw/virtio/virtio-balloon.c | 17 ++++++++++-------
> > > > > 1 file changed, 10 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > > > > index 43af521884..eb357824d8 100644
> > > > > --- a/hw/virtio/virtio-balloon.c
> > > > > +++ b/hw/virtio/virtio-balloon.c
> > > > > @@ -221,17 +221,20 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> > > > > }
> > > > >
> > > > > while (iov_to_buf(elem->out_sg, elem->out_num, offset, &pfn, 4) == 4) {
> > > > > - ram_addr_t pa;
> > > > > - ram_addr_t addr;
> > > > > + hwaddr pa;
> > > > > + hwaddr addr;
> > > > > int p = virtio_ldl_p(vdev, &pfn);
> > > > >
> > > > > - pa = (ram_addr_t) p << VIRTIO_BALLOON_PFN_SHIFT;
> > > > > + pa = (hwaddr) p << VIRTIO_BALLOON_PFN_SHIFT;
> > > > > offset += 4;
> > > > >
> > > > > - /* FIXME: remove get_system_memory(), but how? */
> > > > > - section = memory_region_find(get_system_memory(), pa, 1);
> > > > > - if (!int128_nz(section.size) ||
> > > > > - !memory_region_is_ram(section.mr) ||
> > > > > + section = memory_region_find(get_system_memory(), pa,
> > > > > + BALLOON_PAGE_SIZE);
> > > > > + if (!section.mr) {
> > > > > + trace_virtio_balloon_bad_addr(pa);
> > > > > + continue;
> > > > > + }
> > > >
> > > > memory_region_unref(section.mr) with section.mr == NULL is safe and
> > > > resolves to a nop. Not sure you need a separate if to handle this
> > > > case.
> > >
> > > memory_region_is_ram() and friends are not, however - they will
> > > dereference their argument unconditionally.
> > >
> >
> > Indeed but the two ifs can be merged anyway:
> >
> > if (!section.mr ||
> > !memory_region_is_ram(section.mr) ||
> > memory_region_is_rom(section.mr) ||
> > memory_region_is_romd(section.mr)) {
> > trace_virtio_balloon_bad_addr(pa);
> > memory_region_unref(section.mr);
> > continue;
> > }
>
> Oh, I see what you mean. Hrm, I still kind of prefer visually
> separating the validity check from tests which depend on that validity.
>
Makes sense. It's alright then :)
> >
> > > >
> > > > Apart from that,
> > > >
> > > > Reviewed-by: Greg Kurz <groug@kaod.org>
> > > >
> > > > > + if (!memory_region_is_ram(section.mr) ||
> > > > > memory_region_is_rom(section.mr) ||
> > > > > memory_region_is_romd(section.mr)) {
> > > > > trace_virtio_balloon_bad_addr(pa);
> > > >
> > >
> >
>
>
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Qemu-devel] [PATCH 3/5] virtio-balloon: Rework ballon_page() interface
2019-02-14 4:39 [Qemu-devel] [PATCH 0/5] Improve balloon handling of pagesizes other than 4kiB David Gibson
2019-02-14 4:39 ` [Qemu-devel] [PATCH 1/5] virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate David Gibson
2019-02-14 4:39 ` [Qemu-devel] [PATCH 2/5] virtio-balloon: Corrections to address verification David Gibson
@ 2019-02-14 4:39 ` David Gibson
2019-02-14 4:39 ` [Qemu-devel] [PATCH 4/5] virtio-balloon: Use ram_block_discard_range() instead of raw madvise() David Gibson
` (2 subsequent siblings)
5 siblings, 0 replies; 38+ messages in thread
From: David Gibson @ 2019-02-14 4:39 UTC (permalink / raw)
To: mst, qemu-devel; +Cc: qemu-ppc, David Gibson, David Hildenbrand
This replaces the balloon_page() internal interface with
ballon_inflate_page(), with a slightly different interface. The new
interface will make future alterations simpler.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
---
hw/virtio/virtio-balloon.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index eb357824d8..bf93148486 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -33,11 +33,12 @@
#define BALLOON_PAGE_SIZE (1 << VIRTIO_BALLOON_PFN_SHIFT)
-static void balloon_page(void *addr, int deflate)
+static void balloon_inflate_page(VirtIOBalloon *balloon,
+ MemoryRegion *mr, hwaddr offset)
{
- if (!qemu_balloon_is_inhibited() && !deflate) {
- qemu_madvise(addr, BALLOON_PAGE_SIZE, QEMU_MADV_DONTNEED);
- }
+ void *addr = memory_region_get_ram_ptr(mr) + offset;
+
+ qemu_madvise(addr, BALLOON_PAGE_SIZE, QEMU_MADV_DONTNEED);
}
static const char *balloon_stat_names[] = {
@@ -222,7 +223,6 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
while (iov_to_buf(elem->out_sg, elem->out_num, offset, &pfn, 4) == 4) {
hwaddr pa;
- hwaddr addr;
int p = virtio_ldl_p(vdev, &pfn);
pa = (hwaddr) p << VIRTIO_BALLOON_PFN_SHIFT;
@@ -244,11 +244,9 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
trace_virtio_balloon_handle_output(memory_region_name(section.mr),
pa);
- /* Using memory_region_get_ram_ptr is bending the rules a bit, but
- should be OK because we only want a single page. */
- addr = section.offset_within_region;
- balloon_page(memory_region_get_ram_ptr(section.mr) + addr,
- !!(vq == s->dvq));
+ if (!qemu_balloon_is_inhibited() && vq != s->dvq) {
+ balloon_inflate_page(s, section.mr, section.offset_within_region);
+ }
memory_region_unref(section.mr);
}
--
2.20.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [Qemu-devel] [PATCH 4/5] virtio-balloon: Use ram_block_discard_range() instead of raw madvise()
2019-02-14 4:39 [Qemu-devel] [PATCH 0/5] Improve balloon handling of pagesizes other than 4kiB David Gibson
` (2 preceding siblings ...)
2019-02-14 4:39 ` [Qemu-devel] [PATCH 3/5] virtio-balloon: Rework ballon_page() interface David Gibson
@ 2019-02-14 4:39 ` David Gibson
2019-02-14 4:39 ` [Qemu-devel] [PATCH 5/5] virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size David Gibson
2019-02-28 13:39 ` [Qemu-devel] [PATCH 0/5] Improve balloon handling of pagesizes other than 4kiB Michael S. Tsirkin
5 siblings, 0 replies; 38+ messages in thread
From: David Gibson @ 2019-02-14 4:39 UTC (permalink / raw)
To: mst, qemu-devel; +Cc: qemu-ppc, David Gibson
Currently, virtio-balloon uses madvise() with MADV_DONTNEED to actually
discard RAM pages inserted into the balloon. This is basically a Linux
only interface (MADV_DONTNEED exists on some other platforms, but doesn't
always have the same semantics). It also doesn't work on hugepages and has
some other limitations.
It turns out that postcopy also needs to discard chunks of memory, and uses
a better interface for it: ram_block_discard_range(). It doesn't cover
every case, but it covers more than going direct to madvise() and this
gives us a single place to update for more possibilities in future.
There are some subtleties here to maintain the current balloon behaviour:
* For now, we just ignore requests to balloon in a hugepage backed region.
That matches current behaviour, because MADV_DONTNEED on a hugepage would
simply fail, and we ignore the error.
* If host page size is > BALLOON_PAGE_SIZE we can frequently call this on
non-host-page-aligned addresses. These would also fail in madvise(),
which we then ignored. ram_block_discard_range() error_report()s calls
on unaligned addresses, so we explicitly check that case to avoid
spamming the logs.
* We now call ram_block_discard_range() with the *host* page size, whereas
we previously called madvise() with BALLOON_PAGE_SIZE. Surprisingly,
this also matches existing behaviour. Although the kernel fails madvise
on unaligned addresses, it will round unaligned sizes *up* to the host
page size. Yes, this means that if BALLOON_PAGE_SIZE < guest page size
we can incorrectly discard more memory than the guest asked us to. I'm
planning to address that soon.
Errors other than the ones discussed above, will now be reported by
ram_block_discard_range(), rather than silently ignored, which means we
have a much better chance of seeing when something is going wrong.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
---
hw/virtio/virtio-balloon.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index bf93148486..e4cd8d566b 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -37,8 +37,29 @@ static void balloon_inflate_page(VirtIOBalloon *balloon,
MemoryRegion *mr, hwaddr offset)
{
void *addr = memory_region_get_ram_ptr(mr) + offset;
+ RAMBlock *rb;
+ size_t rb_page_size;
+ ram_addr_t ram_offset;
- qemu_madvise(addr, BALLOON_PAGE_SIZE, QEMU_MADV_DONTNEED);
+ /* XXX is there a better way to get to the RAMBlock than via a
+ * host address? */
+ rb = qemu_ram_block_from_host(addr, false, &ram_offset);
+ rb_page_size = qemu_ram_pagesize(rb);
+
+ /* Silently ignore hugepage RAM blocks */
+ if (rb_page_size != getpagesize()) {
+ return;
+ }
+
+ /* Silently ignore unaligned requests */
+ if (ram_offset & (rb_page_size - 1)) {
+ return;
+ }
+
+ ram_block_discard_range(rb, ram_offset, rb_page_size);
+ /* We ignore errors from ram_block_discard_range(), because it has
+ * already reported them, and failing to discard a balloon page is
+ * not fatal */
}
static const char *balloon_stat_names[] = {
--
2.20.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [Qemu-devel] [PATCH 5/5] virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size
2019-02-14 4:39 [Qemu-devel] [PATCH 0/5] Improve balloon handling of pagesizes other than 4kiB David Gibson
` (3 preceding siblings ...)
2019-02-14 4:39 ` [Qemu-devel] [PATCH 4/5] virtio-balloon: Use ram_block_discard_range() instead of raw madvise() David Gibson
@ 2019-02-14 4:39 ` David Gibson
2019-03-05 16:06 ` [Qemu-devel] [PULL 23/26] " Peter Maydell
2019-02-28 13:39 ` [Qemu-devel] [PATCH 0/5] Improve balloon handling of pagesizes other than 4kiB Michael S. Tsirkin
5 siblings, 1 reply; 38+ messages in thread
From: David Gibson @ 2019-02-14 4:39 UTC (permalink / raw)
To: mst, qemu-devel; +Cc: qemu-ppc, David Gibson
The virtio-balloon always works in units of 4kiB (BALLOON_PAGE_SIZE), but
we can only actually discard memory in units of the host page size.
Now, we handle this very badly: we silently ignore balloon requests that
aren't host page aligned, and for requests that are host page aligned we
discard the entire host page. The latter can corrupt guest memory if its
page size is smaller than the host's.
The obvious choice would be to disable the balloon if the host page size is
not 4kiB. However, that would break the special case where host and guest
have the same page size, but that's larger than 4kiB. That case currently
works by accident[1] - and is used in practice on many production POWER
systems where 64kiB has long been the Linux default page size on both host
and guest.
To make the balloon safe, without breaking that useful special case, we
need to accumulate 4kiB balloon requests until we have a whole contiguous
host page to discard.
We could in principle do that across all guest memory, but it would require
a large bitmap to track. This patch represents a compromise: we track
ballooned subpages for a single contiguous host page at a time. This means
that if the guest discards all 4kiB chunks of a host page in succession,
we will discard it. This is the expected behaviour in the (host page) ==
(guest page) != 4kiB case we want to support.
If the guest scatters 4kiB requests across different host pages, we don't
discard anything, and issue a warning. Not ideal, but at least we don't
corrupt guest memory as the previous version could.
Warning reporting is kind of a compromise here. Determining whether we're
in a problematic state at realize() time is tricky, because we'd have to
look at the host pagesizes of all memory backends, but we can't really know
if some of those backends could be for special purpose memory that's not
subject to ballooning.
Reporting only when the guest tries to balloon a partial page also isn't
great because if the guest page size happens to line up it won't indicate
that we're in a non ideal situation. It could also cause alarming repeated
warnings whenever a migration is attempted.
So, what we do is warn the first time the guest attempts balloon a partial
host page, whether or not it will end up ballooning the rest of the page
immediately afterwards.
[1] Because when the guest attempts to balloon a page, it will submit
requests for each 4kiB subpage. Most will be ignored, but the one
which happens to be host page aligned will discard the whole lot.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
hw/virtio/virtio-balloon.c | 69 +++++++++++++++++++++++++-----
include/hw/virtio/virtio-balloon.h | 3 ++
2 files changed, 62 insertions(+), 10 deletions(-)
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index e4cd8d566b..65f861cbef 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -33,33 +33,82 @@
#define BALLOON_PAGE_SIZE (1 << VIRTIO_BALLOON_PFN_SHIFT)
+typedef struct PartiallyBalloonedPage {
+ RAMBlock *rb;
+ ram_addr_t base;
+ unsigned long bitmap[];
+} PartiallyBalloonedPage;
+
static void balloon_inflate_page(VirtIOBalloon *balloon,
MemoryRegion *mr, hwaddr offset)
{
void *addr = memory_region_get_ram_ptr(mr) + offset;
RAMBlock *rb;
size_t rb_page_size;
- ram_addr_t ram_offset;
+ int subpages;
+ ram_addr_t ram_offset, host_page_base;
/* XXX is there a better way to get to the RAMBlock than via a
* host address? */
rb = qemu_ram_block_from_host(addr, false, &ram_offset);
rb_page_size = qemu_ram_pagesize(rb);
+ host_page_base = ram_offset & ~(rb_page_size - 1);
+
+ if (rb_page_size == BALLOON_PAGE_SIZE) {
+ /* Easy case */
- /* Silently ignore hugepage RAM blocks */
- if (rb_page_size != getpagesize()) {
+ ram_block_discard_range(rb, ram_offset, rb_page_size);
+ /* We ignore errors from ram_block_discard_range(), because it
+ * has already reported them, and failing to discard a balloon
+ * page is not fatal */
return;
}
- /* Silently ignore unaligned requests */
- if (ram_offset & (rb_page_size - 1)) {
- return;
+ /* Hard case
+ *
+ * We've put a piece of a larger host page into the balloon - we
+ * need to keep track until we have a whole host page to
+ * discard
+ */
+ warn_report_once(
+"Balloon used with backing page size > 4kiB, this may not be reliable");
+
+ subpages = rb_page_size / BALLOON_PAGE_SIZE;
+
+ if (balloon->pbp
+ && (rb != balloon->pbp->rb
+ || host_page_base != balloon->pbp->base)) {
+ /* We've partially ballooned part of a host page, but now
+ * we're trying to balloon part of a different one. Too hard,
+ * give up on the old partial page */
+ free(balloon->pbp);
+ balloon->pbp = NULL;
}
- ram_block_discard_range(rb, ram_offset, rb_page_size);
- /* We ignore errors from ram_block_discard_range(), because it has
- * already reported them, and failing to discard a balloon page is
- * not fatal */
+ if (!balloon->pbp) {
+ /* Starting on a new host page */
+ size_t bitlen = BITS_TO_LONGS(subpages) * sizeof(unsigned long);
+ balloon->pbp = g_malloc0(sizeof(PartiallyBalloonedPage) + bitlen);
+ balloon->pbp->rb = rb;
+ balloon->pbp->base = host_page_base;
+ }
+
+ bitmap_set(balloon->pbp->bitmap,
+ (ram_offset - balloon->pbp->base) / BALLOON_PAGE_SIZE,
+ subpages);
+
+ if (bitmap_full(balloon->pbp->bitmap, subpages)) {
+ /* We've accumulated a full host page, we can actually discard
+ * it now */
+
+ ram_block_discard_range(rb, balloon->pbp->base, rb_page_size);
+ /* We ignore errors from ram_block_discard_range(), because it
+ * has already reported them, and failing to discard a balloon
+ * page is not fatal */
+
+ free(balloon->pbp);
+ balloon->pbp = NULL;
+ }
}
static const char *balloon_stat_names[] = {
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index e0df3528c8..99dcd6d105 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -30,6 +30,8 @@ typedef struct virtio_balloon_stat_modern {
uint64_t val;
} VirtIOBalloonStatModern;
+typedef struct PartiallyBalloonedPage PartiallyBalloonedPage;
+
typedef struct VirtIOBalloon {
VirtIODevice parent_obj;
VirtQueue *ivq, *dvq, *svq;
@@ -42,6 +44,7 @@ typedef struct VirtIOBalloon {
int64_t stats_last_update;
int64_t stats_poll_interval;
uint32_t host_features;
+ PartiallyBalloonedPage *pbp;
} VirtIOBalloon;
#endif
--
2.20.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [PULL 23/26] virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size
2019-02-14 4:39 ` [Qemu-devel] [PATCH 5/5] virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size David Gibson
@ 2019-03-05 16:06 ` Peter Maydell
2019-03-05 23:33 ` David Gibson
0 siblings, 1 reply; 38+ messages in thread
From: Peter Maydell @ 2019-03-05 16:06 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: QEMU Developers, David Gibson
On Fri, 22 Feb 2019 at 02:41, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> From: David Gibson <david@gibson.dropbear.id.au>
>
> The virtio-balloon always works in units of 4kiB (BALLOON_PAGE_SIZE), but
> we can only actually discard memory in units of the host page size.
Hi -- Coverity points out an issue in this patch (CID 1399146):
> + /* Hard case
> + *
> + * We've put a piece of a larger host page into the balloon - we
> + * need to keep track until we have a whole host page to
> + * discard
> + */
> + warn_report_once(
> +"Balloon used with backing page size > 4kiB, this may not be reliable");
> +
> + subpages = rb_page_size / BALLOON_PAGE_SIZE;
> +
> + if (balloon->pbp
> + && (rb != balloon->pbp->rb
> + || host_page_base != balloon->pbp->base)) {
> + /* We've partially ballooned part of a host page, but now
> + * we're trying to balloon part of a different one. Too hard,
> + * give up on the old partial page */
> + free(balloon->pbp);
> + balloon->pbp = NULL;
> }
>
> - ram_block_discard_range(rb, ram_offset, rb_page_size);
> - /* We ignore errors from ram_block_discard_range(), because it has
> - * already reported them, and failing to discard a balloon page is
> - * not fatal */
> + if (!balloon->pbp) {
> + /* Starting on a new host page */
> + size_t bitlen = BITS_TO_LONGS(subpages) * sizeof(unsigned long);
> + balloon->pbp = g_malloc0(sizeof(PartiallyBalloonedPage) + bitlen);
We allocate balloon->pbp with g_malloc0() here...
> + balloon->pbp->rb = rb;
> + balloon->pbp->base = host_page_base;
> + }
> +
> + bitmap_set(balloon->pbp->bitmap,
> + (ram_offset - balloon->pbp->base) / BALLOON_PAGE_SIZE,
> + subpages);
> +
> + if (bitmap_full(balloon->pbp->bitmap, subpages)) {
> + /* We've accumulated a full host page, we can actually discard
> + * it now */
> +
> + ram_block_discard_range(rb, balloon->pbp->base, rb_page_size);
> + /* We ignore errors from ram_block_discard_range(), because it
> + * has already reported them, and failing to discard a balloon
> + * page is not fatal */
> +
> + free(balloon->pbp);
...but we free it (here and elsewhere) with free(), not g_free().
thanks
-- PMM
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [PULL 23/26] virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size
2019-03-05 16:06 ` [Qemu-devel] [PULL 23/26] " Peter Maydell
@ 2019-03-05 23:33 ` David Gibson
0 siblings, 0 replies; 38+ messages in thread
From: David Gibson @ 2019-03-05 23:33 UTC (permalink / raw)
To: Peter Maydell; +Cc: Michael S. Tsirkin, QEMU Developers
[-- Attachment #1: Type: text/plain, Size: 2914 bytes --]
On Tue, Mar 05, 2019 at 04:06:54PM +0000, Peter Maydell wrote:
> On Fri, 22 Feb 2019 at 02:41, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > From: David Gibson <david@gibson.dropbear.id.au>
> >
> > The virtio-balloon always works in units of 4kiB (BALLOON_PAGE_SIZE), but
> > we can only actually discard memory in units of the host page size.
>
> Hi -- Coverity points out an issue in this patch (CID 1399146):
>
> > + /* Hard case
> > + *
> > + * We've put a piece of a larger host page into the balloon - we
> > + * need to keep track until we have a whole host page to
> > + * discard
> > + */
> > + warn_report_once(
> > +"Balloon used with backing page size > 4kiB, this may not be reliable");
> > +
> > + subpages = rb_page_size / BALLOON_PAGE_SIZE;
> > +
> > + if (balloon->pbp
> > + && (rb != balloon->pbp->rb
> > + || host_page_base != balloon->pbp->base)) {
> > + /* We've partially ballooned part of a host page, but now
> > + * we're trying to balloon part of a different one. Too hard,
> > + * give up on the old partial page */
> > + free(balloon->pbp);
> > + balloon->pbp = NULL;
> > }
> >
> > - ram_block_discard_range(rb, ram_offset, rb_page_size);
> > - /* We ignore errors from ram_block_discard_range(), because it has
> > - * already reported them, and failing to discard a balloon page is
> > - * not fatal */
> > + if (!balloon->pbp) {
> > + /* Starting on a new host page */
> > + size_t bitlen = BITS_TO_LONGS(subpages) * sizeof(unsigned long);
> > + balloon->pbp = g_malloc0(sizeof(PartiallyBalloonedPage) + bitlen);
>
>
> We allocate balloon->pbp with g_malloc0() here...
>
> > + balloon->pbp->rb = rb;
> > + balloon->pbp->base = host_page_base;
> > + }
> > +
> > + bitmap_set(balloon->pbp->bitmap,
> > + (ram_offset - balloon->pbp->base) / BALLOON_PAGE_SIZE,
> > + subpages);
> > +
> > + if (bitmap_full(balloon->pbp->bitmap, subpages)) {
> > + /* We've accumulated a full host page, we can actually discard
> > + * it now */
> > +
> > + ram_block_discard_range(rb, balloon->pbp->base, rb_page_size);
> > + /* We ignore errors from ram_block_discard_range(), because it
> > + * has already reported them, and failing to discard a balloon
> > + * page is not fatal */
> > +
> > + free(balloon->pbp);
>
> ...but we free it (here and elsewhere) with free(), not g_free().
Ah. Whoops.
I'll put a fix for that in the series of followup balloon patches I'm
working on right now.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [PATCH 0/5] Improve balloon handling of pagesizes other than 4kiB
2019-02-14 4:39 [Qemu-devel] [PATCH 0/5] Improve balloon handling of pagesizes other than 4kiB David Gibson
` (4 preceding siblings ...)
2019-02-14 4:39 ` [Qemu-devel] [PATCH 5/5] virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size David Gibson
@ 2019-02-28 13:39 ` Michael S. Tsirkin
2019-03-05 0:53 ` David Gibson
5 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2019-02-28 13:39 UTC (permalink / raw)
To: David Gibson; +Cc: qemu-devel, qemu-ppc
On Thu, Feb 14, 2019 at 03:39:11PM +1100, David Gibson wrote:
> I posted some RFCs for this back in December, but didn't wrap it up in
> time for 3.1. Posting again for inclusion in 4.0.
>
> The virtio-balloon devices was never really thought out for cases
> other than 4kiB pagesize on both guest and host. It works in some
> cases, but in others can be ineffectual or even cause guest memory
> corruption.
>
> This series makes a handful of preliminary cleanups, then makes a
> change to safely, though not perfectly, handle cases with non 4kiB
> pagesizes.
I'd like to see a version of this that does not depend on patch 1 which
is not a cleanup nor a bugfix. Could you look into this please?
We can then debate merits of patch 1 separately.
> Changes since RFC:
> * Further refinement of when to issue warnings in 5/5
>
> David Gibson (5):
> virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate
> virtio-balloon: Corrections to address verification
> virtio-balloon: Rework ballon_page() interface
> virtio-balloon: Use ram_block_discard_range() instead of raw madvise()
> virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size
>
> hw/virtio/virtio-balloon.c | 102 ++++++++++++++++++++++++-----
> include/hw/virtio/virtio-balloon.h | 3 +
> 2 files changed, 89 insertions(+), 16 deletions(-)
>
> --
> 2.20.1
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [PATCH 0/5] Improve balloon handling of pagesizes other than 4kiB
2019-02-28 13:39 ` [Qemu-devel] [PATCH 0/5] Improve balloon handling of pagesizes other than 4kiB Michael S. Tsirkin
@ 2019-03-05 0:53 ` David Gibson
2019-03-05 2:13 ` Michael S. Tsirkin
0 siblings, 1 reply; 38+ messages in thread
From: David Gibson @ 2019-03-05 0:53 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: qemu-devel, qemu-ppc
[-- Attachment #1: Type: text/plain, Size: 1996 bytes --]
On Thu, Feb 28, 2019 at 08:39:21AM -0500, Michael S. Tsirkin wrote:
> On Thu, Feb 14, 2019 at 03:39:11PM +1100, David Gibson wrote:
> > I posted some RFCs for this back in December, but didn't wrap it up in
> > time for 3.1. Posting again for inclusion in 4.0.
> >
> > The virtio-balloon devices was never really thought out for cases
> > other than 4kiB pagesize on both guest and host. It works in some
> > cases, but in others can be ineffectual or even cause guest memory
> > corruption.
> >
> > This series makes a handful of preliminary cleanups, then makes a
> > change to safely, though not perfectly, handle cases with non 4kiB
> > pagesizes.
>
> I'd like to see a version of this that does not depend on patch 1 which
> is not a cleanup nor a bugfix. Could you look into this please?
Ok... the original series is already applied to master, so I'm not
exactly sure what you want me to do here. Should I try to come up
with a "logical revert" of the first patch? Or do you intend to
revert the whole series, and I rewrite the series without the first
patch?
> We can then debate merits of patch 1 separately.
>
>
> > Changes since RFC:
> > * Further refinement of when to issue warnings in 5/5
> >
> > David Gibson (5):
> > virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate
> > virtio-balloon: Corrections to address verification
> > virtio-balloon: Rework ballon_page() interface
> > virtio-balloon: Use ram_block_discard_range() instead of raw madvise()
> > virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size
> >
> > hw/virtio/virtio-balloon.c | 102 ++++++++++++++++++++++++-----
> > include/hw/virtio/virtio-balloon.h | 3 +
> > 2 files changed, 89 insertions(+), 16 deletions(-)
> >
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [PATCH 0/5] Improve balloon handling of pagesizes other than 4kiB
2019-03-05 0:53 ` David Gibson
@ 2019-03-05 2:13 ` Michael S. Tsirkin
2019-03-05 4:55 ` David Gibson
0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2019-03-05 2:13 UTC (permalink / raw)
To: David Gibson; +Cc: qemu-devel, qemu-ppc
On Tue, Mar 05, 2019 at 11:53:32AM +1100, David Gibson wrote:
> On Thu, Feb 28, 2019 at 08:39:21AM -0500, Michael S. Tsirkin wrote:
> > On Thu, Feb 14, 2019 at 03:39:11PM +1100, David Gibson wrote:
> > > I posted some RFCs for this back in December, but didn't wrap it up in
> > > time for 3.1. Posting again for inclusion in 4.0.
> > >
> > > The virtio-balloon devices was never really thought out for cases
> > > other than 4kiB pagesize on both guest and host. It works in some
> > > cases, but in others can be ineffectual or even cause guest memory
> > > corruption.
> > >
> > > This series makes a handful of preliminary cleanups, then makes a
> > > change to safely, though not perfectly, handle cases with non 4kiB
> > > pagesizes.
> >
> > I'd like to see a version of this that does not depend on patch 1 which
> > is not a cleanup nor a bugfix. Could you look into this please?
>
> Ok... the original series is already applied to master, so I'm not
> exactly sure what you want me to do here. Should I try to come up
> with a "logical revert" of the first patch? Or do you intend to
> revert the whole series, and I rewrite the series without the first
> patch?
Whatever you prefer. Maybe the best idea is to add a flag
that says whether to madvise or not. Default can be compatible.
Hmm?
> > We can then debate merits of patch 1 separately.
> >
> >
> > > Changes since RFC:
> > > * Further refinement of when to issue warnings in 5/5
> > >
> > > David Gibson (5):
> > > virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate
> > > virtio-balloon: Corrections to address verification
> > > virtio-balloon: Rework ballon_page() interface
> > > virtio-balloon: Use ram_block_discard_range() instead of raw madvise()
> > > virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size
> > >
> > > hw/virtio/virtio-balloon.c | 102 ++++++++++++++++++++++++-----
> > > include/hw/virtio/virtio-balloon.h | 3 +
> > > 2 files changed, 89 insertions(+), 16 deletions(-)
> > >
> >
>
> --
> David Gibson | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
> | _way_ _around_!
> http://www.ozlabs.org/~dgibson
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [PATCH 0/5] Improve balloon handling of pagesizes other than 4kiB
2019-03-05 2:13 ` Michael S. Tsirkin
@ 2019-03-05 4:55 ` David Gibson
0 siblings, 0 replies; 38+ messages in thread
From: David Gibson @ 2019-03-05 4:55 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: qemu-devel, qemu-ppc
[-- Attachment #1: Type: text/plain, Size: 1860 bytes --]
On Mon, Mar 04, 2019 at 09:13:03PM -0500, Michael S. Tsirkin wrote:
> On Tue, Mar 05, 2019 at 11:53:32AM +1100, David Gibson wrote:
> > On Thu, Feb 28, 2019 at 08:39:21AM -0500, Michael S. Tsirkin wrote:
> > > On Thu, Feb 14, 2019 at 03:39:11PM +1100, David Gibson wrote:
> > > > I posted some RFCs for this back in December, but didn't wrap it up in
> > > > time for 3.1. Posting again for inclusion in 4.0.
> > > >
> > > > The virtio-balloon devices was never really thought out for cases
> > > > other than 4kiB pagesize on both guest and host. It works in some
> > > > cases, but in others can be ineffectual or even cause guest memory
> > > > corruption.
> > > >
> > > > This series makes a handful of preliminary cleanups, then makes a
> > > > change to safely, though not perfectly, handle cases with non 4kiB
> > > > pagesizes.
> > >
> > > I'd like to see a version of this that does not depend on patch 1 which
> > > is not a cleanup nor a bugfix. Could you look into this please?
> >
> > Ok... the original series is already applied to master, so I'm not
> > exactly sure what you want me to do here. Should I try to come up
> > with a "logical revert" of the first patch? Or do you intend to
> > revert the whole series, and I rewrite the series without the first
> > patch?
>
> Whatever you prefer. Maybe the best idea is to add a flag
> that says whether to madvise or not. Default can be compatible.
> Hmm?
Ok, will do.
As I was looking at this I noticed a bug in the current code.
Extremely unlikely to hit in practice, but could result in guest
memory corruption if it does. So, I'll fix that as well.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 38+ messages in thread