All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-06  7:46 ` Paolo Bonzini
  0 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-06  7:46 UTC (permalink / raw)
  To: rusty
  Cc: fes, mst, aarcange, riel, kvm, linux-kernel, mikew, yinghan,
	virtualization

VIRTIO_BALLOON_F_MUST_TELL_HOST cannot be used properly because it is a
"negative" feature: it tells you that silent defalte is not supported.
Right now, QEMU refuses migration if the target does not support all the
features that were negotiated.  But then:

- a migration from non-MUST_TELL_HOST to MUST_TELL_HOST will succeed,
which is wrong;

- a migration from MUST_TELL_HOST to non-MUST_TELL_HOST will fail, which
is useless.

Add instead a new feature VIRTIO_BALLOON_F_SILENT_DEFLATE, and deprecate
VIRTIO_BALLOON_F_MUST_TELL_HOST since it is never actually used.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 virtio-spec.lyx | 36 +++++++++++++++++++++++++++++++++---
 1 file modificato, 33 inserzioni(+), 3 rimozioni(-)

diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index 7a073f4..1a25a18 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -6238,6 +6238,8 @@ bits
 
 \begin_deeper
 \begin_layout Description
+
+\change_deleted 1531152142 1346917221
 VIRTIO_BALLOON_F_MUST_TELL_HOST
 \begin_inset space ~
 \end_inset
@@ -6251,6 +6253,20 @@ VIRTIO_BALLOON_F_STATS_VQ
 \end_inset
 
 (1) A virtqueue for reporting guest memory statistics is present.
+\change_inserted 1531152142 1346917193
+
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1346917219
+VIRTIO_BALLOON_F_SILENT_DEFLATE
+\begin_inset space ~
+\end_inset
+
+(2) Host does not need to be told before pages from the balloon are used.
+\change_unchanged
+
 \end_layout
 
 \end_deeper
@@ -6401,9 +6417,23 @@ The driver constructs an array of addresses of memory pages it has previously
 \end_layout
 
 \begin_layout Enumerate
-If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is set, the guest may not
- use these requested pages until that descriptor in the deflateq has been
- used by the device.
+If the VIRTIO_BALLOON_F_
+\change_deleted 1531152142 1346917234
+MUST_TELL_HOST
+\change_inserted 1531152142 1346917237
+SILENT_DEFLATE
+\change_unchanged
+ feature is 
+\change_inserted 1531152142 1346917241
+not 
+\change_unchanged
+set, the guest may not use these requested pages until that descriptor in
+ the deflateq has been used by the device.
+
+\change_inserted 1531152142 1346917253
+ If it is set, the guest may choose to not use the deflateq at all.
+\change_unchanged
+
 \end_layout
 
 \begin_layout Enumerate
-- 
1.7.11.2


^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-06  7:46 ` Paolo Bonzini
  0 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-06  7:46 UTC (permalink / raw)
  To: rusty
  Cc: fes, aarcange, riel, kvm, mst, linux-kernel, mikew, yinghan,
	virtualization

VIRTIO_BALLOON_F_MUST_TELL_HOST cannot be used properly because it is a
"negative" feature: it tells you that silent defalte is not supported.
Right now, QEMU refuses migration if the target does not support all the
features that were negotiated.  But then:

- a migration from non-MUST_TELL_HOST to MUST_TELL_HOST will succeed,
which is wrong;

- a migration from MUST_TELL_HOST to non-MUST_TELL_HOST will fail, which
is useless.

Add instead a new feature VIRTIO_BALLOON_F_SILENT_DEFLATE, and deprecate
VIRTIO_BALLOON_F_MUST_TELL_HOST since it is never actually used.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 virtio-spec.lyx | 36 +++++++++++++++++++++++++++++++++---
 1 file modificato, 33 inserzioni(+), 3 rimozioni(-)

diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index 7a073f4..1a25a18 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -6238,6 +6238,8 @@ bits
 
 \begin_deeper
 \begin_layout Description
+
+\change_deleted 1531152142 1346917221
 VIRTIO_BALLOON_F_MUST_TELL_HOST
 \begin_inset space ~
 \end_inset
@@ -6251,6 +6253,20 @@ VIRTIO_BALLOON_F_STATS_VQ
 \end_inset
 
 (1) A virtqueue for reporting guest memory statistics is present.
+\change_inserted 1531152142 1346917193
+
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1346917219
+VIRTIO_BALLOON_F_SILENT_DEFLATE
+\begin_inset space ~
+\end_inset
+
+(2) Host does not need to be told before pages from the balloon are used.
+\change_unchanged
+
 \end_layout
 
 \end_deeper
@@ -6401,9 +6417,23 @@ The driver constructs an array of addresses of memory pages it has previously
 \end_layout
 
 \begin_layout Enumerate
-If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is set, the guest may not
- use these requested pages until that descriptor in the deflateq has been
- used by the device.
+If the VIRTIO_BALLOON_F_
+\change_deleted 1531152142 1346917234
+MUST_TELL_HOST
+\change_inserted 1531152142 1346917237
+SILENT_DEFLATE
+\change_unchanged
+ feature is 
+\change_inserted 1531152142 1346917241
+not 
+\change_unchanged
+set, the guest may not use these requested pages until that descriptor in
+ the deflateq has been used by the device.
+
+\change_inserted 1531152142 1346917253
+ If it is set, the guest may choose to not use the deflateq at all.
+\change_unchanged
+
 \end_layout
 
 \begin_layout Enumerate
-- 
1.7.11.2

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-06  7:46 ` Paolo Bonzini
@ 2012-09-06  8:47   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-06  8:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: rusty, fes, aarcange, riel, kvm, linux-kernel, mikew, yinghan,
	virtualization

On Thu, Sep 06, 2012 at 09:46:50AM +0200, Paolo Bonzini wrote:
> VIRTIO_BALLOON_F_MUST_TELL_HOST cannot be used properly because it is a
> "negative" feature: it tells you that silent defalte is not supported.
> Right now, QEMU refuses migration if the target does not support all the
> features that were negotiated.  But then:
> 
> - a migration from non-MUST_TELL_HOST to MUST_TELL_HOST will succeed,
> which is wrong;
> 
> - a migration from MUST_TELL_HOST to non-MUST_TELL_HOST will fail, which
> is useless.
> 
> Add instead a new feature VIRTIO_BALLOON_F_SILENT_DEFLATE, and deprecate
> VIRTIO_BALLOON_F_MUST_TELL_HOST since it is never actually used.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Frankly I think it's a qemu migration bug. I don't see why
we need to tweak spec: just teach qemu to be smarter
during migration.

Can you show a scenario with old driver/new hypervisor or
new driver/old hypervisor that fails?

> ---
>  virtio-spec.lyx | 36 +++++++++++++++++++++++++++++++++---
>  1 file modificato, 33 inserzioni(+), 3 rimozioni(-)
> 
> diff --git a/virtio-spec.lyx b/virtio-spec.lyx
> index 7a073f4..1a25a18 100644
> --- a/virtio-spec.lyx
> +++ b/virtio-spec.lyx
> @@ -6238,6 +6238,8 @@ bits
>  
>  \begin_deeper
>  \begin_layout Description
> +
> +\change_deleted 1531152142 1346917221
>  VIRTIO_BALLOON_F_MUST_TELL_HOST
>  \begin_inset space ~
>  \end_inset
> @@ -6251,6 +6253,20 @@ VIRTIO_BALLOON_F_STATS_VQ
>  \end_inset
>  
>  (1) A virtqueue for reporting guest memory statistics is present.
> +\change_inserted 1531152142 1346917193
> +
> +\end_layout
> +
> +\begin_layout Description
> +
> +\change_inserted 1531152142 1346917219
> +VIRTIO_BALLOON_F_SILENT_DEFLATE
> +\begin_inset space ~
> +\end_inset
> +
> +(2) Host does not need to be told before pages from the balloon are used.
> +\change_unchanged
> +
>  \end_layout
>  
>  \end_deeper
> @@ -6401,9 +6417,23 @@ The driver constructs an array of addresses of memory pages it has previously
>  \end_layout
>  
>  \begin_layout Enumerate
> -If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is set, the guest may not
> - use these requested pages until that descriptor in the deflateq has been
> - used by the device.
> +If the VIRTIO_BALLOON_F_
> +\change_deleted 1531152142 1346917234
> +MUST_TELL_HOST
> +\change_inserted 1531152142 1346917237
> +SILENT_DEFLATE
> +\change_unchanged
> + feature is 
> +\change_inserted 1531152142 1346917241
> +not 
> +\change_unchanged
> +set, the guest may not use these requested pages until that descriptor in
> + the deflateq has been used by the device.
> +
> +\change_inserted 1531152142 1346917253
> + If it is set, the guest may choose to not use the deflateq at all.
> +\change_unchanged
> +
>  \end_layout
>  
>  \begin_layout Enumerate
> -- 
> 1.7.11.2

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-06  8:47   ` Michael S. Tsirkin
  0 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-06  8:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: fes, aarcange, riel, kvm, linux-kernel, mikew, yinghan, virtualization

On Thu, Sep 06, 2012 at 09:46:50AM +0200, Paolo Bonzini wrote:
> VIRTIO_BALLOON_F_MUST_TELL_HOST cannot be used properly because it is a
> "negative" feature: it tells you that silent defalte is not supported.
> Right now, QEMU refuses migration if the target does not support all the
> features that were negotiated.  But then:
> 
> - a migration from non-MUST_TELL_HOST to MUST_TELL_HOST will succeed,
> which is wrong;
> 
> - a migration from MUST_TELL_HOST to non-MUST_TELL_HOST will fail, which
> is useless.
> 
> Add instead a new feature VIRTIO_BALLOON_F_SILENT_DEFLATE, and deprecate
> VIRTIO_BALLOON_F_MUST_TELL_HOST since it is never actually used.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Frankly I think it's a qemu migration bug. I don't see why
we need to tweak spec: just teach qemu to be smarter
during migration.

Can you show a scenario with old driver/new hypervisor or
new driver/old hypervisor that fails?

> ---
>  virtio-spec.lyx | 36 +++++++++++++++++++++++++++++++++---
>  1 file modificato, 33 inserzioni(+), 3 rimozioni(-)
> 
> diff --git a/virtio-spec.lyx b/virtio-spec.lyx
> index 7a073f4..1a25a18 100644
> --- a/virtio-spec.lyx
> +++ b/virtio-spec.lyx
> @@ -6238,6 +6238,8 @@ bits
>  
>  \begin_deeper
>  \begin_layout Description
> +
> +\change_deleted 1531152142 1346917221
>  VIRTIO_BALLOON_F_MUST_TELL_HOST
>  \begin_inset space ~
>  \end_inset
> @@ -6251,6 +6253,20 @@ VIRTIO_BALLOON_F_STATS_VQ
>  \end_inset
>  
>  (1) A virtqueue for reporting guest memory statistics is present.
> +\change_inserted 1531152142 1346917193
> +
> +\end_layout
> +
> +\begin_layout Description
> +
> +\change_inserted 1531152142 1346917219
> +VIRTIO_BALLOON_F_SILENT_DEFLATE
> +\begin_inset space ~
> +\end_inset
> +
> +(2) Host does not need to be told before pages from the balloon are used.
> +\change_unchanged
> +
>  \end_layout
>  
>  \end_deeper
> @@ -6401,9 +6417,23 @@ The driver constructs an array of addresses of memory pages it has previously
>  \end_layout
>  
>  \begin_layout Enumerate
> -If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is set, the guest may not
> - use these requested pages until that descriptor in the deflateq has been
> - used by the device.
> +If the VIRTIO_BALLOON_F_
> +\change_deleted 1531152142 1346917234
> +MUST_TELL_HOST
> +\change_inserted 1531152142 1346917237
> +SILENT_DEFLATE
> +\change_unchanged
> + feature is 
> +\change_inserted 1531152142 1346917241
> +not 
> +\change_unchanged
> +set, the guest may not use these requested pages until that descriptor in
> + the deflateq has been used by the device.
> +
> +\change_inserted 1531152142 1346917253
> + If it is set, the guest may choose to not use the deflateq at all.
> +\change_unchanged
> +
>  \end_layout
>  
>  \begin_layout Enumerate
> -- 
> 1.7.11.2

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-06  8:47   ` Michael S. Tsirkin
@ 2012-09-06  9:24     ` Paolo Bonzini
  -1 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-06  9:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: rusty, fes, aarcange, riel, kvm, linux-kernel, mikew, yinghan,
	virtualization

Il 06/09/2012 10:47, Michael S. Tsirkin ha scritto:
>> > - a migration from non-MUST_TELL_HOST to MUST_TELL_HOST will succeed,
>> > which is wrong;
>> > 
>> > - a migration from MUST_TELL_HOST to non-MUST_TELL_HOST will fail, which
>> > is useless.
>> > 
>> > Add instead a new feature VIRTIO_BALLOON_F_SILENT_DEFLATE, and deprecate
>> > VIRTIO_BALLOON_F_MUST_TELL_HOST since it is never actually used.
>> > 
>> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Frankly I think it's a qemu migration bug. I don't see why
> we need to tweak spec: just teach qemu to be smarter
> during migration.

Of course you can just teach QEMU to be smarter, but that would be a
one-off hack for the only ill-defined feature that says something is
_not_ supported.  Since in practice all virtio_balloon-enbled
hypervisors supported silent deflate (so the bit was always zero), and
no client used it (so it was never checked), it's easier to just reverse
the direction.

In fact, it's not clear how the driver should use the feature.  My guess
is that, if it wants to use silent deflate, it tries to negotiate
VIRTIO_BALLOON_F_MUST_TELL_HOST, and can use silent deflate if
negotiation fails.  This is against the logic of all other features.

> Can you show a scenario with old driver/new hypervisor or
> new driver/old hypervisor that fails?

Suppose the driver tried to negotiate the feature as above.  We then
have the two scenarios above.

In the harmless but annoying scenario, the source hypervisor doesn't
support silent deflate, so VIRTIO_BALLOON_F_MUST_TELL_HOST has been
negotiated successfully.  The destination hypervisor supports silent
deflate, so it does _not_ include the feature.  It sees that the guest
requests VIRTIO_BALLOON_F_MUST_TELL_HOST, and fails migration.

In the incorrect scenario, you are migrating to an older hypervisor.
The source hypervisor is newer and supports silent deflate, so
VIRTIO_BALLOON_F_MUST_TELL_HOST was _not_ negotiated.  The destination
hypervisor does not supports silent deflate.  However, the guest is not
requesting VIRTIO_BALLOON_F_MUST_TELL_HOST, and migration succeeds.
Next time the guest tries to do use a page from the balloon, badness
happens, because the hypervisor does not expect it.

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-06  9:24     ` Paolo Bonzini
  0 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-06  9:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: fes, aarcange, riel, kvm, linux-kernel, mikew, yinghan, virtualization

Il 06/09/2012 10:47, Michael S. Tsirkin ha scritto:
>> > - a migration from non-MUST_TELL_HOST to MUST_TELL_HOST will succeed,
>> > which is wrong;
>> > 
>> > - a migration from MUST_TELL_HOST to non-MUST_TELL_HOST will fail, which
>> > is useless.
>> > 
>> > Add instead a new feature VIRTIO_BALLOON_F_SILENT_DEFLATE, and deprecate
>> > VIRTIO_BALLOON_F_MUST_TELL_HOST since it is never actually used.
>> > 
>> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Frankly I think it's a qemu migration bug. I don't see why
> we need to tweak spec: just teach qemu to be smarter
> during migration.

Of course you can just teach QEMU to be smarter, but that would be a
one-off hack for the only ill-defined feature that says something is
_not_ supported.  Since in practice all virtio_balloon-enbled
hypervisors supported silent deflate (so the bit was always zero), and
no client used it (so it was never checked), it's easier to just reverse
the direction.

In fact, it's not clear how the driver should use the feature.  My guess
is that, if it wants to use silent deflate, it tries to negotiate
VIRTIO_BALLOON_F_MUST_TELL_HOST, and can use silent deflate if
negotiation fails.  This is against the logic of all other features.

> Can you show a scenario with old driver/new hypervisor or
> new driver/old hypervisor that fails?

Suppose the driver tried to negotiate the feature as above.  We then
have the two scenarios above.

In the harmless but annoying scenario, the source hypervisor doesn't
support silent deflate, so VIRTIO_BALLOON_F_MUST_TELL_HOST has been
negotiated successfully.  The destination hypervisor supports silent
deflate, so it does _not_ include the feature.  It sees that the guest
requests VIRTIO_BALLOON_F_MUST_TELL_HOST, and fails migration.

In the incorrect scenario, you are migrating to an older hypervisor.
The source hypervisor is newer and supports silent deflate, so
VIRTIO_BALLOON_F_MUST_TELL_HOST was _not_ negotiated.  The destination
hypervisor does not supports silent deflate.  However, the guest is not
requesting VIRTIO_BALLOON_F_MUST_TELL_HOST, and migration succeeds.
Next time the guest tries to do use a page from the balloon, badness
happens, because the hypervisor does not expect it.

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-06  9:24     ` Paolo Bonzini
@ 2012-09-06  9:44       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-06  9:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: rusty, fes, aarcange, riel, kvm, linux-kernel, mikew, yinghan,
	virtualization

On Thu, Sep 06, 2012 at 11:24:02AM +0200, Paolo Bonzini wrote:
> Il 06/09/2012 10:47, Michael S. Tsirkin ha scritto:
> >> > - a migration from non-MUST_TELL_HOST to MUST_TELL_HOST will succeed,
> >> > which is wrong;
> >> > 
> >> > - a migration from MUST_TELL_HOST to non-MUST_TELL_HOST will fail, which
> >> > is useless.
> >> > 
> >> > Add instead a new feature VIRTIO_BALLOON_F_SILENT_DEFLATE, and deprecate
> >> > VIRTIO_BALLOON_F_MUST_TELL_HOST since it is never actually used.
> >> > 
> >> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > Frankly I think it's a qemu migration bug. I don't see why
> > we need to tweak spec: just teach qemu to be smarter
> > during migration.
> 
> Of course you can just teach QEMU to be smarter, but that would be a
> one-off hack for the only ill-defined feature that says something is
> _not_ supported.  Since in practice all virtio_balloon-enbled
> hypervisors supported silent deflate (so the bit was always zero), and
> no client used it (so it was never checked), it's easier to just reverse
> the direction.
> 
> In fact, it's not clear how the driver should use the feature.  My guess
> is that, if it wants to use silent deflate, it tries to negotiate
> VIRTIO_BALLOON_F_MUST_TELL_HOST, and can use silent deflate if
> negotiation fails.  This is against the logic of all other features.

Let's take a step back from the implementation details.
You are trying to add a new feature bit, after all.
Why? Why is silent deflate useful? This is what is
missing in all this discussion. If it is not useful
we do not need a bit for it.

> > Can you show a scenario with old driver/new hypervisor or
> > new driver/old hypervisor that fails?
> 
> Suppose the driver tried to negotiate the feature as above.  We then
> have the two scenarios above.
> 
> In the harmless but annoying scenario, the source hypervisor doesn't
> support silent deflate, so VIRTIO_BALLOON_F_MUST_TELL_HOST has been
> negotiated successfully.  The destination hypervisor supports silent
> deflate, so it does _not_ include the feature.  It sees that the guest
> requests VIRTIO_BALLOON_F_MUST_TELL_HOST, and fails migration.
> 
> In the incorrect scenario, you are migrating to an older hypervisor.
> The source hypervisor is newer and supports silent deflate, so
> VIRTIO_BALLOON_F_MUST_TELL_HOST was _not_ negotiated.  The destination
> hypervisor does not supports silent deflate.  However, the guest is not
> requesting VIRTIO_BALLOON_F_MUST_TELL_HOST, and migration succeeds.
> Next time the guest tries to do use a page from the balloon, badness
> happens, because the hypervisor does not expect it.
> 
> Paolo

Sorry this is not the example I asked for.  Please give and example
without migration.

Migration is qemu's problem: it is hypervisor's job to
make sure guest sees no change during migration.
It should be able to do this with any hardware it emulates,
there should be no need to change hardware to make it
"migrateable" somehow.

-- 
MST

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-06  9:44       ` Michael S. Tsirkin
  0 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-06  9:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: fes, aarcange, riel, kvm, linux-kernel, mikew, yinghan, virtualization

On Thu, Sep 06, 2012 at 11:24:02AM +0200, Paolo Bonzini wrote:
> Il 06/09/2012 10:47, Michael S. Tsirkin ha scritto:
> >> > - a migration from non-MUST_TELL_HOST to MUST_TELL_HOST will succeed,
> >> > which is wrong;
> >> > 
> >> > - a migration from MUST_TELL_HOST to non-MUST_TELL_HOST will fail, which
> >> > is useless.
> >> > 
> >> > Add instead a new feature VIRTIO_BALLOON_F_SILENT_DEFLATE, and deprecate
> >> > VIRTIO_BALLOON_F_MUST_TELL_HOST since it is never actually used.
> >> > 
> >> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > Frankly I think it's a qemu migration bug. I don't see why
> > we need to tweak spec: just teach qemu to be smarter
> > during migration.
> 
> Of course you can just teach QEMU to be smarter, but that would be a
> one-off hack for the only ill-defined feature that says something is
> _not_ supported.  Since in practice all virtio_balloon-enbled
> hypervisors supported silent deflate (so the bit was always zero), and
> no client used it (so it was never checked), it's easier to just reverse
> the direction.
> 
> In fact, it's not clear how the driver should use the feature.  My guess
> is that, if it wants to use silent deflate, it tries to negotiate
> VIRTIO_BALLOON_F_MUST_TELL_HOST, and can use silent deflate if
> negotiation fails.  This is against the logic of all other features.

Let's take a step back from the implementation details.
You are trying to add a new feature bit, after all.
Why? Why is silent deflate useful? This is what is
missing in all this discussion. If it is not useful
we do not need a bit for it.

> > Can you show a scenario with old driver/new hypervisor or
> > new driver/old hypervisor that fails?
> 
> Suppose the driver tried to negotiate the feature as above.  We then
> have the two scenarios above.
> 
> In the harmless but annoying scenario, the source hypervisor doesn't
> support silent deflate, so VIRTIO_BALLOON_F_MUST_TELL_HOST has been
> negotiated successfully.  The destination hypervisor supports silent
> deflate, so it does _not_ include the feature.  It sees that the guest
> requests VIRTIO_BALLOON_F_MUST_TELL_HOST, and fails migration.
> 
> In the incorrect scenario, you are migrating to an older hypervisor.
> The source hypervisor is newer and supports silent deflate, so
> VIRTIO_BALLOON_F_MUST_TELL_HOST was _not_ negotiated.  The destination
> hypervisor does not supports silent deflate.  However, the guest is not
> requesting VIRTIO_BALLOON_F_MUST_TELL_HOST, and migration succeeds.
> Next time the guest tries to do use a page from the balloon, badness
> happens, because the hypervisor does not expect it.
> 
> Paolo

Sorry this is not the example I asked for.  Please give and example
without migration.

Migration is qemu's problem: it is hypervisor's job to
make sure guest sees no change during migration.
It should be able to do this with any hardware it emulates,
there should be no need to change hardware to make it
"migrateable" somehow.

-- 
MST

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-06  9:44       ` Michael S. Tsirkin
@ 2012-09-06  9:57         ` Paolo Bonzini
  -1 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-06  9:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: rusty, fes, aarcange, riel, kvm, linux-kernel, mikew, yinghan,
	virtualization

Il 06/09/2012 11:44, Michael S. Tsirkin ha scritto:
>> In fact, it's not clear how the driver should use the feature.  My guess
>> is that, if it wants to use silent deflate, it tries to negotiate
>> VIRTIO_BALLOON_F_MUST_TELL_HOST, and can use silent deflate if
>> negotiation fails.  This is against the logic of all other features.
> 
> Let's take a step back from the implementation details.
> You are trying to add a new feature bit, after all.
> Why? Why is silent deflate useful? This is what is
> missing in all this discussion. If it is not useful
> we do not need a bit for it.

It is useful because it lets guests inflate the balloon aggressively,
and then use ballooned-out pages even in places where the guest OS
cannot sleep, such as kmalloc(GFP_ATOMIC).

>>> Can you show a scenario with old driver/new hypervisor or
>>> new driver/old hypervisor that fails?
>
> Sorry this is not the example I asked for.  Please give and example
> without migration.
> 
> Migration is qemu's problem: it is hypervisor's job to
> make sure guest sees no change during migration.

Quoting my message: "Of course you can just teach QEMU to be smarter,
but that would be a one-off hack for the only ill-defined feature that
says something is _not_ supported".

Currently migration works the same way for all virtio devices, and
assumes that features are defined only in the "positive" direction:
drivers request features if they want to use it, devices provide
features to say they support something.

Instead, in the case of this feature, the driver requests it before
relying on its lack (which is odd); the device provides if they do not
support something (which is wrong).  You can see that this just cannot
provide backwards-compatibility in the device; it happens to work only
because the feature was there in the first version of the spec.

> It should be able to do this with any hardware it emulates,
> there should be no need to change hardware to make it
> "migrateable" somehow.

Of course, but if we can fix the hardware with no bad effects, let's do
that instead.

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-06  9:57         ` Paolo Bonzini
  0 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-06  9:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: fes, aarcange, riel, kvm, linux-kernel, mikew, yinghan, virtualization

Il 06/09/2012 11:44, Michael S. Tsirkin ha scritto:
>> In fact, it's not clear how the driver should use the feature.  My guess
>> is that, if it wants to use silent deflate, it tries to negotiate
>> VIRTIO_BALLOON_F_MUST_TELL_HOST, and can use silent deflate if
>> negotiation fails.  This is against the logic of all other features.
> 
> Let's take a step back from the implementation details.
> You are trying to add a new feature bit, after all.
> Why? Why is silent deflate useful? This is what is
> missing in all this discussion. If it is not useful
> we do not need a bit for it.

It is useful because it lets guests inflate the balloon aggressively,
and then use ballooned-out pages even in places where the guest OS
cannot sleep, such as kmalloc(GFP_ATOMIC).

>>> Can you show a scenario with old driver/new hypervisor or
>>> new driver/old hypervisor that fails?
>
> Sorry this is not the example I asked for.  Please give and example
> without migration.
> 
> Migration is qemu's problem: it is hypervisor's job to
> make sure guest sees no change during migration.

Quoting my message: "Of course you can just teach QEMU to be smarter,
but that would be a one-off hack for the only ill-defined feature that
says something is _not_ supported".

Currently migration works the same way for all virtio devices, and
assumes that features are defined only in the "positive" direction:
drivers request features if they want to use it, devices provide
features to say they support something.

Instead, in the case of this feature, the driver requests it before
relying on its lack (which is odd); the device provides if they do not
support something (which is wrong).  You can see that this just cannot
provide backwards-compatibility in the device; it happens to work only
because the feature was there in the first version of the spec.

> It should be able to do this with any hardware it emulates,
> there should be no need to change hardware to make it
> "migrateable" somehow.

Of course, but if we can fix the hardware with no bad effects, let's do
that instead.

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-06  9:57         ` Paolo Bonzini
@ 2012-09-06 10:53           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-06 10:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: rusty, fes, aarcange, riel, kvm, linux-kernel, mikew, yinghan,
	virtualization

On Thu, Sep 06, 2012 at 11:57:22AM +0200, Paolo Bonzini wrote:
> Il 06/09/2012 11:44, Michael S. Tsirkin ha scritto:
> >> In fact, it's not clear how the driver should use the feature.  My guess
> >> is that, if it wants to use silent deflate, it tries to negotiate
> >> VIRTIO_BALLOON_F_MUST_TELL_HOST, and can use silent deflate if
> >> negotiation fails.  This is against the logic of all other features.
> > 
> > Let's take a step back from the implementation details.
> > You are trying to add a new feature bit, after all.
> > Why? Why is silent deflate useful? This is what is
> > missing in all this discussion. If it is not useful
> > we do not need a bit for it.
> 
> It is useful because it lets guests inflate the balloon aggressively,
> and then use ballooned-out pages even in places where the guest OS
> cannot sleep, such as kmalloc(GFP_ATOMIC).

Interesting.
Do you intend to develop a driver patch using this?  I'd like to see how
that works.  Because if not, IMO it's best to wait until someone asks
for it.

> >>> Can you show a scenario with old driver/new hypervisor or
> >>> new driver/old hypervisor that fails?
> >
> > Sorry this is not the example I asked for.  Please give and example
> > without migration.
> > 
> > Migration is qemu's problem: it is hypervisor's job to
> > make sure guest sees no change during migration.
> 
> Quoting my message: "Of course you can just teach QEMU to be smarter,
> but that would be a one-off hack for the only ill-defined feature that
> says something is _not_ supported".
>
> Currently migration works the same way for all virtio devices,
> and
> assumes that features are defined only in the "positive" direction:
> drivers request features if they want to use it, devices provide
> features to say they support something.

Well this approach is buggy. If I reread features after migration what
do I see? Something changed right? So this is a bug. Migration should
not change hardware. And it is not a "one off" thing it is
fundamental for any hardware.

Fix that in qemu, and the problem goes away without spec changes.

> Instead, in the case of this feature, the driver requests it before
> relying on its lack (which is odd);

Which code in driver do you refer to?

> the device provides if they do not
> support something (which is wrong).

Not support?
It just seems to be asking guest to tell it about deflates.
If guest acks the bit, we know it will. If it does not,
it will not.

>  You can see that this just cannot
> provide backwards-compatibility in the device;

Sorry I do not understand this meta argument.
There should be an example where a driver and device
fail to work together. And without migration: as
I showed migration is simply broken atm for
an unrelated reason. Otherwise all's well.

> it happens to work only
> because the feature was there in the first version of the spec.

This is how we do compatiblity in virtio. If we want driver to do
something, we add a feature and it can ack, if it does we know it will
do what we want.  Another example is network announce bit.  If driver
acks it, we know we do not need to send gratitious arp from qemu.  You
are saying it is also broken?

> > It should be able to do this with any hardware it emulates,
> > there should be no need to change hardware to make it
> > "migrateable" somehow.
> 
> Of course, but if we can fix the hardware with no bad effects, let's do
> that instead.
> 
> Paolo

Don't fix what is not broken. We get to carry compatibility
in both driver and host for a long time for each feature.

Note: adding
new features adds zero value in this respect - it will not
allow simplifying the hypervisor.
-- 
MST

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-06 10:53           ` Michael S. Tsirkin
  0 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-06 10:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: fes, aarcange, riel, kvm, linux-kernel, mikew, yinghan, virtualization

On Thu, Sep 06, 2012 at 11:57:22AM +0200, Paolo Bonzini wrote:
> Il 06/09/2012 11:44, Michael S. Tsirkin ha scritto:
> >> In fact, it's not clear how the driver should use the feature.  My guess
> >> is that, if it wants to use silent deflate, it tries to negotiate
> >> VIRTIO_BALLOON_F_MUST_TELL_HOST, and can use silent deflate if
> >> negotiation fails.  This is against the logic of all other features.
> > 
> > Let's take a step back from the implementation details.
> > You are trying to add a new feature bit, after all.
> > Why? Why is silent deflate useful? This is what is
> > missing in all this discussion. If it is not useful
> > we do not need a bit for it.
> 
> It is useful because it lets guests inflate the balloon aggressively,
> and then use ballooned-out pages even in places where the guest OS
> cannot sleep, such as kmalloc(GFP_ATOMIC).

Interesting.
Do you intend to develop a driver patch using this?  I'd like to see how
that works.  Because if not, IMO it's best to wait until someone asks
for it.

> >>> Can you show a scenario with old driver/new hypervisor or
> >>> new driver/old hypervisor that fails?
> >
> > Sorry this is not the example I asked for.  Please give and example
> > without migration.
> > 
> > Migration is qemu's problem: it is hypervisor's job to
> > make sure guest sees no change during migration.
> 
> Quoting my message: "Of course you can just teach QEMU to be smarter,
> but that would be a one-off hack for the only ill-defined feature that
> says something is _not_ supported".
>
> Currently migration works the same way for all virtio devices,
> and
> assumes that features are defined only in the "positive" direction:
> drivers request features if they want to use it, devices provide
> features to say they support something.

Well this approach is buggy. If I reread features after migration what
do I see? Something changed right? So this is a bug. Migration should
not change hardware. And it is not a "one off" thing it is
fundamental for any hardware.

Fix that in qemu, and the problem goes away without spec changes.

> Instead, in the case of this feature, the driver requests it before
> relying on its lack (which is odd);

Which code in driver do you refer to?

> the device provides if they do not
> support something (which is wrong).

Not support?
It just seems to be asking guest to tell it about deflates.
If guest acks the bit, we know it will. If it does not,
it will not.

>  You can see that this just cannot
> provide backwards-compatibility in the device;

Sorry I do not understand this meta argument.
There should be an example where a driver and device
fail to work together. And without migration: as
I showed migration is simply broken atm for
an unrelated reason. Otherwise all's well.

> it happens to work only
> because the feature was there in the first version of the spec.

This is how we do compatiblity in virtio. If we want driver to do
something, we add a feature and it can ack, if it does we know it will
do what we want.  Another example is network announce bit.  If driver
acks it, we know we do not need to send gratitious arp from qemu.  You
are saying it is also broken?

> > It should be able to do this with any hardware it emulates,
> > there should be no need to change hardware to make it
> > "migrateable" somehow.
> 
> Of course, but if we can fix the hardware with no bad effects, let's do
> that instead.
> 
> Paolo

Don't fix what is not broken. We get to carry compatibility
in both driver and host for a long time for each feature.

Note: adding
new features adds zero value in this respect - it will not
allow simplifying the hypervisor.
-- 
MST

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-06 10:53           ` Michael S. Tsirkin
@ 2012-09-06 12:13             ` Paolo Bonzini
  -1 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-06 12:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: rusty, fes, aarcange, riel, kvm, linux-kernel, mikew, yinghan,
	virtualization

Il 06/09/2012 12:53, Michael S. Tsirkin ha scritto:
>> It is useful because it lets guests inflate the balloon aggressively,
>> and then use ballooned-out pages even in places where the guest OS
>> cannot sleep, such as kmalloc(GFP_ATOMIC).
> 
> Interesting.
> Do you intend to develop a driver patch using this?  I'd like to see how
> that works.  Because if not, IMO it's best to wait until someone asks
> for it.

It's been two months, but Frank Swiderski's patch that triggered the
debate is exactly that
(http://permalink.gmane.org/gmane.linux.kernel/1318984).  However, he
didn't check VIRTIO_BALLOON_F_MUST_TELL_HOST, so he has a bug there.

>> Currently migration works the same way for all virtio devices,
>> and assumes that features are defined only in the "positive" direction:
>> drivers request features if they want to use it, devices provide
>> features to say they support something.
> 
> Well this approach is buggy. If I reread features after migration what
> do I see? Something changed right? So this is a bug. Migration should
> not change hardware.

Exactly, virtio migration currently fails if it would change hardware
due to features not supported in the destination.  Except for
VIRTIO_BALLOON_F_MUST_TELL_HOST, where it does not fail because it is
defined in the wrong direction.

> Fix that in qemu, and the problem goes away without spec changes.

That would be a one-off hack, for the sole feature that was defined wrong.

>> Instead, in the case of this feature, the driver requests it before
>> relying on its lack (which is odd);
> 
> Which code in driver do you refer to?

I'm talking of the code Frank should have put in the driver, but he
didn't (so he has a bug).  Something like

    if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST))
	return -ENODEV;

So it has to request the feature, and then fail if the feature is
present.  That's quite backwards.  Everywhere else you'll find

    if (!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_SCSI))
        return -ENOTTY;

    BUG_ON(!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_CONFIG_WCE));

    if (virtio_has_feature(vblk->vdev, VIRTIO_NET_F_CTRL_VQ)) {
        /* do cool stuff */
    }

etc.


>> the device provides if they do not
>> support something (which is wrong).
> 
> Not support? It just seems to be asking guest to tell it about deflates.
> If guest acks the bit, we know it will. If it does not,
> it will not.

No, it is the other way round.  The host ultimately decides what
features are negotiated, so it doesn't ask anything to the guest.  The
_guest_ is asking the host about the need for explicit deflate.

>>  You can see that this just cannot
>> provide backwards-compatibility in the device;
> 
> Sorry I do not understand this meta argument.
> There should be an example where a driver and device
> fail to work together.

There's nothing that you cannot work around.  Use virtio_has_feature in
the device, invert the migration feature check in the driver.  Why not
just _get it right_ instead?

>> it happens to work only
>> because the feature was there in the first version of the spec.
> 
> This is how we do compatiblity in virtio. If we want driver to do
> something, we add a feature and it can ack, if it does we know it will
> do what we want.  Another example is network announce bit.  If driver
> acks it, we know we do not need to send gratitious arp from qemu.  You
> are saying it is also broken?

No, it's not broken.  A reverse feature, let's call it like
VIRTIO_NET_F_HOST_MUST_SEND_GARP, would be broken.

VIRTIO_NET_F_GUEST_ANNOUNCE is a "positive" feature: if set, the host
_can_ ask the guest to send a gARP, but it may also send it itself.
Similarly if VIRTIO_BALLOON_F_SILENT_DEFLATE is set, the guest _can_ use
ballooned pages directly, but may also deflate them explicitly.

Instead, VIRTIO_NET_F_HOST_MUST_SEND_GARP would be a "negative" feature:
if set, the host _may not_ rely on the guest to send a gARP.  Similarly
if VIRTIO_BALLOON_F_MUST_TELL_HOST is set, the guest _may not_ use
ballooned pages directly.

There are _no_ other negative features besides
VIRTIO_BALLOON_F_MUST_TELL_HOST in the spec, and for a good
reason---because they're broken.

(Hmm, actually we have one, VIRTIO_BLK_F_RO.  It is also a bit broken,
but it is not so important because it depends on user input more than
hypervisor version).

Reasoning on migration is just another way to see if the feature is
positive.  During migration, new features available on the destination
can always be masked.  But if removing the feature _adds_ a capability
to the hardware, it's wrong.

> Don't fix what is not broken. We get to carry compatibility
> in both driver and host for a long time for each feature.

Sure, but better fix broken things _before_ somebody uses them.

I'm not proposing to replace VIRTIO_BLK_F_RO with VIRTIO_BLK_F_RW
because it's in wide use and it would pose compatibility problems indeed.

But since VIRTIO_BALLOON_F_MUST_TELL_HOST does not exist in any source
code, neither driver nor hypervisor, we get lucky and we can instantly
deprecate it.

> Note: adding new features adds zero value in this respect - it will not
> allow simplifying the hypervisor.

Indeed, it will add one line of code to the hypervisor to advertise the
new feature.

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-06 12:13             ` Paolo Bonzini
  0 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-06 12:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: fes, aarcange, riel, kvm, linux-kernel, mikew, yinghan, virtualization

Il 06/09/2012 12:53, Michael S. Tsirkin ha scritto:
>> It is useful because it lets guests inflate the balloon aggressively,
>> and then use ballooned-out pages even in places where the guest OS
>> cannot sleep, such as kmalloc(GFP_ATOMIC).
> 
> Interesting.
> Do you intend to develop a driver patch using this?  I'd like to see how
> that works.  Because if not, IMO it's best to wait until someone asks
> for it.

It's been two months, but Frank Swiderski's patch that triggered the
debate is exactly that
(http://permalink.gmane.org/gmane.linux.kernel/1318984).  However, he
didn't check VIRTIO_BALLOON_F_MUST_TELL_HOST, so he has a bug there.

>> Currently migration works the same way for all virtio devices,
>> and assumes that features are defined only in the "positive" direction:
>> drivers request features if they want to use it, devices provide
>> features to say they support something.
> 
> Well this approach is buggy. If I reread features after migration what
> do I see? Something changed right? So this is a bug. Migration should
> not change hardware.

Exactly, virtio migration currently fails if it would change hardware
due to features not supported in the destination.  Except for
VIRTIO_BALLOON_F_MUST_TELL_HOST, where it does not fail because it is
defined in the wrong direction.

> Fix that in qemu, and the problem goes away without spec changes.

That would be a one-off hack, for the sole feature that was defined wrong.

>> Instead, in the case of this feature, the driver requests it before
>> relying on its lack (which is odd);
> 
> Which code in driver do you refer to?

I'm talking of the code Frank should have put in the driver, but he
didn't (so he has a bug).  Something like

    if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST))
	return -ENODEV;

So it has to request the feature, and then fail if the feature is
present.  That's quite backwards.  Everywhere else you'll find

    if (!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_SCSI))
        return -ENOTTY;

    BUG_ON(!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_CONFIG_WCE));

    if (virtio_has_feature(vblk->vdev, VIRTIO_NET_F_CTRL_VQ)) {
        /* do cool stuff */
    }

etc.


>> the device provides if they do not
>> support something (which is wrong).
> 
> Not support? It just seems to be asking guest to tell it about deflates.
> If guest acks the bit, we know it will. If it does not,
> it will not.

No, it is the other way round.  The host ultimately decides what
features are negotiated, so it doesn't ask anything to the guest.  The
_guest_ is asking the host about the need for explicit deflate.

>>  You can see that this just cannot
>> provide backwards-compatibility in the device;
> 
> Sorry I do not understand this meta argument.
> There should be an example where a driver and device
> fail to work together.

There's nothing that you cannot work around.  Use virtio_has_feature in
the device, invert the migration feature check in the driver.  Why not
just _get it right_ instead?

>> it happens to work only
>> because the feature was there in the first version of the spec.
> 
> This is how we do compatiblity in virtio. If we want driver to do
> something, we add a feature and it can ack, if it does we know it will
> do what we want.  Another example is network announce bit.  If driver
> acks it, we know we do not need to send gratitious arp from qemu.  You
> are saying it is also broken?

No, it's not broken.  A reverse feature, let's call it like
VIRTIO_NET_F_HOST_MUST_SEND_GARP, would be broken.

VIRTIO_NET_F_GUEST_ANNOUNCE is a "positive" feature: if set, the host
_can_ ask the guest to send a gARP, but it may also send it itself.
Similarly if VIRTIO_BALLOON_F_SILENT_DEFLATE is set, the guest _can_ use
ballooned pages directly, but may also deflate them explicitly.

Instead, VIRTIO_NET_F_HOST_MUST_SEND_GARP would be a "negative" feature:
if set, the host _may not_ rely on the guest to send a gARP.  Similarly
if VIRTIO_BALLOON_F_MUST_TELL_HOST is set, the guest _may not_ use
ballooned pages directly.

There are _no_ other negative features besides
VIRTIO_BALLOON_F_MUST_TELL_HOST in the spec, and for a good
reason---because they're broken.

(Hmm, actually we have one, VIRTIO_BLK_F_RO.  It is also a bit broken,
but it is not so important because it depends on user input more than
hypervisor version).

Reasoning on migration is just another way to see if the feature is
positive.  During migration, new features available on the destination
can always be masked.  But if removing the feature _adds_ a capability
to the hardware, it's wrong.

> Don't fix what is not broken. We get to carry compatibility
> in both driver and host for a long time for each feature.

Sure, but better fix broken things _before_ somebody uses them.

I'm not proposing to replace VIRTIO_BLK_F_RO with VIRTIO_BLK_F_RW
because it's in wide use and it would pose compatibility problems indeed.

But since VIRTIO_BALLOON_F_MUST_TELL_HOST does not exist in any source
code, neither driver nor hypervisor, we get lucky and we can instantly
deprecate it.

> Note: adding new features adds zero value in this respect - it will not
> allow simplifying the hypervisor.

Indeed, it will add one line of code to the hypervisor to advertise the
new feature.

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-06 12:13             ` Paolo Bonzini
@ 2012-09-06 12:51               ` Michael S. Tsirkin
  -1 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-06 12:51 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: rusty, fes, aarcange, riel, kvm, linux-kernel, mikew, yinghan,
	virtualization

On Thu, Sep 06, 2012 at 02:13:14PM +0200, Paolo Bonzini wrote:
> Il 06/09/2012 12:53, Michael S. Tsirkin ha scritto:
> >> It is useful because it lets guests inflate the balloon aggressively,
> >> and then use ballooned-out pages even in places where the guest OS
> >> cannot sleep, such as kmalloc(GFP_ATOMIC).
> > 
> > Interesting.
> > Do you intend to develop a driver patch using this?  I'd like to see how
> > that works.  Because if not, IMO it's best to wait until someone asks
> > for it.
> 
> It's been two months, but Frank Swiderski's patch that triggered the
> debate is exactly that
> (http://permalink.gmane.org/gmane.linux.kernel/1318984).  However, he
> didn't check VIRTIO_BALLOON_F_MUST_TELL_HOST, so he has a bug there.

He is using a sepate device ID though, so we do not need a feature bit.

> >> Currently migration works the same way for all virtio devices,
> >> and assumes that features are defined only in the "positive" direction:
> >> drivers request features if they want to use it, devices provide
> >> features to say they support something.
> > 
> > Well this approach is buggy. If I reread features after migration what
> > do I see? Something changed right? So this is a bug. Migration should
> > not change hardware.
> 
> Exactly, virtio migration currently fails if it would change hardware
> due to features not supported in the destination.  Except for
> VIRTIO_BALLOON_F_MUST_TELL_HOST, where it does not fail because it is
> defined in the wrong direction.

There is nothing wrong with the direction that I can see.
The bug is that migration between backends

> > Fix that in qemu, and the problem goes away without spec changes.
> 
> That would be a one-off hack, for the sole feature that was defined wrong.

Not at all. It's a fundamental bug, as long as it's unfixed talking
about migration is just useless.

> >> Instead, in the case of this feature, the driver requests it before
> >> relying on its lack (which is odd);
> > 
> > Which code in driver do you refer to?
> 
> I'm talking of the code Frank should have put in the driver, but he
> didn't (so he has a bug).  Something like
> 
>     if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST))
> 	return -ENODEV;
> 
> So it has to request the feature, and then fail if the feature is
> present.  That's quite backwards.  Everywhere else you'll find
> 
>     if (!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_SCSI))
>         return -ENOTTY;
> 
>     BUG_ON(!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_CONFIG_WCE));

This is a bug BTW - we should not crash on bad device, failing probe
is the right thing.

> 
>     if (virtio_has_feature(vblk->vdev, VIRTIO_NET_F_CTRL_VQ)) {
>         /* do cool stuff */
>     }
> 
> etc.
> 

See above. Frank's driver does not seem to have a bug.

> >> the device provides if they do not
> >> support something (which is wrong).
> > 
> > Not support? It just seems to be asking guest to tell it about deflates.
> > If guest acks the bit, we know it will. If it does not,
> > it will not.
> 
> No, it is the other way round.  The host ultimately decides what
> features are negotiated, so it doesn't ask anything to the guest.  The
> _guest_ is asking the host about the need for explicit deflate.

Now we are arguing about words.  This is why meta arguments without
specific examples are so bad.

> >>  You can see that this just cannot
> >> provide backwards-compatibility in the device;
> > 
> > Sorry I do not understand this meta argument.
> > There should be an example where a driver and device
> > fail to work together.
> 
> There's nothing that you cannot work around.  Use virtio_has_feature in
> the device, invert the migration feature check in the driver.  Why not
> just _get it right_ instead?

Exactly. Bug is in qemu, fix it _there_. What you do is a work around
in spec: you declare old configurations unsupported.

> >> it happens to work only
> >> because the feature was there in the first version of the spec.
> > 
> > This is how we do compatiblity in virtio. If we want driver to do
> > something, we add a feature and it can ack, if it does we know it will
> > do what we want.  Another example is network announce bit.  If driver
> > acks it, we know we do not need to send gratitious arp from qemu.  You
> > are saying it is also broken?
> 
> No, it's not broken.  A reverse feature, let's call it like
> VIRTIO_NET_F_HOST_MUST_SEND_GARP, would be broken.
> 
> VIRTIO_NET_F_GUEST_ANNOUNCE is a "positive" feature: if set, the host
> _can_ ask the guest to send a gARP, but it may also send it itself.
> Similarly if VIRTIO_BALLOON_F_SILENT_DEFLATE is set, the guest _can_ use
> ballooned pages directly, but may also deflate them explicitly.
> 
> Instead, VIRTIO_NET_F_HOST_MUST_SEND_GARP would be a "negative" feature:
> if set, the host _may not_ rely on the guest to send a gARP.  Similarly
> if VIRTIO_BALLOON_F_MUST_TELL_HOST is set, the guest _may not_ use
> ballooned pages directly.
> 
> There are _no_ other negative features besides
> VIRTIO_BALLOON_F_MUST_TELL_HOST in the spec, and for a good
> reason---because they're broken.
> 
> (Hmm, actually we have one, VIRTIO_BLK_F_RO.  It is also a bit broken,
> but it is not so important because it depends on user input more than
> hypervisor version).

Now that we have a specific example, we can talk.
Simply, some features do not need an ack from guest:
they just tell guest something about device.
RO is one such feature.

> Reasoning on migration is just another way to see if the feature is
> positive.  During migration, new features available on the destination
> can always be masked.  But if removing the feature _adds_ a capability
> to the hardware, it's wrong.

Fact is, nothing except migration seems broken. This alone should
make you realize there is a bug in qemu not in driver or protocol.

> > Don't fix what is not broken. We get to carry compatibility
> > in both driver and host for a long time for each feature.
> 
> Sure, but better fix broken things _before_ somebody uses them.
> 
> I'm not proposing to replace VIRTIO_BLK_F_RO with VIRTIO_BLK_F_RW
> because it's in wide use and it would pose compatibility problems indeed.
> 
> But since VIRTIO_BALLOON_F_MUST_TELL_HOST does not exist in any source
> code, neither driver nor hypervisor,
> we get lucky and we can instantly
> deprecate it.

Which driver are you looking at?

 grep VIRTIO_BALLOON_F_MUST_TELL_HOST drivers/virtio/*c|wc -l
2

So it does not look like we can just remove it like you did. At minimum
we will need to reserve the bit.
Yes qemu does not seem to set this bit.  Need to check others e.g. kvm tool etc.

Benefit seems very small. Why bother?

> > Note: adding new features adds zero value in this respect - it will not
> > allow simplifying the hypervisor.
> 
> Indeed, it will add one line of code to the hypervisor to advertise the
> new feature.
> 
> Paolo

So there's no point. Migration will stil be broken until it is
fixed properly.

-- 
MST

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-06 12:51               ` Michael S. Tsirkin
  0 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-06 12:51 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: fes, aarcange, riel, kvm, linux-kernel, mikew, yinghan, virtualization

On Thu, Sep 06, 2012 at 02:13:14PM +0200, Paolo Bonzini wrote:
> Il 06/09/2012 12:53, Michael S. Tsirkin ha scritto:
> >> It is useful because it lets guests inflate the balloon aggressively,
> >> and then use ballooned-out pages even in places where the guest OS
> >> cannot sleep, such as kmalloc(GFP_ATOMIC).
> > 
> > Interesting.
> > Do you intend to develop a driver patch using this?  I'd like to see how
> > that works.  Because if not, IMO it's best to wait until someone asks
> > for it.
> 
> It's been two months, but Frank Swiderski's patch that triggered the
> debate is exactly that
> (http://permalink.gmane.org/gmane.linux.kernel/1318984).  However, he
> didn't check VIRTIO_BALLOON_F_MUST_TELL_HOST, so he has a bug there.

He is using a sepate device ID though, so we do not need a feature bit.

> >> Currently migration works the same way for all virtio devices,
> >> and assumes that features are defined only in the "positive" direction:
> >> drivers request features if they want to use it, devices provide
> >> features to say they support something.
> > 
> > Well this approach is buggy. If I reread features after migration what
> > do I see? Something changed right? So this is a bug. Migration should
> > not change hardware.
> 
> Exactly, virtio migration currently fails if it would change hardware
> due to features not supported in the destination.  Except for
> VIRTIO_BALLOON_F_MUST_TELL_HOST, where it does not fail because it is
> defined in the wrong direction.

There is nothing wrong with the direction that I can see.
The bug is that migration between backends

> > Fix that in qemu, and the problem goes away without spec changes.
> 
> That would be a one-off hack, for the sole feature that was defined wrong.

Not at all. It's a fundamental bug, as long as it's unfixed talking
about migration is just useless.

> >> Instead, in the case of this feature, the driver requests it before
> >> relying on its lack (which is odd);
> > 
> > Which code in driver do you refer to?
> 
> I'm talking of the code Frank should have put in the driver, but he
> didn't (so he has a bug).  Something like
> 
>     if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST))
> 	return -ENODEV;
> 
> So it has to request the feature, and then fail if the feature is
> present.  That's quite backwards.  Everywhere else you'll find
> 
>     if (!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_SCSI))
>         return -ENOTTY;
> 
>     BUG_ON(!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_CONFIG_WCE));

This is a bug BTW - we should not crash on bad device, failing probe
is the right thing.

> 
>     if (virtio_has_feature(vblk->vdev, VIRTIO_NET_F_CTRL_VQ)) {
>         /* do cool stuff */
>     }
> 
> etc.
> 

See above. Frank's driver does not seem to have a bug.

> >> the device provides if they do not
> >> support something (which is wrong).
> > 
> > Not support? It just seems to be asking guest to tell it about deflates.
> > If guest acks the bit, we know it will. If it does not,
> > it will not.
> 
> No, it is the other way round.  The host ultimately decides what
> features are negotiated, so it doesn't ask anything to the guest.  The
> _guest_ is asking the host about the need for explicit deflate.

Now we are arguing about words.  This is why meta arguments without
specific examples are so bad.

> >>  You can see that this just cannot
> >> provide backwards-compatibility in the device;
> > 
> > Sorry I do not understand this meta argument.
> > There should be an example where a driver and device
> > fail to work together.
> 
> There's nothing that you cannot work around.  Use virtio_has_feature in
> the device, invert the migration feature check in the driver.  Why not
> just _get it right_ instead?

Exactly. Bug is in qemu, fix it _there_. What you do is a work around
in spec: you declare old configurations unsupported.

> >> it happens to work only
> >> because the feature was there in the first version of the spec.
> > 
> > This is how we do compatiblity in virtio. If we want driver to do
> > something, we add a feature and it can ack, if it does we know it will
> > do what we want.  Another example is network announce bit.  If driver
> > acks it, we know we do not need to send gratitious arp from qemu.  You
> > are saying it is also broken?
> 
> No, it's not broken.  A reverse feature, let's call it like
> VIRTIO_NET_F_HOST_MUST_SEND_GARP, would be broken.
> 
> VIRTIO_NET_F_GUEST_ANNOUNCE is a "positive" feature: if set, the host
> _can_ ask the guest to send a gARP, but it may also send it itself.
> Similarly if VIRTIO_BALLOON_F_SILENT_DEFLATE is set, the guest _can_ use
> ballooned pages directly, but may also deflate them explicitly.
> 
> Instead, VIRTIO_NET_F_HOST_MUST_SEND_GARP would be a "negative" feature:
> if set, the host _may not_ rely on the guest to send a gARP.  Similarly
> if VIRTIO_BALLOON_F_MUST_TELL_HOST is set, the guest _may not_ use
> ballooned pages directly.
> 
> There are _no_ other negative features besides
> VIRTIO_BALLOON_F_MUST_TELL_HOST in the spec, and for a good
> reason---because they're broken.
> 
> (Hmm, actually we have one, VIRTIO_BLK_F_RO.  It is also a bit broken,
> but it is not so important because it depends on user input more than
> hypervisor version).

Now that we have a specific example, we can talk.
Simply, some features do not need an ack from guest:
they just tell guest something about device.
RO is one such feature.

> Reasoning on migration is just another way to see if the feature is
> positive.  During migration, new features available on the destination
> can always be masked.  But if removing the feature _adds_ a capability
> to the hardware, it's wrong.

Fact is, nothing except migration seems broken. This alone should
make you realize there is a bug in qemu not in driver or protocol.

> > Don't fix what is not broken. We get to carry compatibility
> > in both driver and host for a long time for each feature.
> 
> Sure, but better fix broken things _before_ somebody uses them.
> 
> I'm not proposing to replace VIRTIO_BLK_F_RO with VIRTIO_BLK_F_RW
> because it's in wide use and it would pose compatibility problems indeed.
> 
> But since VIRTIO_BALLOON_F_MUST_TELL_HOST does not exist in any source
> code, neither driver nor hypervisor,
> we get lucky and we can instantly
> deprecate it.

Which driver are you looking at?

 grep VIRTIO_BALLOON_F_MUST_TELL_HOST drivers/virtio/*c|wc -l
2

So it does not look like we can just remove it like you did. At minimum
we will need to reserve the bit.
Yes qemu does not seem to set this bit.  Need to check others e.g. kvm tool etc.

Benefit seems very small. Why bother?

> > Note: adding new features adds zero value in this respect - it will not
> > allow simplifying the hypervisor.
> 
> Indeed, it will add one line of code to the hypervisor to advertise the
> new feature.
> 
> Paolo

So there's no point. Migration will stil be broken until it is
fixed properly.

-- 
MST

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-06 12:51               ` Michael S. Tsirkin
@ 2012-09-06 13:12                 ` Paolo Bonzini
  -1 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-06 13:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: rusty, fes, aarcange, riel, kvm, linux-kernel, mikew, yinghan,
	virtualization

Il 06/09/2012 14:51, Michael S. Tsirkin ha scritto:
> On Thu, Sep 06, 2012 at 02:13:14PM +0200, Paolo Bonzini wrote:
>> Il 06/09/2012 12:53, Michael S. Tsirkin ha scritto:
>>>> It is useful because it lets guests inflate the balloon aggressively,
>>>> and then use ballooned-out pages even in places where the guest OS
>>>> cannot sleep, such as kmalloc(GFP_ATOMIC).
>>>
>>> Interesting.
>>> Do you intend to develop a driver patch using this?  I'd like to see how
>>> that works.  Because if not, IMO it's best to wait until someone asks
>>> for it.
>>
>> It's been two months, but Frank Swiderski's patch that triggered the
>> debate is exactly that
>> (http://permalink.gmane.org/gmane.linux.kernel/1318984).  However, he
>> didn't check VIRTIO_BALLOON_F_MUST_TELL_HOST, so he has a bug there.
> 
> He is using a sepate device ID though, so we do not need a feature bit.

It doesn't need to, it can work just as well with the existing device.

>> Exactly, virtio migration currently fails if it would change hardware
>> due to features not supported in the destination.  Except for
>> VIRTIO_BALLOON_F_MUST_TELL_HOST, where it does not fail because it is
>> defined in the wrong direction.
> 
> There is nothing wrong with the direction that I can see.
> The bug is that migration between backends

...?

>>     BUG_ON(!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_CONFIG_WCE));
> 
> This is a bug BTW - we should not crash on bad device, failing probe
> is the right thing.

(I knew you would say that. :)  The feature has been checked elsewhere,
and this code will be unreachable if the feature was not there in the
first place).

>>>>  You can see that this just cannot
>>>> provide backwards-compatibility in the device;
>>>
>>> Sorry I do not understand this meta argument.
>>> There should be an example where a driver and device
>>> fail to work together.
>>
>> There's nothing that you cannot work around.  Use virtio_has_feature in
>> the device, invert the migration feature check in the driver.  Why not
>> just _get it right_ instead?
> 
> Exactly. Bug is in qemu, fix it _there_. What you do is a work around
> in spec: you declare old configurations unsupported.

Such old configurations do not exist, so it's fine.

>> There are _no_ other negative features besides
>> VIRTIO_BALLOON_F_MUST_TELL_HOST in the spec, and for a good
>> reason---because they're broken.
>>
>> (Hmm, actually we have one, VIRTIO_BLK_F_RO.  It is also a bit broken,
>> but it is not so important because it depends on user input more than
>> hypervisor version).
> 
> Now that we have a specific example, we can talk.
> Simply, some features do not need an ack from guest:
> they just tell guest something about device.
> RO is one such feature.

I don't understand how that's relevant.  All features that just say
"this field is there in the configuration at that offset" need no ack
from guest, yet they're expressed as "positive" features.

>> Reasoning on migration is just another way to see if the feature is
>> positive.  During migration, new features available on the destination
>> can always be masked.  But if removing the feature _adds_ a capability
>> to the hardware, it's wrong.
> 
> Fact is, nothing except migration seems broken. This alone should
> make you realize there is a bug in qemu not in driver or protocol.

I'm not entirely sure that you understand why migration is broken and
why it is only broken for this particular feature bit (and in a more
benign way for VIRTIO_BLK_F_RO).

>>> Don't fix what is not broken. We get to carry compatibility
>>> in both driver and host for a long time for each feature.
>>
>> Sure, but better fix broken things _before_ somebody uses them.
>>
>> I'm not proposing to replace VIRTIO_BLK_F_RO with VIRTIO_BLK_F_RW
>> because it's in wide use and it would pose compatibility problems indeed.
>>
>> But since VIRTIO_BALLOON_F_MUST_TELL_HOST does not exist in any source
>> code, neither driver nor hypervisor,
>> we get lucky and we can instantly
>> deprecate it.
> 
> Which driver are you looking at?
> 
>  grep VIRTIO_BALLOON_F_MUST_TELL_HOST drivers/virtio/*c|wc -l
> 2

Well, I am also looking at how it is used:

static unsigned int features[] = {
        VIRTIO_BALLOON_F_MUST_TELL_HOST,
        VIRTIO_BALLOON_F_STATS_VQ,
};

        /*
         * Note that if
         * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
         * is true, we *have* to do it in this order
         */
        tell_host(vb, vb->deflate_vq);
        release_pages_by_pfn(vb->pfns, vb->num_pfns);

So no, it is not used with virtio_has_feature in any way that matters.

> So it does not look like we can just remove it like you did. At minimum
> we will need to reserve the bit.

Yes, reserving the bit is necessary.

> Benefit seems very small. Why bother?

Because the problem is clear, and easily solved, and I don't like time
bombs.

>>> Note: adding new features adds zero value in this respect - it will not
>>> allow simplifying the hypervisor.
>>
>> Indeed, it will add one line of code to the hypervisor to advertise the
>> new feature.
> 
> So there's no point. Migration will stil be broken until it is
> fixed properly.

Migration of VIRTIO_BALLOON_F_MUST_TELL_HOST would be broken, but
VIRTIO_BALLOON_F_MUST_TELL_HOST does not exist and will not exist in the
hypervisor, so it will not be broken.

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-06 13:12                 ` Paolo Bonzini
  0 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-06 13:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: fes, aarcange, riel, kvm, linux-kernel, mikew, yinghan, virtualization

Il 06/09/2012 14:51, Michael S. Tsirkin ha scritto:
> On Thu, Sep 06, 2012 at 02:13:14PM +0200, Paolo Bonzini wrote:
>> Il 06/09/2012 12:53, Michael S. Tsirkin ha scritto:
>>>> It is useful because it lets guests inflate the balloon aggressively,
>>>> and then use ballooned-out pages even in places where the guest OS
>>>> cannot sleep, such as kmalloc(GFP_ATOMIC).
>>>
>>> Interesting.
>>> Do you intend to develop a driver patch using this?  I'd like to see how
>>> that works.  Because if not, IMO it's best to wait until someone asks
>>> for it.
>>
>> It's been two months, but Frank Swiderski's patch that triggered the
>> debate is exactly that
>> (http://permalink.gmane.org/gmane.linux.kernel/1318984).  However, he
>> didn't check VIRTIO_BALLOON_F_MUST_TELL_HOST, so he has a bug there.
> 
> He is using a sepate device ID though, so we do not need a feature bit.

It doesn't need to, it can work just as well with the existing device.

>> Exactly, virtio migration currently fails if it would change hardware
>> due to features not supported in the destination.  Except for
>> VIRTIO_BALLOON_F_MUST_TELL_HOST, where it does not fail because it is
>> defined in the wrong direction.
> 
> There is nothing wrong with the direction that I can see.
> The bug is that migration between backends

...?

>>     BUG_ON(!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_CONFIG_WCE));
> 
> This is a bug BTW - we should not crash on bad device, failing probe
> is the right thing.

(I knew you would say that. :)  The feature has been checked elsewhere,
and this code will be unreachable if the feature was not there in the
first place).

>>>>  You can see that this just cannot
>>>> provide backwards-compatibility in the device;
>>>
>>> Sorry I do not understand this meta argument.
>>> There should be an example where a driver and device
>>> fail to work together.
>>
>> There's nothing that you cannot work around.  Use virtio_has_feature in
>> the device, invert the migration feature check in the driver.  Why not
>> just _get it right_ instead?
> 
> Exactly. Bug is in qemu, fix it _there_. What you do is a work around
> in spec: you declare old configurations unsupported.

Such old configurations do not exist, so it's fine.

>> There are _no_ other negative features besides
>> VIRTIO_BALLOON_F_MUST_TELL_HOST in the spec, and for a good
>> reason---because they're broken.
>>
>> (Hmm, actually we have one, VIRTIO_BLK_F_RO.  It is also a bit broken,
>> but it is not so important because it depends on user input more than
>> hypervisor version).
> 
> Now that we have a specific example, we can talk.
> Simply, some features do not need an ack from guest:
> they just tell guest something about device.
> RO is one such feature.

I don't understand how that's relevant.  All features that just say
"this field is there in the configuration at that offset" need no ack
from guest, yet they're expressed as "positive" features.

>> Reasoning on migration is just another way to see if the feature is
>> positive.  During migration, new features available on the destination
>> can always be masked.  But if removing the feature _adds_ a capability
>> to the hardware, it's wrong.
> 
> Fact is, nothing except migration seems broken. This alone should
> make you realize there is a bug in qemu not in driver or protocol.

I'm not entirely sure that you understand why migration is broken and
why it is only broken for this particular feature bit (and in a more
benign way for VIRTIO_BLK_F_RO).

>>> Don't fix what is not broken. We get to carry compatibility
>>> in both driver and host for a long time for each feature.
>>
>> Sure, but better fix broken things _before_ somebody uses them.
>>
>> I'm not proposing to replace VIRTIO_BLK_F_RO with VIRTIO_BLK_F_RW
>> because it's in wide use and it would pose compatibility problems indeed.
>>
>> But since VIRTIO_BALLOON_F_MUST_TELL_HOST does not exist in any source
>> code, neither driver nor hypervisor,
>> we get lucky and we can instantly
>> deprecate it.
> 
> Which driver are you looking at?
> 
>  grep VIRTIO_BALLOON_F_MUST_TELL_HOST drivers/virtio/*c|wc -l
> 2

Well, I am also looking at how it is used:

static unsigned int features[] = {
        VIRTIO_BALLOON_F_MUST_TELL_HOST,
        VIRTIO_BALLOON_F_STATS_VQ,
};

        /*
         * Note that if
         * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
         * is true, we *have* to do it in this order
         */
        tell_host(vb, vb->deflate_vq);
        release_pages_by_pfn(vb->pfns, vb->num_pfns);

So no, it is not used with virtio_has_feature in any way that matters.

> So it does not look like we can just remove it like you did. At minimum
> we will need to reserve the bit.

Yes, reserving the bit is necessary.

> Benefit seems very small. Why bother?

Because the problem is clear, and easily solved, and I don't like time
bombs.

>>> Note: adding new features adds zero value in this respect - it will not
>>> allow simplifying the hypervisor.
>>
>> Indeed, it will add one line of code to the hypervisor to advertise the
>> new feature.
> 
> So there's no point. Migration will stil be broken until it is
> fixed properly.

Migration of VIRTIO_BALLOON_F_MUST_TELL_HOST would be broken, but
VIRTIO_BALLOON_F_MUST_TELL_HOST does not exist and will not exist in the
hypervisor, so it will not be broken.

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-06  7:46 ` Paolo Bonzini
@ 2012-09-06 23:41   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-06 23:41 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: rusty, fes, aarcange, riel, kvm, linux-kernel, mikew, yinghan,
	virtualization

On Thu, Sep 06, 2012 at 09:46:50AM +0200, Paolo Bonzini wrote:
> VIRTIO_BALLOON_F_MUST_TELL_HOST cannot be used properly because it is a
> "negative" feature: it tells you that silent defalte is not supported.
> Right now, QEMU refuses migration if the target does not support all the
> features that were negotiated.  But then:
> - a migration from non-MUST_TELL_HOST to MUST_TELL_HOST will succeed,
> which is wrong;

It need not be wrong. It depends on how host behaves.
The right thing for qemu to do would be to assume that
since this bit was not acked it can not assume specific
guest behaviour.

Even ignoring that, looking at this at a high level: basically you have
configured two different machines with different qemu flags, and are
complaning that migration did not fail cleanly.

However
- This is still a user/management bug. You should not even try
  such migration. Yes I put a sanity check there but it is
  just a debugging aid. It is not indended to be exhaustive.
  This is far from the only case where user error might cause
  problems silently.
- Yes clean failure would be nicer, if
  you want to guarantee this just teach qemu to send
  all host
  features and verify they match on destination.
  Or more generally send machine configuration.
  No need to change spec or special case this bit at all.

> 
> - a migration from MUST_TELL_HOST to non-MUST_TELL_HOST will fail, which
> is useless.

It is correct. device feature bits should not change across migration.


> Add instead a new feature VIRTIO_BALLOON_F_SILENT_DEFLATE, and deprecate
> VIRTIO_BALLOON_F_MUST_TELL_HOST since it is never actually used.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  virtio-spec.lyx | 36 +++++++++++++++++++++++++++++++++---
>  1 file modificato, 33 inserzioni(+), 3 rimozioni(-)
> 
> diff --git a/virtio-spec.lyx b/virtio-spec.lyx
> index 7a073f4..1a25a18 100644
> --- a/virtio-spec.lyx
> +++ b/virtio-spec.lyx
> @@ -6238,6 +6238,8 @@ bits
>  
>  \begin_deeper
>  \begin_layout Description
> +
> +\change_deleted 1531152142 1346917221
>  VIRTIO_BALLOON_F_MUST_TELL_HOST
>  \begin_inset space ~
>  \end_inset
> @@ -6251,6 +6253,20 @@ VIRTIO_BALLOON_F_STATS_VQ
>  \end_inset
>  
>  (1) A virtqueue for reporting guest memory statistics is present.
> +\change_inserted 1531152142 1346917193
> +
> +\end_layout
> +
> +\begin_layout Description
> +
> +\change_inserted 1531152142 1346917219
> +VIRTIO_BALLOON_F_SILENT_DEFLATE
> +\begin_inset space ~
> +\end_inset
> +
> +(2) Host does not need to be told before pages from the balloon are used.
> +\change_unchanged
> +
>  \end_layout
>  
>  \end_deeper
> @@ -6401,9 +6417,23 @@ The driver constructs an array of addresses of memory pages it has previously
>  \end_layout
>  
>  \begin_layout Enumerate
> -If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is set, the guest may not
> - use these requested pages until that descriptor in the deflateq has been
> - used by the device.
> +If the VIRTIO_BALLOON_F_
> +\change_deleted 1531152142 1346917234
> +MUST_TELL_HOST
> +\change_inserted 1531152142 1346917237
> +SILENT_DEFLATE
> +\change_unchanged
> + feature is 
> +\change_inserted 1531152142 1346917241
> +not 
> +\change_unchanged
> +set, the guest may not use these requested pages until that descriptor in
> + the deflateq has been used by the device.
> +
> +\change_inserted 1531152142 1346917253
> + If it is set, the guest may choose to not use the deflateq at all.
> +\change_unchanged
> +
>  \end_layout
>  
>  \begin_layout Enumerate
> -- 
> 1.7.11.2

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-06 23:41   ` Michael S. Tsirkin
  0 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-06 23:41 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: fes, aarcange, riel, kvm, linux-kernel, mikew, yinghan, virtualization

On Thu, Sep 06, 2012 at 09:46:50AM +0200, Paolo Bonzini wrote:
> VIRTIO_BALLOON_F_MUST_TELL_HOST cannot be used properly because it is a
> "negative" feature: it tells you that silent defalte is not supported.
> Right now, QEMU refuses migration if the target does not support all the
> features that were negotiated.  But then:
> - a migration from non-MUST_TELL_HOST to MUST_TELL_HOST will succeed,
> which is wrong;

It need not be wrong. It depends on how host behaves.
The right thing for qemu to do would be to assume that
since this bit was not acked it can not assume specific
guest behaviour.

Even ignoring that, looking at this at a high level: basically you have
configured two different machines with different qemu flags, and are
complaning that migration did not fail cleanly.

However
- This is still a user/management bug. You should not even try
  such migration. Yes I put a sanity check there but it is
  just a debugging aid. It is not indended to be exhaustive.
  This is far from the only case where user error might cause
  problems silently.
- Yes clean failure would be nicer, if
  you want to guarantee this just teach qemu to send
  all host
  features and verify they match on destination.
  Or more generally send machine configuration.
  No need to change spec or special case this bit at all.

> 
> - a migration from MUST_TELL_HOST to non-MUST_TELL_HOST will fail, which
> is useless.

It is correct. device feature bits should not change across migration.


> Add instead a new feature VIRTIO_BALLOON_F_SILENT_DEFLATE, and deprecate
> VIRTIO_BALLOON_F_MUST_TELL_HOST since it is never actually used.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  virtio-spec.lyx | 36 +++++++++++++++++++++++++++++++++---
>  1 file modificato, 33 inserzioni(+), 3 rimozioni(-)
> 
> diff --git a/virtio-spec.lyx b/virtio-spec.lyx
> index 7a073f4..1a25a18 100644
> --- a/virtio-spec.lyx
> +++ b/virtio-spec.lyx
> @@ -6238,6 +6238,8 @@ bits
>  
>  \begin_deeper
>  \begin_layout Description
> +
> +\change_deleted 1531152142 1346917221
>  VIRTIO_BALLOON_F_MUST_TELL_HOST
>  \begin_inset space ~
>  \end_inset
> @@ -6251,6 +6253,20 @@ VIRTIO_BALLOON_F_STATS_VQ
>  \end_inset
>  
>  (1) A virtqueue for reporting guest memory statistics is present.
> +\change_inserted 1531152142 1346917193
> +
> +\end_layout
> +
> +\begin_layout Description
> +
> +\change_inserted 1531152142 1346917219
> +VIRTIO_BALLOON_F_SILENT_DEFLATE
> +\begin_inset space ~
> +\end_inset
> +
> +(2) Host does not need to be told before pages from the balloon are used.
> +\change_unchanged
> +
>  \end_layout
>  
>  \end_deeper
> @@ -6401,9 +6417,23 @@ The driver constructs an array of addresses of memory pages it has previously
>  \end_layout
>  
>  \begin_layout Enumerate
> -If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is set, the guest may not
> - use these requested pages until that descriptor in the deflateq has been
> - used by the device.
> +If the VIRTIO_BALLOON_F_
> +\change_deleted 1531152142 1346917234
> +MUST_TELL_HOST
> +\change_inserted 1531152142 1346917237
> +SILENT_DEFLATE
> +\change_unchanged
> + feature is 
> +\change_inserted 1531152142 1346917241
> +not 
> +\change_unchanged
> +set, the guest may not use these requested pages until that descriptor in
> + the deflateq has been used by the device.
> +
> +\change_inserted 1531152142 1346917253
> + If it is set, the guest may choose to not use the deflateq at all.
> +\change_unchanged
> +
>  \end_layout
>  
>  \begin_layout Enumerate
> -- 
> 1.7.11.2

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-06 12:13             ` Paolo Bonzini
  (?)
  (?)
@ 2012-09-06 23:45             ` Rusty Russell
  2012-09-07  5:42                 ` Michael S. Tsirkin
  -1 siblings, 1 reply; 71+ messages in thread
From: Rusty Russell @ 2012-09-06 23:45 UTC (permalink / raw)
  To: Paolo Bonzini, Michael S. Tsirkin
  Cc: fes, aarcange, riel, kvm, linux-kernel, mikew, yinghan, virtualization

Paolo Bonzini <pbonzini@redhat.com> writes:
> Instead, VIRTIO_NET_F_HOST_MUST_SEND_GARP would be a "negative" feature:
> if set, the host _may not_ rely on the guest to send a gARP.  Similarly
> if VIRTIO_BALLOON_F_MUST_TELL_HOST is set, the guest _may not_ use
> ballooned pages directly.
>
> There are _no_ other negative features besides
> VIRTIO_BALLOON_F_MUST_TELL_HOST in the spec, and for a good
> reason---because they're broken.
>
> (Hmm, actually we have one, VIRTIO_BLK_F_RO.  It is also a bit broken,
> but it is not so important because it depends on user input more than
> hypervisor version).

Yes, this is the key observation, and an important lesson for the
future.  Thanks!

Note that these two negative features were in the original spec, where
it's assumed that every device supports them.  That's not explicitly
documented, however.

I like killing the totally unused feature.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-06 23:45             ` Rusty Russell
@ 2012-09-07  5:42                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-07  5:42 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Paolo Bonzini, fes, aarcange, riel, kvm, linux-kernel, mikew,
	yinghan, virtualization, yvugenfi, vrozenfe

On Fri, Sep 07, 2012 at 09:15:46AM +0930, Rusty Russell wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> > Instead, VIRTIO_NET_F_HOST_MUST_SEND_GARP would be a "negative" feature:
> > if set, the host _may not_ rely on the guest to send a gARP.  Similarly
> > if VIRTIO_BALLOON_F_MUST_TELL_HOST is set, the guest _may not_ use
> > ballooned pages directly.
> >
> > There are _no_ other negative features besides
> > VIRTIO_BALLOON_F_MUST_TELL_HOST in the spec, and for a good
> > reason---because they're broken.
> >
> > (Hmm, actually we have one, VIRTIO_BLK_F_RO.  It is also a bit broken,
> > but it is not so important because it depends on user input more than
> > hypervisor version).
> 
> Yes, this is the key observation, and an important lesson for the
> future.  Thanks!
> Note that these two negative features were in the original spec, where
> it's assumed that every device supports them.  That's not explicitly
> documented, however.

I'm curious what would we do for the future? I tried to imagine that _RO
was not in the original spec, so virtio-blk expects a r/w device.
Now we can not add _RW - old hypervisors do not set it, and old
drivers do not ack it.
What would a new flag with equivalent functionality be?

> I like killing the totally unused feature.
> 
> Cheers,
> Rusty.

I tried verifying that it is unused. And found this:
Linux drivers currently try to ack this feature if it is there,
so do BSD drivers. Both from v1. And they tell host before leak.
So that is fine.

But looking at windows drivers:
https://github.com/YanVugenfirer/kvm-guest-drivers-windows/blob/master/Balloon/sys/balloon.c
they do *not* ack this bit, and BalloonLeak calls TellHost at the last
line.

So it looks like a bug: we should teach driver to tell host first on leak?
Yan, Vadim, can you comment please?

Also if true, looks like this bit will be useful to detect a fixed driver on
the hypervisor side - to avoid unmapping such pages? Rusty what do you
think?

-- 
MST

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-07  5:42                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-07  5:42 UTC (permalink / raw)
  To: Rusty Russell
  Cc: fes, aarcange, riel, kvm, yvugenfi, linux-kernel, mikew, yinghan,
	Paolo Bonzini, virtualization

On Fri, Sep 07, 2012 at 09:15:46AM +0930, Rusty Russell wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> > Instead, VIRTIO_NET_F_HOST_MUST_SEND_GARP would be a "negative" feature:
> > if set, the host _may not_ rely on the guest to send a gARP.  Similarly
> > if VIRTIO_BALLOON_F_MUST_TELL_HOST is set, the guest _may not_ use
> > ballooned pages directly.
> >
> > There are _no_ other negative features besides
> > VIRTIO_BALLOON_F_MUST_TELL_HOST in the spec, and for a good
> > reason---because they're broken.
> >
> > (Hmm, actually we have one, VIRTIO_BLK_F_RO.  It is also a bit broken,
> > but it is not so important because it depends on user input more than
> > hypervisor version).
> 
> Yes, this is the key observation, and an important lesson for the
> future.  Thanks!
> Note that these two negative features were in the original spec, where
> it's assumed that every device supports them.  That's not explicitly
> documented, however.

I'm curious what would we do for the future? I tried to imagine that _RO
was not in the original spec, so virtio-blk expects a r/w device.
Now we can not add _RW - old hypervisors do not set it, and old
drivers do not ack it.
What would a new flag with equivalent functionality be?

> I like killing the totally unused feature.
> 
> Cheers,
> Rusty.

I tried verifying that it is unused. And found this:
Linux drivers currently try to ack this feature if it is there,
so do BSD drivers. Both from v1. And they tell host before leak.
So that is fine.

But looking at windows drivers:
https://github.com/YanVugenfirer/kvm-guest-drivers-windows/blob/master/Balloon/sys/balloon.c
they do *not* ack this bit, and BalloonLeak calls TellHost at the last
line.

So it looks like a bug: we should teach driver to tell host first on leak?
Yan, Vadim, can you comment please?

Also if true, looks like this bit will be useful to detect a fixed driver on
the hypervisor side - to avoid unmapping such pages? Rusty what do you
think?

-- 
MST

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-07  5:42                 ` Michael S. Tsirkin
@ 2012-09-07  6:39                   ` Rusty Russell
  -1 siblings, 0 replies; 71+ messages in thread
From: Rusty Russell @ 2012-09-07  6:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Paolo Bonzini, fes, aarcange, riel, kvm, linux-kernel, mikew,
	yinghan, virtualization, yvugenfi, vrozenfe

"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Fri, Sep 07, 2012 at 09:15:46AM +0930, Rusty Russell wrote:
>> Paolo Bonzini <pbonzini@redhat.com> writes:
>> > Instead, VIRTIO_NET_F_HOST_MUST_SEND_GARP would be a "negative" feature:
>> > if set, the host _may not_ rely on the guest to send a gARP.  Similarly
>> > if VIRTIO_BALLOON_F_MUST_TELL_HOST is set, the guest _may not_ use
>> > ballooned pages directly.
>> >
>> > There are _no_ other negative features besides
>> > VIRTIO_BALLOON_F_MUST_TELL_HOST in the spec, and for a good
>> > reason---because they're broken.
>> >
>> > (Hmm, actually we have one, VIRTIO_BLK_F_RO.  It is also a bit broken,
>> > but it is not so important because it depends on user input more than
>> > hypervisor version).
>> 
>> Yes, this is the key observation, and an important lesson for the
>> future.  Thanks!
>> Note that these two negative features were in the original spec, where
>> it's assumed that every device supports them.  That's not explicitly
>> documented, however.
>
> I'm curious what would we do for the future? I tried to imagine that _RO
> was not in the original spec, so virtio-blk expects a r/w device.
> Now we can not add _RW - old hypervisors do not set it, and old
> drivers do not ack it.
> What would a new flag with equivalent functionality be?

Backwards compatibility in the R/O case would actually work: just fail
writes.  Because it's just friendly advice to the OS, really.

The final test is always: does it break users?  If there are no users
who will notice, we can do anything.  If there are users, we have to
keep backwards compatibility, and that implies we can't add "must know"
features.

> So it looks like a bug: we should teach driver to tell host first on leak?
> Yan, Vadim, can you comment please?
>
> Also if true, looks like this bit will be useful to detect a fixed driver on
> the hypervisor side - to avoid unmapping such pages? Rusty what do you
> think?

So, feature is unimplemented in qemu, and broken in drivers.  I starting
to share Paolo's dislike of it.

Don't understand why we'd care about fixed drivers though, if we remove
the feature bit....

Cheers,
Rusty.


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-07  6:39                   ` Rusty Russell
  0 siblings, 0 replies; 71+ messages in thread
From: Rusty Russell @ 2012-09-07  6:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: fes, aarcange, riel, kvm, yvugenfi, linux-kernel, mikew, yinghan,
	Paolo Bonzini, virtualization

"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Fri, Sep 07, 2012 at 09:15:46AM +0930, Rusty Russell wrote:
>> Paolo Bonzini <pbonzini@redhat.com> writes:
>> > Instead, VIRTIO_NET_F_HOST_MUST_SEND_GARP would be a "negative" feature:
>> > if set, the host _may not_ rely on the guest to send a gARP.  Similarly
>> > if VIRTIO_BALLOON_F_MUST_TELL_HOST is set, the guest _may not_ use
>> > ballooned pages directly.
>> >
>> > There are _no_ other negative features besides
>> > VIRTIO_BALLOON_F_MUST_TELL_HOST in the spec, and for a good
>> > reason---because they're broken.
>> >
>> > (Hmm, actually we have one, VIRTIO_BLK_F_RO.  It is also a bit broken,
>> > but it is not so important because it depends on user input more than
>> > hypervisor version).
>> 
>> Yes, this is the key observation, and an important lesson for the
>> future.  Thanks!
>> Note that these two negative features were in the original spec, where
>> it's assumed that every device supports them.  That's not explicitly
>> documented, however.
>
> I'm curious what would we do for the future? I tried to imagine that _RO
> was not in the original spec, so virtio-blk expects a r/w device.
> Now we can not add _RW - old hypervisors do not set it, and old
> drivers do not ack it.
> What would a new flag with equivalent functionality be?

Backwards compatibility in the R/O case would actually work: just fail
writes.  Because it's just friendly advice to the OS, really.

The final test is always: does it break users?  If there are no users
who will notice, we can do anything.  If there are users, we have to
keep backwards compatibility, and that implies we can't add "must know"
features.

> So it looks like a bug: we should teach driver to tell host first on leak?
> Yan, Vadim, can you comment please?
>
> Also if true, looks like this bit will be useful to detect a fixed driver on
> the hypervisor side - to avoid unmapping such pages? Rusty what do you
> think?

So, feature is unimplemented in qemu, and broken in drivers.  I starting
to share Paolo's dislike of it.

Don't understand why we'd care about fixed drivers though, if we remove
the feature bit....

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-07  6:39                   ` Rusty Russell
@ 2012-09-07  9:27                     ` Paolo Bonzini
  -1 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-07  9:27 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Michael S. Tsirkin, fes, aarcange, riel, kvm, linux-kernel,
	mikew, yinghan, virtualization, yvugenfi, vrozenfe

Il 07/09/2012 08:39, Rusty Russell ha scritto:
>> > So it looks like a bug: we should teach driver to tell host first on leak?
>> > Yan, Vadim, can you comment please?
>> >
>> > Also if true, looks like this bit will be useful to detect a fixed driver on
>> > the hypervisor side - to avoid unmapping such pages? Rusty what do you
>> > think?
> So, feature is unimplemented in qemu, and broken in drivers.  I starting
> to share Paolo's dislike of it.
> 
> Don't understand why we'd care about fixed drivers though, if we remove
> the feature bit....

Hmm, Michael has a point here.  Basically, the Windows driver is using
silent deflate, but not telling the host (yet) about it.  So, we must
assume that a driver that does not negotiate
VIRTIO_BALLOON_F_MUST_TELL_HOST _will_ use silent deflate.

Here's a way to proceed.

We add VIRTIO_BALLOON_F_SILENT_DEFLATE, which is negotiated normally.
If not available, at worst the guest driver may refuse to start, or
revert to using the deflateq.

We rename VIRTIO_BALLOON_F_MUST_TELL_HOST to WILL_TELL_HOST, since
that's how it's being used.  Now for the device there are three cases:

- does not support silent deflate at all: it should always propose
  VIRTIO_BALLOON_F_WILL_TELL_HOST; if the (bad) driver does not
  negotiate it, the device must assume that the guest will use silent
  deflate, and fail to start the guest if the device does not support
  silent deflate.

- optionally supports silent deflate: it should always propose
  VIRTIO_BALLOON_F_WILL_TELL_HOST; if the (bad) driver does not
  negotiate it, the device must assume that the guest will use silent
  deflate

- always supports silent deflate: does not need to do anything,
  current behavior works fine.  But the driver might as well propose
  VIRTIO_BALLOON_F_WILL_TELL_HOST, so that migration works fine.  (This
  is a hardware change, so it must be versioned, yadda yadda).

I can prepare a spec patch for this.


BTW, since we have in the archives an example of using silent deflate,
here is an example of non-silent deflate.  It may help understanding the
above with an actual example of a device.  Suppose a guest is using PCI
passthrough, so it has all memory pinned.

- If the guest will _not_ use silent deflate, we can unlock memory on
inflate and lock it back on deflate.  (The question is what to do if
locking fail; left for when someone actually implements this thing).

- If the guest will use silent deflate, we cannot do that.

So this is the second case above.  The device must propose
VIRTIO_BALLOON_F_WILL_TELL_HOST.  Then:

- if the guest negotiates VIRTIO_BALLOON_F_SILENT_DEFLATE,
  we cannot do the munlock/mlock

- if the guest negotiates VIRTIO_BALLOON_F_WILL_TELL_HOST,
  we can do the munlock/mlock

- if the guest does not negotiate either, the driver is buggy
  and we cannot do the munlock/mlock

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-07  9:27                     ` Paolo Bonzini
  0 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-07  9:27 UTC (permalink / raw)
  To: Rusty Russell
  Cc: fes, aarcange, riel, kvm, Michael S. Tsirkin, yvugenfi,
	linux-kernel, mikew, yinghan, virtualization

Il 07/09/2012 08:39, Rusty Russell ha scritto:
>> > So it looks like a bug: we should teach driver to tell host first on leak?
>> > Yan, Vadim, can you comment please?
>> >
>> > Also if true, looks like this bit will be useful to detect a fixed driver on
>> > the hypervisor side - to avoid unmapping such pages? Rusty what do you
>> > think?
> So, feature is unimplemented in qemu, and broken in drivers.  I starting
> to share Paolo's dislike of it.
> 
> Don't understand why we'd care about fixed drivers though, if we remove
> the feature bit....

Hmm, Michael has a point here.  Basically, the Windows driver is using
silent deflate, but not telling the host (yet) about it.  So, we must
assume that a driver that does not negotiate
VIRTIO_BALLOON_F_MUST_TELL_HOST _will_ use silent deflate.

Here's a way to proceed.

We add VIRTIO_BALLOON_F_SILENT_DEFLATE, which is negotiated normally.
If not available, at worst the guest driver may refuse to start, or
revert to using the deflateq.

We rename VIRTIO_BALLOON_F_MUST_TELL_HOST to WILL_TELL_HOST, since
that's how it's being used.  Now for the device there are three cases:

- does not support silent deflate at all: it should always propose
  VIRTIO_BALLOON_F_WILL_TELL_HOST; if the (bad) driver does not
  negotiate it, the device must assume that the guest will use silent
  deflate, and fail to start the guest if the device does not support
  silent deflate.

- optionally supports silent deflate: it should always propose
  VIRTIO_BALLOON_F_WILL_TELL_HOST; if the (bad) driver does not
  negotiate it, the device must assume that the guest will use silent
  deflate

- always supports silent deflate: does not need to do anything,
  current behavior works fine.  But the driver might as well propose
  VIRTIO_BALLOON_F_WILL_TELL_HOST, so that migration works fine.  (This
  is a hardware change, so it must be versioned, yadda yadda).

I can prepare a spec patch for this.


BTW, since we have in the archives an example of using silent deflate,
here is an example of non-silent deflate.  It may help understanding the
above with an actual example of a device.  Suppose a guest is using PCI
passthrough, so it has all memory pinned.

- If the guest will _not_ use silent deflate, we can unlock memory on
inflate and lock it back on deflate.  (The question is what to do if
locking fail; left for when someone actually implements this thing).

- If the guest will use silent deflate, we cannot do that.

So this is the second case above.  The device must propose
VIRTIO_BALLOON_F_WILL_TELL_HOST.  Then:

- if the guest negotiates VIRTIO_BALLOON_F_SILENT_DEFLATE,
  we cannot do the munlock/mlock

- if the guest negotiates VIRTIO_BALLOON_F_WILL_TELL_HOST,
  we can do the munlock/mlock

- if the guest does not negotiate either, the driver is buggy
  and we cannot do the munlock/mlock

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-07  6:39                   ` Rusty Russell
@ 2012-09-07 10:43                     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-07 10:43 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Paolo Bonzini, fes, aarcange, riel, kvm, linux-kernel, mikew,
	yinghan, virtualization, yvugenfi, vrozenfe

On Fri, Sep 07, 2012 at 04:09:50PM +0930, Rusty Russell wrote:
> > So it looks like a bug: we should teach driver to tell host first on leak?
> > Yan, Vadim, can you comment please?
> >
> > Also if true, looks like this bit will be useful to detect a fixed driver on
> > the hypervisor side - to avoid unmapping such pages? Rusty what do you
> > think?
> 
> So, feature is unimplemented in qemu, and broken in drivers.  I starting
> to share Paolo's dislike of it.

What is broken in drivers?

To me it looks like it works exactly as advertized: linux and bsd ack
TELL_HOST and tells host first. windows does not do either.  Do windows
drivers currently are not broken: they just do not support the feature.

The claim that drivers and hyprevisors do not support it is false:
qemu does not support it but drivers do and in non trivial way.

> Don't understand why we'd care about fixed drivers though, if we remove
> the feature bit....
> 
> Cheers,
> Rusty.

Do we really know there are no hypervisors implementing it?
As I said above drivers do have support.

-- 
MST

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-07 10:43                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-07 10:43 UTC (permalink / raw)
  To: Rusty Russell
  Cc: fes, aarcange, riel, kvm, yvugenfi, linux-kernel, mikew, yinghan,
	Paolo Bonzini, virtualization

On Fri, Sep 07, 2012 at 04:09:50PM +0930, Rusty Russell wrote:
> > So it looks like a bug: we should teach driver to tell host first on leak?
> > Yan, Vadim, can you comment please?
> >
> > Also if true, looks like this bit will be useful to detect a fixed driver on
> > the hypervisor side - to avoid unmapping such pages? Rusty what do you
> > think?
> 
> So, feature is unimplemented in qemu, and broken in drivers.  I starting
> to share Paolo's dislike of it.

What is broken in drivers?

To me it looks like it works exactly as advertized: linux and bsd ack
TELL_HOST and tells host first. windows does not do either.  Do windows
drivers currently are not broken: they just do not support the feature.

The claim that drivers and hyprevisors do not support it is false:
qemu does not support it but drivers do and in non trivial way.

> Don't understand why we'd care about fixed drivers though, if we remove
> the feature bit....
> 
> Cheers,
> Rusty.

Do we really know there are no hypervisors implementing it?
As I said above drivers do have support.

-- 
MST

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-07  9:27                     ` Paolo Bonzini
@ 2012-09-07 10:53                       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-07 10:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Rusty Russell, fes, aarcange, riel, kvm, linux-kernel, mikew,
	yinghan, virtualization, yvugenfi, vrozenfe

On Fri, Sep 07, 2012 at 11:27:42AM +0200, Paolo Bonzini wrote:
> Il 07/09/2012 08:39, Rusty Russell ha scritto:
> >> > So it looks like a bug: we should teach driver to tell host first on leak?
> >> > Yan, Vadim, can you comment please?
> >> >
> >> > Also if true, looks like this bit will be useful to detect a fixed driver on
> >> > the hypervisor side - to avoid unmapping such pages? Rusty what do you
> >> > think?
> > So, feature is unimplemented in qemu, and broken in drivers.  I starting
> > to share Paolo's dislike of it.
> > 
> > Don't understand why we'd care about fixed drivers though, if we remove
> > the feature bit....
> 
> Hmm, Michael has a point here.  Basically, the Windows driver is using
> silent deflate, but not telling the host (yet) about it.  So, we must
> assume that a driver that does not negotiate
> VIRTIO_BALLOON_F_MUST_TELL_HOST _will_ use silent deflate.
> 
> Here's a way to proceed.
> 
> We add VIRTIO_BALLOON_F_SILENT_DEFLATE, which is negotiated normally.
> If not available, at worst the guest driver may refuse to start, or
> revert to using the deflateq.
> 
> We rename VIRTIO_BALLOON_F_MUST_TELL_HOST to WILL_TELL_HOST, since
> that's how it's being used.  Now for the device there are three cases:
> 
> - does not support silent deflate at all: it should always propose
>   VIRTIO_BALLOON_F_WILL_TELL_HOST; if the (bad) driver does not
>   negotiate it, the device must assume that the guest will use silent
>   deflate, and fail to start the guest if the device does not support
>   silent deflate.
> 
> - optionally supports silent deflate: it should always propose
>   VIRTIO_BALLOON_F_WILL_TELL_HOST; if the (bad) driver does not
>   negotiate it, the device must assume that the guest will use silent
>   deflate
> 
> - always supports silent deflate: does not need to do anything,
>   current behavior works fine.  But the driver might as well propose
>   VIRTIO_BALLOON_F_WILL_TELL_HOST, so that migration works fine.  (This
>   is a hardware change, so it must be versioned, yadda yadda).
> 
> I can prepare a spec patch for this.
> 
> 
> BTW, since we have in the archives an example of using silent deflate,
> here is an example of non-silent deflate.  It may help understanding the
> above with an actual example of a device.  Suppose a guest is using PCI
> passthrough, so it has all memory pinned.
> 
> - If the guest will _not_ use silent deflate, we can unlock memory on
> inflate and lock it back on deflate.  (The question is what to do if
> locking fail; left for when someone actually implements this thing).
> 
> - If the guest will use silent deflate, we cannot do that.
> 
> So this is the second case above.  The device must propose
> VIRTIO_BALLOON_F_WILL_TELL_HOST.  Then:
> 
> - if the guest negotiates VIRTIO_BALLOON_F_SILENT_DEFLATE,
>   we cannot do the munlock/mlock
> 
> - if the guest negotiates VIRTIO_BALLOON_F_WILL_TELL_HOST,
>   we can do the munlock/mlock
> 
> - if the guest does not negotiate either, the driver is buggy
>   and we cannot do the munlock/mlock
> 
> Paolo


Let us start with what is broken currently. Looking at
it very closely, I think the answer is nothing.
Even migration in qemu is not broken as you claimed initially.

Next, consider the interface proposed here. You defacto declare
all existing drivers buggy. This is a wrong thing to do.
You also use two feature bits for a single simple thing,
this is inelegant.

Last, let us consider how existing feature can be used
in the hypervisor. If driver did not ack
MUST_TELL_HOST, it is *not* buggy but it means we can not
do munlock. This applies to current windows drivers.
If driver *did* ack MUST_TELL_HOST, we can munlock
and mlock back on leak.
Seems useful, driver support is already there,
so removing the MUST_TELL_HOST bit seems like a bad idea.

-- 
MST

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-07 10:53                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-07 10:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: fes, aarcange, riel, yvugenfi, kvm, linux-kernel, mikew, yinghan,
	virtualization

On Fri, Sep 07, 2012 at 11:27:42AM +0200, Paolo Bonzini wrote:
> Il 07/09/2012 08:39, Rusty Russell ha scritto:
> >> > So it looks like a bug: we should teach driver to tell host first on leak?
> >> > Yan, Vadim, can you comment please?
> >> >
> >> > Also if true, looks like this bit will be useful to detect a fixed driver on
> >> > the hypervisor side - to avoid unmapping such pages? Rusty what do you
> >> > think?
> > So, feature is unimplemented in qemu, and broken in drivers.  I starting
> > to share Paolo's dislike of it.
> > 
> > Don't understand why we'd care about fixed drivers though, if we remove
> > the feature bit....
> 
> Hmm, Michael has a point here.  Basically, the Windows driver is using
> silent deflate, but not telling the host (yet) about it.  So, we must
> assume that a driver that does not negotiate
> VIRTIO_BALLOON_F_MUST_TELL_HOST _will_ use silent deflate.
> 
> Here's a way to proceed.
> 
> We add VIRTIO_BALLOON_F_SILENT_DEFLATE, which is negotiated normally.
> If not available, at worst the guest driver may refuse to start, or
> revert to using the deflateq.
> 
> We rename VIRTIO_BALLOON_F_MUST_TELL_HOST to WILL_TELL_HOST, since
> that's how it's being used.  Now for the device there are three cases:
> 
> - does not support silent deflate at all: it should always propose
>   VIRTIO_BALLOON_F_WILL_TELL_HOST; if the (bad) driver does not
>   negotiate it, the device must assume that the guest will use silent
>   deflate, and fail to start the guest if the device does not support
>   silent deflate.
> 
> - optionally supports silent deflate: it should always propose
>   VIRTIO_BALLOON_F_WILL_TELL_HOST; if the (bad) driver does not
>   negotiate it, the device must assume that the guest will use silent
>   deflate
> 
> - always supports silent deflate: does not need to do anything,
>   current behavior works fine.  But the driver might as well propose
>   VIRTIO_BALLOON_F_WILL_TELL_HOST, so that migration works fine.  (This
>   is a hardware change, so it must be versioned, yadda yadda).
> 
> I can prepare a spec patch for this.
> 
> 
> BTW, since we have in the archives an example of using silent deflate,
> here is an example of non-silent deflate.  It may help understanding the
> above with an actual example of a device.  Suppose a guest is using PCI
> passthrough, so it has all memory pinned.
> 
> - If the guest will _not_ use silent deflate, we can unlock memory on
> inflate and lock it back on deflate.  (The question is what to do if
> locking fail; left for when someone actually implements this thing).
> 
> - If the guest will use silent deflate, we cannot do that.
> 
> So this is the second case above.  The device must propose
> VIRTIO_BALLOON_F_WILL_TELL_HOST.  Then:
> 
> - if the guest negotiates VIRTIO_BALLOON_F_SILENT_DEFLATE,
>   we cannot do the munlock/mlock
> 
> - if the guest negotiates VIRTIO_BALLOON_F_WILL_TELL_HOST,
>   we can do the munlock/mlock
> 
> - if the guest does not negotiate either, the driver is buggy
>   and we cannot do the munlock/mlock
> 
> Paolo


Let us start with what is broken currently. Looking at
it very closely, I think the answer is nothing.
Even migration in qemu is not broken as you claimed initially.

Next, consider the interface proposed here. You defacto declare
all existing drivers buggy. This is a wrong thing to do.
You also use two feature bits for a single simple thing,
this is inelegant.

Last, let us consider how existing feature can be used
in the hypervisor. If driver did not ack
MUST_TELL_HOST, it is *not* buggy but it means we can not
do munlock. This applies to current windows drivers.
If driver *did* ack MUST_TELL_HOST, we can munlock
and mlock back on leak.
Seems useful, driver support is already there,
so removing the MUST_TELL_HOST bit seems like a bad idea.

-- 
MST

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-07 10:53                       ` Michael S. Tsirkin
@ 2012-09-07 11:20                         ` Paolo Bonzini
  -1 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-07 11:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Rusty Russell, fes, aarcange, riel, kvm, linux-kernel, mikew,
	yinghan, virtualization, yvugenfi, vrozenfe

Il 07/09/2012 12:53, Michael S. Tsirkin ha scritto:
> Let us start with what is broken currently. Looking at
> it very closely, I think the answer is nothing.
> Even migration in qemu is not broken as you claimed initially.

Correct, migration would be broken as soon as QEMU starts using
MUST_TELL_HOST.  I'm trying to think ahead, since we have many ideas
floating around on the implementation of ballooning.

If you implement the mlock/munlock trick, you must start using
MUST_TELL_HOST in QEMU to advertise it to guests, and migration breaks.

> Next, consider the interface proposed here. You defacto declare
> all existing drivers buggy.

No, only Windows (and it is buggy, it calls tell_host last).  Linux and
BSD drivers do negotiate MUST_TELL_HOST, and are not buggy.

> This is a wrong thing to do.
> You also use two feature bits for a single simple thing,
> this is inelegant.

True, but the choice is:

1) add a once-only hack to QEMU that fixes migration of
VIRTIO_BALLOON_F_MUST_TELL_HOST;

2) always advertise VIRTIO_BALLOON_F_MUST_TELL_HOST.  If you do this,
guests cannot use anymore silent deflate, which is a regression.

3) use two bits.  One tells the device that the driver supports chatty
deflate; one tells the driver that the device supports silent deflate.

So in the end you do use two feature bits for two different things.
Plus, both feature bits are "positive" and I'm happy.

> Last, let us consider how existing feature can be used
> in the hypervisor. If driver did not ack
> MUST_TELL_HOST, it is *not* buggy but it means we can not
> do munlock. This applies to current windows drivers.
> If driver *did* ack MUST_TELL_HOST, we can munlock
> and mlock back on leak.
> Seems useful, driver support is already there,
> so removing the MUST_TELL_HOST bit seems like a bad idea.

Indeed, repurposing MUST_TELL_HOST to WILL_TELL_HOST is better than
killing it.

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-07 11:20                         ` Paolo Bonzini
  0 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-07 11:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: fes, aarcange, riel, yvugenfi, kvm, linux-kernel, mikew, yinghan,
	virtualization

Il 07/09/2012 12:53, Michael S. Tsirkin ha scritto:
> Let us start with what is broken currently. Looking at
> it very closely, I think the answer is nothing.
> Even migration in qemu is not broken as you claimed initially.

Correct, migration would be broken as soon as QEMU starts using
MUST_TELL_HOST.  I'm trying to think ahead, since we have many ideas
floating around on the implementation of ballooning.

If you implement the mlock/munlock trick, you must start using
MUST_TELL_HOST in QEMU to advertise it to guests, and migration breaks.

> Next, consider the interface proposed here. You defacto declare
> all existing drivers buggy.

No, only Windows (and it is buggy, it calls tell_host last).  Linux and
BSD drivers do negotiate MUST_TELL_HOST, and are not buggy.

> This is a wrong thing to do.
> You also use two feature bits for a single simple thing,
> this is inelegant.

True, but the choice is:

1) add a once-only hack to QEMU that fixes migration of
VIRTIO_BALLOON_F_MUST_TELL_HOST;

2) always advertise VIRTIO_BALLOON_F_MUST_TELL_HOST.  If you do this,
guests cannot use anymore silent deflate, which is a regression.

3) use two bits.  One tells the device that the driver supports chatty
deflate; one tells the driver that the device supports silent deflate.

So in the end you do use two feature bits for two different things.
Plus, both feature bits are "positive" and I'm happy.

> Last, let us consider how existing feature can be used
> in the hypervisor. If driver did not ack
> MUST_TELL_HOST, it is *not* buggy but it means we can not
> do munlock. This applies to current windows drivers.
> If driver *did* ack MUST_TELL_HOST, we can munlock
> and mlock back on leak.
> Seems useful, driver support is already there,
> so removing the MUST_TELL_HOST bit seems like a bad idea.

Indeed, repurposing MUST_TELL_HOST to WILL_TELL_HOST is better than
killing it.

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-07 11:20                         ` Paolo Bonzini
@ 2012-09-07 12:17                           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-07 12:17 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Rusty Russell, fes, aarcange, riel, kvm, linux-kernel, mikew,
	yinghan, virtualization, yvugenfi, vrozenfe

On Fri, Sep 07, 2012 at 01:20:57PM +0200, Paolo Bonzini wrote:
> Il 07/09/2012 12:53, Michael S. Tsirkin ha scritto:
> > Let us start with what is broken currently. Looking at
> > it very closely, I think the answer is nothing.
> > Even migration in qemu is not broken as you claimed initially.
> 
> Correct, migration would be broken as soon as QEMU starts using
> MUST_TELL_HOST.  I'm trying to think ahead, since we have many ideas
> floating around on the implementation of ballooning.
> 
> If you implement the mlock/munlock trick, you must start using
> MUST_TELL_HOST in QEMU to advertise it to guests, and migration breaks.

Migration does not break.

Since I wrote this code in qemu let me explain what is going on.

qemu requires that local and remote side are started with
same feature bits.
To support cross version migration, code in hw/pc_piix.c
disables features if you require migration from/to old qemu.

At some point I added a sanity check:
if we get guest features we know that any bit
set there must be set in host features.
Yes, this catches some user mistakes.

This was never intended as a compatibility guarantee.
User is still required to start qemu such
that host features match exactly, anything else
can lead to failures some of them hard to debug.

Here is a simple example:

1. guest reads host features
2. guest is migrated - check passes since no features are acked
3. guest acks features -> failure

This applies to any feature. Nothing special with this one.

Yes, we can if we want to make this more robust
against user errors, e.g. by migrating host feature
bits. Patches welcome. If we do it will help all
features, not just this one.


> > Next, consider the interface proposed here. You defacto declare
> > all existing drivers buggy.
> 
> No, only Windows (and it is buggy, it calls tell_host last).

It is not buggy. It does not ack MUST_TELL_HOST. So it is free to tell
host at any point, it behaves exactly
to spec. You can not retroactively declare drivers buggy like that.

>  Linux and
> BSD drivers do negotiate MUST_TELL_HOST, and are not buggy.
> 
> > This is a wrong thing to do.
> > You also use two feature bits for a single simple thing,
> > this is inelegant.
> 
> True, but the choice is:
> 
> 1) add a once-only hack to QEMU that fixes migration of
> VIRTIO_BALLOON_F_MUST_TELL_HOST;
> 
> 2) always advertise VIRTIO_BALLOON_F_MUST_TELL_HOST.  If you do this,
> guests cannot use anymore silent deflate, which is a regression.
> 
> 3) use two bits.  One tells the device that the driver supports chatty
> deflate; one tells the driver that the device supports silent deflate.

The right thing to do is either
4. realize we can not address all user errors, so no real issue
5. address this class of user errors by migrating host features

> So in the end you do use two feature bits for two different things.
> Plus, both feature bits are "positive" and I'm happy.

I am not happy.
We lose compatibility with all existing drivers
so it will take years until the feature is actually
useful.

> > Last, let us consider how existing feature can be used
> > in the hypervisor. If driver did not ack
> > MUST_TELL_HOST, it is *not* buggy but it means we can not
> > do munlock. This applies to current windows drivers.
> > If driver *did* ack MUST_TELL_HOST, we can munlock
> > and mlock back on leak.
> > Seems useful, driver support is already there,
> > so removing the MUST_TELL_HOST bit seems like a bad idea.
> 
> Indeed, repurposing MUST_TELL_HOST to WILL_TELL_HOST is better than
> killing it.
> 
> Paolo

Is this just a matter of naming? Same functionality:
driver that acks this bit will tell host first,
driver that does not will not?

If yes that is fine.

-- 
MST

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-07 12:17                           ` Michael S. Tsirkin
  0 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-07 12:17 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: fes, aarcange, riel, yvugenfi, kvm, linux-kernel, mikew, yinghan,
	virtualization

On Fri, Sep 07, 2012 at 01:20:57PM +0200, Paolo Bonzini wrote:
> Il 07/09/2012 12:53, Michael S. Tsirkin ha scritto:
> > Let us start with what is broken currently. Looking at
> > it very closely, I think the answer is nothing.
> > Even migration in qemu is not broken as you claimed initially.
> 
> Correct, migration would be broken as soon as QEMU starts using
> MUST_TELL_HOST.  I'm trying to think ahead, since we have many ideas
> floating around on the implementation of ballooning.
> 
> If you implement the mlock/munlock trick, you must start using
> MUST_TELL_HOST in QEMU to advertise it to guests, and migration breaks.

Migration does not break.

Since I wrote this code in qemu let me explain what is going on.

qemu requires that local and remote side are started with
same feature bits.
To support cross version migration, code in hw/pc_piix.c
disables features if you require migration from/to old qemu.

At some point I added a sanity check:
if we get guest features we know that any bit
set there must be set in host features.
Yes, this catches some user mistakes.

This was never intended as a compatibility guarantee.
User is still required to start qemu such
that host features match exactly, anything else
can lead to failures some of them hard to debug.

Here is a simple example:

1. guest reads host features
2. guest is migrated - check passes since no features are acked
3. guest acks features -> failure

This applies to any feature. Nothing special with this one.

Yes, we can if we want to make this more robust
against user errors, e.g. by migrating host feature
bits. Patches welcome. If we do it will help all
features, not just this one.


> > Next, consider the interface proposed here. You defacto declare
> > all existing drivers buggy.
> 
> No, only Windows (and it is buggy, it calls tell_host last).

It is not buggy. It does not ack MUST_TELL_HOST. So it is free to tell
host at any point, it behaves exactly
to spec. You can not retroactively declare drivers buggy like that.

>  Linux and
> BSD drivers do negotiate MUST_TELL_HOST, and are not buggy.
> 
> > This is a wrong thing to do.
> > You also use two feature bits for a single simple thing,
> > this is inelegant.
> 
> True, but the choice is:
> 
> 1) add a once-only hack to QEMU that fixes migration of
> VIRTIO_BALLOON_F_MUST_TELL_HOST;
> 
> 2) always advertise VIRTIO_BALLOON_F_MUST_TELL_HOST.  If you do this,
> guests cannot use anymore silent deflate, which is a regression.
> 
> 3) use two bits.  One tells the device that the driver supports chatty
> deflate; one tells the driver that the device supports silent deflate.

The right thing to do is either
4. realize we can not address all user errors, so no real issue
5. address this class of user errors by migrating host features

> So in the end you do use two feature bits for two different things.
> Plus, both feature bits are "positive" and I'm happy.

I am not happy.
We lose compatibility with all existing drivers
so it will take years until the feature is actually
useful.

> > Last, let us consider how existing feature can be used
> > in the hypervisor. If driver did not ack
> > MUST_TELL_HOST, it is *not* buggy but it means we can not
> > do munlock. This applies to current windows drivers.
> > If driver *did* ack MUST_TELL_HOST, we can munlock
> > and mlock back on leak.
> > Seems useful, driver support is already there,
> > so removing the MUST_TELL_HOST bit seems like a bad idea.
> 
> Indeed, repurposing MUST_TELL_HOST to WILL_TELL_HOST is better than
> killing it.
> 
> Paolo

Is this just a matter of naming? Same functionality:
driver that acks this bit will tell host first,
driver that does not will not?

If yes that is fine.

-- 
MST

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-07 12:17                           ` Michael S. Tsirkin
@ 2012-09-07 12:22                             ` Paolo Bonzini
  -1 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-07 12:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Rusty Russell, fes, aarcange, riel, kvm, linux-kernel, mikew,
	yinghan, virtualization, yvugenfi, vrozenfe

Il 07/09/2012 14:17, Michael S. Tsirkin ha scritto:
>>> Next, consider the interface proposed here. You defacto declare
>>> all existing drivers buggy.
>>
>> No, only Windows (and it is buggy, it calls tell_host last).
> 
> It is not buggy. It does not ack MUST_TELL_HOST. So it is free to tell
> host at any point, it behaves exactly
> to spec. You can not retroactively declare drivers buggy like that.

Well, according to your reading of the spec.

In the spec I'm reading "Host must be told before pages from the balloon
are used".  Doesn't say anything about the guest.

Now, indeed a very free interpretation is "Guest will tell host before
pages from the balloon are used".  It turns out that it's exactly what
guests have been doing, hence that's exactly what I'm proposing now:
rename the feature to VIRTIO_BALLOON_F_WILL_TELL_HOST.

>> True, but the choice is:
>>
>> 1) add a once-only hack to QEMU that fixes migration of
>> VIRTIO_BALLOON_F_MUST_TELL_HOST;
>>
>> 2) always advertise VIRTIO_BALLOON_F_MUST_TELL_HOST.  If you do this,
>> guests cannot use anymore silent deflate, which is a regression.
>>
>> 3) use two bits.  One tells the device that the driver supports chatty
>> deflate; one tells the driver that the device supports silent deflate.
> 
> The right thing to do is either
> 4. realize we can not address all user errors, so no real issue
> 5. address this class of user errors by migrating host features
> 
>> So in the end you do use two feature bits for two different things.
>> Plus, both feature bits are "positive" and I'm happy.
> 
> I am not happy.
> We lose compatibility with all existing drivers

How so?

> so it will take years until the feature is actually
> useful.

No, we don't!  Windows guests will just not be able to use munlock/mlock
until the driver is fixed.  Which will probably happen before someone
writes the munlock/mlock code.

> Is this just a matter of naming? Same functionality:
> driver that acks this bit will tell host first,
> driver that does not will not?
> 
> If yes that is fine.

Yes, that part we agree on I think.  We disagree that (after the rename)
QEMU should start always proposing VIRTIO_BALLOON_F_WILL_TELL_HOST.

_Plus_ adding the new feature bit, which is needed to actually tell the
driver that the host supports the silent deflate.  Spec patch on the way.

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-07 12:22                             ` Paolo Bonzini
  0 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-07 12:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: fes, aarcange, riel, yvugenfi, kvm, linux-kernel, mikew, yinghan,
	virtualization

Il 07/09/2012 14:17, Michael S. Tsirkin ha scritto:
>>> Next, consider the interface proposed here. You defacto declare
>>> all existing drivers buggy.
>>
>> No, only Windows (and it is buggy, it calls tell_host last).
> 
> It is not buggy. It does not ack MUST_TELL_HOST. So it is free to tell
> host at any point, it behaves exactly
> to spec. You can not retroactively declare drivers buggy like that.

Well, according to your reading of the spec.

In the spec I'm reading "Host must be told before pages from the balloon
are used".  Doesn't say anything about the guest.

Now, indeed a very free interpretation is "Guest will tell host before
pages from the balloon are used".  It turns out that it's exactly what
guests have been doing, hence that's exactly what I'm proposing now:
rename the feature to VIRTIO_BALLOON_F_WILL_TELL_HOST.

>> True, but the choice is:
>>
>> 1) add a once-only hack to QEMU that fixes migration of
>> VIRTIO_BALLOON_F_MUST_TELL_HOST;
>>
>> 2) always advertise VIRTIO_BALLOON_F_MUST_TELL_HOST.  If you do this,
>> guests cannot use anymore silent deflate, which is a regression.
>>
>> 3) use two bits.  One tells the device that the driver supports chatty
>> deflate; one tells the driver that the device supports silent deflate.
> 
> The right thing to do is either
> 4. realize we can not address all user errors, so no real issue
> 5. address this class of user errors by migrating host features
> 
>> So in the end you do use two feature bits for two different things.
>> Plus, both feature bits are "positive" and I'm happy.
> 
> I am not happy.
> We lose compatibility with all existing drivers

How so?

> so it will take years until the feature is actually
> useful.

No, we don't!  Windows guests will just not be able to use munlock/mlock
until the driver is fixed.  Which will probably happen before someone
writes the munlock/mlock code.

> Is this just a matter of naming? Same functionality:
> driver that acks this bit will tell host first,
> driver that does not will not?
> 
> If yes that is fine.

Yes, that part we agree on I think.  We disagree that (after the rename)
QEMU should start always proposing VIRTIO_BALLOON_F_WILL_TELL_HOST.

_Plus_ adding the new feature bit, which is needed to actually tell the
driver that the host supports the silent deflate.  Spec patch on the way.

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-07 12:22                             ` Paolo Bonzini
@ 2012-09-07 12:44                               ` Michael S. Tsirkin
  -1 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-07 12:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Rusty Russell, fes, aarcange, riel, kvm, linux-kernel, mikew,
	yinghan, virtualization, yvugenfi, vrozenfe

On Fri, Sep 07, 2012 at 02:22:47PM +0200, Paolo Bonzini wrote:
> Il 07/09/2012 14:17, Michael S. Tsirkin ha scritto:
> >>> Next, consider the interface proposed here. You defacto declare
> >>> all existing drivers buggy.
> >>
> >> No, only Windows (and it is buggy, it calls tell_host last).
> > 
> > It is not buggy. It does not ack MUST_TELL_HOST. So it is free to tell
> > host at any point, it behaves exactly
> > to spec. You can not retroactively declare drivers buggy like that.
> 
> Well, according to your reading of the spec.
> 
> In the spec I'm reading "Host must be told before pages from the balloon
> are used".  Doesn't say anything about the guest.

No? How is host told then? By divine force?

> Now, indeed a very free interpretation is "Guest will tell host before
> pages from the balloon are used".  It turns out that it's exactly what
> guests have been doing, hence that's exactly what I'm proposing now:
> rename the feature to VIRTIO_BALLOON_F_WILL_TELL_HOST.

Rename is fine.

> >> True, but the choice is:
> >>
> >> 1) add a once-only hack to QEMU that fixes migration of
> >> VIRTIO_BALLOON_F_MUST_TELL_HOST;
> >>
> >> 2) always advertise VIRTIO_BALLOON_F_MUST_TELL_HOST.  If you do this,
> >> guests cannot use anymore silent deflate, which is a regression.
> >>
> >> 3) use two bits.  One tells the device that the driver supports chatty
> >> deflate; one tells the driver that the device supports silent deflate.
> > 
> > The right thing to do is either
> > 4. realize we can not address all user errors, so no real issue
> > 5. address this class of user errors by migrating host features
> > 
> >> So in the end you do use two feature bits for two different things.
> >> Plus, both feature bits are "positive" and I'm happy.
> > 
> > I am not happy.
> > We lose compatibility with all existing drivers
> 
> How so?
> 
> > so it will take years until the feature is actually
> > useful.
> 
> No, we don't!  Windows guests will just not be able to use munlock/mlock
> until the driver is fixed.  Which will probably happen before someone
> writes the munlock/mlock code.

If the only change is rename then ofcourse things keep working.
I don't care about the name it is up to Rusty.

> > Is this just a matter of naming? Same functionality:
> > driver that acks this bit will tell host first,
> > driver that does not will not?
> > 
> > If yes that is fine.
> 
> Yes, that part we agree on I think.  We disagree that (after the rename)
> QEMU should start always proposing VIRTIO_BALLOON_F_WILL_TELL_HOST.

Not always. It must be off if compatibility with old qemu is disabled.

> _Plus_ adding the new feature bit, which is needed to actually tell the

This part I do not get.  What is silent deflate, why is it useful
and what it has to do with what we are discussing here?

> driver that the host supports the silent deflate.
>  Spec patch on the way.
> 
> Paolo

Hang on.
Can we please talk about motivation? These patches which come
without motivation are very hard to review.


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-07 12:44                               ` Michael S. Tsirkin
  0 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-07 12:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: fes, aarcange, riel, yvugenfi, kvm, linux-kernel, mikew, yinghan,
	virtualization

On Fri, Sep 07, 2012 at 02:22:47PM +0200, Paolo Bonzini wrote:
> Il 07/09/2012 14:17, Michael S. Tsirkin ha scritto:
> >>> Next, consider the interface proposed here. You defacto declare
> >>> all existing drivers buggy.
> >>
> >> No, only Windows (and it is buggy, it calls tell_host last).
> > 
> > It is not buggy. It does not ack MUST_TELL_HOST. So it is free to tell
> > host at any point, it behaves exactly
> > to spec. You can not retroactively declare drivers buggy like that.
> 
> Well, according to your reading of the spec.
> 
> In the spec I'm reading "Host must be told before pages from the balloon
> are used".  Doesn't say anything about the guest.

No? How is host told then? By divine force?

> Now, indeed a very free interpretation is "Guest will tell host before
> pages from the balloon are used".  It turns out that it's exactly what
> guests have been doing, hence that's exactly what I'm proposing now:
> rename the feature to VIRTIO_BALLOON_F_WILL_TELL_HOST.

Rename is fine.

> >> True, but the choice is:
> >>
> >> 1) add a once-only hack to QEMU that fixes migration of
> >> VIRTIO_BALLOON_F_MUST_TELL_HOST;
> >>
> >> 2) always advertise VIRTIO_BALLOON_F_MUST_TELL_HOST.  If you do this,
> >> guests cannot use anymore silent deflate, which is a regression.
> >>
> >> 3) use two bits.  One tells the device that the driver supports chatty
> >> deflate; one tells the driver that the device supports silent deflate.
> > 
> > The right thing to do is either
> > 4. realize we can not address all user errors, so no real issue
> > 5. address this class of user errors by migrating host features
> > 
> >> So in the end you do use two feature bits for two different things.
> >> Plus, both feature bits are "positive" and I'm happy.
> > 
> > I am not happy.
> > We lose compatibility with all existing drivers
> 
> How so?
> 
> > so it will take years until the feature is actually
> > useful.
> 
> No, we don't!  Windows guests will just not be able to use munlock/mlock
> until the driver is fixed.  Which will probably happen before someone
> writes the munlock/mlock code.

If the only change is rename then ofcourse things keep working.
I don't care about the name it is up to Rusty.

> > Is this just a matter of naming? Same functionality:
> > driver that acks this bit will tell host first,
> > driver that does not will not?
> > 
> > If yes that is fine.
> 
> Yes, that part we agree on I think.  We disagree that (after the rename)
> QEMU should start always proposing VIRTIO_BALLOON_F_WILL_TELL_HOST.

Not always. It must be off if compatibility with old qemu is disabled.

> _Plus_ adding the new feature bit, which is needed to actually tell the

This part I do not get.  What is silent deflate, why is it useful
and what it has to do with what we are discussing here?

> driver that the host supports the silent deflate.
>  Spec patch on the way.
> 
> Paolo

Hang on.
Can we please talk about motivation? These patches which come
without motivation are very hard to review.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-07 12:44                               ` Michael S. Tsirkin
@ 2012-09-07 14:04                                 ` Paolo Bonzini
  -1 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-07 14:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Rusty Russell, fes, aarcange, riel, kvm, linux-kernel, mikew,
	yinghan, virtualization, yvugenfi, vrozenfe

Il 07/09/2012 14:44, Michael S. Tsirkin ha scritto:
>> Well, according to your reading of the spec.
>>
>> In the spec I'm reading "Host must be told before pages from the balloon
>> are used".  Doesn't say anything about the guest.
> 
> No? How is host told then? By divine force?

For a simple madvise-based implementation such as the one in QEMU, the
host doesn't need to be told about anything (except periodic updating of
the "actual" field, or the statsq).

The guest doesn't need at all to use the deflateq in fact.

>> Now, indeed a very free interpretation is "Guest will tell host before
>> pages from the balloon are used".  It turns out that it's exactly what
>> guests have been doing, hence that's exactly what I'm proposing now:
>> rename the feature to VIRTIO_BALLOON_F_WILL_TELL_HOST.
> 
> Rename is fine.

Cool.

>> Yes, that part we agree on I think.  We disagree that (after the rename)
>> QEMU should start always proposing VIRTIO_BALLOON_F_WILL_TELL_HOST.
> 
> Not always. It must be off if compatibility with old qemu is disabled.

Yes, of course.

>> _Plus_ adding the new feature bit, which is needed to actually tell the
> 
> This part I do not get.  What is silent deflate, why is it useful
> and what it has to do with what we are discussing here?

Silent deflate is deflating just by using the page, without using the
deflateq at all.  So it can be done even from GFP_ATOMIC context.

But knowing that the guest will _not_ deflate silently also benefits the
host. This is the cause of unpinning ballooned pages and pinning them
again upon deflation. This allows cooperative memory overcommit even if
the guests' memory is pinned, similar to what Xen did before it
supported overcommit.  This is the second feature bit.

There are three cases:

* guest will never do silent deflation: current Linux driver.  Host may
do munlock/mlock dance.  Negotiates VIRTIO_BALLOON_F_WILL_TELL_HOST
only, features = VIRTIO_BALLOON_F_WILL_TELL_HOST.

* guest will always do silent deflation: current Windows driver.
Negotiates nothing, or if it cares it can negotiate
VIRTIO_BALLOON_F_SILENT_DEFLATE.  Host mustn't do munlock/mlock dance.

* guest may do silent deflation if available: combo of Linux driver +
Frank's driver.  Negotiates both features, looks at
VIRTIO_BALLOON_F_SILENT_DEFLATE host feature to decide how to behave:

  ** If PCI passthrough, the host will not negotiate
     VIRTIO_BALLOON_F_SILENT_DEFLATE.  The driver will behave as the
     current Linux driver, the host can do the munlock/mlock dance.

  ** If no PCI passthrough, the host will negotiate
     VIRTIO_BALLOON_F_SILENT_DEFLATE, and the driver can balloon more
     aggressively and not use the deflateq.

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-07 14:04                                 ` Paolo Bonzini
  0 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-07 14:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: fes, aarcange, riel, yvugenfi, kvm, linux-kernel, mikew, yinghan,
	virtualization

Il 07/09/2012 14:44, Michael S. Tsirkin ha scritto:
>> Well, according to your reading of the spec.
>>
>> In the spec I'm reading "Host must be told before pages from the balloon
>> are used".  Doesn't say anything about the guest.
> 
> No? How is host told then? By divine force?

For a simple madvise-based implementation such as the one in QEMU, the
host doesn't need to be told about anything (except periodic updating of
the "actual" field, or the statsq).

The guest doesn't need at all to use the deflateq in fact.

>> Now, indeed a very free interpretation is "Guest will tell host before
>> pages from the balloon are used".  It turns out that it's exactly what
>> guests have been doing, hence that's exactly what I'm proposing now:
>> rename the feature to VIRTIO_BALLOON_F_WILL_TELL_HOST.
> 
> Rename is fine.

Cool.

>> Yes, that part we agree on I think.  We disagree that (after the rename)
>> QEMU should start always proposing VIRTIO_BALLOON_F_WILL_TELL_HOST.
> 
> Not always. It must be off if compatibility with old qemu is disabled.

Yes, of course.

>> _Plus_ adding the new feature bit, which is needed to actually tell the
> 
> This part I do not get.  What is silent deflate, why is it useful
> and what it has to do with what we are discussing here?

Silent deflate is deflating just by using the page, without using the
deflateq at all.  So it can be done even from GFP_ATOMIC context.

But knowing that the guest will _not_ deflate silently also benefits the
host. This is the cause of unpinning ballooned pages and pinning them
again upon deflation. This allows cooperative memory overcommit even if
the guests' memory is pinned, similar to what Xen did before it
supported overcommit.  This is the second feature bit.

There are three cases:

* guest will never do silent deflation: current Linux driver.  Host may
do munlock/mlock dance.  Negotiates VIRTIO_BALLOON_F_WILL_TELL_HOST
only, features = VIRTIO_BALLOON_F_WILL_TELL_HOST.

* guest will always do silent deflation: current Windows driver.
Negotiates nothing, or if it cares it can negotiate
VIRTIO_BALLOON_F_SILENT_DEFLATE.  Host mustn't do munlock/mlock dance.

* guest may do silent deflation if available: combo of Linux driver +
Frank's driver.  Negotiates both features, looks at
VIRTIO_BALLOON_F_SILENT_DEFLATE host feature to decide how to behave:

  ** If PCI passthrough, the host will not negotiate
     VIRTIO_BALLOON_F_SILENT_DEFLATE.  The driver will behave as the
     current Linux driver, the host can do the munlock/mlock dance.

  ** If no PCI passthrough, the host will negotiate
     VIRTIO_BALLOON_F_SILENT_DEFLATE, and the driver can balloon more
     aggressively and not use the deflateq.

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-07 14:04                                 ` Paolo Bonzini
@ 2012-09-07 14:25                                   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-07 14:25 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Rusty Russell, fes, aarcange, riel, kvm, linux-kernel, mikew,
	yinghan, virtualization, yvugenfi, vrozenfe

On Fri, Sep 07, 2012 at 04:04:13PM +0200, Paolo Bonzini wrote:
> Il 07/09/2012 14:44, Michael S. Tsirkin ha scritto:
> >> Well, according to your reading of the spec.
> >>
> >> In the spec I'm reading "Host must be told before pages from the balloon
> >> are used".  Doesn't say anything about the guest.
> > 
> > No? How is host told then? By divine force?
> 
> For a simple madvise-based implementation such as the one in QEMU, the
> host doesn't need to be told about anything (except periodic updating of
> the "actual" field, or the statsq).
> 
> The guest doesn't need at all to use the deflateq in fact.
> 
> >> Now, indeed a very free interpretation is "Guest will tell host before
> >> pages from the balloon are used".  It turns out that it's exactly what
> >> guests have been doing, hence that's exactly what I'm proposing now:
> >> rename the feature to VIRTIO_BALLOON_F_WILL_TELL_HOST.
> > 
> > Rename is fine.
> 
> Cool.
> 
> >> Yes, that part we agree on I think.  We disagree that (after the rename)
> >> QEMU should start always proposing VIRTIO_BALLOON_F_WILL_TELL_HOST.
> > 
> > Not always. It must be off if compatibility with old qemu is disabled.
> 
> Yes, of course.
> 
> >> _Plus_ adding the new feature bit, which is needed to actually tell the
> > 
> > This part I do not get.  What is silent deflate, why is it useful
> > and what it has to do with what we are discussing here?
> 
> Silent deflate is deflating just by using the page, without using the
> deflateq at all.  So it can be done even from GFP_ATOMIC context.

BTW I don't see a need to avoid deflateq.
Without MUST_TELL_HOST you can just deflate and then
bounce telling the host out to a thread.

> But knowing that the guest will _not_ deflate silently also benefits the
> host. This is the cause of unpinning ballooned pages and pinning them
> again upon deflation. This allows cooperative memory overcommit even if
> the guests' memory is pinned, similar to what Xen did before it
> supported overcommit.  This is the second feature bit.

This is the MUST_TELL_HOST.

> There are three cases:
> 
> * guest will never do silent deflation: current Linux driver.  Host may
> do munlock/mlock dance.  Negotiates VIRTIO_BALLOON_F_WILL_TELL_HOST
> only, features = VIRTIO_BALLOON_F_WILL_TELL_HOST.
> 
> * guest will always do silent deflation: current Windows driver.

Not exactly. It is not silent. It tells host
just after use.

> Negotiates nothing, or if it cares it can negotiate
> VIRTIO_BALLOON_F_SILENT_DEFLATE.  Host mustn't do munlock/mlock dance.
>
> * guest may do silent deflation if available: combo of Linux driver +
> Frank's driver.  Negotiates both features, looks at
> VIRTIO_BALLOON_F_SILENT_DEFLATE host feature to decide how to behave:
> 
>   ** If PCI passthrough, the host will not negotiate
>      VIRTIO_BALLOON_F_SILENT_DEFLATE.  The driver will behave as the
>      current Linux driver, the host can do the munlock/mlock dance.

So this case is just existing interface. OK.

>   ** If no PCI passthrough, the host will negotiate
>      VIRTIO_BALLOON_F_SILENT_DEFLATE, and the driver can balloon more
>      aggressively and not use the deflateq.
> 
> Paolo

This last trickery confuses me.  It just does not make sense to set both
SILENT and TELL: either you are silent or you tell.

What is the benefit of avoiding host notification?
It seems cleaner for the host to know the state.



-- 
MST

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-07 14:25                                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-07 14:25 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: fes, aarcange, riel, yvugenfi, kvm, linux-kernel, mikew, yinghan,
	virtualization

On Fri, Sep 07, 2012 at 04:04:13PM +0200, Paolo Bonzini wrote:
> Il 07/09/2012 14:44, Michael S. Tsirkin ha scritto:
> >> Well, according to your reading of the spec.
> >>
> >> In the spec I'm reading "Host must be told before pages from the balloon
> >> are used".  Doesn't say anything about the guest.
> > 
> > No? How is host told then? By divine force?
> 
> For a simple madvise-based implementation such as the one in QEMU, the
> host doesn't need to be told about anything (except periodic updating of
> the "actual" field, or the statsq).
> 
> The guest doesn't need at all to use the deflateq in fact.
> 
> >> Now, indeed a very free interpretation is "Guest will tell host before
> >> pages from the balloon are used".  It turns out that it's exactly what
> >> guests have been doing, hence that's exactly what I'm proposing now:
> >> rename the feature to VIRTIO_BALLOON_F_WILL_TELL_HOST.
> > 
> > Rename is fine.
> 
> Cool.
> 
> >> Yes, that part we agree on I think.  We disagree that (after the rename)
> >> QEMU should start always proposing VIRTIO_BALLOON_F_WILL_TELL_HOST.
> > 
> > Not always. It must be off if compatibility with old qemu is disabled.
> 
> Yes, of course.
> 
> >> _Plus_ adding the new feature bit, which is needed to actually tell the
> > 
> > This part I do not get.  What is silent deflate, why is it useful
> > and what it has to do with what we are discussing here?
> 
> Silent deflate is deflating just by using the page, without using the
> deflateq at all.  So it can be done even from GFP_ATOMIC context.

BTW I don't see a need to avoid deflateq.
Without MUST_TELL_HOST you can just deflate and then
bounce telling the host out to a thread.

> But knowing that the guest will _not_ deflate silently also benefits the
> host. This is the cause of unpinning ballooned pages and pinning them
> again upon deflation. This allows cooperative memory overcommit even if
> the guests' memory is pinned, similar to what Xen did before it
> supported overcommit.  This is the second feature bit.

This is the MUST_TELL_HOST.

> There are three cases:
> 
> * guest will never do silent deflation: current Linux driver.  Host may
> do munlock/mlock dance.  Negotiates VIRTIO_BALLOON_F_WILL_TELL_HOST
> only, features = VIRTIO_BALLOON_F_WILL_TELL_HOST.
> 
> * guest will always do silent deflation: current Windows driver.

Not exactly. It is not silent. It tells host
just after use.

> Negotiates nothing, or if it cares it can negotiate
> VIRTIO_BALLOON_F_SILENT_DEFLATE.  Host mustn't do munlock/mlock dance.
>
> * guest may do silent deflation if available: combo of Linux driver +
> Frank's driver.  Negotiates both features, looks at
> VIRTIO_BALLOON_F_SILENT_DEFLATE host feature to decide how to behave:
> 
>   ** If PCI passthrough, the host will not negotiate
>      VIRTIO_BALLOON_F_SILENT_DEFLATE.  The driver will behave as the
>      current Linux driver, the host can do the munlock/mlock dance.

So this case is just existing interface. OK.

>   ** If no PCI passthrough, the host will negotiate
>      VIRTIO_BALLOON_F_SILENT_DEFLATE, and the driver can balloon more
>      aggressively and not use the deflateq.
> 
> Paolo

This last trickery confuses me.  It just does not make sense to set both
SILENT and TELL: either you are silent or you tell.

What is the benefit of avoiding host notification?
It seems cleaner for the host to know the state.



-- 
MST

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-07 14:25                                   ` Michael S. Tsirkin
@ 2012-09-07 14:44                                     ` Paolo Bonzini
  -1 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-07 14:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Rusty Russell, fes, aarcange, riel, kvm, linux-kernel, mikew,
	yinghan, virtualization, yvugenfi, vrozenfe

Il 07/09/2012 16:25, Michael S. Tsirkin ha scritto:
>> > Silent deflate is deflating just by using the page, without using the
>> > deflateq at all.  So it can be done even from GFP_ATOMIC context.
> BTW I don't see a need to avoid deflateq.
> Without MUST_TELL_HOST you can just deflate and then
> bounce telling the host out to a thread.

Yes, that's fine too.

>> > But knowing that the guest will _not_ deflate silently also benefits the
>> > host. This is the cause of unpinning ballooned pages and pinning them
>> > again upon deflation. This allows cooperative memory overcommit even if
>> > the guests' memory is pinned, similar to what Xen did before it
>> > supported overcommit.  This is the second feature bit.
> This is the MUST_TELL_HOST.

Almost.  One is "the guest, if really needed, can tell the host of
pages".  If not negotiated, and the host does not support it, the host
must break the guest (e.g. fail to offer any virtqueues).

The other is "the guest, though, would prefer not to do so".  It is
different because the guest can proceed in a fallback mode even if the
host doesn't offer it.

You're interpreting features as something that dictates behavior, but
they're really just advisory.  You could negotiate VIRTIO_BLK_F_TOPOLOGY
and end up never reading the fields; you could negotiate
VIRTIO_NET_F_GUEST_ANNOUNCE and never send a guest announcement.

>> > * guest will always do silent deflation: current Windows driver.
> Not exactly. It is not silent. It tells host
> just after use.

Yeah, but no difference from the POV of the driver.

Delaying or avoiding is the same in the end.  The spec says it well: "In
this case, deflation advice is merely a courtesy".

>> > Negotiates nothing, or if it cares it can negotiate
>> > VIRTIO_BALLOON_F_SILENT_DEFLATE.  Host mustn't do munlock/mlock dance.
>> >
>> > * guest may do silent deflation if available: combo of Linux driver +
>> > Frank's driver.  Negotiates both features, looks at
>> > VIRTIO_BALLOON_F_SILENT_DEFLATE host feature to decide how to behave:
>> > 
>> >   ** If PCI passthrough, the host will not negotiate
>> >      VIRTIO_BALLOON_F_SILENT_DEFLATE.  The driver will behave as the
>> >      current Linux driver, the host can do the munlock/mlock dance.
> So this case is just existing interface. OK.
> 
>> >   ** If no PCI passthrough, the host will negotiate
>> >      VIRTIO_BALLOON_F_SILENT_DEFLATE, and the driver can balloon more
>> >      aggressively and not use the deflateq.
>> > 
> This last trickery confuses me.  It just does not make sense to set both
> SILENT and TELL: either you are silent or you tell.

"I can be chatty if you ask me, but I'd rather be silent if you let me".

TELL is a request of the host to the guest; the guest can agree or not.

SILENT is a request of the guest to the host; the host can agree or not.

> What is the benefit of avoiding host notification?
> It seems cleaner for the host to know the state.

Yeah, if you want to do it you can.

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-07 14:44                                     ` Paolo Bonzini
  0 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-07 14:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: fes, aarcange, riel, yvugenfi, kvm, linux-kernel, mikew, yinghan,
	virtualization

Il 07/09/2012 16:25, Michael S. Tsirkin ha scritto:
>> > Silent deflate is deflating just by using the page, without using the
>> > deflateq at all.  So it can be done even from GFP_ATOMIC context.
> BTW I don't see a need to avoid deflateq.
> Without MUST_TELL_HOST you can just deflate and then
> bounce telling the host out to a thread.

Yes, that's fine too.

>> > But knowing that the guest will _not_ deflate silently also benefits the
>> > host. This is the cause of unpinning ballooned pages and pinning them
>> > again upon deflation. This allows cooperative memory overcommit even if
>> > the guests' memory is pinned, similar to what Xen did before it
>> > supported overcommit.  This is the second feature bit.
> This is the MUST_TELL_HOST.

Almost.  One is "the guest, if really needed, can tell the host of
pages".  If not negotiated, and the host does not support it, the host
must break the guest (e.g. fail to offer any virtqueues).

The other is "the guest, though, would prefer not to do so".  It is
different because the guest can proceed in a fallback mode even if the
host doesn't offer it.

You're interpreting features as something that dictates behavior, but
they're really just advisory.  You could negotiate VIRTIO_BLK_F_TOPOLOGY
and end up never reading the fields; you could negotiate
VIRTIO_NET_F_GUEST_ANNOUNCE and never send a guest announcement.

>> > * guest will always do silent deflation: current Windows driver.
> Not exactly. It is not silent. It tells host
> just after use.

Yeah, but no difference from the POV of the driver.

Delaying or avoiding is the same in the end.  The spec says it well: "In
this case, deflation advice is merely a courtesy".

>> > Negotiates nothing, or if it cares it can negotiate
>> > VIRTIO_BALLOON_F_SILENT_DEFLATE.  Host mustn't do munlock/mlock dance.
>> >
>> > * guest may do silent deflation if available: combo of Linux driver +
>> > Frank's driver.  Negotiates both features, looks at
>> > VIRTIO_BALLOON_F_SILENT_DEFLATE host feature to decide how to behave:
>> > 
>> >   ** If PCI passthrough, the host will not negotiate
>> >      VIRTIO_BALLOON_F_SILENT_DEFLATE.  The driver will behave as the
>> >      current Linux driver, the host can do the munlock/mlock dance.
> So this case is just existing interface. OK.
> 
>> >   ** If no PCI passthrough, the host will negotiate
>> >      VIRTIO_BALLOON_F_SILENT_DEFLATE, and the driver can balloon more
>> >      aggressively and not use the deflateq.
>> > 
> This last trickery confuses me.  It just does not make sense to set both
> SILENT and TELL: either you are silent or you tell.

"I can be chatty if you ask me, but I'd rather be silent if you let me".

TELL is a request of the host to the guest; the guest can agree or not.

SILENT is a request of the guest to the host; the host can agree or not.

> What is the benefit of avoiding host notification?
> It seems cleaner for the host to know the state.

Yeah, if you want to do it you can.

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-07 10:43                     ` Michael S. Tsirkin
@ 2012-09-08  5:06                       ` Rusty Russell
  -1 siblings, 0 replies; 71+ messages in thread
From: Rusty Russell @ 2012-09-08  5:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Paolo Bonzini, fes, aarcange, riel, kvm, linux-kernel, mikew,
	yinghan, virtualization, yvugenfi, vrozenfe

"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Fri, Sep 07, 2012 at 04:09:50PM +0930, Rusty Russell wrote:
>> > So it looks like a bug: we should teach driver to tell host first on leak?
>> > Yan, Vadim, can you comment please?
>> >
>> > Also if true, looks like this bit will be useful to detect a fixed driver on
>> > the hypervisor side - to avoid unmapping such pages? Rusty what do you
>> > think?
>> 
>> So, feature is unimplemented in qemu, and broken in drivers.  I starting
>> to share Paolo's dislike of it.
>
> What is broken in drivers?

Because supporting the feature is *not* optional for a driver.

If the device said MUST_TELL_HOST, it meant that the driver *had* to
tell the host before it touched the page, otherwise Bad Things might
happen.  It was in the original spec precisely to allow devices to
actually *remove* pages.

Noone ever noticed the windows driver didn't support it, because qemu
never requires MUST_TELL_HOST.

So in practice, it's now an optional feature.  Since no device used it
anyway, we're better off discarding it than trying to fix it.

If someone wants an *optional* "tell me first" feature later, that's
easy to add, but I don't see why they'd want to.

> Do we really know there are no hypervisors implementing it?

As much as can be known.  Qemu doesn't, lkvm doesn't.

> As I said above drivers do have support.

Not the windows drivers.  So it's optional, thus removing it will likely
harm noone.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-08  5:06                       ` Rusty Russell
  0 siblings, 0 replies; 71+ messages in thread
From: Rusty Russell @ 2012-09-08  5:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: fes, aarcange, riel, kvm, yvugenfi, linux-kernel, mikew, yinghan,
	Paolo Bonzini, virtualization

"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Fri, Sep 07, 2012 at 04:09:50PM +0930, Rusty Russell wrote:
>> > So it looks like a bug: we should teach driver to tell host first on leak?
>> > Yan, Vadim, can you comment please?
>> >
>> > Also if true, looks like this bit will be useful to detect a fixed driver on
>> > the hypervisor side - to avoid unmapping such pages? Rusty what do you
>> > think?
>> 
>> So, feature is unimplemented in qemu, and broken in drivers.  I starting
>> to share Paolo's dislike of it.
>
> What is broken in drivers?

Because supporting the feature is *not* optional for a driver.

If the device said MUST_TELL_HOST, it meant that the driver *had* to
tell the host before it touched the page, otherwise Bad Things might
happen.  It was in the original spec precisely to allow devices to
actually *remove* pages.

Noone ever noticed the windows driver didn't support it, because qemu
never requires MUST_TELL_HOST.

So in practice, it's now an optional feature.  Since no device used it
anyway, we're better off discarding it than trying to fix it.

If someone wants an *optional* "tell me first" feature later, that's
easy to add, but I don't see why they'd want to.

> Do we really know there are no hypervisors implementing it?

As much as can be known.  Qemu doesn't, lkvm doesn't.

> As I said above drivers do have support.

Not the windows drivers.  So it's optional, thus removing it will likely
harm noone.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-08  5:06                       ` Rusty Russell
  (?)
  (?)
@ 2012-09-08 10:32                       ` Paolo Bonzini
  -1 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-08 10:32 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Michael S. Tsirkin, fes, aarcange, riel, kvm, yvugenfi,
	linux-kernel, mikew, yinghan, virtualization

Il 08/09/2012 07:06, Rusty Russell ha scritto:
>> > Do we really know there are no hypervisors implementing it?
> As much as can be known.  Qemu doesn't, lkvm doesn't.
> 

However, there are cases in which it would be nice to have it.
Repurposing the bit to how it has been used so far (as a guest->host
communication bit, not host->guest) is a better choice.

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-08  5:06                       ` Rusty Russell
  (?)
@ 2012-09-08 10:32                       ` Paolo Bonzini
  -1 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-08 10:32 UTC (permalink / raw)
  To: Rusty Russell
  Cc: fes, aarcange, riel, kvm, Michael S. Tsirkin, yvugenfi,
	linux-kernel, mikew, yinghan, virtualization

Il 08/09/2012 07:06, Rusty Russell ha scritto:
>> > Do we really know there are no hypervisors implementing it?
> As much as can be known.  Qemu doesn't, lkvm doesn't.
> 

However, there are cases in which it would be nice to have it.
Repurposing the bit to how it has been used so far (as a guest->host
communication bit, not host->guest) is a better choice.

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-07 14:44                                     ` Paolo Bonzini
@ 2012-09-08 22:22                                       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-08 22:22 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Rusty Russell, fes, aarcange, riel, kvm, linux-kernel, mikew,
	yinghan, virtualization, yvugenfi, vrozenfe

On Fri, Sep 07, 2012 at 04:44:40PM +0200, Paolo Bonzini wrote:
> Il 07/09/2012 16:25, Michael S. Tsirkin ha scritto:
> >> > Silent deflate is deflating just by using the page, without using the
> >> > deflateq at all.  So it can be done even from GFP_ATOMIC context.
> > BTW I don't see a need to avoid deflateq.
> > Without MUST_TELL_HOST you can just deflate and then
> > bounce telling the host out to a thread.
> 
> Yes, that's fine too.

OK so it's preferable: will work on existing hypervisors.

> >> > But knowing that the guest will _not_ deflate silently also benefits the
> >> > host. This is the cause of unpinning ballooned pages and pinning them
> >> > again upon deflation. This allows cooperative memory overcommit even if
> >> > the guests' memory is pinned, similar to what Xen did before it
> >> > supported overcommit.  This is the second feature bit.
> > This is the MUST_TELL_HOST.
> 
> Almost.  One is "the guest, if really needed, can tell the host of
> pages".  If not negotiated, and the host does not support it, the host
> must break the guest (e.g. fail to offer any virtqueues).

There is no way in spec to break the guest.
You can not fail to offer virtqueues.
Besides, there is no guarantee that virtqueue setup
happens after feature negotiation.

> The other is "the guest, though, would prefer not to do so".  It is
> different because the guest can proceed in a fallback mode even if the
> host doesn't offer it.

I think I get what your proposed SILENT means what I do not get
is the motivation. It looks like a premature optimization to me.

> You're interpreting features as something that dictates behavior, but
> they're really just advisory.

The spec is pretty clear that if guest acks feature it
is a contract that dictates behaviour.
If it doesn't it is either ignored or just informative
depending on feature.

>  You could negotiate VIRTIO_BLK_F_TOPOLOGY
> and end up never reading the fields; you could negotiate
> VIRTIO_NET_F_GUEST_ANNOUNCE and never send a guest announcement.


Block example is just informative. It does not need to be
negotiated even to be used.
But last example is wrong.
If you ack GUEST_ANNOUNCE hypervisor assumes guest will
announce self, if guest does not do it this break migration.


> >> > * guest will always do silent deflation: current Windows driver.
> > Not exactly. It is not silent. It tells host
> > just after use.
> 
> Yeah, but no difference from the POV of the driver.
> 
> Delaying or avoiding is the same in the end.  The spec says it well: "In
> this case, deflation advice is merely a courtesy".

So it looks like we don't need a new bit to leak in atomic ctx.
Just do not ack MUST_TELL_HOST and delay telling host to a wq.
IMO we should not add random stuff to spec like this just because it
seems like a good idea.

> >> > Negotiates nothing, or if it cares it can negotiate
> >> > VIRTIO_BALLOON_F_SILENT_DEFLATE.  Host mustn't do munlock/mlock dance.
> >> >
> >> > * guest may do silent deflation if available: combo of Linux driver +
> >> > Frank's driver.  Negotiates both features, looks at
> >> > VIRTIO_BALLOON_F_SILENT_DEFLATE host feature to decide how to behave:
> >> > 
> >> >   ** If PCI passthrough, the host will not negotiate
> >> >      VIRTIO_BALLOON_F_SILENT_DEFLATE.  The driver will behave as the
> >> >      current Linux driver, the host can do the munlock/mlock dance.
> > So this case is just existing interface. OK.
> > 
> >> >   ** If no PCI passthrough, the host will negotiate
> >> >      VIRTIO_BALLOON_F_SILENT_DEFLATE, and the driver can balloon more
> >> >      aggressively and not use the deflateq.
> >> > 
> > This last trickery confuses me.  It just does not make sense to set both
> > SILENT and TELL: either you are silent or you tell.
> 
> "I can be chatty if you ask me, but I'd rather be silent if you let me".
> 
> TELL is a request of the host to the guest; the guest can agree or not.
> 
> SILENT is a request of the guest to the host; the host can agree or not.

OK so TELL says *when* to notify host, SILENT if set allows guest
to skip leak notifications? In this case TELL should just be ignored
when SILENT is set. Otherwise it's a reasonable extension.
But I am still not sure *why* do we need SILENT - any data showing
that avoiding these notifications is a win?
Let's not add new features until we see an actual use.


> > It seems cleaner for the host to know the state.
> 
> Yeah, if you want to do it you can.
> 
> Paolo

IMHO, renaming is fine since there is confusion.
But WILL_TELL is also not all that clear imho.

I think the confusion is that TELL_HOST seems to
imply we can avoid telling host at all.
How about being explicit?

VIRTIO_BALLOON_F_HOST_ACK_BEFORE_DEFLATE

Hmm?


"MUST" "WILL" are not really informative.

Also some confusion I think is from spec text saying "feature is set"
in many cases where it really almost always means "feature is negotiated".

Let's address?

-- 
MSt

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-08 22:22                                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-08 22:22 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: fes, aarcange, riel, yvugenfi, kvm, linux-kernel, mikew, yinghan,
	virtualization

On Fri, Sep 07, 2012 at 04:44:40PM +0200, Paolo Bonzini wrote:
> Il 07/09/2012 16:25, Michael S. Tsirkin ha scritto:
> >> > Silent deflate is deflating just by using the page, without using the
> >> > deflateq at all.  So it can be done even from GFP_ATOMIC context.
> > BTW I don't see a need to avoid deflateq.
> > Without MUST_TELL_HOST you can just deflate and then
> > bounce telling the host out to a thread.
> 
> Yes, that's fine too.

OK so it's preferable: will work on existing hypervisors.

> >> > But knowing that the guest will _not_ deflate silently also benefits the
> >> > host. This is the cause of unpinning ballooned pages and pinning them
> >> > again upon deflation. This allows cooperative memory overcommit even if
> >> > the guests' memory is pinned, similar to what Xen did before it
> >> > supported overcommit.  This is the second feature bit.
> > This is the MUST_TELL_HOST.
> 
> Almost.  One is "the guest, if really needed, can tell the host of
> pages".  If not negotiated, and the host does not support it, the host
> must break the guest (e.g. fail to offer any virtqueues).

There is no way in spec to break the guest.
You can not fail to offer virtqueues.
Besides, there is no guarantee that virtqueue setup
happens after feature negotiation.

> The other is "the guest, though, would prefer not to do so".  It is
> different because the guest can proceed in a fallback mode even if the
> host doesn't offer it.

I think I get what your proposed SILENT means what I do not get
is the motivation. It looks like a premature optimization to me.

> You're interpreting features as something that dictates behavior, but
> they're really just advisory.

The spec is pretty clear that if guest acks feature it
is a contract that dictates behaviour.
If it doesn't it is either ignored or just informative
depending on feature.

>  You could negotiate VIRTIO_BLK_F_TOPOLOGY
> and end up never reading the fields; you could negotiate
> VIRTIO_NET_F_GUEST_ANNOUNCE and never send a guest announcement.


Block example is just informative. It does not need to be
negotiated even to be used.
But last example is wrong.
If you ack GUEST_ANNOUNCE hypervisor assumes guest will
announce self, if guest does not do it this break migration.


> >> > * guest will always do silent deflation: current Windows driver.
> > Not exactly. It is not silent. It tells host
> > just after use.
> 
> Yeah, but no difference from the POV of the driver.
> 
> Delaying or avoiding is the same in the end.  The spec says it well: "In
> this case, deflation advice is merely a courtesy".

So it looks like we don't need a new bit to leak in atomic ctx.
Just do not ack MUST_TELL_HOST and delay telling host to a wq.
IMO we should not add random stuff to spec like this just because it
seems like a good idea.

> >> > Negotiates nothing, or if it cares it can negotiate
> >> > VIRTIO_BALLOON_F_SILENT_DEFLATE.  Host mustn't do munlock/mlock dance.
> >> >
> >> > * guest may do silent deflation if available: combo of Linux driver +
> >> > Frank's driver.  Negotiates both features, looks at
> >> > VIRTIO_BALLOON_F_SILENT_DEFLATE host feature to decide how to behave:
> >> > 
> >> >   ** If PCI passthrough, the host will not negotiate
> >> >      VIRTIO_BALLOON_F_SILENT_DEFLATE.  The driver will behave as the
> >> >      current Linux driver, the host can do the munlock/mlock dance.
> > So this case is just existing interface. OK.
> > 
> >> >   ** If no PCI passthrough, the host will negotiate
> >> >      VIRTIO_BALLOON_F_SILENT_DEFLATE, and the driver can balloon more
> >> >      aggressively and not use the deflateq.
> >> > 
> > This last trickery confuses me.  It just does not make sense to set both
> > SILENT and TELL: either you are silent or you tell.
> 
> "I can be chatty if you ask me, but I'd rather be silent if you let me".
> 
> TELL is a request of the host to the guest; the guest can agree or not.
> 
> SILENT is a request of the guest to the host; the host can agree or not.

OK so TELL says *when* to notify host, SILENT if set allows guest
to skip leak notifications? In this case TELL should just be ignored
when SILENT is set. Otherwise it's a reasonable extension.
But I am still not sure *why* do we need SILENT - any data showing
that avoiding these notifications is a win?
Let's not add new features until we see an actual use.


> > It seems cleaner for the host to know the state.
> 
> Yeah, if you want to do it you can.
> 
> Paolo

IMHO, renaming is fine since there is confusion.
But WILL_TELL is also not all that clear imho.

I think the confusion is that TELL_HOST seems to
imply we can avoid telling host at all.
How about being explicit?

VIRTIO_BALLOON_F_HOST_ACK_BEFORE_DEFLATE

Hmm?


"MUST" "WILL" are not really informative.

Also some confusion I think is from spec text saying "feature is set"
in many cases where it really almost always means "feature is negotiated".

Let's address?

-- 
MSt

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-08  5:06                       ` Rusty Russell
@ 2012-09-08 22:37                         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-08 22:37 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Paolo Bonzini, fes, aarcange, riel, kvm, linux-kernel, mikew,
	yinghan, virtualization, yvugenfi, vrozenfe

On Sat, Sep 08, 2012 at 02:36:00PM +0930, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > On Fri, Sep 07, 2012 at 04:09:50PM +0930, Rusty Russell wrote:
> >> > So it looks like a bug: we should teach driver to tell host first on leak?
> >> > Yan, Vadim, can you comment please?
> >> >
> >> > Also if true, looks like this bit will be useful to detect a fixed driver on
> >> > the hypervisor side - to avoid unmapping such pages? Rusty what do you
> >> > think?
> >> 
> >> So, feature is unimplemented in qemu, and broken in drivers.  I starting
> >> to share Paolo's dislike of it.
> >
> > What is broken in drivers?
> 
> Because supporting the feature is *not* optional for a driver.
> 
> If the device said MUST_TELL_HOST, it meant that the driver *had* to
> tell the host before it touched the page, otherwise Bad Things might
> happen.  It was in the original spec precisely to allow devices to
> actually *remove* pages.
> 
> Noone ever noticed the windows driver didn't support it, because qemu
> never requires MUST_TELL_HOST.
> 
> So in practice, it's now an optional feature.  Since no device used it
> anyway, we're better off discarding it than trying to fix it.

I trust you this was not the intent. But it seems to be
the intent in spec, because almost all features are optional.

And so windows driver authors interpreted it
this way. And it is *useful* like this.  See below.

> If someone wants an *optional* "tell me first" feature later, that's
> easy to add, but I don't see why they'd want to.

Suggested use is for device assignment which needs all guest memory
locked.  hypervisor can unlock pages in balloon but guest must wait for
hypervisor to lock them back before use.

when a hypervisor implements this,
this will work with linux guests but not windows
guests and the existing bit fits exactly the purpose.


> > Do we really know there are no hypervisors implementing it?
> 
> As much as can be known.  Qemu doesn't, lkvm doesn't.

But we can add it in qemu and it will be a useful feature.

> > As I said above drivers do have support.
> 
> Not the windows drivers.  So it's optional, thus removing it will likely
> harm noone.
> 
> Cheers,
> Rusty.

I think the issue is that kvm always locked all guest memory
for assignment. This restriction is removed
with vfio which has separate page tables.
Now that vfio is upstream and work on qemu integration
is being worked on, we might finally see people using this bit
to allow memory overcommit with device assignment.

So let's not remove yet, there is no rush.
Let's document it that this feature is optional,
maybe renaming as Paolo suggests.

-- 
MST

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-08 22:37                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-08 22:37 UTC (permalink / raw)
  To: Rusty Russell
  Cc: fes, aarcange, riel, kvm, yvugenfi, linux-kernel, mikew, yinghan,
	Paolo Bonzini, virtualization

On Sat, Sep 08, 2012 at 02:36:00PM +0930, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > On Fri, Sep 07, 2012 at 04:09:50PM +0930, Rusty Russell wrote:
> >> > So it looks like a bug: we should teach driver to tell host first on leak?
> >> > Yan, Vadim, can you comment please?
> >> >
> >> > Also if true, looks like this bit will be useful to detect a fixed driver on
> >> > the hypervisor side - to avoid unmapping such pages? Rusty what do you
> >> > think?
> >> 
> >> So, feature is unimplemented in qemu, and broken in drivers.  I starting
> >> to share Paolo's dislike of it.
> >
> > What is broken in drivers?
> 
> Because supporting the feature is *not* optional for a driver.
> 
> If the device said MUST_TELL_HOST, it meant that the driver *had* to
> tell the host before it touched the page, otherwise Bad Things might
> happen.  It was in the original spec precisely to allow devices to
> actually *remove* pages.
> 
> Noone ever noticed the windows driver didn't support it, because qemu
> never requires MUST_TELL_HOST.
> 
> So in practice, it's now an optional feature.  Since no device used it
> anyway, we're better off discarding it than trying to fix it.

I trust you this was not the intent. But it seems to be
the intent in spec, because almost all features are optional.

And so windows driver authors interpreted it
this way. And it is *useful* like this.  See below.

> If someone wants an *optional* "tell me first" feature later, that's
> easy to add, but I don't see why they'd want to.

Suggested use is for device assignment which needs all guest memory
locked.  hypervisor can unlock pages in balloon but guest must wait for
hypervisor to lock them back before use.

when a hypervisor implements this,
this will work with linux guests but not windows
guests and the existing bit fits exactly the purpose.


> > Do we really know there are no hypervisors implementing it?
> 
> As much as can be known.  Qemu doesn't, lkvm doesn't.

But we can add it in qemu and it will be a useful feature.

> > As I said above drivers do have support.
> 
> Not the windows drivers.  So it's optional, thus removing it will likely
> harm noone.
> 
> Cheers,
> Rusty.

I think the issue is that kvm always locked all guest memory
for assignment. This restriction is removed
with vfio which has separate page tables.
Now that vfio is upstream and work on qemu integration
is being worked on, we might finally see people using this bit
to allow memory overcommit with device assignment.

So let's not remove yet, there is no rush.
Let's document it that this feature is optional,
maybe renaming as Paolo suggests.

-- 
MST

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-08 22:37                         ` Michael S. Tsirkin
@ 2012-09-10  1:43                           ` Rusty Russell
  -1 siblings, 0 replies; 71+ messages in thread
From: Rusty Russell @ 2012-09-10  1:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Paolo Bonzini, fes, aarcange, riel, kvm, linux-kernel, mikew,
	yinghan, virtualization, yvugenfi, vrozenfe

"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Sat, Sep 08, 2012 at 02:36:00PM +0930, Rusty Russell wrote:
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> > On Fri, Sep 07, 2012 at 04:09:50PM +0930, Rusty Russell wrote:
>> >> > So it looks like a bug: we should teach driver to tell host first on leak?
>> >> > Yan, Vadim, can you comment please?
>> >> >
>> >> > Also if true, looks like this bit will be useful to detect a fixed driver on
>> >> > the hypervisor side - to avoid unmapping such pages? Rusty what do you
>> >> > think?
>> >> 
>> >> So, feature is unimplemented in qemu, and broken in drivers.  I starting
>> >> to share Paolo's dislike of it.
>> >
>> > What is broken in drivers?
>> 
>> Because supporting the feature is *not* optional for a driver.
>> 
>> If the device said MUST_TELL_HOST, it meant that the driver *had* to
>> tell the host before it touched the page, otherwise Bad Things might
>> happen.  It was in the original spec precisely to allow devices to
>> actually *remove* pages.
>> 
>> Noone ever noticed the windows driver didn't support it, because qemu
>> never requires MUST_TELL_HOST.
>> 
>> So in practice, it's now an optional feature.  Since no device used it
>> anyway, we're better off discarding it than trying to fix it.
>
> I trust you this was not the intent. But it seems to be
> the intent in spec, because almost all features are optional.
>
> And so windows driver authors interpreted it
> this way. And it is *useful* like this.  See below.

...

> Suggested use is for device assignment which needs all guest memory
> locked.  hypervisor can unlock pages in balloon but guest must wait for
> hypervisor to lock them back before use.
>
> when a hypervisor implements this,
> this will work with linux guests but not windows
> guests and the existing bit fits exactly the purpose.

If a hypervisor needs this, and the guest doesn't support it, then
the hypervisor can only abandon the balloon device.  That's not my
definition of "optional".

>> > Do we really know there are no hypervisors implementing it?
>> 
>> As much as can be known.  Qemu doesn't, lkvm doesn't.
>
> But we can add it in qemu and it will be a useful feature.
>
>> > As I said above drivers do have support.
>> 
>> Not the windows drivers.  So it's optional, thus removing it will likely
>> harm noone.
>> 
>> Cheers,
>> Rusty.
>
> I think the issue is that kvm always locked all guest memory
> for assignment. This restriction is removed
> with vfio which has separate page tables.
> Now that vfio is upstream and work on qemu integration
> is being worked on, we might finally see people using this bit
> to allow memory overcommit with device assignment.

That was left-field.... so you're saying some guest might pull a page
from the balloon and DMA to it, and the vfio device needs to know in
advance that it's going to do it?

But what will we do if the guest doesn't ack the bit?

ie. I don't think it can really be optional.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-10  1:43                           ` Rusty Russell
  0 siblings, 0 replies; 71+ messages in thread
From: Rusty Russell @ 2012-09-10  1:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: fes, aarcange, riel, kvm, yvugenfi, linux-kernel, mikew, yinghan,
	Paolo Bonzini, virtualization

"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Sat, Sep 08, 2012 at 02:36:00PM +0930, Rusty Russell wrote:
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> > On Fri, Sep 07, 2012 at 04:09:50PM +0930, Rusty Russell wrote:
>> >> > So it looks like a bug: we should teach driver to tell host first on leak?
>> >> > Yan, Vadim, can you comment please?
>> >> >
>> >> > Also if true, looks like this bit will be useful to detect a fixed driver on
>> >> > the hypervisor side - to avoid unmapping such pages? Rusty what do you
>> >> > think?
>> >> 
>> >> So, feature is unimplemented in qemu, and broken in drivers.  I starting
>> >> to share Paolo's dislike of it.
>> >
>> > What is broken in drivers?
>> 
>> Because supporting the feature is *not* optional for a driver.
>> 
>> If the device said MUST_TELL_HOST, it meant that the driver *had* to
>> tell the host before it touched the page, otherwise Bad Things might
>> happen.  It was in the original spec precisely to allow devices to
>> actually *remove* pages.
>> 
>> Noone ever noticed the windows driver didn't support it, because qemu
>> never requires MUST_TELL_HOST.
>> 
>> So in practice, it's now an optional feature.  Since no device used it
>> anyway, we're better off discarding it than trying to fix it.
>
> I trust you this was not the intent. But it seems to be
> the intent in spec, because almost all features are optional.
>
> And so windows driver authors interpreted it
> this way. And it is *useful* like this.  See below.

...

> Suggested use is for device assignment which needs all guest memory
> locked.  hypervisor can unlock pages in balloon but guest must wait for
> hypervisor to lock them back before use.
>
> when a hypervisor implements this,
> this will work with linux guests but not windows
> guests and the existing bit fits exactly the purpose.

If a hypervisor needs this, and the guest doesn't support it, then
the hypervisor can only abandon the balloon device.  That's not my
definition of "optional".

>> > Do we really know there are no hypervisors implementing it?
>> 
>> As much as can be known.  Qemu doesn't, lkvm doesn't.
>
> But we can add it in qemu and it will be a useful feature.
>
>> > As I said above drivers do have support.
>> 
>> Not the windows drivers.  So it's optional, thus removing it will likely
>> harm noone.
>> 
>> Cheers,
>> Rusty.
>
> I think the issue is that kvm always locked all guest memory
> for assignment. This restriction is removed
> with vfio which has separate page tables.
> Now that vfio is upstream and work on qemu integration
> is being worked on, we might finally see people using this bit
> to allow memory overcommit with device assignment.

That was left-field.... so you're saying some guest might pull a page
from the balloon and DMA to it, and the vfio device needs to know in
advance that it's going to do it?

But what will we do if the guest doesn't ack the bit?

ie. I don't think it can really be optional.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-08 22:22                                       ` Michael S. Tsirkin
  (?)
  (?)
@ 2012-09-10  5:50                                       ` Paolo Bonzini
  2012-09-10  6:03                                           ` Michael S. Tsirkin
  -1 siblings, 1 reply; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-10  5:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: fes, aarcange, riel, yvugenfi, kvm, linux-kernel, mikew, yinghan,
	virtualization

Il 09/09/2012 00:22, Michael S. Tsirkin ha scritto:
>> Almost.  One is "the guest, if really needed, can tell the host of
>> pages".  If not negotiated, and the host does not support it, the host
>> must break the guest (e.g. fail to offer any virtqueues).
> 
> There is no way in spec to break the guest.
> You can not fail to offer virtqueues.

You can always return 0 for the first queue.

> Besides, there is no guarantee that virtqueue setup
> happens after feature negotiation.

It is the only way that makes sense though (unless the guest would write
0 for its features).  Should we change that?

>> The other is "the guest, though, would prefer not to do so".  It is
>> different because the guest can proceed in a fallback mode even if the
>> host doesn't offer it.
> 
> I think I get what your proposed SILENT means what I do not get
> is the motivation. It looks like a premature optimization to me.

The motivation is to let the driver choose between two behaviors: the
current one where ballooning is only done on request, and a more
aggressive one.

> The spec is pretty clear that if guest acks feature it
> is a contract that dictates behaviour.
> If it doesn't it is either ignored or just informative
> depending on feature.
> 
>> You could negotiate VIRTIO_BLK_F_TOPOLOGY
>> and end up never reading the fields; you could negotiate
>> VIRTIO_NET_F_GUEST_ANNOUNCE and never send a guest announcement.
> 
> Block example is just informative. It does not need to be
> negotiated even to be used. But last example is wrong.
> If you ack GUEST_ANNOUNCE hypervisor assumes guest will
> announce self, if guest does not do it this break migration.

It is wrong indeed, sorry.

Better example: the driver can negotiate VIRTIO_NET_F_CTRL_RX and never
set promiscuous mode.  The device has to obey if it does.

Similarly, if you set VIRTIO_BALLOON_F_SILENT_DEFLATE and only do chatty
deflate later, that's fine.  If you do silent deflate, and the device
negotiated the feature, it has to work.

>> Delaying or avoiding is the same in the end.  The spec says it well: "In
>> this case, deflation advice is merely a courtesy".
> 
> So it looks like we don't need a new bit to leak in atomic ctx.
> Just do not ack MUST_TELL_HOST and delay telling host to a wq.
> IMO we should not add random stuff to spec like this just because it
> seems like a good idea.

But this way you have to choose all-or-none.  If the host cannot do
silent deflate, you cannot balloon anymore, not even in the normal
"cooperative" mode.

> OK so TELL says *when* to notify host, SILENT if set allows guest
> to skip leak notifications? In this case TELL should just be ignored
> when SILENT is set.

Yeah, that was my first idea.  However, there are existing drivers that
ignore SILENT, so that would not be 100% exact.

> IMHO, renaming is fine since there is confusion.
> But WILL_TELL is also not all that clear imho.

> I think the confusion is that TELL_HOST seems to
> imply we can avoid telling host at all.
> How about being explicit?
> 
> VIRTIO_BALLOON_F_HOST_ACK_BEFORE_DEFLATE

Makes sense.

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-08 22:22                                       ` Michael S. Tsirkin
  (?)
@ 2012-09-10  5:50                                       ` Paolo Bonzini
  -1 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-10  5:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: fes, aarcange, riel, kvm, yvugenfi, linux-kernel, mikew, yinghan,
	virtualization

Il 09/09/2012 00:22, Michael S. Tsirkin ha scritto:
>> Almost.  One is "the guest, if really needed, can tell the host of
>> pages".  If not negotiated, and the host does not support it, the host
>> must break the guest (e.g. fail to offer any virtqueues).
> 
> There is no way in spec to break the guest.
> You can not fail to offer virtqueues.

You can always return 0 for the first queue.

> Besides, there is no guarantee that virtqueue setup
> happens after feature negotiation.

It is the only way that makes sense though (unless the guest would write
0 for its features).  Should we change that?

>> The other is "the guest, though, would prefer not to do so".  It is
>> different because the guest can proceed in a fallback mode even if the
>> host doesn't offer it.
> 
> I think I get what your proposed SILENT means what I do not get
> is the motivation. It looks like a premature optimization to me.

The motivation is to let the driver choose between two behaviors: the
current one where ballooning is only done on request, and a more
aggressive one.

> The spec is pretty clear that if guest acks feature it
> is a contract that dictates behaviour.
> If it doesn't it is either ignored or just informative
> depending on feature.
> 
>> You could negotiate VIRTIO_BLK_F_TOPOLOGY
>> and end up never reading the fields; you could negotiate
>> VIRTIO_NET_F_GUEST_ANNOUNCE and never send a guest announcement.
> 
> Block example is just informative. It does not need to be
> negotiated even to be used. But last example is wrong.
> If you ack GUEST_ANNOUNCE hypervisor assumes guest will
> announce self, if guest does not do it this break migration.

It is wrong indeed, sorry.

Better example: the driver can negotiate VIRTIO_NET_F_CTRL_RX and never
set promiscuous mode.  The device has to obey if it does.

Similarly, if you set VIRTIO_BALLOON_F_SILENT_DEFLATE and only do chatty
deflate later, that's fine.  If you do silent deflate, and the device
negotiated the feature, it has to work.

>> Delaying or avoiding is the same in the end.  The spec says it well: "In
>> this case, deflation advice is merely a courtesy".
> 
> So it looks like we don't need a new bit to leak in atomic ctx.
> Just do not ack MUST_TELL_HOST and delay telling host to a wq.
> IMO we should not add random stuff to spec like this just because it
> seems like a good idea.

But this way you have to choose all-or-none.  If the host cannot do
silent deflate, you cannot balloon anymore, not even in the normal
"cooperative" mode.

> OK so TELL says *when* to notify host, SILENT if set allows guest
> to skip leak notifications? In this case TELL should just be ignored
> when SILENT is set.

Yeah, that was my first idea.  However, there are existing drivers that
ignore SILENT, so that would not be 100% exact.

> IMHO, renaming is fine since there is confusion.
> But WILL_TELL is also not all that clear imho.

> I think the confusion is that TELL_HOST seems to
> imply we can avoid telling host at all.
> How about being explicit?
> 
> VIRTIO_BALLOON_F_HOST_ACK_BEFORE_DEFLATE

Makes sense.

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-10  1:43                           ` Rusty Russell
@ 2012-09-10  5:51                             ` Paolo Bonzini
  -1 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-10  5:51 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Michael S. Tsirkin, fes, aarcange, riel, kvm, yvugenfi,
	linux-kernel, mikew, yinghan, virtualization

Il 10/09/2012 03:43, Rusty Russell ha scritto:
>> Now that vfio is upstream and work on qemu integration
>> is being worked on, we might finally see people using this bit
>> to allow memory overcommit with device assignment.
> That was left-field.... so you're saying some guest might pull a page
> from the balloon and DMA to it, and the vfio device needs to know in
> advance that it's going to do it?

Not the vfio device, but the page needs to be pinned with mlock so the
effect is the same.

> But what will we do if the guest doesn't ack the bit?

Just pin the whole memory.

Paolo

> ie. I don't think it can really be optional.


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-10  5:51                             ` Paolo Bonzini
  0 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-10  5:51 UTC (permalink / raw)
  To: Rusty Russell
  Cc: fes, aarcange, riel, kvm, Michael S. Tsirkin, yvugenfi,
	linux-kernel, mikew, yinghan, virtualization

Il 10/09/2012 03:43, Rusty Russell ha scritto:
>> Now that vfio is upstream and work on qemu integration
>> is being worked on, we might finally see people using this bit
>> to allow memory overcommit with device assignment.
> That was left-field.... so you're saying some guest might pull a page
> from the balloon and DMA to it, and the vfio device needs to know in
> advance that it's going to do it?

Not the vfio device, but the page needs to be pinned with mlock so the
effect is the same.

> But what will we do if the guest doesn't ack the bit?

Just pin the whole memory.

Paolo

> ie. I don't think it can really be optional.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-10  1:43                           ` Rusty Russell
@ 2012-09-10  5:51                             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-10  5:51 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Paolo Bonzini, fes, aarcange, riel, kvm, linux-kernel, mikew,
	yinghan, virtualization, yvugenfi, vrozenfe

On Mon, Sep 10, 2012 at 11:13:12AM +0930, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > On Sat, Sep 08, 2012 at 02:36:00PM +0930, Rusty Russell wrote:
> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >> > On Fri, Sep 07, 2012 at 04:09:50PM +0930, Rusty Russell wrote:
> >> >> > So it looks like a bug: we should teach driver to tell host first on leak?
> >> >> > Yan, Vadim, can you comment please?
> >> >> >
> >> >> > Also if true, looks like this bit will be useful to detect a fixed driver on
> >> >> > the hypervisor side - to avoid unmapping such pages? Rusty what do you
> >> >> > think?
> >> >> 
> >> >> So, feature is unimplemented in qemu, and broken in drivers.  I starting
> >> >> to share Paolo's dislike of it.
> >> >
> >> > What is broken in drivers?
> >> 
> >> Because supporting the feature is *not* optional for a driver.
> >> 
> >> If the device said MUST_TELL_HOST, it meant that the driver *had* to
> >> tell the host before it touched the page, otherwise Bad Things might
> >> happen.  It was in the original spec precisely to allow devices to
> >> actually *remove* pages.
> >> 
> >> Noone ever noticed the windows driver didn't support it, because qemu
> >> never requires MUST_TELL_HOST.
> >> 
> >> So in practice, it's now an optional feature.  Since no device used it
> >> anyway, we're better off discarding it than trying to fix it.
> >
> > I trust you this was not the intent. But it seems to be
> > the intent in spec, because almost all features are optional.
> >
> > And so windows driver authors interpreted it
> > this way. And it is *useful* like this.  See below.
> 
> ...
> 
> > Suggested use is for device assignment which needs all guest memory
> > locked.  hypervisor can unlock pages in balloon but guest must wait for
> > hypervisor to lock them back before use.
> >
> > when a hypervisor implements this,
> > this will work with linux guests but not windows
> > guests and the existing bit fits exactly the purpose.
> 
> If a hypervisor needs this, and the guest doesn't support it, then
> the hypervisor can only abandon the balloon device.  That's not my
> definition of "optional".
> 
> >> > Do we really know there are no hypervisors implementing it?
> >> 
> >> As much as can be known.  Qemu doesn't, lkvm doesn't.
> >
> > But we can add it in qemu and it will be a useful feature.
> >
> >> > As I said above drivers do have support.
> >> 
> >> Not the windows drivers.  So it's optional, thus removing it will likely
> >> harm noone.
> >> 
> >> Cheers,
> >> Rusty.
> >
> > I think the issue is that kvm always locked all guest memory
> > for assignment. This restriction is removed
> > with vfio which has separate page tables.
> > Now that vfio is upstream and work on qemu integration
> > is being worked on, we might finally see people using this bit
> > to allow memory overcommit with device assignment.
> 
> That was left-field.... so you're saying some guest might pull a page
> from the balloon and DMA to it, and the vfio device needs to know in
> advance that it's going to do it?
> 
> But what will we do if the guest doesn't ack the bit?
> ie. I don't think it can really be optional.
> 
> Cheers,
> Rusty.

We have several options:
1. No memory overcommit feature. This is always the case ATM!
2. Do not hot-plug assigned device.
3. Hot-unplug assigned device.
4. Some assigned devices can be able to handle memory errors
   e.g. using ATS. Limit hotplug to these.

> I don't think it can really be optional.

It is optional *for the device*.

But I don't insist on my patch. I am merely saying that
1. The bit is useful for host to detect guests
which can't combine memory overcommit with device assignment,
and this set of guests is not empty.

2. The fact that this bit is not optional for drivers is not well documented.
The only hint seems the use of words "feature is set" as
opposed to "feature is negoticated" as with other features.
The spec intended
"feature is set in Device Features bits". Drivers interpreted this
as "feature is set in Guest Features bits".

Hard to blame them, let us give them time to address this.

-- 
MST

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-10  5:51                             ` Michael S. Tsirkin
  0 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-10  5:51 UTC (permalink / raw)
  To: Rusty Russell
  Cc: fes, aarcange, riel, kvm, yvugenfi, linux-kernel, mikew, yinghan,
	Paolo Bonzini, virtualization

On Mon, Sep 10, 2012 at 11:13:12AM +0930, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > On Sat, Sep 08, 2012 at 02:36:00PM +0930, Rusty Russell wrote:
> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >> > On Fri, Sep 07, 2012 at 04:09:50PM +0930, Rusty Russell wrote:
> >> >> > So it looks like a bug: we should teach driver to tell host first on leak?
> >> >> > Yan, Vadim, can you comment please?
> >> >> >
> >> >> > Also if true, looks like this bit will be useful to detect a fixed driver on
> >> >> > the hypervisor side - to avoid unmapping such pages? Rusty what do you
> >> >> > think?
> >> >> 
> >> >> So, feature is unimplemented in qemu, and broken in drivers.  I starting
> >> >> to share Paolo's dislike of it.
> >> >
> >> > What is broken in drivers?
> >> 
> >> Because supporting the feature is *not* optional for a driver.
> >> 
> >> If the device said MUST_TELL_HOST, it meant that the driver *had* to
> >> tell the host before it touched the page, otherwise Bad Things might
> >> happen.  It was in the original spec precisely to allow devices to
> >> actually *remove* pages.
> >> 
> >> Noone ever noticed the windows driver didn't support it, because qemu
> >> never requires MUST_TELL_HOST.
> >> 
> >> So in practice, it's now an optional feature.  Since no device used it
> >> anyway, we're better off discarding it than trying to fix it.
> >
> > I trust you this was not the intent. But it seems to be
> > the intent in spec, because almost all features are optional.
> >
> > And so windows driver authors interpreted it
> > this way. And it is *useful* like this.  See below.
> 
> ...
> 
> > Suggested use is for device assignment which needs all guest memory
> > locked.  hypervisor can unlock pages in balloon but guest must wait for
> > hypervisor to lock them back before use.
> >
> > when a hypervisor implements this,
> > this will work with linux guests but not windows
> > guests and the existing bit fits exactly the purpose.
> 
> If a hypervisor needs this, and the guest doesn't support it, then
> the hypervisor can only abandon the balloon device.  That's not my
> definition of "optional".
> 
> >> > Do we really know there are no hypervisors implementing it?
> >> 
> >> As much as can be known.  Qemu doesn't, lkvm doesn't.
> >
> > But we can add it in qemu and it will be a useful feature.
> >
> >> > As I said above drivers do have support.
> >> 
> >> Not the windows drivers.  So it's optional, thus removing it will likely
> >> harm noone.
> >> 
> >> Cheers,
> >> Rusty.
> >
> > I think the issue is that kvm always locked all guest memory
> > for assignment. This restriction is removed
> > with vfio which has separate page tables.
> > Now that vfio is upstream and work on qemu integration
> > is being worked on, we might finally see people using this bit
> > to allow memory overcommit with device assignment.
> 
> That was left-field.... so you're saying some guest might pull a page
> from the balloon and DMA to it, and the vfio device needs to know in
> advance that it's going to do it?
> 
> But what will we do if the guest doesn't ack the bit?
> ie. I don't think it can really be optional.
> 
> Cheers,
> Rusty.

We have several options:
1. No memory overcommit feature. This is always the case ATM!
2. Do not hot-plug assigned device.
3. Hot-unplug assigned device.
4. Some assigned devices can be able to handle memory errors
   e.g. using ATS. Limit hotplug to these.

> I don't think it can really be optional.

It is optional *for the device*.

But I don't insist on my patch. I am merely saying that
1. The bit is useful for host to detect guests
which can't combine memory overcommit with device assignment,
and this set of guests is not empty.

2. The fact that this bit is not optional for drivers is not well documented.
The only hint seems the use of words "feature is set" as
opposed to "feature is negoticated" as with other features.
The spec intended
"feature is set in Device Features bits". Drivers interpreted this
as "feature is set in Guest Features bits".

Hard to blame them, let us give them time to address this.

-- 
MST

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-10  5:50                                       ` Paolo Bonzini
@ 2012-09-10  6:03                                           ` Michael S. Tsirkin
  0 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-10  6:03 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: fes, aarcange, riel, yvugenfi, kvm, linux-kernel, mikew, yinghan,
	virtualization

On Mon, Sep 10, 2012 at 07:50:13AM +0200, Paolo Bonzini wrote:
> Il 09/09/2012 00:22, Michael S. Tsirkin ha scritto:
> >> Almost.  One is "the guest, if really needed, can tell the host of
> >> pages".  If not negotiated, and the host does not support it, the host
> >> must break the guest (e.g. fail to offer any virtqueues).
> > 
> > There is no way in spec to break the guest.
> > You can not fail to offer virtqueues.
> 
> You can always return 0 for the first queue.

I don't think guest drivers recover gracefully from this.
Do they?

> > Besides, there is no guarantee that virtqueue setup
> > happens after feature negotiation.
> 
> It is the only way that makes sense though (unless the guest would write
> 0 for its features).
>  Should we change that?

Not sure.  This was not always the case. Further
setup can fail with e.g ENOMEM and
driver could retry with a set of more conservative features.

I do think it would be nice to add a generic way for device to
notify guest about an internal failure.
This can only happen after DRIVER_OK status is written though,
and since existing drivers do not expect such failure, it might
be too late.

> >> The other is "the guest, though, would prefer not to do so".  It is
> >> different because the guest can proceed in a fallback mode even if the
> >> host doesn't offer it.
> > 
> > I think I get what your proposed SILENT means what I do not get
> > is the motivation. It looks like a premature optimization to me.
> 
> The motivation is to let the driver choose between two behaviors: the
> current one where ballooning is only done on request, and a more
> aggressive one.

Yes but why is being silent any good? Optimization?
Any data to show that it will help some workload?

...

> > OK so TELL says *when* to notify host, SILENT if set allows guest
> > to skip leak notifications? In this case TELL should just be ignored
> > when SILENT is set.
> 
> Yeah, that was my first idea.  However, there are existing drivers that
> ignore SILENT, so that would not be 100% exact.

Not sure I follow the logic.
They don't ack SILENT so that would be 100% exact.

-- 
MST

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-10  6:03                                           ` Michael S. Tsirkin
  0 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-10  6:03 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: fes, aarcange, riel, kvm, yvugenfi, linux-kernel, mikew, yinghan,
	virtualization

On Mon, Sep 10, 2012 at 07:50:13AM +0200, Paolo Bonzini wrote:
> Il 09/09/2012 00:22, Michael S. Tsirkin ha scritto:
> >> Almost.  One is "the guest, if really needed, can tell the host of
> >> pages".  If not negotiated, and the host does not support it, the host
> >> must break the guest (e.g. fail to offer any virtqueues).
> > 
> > There is no way in spec to break the guest.
> > You can not fail to offer virtqueues.
> 
> You can always return 0 for the first queue.

I don't think guest drivers recover gracefully from this.
Do they?

> > Besides, there is no guarantee that virtqueue setup
> > happens after feature negotiation.
> 
> It is the only way that makes sense though (unless the guest would write
> 0 for its features).
>  Should we change that?

Not sure.  This was not always the case. Further
setup can fail with e.g ENOMEM and
driver could retry with a set of more conservative features.

I do think it would be nice to add a generic way for device to
notify guest about an internal failure.
This can only happen after DRIVER_OK status is written though,
and since existing drivers do not expect such failure, it might
be too late.

> >> The other is "the guest, though, would prefer not to do so".  It is
> >> different because the guest can proceed in a fallback mode even if the
> >> host doesn't offer it.
> > 
> > I think I get what your proposed SILENT means what I do not get
> > is the motivation. It looks like a premature optimization to me.
> 
> The motivation is to let the driver choose between two behaviors: the
> current one where ballooning is only done on request, and a more
> aggressive one.

Yes but why is being silent any good? Optimization?
Any data to show that it will help some workload?

...

> > OK so TELL says *when* to notify host, SILENT if set allows guest
> > to skip leak notifications? In this case TELL should just be ignored
> > when SILENT is set.
> 
> Yeah, that was my first idea.  However, there are existing drivers that
> ignore SILENT, so that would not be 100% exact.

Not sure I follow the logic.
They don't ack SILENT so that would be 100% exact.

-- 
MST

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-10  6:03                                           ` Michael S. Tsirkin
@ 2012-09-10  6:38                                             ` Paolo Bonzini
  -1 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-10  6:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: fes, aarcange, riel, yvugenfi, kvm, linux-kernel, mikew, yinghan,
	virtualization

Il 10/09/2012 08:03, Michael S. Tsirkin ha scritto:
> On Mon, Sep 10, 2012 at 07:50:13AM +0200, Paolo Bonzini wrote:
>> Il 09/09/2012 00:22, Michael S. Tsirkin ha scritto:
>>>> Almost.  One is "the guest, if really needed, can tell the host of
>>>> pages".  If not negotiated, and the host does not support it, the host
>>>> must break the guest (e.g. fail to offer any virtqueues).
>>>
>>> There is no way in spec to break the guest.
>>> You can not fail to offer virtqueues.
>>
>> You can always return 0 for the first queue.
> 
> I don't think guest drivers recover gracefully from this.
> Do they?

No, that's the point ("break the guest" is really "break the driver").

>>> Besides, there is no guarantee that virtqueue setup
>>> happens after feature negotiation.
>>
>> It is the only way that makes sense though (unless the guest would write
>> 0 for its features).  Should we change that?
> 
> I do think it would be nice to add a generic way for device to
> notify guest about an internal failure.
> This can only happen after DRIVER_OK status is written though,
> and since existing drivers do not expect such failure, it might
> be too late.

Agreed.

>>>> The other is "the guest, though, would prefer not to do so".  It is
>>>> different because the guest can proceed in a fallback mode even if the
>>>> host doesn't offer it.
>>>
>>> I think I get what your proposed SILENT means what I do not get
>>> is the motivation. It looks like a premature optimization to me.
>>
>> The motivation is to let the driver choose between two behaviors: the
>> current one where ballooning is only done on request, and a more
>> aggressive one.
> 
> Yes but why is being silent any good? Optimization?
> Any data to show that it will help some workload?

Idle guests can move cache pages to the balloon.  You can overcommit
more aggressively, because the host can madvise away a lot more memory.

>>> OK so TELL says *when* to notify host, SILENT if set allows guest
>>> to skip leak notifications? In this case TELL should just be ignored
>>> when SILENT is set.
>>
>> Yeah, that was my first idea.  However, there are existing drivers that
>> ignore SILENT, so that would not be 100% exact.
> 
> Not sure I follow the logic.
> They don't ack SILENT so that would be 100% exact.

Hmm, then I'm not sure I follow yours.  We agreed that delaying
notifications or skipping them is really the same thing, right?

I think we're just stuck in a linguistic problem, with "must not" being
wrong and "does not have to" being too verbose.  Calling it
VIRTIO_BALLOON_F_SILENT_DEFLATE was a workaround for this, but perhaps
it adds more confusion.

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-10  6:38                                             ` Paolo Bonzini
  0 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-10  6:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: fes, aarcange, riel, kvm, yvugenfi, linux-kernel, mikew, yinghan,
	virtualization

Il 10/09/2012 08:03, Michael S. Tsirkin ha scritto:
> On Mon, Sep 10, 2012 at 07:50:13AM +0200, Paolo Bonzini wrote:
>> Il 09/09/2012 00:22, Michael S. Tsirkin ha scritto:
>>>> Almost.  One is "the guest, if really needed, can tell the host of
>>>> pages".  If not negotiated, and the host does not support it, the host
>>>> must break the guest (e.g. fail to offer any virtqueues).
>>>
>>> There is no way in spec to break the guest.
>>> You can not fail to offer virtqueues.
>>
>> You can always return 0 for the first queue.
> 
> I don't think guest drivers recover gracefully from this.
> Do they?

No, that's the point ("break the guest" is really "break the driver").

>>> Besides, there is no guarantee that virtqueue setup
>>> happens after feature negotiation.
>>
>> It is the only way that makes sense though (unless the guest would write
>> 0 for its features).  Should we change that?
> 
> I do think it would be nice to add a generic way for device to
> notify guest about an internal failure.
> This can only happen after DRIVER_OK status is written though,
> and since existing drivers do not expect such failure, it might
> be too late.

Agreed.

>>>> The other is "the guest, though, would prefer not to do so".  It is
>>>> different because the guest can proceed in a fallback mode even if the
>>>> host doesn't offer it.
>>>
>>> I think I get what your proposed SILENT means what I do not get
>>> is the motivation. It looks like a premature optimization to me.
>>
>> The motivation is to let the driver choose between two behaviors: the
>> current one where ballooning is only done on request, and a more
>> aggressive one.
> 
> Yes but why is being silent any good? Optimization?
> Any data to show that it will help some workload?

Idle guests can move cache pages to the balloon.  You can overcommit
more aggressively, because the host can madvise away a lot more memory.

>>> OK so TELL says *when* to notify host, SILENT if set allows guest
>>> to skip leak notifications? In this case TELL should just be ignored
>>> when SILENT is set.
>>
>> Yeah, that was my first idea.  However, there are existing drivers that
>> ignore SILENT, so that would not be 100% exact.
> 
> Not sure I follow the logic.
> They don't ack SILENT so that would be 100% exact.

Hmm, then I'm not sure I follow yours.  We agreed that delaying
notifications or skipping them is really the same thing, right?

I think we're just stuck in a linguistic problem, with "must not" being
wrong and "does not have to" being too verbose.  Calling it
VIRTIO_BALLOON_F_SILENT_DEFLATE was a workaround for this, but perhaps
it adds more confusion.

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-10  6:38                                             ` Paolo Bonzini
@ 2012-09-10  6:47                                               ` Michael S. Tsirkin
  -1 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-10  6:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: fes, aarcange, riel, yvugenfi, kvm, linux-kernel, mikew, yinghan,
	virtualization

On Mon, Sep 10, 2012 at 08:38:09AM +0200, Paolo Bonzini wrote:
> Il 10/09/2012 08:03, Michael S. Tsirkin ha scritto:
> > On Mon, Sep 10, 2012 at 07:50:13AM +0200, Paolo Bonzini wrote:
> >> Il 09/09/2012 00:22, Michael S. Tsirkin ha scritto:
> >>>> Almost.  One is "the guest, if really needed, can tell the host of
> >>>> pages".  If not negotiated, and the host does not support it, the host
> >>>> must break the guest (e.g. fail to offer any virtqueues).
> >>>
> >>> There is no way in spec to break the guest.
> >>> You can not fail to offer virtqueues.
> >>
> >> You can always return 0 for the first queue.
> > 
> > I don't think guest drivers recover gracefully from this.
> > Do they?
> 
> No, that's the point ("break the guest" is really "break the driver").

You can just stop VM then. No need for a side channel.

...

> >>>> The other is "the guest, though, would prefer not to do so".  It is
> >>>> different because the guest can proceed in a fallback mode even if the
> >>>> host doesn't offer it.
> >>>
> >>> I think I get what your proposed SILENT means what I do not get
> >>> is the motivation. It looks like a premature optimization to me.
> >>
> >> The motivation is to let the driver choose between two behaviors: the
> >> current one where ballooning is only done on request, and a more
> >> aggressive one.
> > 
> > Yes but why is being silent any good? Optimization?
> > Any data to show that it will help some workload?
> 
> Idle guests can move cache pages to the balloon.  You can overcommit
> more aggressively, because the host can madvise away a lot more memory.

IMHO this feature needs more thought. E.g. how will this work with assignment?
If we build something let's build it in a way that plays nicely
with other features.

> >>> OK so TELL says *when* to notify host, SILENT if set allows guest
> >>> to skip leak notifications? In this case TELL should just be ignored
> >>> when SILENT is set.
> >>
> >> Yeah, that was my first idea.  However, there are existing drivers that
> >> ignore SILENT, so that would not be 100% exact.
> > 
> > Not sure I follow the logic.
> > They don't ack SILENT so that would be 100% exact.
> 
> Hmm, then I'm not sure I follow yours.  We agreed that delaying
> notifications or skipping them is really the same thing, right?
> 
> I think we're just stuck in a linguistic problem, with "must not" being
> wrong and "does not have to" being too verbose.  Calling it
> VIRTIO_BALLOON_F_SILENT_DEFLATE was a workaround for this, but perhaps
> it adds more confusion.
> 
> Paolo

Looks like it does :)

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-10  6:47                                               ` Michael S. Tsirkin
  0 siblings, 0 replies; 71+ messages in thread
From: Michael S. Tsirkin @ 2012-09-10  6:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: fes, aarcange, riel, kvm, yvugenfi, linux-kernel, mikew, yinghan,
	virtualization

On Mon, Sep 10, 2012 at 08:38:09AM +0200, Paolo Bonzini wrote:
> Il 10/09/2012 08:03, Michael S. Tsirkin ha scritto:
> > On Mon, Sep 10, 2012 at 07:50:13AM +0200, Paolo Bonzini wrote:
> >> Il 09/09/2012 00:22, Michael S. Tsirkin ha scritto:
> >>>> Almost.  One is "the guest, if really needed, can tell the host of
> >>>> pages".  If not negotiated, and the host does not support it, the host
> >>>> must break the guest (e.g. fail to offer any virtqueues).
> >>>
> >>> There is no way in spec to break the guest.
> >>> You can not fail to offer virtqueues.
> >>
> >> You can always return 0 for the first queue.
> > 
> > I don't think guest drivers recover gracefully from this.
> > Do they?
> 
> No, that's the point ("break the guest" is really "break the driver").

You can just stop VM then. No need for a side channel.

...

> >>>> The other is "the guest, though, would prefer not to do so".  It is
> >>>> different because the guest can proceed in a fallback mode even if the
> >>>> host doesn't offer it.
> >>>
> >>> I think I get what your proposed SILENT means what I do not get
> >>> is the motivation. It looks like a premature optimization to me.
> >>
> >> The motivation is to let the driver choose between two behaviors: the
> >> current one where ballooning is only done on request, and a more
> >> aggressive one.
> > 
> > Yes but why is being silent any good? Optimization?
> > Any data to show that it will help some workload?
> 
> Idle guests can move cache pages to the balloon.  You can overcommit
> more aggressively, because the host can madvise away a lot more memory.

IMHO this feature needs more thought. E.g. how will this work with assignment?
If we build something let's build it in a way that plays nicely
with other features.

> >>> OK so TELL says *when* to notify host, SILENT if set allows guest
> >>> to skip leak notifications? In this case TELL should just be ignored
> >>> when SILENT is set.
> >>
> >> Yeah, that was my first idea.  However, there are existing drivers that
> >> ignore SILENT, so that would not be 100% exact.
> > 
> > Not sure I follow the logic.
> > They don't ack SILENT so that would be 100% exact.
> 
> Hmm, then I'm not sure I follow yours.  We agreed that delaying
> notifications or skipping them is really the same thing, right?
> 
> I think we're just stuck in a linguistic problem, with "must not" being
> wrong and "does not have to" being too verbose.  Calling it
> VIRTIO_BALLOON_F_SILENT_DEFLATE was a workaround for this, but perhaps
> it adds more confusion.
> 
> Paolo

Looks like it does :)

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-10  6:47                                               ` Michael S. Tsirkin
@ 2012-09-10  6:55                                                 ` Paolo Bonzini
  -1 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-10  6:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: fes, aarcange, riel, yvugenfi, kvm, linux-kernel, mikew, yinghan,
	virtualization

Il 10/09/2012 08:47, Michael S. Tsirkin ha scritto:
> On Mon, Sep 10, 2012 at 08:38:09AM +0200, Paolo Bonzini wrote:
>> Il 10/09/2012 08:03, Michael S. Tsirkin ha scritto:
>>> On Mon, Sep 10, 2012 at 07:50:13AM +0200, Paolo Bonzini wrote:
>>>> Il 09/09/2012 00:22, Michael S. Tsirkin ha scritto:
>>>>>> Almost.  One is "the guest, if really needed, can tell the host of
>>>>>> pages".  If not negotiated, and the host does not support it, the host
>>>>>> must break the guest (e.g. fail to offer any virtqueues).
>>>>>
>>>>> There is no way in spec to break the guest.
>>>>> You can not fail to offer virtqueues.
>>>>
>>>> You can always return 0 for the first queue.
>>>
>>> I don't think guest drivers recover gracefully from this.
>>> Do they?
>>
>> No, that's the point ("break the guest" is really "break the driver").
> 
> You can just stop VM then. No need for a side channel.

Keeping the VM running, just with no balloon driver is preferrable.

>>>>>> The other is "the guest, though, would prefer not to do so".  It is
>>>>>> different because the guest can proceed in a fallback mode even if the
>>>>>> host doesn't offer it.
>>>>>
>>>>> I think I get what your proposed SILENT means what I do not get
>>>>> is the motivation. It looks like a premature optimization to me.
>>>>
>>>> The motivation is to let the driver choose between two behaviors: the
>>>> current one where ballooning is only done on request, and a more
>>>> aggressive one.
>>>
>>> Yes but why is being silent any good? Optimization?
>>> Any data to show that it will help some workload?
>>
>> Idle guests can move cache pages to the balloon.  You can overcommit
>> more aggressively, because the host can madvise away a lot more memory.
> 
> IMHO this feature needs more thought. E.g. how will this work with assignment?

Revert to normal cooperative ballooning.

> If we build something let's build it in a way that plays nicely
> with other features.

Yes, that's the point of SILENT. :)

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-10  6:55                                                 ` Paolo Bonzini
  0 siblings, 0 replies; 71+ messages in thread
From: Paolo Bonzini @ 2012-09-10  6:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: fes, aarcange, riel, kvm, yvugenfi, linux-kernel, mikew, yinghan,
	virtualization

Il 10/09/2012 08:47, Michael S. Tsirkin ha scritto:
> On Mon, Sep 10, 2012 at 08:38:09AM +0200, Paolo Bonzini wrote:
>> Il 10/09/2012 08:03, Michael S. Tsirkin ha scritto:
>>> On Mon, Sep 10, 2012 at 07:50:13AM +0200, Paolo Bonzini wrote:
>>>> Il 09/09/2012 00:22, Michael S. Tsirkin ha scritto:
>>>>>> Almost.  One is "the guest, if really needed, can tell the host of
>>>>>> pages".  If not negotiated, and the host does not support it, the host
>>>>>> must break the guest (e.g. fail to offer any virtqueues).
>>>>>
>>>>> There is no way in spec to break the guest.
>>>>> You can not fail to offer virtqueues.
>>>>
>>>> You can always return 0 for the first queue.
>>>
>>> I don't think guest drivers recover gracefully from this.
>>> Do they?
>>
>> No, that's the point ("break the guest" is really "break the driver").
> 
> You can just stop VM then. No need for a side channel.

Keeping the VM running, just with no balloon driver is preferrable.

>>>>>> The other is "the guest, though, would prefer not to do so".  It is
>>>>>> different because the guest can proceed in a fallback mode even if the
>>>>>> host doesn't offer it.
>>>>>
>>>>> I think I get what your proposed SILENT means what I do not get
>>>>> is the motivation. It looks like a premature optimization to me.
>>>>
>>>> The motivation is to let the driver choose between two behaviors: the
>>>> current one where ballooning is only done on request, and a more
>>>> aggressive one.
>>>
>>> Yes but why is being silent any good? Optimization?
>>> Any data to show that it will help some workload?
>>
>> Idle guests can move cache pages to the balloon.  You can overcommit
>> more aggressively, because the host can madvise away a lot more memory.
> 
> IMHO this feature needs more thought. E.g. how will this work with assignment?

Revert to normal cooperative ballooning.

> If we build something let's build it in a way that plays nicely
> with other features.

Yes, that's the point of SILENT. :)

Paolo

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
  2012-09-10  5:51                             ` Michael S. Tsirkin
@ 2012-09-12  6:24                               ` Rusty Russell
  -1 siblings, 0 replies; 71+ messages in thread
From: Rusty Russell @ 2012-09-12  6:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Paolo Bonzini, fes, aarcange, riel, kvm, linux-kernel, mikew,
	yinghan, virtualization, yvugenfi, vrozenfe

"Michael S. Tsirkin" <mst@redhat.com> writes:
> We have several options:
> 1. No memory overcommit feature. This is always the case ATM!
> 2. Do not hot-plug assigned device.
> 3. Hot-unplug assigned device.
> 4. Some assigned devices can be able to handle memory errors
>    e.g. using ATS. Limit hotplug to these.

OK, let's leave it alone, and the first person to try to implement it in
a device can figure out how to fall back.

I replied on the other thread: I think a SILENT_DEFLATE option is
a future optimization.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works
@ 2012-09-12  6:24                               ` Rusty Russell
  0 siblings, 0 replies; 71+ messages in thread
From: Rusty Russell @ 2012-09-12  6:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: fes, aarcange, riel, kvm, yvugenfi, linux-kernel, mikew, yinghan,
	Paolo Bonzini, virtualization

"Michael S. Tsirkin" <mst@redhat.com> writes:
> We have several options:
> 1. No memory overcommit feature. This is always the case ATM!
> 2. Do not hot-plug assigned device.
> 3. Hot-unplug assigned device.
> 4. Some assigned devices can be able to handle memory errors
>    e.g. using ATS. Limit hotplug to these.

OK, let's leave it alone, and the first person to try to implement it in
a device can figure out how to fall back.

I replied on the other thread: I think a SILENT_DEFLATE option is
a future optimization.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 71+ messages in thread

end of thread, other threads:[~2012-09-12  6:33 UTC | newest]

Thread overview: 71+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-06  7:46 [PATCH] virtio-balloon spec: provide a version of the "silent deflate" feature that works Paolo Bonzini
2012-09-06  7:46 ` Paolo Bonzini
2012-09-06  8:47 ` Michael S. Tsirkin
2012-09-06  8:47   ` Michael S. Tsirkin
2012-09-06  9:24   ` Paolo Bonzini
2012-09-06  9:24     ` Paolo Bonzini
2012-09-06  9:44     ` Michael S. Tsirkin
2012-09-06  9:44       ` Michael S. Tsirkin
2012-09-06  9:57       ` Paolo Bonzini
2012-09-06  9:57         ` Paolo Bonzini
2012-09-06 10:53         ` Michael S. Tsirkin
2012-09-06 10:53           ` Michael S. Tsirkin
2012-09-06 12:13           ` Paolo Bonzini
2012-09-06 12:13             ` Paolo Bonzini
2012-09-06 12:51             ` Michael S. Tsirkin
2012-09-06 12:51               ` Michael S. Tsirkin
2012-09-06 13:12               ` Paolo Bonzini
2012-09-06 13:12                 ` Paolo Bonzini
2012-09-06 23:45             ` Rusty Russell
2012-09-07  5:42               ` Michael S. Tsirkin
2012-09-07  5:42                 ` Michael S. Tsirkin
2012-09-07  6:39                 ` Rusty Russell
2012-09-07  6:39                   ` Rusty Russell
2012-09-07  9:27                   ` Paolo Bonzini
2012-09-07  9:27                     ` Paolo Bonzini
2012-09-07 10:53                     ` Michael S. Tsirkin
2012-09-07 10:53                       ` Michael S. Tsirkin
2012-09-07 11:20                       ` Paolo Bonzini
2012-09-07 11:20                         ` Paolo Bonzini
2012-09-07 12:17                         ` Michael S. Tsirkin
2012-09-07 12:17                           ` Michael S. Tsirkin
2012-09-07 12:22                           ` Paolo Bonzini
2012-09-07 12:22                             ` Paolo Bonzini
2012-09-07 12:44                             ` Michael S. Tsirkin
2012-09-07 12:44                               ` Michael S. Tsirkin
2012-09-07 14:04                               ` Paolo Bonzini
2012-09-07 14:04                                 ` Paolo Bonzini
2012-09-07 14:25                                 ` Michael S. Tsirkin
2012-09-07 14:25                                   ` Michael S. Tsirkin
2012-09-07 14:44                                   ` Paolo Bonzini
2012-09-07 14:44                                     ` Paolo Bonzini
2012-09-08 22:22                                     ` Michael S. Tsirkin
2012-09-08 22:22                                       ` Michael S. Tsirkin
2012-09-10  5:50                                       ` Paolo Bonzini
2012-09-10  5:50                                       ` Paolo Bonzini
2012-09-10  6:03                                         ` Michael S. Tsirkin
2012-09-10  6:03                                           ` Michael S. Tsirkin
2012-09-10  6:38                                           ` Paolo Bonzini
2012-09-10  6:38                                             ` Paolo Bonzini
2012-09-10  6:47                                             ` Michael S. Tsirkin
2012-09-10  6:47                                               ` Michael S. Tsirkin
2012-09-10  6:55                                               ` Paolo Bonzini
2012-09-10  6:55                                                 ` Paolo Bonzini
2012-09-07 10:43                   ` Michael S. Tsirkin
2012-09-07 10:43                     ` Michael S. Tsirkin
2012-09-08  5:06                     ` Rusty Russell
2012-09-08  5:06                       ` Rusty Russell
2012-09-08 10:32                       ` Paolo Bonzini
2012-09-08 10:32                       ` Paolo Bonzini
2012-09-08 22:37                       ` Michael S. Tsirkin
2012-09-08 22:37                         ` Michael S. Tsirkin
2012-09-10  1:43                         ` Rusty Russell
2012-09-10  1:43                           ` Rusty Russell
2012-09-10  5:51                           ` Paolo Bonzini
2012-09-10  5:51                             ` Paolo Bonzini
2012-09-10  5:51                           ` Michael S. Tsirkin
2012-09-10  5:51                             ` Michael S. Tsirkin
2012-09-12  6:24                             ` Rusty Russell
2012-09-12  6:24                               ` Rusty Russell
2012-09-06 23:41 ` Michael S. Tsirkin
2012-09-06 23:41   ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.