All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xen-devel] PV Shim ballooning
@ 2020-02-11 13:39 Andrew Cooper
  2020-02-11 13:49 ` Igor Druzhinin
  2020-02-11 16:01 ` Roger Pau Monné
  0 siblings, 2 replies; 8+ messages in thread
From: Andrew Cooper @ 2020-02-11 13:39 UTC (permalink / raw)
  To: xen-devel
  Cc: George Dunlap, Igor Druzhinin, Wei Liu, Jan Beulich, Roger Pau Monne

Ballooning inside PV shim is currently very broken.

From an instrumented Xen and 32bit PV XTF test:

(d3) (d3) --- Xen Test Framework ---
(d3) (d3) Ballooning: PV 32bit (PAE 3 levels)
(d3) (d3) mr { 0010a940, 1024, 0x7ff0 }
(d3) (d3) About to decrease
(d3) (XEN) *** D { ffff820080000020, nr 1020, done 0 }
(d3) (XEN) d3v0 failed to reserve 267 extents of order 0 for offlining
(d3) (XEN) *** D { ffff82007fffe040, nr 1024, done 1020 }
(d3) (XEN) d3v0 failed to reserve 1024 extents of order 0 for offlining
(d3) (d3) => got 1024

This test takes 1024 frames and calls decrease reservation on them,
before unmapping.  i.e. the decrease reservation should fail.  Shim
successfully offlines 753 pages (nothing to do with the frames the guest
selected), and fails to offline 1291, and despite this, returns success.

First of all, the "failed to reserve" is in pv_shim_offline_memory()
which is a void function that has a semantically relevant failure case. 
This obviously isn't ok.

Second, the way the compat code loops over the translated data is
incompatible with how args.nr_done is used for the call into
pv_shim_offline_memory().

Why is pv_shim_offline_memory() not in decrease_reservation() to begin with?

Furthermore, there is a fundamental difference in ballooning behaviour
between PV and HVM guests, which I don't think we can compensate for. 
PV guests need to call decrease reservation once to release the frames,
and unmap the frames (in any order).  HVM guests calling decrease
reservation automatically make the frame unusable no matter how many
outstanding references exist.

Shim can't decrease reservation (HVM with L0 Xen) on any frame who's
reference count didn't drop to 0 from the PV guests' call, and there is
nothing presently to check this condition.

Short of a PGC bit and extra shim logic in free_domheap_page(), I can't
see any way to reconcile the behaviour, except to change the semantics
of decrease reservation for PV guests.  In practice, this would be far
more sensible behaviour, but we have no idea if existing PV guests would
manage.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xen-devel] PV Shim ballooning
  2020-02-11 13:39 [Xen-devel] PV Shim ballooning Andrew Cooper
@ 2020-02-11 13:49 ` Igor Druzhinin
  2020-02-11 16:01 ` Roger Pau Monné
  1 sibling, 0 replies; 8+ messages in thread
From: Igor Druzhinin @ 2020-02-11 13:49 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: George Dunlap, Wei Liu, Jan Beulich, Roger Pau Monne

On 11/02/2020 13:39, Andrew Cooper wrote:
> Ballooning inside PV shim is currently very broken.
> 
> From an instrumented Xen and 32bit PV XTF test:
> 
> (d3) (d3) --- Xen Test Framework ---
> (d3) (d3) Ballooning: PV 32bit (PAE 3 levels)
> (d3) (d3) mr { 0010a940, 1024, 0x7ff0 }
> (d3) (d3) About to decrease
> (d3) (XEN) *** D { ffff820080000020, nr 1020, done 0 }
> (d3) (XEN) d3v0 failed to reserve 267 extents of order 0 for offlining
> (d3) (XEN) *** D { ffff82007fffe040, nr 1024, done 1020 }
> (d3) (XEN) d3v0 failed to reserve 1024 extents of order 0 for offlining
> (d3) (d3) => got 1024
> 
> This test takes 1024 frames and calls decrease reservation on them,
> before unmapping.  i.e. the decrease reservation should fail.  Shim
> successfully offlines 753 pages (nothing to do with the frames the guest
> selected), and fails to offline 1291, and despite this, returns success.
> 
> First of all, the "failed to reserve" is in pv_shim_offline_memory()
> which is a void function that has a semantically relevant failure case. 
> This obviously isn't ok.
> 
> Second, the way the compat code loops over the translated data is
> incompatible with how args.nr_done is used for the call into
> pv_shim_offline_memory().
> 
> Why is pv_shim_offline_memory() not in decrease_reservation() to begin with?

Could be moved assuming it will just offline the frames that already processed.

> Furthermore, there is a fundamental difference in ballooning behaviour
> between PV and HVM guests, which I don't think we can compensate for. 
> PV guests need to call decrease reservation once to release the frames,
> and unmap the frames (in any order).  HVM guests calling decrease
> reservation automatically make the frame unusable no matter how many
> outstanding references exist.
> 
> Shim can't decrease reservation (HVM with L0 Xen) on any frame who's
> reference count didn't drop to 0 from the PV guests' call, and there is
> nothing presently to check this condition.

It will allocated the pages from allocator - yes, it checks that counter
is dropped to 0.

Igor

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xen-devel] PV Shim ballooning
  2020-02-11 13:39 [Xen-devel] PV Shim ballooning Andrew Cooper
  2020-02-11 13:49 ` Igor Druzhinin
@ 2020-02-11 16:01 ` Roger Pau Monné
  2020-02-11 16:29   ` Igor Druzhinin
  1 sibling, 1 reply; 8+ messages in thread
From: Roger Pau Monné @ 2020-02-11 16:01 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, xen-devel, Wei Liu, Jan Beulich, Igor Druzhinin

On Tue, Feb 11, 2020 at 01:39:42PM +0000, Andrew Cooper wrote:
> Ballooning inside PV shim is currently very broken.
> 
> From an instrumented Xen and 32bit PV XTF test:
> 
> (d3) (d3) --- Xen Test Framework ---
> (d3) (d3) Ballooning: PV 32bit (PAE 3 levels)
> (d3) (d3) mr { 0010a940, 1024, 0x7ff0 }
> (d3) (d3) About to decrease
> (d3) (XEN) *** D { ffff820080000020, nr 1020, done 0 }
> (d3) (XEN) d3v0 failed to reserve 267 extents of order 0 for offlining
> (d3) (XEN) *** D { ffff82007fffe040, nr 1024, done 1020 }
> (d3) (XEN) d3v0 failed to reserve 1024 extents of order 0 for offlining
> (d3) (d3) => got 1024
> 
> This test takes 1024 frames and calls decrease reservation on them,
> before unmapping.  i.e. the decrease reservation should fail.  Shim
> successfully offlines 753 pages (nothing to do with the frames the guest
> selected), and fails to offline 1291, and despite this, returns success.
> 
> First of all, the "failed to reserve" is in pv_shim_offline_memory()
> which is a void function that has a semantically relevant failure case. 
> This obviously isn't ok.

So on failure to reserve the pages for offlining we should likely add
them again to the domU and return the number of pages that have been
fully offlined?

Not sure if that's doable, but I think by poking at the extends list
Xen should be able to repopulate the entries.

> 
> Second, the way the compat code loops over the translated data is
> incompatible with how args.nr_done is used for the call into
> pv_shim_offline_memory().

Oh, I would have to check that, I tend to get lost in compat code. The
code in pv_shim_offline_memory assumes that args.nr_done will contain
the total amount of successfully ballooned out pages.

> Why is pv_shim_offline_memory() not in decrease_reservation() to begin with?

I guess to try to batch the decrease into a single call to
batch_memory_op, and to keep the symmetry with the call to
pv_shim_online_memory.

But most of this was done in a hurry, so it's likely it's just there
because that's the first place that seemed sensible enough.

> Furthermore, there is a fundamental difference in ballooning behaviour
> between PV and HVM guests, which I don't think we can compensate for. 
> PV guests need to call decrease reservation once to release the frames,
> and unmap the frames (in any order).  HVM guests calling decrease
> reservation automatically make the frame unusable no matter how many
> outstanding references exist.

Ouch, so you can call XENMEM_decrease_reservation and then unmap the
pages from the guest page-tables and they will be ballooned out?

TBH I had no idea this was possible, I've mostly assumed a model
similar with HVM, where you call decrease_reservation and the pages
are just removed from the physmap.

> Shim can't decrease reservation (HVM with L0 Xen) on any frame who's
> reference count didn't drop to 0 from the PV guests' call, and there is
> nothing presently to check this condition.

But shim will only balloon out free domheap pages (as it gets them
from alloc_domheap_pages), and those shouldn't have any reference by
the guest?

> Short of a PGC bit and extra shim logic in free_domheap_page(), I can't
> see any way to reconcile the behaviour, except to change the semantics
> of decrease reservation for PV guests.  In practice, this would be far
> more sensible behaviour, but we have no idea if existing PV guests would
> manage.

Hm, I guess we could add some hook to free_domheap_page in order to
remove them from the physmap once the guest frees them?

How does Xen know which pages freed by a PV guest should be ballooned
out?

Is that done solely based on the fact that those pages don't have any
reference?

That doesn't seem like a viable option unless we add a new bit to the
page struct in order to signal that those pages should be ballooned
out once freed, as you suggest.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xen-devel] PV Shim ballooning
  2020-02-11 16:01 ` Roger Pau Monné
@ 2020-02-11 16:29   ` Igor Druzhinin
  2020-02-11 16:42     ` Roger Pau Monné
  0 siblings, 1 reply; 8+ messages in thread
From: Igor Druzhinin @ 2020-02-11 16:29 UTC (permalink / raw)
  To: Roger Pau Monné, Andrew Cooper
  Cc: George Dunlap, xen-devel, Wei Liu, Jan Beulich

On 11/02/2020 16:01, Roger Pau Monné wrote:
> On Tue, Feb 11, 2020 at 01:39:42PM +0000, Andrew Cooper wrote:
>> Shim can't decrease reservation (HVM with L0 Xen) on any frame who's
>> reference count didn't drop to 0 from the PV guests' call, and there is
>> nothing presently to check this condition.
> 
> But shim will only balloon out free domheap pages (as it gets them
> from alloc_domheap_pages), and those shouldn't have any reference by
> the guest?

Correct, however all the guests that we test in XenRT behave properly.
I'm not aware of any guest that keeps references after calling
decrease_reservation().

>> Short of a PGC bit and extra shim logic in free_domheap_page(), I can't
>> see any way to reconcile the behaviour, except to change the semantics
>> of decrease reservation for PV guests.  In practice, this would be far
>> more sensible behaviour, but we have no idea if existing PV guests would
>> manage.
> 
> Hm, I guess we could add some hook to free_domheap_page in order to
> remove them from the physmap once the guest frees them?
>
> How does Xen know which pages freed by a PV guest should be ballooned
> out?

It doesn't currently.

> Is that done solely based on the fact that those pages don't have any
> reference?

Yes.

> That doesn't seem like a viable option unless we add a new bit to the
> page struct in order to signal that those pages should be ballooned
> out once freed, as you suggest.

Agree. But as I said I'm not aware of any guest that violates the
invariant of decrease_reservation() being the last call.

Igor

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xen-devel] PV Shim ballooning
  2020-02-11 16:29   ` Igor Druzhinin
@ 2020-02-11 16:42     ` Roger Pau Monné
  2020-02-11 17:29       ` Igor Druzhinin
  0 siblings, 1 reply; 8+ messages in thread
From: Roger Pau Monné @ 2020-02-11 16:42 UTC (permalink / raw)
  To: Igor Druzhinin
  Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich, xen-devel

On Tue, Feb 11, 2020 at 04:29:36PM +0000, Igor Druzhinin wrote:
> On 11/02/2020 16:01, Roger Pau Monné wrote:
> > On Tue, Feb 11, 2020 at 01:39:42PM +0000, Andrew Cooper wrote:
> >> Shim can't decrease reservation (HVM with L0 Xen) on any frame who's
> >> reference count didn't drop to 0 from the PV guests' call, and there is
> >> nothing presently to check this condition.
> > 
> > But shim will only balloon out free domheap pages (as it gets them
> > from alloc_domheap_pages), and those shouldn't have any reference by
> > the guest?
> 
> Correct, however all the guests that we test in XenRT behave properly.
> I'm not aware of any guest that keeps references after calling
> decrease_reservation().
> 
> >> Short of a PGC bit and extra shim logic in free_domheap_page(), I can't
> >> see any way to reconcile the behaviour, except to change the semantics
> >> of decrease reservation for PV guests.  In practice, this would be far
> >> more sensible behaviour, but we have no idea if existing PV guests would
> >> manage.
> > 
> > Hm, I guess we could add some hook to free_domheap_page in order to
> > remove them from the physmap once the guest frees them?
> >
> > How does Xen know which pages freed by a PV guest should be ballooned
> > out?
> 
> It doesn't currently.

Well, not when running on the shim, but I guess when running as a
classic PV guest the reservation for the guest will be lowered (so
that after the call to decrease_reservation the guest will have an
overcommit of memory) and pages would be removed from the domheap as
references are dropped.

> 
> > Is that done solely based on the fact that those pages don't have any
> > reference?
> 
> Yes.
> 
> > That doesn't seem like a viable option unless we add a new bit to the
> > page struct in order to signal that those pages should be ballooned
> > out once freed, as you suggest.
> 
> Agree. But as I said I'm not aware of any guest that violates the
> invariant of decrease_reservation() being the last call.

Maybe we could piggyback on whether a page is removed from the domain
domheap and use that as a signal that the page should be ballooned
out?

There's already an arch_free_heap_page that's called when a page is
removed from a domain, which might be suitable for this. It would
however imply making an hypercall for every page to be ballooned out.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xen-devel] PV Shim ballooning
  2020-02-11 16:42     ` Roger Pau Monné
@ 2020-02-11 17:29       ` Igor Druzhinin
  2020-02-11 18:11         ` Roger Pau Monné
  0 siblings, 1 reply; 8+ messages in thread
From: Igor Druzhinin @ 2020-02-11 17:29 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich, xen-devel

On 11/02/2020 16:42, Roger Pau Monné wrote:
> On Tue, Feb 11, 2020 at 04:29:36PM +0000, Igor Druzhinin wrote:
>> Agree. But as I said I'm not aware of any guest that violates the
>> invariant of decrease_reservation() being the last call.
> 
> Maybe we could piggyback on whether a page is removed from the domain
> domheap and use that as a signal that the page should be ballooned
> out?
> 
> There's already an arch_free_heap_page that's called when a page is
> removed from a domain, which might be suitable for this. It would
> however imply making an hypercall for every page to be ballooned out.

I tested that - doesn't work - too many hypercalls make ballooning take
ages. This simply cannot be done on page-by-page basis.

Igor

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xen-devel] PV Shim ballooning
  2020-02-11 17:29       ` Igor Druzhinin
@ 2020-02-11 18:11         ` Roger Pau Monné
  2020-02-11 18:39           ` Igor Druzhinin
  0 siblings, 1 reply; 8+ messages in thread
From: Roger Pau Monné @ 2020-02-11 18:11 UTC (permalink / raw)
  To: Igor Druzhinin
  Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich, xen-devel

On Tue, Feb 11, 2020 at 05:29:09PM +0000, Igor Druzhinin wrote:
> On 11/02/2020 16:42, Roger Pau Monné wrote:
> > On Tue, Feb 11, 2020 at 04:29:36PM +0000, Igor Druzhinin wrote:
> >> Agree. But as I said I'm not aware of any guest that violates the
> >> invariant of decrease_reservation() being the last call.
> > 
> > Maybe we could piggyback on whether a page is removed from the domain
> > domheap and use that as a signal that the page should be ballooned
> > out?
> > 
> > There's already an arch_free_heap_page that's called when a page is
> > removed from a domain, which might be suitable for this. It would
> > however imply making an hypercall for every page to be ballooned out.
> 
> I tested that - doesn't work - too many hypercalls make ballooning take
> ages. This simply cannot be done on page-by-page basis.

Why not place them on a list (in arch_free_heap_page) and do the flush
either after a timeout or when it gets to a certain number of
elements?

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xen-devel] PV Shim ballooning
  2020-02-11 18:11         ` Roger Pau Monné
@ 2020-02-11 18:39           ` Igor Druzhinin
  0 siblings, 0 replies; 8+ messages in thread
From: Igor Druzhinin @ 2020-02-11 18:39 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich, xen-devel

On 11/02/2020 18:11, Roger Pau Monné wrote:
> On Tue, Feb 11, 2020 at 05:29:09PM +0000, Igor Druzhinin wrote:
>> On 11/02/2020 16:42, Roger Pau Monné wrote:
>>> On Tue, Feb 11, 2020 at 04:29:36PM +0000, Igor Druzhinin wrote:
>>>> Agree. But as I said I'm not aware of any guest that violates the
>>>> invariant of decrease_reservation() being the last call.
>>>
>>> Maybe we could piggyback on whether a page is removed from the domain
>>> domheap and use that as a signal that the page should be ballooned
>>> out?
>>>
>>> There's already an arch_free_heap_page that's called when a page is
>>> removed from a domain, which might be suitable for this. It would
>>> however imply making an hypercall for every page to be ballooned out.
>>
>> I tested that - doesn't work - too many hypercalls make ballooning take
>> ages. This simply cannot be done on page-by-page basis.
> 
> Why not place them on a list (in arch_free_heap_page) and do the flush
> either after a timeout or when it gets to a certain number of
> elements?

How do you know that "certain number of elements"? How do you know what
timeout is just right? This all seems like it will introduce more problems
than solve.

What if instead we place them on a temp list in L1 decrease_reservation()
and then will just move to another list as soon as they are freed,
then pass the whole list to L0 as soon as the first list is empty?

Igor

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-02-11 18:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-11 13:39 [Xen-devel] PV Shim ballooning Andrew Cooper
2020-02-11 13:49 ` Igor Druzhinin
2020-02-11 16:01 ` Roger Pau Monné
2020-02-11 16:29   ` Igor Druzhinin
2020-02-11 16:42     ` Roger Pau Monné
2020-02-11 17:29       ` Igor Druzhinin
2020-02-11 18:11         ` Roger Pau Monné
2020-02-11 18:39           ` Igor Druzhinin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.