All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Futher work after XSA-150
@ 2015-10-30 18:33 Andrew Cooper
  2015-10-30 18:33 ` [PATCH 1/4] x86/PoD: Make p2m_pod_empty_cache() restartable Andrew Cooper
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Andrew Cooper @ 2015-10-30 18:33 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

These 4 patches were developed during the analysis and fixing of XSA-150

Andrew Cooper (4):
  x86/PoD: Make p2m_pod_empty_cache() restartable
  x86/PoD: Identify when a domain has already been killed from PoD
    exhaustion
  x86/mm: Return -ESRCH for an invalid foreign domid
  x86/PoD: Command line option to prohibit any PoD operations

 docs/misc/xen-command-line.markdown |  8 ++++++++
 xen/arch/x86/hvm/hvm.c              |  3 +++
 xen/arch/x86/mm.c                   |  8 +++++++-
 xen/arch/x86/mm/p2m-pod.c           | 19 +++++++++++++------
 xen/arch/x86/mm/paging.c            |  2 +-
 xen/common/memory.c                 |  4 ++++
 xen/include/asm-x86/hvm/hvm.h       |  2 ++
 xen/include/asm-x86/p2m.h           |  4 +++-
 8 files changed, 41 insertions(+), 9 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] x86/PoD: Make p2m_pod_empty_cache() restartable
  2015-10-30 18:33 [PATCH 0/4] Futher work after XSA-150 Andrew Cooper
@ 2015-10-30 18:33 ` Andrew Cooper
  2015-11-02 12:17   ` George Dunlap
  2015-10-30 18:33 ` [PATCH 2/4] x86/PoD: Identify when a domain has already been killed from PoD exhaustion Andrew Cooper
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Andrew Cooper @ 2015-10-30 18:33 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Jan Beulich

This avoids a long running operation when destroying a domain with a
large PoD cache.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
---
 xen/arch/x86/mm/p2m-pod.c | 16 +++++++++++-----
 xen/arch/x86/mm/paging.c  |  2 +-
 xen/include/asm-x86/p2m.h |  2 +-
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
index 2a6f125..be15cf3 100644
--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -374,11 +374,11 @@ p2m_pod_set_mem_target(struct domain *d, unsigned long target)
     return ret;
 }
 
-void
-p2m_pod_empty_cache(struct domain *d)
+int p2m_pod_empty_cache(struct domain *d)
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
     struct page_info *page;
+    unsigned int i;
 
     /* After this barrier no new PoD activities can happen. */
     BUG_ON(!d->is_dying);
@@ -388,8 +388,6 @@ p2m_pod_empty_cache(struct domain *d)
 
     while ( (page = page_list_remove_head(&p2m->pod.super)) )
     {
-        int i;
-            
         for ( i = 0 ; i < SUPERPAGE_PAGES ; i++ )
         {
             BUG_ON(page_get_owner(page + i) != d);
@@ -397,19 +395,27 @@ p2m_pod_empty_cache(struct domain *d)
         }
 
         p2m->pod.count -= SUPERPAGE_PAGES;
+
+        if ( hypercall_preempt_check() )
+            goto out;
     }
 
-    while ( (page = page_list_remove_head(&p2m->pod.single)) )
+    for ( i = 0; (page = page_list_remove_head(&p2m->pod.single)); ++i )
     {
         BUG_ON(page_get_owner(page) != d);
         page_list_add_tail(page, &d->page_list);
 
         p2m->pod.count -= 1;
+
+        if ( i && !(i & 511) && hypercall_preempt_check() )
+            goto out;
     }
 
     BUG_ON(p2m->pod.count != 0);
 
+ out:
     unlock_page_alloc(p2m);
+    return p2m->pod.count ? -ERESTART : 0;
 }
 
 int
diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c
index 0bfb2be..c7d1943 100644
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
@@ -824,7 +824,7 @@ int paging_teardown(struct domain *d)
         return rc;
 
     /* Move populate-on-demand cache back to domain_list for destruction */
-    p2m_pod_empty_cache(d);
+    rc = p2m_pod_empty_cache(d);
 
     return rc;
 }
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index b710b2c..d748557 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -588,7 +588,7 @@ void p2m_pod_dump_data(struct domain *d);
 
 /* Move all pages from the populate-on-demand cache to the domain page_list
  * (usually in preparation for domain destruction) */
-void p2m_pod_empty_cache(struct domain *d);
+int p2m_pod_empty_cache(struct domain *d);
 
 /* Set populate-on-demand cache size so that the total memory allocated to a
  * domain matches target */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] x86/PoD: Identify when a domain has already been killed from PoD exhaustion
  2015-10-30 18:33 [PATCH 0/4] Futher work after XSA-150 Andrew Cooper
  2015-10-30 18:33 ` [PATCH 1/4] x86/PoD: Make p2m_pod_empty_cache() restartable Andrew Cooper
@ 2015-10-30 18:33 ` Andrew Cooper
  2015-11-02 14:07   ` George Dunlap
  2015-10-30 18:33 ` [PATCH 3/4] x86/mm: Return -ESRCH for an invalid foreign domid Andrew Cooper
  2015-10-30 18:33 ` [PATCH 4/4] x86/PoD: Command line option to prohibit any PoD operations Andrew Cooper
  3 siblings, 1 reply; 16+ messages in thread
From: Andrew Cooper @ 2015-10-30 18:33 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Jan Beulich

p2m_pod_demand_populate() can be entered repeatedly during a single path
through the hypervisor, e.g. on a toolstack batch map operation.

The domain might be crashed, but the interface currently lacks a way of
passing an error back through the generic p2m layer.

Longterm the p2m layer needs reworking to allow errors to be returned, but in
the short term, avoid repeatedly re-sweeping the domain after it has already
been crashed from PoD exhaustion.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
---
 xen/arch/x86/mm/p2m-pod.c | 3 ++-
 xen/include/asm-x86/p2m.h | 2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
index be15cf3..6fb054f 100644
--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -1048,7 +1048,7 @@ p2m_pod_demand_populate(struct p2m_domain *p2m, unsigned long gfn,
     /* This check is done with the pod lock held.  This will make sure that
      * even if d->is_dying changes under our feet, p2m_pod_empty_cache() 
      * won't start until we're done. */
-    if ( unlikely(d->is_dying) )
+    if ( unlikely(d->is_dying) || p2m->pod.dead )
         goto out_fail;
 
     
@@ -1129,6 +1129,7 @@ p2m_pod_demand_populate(struct p2m_domain *p2m, unsigned long gfn,
     pod_unlock(p2m);
     return 0;
 out_of_memory:
+    p2m->pod.dead = 1;
     pod_unlock(p2m);
 
     printk("%s: Dom%d out of PoD memory! (tot=%"PRIu32" ents=%ld dom%d)\n",
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index d748557..6faf9d6 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -308,6 +308,8 @@ struct p2m_domain {
         } mrp;
         mm_lock_t        lock;         /* Locking of private pod structs,   *
                                         * not relying on the p2m lock.      */
+        bool_t           dead;         /* Emergency sweep has failed and    *
+                                        * the domain has been crashed.      */
     } pod;
     union {
         struct ept_data ept;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] x86/mm: Return -ESRCH for an invalid foreign domid
  2015-10-30 18:33 [PATCH 0/4] Futher work after XSA-150 Andrew Cooper
  2015-10-30 18:33 ` [PATCH 1/4] x86/PoD: Make p2m_pod_empty_cache() restartable Andrew Cooper
  2015-10-30 18:33 ` [PATCH 2/4] x86/PoD: Identify when a domain has already been killed from PoD exhaustion Andrew Cooper
@ 2015-10-30 18:33 ` Andrew Cooper
  2015-11-02 14:10   ` George Dunlap
  2015-10-30 18:33 ` [PATCH 4/4] x86/PoD: Command line option to prohibit any PoD operations Andrew Cooper
  3 siblings, 1 reply; 16+ messages in thread
From: Andrew Cooper @ 2015-10-30 18:33 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Jan Beulich

For consistency with all other invalid domid handling.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>

A positive effect this will have is causing Linux to break early from
batched foreign mappings, rather than attempting to complete the
mappings, recording -EINVAL failures for each map request.
---
 xen/arch/x86/mm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index b65c3a5..92df36f 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3544,7 +3544,7 @@ long do_mmu_update(
     {
         /* Pagetables belong to a foreign domain (PFD). */
         if ( (pt_owner = rcu_lock_domain_by_id(pt_dom - 1)) == NULL )
-            return -EINVAL;
+            return -ESRCH;
 
         if ( pt_owner == d )
             rcu_unlock_domain(pt_owner);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/4] x86/PoD: Command line option to prohibit any PoD operations
  2015-10-30 18:33 [PATCH 0/4] Futher work after XSA-150 Andrew Cooper
                   ` (2 preceding siblings ...)
  2015-10-30 18:33 ` [PATCH 3/4] x86/mm: Return -ESRCH for an invalid foreign domid Andrew Cooper
@ 2015-10-30 18:33 ` Andrew Cooper
  2015-11-02 13:53   ` Jan Beulich
  2015-11-02 15:19   ` George Dunlap
  3 siblings, 2 replies; 16+ messages in thread
From: Andrew Cooper @ 2015-10-30 18:33 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Jan Beulich

PoD is only needed to cover a corner case with memory overcommit
(rebooting a ballooned down VM with insufficient host RAM available).

Its use comes with a performance hit, and inoperability with other
technologies such as PCI Passthrough.

Offer a command line option for administrators who want to be certain
that PoD is not in use for their VMs.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
---
 docs/misc/xen-command-line.markdown | 8 ++++++++
 xen/arch/x86/hvm/hvm.c              | 3 +++
 xen/arch/x86/mm.c                   | 6 ++++++
 xen/common/memory.c                 | 4 ++++
 xen/include/asm-x86/hvm/hvm.h       | 2 ++
 5 files changed, 23 insertions(+)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 416e559..aaf8a4f 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1173,6 +1173,14 @@ This option can be specified more than once (up to 8 times at present).
 ### ple\_window
 > `= <integer>`
 
+### pod
+> `= <boolean>`
+
+Default: `true`
+
+Permit or deny the use of Populate on Demand with x86 HVM guests.  If
+disabled, attempts to create VMs with `memory < maxmem` will fail.
+
 ### psr (Intel)
 > `= List of ( cmt:<boolean> | rmid_max:<integer> | cat:<boolean> | cos_max:<integer> | cdp:<boolean> )`
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 21f42a7..cf895a7 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -92,6 +92,9 @@ unsigned long __section(".bss.page_aligned")
 static bool_t __initdata opt_hap_enabled = 1;
 boolean_param("hap", opt_hap_enabled);
 
+bool_t opt_pod_enabled = 1;
+boolean_param("pod", opt_pod_enabled);
+
 #ifndef opt_hvm_fep
 bool_t opt_hvm_fep;
 boolean_param("hvm_fep", opt_hvm_fep);
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 92df36f..90270ba 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4929,6 +4929,12 @@ long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
         if ( cmd == XENMEM_set_pod_target )
         {
+            if ( unlikely(!opt_pod_enabled) )
+            {
+                rc = -EOPNOTSUPP;
+                goto pod_target_out_unlock;
+            }
+
             if ( target.target_pages > d->max_pages )
             {
                 rc = -EINVAL;
diff --git a/xen/common/memory.c b/xen/common/memory.c
index a3bffb7..f5ed66e 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -818,6 +818,10 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( unlikely(start_extent >= reservation.nr_extents) )
             return start_extent;
 
+        if ( unlikely(!opt_pod_enabled) &&
+             (reservation.mem_flags & XENMEMF_populate_on_demand) )
+            return start_extent;
+
         d = rcu_lock_domain_by_any_id(reservation.domid);
         if ( d == NULL )
             return start_extent;
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 0cf7da1..0341ba6 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -555,6 +555,8 @@ extern bool_t opt_hvm_fep;
 #define opt_hvm_fep 0
 #endif
 
+extern bool_t opt_pod_enabled;
+
 /* updates the current hardware p2m */
 void altp2m_vcpu_update_p2m(struct vcpu *v);
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] x86/PoD: Make p2m_pod_empty_cache() restartable
  2015-10-30 18:33 ` [PATCH 1/4] x86/PoD: Make p2m_pod_empty_cache() restartable Andrew Cooper
@ 2015-11-02 12:17   ` George Dunlap
  0 siblings, 0 replies; 16+ messages in thread
From: George Dunlap @ 2015-11-02 12:17 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel; +Cc: George Dunlap, Jan Beulich

On 30/10/15 18:33, Andrew Cooper wrote:
> This avoids a long running operation when destroying a domain with a
> large PoD cache.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] x86/PoD: Command line option to prohibit any PoD operations
  2015-10-30 18:33 ` [PATCH 4/4] x86/PoD: Command line option to prohibit any PoD operations Andrew Cooper
@ 2015-11-02 13:53   ` Jan Beulich
  2015-11-02 14:32     ` Ian Campbell
  2015-11-02 15:19   ` George Dunlap
  1 sibling, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2015-11-02 13:53 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Xen-devel

>>> On 30.10.15 at 19:33, <andrew.cooper3@citrix.com> wrote:
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -92,6 +92,9 @@ unsigned long __section(".bss.page_aligned")
>  static bool_t __initdata opt_hap_enabled = 1;
>  boolean_param("hap", opt_hap_enabled);
>  
> +bool_t opt_pod_enabled = 1;

__read_mostly?

> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -818,6 +818,10 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          if ( unlikely(start_extent >= reservation.nr_extents) )
>              return start_extent;
>  
> +        if ( unlikely(!opt_pod_enabled) &&
> +             (reservation.mem_flags & XENMEMF_populate_on_demand) )
> +            return start_extent;

A few lines down we can see that XENMEMF_populate_on_demand
gets honored only for XENMEM_populate_physmap. Perhaps you
shouldn't fail the other two which ignore the flag anyway? And
perhaps you should also fold this into the existing check?

Also I don't think this is going to build on ARM.

Jan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] x86/PoD: Identify when a domain has already been killed from PoD exhaustion
  2015-10-30 18:33 ` [PATCH 2/4] x86/PoD: Identify when a domain has already been killed from PoD exhaustion Andrew Cooper
@ 2015-11-02 14:07   ` George Dunlap
  2015-11-02 14:32     ` Andrew Cooper
  0 siblings, 1 reply; 16+ messages in thread
From: George Dunlap @ 2015-11-02 14:07 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel; +Cc: George Dunlap, Jan Beulich

On 30/10/15 18:33, Andrew Cooper wrote:
> p2m_pod_demand_populate() can be entered repeatedly during a single path
> through the hypervisor, e.g. on a toolstack batch map operation.
> 
> The domain might be crashed, but the interface currently lacks a way of
> passing an error back through the generic p2m layer.
> 
> Longterm the p2m layer needs reworking to allow errors to be returned, but in
> the short term, avoid repeatedly re-sweeping the domain after it has already
> been crashed from PoD exhaustion.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: George Dunlap <george.dunlap@eu.citrix.com>
> ---
>  xen/arch/x86/mm/p2m-pod.c | 3 ++-
>  xen/include/asm-x86/p2m.h | 2 ++
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
> index be15cf3..6fb054f 100644
> --- a/xen/arch/x86/mm/p2m-pod.c
> +++ b/xen/arch/x86/mm/p2m-pod.c
> @@ -1048,7 +1048,7 @@ p2m_pod_demand_populate(struct p2m_domain *p2m, unsigned long gfn,
>      /* This check is done with the pod lock held.  This will make sure that
>       * even if d->is_dying changes under our feet, p2m_pod_empty_cache() 
>       * won't start until we're done. */
> -    if ( unlikely(d->is_dying) )
> +    if ( unlikely(d->is_dying) || p2m->pod.dead )

So after getting lost in a maze of twisty passages, it looks like
"d->is_dying" might be the wrong thing to check here.  d->is_dying is
*only* set, AFAICT, in two places:
 - in domain_kill(), which is only called for XEN_DOMCTL_destroydomain
 - in domain_create(), if the creation failed for some reason.

Would it make more sense to check d->is_shutting_down instead?

Having some sort of pod-specific flag seems like the wrong solution.

 -George

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] x86/mm: Return -ESRCH for an invalid foreign domid
  2015-10-30 18:33 ` [PATCH 3/4] x86/mm: Return -ESRCH for an invalid foreign domid Andrew Cooper
@ 2015-11-02 14:10   ` George Dunlap
  0 siblings, 0 replies; 16+ messages in thread
From: George Dunlap @ 2015-11-02 14:10 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel; +Cc: George Dunlap, Jan Beulich

On 30/10/15 18:33, Andrew Cooper wrote:
> For consistency with all other invalid domid handling.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] x86/PoD: Command line option to prohibit any PoD operations
  2015-11-02 13:53   ` Jan Beulich
@ 2015-11-02 14:32     ` Ian Campbell
  2015-11-02 14:37       ` Andrew Cooper
  2015-11-02 14:38       ` Jan Beulich
  0 siblings, 2 replies; 16+ messages in thread
From: Ian Campbell @ 2015-11-02 14:32 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper; +Cc: George Dunlap, Xen-devel

On Mon, 2015-11-02 at 06:53 -0700, Jan Beulich wrote:
> > > > On 30.10.15 at 19:33, <andrew.cooper3@citrix.com> wrote:
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -92,6 +92,9 @@ unsigned long __section(".bss.page_aligned")
> >  static bool_t __initdata opt_hap_enabled = 1;
> >  boolean_param("hap", opt_hap_enabled);
> >  
> > +bool_t opt_pod_enabled = 1;
> 
> __read_mostly?
> 
> > --- a/xen/common/memory.c
> > +++ b/xen/common/memory.c
> > @@ -818,6 +818,10 @@ long do_memory_op(unsigned long cmd,
> > XEN_GUEST_HANDLE_PARAM(void) arg)
> >          if ( unlikely(start_extent >= reservation.nr_extents) )
> >              return start_extent;
> >  
> > +        if ( unlikely(!opt_pod_enabled) &&
> > +             (reservation.mem_flags & XENMEMF_populate_on_demand) )
> > +            return start_extent;
> 
> A few lines down we can see that XENMEMF_populate_on_demand
> gets honored only for XENMEM_populate_physmap. Perhaps you
> shouldn't fail the other two which ignore the flag anyway?

Setting an unexpected flag surely ought to be an error? Admittedly that
particular ship may have sailed WRT this public ABI.

>  And
> perhaps you should also fold this into the existing check?
> 
> Also I don't think this is going to build on ARM.

Because opt_pod_enabled is on x86 only, I suppose.

I wouldn't be averse to a "const bool_t opt_pod_enabled = 0" in some
convenient place, seems better than the alternative of ifdefs around here.

Ian.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] x86/PoD: Identify when a domain has already been killed from PoD exhaustion
  2015-11-02 14:07   ` George Dunlap
@ 2015-11-02 14:32     ` Andrew Cooper
  2015-11-23 14:50       ` Jan Beulich
  2015-11-24 16:51       ` George Dunlap
  0 siblings, 2 replies; 16+ messages in thread
From: Andrew Cooper @ 2015-11-02 14:32 UTC (permalink / raw)
  To: George Dunlap, Xen-devel; +Cc: George Dunlap, Jan Beulich

On 02/11/15 14:07, George Dunlap wrote:
> On 30/10/15 18:33, Andrew Cooper wrote:
>> p2m_pod_demand_populate() can be entered repeatedly during a single path
>> through the hypervisor, e.g. on a toolstack batch map operation.
>>
>> The domain might be crashed, but the interface currently lacks a way of
>> passing an error back through the generic p2m layer.
>>
>> Longterm the p2m layer needs reworking to allow errors to be returned, but in
>> the short term, avoid repeatedly re-sweeping the domain after it has already
>> been crashed from PoD exhaustion.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: George Dunlap <george.dunlap@eu.citrix.com>
>> ---
>>  xen/arch/x86/mm/p2m-pod.c | 3 ++-
>>  xen/include/asm-x86/p2m.h | 2 ++
>>  2 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
>> index be15cf3..6fb054f 100644
>> --- a/xen/arch/x86/mm/p2m-pod.c
>> +++ b/xen/arch/x86/mm/p2m-pod.c
>> @@ -1048,7 +1048,7 @@ p2m_pod_demand_populate(struct p2m_domain *p2m, unsigned long gfn,
>>      /* This check is done with the pod lock held.  This will make sure that
>>       * even if d->is_dying changes under our feet, p2m_pod_empty_cache() 
>>       * won't start until we're done. */
>> -    if ( unlikely(d->is_dying) )
>> +    if ( unlikely(d->is_dying) || p2m->pod.dead )
> So after getting lost in a maze of twisty passages, it looks like
> "d->is_dying" might be the wrong thing to check here.  d->is_dying is
> *only* set, AFAICT, in two places:
>  - in domain_kill(), which is only called for XEN_DOMCTL_destroydomain
>  - in domain_create(), if the creation failed for some reason.

d->is_dying indicates that the domains resources are starting to be torn
down.  It is the correct check here.

>
> Would it make more sense to check d->is_shutting_down instead?

A domain resume hypercall can clear d->is_shutting_down, making it an
inappropriate check.

>
> Having some sort of pod-specific flag seems like the wrong solution.

All of this patch is the wrong solution, but is necessary until the
generic p2m interface can properly represent errors.

~Andrew

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] x86/PoD: Command line option to prohibit any PoD operations
  2015-11-02 14:32     ` Ian Campbell
@ 2015-11-02 14:37       ` Andrew Cooper
  2015-11-02 14:38       ` Jan Beulich
  1 sibling, 0 replies; 16+ messages in thread
From: Andrew Cooper @ 2015-11-02 14:37 UTC (permalink / raw)
  To: Ian Campbell, Jan Beulich; +Cc: George Dunlap, Xen-devel

On 02/11/15 14:32, Ian Campbell wrote:
> On Mon, 2015-11-02 at 06:53 -0700, Jan Beulich wrote:
>>>>> On 30.10.15 at 19:33, <andrew.cooper3@citrix.com> wrote:
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -92,6 +92,9 @@ unsigned long __section(".bss.page_aligned")
>>>  static bool_t __initdata opt_hap_enabled = 1;
>>>  boolean_param("hap", opt_hap_enabled);
>>>  
>>> +bool_t opt_pod_enabled = 1;
>> __read_mostly?
>>
>>> --- a/xen/common/memory.c
>>> +++ b/xen/common/memory.c
>>> @@ -818,6 +818,10 @@ long do_memory_op(unsigned long cmd,
>>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>>          if ( unlikely(start_extent >= reservation.nr_extents) )
>>>              return start_extent;
>>>  
>>> +        if ( unlikely(!opt_pod_enabled) &&
>>> +             (reservation.mem_flags & XENMEMF_populate_on_demand) )
>>> +            return start_extent;
>> A few lines down we can see that XENMEMF_populate_on_demand
>> gets honored only for XENMEM_populate_physmap. Perhaps you
>> shouldn't fail the other two which ignore the flag anyway?
> Setting an unexpected flag surely ought to be an error? Admittedly that
> particular ship may have sailed WRT this public ABI.
>
>>  And
>> perhaps you should also fold this into the existing check?
>>
>> Also I don't think this is going to build on ARM.

Oops - I forgot to check for that, but you are correct.

> Because opt_pod_enabled is on x86 only, I suppose.
>
> I wouldn't be averse to a "const bool_t opt_pod_enabled = 0" in some
> convenient place, seems better than the alternative of ifdefs around here.

I will see what I can do.

~Andrew

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] x86/PoD: Command line option to prohibit any PoD operations
  2015-11-02 14:32     ` Ian Campbell
  2015-11-02 14:37       ` Andrew Cooper
@ 2015-11-02 14:38       ` Jan Beulich
  1 sibling, 0 replies; 16+ messages in thread
From: Jan Beulich @ 2015-11-02 14:38 UTC (permalink / raw)
  To: Ian Campbell; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 02.11.15 at 15:32, <ian.campbell@citrix.com> wrote:
> On Mon, 2015-11-02 at 06:53 -0700, Jan Beulich wrote:
>> > > > On 30.10.15 at 19:33, <andrew.cooper3@citrix.com> wrote:
>> > --- a/xen/common/memory.c
>> > +++ b/xen/common/memory.c
>> > @@ -818,6 +818,10 @@ long do_memory_op(unsigned long cmd,
>> > XEN_GUEST_HANDLE_PARAM(void) arg)
>> >          if ( unlikely(start_extent >= reservation.nr_extents) )
>> >              return start_extent;
>> >  
>> > +        if ( unlikely(!opt_pod_enabled) &&
>> > +             (reservation.mem_flags & XENMEMF_populate_on_demand) )
>> > +            return start_extent;
>> 
>> A few lines down we can see that XENMEMF_populate_on_demand
>> gets honored only for XENMEM_populate_physmap. Perhaps you
>> shouldn't fail the other two which ignore the flag anyway?
> 
> Setting an unexpected flag surely ought to be an error? Admittedly that
> particular ship may have sailed WRT this public ABI.

Without that latter aspect I would certainly answer the question
with "Yes".

Jan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] x86/PoD: Command line option to prohibit any PoD operations
  2015-10-30 18:33 ` [PATCH 4/4] x86/PoD: Command line option to prohibit any PoD operations Andrew Cooper
  2015-11-02 13:53   ` Jan Beulich
@ 2015-11-02 15:19   ` George Dunlap
  1 sibling, 0 replies; 16+ messages in thread
From: George Dunlap @ 2015-11-02 15:19 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel; +Cc: George Dunlap, Jan Beulich

On 30/10/15 18:33, Andrew Cooper wrote:
> PoD is only needed to cover a corner case with memory overcommit
> (rebooting a ballooned down VM with insufficient host RAM available).
> 
> Its use comes with a performance hit, and inoperability with other
> technologies such as PCI Passthrough.
> 
> Offer a command line option for administrators who want to be certain
> that PoD is not in use for their VMs.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

With the arm build addressed:

Acked-by: George Dunlap <george.dunlap@citrix.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] x86/PoD: Identify when a domain has already been killed from PoD exhaustion
  2015-11-02 14:32     ` Andrew Cooper
@ 2015-11-23 14:50       ` Jan Beulich
  2015-11-24 16:51       ` George Dunlap
  1 sibling, 0 replies; 16+ messages in thread
From: Jan Beulich @ 2015-11-23 14:50 UTC (permalink / raw)
  To: George Dunlap; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 02.11.15 at 15:32, <andrew.cooper3@citrix.com> wrote:
> On 02/11/15 14:07, George Dunlap wrote:
>> On 30/10/15 18:33, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/mm/p2m-pod.c
>>> +++ b/xen/arch/x86/mm/p2m-pod.c
>>> @@ -1048,7 +1048,7 @@ p2m_pod_demand_populate(struct p2m_domain *p2m, unsigned long gfn,
>>>      /* This check is done with the pod lock held.  This will make sure that
>>>       * even if d->is_dying changes under our feet, p2m_pod_empty_cache() 
>>>       * won't start until we're done. */
>>> -    if ( unlikely(d->is_dying) )
>>> +    if ( unlikely(d->is_dying) || p2m->pod.dead )
>> So after getting lost in a maze of twisty passages, it looks like
>> "d->is_dying" might be the wrong thing to check here.  d->is_dying is
>> *only* set, AFAICT, in two places:
>>  - in domain_kill(), which is only called for XEN_DOMCTL_destroydomain
>>  - in domain_create(), if the creation failed for some reason.
> 
> d->is_dying indicates that the domains resources are starting to be torn
> down.  It is the correct check here.
> 
>>
>> Would it make more sense to check d->is_shutting_down instead?
> 
> A domain resume hypercall can clear d->is_shutting_down, making it an
> inappropriate check.
> 
>>
>> Having some sort of pod-specific flag seems like the wrong solution.
> 
> All of this patch is the wrong solution, but is necessary until the
> generic p2m interface can properly represent errors.

George,

I still have the patch at the root of this thread in my to be applied
queue, but the discussion having abruptly ended here makes it hard
for me to tell whether I should keep or drop it.

Jan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] x86/PoD: Identify when a domain has already been killed from PoD exhaustion
  2015-11-02 14:32     ` Andrew Cooper
  2015-11-23 14:50       ` Jan Beulich
@ 2015-11-24 16:51       ` George Dunlap
  1 sibling, 0 replies; 16+ messages in thread
From: George Dunlap @ 2015-11-24 16:51 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Jan Beulich, Xen-devel

On Mon, Nov 2, 2015 at 2:32 PM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 02/11/15 14:07, George Dunlap wrote:
>> On 30/10/15 18:33, Andrew Cooper wrote:
>>> p2m_pod_demand_populate() can be entered repeatedly during a single path
>>> through the hypervisor, e.g. on a toolstack batch map operation.
>>>
>>> The domain might be crashed, but the interface currently lacks a way of
>>> passing an error back through the generic p2m layer.
>>>
>>> Longterm the p2m layer needs reworking to allow errors to be returned, but in
>>> the short term, avoid repeatedly re-sweeping the domain after it has already
>>> been crashed from PoD exhaustion.
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> ---
>>> CC: Jan Beulich <JBeulich@suse.com>
>>> CC: George Dunlap <george.dunlap@eu.citrix.com>
>>> ---
>>>  xen/arch/x86/mm/p2m-pod.c | 3 ++-
>>>  xen/include/asm-x86/p2m.h | 2 ++
>>>  2 files changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
>>> index be15cf3..6fb054f 100644
>>> --- a/xen/arch/x86/mm/p2m-pod.c
>>> +++ b/xen/arch/x86/mm/p2m-pod.c
>>> @@ -1048,7 +1048,7 @@ p2m_pod_demand_populate(struct p2m_domain *p2m, unsigned long gfn,
>>>      /* This check is done with the pod lock held.  This will make sure that
>>>       * even if d->is_dying changes under our feet, p2m_pod_empty_cache()
>>>       * won't start until we're done. */
>>> -    if ( unlikely(d->is_dying) )
>>> +    if ( unlikely(d->is_dying) || p2m->pod.dead )
>> So after getting lost in a maze of twisty passages, it looks like
>> "d->is_dying" might be the wrong thing to check here.  d->is_dying is
>> *only* set, AFAICT, in two places:
>>  - in domain_kill(), which is only called for XEN_DOMCTL_destroydomain
>>  - in domain_create(), if the creation failed for some reason.
>
> d->is_dying indicates that the domains resources are starting to be torn
> down.  It is the correct check here.
>
>>
>> Would it make more sense to check d->is_shutting_down instead?
>
> A domain resume hypercall can clear d->is_shutting_down, making it an
> inappropriate check.

Well if domain_resume() cleared it, then the sweep would be run one
more time and then crashed again, which wouldn't hurt too much.

The real reason d->is_shutting_down is an inappropriate check is that
there are lots of *other* reasons why a domain might shut down -- or
even crash -- which should not prevent p2m sweeps from happening on
the domain.  The only legitimate reason to do no more sweeps, other
than the final tear-down, is a failed PoD sweep.

So I think I've convinced myself that something like this is
(unfortunately) necessary.

I do think it should have a more descriptive name -- "out_of_memory"
or "sweep_failed" would be better than "dead".

In some imaginary future where p2m functions have proper error
handling, one could imagine the toolstack deciding to repopulate the
PoD cache and clear this flag, allowing the domain to continue rather
than crashing it.

Thanks,
 -George

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-11-24 16:51 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-30 18:33 [PATCH 0/4] Futher work after XSA-150 Andrew Cooper
2015-10-30 18:33 ` [PATCH 1/4] x86/PoD: Make p2m_pod_empty_cache() restartable Andrew Cooper
2015-11-02 12:17   ` George Dunlap
2015-10-30 18:33 ` [PATCH 2/4] x86/PoD: Identify when a domain has already been killed from PoD exhaustion Andrew Cooper
2015-11-02 14:07   ` George Dunlap
2015-11-02 14:32     ` Andrew Cooper
2015-11-23 14:50       ` Jan Beulich
2015-11-24 16:51       ` George Dunlap
2015-10-30 18:33 ` [PATCH 3/4] x86/mm: Return -ESRCH for an invalid foreign domid Andrew Cooper
2015-11-02 14:10   ` George Dunlap
2015-10-30 18:33 ` [PATCH 4/4] x86/PoD: Command line option to prohibit any PoD operations Andrew Cooper
2015-11-02 13:53   ` Jan Beulich
2015-11-02 14:32     ` Ian Campbell
2015-11-02 14:37       ` Andrew Cooper
2015-11-02 14:38       ` Jan Beulich
2015-11-02 15:19   ` George Dunlap

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.