xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared
@ 2016-05-26  3:55 Tamas K Lengyel
  2016-05-26 10:39 ` George Dunlap
  0 siblings, 1 reply; 22+ messages in thread
From: Tamas K Lengyel @ 2016-05-26  3:55 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Andrew Cooper, Tamas K Lengyel, Jan Beulich

Move sharing locks above altp2m to avoid locking order violation. Allow
applying altp2m mem_access settings when the hostp2m entries are shared.
Also, do not trigger PoD for hostp2m when setting altp2m mem_access to be
in-line with non-altp2m mem_access path. Also allow gfn remapping with
p2m_ram_shared type gfns in altp2m views.

When using mem_sharing in combination with altp2m, unsharing events overwrite
altp2m entries with that of the hostp2m (both remapped entries and mem_access
settings). User should take precaution to not share pages where this behavior
is undesired.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v4: Resolve locking problem by moving sharing locks above altp2m locks
---
 xen/arch/x86/mm/mm-locks.h | 30 +++++++++++++++---------------
 xen/arch/x86/mm/p2m.c      | 10 +++++-----
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
index 086c8bb..74fdfc1 100644
--- a/xen/arch/x86/mm/mm-locks.h
+++ b/xen/arch/x86/mm/mm-locks.h
@@ -242,6 +242,21 @@ declare_mm_lock(nestedp2m)
 
 declare_mm_rwlock(p2m);
 
+/* Sharing per page lock
+ *
+ * This is an external lock, not represented by an mm_lock_t. The memory
+ * sharing lock uses it to protect addition and removal of (gfn,domain)
+ * tuples to a shared page. We enforce order here against the p2m lock,
+ * which is taken after the page_lock to change the gfn's p2m entry.
+ *
+ * The lock is recursive because during share we lock two pages. */
+
+declare_mm_order_constraint(per_page_sharing)
+#define page_sharing_mm_pre_lock()   mm_enforce_order_lock_pre_per_page_sharing()
+#define page_sharing_mm_post_lock(l, r) \
+        mm_enforce_order_lock_post_per_page_sharing((l), (r))
+#define page_sharing_mm_unlock(l, r) mm_enforce_order_unlock((l), (r))
+
 /* Alternate P2M list lock (per-domain)
  *
  * A per-domain lock that protects the list of alternate p2m's.
@@ -287,21 +302,6 @@ declare_mm_rwlock(altp2m);
 #define p2m_locked_by_me(p)   mm_write_locked_by_me(&(p)->lock)
 #define gfn_locked_by_me(p,g) p2m_locked_by_me(p)
 
-/* Sharing per page lock
- *
- * This is an external lock, not represented by an mm_lock_t. The memory
- * sharing lock uses it to protect addition and removal of (gfn,domain)
- * tuples to a shared page. We enforce order here against the p2m lock,
- * which is taken after the page_lock to change the gfn's p2m entry.
- *
- * The lock is recursive because during share we lock two pages. */
-
-declare_mm_order_constraint(per_page_sharing)
-#define page_sharing_mm_pre_lock()   mm_enforce_order_lock_pre_per_page_sharing()
-#define page_sharing_mm_post_lock(l, r) \
-        mm_enforce_order_lock_post_per_page_sharing((l), (r))
-#define page_sharing_mm_unlock(l, r) mm_enforce_order_unlock((l), (r))
-
 /* PoD lock (per-p2m-table)
  * 
  * Protects private PoD data structs: entry and cache
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 9b19769..dc97082 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1768,10 +1768,10 @@ int p2m_set_altp2m_mem_access(struct domain *d, struct p2m_domain *hp2m,
     if ( !mfn_valid(mfn) )
     {
         mfn = hp2m->get_entry(hp2m, gfn_l, &t, &old_a,
-                              P2M_ALLOC | P2M_UNSHARE, &page_order, NULL);
+                              0, &page_order, NULL);
 
         rc = -ESRCH;
-        if ( !mfn_valid(mfn) || t != p2m_ram_rw )
+        if ( !mfn_valid(mfn) || (t != p2m_ram_rw && t != p2m_ram_shared) )
             return rc;
 
         /* If this is a superpage, copy that first */
@@ -2542,9 +2542,9 @@ int p2m_change_altp2m_gfn(struct domain *d, unsigned int idx,
     if ( !mfn_valid(mfn) )
     {
         mfn = hp2m->get_entry(hp2m, gfn_x(old_gfn), &t, &a,
-                              P2M_ALLOC | P2M_UNSHARE, &page_order, NULL);
+                              P2M_ALLOC, &page_order, NULL);
 
-        if ( !mfn_valid(mfn) || t != p2m_ram_rw )
+        if ( !mfn_valid(mfn) || (t != p2m_ram_rw && t != p2m_ram_shared) )
             goto out;
 
         /* If this is a superpage, copy that first */
@@ -2567,7 +2567,7 @@ int p2m_change_altp2m_gfn(struct domain *d, unsigned int idx,
     if ( !mfn_valid(mfn) )
         mfn = hp2m->get_entry(hp2m, gfn_x(new_gfn), &t, &a, 0, NULL, NULL);
 
-    if ( !mfn_valid(mfn) || (t != p2m_ram_rw) )
+    if ( !mfn_valid(mfn) || (t != p2m_ram_rw && t != p2m_ram_shared) )
         goto out;
 
     if ( !ap2m->set_entry(ap2m, gfn_x(old_gfn), mfn, PAGE_ORDER_4K, t, a,
-- 
2.8.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared
  2016-05-26  3:55 [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared Tamas K Lengyel
@ 2016-05-26 10:39 ` George Dunlap
  2016-05-26 16:17   ` Tamas K Lengyel
  0 siblings, 1 reply; 22+ messages in thread
From: George Dunlap @ 2016-05-26 10:39 UTC (permalink / raw)
  To: Tamas K Lengyel, xen-devel; +Cc: George Dunlap, Andrew Cooper, Jan Beulich

On 26/05/16 04:55, Tamas K Lengyel wrote:
> Move sharing locks above altp2m to avoid locking order violation. Allow
> applying altp2m mem_access settings when the hostp2m entries are shared.
> Also, do not trigger PoD for hostp2m when setting altp2m mem_access to be
> in-line with non-altp2m mem_access path. Also allow gfn remapping with
> p2m_ram_shared type gfns in altp2m views.
> 
> When using mem_sharing in combination with altp2m, unsharing events overwrite
> altp2m entries with that of the hostp2m (both remapped entries and mem_access
> settings). User should take precaution to not share pages where this behavior
> is undesired.

I'm afraid this is not acceptable.  How could this ever be even remotely
usable?  If this is a necessary side effect of sharing, then I think the
original functionality, of un-sharing when setting an altp2m entry is
the only option (and not allowing sharing to happen when an altp2m is
present for a particular gfn).

Hmm, but actually this also brings up another tricky point: In an altp2m
you can change the mfn which backs a gfn.  This would need to be handled
properly in the reverse map, which it doesn't look like it is at the moment.

On the whole, I think if you're going to allow a single gfn to be
simultaneously shared and allow an altp2m for it, you're going to need
to do a lot more work.

(Sorry for not catching a lot of this before...)

 -George

> 
> Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> v4: Resolve locking problem by moving sharing locks above altp2m locks
> ---
>  xen/arch/x86/mm/mm-locks.h | 30 +++++++++++++++---------------
>  xen/arch/x86/mm/p2m.c      | 10 +++++-----
>  2 files changed, 20 insertions(+), 20 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
> index 086c8bb..74fdfc1 100644
> --- a/xen/arch/x86/mm/mm-locks.h
> +++ b/xen/arch/x86/mm/mm-locks.h
> @@ -242,6 +242,21 @@ declare_mm_lock(nestedp2m)
>  
>  declare_mm_rwlock(p2m);
>  
> +/* Sharing per page lock
> + *
> + * This is an external lock, not represented by an mm_lock_t. The memory
> + * sharing lock uses it to protect addition and removal of (gfn,domain)
> + * tuples to a shared page. We enforce order here against the p2m lock,
> + * which is taken after the page_lock to change the gfn's p2m entry.
> + *
> + * The lock is recursive because during share we lock two pages. */
> +
> +declare_mm_order_constraint(per_page_sharing)
> +#define page_sharing_mm_pre_lock()   mm_enforce_order_lock_pre_per_page_sharing()
> +#define page_sharing_mm_post_lock(l, r) \
> +        mm_enforce_order_lock_post_per_page_sharing((l), (r))
> +#define page_sharing_mm_unlock(l, r) mm_enforce_order_unlock((l), (r))
> +
>  /* Alternate P2M list lock (per-domain)
>   *
>   * A per-domain lock that protects the list of alternate p2m's.
> @@ -287,21 +302,6 @@ declare_mm_rwlock(altp2m);
>  #define p2m_locked_by_me(p)   mm_write_locked_by_me(&(p)->lock)
>  #define gfn_locked_by_me(p,g) p2m_locked_by_me(p)
>  
> -/* Sharing per page lock
> - *
> - * This is an external lock, not represented by an mm_lock_t. The memory
> - * sharing lock uses it to protect addition and removal of (gfn,domain)
> - * tuples to a shared page. We enforce order here against the p2m lock,
> - * which is taken after the page_lock to change the gfn's p2m entry.
> - *
> - * The lock is recursive because during share we lock two pages. */
> -
> -declare_mm_order_constraint(per_page_sharing)
> -#define page_sharing_mm_pre_lock()   mm_enforce_order_lock_pre_per_page_sharing()
> -#define page_sharing_mm_post_lock(l, r) \
> -        mm_enforce_order_lock_post_per_page_sharing((l), (r))
> -#define page_sharing_mm_unlock(l, r) mm_enforce_order_unlock((l), (r))
> -
>  /* PoD lock (per-p2m-table)
>   * 
>   * Protects private PoD data structs: entry and cache
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index 9b19769..dc97082 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -1768,10 +1768,10 @@ int p2m_set_altp2m_mem_access(struct domain *d, struct p2m_domain *hp2m,
>      if ( !mfn_valid(mfn) )
>      {
>          mfn = hp2m->get_entry(hp2m, gfn_l, &t, &old_a,
> -                              P2M_ALLOC | P2M_UNSHARE, &page_order, NULL);
> +                              0, &page_order, NULL);
>  
>          rc = -ESRCH;
> -        if ( !mfn_valid(mfn) || t != p2m_ram_rw )
> +        if ( !mfn_valid(mfn) || (t != p2m_ram_rw && t != p2m_ram_shared) )
>              return rc;
>  
>          /* If this is a superpage, copy that first */
> @@ -2542,9 +2542,9 @@ int p2m_change_altp2m_gfn(struct domain *d, unsigned int idx,
>      if ( !mfn_valid(mfn) )
>      {
>          mfn = hp2m->get_entry(hp2m, gfn_x(old_gfn), &t, &a,
> -                              P2M_ALLOC | P2M_UNSHARE, &page_order, NULL);
> +                              P2M_ALLOC, &page_order, NULL);
>  
> -        if ( !mfn_valid(mfn) || t != p2m_ram_rw )
> +        if ( !mfn_valid(mfn) || (t != p2m_ram_rw && t != p2m_ram_shared) )
>              goto out;
>  
>          /* If this is a superpage, copy that first */
> @@ -2567,7 +2567,7 @@ int p2m_change_altp2m_gfn(struct domain *d, unsigned int idx,
>      if ( !mfn_valid(mfn) )
>          mfn = hp2m->get_entry(hp2m, gfn_x(new_gfn), &t, &a, 0, NULL, NULL);
>  
> -    if ( !mfn_valid(mfn) || (t != p2m_ram_rw) )
> +    if ( !mfn_valid(mfn) || (t != p2m_ram_rw && t != p2m_ram_shared) )
>          goto out;
>  
>      if ( !ap2m->set_entry(ap2m, gfn_x(old_gfn), mfn, PAGE_ORDER_4K, t, a,
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared
  2016-05-26 10:39 ` George Dunlap
@ 2016-05-26 16:17   ` Tamas K Lengyel
  2016-06-11 17:55     ` Tamas K Lengyel
  2016-07-15  8:57     ` George Dunlap
  0 siblings, 2 replies; 22+ messages in thread
From: Tamas K Lengyel @ 2016-05-26 16:17 UTC (permalink / raw)
  To: George Dunlap; +Cc: George Dunlap, Xen-devel, Jan Beulich, Andrew Cooper


[-- Attachment #1.1: Type: text/plain, Size: 6530 bytes --]

On May 26, 2016 04:40, "George Dunlap" <george.dunlap@citrix.com> wrote:
>
> On 26/05/16 04:55, Tamas K Lengyel wrote:
> > Move sharing locks above altp2m to avoid locking order violation. Allow
> > applying altp2m mem_access settings when the hostp2m entries are shared.
> > Also, do not trigger PoD for hostp2m when setting altp2m mem_access to
be
> > in-line with non-altp2m mem_access path. Also allow gfn remapping with
> > p2m_ram_shared type gfns in altp2m views.
> >
> > When using mem_sharing in combination with altp2m, unsharing events
overwrite
> > altp2m entries with that of the hostp2m (both remapped entries and
mem_access
> > settings). User should take precaution to not share pages where this
behavior
> > is undesired.
>
> I'm afraid this is not acceptable.  How could this ever be even remotely
> usable?  If this is a necessary side effect of sharing, then I think the
> original functionality, of un-sharing when setting an altp2m entry is
> the only option (and not allowing sharing to happen when an altp2m is
> present for a particular gfn).
>
> Hmm, but actually this also brings up another tricky point: In an altp2m
> you can change the mfn which backs a gfn.  This would need to be handled
> properly in the reverse map, which it doesn't look like it is at the
moment.
>
> On the whole, I think if you're going to allow a single gfn to be
> simultaneously shared and allow an altp2m for it, you're going to need
> to do a lot more work.
>
> (Sorry for not catching a lot of this before...)
>

Well this patch resolves the locking order violation and allows the
xen-access tool's altp2m tests to pass, so it does improve on the current
situation which is a hypervisor crash. To help with the override issue the
user can apply W mem_access permission on the shared hostp2m entries. That
way they get notification through vm_event of the event that leads to
unsharing and can then reapply the altp2m changes. So IMHO this patch is
already quite workable and while it requires more setup from the userside,
the VMM side is OK with this change.

Tamas

>
> >
> > Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
> > ---
> > Cc: George Dunlap <george.dunlap@eu.citrix.com>
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> >
> > v4: Resolve locking problem by moving sharing locks above altp2m locks
> > ---
> >  xen/arch/x86/mm/mm-locks.h | 30 +++++++++++++++---------------
> >  xen/arch/x86/mm/p2m.c      | 10 +++++-----
> >  2 files changed, 20 insertions(+), 20 deletions(-)
> >
> > diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
> > index 086c8bb..74fdfc1 100644
> > --- a/xen/arch/x86/mm/mm-locks.h
> > +++ b/xen/arch/x86/mm/mm-locks.h
> > @@ -242,6 +242,21 @@ declare_mm_lock(nestedp2m)
> >
> >  declare_mm_rwlock(p2m);
> >
> > +/* Sharing per page lock
> > + *
> > + * This is an external lock, not represented by an mm_lock_t. The
memory
> > + * sharing lock uses it to protect addition and removal of (gfn,domain)
> > + * tuples to a shared page. We enforce order here against the p2m lock,
> > + * which is taken after the page_lock to change the gfn's p2m entry.
> > + *
> > + * The lock is recursive because during share we lock two pages. */
> > +
> > +declare_mm_order_constraint(per_page_sharing)
> > +#define page_sharing_mm_pre_lock()
 mm_enforce_order_lock_pre_per_page_sharing()
> > +#define page_sharing_mm_post_lock(l, r) \
> > +        mm_enforce_order_lock_post_per_page_sharing((l), (r))
> > +#define page_sharing_mm_unlock(l, r) mm_enforce_order_unlock((l), (r))
> > +
> >  /* Alternate P2M list lock (per-domain)
> >   *
> >   * A per-domain lock that protects the list of alternate p2m's.
> > @@ -287,21 +302,6 @@ declare_mm_rwlock(altp2m);
> >  #define p2m_locked_by_me(p)   mm_write_locked_by_me(&(p)->lock)
> >  #define gfn_locked_by_me(p,g) p2m_locked_by_me(p)
> >
> > -/* Sharing per page lock
> > - *
> > - * This is an external lock, not represented by an mm_lock_t. The
memory
> > - * sharing lock uses it to protect addition and removal of (gfn,domain)
> > - * tuples to a shared page. We enforce order here against the p2m lock,
> > - * which is taken after the page_lock to change the gfn's p2m entry.
> > - *
> > - * The lock is recursive because during share we lock two pages. */
> > -
> > -declare_mm_order_constraint(per_page_sharing)
> > -#define page_sharing_mm_pre_lock()
 mm_enforce_order_lock_pre_per_page_sharing()
> > -#define page_sharing_mm_post_lock(l, r) \
> > -        mm_enforce_order_lock_post_per_page_sharing((l), (r))
> > -#define page_sharing_mm_unlock(l, r) mm_enforce_order_unlock((l), (r))
> > -
> >  /* PoD lock (per-p2m-table)
> >   *
> >   * Protects private PoD data structs: entry and cache
> > diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> > index 9b19769..dc97082 100644
> > --- a/xen/arch/x86/mm/p2m.c
> > +++ b/xen/arch/x86/mm/p2m.c
> > @@ -1768,10 +1768,10 @@ int p2m_set_altp2m_mem_access(struct domain *d,
struct p2m_domain *hp2m,
> >      if ( !mfn_valid(mfn) )
> >      {
> >          mfn = hp2m->get_entry(hp2m, gfn_l, &t, &old_a,
> > -                              P2M_ALLOC | P2M_UNSHARE, &page_order,
NULL);
> > +                              0, &page_order, NULL);
> >
> >          rc = -ESRCH;
> > -        if ( !mfn_valid(mfn) || t != p2m_ram_rw )
> > +        if ( !mfn_valid(mfn) || (t != p2m_ram_rw && t !=
p2m_ram_shared) )
> >              return rc;
> >
> >          /* If this is a superpage, copy that first */
> > @@ -2542,9 +2542,9 @@ int p2m_change_altp2m_gfn(struct domain *d,
unsigned int idx,
> >      if ( !mfn_valid(mfn) )
> >      {
> >          mfn = hp2m->get_entry(hp2m, gfn_x(old_gfn), &t, &a,
> > -                              P2M_ALLOC | P2M_UNSHARE, &page_order,
NULL);
> > +                              P2M_ALLOC, &page_order, NULL);
> >
> > -        if ( !mfn_valid(mfn) || t != p2m_ram_rw )
> > +        if ( !mfn_valid(mfn) || (t != p2m_ram_rw && t !=
p2m_ram_shared) )
> >              goto out;
> >
> >          /* If this is a superpage, copy that first */
> > @@ -2567,7 +2567,7 @@ int p2m_change_altp2m_gfn(struct domain *d,
unsigned int idx,
> >      if ( !mfn_valid(mfn) )
> >          mfn = hp2m->get_entry(hp2m, gfn_x(new_gfn), &t, &a, 0, NULL,
NULL);
> >
> > -    if ( !mfn_valid(mfn) || (t != p2m_ram_rw) )
> > +    if ( !mfn_valid(mfn) || (t != p2m_ram_rw && t != p2m_ram_shared) )
> >          goto out;
> >
> >      if ( !ap2m->set_entry(ap2m, gfn_x(old_gfn), mfn, PAGE_ORDER_4K, t,
a,
> >
>

[-- Attachment #1.2: Type: text/html, Size: 8546 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared
  2016-05-26 16:17   ` Tamas K Lengyel
@ 2016-06-11 17:55     ` Tamas K Lengyel
  2016-06-13  9:28       ` George Dunlap
  2016-07-15  8:57     ` George Dunlap
  1 sibling, 1 reply; 22+ messages in thread
From: Tamas K Lengyel @ 2016-06-11 17:55 UTC (permalink / raw)
  To: George Dunlap; +Cc: George Dunlap, Xen-devel, Jan Beulich, Andrew Cooper

On Thu, May 26, 2016 at 10:17 AM, Tamas K Lengyel <tamas@tklengyel.com> wrote:
>
> On May 26, 2016 04:40, "George Dunlap" <george.dunlap@citrix.com> wrote:
>>
>> On 26/05/16 04:55, Tamas K Lengyel wrote:
>> > Move sharing locks above altp2m to avoid locking order violation. Allow
>> > applying altp2m mem_access settings when the hostp2m entries are shared.
>> > Also, do not trigger PoD for hostp2m when setting altp2m mem_access to
>> > be
>> > in-line with non-altp2m mem_access path. Also allow gfn remapping with
>> > p2m_ram_shared type gfns in altp2m views.
>> >
>> > When using mem_sharing in combination with altp2m, unsharing events
>> > overwrite
>> > altp2m entries with that of the hostp2m (both remapped entries and
>> > mem_access
>> > settings). User should take precaution to not share pages where this
>> > behavior
>> > is undesired.
>>
>> I'm afraid this is not acceptable.  How could this ever be even remotely
>> usable?  If this is a necessary side effect of sharing, then I think the
>> original functionality, of un-sharing when setting an altp2m entry is
>> the only option (and not allowing sharing to happen when an altp2m is
>> present for a particular gfn).
>>
>> Hmm, but actually this also brings up another tricky point: In an altp2m
>> you can change the mfn which backs a gfn.  This would need to be handled
>> properly in the reverse map, which it doesn't look like it is at the
>> moment.
>>
>> On the whole, I think if you're going to allow a single gfn to be
>> simultaneously shared and allow an altp2m for it, you're going to need
>> to do a lot more work.
>>
>> (Sorry for not catching a lot of this before...)
>>
>
> Well this patch resolves the locking order violation and allows the
> xen-access tool's altp2m tests to pass, so it does improve on the current
> situation which is a hypervisor crash. To help with the override issue the
> user can apply W mem_access permission on the shared hostp2m entries. That
> way they get notification through vm_event of the event that leads to
> unsharing and can then reapply the altp2m changes. So IMHO this patch is
> already quite workable and while it requires more setup from the userside,
> the VMM side is OK with this change.

Ping. Can I get an update on what the verdict is on this patch?

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared
  2016-06-11 17:55     ` Tamas K Lengyel
@ 2016-06-13  9:28       ` George Dunlap
  2016-06-13 17:20         ` Tamas K Lengyel
  0 siblings, 1 reply; 22+ messages in thread
From: George Dunlap @ 2016-06-13  9:28 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: George Dunlap, Xen-devel, Jan Beulich, Andrew Cooper

On 11/06/16 18:55, Tamas K Lengyel wrote:
> On Thu, May 26, 2016 at 10:17 AM, Tamas K Lengyel <tamas@tklengyel.com> wrote:
>>
>> On May 26, 2016 04:40, "George Dunlap" <george.dunlap@citrix.com> wrote:
>>>
>>> On 26/05/16 04:55, Tamas K Lengyel wrote:
>>>> Move sharing locks above altp2m to avoid locking order violation. Allow
>>>> applying altp2m mem_access settings when the hostp2m entries are shared.
>>>> Also, do not trigger PoD for hostp2m when setting altp2m mem_access to
>>>> be
>>>> in-line with non-altp2m mem_access path. Also allow gfn remapping with
>>>> p2m_ram_shared type gfns in altp2m views.
>>>>
>>>> When using mem_sharing in combination with altp2m, unsharing events
>>>> overwrite
>>>> altp2m entries with that of the hostp2m (both remapped entries and
>>>> mem_access
>>>> settings). User should take precaution to not share pages where this
>>>> behavior
>>>> is undesired.
>>>
>>> I'm afraid this is not acceptable.  How could this ever be even remotely
>>> usable?  If this is a necessary side effect of sharing, then I think the
>>> original functionality, of un-sharing when setting an altp2m entry is
>>> the only option (and not allowing sharing to happen when an altp2m is
>>> present for a particular gfn).
>>>
>>> Hmm, but actually this also brings up another tricky point: In an altp2m
>>> you can change the mfn which backs a gfn.  This would need to be handled
>>> properly in the reverse map, which it doesn't look like it is at the
>>> moment.
>>>
>>> On the whole, I think if you're going to allow a single gfn to be
>>> simultaneously shared and allow an altp2m for it, you're going to need
>>> to do a lot more work.
>>>
>>> (Sorry for not catching a lot of this before...)
>>>
>>
>> Well this patch resolves the locking order violation and allows the
>> xen-access tool's altp2m tests to pass, so it does improve on the current
>> situation which is a hypervisor crash. To help with the override issue the
>> user can apply W mem_access permission on the shared hostp2m entries. That
>> way they get notification through vm_event of the event that leads to
>> unsharing and can then reapply the altp2m changes. So IMHO this patch is
>> already quite workable and while it requires more setup from the userside,
>> the VMM side is OK with this change.
> 
> Ping. Can I get an update on what the verdict is on this patch?

To get a proper verdict requires actually taking a deeper look and
thinking carefully about things :-)  Sorry for the delay -- hopefully I
can get to this this week.

Peace,
 -George



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared
  2016-06-13  9:28       ` George Dunlap
@ 2016-06-13 17:20         ` Tamas K Lengyel
  2016-07-12 17:29           ` Tamas K Lengyel
  0 siblings, 1 reply; 22+ messages in thread
From: Tamas K Lengyel @ 2016-06-13 17:20 UTC (permalink / raw)
  To: George Dunlap; +Cc: George Dunlap, Xen-devel, Jan Beulich, Andrew Cooper

On Mon, Jun 13, 2016 at 3:28 AM, George Dunlap <george.dunlap@citrix.com> wrote:
> On 11/06/16 18:55, Tamas K Lengyel wrote:
>> On Thu, May 26, 2016 at 10:17 AM, Tamas K Lengyel <tamas@tklengyel.com> wrote:
>>>
>>> On May 26, 2016 04:40, "George Dunlap" <george.dunlap@citrix.com> wrote:
>>>>
>>>> On 26/05/16 04:55, Tamas K Lengyel wrote:
>>>>> Move sharing locks above altp2m to avoid locking order violation. Allow
>>>>> applying altp2m mem_access settings when the hostp2m entries are shared.
>>>>> Also, do not trigger PoD for hostp2m when setting altp2m mem_access to
>>>>> be
>>>>> in-line with non-altp2m mem_access path. Also allow gfn remapping with
>>>>> p2m_ram_shared type gfns in altp2m views.
>>>>>
>>>>> When using mem_sharing in combination with altp2m, unsharing events
>>>>> overwrite
>>>>> altp2m entries with that of the hostp2m (both remapped entries and
>>>>> mem_access
>>>>> settings). User should take precaution to not share pages where this
>>>>> behavior
>>>>> is undesired.
>>>>
>>>> I'm afraid this is not acceptable.  How could this ever be even remotely
>>>> usable?  If this is a necessary side effect of sharing, then I think the
>>>> original functionality, of un-sharing when setting an altp2m entry is
>>>> the only option (and not allowing sharing to happen when an altp2m is
>>>> present for a particular gfn).
>>>>
>>>> Hmm, but actually this also brings up another tricky point: In an altp2m
>>>> you can change the mfn which backs a gfn.  This would need to be handled
>>>> properly in the reverse map, which it doesn't look like it is at the
>>>> moment.
>>>>
>>>> On the whole, I think if you're going to allow a single gfn to be
>>>> simultaneously shared and allow an altp2m for it, you're going to need
>>>> to do a lot more work.
>>>>
>>>> (Sorry for not catching a lot of this before...)
>>>>
>>>
>>> Well this patch resolves the locking order violation and allows the
>>> xen-access tool's altp2m tests to pass, so it does improve on the current
>>> situation which is a hypervisor crash. To help with the override issue the
>>> user can apply W mem_access permission on the shared hostp2m entries. That
>>> way they get notification through vm_event of the event that leads to
>>> unsharing and can then reapply the altp2m changes. So IMHO this patch is
>>> already quite workable and while it requires more setup from the userside,
>>> the VMM side is OK with this change.
>>
>> Ping. Can I get an update on what the verdict is on this patch?
>
> To get a proper verdict requires actually taking a deeper look and
> thinking carefully about things :-)  Sorry for the delay -- hopefully I
> can get to this this week.

Thanks, no rush ;) Just wanted to keep the discussion on the radar.
The only caveat with what I proposed above with using mem_access to
get notified of unsharing is that the mem_access notification is
currently sent before the unsharing in hvm/hvm.c. This means the user
getting the mem_access notification has to also do the unsharing
before reapplying the altp2m changes as well. It can be done so it
would still work just fine, just has to keep in mind. We could also
swap around the order of things so unsharing happens before the
mem_access notification happens but I don't think it's necessary.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared
  2016-06-13 17:20         ` Tamas K Lengyel
@ 2016-07-12 17:29           ` Tamas K Lengyel
  0 siblings, 0 replies; 22+ messages in thread
From: Tamas K Lengyel @ 2016-07-12 17:29 UTC (permalink / raw)
  To: George Dunlap; +Cc: George Dunlap, Xen-devel, Jan Beulich, Andrew Cooper

On Mon, Jun 13, 2016 at 11:20 AM, Tamas K Lengyel <tamas@tklengyel.com> wrote:
> On Mon, Jun 13, 2016 at 3:28 AM, George Dunlap <george.dunlap@citrix.com> wrote:
>> On 11/06/16 18:55, Tamas K Lengyel wrote:
>>> On Thu, May 26, 2016 at 10:17 AM, Tamas K Lengyel <tamas@tklengyel.com> wrote:
>>>>
>>>> On May 26, 2016 04:40, "George Dunlap" <george.dunlap@citrix.com> wrote:
>>>>>
>>>>> On 26/05/16 04:55, Tamas K Lengyel wrote:
>>>>>> Move sharing locks above altp2m to avoid locking order violation. Allow
>>>>>> applying altp2m mem_access settings when the hostp2m entries are shared.
>>>>>> Also, do not trigger PoD for hostp2m when setting altp2m mem_access to
>>>>>> be
>>>>>> in-line with non-altp2m mem_access path. Also allow gfn remapping with
>>>>>> p2m_ram_shared type gfns in altp2m views.
>>>>>>
>>>>>> When using mem_sharing in combination with altp2m, unsharing events
>>>>>> overwrite
>>>>>> altp2m entries with that of the hostp2m (both remapped entries and
>>>>>> mem_access
>>>>>> settings). User should take precaution to not share pages where this
>>>>>> behavior
>>>>>> is undesired.
>>>>>
>>>>> I'm afraid this is not acceptable.  How could this ever be even remotely
>>>>> usable?  If this is a necessary side effect of sharing, then I think the
>>>>> original functionality, of un-sharing when setting an altp2m entry is
>>>>> the only option (and not allowing sharing to happen when an altp2m is
>>>>> present for a particular gfn).
>>>>>
>>>>> Hmm, but actually this also brings up another tricky point: In an altp2m
>>>>> you can change the mfn which backs a gfn.  This would need to be handled
>>>>> properly in the reverse map, which it doesn't look like it is at the
>>>>> moment.
>>>>>
>>>>> On the whole, I think if you're going to allow a single gfn to be
>>>>> simultaneously shared and allow an altp2m for it, you're going to need
>>>>> to do a lot more work.
>>>>>
>>>>> (Sorry for not catching a lot of this before...)
>>>>>
>>>>
>>>> Well this patch resolves the locking order violation and allows the
>>>> xen-access tool's altp2m tests to pass, so it does improve on the current
>>>> situation which is a hypervisor crash. To help with the override issue the
>>>> user can apply W mem_access permission on the shared hostp2m entries. That
>>>> way they get notification through vm_event of the event that leads to
>>>> unsharing and can then reapply the altp2m changes. So IMHO this patch is
>>>> already quite workable and while it requires more setup from the userside,
>>>> the VMM side is OK with this change.
>>>
>>> Ping. Can I get an update on what the verdict is on this patch?
>>
>> To get a proper verdict requires actually taking a deeper look and
>> thinking carefully about things :-)  Sorry for the delay -- hopefully I
>> can get to this this week.
>
> Thanks, no rush ;) Just wanted to keep the discussion on the radar.
> The only caveat with what I proposed above with using mem_access to
> get notified of unsharing is that the mem_access notification is
> currently sent before the unsharing in hvm/hvm.c. This means the user
> getting the mem_access notification has to also do the unsharing
> before reapplying the altp2m changes as well. It can be done so it
> would still work just fine, just has to keep in mind. We could also
> swap around the order of things so unsharing happens before the
> mem_access notification happens but I don't think it's necessary.
>
> Tamas

PATCH ping.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared
  2016-05-26 16:17   ` Tamas K Lengyel
  2016-06-11 17:55     ` Tamas K Lengyel
@ 2016-07-15  8:57     ` George Dunlap
  2016-07-15 14:59       ` Tamas K Lengyel
  1 sibling, 1 reply; 22+ messages in thread
From: George Dunlap @ 2016-07-15  8:57 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: Xen-devel, Jan Beulich, Andrew Cooper

On Thu, May 26, 2016 at 5:17 PM, Tamas K Lengyel <tamas@tklengyel.com> wrote:
>
> On May 26, 2016 04:40, "George Dunlap" <george.dunlap@citrix.com> wrote:
>>
>> On 26/05/16 04:55, Tamas K Lengyel wrote:
>> > Move sharing locks above altp2m to avoid locking order violation. Allow
>> > applying altp2m mem_access settings when the hostp2m entries are shared.
>> > Also, do not trigger PoD for hostp2m when setting altp2m mem_access to
>> > be
>> > in-line with non-altp2m mem_access path. Also allow gfn remapping with
>> > p2m_ram_shared type gfns in altp2m views.
>> >
>> > When using mem_sharing in combination with altp2m, unsharing events
>> > overwrite
>> > altp2m entries with that of the hostp2m (both remapped entries and
>> > mem_access
>> > settings). User should take precaution to not share pages where this
>> > behavior
>> > is undesired.
>>
>> I'm afraid this is not acceptable.  How could this ever be even remotely
>> usable?  If this is a necessary side effect of sharing, then I think the
>> original functionality, of un-sharing when setting an altp2m entry is
>> the only option (and not allowing sharing to happen when an altp2m is
>> present for a particular gfn).
>>
>> Hmm, but actually this also brings up another tricky point: In an altp2m
>> you can change the mfn which backs a gfn.  This would need to be handled
>> properly in the reverse map, which it doesn't look like it is at the
>> moment.
>>
>> On the whole, I think if you're going to allow a single gfn to be
>> simultaneously shared and allow an altp2m for it, you're going to need
>> to do a lot more work.
>>
>> (Sorry for not catching a lot of this before...)
>>
>
> Well this patch resolves the locking order violation and allows the
> xen-access tool's altp2m tests to pass, so it does improve on the current
> situation which is a hypervisor crash. To help with the override issue the
> user can apply W mem_access permission on the shared hostp2m entries. That
> way they get notification through vm_event of the event that leads to
> unsharing and can then reapply the altp2m changes. So IMHO this patch is
> already quite workable and while it requires more setup from the userside,
> the VMM side is OK with this change.

So correct me if I'm wrong here.

The altp2m stuff was primarily designed for guest operating systems to
be able to make alternate "views" of their own P2M.  When enabled,
extra p2ms are allocated which allow a VM to remap a gfn to point to
the gfn of an mfn in its own address space.

For example, suppose domain 1 host p2m gfns A, B, and C point to mA,
mB, and mC respectively.  The guest can create an altp2m a1 such that
gfns O, P, and Q also point to mA, mB, and mC.  The guest can also
register to receive #VE for violations of an altp2m which are not
violations under the hostp2m.

mem sharing allows a process in domain 0 (or some other privileged
domain) to nominate two gfns in different domains to be shared; say,
domain 1 gfn A and domain 2 gfn X (d1gA and d2gX, respectively).  This
works by first "nominating" the respective gfns, then sharing them.
"Nominating" a gfn tells the sharing infrastructure to start tracking
mappings of that page in a reverse map; after nomination mA's page
structure will point to d1gA, and mX's page structure will point to
d2gX.  At that point they can be "shared", at which point (for
example) both d1gA and d2gX will point to mA, and mA's reverse map
will contain d1gA and d2gX.

However, the mem sharing code has no visibility into altp2ms.  There
are two cases we need to consider: the sharing of a gfn for which
there is another gfn in an altp2m pointing to it (as in the case of
domain 1 gfn A above -- both hostp2m gfn A and alt2m gfn O point to
mA), and the sharing of a gfn for which there is an altp2m if that gfn
pointing to a different gfn (as in the case of domain 1 gfn O above --
hostp2m gfn O points to mO, altp2m gfn O points to mA).  Then we also
have to consider the order in which things happen: altp2m then
sharing, sharing then altp2m.

Let's first consider the case of domain 1 gfn A being shared.  At the
moment, if the situation described above happens in that order (altp2m
set up first, then sharing) then the page nomination of d1gA will most
likely fail because the refcount on mA will be one more than expected.
If it happened in the reverse order (sharing set up first, then
altp2m), then setting the altp2m would unshare d1gA.  Both should be
safe.

Now what happens if we accept your patch as-is?  It looks like altp2m
first then sharing will behave the same way -- the refcount will be
too high, so the nomination will fail.  But if you share first and
then set the altp2m, then the altp2m gfn O will have a reference to mA
that isn't in your reverse map.  If you get an unshare on domain 1
altp2m gfn O, it looks to me like you'll get an unshare on *hostp2m*
gfn O, which points to mO, which is *not* shared -- at which point it
looks like you'll BUG().  If you get an unshare on domain 1 gfn A, it
looks to me like you'll get a new page, mN, at gfn A, but that altp2m
gfn O will still still point to mA -- effectively breaking whatever
the guest meant to be doing when it was using the altp2ms.

I could go on in the analysis, but the point is that there's a morass
of interactions here all of which need to be correct, which this patch
does not address.  You have a long way to go before sharing and altp2m
can be safely used on the same gfn ranges.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared
  2016-07-15  8:57     ` George Dunlap
@ 2016-07-15 14:59       ` Tamas K Lengyel
  2016-07-18 11:04         ` George Dunlap
  0 siblings, 1 reply; 22+ messages in thread
From: Tamas K Lengyel @ 2016-07-15 14:59 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andrew Cooper, Jan Beulich, Xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 6845 bytes --]

On Jul 15, 2016 02:57, "George Dunlap" <george.dunlap@citrix.com> wrote:
>
> On Thu, May 26, 2016 at 5:17 PM, Tamas K Lengyel <tamas@tklengyel.com>
wrote:
> >
> > On May 26, 2016 04:40, "George Dunlap" <george.dunlap@citrix.com> wrote:
> >>
> >> On 26/05/16 04:55, Tamas K Lengyel wrote:
> >> > Move sharing locks above altp2m to avoid locking order violation.
Allow
> >> > applying altp2m mem_access settings when the hostp2m entries are
shared.
> >> > Also, do not trigger PoD for hostp2m when setting altp2m mem_access
to
> >> > be
> >> > in-line with non-altp2m mem_access path. Also allow gfn remapping
with
> >> > p2m_ram_shared type gfns in altp2m views.
> >> >
> >> > When using mem_sharing in combination with altp2m, unsharing events
> >> > overwrite
> >> > altp2m entries with that of the hostp2m (both remapped entries and
> >> > mem_access
> >> > settings). User should take precaution to not share pages where this
> >> > behavior
> >> > is undesired.
> >>
> >> I'm afraid this is not acceptable.  How could this ever be even
remotely
> >> usable?  If this is a necessary side effect of sharing, then I think
the
> >> original functionality, of un-sharing when setting an altp2m entry is
> >> the only option (and not allowing sharing to happen when an altp2m is
> >> present for a particular gfn).
> >>
> >> Hmm, but actually this also brings up another tricky point: In an
altp2m
> >> you can change the mfn which backs a gfn.  This would need to be
handled
> >> properly in the reverse map, which it doesn't look like it is at the
> >> moment.
> >>
> >> On the whole, I think if you're going to allow a single gfn to be
> >> simultaneously shared and allow an altp2m for it, you're going to need
> >> to do a lot more work.
> >>
> >> (Sorry for not catching a lot of this before...)
> >>
> >
> > Well this patch resolves the locking order violation and allows the
> > xen-access tool's altp2m tests to pass, so it does improve on the
current
> > situation which is a hypervisor crash. To help with the override issue
the
> > user can apply W mem_access permission on the shared hostp2m entries.
That
> > way they get notification through vm_event of the event that leads to
> > unsharing and can then reapply the altp2m changes. So IMHO this patch is
> > already quite workable and while it requires more setup from the
userside,
> > the VMM side is OK with this change.
>
> So correct me if I'm wrong here.
>
> The altp2m stuff was primarily designed for guest operating systems to
> be able to make alternate "views" of their own P2M.  When enabled,
> extra p2ms are allocated which allow a VM to remap a gfn to point to
> the gfn of an mfn in its own address space.
>
> For example, suppose domain 1 host p2m gfns A, B, and C point to mA,
> mB, and mC respectively.  The guest can create an altp2m a1 such that
> gfns O, P, and Q also point to mA, mB, and mC.  The guest can also
> register to receive #VE for violations of an altp2m which are not
> violations under the hostp2m.
>
> mem sharing allows a process in domain 0 (or some other privileged
> domain) to nominate two gfns in different domains to be shared; say,
> domain 1 gfn A and domain 2 gfn X (d1gA and d2gX, respectively).  This
> works by first "nominating" the respective gfns, then sharing them.
> "Nominating" a gfn tells the sharing infrastructure to start tracking
> mappings of that page in a reverse map; after nomination mA's page
> structure will point to d1gA, and mX's page structure will point to
> d2gX.  At that point they can be "shared", at which point (for
> example) both d1gA and d2gX will point to mA, and mA's reverse map
> will contain d1gA and d2gX.
>
> However, the mem sharing code has no visibility into altp2ms.  There
> are two cases we need to consider: the sharing of a gfn for which
> there is another gfn in an altp2m pointing to it (as in the case of
> domain 1 gfn A above -- both hostp2m gfn A and alt2m gfn O point to
> mA), and the sharing of a gfn for which there is an altp2m if that gfn
> pointing to a different gfn (as in the case of domain 1 gfn O above --
> hostp2m gfn O points to mO, altp2m gfn O points to mA).  Then we also
> have to consider the order in which things happen: altp2m then
> sharing, sharing then altp2m.
>
> Let's first consider the case of domain 1 gfn A being shared.  At the
> moment, if the situation described above happens in that order (altp2m
> set up first, then sharing) then the page nomination of d1gA will most
> likely fail because the refcount on mA will be one more than expected.
> If it happened in the reverse order (sharing set up first, then
> altp2m), then setting the altp2m would unshare d1gA.  Both should be
> safe.
>
> Now what happens if we accept your patch as-is?  It looks like altp2m
> first then sharing will behave the same way -- the refcount will be
> too high, so the nomination will fail.  But if you share first and
> then set the altp2m, then the altp2m gfn O will have a reference to mA
> that isn't in your reverse map.  If you get an unshare on domain 1
> altp2m gfn O, it looks to me like you'll get an unshare on *hostp2m*
> gfn O, which points to mO, which is *not* shared -- at which point it
> looks like you'll BUG().  If you get an unshare on domain 1 gfn A, it
> looks to me like you'll get a new page, mN, at gfn A, but that altp2m
> gfn O will still still point to mA -- effectively breaking whatever
> the guest meant to be doing when it was using the altp2ms.
>
> I could go on in the analysis, but the point is that there's a morass
> of interactions here all of which need to be correct, which this patch
> does not address.  You have a long way to go before sharing and altp2m
> can be safely used on the same gfn ranges.
>

Hi George,
certainly there are cornercases where if the user does not setup things in
the right order things can still go out of whack. My goal with this patch
is not to address all of them. The goal of this patch is to not crash the
hypervisor when the user at least tries to experiment with it, which is the
current state. So this patch improves the status quo. Also, both
mem_sharing and altp2m is considered experimental/tech_preview, so the fact
that their combination is also experimental should be assumed as well. As I
explained, with this patch in place there is at least one way they can be
safely used together if the user tracks unsharing requirements through
mem_access and does unsharing and fixup of the altp2m views manually. There
are other ways where it would not be safe as after unsharing we don't know
how the user would want things to look in altp2ms. I don't think we want to
start guessing about that either so I will not be looking to implement
that. So I don't agree with this reasoning being grounds for rejecting this
patch that does incrementally improve the current state.

Tamas

[-- Attachment #1.2: Type: text/html, Size: 8356 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared
  2016-07-15 14:59       ` Tamas K Lengyel
@ 2016-07-18 11:04         ` George Dunlap
  2016-07-18 15:18           ` Tamas K Lengyel
  0 siblings, 1 reply; 22+ messages in thread
From: George Dunlap @ 2016-07-18 11:04 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: Andrew Cooper, Jan Beulich, Xen-devel

On Fri, Jul 15, 2016 at 3:59 PM, Tamas K Lengyel <tamas@tklengyel.com> wrote:
>> I could go on in the analysis, but the point is that there's a morass
>> of interactions here all of which need to be correct, which this patch
>> does not address.  You have a long way to go before sharing and altp2m
>> can be safely used on the same gfn ranges.
>>
>
> Hi George,
> certainly there are cornercases where if the user does not setup things in
> the right order things can still go out of whack. My goal with this patch is
> not to address all of them. The goal of this patch is to not crash the
> hypervisor when the user at least tries to experiment with it, which is the
> current state. So this patch improves the status quo. Also, both mem_sharing
> and altp2m is considered experimental/tech_preview, so the fact that their
> combination is also experimental should be assumed as well. As I explained,
> with this patch in place there is at least one way they can be safely used
> together if the user tracks unsharing requirements through mem_access and
> does unsharing and fixup of the altp2m views manually. There are other ways
> where it would not be safe as after unsharing we don't know how the user
> would want things to look in altp2ms. I don't think we want to start
> guessing about that either so I will not be looking to implement that. So I
> don't agree with this reasoning being grounds for rejecting this patch that
> does incrementally improve the current state.

So you keep saying "user"; I assume you mean whatever program is
sitting in domain 0, which is going to be doing memsharing, altp2m,
and memaccess stuff all at once?

The altp2m code was not written for that purpose.  It was written for
*guest* administrators to use within the guest.  There's no problem
with finding additional uses for it, as long as the *original* purpose
doesn't get broken; and this patch most certainly does break things
for that purpose.  Guests using altp2m should not have to know that
sharing is happening behind their backs; nor should they be required
to use mem_access to manually fix up what the hypervisor has allowed
to be broken.

If at the moment altp2m + memsharing just plain crashes, then that
should be fixed; and if the lock-ordering parts of the patch fix that,
then they should be applied.  But the code which make sure that the
same gfn range cannot both be shared and subject to altp2m must remain
until they interact properly, with all the corner cases carefully
thought out.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared
  2016-07-18 11:04         ` George Dunlap
@ 2016-07-18 15:18           ` Tamas K Lengyel
  2016-07-18 15:27             ` Andrew Cooper
  2016-07-18 16:10             ` George Dunlap
  0 siblings, 2 replies; 22+ messages in thread
From: Tamas K Lengyel @ 2016-07-18 15:18 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andrew Cooper, Jan Beulich, Xen-devel

On Mon, Jul 18, 2016 at 5:04 AM, George Dunlap <george.dunlap@citrix.com> wrote:
> On Fri, Jul 15, 2016 at 3:59 PM, Tamas K Lengyel <tamas@tklengyel.com> wrote:
>>> I could go on in the analysis, but the point is that there's a morass
>>> of interactions here all of which need to be correct, which this patch
>>> does not address.  You have a long way to go before sharing and altp2m
>>> can be safely used on the same gfn ranges.
>>>
>>
>> Hi George,
>> certainly there are cornercases where if the user does not setup things in
>> the right order things can still go out of whack. My goal with this patch is
>> not to address all of them. The goal of this patch is to not crash the
>> hypervisor when the user at least tries to experiment with it, which is the
>> current state. So this patch improves the status quo. Also, both mem_sharing
>> and altp2m is considered experimental/tech_preview, so the fact that their
>> combination is also experimental should be assumed as well. As I explained,
>> with this patch in place there is at least one way they can be safely used
>> together if the user tracks unsharing requirements through mem_access and
>> does unsharing and fixup of the altp2m views manually. There are other ways
>> where it would not be safe as after unsharing we don't know how the user
>> would want things to look in altp2ms. I don't think we want to start
>> guessing about that either so I will not be looking to implement that. So I
>> don't agree with this reasoning being grounds for rejecting this patch that
>> does incrementally improve the current state.
>
> So you keep saying "user"; I assume you mean whatever program is
> sitting in domain 0, which is going to be doing memsharing, altp2m,
> and memaccess stuff all at once?
>
> The altp2m code was not written for that purpose.  It was written for
> *guest* administrators to use within the guest.

That's simply not true. It was written specifically to allow both
usecases - both internal *and* external. Mixing the use-cases was not
envisioned.

  There's no problem
> with finding additional uses for it, as long as the *original* purpose
> doesn't get broken; and this patch most certainly does break things
> for that purpose.

This patch most certainly *does not break* the in-guest usecase by
itself. If the in-guest usecase is used without any mem_sharing going
on, this patch does not have any effect on that usecase.

  Guests using altp2m should not have to know that
> sharing is happening behind their backs; nor should they be required
> to use mem_access to manually fix up what the hypervisor has allowed
> to be broken.

And they are not required. This requirement only applies if the user
mixes in in-guest and external usecases.

>
> If at the moment altp2m + memsharing just plain crashes, then that
> should be fixed; and if the lock-ordering parts of the patch fix that,
> then they should be applied.

Yes, that would be a start at least.

  But the code which make sure that the
> same gfn range cannot both be shared and subject to altp2m must remain
> until they interact properly, with all the corner cases carefully
> thought out.

Well, I don't see that what you suggest is going to happen if
incremental improvements are not allowed in. Anyhow, at this point I'm
going to start carrying out-of-tree patches for Xen in my project and
just resign from my mem_sharing maintainership as I feel like it's
pretty pointless.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared
  2016-07-18 15:18           ` Tamas K Lengyel
@ 2016-07-18 15:27             ` Andrew Cooper
  2016-07-18 16:15               ` George Dunlap
  2016-07-18 16:10             ` George Dunlap
  1 sibling, 1 reply; 22+ messages in thread
From: Andrew Cooper @ 2016-07-18 15:27 UTC (permalink / raw)
  To: Tamas K Lengyel, George Dunlap; +Cc: Xen-devel, Jan Beulich

On 18/07/16 16:18, Tamas K Lengyel wrote:
> On Mon, Jul 18, 2016 at 5:04 AM, George Dunlap <george.dunlap@citrix.com> wrote:
>> On Fri, Jul 15, 2016 at 3:59 PM, Tamas K Lengyel <tamas@tklengyel.com> wrote:
>>>> I could go on in the analysis, but the point is that there's a morass
>>>> of interactions here all of which need to be correct, which this patch
>>>> does not address.  You have a long way to go before sharing and altp2m
>>>> can be safely used on the same gfn ranges.
>>>>
>>> Hi George,
>>> certainly there are cornercases where if the user does not setup things in
>>> the right order things can still go out of whack. My goal with this patch is
>>> not to address all of them. The goal of this patch is to not crash the
>>> hypervisor when the user at least tries to experiment with it, which is the
>>> current state. So this patch improves the status quo. Also, both mem_sharing
>>> and altp2m is considered experimental/tech_preview, so the fact that their
>>> combination is also experimental should be assumed as well. As I explained,
>>> with this patch in place there is at least one way they can be safely used
>>> together if the user tracks unsharing requirements through mem_access and
>>> does unsharing and fixup of the altp2m views manually. There are other ways
>>> where it would not be safe as after unsharing we don't know how the user
>>> would want things to look in altp2ms. I don't think we want to start
>>> guessing about that either so I will not be looking to implement that. So I
>>> don't agree with this reasoning being grounds for rejecting this patch that
>>> does incrementally improve the current state.
>> So you keep saying "user"; I assume you mean whatever program is
>> sitting in domain 0, which is going to be doing memsharing, altp2m,
>> and memaccess stuff all at once?
>>
>> The altp2m code was not written for that purpose.  It was written for
>> *guest* administrators to use within the guest.
> That's simply not true. It was written specifically to allow both
> usecases - both internal *and* external. Mixing the use-cases was not
> envisioned.

Tamas is correct here.  altp2m was specifically designed and implemented
usable by both internal and external entities, irrespective of hardware
support.

Any modifications to the subsystem should maintain this property.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared
  2016-07-18 15:18           ` Tamas K Lengyel
  2016-07-18 15:27             ` Andrew Cooper
@ 2016-07-18 16:10             ` George Dunlap
  2016-07-18 17:06               ` Tamas K Lengyel
  1 sibling, 1 reply; 22+ messages in thread
From: George Dunlap @ 2016-07-18 16:10 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: Andrew Cooper, Jan Beulich, Xen-devel

On 18/07/16 16:18, Tamas K Lengyel wrote:
> On Mon, Jul 18, 2016 at 5:04 AM, George Dunlap <george.dunlap@citrix.com> wrote:
>> On Fri, Jul 15, 2016 at 3:59 PM, Tamas K Lengyel <tamas@tklengyel.com> wrote:
>>>> I could go on in the analysis, but the point is that there's a morass
>>>> of interactions here all of which need to be correct, which this patch
>>>> does not address.  You have a long way to go before sharing and altp2m
>>>> can be safely used on the same gfn ranges.
>>>>
>>>
>>> Hi George,
>>> certainly there are cornercases where if the user does not setup things in
>>> the right order things can still go out of whack. My goal with this patch is
>>> not to address all of them. The goal of this patch is to not crash the
>>> hypervisor when the user at least tries to experiment with it, which is the
>>> current state. So this patch improves the status quo. Also, both mem_sharing
>>> and altp2m is considered experimental/tech_preview, so the fact that their
>>> combination is also experimental should be assumed as well. As I explained,
>>> with this patch in place there is at least one way they can be safely used
>>> together if the user tracks unsharing requirements through mem_access and
>>> does unsharing and fixup of the altp2m views manually. There are other ways
>>> where it would not be safe as after unsharing we don't know how the user
>>> would want things to look in altp2ms. I don't think we want to start
>>> guessing about that either so I will not be looking to implement that. So I
>>> don't agree with this reasoning being grounds for rejecting this patch that
>>> does incrementally improve the current state.
>>
>> So you keep saying "user"; I assume you mean whatever program is
>> sitting in domain 0, which is going to be doing memsharing, altp2m,
>> and memaccess stuff all at once?
>>
>> The altp2m code was not written for that purpose.  It was written for
>> *guest* administrators to use within the guest.
> 
> That's simply not true. It was written specifically to allow both
> usecases - both internal *and* external. Mixing the use-cases was not
> envisioned.

Right, well in any case I certainly think that having external users of
the feature is a good thing to pursue.

> 
>   There's no problem
>> with finding additional uses for it, as long as the *original* purpose
>> doesn't get broken; and this patch most certainly does break things
>> for that purpose.
> 
> This patch most certainly *does not break* the in-guest usecase by
> itself. If the in-guest usecase is used without any mem_sharing going
> on, this patch does not have any effect on that usecase.
> 
>   Guests using altp2m should not have to know that
>> sharing is happening behind their backs; nor should they be required
>> to use mem_access to manually fix up what the hypervisor has allowed
>> to be broken.
> 
> And they are not required. This requirement only applies if the user
> mixes in in-guest and external usecases.

I keep saying "dom0" and "guest administrator", and you keep saying
"user", as though these are always going to be one and the same person.
That is a valid use case, but it is not the normal use case for Xen.  We
need to make it possible for host administrators to be able to decide to
enable sharing without having to know whether the guest administrator is
using altp2m; and the for the guest administrator to be able to decide
to use altp2m without having to know whether the host administrator is
going to enable sharing.

And even if they are the same person, I think code that Just Works is
better than code that has a lot of corner cases you have to avoid.

>> If at the moment altp2m + memsharing just plain crashes, then that
>> should be fixed; and if the lock-ordering parts of the patch fix that,
>> then they should be applied.
> 
> Yes, that would be a start at least.
> 
>   But the code which make sure that the
>> same gfn range cannot both be shared and subject to altp2m must remain
>> until they interact properly, with all the corner cases carefully
>> thought out.
> 
> Well, I don't see that what you suggest is going to happen if
> incremental improvements are not allowed in. 

Incremental improvements are welcome; but they must not cause
regressions in existing functionality.

The code as it is in the tree right now was intended to allow both
sharing and altp2m to be enabled on the same domain, just not over the
same gfn range.  And it was intended to be robust -- that is, the
sharing code and the altp2m code don't need to be aware of each other
and try not to step on each other's toes; each can just do its own thing
and Xen will make sure that nothing bad happens (by preventing pages
with an altp2m entry from being shared, and unsharing pages for which an
altp2m entry is created).

It sounds like that's broken right now; it should therefore be fixed.
When it is fixed, you'll be able to use both altp2m and sharing on the
same domain; Xen will simply prevent sharing from happening on gfn
ranges with altp2m entries.

An even bigger improvement would be to allow the same gfns to be subject
both to altp2m and sharing at the same time.  But this requires thinking
carefully about all the corner cases and making sure that they all work
correctly.

> Anyhow, at this point I'm
> going to start carrying out-of-tree patches for Xen in my project and
> just resign from my mem_sharing maintainership as I feel like it's
> pretty pointless.

I'm sorry that you're discouraged; all I can say is that I hope you
reconsider.  I'm not trying to block you, and I'm not ignoring your use
case; it's the job of a maintainer to look at *everyone's* use cases and
try to make sure that they are all accommodated in so far as it is
possible.

I'm also trying to make sure that the code you end up using in your
project is robust and reliable.  It seems to me like if the current
implementation was fixed, your life would be a lot easier than if we
accept your patch as it is -- your sharing code could just worry about
sharing, your altp2m code could just worry about whatever it's trying to
do, without having to carefully avoid corner cases or manually fix
things up when corner cases happen.  A bit less sharing would happen,
because fewer pages would be eligible to be shared, but overall you'd
have a lot less bugs and headache.

I invested a lot of my very limited time carefully going through both
sets of code before I answered your e-mail, and I spent a lot of time
trying to explain the kinds of interactions I think will be a problem.
I could have just acked the patch without doing that; but I think that
would have been both less good for you, and less good for the project as
a whole.

Peace,
 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared
  2016-07-18 15:27             ` Andrew Cooper
@ 2016-07-18 16:15               ` George Dunlap
  2016-07-18 16:22                 ` Tamas K Lengyel
  0 siblings, 1 reply; 22+ messages in thread
From: George Dunlap @ 2016-07-18 16:15 UTC (permalink / raw)
  To: Andrew Cooper, Tamas K Lengyel; +Cc: Xen-devel, Jan Beulich

On 18/07/16 16:27, Andrew Cooper wrote:
> On 18/07/16 16:18, Tamas K Lengyel wrote:
>> On Mon, Jul 18, 2016 at 5:04 AM, George Dunlap <george.dunlap@citrix.com> wrote:
>>> On Fri, Jul 15, 2016 at 3:59 PM, Tamas K Lengyel <tamas@tklengyel.com> wrote:
>>>>> I could go on in the analysis, but the point is that there's a morass
>>>>> of interactions here all of which need to be correct, which this patch
>>>>> does not address.  You have a long way to go before sharing and altp2m
>>>>> can be safely used on the same gfn ranges.
>>>>>
>>>> Hi George,
>>>> certainly there are cornercases where if the user does not setup things in
>>>> the right order things can still go out of whack. My goal with this patch is
>>>> not to address all of them. The goal of this patch is to not crash the
>>>> hypervisor when the user at least tries to experiment with it, which is the
>>>> current state. So this patch improves the status quo. Also, both mem_sharing
>>>> and altp2m is considered experimental/tech_preview, so the fact that their
>>>> combination is also experimental should be assumed as well. As I explained,
>>>> with this patch in place there is at least one way they can be safely used
>>>> together if the user tracks unsharing requirements through mem_access and
>>>> does unsharing and fixup of the altp2m views manually. There are other ways
>>>> where it would not be safe as after unsharing we don't know how the user
>>>> would want things to look in altp2ms. I don't think we want to start
>>>> guessing about that either so I will not be looking to implement that. So I
>>>> don't agree with this reasoning being grounds for rejecting this patch that
>>>> does incrementally improve the current state.
>>> So you keep saying "user"; I assume you mean whatever program is
>>> sitting in domain 0, which is going to be doing memsharing, altp2m,
>>> and memaccess stuff all at once?
>>>
>>> The altp2m code was not written for that purpose.  It was written for
>>> *guest* administrators to use within the guest.
>> That's simply not true. It was written specifically to allow both
>> usecases - both internal *and* external. Mixing the use-cases was not
>> envisioned.
> 
> Tamas is correct here.  altp2m was specifically designed and implemented
> usable by both internal and external entities, irrespective of hardware
> support.
> 
> Any modifications to the subsystem should maintain this property.

Indeed; it was also designed to be robust when used with sharing
(although it was apparently never tested, because it's currently broken
in that respect).  And it is this property that I'm trying to maintain,
both for internal and external users.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared
  2016-07-18 16:15               ` George Dunlap
@ 2016-07-18 16:22                 ` Tamas K Lengyel
  0 siblings, 0 replies; 22+ messages in thread
From: Tamas K Lengyel @ 2016-07-18 16:22 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andrew Cooper, Jan Beulich, Xen-devel

On Mon, Jul 18, 2016 at 10:15 AM, George Dunlap
<george.dunlap@citrix.com> wrote:
> On 18/07/16 16:27, Andrew Cooper wrote:
>> On 18/07/16 16:18, Tamas K Lengyel wrote:
>>> On Mon, Jul 18, 2016 at 5:04 AM, George Dunlap <george.dunlap@citrix.com> wrote:
>>>> On Fri, Jul 15, 2016 at 3:59 PM, Tamas K Lengyel <tamas@tklengyel.com> wrote:
>>>>>> I could go on in the analysis, but the point is that there's a morass
>>>>>> of interactions here all of which need to be correct, which this patch
>>>>>> does not address.  You have a long way to go before sharing and altp2m
>>>>>> can be safely used on the same gfn ranges.
>>>>>>
>>>>> Hi George,
>>>>> certainly there are cornercases where if the user does not setup things in
>>>>> the right order things can still go out of whack. My goal with this patch is
>>>>> not to address all of them. The goal of this patch is to not crash the
>>>>> hypervisor when the user at least tries to experiment with it, which is the
>>>>> current state. So this patch improves the status quo. Also, both mem_sharing
>>>>> and altp2m is considered experimental/tech_preview, so the fact that their
>>>>> combination is also experimental should be assumed as well. As I explained,
>>>>> with this patch in place there is at least one way they can be safely used
>>>>> together if the user tracks unsharing requirements through mem_access and
>>>>> does unsharing and fixup of the altp2m views manually. There are other ways
>>>>> where it would not be safe as after unsharing we don't know how the user
>>>>> would want things to look in altp2ms. I don't think we want to start
>>>>> guessing about that either so I will not be looking to implement that. So I
>>>>> don't agree with this reasoning being grounds for rejecting this patch that
>>>>> does incrementally improve the current state.
>>>> So you keep saying "user"; I assume you mean whatever program is
>>>> sitting in domain 0, which is going to be doing memsharing, altp2m,
>>>> and memaccess stuff all at once?
>>>>
>>>> The altp2m code was not written for that purpose.  It was written for
>>>> *guest* administrators to use within the guest.
>>> That's simply not true. It was written specifically to allow both
>>> usecases - both internal *and* external. Mixing the use-cases was not
>>> envisioned.
>>
>> Tamas is correct here.  altp2m was specifically designed and implemented
>> usable by both internal and external entities, irrespective of hardware
>> support.
>>
>> Any modifications to the subsystem should maintain this property.
>
> Indeed; it was also designed to be robust when used with sharing
> (although it was apparently never tested, because it's currently broken
> in that respect).  And it is this property that I'm trying to maintain,
> both for internal and external users.
>

It was tested and the hypervisor crash was pointed out during the
merging process of altp2m, but it was let in anyway as both
mem_sharing and altp2m was considered experimental and thus this crash
in a cornercase was deemed acceptable.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared
  2016-07-18 16:10             ` George Dunlap
@ 2016-07-18 17:06               ` Tamas K Lengyel
  2016-07-19 11:39                 ` George Dunlap
                                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Tamas K Lengyel @ 2016-07-18 17:06 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andrew Cooper, Jan Beulich, Xen-devel

On Mon, Jul 18, 2016 at 10:10 AM, George Dunlap
<george.dunlap@citrix.com> wrote:
> On 18/07/16 16:18, Tamas K Lengyel wrote:
>> On Mon, Jul 18, 2016 at 5:04 AM, George Dunlap <george.dunlap@citrix.com> wrote:
>>> On Fri, Jul 15, 2016 at 3:59 PM, Tamas K Lengyel <tamas@tklengyel.com> wrote:
>>>>> I could go on in the analysis, but the point is that there's a morass
>>>>> of interactions here all of which need to be correct, which this patch
>>>>> does not address.  You have a long way to go before sharing and altp2m
>>>>> can be safely used on the same gfn ranges.
>>>>>
>>>>
>>>> Hi George,
>>>> certainly there are cornercases where if the user does not setup things in
>>>> the right order things can still go out of whack. My goal with this patch is
>>>> not to address all of them. The goal of this patch is to not crash the
>>>> hypervisor when the user at least tries to experiment with it, which is the
>>>> current state. So this patch improves the status quo. Also, both mem_sharing
>>>> and altp2m is considered experimental/tech_preview, so the fact that their
>>>> combination is also experimental should be assumed as well. As I explained,
>>>> with this patch in place there is at least one way they can be safely used
>>>> together if the user tracks unsharing requirements through mem_access and
>>>> does unsharing and fixup of the altp2m views manually. There are other ways
>>>> where it would not be safe as after unsharing we don't know how the user
>>>> would want things to look in altp2ms. I don't think we want to start
>>>> guessing about that either so I will not be looking to implement that. So I
>>>> don't agree with this reasoning being grounds for rejecting this patch that
>>>> does incrementally improve the current state.
>>>
>>> So you keep saying "user"; I assume you mean whatever program is
>>> sitting in domain 0, which is going to be doing memsharing, altp2m,
>>> and memaccess stuff all at once?
>>>
>>> The altp2m code was not written for that purpose.  It was written for
>>> *guest* administrators to use within the guest.
>>
>> That's simply not true. It was written specifically to allow both
>> usecases - both internal *and* external. Mixing the use-cases was not
>> envisioned.
>
> Right, well in any case I certainly think that having external users of
> the feature is a good thing to pursue.
>
>>
>>   There's no problem
>>> with finding additional uses for it, as long as the *original* purpose
>>> doesn't get broken; and this patch most certainly does break things
>>> for that purpose.
>>
>> This patch most certainly *does not break* the in-guest usecase by
>> itself. If the in-guest usecase is used without any mem_sharing going
>> on, this patch does not have any effect on that usecase.
>>
>>   Guests using altp2m should not have to know that
>>> sharing is happening behind their backs; nor should they be required
>>> to use mem_access to manually fix up what the hypervisor has allowed
>>> to be broken.
>>
>> And they are not required. This requirement only applies if the user
>> mixes in in-guest and external usecases.
>
> I keep saying "dom0" and "guest administrator", and you keep saying
> "user", as though these are always going to be one and the same person.
> That is a valid use case, but it is not the normal use case for Xen.  We
> need to make it possible for host administrators to be able to decide to
> enable sharing without having to know whether the guest administrator is
> using altp2m; and the for the guest administrator to be able to decide
> to use altp2m without having to know whether the host administrator is
> going to enable sharing.
>
> And even if they are the same person, I think code that Just Works is
> better than code that has a lot of corner cases you have to avoid.
>
>>> If at the moment altp2m + memsharing just plain crashes, then that
>>> should be fixed; and if the lock-ordering parts of the patch fix that,
>>> then they should be applied.
>>
>> Yes, that would be a start at least.
>>
>>   But the code which make sure that the
>>> same gfn range cannot both be shared and subject to altp2m must remain
>>> until they interact properly, with all the corner cases carefully
>>> thought out.
>>
>> Well, I don't see that what you suggest is going to happen if
>> incremental improvements are not allowed in.
>
> Incremental improvements are welcome; but they must not cause
> regressions in existing functionality.

Existing functionality does not get impaired by this as what happens
right now is a hypervisor crash. I don't see how things can get any
worst than that.

>
> The code as it is in the tree right now was intended to allow both
> sharing and altp2m to be enabled on the same domain, just not over the
> same gfn range.  And it was intended to be robust -- that is, the
> sharing code and the altp2m code don't need to be aware of each other
> and try not to step on each other's toes; each can just do its own thing
> and Xen will make sure that nothing bad happens (by preventing pages
> with an altp2m entry from being shared, and unsharing pages for which an
> altp2m entry is created).
>
> It sounds like that's broken right now; it should therefore be fixed.
> When it is fixed, you'll be able to use both altp2m and sharing on the
> same domain; Xen will simply prevent sharing from happening on gfn
> ranges with altp2m entries.

No, that's incorrect, it's the other way around. If you were to try to
share pages for which you have altp2m entries it will happily oblige.
It will just fail to do the altp2m actions for entries of shared
entries (copying the mapping to the altp2m view, mem_access, etc).

>
> An even bigger improvement would be to allow the same gfns to be subject
> both to altp2m and sharing at the same time.  But this requires thinking
> carefully about all the corner cases and making sure that they all work
> correctly.

And this is exactly what this patch allows you to do. An entry can now
be both shared, get properly copied to altp2m views, allow setting
mem_access in altp2m views, etc. The only situation you have to take
core of is when the type of the entry changes from shared to unshared
as that resets the altp2m views. This is a behavior altp2m already has
when for example you remove pages altogether from the hostp2m. The
altp2m system will just flush all affected altp2m views altogether
regardless if there were remapped gfns and/or mem_access settings,
without sending any notification to the user about it. In this regard
the unsharing event is actually better as we can catch that with
mem_access. So again, I don't see this as being a regression by any
means.

Also, just as the external user could subscribe to events that lead to
unsharing the internal user could as well and preventatively fix
things up. How the internal user becomes aware which entries are
shared and which are not is beyond the scope of what I'm interested in
achieving here but there certainly could be channels where they can
communicate that information to each other if needed.

>
>> Anyhow, at this point I'm
>> going to start carrying out-of-tree patches for Xen in my project and
>> just resign from my mem_sharing maintainership as I feel like it's
>> pretty pointless.
>
> I'm sorry that you're discouraged; all I can say is that I hope you
> reconsider.  I'm not trying to block you, and I'm not ignoring your use
> case; it's the job of a maintainer to look at *everyone's* use cases and
> try to make sure that they are all accommodated in so far as it is
> possible.
>
> I'm also trying to make sure that the code you end up using in your
> project is robust and reliable.  It seems to me like if the current
> implementation was fixed, your life would be a lot easier than if we
> accept your patch as it is -- your sharing code could just worry about
> sharing, your altp2m code could just worry about whatever it's trying to
> do, without having to carefully avoid corner cases or manually fix
> things up when corner cases happen.  A bit less sharing would happen,
> because fewer pages would be eligible to be shared, but overall you'd
> have a lot less bugs and headache.
>
> I invested a lot of my very limited time carefully going through both
> sets of code before I answered your e-mail, and I spent a lot of time
> trying to explain the kinds of interactions I think will be a problem.
> I could have just acked the patch without doing that; but I think that
> would have been both less good for you, and less good for the project as
> a whole.

I certainly appreciate your time spent on this. However, I don't see
the point of being maintainer if my opinion on what constitutes an
improvement of the system just gets overruled. As pretty much my
project is the only use-case where these two systems would be used
together at this point, and since I already require my users to
compile Xen from source it is just easier to go this route then what
you suggest and exploring and remedying all possible ways the two
systems could be misused when setup in ways they were not intended. If
these were considered stable features and not experimental I would
agree, but that's just not the case. So I think both of our time is
better spent doing other things then arguing. I would like to hear the
other maintainers opinion on this matter as well but I'm not
interested in arguing endlessly or initiating or vote, so if the patch
is not allowed in I will accept that decision but I will see no point
in continuing as maintainer of the system.

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared
  2016-07-18 17:06               ` Tamas K Lengyel
@ 2016-07-19 11:39                 ` George Dunlap
  2016-07-19 15:20                   ` Tamas K Lengyel
  2016-07-19 13:36                 ` Ian Jackson
  2016-07-19 17:11                 ` George Dunlap
  2 siblings, 1 reply; 22+ messages in thread
From: George Dunlap @ 2016-07-19 11:39 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: Andrew Cooper, Jan Beulich, Xen-devel

On 18/07/16 18:06, Tamas K Lengyel wrote:
>>> Anyhow, at this point I'm
>>> going to start carrying out-of-tree patches for Xen in my project and
>>> just resign from my mem_sharing maintainership as I feel like it's
>>> pretty pointless.
>>
>> I'm sorry that you're discouraged; all I can say is that I hope you
>> reconsider.  I'm not trying to block you, and I'm not ignoring your use
>> case; it's the job of a maintainer to look at *everyone's* use cases and
>> try to make sure that they are all accommodated in so far as it is
>> possible.
>>
>> I'm also trying to make sure that the code you end up using in your
>> project is robust and reliable.  It seems to me like if the current
>> implementation was fixed, your life would be a lot easier than if we
>> accept your patch as it is -- your sharing code could just worry about
>> sharing, your altp2m code could just worry about whatever it's trying to
>> do, without having to carefully avoid corner cases or manually fix
>> things up when corner cases happen.  A bit less sharing would happen,
>> because fewer pages would be eligible to be shared, but overall you'd
>> have a lot less bugs and headache.
>>
>> I invested a lot of my very limited time carefully going through both
>> sets of code before I answered your e-mail, and I spent a lot of time
>> trying to explain the kinds of interactions I think will be a problem.
>> I could have just acked the patch without doing that; but I think that
>> would have been both less good for you, and less good for the project as
>> a whole.
> 
> I certainly appreciate your time spent on this. However, I don't see
> the point of being maintainer if my opinion on what constitutes an
> improvement of the system just gets overruled.

You're not being overruled; you're just being asked to make a case for a
change you want to make to an area of code that I maintain (the p2m
code).  And the discussion is by no means over; I started the most
recent discussion by saying "Correct me if I'm wrong", and it looks like
there are still a number of places where we have different views of the
facts of the matter.  Once we've established those we may end up with
closer opinions.

Working together means that sometimes you have to spend the time and
effort to understand where other people are coming from -- what they
think is important, what they think is true; and then working with that
-- correcting them on places where they have misconceptions (or
double-checking your own beliefs to make sure that you're not mistaken);
communicating what it is that you think is important, and then trying to
come up with a way forward that takes everyone's values into account, or
convincing someone that a particular way really is the best way forward
(which may mean convincing them to change their priorities somewhat).

I am sorry that the tone of this conversation has heated up.  But the
reason I've been "raising my voice" as it were is because I've been
trying to ask questions and raise potential issues, and I feel like
you've been just hand-waving them away.  You may be 100% right, but it
is my duty as the maintainer of the p2m code to not accept code until
I'm reasonably convinced that it's a good way forward.

> I would like to hear the
> other maintainers opinion on this matter as well but I'm not
> interested in arguing endlessly or initiating or vote, so if the patch
> is not allowed in I will accept that decision but I will see no point
> in continuing as maintainer of the system.

At a basic level, the other maintainers will agree that I shouldn't
accept code unless I am convinced it's for the good of the project.  But
since this is a technical issue, before anyone would express an opinion
to ask me to change my mind, they would want a more complete view of the
facts of the matter -- facts which you and I are still in the process of
sorting out.

> As pretty much my
> project is the only use-case where these two systems would be used
> together at this point, and since I already require my users to
> compile Xen from source it is just easier to go this route then what
> you suggest and exploring and remedying all possible ways the two
> systems could be misused when setup in ways they were not intended. If
> these were considered stable features and not experimental I would
> agree, but that's just not the case. So I think both of our time is
> better spent doing other things then arguing. 

So a lot of points here.

You have no idea what other projects are doing.  Lots of people take the
Xen code, do something with it internally, and and we never hear from
them.  Or maybe they're in a start-up in stealth mode and will announce
themselves in due course.  If you step down from being a maintainer and
stop engaging with the community you'll be in the same position.

There's a very obvious other use case which I've been talking about from
the beginning: A host administrator / cloud provider / user wants to
both 1) use page sharing to improve memory efficiency, and 2) allow
tools using the altp2m feature to run internally within the guest.
(Given that I have brought it up several times very explicitly, I am in
fact a bit at a loss to understand why you would say your project is the
"only use case where these two systems would be used together".)

Even it were true that your project is the only use case, making the
code so that it only works for your use case is a sure-fire way to
guarantee that there never are any other use cases.

I'm not insisting that you either leave the code broken or make sharing
and altp2m over the same gfn ranges fully robust.  What I've been
suggesting you do is to fix the code so that it works as it is
*intended* to work currently -- that is, that gfns refered to by altp2ms
will be prevented from being shared.  Maybe that's not possible, or
maybe that's not suitable for your use case; but you haven't yet argued
either of those points.

As I've said, it seems to me like the option you've chosen isn't even
the best option *for your project*.  You're giving *yourself* an
interface which is inherently fragile, and for which you as the consumer
of that interface need to think carefully about all the corner cases and
then either avoid them or use mem_sharing to work around them.  Is that
actually more difficult than just doing the work up front to make sure
that the corner cases are handled in a robust manner by Xen?

Finally, yes one of the benefits of being a maintainer is that you can
shepherd and develop the code that you maintain to make sure that it
doesn't degrade and continues to work for use cases that you care about.
 But one of the responsibilities of that is looking not only for your
own use case, but to try to make it a useful tool for the community as a
whole.

Working with the community certainly is a lot of work, but it's a *lot*
less work than writing your own hypervisor.  It's also less work, in the
end, than keeping a massive patchset out-of-tree and having to rebase
every release.  And it often ends up improving and sharpening the code
you wrote anyway.

Peace,
 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared
  2016-07-18 17:06               ` Tamas K Lengyel
  2016-07-19 11:39                 ` George Dunlap
@ 2016-07-19 13:36                 ` Ian Jackson
  2016-07-19 15:31                   ` Tamas K Lengyel
  2016-07-19 17:11                 ` George Dunlap
  2 siblings, 1 reply; 22+ messages in thread
From: Ian Jackson @ 2016-07-19 13:36 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: Andrew Cooper, George Dunlap, Jan Beulich, Xen-devel

Tamas: George brought this thread to my attention.  I'm sorry that you
feel blocked and/or overruled.  The hypervisor MM code is not my area
of expertise, but I have a keen interest in seeing a good, productive
and friendly Xen community.  I definitely don't want to see you pushed
away, and driven to maintain an out-of-tree patchset.

Reading through your mails there seem to still have unresolved
detailed technical disagreements between you and George about the
existing behaviours in Xen, and the effects of your proposed changes.

Right now I would like to ask both you and George to sort out those
factual disagreements.  I expect that you can do so.  I hope that then
the way forward will be clear: ie that you and George willbe in
agreement about the direction in which the code should be going.

I think that would be better than getting into a more abstract
conversation about which use cases exist or are important, or an
argument about areas of responsibility or authority.

If either of you feel that you aren't able to agree on the facts, or
that that conversation is not proceeding constructively, I'm be happy
to try to help.  You can contact me by email in public or private, or
find me as Diziet on irc.

(In a complex codebase like Xen there will always be overlap or
interference between different maintainers' bailiwicks, so we
definitely do need to be able to come to some kind of agreement,
rather than everyone insisting on their own authority in what they
regard as their own area.)

Regards,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared
  2016-07-19 11:39                 ` George Dunlap
@ 2016-07-19 15:20                   ` Tamas K Lengyel
  0 siblings, 0 replies; 22+ messages in thread
From: Tamas K Lengyel @ 2016-07-19 15:20 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andrew Cooper, Jan Beulich, Xen-devel

On Tue, Jul 19, 2016 at 5:39 AM, George Dunlap <george.dunlap@citrix.com> wrote:
> On 18/07/16 18:06, Tamas K Lengyel wrote:
>>>> Anyhow, at this point I'm
>>>> going to start carrying out-of-tree patches for Xen in my project and
>>>> just resign from my mem_sharing maintainership as I feel like it's
>>>> pretty pointless.
>>>
>>> I'm sorry that you're discouraged; all I can say is that I hope you
>>> reconsider.  I'm not trying to block you, and I'm not ignoring your use
>>> case; it's the job of a maintainer to look at *everyone's* use cases and
>>> try to make sure that they are all accommodated in so far as it is
>>> possible.
>>>
>>> I'm also trying to make sure that the code you end up using in your
>>> project is robust and reliable.  It seems to me like if the current
>>> implementation was fixed, your life would be a lot easier than if we
>>> accept your patch as it is -- your sharing code could just worry about
>>> sharing, your altp2m code could just worry about whatever it's trying to
>>> do, without having to carefully avoid corner cases or manually fix
>>> things up when corner cases happen.  A bit less sharing would happen,
>>> because fewer pages would be eligible to be shared, but overall you'd
>>> have a lot less bugs and headache.
>>>
>>> I invested a lot of my very limited time carefully going through both
>>> sets of code before I answered your e-mail, and I spent a lot of time
>>> trying to explain the kinds of interactions I think will be a problem.
>>> I could have just acked the patch without doing that; but I think that
>>> would have been both less good for you, and less good for the project as
>>> a whole.
>>
>> I certainly appreciate your time spent on this. However, I don't see
>> the point of being maintainer if my opinion on what constitutes an
>> improvement of the system just gets overruled.
>
> You're not being overruled; you're just being asked to make a case for a
> change you want to make to an area of code that I maintain (the p2m
> code).  And the discussion is by no means over; I started the most
> recent discussion by saying "Correct me if I'm wrong", and it looks like
> there are still a number of places where we have different views of the
> facts of the matter.  Once we've established those we may end up with
> closer opinions.
>
> Working together means that sometimes you have to spend the time and
> effort to understand where other people are coming from -- what they
> think is important, what they think is true; and then working with that
> -- correcting them on places where they have misconceptions (or
> double-checking your own beliefs to make sure that you're not mistaken);
> communicating what it is that you think is important, and then trying to
> come up with a way forward that takes everyone's values into account, or
> convincing someone that a particular way really is the best way forward
> (which may mean convincing them to change their priorities somewhat).
>
> I am sorry that the tone of this conversation has heated up.  But the
> reason I've been "raising my voice" as it were is because I've been
> trying to ask questions and raise potential issues, and I feel like
> you've been just hand-waving them away.  You may be 100% right, but it
> is my duty as the maintainer of the p2m code to not accept code until
> I'm reasonably convinced that it's a good way forward.

By no means I meant to heat-up the conversation or hand-wave your
concerns away. I do understand what it takes to work with the
community and that it takes cooperation for that to happen. I was not
hand-waving your concerns away but describing how the two systems
could interact safely together while agreeing with you that yes, there
are still scenarios where it would not be wise to turn two
experimental systems on together.

>
>> I would like to hear the
>> other maintainers opinion on this matter as well but I'm not
>> interested in arguing endlessly or initiating or vote, so if the patch
>> is not allowed in I will accept that decision but I will see no point
>> in continuing as maintainer of the system.
>
> At a basic level, the other maintainers will agree that I shouldn't
> accept code unless I am convinced it's for the good of the project.  But
> since this is a technical issue, before anyone would express an opinion
> to ask me to change my mind, they would want a more complete view of the
> facts of the matter -- facts which you and I are still in the process of
> sorting out.

Both of these systems are fairly complicated and not many people have
been looking at them in-depth, so I most certainly appreciate the time
you spent on reviewing thus far. But your conclusion that there is a
"long way to go here" tells me that an arbitrary criteria is getting
pushed on me that I don't even know how to address. The altp2m system
got merged while it was known that it crashes the hypervisor when
mem_sharing is used, but now when an incremental fix introduces at
least one setup where they can be used together, all of a sudden that
is regression. I understand that it would be better to address all
cases and make the system as robust as possible. It's just way beyond
what I can do here.

>
>> As pretty much my
>> project is the only use-case where these two systems would be used
>> together at this point, and since I already require my users to
>> compile Xen from source it is just easier to go this route then what
>> you suggest and exploring and remedying all possible ways the two
>> systems could be misused when setup in ways they were not intended. If
>> these were considered stable features and not experimental I would
>> agree, but that's just not the case. So I think both of our time is
>> better spent doing other things then arguing.
>
> So a lot of points here.
>
> You have no idea what other projects are doing.  Lots of people take the
> Xen code, do something with it internally, and and we never hear from
> them.  Or maybe they're in a start-up in stealth mode and will announce
> themselves in due course.  If you step down from being a maintainer and
> stop engaging with the community you'll be in the same position.

Surely, I didn't imply there would or could be noone else who would be
interested in these two systems working together. All I meant was that
my project is the only known opensource project at this time that even
uses mem_sharing and altp2m. If anyone else would have been interested
in getting the two systems working together othen then me they
probably would have complained already that hey this crashes the
hypervisor. My point being that at this point the impact of this patch
is likely really low.

>
> There's a very obvious other use case which I've been talking about from
> the beginning: A host administrator / cloud provider / user wants to
> both 1) use page sharing to improve memory efficiency, and 2) allow
> tools using the altp2m feature to run internally within the guest.
> (Given that I have brought it up several times very explicitly, I am in
> fact a bit at a loss to understand why you would say your project is the
> "only use case where these two systems would be used together".)
>
> Even it were true that your project is the only use case, making the
> code so that it only works for your use case is a sure-fire way to
> guarantee that there never are any other use cases.

Well, I do make sure of that in my project by way of a custom XSM
policy that disables the guest from being able to do altp2m on itself.
I don't know what other way you could envision to enforce such
separation then documenting the known behaviors of the system and
trusting that the brave enough administrator who turns on experimental
features is ready to experiment with them.

> I'm not insisting that you either leave the code broken or make sharing
> and altp2m over the same gfn ranges fully robust.  What I've been
> suggesting you do is to fix the code so that it works as it is
> *intended* to work currently -- that is, that gfns refered to by altp2ms
> will be prevented from being shared.  Maybe that's not possible, or
> maybe that's not suitable for your use case; but you haven't yet argued
> either of those points.

OK, now this is a technical point that can be addressed and has not
been articulated as such. From my perspective this is fine, my usecase
assumes the domain is completely memshared before any altp2m
operations are done so I see no problem with this.

>
> As I've said, it seems to me like the option you've chosen isn't even
> the best option *for your project*.  You're giving *yourself* an
> interface which is inherently fragile, and for which you as the consumer
> of that interface need to think carefully about all the corner cases and
> then either avoid them or use mem_sharing to work around them.  Is that
> actually more difficult than just doing the work up front to make sure
> that the corner cases are handled in a robust manner by Xen?
>
> Finally, yes one of the benefits of being a maintainer is that you can
> shepherd and develop the code that you maintain to make sure that it
> doesn't degrade and continues to work for use cases that you care about.
>  But one of the responsibilities of that is looking not only for your
> own use case, but to try to make it a useful tool for the community as a
> whole.
>
> Working with the community certainly is a lot of work, but it's a *lot*
> less work than writing your own hypervisor.  It's also less work, in the
> end, than keeping a massive patchset out-of-tree and having to rebase
> every release.  And it often ends up improving and sharpening the code
> you wrote anyway.
>

Yes, I absolutely agree and that's why I've been working hard with the
community this past years to improve a lot of half - or completely -
abandoned and undocumented portions of the system. I certainly don't
like carrying out-of-tree patches and it's not my first choice.
However, it is unreasonable to expect complete reworks and/or
addressing all possible misuse cases on experimental parts that have
very low utility for most people.

Describing and fixing all possible ways how experiments can go wrong
with experimental systems is unreasonable and is not what I agreed to
when I accepted to maintain mem_sharing. I've accepted it with the
understanding that I can do "odd fixes" and incremental improvements
here and there. I understand if this may not be acceptable for your
components but since mem_sharing is inherently entangled with it, then
that means that doing odd fixes is not possible so keeping that hat as
is pointless. I certainly hope we can do incremental improvements as I
don't see anyone picking this up anytime in the near future.

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared
  2016-07-19 13:36                 ` Ian Jackson
@ 2016-07-19 15:31                   ` Tamas K Lengyel
  0 siblings, 0 replies; 22+ messages in thread
From: Tamas K Lengyel @ 2016-07-19 15:31 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Andrew Cooper, George Dunlap, Jan Beulich, Xen-devel

On Tue, Jul 19, 2016 at 7:36 AM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:
> Tamas: George brought this thread to my attention.  I'm sorry that you
> feel blocked and/or overruled.  The hypervisor MM code is not my area
> of expertise, but I have a keen interest in seeing a good, productive
> and friendly Xen community.  I definitely don't want to see you pushed
> away, and driven to maintain an out-of-tree patchset.
>
> Reading through your mails there seem to still have unresolved
> detailed technical disagreements between you and George about the
> existing behaviours in Xen, and the effects of your proposed changes.
>
> Right now I would like to ask both you and George to sort out those
> factual disagreements.  I expect that you can do so.  I hope that then
> the way forward will be clear: ie that you and George willbe in
> agreement about the direction in which the code should be going.
>
> I think that would be better than getting into a more abstract
> conversation about which use cases exist or are important, or an
> argument about areas of responsibility or authority.
>
> If either of you feel that you aren't able to agree on the facts, or
> that that conversation is not proceeding constructively, I'm be happy
> to try to help.  You can contact me by email in public or private, or
> find me as Diziet on irc.
>
> (In a complex codebase like Xen there will always be overlap or
> interference between different maintainers' bailiwicks, so we
> definitely do need to be able to come to some kind of agreement,
> rather than everyone insisting on their own authority in what they
> regard as their own area.)
>
> Regards,
> Ian.

Thanks Ian, I agree and hope we can get back to technical issues as
well. I certainly didn't mean to escalate this. I do hope we can get
to the bottom of what concerns are applicable to this change and
discuss what we can do to address those in a reasonable fashion.

Best,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared
  2016-07-18 17:06               ` Tamas K Lengyel
  2016-07-19 11:39                 ` George Dunlap
  2016-07-19 13:36                 ` Ian Jackson
@ 2016-07-19 17:11                 ` George Dunlap
  2016-07-19 19:12                   ` Tamas K Lengyel
  2 siblings, 1 reply; 22+ messages in thread
From: George Dunlap @ 2016-07-19 17:11 UTC (permalink / raw)
  To: Tamas K Lengyel; +Cc: Andrew Cooper, Ian Jackson, Jan Beulich, Xen-devel

On 18/07/16 18:06, Tamas K Lengyel wrote:
>> Incremental improvements are welcome; but they must not cause
>> regressions in existing functionality.
> 
> Existing functionality does not get impaired by this as what happens
> right now is a hypervisor crash. I don't see how things can get any
> worst than that.

Also from another thread:
> If anyone else would have been interested in getting the two systems
> working together othen then me they probably would have complained
> already that hey this crashes the hypervisor. My point being that at
> this point the impact of this patch is likely really low.

From a user perspective, "failing intermittently in some strange and
unpredictable way" is definitely worse than a hypervisor crash. :-)

My concern is that someone will start using guests which use the altp2m
interface internally, and that will all work; and then maybe separately
they will start doing some sort of memory sharing between guests, and
that will all work; and then at some point they'll do memory sharing on
a guest using the altp2m functionality internally, and suddenly they'll
get strange intermittent errors where things don't behave the way they
expect and they don't know why.  A hypervisor crash that tells them
exactly what code has the problem is definitely preferable.

>> The code as it is in the tree right now was intended to allow both
>> sharing and altp2m to be enabled on the same domain, just not over the
>> same gfn range.  And it was intended to be robust -- that is, the
>> sharing code and the altp2m code don't need to be aware of each other
>> and try not to step on each other's toes; each can just do its own thing
>> and Xen will make sure that nothing bad happens (by preventing pages
>> with an altp2m entry from being shared, and unsharing pages for which an
>> altp2m entry is created).
>>
>> It sounds like that's broken right now; it should therefore be fixed.
>> When it is fixed, you'll be able to use both altp2m and sharing on the
>> same domain; Xen will simply prevent sharing from happening on gfn
>> ranges with altp2m entries.
> 
> No, that's incorrect, it's the other way around. If you were to try to
> share pages for which you have altp2m entries it will happily oblige.
> It will just fail to do the altp2m actions for entries of shared
> entries (copying the mapping to the altp2m view, mem_access, etc).

It's quite possible I missed something, but that's not how I read the
code.  Before sharing a page you have to have to call
mem_sharing_nominate_page(), which calls page_make_sharable().
page_make_sharable() will make sure that it has exactly the expected
number of references; which for gfns is 2 and for grant references is 4.

When you map an mfn into an altp2m of a different gfn, you'll increase
the reference count.  So it appears to me that if you attempt to share a
page which is mapped in an altp2m, then the nominate operation will fail
(with -E2BIG, of all things).

Am I mistaken about that?

(BTB this would probably still be the case even after your patch.)

Also, as far as I can tell, "It will just fail to do the altp2m actions
for entries of shared entries" is not true; instead, the page will be
un-shared and the altp2m action will then take place.  Is this not the case?

>> An even bigger improvement would be to allow the same gfns to be subject
>> both to altp2m and sharing at the same time.  But this requires thinking
>> carefully about all the corner cases and making sure that they all work
>> correctly.
> 
> And this is exactly what this patch allows you to do. An entry can now
> be both shared, get properly copied to altp2m views, allow setting
> mem_access in altp2m views, etc. The only situation you have to take
> core of is when the type of the entry changes from shared to unshared
> as that resets the altp2m views.

I described another situation you have to be careful of in an earlier
e-mail:
- host gfn A is marked "shared"
- altp2m gfn O is mapped to gfn A (thus also marked as 'shared')
- Guest writes to gfn O, Xen attempts to unshare the page.

In this circumstance, the fault will ends up in
__mem_sharing_unshare_page(), which will calls rmap_retrieve(d, O, mA).
This returns NULL because gfn O was never put in the reverse map, and
you BUG().

Again, am I misreading what would happen?

I'm pretty sure if I went looking I could find some more situations you
need to avoid to prevent problems.

So the next big missing piece of information in this discussion is
exactly what you do need from this system.  You're using altp2m and
mem_sharing (and mem_access) on the same domain, that's obvious.  Which
features of altp2m are you using -- are you mainly using the mem_access
changes, or are you also using the gfn mapping functionality?

Also, how important is it that pages using altp2m functionality not be
un-shared -- i.e., what proportion of a guest's pages do you expect to
be shared, and what proportion do you need to perform altp2m operations on?

So there are several things we could do here:

1. Allow mem_sharing and altp2m functionality co-existing by avoiding
each other: the same domain can use both, but the same gfn cannot be
both shared and have altp2m entries.  This is essentially fixing the bug
in the current code: Adding an altp2m to a gfn unshares it; and gfns
which have altp2m entries cannot be shared.

This should, I *think*, be fairly straightforward; it might even only
require the lock-reordering part of your patch.

If your use-case only uses altp2m functionality on a small number of
pages, then this might be a not-too-difficult option that would solve
your problem while not introducing any pitfalls for future users to all
into.  If your use case requires large amounts of sharing *and* large
amounts of altp2m entries, this is probably not going to work for you.

2. Allow mem_sharing on altp2m entries with limited functionality.
Specifically: Do not unshare if an altp2m entry only has mem_access
restrictions set; only unshare it if the gfn is remapped.

This would require some very careful thinking about the different corner
cases that might happen, but I don't think it would be quite the morass
that full altp2m support (with gfn remapping) would involve.

If your use-case only uses the mem_access altp2m functionality, this
might be a solution that's a reasonable amount of work.

3. Allow mem_sharing on altp2m entries with full functionality in a
robust manner, making sure all the corner cases are handled correctly.

This would obviously be quite a bit more work.  I don't think it would
actually be wasted, since it should allow you to have full sharing and
full altp2m access without your dom0 code having to worry about all the
corner cases and interactions.  But I can certainly see why you're not
keen on the prospect of doing so.

4. Allow mem_sharing on altp2m entries with full functionality, but with
corner cases the user of those interfaces must avoid or mitigate; and no
restrictions.

This is what your current patch does; and as I've said, I don't think
this is suitable because it lays a trap for future people to fall into
if they end up enabling both.

But after a discussion with IanJ, I've got a couple of variations of
this one.  The variations all involve different ways of making sure that
the "full functionality" is only available when there is a caller that
knows about the risks and will actively try to avoid them.

4a. Add a per-domain flag to decide whether to allow "unsafe" altp2m /
mem_sharing interactions.  If it's clear, then only "safe" operations
are allowed (i.e., it would start with #1, but if someone wanted they
could implement #2 or #3).  If it's set, then things behave as in #4.
(We probably should try to prevent hypervisor crashes due to unsharing
events, however.)

4b. In the altp2m code, as #4; but add code somewhere else to prevent
the entire features from being turned on in a domain unless the "unsafe"
option is requested.  That is, by default a guest with mem_sharing
enabled would not be able to use altp2m at all; and a guest with altp2m
enabled would not be able to use mem_sharing at all.  Only with the
"unsafe" flag set would be allowed to have both altp2m and mem_sharing
enabled at the same time.

Both 4a and 4b seem like they would be relatively straightforward to
implement; like they would fit your use case, but avoid laying a trap
for people in the future.

What do you think about those options?

If #1 is acceptable to you, I continue to think that's probably the
combination of most robust and simplest to achieve.  I would recommend 2
above 4a and 4b if feasible; and if not, I would probably choose 4a over
4b because it allows for incremental improvement.  But any of {1, 2, 4a,
4b} would work for me.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared
  2016-07-19 17:11                 ` George Dunlap
@ 2016-07-19 19:12                   ` Tamas K Lengyel
  0 siblings, 0 replies; 22+ messages in thread
From: Tamas K Lengyel @ 2016-07-19 19:12 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andrew Cooper, Ian Jackson, Jan Beulich, Xen-devel

On Tue, Jul 19, 2016 at 11:11 AM, George Dunlap
<george.dunlap@citrix.com> wrote:
> On 18/07/16 18:06, Tamas K Lengyel wrote:
>>> Incremental improvements are welcome; but they must not cause
>>> regressions in existing functionality.
>>
>> Existing functionality does not get impaired by this as what happens
>> right now is a hypervisor crash. I don't see how things can get any
>> worst than that.
>
> Also from another thread:
>> If anyone else would have been interested in getting the two systems
>> working together othen then me they probably would have complained
>> already that hey this crashes the hypervisor. My point being that at
>> this point the impact of this patch is likely really low.
>
> From a user perspective, "failing intermittently in some strange and
> unpredictable way" is definitely worse than a hypervisor crash. :-)
>
> My concern is that someone will start using guests which use the altp2m
> interface internally, and that will all work; and then maybe separately
> they will start doing some sort of memory sharing between guests, and
> that will all work; and then at some point they'll do memory sharing on
> a guest using the altp2m functionality internally, and suddenly they'll
> get strange intermittent errors where things don't behave the way they
> expect and they don't know why.  A hypervisor crash that tells them
> exactly what code has the problem is definitely preferable.

Well, IMHO that's where documenting the expected use-case and the
known corner-cases comes into play.

>
>>> The code as it is in the tree right now was intended to allow both
>>> sharing and altp2m to be enabled on the same domain, just not over the
>>> same gfn range.  And it was intended to be robust -- that is, the
>>> sharing code and the altp2m code don't need to be aware of each other
>>> and try not to step on each other's toes; each can just do its own thing
>>> and Xen will make sure that nothing bad happens (by preventing pages
>>> with an altp2m entry from being shared, and unsharing pages for which an
>>> altp2m entry is created).
>>>
>>> It sounds like that's broken right now; it should therefore be fixed.
>>> When it is fixed, you'll be able to use both altp2m and sharing on the
>>> same domain; Xen will simply prevent sharing from happening on gfn
>>> ranges with altp2m entries.
>>
>> No, that's incorrect, it's the other way around. If you were to try to
>> share pages for which you have altp2m entries it will happily oblige.
>> It will just fail to do the altp2m actions for entries of shared
>> entries (copying the mapping to the altp2m view, mem_access, etc).
>
> It's quite possible I missed something, but that's not how I read the
> code.  Before sharing a page you have to have to call
> mem_sharing_nominate_page(), which calls page_make_sharable().
> page_make_sharable() will make sure that it has exactly the expected
> number of references; which for gfns is 2 and for grant references is 4.
>
> When you map an mfn into an altp2m of a different gfn, you'll increase
> the reference count.  So it appears to me that if you attempt to share a
> page which is mapped in an altp2m, then the nominate operation will fail
> (with -E2BIG, of all things).
>
> Am I mistaken about that?

Hm, no you may be right. I was thinking of the type checking only. If
the reference count prevents pages with alt2pm entries from being
shared  - going from p2m_ram_rw -> p2m_ram_shared - then from my
perspective that is fine and I'm not planning on changing that. What
I'm trying to get working is if the type is already p2m_ram_shared and
is going from p2m_ram_shared -> p2m_ram_rw. I would also like to be
able to do mem_access settings for the p2m_ram_shared type pte in an
altp2m view.

>
> (BTB this would probably still be the case even after your patch.)
>
> Also, as far as I can tell, "It will just fail to do the altp2m actions
> for entries of shared entries" is not true; instead, the page will be
> un-shared and the altp2m action will then take place.  Is this not the case?

So right now when the entry is p2m_ram_shared it will crash the
hypervisor because of the lock ordering issue during unsharing. If the
lock ordering issue is fixed, the unsharing event will result in the
altp2m propagate change taking the p2m setting from the hostp2m and
copying it to all affected altp2m views, overwriting any setting that
may have been stored there. This is the situation that can be
monitored with mem_access so that the user can perform the unsharing
and recreating the necessary altp2m settings manually. What I mean in
the quoted sentence is that the altp2m ops do a type-check right now,
so if you shared a page before, the type check will make all altp2m
ops fail on that entry. So for example if you have a shared pte, and
then try to do altp2m mem_access setting on it, it will fail.

>
>>> An even bigger improvement would be to allow the same gfns to be subject
>>> both to altp2m and sharing at the same time.  But this requires thinking
>>> carefully about all the corner cases and making sure that they all work
>>> correctly.
>>
>> And this is exactly what this patch allows you to do. An entry can now
>> be both shared, get properly copied to altp2m views, allow setting
>> mem_access in altp2m views, etc. The only situation you have to take
>> core of is when the type of the entry changes from shared to unshared
>> as that resets the altp2m views.
>
> I described another situation you have to be careful of in an earlier
> e-mail:
> - host gfn A is marked "shared"
> - altp2m gfn O is mapped to gfn A (thus also marked as 'shared')
> - Guest writes to gfn O, Xen attempts to unshare the page.
>
> In this circumstance, the fault will ends up in
> __mem_sharing_unshare_page(), which will calls rmap_retrieve(d, O, mA).
> This returns NULL because gfn O was never put in the reverse map, and
> you BUG().
>
> Again, am I misreading what would happen?

Yes, remapping gfns to point to other shared entries can get you in
trouble because the default unsharing op in Xen doesn't have a way to
handle that. However, it can also be worked with as I suggested by
making gfn O in the altp2m view read/execute-only with mem_access. At
that point the external user has to option to switch to another view
where gfn O is not pointing to a shared entry, or change the mapping
again, or do a number of other things to prevent triggering the BUG().
So the question is, how much hand-holding and restrictions we want to
apply to prevent users from experimenting with exotic setups here?

>
> I'm pretty sure if I went looking I could find some more situations you
> need to avoid to prevent problems.
>
> So the next big missing piece of information in this discussion is
> exactly what you do need from this system.  You're using altp2m and
> mem_sharing (and mem_access) on the same domain, that's obvious.  Which
> features of altp2m are you using -- are you mainly using the mem_access
> changes, or are you also using the gfn mapping functionality?

I'm using both mem_access and remapping in altp2m views. However, in
my setup the remapped entries are always execute-only (have hidden
breakpoints injected). I already have to trap when the hostp2m entries
get write events as I need to update the shadow copy and reapply all
the injected breakpoints. So from my perspective the sharing will just
reduce the memory footprint of the domain as the hostp2m entry can
remain shared as long as possible, the unsharing itself have no
negative impact at all as I already do the copy/update of the shadow
page manually.

>
> Also, how important is it that pages using altp2m functionality not be
> un-shared -- i.e., what proportion of a guest's pages do you expect to
> be shared, and what proportion do you need to perform altp2m operations on?

I'm doing full VM deduplication so close to all pages start with
p2m_ram_shared. Thus pretty much all pages need to pass the altp2m
lazy copy operation with a shared page. For the pages that do have
mem_access/remapped entries, I already trap write events on the
hostp2m so I can do the unsharing manually (mmaping the hostp2m page
that will unshare + overwrite all affected altp2m views). From there I
can reapply any necessary mem_access/remapping without any problems.

>
> So there are several things we could do here:
>
> 1. Allow mem_sharing and altp2m functionality co-existing by avoiding
> each other: the same domain can use both, but the same gfn cannot be
> both shared and have altp2m entries.  This is essentially fixing the bug
> in the current code: Adding an altp2m to a gfn unshares it; and gfns
> which have altp2m entries cannot be shared.
>
> This should, I *think*, be fairly straightforward; it might even only
> require the lock-reordering part of your patch.
>
> If your use-case only uses altp2m functionality on a small number of
> pages, then this might be a not-too-difficult option that would solve
> your problem while not introducing any pitfalls for future users to all
> into.  If your use case requires large amounts of sharing *and* large
> amounts of altp2m entries, this is probably not going to work for you.

Well, this is not as simple as that. The altp2m lazy-copy operation
automatically populates the currently active altp2m view by copying
the pte from the hostp2m. This operation must succeed for my usecase
with shared pages. As for applying mem_access permissions in altp2m
views resulting in automatic unsharing, this is going to result in a
good chunk of pages being unshared for no reason other then the fact
that the external user has to be prepared to handle the unsharing
manually as I described, but I can live with that, it's still better
then nothing. Preventing remapping pointing to a shared page I can
live with, this is not part of my usecase.

> 2. Allow mem_sharing on altp2m entries with limited functionality.
> Specifically: Do not unshare if an altp2m entry only has mem_access
> restrictions set; only unshare it if the gfn is remapped.
>
> This would require some very careful thinking about the different corner
> cases that might happen, but I don't think it would be quite the morass
> that full altp2m support (with gfn remapping) would involve.
>
> If your use-case only uses the mem_access altp2m functionality, this
> might be a solution that's a reasonable amount of work.

This is not likely to be a workable setup for my usecase.

>
> 3. Allow mem_sharing on altp2m entries with full functionality in a
> robust manner, making sure all the corner cases are handled correctly.
>
> This would obviously be quite a bit more work.  I don't think it would
> actually be wasted, since it should allow you to have full sharing and
> full altp2m access without your dom0 code having to worry about all the
> corner cases and interactions.  But I can certainly see why you're not
> keen on the prospect of doing so.

Right, this would be the ultimate goal. However, realistically this is
not going to happen due to how much time and effort it involves.

>
> 4. Allow mem_sharing on altp2m entries with full functionality, but with
> corner cases the user of those interfaces must avoid or mitigate; and no
> restrictions.
>
> This is what your current patch does; and as I've said, I don't think
> this is suitable because it lays a trap for future people to fall into
> if they end up enabling both.
>
> But after a discussion with IanJ, I've got a couple of variations of
> this one.  The variations all involve different ways of making sure that
> the "full functionality" is only available when there is a caller that
> knows about the risks and will actively try to avoid them.
>
> 4a. Add a per-domain flag to decide whether to allow "unsafe" altp2m /
> mem_sharing interactions.  If it's clear, then only "safe" operations
> are allowed (i.e., it would start with #1, but if someone wanted they
> could implement #2 or #3).  If it's set, then things behave as in #4.
> (We probably should try to prevent hypervisor crashes due to unsharing
> events, however.)
>
> 4b. In the altp2m code, as #4; but add code somewhere else to prevent
> the entire features from being turned on in a domain unless the "unsafe"
> option is requested.  That is, by default a guest with mem_sharing
> enabled would not be able to use altp2m at all; and a guest with altp2m
> enabled would not be able to use mem_sharing at all.  Only with the
> "unsafe" flag set would be allowed to have both altp2m and mem_sharing
> enabled at the same time.
>
> Both 4a and 4b seem like they would be relatively straightforward to
> implement; like they would fit your use case, but avoid laying a trap
> for people in the future.
>
> What do you think about those options?
>
> If #1 is acceptable to you, I continue to think that's probably the
> combination of most robust and simplest to achieve.  I would recommend 2
> above 4a and 4b if feasible; and if not, I would probably choose 4a over
> 4b because it allows for incremental improvement.  But any of {1, 2, 4a,
> 4b} would work for me.
>

Let's try to figure out if we can get #1 in an acceptable shape. The
only point where completely avoiding each other is not possible is
during the lazy copy operation where altp2m copies the settings to the
currently active view. As long as there are no mem_access/remapping
going on for the pte, the default unsharing operation and copying to
the altp2m view "just works". As for the other altp2m cases, I can
live with unsharing the gfn's and also preventing sharing to be
performed on them subsequently.

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-07-19 19:12 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-26  3:55 [PATCH v4] altp2m: Allow the hostp2m entries to be of type p2m_ram_shared Tamas K Lengyel
2016-05-26 10:39 ` George Dunlap
2016-05-26 16:17   ` Tamas K Lengyel
2016-06-11 17:55     ` Tamas K Lengyel
2016-06-13  9:28       ` George Dunlap
2016-06-13 17:20         ` Tamas K Lengyel
2016-07-12 17:29           ` Tamas K Lengyel
2016-07-15  8:57     ` George Dunlap
2016-07-15 14:59       ` Tamas K Lengyel
2016-07-18 11:04         ` George Dunlap
2016-07-18 15:18           ` Tamas K Lengyel
2016-07-18 15:27             ` Andrew Cooper
2016-07-18 16:15               ` George Dunlap
2016-07-18 16:22                 ` Tamas K Lengyel
2016-07-18 16:10             ` George Dunlap
2016-07-18 17:06               ` Tamas K Lengyel
2016-07-19 11:39                 ` George Dunlap
2016-07-19 15:20                   ` Tamas K Lengyel
2016-07-19 13:36                 ` Ian Jackson
2016-07-19 15:31                   ` Tamas K Lengyel
2016-07-19 17:11                 ` George Dunlap
2016-07-19 19:12                   ` Tamas K Lengyel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).