xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [Xen-devel] [PATCH v4 0/2] Refactor super page shattering
@ 2019-12-12 12:46 Hongyan Xia
  2019-12-12 12:46 ` [Xen-devel] [PATCH v4 1/2] x86/mm: factor out the code for shattering an l3 PTE Hongyan Xia
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Hongyan Xia @ 2019-12-12 12:46 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, jgrall, Wei Liu, Jan Beulich, Roger Pau Monné

map_pages_to_xen and modify_xen_mappings use almost exactly the same
page shattering logic, and the code is mingled with other PTE
manipulations so it is less obvious that the intention is page
shattering. Factor out the functions to make them reusable and to make
the intention more obvious.

Of course, there is not much difference between the shattering logic of
each level, so we could further turn the per-level functions into a
single macro, although this is not that simple since we have per-level
functions and macros all over the place and there are slight differences
between levels. Keep it per-level for now.

tree:
https://xenbits.xen.org/git-http/people/hx242/xen.git
shatter-refactor_v4

---
Changes in v4:
- helper functions now return bool instead of a random value.
- rebase

Changes in v3:
- style and indentation fixes.

Changes in v2:
- rebase.
- improve asm code.
- avoid stale values when taking the lock.
- move allocation of PTE tables inside the shatter function.

Hongyan Xia (2):
  x86/mm: factor out the code for shattering an l3 PTE
  x86/mm: factor out the code for shattering an l2 PTE

 xen/arch/x86/mm.c | 194 +++++++++++++++++++++++-----------------------
 1 file changed, 98 insertions(+), 96 deletions(-)

-- 
2.17.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Xen-devel] [PATCH v4 1/2] x86/mm: factor out the code for shattering an l3 PTE
  2019-12-12 12:46 [Xen-devel] [PATCH v4 0/2] Refactor super page shattering Hongyan Xia
@ 2019-12-12 12:46 ` Hongyan Xia
  2019-12-13 14:19   ` Andrew Cooper
  2019-12-12 12:46 ` [Xen-devel] [PATCH v4 2/2] x86/mm: factor out the code for shattering an l2 PTE Hongyan Xia
  2019-12-12 13:16 ` [Xen-devel] [PATCH v4 0/2] Refactor super page shattering Jan Beulich
  2 siblings, 1 reply; 10+ messages in thread
From: Hongyan Xia @ 2019-12-12 12:46 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, jgrall, Wei Liu, Jan Beulich, Roger Pau Monné

map_pages_to_xen and modify_xen_mappings are performing almost exactly
the same operations when shattering an l3 PTE, the only difference
being whether we want to flush.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

---
Changes in v4:
- use false/true instead of -1/0 to indicate failure/success.
- remove unnecessary cast.

Changes in v3:
- style and indentation changes.
- return -ENOMEM instead of -1.

Changes in v2:
- improve asm.
- re-read pl3e from memory when taking the lock.
- move the allocation of l2t inside the shatter function.
---
 xen/arch/x86/mm.c | 98 +++++++++++++++++++++++------------------------
 1 file changed, 49 insertions(+), 49 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 7d4dd80a85..8def4fb8d8 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5151,6 +5151,52 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
                          flush_area_local((const void *)v, f) : \
                          flush_area_all((const void *)v, f))
 
+/* Shatter an l3 entry and populate l2. If virt is passed in, also do flush. */
+static bool shatter_l3e(l3_pgentry_t *pl3e, unsigned long virt, bool locking)
+{
+    unsigned int i;
+    l3_pgentry_t ol3e = *pl3e;
+    l2_pgentry_t l2e = l2e_from_intpte(l3e_get_intpte(ol3e));
+    l2_pgentry_t *l2t = alloc_xen_pagetable();
+
+    if ( !l2t )
+        return false;
+
+    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
+    {
+        l2e_write(l2t + i, l2e);
+        l2e = l2e_from_intpte(
+                  l2e_get_intpte(l2e) + (PAGE_SIZE << PAGETABLE_ORDER));
+    }
+
+    if ( locking )
+        spin_lock(&map_pgdir_lock);
+    if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
+         (l3e_get_flags(*pl3e) & _PAGE_PSE) )
+    {
+        l3e_write_atomic(pl3e,
+            l3e_from_paddr(virt_to_maddr(l2t), __PAGE_HYPERVISOR));
+        l2t = NULL;
+    }
+    if ( locking )
+        spin_unlock(&map_pgdir_lock);
+
+    if ( virt )
+    {
+        unsigned int flush_flags =
+            FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
+
+        if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
+            flush_flags |= FLUSH_TLB_GLOBAL;
+        flush_area(virt, flush_flags);
+    }
+
+    if ( l2t )
+        free_xen_pagetable(l2t);
+
+    return true;
+}
+
 int map_pages_to_xen(
     unsigned long virt,
     mfn_t mfn,
@@ -5244,9 +5290,6 @@ int map_pages_to_xen(
         if ( (l3e_get_flags(ol3e) & _PAGE_PRESENT) &&
              (l3e_get_flags(ol3e) & _PAGE_PSE) )
         {
-            unsigned int flush_flags =
-                FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
-
             /* Skip this PTE if there is no change. */
             if ( ((l3e_get_pfn(ol3e) & ~(L2_PAGETABLE_ENTRIES *
                                          L1_PAGETABLE_ENTRIES - 1)) +
@@ -5267,33 +5310,9 @@ int map_pages_to_xen(
                 continue;
             }
 
-            pl2e = alloc_xen_pagetable();
-            if ( pl2e == NULL )
+            /* Pass virt to indicate we need to flush. */
+            if ( !shatter_l3e(pl3e, virt, locking) )
                 return -ENOMEM;
-
-            for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
-                l2e_write(pl2e + i,
-                          l2e_from_pfn(l3e_get_pfn(ol3e) +
-                                       (i << PAGETABLE_ORDER),
-                                       l3e_get_flags(ol3e)));
-
-            if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
-                flush_flags |= FLUSH_TLB_GLOBAL;
-
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
-            if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
-                 (l3e_get_flags(*pl3e) & _PAGE_PSE) )
-            {
-                l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(pl2e),
-                                                    __PAGE_HYPERVISOR));
-                pl2e = NULL;
-            }
-            if ( locking )
-                spin_unlock(&map_pgdir_lock);
-            flush_area(virt, flush_flags);
-            if ( pl2e )
-                free_xen_pagetable(pl2e);
         }
 
         pl2e = virt_to_xen_l2e(virt);
@@ -5578,27 +5597,8 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
             }
 
             /* PAGE1GB: shatter the superpage and fall through. */
-            pl2e = alloc_xen_pagetable();
-            if ( !pl2e )
+            if ( !shatter_l3e(pl3e, 0, locking) )
                 return -ENOMEM;
-            for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
-                l2e_write(pl2e + i,
-                          l2e_from_pfn(l3e_get_pfn(*pl3e) +
-                                       (i << PAGETABLE_ORDER),
-                                       l3e_get_flags(*pl3e)));
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
-            if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
-                 (l3e_get_flags(*pl3e) & _PAGE_PSE) )
-            {
-                l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(pl2e),
-                                                    __PAGE_HYPERVISOR));
-                pl2e = NULL;
-            }
-            if ( locking )
-                spin_unlock(&map_pgdir_lock);
-            if ( pl2e )
-                free_xen_pagetable(pl2e);
         }
 
         /*
-- 
2.17.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Xen-devel] [PATCH v4 2/2] x86/mm: factor out the code for shattering an l2 PTE
  2019-12-12 12:46 [Xen-devel] [PATCH v4 0/2] Refactor super page shattering Hongyan Xia
  2019-12-12 12:46 ` [Xen-devel] [PATCH v4 1/2] x86/mm: factor out the code for shattering an l3 PTE Hongyan Xia
@ 2019-12-12 12:46 ` Hongyan Xia
  2019-12-12 13:16 ` [Xen-devel] [PATCH v4 0/2] Refactor super page shattering Jan Beulich
  2 siblings, 0 replies; 10+ messages in thread
From: Hongyan Xia @ 2019-12-12 12:46 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, jgrall, Wei Liu, Jan Beulich, Roger Pau Monné

map_pages_to_xen and modify_xen_mappings are performing almost exactly
the same operations when shattering an l2 PTE, the only difference
being whether we want to flush.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

---
Changes in v4:
- use false/true instead of -1/0 to indicate failure/success.
- remove unnecessary cast.

Changes in v3:
- style and indentation changes.
- return -ENOMEM instead of -1.

Changes in v2:
- improve asm.
- re-read pl2e from memory when taking the lock.
- move the allocation of l1t inside the shatter function.
---
 xen/arch/x86/mm.c | 96 ++++++++++++++++++++++++-----------------------
 1 file changed, 49 insertions(+), 47 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 8def4fb8d8..4daf0ff0f0 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5151,6 +5151,52 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
                          flush_area_local((const void *)v, f) : \
                          flush_area_all((const void *)v, f))
 
+/* Shatter an l2 entry and populate l1. If virt is passed in, also do flush. */
+static bool shatter_l2e(l2_pgentry_t *pl2e, unsigned long virt, bool locking)
+{
+    unsigned int i;
+    l2_pgentry_t ol2e = *pl2e;
+    l1_pgentry_t l1e = l1e_from_paddr(l2e_get_paddr(ol2e),
+                                      lNf_to_l1f(l2e_get_flags(ol2e)));
+    l1_pgentry_t *l1t = alloc_xen_pagetable();
+
+    if ( !l1t )
+        return false;
+
+    for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
+    {
+        l1e_write(l1t + i, l1e);
+        l1e = l1e_from_intpte(l1e_get_intpte(l1e) + PAGE_SIZE);
+    }
+
+    if ( locking )
+        spin_lock(&map_pgdir_lock);
+    if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
+         (l2e_get_flags(*pl2e) & _PAGE_PSE) )
+    {
+        l2e_write_atomic(pl2e,
+            l2e_from_paddr(virt_to_maddr(l1t), __PAGE_HYPERVISOR));
+        l1t = NULL;
+    }
+    if ( locking )
+        spin_unlock(&map_pgdir_lock);
+
+    if ( virt )
+    {
+        unsigned int flush_flags =
+            FLUSH_TLB | FLUSH_ORDER(PAGETABLE_ORDER);
+
+        if ( l2e_get_flags(ol2e) & _PAGE_GLOBAL )
+            flush_flags |= FLUSH_TLB_GLOBAL;
+        flush_area(virt, flush_flags);
+    }
+
+    if ( l1t )
+        free_xen_pagetable(l1t);
+
+    return true;
+}
+
 /* Shatter an l3 entry and populate l2. If virt is passed in, also do flush. */
 static bool shatter_l3e(l3_pgentry_t *pl3e, unsigned long virt, bool locking)
 {
@@ -5364,9 +5410,6 @@ int map_pages_to_xen(
             }
             else if ( l2e_get_flags(*pl2e) & _PAGE_PSE )
             {
-                unsigned int flush_flags =
-                    FLUSH_TLB | FLUSH_ORDER(PAGETABLE_ORDER);
-
                 /* Skip this PTE if there is no change. */
                 if ( (((l2e_get_pfn(*pl2e) & ~(L1_PAGETABLE_ENTRIES - 1)) +
                        l1_table_offset(virt)) == mfn_x(mfn)) &&
@@ -5385,32 +5428,9 @@ int map_pages_to_xen(
                     goto check_l3;
                 }
 
-                pl1e = alloc_xen_pagetable();
-                if ( pl1e == NULL )
+                /* Pass virt to indicate we need to flush. */
+                if ( !shatter_l2e(pl2e, virt, locking) )
                     return -ENOMEM;
-
-                for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
-                    l1e_write(&pl1e[i],
-                              l1e_from_pfn(l2e_get_pfn(*pl2e) + i,
-                                           lNf_to_l1f(l2e_get_flags(*pl2e))));
-
-                if ( l2e_get_flags(*pl2e) & _PAGE_GLOBAL )
-                    flush_flags |= FLUSH_TLB_GLOBAL;
-
-                if ( locking )
-                    spin_lock(&map_pgdir_lock);
-                if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
-                     (l2e_get_flags(*pl2e) & _PAGE_PSE) )
-                {
-                    l2e_write_atomic(pl2e, l2e_from_mfn(virt_to_mfn(pl1e),
-                                                        __PAGE_HYPERVISOR));
-                    pl1e = NULL;
-                }
-                if ( locking )
-                    spin_unlock(&map_pgdir_lock);
-                flush_area(virt, flush_flags);
-                if ( pl1e )
-                    free_xen_pagetable(pl1e);
             }
 
             pl1e  = l2e_to_l1e(*pl2e) + l1_table_offset(virt);
@@ -5633,26 +5653,8 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
             else
             {
                 /* PSE: shatter the superpage and try again. */
-                pl1e = alloc_xen_pagetable();
-                if ( !pl1e )
+                if ( !shatter_l2e(pl2e, 0, locking) )
                     return -ENOMEM;
-                for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
-                    l1e_write(&pl1e[i],
-                              l1e_from_pfn(l2e_get_pfn(*pl2e) + i,
-                                           l2e_get_flags(*pl2e) & ~_PAGE_PSE));
-                if ( locking )
-                    spin_lock(&map_pgdir_lock);
-                if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
-                     (l2e_get_flags(*pl2e) & _PAGE_PSE) )
-                {
-                    l2e_write_atomic(pl2e, l2e_from_mfn(virt_to_mfn(pl1e),
-                                                        __PAGE_HYPERVISOR));
-                    pl1e = NULL;
-                }
-                if ( locking )
-                    spin_unlock(&map_pgdir_lock);
-                if ( pl1e )
-                    free_xen_pagetable(pl1e);
             }
         }
         else
-- 
2.17.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Xen-devel] [PATCH v4 0/2] Refactor super page shattering
  2019-12-12 12:46 [Xen-devel] [PATCH v4 0/2] Refactor super page shattering Hongyan Xia
  2019-12-12 12:46 ` [Xen-devel] [PATCH v4 1/2] x86/mm: factor out the code for shattering an l3 PTE Hongyan Xia
  2019-12-12 12:46 ` [Xen-devel] [PATCH v4 2/2] x86/mm: factor out the code for shattering an l2 PTE Hongyan Xia
@ 2019-12-12 13:16 ` Jan Beulich
  2019-12-13 14:35   ` Andrew Cooper
  2 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2019-12-12 13:16 UTC (permalink / raw)
  To: Hongyan Xia, Andrew Cooper
  Cc: xen-devel, jgrall, Wei Liu, Roger Pau Monné

On 12.12.2019 13:46, Hongyan Xia wrote:
> map_pages_to_xen and modify_xen_mappings use almost exactly the same
> page shattering logic, and the code is mingled with other PTE
> manipulations so it is less obvious that the intention is page
> shattering. Factor out the functions to make them reusable and to make
> the intention more obvious.
> 
> Of course, there is not much difference between the shattering logic of
> each level, so we could further turn the per-level functions into a
> single macro, although this is not that simple since we have per-level
> functions and macros all over the place and there are slight differences
> between levels. Keep it per-level for now.

FWIW these look okay to me now, and I would give them my R-b without
if there wasn't the type safety issue. Andrew?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xen-devel] [PATCH v4 1/2] x86/mm: factor out the code for shattering an l3 PTE
  2019-12-12 12:46 ` [Xen-devel] [PATCH v4 1/2] x86/mm: factor out the code for shattering an l3 PTE Hongyan Xia
@ 2019-12-13 14:19   ` Andrew Cooper
  2019-12-13 14:36     ` Jan Beulich
  2019-12-13 15:55     ` Xia, Hongyan
  0 siblings, 2 replies; 10+ messages in thread
From: Andrew Cooper @ 2019-12-13 14:19 UTC (permalink / raw)
  To: Hongyan Xia, xen-devel; +Cc: jgrall, Wei Liu, Jan Beulich, Roger Pau Monné

On 12/12/2019 12:46, Hongyan Xia wrote:
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 7d4dd80a85..8def4fb8d8 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -5151,6 +5151,52 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
>                           flush_area_local((const void *)v, f) : \
>                           flush_area_all((const void *)v, f))
>  
> +/* Shatter an l3 entry and populate l2. If virt is passed in, also do flush. */
> +static bool shatter_l3e(l3_pgentry_t *pl3e, unsigned long virt, bool locking)
> +{
> +    unsigned int i;
> +    l3_pgentry_t ol3e = *pl3e;
> +    l2_pgentry_t l2e = l2e_from_intpte(l3e_get_intpte(ol3e));
> +    l2_pgentry_t *l2t = alloc_xen_pagetable();
> +
> +    if ( !l2t )
> +        return false;
> +
> +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
> +    {
> +        l2e_write(l2t + i, l2e);
> +        l2e = l2e_from_intpte(
> +                  l2e_get_intpte(l2e) + (PAGE_SIZE << PAGETABLE_ORDER));
> +    }
> +
> +    if ( locking )
> +        spin_lock(&map_pgdir_lock);
> +    if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
> +         (l3e_get_flags(*pl3e) & _PAGE_PSE) )
> +    {
> +        l3e_write_atomic(pl3e,
> +            l3e_from_paddr(virt_to_maddr(l2t), __PAGE_HYPERVISOR));
> +        l2t = NULL;
> +    }
> +    if ( locking )
> +        spin_unlock(&map_pgdir_lock);
> +
> +    if ( virt )
> +    {
> +        unsigned int flush_flags =
> +            FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
> +
> +        if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
> +            flush_flags |= FLUSH_TLB_GLOBAL;

Another problematic use of ol3e which is racy on conflict.  You need to
strictly use the content of *pl3e from within the locked region.

However, why have you moved the flushing in here?  Only one of the two
callers actually wanted it, and even then I'm not totally sure it is
necessary.

Both callers operate on an arbitrary range of addresses, and for
anything other than a singleton update, will want to issue a single
flush at the end, rather than a spate of flushes for sub-areas.

(Although someone really please check my reasoning here for the
map_pages_to_xen() case which currently does have sub-area flushing.)

Either the flush wants dropping (and best via a prereq patch altering
map_pages_to_xen()), or you need to cache ol3e in the locked region with
ACCESS_ONCE() or equivalent.

> +        flush_area(virt, flush_flags);
> +    }
> +
> +    if ( l2t )
> +        free_xen_pagetable(l2t);

Mind annotating this as:

    if ( l2t ) /* Raced on trying to shatter?  Throw away our work. */

to highlight that this is an error path, and there is no connection
between the TLB flushing and pagetable freeing.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xen-devel] [PATCH v4 0/2] Refactor super page shattering
  2019-12-12 13:16 ` [Xen-devel] [PATCH v4 0/2] Refactor super page shattering Jan Beulich
@ 2019-12-13 14:35   ` Andrew Cooper
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Cooper @ 2019-12-13 14:35 UTC (permalink / raw)
  To: Jan Beulich, Hongyan Xia; +Cc: xen-devel, jgrall, Wei Liu, Roger Pau Monné

On 12/12/2019 13:16, Jan Beulich wrote:
> On 12.12.2019 13:46, Hongyan Xia wrote:
>> map_pages_to_xen and modify_xen_mappings use almost exactly the same
>> page shattering logic, and the code is mingled with other PTE
>> manipulations so it is less obvious that the intention is page
>> shattering. Factor out the functions to make them reusable and to make
>> the intention more obvious.
>>
>> Of course, there is not much difference between the shattering logic of
>> each level, so we could further turn the per-level functions into a
>> single macro, although this is not that simple since we have per-level
>> functions and macros all over the place and there are slight differences
>> between levels. Keep it per-level for now.
> FWIW these look okay to me now, and I would give them my R-b without
> if there wasn't the type safety issue. Andrew?

There are correctness issues which I've pointed out, but WRT type
safety, these look fine.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xen-devel] [PATCH v4 1/2] x86/mm: factor out the code for shattering an l3 PTE
  2019-12-13 14:19   ` Andrew Cooper
@ 2019-12-13 14:36     ` Jan Beulich
  2019-12-13 14:58       ` Andrew Cooper
  2019-12-13 15:55     ` Xia, Hongyan
  1 sibling, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2019-12-13 14:36 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: xen-devel, jgrall, Roger Pau Monné, Wei Liu, Hongyan Xia

On 13.12.2019 15:19, Andrew Cooper wrote:
> On 12/12/2019 12:46, Hongyan Xia wrote:
>> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
>> index 7d4dd80a85..8def4fb8d8 100644
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -5151,6 +5151,52 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
>>                           flush_area_local((const void *)v, f) : \
>>                           flush_area_all((const void *)v, f))
>>  
>> +/* Shatter an l3 entry and populate l2. If virt is passed in, also do flush. */
>> +static bool shatter_l3e(l3_pgentry_t *pl3e, unsigned long virt, bool locking)
>> +{
>> +    unsigned int i;
>> +    l3_pgentry_t ol3e = *pl3e;
>> +    l2_pgentry_t l2e = l2e_from_intpte(l3e_get_intpte(ol3e));
>> +    l2_pgentry_t *l2t = alloc_xen_pagetable();
>> +
>> +    if ( !l2t )
>> +        return false;
>> +
>> +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
>> +    {
>> +        l2e_write(l2t + i, l2e);
>> +        l2e = l2e_from_intpte(
>> +                  l2e_get_intpte(l2e) + (PAGE_SIZE << PAGETABLE_ORDER));
>> +    }
>> +
>> +    if ( locking )
>> +        spin_lock(&map_pgdir_lock);
>> +    if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
>> +         (l3e_get_flags(*pl3e) & _PAGE_PSE) )
>> +    {
>> +        l3e_write_atomic(pl3e,
>> +            l3e_from_paddr(virt_to_maddr(l2t), __PAGE_HYPERVISOR));
>> +        l2t = NULL;
>> +    }
>> +    if ( locking )
>> +        spin_unlock(&map_pgdir_lock);
>> +
>> +    if ( virt )
>> +    {
>> +        unsigned int flush_flags =
>> +            FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
>> +
>> +        if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
>> +            flush_flags |= FLUSH_TLB_GLOBAL;
> 
> Another problematic use of ol3e which is racy on conflict.  You need to
> strictly use the content of *pl3e from within the locked region.

But this isn't a problem introduced here, i.e. fixing of it doesn't
strictly fall under "re-factor". (I'm certainly not opposed to
getting this right at the same time.)

> However, why have you moved the flushing in here?  Only one of the two
> callers actually wanted it, and even then I'm not totally sure it is
> necessary.
> 
> Both callers operate on an arbitrary range of addresses, and for
> anything other than a singleton update, will want to issue a single
> flush at the end, rather than a spate of flushes for sub-areas.
> 
> (Although someone really please check my reasoning here for the
> map_pages_to_xen() case which currently does have sub-area flushing.)
> 
> Either the flush wants dropping (and best via a prereq patch altering
> map_pages_to_xen()), or you need to cache ol3e in the locked region with
> ACCESS_ONCE() or equivalent.

Well, at best replacing by a single one at the end, but I guess
the current piecemeal behavior is to cope with error paths (see
Julien's report against modify_xen_mappings(), where it's
exactly the other way around). Considering especially speculative
accesses I think it isn't the worst idea to keep the window small
between shatter and flush (short of us doing a proper break-then-
make sequence).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xen-devel] [PATCH v4 1/2] x86/mm: factor out the code for shattering an l3 PTE
  2019-12-13 14:36     ` Jan Beulich
@ 2019-12-13 14:58       ` Andrew Cooper
  2019-12-13 16:02         ` Xia, Hongyan
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Cooper @ 2019-12-13 14:58 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, jgrall, Roger Pau Monné, Wei Liu, Hongyan Xia

On 13/12/2019 14:36, Jan Beulich wrote:
> On 13.12.2019 15:19, Andrew Cooper wrote:
>> On 12/12/2019 12:46, Hongyan Xia wrote:
>>> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
>>> index 7d4dd80a85..8def4fb8d8 100644
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -5151,6 +5151,52 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
>>>                           flush_area_local((const void *)v, f) : \
>>>                           flush_area_all((const void *)v, f))
>>>  
>>> +/* Shatter an l3 entry and populate l2. If virt is passed in, also do flush. */
>>> +static bool shatter_l3e(l3_pgentry_t *pl3e, unsigned long virt, bool locking)
>>> +{
>>> +    unsigned int i;
>>> +    l3_pgentry_t ol3e = *pl3e;
>>> +    l2_pgentry_t l2e = l2e_from_intpte(l3e_get_intpte(ol3e));
>>> +    l2_pgentry_t *l2t = alloc_xen_pagetable();
>>> +
>>> +    if ( !l2t )
>>> +        return false;
>>> +
>>> +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
>>> +    {
>>> +        l2e_write(l2t + i, l2e);
>>> +        l2e = l2e_from_intpte(
>>> +                  l2e_get_intpte(l2e) + (PAGE_SIZE << PAGETABLE_ORDER));
>>> +    }
>>> +
>>> +    if ( locking )
>>> +        spin_lock(&map_pgdir_lock);
>>> +    if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
>>> +         (l3e_get_flags(*pl3e) & _PAGE_PSE) )
>>> +    {
>>> +        l3e_write_atomic(pl3e,
>>> +            l3e_from_paddr(virt_to_maddr(l2t), __PAGE_HYPERVISOR));
>>> +        l2t = NULL;
>>> +    }
>>> +    if ( locking )
>>> +        spin_unlock(&map_pgdir_lock);
>>> +
>>> +    if ( virt )
>>> +    {
>>> +        unsigned int flush_flags =
>>> +            FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
>>> +
>>> +        if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
>>> +            flush_flags |= FLUSH_TLB_GLOBAL;
>> Another problematic use of ol3e which is racy on conflict.  You need to
>> strictly use the content of *pl3e from within the locked region.
> But this isn't a problem introduced here, i.e. fixing of it doesn't
> strictly fall under "re-factor". (I'm certainly not opposed to
> getting this right at the same time.)

It is brand new code which is racy.  Its either not necessary, or an
XSA-in-waiting.  (And not necessary, AFAICT).

>
>> However, why have you moved the flushing in here?  Only one of the two
>> callers actually wanted it, and even then I'm not totally sure it is
>> necessary.
>>
>> Both callers operate on an arbitrary range of addresses, and for
>> anything other than a singleton update, will want to issue a single
>> flush at the end, rather than a spate of flushes for sub-areas.
>>
>> (Although someone really please check my reasoning here for the
>> map_pages_to_xen() case which currently does have sub-area flushing.)
>>
>> Either the flush wants dropping (and best via a prereq patch altering
>> map_pages_to_xen()), or you need to cache ol3e in the locked region with
>> ACCESS_ONCE() or equivalent.
> Well, at best replacing by a single one at the end, but I guess
> the current piecemeal behavior is to cope with error paths (see
> Julien's report against modify_xen_mappings(), where it's
> exactly the other way around). Considering especially speculative
> accesses I think it isn't the worst idea to keep the window small
> between shatter and flush (short of us doing a proper break-then-
> make sequence).

Every sub-flush is a broadcast IPI, which is a scalability concern, and
at least needs considering.

x86 is designed not to need BBM, although the BBM sequence can be
helpful at times to simplify other reasoning.  It might actually be
necessary in some SMP cases.

Speculation can bite you at any point, including the very next
instruction.  The logic is either correct, or not correct, and the
distance between the PTE update and the flush is only relevant when it
comes to the scarcity of the incorrect case manifesting in a noticeable way.

Fundamentally, we either need BBM, or the flush is safe to defer to the
end.  Everything in-between is racy, and dropping the sub-flushes would
make any incorrect cases more likely to manifest.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xen-devel] [PATCH v4 1/2] x86/mm: factor out the code for shattering an l3 PTE
  2019-12-13 14:19   ` Andrew Cooper
  2019-12-13 14:36     ` Jan Beulich
@ 2019-12-13 15:55     ` Xia, Hongyan
  1 sibling, 0 replies; 10+ messages in thread
From: Xia, Hongyan @ 2019-12-13 15:55 UTC (permalink / raw)
  To: andrew.cooper3, xen-devel; +Cc: Grall, Julien, wl, jbeulich, roger.pau

Hi,

On Fri, 2019-12-13 at 14:19 +0000, Andrew Cooper wrote:
> On 12/12/2019 12:46, Hongyan Xia wrote:
> > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> > index 7d4dd80a85..8def4fb8d8 100644
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -5151,6 +5151,52 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long
> > v)
> >                           flush_area_local((const void *)v, f) : \
> >                           flush_area_all((const void *)v, f))
> >  
> > +/* Shatter an l3 entry and populate l2. If virt is passed in, also
> > do flush. */
> > +static bool shatter_l3e(l3_pgentry_t *pl3e, unsigned long virt,
> > bool locking)
> > +{
> > +    unsigned int i;
> > +    l3_pgentry_t ol3e = *pl3e;
> > +    l2_pgentry_t l2e = l2e_from_intpte(l3e_get_intpte(ol3e));
> > +    l2_pgentry_t *l2t = alloc_xen_pagetable();
> > +
> > +    if ( !l2t )
> > +        return false;
> > +
> > +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
> > +    {
> > +        l2e_write(l2t + i, l2e);
> > +        l2e = l2e_from_intpte(
> > +                  l2e_get_intpte(l2e) + (PAGE_SIZE <<
> > PAGETABLE_ORDER));
> > +    }
> > +
> > +    if ( locking )
> > +        spin_lock(&map_pgdir_lock);
> > +    if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
> > +         (l3e_get_flags(*pl3e) & _PAGE_PSE) )
> > +    {
> > +        l3e_write_atomic(pl3e,
> > +            l3e_from_paddr(virt_to_maddr(l2t),
> > __PAGE_HYPERVISOR));
> > +        l2t = NULL;
> > +    }
> > +    if ( locking )
> > +        spin_unlock(&map_pgdir_lock);
> > +
> > +    if ( virt )
> > +    {
> > +        unsigned int flush_flags =
> > +            FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
> > +
> > +        if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
> > +            flush_flags |= FLUSH_TLB_GLOBAL;
> 
> Another problematic use of ol3e which is racy on conflict.  You need
> to
> strictly use the content of *pl3e from within the locked region.
> 

This is actually just refactoring, although if the original code is
wrong, it is also worth fixing.

In fact, in the last couple of days, the more I read the code in
map_pages_to_xen, the more I am worried about its race conditions and
correctness. The lock is mostly used for writes, so there are so many
reads outside the locked region which could potentially read stale
values. One example I found is (after refactoring):

             else if ( l2e_get_flags(*pl2e) & _PAGE_PSE )
             {
	         ...
                 /* Pass virt to indicate we need to flush. */
                 if ( !shatter_l2e(pl2e, virt, locking) )
                     return -ENOMEM;
	     }

	     pl1e  = l2e_to_l1e(*pl2e) + l1_table_offset(virt);

It tries to shatter an l2 page before accessing a pl1e, but is there
any guard between the shattering and the read of pl1e? If another call
comes in between the two and merges this page back to a superpage, the
pl1e then accesses the superpage memory instead of a PTE page! (Please
check my logic.) Also in other places, we see the races between PTE
modifications and flushes.

There could be more examples like this. Of course, removing the code
for merging can avoid a lot of the problems, although Julien explained
to me that it could be useful during boot. If removing is not an
option, is it a big problem to extend the lock, e.g., to the whole
function? It is mostly just used by vmap after boot, and a larger lock
can simplify this function and its logic significantly. vmap is already
taking other global locks before map_pages_to_xen anyway though.

Hongyan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xen-devel] [PATCH v4 1/2] x86/mm: factor out the code for shattering an l3 PTE
  2019-12-13 14:58       ` Andrew Cooper
@ 2019-12-13 16:02         ` Xia, Hongyan
  0 siblings, 0 replies; 10+ messages in thread
From: Xia, Hongyan @ 2019-12-13 16:02 UTC (permalink / raw)
  To: jbeulich, andrew.cooper3; +Cc: xen-devel, Grall,  Julien, wl, roger.pau

On Fri, 2019-12-13 at 14:58 +0000, Andrew Cooper wrote:
> On 13/12/2019 14:36, Jan Beulich wrote:
> > On 13.12.2019 15:19, Andrew Cooper wrote:
> > > On 12/12/2019 12:46, Hongyan Xia wrote:
> > > > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> > > > index 7d4dd80a85..8def4fb8d8 100644
> > > > --- a/xen/arch/x86/mm.c
> > > > +++ b/xen/arch/x86/mm.c
> > > > @@ -5151,6 +5151,52 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned
> > > > long v)
> > > >                           flush_area_local((const void *)v, f)
> > > > : \
> > > >                           flush_area_all((const void *)v, f))
> > > >  
> > > > +/* Shatter an l3 entry and populate l2. If virt is passed in,
> > > > also do flush. */
> > > > +static bool shatter_l3e(l3_pgentry_t *pl3e, unsigned long
> > > > virt, bool locking)
> > > > +{
> > > > +    unsigned int i;
> > > > +    l3_pgentry_t ol3e = *pl3e;
> > > > +    l2_pgentry_t l2e = l2e_from_intpte(l3e_get_intpte(ol3e));
> > > > +    l2_pgentry_t *l2t = alloc_xen_pagetable();
> > > > +
> > > > +    if ( !l2t )
> > > > +        return false;
> > > > +
> > > > +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
> > > > +    {
> > > > +        l2e_write(l2t + i, l2e);
> > > > +        l2e = l2e_from_intpte(
> > > > +                  l2e_get_intpte(l2e) + (PAGE_SIZE <<
> > > > PAGETABLE_ORDER));
> > > > +    }
> > > > +
> > > > +    if ( locking )
> > > > +        spin_lock(&map_pgdir_lock);
> > > > +    if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
> > > > +         (l3e_get_flags(*pl3e) & _PAGE_PSE) )
> > > > +    {
> > > > +        l3e_write_atomic(pl3e,
> > > > +            l3e_from_paddr(virt_to_maddr(l2t),
> > > > __PAGE_HYPERVISOR));
> > > > +        l2t = NULL;
> > > > +    }
> > > > +    if ( locking )
> > > > +        spin_unlock(&map_pgdir_lock);
> > > > +
> > > > +    if ( virt )
> > > > +    {
> > > > +        unsigned int flush_flags =
> > > > +            FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
> > > > +
> > > > +        if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
> > > > +            flush_flags |= FLUSH_TLB_GLOBAL;
> > > 
> > > Another problematic use of ol3e which is racy on conflict.  You
> > > need to
> > > strictly use the content of *pl3e from within the locked region.
> > 
> > But this isn't a problem introduced here, i.e. fixing of it doesn't
> > strictly fall under "re-factor". (I'm certainly not opposed to
> > getting this right at the same time.)
> 
> It is brand new code which is racy.  Its either not necessary, or an
> XSA-in-waiting.  (And not necessary, AFAICT).
> 

I am really confused. The original code already does sub-region
flushes, and it uses a flush flag from ol3e that is even more outdated
than the refactored version, so I am not quite getting your point. I
hope I am not missing something obvious.

Hongyan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-12-13 21:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-12 12:46 [Xen-devel] [PATCH v4 0/2] Refactor super page shattering Hongyan Xia
2019-12-12 12:46 ` [Xen-devel] [PATCH v4 1/2] x86/mm: factor out the code for shattering an l3 PTE Hongyan Xia
2019-12-13 14:19   ` Andrew Cooper
2019-12-13 14:36     ` Jan Beulich
2019-12-13 14:58       ` Andrew Cooper
2019-12-13 16:02         ` Xia, Hongyan
2019-12-13 15:55     ` Xia, Hongyan
2019-12-12 12:46 ` [Xen-devel] [PATCH v4 2/2] x86/mm: factor out the code for shattering an l2 PTE Hongyan Xia
2019-12-12 13:16 ` [Xen-devel] [PATCH v4 0/2] Refactor super page shattering Jan Beulich
2019-12-13 14:35   ` Andrew Cooper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).