All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xen-devel] [PATCH 0/2] Refactor super page shattering
@ 2019-12-06 15:53 Hongyan Xia
  2019-12-06 15:53 ` [Xen-devel] [PATCH 1/2] x86/mm: factor out the code for shattering an l3 PTE Hongyan Xia
  2019-12-06 15:53 ` [Xen-devel] [PATCH 2/2] x86/mm: factor out the code for shattering an l2 PTE Hongyan Xia
  0 siblings, 2 replies; 11+ messages in thread
From: Hongyan Xia @ 2019-12-06 15:53 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, jgrall, Wei Liu, Jan Beulich, Roger Pau Monné

map_pages_to_xen and modify_xen_mappings use almost exactly the same
page shattering logic, and the code is mingled with other PTE
manipulations so it is less obvious that the intention is page
shattering. Factor out the functions to make them reusable and to make
the intention more obvious.

Of course, there is not much difference between the shattering logic of
each level, so we could further turn the per-level functions into a
single macro, although this is not that simple since we have per-level
functions and macros all over the place and there are slight differences
between levels. Keep it per-level for now.

Hongyan Xia (2):
  x86/mm: factor out the code for shattering an l3 PTE
  x86/mm: factor out the code for shattering an l2 PTE

 xen/arch/x86/mm.c | 166 ++++++++++++++++++++++------------------------
 1 file changed, 78 insertions(+), 88 deletions(-)

-- 
2.17.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Xen-devel] [PATCH 1/2] x86/mm: factor out the code for shattering an l3 PTE
  2019-12-06 15:53 [Xen-devel] [PATCH 0/2] Refactor super page shattering Hongyan Xia
@ 2019-12-06 15:53 ` Hongyan Xia
  2019-12-06 17:50   ` Andrew Cooper
  2019-12-07 19:04   ` Julien Grall
  2019-12-06 15:53 ` [Xen-devel] [PATCH 2/2] x86/mm: factor out the code for shattering an l2 PTE Hongyan Xia
  1 sibling, 2 replies; 11+ messages in thread
From: Hongyan Xia @ 2019-12-06 15:53 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, jgrall, Wei Liu, Jan Beulich, Roger Pau Monné

map_pages_to_xen and modify_xen_mappings are performing almost exactly
the same operations when shattering an l3 PTE, the only difference
being whether we want to flush.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
---
 xen/arch/x86/mm.c | 85 ++++++++++++++++++++++-------------------------
 1 file changed, 40 insertions(+), 45 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 7d4dd80a85..42aaaa1083 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5151,6 +5151,43 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
                          flush_area_local((const void *)v, f) : \
                          flush_area_all((const void *)v, f))
 
+/* Shatter an l3 entry and populate l2. If virt is passed in, also do flush. */
+static void shatter_l3e(l3_pgentry_t *pl3e, l2_pgentry_t *l2t,
+        unsigned long virt, bool locking)
+{
+    unsigned int i;
+    l3_pgentry_t ol3e = *pl3e;
+
+    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
+        l2e_write(l2t + i,
+                  l2e_from_pfn(l3e_get_pfn(ol3e) +
+                               (i << PAGETABLE_ORDER),
+                               l3e_get_flags(ol3e)));
+
+    if ( locking )
+        spin_lock(&map_pgdir_lock);
+    if ( (l3e_get_flags(ol3e) & _PAGE_PRESENT) &&
+         (l3e_get_flags(ol3e) & _PAGE_PSE) )
+    {
+        l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(l2t),
+                                            __PAGE_HYPERVISOR));
+        l2t = NULL;
+    }
+    if ( locking )
+        spin_unlock(&map_pgdir_lock);
+    if ( virt )
+    {
+        unsigned int flush_flags =
+            FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
+
+        if ( (l3e_get_flags(ol3e) & _PAGE_GLOBAL) )
+                flush_flags |= FLUSH_TLB_GLOBAL;
+        flush_area(virt, flush_flags);
+    }
+    if ( l2t )
+        free_xen_pagetable(l2t);
+}
+
 int map_pages_to_xen(
     unsigned long virt,
     mfn_t mfn,
@@ -5244,9 +5281,6 @@ int map_pages_to_xen(
         if ( (l3e_get_flags(ol3e) & _PAGE_PRESENT) &&
              (l3e_get_flags(ol3e) & _PAGE_PSE) )
         {
-            unsigned int flush_flags =
-                FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
-
             /* Skip this PTE if there is no change. */
             if ( ((l3e_get_pfn(ol3e) & ~(L2_PAGETABLE_ENTRIES *
                                          L1_PAGETABLE_ENTRIES - 1)) +
@@ -5270,30 +5304,8 @@ int map_pages_to_xen(
             pl2e = alloc_xen_pagetable();
             if ( pl2e == NULL )
                 return -ENOMEM;
-
-            for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
-                l2e_write(pl2e + i,
-                          l2e_from_pfn(l3e_get_pfn(ol3e) +
-                                       (i << PAGETABLE_ORDER),
-                                       l3e_get_flags(ol3e)));
-
-            if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
-                flush_flags |= FLUSH_TLB_GLOBAL;
-
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
-            if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
-                 (l3e_get_flags(*pl3e) & _PAGE_PSE) )
-            {
-                l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(pl2e),
-                                                    __PAGE_HYPERVISOR));
-                pl2e = NULL;
-            }
-            if ( locking )
-                spin_unlock(&map_pgdir_lock);
-            flush_area(virt, flush_flags);
-            if ( pl2e )
-                free_xen_pagetable(pl2e);
+            /* Pass virt to indicate we need to flush. */
+            shatter_l3e(pl3e, pl2e, virt, locking);
         }
 
         pl2e = virt_to_xen_l2e(virt);
@@ -5581,24 +5593,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
             pl2e = alloc_xen_pagetable();
             if ( !pl2e )
                 return -ENOMEM;
-            for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
-                l2e_write(pl2e + i,
-                          l2e_from_pfn(l3e_get_pfn(*pl3e) +
-                                       (i << PAGETABLE_ORDER),
-                                       l3e_get_flags(*pl3e)));
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
-            if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
-                 (l3e_get_flags(*pl3e) & _PAGE_PSE) )
-            {
-                l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(pl2e),
-                                                    __PAGE_HYPERVISOR));
-                pl2e = NULL;
-            }
-            if ( locking )
-                spin_unlock(&map_pgdir_lock);
-            if ( pl2e )
-                free_xen_pagetable(pl2e);
+            shatter_l3e(pl3e, pl2e, 0, locking);
         }
 
         /*
-- 
2.17.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Xen-devel] [PATCH 2/2] x86/mm: factor out the code for shattering an l2 PTE
  2019-12-06 15:53 [Xen-devel] [PATCH 0/2] Refactor super page shattering Hongyan Xia
  2019-12-06 15:53 ` [Xen-devel] [PATCH 1/2] x86/mm: factor out the code for shattering an l3 PTE Hongyan Xia
@ 2019-12-06 15:53 ` Hongyan Xia
  1 sibling, 0 replies; 11+ messages in thread
From: Hongyan Xia @ 2019-12-06 15:53 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, jgrall, Wei Liu, Jan Beulich, Roger Pau Monné

map_pages_to_xen and modify_xen_mappings are performing almost exactly
the same operations when shattering an l2 PTE, the only difference
being whether we want to flush.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
---
 xen/arch/x86/mm.c | 81 ++++++++++++++++++++++-------------------------
 1 file changed, 38 insertions(+), 43 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 42aaaa1083..65f0758a6f 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5151,6 +5151,41 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
                          flush_area_local((const void *)v, f) : \
                          flush_area_all((const void *)v, f))
 
+/* Shatter an l2 entry and populate l1. If virt is passed in, also do flush. */
+static void shatter_l2e(l2_pgentry_t *pl2e, l1_pgentry_t *l1t,
+        unsigned long virt, bool locking)
+{
+    unsigned int i;
+    l2_pgentry_t ol2e = *pl2e;
+
+    for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
+        l1e_write(l1t + i,
+                  l1e_from_pfn(l2e_get_pfn(ol2e) + i,
+                               lNf_to_l1f(l2e_get_flags(ol2e))));
+    if ( locking )
+        spin_lock(&map_pgdir_lock);
+    if ( (l2e_get_flags(ol2e) & _PAGE_PRESENT) &&
+         (l2e_get_flags(ol2e) & _PAGE_PSE) )
+    {
+        l2e_write_atomic(pl2e, l2e_from_mfn(virt_to_mfn(l1t),
+                                            __PAGE_HYPERVISOR));
+        l1t = NULL;
+    }
+    if ( locking )
+        spin_unlock(&map_pgdir_lock);
+    if ( virt )
+    {
+        unsigned int flush_flags =
+            FLUSH_TLB | FLUSH_ORDER(PAGETABLE_ORDER);
+
+        if ( l2e_get_flags(ol2e) & _PAGE_GLOBAL )
+            flush_flags |= FLUSH_TLB_GLOBAL;
+        flush_area(virt, flush_flags);
+    }
+    if ( l1t )
+        free_xen_pagetable(l1t);
+}
+
 /* Shatter an l3 entry and populate l2. If virt is passed in, also do flush. */
 static void shatter_l3e(l3_pgentry_t *pl3e, l2_pgentry_t *l2t,
         unsigned long virt, bool locking)
@@ -5357,9 +5392,6 @@ int map_pages_to_xen(
             }
             else if ( l2e_get_flags(*pl2e) & _PAGE_PSE )
             {
-                unsigned int flush_flags =
-                    FLUSH_TLB | FLUSH_ORDER(PAGETABLE_ORDER);
-
                 /* Skip this PTE if there is no change. */
                 if ( (((l2e_get_pfn(*pl2e) & ~(L1_PAGETABLE_ENTRIES - 1)) +
                        l1_table_offset(virt)) == mfn_x(mfn)) &&
@@ -5381,29 +5413,8 @@ int map_pages_to_xen(
                 pl1e = alloc_xen_pagetable();
                 if ( pl1e == NULL )
                     return -ENOMEM;
-
-                for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
-                    l1e_write(&pl1e[i],
-                              l1e_from_pfn(l2e_get_pfn(*pl2e) + i,
-                                           lNf_to_l1f(l2e_get_flags(*pl2e))));
-
-                if ( l2e_get_flags(*pl2e) & _PAGE_GLOBAL )
-                    flush_flags |= FLUSH_TLB_GLOBAL;
-
-                if ( locking )
-                    spin_lock(&map_pgdir_lock);
-                if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
-                     (l2e_get_flags(*pl2e) & _PAGE_PSE) )
-                {
-                    l2e_write_atomic(pl2e, l2e_from_mfn(virt_to_mfn(pl1e),
-                                                        __PAGE_HYPERVISOR));
-                    pl1e = NULL;
-                }
-                if ( locking )
-                    spin_unlock(&map_pgdir_lock);
-                flush_area(virt, flush_flags);
-                if ( pl1e )
-                    free_xen_pagetable(pl1e);
+                /* Pass virt to indicate we need to flush. */
+                shatter_l2e(pl2e, pl1e, virt, locking);
             }
 
             pl1e  = l2e_to_l1e(*pl2e) + l1_table_offset(virt);
@@ -5631,23 +5642,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                 pl1e = alloc_xen_pagetable();
                 if ( !pl1e )
                     return -ENOMEM;
-                for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
-                    l1e_write(&pl1e[i],
-                              l1e_from_pfn(l2e_get_pfn(*pl2e) + i,
-                                           l2e_get_flags(*pl2e) & ~_PAGE_PSE));
-                if ( locking )
-                    spin_lock(&map_pgdir_lock);
-                if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
-                     (l2e_get_flags(*pl2e) & _PAGE_PSE) )
-                {
-                    l2e_write_atomic(pl2e, l2e_from_mfn(virt_to_mfn(pl1e),
-                                                        __PAGE_HYPERVISOR));
-                    pl1e = NULL;
-                }
-                if ( locking )
-                    spin_unlock(&map_pgdir_lock);
-                if ( pl1e )
-                    free_xen_pagetable(pl1e);
+                shatter_l2e(pl2e, pl1e, 0, locking);
             }
         }
         else
-- 
2.17.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] x86/mm: factor out the code for shattering an l3 PTE
  2019-12-06 15:53 ` [Xen-devel] [PATCH 1/2] x86/mm: factor out the code for shattering an l3 PTE Hongyan Xia
@ 2019-12-06 17:50   ` Andrew Cooper
  2019-12-07 18:20     ` Xia, Hongyan
  2019-12-07 19:04   ` Julien Grall
  1 sibling, 1 reply; 11+ messages in thread
From: Andrew Cooper @ 2019-12-06 17:50 UTC (permalink / raw)
  To: Hongyan Xia, xen-devel; +Cc: jgrall, Wei Liu, Jan Beulich, Roger Pau Monné

On 06/12/2019 15:53, Hongyan Xia wrote:
> map_pages_to_xen and modify_xen_mappings are performing almost exactly
> the same operations when shattering an l3 PTE, the only difference
> being whether we want to flush.
>
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

Just for reviewing purposes, how does this relate to your other posted
series.  Is it independent, a prerequisite, or does it depend on that
series?

> ---
>  xen/arch/x86/mm.c | 85 ++++++++++++++++++++++-------------------------
>  1 file changed, 40 insertions(+), 45 deletions(-)
>
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 7d4dd80a85..42aaaa1083 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -5151,6 +5151,43 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
>                           flush_area_local((const void *)v, f) : \
>                           flush_area_all((const void *)v, f))
>  
> +/* Shatter an l3 entry and populate l2. If virt is passed in, also do flush. */
> +static void shatter_l3e(l3_pgentry_t *pl3e, l2_pgentry_t *l2t,
> +        unsigned long virt, bool locking)
> +{
> +    unsigned int i;
> +    l3_pgentry_t ol3e = *pl3e;
> +
> +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
> +        l2e_write(l2t + i,
> +                  l2e_from_pfn(l3e_get_pfn(ol3e) +
> +                               (i << PAGETABLE_ORDER),
> +                               l3e_get_flags(ol3e)));

The PTE macros are especially poor for generated asm, and in cases like
this, I'd like to improve things.

In particular, I believe the following code has identical behaviour:

l2_pgentry_t nl2e = l2e_from_intpte(l3e_get_intpte(ol3e));

for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++, nl2e.l2 += PAGETABLE_ORDER )
    l2e_write(l2t + i, nl2e);

(although someone please double check my logic) and rather better asm
generation.  (I also expect there to be some discussion on whether using
n2le.l2 directly is something we'd want to start doing.)

> +
> +    if ( locking )
> +        spin_lock(&map_pgdir_lock);
> +    if ( (l3e_get_flags(ol3e) & _PAGE_PRESENT) &&
> +         (l3e_get_flags(ol3e) & _PAGE_PSE) )

There is a subtle difference between the original code, and the
refactored code, and it depends on the memory barrier from spin_lock().

Previously, it was re-read from memory after the lock, whereas now it is
likely the stale version from before map_pgdir was locked.

Either you can go back to the old version and use *pl3e, or
alternatively use:

    if ( locking )
        spin_lock(&map_pgdir_lock);
    ol3e = ACCESS_ONCE(*pl3e);
    if ( ...

to make it clear that a reread from memory is required.

> +    {
> +        l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(l2t),

This would probably generate better asm by using the maddr variants so
we don't have a double shift.

> +                                            __PAGE_HYPERVISOR));
> +        l2t = NULL;
> +    }
> +    if ( locking )
> +        spin_unlock(&map_pgdir_lock);
> +    if ( virt )
> +    {
> +        unsigned int flush_flags =
> +            FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
> +
> +        if ( (l3e_get_flags(ol3e) & _PAGE_GLOBAL) )
> +                flush_flags |= FLUSH_TLB_GLOBAL;
> +        flush_area(virt, flush_flags);
> +    }
> +    if ( l2t )
> +        free_xen_pagetable(l2t);

This surely needs to NULL out l2t, just so the caller doesn't get any
clever ideas and ends up with a use-after-free?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] x86/mm: factor out the code for shattering an l3 PTE
  2019-12-06 17:50   ` Andrew Cooper
@ 2019-12-07 18:20     ` Xia, Hongyan
  2019-12-07 18:53       ` Julien Grall
  2019-12-09 13:22       ` Jan Beulich
  0 siblings, 2 replies; 11+ messages in thread
From: Xia, Hongyan @ 2019-12-07 18:20 UTC (permalink / raw)
  To: andrew.cooper3, xen-devel; +Cc: Grall, Julien, wl, jbeulich, roger.pau

Hi Andrew,

On Fri, 2019-12-06 at 17:50 +0000, Andrew Cooper wrote:
> On 06/12/2019 15:53, Hongyan Xia wrote:
> > map_pages_to_xen and modify_xen_mappings are performing almost
> > exactly
> > the same operations when shattering an l3 PTE, the only difference
> > being whether we want to flush.
> > 
> > Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> 
> Just for reviewing purposes, how does this relate to your other
> posted
> series.  Is it independent, a prerequisite, or does it depend on that
> series?

This is independent. Other series I posted will touch a lot of PTE
functions, and as Jan suggested, it would be nice to refactor some of
the long and complicated ones before changing them, which could also
prepare us for 5-level paging in the future. Although if these
refactoring patches get in, I need to rebase the series I posted
before.

> 
> > ---
> >  xen/arch/x86/mm.c | 85 ++++++++++++++++++++++---------------------
> > ----
> >  1 file changed, 40 insertions(+), 45 deletions(-)
> > 
> > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> > index 7d4dd80a85..42aaaa1083 100644
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -5151,6 +5151,43 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long
> > v)
> >                           flush_area_local((const void *)v, f) : \
> >                           flush_area_all((const void *)v, f))
> >  
> > +/* Shatter an l3 entry and populate l2. If virt is passed in, also
> > do flush. */
> > +static void shatter_l3e(l3_pgentry_t *pl3e, l2_pgentry_t *l2t,
> > +        unsigned long virt, bool locking)
> > +{
> > +    unsigned int i;
> > +    l3_pgentry_t ol3e = *pl3e;
> > +
> > +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
> > +        l2e_write(l2t + i,
> > +                  l2e_from_pfn(l3e_get_pfn(ol3e) +
> > +                               (i << PAGETABLE_ORDER),
> > +                               l3e_get_flags(ol3e)));
> 
> The PTE macros are especially poor for generated asm, and in cases
> like
> this, I'd like to improve things.
> 
> In particular, I believe the following code has identical behaviour:
> 
> l2_pgentry_t nl2e = l2e_from_intpte(l3e_get_intpte(ol3e));
> 
> for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++, nl2e.l2 +=
> PAGETABLE_ORDER )
>     l2e_write(l2t + i, nl2e);
> 
> (although someone please double check my logic) and rather better asm
> generation.  (I also expect there to be some discussion on whether
> using
> n2le.l2 directly is something we'd want to start doing.)
> 

I believe it should be nl2e.l2 += 1<<(PAGETABLE_ORDER+PAGE_SHIFT) ?
Although the code rarely touches the field (.l2) directly, so maybe use
the macros (get_intpte -> add -> from_intpte) for consistency? They
should produce the same code if the compiler is not too stupid.

> > +
> > +    if ( locking )
> > +        spin_lock(&map_pgdir_lock);
> > +    if ( (l3e_get_flags(ol3e) & _PAGE_PRESENT) &&
> > +         (l3e_get_flags(ol3e) & _PAGE_PSE) )
> 
> There is a subtle difference between the original code, and the
> refactored code, and it depends on the memory barrier from
> spin_lock().
> 
> Previously, it was re-read from memory after the lock, whereas now it
> is
> likely the stale version from before map_pgdir was locked.
> 
> Either you can go back to the old version and use *pl3e, or
> alternatively use:
> 
>     if ( locking )
>         spin_lock(&map_pgdir_lock);
>     ol3e = ACCESS_ONCE(*pl3e);
>     if ( ...
> 
> to make it clear that a reread from memory is required.
> 

Good point. Thanks.

> > +    {
> > +        l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(l2t),
> 
> This would probably generate better asm by using the maddr variants
> so
> we don't have a double shift.
> 

Right. I can change that.

> > +                                            __PAGE_HYPERVISOR));
> > +        l2t = NULL;
> > +    }
> > +    if ( locking )
> > +        spin_unlock(&map_pgdir_lock);
> > +    if ( virt )
> > +    {
> > +        unsigned int flush_flags =
> > +            FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
> > +
> > +        if ( (l3e_get_flags(ol3e) & _PAGE_GLOBAL) )
> > +                flush_flags |= FLUSH_TLB_GLOBAL;
> > +        flush_area(virt, flush_flags);
> > +    }
> > +    if ( l2t )
> > +        free_xen_pagetable(l2t);
> 
> This surely needs to NULL out l2t, just so the caller doesn't get any
> clever ideas and ends up with a use-after-free?
> 
> ~Andrew

hmm... if we want to make the nullification visible to the caller we
might need to pass &. I wonder if it makes sense to simply move the
memory allocation of pl2e into shatter_l3e as well, so that the caller
cannot have any ideas.

Thanks,
Hongyan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] x86/mm: factor out the code for shattering an l3 PTE
  2019-12-07 18:20     ` Xia, Hongyan
@ 2019-12-07 18:53       ` Julien Grall
  2019-12-09 13:22       ` Jan Beulich
  1 sibling, 0 replies; 11+ messages in thread
From: Julien Grall @ 2019-12-07 18:53 UTC (permalink / raw)
  To: Xia, Hongyan, andrew.cooper3, xen-devel; +Cc: wl, jbeulich, roger.pau

Hi,

On 07/12/2019 18:20, Xia, Hongyan wrote:
> hmm... if we want to make the nullification visible to the caller we
> might need to pass &. I wonder if it makes sense to simply move the
> memory allocation of pl2e into shatter_l3e as well, so that the caller
> cannot have any ideas.

AFAICT, the allocation is done when shattering the page-tables. So it 
would make sense to move it withing the shatter function. Note that you 
will need to propagate the error if there is any.


Cheers,

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] x86/mm: factor out the code for shattering an l3 PTE
  2019-12-06 15:53 ` [Xen-devel] [PATCH 1/2] x86/mm: factor out the code for shattering an l3 PTE Hongyan Xia
  2019-12-06 17:50   ` Andrew Cooper
@ 2019-12-07 19:04   ` Julien Grall
  2019-12-07 19:37     ` Andrew Cooper
  1 sibling, 1 reply; 11+ messages in thread
From: Julien Grall @ 2019-12-07 19:04 UTC (permalink / raw)
  To: Hongyan Xia, xen-devel
  Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

Hi Hongyan,

On 06/12/2019 15:53, Hongyan Xia wrote:
> map_pages_to_xen and modify_xen_mappings are performing almost exactly
> the same operations when shattering an l3 PTE, the only difference
> being whether we want to flush.
> 
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> ---
>   xen/arch/x86/mm.c | 85 ++++++++++++++++++++++-------------------------
>   1 file changed, 40 insertions(+), 45 deletions(-)
> 
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 7d4dd80a85..42aaaa1083 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -5151,6 +5151,43 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
>                            flush_area_local((const void *)v, f) : \
>                            flush_area_all((const void *)v, f))
>   
> +/* Shatter an l3 entry and populate l2. If virt is passed in, also do flush. */
> +static void shatter_l3e(l3_pgentry_t *pl3e, l2_pgentry_t *l2t,
> +        unsigned long virt, bool locking)
> +{
> +    unsigned int i;
> +    l3_pgentry_t ol3e = *pl3e;
> +
> +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
> +        l2e_write(l2t + i,
> +                  l2e_from_pfn(l3e_get_pfn(ol3e) +
> +                               (i << PAGETABLE_ORDER),
> +                               l3e_get_flags(ol3e)));

I understand this is just a factor out of the current code, but the code 
feels wrong (at least in theory) and wasteful.

You would allocate the L2 table, prepare it and then free it if the L3 
entry has _PAGE_PRESENT or _PAGE_PSE cleared.

Also, in theory, there is nothing preventing the l3 flags to be modified 
behind your back. So you could end up to generate the l2 entries with 
the old l3 flags.

In nutshell, it feels to me that the shattering/allocation should happen 
with the lock taken. This would also allow you to not allocate the l2 
table when you are removing the page.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] x86/mm: factor out the code for shattering an l3 PTE
  2019-12-07 19:04   ` Julien Grall
@ 2019-12-07 19:37     ` Andrew Cooper
  2019-12-07 23:39       ` Julien Grall
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Cooper @ 2019-12-07 19:37 UTC (permalink / raw)
  To: Julien Grall, Hongyan Xia, xen-devel
  Cc: Wei Liu, Jan Beulich, Roger Pau Monné

On 07/12/2019 19:04, Julien Grall wrote:
> Hi Hongyan,
>
> On 06/12/2019 15:53, Hongyan Xia wrote:
>> map_pages_to_xen and modify_xen_mappings are performing almost exactly
>> the same operations when shattering an l3 PTE, the only difference
>> being whether we want to flush.
>>
>> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
>> ---
>>   xen/arch/x86/mm.c | 85 ++++++++++++++++++++++-------------------------
>>   1 file changed, 40 insertions(+), 45 deletions(-)
>>
>> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
>> index 7d4dd80a85..42aaaa1083 100644
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -5151,6 +5151,43 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
>>                            flush_area_local((const void *)v, f) : \
>>                            flush_area_all((const void *)v, f))
>>   +/* Shatter an l3 entry and populate l2. If virt is passed in, also
>> do flush. */
>> +static void shatter_l3e(l3_pgentry_t *pl3e, l2_pgentry_t *l2t,
>> +        unsigned long virt, bool locking)
>> +{
>> +    unsigned int i;
>> +    l3_pgentry_t ol3e = *pl3e;
>> +
>> +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
>> +        l2e_write(l2t + i,
>> +                  l2e_from_pfn(l3e_get_pfn(ol3e) +
>> +                               (i << PAGETABLE_ORDER),
>> +                               l3e_get_flags(ol3e)));
>
> I understand this is just a factor out of the current code, but the
> code feels wrong (at least in theory) and wasteful.
>
> You would allocate the L2 table, prepare it and then free it if the L3
> entry has _PAGE_PRESENT or _PAGE_PSE cleared.
>
> Also, in theory, there is nothing preventing the l3 flags to be
> modified behind your back. So you could end up to generate the l2
> entries with the old l3 flags.
>
> In nutshell, it feels to me that the shattering/allocation should
> happen with the lock taken. This would also allow you to not allocate
> the l2 table when you are removing the page.

The existing behaviour is deliberate and most likely wants to remain.

It makes adjustments safe to concurrent modifications, while reducing
the critical section of the lock to only the alteration of the live PTEs.

We don't expect concurrent conflicting updates to these pagetables at
all, but extending the locked region around the memory allocation and
writing the new pagetable is a bottlekneck to parallel updates of
independent addresses.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] x86/mm: factor out the code for shattering an l3 PTE
  2019-12-07 19:37     ` Andrew Cooper
@ 2019-12-07 23:39       ` Julien Grall
  0 siblings, 0 replies; 11+ messages in thread
From: Julien Grall @ 2019-12-07 23:39 UTC (permalink / raw)
  To: Andrew Cooper, Hongyan Xia, xen-devel
  Cc: Wei Liu, Jan Beulich, Roger Pau Monné



On 07/12/2019 19:37, Andrew Cooper wrote:
> On 07/12/2019 19:04, Julien Grall wrote:
>> Hi Hongyan,
>>
>> On 06/12/2019 15:53, Hongyan Xia wrote:
>>> map_pages_to_xen and modify_xen_mappings are performing almost exactly
>>> the same operations when shattering an l3 PTE, the only difference
>>> being whether we want to flush.
>>>
>>> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
>>> ---
>>>    xen/arch/x86/mm.c | 85 ++++++++++++++++++++++-------------------------
>>>    1 file changed, 40 insertions(+), 45 deletions(-)
>>>
>>> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
>>> index 7d4dd80a85..42aaaa1083 100644
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -5151,6 +5151,43 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
>>>                             flush_area_local((const void *)v, f) : \
>>>                             flush_area_all((const void *)v, f))
>>>    +/* Shatter an l3 entry and populate l2. If virt is passed in, also
>>> do flush. */
>>> +static void shatter_l3e(l3_pgentry_t *pl3e, l2_pgentry_t *l2t,
>>> +        unsigned long virt, bool locking)
>>> +{
>>> +    unsigned int i;
>>> +    l3_pgentry_t ol3e = *pl3e;
>>> +
>>> +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
>>> +        l2e_write(l2t + i,
>>> +                  l2e_from_pfn(l3e_get_pfn(ol3e) +
>>> +                               (i << PAGETABLE_ORDER),
>>> +                               l3e_get_flags(ol3e)));
>>
>> I understand this is just a factor out of the current code, but the
>> code feels wrong (at least in theory) and wasteful.
>>
>> You would allocate the L2 table, prepare it and then free it if the L3
>> entry has _PAGE_PRESENT or _PAGE_PSE cleared.
>>
>> Also, in theory, there is nothing preventing the l3 flags to be
>> modified behind your back. So you could end up to generate the l2
>> entries with the old l3 flags.
>>
>> In nutshell, it feels to me that the shattering/allocation should
>> happen with the lock taken. This would also allow you to not allocate
>> the l2 table when you are removing the page.
> 
> The existing behaviour is deliberate and most likely wants to remain.
> 
> It makes adjustments safe to concurrent modifications, while reducing
> the critical section of the lock to only the alteration of the live PTEs.
> 
> We don't expect concurrent conflicting updates to these pagetables at
> all, but extending the locked region around the memory allocation and
> writing the new pagetable is a bottlekneck to parallel updates of
> independent addresses.

I am quite interested to know a bit more about the potential 
bottlenecks. When I rewrote the Xen PT helpers for Arm I didn't see many 
access to the Xen PT.

To make a comparison, I see much more update to the P2M. So I would 
expect such optimization to be more a concern there. Yet we take the 
lock for the full update. Maybe this was an oversight when the P2M was 
created?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] x86/mm: factor out the code for shattering an l3 PTE
  2019-12-07 18:20     ` Xia, Hongyan
  2019-12-07 18:53       ` Julien Grall
@ 2019-12-09 13:22       ` Jan Beulich
  2019-12-10 10:07         ` Xia, Hongyan
  1 sibling, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2019-12-09 13:22 UTC (permalink / raw)
  To: Xia, Hongyan, andrew.cooper3; +Cc: xen-devel, Grall, Julien, wl, roger.pau

On 07.12.2019 19:20, Xia, Hongyan wrote:
> On Fri, 2019-12-06 at 17:50 +0000, Andrew Cooper wrote:
>> On 06/12/2019 15:53, Hongyan Xia wrote:
>>> +/* Shatter an l3 entry and populate l2. If virt is passed in, also
>>> do flush. */
>>> +static void shatter_l3e(l3_pgentry_t *pl3e, l2_pgentry_t *l2t,
>>> +        unsigned long virt, bool locking)
>>> +{
>>> +    unsigned int i;
>>> +    l3_pgentry_t ol3e = *pl3e;
>>> +
>>> +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
>>> +        l2e_write(l2t + i,
>>> +                  l2e_from_pfn(l3e_get_pfn(ol3e) +
>>> +                               (i << PAGETABLE_ORDER),
>>> +                               l3e_get_flags(ol3e)));
>>
>> The PTE macros are especially poor for generated asm, and in cases
>> like
>> this, I'd like to improve things.
>>
>> In particular, I believe the following code has identical behaviour:
>>
>> l2_pgentry_t nl2e = l2e_from_intpte(l3e_get_intpte(ol3e));
>>
>> for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++, nl2e.l2 +=
>> PAGETABLE_ORDER )
>>     l2e_write(l2t + i, nl2e);
>>
>> (although someone please double check my logic) and rather better asm
>> generation.  (I also expect there to be some discussion on whether
>> using
>> n2le.l2 directly is something we'd want to start doing.)
>>
> 
> I believe it should be nl2e.l2 += 1<<(PAGETABLE_ORDER+PAGE_SHIFT) ?

Indeed.

> Although the code rarely touches the field (.l2) directly, so maybe use
> the macros (get_intpte -> add -> from_intpte) for consistency? They
> should produce the same code if the compiler is not too stupid.

I think the to/from intpte transformations should be used sparingly
too (as being dangerous). How about we make PTEs proper unions, with
a full-field intpte_t as we have it now combined with a set of bit
fields? This would at least eliminate the need for using PAGE_SHIFT
in constructs like the above.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] x86/mm: factor out the code for shattering an l3 PTE
  2019-12-09 13:22       ` Jan Beulich
@ 2019-12-10 10:07         ` Xia, Hongyan
  0 siblings, 0 replies; 11+ messages in thread
From: Xia, Hongyan @ 2019-12-10 10:07 UTC (permalink / raw)
  To: jbeulich, andrew.cooper3; +Cc: xen-devel, Grall,  Julien, wl, roger.pau

On Mon, 2019-12-09 at 14:22 +0100, Jan Beulich wrote:
> On 07.12.2019 19:20, Xia, Hongyan wrote:
> > On Fri, 2019-12-06 at 17:50 +0000, Andrew Cooper wrote:
> > > On 06/12/2019 15:53, Hongyan Xia wrote:
> > > > +/* Shatter an l3 entry and populate l2. If virt is passed in,
> > > > also
> > > > do flush. */
> > > > +static void shatter_l3e(l3_pgentry_t *pl3e, l2_pgentry_t *l2t,
> > > > +        unsigned long virt, bool locking)
> > > > +{
> > > > +    unsigned int i;
> > > > +    l3_pgentry_t ol3e = *pl3e;
> > > > +
> > > > +    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
> > > > +        l2e_write(l2t + i,
> > > > +                  l2e_from_pfn(l3e_get_pfn(ol3e) +
> > > > +                               (i << PAGETABLE_ORDER),
> > > > +                               l3e_get_flags(ol3e)));
> > > 
> > > The PTE macros are especially poor for generated asm, and in
> > > cases
> > > like
> > > this, I'd like to improve things.
> > > 
> > > In particular, I believe the following code has identical
> > > behaviour:
> > > 
> > > l2_pgentry_t nl2e = l2e_from_intpte(l3e_get_intpte(ol3e));
> > > 
> > > for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++, nl2e.l2 +=
> > > PAGETABLE_ORDER )
> > >     l2e_write(l2t + i, nl2e);
> > > 
> > > (although someone please double check my logic) and rather better
> > > asm
> > > generation.  (I also expect there to be some discussion on
> > > whether
> > > using
> > > n2le.l2 directly is something we'd want to start doing.)
> > > 
> > 
> > I believe it should be nl2e.l2 += 1<<(PAGETABLE_ORDER+PAGE_SHIFT) ?
> 
> Indeed.
> 
> > Although the code rarely touches the field (.l2) directly, so maybe
> > use
> > the macros (get_intpte -> add -> from_intpte) for consistency? They
> > should produce the same code if the compiler is not too stupid.
> 
> I think the to/from intpte transformations should be used sparingly
> too (as being dangerous). How about we make PTEs proper unions, with
> a full-field intpte_t as we have it now combined with a set of bit
> fields? This would at least eliminate the need for using PAGE_SHIFT
> in constructs like the above.

I can see this makes the code look much nicer. One concern I have is
that Andrew's suggestion was to improve the generated assembly code,
and using packed bit fields may generate even more masking and bit
shifting, which in the end might give us more assembly code than before
the refactoring.

Hongyan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-12-10 10:07 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-06 15:53 [Xen-devel] [PATCH 0/2] Refactor super page shattering Hongyan Xia
2019-12-06 15:53 ` [Xen-devel] [PATCH 1/2] x86/mm: factor out the code for shattering an l3 PTE Hongyan Xia
2019-12-06 17:50   ` Andrew Cooper
2019-12-07 18:20     ` Xia, Hongyan
2019-12-07 18:53       ` Julien Grall
2019-12-09 13:22       ` Jan Beulich
2019-12-10 10:07         ` Xia, Hongyan
2019-12-07 19:04   ` Julien Grall
2019-12-07 19:37     ` Andrew Cooper
2019-12-07 23:39       ` Julien Grall
2019-12-06 15:53 ` [Xen-devel] [PATCH 2/2] x86/mm: factor out the code for shattering an l2 PTE Hongyan Xia

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.