xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] x86/p2m: use large pages for MMIO mappings
       [not found] <55F70C9A02000078000A2A58@prv-mh.provo.novell.com>
@ 2015-09-15  7:13 ` Jan Beulich
  2015-09-15  7:30   ` [PATCH 1/4] x86/EPT: always return proper order value from ept_get_entry() Jan Beulich
                     ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Jan Beulich @ 2015-09-15  7:13 UTC (permalink / raw)
  To: xen-devel
  Cc: George Dunlap, Andrew Cooper, Malcolm Crossley, Keir Fraser, Tiejun Chen

... where possible. This is in response to
http://lists.xenproject.org/archives/html/xen-devel/2015-09/msg01502.html,
albeit only the first three patches are really needed for that (the
fourth one just leverages what the first two do in another way).
The third (main) patch is RFC for a number of reasons - see there.

1: EPT: always return proper order value from ept_get_entry()
2: NPT: always return proper order value from p2m_pt_get_entry()
3: p2m: use large pages for MMIO mappings
4: PoD: shorten updates for certain operations on higher order ranges

Signed-off-by: Jan Beulich <jbeulich@suse.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/4] x86/EPT: always return proper order value from ept_get_entry()
  2015-09-15  7:13 ` [PATCH 0/4] x86/p2m: use large pages for MMIO mappings Jan Beulich
@ 2015-09-15  7:30   ` Jan Beulich
  2015-09-16  7:15     ` Tian, Kevin
  2015-09-17 16:13     ` Andrew Cooper
  2015-09-15  7:31   ` [PATCH 2/4] x86/NPT: always return proper order value from p2m_pt_get_entry() Jan Beulich
                     ` (3 subsequent siblings)
  4 siblings, 2 replies; 26+ messages in thread
From: Jan Beulich @ 2015-09-15  7:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, George Dunlap, Andrew Cooper,
	Jun Nakajima, Tiejun Chen, Malcolm Crossley

[-- Attachment #1: Type: text/plain, Size: 1091 bytes --]

This is so that callers can determine what range of address space would
get altered by a corresponding "set".

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -879,7 +879,13 @@ static mfn_t ept_get_entry(struct p2m_do
 
     /* This pfn is higher than the highest the p2m map currently holds */
     if ( gfn > p2m->max_mapped_pfn )
+    {
+        for ( i = ept_get_wl(ept); i > 0; --i )
+            if ( (gfn & ~((1UL << (i * EPT_TABLE_ORDER)) - 1)) >
+                 p2m->max_mapped_pfn )
+                break;
         goto out;
+    }
 
     /* Should check if gfn obeys GAW here. */
 
@@ -956,12 +962,12 @@ static mfn_t ept_get_entry(struct p2m_do
                  ((1 << (i * EPT_TABLE_ORDER)) - 1));
             mfn = _mfn(split_mfn);
         }
-
-        if ( page_order )
-            *page_order = i * EPT_TABLE_ORDER;
     }
 
-out:
+ out:
+    if ( page_order )
+        *page_order = i * EPT_TABLE_ORDER;
+
     unmap_domain_page(table);
     return mfn;
 }




[-- Attachment #2: x86-p2m-ept-get-order.patch --]
[-- Type: text/plain, Size: 1151 bytes --]

x86/EPT: always return proper order value from ept_get_entry()

This is so that callers can determine what range of address space would
get altered by a corresponding "set".

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -879,7 +879,13 @@ static mfn_t ept_get_entry(struct p2m_do
 
     /* This pfn is higher than the highest the p2m map currently holds */
     if ( gfn > p2m->max_mapped_pfn )
+    {
+        for ( i = ept_get_wl(ept); i > 0; --i )
+            if ( (gfn & ~((1UL << (i * EPT_TABLE_ORDER)) - 1)) >
+                 p2m->max_mapped_pfn )
+                break;
         goto out;
+    }
 
     /* Should check if gfn obeys GAW here. */
 
@@ -956,12 +962,12 @@ static mfn_t ept_get_entry(struct p2m_do
                  ((1 << (i * EPT_TABLE_ORDER)) - 1));
             mfn = _mfn(split_mfn);
         }
-
-        if ( page_order )
-            *page_order = i * EPT_TABLE_ORDER;
     }
 
-out:
+ out:
+    if ( page_order )
+        *page_order = i * EPT_TABLE_ORDER;
+
     unmap_domain_page(table);
     return mfn;
 }

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2/4] x86/NPT: always return proper order value from p2m_pt_get_entry()
  2015-09-15  7:13 ` [PATCH 0/4] x86/p2m: use large pages for MMIO mappings Jan Beulich
  2015-09-15  7:30   ` [PATCH 1/4] x86/EPT: always return proper order value from ept_get_entry() Jan Beulich
@ 2015-09-15  7:31   ` Jan Beulich
  2015-09-15  7:35     ` Jan Beulich
  2015-09-15  7:32   ` Jan Beulich
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2015-09-15  7:31 UTC (permalink / raw)
  To: xen-devel
  Cc: George Dunlap, Andrew Cooper, Malcolm Crossley, Keir Fraser, Tiejun Chen

>>> On 15.09.15 at 09:13, <JBeulich@suse.com> wrote:
> ... where possible. This is in response to
> http://lists.xenproject.org/archives/html/xen-devel/2015-09/msg01502.html,
> albeit only the first three patches are really needed for that (the
> fourth one just leverages what the first two do in another way).
> The third (main) patch is RFC for a number of reasons - see there.
> 
> 1: EPT: always return proper order value from ept_get_entry()
> 2: 
> 3: p2m: use large pages for MMIO mappings
> 4: PoD: shorten updates for certain operations on higher order ranges
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org 
> http://lists.xen.org/xen-devel 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2/4] x86/NPT: always return proper order value from p2m_pt_get_entry()
  2015-09-15  7:13 ` [PATCH 0/4] x86/p2m: use large pages for MMIO mappings Jan Beulich
  2015-09-15  7:30   ` [PATCH 1/4] x86/EPT: always return proper order value from ept_get_entry() Jan Beulich
  2015-09-15  7:31   ` [PATCH 2/4] x86/NPT: always return proper order value from p2m_pt_get_entry() Jan Beulich
@ 2015-09-15  7:32   ` Jan Beulich
  2015-09-17 16:14     ` Andrew Cooper
  2015-09-15  7:34   ` [PATCH 3/4 RFC] x86/p2m: use large pages for MMIO mappings Jan Beulich
  2015-09-15  7:37   ` [PATCH 4/4] x86/PoD: shorten certain operations on higher order ranges Jan Beulich
  4 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2015-09-15  7:32 UTC (permalink / raw)
  To: xen-devel
  Cc: George Dunlap, Andrew Cooper, Malcolm Crossley, Keir Fraser, Tiejun Chen

[-- Attachment #1: Type: text/plain, Size: 2832 bytes --]

This is so that callers can determine what range of address space would
get altered by a corresponding "set".

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -715,14 +715,26 @@ p2m_pt_get_entry(struct p2m_domain *p2m,
     *a = p2m_access_rwx; 
 
     if ( gfn > p2m->max_mapped_pfn )
+    {
         /* This pfn is higher than the highest the p2m map currently holds */
+        if ( page_order )
+        {
+            for ( *page_order = 3 * PAGETABLE_ORDER; *page_order;
+                  *page_order -= PAGETABLE_ORDER )
+                if ( (gfn & ~((1UL << *page_order) - 1)) >
+                     p2m->max_mapped_pfn )
+                    break;
+        }
         return _mfn(INVALID_MFN);
+    }
 
     mfn = pagetable_get_mfn(p2m_get_pagetable(p2m));
 
     {
         l4_pgentry_t *l4e = map_domain_page(mfn);
         l4e += l4_table_offset(addr);
+        if ( page_order )
+            *page_order = 3 * PAGETABLE_ORDER;
         if ( (l4e_get_flags(*l4e) & _PAGE_PRESENT) == 0 )
         {
             unmap_domain_page(l4e);
@@ -735,6 +747,9 @@ p2m_pt_get_entry(struct p2m_domain *p2m,
     {
         l3_pgentry_t *l3e = map_domain_page(mfn);
         l3e += l3_table_offset(addr);
+        if ( page_order )
+            *page_order = 2 * PAGETABLE_ORDER;
+
 pod_retry_l3:
         flags = l3e_get_flags(*l3e);
         if ( !(flags & _PAGE_PRESENT) )
@@ -763,8 +778,6 @@ pod_retry_l3:
             unmap_domain_page(l3e);
 
             ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t));
-            if ( page_order )
-                *page_order = PAGE_ORDER_1G;
             return (p2m_is_valid(*t)) ? mfn : _mfn(INVALID_MFN);
         }
 
@@ -776,6 +789,8 @@ pod_retry_l3:
 
     l2e = map_domain_page(mfn);
     l2e += l2_table_offset(addr);
+    if ( page_order )
+        *page_order = PAGETABLE_ORDER;
 
 pod_retry_l2:
     flags = l2e_get_flags(*l2e);
@@ -802,8 +817,6 @@ pod_retry_l2:
         unmap_domain_page(l2e);
         
         ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t));
-        if ( page_order )
-            *page_order = PAGE_ORDER_2M;
         return (p2m_is_valid(*t)) ? mfn : _mfn(INVALID_MFN);
     }
 
@@ -814,6 +827,9 @@ pod_retry_l2:
 
     l1e = map_domain_page(mfn);
     l1e += l1_table_offset(addr);
+    if ( page_order )
+        *page_order = 0;
+
 pod_retry_l1:
     flags = l1e_get_flags(*l1e);
     l1t = p2m_flags_to_type(flags);
@@ -837,8 +853,6 @@ pod_retry_l1:
     unmap_domain_page(l1e);
 
     ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t) || p2m_is_paging(*t));
-    if ( page_order )
-        *page_order = PAGE_ORDER_4K;
     return (p2m_is_valid(*t) || p2m_is_grant(*t)) ? mfn : _mfn(INVALID_MFN);
 }
 




[-- Attachment #2: x86-p2m-npt-get-order.patch --]
[-- Type: text/plain, Size: 2895 bytes --]

x86/NPT: always return proper order value from p2m_pt_get_entry()

This is so that callers can determine what range of address space would
get altered by a corresponding "set".

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -715,14 +715,26 @@ p2m_pt_get_entry(struct p2m_domain *p2m,
     *a = p2m_access_rwx; 
 
     if ( gfn > p2m->max_mapped_pfn )
+    {
         /* This pfn is higher than the highest the p2m map currently holds */
+        if ( page_order )
+        {
+            for ( *page_order = 3 * PAGETABLE_ORDER; *page_order;
+                  *page_order -= PAGETABLE_ORDER )
+                if ( (gfn & ~((1UL << *page_order) - 1)) >
+                     p2m->max_mapped_pfn )
+                    break;
+        }
         return _mfn(INVALID_MFN);
+    }
 
     mfn = pagetable_get_mfn(p2m_get_pagetable(p2m));
 
     {
         l4_pgentry_t *l4e = map_domain_page(mfn);
         l4e += l4_table_offset(addr);
+        if ( page_order )
+            *page_order = 3 * PAGETABLE_ORDER;
         if ( (l4e_get_flags(*l4e) & _PAGE_PRESENT) == 0 )
         {
             unmap_domain_page(l4e);
@@ -735,6 +747,9 @@ p2m_pt_get_entry(struct p2m_domain *p2m,
     {
         l3_pgentry_t *l3e = map_domain_page(mfn);
         l3e += l3_table_offset(addr);
+        if ( page_order )
+            *page_order = 2 * PAGETABLE_ORDER;
+
 pod_retry_l3:
         flags = l3e_get_flags(*l3e);
         if ( !(flags & _PAGE_PRESENT) )
@@ -763,8 +778,6 @@ pod_retry_l3:
             unmap_domain_page(l3e);
 
             ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t));
-            if ( page_order )
-                *page_order = PAGE_ORDER_1G;
             return (p2m_is_valid(*t)) ? mfn : _mfn(INVALID_MFN);
         }
 
@@ -776,6 +789,8 @@ pod_retry_l3:
 
     l2e = map_domain_page(mfn);
     l2e += l2_table_offset(addr);
+    if ( page_order )
+        *page_order = PAGETABLE_ORDER;
 
 pod_retry_l2:
     flags = l2e_get_flags(*l2e);
@@ -802,8 +817,6 @@ pod_retry_l2:
         unmap_domain_page(l2e);
         
         ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t));
-        if ( page_order )
-            *page_order = PAGE_ORDER_2M;
         return (p2m_is_valid(*t)) ? mfn : _mfn(INVALID_MFN);
     }
 
@@ -814,6 +827,9 @@ pod_retry_l2:
 
     l1e = map_domain_page(mfn);
     l1e += l1_table_offset(addr);
+    if ( page_order )
+        *page_order = 0;
+
 pod_retry_l1:
     flags = l1e_get_flags(*l1e);
     l1t = p2m_flags_to_type(flags);
@@ -837,8 +853,6 @@ pod_retry_l1:
     unmap_domain_page(l1e);
 
     ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t) || p2m_is_paging(*t));
-    if ( page_order )
-        *page_order = PAGE_ORDER_4K;
     return (p2m_is_valid(*t) || p2m_is_grant(*t)) ? mfn : _mfn(INVALID_MFN);
 }
 

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3/4 RFC] x86/p2m: use large pages for MMIO mappings
  2015-09-15  7:13 ` [PATCH 0/4] x86/p2m: use large pages for MMIO mappings Jan Beulich
                     ` (2 preceding siblings ...)
  2015-09-15  7:32   ` Jan Beulich
@ 2015-09-15  7:34   ` Jan Beulich
  2015-09-16 10:02     ` Julien Grall
                       ` (2 more replies)
  2015-09-15  7:37   ` [PATCH 4/4] x86/PoD: shorten certain operations on higher order ranges Jan Beulich
  4 siblings, 3 replies; 26+ messages in thread
From: Jan Beulich @ 2015-09-15  7:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Stefano Stabellini, George Dunlap, Andrew Cooper,
	Ian Jackson, Ian Campbell, Tiejun Chen, Malcolm Crossley,
	Keir Fraser

[-- Attachment #1: Type: text/plain, Size: 14251 bytes --]

When mapping large BARs (e.g. the frame buffer of a graphics card) the
overhead or establishing such mappings using onle 4k pages has,
particularly after the XSA-125 fix, become unacceptable. Alter the
XEN_DOMCTL_memory_mapping semantics once again, so that there's no
longer a fixed amount of guest frames that represents the upper limit
of what a single invocation can map. Instead bound execution time by
limiting the number of iterations (regardless of page size).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
RFC reasons:
- ARM side unimplemented (and hence libxc for now made cope with both
  models), the main issue (besides my inability to test any change
  there) being the many internal uses of map_mmio_regions())
- error unmapping in map_mmio_regions() and error propagation to caller
  from unmap_mmio_regions() are not satisfactory (for the latter a
  possible model might be to have the function - and hence the domctl -
  return the [non-zero] number of completed entries upon error,
  requiring the caller to re-invoke the hypercall to then obtain the
  actual error for the failed slot)
- iommu_{,un}map_page() interfaces don't support "order" (hence
  mmio_order() for now returns zero when !iommu_hap_pt_share, which in
  particular means the AMD side isn't being take care of just yet)

--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -2215,7 +2215,7 @@ int xc_domain_memory_mapping(
 {
     DECLARE_DOMCTL;
     xc_dominfo_t info;
-    int ret = 0, err;
+    int ret = 0, rc;
     unsigned long done = 0, nr, max_batch_sz;
 
     if ( xc_domain_getinfo(xch, domid, 1, &info) != 1 ||
@@ -2240,19 +2240,24 @@ int xc_domain_memory_mapping(
         domctl.u.memory_mapping.nr_mfns = nr;
         domctl.u.memory_mapping.first_gfn = first_gfn + done;
         domctl.u.memory_mapping.first_mfn = first_mfn + done;
-        err = do_domctl(xch, &domctl);
-        if ( err && errno == E2BIG )
+        rc = do_domctl(xch, &domctl);
+        if ( rc < 0 && errno == E2BIG )
         {
             if ( max_batch_sz <= 1 )
                 break;
             max_batch_sz >>= 1;
             continue;
         }
+        if ( rc > 0 )
+        {
+            done += rc;
+            continue;
+        }
         /* Save the first error... */
         if ( !ret )
-            ret = err;
+            ret = rc;
         /* .. and ignore the rest of them when removing. */
-        if ( err && add_mapping != DPCI_REMOVE_MAPPING )
+        if ( rc && add_mapping != DPCI_REMOVE_MAPPING )
             break;
 
         done += nr;
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -436,7 +436,7 @@ static __init void pvh_add_mem_mapping(s
         else
             a = p2m_access_rwx;
 
-        if ( (rc = set_mmio_p2m_entry(d, gfn + i, _mfn(mfn + i), a)) )
+        if ( (rc = set_mmio_p2m_entry(d, gfn + i, _mfn(mfn + i), 0, a)) )
             panic("pvh_add_mem_mapping: gfn:%lx mfn:%lx i:%ld rc:%d\n",
                   gfn, mfn, i, rc);
         if ( !(i & 0xfffff) )
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2396,7 +2396,8 @@ static int vmx_alloc_vlapic_mapping(stru
     share_xen_page_with_guest(virt_to_page(apic_va), d, XENSHARE_writable);
     d->arch.hvm_domain.vmx.apic_access_mfn = virt_to_mfn(apic_va);
     set_mmio_p2m_entry(d, paddr_to_pfn(APIC_DEFAULT_PHYS_BASE),
-        _mfn(virt_to_mfn(apic_va)), p2m_get_hostp2m(d)->default_access);
+                       _mfn(virt_to_mfn(apic_va)), 0,
+                       p2m_get_hostp2m(d)->default_access);
 
     return 0;
 }
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -897,39 +897,47 @@ void p2m_change_type_range(struct domain
 
 /* Returns: 0 for success, -errno for failure */
 static int set_typed_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
-                               p2m_type_t gfn_p2mt, p2m_access_t access)
+                               unsigned int order, p2m_type_t gfn_p2mt,
+                               p2m_access_t access)
 {
     int rc = 0;
     p2m_access_t a;
     p2m_type_t ot;
     mfn_t omfn;
+    unsigned int cur_order = 0;
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
 
     if ( !paging_mode_translate(d) )
         return -EIO;
 
-    gfn_lock(p2m, gfn, 0);
-    omfn = p2m->get_entry(p2m, gfn, &ot, &a, 0, NULL, NULL);
+    gfn_lock(p2m, gfn, order);
+    omfn = p2m->get_entry(p2m, gfn, &ot, &a, 0, &cur_order, NULL);
+    if ( cur_order < order )
+    {
+        gfn_unlock(p2m, gfn, order);
+        return cur_order + 1;
+    }
     if ( p2m_is_grant(ot) || p2m_is_foreign(ot) )
     {
-        gfn_unlock(p2m, gfn, 0);
+        gfn_unlock(p2m, gfn, order);
         domain_crash(d);
         return -ENOENT;
     }
     else if ( p2m_is_ram(ot) )
     {
+        unsigned long i;
+
         ASSERT(mfn_valid(omfn));
-        set_gpfn_from_mfn(mfn_x(omfn), INVALID_M2P_ENTRY);
+        for ( i = 0; i < (1UL << order); ++i )
+            set_gpfn_from_mfn(mfn_x(omfn) + i, INVALID_M2P_ENTRY);
     }
 
     P2M_DEBUG("set %d %lx %lx\n", gfn_p2mt, gfn, mfn_x(mfn));
-    rc = p2m_set_entry(p2m, gfn, mfn, PAGE_ORDER_4K, gfn_p2mt,
-                       access);
-    gfn_unlock(p2m, gfn, 0);
+    rc = p2m_set_entry(p2m, gfn, mfn, order, gfn_p2mt, access);
+    gfn_unlock(p2m, gfn, order);
     if ( rc )
-        gdprintk(XENLOG_ERR,
-                 "p2m_set_entry failed! mfn=%08lx rc:%d\n",
-                 mfn_x(get_gfn_query_unlocked(p2m->domain, gfn, &ot)), rc);
+        gdprintk(XENLOG_ERR, "p2m_set_entry: %#lx:%u -> %d (%#lx)\n",
+                 gfn, order, rc, mfn_x(mfn));
     return rc;
 }
 
@@ -937,14 +945,14 @@ static int set_typed_p2m_entry(struct do
 static int set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
                                  mfn_t mfn)
 {
-    return set_typed_p2m_entry(d, gfn, mfn, p2m_map_foreign,
+    return set_typed_p2m_entry(d, gfn, mfn, 0, p2m_map_foreign,
                                p2m_get_hostp2m(d)->default_access);
 }
 
 int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
-                       p2m_access_t access)
+                       unsigned int order, p2m_access_t access)
 {
-    return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
+    return set_typed_p2m_entry(d, gfn, mfn, order, p2m_mmio_direct, access);
 }
 
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
@@ -989,19 +997,26 @@ int set_identity_p2m_entry(struct domain
 }
 
 /* Returns: 0 for success, -errno for failure */
-int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
+                         unsigned int order)
 {
     int rc = -EINVAL;
     mfn_t actual_mfn;
     p2m_access_t a;
     p2m_type_t t;
+    unsigned int cur_order = 0;
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
 
     if ( !paging_mode_translate(d) )
         return -EIO;
 
-    gfn_lock(p2m, gfn, 0);
-    actual_mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL, NULL);
+    gfn_lock(p2m, gfn, order);
+    actual_mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, &cur_order, NULL);
+    if ( cur_order < order )
+    {
+        rc = cur_order + 1;
+        goto out;
+    }
 
     /* Do not use mfn_valid() here as it will usually fail for MMIO pages. */
     if ( (INVALID_MFN == mfn_x(actual_mfn)) || (t != p2m_mmio_direct) )
@@ -1014,11 +1029,11 @@ int clear_mmio_p2m_entry(struct domain *
         gdprintk(XENLOG_WARNING,
                  "no mapping between mfn %08lx and gfn %08lx\n",
                  mfn_x(mfn), gfn);
-    rc = p2m_set_entry(p2m, gfn, _mfn(INVALID_MFN), PAGE_ORDER_4K, p2m_invalid,
+    rc = p2m_set_entry(p2m, gfn, _mfn(INVALID_MFN), order, p2m_invalid,
                        p2m->default_access);
 
  out:
-    gfn_unlock(p2m, gfn, 0);
+    gfn_unlock(p2m, gfn, order);
 
     return rc;
 }
@@ -2037,6 +2052,25 @@ unsigned long paging_gva_to_gfn(struct v
     return hostmode->gva_to_gfn(v, hostp2m, va, pfec);
 }
 
+static unsigned int mmio_order(const struct domain *d,
+                               unsigned long start_fn, unsigned long nr)
+{
+    if ( !hap_enabled(d) || !need_iommu(d) || !iommu_hap_pt_share ||
+         (start_fn & ((1UL << PAGE_ORDER_2M) - 1)) || !(nr >> PAGE_ORDER_2M) )
+        return 0;
+
+    if ( !(start_fn & ((1UL << PAGE_ORDER_1G) - 1)) && (nr >> PAGE_ORDER_1G) &&
+         opt_hap_1gb && hvm_hap_has_1gb(d) )
+        return PAGE_ORDER_1G;
+
+    if ( opt_hap_2mb && hvm_hap_has_2mb(d) )
+        return PAGE_ORDER_2M;
+
+    return 0;
+}
+
+#define MAP_MMIO_MAX_ITER 64 /* pretty arbitrary */
+
 int map_mmio_regions(struct domain *d,
                      unsigned long start_gfn,
                      unsigned long nr,
@@ -2044,22 +2078,45 @@ int map_mmio_regions(struct domain *d,
 {
     int ret = 0;
     unsigned long i;
+    unsigned int iter, order;
 
     if ( !paging_mode_translate(d) )
         return 0;
 
-    for ( i = 0; !ret && i < nr; i++ )
+    for ( iter = i = 0; i < nr && iter < MAP_MMIO_MAX_ITER;
+          i += 1UL << order, ++iter )
     {
-        ret = set_mmio_p2m_entry(d, start_gfn + i, _mfn(mfn + i),
-                                 p2m_get_hostp2m(d)->default_access);
-        if ( ret )
+        for ( order = mmio_order(d, (start_gfn + i) | (mfn + i), nr - i); ;
+              order = ret - 1 )
         {
-            unmap_mmio_regions(d, start_gfn, i, mfn);
+            ret = set_mmio_p2m_entry(d, start_gfn + i, _mfn(mfn + i), order,
+                                     p2m_get_hostp2m(d)->default_access);
+            if ( ret <= 0 )
+                break;
+        }
+        if ( ret < 0 )
+        {
+            for ( nr = i, iter = i = 0; i < nr ; i += 1UL << order, ++iter )
+            {
+                int rc;
+
+                WARN_ON(iter == MAP_MMIO_MAX_ITER);
+                for ( order = mmio_order(d, (start_gfn + i) | (mfn + i),
+                                         nr - i); ; order = rc - 1 )
+                {
+                    rc = clear_mmio_p2m_entry(d, start_gfn + i,
+                                              _mfn(mfn + i), order);
+                    if ( rc <= 0 )
+                        break;
+                }
+                if ( rc < 0 )
+                    order = 0;
+            }
             break;
         }
     }
 
-    return ret;
+    return ret < 0 ? ret : i == nr ? 0 : i;
 }
 
 int unmap_mmio_regions(struct domain *d,
@@ -2069,18 +2126,31 @@ int unmap_mmio_regions(struct domain *d,
 {
     int err = 0;
     unsigned long i;
+    unsigned int iter, order;
 
     if ( !paging_mode_translate(d) )
         return 0;
 
-    for ( i = 0; i < nr; i++ )
+    for ( iter = i = 0; i < nr && iter < MAP_MMIO_MAX_ITER;
+          i += 1UL << order, ++iter )
     {
-        int ret = clear_mmio_p2m_entry(d, start_gfn + i, _mfn(mfn + i));
-        if ( ret )
+        int ret;
+
+        for ( order = mmio_order(d, (start_gfn + i) | (mfn + i), nr - i); ;
+              order = ret - 1 )
+        {
+            ret = clear_mmio_p2m_entry(d, start_gfn + i, _mfn(mfn + i), order);
+            if ( ret <= 0 )
+                break;
+        }
+        if ( ret < 0 )
+        {
             err = ret;
+            order = 0;
+        }
     }
 
-    return err;
+    return err ?: i == nr ? 0 : i;
 }
 
 unsigned int p2m_find_altp2m_by_eptp(struct domain *d, uint64_t eptp)
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -1042,10 +1042,12 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xe
              (gfn + nr_mfns - 1) < gfn ) /* wrap? */
             break;
 
+#ifndef CONFIG_X86 /* XXX ARM!? */
         ret = -E2BIG;
         /* Must break hypercall up as this could take a while. */
         if ( nr_mfns > 64 )
             break;
+#endif
 
         ret = -EPERM;
         if ( !iomem_access_permitted(current->domain, mfn, mfn_end) ||
@@ -1063,7 +1065,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xe
                    d->domain_id, gfn, mfn, nr_mfns);
 
             ret = map_mmio_regions(d, gfn, nr_mfns, mfn);
-            if ( ret )
+            if ( ret < 0 )
                 printk(XENLOG_G_WARNING
                        "memory_map:fail: dom%d gfn=%lx mfn=%lx nr=%lx ret:%ld\n",
                        d->domain_id, gfn, mfn, nr_mfns, ret);
@@ -1075,7 +1077,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xe
                    d->domain_id, gfn, mfn, nr_mfns);
 
             ret = unmap_mmio_regions(d, gfn, nr_mfns, mfn);
-            if ( ret && is_hardware_domain(current->domain) )
+            if ( ret < 0 && is_hardware_domain(current->domain) )
                 printk(XENLOG_ERR
                        "memory_map: error %ld removing dom%d access to [%lx,%lx]\n",
                        ret, d->domain_id, mfn, mfn_end);
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -209,7 +209,7 @@ int guest_remove_page(struct domain *d, 
     }
     if ( p2mt == p2m_mmio_direct )
     {
-        clear_mmio_p2m_entry(d, gmfn, _mfn(mfn));
+        clear_mmio_p2m_entry(d, gmfn, _mfn(mfn), 0);
         put_gfn(d, gmfn);
         return 1;
     }
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -557,8 +557,9 @@ int p2m_is_logdirty_range(struct p2m_dom
 
 /* Set mmio addresses in the p2m table (for pass-through) */
 int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
-                       p2m_access_t access);
-int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
+                       unsigned int order, p2m_access_t access);
+int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
+                         unsigned int order);
 
 /* Set identity addresses in the p2m table (for pass-through) */
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,



[-- Attachment #2: x86-p2m-mmio-large.patch --]
[-- Type: text/plain, Size: 14293 bytes --]

x86/p2m: use large pages for MMIO mappings

When mapping large BARs (e.g. the frame buffer of a graphics card) the
overhead or establishing such mappings using onle 4k pages has,
particularly after the XSA-125 fix, become unacceptable. Alter the
XEN_DOMCTL_memory_mapping semantics once again, so that there's no
longer a fixed amount of guest frames that represents the upper limit
of what a single invocation can map. Instead bound execution time by
limiting the number of iterations (regardless of page size).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
RFC reasons:
- ARM side unimplemented (and hence libxc for now made cope with both
  models), the main issue (besides my inability to test any change
  there) being the many internal uses of map_mmio_regions())
- error unmapping in map_mmio_regions() and error propagation to caller
  from unmap_mmio_regions() are not satisfactory (for the latter a
  possible model might be to have the function - and hence the domctl -
  return the [non-zero] number of completed entries upon error,
  requiring the caller to re-invoke the hypercall to then obtain the
  actual error for the failed slot)
- iommu_{,un}map_page() interfaces don't support "order" (hence
  mmio_order() for now returns zero when !iommu_hap_pt_share, which in
  particular means the AMD side isn't being take care of just yet)

--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -2215,7 +2215,7 @@ int xc_domain_memory_mapping(
 {
     DECLARE_DOMCTL;
     xc_dominfo_t info;
-    int ret = 0, err;
+    int ret = 0, rc;
     unsigned long done = 0, nr, max_batch_sz;
 
     if ( xc_domain_getinfo(xch, domid, 1, &info) != 1 ||
@@ -2240,19 +2240,24 @@ int xc_domain_memory_mapping(
         domctl.u.memory_mapping.nr_mfns = nr;
         domctl.u.memory_mapping.first_gfn = first_gfn + done;
         domctl.u.memory_mapping.first_mfn = first_mfn + done;
-        err = do_domctl(xch, &domctl);
-        if ( err && errno == E2BIG )
+        rc = do_domctl(xch, &domctl);
+        if ( rc < 0 && errno == E2BIG )
         {
             if ( max_batch_sz <= 1 )
                 break;
             max_batch_sz >>= 1;
             continue;
         }
+        if ( rc > 0 )
+        {
+            done += rc;
+            continue;
+        }
         /* Save the first error... */
         if ( !ret )
-            ret = err;
+            ret = rc;
         /* .. and ignore the rest of them when removing. */
-        if ( err && add_mapping != DPCI_REMOVE_MAPPING )
+        if ( rc && add_mapping != DPCI_REMOVE_MAPPING )
             break;
 
         done += nr;
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -436,7 +436,7 @@ static __init void pvh_add_mem_mapping(s
         else
             a = p2m_access_rwx;
 
-        if ( (rc = set_mmio_p2m_entry(d, gfn + i, _mfn(mfn + i), a)) )
+        if ( (rc = set_mmio_p2m_entry(d, gfn + i, _mfn(mfn + i), 0, a)) )
             panic("pvh_add_mem_mapping: gfn:%lx mfn:%lx i:%ld rc:%d\n",
                   gfn, mfn, i, rc);
         if ( !(i & 0xfffff) )
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2396,7 +2396,8 @@ static int vmx_alloc_vlapic_mapping(stru
     share_xen_page_with_guest(virt_to_page(apic_va), d, XENSHARE_writable);
     d->arch.hvm_domain.vmx.apic_access_mfn = virt_to_mfn(apic_va);
     set_mmio_p2m_entry(d, paddr_to_pfn(APIC_DEFAULT_PHYS_BASE),
-        _mfn(virt_to_mfn(apic_va)), p2m_get_hostp2m(d)->default_access);
+                       _mfn(virt_to_mfn(apic_va)), 0,
+                       p2m_get_hostp2m(d)->default_access);
 
     return 0;
 }
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -897,39 +897,47 @@ void p2m_change_type_range(struct domain
 
 /* Returns: 0 for success, -errno for failure */
 static int set_typed_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
-                               p2m_type_t gfn_p2mt, p2m_access_t access)
+                               unsigned int order, p2m_type_t gfn_p2mt,
+                               p2m_access_t access)
 {
     int rc = 0;
     p2m_access_t a;
     p2m_type_t ot;
     mfn_t omfn;
+    unsigned int cur_order = 0;
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
 
     if ( !paging_mode_translate(d) )
         return -EIO;
 
-    gfn_lock(p2m, gfn, 0);
-    omfn = p2m->get_entry(p2m, gfn, &ot, &a, 0, NULL, NULL);
+    gfn_lock(p2m, gfn, order);
+    omfn = p2m->get_entry(p2m, gfn, &ot, &a, 0, &cur_order, NULL);
+    if ( cur_order < order )
+    {
+        gfn_unlock(p2m, gfn, order);
+        return cur_order + 1;
+    }
     if ( p2m_is_grant(ot) || p2m_is_foreign(ot) )
     {
-        gfn_unlock(p2m, gfn, 0);
+        gfn_unlock(p2m, gfn, order);
         domain_crash(d);
         return -ENOENT;
     }
     else if ( p2m_is_ram(ot) )
     {
+        unsigned long i;
+
         ASSERT(mfn_valid(omfn));
-        set_gpfn_from_mfn(mfn_x(omfn), INVALID_M2P_ENTRY);
+        for ( i = 0; i < (1UL << order); ++i )
+            set_gpfn_from_mfn(mfn_x(omfn) + i, INVALID_M2P_ENTRY);
     }
 
     P2M_DEBUG("set %d %lx %lx\n", gfn_p2mt, gfn, mfn_x(mfn));
-    rc = p2m_set_entry(p2m, gfn, mfn, PAGE_ORDER_4K, gfn_p2mt,
-                       access);
-    gfn_unlock(p2m, gfn, 0);
+    rc = p2m_set_entry(p2m, gfn, mfn, order, gfn_p2mt, access);
+    gfn_unlock(p2m, gfn, order);
     if ( rc )
-        gdprintk(XENLOG_ERR,
-                 "p2m_set_entry failed! mfn=%08lx rc:%d\n",
-                 mfn_x(get_gfn_query_unlocked(p2m->domain, gfn, &ot)), rc);
+        gdprintk(XENLOG_ERR, "p2m_set_entry: %#lx:%u -> %d (%#lx)\n",
+                 gfn, order, rc, mfn_x(mfn));
     return rc;
 }
 
@@ -937,14 +945,14 @@ static int set_typed_p2m_entry(struct do
 static int set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
                                  mfn_t mfn)
 {
-    return set_typed_p2m_entry(d, gfn, mfn, p2m_map_foreign,
+    return set_typed_p2m_entry(d, gfn, mfn, 0, p2m_map_foreign,
                                p2m_get_hostp2m(d)->default_access);
 }
 
 int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
-                       p2m_access_t access)
+                       unsigned int order, p2m_access_t access)
 {
-    return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
+    return set_typed_p2m_entry(d, gfn, mfn, order, p2m_mmio_direct, access);
 }
 
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
@@ -989,19 +997,26 @@ int set_identity_p2m_entry(struct domain
 }
 
 /* Returns: 0 for success, -errno for failure */
-int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
+                         unsigned int order)
 {
     int rc = -EINVAL;
     mfn_t actual_mfn;
     p2m_access_t a;
     p2m_type_t t;
+    unsigned int cur_order = 0;
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
 
     if ( !paging_mode_translate(d) )
         return -EIO;
 
-    gfn_lock(p2m, gfn, 0);
-    actual_mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL, NULL);
+    gfn_lock(p2m, gfn, order);
+    actual_mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, &cur_order, NULL);
+    if ( cur_order < order )
+    {
+        rc = cur_order + 1;
+        goto out;
+    }
 
     /* Do not use mfn_valid() here as it will usually fail for MMIO pages. */
     if ( (INVALID_MFN == mfn_x(actual_mfn)) || (t != p2m_mmio_direct) )
@@ -1014,11 +1029,11 @@ int clear_mmio_p2m_entry(struct domain *
         gdprintk(XENLOG_WARNING,
                  "no mapping between mfn %08lx and gfn %08lx\n",
                  mfn_x(mfn), gfn);
-    rc = p2m_set_entry(p2m, gfn, _mfn(INVALID_MFN), PAGE_ORDER_4K, p2m_invalid,
+    rc = p2m_set_entry(p2m, gfn, _mfn(INVALID_MFN), order, p2m_invalid,
                        p2m->default_access);
 
  out:
-    gfn_unlock(p2m, gfn, 0);
+    gfn_unlock(p2m, gfn, order);
 
     return rc;
 }
@@ -2037,6 +2052,25 @@ unsigned long paging_gva_to_gfn(struct v
     return hostmode->gva_to_gfn(v, hostp2m, va, pfec);
 }
 
+static unsigned int mmio_order(const struct domain *d,
+                               unsigned long start_fn, unsigned long nr)
+{
+    if ( !hap_enabled(d) || !need_iommu(d) || !iommu_hap_pt_share ||
+         (start_fn & ((1UL << PAGE_ORDER_2M) - 1)) || !(nr >> PAGE_ORDER_2M) )
+        return 0;
+
+    if ( !(start_fn & ((1UL << PAGE_ORDER_1G) - 1)) && (nr >> PAGE_ORDER_1G) &&
+         opt_hap_1gb && hvm_hap_has_1gb(d) )
+        return PAGE_ORDER_1G;
+
+    if ( opt_hap_2mb && hvm_hap_has_2mb(d) )
+        return PAGE_ORDER_2M;
+
+    return 0;
+}
+
+#define MAP_MMIO_MAX_ITER 64 /* pretty arbitrary */
+
 int map_mmio_regions(struct domain *d,
                      unsigned long start_gfn,
                      unsigned long nr,
@@ -2044,22 +2078,45 @@ int map_mmio_regions(struct domain *d,
 {
     int ret = 0;
     unsigned long i;
+    unsigned int iter, order;
 
     if ( !paging_mode_translate(d) )
         return 0;
 
-    for ( i = 0; !ret && i < nr; i++ )
+    for ( iter = i = 0; i < nr && iter < MAP_MMIO_MAX_ITER;
+          i += 1UL << order, ++iter )
     {
-        ret = set_mmio_p2m_entry(d, start_gfn + i, _mfn(mfn + i),
-                                 p2m_get_hostp2m(d)->default_access);
-        if ( ret )
+        for ( order = mmio_order(d, (start_gfn + i) | (mfn + i), nr - i); ;
+              order = ret - 1 )
         {
-            unmap_mmio_regions(d, start_gfn, i, mfn);
+            ret = set_mmio_p2m_entry(d, start_gfn + i, _mfn(mfn + i), order,
+                                     p2m_get_hostp2m(d)->default_access);
+            if ( ret <= 0 )
+                break;
+        }
+        if ( ret < 0 )
+        {
+            for ( nr = i, iter = i = 0; i < nr ; i += 1UL << order, ++iter )
+            {
+                int rc;
+
+                WARN_ON(iter == MAP_MMIO_MAX_ITER);
+                for ( order = mmio_order(d, (start_gfn + i) | (mfn + i),
+                                         nr - i); ; order = rc - 1 )
+                {
+                    rc = clear_mmio_p2m_entry(d, start_gfn + i,
+                                              _mfn(mfn + i), order);
+                    if ( rc <= 0 )
+                        break;
+                }
+                if ( rc < 0 )
+                    order = 0;
+            }
             break;
         }
     }
 
-    return ret;
+    return ret < 0 ? ret : i == nr ? 0 : i;
 }
 
 int unmap_mmio_regions(struct domain *d,
@@ -2069,18 +2126,31 @@ int unmap_mmio_regions(struct domain *d,
 {
     int err = 0;
     unsigned long i;
+    unsigned int iter, order;
 
     if ( !paging_mode_translate(d) )
         return 0;
 
-    for ( i = 0; i < nr; i++ )
+    for ( iter = i = 0; i < nr && iter < MAP_MMIO_MAX_ITER;
+          i += 1UL << order, ++iter )
     {
-        int ret = clear_mmio_p2m_entry(d, start_gfn + i, _mfn(mfn + i));
-        if ( ret )
+        int ret;
+
+        for ( order = mmio_order(d, (start_gfn + i) | (mfn + i), nr - i); ;
+              order = ret - 1 )
+        {
+            ret = clear_mmio_p2m_entry(d, start_gfn + i, _mfn(mfn + i), order);
+            if ( ret <= 0 )
+                break;
+        }
+        if ( ret < 0 )
+        {
             err = ret;
+            order = 0;
+        }
     }
 
-    return err;
+    return err ?: i == nr ? 0 : i;
 }
 
 unsigned int p2m_find_altp2m_by_eptp(struct domain *d, uint64_t eptp)
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -1042,10 +1042,12 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xe
              (gfn + nr_mfns - 1) < gfn ) /* wrap? */
             break;
 
+#ifndef CONFIG_X86 /* XXX ARM!? */
         ret = -E2BIG;
         /* Must break hypercall up as this could take a while. */
         if ( nr_mfns > 64 )
             break;
+#endif
 
         ret = -EPERM;
         if ( !iomem_access_permitted(current->domain, mfn, mfn_end) ||
@@ -1063,7 +1065,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xe
                    d->domain_id, gfn, mfn, nr_mfns);
 
             ret = map_mmio_regions(d, gfn, nr_mfns, mfn);
-            if ( ret )
+            if ( ret < 0 )
                 printk(XENLOG_G_WARNING
                        "memory_map:fail: dom%d gfn=%lx mfn=%lx nr=%lx ret:%ld\n",
                        d->domain_id, gfn, mfn, nr_mfns, ret);
@@ -1075,7 +1077,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xe
                    d->domain_id, gfn, mfn, nr_mfns);
 
             ret = unmap_mmio_regions(d, gfn, nr_mfns, mfn);
-            if ( ret && is_hardware_domain(current->domain) )
+            if ( ret < 0 && is_hardware_domain(current->domain) )
                 printk(XENLOG_ERR
                        "memory_map: error %ld removing dom%d access to [%lx,%lx]\n",
                        ret, d->domain_id, mfn, mfn_end);
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -209,7 +209,7 @@ int guest_remove_page(struct domain *d, 
     }
     if ( p2mt == p2m_mmio_direct )
     {
-        clear_mmio_p2m_entry(d, gmfn, _mfn(mfn));
+        clear_mmio_p2m_entry(d, gmfn, _mfn(mfn), 0);
         put_gfn(d, gmfn);
         return 1;
     }
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -557,8 +557,9 @@ int p2m_is_logdirty_range(struct p2m_dom
 
 /* Set mmio addresses in the p2m table (for pass-through) */
 int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
-                       p2m_access_t access);
-int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
+                       unsigned int order, p2m_access_t access);
+int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
+                         unsigned int order);
 
 /* Set identity addresses in the p2m table (for pass-through) */
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] x86/NPT: always return proper order value from p2m_pt_get_entry()
  2015-09-15  7:31   ` [PATCH 2/4] x86/NPT: always return proper order value from p2m_pt_get_entry() Jan Beulich
@ 2015-09-15  7:35     ` Jan Beulich
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2015-09-15  7:35 UTC (permalink / raw)
  To: xen-devel
  Cc: George Dunlap, Andrew Cooper, Malcolm Crossley, Keir Fraser, Tiejun Chen

Please disregard this - hit "send" too early.

>>> On 15.09.15 at 09:31, <JBeulich@suse.com> wrote:
>>>> On 15.09.15 at 09:13, <JBeulich@suse.com> wrote:
>> ... where possible. This is in response to
>> http://lists.xenproject.org/archives/html/xen-devel/2015-09/msg01502.html,
>> albeit only the first three patches are really needed for that (the
>> fourth one just leverages what the first two do in another way).
>> The third (main) patch is RFC for a number of reasons - see there.
>> 
>> 1: EPT: always return proper order value from ept_get_entry()
>> 2: 
>> 3: p2m: use large pages for MMIO mappings
>> 4: PoD: shorten updates for certain operations on higher order ranges
>> 
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org 
>> http://lists.xen.org/xen-devel 
> 
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org 
> http://lists.xen.org/xen-devel 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 4/4] x86/PoD: shorten certain operations on higher order ranges
  2015-09-15  7:13 ` [PATCH 0/4] x86/p2m: use large pages for MMIO mappings Jan Beulich
                     ` (3 preceding siblings ...)
  2015-09-15  7:34   ` [PATCH 3/4 RFC] x86/p2m: use large pages for MMIO mappings Jan Beulich
@ 2015-09-15  7:37   ` Jan Beulich
  2015-09-23 17:10     ` George Dunlap
  4 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2015-09-15  7:37 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Andrew Cooper, Keir Fraser

[-- Attachment #1: Type: text/plain, Size: 7057 bytes --]

Now that p2m->get_entry() always returns a valid order, utilize this
to accelerate some of the operations in PoD code. (There are two uses
of p2m->get_entry() left which don't easily lend themselves to this
optimization.)

Also adjust a few types as needed and remove stale comments from
p2m_pod_cache_add() (to avoid duplicating them yet another time).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -119,20 +119,23 @@ p2m_pod_cache_add(struct p2m_domain *p2m
 
     unlock_page_alloc(p2m);
 
-    /* Then add the first one to the appropriate populate-on-demand list */
-    switch(order)
+    /* Then add to the appropriate populate-on-demand list. */
+    switch ( order )
     {
+    case PAGE_ORDER_1G:
+        for ( i = 0; i < (1UL << PAGE_ORDER_1G); i += 1UL << PAGE_ORDER_2M )
+            page_list_add_tail(page + i, &p2m->pod.super);
+        break;
     case PAGE_ORDER_2M:
-        page_list_add_tail(page, &p2m->pod.super); /* lock: page_alloc */
-        p2m->pod.count += 1 << order;
+        page_list_add_tail(page, &p2m->pod.super);
         break;
     case PAGE_ORDER_4K:
-        page_list_add_tail(page, &p2m->pod.single); /* lock: page_alloc */
-        p2m->pod.count += 1;
+        page_list_add_tail(page, &p2m->pod.single);
         break;
     default:
         BUG();
     }
+    p2m->pod.count += 1 << order;
 
     return 0;
 }
@@ -502,11 +505,10 @@ p2m_pod_decrease_reservation(struct doma
                              unsigned int order)
 {
     int ret=0;
-    int i;
+    unsigned long i, n;
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
-
-    int steal_for_cache;
-    int pod, nonpod, ram;
+    bool_t steal_for_cache;
+    long pod, nonpod, ram;
 
     gfn_lock(p2m, gpfn, order);
     pod_lock(p2m);    
@@ -525,21 +527,21 @@ recount:
     /* Figure out if we need to steal some freed memory for our cache */
     steal_for_cache =  ( p2m->pod.entry_count > p2m->pod.count );
 
-    /* FIXME: Add contiguous; query for PSE entries? */
-    for ( i=0; i<(1<<order); i++)
+    for ( i = 0; i < (1UL << order); i += n )
     {
         p2m_access_t a;
         p2m_type_t t;
+        unsigned int cur_order;
 
-        (void)p2m->get_entry(p2m, gpfn + i, &t, &a, 0, NULL, NULL);
-
+        p2m->get_entry(p2m, gpfn + i, &t, &a, 0, &cur_order, NULL);
+        n = 1UL << min(order, cur_order);
         if ( t == p2m_populate_on_demand )
-            pod++;
+            pod += n;
         else
         {
-            nonpod++;
+            nonpod += n;
             if ( p2m_is_ram(t) )
-                ram++;
+                ram += n;
         }
     }
 
@@ -574,41 +576,46 @@ recount:
      * + There are PoD entries to handle, or
      * + There is ram left, and we want to steal it
      */
-    for ( i=0;
-          i<(1<<order) && (pod>0 || (steal_for_cache && ram > 0));
-          i++)
+    for ( i = 0;
+          i < (1UL << order) && (pod > 0 || (steal_for_cache && ram > 0));
+          i += n )
     {
         mfn_t mfn;
         p2m_type_t t;
         p2m_access_t a;
+        unsigned int cur_order;
 
-        mfn = p2m->get_entry(p2m, gpfn + i, &t, &a, 0, NULL, NULL);
+        mfn = p2m->get_entry(p2m, gpfn + i, &t, &a, 0, &cur_order, NULL);
+        if ( order < cur_order )
+            cur_order = order;
+        n = 1UL << cur_order;
         if ( t == p2m_populate_on_demand )
         {
-            p2m_set_entry(p2m, gpfn + i, _mfn(INVALID_MFN), 0, p2m_invalid,
-                          p2m->default_access);
-            p2m->pod.entry_count--;
+            p2m_set_entry(p2m, gpfn + i, _mfn(INVALID_MFN), cur_order,
+                          p2m_invalid, p2m->default_access);
+            p2m->pod.entry_count -= n;
             BUG_ON(p2m->pod.entry_count < 0);
-            pod--;
+            pod -= n;
         }
         else if ( steal_for_cache && p2m_is_ram(t) )
         {
             struct page_info *page;
+            unsigned int j;
 
             ASSERT(mfn_valid(mfn));
 
             page = mfn_to_page(mfn);
 
-            p2m_set_entry(p2m, gpfn + i, _mfn(INVALID_MFN), 0, p2m_invalid,
-                          p2m->default_access);
-            set_gpfn_from_mfn(mfn_x(mfn), INVALID_M2P_ENTRY);
-
-            p2m_pod_cache_add(p2m, page, 0);
+            p2m_set_entry(p2m, gpfn + i, _mfn(INVALID_MFN), cur_order,
+                          p2m_invalid, p2m->default_access);
+            for ( j = 0; j < n; ++j )
+                set_gpfn_from_mfn(mfn_x(mfn), INVALID_M2P_ENTRY);
+            p2m_pod_cache_add(p2m, page, cur_order);
 
             steal_for_cache =  ( p2m->pod.entry_count > p2m->pod.count );
 
-            nonpod--;
-            ram--;
+            nonpod -= n;
+            ram -= n;
         }
     }    
 
@@ -649,7 +656,8 @@ p2m_pod_zero_check_superpage(struct p2m_
     p2m_type_t type, type0 = 0;
     unsigned long * map = NULL;
     int ret=0, reset = 0;
-    int i, j;
+    unsigned long i, n;
+    unsigned int j;
     int max_ref = 1;
     struct domain *d = p2m->domain;
 
@@ -668,10 +676,13 @@ p2m_pod_zero_check_superpage(struct p2m_
 
     /* Look up the mfns, checking to make sure they're the same mfn
      * and aligned, and mapping them. */
-    for ( i=0; i<SUPERPAGE_PAGES; i++ )
+    for ( i = 0; i < SUPERPAGE_PAGES; i += n )
     {
         p2m_access_t a; 
-        mfn = p2m->get_entry(p2m, gfn + i, &type, &a, 0, NULL, NULL);
+        unsigned int cur_order;
+
+        mfn = p2m->get_entry(p2m, gfn + i, &type, &a, 0, &cur_order, NULL);
+        n = 1UL << min(cur_order, SUPERPAGE_ORDER + 0U);
 
         if ( i == 0 )
         {
@@ -1114,7 +1125,7 @@ guest_physmap_mark_populate_on_demand(st
                                       unsigned int order)
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
-    unsigned long i, pod_count = 0;
+    unsigned long i, n, pod_count = 0;
     p2m_type_t ot;
     mfn_t omfn;
     int rc = 0;
@@ -1127,10 +1138,13 @@ guest_physmap_mark_populate_on_demand(st
     P2M_DEBUG("mark pod gfn=%#lx\n", gfn);
 
     /* Make sure all gpfns are unused */
-    for ( i = 0; i < (1UL << order); i++ )
+    for ( i = 0; i < (1UL << order); i += n )
     {
         p2m_access_t a;
-        omfn = p2m->get_entry(p2m, gfn + i, &ot, &a, 0, NULL, NULL);
+        unsigned int cur_order;
+
+        omfn = p2m->get_entry(p2m, gfn + i, &ot, &a, 0, &cur_order, NULL);
+        n = 1UL << min(order, cur_order);
         if ( p2m_is_ram(ot) )
         {
             P2M_DEBUG("gfn_to_mfn returned type %d!\n", ot);
@@ -1140,7 +1154,7 @@ guest_physmap_mark_populate_on_demand(st
         else if ( ot == p2m_populate_on_demand )
         {
             /* Count how man PoD entries we'll be replacing if successful */
-            pod_count++;
+            pod_count += n;
         }
     }
 



[-- Attachment #2: x86-p2m-pod-query-contiguous.patch --]
[-- Type: text/plain, Size: 7115 bytes --]

x86/PoD: shorten certain operations on higher order ranges

Now that p2m->get_entry() always returns a valid order, utilize this
to accelerate some of the operations in PoD code. (There are two uses
of p2m->get_entry() left which don't easily lend themselves to this
optimization.)

Also adjust a few types as needed and remove stale comments from
p2m_pod_cache_add() (to avoid duplicating them yet another time).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -119,20 +119,23 @@ p2m_pod_cache_add(struct p2m_domain *p2m
 
     unlock_page_alloc(p2m);
 
-    /* Then add the first one to the appropriate populate-on-demand list */
-    switch(order)
+    /* Then add to the appropriate populate-on-demand list. */
+    switch ( order )
     {
+    case PAGE_ORDER_1G:
+        for ( i = 0; i < (1UL << PAGE_ORDER_1G); i += 1UL << PAGE_ORDER_2M )
+            page_list_add_tail(page + i, &p2m->pod.super);
+        break;
     case PAGE_ORDER_2M:
-        page_list_add_tail(page, &p2m->pod.super); /* lock: page_alloc */
-        p2m->pod.count += 1 << order;
+        page_list_add_tail(page, &p2m->pod.super);
         break;
     case PAGE_ORDER_4K:
-        page_list_add_tail(page, &p2m->pod.single); /* lock: page_alloc */
-        p2m->pod.count += 1;
+        page_list_add_tail(page, &p2m->pod.single);
         break;
     default:
         BUG();
     }
+    p2m->pod.count += 1 << order;
 
     return 0;
 }
@@ -502,11 +505,10 @@ p2m_pod_decrease_reservation(struct doma
                              unsigned int order)
 {
     int ret=0;
-    int i;
+    unsigned long i, n;
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
-
-    int steal_for_cache;
-    int pod, nonpod, ram;
+    bool_t steal_for_cache;
+    long pod, nonpod, ram;
 
     gfn_lock(p2m, gpfn, order);
     pod_lock(p2m);    
@@ -525,21 +527,21 @@ recount:
     /* Figure out if we need to steal some freed memory for our cache */
     steal_for_cache =  ( p2m->pod.entry_count > p2m->pod.count );
 
-    /* FIXME: Add contiguous; query for PSE entries? */
-    for ( i=0; i<(1<<order); i++)
+    for ( i = 0; i < (1UL << order); i += n )
     {
         p2m_access_t a;
         p2m_type_t t;
+        unsigned int cur_order;
 
-        (void)p2m->get_entry(p2m, gpfn + i, &t, &a, 0, NULL, NULL);
-
+        p2m->get_entry(p2m, gpfn + i, &t, &a, 0, &cur_order, NULL);
+        n = 1UL << min(order, cur_order);
         if ( t == p2m_populate_on_demand )
-            pod++;
+            pod += n;
         else
         {
-            nonpod++;
+            nonpod += n;
             if ( p2m_is_ram(t) )
-                ram++;
+                ram += n;
         }
     }
 
@@ -574,41 +576,46 @@ recount:
      * + There are PoD entries to handle, or
      * + There is ram left, and we want to steal it
      */
-    for ( i=0;
-          i<(1<<order) && (pod>0 || (steal_for_cache && ram > 0));
-          i++)
+    for ( i = 0;
+          i < (1UL << order) && (pod > 0 || (steal_for_cache && ram > 0));
+          i += n )
     {
         mfn_t mfn;
         p2m_type_t t;
         p2m_access_t a;
+        unsigned int cur_order;
 
-        mfn = p2m->get_entry(p2m, gpfn + i, &t, &a, 0, NULL, NULL);
+        mfn = p2m->get_entry(p2m, gpfn + i, &t, &a, 0, &cur_order, NULL);
+        if ( order < cur_order )
+            cur_order = order;
+        n = 1UL << cur_order;
         if ( t == p2m_populate_on_demand )
         {
-            p2m_set_entry(p2m, gpfn + i, _mfn(INVALID_MFN), 0, p2m_invalid,
-                          p2m->default_access);
-            p2m->pod.entry_count--;
+            p2m_set_entry(p2m, gpfn + i, _mfn(INVALID_MFN), cur_order,
+                          p2m_invalid, p2m->default_access);
+            p2m->pod.entry_count -= n;
             BUG_ON(p2m->pod.entry_count < 0);
-            pod--;
+            pod -= n;
         }
         else if ( steal_for_cache && p2m_is_ram(t) )
         {
             struct page_info *page;
+            unsigned int j;
 
             ASSERT(mfn_valid(mfn));
 
             page = mfn_to_page(mfn);
 
-            p2m_set_entry(p2m, gpfn + i, _mfn(INVALID_MFN), 0, p2m_invalid,
-                          p2m->default_access);
-            set_gpfn_from_mfn(mfn_x(mfn), INVALID_M2P_ENTRY);
-
-            p2m_pod_cache_add(p2m, page, 0);
+            p2m_set_entry(p2m, gpfn + i, _mfn(INVALID_MFN), cur_order,
+                          p2m_invalid, p2m->default_access);
+            for ( j = 0; j < n; ++j )
+                set_gpfn_from_mfn(mfn_x(mfn), INVALID_M2P_ENTRY);
+            p2m_pod_cache_add(p2m, page, cur_order);
 
             steal_for_cache =  ( p2m->pod.entry_count > p2m->pod.count );
 
-            nonpod--;
-            ram--;
+            nonpod -= n;
+            ram -= n;
         }
     }    
 
@@ -649,7 +656,8 @@ p2m_pod_zero_check_superpage(struct p2m_
     p2m_type_t type, type0 = 0;
     unsigned long * map = NULL;
     int ret=0, reset = 0;
-    int i, j;
+    unsigned long i, n;
+    unsigned int j;
     int max_ref = 1;
     struct domain *d = p2m->domain;
 
@@ -668,10 +676,13 @@ p2m_pod_zero_check_superpage(struct p2m_
 
     /* Look up the mfns, checking to make sure they're the same mfn
      * and aligned, and mapping them. */
-    for ( i=0; i<SUPERPAGE_PAGES; i++ )
+    for ( i = 0; i < SUPERPAGE_PAGES; i += n )
     {
         p2m_access_t a; 
-        mfn = p2m->get_entry(p2m, gfn + i, &type, &a, 0, NULL, NULL);
+        unsigned int cur_order;
+
+        mfn = p2m->get_entry(p2m, gfn + i, &type, &a, 0, &cur_order, NULL);
+        n = 1UL << min(cur_order, SUPERPAGE_ORDER + 0U);
 
         if ( i == 0 )
         {
@@ -1114,7 +1125,7 @@ guest_physmap_mark_populate_on_demand(st
                                       unsigned int order)
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
-    unsigned long i, pod_count = 0;
+    unsigned long i, n, pod_count = 0;
     p2m_type_t ot;
     mfn_t omfn;
     int rc = 0;
@@ -1127,10 +1138,13 @@ guest_physmap_mark_populate_on_demand(st
     P2M_DEBUG("mark pod gfn=%#lx\n", gfn);
 
     /* Make sure all gpfns are unused */
-    for ( i = 0; i < (1UL << order); i++ )
+    for ( i = 0; i < (1UL << order); i += n )
     {
         p2m_access_t a;
-        omfn = p2m->get_entry(p2m, gfn + i, &ot, &a, 0, NULL, NULL);
+        unsigned int cur_order;
+
+        omfn = p2m->get_entry(p2m, gfn + i, &ot, &a, 0, &cur_order, NULL);
+        n = 1UL << min(order, cur_order);
         if ( p2m_is_ram(ot) )
         {
             P2M_DEBUG("gfn_to_mfn returned type %d!\n", ot);
@@ -1140,7 +1154,7 @@ guest_physmap_mark_populate_on_demand(st
         else if ( ot == p2m_populate_on_demand )
         {
             /* Count how man PoD entries we'll be replacing if successful */
-            pod_count++;
+            pod_count += n;
         }
     }
 

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/4] x86/EPT: always return proper order value from ept_get_entry()
  2015-09-15  7:30   ` [PATCH 1/4] x86/EPT: always return proper order value from ept_get_entry() Jan Beulich
@ 2015-09-16  7:15     ` Tian, Kevin
  2015-09-17 16:13     ` Andrew Cooper
  1 sibling, 0 replies; 26+ messages in thread
From: Tian, Kevin @ 2015-09-16  7:15 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Keir Fraser, George Dunlap, Andrew Cooper, Nakajima, Jun, Chen,
	Tiejun, Malcolm Crossley

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Tuesday, September 15, 2015 3:31 PM
> 
> This is so that callers can determine what range of address space would
> get altered by a corresponding "set".
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>

Thanks
Kevin 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4 RFC] x86/p2m: use large pages for MMIO mappings
  2015-09-15  7:34   ` [PATCH 3/4 RFC] x86/p2m: use large pages for MMIO mappings Jan Beulich
@ 2015-09-16 10:02     ` Julien Grall
  2015-09-17 16:37     ` Andrew Cooper
  2015-09-29 11:33     ` Julien Grall
  2 siblings, 0 replies; 26+ messages in thread
From: Julien Grall @ 2015-09-16 10:02 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Wei Liu, Stefano Stabellini, George Dunlap, Andrew Cooper,
	Ian Jackson, Ian Campbell, Tiejun Chen, Malcolm Crossley,
	Keir Fraser

Hi Jan,

On 15/09/2015 08:34, Jan Beulich wrote:
> When mapping large BARs (e.g. the frame buffer of a graphics card) the
> overhead or establishing such mappings using onle 4k pages has,
> particularly after the XSA-125 fix, become unacceptable. Alter the
> XEN_DOMCTL_memory_mapping semantics once again, so that there's no
> longer a fixed amount of guest frames that represents the upper limit
> of what a single invocation can map. Instead bound execution time by
> limiting the number of iterations (regardless of page size).
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> RFC reasons:
> - ARM side unimplemented (and hence libxc for now made cope with both
>    models), the main issue (besides my inability to test any change
>    there) being the many internal uses of map_mmio_regions())

I will give a look to this bits.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/4] x86/EPT: always return proper order value from ept_get_entry()
  2015-09-15  7:30   ` [PATCH 1/4] x86/EPT: always return proper order value from ept_get_entry() Jan Beulich
  2015-09-16  7:15     ` Tian, Kevin
@ 2015-09-17 16:13     ` Andrew Cooper
  1 sibling, 0 replies; 26+ messages in thread
From: Andrew Cooper @ 2015-09-17 16:13 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Kevin Tian, Keir Fraser, George Dunlap, Jun Nakajima,
	Tiejun Chen, Malcolm Crossley

On 15/09/15 08:30, Jan Beulich wrote:
> This is so that callers can determine what range of address space would
> get altered by a corresponding "set".
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] x86/NPT: always return proper order value from p2m_pt_get_entry()
  2015-09-15  7:32   ` Jan Beulich
@ 2015-09-17 16:14     ` Andrew Cooper
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Cooper @ 2015-09-17 16:14 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: George Dunlap, Tiejun Chen, Malcolm Crossley, Keir Fraser

On 15/09/15 08:32, Jan Beulich wrote:
> This is so that callers can determine what range of address space would
> get altered by a corresponding "set".
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4 RFC] x86/p2m: use large pages for MMIO mappings
  2015-09-15  7:34   ` [PATCH 3/4 RFC] x86/p2m: use large pages for MMIO mappings Jan Beulich
  2015-09-16 10:02     ` Julien Grall
@ 2015-09-17 16:37     ` Andrew Cooper
  2015-09-17 17:59       ` Jan Beulich
  2015-09-22  8:32       ` Jan Beulich
  2015-09-29 11:33     ` Julien Grall
  2 siblings, 2 replies; 26+ messages in thread
From: Andrew Cooper @ 2015-09-17 16:37 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Wei Liu, Stefano Stabellini, George Dunlap, Ian Jackson,
	Ian Campbell, Tiejun Chen, Malcolm Crossley, Keir Fraser

On 15/09/15 08:34, Jan Beulich wrote:
> When mapping large BARs (e.g. the frame buffer of a graphics card) the
> overhead or establishing such mappings using onle 4k pages has,
> particularly after the XSA-125 fix, become unacceptable. Alter the
> XEN_DOMCTL_memory_mapping semantics once again, so that there's no
> longer a fixed amount of guest frames that represents the upper limit
> of what a single invocation can map. Instead bound execution time by
> limiting the number of iterations (regardless of page size).
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> RFC reasons:
> - ARM side unimplemented (and hence libxc for now made cope with both
>   models), the main issue (besides my inability to test any change
>   there) being the many internal uses of map_mmio_regions())
> - error unmapping in map_mmio_regions() and error propagation to caller
>   from unmap_mmio_regions() are not satisfactory (for the latter a
>   possible model might be to have the function - and hence the domctl -
>   return the [non-zero] number of completed entries upon error,
>   requiring the caller to re-invoke the hypercall to then obtain the
>   actual error for the failed slot)

Doesn't this mean the caller must always make two hypercalls to confirm
success?

> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -897,39 +897,47 @@ void p2m_change_type_range(struct domain
>  
>  /* Returns: 0 for success, -errno for failure */
>  static int set_typed_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
> -                               p2m_type_t gfn_p2mt, p2m_access_t access)
> +                               unsigned int order, p2m_type_t gfn_p2mt,
> +                               p2m_access_t access)
>  {
>      int rc = 0;
>      p2m_access_t a;
>      p2m_type_t ot;
>      mfn_t omfn;
> +    unsigned int cur_order = 0;
>      struct p2m_domain *p2m = p2m_get_hostp2m(d);
>  
>      if ( !paging_mode_translate(d) )
>          return -EIO;
>  
> -    gfn_lock(p2m, gfn, 0);
> -    omfn = p2m->get_entry(p2m, gfn, &ot, &a, 0, NULL, NULL);
> +    gfn_lock(p2m, gfn, order);
> +    omfn = p2m->get_entry(p2m, gfn, &ot, &a, 0, &cur_order, NULL);
> +    if ( cur_order < order )
> +    {
> +        gfn_unlock(p2m, gfn, order);
> +        return cur_order + 1;

This appears to change the error semantics, therefore warrents an update
to the function comment.

> +    }
>      if ( p2m_is_grant(ot) || p2m_is_foreign(ot) )
>      {
> -        gfn_unlock(p2m, gfn, 0);
> +        gfn_unlock(p2m, gfn, order);
>          domain_crash(d);
>          return -ENOENT;
>      }
>      else if ( p2m_is_ram(ot) )
>      {
> +        unsigned long i;
> +
>          ASSERT(mfn_valid(omfn));
> -        set_gpfn_from_mfn(mfn_x(omfn), INVALID_M2P_ENTRY);
> +        for ( i = 0; i < (1UL << order); ++i )
> +            set_gpfn_from_mfn(mfn_x(omfn) + i, INVALID_M2P_ENTRY);
>      }
>  
>      P2M_DEBUG("set %d %lx %lx\n", gfn_p2mt, gfn, mfn_x(mfn));
> -    rc = p2m_set_entry(p2m, gfn, mfn, PAGE_ORDER_4K, gfn_p2mt,
> -                       access);
> -    gfn_unlock(p2m, gfn, 0);
> +    rc = p2m_set_entry(p2m, gfn, mfn, order, gfn_p2mt, access);
> +    gfn_unlock(p2m, gfn, order);
>      if ( rc )
> -        gdprintk(XENLOG_ERR,
> -                 "p2m_set_entry failed! mfn=%08lx rc:%d\n",
> -                 mfn_x(get_gfn_query_unlocked(p2m->domain, gfn, &ot)), rc);
> +        gdprintk(XENLOG_ERR, "p2m_set_entry: %#lx:%u -> %d (%#lx)\n",

PRI_mfn

> @@ -2037,6 +2052,25 @@ unsigned long paging_gva_to_gfn(struct v
>      return hostmode->gva_to_gfn(v, hostp2m, va, pfec);- iommu_{,un}map_page() interfaces don't support "order" (hence
>   mmio_order() for now returns zero when !iommu_hap_pt_share, which in
>   particular means the AMD side isn't being take care of just yet)
>
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -2215,7 +2215,7 @@ int xc_domain_memory_mapping(
>  {
>      DECLARE_DOMCTL;
>      xc_dominfo_t info;
> -    int ret = 0, err;
> +    int ret = 0, rc;
>      unsigned long done = 0, nr, max_batch_sz;
>  
>      if ( xc_domain_getinfo(xch, domid, 1, &info) != 1 ||
> @@ -2240,19 +2240,24 @@ int xc_domain_memory_mapping(
>          domctl.u.memory_mapping.nr_mfns = nr;
>          domctl.u.memory_mapping.first_gfn = first_gfn + done;
>          domctl.u.memory_mapping.first_mfn = first_mfn + done;
> -        err = do_domctl(xch, &domctl);
> -        if ( err && errno == E2BIG )
> +        rc = do_domctl(xch, &domctl);
> +        if ( rc < 0 && errno == E2BIG )
>          {
>              if ( max_batch_sz <= 1 )
>                  break;
>              max_batch_sz >>= 1;
>              continue;
>          }
> +        if ( rc > 0 )
> +        {
> +            done += rc;
> +            continue;
> +        }
>          /* Save the first error... */
>          if ( !ret )
> -            ret = err;
> +            ret = rc;
>          /* .. and ignore the rest of them when removing. */
> -        if ( err && add_mapping != DPCI_REMOVE_MAPPING )
> +        if ( rc && add_mapping != DPCI_REMOVE_MAPPING )
>              break;
>  
>          done += nr;
> --- a/xen/arch/x86/domain_build.c
> +++ b/xen/arch/x86/domain_build.c
> @@ -436,7 +436,7 @@ static __init void pvh_add_mem_mapping(s
>          else
>              a = p2m_access_rwx;
>  
> -        if ( (rc = set_mmio_p2m_entry(d, gfn + i, _mfn(mfn + i), a)) )
> +        if ( (rc = set_mmio_p2m_entry(d, gfn + i, _mfn(mfn + i), 0, a)) )
>              panic("pvh_add_mem_mapping: gfn:%lx mfn:%lx i:%ld rc:%d\n",
>                    gfn, mfn, i, rc);
>          if ( !(i & 0xfffff) )
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -2396,7 +2396,8 @@ static int vmx_alloc_vlapic_mapping(stru
>      share_xen_page_with_guest(virt_to_page(apic_va), d, XENSHARE_writable);
>      d->arch.hvm_domain.vmx.apic_access_mfn = virt_to_mfn(apic_va);
>      set_mmio_p2m_entry(d, paddr_to_pfn(APIC_DEFAULT_PHYS_BASE),
> -        _mfn(virt_to_mfn(apic_va)), p2m_get_hostp2m(d)->default_access);
> +                       _mfn(virt_to_mfn(apic_va)), 0,
> +                       p2m_get_hostp2m(d)->default_access);
>  
>      return 0;
>  }
>
>  }
>  
> +static unsigned int mmio_order(const struct domain *d,
> +                               unsigned long start_fn, unsigned long nr)

Do you mean "start_gfn" ?

> +{
> +    if ( !hap_enabled(d) || !need_iommu(d) || !iommu_hap_pt_share ||
> +         (start_fn & ((1UL << PAGE_ORDER_2M) - 1)) || !(nr >> PAGE_ORDER_2M) )
> +        return 0;
> +
> +    if ( !(start_fn & ((1UL << PAGE_ORDER_1G) - 1)) && (nr >> PAGE_ORDER_1G) &&
> +         opt_hap_1gb && hvm_hap_has_1gb(d) )

opt_hap_1gb should be made to be redundant with hvm_hap_has_1gb() to
avoid all the double checks.  The only place where it is interesting for
them being different is in hvm_enable().

I will throw together a patch.

> +        return PAGE_ORDER_1G;
> +
> +    if ( opt_hap_2mb && hvm_hap_has_2mb(d) )
> +        return PAGE_ORDER_2M;
> +
> +    return 0;
> +}
> +
> +#define MAP_MMIO_MAX_ITER 64 /* pretty arbitrary */
> +
>  int map_mmio_regions(struct domain *d,
>                       unsigned long start_gfn,
>                       unsigned long nr,
> @@ -2044,22 +2078,45 @@ int map_mmio_regions(struct domain *d,
>  {
>      int ret = 0;
>      unsigned long i;
> +    unsigned int iter, order;
>  
>      if ( !paging_mode_translate(d) )
>          return 0;
>  
> -    for ( i = 0; !ret && i < nr; i++ )
> +    for ( iter = i = 0; i < nr && iter < MAP_MMIO_MAX_ITER;
> +          i += 1UL << order, ++iter )
>      {
> -        ret = set_mmio_p2m_entry(d, start_gfn + i, _mfn(mfn + i),
> -                                 p2m_get_hostp2m(d)->default_access);
> -        if ( ret )
> +        for ( order = mmio_order(d, (start_gfn + i) | (mfn + i), nr - i); ;
> +              order = ret - 1 )

It is hard to reason as to whether this loop will terminate.  All it
would take is a bug in set_mmio_p2m_entry() which causes it to
unilaterally return 1 and this loop would never terminate.

Is there any other condition which can be used as a safety check?

~Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4 RFC] x86/p2m: use large pages for MMIO mappings
  2015-09-17 16:37     ` Andrew Cooper
@ 2015-09-17 17:59       ` Jan Beulich
  2015-09-22  8:32       ` Jan Beulich
  1 sibling, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2015-09-17 17:59 UTC (permalink / raw)
  To: andrew.cooper3
  Cc: wei.liu2, stefano.stabellini, George.Dunlap, Ian.Jackson,
	Ian.Campbell, xen-devel, tiejun.chen, malcolm.crossley, keir

>>> Andrew Cooper <andrew.cooper3@citrix.com> 09/17/15 6:38 PM >>>
>On 15/09/15 08:34, Jan Beulich wrote:
>> - error unmapping in map_mmio_regions() and error propagation to caller
>>   from unmap_mmio_regions() are not satisfactory (for the latter a
>>   possible model might be to have the function - and hence the domctl -
>>   return the [non-zero] number of completed entries upon error,
>>   requiring the caller to re-invoke the hypercall to then obtain the
>>   actual error for the failed slot)
>
>Doesn't this mean the caller must always make two hypercalls to confirm
>success?

No - no-zero would still only be returned upon error, just that to obtain the
error code for the failure you need to make second call.

>> +static unsigned int mmio_order(const struct domain *d,
>> +                               unsigned long start_fn, unsigned long nr)
>
>Do you mean "start_gfn" ?

No, intentionally not (the caller passes the or of MFN and GFN).

>> @@ -2044,22 +2078,45 @@ int map_mmio_regions(struct domain *d,
>>  {
>>      int ret = 0;
>>      unsigned long i;
>> +    unsigned int iter, order;
>>  
>>      if ( !paging_mode_translate(d) )
>>          return 0;
>>  
>> -    for ( i = 0; !ret && i < nr; i++ )
>> +    for ( iter = i = 0; i < nr && iter < MAP_MMIO_MAX_ITER;
>> +          i += 1UL << order, ++iter )
>>      {
>> -        ret = set_mmio_p2m_entry(d, start_gfn + i, _mfn(mfn + i),
>> -                                 p2m_get_hostp2m(d)->default_access);
>> -        if ( ret )
>> +        for ( order = mmio_order(d, (start_gfn + i) | (mfn + i), nr - i); ;
>> +              order = ret - 1 )
>
>It is hard to reason as to whether this loop will terminate.  All it
>would take is a bug in set_mmio_p2m_entry() which causes it to
>unilaterally return 1 and this loop would never terminate.
>
>Is there any other condition which can be used as a safety check?

Not that I could see any.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4 RFC] x86/p2m: use large pages for MMIO mappings
  2015-09-17 16:37     ` Andrew Cooper
  2015-09-17 17:59       ` Jan Beulich
@ 2015-09-22  8:32       ` Jan Beulich
  1 sibling, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2015-09-22  8:32 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Stefano Stabellini, George Dunlap, Ian Jackson,
	IanCampbell, xen-devel, Tiejun Chen, Malcolm Crossley,
	Keir Fraser

>>> On 17.09.15 at 18:37, <andrew.cooper3@citrix.com> wrote:
> On 15/09/15 08:34, Jan Beulich wrote:
>> @@ -2044,22 +2078,45 @@ int map_mmio_regions(struct domain *d,
>>  {
>>      int ret = 0;
>>      unsigned long i;
>> +    unsigned int iter, order;
>>  
>>      if ( !paging_mode_translate(d) )
>>          return 0;
>>  
>> -    for ( i = 0; !ret && i < nr; i++ )
>> +    for ( iter = i = 0; i < nr && iter < MAP_MMIO_MAX_ITER;
>> +          i += 1UL << order, ++iter )
>>      {
>> -        ret = set_mmio_p2m_entry(d, start_gfn + i, _mfn(mfn + i),
>> -                                 p2m_get_hostp2m(d)->default_access);
>> -        if ( ret )
>> +        for ( order = mmio_order(d, (start_gfn + i) | (mfn + i), nr - i); ;
>> +              order = ret - 1 )
> 
> It is hard to reason as to whether this loop will terminate.  All it
> would take is a bug in set_mmio_p2m_entry() which causes it to
> unilaterally return 1 and this loop would never terminate.
> 
> Is there any other condition which can be used as a safety check?

I'm adding ASSERT()s as a minimal measure.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] x86/PoD: shorten certain operations on higher order ranges
  2015-09-15  7:37   ` [PATCH 4/4] x86/PoD: shorten certain operations on higher order ranges Jan Beulich
@ 2015-09-23 17:10     ` George Dunlap
  2015-09-23 17:16       ` George Dunlap
  2015-09-24  8:42       ` Jan Beulich
  0 siblings, 2 replies; 26+ messages in thread
From: George Dunlap @ 2015-09-23 17:10 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: George Dunlap, Andrew Cooper, Keir Fraser

On 09/15/2015 08:37 AM, Jan Beulich wrote:
> @@ -574,41 +576,46 @@ recount:
>       * + There are PoD entries to handle, or
>       * + There is ram left, and we want to steal it
>       */
> -    for ( i=0;
> -          i<(1<<order) && (pod>0 || (steal_for_cache && ram > 0));
> -          i++)
> +    for ( i = 0;
> +          i < (1UL << order) && (pod > 0 || (steal_for_cache && ram > 0));
> +          i += n )
>      {
>          mfn_t mfn;
>          p2m_type_t t;
>          p2m_access_t a;
> +        unsigned int cur_order;
>  
> -        mfn = p2m->get_entry(p2m, gpfn + i, &t, &a, 0, NULL, NULL);
> +        mfn = p2m->get_entry(p2m, gpfn + i, &t, &a, 0, &cur_order, NULL);
> +        if ( order < cur_order )
> +            cur_order = order;
> +        n = 1UL << cur_order;
>          if ( t == p2m_populate_on_demand )
>          {
> -            p2m_set_entry(p2m, gpfn + i, _mfn(INVALID_MFN), 0, p2m_invalid,
> -                          p2m->default_access);
> -            p2m->pod.entry_count--;
> +            p2m_set_entry(p2m, gpfn + i, _mfn(INVALID_MFN), cur_order,
> +                          p2m_invalid, p2m->default_access);
> +            p2m->pod.entry_count -= n;
>              BUG_ON(p2m->pod.entry_count < 0);
> -            pod--;
> +            pod -= n;
>          }
>          else if ( steal_for_cache && p2m_is_ram(t) )
>          {
>              struct page_info *page;
> +            unsigned int j;
>  
>              ASSERT(mfn_valid(mfn));
>  
>              page = mfn_to_page(mfn);
>  
> -            p2m_set_entry(p2m, gpfn + i, _mfn(INVALID_MFN), 0, p2m_invalid,
> -                          p2m->default_access);
> -            set_gpfn_from_mfn(mfn_x(mfn), INVALID_M2P_ENTRY);
> -
> -            p2m_pod_cache_add(p2m, page, 0);
> +            p2m_set_entry(p2m, gpfn + i, _mfn(INVALID_MFN), cur_order,
> +                          p2m_invalid, p2m->default_access);
> +            for ( j = 0; j < n; ++j )
> +                set_gpfn_from_mfn(mfn_x(mfn), INVALID_M2P_ENTRY);
> +            p2m_pod_cache_add(p2m, page, cur_order);
>  
>              steal_for_cache =  ( p2m->pod.entry_count > p2m->pod.count );

This code will now steal an entire 2MiB page even if we only need a few
individual pages, then free it below (calling p2m_pod_set_cache_target()).

Upon reflection, this is actually a feature -- if we have singleton
pages in the cache, we can free those first, hopefully keeping the 2MiB
page together.

It would be good to have a comment here to that effect, though; perhaps:

"If we need less than 1<<order, may end up stealing more memory here
than we actually need.  This will be rectified below, however; and
stealing too much and then freeing what we need may allow us to free
smaller pages from the cache, and avoid breaking up superpages."

It occurs to me that the steal_for_cache calculations are also wrong
here --it should be (p2m->pod.entry_count - pod > p2m->pod.count) --
i.e., we should steal if liabilities would be greater than assets
*after* this function completed, not before.

Otherwise, everything else looks good, thanks.

I assume you'll resubmit this when the patch it depends on leaves RFC?
In which case I'll wait until I see that version to Ack it.

 -George

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] x86/PoD: shorten certain operations on higher order ranges
  2015-09-23 17:10     ` George Dunlap
@ 2015-09-23 17:16       ` George Dunlap
  2015-09-24  8:42       ` Jan Beulich
  1 sibling, 0 replies; 26+ messages in thread
From: George Dunlap @ 2015-09-23 17:16 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Keir Fraser, Jan Beulich, Andrew Cooper

On Wed, Sep 23, 2015 at 6:10 PM, George Dunlap <george.dunlap@citrix.com> wrote:
> It occurs to me that the steal_for_cache calculations are also wrong
> here --it should be (p2m->pod.entry_count - pod > p2m->pod.count) --
> i.e., we should steal if liabilities would be greater than assets
> *after* this function completed, not before.

Sorry, meant to make it clear that this had nothing to do with you --
just thinking out loud.

(Though if you wanted to adjust that, it would certainly be appreciated.)

 -George

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] x86/PoD: shorten certain operations on higher order ranges
  2015-09-23 17:10     ` George Dunlap
  2015-09-23 17:16       ` George Dunlap
@ 2015-09-24  8:42       ` Jan Beulich
  1 sibling, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2015-09-24  8:42 UTC (permalink / raw)
  To: George Dunlap; +Cc: George Dunlap, Andrew Cooper, Keir Fraser, xen-devel

>>> On 23.09.15 at 19:10, <george.dunlap@citrix.com> wrote:
> On 09/15/2015 08:37 AM, Jan Beulich wrote:
>> @@ -574,41 +576,46 @@ recount:
>>       * + There are PoD entries to handle, or
>>       * + There is ram left, and we want to steal it
>>       */
>> -    for ( i=0;
>> -          i<(1<<order) && (pod>0 || (steal_for_cache && ram > 0));
>> -          i++)
>> +    for ( i = 0;
>> +          i < (1UL << order) && (pod > 0 || (steal_for_cache && ram > 0));
>> +          i += n )
>>      {
>>          mfn_t mfn;
>>          p2m_type_t t;
>>          p2m_access_t a;
>> +        unsigned int cur_order;
>>  
>> -        mfn = p2m->get_entry(p2m, gpfn + i, &t, &a, 0, NULL, NULL);
>> +        mfn = p2m->get_entry(p2m, gpfn + i, &t, &a, 0, &cur_order, NULL);
>> +        if ( order < cur_order )
>> +            cur_order = order;
>> +        n = 1UL << cur_order;
>>          if ( t == p2m_populate_on_demand )
>>          {
>> -            p2m_set_entry(p2m, gpfn + i, _mfn(INVALID_MFN), 0, p2m_invalid,
>> -                          p2m->default_access);
>> -            p2m->pod.entry_count--;
>> +            p2m_set_entry(p2m, gpfn + i, _mfn(INVALID_MFN), cur_order,
>> +                          p2m_invalid, p2m->default_access);
>> +            p2m->pod.entry_count -= n;
>>              BUG_ON(p2m->pod.entry_count < 0);
>> -            pod--;
>> +            pod -= n;
>>          }
>>          else if ( steal_for_cache && p2m_is_ram(t) )
>>          {
>>              struct page_info *page;
>> +            unsigned int j;
>>  
>>              ASSERT(mfn_valid(mfn));
>>  
>>              page = mfn_to_page(mfn);
>>  
>> -            p2m_set_entry(p2m, gpfn + i, _mfn(INVALID_MFN), 0, p2m_invalid,
>> -                          p2m->default_access);
>> -            set_gpfn_from_mfn(mfn_x(mfn), INVALID_M2P_ENTRY);
>> -
>> -            p2m_pod_cache_add(p2m, page, 0);
>> +            p2m_set_entry(p2m, gpfn + i, _mfn(INVALID_MFN), cur_order,
>> +                          p2m_invalid, p2m->default_access);
>> +            for ( j = 0; j < n; ++j )
>> +                set_gpfn_from_mfn(mfn_x(mfn), INVALID_M2P_ENTRY);
>> +            p2m_pod_cache_add(p2m, page, cur_order);
>>  
>>              steal_for_cache =  ( p2m->pod.entry_count > p2m->pod.count );
> 
> This code will now steal an entire 2MiB page even if we only need a few
> individual pages, then free it below (calling p2m_pod_set_cache_target()).
> 
> Upon reflection, this is actually a feature -- if we have singleton
> pages in the cache, we can free those first, hopefully keeping the 2MiB
> page together.
> 
> It would be good to have a comment here to that effect, though; perhaps:
> 
> "If we need less than 1<<order, may end up stealing more memory here
> than we actually need.  This will be rectified below, however; and
> stealing too much and then freeing what we need may allow us to free
> smaller pages from the cache, and avoid breaking up superpages."

Good point - I didn't really notice this change in behavior. Comment
added.

> It occurs to me that the steal_for_cache calculations are also wrong
> here --it should be (p2m->pod.entry_count - pod > p2m->pod.count) --
> i.e., we should steal if liabilities would be greater than assets
> *after* this function completed, not before.

In another patch (by you?) perhaps?

> Otherwise, everything else looks good, thanks.
> 
> I assume you'll resubmit this when the patch it depends on leaves RFC?
> In which case I'll wait until I see that version to Ack it.

The dependencies were patches 1 and 2 in this series, which
already went in. Patch 3 (the RFC one) was independent; in the
one here we just leverage what was needed as a prereq for that
other one.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4 RFC] x86/p2m: use large pages for MMIO mappings
  2015-09-15  7:34   ` [PATCH 3/4 RFC] x86/p2m: use large pages for MMIO mappings Jan Beulich
  2015-09-16 10:02     ` Julien Grall
  2015-09-17 16:37     ` Andrew Cooper
@ 2015-09-29 11:33     ` Julien Grall
  2015-09-29 11:44       ` Jan Beulich
  2 siblings, 1 reply; 26+ messages in thread
From: Julien Grall @ 2015-09-29 11:33 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Wei Liu, Stefano Stabellini, George Dunlap, Andrew Cooper,
	Ian Jackson, Ian Campbell, Tiejun Chen, Malcolm Crossley,
	Keir Fraser

Hi Jan,

Sorry I hadn't much time to look at it until now.

On 15/09/15 08:34, Jan Beulich wrote:
> When mapping large BARs (e.g. the frame buffer of a graphics card) the
> overhead or establishing such mappings using onle 4k pages has,
> particularly after the XSA-125 fix, become unacceptable. Alter the
> XEN_DOMCTL_memory_mapping semantics once again, so that there's no
> longer a fixed amount of guest frames that represents the upper limit
> of what a single invocation can map. Instead bound execution time by
> limiting the number of iterations (regardless of page size).
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> RFC reasons:
> - ARM side unimplemented (and hence libxc for now made cope with both
>   models), the main issue (besides my inability to test any change
>   there) being the many internal uses of map_mmio_regions())

map_mmio_regions is used in ARM to map all the device memory in a guest.
We expect this function to map everything at once when called during
DOM0 build and/or when a guest is created (used to map the interrupt
controller).

I would rather prefer to avoid introducing specific helpers with
slightly different behavior (i.e one is only mapping N page, the other
everything).

What about extending map_mmio_regions to take a parameter telling if we
want to limit the number of mapping in a single invocation?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4 RFC] x86/p2m: use large pages for MMIO mappings
  2015-09-29 11:33     ` Julien Grall
@ 2015-09-29 11:44       ` Jan Beulich
  2015-09-29 12:16         ` Julien Grall
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2015-09-29 11:44 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Liu, Stefano Stabellini, George Dunlap, Andrew Cooper,
	Ian Jackson, Ian Campbell, xen-devel, Tiejun Chen,
	MalcolmCrossley, Keir Fraser

>>> On 29.09.15 at 13:33, <julien.grall@citrix.com> wrote:
> On 15/09/15 08:34, Jan Beulich wrote:
>> RFC reasons:
>> - ARM side unimplemented (and hence libxc for now made cope with both
>>   models), the main issue (besides my inability to test any change
>>   there) being the many internal uses of map_mmio_regions())
> 
> map_mmio_regions is used in ARM to map all the device memory in a guest.
> We expect this function to map everything at once when called during
> DOM0 build and/or when a guest is created (used to map the interrupt
> controller).
> 
> I would rather prefer to avoid introducing specific helpers with
> slightly different behavior (i.e one is only mapping N page, the other
> everything).
> 
> What about extending map_mmio_regions to take a parameter telling if we
> want to limit the number of mapping in a single invocation?

Sure an option, albeit something that would be sufficient to be
done in ARM specific code, albeit the only user using variable
length is map_range_to_domain(). All the others, using fixed
lengths up to 32 pages, would implicitly get everything done at
once as long as the threshold is >= 32.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4 RFC] x86/p2m: use large pages for MMIO mappings
  2015-09-29 11:44       ` Jan Beulich
@ 2015-09-29 12:16         ` Julien Grall
  2015-09-29 12:46           ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Julien Grall @ 2015-09-29 12:16 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Stefano Stabellini, George Dunlap, Andrew Cooper,
	Ian Jackson, Ian Campbell, xen-devel, Tiejun Chen,
	MalcolmCrossley, Keir Fraser

On 29/09/15 12:44, Jan Beulich wrote:
>>>> On 29.09.15 at 13:33, <julien.grall@citrix.com> wrote:
>> On 15/09/15 08:34, Jan Beulich wrote:
>>> RFC reasons:
>>> - ARM side unimplemented (and hence libxc for now made cope with both
>>>   models), the main issue (besides my inability to test any change
>>>   there) being the many internal uses of map_mmio_regions())
>>
>> map_mmio_regions is used in ARM to map all the device memory in a guest.
>> We expect this function to map everything at once when called during
>> DOM0 build and/or when a guest is created (used to map the interrupt
>> controller).
>>
>> I would rather prefer to avoid introducing specific helpers with
>> slightly different behavior (i.e one is only mapping N page, the other
>> everything).
>>
>> What about extending map_mmio_regions to take a parameter telling if we
>> want to limit the number of mapping in a single invocation?
> 
> Sure an option, albeit something that would be sufficient to be
> done in ARM specific code, albeit the only user using variable
> length is map_range_to_domain(). All the others, using fixed
> lengths up to 32 pages, would implicitly get everything done at
> once as long as the threshold is >= 32.

While this is the case today, we have different patch series coming up
using variable lenght in different place within the ARM code (vGIC,
ACPI...).

It won't be possible to use map_range_to_domain because it's very
specific to build DOM0.

So, I would extend map_mmio_region like that

map_mmio_regions(struct domain *d,
		 unsigned long start_gfn,
		 unsigned long nr,
		 unsigned long mfn,
		 unsigned long limit);

The limit parameter would be 0 if there is no limit otherwise the
maximum of iteration.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4 RFC] x86/p2m: use large pages for MMIO mappings
  2015-09-29 12:16         ` Julien Grall
@ 2015-09-29 12:46           ` Jan Beulich
  2015-09-29 12:52             ` Julien Grall
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2015-09-29 12:46 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Liu, Stefano Stabellini, George Dunlap, Andrew Cooper,
	Ian Jackson, Ian Campbell, xen-devel, Tiejun Chen,
	MalcolmCrossley, Keir Fraser

>>> On 29.09.15 at 14:16, <julien.grall@citrix.com> wrote:
> On 29/09/15 12:44, Jan Beulich wrote:
>>>>> On 29.09.15 at 13:33, <julien.grall@citrix.com> wrote:
>>> On 15/09/15 08:34, Jan Beulich wrote:
>>>> RFC reasons:
>>>> - ARM side unimplemented (and hence libxc for now made cope with both
>>>>   models), the main issue (besides my inability to test any change
>>>>   there) being the many internal uses of map_mmio_regions())
>>>
>>> map_mmio_regions is used in ARM to map all the device memory in a guest.
>>> We expect this function to map everything at once when called during
>>> DOM0 build and/or when a guest is created (used to map the interrupt
>>> controller).
>>>
>>> I would rather prefer to avoid introducing specific helpers with
>>> slightly different behavior (i.e one is only mapping N page, the other
>>> everything).
>>>
>>> What about extending map_mmio_regions to take a parameter telling if we
>>> want to limit the number of mapping in a single invocation?
>> 
>> Sure an option, albeit something that would be sufficient to be
>> done in ARM specific code, albeit the only user using variable
>> length is map_range_to_domain(). All the others, using fixed
>> lengths up to 32 pages, would implicitly get everything done at
>> once as long as the threshold is >= 32.
> 
> While this is the case today, we have different patch series coming up
> using variable lenght in different place within the ARM code (vGIC,
> ACPI...).

Okay.

> It won't be possible to use map_range_to_domain because it's very
> specific to build DOM0.

Sure; I didn't even think of suggesting that.

> So, I would extend map_mmio_region like that
> 
> map_mmio_regions(struct domain *d,
> 		 unsigned long start_gfn,
> 		 unsigned long nr,
> 		 unsigned long mfn,
> 		 unsigned long limit);
> 
> The limit parameter would be 0 if there is no limit otherwise the
> maximum of iteration.

Again, make map_mmio_regions() a wrapper around an ARM-specific
function with the extra argument. No need to alter common or x86
code.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4 RFC] x86/p2m: use large pages for MMIO mappings
  2015-09-29 12:46           ` Jan Beulich
@ 2015-09-29 12:52             ` Julien Grall
  2015-09-29 13:00               ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Julien Grall @ 2015-09-29 12:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Stefano Stabellini, George Dunlap, Andrew Cooper,
	Ian Jackson, Ian Campbell, xen-devel, Tiejun Chen,
	MalcolmCrossley, Keir Fraser

On 29/09/15 13:46, Jan Beulich wrote:
>>>> On 29.09.15 at 14:16, <julien.grall@citrix.com> wrote:
>> On 29/09/15 12:44, Jan Beulich wrote:
>>>>>> On 29.09.15 at 13:33, <julien.grall@citrix.com> wrote:
>>>> On 15/09/15 08:34, Jan Beulich wrote:
>>>>> RFC reasons:
>>>>> - ARM side unimplemented (and hence libxc for now made cope with both
>>>>>   models), the main issue (besides my inability to test any change
>>>>>   there) being the many internal uses of map_mmio_regions())
>>>>
>>>> map_mmio_regions is used in ARM to map all the device memory in a guest.
>>>> We expect this function to map everything at once when called during
>>>> DOM0 build and/or when a guest is created (used to map the interrupt
>>>> controller).
>>>>
>>>> I would rather prefer to avoid introducing specific helpers with
>>>> slightly different behavior (i.e one is only mapping N page, the other
>>>> everything).
>>>>
>>>> What about extending map_mmio_regions to take a parameter telling if we
>>>> want to limit the number of mapping in a single invocation?
>>>
>>> Sure an option, albeit something that would be sufficient to be
>>> done in ARM specific code, albeit the only user using variable
>>> length is map_range_to_domain(). All the others, using fixed
>>> lengths up to 32 pages, would implicitly get everything done at
>>> once as long as the threshold is >= 32.
>>
>> While this is the case today, we have different patch series coming up
>> using variable lenght in different place within the ARM code (vGIC,
>> ACPI...).
> 
> Okay.
> 
>> It won't be possible to use map_range_to_domain because it's very
>> specific to build DOM0.
> 
> Sure; I didn't even think of suggesting that.
> 
>> So, I would extend map_mmio_region like that
>>
>> map_mmio_regions(struct domain *d,
>> 		 unsigned long start_gfn,
>> 		 unsigned long nr,
>> 		 unsigned long mfn,
>> 		 unsigned long limit);
>>
>> The limit parameter would be 0 if there is no limit otherwise the
>> maximum of iteration.
> 
> Again, make map_mmio_regions() a wrapper around an ARM-specific
> function with the extra argument. No need to alter common or x86
> code.

TBH, extending the mapp_mmio_region is the best solution.

The name map_mmio_region is very generic and there is no reason we can't
use it in the hypervisor. Adding yet another wrapper will confuse people
and it will be hard for both the reviewer and the developer to know
which one to use.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4 RFC] x86/p2m: use large pages for MMIO mappings
  2015-09-29 12:52             ` Julien Grall
@ 2015-09-29 13:00               ` Jan Beulich
  2015-09-29 13:06                 ` Julien Grall
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2015-09-29 13:00 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Liu, Stefano Stabellini, George Dunlap, Andrew Cooper,
	Ian Jackson, Ian Campbell, xen-devel, Tiejun Chen,
	MalcolmCrossley, Keir Fraser

>>> On 29.09.15 at 14:52, <julien.grall@citrix.com> wrote:
> On 29/09/15 13:46, Jan Beulich wrote:
>> Again, make map_mmio_regions() a wrapper around an ARM-specific
>> function with the extra argument. No need to alter common or x86
>> code.
> 
> TBH, extending the mapp_mmio_region is the best solution.
> 
> The name map_mmio_region is very generic and there is no reason we can't
> use it in the hypervisor. Adding yet another wrapper will confuse people
> and it will be hard for both the reviewer and the developer to know
> which one to use.

I disagree - the function was originally meant to exclusively support
the respective domctl. The fact that ARM gained (many) more uses
should not impact common code or x86.

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4 RFC] x86/p2m: use large pages for MMIO mappings
  2015-09-29 13:00               ` Jan Beulich
@ 2015-09-29 13:06                 ` Julien Grall
  2015-09-29 13:27                   ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Julien Grall @ 2015-09-29 13:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Stefano Stabellini, George Dunlap, Andrew Cooper,
	Ian Jackson, Ian Campbell, xen-devel, Tiejun Chen,
	MalcolmCrossley, Keir Fraser

On 29/09/15 14:00, Jan Beulich wrote:
>>>> On 29.09.15 at 14:52, <julien.grall@citrix.com> wrote:
>> On 29/09/15 13:46, Jan Beulich wrote:
>>> Again, make map_mmio_regions() a wrapper around an ARM-specific
>>> function with the extra argument. No need to alter common or x86
>>> code.
>>
>> TBH, extending the mapp_mmio_region is the best solution.
>>
>> The name map_mmio_region is very generic and there is no reason we can't
>> use it in the hypervisor. Adding yet another wrapper will confuse people
>> and it will be hard for both the reviewer and the developer to know
>> which one to use.
> 
> I disagree - the function was originally meant to exclusively support
> the respective domctl. The fact that ARM gained (many) more uses
> should not impact common code or x86.

The expectation you described is neither documented nor explicit from
the name...

As the interface is not set in stone, we could decide to extend the
usage of the function to make a coherent interface and not adding new
wrapper because we don't want to touch x86...

Anyway, I'm not a maintainer so I will let Ian and Stefano decide what's
best.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4 RFC] x86/p2m: use large pages for MMIO mappings
  2015-09-29 13:06                 ` Julien Grall
@ 2015-09-29 13:27                   ` Jan Beulich
  2015-09-30 10:15                     ` Julien Grall
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2015-09-29 13:27 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Liu, Stefano Stabellini, George Dunlap, Andrew Cooper,
	IanJackson, Ian Campbell, xen-devel, Tiejun Chen,
	MalcolmCrossley, Keir Fraser

>>> On 29.09.15 at 15:06, <julien.grall@citrix.com> wrote:
> On 29/09/15 14:00, Jan Beulich wrote:
>>>>> On 29.09.15 at 14:52, <julien.grall@citrix.com> wrote:
>>> On 29/09/15 13:46, Jan Beulich wrote:
>>>> Again, make map_mmio_regions() a wrapper around an ARM-specific
>>>> function with the extra argument. No need to alter common or x86
>>>> code.
>>>
>>> TBH, extending the mapp_mmio_region is the best solution.
>>>
>>> The name map_mmio_region is very generic and there is no reason we can't
>>> use it in the hypervisor. Adding yet another wrapper will confuse people
>>> and it will be hard for both the reviewer and the developer to know
>>> which one to use.
>> 
>> I disagree - the function was originally meant to exclusively support
>> the respective domctl. The fact that ARM gained (many) more uses
>> should not impact common code or x86.
> 
> The expectation you described is neither documented nor explicit from
> the name...

>From history only, agreed.

> As the interface is not set in stone, we could decide to extend the
> usage of the function to make a coherent interface and not adding new
> wrapper because we don't want to touch x86...

One more thing - what meaning would you expect the new parameter
to carry? Number of frames mapped? Number of iterations done? In
the former case, the goal aimed at here won't be achievable. In the
latter case, would you mean to pass -1UL for it for the ARM callers
wanting the operation to run to completion?

Jan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4 RFC] x86/p2m: use large pages for MMIO mappings
  2015-09-29 13:27                   ` Jan Beulich
@ 2015-09-30 10:15                     ` Julien Grall
  0 siblings, 0 replies; 26+ messages in thread
From: Julien Grall @ 2015-09-30 10:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Stefano Stabellini, George Dunlap, Andrew Cooper,
	IanJackson, Ian Campbell, Tiejun Chen, xen-devel,
	MalcolmCrossley, Keir Fraser

Hi Jan,

On 29/09/15 14:27, Jan Beulich wrote:
>>>> On 29.09.15 at 15:06, <julien.grall@citrix.com> wrote:
>> On 29/09/15 14:00, Jan Beulich wrote:
>>>>>> On 29.09.15 at 14:52, <julien.grall@citrix.com> wrote:
>>>> On 29/09/15 13:46, Jan Beulich wrote:
>>>>> Again, make map_mmio_regions() a wrapper around an ARM-specific
>>>>> function with the extra argument. No need to alter common or x86
>>>>> code.
>>>>
>>>> TBH, extending the mapp_mmio_region is the best solution.
>>>>
>>>> The name map_mmio_region is very generic and there is no reason we can't
>>>> use it in the hypervisor. Adding yet another wrapper will confuse people
>>>> and it will be hard for both the reviewer and the developer to know
>>>> which one to use.
>>>
>>> I disagree - the function was originally meant to exclusively support
>>> the respective domctl. The fact that ARM gained (many) more uses
>>> should not impact common code or x86.
>>
>> The expectation you described is neither documented nor explicit from
>> the name...
> 
> From history only, agreed.
> 
>> As the interface is not set in stone, we could decide to extend the
>> usage of the function to make a coherent interface and not adding new
>> wrapper because we don't want to touch x86...
> 
> One more thing - what meaning would you expect the new parameter
> to carry? Number of frames mapped? Number of iterations done? In
> the former case, the goal aimed at here won't be achievable. In the
> latter case, would you mean to pass -1UL for it for the ARM callers
> wanting the operation to run to completion?

None of the both. I was thinking to add a boolean which indicate if the
function is preemptible or not. A bit like p2m_pod_set_cache_target.

Although, if I had to choose between your two suggestions, I would lean
toward the later.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-09-30 10:16 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <55F70C9A02000078000A2A58@prv-mh.provo.novell.com>
2015-09-15  7:13 ` [PATCH 0/4] x86/p2m: use large pages for MMIO mappings Jan Beulich
2015-09-15  7:30   ` [PATCH 1/4] x86/EPT: always return proper order value from ept_get_entry() Jan Beulich
2015-09-16  7:15     ` Tian, Kevin
2015-09-17 16:13     ` Andrew Cooper
2015-09-15  7:31   ` [PATCH 2/4] x86/NPT: always return proper order value from p2m_pt_get_entry() Jan Beulich
2015-09-15  7:35     ` Jan Beulich
2015-09-15  7:32   ` Jan Beulich
2015-09-17 16:14     ` Andrew Cooper
2015-09-15  7:34   ` [PATCH 3/4 RFC] x86/p2m: use large pages for MMIO mappings Jan Beulich
2015-09-16 10:02     ` Julien Grall
2015-09-17 16:37     ` Andrew Cooper
2015-09-17 17:59       ` Jan Beulich
2015-09-22  8:32       ` Jan Beulich
2015-09-29 11:33     ` Julien Grall
2015-09-29 11:44       ` Jan Beulich
2015-09-29 12:16         ` Julien Grall
2015-09-29 12:46           ` Jan Beulich
2015-09-29 12:52             ` Julien Grall
2015-09-29 13:00               ` Jan Beulich
2015-09-29 13:06                 ` Julien Grall
2015-09-29 13:27                   ` Jan Beulich
2015-09-30 10:15                     ` Julien Grall
2015-09-15  7:37   ` [PATCH 4/4] x86/PoD: shorten certain operations on higher order ranges Jan Beulich
2015-09-23 17:10     ` George Dunlap
2015-09-23 17:16       ` George Dunlap
2015-09-24  8:42       ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).