[RFC Patch v4 0/8] Extend resources to support more vcpus in single VM

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC Patch v4 0/8] Extend resources to support more vcpus in single VM
@ 2017-12-06  7:50 Chao Gao
  2017-12-06  7:50 ` [RFC Patch v4 1/8] ioreq: remove most 'buf' parameter from static functions Chao Gao
                   ` (7 more replies)
  0 siblings, 8 replies; 56+ messages in thread
From: Chao Gao @ 2017-12-06  7:50 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Paul Durrant, Jan Beulich, Chao Gao

This series is based on Paul Durrant's "x86: guest resource mapping"
(https://lists.xenproject.org/archives/html/xen-devel/2017-11/msg01735.html)
and "add vIOMMU support with irq remapping	function of virtual VT-d"
(https://lists.xenproject.org/archives/html/xen-devel/2017-11/msg01063.html).

In order to support more vcpus in hvm, this series is to remove VCPU number
constraint imposed by several components:
1. IOREQ server: current only one IOREQ page is used, which limits
   the maximum number of vcpus to 128.
2. libacpi: no x2apic entry is built in MADT and SRAT
3. Size of pre-allocated shadow memory
4. The way how we boot up APs.

This series is RFC for
1. I am not sure whether changes in patch 2 are acceptable. 
2. It depends on our VIOMMU patches which are still under review.

Change since v3:
	- Respond Wei and Roger's comments.
	- Support multiple IOREQ pages. Seeing patch 1 and 2.
	- boot APs through broadcast. Seeing patch 4.
	- unify the computation of lapic_id.
	- Add x2apic entry in SRAT.
	- Increase shadow memory according to the maximum vcpus of HVM.

Change since v2:
    1) Increase page pool size during setting max vcpu
    2) Allocate madt table size according APIC id of each vcpus
    3) Fix some code style issues.

Change since v1:
    1) Increase hap page pool according vcpu number
    2) Use "Processor" syntax to define vcpus with APIC id < 255
in dsdt and use "Device" syntax for other vcpus in ACPI DSDT table.
    3) Use XAPIC structure for vcpus with APIC id < 255
in dsdt and use x2APIC structure for other vcpus in the ACPI MADT table.

This patchset is to extend some resources(i.e, event channel,
hap and so) to support more vcpus for single VM.

Chao Gao (6):
  ioreq: remove most 'buf' parameter from static functions
  ioreq: bump the number of IOREQ page to 4 pages
  xl/acpi: unify the computation of lapic_id
  hvmloader: boot cpu through broadcast
  x86/hvm: bump the number of pages of shadow memory
  x86/hvm: bump the maximum number of vcpus to 512

Lan Tianyu (2):
  Tool/ACPI: DSDT extension to support more vcpus
  hvmload: Add x2apic entry support in the MADT and SRAT build

 tools/firmware/hvmloader/apic_regs.h    |   4 +
 tools/firmware/hvmloader/config.h       |   3 +-
 tools/firmware/hvmloader/smp.c          |  64 ++++++++++++--
 tools/libacpi/acpi2_0.h                 |  25 +++++-
 tools/libacpi/build.c                   |  57 +++++++++---
 tools/libacpi/libacpi.h                 |   9 ++
 tools/libacpi/mk_dsdt.c                 |  40 +++++++--
 tools/libxc/include/xc_dom.h            |   2 +-
 tools/libxc/xc_dom_x86.c                |   6 +-
 tools/libxl/libxl_x86_acpi.c            |   2 +-
 xen/arch/x86/hvm/hvm.c                  |   1 +
 xen/arch/x86/hvm/ioreq.c                | 150 ++++++++++++++++++++++----------
 xen/arch/x86/mm/hap/hap.c               |   2 +-
 xen/arch/x86/mm/shadow/common.c         |   2 +-
 xen/include/asm-x86/hvm/domain.h        |   6 +-
 xen/include/public/hvm/hvm_info_table.h |   2 +-
 xen/include/public/hvm/ioreq.h          |   2 +
 xen/include/public/hvm/params.h         |   8 +-
 18 files changed, 303 insertions(+), 82 deletions(-)

-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC Patch v4 1/8] ioreq: remove most 'buf' parameter from static functions
  2017-12-06  7:50 [RFC Patch v4 0/8] Extend resources to support more vcpus in single VM Chao Gao
@ 2017-12-06  7:50 ` Chao Gao
  2017-12-06 14:44   ` Paul Durrant
  2017-12-06  7:50 ` [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages Chao Gao
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 56+ messages in thread
From: Chao Gao @ 2017-12-06  7:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Jan Beulich, Chao Gao

It is a preparation to support multiple IOREQ pages.
No functional change.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v4:
 -new
---
 xen/arch/x86/hvm/ioreq.c | 48 +++++++++++++++++++++++-------------------------
 1 file changed, 23 insertions(+), 25 deletions(-)

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index d991ac9..a879f20 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -237,10 +237,9 @@ static void hvm_free_ioreq_gfn(struct hvm_ioreq_server *s, gfn_t gfn)
     set_bit(i, &d->arch.hvm_domain.ioreq_gfn.mask);
 }
 
-static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s,
+                                struct hvm_ioreq_page *iorp)
 {
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-
     if ( gfn_eq(iorp->gfn, INVALID_GFN) )
         return;
 
@@ -289,15 +288,15 @@ static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
                                  &iorp->va);
 
     if ( rc )
-        hvm_unmap_ioreq_gfn(s, buf);
+        hvm_unmap_ioreq_gfn(s, iorp);
 
     return rc;
 }
 
-static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s,
+                               struct hvm_ioreq_page *iorp)
 {
     struct domain *currd = current->domain;
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
 
     if ( iorp->page )
     {
@@ -344,10 +343,9 @@ static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
     return 0;
 }
 
-static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s,
+                               struct hvm_ioreq_page *iorp)
 {
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-
     if ( !iorp->page )
         return;
 
@@ -380,11 +378,11 @@ bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
     return found;
 }
 
-static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s,
+                                 struct hvm_ioreq_page *iorp)
 
 {
     struct domain *d = s->domain;
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
 
     if ( IS_DEFAULT(s) || gfn_eq(iorp->gfn, INVALID_GFN) )
         return;
@@ -395,10 +393,10 @@ static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
     clear_page(iorp->va);
 }
 
-static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s,
+                             struct hvm_ioreq_page *iorp)
 {
     struct domain *d = s->domain;
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
     int rc;
 
     if ( IS_DEFAULT(s) || gfn_eq(iorp->gfn, INVALID_GFN) )
@@ -550,36 +548,36 @@ static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s)
         rc = hvm_map_ioreq_gfn(s, true);
 
     if ( rc )
-        hvm_unmap_ioreq_gfn(s, false);
+        hvm_unmap_ioreq_gfn(s, &s->ioreq);
 
     return rc;
 }
 
 static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
 {
-    hvm_unmap_ioreq_gfn(s, true);
-    hvm_unmap_ioreq_gfn(s, false);
+    hvm_unmap_ioreq_gfn(s, &s->ioreq);
+    hvm_unmap_ioreq_gfn(s, &s->bufioreq);
 }
 
 static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s)
 {
     int rc;
 
-    rc = hvm_alloc_ioreq_mfn(s, false);
+    rc = hvm_alloc_ioreq_mfn(s, &s->ioreq);
 
     if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) )
-        rc = hvm_alloc_ioreq_mfn(s, true);
+        rc = hvm_alloc_ioreq_mfn(s, &s->bufioreq);
 
     if ( rc )
-        hvm_free_ioreq_mfn(s, false);
+        hvm_free_ioreq_mfn(s, &s->ioreq);
 
     return rc;
 }
 
 static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s)
 {
-    hvm_free_ioreq_mfn(s, true);
-    hvm_free_ioreq_mfn(s, false);
+    hvm_free_ioreq_mfn(s, &s->bufioreq);
+    hvm_free_ioreq_mfn(s, &s->ioreq);
 }
 
 static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s)
@@ -646,8 +644,8 @@ static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s)
     if ( s->enabled )
         goto done;
 
-    hvm_remove_ioreq_gfn(s, false);
-    hvm_remove_ioreq_gfn(s, true);
+    hvm_remove_ioreq_gfn(s, &s->ioreq);
+    hvm_remove_ioreq_gfn(s, &s->bufioreq);
 
     s->enabled = true;
 
@@ -667,8 +665,8 @@ static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s)
     if ( !s->enabled )
         goto done;
 
-    hvm_add_ioreq_gfn(s, true);
-    hvm_add_ioreq_gfn(s, false);
+    hvm_add_ioreq_gfn(s, &s->bufioreq);
+    hvm_add_ioreq_gfn(s, &s->ioreq);
 
     s->enabled = false;
 
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages
  2017-12-06  7:50 [RFC Patch v4 0/8] Extend resources to support more vcpus in single VM Chao Gao
  2017-12-06  7:50 ` [RFC Patch v4 1/8] ioreq: remove most 'buf' parameter from static functions Chao Gao
@ 2017-12-06  7:50 ` Chao Gao
  2017-12-06 15:04   ` Paul Durrant
  2018-04-18  8:19   ` Jan Beulich
  2017-12-06  7:50 ` [RFC Patch v4 3/8] xl/acpi: unify the computation of lapic_id Chao Gao
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 56+ messages in thread
From: Chao Gao @ 2017-12-06  7:50 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Tim Deegan, Paul Durrant, Jan Beulich, Ian Jackson, Chao Gao

One 4K-byte page at most contains 128 'ioreq_t'. In order to remove the vcpu
number constraint imposed by one IOREQ page, bump the number of IOREQ page to
4 pages. With this patch, multiple pages can be used as IOREQ page.

Basically, this patch extends 'ioreq' field in struct hvm_ioreq_server to an
array. All accesses to 'ioreq' field such as 's->ioreq' are replaced with
FOR_EACH_IOREQ_PAGE macro.

In order to access an IOREQ page, QEMU should get the gmfn and map this gmfn
to its virtual address space. Now there are several pages, to be compatible
with previous QEMU, the interface to get the gmfn doesn't change. But newer
QEMU needs to get the gmfn repeatly until a same gmfn is found. To implement
this, an internal index is introduced: when QEMU queries the gmfn, the gmfn of
IOREQ page referenced by the index is returned.  After each operation, the
index increases by 1 and rewinds when it overflows.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v4:
 - new
---
 tools/libxc/include/xc_dom.h     |   2 +-
 tools/libxc/xc_dom_x86.c         |   6 +-
 xen/arch/x86/hvm/hvm.c           |   1 +
 xen/arch/x86/hvm/ioreq.c         | 116 ++++++++++++++++++++++++++++++---------
 xen/include/asm-x86/hvm/domain.h |   6 +-
 xen/include/public/hvm/ioreq.h   |   2 +
 xen/include/public/hvm/params.h  |   8 ++-
 7 files changed, 110 insertions(+), 31 deletions(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 45c9d67..2f8b412 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -20,7 +20,7 @@
 #include <xenguest.h>
 
 #define INVALID_PFN ((xen_pfn_t)-1)
-#define X86_HVM_NR_SPECIAL_PAGES    8
+#define X86_HVM_NR_SPECIAL_PAGES    11
 #define X86_HVM_END_SPECIAL_REGION  0xff000u
 
 /* --- typedefs and structs ---------------------------------------- */
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index bff68a0..b316ebc 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -32,6 +32,7 @@
 #include <xen/foreign/x86_32.h>
 #include <xen/foreign/x86_64.h>
 #include <xen/hvm/hvm_info_table.h>
+#include <xen/hvm/ioreq.h>
 #include <xen/arch-x86/hvm/start_info.h>
 #include <xen/io/protocols.h>
 
@@ -57,8 +58,8 @@
 #define SPECIALPAGE_BUFIOREQ 3
 #define SPECIALPAGE_XENSTORE 4
 #define SPECIALPAGE_IOREQ    5
-#define SPECIALPAGE_IDENT_PT 6
-#define SPECIALPAGE_CONSOLE  7
+#define SPECIALPAGE_IDENT_PT (5 + MAX_IOREQ_PAGE)
+#define SPECIALPAGE_CONSOLE  (SPECIALPAGE_IDENT_PT + 1)
 #define special_pfn(x) \
     (X86_HVM_END_SPECIAL_REGION - X86_HVM_NR_SPECIAL_PAGES + (x))
 
@@ -612,6 +613,7 @@ static int alloc_magic_pages_hvm(struct xc_dom_image *dom)
                                X86_HVM_NR_SPECIAL_PAGES) )
             goto error_out;
 
+    xc_hvm_param_set(xch, domid, HVM_PARAM_IOREQ_PAGES, MAX_IOREQ_PAGE);
     xc_hvm_param_set(xch, domid, HVM_PARAM_STORE_PFN,
                      special_pfn(SPECIALPAGE_XENSTORE));
     xc_hvm_param_set(xch, domid, HVM_PARAM_BUFIOREQ_PFN,
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5d06767..0b3bd04 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4077,6 +4077,7 @@ static int hvm_allow_set_param(struct domain *d,
     case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
     case HVM_PARAM_ALTP2M:
     case HVM_PARAM_MCA_CAP:
+    case HVM_PARAM_IOREQ_PAGES:
         if ( value != 0 && a->value != value )
             rc = -EEXIST;
         break;
diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index a879f20..0a36001 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -64,14 +64,24 @@ static struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
             continue; \
         else
 
+/* Iterate over all ioreq pages */
+#define FOR_EACH_IOREQ_PAGE(s, i, iorp) \
+    for ( (i) = 0, iorp = s->ioreq; (i) < (s)->ioreq_page_nr; (i)++, iorp++ )
+
 static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
 {
-    shared_iopage_t *p = s->ioreq.va;
+    shared_iopage_t *p = s->ioreq[v->vcpu_id / IOREQ_NUM_PER_PAGE].va;
 
     ASSERT((v == current) || !vcpu_runnable(v));
     ASSERT(p != NULL);
 
-    return &p->vcpu_ioreq[v->vcpu_id];
+    return &p->vcpu_ioreq[v->vcpu_id % IOREQ_NUM_PER_PAGE];
+}
+
+static ioreq_t *get_ioreq_fallible(struct hvm_ioreq_server *s, struct vcpu *v)
+{
+    return s->ioreq[v->vcpu_id / IOREQ_NUM_PER_PAGE].va ?
+           get_ioreq(s, v) : NULL;
 }
 
 bool hvm_io_pending(struct vcpu *v)
@@ -252,10 +262,10 @@ static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s,
     iorp->gfn = INVALID_GFN;
 }
 
-static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf, uint8_t i)
 {
     struct domain *d = s->domain;
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq[i];
     int rc;
 
     if ( iorp->page )
@@ -277,7 +287,7 @@ static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
     if ( IS_DEFAULT(s) )
         iorp->gfn = _gfn(buf ?
                          d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN] :
-                         d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN]);
+                         d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN] + i);
     else
         iorp->gfn = hvm_alloc_ioreq_gfn(s);
 
@@ -366,7 +376,22 @@ bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
 
     FOR_EACH_IOREQ_SERVER(d, id, s)
     {
-        if ( (s->ioreq.page == page) || (s->bufioreq.page == page) )
+        int i;
+        const struct hvm_ioreq_page *iorp;
+
+        FOR_EACH_IOREQ_PAGE(s, i, iorp)
+        {
+            if ( iorp->page == page )
+            {
+                found = true;
+                break;
+            }
+        }
+
+        if ( found )
+            break;
+
+        if ( s->bufioreq.page == page )
         {
             found = true;
             break;
@@ -415,14 +440,12 @@ static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s,
 static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s,
                                     struct hvm_ioreq_vcpu *sv)
 {
-    ASSERT(spin_is_locked(&s->lock));
+    ioreq_t *p = get_ioreq_fallible(s, sv->vcpu);
 
-    if ( s->ioreq.va != NULL )
-    {
-        ioreq_t *p = get_ioreq(s, sv->vcpu);
+    ASSERT(spin_is_locked(&s->lock));
 
+    if ( p )
         p->vp_eport = sv->ioreq_evtchn;
-    }
 }
 
 #define HANDLE_BUFIOREQ(s) \
@@ -540,44 +563,66 @@ static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
 
 static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s)
 {
-    int rc;
+    int i, rc = -EINVAL;
+    struct hvm_ioreq_page *iorp;
 
-    rc = hvm_map_ioreq_gfn(s, false);
+    for ( i = 0; i < s->ioreq_page_nr; i++ )
+    {
+        rc = hvm_map_ioreq_gfn(s, false, i);
+        if ( rc )
+            break;
+    }
 
     if ( !rc && HANDLE_BUFIOREQ(s) )
-        rc = hvm_map_ioreq_gfn(s, true);
+        rc = hvm_map_ioreq_gfn(s, true, 0);
 
     if ( rc )
-        hvm_unmap_ioreq_gfn(s, &s->ioreq);
+        FOR_EACH_IOREQ_PAGE(s, i, iorp)
+            hvm_unmap_ioreq_gfn(s, iorp);
 
     return rc;
 }
 
 static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
 {
-    hvm_unmap_ioreq_gfn(s, &s->ioreq);
+    int i;
+    struct hvm_ioreq_page *iorp;
+
+    FOR_EACH_IOREQ_PAGE(s, i, iorp)
+        hvm_unmap_ioreq_gfn(s, iorp);
     hvm_unmap_ioreq_gfn(s, &s->bufioreq);
 }
 
 static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s)
 {
-    int rc;
+    int i, rc = -EINVAL;
+    struct hvm_ioreq_page *iorp;
 
-    rc = hvm_alloc_ioreq_mfn(s, &s->ioreq);
+    FOR_EACH_IOREQ_PAGE(s, i, iorp)
+    {
+        rc = hvm_alloc_ioreq_mfn(s, iorp);
+        if ( rc )
+            break;
+    }
 
     if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) )
         rc = hvm_alloc_ioreq_mfn(s, &s->bufioreq);
 
     if ( rc )
-        hvm_free_ioreq_mfn(s, &s->ioreq);
+        FOR_EACH_IOREQ_PAGE(s, i, iorp)
+            hvm_free_ioreq_mfn(s, iorp);
 
     return rc;
 }
 
 static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s)
 {
+    int i;
+    struct hvm_ioreq_page *iorp;
+
     hvm_free_ioreq_mfn(s, &s->bufioreq);
-    hvm_free_ioreq_mfn(s, &s->ioreq);
+    FOR_EACH_IOREQ_PAGE(s, i, iorp)
+        hvm_free_ioreq_mfn(s, iorp);
 }
 
 static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s)
@@ -638,13 +683,16 @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
 static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s)
 {
     struct hvm_ioreq_vcpu *sv;
+    struct hvm_ioreq_page *iorp;
+    int i;
 
     spin_lock(&s->lock);
 
     if ( s->enabled )
         goto done;
 
-    hvm_remove_ioreq_gfn(s, &s->ioreq);
+    FOR_EACH_IOREQ_PAGE(s, i, iorp)
+        hvm_remove_ioreq_gfn(s, iorp);
     hvm_remove_ioreq_gfn(s, &s->bufioreq);
 
     s->enabled = true;
@@ -660,13 +708,17 @@ static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s)
 
 static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s)
 {
+    struct hvm_ioreq_page *iorp;
+    int i;
+
     spin_lock(&s->lock);
 
     if ( !s->enabled )
         goto done;
 
     hvm_add_ioreq_gfn(s, &s->bufioreq);
-    hvm_add_ioreq_gfn(s, &s->ioreq);
+    FOR_EACH_IOREQ_PAGE(s, i, iorp)
+        hvm_add_ioreq_gfn(s, iorp);
 
     s->enabled = false;
 
@@ -679,7 +731,8 @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server *s,
                                  int bufioreq_handling, ioservid_t id)
 {
     struct vcpu *v;
-    int rc;
+    int rc, i;
+    struct hvm_ioreq_page *iorp;
 
     s->domain = d;
     s->domid = domid;
@@ -688,8 +741,15 @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server *s,
     INIT_LIST_HEAD(&s->ioreq_vcpu_list);
     spin_lock_init(&s->bufioreq_lock);
 
-    s->ioreq.gfn = INVALID_GFN;
+    FOR_EACH_IOREQ_PAGE(s, i, iorp)
+        iorp->gfn = INVALID_GFN;
     s->bufioreq.gfn = INVALID_GFN;
+    s->ioreq_page_nr = (d->max_vcpus + IOREQ_NUM_PER_PAGE - 1) /
+                       IOREQ_NUM_PER_PAGE;
+    if ( s->ioreq_page_nr > d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PAGES] )
+        return -EINVAL;
+
+    s->ioreq_idx = 0;
 
     rc = hvm_ioreq_server_alloc_rangesets(s, id);
     if ( rc )
@@ -866,7 +926,10 @@ int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
     }
 
     if ( ioreq_gfn )
-        *ioreq_gfn = gfn_x(s->ioreq.gfn);
+    {
+        *ioreq_gfn = gfn_x(s->ioreq[s->ioreq_idx].gfn);
+        s->ioreq_idx = (s->ioreq_idx + 1) % s->ioreq_page_nr;
+    }
 
     if ( HANDLE_BUFIOREQ(s) )
     {
@@ -916,7 +979,8 @@ int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
         break;
 
     case XENMEM_resource_ioreq_server_frame_ioreq(0):
-        *mfn = _mfn(page_to_mfn(s->ioreq.page));
+        *mfn = _mfn(page_to_mfn(s->ioreq[s->ioreq_idx].page));
+        s->ioreq_idx = (s->ioreq_idx + 1) % s->ioreq_page_nr;
         rc = 0;
         break;
 
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 87f7994..ff8c44d 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -51,6 +51,7 @@ struct hvm_ioreq_vcpu {
 
 #define NR_IO_RANGE_TYPES (XEN_DMOP_IO_RANGE_PCI + 1)
 #define MAX_NR_IO_RANGES  256
+#define IOREQ_NUM_PER_PAGE (PAGE_SIZE / sizeof(ioreq_t))
 
 struct hvm_ioreq_server {
     struct list_head       list_entry;
@@ -61,7 +62,10 @@ struct hvm_ioreq_server {
 
     /* Domain id of emulating domain */
     domid_t                domid;
-    struct hvm_ioreq_page  ioreq;
+    /* Index and size of ioreq page array */
+    uint8_t                ioreq_idx;
+    uint8_t                ioreq_page_nr;
+    struct hvm_ioreq_page  ioreq[MAX_IOREQ_PAGE];
     struct list_head       ioreq_vcpu_list;
     struct hvm_ioreq_page  bufioreq;
 
diff --git a/xen/include/public/hvm/ioreq.h b/xen/include/public/hvm/ioreq.h
index d309d12..d628303 100644
--- a/xen/include/public/hvm/ioreq.h
+++ b/xen/include/public/hvm/ioreq.h
@@ -71,6 +71,8 @@ struct shared_iopage {
 };
 typedef struct shared_iopage shared_iopage_t;
 
+#define MAX_IOREQ_PAGE 4
+
 struct buf_ioreq {
     uint8_t  type;   /* I/O type                    */
     uint8_t  pad:1;
diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h
index 2ec2e7c..537799d 100644
--- a/xen/include/public/hvm/params.h
+++ b/xen/include/public/hvm/params.h
@@ -279,6 +279,12 @@
 #define XEN_HVM_MCA_CAP_LMCE   (xen_mk_ullong(1) << 0)
 #define XEN_HVM_MCA_CAP_MASK   XEN_HVM_MCA_CAP_LMCE
 
-#define HVM_NR_PARAMS 39
+/*
+ * Number of pages that are reserved for default IOREQ server. The base PFN
+ * is set via HVM_PARAM_IOREQ_PFN.
+ */
+#define HVM_PARAM_IOREQ_PAGES 39
+
+#define HVM_NR_PARAMS 40
 
 #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC Patch v4 3/8] xl/acpi: unify the computation of lapic_id
  2017-12-06  7:50 [RFC Patch v4 0/8] Extend resources to support more vcpus in single VM Chao Gao
  2017-12-06  7:50 ` [RFC Patch v4 1/8] ioreq: remove most 'buf' parameter from static functions Chao Gao
  2017-12-06  7:50 ` [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages Chao Gao
@ 2017-12-06  7:50 ` Chao Gao
  2018-02-22 18:05   ` Wei Liu
  2017-12-06  7:50 ` [RFC Patch v4 4/8] hvmloader: boot cpu through broadcast Chao Gao
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 56+ messages in thread
From: Chao Gao @ 2017-12-06  7:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Andrew Cooper, Wei Liu, Jan Beulich, Chao Gao

There were two places where the lapic_id is computed, one in hvmloader and one
in libacpi. Unify them by defining LAPIC_ID in a header file and incluing it
in both places.

To address compilation issue and make libacpi.h self-contained, include
stdint.h in libacpi.h.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v4:
 - new
---
 tools/firmware/hvmloader/config.h | 3 +--
 tools/libacpi/libacpi.h           | 3 +++
 tools/libxl/libxl_x86_acpi.c      | 2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
index 6e00413..55e3a27 100644
--- a/tools/firmware/hvmloader/config.h
+++ b/tools/firmware/hvmloader/config.h
@@ -1,7 +1,7 @@
 #ifndef __HVMLOADER_CONFIG_H__
 #define __HVMLOADER_CONFIG_H__
 
-#include <stdint.h>
+#include <libacpi.h>
 
 enum virtual_vga { VGA_none, VGA_std, VGA_cirrus, VGA_pt };
 extern enum virtual_vga virtual_vga;
@@ -48,7 +48,6 @@ extern uint8_t ioapic_version;
 #define IOAPIC_ID           0x01
 
 #define LAPIC_BASE_ADDRESS  0xfee00000
-#define LAPIC_ID(vcpu_id)   ((vcpu_id) * 2)
 
 #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
 #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index 46a819d..b89fdb5 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -21,6 +21,9 @@
 #define __LIBACPI_H__
 
 #include <stdbool.h>
+#include <stdint.h>  /* uintXX_t */
+
+#define LAPIC_ID(vcpu_id)   ((vcpu_id) * 2)
 
 #define ACPI_HAS_COM1              (1<<0)
 #define ACPI_HAS_COM2              (1<<1)
diff --git a/tools/libxl/libxl_x86_acpi.c b/tools/libxl/libxl_x86_acpi.c
index bbe9219..0b7507d 100644
--- a/tools/libxl/libxl_x86_acpi.c
+++ b/tools/libxl/libxl_x86_acpi.c
@@ -87,7 +87,7 @@ static void acpi_mem_free(struct acpi_ctxt *ctxt,
 
 static uint32_t acpi_lapic_id(unsigned cpu)
 {
-    return cpu * 2;
+    return LAPIC_ID(cpu);
 }
 
 static int init_acpi_config(libxl__gc *gc, 
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC Patch v4 4/8] hvmloader: boot cpu through broadcast
  2017-12-06  7:50 [RFC Patch v4 0/8] Extend resources to support more vcpus in single VM Chao Gao
                   ` (2 preceding siblings ...)
  2017-12-06  7:50 ` [RFC Patch v4 3/8] xl/acpi: unify the computation of lapic_id Chao Gao
@ 2017-12-06  7:50 ` Chao Gao
  2018-02-22 18:44   ` Wei Liu
                     ` (2 more replies)
  2017-12-06  7:50 ` [RFC Patch v4 5/8] Tool/ACPI: DSDT extension to support more vcpus Chao Gao
                   ` (3 subsequent siblings)
  7 siblings, 3 replies; 56+ messages in thread
From: Chao Gao @ 2017-12-06  7:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, Andrew Cooper, Wei Liu, Jan Beulich, Chao Gao

Intel SDM Extended XAPIC (X2APIC) -> "Initialization by System Software"
has the following description:

"The ACPI interfaces for the x2APIC are described in Section 5.2, “ACPI System
Description Tables,” of the Advanced Configuration and Power Interface
Specification, Revision 4.0a (http://www.acpi.info/spec.htm). The default
behavior for BIOS is to pass the control to the operating system with the
local x2APICs in xAPIC mode if all APIC IDs reported by CPUID.0BH:EDX are less
than 255, and in x2APIC mode if there are any logical processor reporting an
APIC ID of 255 or greater."

In this patch, hvmloader enables x2apic mode for all vcpus if there are cpus
with APIC ID > 255. To wake up processors whose APIC ID is greater than 255,
the SIPI is broadcasted to all APs. It is the way how Seabios wakes up APs.
APs may compete for the stack, thus a lock is introduced to protect the stack.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v4:
 - new
---
 tools/firmware/hvmloader/apic_regs.h |  4 +++
 tools/firmware/hvmloader/smp.c       | 64 ++++++++++++++++++++++++++++++++----
 2 files changed, 61 insertions(+), 7 deletions(-)

diff --git a/tools/firmware/hvmloader/apic_regs.h b/tools/firmware/hvmloader/apic_regs.h
index f737b47..bc39ecd 100644
--- a/tools/firmware/hvmloader/apic_regs.h
+++ b/tools/firmware/hvmloader/apic_regs.h
@@ -105,6 +105,10 @@
 #define     APIC_TDR_DIV_64          0x9
 #define     APIC_TDR_DIV_128         0xA
 
+#define MSR_IA32_APICBASE            0x1b
+#define MSR_IA32_APICBASE_EXTD       (1<<10)
+#define MSR_IA32_APICBASE_MSR        0x800
+
 #endif
 
 /*
diff --git a/tools/firmware/hvmloader/smp.c b/tools/firmware/hvmloader/smp.c
index 082b17f..e3dade4 100644
--- a/tools/firmware/hvmloader/smp.c
+++ b/tools/firmware/hvmloader/smp.c
@@ -26,7 +26,9 @@
 #define AP_BOOT_EIP 0x1000
 extern char ap_boot_start[], ap_boot_end[];
 
-static int ap_callin, ap_cpuid;
+static int ap_callin;
+static int enable_x2apic;
+static bool lock = 1;
 
 asm (
     "    .text                       \n"
@@ -47,7 +49,15 @@ asm (
     "    mov   %eax,%ds              \n"
     "    mov   %eax,%es              \n"
     "    mov   %eax,%ss              \n"
-    "    movl  $stack_top,%esp       \n"
+    "3:  movb  $1, %bl               \n"
+    "    mov   $lock,%edx            \n"
+    "    movzbl %bl,%eax             \n"
+    "    xchg  %al, (%edx)           \n"
+    "    test  %al,%al               \n"
+    "    je    2f                    \n"
+    "    pause                       \n"
+    "    jmp   3b                    \n"
+    "2:  movl  $stack_top,%esp       \n"
     "    movl  %esp,%ebp             \n"
     "    call  ap_start              \n"
     "1:  hlt                         \n"
@@ -68,14 +78,34 @@ asm (
     "    .text                       \n"
     );
 
+unsigned int ap_cpuid(void)
+{
+    if ( !(rdmsr(MSR_IA32_APICBASE) & MSR_IA32_APICBASE_EXTD) )
+    {
+        uint32_t eax, ebx, ecx, edx;
+
+        cpuid(1, &eax, &ebx, &ecx, &edx);
+        return ebx >> 24;
+    }
+    else
+        return rdmsr(MSR_IA32_APICBASE_MSR + (APIC_ID >> 4));
+}
+
 void ap_start(void); /* non-static avoids unused-function compiler warning */
 /*static*/ void ap_start(void)
 {
-    printf(" - CPU%d ... ", ap_cpuid);
+    printf(" - CPU%d ... ", ap_cpuid());
     cacheattr_init();
     printf("done.\n");
     wmb();
-    ap_callin = 1;
+    ap_callin++;
+
+    if ( enable_x2apic )
+        wrmsr(MSR_IA32_APICBASE, rdmsr(MSR_IA32_APICBASE) |
+                                 MSR_IA32_APICBASE_EXTD);
+
+    /* Release the lock */
+    asm volatile ( "xchgb %1, %b0" : : "m" (lock), "r" (0) : "memory" );
 }
 
 static void lapic_wait_ready(void)
@@ -89,7 +119,6 @@ static void boot_cpu(unsigned int cpu)
     unsigned int icr2 = SET_APIC_DEST_FIELD(LAPIC_ID(cpu));
 
     /* Initialise shared variables. */
-    ap_cpuid = cpu;
     ap_callin = 0;
     wmb();
 
@@ -118,6 +147,21 @@ static void boot_cpu(unsigned int cpu)
     lapic_wait_ready();    
 }
 
+static void boot_cpu_broadcast_x2apic(unsigned int nr_cpus)
+{
+    wrmsr(MSR_IA32_APICBASE_MSR + (APIC_ICR >> 4),
+          APIC_DEST_ALLBUT | APIC_DM_INIT);
+
+    wrmsr(MSR_IA32_APICBASE_MSR + (APIC_ICR >> 4),
+          APIC_DEST_ALLBUT | APIC_DM_STARTUP | (AP_BOOT_EIP >> 12));
+
+    while ( ap_callin != nr_cpus )
+        cpu_relax();
+
+    wrmsr(MSR_IA32_APICBASE_MSR + (APIC_ICR >> 4),
+          APIC_DEST_ALLBUT | APIC_DM_INIT);
+}
+
 void smp_initialise(void)
 {
     unsigned int i, nr_cpus = hvm_info->nr_vcpus;
@@ -125,9 +169,15 @@ void smp_initialise(void)
     memcpy((void *)AP_BOOT_EIP, ap_boot_start, ap_boot_end - ap_boot_start);
 
     printf("Multiprocessor initialisation:\n");
+    if ( nr_cpus > MADT_MAX_LOCAL_APIC )
+        enable_x2apic = 1;
+
     ap_start();
-    for ( i = 1; i < nr_cpus; i++ )
-        boot_cpu(i);
+    if ( nr_cpus > MADT_MAX_LOCAL_APIC )
+        boot_cpu_broadcast_x2apic(nr_cpus);
+    else
+        for ( i = 1; i < nr_cpus; i++ )
+            boot_cpu(i);
 }
 
 /*
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC Patch v4 5/8] Tool/ACPI: DSDT extension to support more vcpus
  2017-12-06  7:50 [RFC Patch v4 0/8] Extend resources to support more vcpus in single VM Chao Gao
                   ` (3 preceding siblings ...)
  2017-12-06  7:50 ` [RFC Patch v4 4/8] hvmloader: boot cpu through broadcast Chao Gao
@ 2017-12-06  7:50 ` Chao Gao
  2017-12-06  7:50 ` [RFC Patch v4 6/8] hvmload: Add x2apic entry support in the MADT and SRAT build Chao Gao
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 56+ messages in thread
From: Chao Gao @ 2017-12-06  7:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Lan Tianyu, Ian Jackson, Wei Liu, Jan Beulich, Chao Gao

From: Lan Tianyu <tianyu.lan@intel.com>

This patch is to change DSDT table for processor object to support 4096 vcpus
accroding to ACPI spec 8.4 Declaring Processors.

This patch contains the two changes:
1. Declare processors whose local APIC is declared as a x2apic via the ASL
   Device statement
2. Bump up the size of CPU ID used to compose processor name to 12 bits. Thus
   the processors number limitation imposed here is 4096.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 tools/libacpi/libacpi.h |  6 ++++++
 tools/libacpi/mk_dsdt.c | 40 +++++++++++++++++++++++++++++++++-------
 2 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index b89fdb5..7db4d92 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -24,6 +24,12 @@
 #include <stdint.h>  /* uintXX_t */
 
 #define LAPIC_ID(vcpu_id)   ((vcpu_id) * 2)
+/*
+ * For x86, APIC ID is twice the vcpu id. In MADT, only APICs with
+ * APIC ID <= 254 can be declared as local APIC. Otherwise, APICs with
+ * APIC ID > 254 should be declared as local x2APIC.
+ */
+#define MADT_MAX_LOCAL_APIC 128U
 
 #define ACPI_HAS_COM1              (1<<0)
 #define ACPI_HAS_COM2              (1<<1)
diff --git a/tools/libacpi/mk_dsdt.c b/tools/libacpi/mk_dsdt.c
index 2daf32c..27e5d1b 100644
--- a/tools/libacpi/mk_dsdt.c
+++ b/tools/libacpi/mk_dsdt.c
@@ -20,10 +20,13 @@
 #if defined(CONFIG_X86)
 #include <xen/arch-x86/xen.h>
 #include <xen/hvm/hvm_info_table.h>
+#include "libacpi.h"
 #elif defined(CONFIG_ARM_64)
 #include <xen/arch-arm.h>
 #endif
 
+#define CPU_NAME_FMT      "P%.03X"
+
 static unsigned int indent_level;
 static bool debug = false;
 
@@ -194,12 +197,35 @@ int main(int argc, char **argv)
 #endif
 
     /* Define processor objects and control methods. */
-    for ( cpu = 0; cpu < max_cpus; cpu++)
+    for ( cpu = 0; cpu < max_cpus; cpu++ )
     {
-        push_block("Processor", "PR%02X, %d, 0x0000b010, 0x06", cpu, cpu);
 
-        stmt("Name", "_HID, \"ACPI0007\"");
+#ifdef CONFIG_X86
+        /*
+         * According to the Processor Local x2APIC Structure of ACPI SPEC
+         * Revision 5.0, "OSPM associates the X2APIC Structure with a
+         * processor object declared in the namespace using the Device
+         * statement, when the _UID child object of the processor device
+         * evaluates to a numeric value, by matching the numeric value with
+         * this field".
+         *
+         * Anyhow, a numeric value is assigned to _UID object here. Thus,
+         * for each x2apic structure in MADT, instead of declaring the
+         * corresponding processor via the ASL Processor statement, declare
+         * it via the ASL Device statement.
+         *
+         * Note that If CPU ID is equal or greater than MADT_MAX_LOCAL_APIC,
+         * the lapic of this CPU should be enumerated as a local x2apic
+         * structure.
+         */
+        if ( cpu >= MADT_MAX_LOCAL_APIC )
+            push_block("Device", CPU_NAME_FMT, cpu);
+        else
+#endif
+            push_block("Processor", CPU_NAME_FMT ", %d,0x0000b010, 0x06",
+                       cpu, cpu);
 
+        stmt("Name", "_HID, \"ACPI0007\"");
         stmt("Name", "_UID, %d", cpu);
 #ifdef CONFIG_ARM_64
         pop_block();
@@ -268,15 +294,15 @@ int main(int argc, char **argv)
         /* Extract current CPU's status: 0=offline; 1=online. */
         stmt("And", "Local1, 1, Local2");
         /* Check if status is up-to-date in the relevant MADT LAPIC entry... */
-        push_block("If", "LNotEqual(Local2, \\_SB.PR%02X.FLG)", cpu);
+        push_block("If", "LNotEqual(Local2, \\_SB." CPU_NAME_FMT ".FLG)", cpu);
         /* ...If not, update it and the MADT checksum, and notify OSPM. */
-        stmt("Store", "Local2, \\_SB.PR%02X.FLG", cpu);
+        stmt("Store", "Local2, \\_SB." CPU_NAME_FMT ".FLG", cpu);
         push_block("If", "LEqual(Local2, 1)");
-        stmt("Notify", "PR%02X, 1", cpu); /* Notify: Device Check */
+        stmt("Notify", CPU_NAME_FMT ", 1", cpu); /* Notify: Device Check */
         stmt("Subtract", "\\_SB.MSU, 1, \\_SB.MSU"); /* Adjust MADT csum */
         pop_block();
         push_block("Else", NULL);
-        stmt("Notify", "PR%02X, 3", cpu); /* Notify: Eject Request */
+        stmt("Notify", CPU_NAME_FMT ", 3", cpu); /* Notify: Eject Request */
         stmt("Add", "\\_SB.MSU, 1, \\_SB.MSU"); /* Adjust MADT csum */
         pop_block();
         pop_block();
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC Patch v4 6/8] hvmload: Add x2apic entry support in the MADT and SRAT build
  2017-12-06  7:50 [RFC Patch v4 0/8] Extend resources to support more vcpus in single VM Chao Gao
                   ` (4 preceding siblings ...)
  2017-12-06  7:50 ` [RFC Patch v4 5/8] Tool/ACPI: DSDT extension to support more vcpus Chao Gao
@ 2017-12-06  7:50 ` Chao Gao
  2018-04-18  8:48   ` Jan Beulich
  2017-12-06  7:50 ` [RFC Patch v4 7/8] x86/hvm: bump the number of pages of shadow memory Chao Gao
  2017-12-06  7:50 ` [RFC Patch v4 8/8] x86/hvm: bump the maximum number of vcpus to 512 Chao Gao
  7 siblings, 1 reply; 56+ messages in thread
From: Chao Gao @ 2017-12-06  7:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Lan Tianyu, Ian Jackson, Wei Liu, Jan Beulich, Chao Gao

From: Lan Tianyu <tianyu.lan@intel.com>

This patch contains the following changes:
1. add x2apic entry support for ACPI MADT table according to
 ACPI spec 5.2.12.12 Processor Local x2APIC Structure.
2. add x2apic entry support for ACPI SRAT table according to
 ACPI spec 5.2.16.3 Processor Local x2APIC Affinity Structure.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v4:
 - also add x2apic entry in SRAT
---
 tools/libacpi/acpi2_0.h | 25 ++++++++++++++++++++--
 tools/libacpi/build.c   | 57 +++++++++++++++++++++++++++++++++++++++----------
 2 files changed, 69 insertions(+), 13 deletions(-)

diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
index 6081417..7eb983d 100644
--- a/tools/libacpi/acpi2_0.h
+++ b/tools/libacpi/acpi2_0.h
@@ -322,6 +322,7 @@ struct acpi_20_waet {
 #define ACPI_IO_SAPIC                       0x06
 #define ACPI_PROCESSOR_LOCAL_SAPIC          0x07
 #define ACPI_PLATFORM_INTERRUPT_SOURCES     0x08
+#define ACPI_PROCESSOR_LOCAL_X2APIC         0x09
 
 /*
  * APIC Structure Definitions.
@@ -338,6 +339,15 @@ struct acpi_20_madt_lapic {
     uint32_t flags;
 };
 
+struct acpi_20_madt_x2apic {
+    uint8_t  type;
+    uint8_t  length;
+    uint16_t reserved;
+    uint32_t x2apic_id;
+    uint32_t flags;
+    uint32_t acpi_processor_uid;
+};
+
 /*
  * Local APIC Flags.  All other bits are reserved and must be 0.
  */
@@ -378,8 +388,9 @@ struct acpi_20_srat {
 /*
  * System Resource Affinity Table structure types.
  */
-#define ACPI_PROCESSOR_AFFINITY 0x0
-#define ACPI_MEMORY_AFFINITY    0x1
+#define ACPI_PROCESSOR_AFFINITY         0x0
+#define ACPI_MEMORY_AFFINITY            0x1
+#define ACPI_PROCESSOR_X2APIC_AFFINITY  0x2
 struct acpi_20_srat_processor {
     uint8_t type;
     uint8_t length;
@@ -391,6 +402,16 @@ struct acpi_20_srat_processor {
     uint32_t reserved;
 };
 
+struct acpi_20_srat_processor_x2apic {
+    uint8_t type;
+    uint8_t length;
+    uint16_t reserved;
+    uint32_t domain;
+    uint32_t x2apic_id;
+    uint32_t flags;
+    uint32_t reserved2[2];
+};
+
 /*
  * Local APIC Affinity Flags.  All other bits are reserved and must be 0.
  */
diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
index df0a67c..5cbf6a9 100644
--- a/tools/libacpi/build.c
+++ b/tools/libacpi/build.c
@@ -30,6 +30,11 @@
 
 #define align16(sz)        (((sz) + 15) & ~15)
 #define fixed_strcpy(d, s) strncpy((d), (s), sizeof(d))
+#define min(X, Y) ({                             \
+            const typeof (X) _x = (X);           \
+            const typeof (Y) _y = (Y);           \
+            (void) (&_x == &_y);                 \
+            (_x < _y) ? _x : _y; })
 
 extern struct acpi_20_rsdp Rsdp;
 extern struct acpi_20_rsdt Rsdt;
@@ -79,16 +84,19 @@ static struct acpi_20_madt *construct_madt(struct acpi_ctxt *ctxt,
     struct acpi_20_madt_intsrcovr *intsrcovr;
     struct acpi_20_madt_ioapic    *io_apic;
     struct acpi_20_madt_lapic     *lapic;
+    struct acpi_20_madt_x2apic    *x2apic;
     const struct hvm_info_table   *hvminfo = config->hvminfo;
-    int i, sz;
+    int i, sz, nr_apic;
 
     if ( config->lapic_id == NULL )
         return NULL;
 
+    nr_apic = min(hvminfo->nr_vcpus, MADT_MAX_LOCAL_APIC);
     sz  = sizeof(struct acpi_20_madt);
     sz += sizeof(struct acpi_20_madt_intsrcovr) * 16;
     sz += sizeof(struct acpi_20_madt_ioapic);
-    sz += sizeof(struct acpi_20_madt_lapic) * hvminfo->nr_vcpus;
+    sz += sizeof(struct acpi_20_madt_lapic) * nr_apic;
+    sz += sizeof(struct acpi_20_madt_x2apic) * (hvminfo->nr_vcpus - nr_apic);
 
     madt = ctxt->mem_ops.alloc(ctxt, sz, 16);
     if (!madt) return NULL;
@@ -149,7 +157,7 @@ static struct acpi_20_madt *construct_madt(struct acpi_ctxt *ctxt,
 
     info->nr_cpus = hvminfo->nr_vcpus;
     info->madt_lapic0_addr = ctxt->mem_ops.v2p(ctxt, lapic);
-    for ( i = 0; i < hvminfo->nr_vcpus; i++ )
+    for ( i = 0; i < nr_apic; i++ )
     {
         memset(lapic, 0, sizeof(*lapic));
         lapic->type    = ACPI_PROCESSOR_LOCAL_APIC;
@@ -157,12 +165,26 @@ static struct acpi_20_madt *construct_madt(struct acpi_ctxt *ctxt,
         /* Processor ID must match processor-object IDs in the DSDT. */
         lapic->acpi_processor_id = i;
         lapic->apic_id = config->lapic_id(i);
-        lapic->flags = (test_bit(i, hvminfo->vcpu_online)
-                        ? ACPI_LOCAL_APIC_ENABLED : 0);
+        lapic->flags = test_bit(i, hvminfo->vcpu_online)
+                           ? ACPI_LOCAL_APIC_ENABLED : 0;
         lapic++;
     }
 
-    madt->header.length = (unsigned char *)lapic - (unsigned char *)madt;
+    x2apic = (void *)lapic;
+    for ( ; i < hvminfo->nr_vcpus; i++ )
+    {
+        memset(x2apic, 0, sizeof(*x2apic));
+        x2apic->type    = ACPI_PROCESSOR_LOCAL_X2APIC;
+        x2apic->length  = sizeof(*x2apic);
+        /* Processor UID must match processor-object UIDs in the DSDT. */
+        x2apic->acpi_processor_uid = i;
+        x2apic->x2apic_id = config->lapic_id(i);
+        x2apic->flags =  test_bit(i, hvminfo->vcpu_online)
+                             ? ACPI_LOCAL_APIC_ENABLED : 0;
+        x2apic++;
+    }
+
+    madt->header.length = (unsigned char *)x2apic - (unsigned char *)madt;
     set_checksum(madt, offsetof(struct acpi_header, checksum),
                  madt->header.length);
     info->madt_csum_addr =
@@ -216,12 +238,14 @@ static struct acpi_20_srat *construct_srat(struct acpi_ctxt *ctxt,
 {
     struct acpi_20_srat *srat;
     struct acpi_20_srat_processor *processor;
+    struct acpi_20_srat_processor_x2apic *x2apic;
     struct acpi_20_srat_memory *memory;
-    unsigned int size;
+    unsigned int size, i, nr_apic;
     void *p;
-    unsigned int i;
 
-    size = sizeof(*srat) + sizeof(*processor) * config->hvminfo->nr_vcpus +
+    nr_apic = min(MADT_MAX_LOCAL_APIC, config->hvminfo->nr_vcpus);
+    size = sizeof(*srat) + sizeof(*processor) * nr_apic +
+           sizeof(*x2apic) * (config->hvminfo->nr_vcpus - nr_apic) +
            sizeof(*memory) * config->numa.nr_vmemranges;
 
     p = ctxt->mem_ops.alloc(ctxt, size, 16);
@@ -239,7 +263,7 @@ static struct acpi_20_srat *construct_srat(struct acpi_ctxt *ctxt,
     srat->table_revision      = ACPI_SRAT_TABLE_REVISION;
 
     processor = (struct acpi_20_srat_processor *)(srat + 1);
-    for ( i = 0; i < config->hvminfo->nr_vcpus; i++ )
+    for ( i = 0; i < nr_apic; i++ )
     {
         processor->type     = ACPI_PROCESSOR_AFFINITY;
         processor->length   = sizeof(*processor);
@@ -249,7 +273,18 @@ static struct acpi_20_srat *construct_srat(struct acpi_ctxt *ctxt,
         processor++;
     }
 
-    memory = (struct acpi_20_srat_memory *)processor;
+    x2apic = (struct acpi_20_srat_processor_x2apic *)processor;
+    for ( ; i < config->hvminfo->nr_vcpus; i++ )
+    {
+        x2apic->type       = ACPI_PROCESSOR_X2APIC_AFFINITY;
+        x2apic->length     = sizeof(*x2apic);
+        x2apic->domain     = config->numa.vcpu_to_vnode[i];
+        x2apic->x2apic_id  = config->lapic_id(i);
+        x2apic->flags      = ACPI_LOCAL_APIC_AFFIN_ENABLED;
+        x2apic++;
+    }
+
+    memory = (struct acpi_20_srat_memory *)x2apic;
     for ( i = 0; i < config->numa.nr_vmemranges; i++ )
     {
         memory->type          = ACPI_MEMORY_AFFINITY;
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC Patch v4 7/8] x86/hvm: bump the number of pages of shadow memory
  2017-12-06  7:50 [RFC Patch v4 0/8] Extend resources to support more vcpus in single VM Chao Gao
                   ` (5 preceding siblings ...)
  2017-12-06  7:50 ` [RFC Patch v4 6/8] hvmload: Add x2apic entry support in the MADT and SRAT build Chao Gao
@ 2017-12-06  7:50 ` Chao Gao
  2018-02-27 14:17   ` George Dunlap
  2018-04-18  8:53   ` Jan Beulich
  2017-12-06  7:50 ` [RFC Patch v4 8/8] x86/hvm: bump the maximum number of vcpus to 512 Chao Gao
  7 siblings, 2 replies; 56+ messages in thread
From: Chao Gao @ 2017-12-06  7:50 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Jan Beulich, Chao Gao

Each vcpu of hvm guest consumes at least one shadow page. Currently, only 256
(for hap case) pages are pre-allocated as shadow memory at beginning. It would
run out if guest has more than 256 vcpus and guest creation fails. Bump the
number of shadow pages to 2 * HVM_MAX_VCPUS for hap case and 8 * HVM_MAX_VCPUS
for shadow case.

This patch won't lead to more memory consumption for the size of shadow memory
will be adjusted via XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION according to the size
of guest memory and the number of vcpus.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 xen/arch/x86/mm/hap/hap.c       | 2 +-
 xen/arch/x86/mm/shadow/common.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index 41deb90..f4cf578 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -455,7 +455,7 @@ int hap_enable(struct domain *d, u32 mode)
     if ( old_pages == 0 )
     {
         paging_lock(d);
-        rv = hap_set_allocation(d, 256, NULL);
+        rv = hap_set_allocation(d, 2 * HVM_MAX_VCPUS, NULL);
         if ( rv != 0 )
         {
             hap_set_allocation(d, 0, NULL);
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index 72c674e..5e66603 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -3093,7 +3093,7 @@ int shadow_enable(struct domain *d, u32 mode)
     if ( old_pages == 0 )
     {
         paging_lock(d);
-        rv = shadow_set_allocation(d, 1024, NULL); /* Use at least 4MB */
+        rv = shadow_set_allocation(d, 8 * HVM_MAX_VCPUS, NULL);
         if ( rv != 0 )
         {
             shadow_set_allocation(d, 0, NULL);
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC Patch v4 8/8] x86/hvm: bump the maximum number of vcpus to 512
  2017-12-06  7:50 [RFC Patch v4 0/8] Extend resources to support more vcpus in single VM Chao Gao
                   ` (6 preceding siblings ...)
  2017-12-06  7:50 ` [RFC Patch v4 7/8] x86/hvm: bump the number of pages of shadow memory Chao Gao
@ 2017-12-06  7:50 ` Chao Gao
  2018-02-22 18:46   ` Wei Liu
  2018-02-23 18:11   ` Roger Pau Monné
  7 siblings, 2 replies; 56+ messages in thread
From: Chao Gao @ 2017-12-06  7:50 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Ian Jackson,
	Tim Deegan, Jan Beulich, Andrew Cooper, Chao Gao

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 xen/include/public/hvm/hvm_info_table.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/include/public/hvm/hvm_info_table.h b/xen/include/public/hvm/hvm_info_table.h
index 08c252e..6833a4c 100644
--- a/xen/include/public/hvm/hvm_info_table.h
+++ b/xen/include/public/hvm/hvm_info_table.h
@@ -32,7 +32,7 @@
 #define HVM_INFO_PADDR       ((HVM_INFO_PFN << 12) + HVM_INFO_OFFSET)
 
 /* Maximum we can support with current vLAPIC ID mapping. */
-#define HVM_MAX_VCPUS        128
+#define HVM_MAX_VCPUS        512
 
 /*
  * In some cases SMP HVM guests may require knowledge of Xen's idea of vCPU ids
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 1/8] ioreq: remove most 'buf' parameter from static functions
  2017-12-06 14:44   ` Paul Durrant
@ 2017-12-06  8:37     ` Chao Gao
  0 siblings, 0 replies; 56+ messages in thread
From: Chao Gao @ 2017-12-06  8:37 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Andrew Cooper, Jan Beulich, xen-devel

On Wed, Dec 06, 2017 at 02:44:52PM +0000, Paul Durrant wrote:
>> -----Original Message-----
>> From: Chao Gao [mailto:chao.gao@intel.com]
>> Sent: 06 December 2017 07:50
>> To: xen-devel@lists.xen.org
>> Cc: Chao Gao <chao.gao@intel.com>; Andrew Cooper
>> <Andrew.Cooper3@citrix.com>; Jan Beulich <jbeulich@suse.com>; Paul
>> Durrant <Paul.Durrant@citrix.com>
>> Subject: [RFC Patch v4 1/8] ioreq: remove most 'buf' parameter from static
>> functions
>> 
>> It is a preparation to support multiple IOREQ pages.
>> No functional change.
>> 
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> ---
>> v4:
>>  -new
>> ---
>>  xen/arch/x86/hvm/ioreq.c | 48 +++++++++++++++++++++++------------------
>> -------
>>  1 file changed, 23 insertions(+), 25 deletions(-)
>> 
>> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
>> index d991ac9..a879f20 100644
>> --- a/xen/arch/x86/hvm/ioreq.c
>> +++ b/xen/arch/x86/hvm/ioreq.c
>> @@ -237,10 +237,9 @@ static void hvm_free_ioreq_gfn(struct
>> hvm_ioreq_server *s, gfn_t gfn)
>>      set_bit(i, &d->arch.hvm_domain.ioreq_gfn.mask);
>>  }
>> 
>> -static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
>> +static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s,
>> +                                struct hvm_ioreq_page *iorp)
>>  {
>> -    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
>> -
>
>I don't really like this approach. I'd prefer swapping the bool for an unsigned page index, where we follow the convention adopted in hvm_get_ioreq_server_frame() for which macros exist: 0 equating to the bufioreq page, 1+ for the struct-per-cpu pages.

Ok. I have no preference for these two. But I will take your advice. 

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages
  2017-12-06 15:04   ` Paul Durrant
@ 2017-12-06  9:02     ` Chao Gao
  2017-12-06 16:10       ` Paul Durrant
  0 siblings, 1 reply; 56+ messages in thread
From: Chao Gao @ 2017-12-06  9:02 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, xen-devel, Jan Beulich, Ian Jackson

On Wed, Dec 06, 2017 at 03:04:11PM +0000, Paul Durrant wrote:
>> -----Original Message-----
>> From: Chao Gao [mailto:chao.gao@intel.com]
>> Sent: 06 December 2017 07:50
>> To: xen-devel@lists.xen.org
>> Cc: Chao Gao <chao.gao@intel.com>; Paul Durrant
>> <Paul.Durrant@citrix.com>; Tim (Xen.org) <tim@xen.org>; Stefano Stabellini
>> <sstabellini@kernel.org>; Konrad Rzeszutek Wilk
>> <konrad.wilk@oracle.com>; Jan Beulich <jbeulich@suse.com>; George
>> Dunlap <George.Dunlap@citrix.com>; Andrew Cooper
>> <Andrew.Cooper3@citrix.com>; Wei Liu <wei.liu2@citrix.com>; Ian Jackson
>> <Ian.Jackson@citrix.com>
>> Subject: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4
>> pages
>> 
>> One 4K-byte page at most contains 128 'ioreq_t'. In order to remove the vcpu
>> number constraint imposed by one IOREQ page, bump the number of IOREQ
>> page to
>> 4 pages. With this patch, multiple pages can be used as IOREQ page.
>> 
>> Basically, this patch extends 'ioreq' field in struct hvm_ioreq_server to an
>> array. All accesses to 'ioreq' field such as 's->ioreq' are replaced with
>> FOR_EACH_IOREQ_PAGE macro.
>> 
>> In order to access an IOREQ page, QEMU should get the gmfn and map this
>> gmfn
>> to its virtual address space.
>
>No. There's no need to extend the 'legacy' mechanism of using magic page gfns. You should only handle the case where the mfns are allocated on demand (see the call to hvm_ioreq_server_alloc_pages() in hvm_get_ioreq_server_frame()). The number of guest vcpus is known at this point so the correct number of pages can be allocated. If the creator of the ioreq server attempts to use the legacy hvm_get_ioreq_server_info() and the guest has >128 vcpus then the call should fail.

Great suggestion. I will introduce a new dmop, a variant of
hvm_get_ioreq_server_frame() for creator to get an array of gfns and the
size of array. And the legacy interface will report an error if more
than one IOREQ PAGES are needed.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 1/8] ioreq: remove most 'buf' parameter from static functions
  2017-12-06  7:50 ` [RFC Patch v4 1/8] ioreq: remove most 'buf' parameter from static functions Chao Gao
@ 2017-12-06 14:44   ` Paul Durrant
  2017-12-06  8:37     ` Chao Gao
  0 siblings, 1 reply; 56+ messages in thread
From: Paul Durrant @ 2017-12-06 14:44 UTC (permalink / raw)
  To: 'Chao Gao', xen-devel; +Cc: Andrew Cooper, Jan Beulich

> -----Original Message-----
> From: Chao Gao [mailto:chao.gao@intel.com]
> Sent: 06 December 2017 07:50
> To: xen-devel@lists.xen.org
> Cc: Chao Gao <chao.gao@intel.com>; Andrew Cooper
> <Andrew.Cooper3@citrix.com>; Jan Beulich <jbeulich@suse.com>; Paul
> Durrant <Paul.Durrant@citrix.com>
> Subject: [RFC Patch v4 1/8] ioreq: remove most 'buf' parameter from static
> functions
> 
> It is a preparation to support multiple IOREQ pages.
> No functional change.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> v4:
>  -new
> ---
>  xen/arch/x86/hvm/ioreq.c | 48 +++++++++++++++++++++++------------------
> -------
>  1 file changed, 23 insertions(+), 25 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
> index d991ac9..a879f20 100644
> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -237,10 +237,9 @@ static void hvm_free_ioreq_gfn(struct
> hvm_ioreq_server *s, gfn_t gfn)
>      set_bit(i, &d->arch.hvm_domain.ioreq_gfn.mask);
>  }
> 
> -static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
> +static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s,
> +                                struct hvm_ioreq_page *iorp)
>  {
> -    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
> -

I don't really like this approach. I'd prefer swapping the bool for an unsigned page index, where we follow the convention adopted in hvm_get_ioreq_server_frame() for which macros exist: 0 equating to the bufioreq page, 1+ for the struct-per-cpu pages.

  Paul

>      if ( gfn_eq(iorp->gfn, INVALID_GFN) )
>          return;
> 
> @@ -289,15 +288,15 @@ static int hvm_map_ioreq_gfn(struct
> hvm_ioreq_server *s, bool buf)
>                                   &iorp->va);
> 
>      if ( rc )
> -        hvm_unmap_ioreq_gfn(s, buf);
> +        hvm_unmap_ioreq_gfn(s, iorp);
> 
>      return rc;
>  }
> 
> -static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
> +static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s,
> +                               struct hvm_ioreq_page *iorp)
>  {
>      struct domain *currd = current->domain;
> -    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
> 
>      if ( iorp->page )
>      {
> @@ -344,10 +343,9 @@ static int hvm_alloc_ioreq_mfn(struct
> hvm_ioreq_server *s, bool buf)
>      return 0;
>  }
> 
> -static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
> +static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s,
> +                               struct hvm_ioreq_page *iorp)
>  {
> -    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
> -
>      if ( !iorp->page )
>          return;
> 
> @@ -380,11 +378,11 @@ bool is_ioreq_server_page(struct domain *d, const
> struct page_info *page)
>      return found;
>  }
> 
> -static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
> +static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s,
> +                                 struct hvm_ioreq_page *iorp)
> 
>  {
>      struct domain *d = s->domain;
> -    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
> 
>      if ( IS_DEFAULT(s) || gfn_eq(iorp->gfn, INVALID_GFN) )
>          return;
> @@ -395,10 +393,10 @@ static void hvm_remove_ioreq_gfn(struct
> hvm_ioreq_server *s, bool buf)
>      clear_page(iorp->va);
>  }
> 
> -static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
> +static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s,
> +                             struct hvm_ioreq_page *iorp)
>  {
>      struct domain *d = s->domain;
> -    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
>      int rc;
> 
>      if ( IS_DEFAULT(s) || gfn_eq(iorp->gfn, INVALID_GFN) )
> @@ -550,36 +548,36 @@ static int hvm_ioreq_server_map_pages(struct
> hvm_ioreq_server *s)
>          rc = hvm_map_ioreq_gfn(s, true);
> 
>      if ( rc )
> -        hvm_unmap_ioreq_gfn(s, false);
> +        hvm_unmap_ioreq_gfn(s, &s->ioreq);
> 
>      return rc;
>  }
> 
>  static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
>  {
> -    hvm_unmap_ioreq_gfn(s, true);
> -    hvm_unmap_ioreq_gfn(s, false);
> +    hvm_unmap_ioreq_gfn(s, &s->ioreq);
> +    hvm_unmap_ioreq_gfn(s, &s->bufioreq);
>  }
> 
>  static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s)
>  {
>      int rc;
> 
> -    rc = hvm_alloc_ioreq_mfn(s, false);
> +    rc = hvm_alloc_ioreq_mfn(s, &s->ioreq);
> 
>      if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) )
> -        rc = hvm_alloc_ioreq_mfn(s, true);
> +        rc = hvm_alloc_ioreq_mfn(s, &s->bufioreq);
> 
>      if ( rc )
> -        hvm_free_ioreq_mfn(s, false);
> +        hvm_free_ioreq_mfn(s, &s->ioreq);
> 
>      return rc;
>  }
> 
>  static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s)
>  {
> -    hvm_free_ioreq_mfn(s, true);
> -    hvm_free_ioreq_mfn(s, false);
> +    hvm_free_ioreq_mfn(s, &s->bufioreq);
> +    hvm_free_ioreq_mfn(s, &s->ioreq);
>  }
> 
>  static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s)
> @@ -646,8 +644,8 @@ static void hvm_ioreq_server_enable(struct
> hvm_ioreq_server *s)
>      if ( s->enabled )
>          goto done;
> 
> -    hvm_remove_ioreq_gfn(s, false);
> -    hvm_remove_ioreq_gfn(s, true);
> +    hvm_remove_ioreq_gfn(s, &s->ioreq);
> +    hvm_remove_ioreq_gfn(s, &s->bufioreq);
> 
>      s->enabled = true;
> 
> @@ -667,8 +665,8 @@ static void hvm_ioreq_server_disable(struct
> hvm_ioreq_server *s)
>      if ( !s->enabled )
>          goto done;
> 
> -    hvm_add_ioreq_gfn(s, true);
> -    hvm_add_ioreq_gfn(s, false);
> +    hvm_add_ioreq_gfn(s, &s->bufioreq);
> +    hvm_add_ioreq_gfn(s, &s->ioreq);
> 
>      s->enabled = false;
> 
> --
> 1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages
  2017-12-06  7:50 ` [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages Chao Gao
@ 2017-12-06 15:04   ` Paul Durrant
  2017-12-06  9:02     ` Chao Gao
  2018-04-18  8:19   ` Jan Beulich
  1 sibling, 1 reply; 56+ messages in thread
From: Paul Durrant @ 2017-12-06 15:04 UTC (permalink / raw)
  To: 'Chao Gao', xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Jan Beulich, Ian Jackson

> -----Original Message-----
> From: Chao Gao [mailto:chao.gao@intel.com]
> Sent: 06 December 2017 07:50
> To: xen-devel@lists.xen.org
> Cc: Chao Gao <chao.gao@intel.com>; Paul Durrant
> <Paul.Durrant@citrix.com>; Tim (Xen.org) <tim@xen.org>; Stefano Stabellini
> <sstabellini@kernel.org>; Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com>; Jan Beulich <jbeulich@suse.com>; George
> Dunlap <George.Dunlap@citrix.com>; Andrew Cooper
> <Andrew.Cooper3@citrix.com>; Wei Liu <wei.liu2@citrix.com>; Ian Jackson
> <Ian.Jackson@citrix.com>
> Subject: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4
> pages
> 
> One 4K-byte page at most contains 128 'ioreq_t'. In order to remove the vcpu
> number constraint imposed by one IOREQ page, bump the number of IOREQ
> page to
> 4 pages. With this patch, multiple pages can be used as IOREQ page.
> 
> Basically, this patch extends 'ioreq' field in struct hvm_ioreq_server to an
> array. All accesses to 'ioreq' field such as 's->ioreq' are replaced with
> FOR_EACH_IOREQ_PAGE macro.
> 
> In order to access an IOREQ page, QEMU should get the gmfn and map this
> gmfn
> to its virtual address space.

No. There's no need to extend the 'legacy' mechanism of using magic page gfns. You should only handle the case where the mfns are allocated on demand (see the call to hvm_ioreq_server_alloc_pages() in hvm_get_ioreq_server_frame()). The number of guest vcpus is known at this point so the correct number of pages can be allocated. If the creator of the ioreq server attempts to use the legacy hvm_get_ioreq_server_info() and the guest has >128 vcpus then the call should fail.

  Paul

> Now there are several pages, to be compatible
> with previous QEMU, the interface to get the gmfn doesn't change. But
> newer
> QEMU needs to get the gmfn repeatly until a same gmfn is found. To
> implement
> this, an internal index is introduced: when QEMU queries the gmfn, the gmfn
> of
> IOREQ page referenced by the index is returned.  After each operation, the
> index increases by 1 and rewinds when it overflows.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> v4:
>  - new
> ---
>  tools/libxc/include/xc_dom.h     |   2 +-
>  tools/libxc/xc_dom_x86.c         |   6 +-
>  xen/arch/x86/hvm/hvm.c           |   1 +
>  xen/arch/x86/hvm/ioreq.c         | 116
> ++++++++++++++++++++++++++++++---------
>  xen/include/asm-x86/hvm/domain.h |   6 +-
>  xen/include/public/hvm/ioreq.h   |   2 +
>  xen/include/public/hvm/params.h  |   8 ++-
>  7 files changed, 110 insertions(+), 31 deletions(-)
> 
> diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
> index 45c9d67..2f8b412 100644
> --- a/tools/libxc/include/xc_dom.h
> +++ b/tools/libxc/include/xc_dom.h
> @@ -20,7 +20,7 @@
>  #include <xenguest.h>
> 
>  #define INVALID_PFN ((xen_pfn_t)-1)
> -#define X86_HVM_NR_SPECIAL_PAGES    8
> +#define X86_HVM_NR_SPECIAL_PAGES    11
>  #define X86_HVM_END_SPECIAL_REGION  0xff000u
> 
>  /* --- typedefs and structs ---------------------------------------- */
> diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
> index bff68a0..b316ebc 100644
> --- a/tools/libxc/xc_dom_x86.c
> +++ b/tools/libxc/xc_dom_x86.c
> @@ -32,6 +32,7 @@
>  #include <xen/foreign/x86_32.h>
>  #include <xen/foreign/x86_64.h>
>  #include <xen/hvm/hvm_info_table.h>
> +#include <xen/hvm/ioreq.h>
>  #include <xen/arch-x86/hvm/start_info.h>
>  #include <xen/io/protocols.h>
> 
> @@ -57,8 +58,8 @@
>  #define SPECIALPAGE_BUFIOREQ 3
>  #define SPECIALPAGE_XENSTORE 4
>  #define SPECIALPAGE_IOREQ    5
> -#define SPECIALPAGE_IDENT_PT 6
> -#define SPECIALPAGE_CONSOLE  7
> +#define SPECIALPAGE_IDENT_PT (5 + MAX_IOREQ_PAGE)
> +#define SPECIALPAGE_CONSOLE  (SPECIALPAGE_IDENT_PT + 1)
>  #define special_pfn(x) \
>      (X86_HVM_END_SPECIAL_REGION - X86_HVM_NR_SPECIAL_PAGES + (x))
> 
> @@ -612,6 +613,7 @@ static int alloc_magic_pages_hvm(struct
> xc_dom_image *dom)
>                                 X86_HVM_NR_SPECIAL_PAGES) )
>              goto error_out;
> 
> +    xc_hvm_param_set(xch, domid, HVM_PARAM_IOREQ_PAGES,
> MAX_IOREQ_PAGE);
>      xc_hvm_param_set(xch, domid, HVM_PARAM_STORE_PFN,
>                       special_pfn(SPECIALPAGE_XENSTORE));
>      xc_hvm_param_set(xch, domid, HVM_PARAM_BUFIOREQ_PFN,
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 5d06767..0b3bd04 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -4077,6 +4077,7 @@ static int hvm_allow_set_param(struct domain *d,
>      case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
>      case HVM_PARAM_ALTP2M:
>      case HVM_PARAM_MCA_CAP:
> +    case HVM_PARAM_IOREQ_PAGES:
>          if ( value != 0 && a->value != value )
>              rc = -EEXIST;
>          break;
> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
> index a879f20..0a36001 100644
> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -64,14 +64,24 @@ static struct hvm_ioreq_server
> *get_ioreq_server(const struct domain *d,
>              continue; \
>          else
> 
> +/* Iterate over all ioreq pages */
> +#define FOR_EACH_IOREQ_PAGE(s, i, iorp) \
> +    for ( (i) = 0, iorp = s->ioreq; (i) < (s)->ioreq_page_nr; (i)++, iorp++ )
> +
>  static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
>  {
> -    shared_iopage_t *p = s->ioreq.va;
> +    shared_iopage_t *p = s->ioreq[v->vcpu_id /
> IOREQ_NUM_PER_PAGE].va;
> 
>      ASSERT((v == current) || !vcpu_runnable(v));
>      ASSERT(p != NULL);
> 
> -    return &p->vcpu_ioreq[v->vcpu_id];
> +    return &p->vcpu_ioreq[v->vcpu_id % IOREQ_NUM_PER_PAGE];
> +}
> +
> +static ioreq_t *get_ioreq_fallible(struct hvm_ioreq_server *s, struct vcpu
> *v)
> +{
> +    return s->ioreq[v->vcpu_id / IOREQ_NUM_PER_PAGE].va ?
> +           get_ioreq(s, v) : NULL;
>  }
> 
>  bool hvm_io_pending(struct vcpu *v)
> @@ -252,10 +262,10 @@ static void hvm_unmap_ioreq_gfn(struct
> hvm_ioreq_server *s,
>      iorp->gfn = INVALID_GFN;
>  }
> 
> -static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
> +static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf,
> uint8_t i)
>  {
>      struct domain *d = s->domain;
> -    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
> +    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq[i];
>      int rc;
> 
>      if ( iorp->page )
> @@ -277,7 +287,7 @@ static int hvm_map_ioreq_gfn(struct
> hvm_ioreq_server *s, bool buf)
>      if ( IS_DEFAULT(s) )
>          iorp->gfn = _gfn(buf ?
>                           d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN] :
> -                         d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN]);
> +                         d->arch.hvm_domain.params[HVM_PARAM_IOREQ_PFN] + i);
>      else
>          iorp->gfn = hvm_alloc_ioreq_gfn(s);
> 
> @@ -366,7 +376,22 @@ bool is_ioreq_server_page(struct domain *d, const
> struct page_info *page)
> 
>      FOR_EACH_IOREQ_SERVER(d, id, s)
>      {
> -        if ( (s->ioreq.page == page) || (s->bufioreq.page == page) )
> +        int i;
> +        const struct hvm_ioreq_page *iorp;
> +
> +        FOR_EACH_IOREQ_PAGE(s, i, iorp)
> +        {
> +            if ( iorp->page == page )
> +            {
> +                found = true;
> +                break;
> +            }
> +        }
> +
> +        if ( found )
> +            break;
> +
> +        if ( s->bufioreq.page == page )
>          {
>              found = true;
>              break;
> @@ -415,14 +440,12 @@ static int hvm_add_ioreq_gfn(struct
> hvm_ioreq_server *s,
>  static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s,
>                                      struct hvm_ioreq_vcpu *sv)
>  {
> -    ASSERT(spin_is_locked(&s->lock));
> +    ioreq_t *p = get_ioreq_fallible(s, sv->vcpu);
> 
> -    if ( s->ioreq.va != NULL )
> -    {
> -        ioreq_t *p = get_ioreq(s, sv->vcpu);
> +    ASSERT(spin_is_locked(&s->lock));
> 
> +    if ( p )
>          p->vp_eport = sv->ioreq_evtchn;
> -    }
>  }
> 
>  #define HANDLE_BUFIOREQ(s) \
> @@ -540,44 +563,66 @@ static void
> hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
> 
>  static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s)
>  {
> -    int rc;
> +    int i, rc = -EINVAL;
> +    struct hvm_ioreq_page *iorp;
> 
> -    rc = hvm_map_ioreq_gfn(s, false);
> +    for ( i = 0; i < s->ioreq_page_nr; i++ )
> +    {
> +        rc = hvm_map_ioreq_gfn(s, false, i);
> +        if ( rc )
> +            break;
> +    }
> 
>      if ( !rc && HANDLE_BUFIOREQ(s) )
> -        rc = hvm_map_ioreq_gfn(s, true);
> +        rc = hvm_map_ioreq_gfn(s, true, 0);
> 
>      if ( rc )
> -        hvm_unmap_ioreq_gfn(s, &s->ioreq);
> +        FOR_EACH_IOREQ_PAGE(s, i, iorp)
> +            hvm_unmap_ioreq_gfn(s, iorp);
> 
>      return rc;
>  }
> 
>  static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
>  {
> -    hvm_unmap_ioreq_gfn(s, &s->ioreq);
> +    int i;
> +    struct hvm_ioreq_page *iorp;
> +
> +    FOR_EACH_IOREQ_PAGE(s, i, iorp)
> +        hvm_unmap_ioreq_gfn(s, iorp);
>      hvm_unmap_ioreq_gfn(s, &s->bufioreq);
>  }
> 
>  static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s)
>  {
> -    int rc;
> +    int i, rc = -EINVAL;
> +    struct hvm_ioreq_page *iorp;
> 
> -    rc = hvm_alloc_ioreq_mfn(s, &s->ioreq);
> +    FOR_EACH_IOREQ_PAGE(s, i, iorp)
> +    {
> +        rc = hvm_alloc_ioreq_mfn(s, iorp);
> +        if ( rc )
> +            break;
> +    }
> 
>      if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) )
>          rc = hvm_alloc_ioreq_mfn(s, &s->bufioreq);
> 
>      if ( rc )
> -        hvm_free_ioreq_mfn(s, &s->ioreq);
> +        FOR_EACH_IOREQ_PAGE(s, i, iorp)
> +            hvm_free_ioreq_mfn(s, iorp);
> 
>      return rc;
>  }
> 
>  static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s)
>  {
> +    int i;
> +    struct hvm_ioreq_page *iorp;
> +
>      hvm_free_ioreq_mfn(s, &s->bufioreq);
> -    hvm_free_ioreq_mfn(s, &s->ioreq);
> +    FOR_EACH_IOREQ_PAGE(s, i, iorp)
> +        hvm_free_ioreq_mfn(s, iorp);
>  }
> 
>  static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s)
> @@ -638,13 +683,16 @@ static int hvm_ioreq_server_alloc_rangesets(struct
> hvm_ioreq_server *s,
>  static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s)
>  {
>      struct hvm_ioreq_vcpu *sv;
> +    struct hvm_ioreq_page *iorp;
> +    int i;
> 
>      spin_lock(&s->lock);
> 
>      if ( s->enabled )
>          goto done;
> 
> -    hvm_remove_ioreq_gfn(s, &s->ioreq);
> +    FOR_EACH_IOREQ_PAGE(s, i, iorp)
> +        hvm_remove_ioreq_gfn(s, iorp);
>      hvm_remove_ioreq_gfn(s, &s->bufioreq);
> 
>      s->enabled = true;
> @@ -660,13 +708,17 @@ static void hvm_ioreq_server_enable(struct
> hvm_ioreq_server *s)
> 
>  static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s)
>  {
> +    struct hvm_ioreq_page *iorp;
> +    int i;
> +
>      spin_lock(&s->lock);
> 
>      if ( !s->enabled )
>          goto done;
> 
>      hvm_add_ioreq_gfn(s, &s->bufioreq);
> -    hvm_add_ioreq_gfn(s, &s->ioreq);
> +    FOR_EACH_IOREQ_PAGE(s, i, iorp)
> +        hvm_add_ioreq_gfn(s, iorp);
> 
>      s->enabled = false;
> 
> @@ -679,7 +731,8 @@ static int hvm_ioreq_server_init(struct
> hvm_ioreq_server *s,
>                                   int bufioreq_handling, ioservid_t id)
>  {
>      struct vcpu *v;
> -    int rc;
> +    int rc, i;
> +    struct hvm_ioreq_page *iorp;
> 
>      s->domain = d;
>      s->domid = domid;
> @@ -688,8 +741,15 @@ static int hvm_ioreq_server_init(struct
> hvm_ioreq_server *s,
>      INIT_LIST_HEAD(&s->ioreq_vcpu_list);
>      spin_lock_init(&s->bufioreq_lock);
> 
> -    s->ioreq.gfn = INVALID_GFN;
> +    FOR_EACH_IOREQ_PAGE(s, i, iorp)
> +        iorp->gfn = INVALID_GFN;
>      s->bufioreq.gfn = INVALID_GFN;
> +    s->ioreq_page_nr = (d->max_vcpus + IOREQ_NUM_PER_PAGE - 1) /
> +                       IOREQ_NUM_PER_PAGE;
> +    if ( s->ioreq_page_nr > d-
> >arch.hvm_domain.params[HVM_PARAM_IOREQ_PAGES] )
> +        return -EINVAL;
> +
> +    s->ioreq_idx = 0;
> 
>      rc = hvm_ioreq_server_alloc_rangesets(s, id);
>      if ( rc )
> @@ -866,7 +926,10 @@ int hvm_get_ioreq_server_info(struct domain *d,
> ioservid_t id,
>      }
> 
>      if ( ioreq_gfn )
> -        *ioreq_gfn = gfn_x(s->ioreq.gfn);
> +    {
> +        *ioreq_gfn = gfn_x(s->ioreq[s->ioreq_idx].gfn);
> +        s->ioreq_idx = (s->ioreq_idx + 1) % s->ioreq_page_nr;
> +    }
> 
>      if ( HANDLE_BUFIOREQ(s) )
>      {
> @@ -916,7 +979,8 @@ int hvm_get_ioreq_server_frame(struct domain *d,
> ioservid_t id,
>          break;
> 
>      case XENMEM_resource_ioreq_server_frame_ioreq(0):
> -        *mfn = _mfn(page_to_mfn(s->ioreq.page));
> +        *mfn = _mfn(page_to_mfn(s->ioreq[s->ioreq_idx].page));
> +        s->ioreq_idx = (s->ioreq_idx + 1) % s->ioreq_page_nr;
>          rc = 0;
>          break;
> 
> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-
> x86/hvm/domain.h
> index 87f7994..ff8c44d 100644
> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -51,6 +51,7 @@ struct hvm_ioreq_vcpu {
> 
>  #define NR_IO_RANGE_TYPES (XEN_DMOP_IO_RANGE_PCI + 1)
>  #define MAX_NR_IO_RANGES  256
> +#define IOREQ_NUM_PER_PAGE (PAGE_SIZE / sizeof(ioreq_t))
> 
>  struct hvm_ioreq_server {
>      struct list_head       list_entry;
> @@ -61,7 +62,10 @@ struct hvm_ioreq_server {
> 
>      /* Domain id of emulating domain */
>      domid_t                domid;
> -    struct hvm_ioreq_page  ioreq;
> +    /* Index and size of ioreq page array */
> +    uint8_t                ioreq_idx;
> +    uint8_t                ioreq_page_nr;
> +    struct hvm_ioreq_page  ioreq[MAX_IOREQ_PAGE];
>      struct list_head       ioreq_vcpu_list;
>      struct hvm_ioreq_page  bufioreq;
> 
> diff --git a/xen/include/public/hvm/ioreq.h
> b/xen/include/public/hvm/ioreq.h
> index d309d12..d628303 100644
> --- a/xen/include/public/hvm/ioreq.h
> +++ b/xen/include/public/hvm/ioreq.h
> @@ -71,6 +71,8 @@ struct shared_iopage {
>  };
>  typedef struct shared_iopage shared_iopage_t;
> 
> +#define MAX_IOREQ_PAGE 4
> +
>  struct buf_ioreq {
>      uint8_t  type;   /* I/O type                    */
>      uint8_t  pad:1;
> diff --git a/xen/include/public/hvm/params.h
> b/xen/include/public/hvm/params.h
> index 2ec2e7c..537799d 100644
> --- a/xen/include/public/hvm/params.h
> +++ b/xen/include/public/hvm/params.h
> @@ -279,6 +279,12 @@
>  #define XEN_HVM_MCA_CAP_LMCE   (xen_mk_ullong(1) << 0)
>  #define XEN_HVM_MCA_CAP_MASK   XEN_HVM_MCA_CAP_LMCE
> 
> -#define HVM_NR_PARAMS 39
> +/*
> + * Number of pages that are reserved for default IOREQ server. The base
> PFN
> + * is set via HVM_PARAM_IOREQ_PFN.
> + */
> +#define HVM_PARAM_IOREQ_PAGES 39
> +
> +#define HVM_NR_PARAMS 40
> 
>  #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */
> --
> 1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages
  2017-12-06  9:02     ` Chao Gao
@ 2017-12-06 16:10       ` Paul Durrant
  2017-12-07  8:41         ` Paul Durrant
  0 siblings, 1 reply; 56+ messages in thread
From: Paul Durrant @ 2017-12-06 16:10 UTC (permalink / raw)
  To: 'Chao Gao'
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, xen-devel, Jan Beulich, Ian Jackson

> -----Original Message-----
> From: Chao Gao [mailto:chao.gao@intel.com]
> Sent: 06 December 2017 09:02
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel@lists.xen.org; Tim (Xen.org) <tim@xen.org>; Stefano
> Stabellini <sstabellini@kernel.org>; Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com>; Jan Beulich <jbeulich@suse.com>; George
> Dunlap <George.Dunlap@citrix.com>; Andrew Cooper
> <Andrew.Cooper3@citrix.com>; Wei Liu <wei.liu2@citrix.com>; Ian Jackson
> <Ian.Jackson@citrix.com>
> Subject: Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4
> pages
> 
> On Wed, Dec 06, 2017 at 03:04:11PM +0000, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Chao Gao [mailto:chao.gao@intel.com]
> >> Sent: 06 December 2017 07:50
> >> To: xen-devel@lists.xen.org
> >> Cc: Chao Gao <chao.gao@intel.com>; Paul Durrant
> >> <Paul.Durrant@citrix.com>; Tim (Xen.org) <tim@xen.org>; Stefano
> Stabellini
> >> <sstabellini@kernel.org>; Konrad Rzeszutek Wilk
> >> <konrad.wilk@oracle.com>; Jan Beulich <jbeulich@suse.com>; George
> >> Dunlap <George.Dunlap@citrix.com>; Andrew Cooper
> >> <Andrew.Cooper3@citrix.com>; Wei Liu <wei.liu2@citrix.com>; Ian
> Jackson
> >> <Ian.Jackson@citrix.com>
> >> Subject: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4
> >> pages
> >>
> >> One 4K-byte page at most contains 128 'ioreq_t'. In order to remove the
> vcpu
> >> number constraint imposed by one IOREQ page, bump the number of
> IOREQ
> >> page to
> >> 4 pages. With this patch, multiple pages can be used as IOREQ page.
> >>
> >> Basically, this patch extends 'ioreq' field in struct hvm_ioreq_server to an
> >> array. All accesses to 'ioreq' field such as 's->ioreq' are replaced with
> >> FOR_EACH_IOREQ_PAGE macro.
> >>
> >> In order to access an IOREQ page, QEMU should get the gmfn and map
> this
> >> gmfn
> >> to its virtual address space.
> >
> >No. There's no need to extend the 'legacy' mechanism of using magic page
> gfns. You should only handle the case where the mfns are allocated on
> demand (see the call to hvm_ioreq_server_alloc_pages() in
> hvm_get_ioreq_server_frame()). The number of guest vcpus is known at
> this point so the correct number of pages can be allocated. If the creator of
> the ioreq server attempts to use the legacy hvm_get_ioreq_server_info()
> and the guest has >128 vcpus then the call should fail.
> 
> Great suggestion. I will introduce a new dmop, a variant of
> hvm_get_ioreq_server_frame() for creator to get an array of gfns and the
> size of array. And the legacy interface will report an error if more
> than one IOREQ PAGES are needed.

You don't need a new dmop for mapping I think. The mem op to map ioreq server frames should work. All you should need to do is update hvm_get_ioreq_server_frame() to deal with an index > 1, and provide some means for the ioreq server creator to convert the number of guest vcpus into the correct number of pages to map. (That might need a new dm op).

  Paul

> 
> Thanks
> Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages
  2017-12-07  8:41         ` Paul Durrant
@ 2017-12-07  6:56           ` Chao Gao
  2017-12-08 11:06             ` Paul Durrant
  0 siblings, 1 reply; 56+ messages in thread
From: Chao Gao @ 2017-12-07  6:56 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, xen-devel, Jan Beulich, Ian Jackson

On Thu, Dec 07, 2017 at 08:41:14AM +0000, Paul Durrant wrote:
>> -----Original Message-----
>> From: Xen-devel [mailto:xen-devel-bounces@lists.xenproject.org] On Behalf
>> Of Paul Durrant
>> Sent: 06 December 2017 16:10
>> To: 'Chao Gao' <chao.gao@intel.com>
>> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
>> <wei.liu2@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Tim
>> (Xen.org) <tim@xen.org>; George Dunlap <George.Dunlap@citrix.com>;
>> xen-devel@lists.xen.org; Jan Beulich <jbeulich@suse.com>; Ian Jackson
>> <Ian.Jackson@citrix.com>
>> Subject: Re: [Xen-devel] [RFC Patch v4 2/8] ioreq: bump the number of
>> IOREQ page to 4 pages
>> 
>> > -----Original Message-----
>> > From: Chao Gao [mailto:chao.gao@intel.com]
>> > Sent: 06 December 2017 09:02
>> > To: Paul Durrant <Paul.Durrant@citrix.com>
>> > Cc: xen-devel@lists.xen.org; Tim (Xen.org) <tim@xen.org>; Stefano
>> > Stabellini <sstabellini@kernel.org>; Konrad Rzeszutek Wilk
>> > <konrad.wilk@oracle.com>; Jan Beulich <jbeulich@suse.com>; George
>> > Dunlap <George.Dunlap@citrix.com>; Andrew Cooper
>> > <Andrew.Cooper3@citrix.com>; Wei Liu <wei.liu2@citrix.com>; Ian Jackson
>> > <Ian.Jackson@citrix.com>
>> > Subject: Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4
>> > pages
>> >
>> > On Wed, Dec 06, 2017 at 03:04:11PM +0000, Paul Durrant wrote:
>> > >> -----Original Message-----
>> > >> From: Chao Gao [mailto:chao.gao@intel.com]
>> > >> Sent: 06 December 2017 07:50
>> > >> To: xen-devel@lists.xen.org
>> > >> Cc: Chao Gao <chao.gao@intel.com>; Paul Durrant
>> > >> <Paul.Durrant@citrix.com>; Tim (Xen.org) <tim@xen.org>; Stefano
>> > Stabellini
>> > >> <sstabellini@kernel.org>; Konrad Rzeszutek Wilk
>> > >> <konrad.wilk@oracle.com>; Jan Beulich <jbeulich@suse.com>; George
>> > >> Dunlap <George.Dunlap@citrix.com>; Andrew Cooper
>> > >> <Andrew.Cooper3@citrix.com>; Wei Liu <wei.liu2@citrix.com>; Ian
>> > Jackson
>> > >> <Ian.Jackson@citrix.com>
>> > >> Subject: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4
>> > >> pages
>> > >>
>> > >> One 4K-byte page at most contains 128 'ioreq_t'. In order to remove the
>> > vcpu
>> > >> number constraint imposed by one IOREQ page, bump the number of
>> > IOREQ
>> > >> page to
>> > >> 4 pages. With this patch, multiple pages can be used as IOREQ page.
>> > >>
>> > >> Basically, this patch extends 'ioreq' field in struct hvm_ioreq_server to
>> an
>> > >> array. All accesses to 'ioreq' field such as 's->ioreq' are replaced with
>> > >> FOR_EACH_IOREQ_PAGE macro.
>> > >>
>> > >> In order to access an IOREQ page, QEMU should get the gmfn and map
>> > this
>> > >> gmfn
>> > >> to its virtual address space.
>> > >
>> > >No. There's no need to extend the 'legacy' mechanism of using magic
>> page
>> > gfns. You should only handle the case where the mfns are allocated on
>> > demand (see the call to hvm_ioreq_server_alloc_pages() in
>> > hvm_get_ioreq_server_frame()). The number of guest vcpus is known at
>> > this point so the correct number of pages can be allocated. If the creator of
>> > the ioreq server attempts to use the legacy hvm_get_ioreq_server_info()
>> > and the guest has >128 vcpus then the call should fail.
>> >
>> > Great suggestion. I will introduce a new dmop, a variant of
>> > hvm_get_ioreq_server_frame() for creator to get an array of gfns and the
>> > size of array. And the legacy interface will report an error if more
>> > than one IOREQ PAGES are needed.
>> 
>> You don't need a new dmop for mapping I think. The mem op to map ioreq
>> server frames should work. All you should need to do is update
>> hvm_get_ioreq_server_frame() to deal with an index > 1, and provide some
>> means for the ioreq server creator to convert the number of guest vcpus into
>> the correct number of pages to map. (That might need a new dm op).
>
>I realise after saying this that an emulator already knows the size of the ioreq structure and so can easily calculate the correct number of pages to map, given the number of guest vcpus.

How about the patch in the bottom? Is it in the right direction?
Do you have the QEMU patch, which replaces the old method with the new method
to set up mapping? I want to integrate that patch and do some tests.

Thanks
Chao

From 44919e1e80f36981d6e213f74302c8c89cc9f828 Mon Sep 17 00:00:00 2001
From: Chao Gao <chao.gao@intel.com>
Date: Tue, 5 Dec 2017 14:20:24 +0800
Subject: [PATCH] ioreq: add support of multiple ioreq pages

Each vcpu should have an corresponding 'ioreq_t' structure in the ioreq page.
Currently, only one 4K-byte page is used as ioreq page. Thus it also limits
the number of vcpu to 128 if device model is in use.

This patch changes 'ioreq' field to an array. At most, 4 pages can be used.
When creating IO server, the actual number of ioreq page is calculated
according to the number of vcpus. All ioreq pages are allocated on demand.
The creator should provide enough number of gfn to set up the mapping.

For compatibility, all legacy operations take effect on ioreq[0].

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 xen/arch/x86/hvm/ioreq.c         | 86 ++++++++++++++++++++++++++++------------
 xen/include/asm-x86/hvm/domain.h |  6 ++-
 2 files changed, 65 insertions(+), 27 deletions(-)

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index d991ac9..598aedb 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -66,12 +66,12 @@ static struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
 
 static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
 {
-    shared_iopage_t *p = s->ioreq.va;
+    shared_iopage_t *p = s->ioreq[v->vcpu_id / NR_IOREQ_PER_PAGE].va;
 
     ASSERT((v == current) || !vcpu_runnable(v));
     ASSERT(p != NULL);
 
-    return &p->vcpu_ioreq[v->vcpu_id];
+    return &p->vcpu_ioreq[v->vcpu_id % NR_IOREQ_PER_PAGE];
 }
 
 bool hvm_io_pending(struct vcpu *v)
@@ -239,7 +239,7 @@ static void hvm_free_ioreq_gfn(struct hvm_ioreq_server *s, gfn_t gfn)
 
 static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
 {
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq[0];
 
     if ( gfn_eq(iorp->gfn, INVALID_GFN) )
         return;
@@ -256,7 +256,7 @@ static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
 static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
 {
     struct domain *d = s->domain;
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq[0];
     int rc;
 
     if ( iorp->page )
@@ -294,10 +294,10 @@ static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
     return rc;
 }
 
-static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, uint8_t idx)
 {
     struct domain *currd = current->domain;
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    struct hvm_ioreq_page *iorp = idx ? &s->ioreq[idx - 1] : &s->bufioreq;
 
     if ( iorp->page )
     {
@@ -344,9 +344,9 @@ static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
     return 0;
 }
 
-static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, uint8_t idx)
 {
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    struct hvm_ioreq_page *iorp = idx ? &s->ioreq[idx - 1] : &s->bufioreq;
 
     if ( !iorp->page )
         return;
@@ -368,7 +368,18 @@ bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
 
     FOR_EACH_IOREQ_SERVER(d, id, s)
     {
-        if ( (s->ioreq.page == page) || (s->bufioreq.page == page) )
+        int i;
+
+        for ( i = 0; i < s->nr_ioreq_page; i++ )
+        {
+            if ( s->ioreq[i].page == page )
+            {
+                found = true;
+                break;
+            }
+        }
+
+        if ( !found && s->bufioreq.page == page )
         {
             found = true;
             break;
@@ -384,7 +395,7 @@ static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
 
 {
     struct domain *d = s->domain;
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq[0];
 
     if ( IS_DEFAULT(s) || gfn_eq(iorp->gfn, INVALID_GFN) )
         return;
@@ -398,7 +409,7 @@ static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
 static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
 {
     struct domain *d = s->domain;
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq[0];
     int rc;
 
     if ( IS_DEFAULT(s) || gfn_eq(iorp->gfn, INVALID_GFN) )
@@ -419,7 +430,7 @@ static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s,
 {
     ASSERT(spin_is_locked(&s->lock));
 
-    if ( s->ioreq.va != NULL )
+    if ( s->ioreq[sv->vcpu->vcpu_id / NR_IOREQ_PER_PAGE].va )
     {
         ioreq_t *p = get_ioreq(s, sv->vcpu);
 
@@ -563,23 +574,27 @@ static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
 
 static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s)
 {
-    int rc;
-
-    rc = hvm_alloc_ioreq_mfn(s, false);
+    int rc = 0, i;
 
-    if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) )
-        rc = hvm_alloc_ioreq_mfn(s, true);
+    for ( i = !HANDLE_BUFIOREQ(s); i <= s->nr_ioreq_page; i++ )
+    {
+        rc = hvm_alloc_ioreq_mfn(s, i);
+        break;
+    }
 
     if ( rc )
-        hvm_free_ioreq_mfn(s, false);
+        for ( ; i >= 0; i-- )
+            hvm_free_ioreq_mfn(s, i);
 
     return rc;
 }
 
 static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s)
 {
-    hvm_free_ioreq_mfn(s, true);
-    hvm_free_ioreq_mfn(s, false);
+    int i;
+
+    for ( i = 0; i <= s->nr_ioreq_page; i++ )
+        hvm_free_ioreq_mfn(s, i);
 }
 
 static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s)
@@ -681,7 +696,7 @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server *s,
                                  int bufioreq_handling, ioservid_t id)
 {
     struct vcpu *v;
-    int rc;
+    int rc, i;
 
     s->domain = d;
     s->domid = domid;
@@ -690,7 +705,10 @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server *s,
     INIT_LIST_HEAD(&s->ioreq_vcpu_list);
     spin_lock_init(&s->bufioreq_lock);
 
-    s->ioreq.gfn = INVALID_GFN;
+    s->nr_ioreq_page = (d->max_vcpus + NR_IOREQ_PER_PAGE - 1) /
+                       NR_IOREQ_PER_PAGE;
+    for ( i = 0; i < MAX_NR_IOREQ_PAGE; i++ )
+        s->ioreq[i].gfn = INVALID_GFN;
     s->bufioreq.gfn = INVALID_GFN;
 
     rc = hvm_ioreq_server_alloc_rangesets(s, id);
@@ -760,6 +778,10 @@ int hvm_create_ioreq_server(struct domain *d, domid_t domid,
         rc = -EEXIST;
         if ( GET_IOREQ_SERVER(d, i) )
             goto fail;
+
+        /* Don't create default IO server if > 1 ioreq pages are needed */
+        if ( d->max_vcpus > NR_IOREQ_PER_PAGE )
+            goto fail;
     }
     else
     {
@@ -858,6 +880,9 @@ int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
     if ( !s )
         goto out;
 
+    if ( s->nr_ioreq_page > 1 )
+        return -EINVAL;
+
     ASSERT(!IS_DEFAULT(s));
 
     if ( ioreq_gfn || bufioreq_gfn )
@@ -868,7 +893,7 @@ int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
     }
 
     if ( ioreq_gfn )
-        *ioreq_gfn = gfn_x(s->ioreq.gfn);
+        *ioreq_gfn = gfn_x(s->ioreq[0].gfn);
 
     if ( HANDLE_BUFIOREQ(s) )
     {
@@ -917,10 +942,19 @@ int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
         rc = 0;
         break;
 
-    case XENMEM_resource_ioreq_server_frame_ioreq(0):
-        *mfn = _mfn(page_to_mfn(s->ioreq.page));
-        rc = 0;
+    case XENMEM_resource_ioreq_server_frame_ioreq(0) ... \
+         XENMEM_resource_ioreq_server_frame_ioreq(MAX_NR_IOREQ_PAGE):
+    {
+        int i = idx - XENMEM_resource_ioreq_server_frame_ioreq(0);
+
+        rc = -EINVAL;
+        if ( i < s->nr_ioreq_page )
+        {
+            *mfn = _mfn(page_to_mfn(s->ioreq[i].page));
+            rc = 0;
+        }
         break;
+    }
 
     default:
         rc = -EINVAL;
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 87f7994..3202f74 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -51,6 +51,8 @@ struct hvm_ioreq_vcpu {
 
 #define NR_IO_RANGE_TYPES (XEN_DMOP_IO_RANGE_PCI + 1)
 #define MAX_NR_IO_RANGES  256
+#define MAX_NR_IOREQ_PAGE 4
+#define NR_IOREQ_PER_PAGE (PAGE_SIZE / sizeof(ioreq_t))
 
 struct hvm_ioreq_server {
     struct list_head       list_entry;
@@ -61,7 +63,9 @@ struct hvm_ioreq_server {
 
     /* Domain id of emulating domain */
     domid_t                domid;
-    struct hvm_ioreq_page  ioreq;
+    /* Per-IOserver limitation on the size of 'ioreq' array */
+    uint8_t                nr_ioreq_page;
+    struct hvm_ioreq_page  ioreq[MAX_NR_IOREQ_PAGE];
     struct list_head       ioreq_vcpu_list;
     struct hvm_ioreq_page  bufioreq;
 
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages
  2017-12-06 16:10       ` Paul Durrant
@ 2017-12-07  8:41         ` Paul Durrant
  2017-12-07  6:56           ` Chao Gao
  0 siblings, 1 reply; 56+ messages in thread
From: Paul Durrant @ 2017-12-07  8:41 UTC (permalink / raw)
  To: Paul Durrant, 'Chao Gao'
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, xen-devel, Jan Beulich, Ian Jackson

> -----Original Message-----
> From: Xen-devel [mailto:xen-devel-bounces@lists.xenproject.org] On Behalf
> Of Paul Durrant
> Sent: 06 December 2017 16:10
> To: 'Chao Gao' <chao.gao@intel.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> <wei.liu2@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Tim
> (Xen.org) <tim@xen.org>; George Dunlap <George.Dunlap@citrix.com>;
> xen-devel@lists.xen.org; Jan Beulich <jbeulich@suse.com>; Ian Jackson
> <Ian.Jackson@citrix.com>
> Subject: Re: [Xen-devel] [RFC Patch v4 2/8] ioreq: bump the number of
> IOREQ page to 4 pages
> 
> > -----Original Message-----
> > From: Chao Gao [mailto:chao.gao@intel.com]
> > Sent: 06 December 2017 09:02
> > To: Paul Durrant <Paul.Durrant@citrix.com>
> > Cc: xen-devel@lists.xen.org; Tim (Xen.org) <tim@xen.org>; Stefano
> > Stabellini <sstabellini@kernel.org>; Konrad Rzeszutek Wilk
> > <konrad.wilk@oracle.com>; Jan Beulich <jbeulich@suse.com>; George
> > Dunlap <George.Dunlap@citrix.com>; Andrew Cooper
> > <Andrew.Cooper3@citrix.com>; Wei Liu <wei.liu2@citrix.com>; Ian Jackson
> > <Ian.Jackson@citrix.com>
> > Subject: Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4
> > pages
> >
> > On Wed, Dec 06, 2017 at 03:04:11PM +0000, Paul Durrant wrote:
> > >> -----Original Message-----
> > >> From: Chao Gao [mailto:chao.gao@intel.com]
> > >> Sent: 06 December 2017 07:50
> > >> To: xen-devel@lists.xen.org
> > >> Cc: Chao Gao <chao.gao@intel.com>; Paul Durrant
> > >> <Paul.Durrant@citrix.com>; Tim (Xen.org) <tim@xen.org>; Stefano
> > Stabellini
> > >> <sstabellini@kernel.org>; Konrad Rzeszutek Wilk
> > >> <konrad.wilk@oracle.com>; Jan Beulich <jbeulich@suse.com>; George
> > >> Dunlap <George.Dunlap@citrix.com>; Andrew Cooper
> > >> <Andrew.Cooper3@citrix.com>; Wei Liu <wei.liu2@citrix.com>; Ian
> > Jackson
> > >> <Ian.Jackson@citrix.com>
> > >> Subject: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4
> > >> pages
> > >>
> > >> One 4K-byte page at most contains 128 'ioreq_t'. In order to remove the
> > vcpu
> > >> number constraint imposed by one IOREQ page, bump the number of
> > IOREQ
> > >> page to
> > >> 4 pages. With this patch, multiple pages can be used as IOREQ page.
> > >>
> > >> Basically, this patch extends 'ioreq' field in struct hvm_ioreq_server to
> an
> > >> array. All accesses to 'ioreq' field such as 's->ioreq' are replaced with
> > >> FOR_EACH_IOREQ_PAGE macro.
> > >>
> > >> In order to access an IOREQ page, QEMU should get the gmfn and map
> > this
> > >> gmfn
> > >> to its virtual address space.
> > >
> > >No. There's no need to extend the 'legacy' mechanism of using magic
> page
> > gfns. You should only handle the case where the mfns are allocated on
> > demand (see the call to hvm_ioreq_server_alloc_pages() in
> > hvm_get_ioreq_server_frame()). The number of guest vcpus is known at
> > this point so the correct number of pages can be allocated. If the creator of
> > the ioreq server attempts to use the legacy hvm_get_ioreq_server_info()
> > and the guest has >128 vcpus then the call should fail.
> >
> > Great suggestion. I will introduce a new dmop, a variant of
> > hvm_get_ioreq_server_frame() for creator to get an array of gfns and the
> > size of array. And the legacy interface will report an error if more
> > than one IOREQ PAGES are needed.
> 
> You don't need a new dmop for mapping I think. The mem op to map ioreq
> server frames should work. All you should need to do is update
> hvm_get_ioreq_server_frame() to deal with an index > 1, and provide some
> means for the ioreq server creator to convert the number of guest vcpus into
> the correct number of pages to map. (That might need a new dm op).

I realise after saying this that an emulator already knows the size of the ioreq structure and so can easily calculate the correct number of pages to map, given the number of guest vcpus.

  Paul

> 
>   Paul
> 
> >
> > Thanks
> > Chao
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages
  2017-12-07  6:56           ` Chao Gao
@ 2017-12-08 11:06             ` Paul Durrant
  2017-12-12  1:03               ` Chao Gao
  0 siblings, 1 reply; 56+ messages in thread
From: Paul Durrant @ 2017-12-08 11:06 UTC (permalink / raw)
  To: 'Chao Gao'
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, xen-devel, Jan Beulich, Ian Jackson

[-- Attachment #1: Type: text/plain, Size: 5514 bytes --]

> -----Original Message-----
> From: Chao Gao [mailto:chao.gao@intel.com]
> Sent: 07 December 2017 06:57
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> <wei.liu2@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Tim
> (Xen.org) <tim@xen.org>; George Dunlap <George.Dunlap@citrix.com>;
> xen-devel@lists.xen.org; Jan Beulich <jbeulich@suse.com>; Ian Jackson
> <Ian.Jackson@citrix.com>
> Subject: Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4
> pages
> 
> On Thu, Dec 07, 2017 at 08:41:14AM +0000, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Xen-devel [mailto:xen-devel-bounces@lists.xenproject.org] On
> Behalf
> >> Of Paul Durrant
> >> Sent: 06 December 2017 16:10
> >> To: 'Chao Gao' <chao.gao@intel.com>
> >> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> >> <wei.liu2@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>;
> Tim
> >> (Xen.org) <tim@xen.org>; George Dunlap <George.Dunlap@citrix.com>;
> >> xen-devel@lists.xen.org; Jan Beulich <jbeulich@suse.com>; Ian Jackson
> >> <Ian.Jackson@citrix.com>
> >> Subject: Re: [Xen-devel] [RFC Patch v4 2/8] ioreq: bump the number of
> >> IOREQ page to 4 pages
> >>
> >> > -----Original Message-----
> >> > From: Chao Gao [mailto:chao.gao@intel.com]
> >> > Sent: 06 December 2017 09:02
> >> > To: Paul Durrant <Paul.Durrant@citrix.com>
> >> > Cc: xen-devel@lists.xen.org; Tim (Xen.org) <tim@xen.org>; Stefano
> >> > Stabellini <sstabellini@kernel.org>; Konrad Rzeszutek Wilk
> >> > <konrad.wilk@oracle.com>; Jan Beulich <jbeulich@suse.com>; George
> >> > Dunlap <George.Dunlap@citrix.com>; Andrew Cooper
> >> > <Andrew.Cooper3@citrix.com>; Wei Liu <wei.liu2@citrix.com>; Ian
> Jackson
> >> > <Ian.Jackson@citrix.com>
> >> > Subject: Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page
> to 4
> >> > pages
> >> >
> >> > On Wed, Dec 06, 2017 at 03:04:11PM +0000, Paul Durrant wrote:
> >> > >> -----Original Message-----
> >> > >> From: Chao Gao [mailto:chao.gao@intel.com]
> >> > >> Sent: 06 December 2017 07:50
> >> > >> To: xen-devel@lists.xen.org
> >> > >> Cc: Chao Gao <chao.gao@intel.com>; Paul Durrant
> >> > >> <Paul.Durrant@citrix.com>; Tim (Xen.org) <tim@xen.org>; Stefano
> >> > Stabellini
> >> > >> <sstabellini@kernel.org>; Konrad Rzeszutek Wilk
> >> > >> <konrad.wilk@oracle.com>; Jan Beulich <jbeulich@suse.com>;
> George
> >> > >> Dunlap <George.Dunlap@citrix.com>; Andrew Cooper
> >> > >> <Andrew.Cooper3@citrix.com>; Wei Liu <wei.liu2@citrix.com>; Ian
> >> > Jackson
> >> > >> <Ian.Jackson@citrix.com>
> >> > >> Subject: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page
> to 4
> >> > >> pages
> >> > >>
> >> > >> One 4K-byte page at most contains 128 'ioreq_t'. In order to remove
> the
> >> > vcpu
> >> > >> number constraint imposed by one IOREQ page, bump the number
> of
> >> > IOREQ
> >> > >> page to
> >> > >> 4 pages. With this patch, multiple pages can be used as IOREQ page.
> >> > >>
> >> > >> Basically, this patch extends 'ioreq' field in struct hvm_ioreq_server
> to
> >> an
> >> > >> array. All accesses to 'ioreq' field such as 's->ioreq' are replaced with
> >> > >> FOR_EACH_IOREQ_PAGE macro.
> >> > >>
> >> > >> In order to access an IOREQ page, QEMU should get the gmfn and
> map
> >> > this
> >> > >> gmfn
> >> > >> to its virtual address space.
> >> > >
> >> > >No. There's no need to extend the 'legacy' mechanism of using magic
> >> page
> >> > gfns. You should only handle the case where the mfns are allocated on
> >> > demand (see the call to hvm_ioreq_server_alloc_pages() in
> >> > hvm_get_ioreq_server_frame()). The number of guest vcpus is known
> at
> >> > this point so the correct number of pages can be allocated. If the creator
> of
> >> > the ioreq server attempts to use the legacy
> hvm_get_ioreq_server_info()
> >> > and the guest has >128 vcpus then the call should fail.
> >> >
> >> > Great suggestion. I will introduce a new dmop, a variant of
> >> > hvm_get_ioreq_server_frame() for creator to get an array of gfns and
> the
> >> > size of array. And the legacy interface will report an error if more
> >> > than one IOREQ PAGES are needed.
> >>
> >> You don't need a new dmop for mapping I think. The mem op to map
> ioreq
> >> server frames should work. All you should need to do is update
> >> hvm_get_ioreq_server_frame() to deal with an index > 1, and provide
> some
> >> means for the ioreq server creator to convert the number of guest vcpus
> into
> >> the correct number of pages to map. (That might need a new dm op).
> >
> >I realise after saying this that an emulator already knows the size of the
> ioreq structure and so can easily calculate the correct number of pages to
> map, given the number of guest vcpus.
> 
> How about the patch in the bottom? Is it in the right direction?

Yes, certainly along the right lines. I would probably do away with MAX_NR_IOREQ_PAGE though. You should just to dynamically allocate the correct number of ioreq pages when the ioreq server is created (since you already calculate nr_ioreq_page there anyway).

> Do you have the QEMU patch, which replaces the old method with the new
> method
> to set up mapping? I want to integrate that patch and do some tests.

Sure. There's a couple of patched. I have not tested them with recent rebases of my series so you may find some issues.

Cheers,

  Paul

[-- Attachment #2: 0001-Separate-ioreq-server-mapping-code-from-general-init.patch --]
[-- Type: application/octet-stream, Size: 5266 bytes --]

From b162cc6d92bffe28efac38f5b9501a9c28b5be79 Mon Sep 17 00:00:00 2001
From: Paul Durrant <paul.durrant@citrix.com>
Date: Thu, 10 Aug 2017 11:37:22 +0100
Subject: [PATCH 1/2] Separate ioreq server mapping code from general init

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
 hw/i386/xen/xen-hvm.c | 81 ++++++++++++++++++++++++++++++---------------------
 1 file changed, 48 insertions(+), 33 deletions(-)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index d9ccd5d0d6..59e3122daf 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -95,7 +95,8 @@ typedef struct XenIOState {
     CPUState **cpu_by_vcpu_id;
     /* the evtchn port for polling the notification, */
     evtchn_port_t *ioreq_local_port;
-    /* evtchn local port for buffered io */
+    /* evtchn remote and local ports for buffered io */
+    evtchn_port_t bufioreq_remote_port;
     evtchn_port_t bufioreq_local_port;
     /* the evtchn fd for polling */
     xenevtchn_handle *xce_handle;
@@ -1232,12 +1233,52 @@ static void xen_wakeup_notifier(Notifier *notifier, void *data)
     xc_set_hvm_param(xen_xc, xen_domid, HVM_PARAM_ACPI_S_STATE, 0);
 }
 
-void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
+static int xen_map_ioreq_server(XenIOState *state)
 {
-    int i, rc;
     xen_pfn_t ioreq_pfn;
     xen_pfn_t bufioreq_pfn;
     evtchn_port_t bufioreq_evtchn;
+    int rc;
+    
+    rc = xen_get_ioreq_server_info(xen_domid, state->ioservid,
+                                   &ioreq_pfn, &bufioreq_pfn,
+                                   &bufioreq_evtchn);
+    if (rc < 0) {
+        error_report("failed to get ioreq server info: error %d handle=%p",
+                     errno, xen_xc);
+        return rc;
+    }
+
+    DPRINTF("shared page at pfn %lx\n", ioreq_pfn);
+    DPRINTF("buffered io page at pfn %lx\n", bufioreq_pfn);
+    DPRINTF("buffered io evtchn is %x\n", bufioreq_evtchn);
+
+    state->shared_page = xenforeignmemory_map(xen_fmem, xen_domid,
+                                              PROT_READ|PROT_WRITE,
+                                              1, &ioreq_pfn, NULL);
+    if (state->shared_page == NULL) {
+        error_report("map shared IO page returned error %d handle=%p",
+                     errno, xen_xc);
+        return -1;
+    }
+
+    state->buffered_io_page = xenforeignmemory_map(xen_fmem, xen_domid,
+                                                   PROT_READ|PROT_WRITE,
+                                                   1, &bufioreq_pfn, NULL);
+    if (state->buffered_io_page == NULL) {
+        error_report("map buffered IO page returned error %d", errno);
+        return -1;
+    }
+
+    state->bufioreq_remote_port = bufioreq_evtchn;
+    
+    return 0;
+}
+
+void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
+{
+    int i, rc;
+    xen_pfn_t ioreq_pfn;
     XenIOState *state;
 
     state = g_malloc0(sizeof (XenIOState));
@@ -1273,28 +1314,10 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
     state->wakeup.notify = xen_wakeup_notifier;
     qemu_register_wakeup_notifier(&state->wakeup);
 
-    rc = xen_get_ioreq_server_info(xen_domid, state->ioservid,
-                                   &ioreq_pfn, &bufioreq_pfn,
-                                   &bufioreq_evtchn);
-    if (rc < 0) {
-        error_report("failed to get ioreq server info: error %d handle=%p",
-                     errno, xen_xc);
+    rc = xen_map_ioreq_server(state);
+    if (rc < 0)
         goto err;
-    }
-
-    DPRINTF("shared page at pfn %lx\n", ioreq_pfn);
-    DPRINTF("buffered io page at pfn %lx\n", bufioreq_pfn);
-    DPRINTF("buffered io evtchn is %x\n", bufioreq_evtchn);
-
-    state->shared_page = xenforeignmemory_map(xen_fmem, xen_domid,
-                                              PROT_READ|PROT_WRITE,
-                                              1, &ioreq_pfn, NULL);
-    if (state->shared_page == NULL) {
-        error_report("map shared IO page returned error %d handle=%p",
-                     errno, xen_xc);
-        goto err;
-    }
-
+    
     rc = xen_get_vmport_regs_pfn(xen_xc, xen_domid, &ioreq_pfn);
     if (!rc) {
         DPRINTF("shared vmport page at pfn %lx\n", ioreq_pfn);
@@ -1312,14 +1335,6 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
         goto err;
     }
 
-    state->buffered_io_page = xenforeignmemory_map(xen_fmem, xen_domid,
-                                                   PROT_READ|PROT_WRITE,
-                                                   1, &bufioreq_pfn, NULL);
-    if (state->buffered_io_page == NULL) {
-        error_report("map buffered IO page returned error %d", errno);
-        goto err;
-    }
-
     /* Note: cpus is empty at this point in init */
     state->cpu_by_vcpu_id = g_malloc0(max_cpus * sizeof(CPUState *));
 
@@ -1344,7 +1359,7 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
     }
 
     rc = xenevtchn_bind_interdomain(state->xce_handle, xen_domid,
-                                    bufioreq_evtchn);
+                                    state->bufioreq_remote_port);
     if (rc == -1) {
         error_report("buffered evtchn bind error %d", errno);
         goto err;
-- 
2.11.0


[-- Attachment #3: 0002-use-new-interface.patch --]
[-- Type: application/octet-stream, Size: 5641 bytes --]

From 0afb65e74d4ba5b1fae47cd7fef65a6bdc859b57 Mon Sep 17 00:00:00 2001
From: Paul Durrant <paul.durrant@citrix.com>
Date: Thu, 10 Aug 2017 11:38:01 +0100
Subject: [PATCH 2/2] use new interface

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
 configure                   |  1 +
 hw/i386/xen/xen-hvm.c       | 62 +++++++++++++++++++++++++++++++++++----------
 include/hw/xen/xen_common.h | 16 ++++++++++++
 3 files changed, 66 insertions(+), 13 deletions(-)

diff --git a/configure b/configure
index dd73cce62f..55b67f9845 100755
--- a/configure
+++ b/configure
@@ -2114,6 +2114,7 @@ int main(void) {
 
   xfmem = xenforeignmemory_open(0, 0);
   xenforeignmemory_map2(xfmem, 0, 0, 0, 0, 0, 0, 0);
+  xenforeignmemory_map_resource(xfmem, 0, 0, 0, 0, 0, NULL, 0, 0);
 
   return 0;
 }
diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index 59e3122daf..0a8d1f6574 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -1235,13 +1235,37 @@ static void xen_wakeup_notifier(Notifier *notifier, void *data)
 
 static int xen_map_ioreq_server(XenIOState *state)
 {
+    void *addr = NULL;
+    xenforeignmemory_resource_handle *fres;
     xen_pfn_t ioreq_pfn;
     xen_pfn_t bufioreq_pfn;
     evtchn_port_t bufioreq_evtchn;
     int rc;
+
+    /*
+     * Attempt to map using the resource API and fall back to normal
+     * foreign mapping if this is not supported.
+     */
+    fres = xenforeignmemory_map_resource(xen_fmem, xen_domid,
+                                         XENMEM_resource_ioreq_server,
+                                         state->ioservid, 0, 2,
+                                         &addr,
+                                         PROT_READ|PROT_WRITE, 0);
+    if (fres != NULL) {
+        state->buffered_io_page = addr;
+        state->shared_page = addr + TARGET_PAGE_SIZE;
+    } else {
+        error_report("failed to map ioreq server resources: error %d handle=%p",
+                     errno, xen_xc);
+        if (errno != EOPNOTSUPP)
+            return -1;
+    }
     
     rc = xen_get_ioreq_server_info(xen_domid, state->ioservid,
-                                   &ioreq_pfn, &bufioreq_pfn,
+                                   (state->shared_page == NULL) ?
+                                   &ioreq_pfn : NULL,
+                                   (state->buffered_io_page == NULL) ?
+                                   &bufioreq_pfn : NULL,
                                    &bufioreq_evtchn);
     if (rc < 0) {
         error_report("failed to get ioreq server info: error %d handle=%p",
@@ -1249,24 +1273,36 @@ static int xen_map_ioreq_server(XenIOState *state)
         return rc;
     }
 
-    DPRINTF("shared page at pfn %lx\n", ioreq_pfn);
-    DPRINTF("buffered io page at pfn %lx\n", bufioreq_pfn);
+    if (state->shared_page == NULL)
+        DPRINTF("shared page at pfn %lx\n", ioreq_pfn);
+    
+    if (state->buffered_io_page == NULL)
+        DPRINTF("buffered io page at pfn %lx\n", bufioreq_pfn);
+    
     DPRINTF("buffered io evtchn is %x\n", bufioreq_evtchn);
 
-    state->shared_page = xenforeignmemory_map(xen_fmem, xen_domid,
-                                              PROT_READ|PROT_WRITE,
-                                              1, &ioreq_pfn, NULL);
     if (state->shared_page == NULL) {
-        error_report("map shared IO page returned error %d handle=%p",
-                     errno, xen_xc);
-        return -1;
+        state->shared_page = xenforeignmemory_map(xen_fmem, xen_domid,
+                                                  PROT_READ|PROT_WRITE,
+                                                  1, &ioreq_pfn, NULL);
+        if (state->shared_page == NULL) {
+            error_report("map shared IO page returned error %d handle=%p",
+                         errno, xen_xc);
+        }
     }
 
-    state->buffered_io_page = xenforeignmemory_map(xen_fmem, xen_domid,
-                                                   PROT_READ|PROT_WRITE,
-                                                   1, &bufioreq_pfn, NULL);
     if (state->buffered_io_page == NULL) {
-        error_report("map buffered IO page returned error %d", errno);
+        state->buffered_io_page = xenforeignmemory_map(xen_fmem, xen_domid,
+                                                       PROT_READ|PROT_WRITE,
+                                                       1, &bufioreq_pfn,
+                                                       NULL);
+        if (state->buffered_io_page == NULL) {
+            error_report("map buffered IO page returned error %d", errno);
+            return -1;
+        }
+    }
+
+    if (state->shared_page == NULL || state->buffered_io_page == NULL) {
         return -1;
     }
 
diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
index 86c7f26106..6c59ab56e9 100644
--- a/include/hw/xen/xen_common.h
+++ b/include/hw/xen/xen_common.h
@@ -91,6 +91,22 @@ static inline void *xenforeignmemory_map2(xenforeignmemory_handle *h,
     return xenforeignmemory_map(h, dom, prot, pages, arr, err);
 }
 
+typedef void xenforeignmemory_resource_handle;
+
+static inline xenforeignmemory_resource_handle *xenforeignmemory_map_resource(
+    xenforeignmemory_handle *fmem, domid_t domid, unsigned int type,
+    unsigned int id, unsigned long frame, unsigned long nr_frames,
+    void **paddr, int prot, int flags)
+{
+    errno = EOPNOTSUPP;
+    return -1;
+}
+
+static inline void xenforeignmemory_unmap_resource(
+    xenforeignmemory_handle *fmem, xenforeignmemory_resource_handle *fres)
+{
+}
+
 #endif
 
 #if CONFIG_XEN_CTRL_INTERFACE_VERSION < 40900
-- 
2.11.0


[-- Attachment #4: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages
  2017-12-08 11:06             ` Paul Durrant
@ 2017-12-12  1:03               ` Chao Gao
  2017-12-12  9:07                 ` Paul Durrant
  0 siblings, 1 reply; 56+ messages in thread
From: Chao Gao @ 2017-12-12  1:03 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, xen-devel, Jan Beulich, Ian Jackson

On Fri, Dec 08, 2017 at 11:06:43AM +0000, Paul Durrant wrote:
>> -----Original Message-----
>> From: Chao Gao [mailto:chao.gao@intel.com]
>> Sent: 07 December 2017 06:57
>> To: Paul Durrant <Paul.Durrant@citrix.com>
>> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
>> <wei.liu2@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Tim
>> (Xen.org) <tim@xen.org>; George Dunlap <George.Dunlap@citrix.com>;
>> xen-devel@lists.xen.org; Jan Beulich <jbeulich@suse.com>; Ian Jackson
>> <Ian.Jackson@citrix.com>
>> Subject: Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4
>> pages
>> 
>> On Thu, Dec 07, 2017 at 08:41:14AM +0000, Paul Durrant wrote:
>> >> -----Original Message-----
>> >> From: Xen-devel [mailto:xen-devel-bounces@lists.xenproject.org] On
>> Behalf
>> >> Of Paul Durrant
>> >> Sent: 06 December 2017 16:10
>> >> To: 'Chao Gao' <chao.gao@intel.com>
>> >> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
>> >> <wei.liu2@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>;
>> Tim
>> >> (Xen.org) <tim@xen.org>; George Dunlap <George.Dunlap@citrix.com>;
>> >> xen-devel@lists.xen.org; Jan Beulich <jbeulich@suse.com>; Ian Jackson
>> >> <Ian.Jackson@citrix.com>
>> >> Subject: Re: [Xen-devel] [RFC Patch v4 2/8] ioreq: bump the number of
>> >> IOREQ page to 4 pages
>> >>
>> >> > -----Original Message-----
>> >> > From: Chao Gao [mailto:chao.gao@intel.com]
>> >> > Sent: 06 December 2017 09:02
>> >> > To: Paul Durrant <Paul.Durrant@citrix.com>
>> >> > Cc: xen-devel@lists.xen.org; Tim (Xen.org) <tim@xen.org>; Stefano
>> >> > Stabellini <sstabellini@kernel.org>; Konrad Rzeszutek Wilk
>> >> > <konrad.wilk@oracle.com>; Jan Beulich <jbeulich@suse.com>; George
>> >> > Dunlap <George.Dunlap@citrix.com>; Andrew Cooper
>> >> > <Andrew.Cooper3@citrix.com>; Wei Liu <wei.liu2@citrix.com>; Ian
>> Jackson
>> >> > <Ian.Jackson@citrix.com>
>> >> > Subject: Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page
>> to 4
>> >> > pages
>> >> >
>> >> > On Wed, Dec 06, 2017 at 03:04:11PM +0000, Paul Durrant wrote:
>> >> > >> -----Original Message-----
>> >> > >> From: Chao Gao [mailto:chao.gao@intel.com]
>> >> > >> Sent: 06 December 2017 07:50
>> >> > >> To: xen-devel@lists.xen.org
>> >> > >> Cc: Chao Gao <chao.gao@intel.com>; Paul Durrant
>> >> > >> <Paul.Durrant@citrix.com>; Tim (Xen.org) <tim@xen.org>; Stefano
>> >> > Stabellini
>> >> > >> <sstabellini@kernel.org>; Konrad Rzeszutek Wilk
>> >> > >> <konrad.wilk@oracle.com>; Jan Beulich <jbeulich@suse.com>;
>> George
>> >> > >> Dunlap <George.Dunlap@citrix.com>; Andrew Cooper
>> >> > >> <Andrew.Cooper3@citrix.com>; Wei Liu <wei.liu2@citrix.com>; Ian
>> >> > Jackson
>> >> > >> <Ian.Jackson@citrix.com>
>> >> > >> Subject: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page
>> to 4
>> >> > >> pages
>> >> > >>
>> >> > >> One 4K-byte page at most contains 128 'ioreq_t'. In order to remove
>> the
>> >> > vcpu
>> >> > >> number constraint imposed by one IOREQ page, bump the number
>> of
>> >> > IOREQ
>> >> > >> page to
>> >> > >> 4 pages. With this patch, multiple pages can be used as IOREQ page.
>> >> > >>
>> >> > >> Basically, this patch extends 'ioreq' field in struct hvm_ioreq_server
>> to
>> >> an
>> >> > >> array. All accesses to 'ioreq' field such as 's->ioreq' are replaced with
>> >> > >> FOR_EACH_IOREQ_PAGE macro.
>> >> > >>
>> >> > >> In order to access an IOREQ page, QEMU should get the gmfn and
>> map
>> >> > this
>> >> > >> gmfn
>> >> > >> to its virtual address space.
>> >> > >
>> >> > >No. There's no need to extend the 'legacy' mechanism of using magic
>> >> page
>> >> > gfns. You should only handle the case where the mfns are allocated on
>> >> > demand (see the call to hvm_ioreq_server_alloc_pages() in
>> >> > hvm_get_ioreq_server_frame()). The number of guest vcpus is known
>> at
>> >> > this point so the correct number of pages can be allocated. If the creator
>> of
>> >> > the ioreq server attempts to use the legacy
>> hvm_get_ioreq_server_info()
>> >> > and the guest has >128 vcpus then the call should fail.
>> >> >
>> >> > Great suggestion. I will introduce a new dmop, a variant of
>> >> > hvm_get_ioreq_server_frame() for creator to get an array of gfns and
>> the
>> >> > size of array. And the legacy interface will report an error if more
>> >> > than one IOREQ PAGES are needed.
>> >>
>> >> You don't need a new dmop for mapping I think. The mem op to map
>> ioreq
>> >> server frames should work. All you should need to do is update
>> >> hvm_get_ioreq_server_frame() to deal with an index > 1, and provide
>> some
>> >> means for the ioreq server creator to convert the number of guest vcpus
>> into
>> >> the correct number of pages to map. (That might need a new dm op).
>> >
>> >I realise after saying this that an emulator already knows the size of the
>> ioreq structure and so can easily calculate the correct number of pages to
>> map, given the number of guest vcpus.
>> 
>> How about the patch in the bottom? Is it in the right direction?
>
>Yes, certainly along the right lines. I would probably do away with MAX_NR_IOREQ_PAGE though. You should just to dynamically allocate the correct number of ioreq pages when the ioreq server is created (since you already calculate nr_ioreq_page there anyway).
>
>> Do you have the QEMU patch, which replaces the old method with the new
>> method
>> to set up mapping? I want to integrate that patch and do some tests.
>
>Sure. There's a couple of patched. I have not tested them with recent rebases of my series so you may find some issues.

Hi, Paul.

I merged the two qemu patches, the privcmd patch [1] and did some tests.
I encountered a small issue and report it to you, so you can pay more
attention to it when doing some tests. The symptom is that using the new
interface to map grant table in xc_dom_gnttab_seed() always fails. After
adding some printk in privcmd, I found it is
xen_remap_domain_gfn_array() that fails with errcode -16. Mapping ioreq
server doesn't have such an issue.

[1] http://xenbits.xen.org/gitweb/?p=people/pauldu/linux.git;a=commit;h=ce59a05e6712

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages
  2017-12-12  1:03               ` Chao Gao
@ 2017-12-12  9:07                 ` Paul Durrant
  2017-12-12 23:39                   ` Chao Gao
  0 siblings, 1 reply; 56+ messages in thread
From: Paul Durrant @ 2017-12-12  9:07 UTC (permalink / raw)
  To: 'Chao Gao'
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, xen-devel, Jan Beulich, Ian Jackson

> -----Original Message-----
[snip]
> 
> Hi, Paul.
> 
> I merged the two qemu patches, the privcmd patch [1] and did some tests.
> I encountered a small issue and report it to you, so you can pay more
> attention to it when doing some tests. The symptom is that using the new
> interface to map grant table in xc_dom_gnttab_seed() always fails. After
> adding some printk in privcmd, I found it is
> xen_remap_domain_gfn_array() that fails with errcode -16. Mapping ioreq
> server doesn't have such an issue.
> 
> [1]
> http://xenbits.xen.org/gitweb/?p=people/pauldu/linux.git;a=commit;h=ce5
> 9a05e6712
> 

Chao,

  That privcmd patch is out of date. I've just pushed a new one:

http://xenbits.xen.org/gitweb/?p=people/pauldu/linux.git;a=commit;h=9f00199f5f12cef401c6370c94a1140de9b318fc

  Give that a try. I've been using it for a few weeks now.

  Cheers,

    Paul

> Thanks
> Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages
  2017-12-12  9:07                 ` Paul Durrant
@ 2017-12-12 23:39                   ` Chao Gao
  2017-12-13 10:49                     ` Paul Durrant
  0 siblings, 1 reply; 56+ messages in thread
From: Chao Gao @ 2017-12-12 23:39 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, xen-devel, Jan Beulich, Ian Jackson

On Tue, Dec 12, 2017 at 09:07:46AM +0000, Paul Durrant wrote:
>> -----Original Message-----
>[snip]
>> 
>> Hi, Paul.
>> 
>> I merged the two qemu patches, the privcmd patch [1] and did some tests.
>> I encountered a small issue and report it to you, so you can pay more
>> attention to it when doing some tests. The symptom is that using the new
>> interface to map grant table in xc_dom_gnttab_seed() always fails. After
>> adding some printk in privcmd, I found it is
>> xen_remap_domain_gfn_array() that fails with errcode -16. Mapping ioreq
>> server doesn't have such an issue.
>> 
>> [1]
>> http://xenbits.xen.org/gitweb/?p=people/pauldu/linux.git;a=commit;h=ce5
>> 9a05e6712
>> 
>
>Chao,
>
>  That privcmd patch is out of date. I've just pushed a new one:
>
>http://xenbits.xen.org/gitweb/?p=people/pauldu/linux.git;a=commit;h=9f00199f5f12cef401c6370c94a1140de9b318fc
>
>  Give that a try. I've been using it for a few weeks now.

Mapping ioreq server always fails, while mapping grant table succeeds.

QEMU fails with following log:
xenforeignmemory: error: ioctl failed: Device or resource busy
qemu-system-i386: failed to map ioreq server resources: error 16
handle=0x5614a6df5e00
qemu-system-i386: xen hardware virtual machine initialisation failed

Xen encountered the following error:
(XEN) [13118.909787] mm.c:1003:d0v109 pg_owner d2 l1e_owner d0, but real_pg_owner d0
(XEN) [13118.918122] mm.c:1079:d0v109 Error getting mfn 5da5841 (pfn ffffffffffffffff) from L1 entry 8000005da5841227 for l1e_owner d0, pg_owner d2

I only fixed some obvious issues with a patch to your privcmd patch:
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -181,7 +181,7 @@ int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
        if (xen_feature(XENFEAT_auto_translated_physmap))
                return -EOPNOTSUPP;
 
-       return do_remap_gfn(vma, addr, &gfn, nr, NULL, prot, domid, pages);
+       return do_remap_pfn(vma, addr, &gfn, nr, NULL, prot, domid, false, pages
 }
 EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
 
@@ -200,8 +200,8 @@ int xen_remap_domain_gfn_array(struct vm_area_struct *vma,
         * cause of "wrong memory was mapped in".
         */
        BUG_ON(err_ptr == NULL);
-        do_remap_pfn(vma, addr, gfn, nr, err_ptr, prot, domid,
-                    false, pages);
+       return do_remap_pfn(vma, addr, gfn, nr, err_ptr, prot, domid,
+                       false, pages);
 }
 EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_array);

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages
  2017-12-12 23:39                   ` Chao Gao
@ 2017-12-13 10:49                     ` Paul Durrant
  2017-12-13 17:50                       ` Paul Durrant
  0 siblings, 1 reply; 56+ messages in thread
From: Paul Durrant @ 2017-12-13 10:49 UTC (permalink / raw)
  To: 'Chao Gao'
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, xen-devel, Jan Beulich, Ian Jackson

> -----Original Message-----
> From: Chao Gao [mailto:chao.gao@intel.com]
> Sent: 12 December 2017 23:39
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> <wei.liu2@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Tim
> (Xen.org) <tim@xen.org>; George Dunlap <George.Dunlap@citrix.com>;
> xen-devel@lists.xen.org; Jan Beulich <jbeulich@suse.com>; Ian Jackson
> <Ian.Jackson@citrix.com>
> Subject: Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4
> pages
> 
> On Tue, Dec 12, 2017 at 09:07:46AM +0000, Paul Durrant wrote:
> >> -----Original Message-----
> >[snip]
> >>
> >> Hi, Paul.
> >>
> >> I merged the two qemu patches, the privcmd patch [1] and did some
> tests.
> >> I encountered a small issue and report it to you, so you can pay more
> >> attention to it when doing some tests. The symptom is that using the new
> >> interface to map grant table in xc_dom_gnttab_seed() always fails. After
> >> adding some printk in privcmd, I found it is
> >> xen_remap_domain_gfn_array() that fails with errcode -16. Mapping
> ioreq
> >> server doesn't have such an issue.
> >>
> >> [1]
> >>
> http://xenbits.xen.org/gitweb/?p=people/pauldu/linux.git;a=commit;h=ce5
> >> 9a05e6712
> >>
> >
> >Chao,
> >
> >  That privcmd patch is out of date. I've just pushed a new one:
> >
> >http://xenbits.xen.org/gitweb/?p=people/pauldu/linux.git;a=commit;h=9f
> 00199f5f12cef401c6370c94a1140de9b318fc
> >
> >  Give that a try. I've been using it for a few weeks now.
> 
> Mapping ioreq server always fails, while mapping grant table succeeds.
> 
> QEMU fails with following log:
> xenforeignmemory: error: ioctl failed: Device or resource busy
> qemu-system-i386: failed to map ioreq server resources: error 16
> handle=0x5614a6df5e00
> qemu-system-i386: xen hardware virtual machine initialisation failed
> 
> Xen encountered the following error:
> (XEN) [13118.909787] mm.c:1003:d0v109 pg_owner d2 l1e_owner d0, but
> real_pg_owner d0
> (XEN) [13118.918122] mm.c:1079:d0v109 Error getting mfn 5da5841 (pfn
> ffffffffffffffff) from L1 entry 8000005da5841227 for l1e_owner d0, pg_owner
> d2

Hmm. That looks like it is because the ioreq server pages are not owned by the correct domain. The Xen patch series underwent some changes later in review and I did not re-test my QEMU patch after that so I wonder if mapping IOREQ pages has simply become broken. I'll investigate.

  Paul

> 
> I only fixed some obvious issues with a patch to your privcmd patch:
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -181,7 +181,7 @@ int xen_remap_domain_gfn_range(struct
> vm_area_struct *vma,
>         if (xen_feature(XENFEAT_auto_translated_physmap))
>                 return -EOPNOTSUPP;
> 
> -       return do_remap_gfn(vma, addr, &gfn, nr, NULL, prot, domid, pages);
> +       return do_remap_pfn(vma, addr, &gfn, nr, NULL, prot, domid, false,
> pages
>  }
>  EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
> 
> @@ -200,8 +200,8 @@ int xen_remap_domain_gfn_array(struct
> vm_area_struct *vma,
>          * cause of "wrong memory was mapped in".
>          */
>         BUG_ON(err_ptr == NULL);
> -        do_remap_pfn(vma, addr, gfn, nr, err_ptr, prot, domid,
> -                    false, pages);
> +       return do_remap_pfn(vma, addr, gfn, nr, err_ptr, prot, domid,
> +                       false, pages);
>  }
>  EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_array);
> 
> Thanks
> Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages
  2017-12-13 10:49                     ` Paul Durrant
@ 2017-12-13 17:50                       ` Paul Durrant
  2017-12-14 14:50                         ` Paul Durrant
  0 siblings, 1 reply; 56+ messages in thread
From: Paul Durrant @ 2017-12-13 17:50 UTC (permalink / raw)
  To: Paul Durrant, 'Chao Gao'
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, xen-devel, Jan Beulich, Ian Jackson

> -----Original Message-----
> From: Xen-devel [mailto:xen-devel-bounces@lists.xenproject.org] On Behalf
> Of Paul Durrant
> Sent: 13 December 2017 10:49
> To: 'Chao Gao' <chao.gao@intel.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> <wei.liu2@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Tim
> (Xen.org) <tim@xen.org>; George Dunlap <George.Dunlap@citrix.com>;
> xen-devel@lists.xen.org; Jan Beulich <jbeulich@suse.com>; Ian Jackson
> <Ian.Jackson@citrix.com>
> Subject: Re: [Xen-devel] [RFC Patch v4 2/8] ioreq: bump the number of
> IOREQ page to 4 pages
> 
> > -----Original Message-----
> > From: Chao Gao [mailto:chao.gao@intel.com]
> > Sent: 12 December 2017 23:39
> > To: Paul Durrant <Paul.Durrant@citrix.com>
> > Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> > <wei.liu2@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>;
> Tim
> > (Xen.org) <tim@xen.org>; George Dunlap <George.Dunlap@citrix.com>;
> > xen-devel@lists.xen.org; Jan Beulich <jbeulich@suse.com>; Ian Jackson
> > <Ian.Jackson@citrix.com>
> > Subject: Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4
> > pages
> >
> > On Tue, Dec 12, 2017 at 09:07:46AM +0000, Paul Durrant wrote:
> > >> -----Original Message-----
> > >[snip]
> > >>
> > >> Hi, Paul.
> > >>
> > >> I merged the two qemu patches, the privcmd patch [1] and did some
> > tests.
> > >> I encountered a small issue and report it to you, so you can pay more
> > >> attention to it when doing some tests. The symptom is that using the
> new
> > >> interface to map grant table in xc_dom_gnttab_seed() always fails.
> After
> > >> adding some printk in privcmd, I found it is
> > >> xen_remap_domain_gfn_array() that fails with errcode -16. Mapping
> > ioreq
> > >> server doesn't have such an issue.
> > >>
> > >> [1]
> > >>
> >
> http://xenbits.xen.org/gitweb/?p=people/pauldu/linux.git;a=commit;h=ce5
> > >> 9a05e6712
> > >>
> > >
> > >Chao,
> > >
> > >  That privcmd patch is out of date. I've just pushed a new one:
> > >
> >
> >http://xenbits.xen.org/gitweb/?p=people/pauldu/linux.git;a=commit;h=9f
> > 00199f5f12cef401c6370c94a1140de9b318fc
> > >
> > >  Give that a try. I've been using it for a few weeks now.
> >
> > Mapping ioreq server always fails, while mapping grant table succeeds.
> >
> > QEMU fails with following log:
> > xenforeignmemory: error: ioctl failed: Device or resource busy
> > qemu-system-i386: failed to map ioreq server resources: error 16
> > handle=0x5614a6df5e00
> > qemu-system-i386: xen hardware virtual machine initialisation failed
> >
> > Xen encountered the following error:
> > (XEN) [13118.909787] mm.c:1003:d0v109 pg_owner d2 l1e_owner d0, but
> > real_pg_owner d0
> > (XEN) [13118.918122] mm.c:1079:d0v109 Error getting mfn 5da5841 (pfn
> > ffffffffffffffff) from L1 entry 8000005da5841227 for l1e_owner d0,
> pg_owner
> > d2
> 
> Hmm. That looks like it is because the ioreq server pages are not owned by
> the correct domain. The Xen patch series underwent some changes later in
> review and I did not re-test my QEMU patch after that so I wonder if
> mapping IOREQ pages has simply become broken. I'll investigate.
> 

I have reproduced the problem locally now. Will try to figure out the bug tomorrow.

  Paul

>   Paul
> 
> >
> > I only fixed some obvious issues with a patch to your privcmd patch:
> > --- a/arch/x86/xen/mmu.c
> > +++ b/arch/x86/xen/mmu.c
> > @@ -181,7 +181,7 @@ int xen_remap_domain_gfn_range(struct
> > vm_area_struct *vma,
> >         if (xen_feature(XENFEAT_auto_translated_physmap))
> >                 return -EOPNOTSUPP;
> >
> > -       return do_remap_gfn(vma, addr, &gfn, nr, NULL, prot, domid, pages);
> > +       return do_remap_pfn(vma, addr, &gfn, nr, NULL, prot, domid, false,
> > pages
> >  }
> >  EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
> >
> > @@ -200,8 +200,8 @@ int xen_remap_domain_gfn_array(struct
> > vm_area_struct *vma,
> >          * cause of "wrong memory was mapped in".
> >          */
> >         BUG_ON(err_ptr == NULL);
> > -        do_remap_pfn(vma, addr, gfn, nr, err_ptr, prot, domid,
> > -                    false, pages);
> > +       return do_remap_pfn(vma, addr, gfn, nr, err_ptr, prot, domid,
> > +                       false, pages);
> >  }
> >  EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_array);
> >
> > Thanks
> > Chao
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages
  2017-12-13 17:50                       ` Paul Durrant
@ 2017-12-14 14:50                         ` Paul Durrant
  2017-12-15  0:35                           ` Chao Gao
  0 siblings, 1 reply; 56+ messages in thread
From: Paul Durrant @ 2017-12-14 14:50 UTC (permalink / raw)
  To: 'Chao Gao'
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, xen-devel, Jan Beulich, Ian Jackson

> -----Original Message-----
> >
> > Hmm. That looks like it is because the ioreq server pages are not owned by
> > the correct domain. The Xen patch series underwent some changes later in
> > review and I did not re-test my QEMU patch after that so I wonder if
> > mapping IOREQ pages has simply become broken. I'll investigate.
> >
> 
> I have reproduced the problem locally now. Will try to figure out the bug
> tomorrow.
> 

Chao,

  Can you try my new branch http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git;a=shortlog;h=refs/heads/ioreq24?

  The problem was indeed that the ioreq pages were owned by the emulating domain rather than the target domain, which is no longer compatible with privcmd's use of HYPERVISOR_mmu_update.

  Cheers,

    Paul
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages
  2017-12-14 14:50                         ` Paul Durrant
@ 2017-12-15  0:35                           ` Chao Gao
  2017-12-15  9:40                             ` Paul Durrant
  0 siblings, 1 reply; 56+ messages in thread
From: Chao Gao @ 2017-12-15  0:35 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, xen-devel, Jan Beulich, Ian Jackson

On Thu, Dec 14, 2017 at 02:50:17PM +0000, Paul Durrant wrote:
>> -----Original Message-----
>> >
>> > Hmm. That looks like it is because the ioreq server pages are not owned by
>> > the correct domain. The Xen patch series underwent some changes later in
>> > review and I did not re-test my QEMU patch after that so I wonder if
>> > mapping IOREQ pages has simply become broken. I'll investigate.
>> >
>> 
>> I have reproduced the problem locally now. Will try to figure out the bug
>> tomorrow.
>> 
>
>Chao,
>
>  Can you try my new branch http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git;a=shortlog;h=refs/heads/ioreq24?
>
>  The problem was indeed that the ioreq pages were owned by the emulating domain rather than the target domain, which is no longer compatible with privcmd's use of HYPERVISOR_mmu_update.

Of course. I tested this branch. It works well.

But, I think your privcmd patch couldn't set 'err_ptr' to NULL when
calling xen_remap_domain_mfn_array(). It works for the ioreq page is
allocated right before the bufioreq page, and then they happen to be
continuous.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages
  2017-12-15  0:35                           ` Chao Gao
@ 2017-12-15  9:40                             ` Paul Durrant
  0 siblings, 0 replies; 56+ messages in thread
From: Paul Durrant @ 2017-12-15  9:40 UTC (permalink / raw)
  To: 'Chao Gao'
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, xen-devel, Jan Beulich, Ian Jackson

> -----Original Message-----
> From: Chao Gao [mailto:chao.gao@intel.com]
> Sent: 15 December 2017 00:36
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> <wei.liu2@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Tim
> (Xen.org) <tim@xen.org>; George Dunlap <George.Dunlap@citrix.com>;
> xen-devel@lists.xen.org; Jan Beulich <jbeulich@suse.com>; Ian Jackson
> <Ian.Jackson@citrix.com>
> Subject: Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4
> pages
> 
> On Thu, Dec 14, 2017 at 02:50:17PM +0000, Paul Durrant wrote:
> >> -----Original Message-----
> >> >
> >> > Hmm. That looks like it is because the ioreq server pages are not owned
> by
> >> > the correct domain. The Xen patch series underwent some changes
> later in
> >> > review and I did not re-test my QEMU patch after that so I wonder if
> >> > mapping IOREQ pages has simply become broken. I'll investigate.
> >> >
> >>
> >> I have reproduced the problem locally now. Will try to figure out the bug
> >> tomorrow.
> >>
> >
> >Chao,
> >
> >  Can you try my new branch
> http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git;a=shortlog;h=refs
> /heads/ioreq24?
> >
> >  The problem was indeed that the ioreq pages were owned by the
> emulating domain rather than the target domain, which is no longer
> compatible with privcmd's use of HYPERVISOR_mmu_update.
> 
> Of course. I tested this branch. It works well.
> 
> But, I think your privcmd patch couldn't set 'err_ptr' to NULL when
> calling xen_remap_domain_mfn_array(). It works for the ioreq page is
> allocated right before the bufioreq page, and then they happen to be
> continuous.
> 

I'll have a look at that. The pages should not need to be contiguous MFNs for things to work. They will, by design, by mapped so that they are virtually contiguous. That's just a convenient way of getting pointers to the buf and synchronous structures in QEMU using only a single IOCTL to privcmd.

  Paul

> Thanks
> Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 3/8] xl/acpi: unify the computation of lapic_id
  2017-12-06  7:50 ` [RFC Patch v4 3/8] xl/acpi: unify the computation of lapic_id Chao Gao
@ 2018-02-22 18:05   ` Wei Liu
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Liu @ 2018-02-22 18:05 UTC (permalink / raw)
  To: Chao Gao; +Cc: Ian Jackson, Andrew Cooper, Wei Liu, Jan Beulich, xen-devel

On Wed, Dec 06, 2017 at 03:50:09PM +0800, Chao Gao wrote:
> There were two places where the lapic_id is computed, one in hvmloader and one
> in libacpi. Unify them by defining LAPIC_ID in a header file and incluing it
> in both places.
> 
> To address compilation issue and make libacpi.h self-contained, include
> stdint.h in libacpi.h.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 4/8] hvmloader: boot cpu through broadcast
  2017-12-06  7:50 ` [RFC Patch v4 4/8] hvmloader: boot cpu through broadcast Chao Gao
@ 2018-02-22 18:44   ` Wei Liu
  2018-02-23  8:41     ` Jan Beulich
  2018-02-23 16:42   ` Roger Pau Monné
  2018-04-18  8:38   ` Jan Beulich
  2 siblings, 1 reply; 56+ messages in thread
From: Wei Liu @ 2018-02-22 18:44 UTC (permalink / raw)
  To: Chao Gao; +Cc: Ian Jackson, Andrew Cooper, Wei Liu, Jan Beulich, xen-devel

On Wed, Dec 06, 2017 at 03:50:10PM +0800, Chao Gao wrote:
> Intel SDM Extended XAPIC (X2APIC) -> "Initialization by System Software"
> has the following description:
> 
> "The ACPI interfaces for the x2APIC are described in Section 5.2, “ACPI System
> Description Tables,” of the Advanced Configuration and Power Interface
> Specification, Revision 4.0a (http://www.acpi.info/spec.htm). The default
> behavior for BIOS is to pass the control to the operating system with the
> local x2APICs in xAPIC mode if all APIC IDs reported by CPUID.0BH:EDX are less
> than 255, and in x2APIC mode if there are any logical processor reporting an
> APIC ID of 255 or greater."
> 
> In this patch, hvmloader enables x2apic mode for all vcpus if there are cpus
> with APIC ID > 255. To wake up processors whose APIC ID is greater than 255,
> the SIPI is broadcasted to all APs. It is the way how Seabios wakes up APs.
> APs may compete for the stack, thus a lock is introduced to protect the stack.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
> v4:
>  - new
> ---
>  tools/firmware/hvmloader/apic_regs.h |  4 +++
>  tools/firmware/hvmloader/smp.c       | 64 ++++++++++++++++++++++++++++++++----
>  2 files changed, 61 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/apic_regs.h b/tools/firmware/hvmloader/apic_regs.h
> index f737b47..bc39ecd 100644
> --- a/tools/firmware/hvmloader/apic_regs.h
> +++ b/tools/firmware/hvmloader/apic_regs.h
> @@ -105,6 +105,10 @@
>  #define     APIC_TDR_DIV_64          0x9
>  #define     APIC_TDR_DIV_128         0xA
>  
> +#define MSR_IA32_APICBASE            0x1b
> +#define MSR_IA32_APICBASE_EXTD       (1<<10)
> +#define MSR_IA32_APICBASE_MSR        0x800
> +
>  #endif
>  
>  /*
> diff --git a/tools/firmware/hvmloader/smp.c b/tools/firmware/hvmloader/smp.c
> index 082b17f..e3dade4 100644
> --- a/tools/firmware/hvmloader/smp.c
> +++ b/tools/firmware/hvmloader/smp.c
> @@ -26,7 +26,9 @@
>  #define AP_BOOT_EIP 0x1000
>  extern char ap_boot_start[], ap_boot_end[];
>  
> -static int ap_callin, ap_cpuid;
> +static int ap_callin;
> +static int enable_x2apic;
> +static bool lock = 1;
>  
>  asm (
>      "    .text                       \n"
> @@ -47,7 +49,15 @@ asm (
>      "    mov   %eax,%ds              \n"
>      "    mov   %eax,%es              \n"
>      "    mov   %eax,%ss              \n"
> -    "    movl  $stack_top,%esp       \n"
> +    "3:  movb  $1, %bl               \n"

I would rather you label this 1 and increment the number as you go.

> +    "    mov   $lock,%edx            \n"

Not really an expert in x86 asm -- shouldn't this be lea instead?

> +    "    movzbl %bl,%eax             \n"
> +    "    xchg  %al, (%edx)           \n"

Why not just use %bl directly?

> +    "    test  %al,%al               \n"
> +    "    je    2f                    \n"
> +    "    pause                       \n"
> +    "    jmp   3b                    \n"
> +    "2:  movl  $stack_top,%esp       \n"
>      "    movl  %esp,%ebp             \n"
>      "    call  ap_start              \n"
>      "1:  hlt                         \n"
> @@ -68,14 +78,34 @@ asm (
>      "    .text                       \n"
>      );
>  
[...]
>  
> +static void boot_cpu_broadcast_x2apic(unsigned int nr_cpus)
> +{
> +    wrmsr(MSR_IA32_APICBASE_MSR + (APIC_ICR >> 4),
> +          APIC_DEST_ALLBUT | APIC_DM_INIT);
> +
> +    wrmsr(MSR_IA32_APICBASE_MSR + (APIC_ICR >> 4),
> +          APIC_DEST_ALLBUT | APIC_DM_STARTUP | (AP_BOOT_EIP >> 12));
> +
> +    while ( ap_callin != nr_cpus )
> +        cpu_relax();
> +
> +    wrmsr(MSR_IA32_APICBASE_MSR + (APIC_ICR >> 4),
> +          APIC_DEST_ALLBUT | APIC_DM_INIT);
> +}
> +
>  void smp_initialise(void)
>  {
>      unsigned int i, nr_cpus = hvm_info->nr_vcpus;
> @@ -125,9 +169,15 @@ void smp_initialise(void)
>      memcpy((void *)AP_BOOT_EIP, ap_boot_start, ap_boot_end - ap_boot_start);
>  
>      printf("Multiprocessor initialisation:\n");
> +    if ( nr_cpus > MADT_MAX_LOCAL_APIC )
> +        enable_x2apic = 1;
> +
>      ap_start();
> -    for ( i = 1; i < nr_cpus; i++ )
> -        boot_cpu(i);
> +    if ( nr_cpus > MADT_MAX_LOCAL_APIC )

Use enable_x2apic here instead.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 8/8] x86/hvm: bump the maximum number of vcpus to 512
  2017-12-06  7:50 ` [RFC Patch v4 8/8] x86/hvm: bump the maximum number of vcpus to 512 Chao Gao
@ 2018-02-22 18:46   ` Wei Liu
  2018-02-23  8:50     ` Jan Beulich
  2018-02-23 18:11   ` Roger Pau Monné
  1 sibling, 1 reply; 56+ messages in thread
From: Wei Liu @ 2018-02-22 18:46 UTC (permalink / raw)
  To: Chao Gao
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Ian Jackson,
	Tim Deegan, xen-devel, Jan Beulich, Andrew Cooper

On Wed, Dec 06, 2017 at 03:50:14PM +0800, Chao Gao wrote:
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
>  xen/include/public/hvm/hvm_info_table.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/include/public/hvm/hvm_info_table.h b/xen/include/public/hvm/hvm_info_table.h
> index 08c252e..6833a4c 100644
> --- a/xen/include/public/hvm/hvm_info_table.h
> +++ b/xen/include/public/hvm/hvm_info_table.h
> @@ -32,7 +32,7 @@
>  #define HVM_INFO_PADDR       ((HVM_INFO_PFN << 12) + HVM_INFO_OFFSET)
>  
>  /* Maximum we can support with current vLAPIC ID mapping. */
> -#define HVM_MAX_VCPUS        128
> +#define HVM_MAX_VCPUS        512

I checked a few places where this is used, bumping the number doesn't
seem to be harmful on the surface.

Of course there is the question how many CPUs can upstream support,
I think it is still under argument at the moment.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 4/8] hvmloader: boot cpu through broadcast
  2018-02-22 18:44   ` Wei Liu
@ 2018-02-23  8:41     ` Jan Beulich
  0 siblings, 0 replies; 56+ messages in thread
From: Jan Beulich @ 2018-02-23  8:41 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, xen-devel, Ian Jackson, Chao Gao

>>> On 22.02.18 at 19:44, <wei.liu2@citrix.com> wrote:
> On Wed, Dec 06, 2017 at 03:50:10PM +0800, Chao Gao wrote:
>> +    "    mov   $lock,%edx            \n"
> 
> Not really an expert in x86 asm -- shouldn't this be lea instead?

No, when MOV can be used it's preferable as having a 1 byte shorter
encoding. There are a few cases where MOV may not be appropriate,
but I don't think the one here is.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 8/8] x86/hvm: bump the maximum number of vcpus to 512
  2018-02-22 18:46   ` Wei Liu
@ 2018-02-23  8:50     ` Jan Beulich
  2018-02-23 17:18       ` Wei Liu
  0 siblings, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2018-02-23  8:50 UTC (permalink / raw)
  To: Wei Liu, Chao Gao
  Cc: Tim Deegan, Stefano Stabellini, George Dunlap, Andrew Cooper,
	Ian Jackson, xen-devel

>>> On 22.02.18 at 19:46, <wei.liu2@citrix.com> wrote:
> On Wed, Dec 06, 2017 at 03:50:14PM +0800, Chao Gao wrote:
>> --- a/xen/include/public/hvm/hvm_info_table.h
>> +++ b/xen/include/public/hvm/hvm_info_table.h
>> @@ -32,7 +32,7 @@
>>  #define HVM_INFO_PADDR       ((HVM_INFO_PFN << 12) + HVM_INFO_OFFSET)
>>  
>>  /* Maximum we can support with current vLAPIC ID mapping. */
>> -#define HVM_MAX_VCPUS        128
>> +#define HVM_MAX_VCPUS        512
> 
> I checked a few places where this is used, bumping the number doesn't
> seem to be harmful on the surface.
> 
> Of course there is the question how many CPUs can upstream support,
> I think it is still under argument at the moment.

Leaving the latter aside, it is never okay to simply change a #define
in the public interface. See e.g. fb442e2171 ("x86_64: allow more
vCPU-s per guest"), where we've already gone a somewhat harsh
route by dropping the original #define altogether (thus at least
causing build failures for consumers), taking the position that it
wasn't correct to expose the value in the first place.

Hence at the very least I can't see how struct hvm_info_table
(in the same public header) can be left alone with this change.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 4/8] hvmloader: boot cpu through broadcast
  2017-12-06  7:50 ` [RFC Patch v4 4/8] hvmloader: boot cpu through broadcast Chao Gao
  2018-02-22 18:44   ` Wei Liu
@ 2018-02-23 16:42   ` Roger Pau Monné
  2018-02-24  5:49     ` Chao Gao
  2018-04-18  8:38   ` Jan Beulich
  2 siblings, 1 reply; 56+ messages in thread
From: Roger Pau Monné @ 2018-02-23 16:42 UTC (permalink / raw)
  To: Chao Gao; +Cc: Andrew Cooper, Ian Jackson, Wei Liu, Jan Beulich, xen-devel

On Wed, Dec 06, 2017 at 03:50:10PM +0800, Chao Gao wrote:
> Intel SDM Extended XAPIC (X2APIC) -> "Initialization by System Software"
> has the following description:
> 
> "The ACPI interfaces for the x2APIC are described in Section 5.2, “ACPI System
> Description Tables,” of the Advanced Configuration and Power Interface
> Specification, Revision 4.0a (http://www.acpi.info/spec.htm). The default
> behavior for BIOS is to pass the control to the operating system with the
> local x2APICs in xAPIC mode if all APIC IDs reported by CPUID.0BH:EDX are less
> than 255, and in x2APIC mode if there are any logical processor reporting an
> APIC ID of 255 or greater."
> 
> In this patch, hvmloader enables x2apic mode for all vcpus if there are cpus
> with APIC ID > 255. To wake up processors whose APIC ID is greater than 255,
> the SIPI is broadcasted to all APs. It is the way how Seabios wakes up APs.
> APs may compete for the stack, thus a lock is introduced to protect the stack.

Hm, how are we going to deal with this on PVH? hvmloader doesn't run
for PVH guests, hence it seems like switching to x2APIC mode should be
done somewhere else that shared between HVM and PVH.

Maybe the hypercall that sets the number of vCPUs should change the
APIC mode?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 8/8] x86/hvm: bump the maximum number of vcpus to 512
  2018-02-23  8:50     ` Jan Beulich
@ 2018-02-23 17:18       ` Wei Liu
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Liu @ 2018-02-23 17:18 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Chao Gao

On Fri, Feb 23, 2018 at 01:50:05AM -0700, Jan Beulich wrote:
> >>> On 22.02.18 at 19:46, <wei.liu2@citrix.com> wrote:
> > On Wed, Dec 06, 2017 at 03:50:14PM +0800, Chao Gao wrote:
> >> --- a/xen/include/public/hvm/hvm_info_table.h
> >> +++ b/xen/include/public/hvm/hvm_info_table.h
> >> @@ -32,7 +32,7 @@
> >>  #define HVM_INFO_PADDR       ((HVM_INFO_PFN << 12) + HVM_INFO_OFFSET)
> >>  
> >>  /* Maximum we can support with current vLAPIC ID mapping. */
> >> -#define HVM_MAX_VCPUS        128
> >> +#define HVM_MAX_VCPUS        512
> > 
> > I checked a few places where this is used, bumping the number doesn't
> > seem to be harmful on the surface.
> > 
> > Of course there is the question how many CPUs can upstream support,
> > I think it is still under argument at the moment.
> 
> Leaving the latter aside, it is never okay to simply change a #define
> in the public interface. See e.g. fb442e2171 ("x86_64: allow more
> vCPU-s per guest"), where we've already gone a somewhat harsh
> route by dropping the original #define altogether (thus at least
> causing build failures for consumers), taking the position that it
> wasn't correct to expose the value in the first place.
> 
> Hence at the very least I can't see how struct hvm_info_table
> (in the same public header) can be left alone with this change.

AIUI this change has made the structure larger by extending the trailing
array. The resulting structure still fits into one page. At least to me
the new ABI is still compatible with the old.

The commit you mentioned is different because it completely breaks ABI
compatibility.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 8/8] x86/hvm: bump the maximum number of vcpus to 512
  2017-12-06  7:50 ` [RFC Patch v4 8/8] x86/hvm: bump the maximum number of vcpus to 512 Chao Gao
  2018-02-22 18:46   ` Wei Liu
@ 2018-02-23 18:11   ` Roger Pau Monné
  2018-02-24  6:26     ` Chao Gao
  2018-02-26  8:26     ` Jan Beulich
  1 sibling, 2 replies; 56+ messages in thread
From: Roger Pau Monné @ 2018-02-23 18:11 UTC (permalink / raw)
  To: Chao Gao
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Tim Deegan,
	Ian Jackson, xen-devel, Jan Beulich, Andrew Cooper

On Wed, Dec 06, 2017 at 03:50:14PM +0800, Chao Gao wrote:
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
>  xen/include/public/hvm/hvm_info_table.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/include/public/hvm/hvm_info_table.h b/xen/include/public/hvm/hvm_info_table.h
> index 08c252e..6833a4c 100644
> --- a/xen/include/public/hvm/hvm_info_table.h
> +++ b/xen/include/public/hvm/hvm_info_table.h
> @@ -32,7 +32,7 @@
>  #define HVM_INFO_PADDR       ((HVM_INFO_PFN << 12) + HVM_INFO_OFFSET)
>  
>  /* Maximum we can support with current vLAPIC ID mapping. */
> -#define HVM_MAX_VCPUS        128
> +#define HVM_MAX_VCPUS        512

Wow, that looks like a pretty big jump. I certainly don't have access
to any box with this number of vCPUs, so that's going to be quite hard
to test. What the reasoning behind this bump? Is hardware with 512
ways expected soon-ish?

Also osstest is not even able to test the current limit, so I would
maybe bump this to 256, but as I expressed in other occasions I don't
feel comfortable with have a number of vCPUs that the current test
system doesn't have hardware to test with.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 4/8] hvmloader: boot cpu through broadcast
  2018-02-23 16:42   ` Roger Pau Monné
@ 2018-02-24  5:49     ` Chao Gao
  2018-02-26  8:28       ` Jan Beulich
  0 siblings, 1 reply; 56+ messages in thread
From: Chao Gao @ 2018-02-24  5:49 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Andrew Cooper, Ian Jackson, Wei Liu, Jan Beulich, xen-devel

On Fri, Feb 23, 2018 at 04:42:10PM +0000, Roger Pau Monné wrote:
>On Wed, Dec 06, 2017 at 03:50:10PM +0800, Chao Gao wrote:
>> Intel SDM Extended XAPIC (X2APIC) -> "Initialization by System Software"
>> has the following description:
>> 
>> "The ACPI interfaces for the x2APIC are described in Section 5.2, “ACPI System
>> Description Tables,” of the Advanced Configuration and Power Interface
>> Specification, Revision 4.0a (http://www.acpi.info/spec.htm). The default
>> behavior for BIOS is to pass the control to the operating system with the
>> local x2APICs in xAPIC mode if all APIC IDs reported by CPUID.0BH:EDX are less
>> than 255, and in x2APIC mode if there are any logical processor reporting an
>> APIC ID of 255 or greater."
>> 
>> In this patch, hvmloader enables x2apic mode for all vcpus if there are cpus
>> with APIC ID > 255. To wake up processors whose APIC ID is greater than 255,
>> the SIPI is broadcasted to all APs. It is the way how Seabios wakes up APs.
>> APs may compete for the stack, thus a lock is introduced to protect the stack.
>
>Hm, how are we going to deal with this on PVH? hvmloader doesn't run
>for PVH guests, hence it seems like switching to x2APIC mode should be
>done somewhere else that shared between HVM and PVH.
>
>Maybe the hypercall that sets the number of vCPUs should change the
>APIC mode?

Yes. A flag can be passed when setting the maximum number of vCPUs. Xen
will switch all local APICs to x2APIC mode or xAPIC mode according to
the flag.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 8/8] x86/hvm: bump the maximum number of vcpus to 512
  2018-02-23 18:11   ` Roger Pau Monné
@ 2018-02-24  6:26     ` Chao Gao
  2018-02-26  8:26     ` Jan Beulich
  1 sibling, 0 replies; 56+ messages in thread
From: Chao Gao @ 2018-02-24  6:26 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Tim Deegan,
	Ian Jackson, xen-devel, Jan Beulich, Andrew Cooper

On Fri, Feb 23, 2018 at 06:11:39PM +0000, Roger Pau Monné wrote:
>On Wed, Dec 06, 2017 at 03:50:14PM +0800, Chao Gao wrote:
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> ---
>>  xen/include/public/hvm/hvm_info_table.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/xen/include/public/hvm/hvm_info_table.h b/xen/include/public/hvm/hvm_info_table.h
>> index 08c252e..6833a4c 100644
>> --- a/xen/include/public/hvm/hvm_info_table.h
>> +++ b/xen/include/public/hvm/hvm_info_table.h
>> @@ -32,7 +32,7 @@
>>  #define HVM_INFO_PADDR       ((HVM_INFO_PFN << 12) + HVM_INFO_OFFSET)
>>  
>>  /* Maximum we can support with current vLAPIC ID mapping. */
>> -#define HVM_MAX_VCPUS        128
>> +#define HVM_MAX_VCPUS        512
>
>Wow, that looks like a pretty big jump. I certainly don't have access
>to any box with this number of vCPUs, so that's going to be quite hard
>to test. What the reasoning behind this bump?

There are 288 cpus on some Intel XEON-phi platform. Bumping this value to
288 would be weird because I think if I do that there would be some
questions -- why it should be 288 and why the number of vCPUs should be the
same with the number of pCPUs. I choose 512, just because it is the nearest
next power of 2 number.

>Is hardware with 512 ways expected soon-ish?

No. 

>
>Also osstest is not even able to test the current limit, so I would

Do we have any plan to test guest with 128 vCPUs on osstest?

>maybe bump this to 256, but as I expressed in other occasions I don't
>feel comfortable with have a number of vCPUs that the current test
>system doesn't have hardware to test with.

If we only test functionality (osstest doesn't test performance,
right?), hardware needn't have so many pCPUs.

Could we declare that more than 128 vCPUs support is only
experimental feature? Once some regular tests are available, we can
announce that On x86, Xen supports HVM with, for example, 128 vCPUs or
512 vCPUs. It seems Geroge's work on SUPPORT.md is for this kind of propose.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 8/8] x86/hvm: bump the maximum number of vcpus to 512
  2018-02-23 18:11   ` Roger Pau Monné
  2018-02-24  6:26     ` Chao Gao
@ 2018-02-26  8:26     ` Jan Beulich
  2018-02-26 13:11       ` Chao Gao
  1 sibling, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2018-02-26  8:26 UTC (permalink / raw)
  To: Roger Pau Monné, Chao Gao
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel

>>> On 23.02.18 at 19:11, <roger.pau@citrix.com> wrote:
> On Wed, Dec 06, 2017 at 03:50:14PM +0800, Chao Gao wrote:
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> ---
>>  xen/include/public/hvm/hvm_info_table.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/xen/include/public/hvm/hvm_info_table.h 
> b/xen/include/public/hvm/hvm_info_table.h
>> index 08c252e..6833a4c 100644
>> --- a/xen/include/public/hvm/hvm_info_table.h
>> +++ b/xen/include/public/hvm/hvm_info_table.h
>> @@ -32,7 +32,7 @@
>>  #define HVM_INFO_PADDR       ((HVM_INFO_PFN << 12) + HVM_INFO_OFFSET)
>>  
>>  /* Maximum we can support with current vLAPIC ID mapping. */
>> -#define HVM_MAX_VCPUS        128
>> +#define HVM_MAX_VCPUS        512
> 
> Wow, that looks like a pretty big jump. I certainly don't have access
> to any box with this number of vCPUs, so that's going to be quite hard
> to test. What the reasoning behind this bump? Is hardware with 512
> ways expected soon-ish?
> 
> Also osstest is not even able to test the current limit, so I would
> maybe bump this to 256, but as I expressed in other occasions I don't
> feel comfortable with have a number of vCPUs that the current test
> system doesn't have hardware to test with.

I think implementation limit and supported limit need to be clearly
distinguished here. Therefore I'd put the question the other way
around: What's causing the limit to be 512, rather than 1024,
4096, or even 4G-1 (x2APIC IDs are 32 bits wide, after all)?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 4/8] hvmloader: boot cpu through broadcast
  2018-02-24  5:49     ` Chao Gao
@ 2018-02-26  8:28       ` Jan Beulich
  2018-02-26 12:33         ` Chao Gao
  0 siblings, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2018-02-26  8:28 UTC (permalink / raw)
  To: Chao Gao
  Cc: Andrew Cooper, xen-devel, Wei Liu, Ian Jackson, Roger Pau Monné

>>> On 24.02.18 at 06:49, <chao.gao@intel.com> wrote:
> On Fri, Feb 23, 2018 at 04:42:10PM +0000, Roger Pau Monné wrote:
>>On Wed, Dec 06, 2017 at 03:50:10PM +0800, Chao Gao wrote:
>>> Intel SDM Extended XAPIC (X2APIC) -> "Initialization by System Software"
>>> has the following description:
>>> 
>>> "The ACPI interfaces for the x2APIC are described in Section 5.2, “ACPI System
>>> Description Tables,” of the Advanced Configuration and Power Interface
>>> Specification, Revision 4.0a (http://www.acpi.info/spec.htm). The default
>>> behavior for BIOS is to pass the control to the operating system with the
>>> local x2APICs in xAPIC mode if all APIC IDs reported by CPUID.0BH:EDX are less
>>> than 255, and in x2APIC mode if there are any logical processor reporting an
>>> APIC ID of 255 or greater."
>>> 
>>> In this patch, hvmloader enables x2apic mode for all vcpus if there are cpus
>>> with APIC ID > 255. To wake up processors whose APIC ID is greater than 255,
>>> the SIPI is broadcasted to all APs. It is the way how Seabios wakes up APs.
>>> APs may compete for the stack, thus a lock is introduced to protect the stack.
>>
>>Hm, how are we going to deal with this on PVH? hvmloader doesn't run
>>for PVH guests, hence it seems like switching to x2APIC mode should be
>>done somewhere else that shared between HVM and PVH.
>>
>>Maybe the hypercall that sets the number of vCPUs should change the
>>APIC mode?
> 
> Yes. A flag can be passed when setting the maximum number of vCPUs. Xen
> will switch all local APICs to x2APIC mode or xAPIC mode according to
> the flag.

A flag? Where? Why isn't 256+ vCPU-s on its own sufficient to tell
that the mode needs to be switched?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 4/8] hvmloader: boot cpu through broadcast
  2018-02-26  8:28       ` Jan Beulich
@ 2018-02-26 12:33         ` Chao Gao
  2018-02-26 14:19           ` Roger Pau Monné
  0 siblings, 1 reply; 56+ messages in thread
From: Chao Gao @ 2018-02-26 12:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, xen-devel, Wei Liu, Ian Jackson, Roger Pau Monné

On Mon, Feb 26, 2018 at 01:28:07AM -0700, Jan Beulich wrote:
>>>> On 24.02.18 at 06:49, <chao.gao@intel.com> wrote:
>> On Fri, Feb 23, 2018 at 04:42:10PM +0000, Roger Pau Monné wrote:
>>>On Wed, Dec 06, 2017 at 03:50:10PM +0800, Chao Gao wrote:
>>>> Intel SDM Extended XAPIC (X2APIC) -> "Initialization by System Software"
>>>> has the following description:
>>>> 
>>>> "The ACPI interfaces for the x2APIC are described in Section 5.2, “ACPI System
>>>> Description Tables,” of the Advanced Configuration and Power Interface
>>>> Specification, Revision 4.0a (http://www.acpi.info/spec.htm). The default
>>>> behavior for BIOS is to pass the control to the operating system with the
>>>> local x2APICs in xAPIC mode if all APIC IDs reported by CPUID.0BH:EDX are less
>>>> than 255, and in x2APIC mode if there are any logical processor reporting an
>>>> APIC ID of 255 or greater."
>>>> 
>>>> In this patch, hvmloader enables x2apic mode for all vcpus if there are cpus
>>>> with APIC ID > 255. To wake up processors whose APIC ID is greater than 255,
>>>> the SIPI is broadcasted to all APs. It is the way how Seabios wakes up APs.
>>>> APs may compete for the stack, thus a lock is introduced to protect the stack.
>>>
>>>Hm, how are we going to deal with this on PVH? hvmloader doesn't run
>>>for PVH guests, hence it seems like switching to x2APIC mode should be
>>>done somewhere else that shared between HVM and PVH.
>>>
>>>Maybe the hypercall that sets the number of vCPUs should change the
>>>APIC mode?
>> 
>> Yes. A flag can be passed when setting the maximum number of vCPUs. Xen
>> will switch all local APICs to x2APIC mode or xAPIC mode according to
>> the flag.
>
>A flag? Where? Why isn't 257+ vCPU-s on its own sufficient to tell
>that the mode needs to be switched?

In struct xen_domctl_max_vcpus, a flag, like SWITCH_TO_X2APIC_MODE, can
be used to instruct Xen to initialize vlapic and do this switch.

Yes, it is another option: Xen can do this switch when need. This
solution leads to smaller code change compared with introducing a new
flag when setting the maximum number of vCPUs.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 8/8] x86/hvm: bump the maximum number of vcpus to 512
  2018-02-26  8:26     ` Jan Beulich
@ 2018-02-26 13:11       ` Chao Gao
  2018-02-26 16:10         ` Jan Beulich
  2018-02-27 14:59         ` George Dunlap
  0 siblings, 2 replies; 56+ messages in thread
From: Chao Gao @ 2018-02-26 13:11 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Roger Pau Monné

On Mon, Feb 26, 2018 at 01:26:42AM -0700, Jan Beulich wrote:
>>>> On 23.02.18 at 19:11, <roger.pau@citrix.com> wrote:
>> On Wed, Dec 06, 2017 at 03:50:14PM +0800, Chao Gao wrote:
>>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>>> ---
>>>  xen/include/public/hvm/hvm_info_table.h | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> diff --git a/xen/include/public/hvm/hvm_info_table.h 
>> b/xen/include/public/hvm/hvm_info_table.h
>>> index 08c252e..6833a4c 100644
>>> --- a/xen/include/public/hvm/hvm_info_table.h
>>> +++ b/xen/include/public/hvm/hvm_info_table.h
>>> @@ -32,7 +32,7 @@
>>>  #define HVM_INFO_PADDR       ((HVM_INFO_PFN << 12) + HVM_INFO_OFFSET)
>>>  
>>>  /* Maximum we can support with current vLAPIC ID mapping. */
>>> -#define HVM_MAX_VCPUS        128
>>> +#define HVM_MAX_VCPUS        512
>> 
>> Wow, that looks like a pretty big jump. I certainly don't have access
>> to any box with this number of vCPUs, so that's going to be quite hard
>> to test. What the reasoning behind this bump? Is hardware with 512
>> ways expected soon-ish?
>> 
>> Also osstest is not even able to test the current limit, so I would
>> maybe bump this to 256, but as I expressed in other occasions I don't
>> feel comfortable with have a number of vCPUs that the current test
>> system doesn't have hardware to test with.
>
>I think implementation limit and supported limit need to be clearly
>distinguished here. Therefore I'd put the question the other way
>around: What's causing the limit to be 512, rather than 1024,
>4096, or even 4G-1 (x2APIC IDs are 32 bits wide, after all)?

TBH, I have no idea. When I choose a value, what comes up to my mind is
that the value should be 288, because Intel has Xeon-phi platform which
has 288 physical threads, and some customers wants to use this new platform
for HPC cloud. Furthermore, they requests to support a big VM in which
almost computing and device resources are assigned to the VM. They just
use virtulization technology to manage the machines. In this situation,
I choose 512 is because I feel much better if the limit is a power of 2.

You are asking that as these patches remove limitations imposed by some
components, which one is the next bottleneck and how many vcpus does it
limit.  Maybe it would be the use-case. No one is requesting to support
more than 288 at this moment. So what is the value you prefer? 288 or
512? or you think I should find the next bottleneck in Xen's
implementation.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 4/8] hvmloader: boot cpu through broadcast
  2018-02-26 12:33         ` Chao Gao
@ 2018-02-26 14:19           ` Roger Pau Monné
  0 siblings, 0 replies; 56+ messages in thread
From: Roger Pau Monné @ 2018-02-26 14:19 UTC (permalink / raw)
  To: Chao Gao; +Cc: Andrew Cooper, Wei Liu, Ian Jackson, Jan Beulich, xen-devel

On Mon, Feb 26, 2018 at 08:33:23PM +0800, Chao Gao wrote:
> On Mon, Feb 26, 2018 at 01:28:07AM -0700, Jan Beulich wrote:
> >>>> On 24.02.18 at 06:49, <chao.gao@intel.com> wrote:
> >> On Fri, Feb 23, 2018 at 04:42:10PM +0000, Roger Pau Monné wrote:
> >>>On Wed, Dec 06, 2017 at 03:50:10PM +0800, Chao Gao wrote:
> >>>> Intel SDM Extended XAPIC (X2APIC) -> "Initialization by System Software"
> >>>> has the following description:
> >>>> 
> >>>> "The ACPI interfaces for the x2APIC are described in Section 5.2, “ACPI System
> >>>> Description Tables,” of the Advanced Configuration and Power Interface
> >>>> Specification, Revision 4.0a (http://www.acpi.info/spec.htm). The default
> >>>> behavior for BIOS is to pass the control to the operating system with the
> >>>> local x2APICs in xAPIC mode if all APIC IDs reported by CPUID.0BH:EDX are less
> >>>> than 255, and in x2APIC mode if there are any logical processor reporting an
> >>>> APIC ID of 255 or greater."
> >>>> 
> >>>> In this patch, hvmloader enables x2apic mode for all vcpus if there are cpus
> >>>> with APIC ID > 255. To wake up processors whose APIC ID is greater than 255,
> >>>> the SIPI is broadcasted to all APs. It is the way how Seabios wakes up APs.
> >>>> APs may compete for the stack, thus a lock is introduced to protect the stack.
> >>>
> >>>Hm, how are we going to deal with this on PVH? hvmloader doesn't run
> >>>for PVH guests, hence it seems like switching to x2APIC mode should be
> >>>done somewhere else that shared between HVM and PVH.
> >>>
> >>>Maybe the hypercall that sets the number of vCPUs should change the
> >>>APIC mode?
> >> 
> >> Yes. A flag can be passed when setting the maximum number of vCPUs. Xen
> >> will switch all local APICs to x2APIC mode or xAPIC mode according to
> >> the flag.
> >
> >A flag? Where? Why isn't 257+ vCPU-s on its own sufficient to tell
> >that the mode needs to be switched?
> 
> In struct xen_domctl_max_vcpus, a flag, like SWITCH_TO_X2APIC_MODE, can
> be used to instruct Xen to initialize vlapic and do this switch.
> 
> Yes, it is another option: Xen can do this switch when need. This
> solution leads to smaller code change compared with introducing a new
> flag when setting the maximum number of vCPUs.

Since APIC ID is currently hardcoded in guest_cpuid as vcpu_id * 2,
IMO Xen should switch to x2APIC mode when it detects that vCPUs > 128,
like Jan has suggest. Then you won't need to modify hvmloader at all,
and the same would work for PVH I assume?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 8/8] x86/hvm: bump the maximum number of vcpus to 512
  2018-02-26 13:11       ` Chao Gao
@ 2018-02-26 16:10         ` Jan Beulich
  2018-03-01  5:21           ` Chao Gao
  2018-02-27 14:59         ` George Dunlap
  1 sibling, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2018-02-26 16:10 UTC (permalink / raw)
  To: Chao Gao
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Roger Pau Monné

>>> On 26.02.18 at 14:11, <chao.gao@intel.com> wrote:
> On Mon, Feb 26, 2018 at 01:26:42AM -0700, Jan Beulich wrote:
>>>>> On 23.02.18 at 19:11, <roger.pau@citrix.com> wrote:
>>> On Wed, Dec 06, 2017 at 03:50:14PM +0800, Chao Gao wrote:
>>>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>>>> ---
>>>>  xen/include/public/hvm/hvm_info_table.h | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>> 
>>>> diff --git a/xen/include/public/hvm/hvm_info_table.h 
>>> b/xen/include/public/hvm/hvm_info_table.h
>>>> index 08c252e..6833a4c 100644
>>>> --- a/xen/include/public/hvm/hvm_info_table.h
>>>> +++ b/xen/include/public/hvm/hvm_info_table.h
>>>> @@ -32,7 +32,7 @@
>>>>  #define HVM_INFO_PADDR       ((HVM_INFO_PFN << 12) + HVM_INFO_OFFSET)
>>>>  
>>>>  /* Maximum we can support with current vLAPIC ID mapping. */
>>>> -#define HVM_MAX_VCPUS        128
>>>> +#define HVM_MAX_VCPUS        512
>>> 
>>> Wow, that looks like a pretty big jump. I certainly don't have access
>>> to any box with this number of vCPUs, so that's going to be quite hard
>>> to test. What the reasoning behind this bump? Is hardware with 512
>>> ways expected soon-ish?
>>> 
>>> Also osstest is not even able to test the current limit, so I would
>>> maybe bump this to 256, but as I expressed in other occasions I don't
>>> feel comfortable with have a number of vCPUs that the current test
>>> system doesn't have hardware to test with.
>>
>>I think implementation limit and supported limit need to be clearly
>>distinguished here. Therefore I'd put the question the other way
>>around: What's causing the limit to be 512, rather than 1024,
>>4096, or even 4G-1 (x2APIC IDs are 32 bits wide, after all)?
> 
> TBH, I have no idea. When I choose a value, what comes up to my mind is
> that the value should be 288, because Intel has Xeon-phi platform which
> has 288 physical threads, and some customers wants to use this new platform
> for HPC cloud. Furthermore, they requests to support a big VM in which
> almost computing and device resources are assigned to the VM. They just
> use virtulization technology to manage the machines. In this situation,
> I choose 512 is because I feel much better if the limit is a power of 2.
> 
> You are asking that as these patches remove limitations imposed by some
> components, which one is the next bottleneck and how many vcpus does it
> limit.  Maybe it would be the use-case. No one is requesting to support
> more than 288 at this moment. So what is the value you prefer? 288 or
> 512? or you think I should find the next bottleneck in Xen's
> implementation.

Again - here we're talking about implementation limits, not
bottlenecks. So in this context all I'm interested in is whether
(and if so which) implementation limit remains. If an (almost)
arbitrary number is fine, perhaps we'll want to have a Kconfig
option.

I'm also curious - do Phis not come in multi-socket configs? It's
my understanding that 288 is the count for a single socket.

As to bottlenecks - you've been told they exist far below 128.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 7/8] x86/hvm: bump the number of pages of shadow memory
  2017-12-06  7:50 ` [RFC Patch v4 7/8] x86/hvm: bump the number of pages of shadow memory Chao Gao
@ 2018-02-27 14:17   ` George Dunlap
  2018-04-18  8:53   ` Jan Beulich
  1 sibling, 0 replies; 56+ messages in thread
From: George Dunlap @ 2018-02-27 14:17 UTC (permalink / raw)
  To: Chao Gao, xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Jan Beulich

On 12/06/2017 07:50 AM, Chao Gao wrote:
> Each vcpu of hvm guest consumes at least one shadow page. Currently, only 256
> (for hap case) pages are pre-allocated as shadow memory at beginning. It would
> run out if guest has more than 256 vcpus and guest creation fails. Bump the
> number of shadow pages to 2 * HVM_MAX_VCPUS for hap case and 8 * HVM_MAX_VCPUS
> for shadow case.
> 
> This patch won't lead to more memory consumption for the size of shadow memory
> will be adjusted via XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION according to the size
> of guest memory and the number of vcpus.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>

Acked-by: George Dunlap <george.dunlap@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 8/8] x86/hvm: bump the maximum number of vcpus to 512
  2018-02-26 13:11       ` Chao Gao
  2018-02-26 16:10         ` Jan Beulich
@ 2018-02-27 14:59         ` George Dunlap
  1 sibling, 0 replies; 56+ messages in thread
From: George Dunlap @ 2018-02-27 14:59 UTC (permalink / raw)
  To: Chao Gao, Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Roger Pau Monné

On 02/26/2018 01:11 PM, Chao Gao wrote:
> On Mon, Feb 26, 2018 at 01:26:42AM -0700, Jan Beulich wrote:
>>>>> On 23.02.18 at 19:11, <roger.pau@citrix.com> wrote:
>>> On Wed, Dec 06, 2017 at 03:50:14PM +0800, Chao Gao wrote:
>>>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>>>> ---
>>>>  xen/include/public/hvm/hvm_info_table.h | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/xen/include/public/hvm/hvm_info_table.h 
>>> b/xen/include/public/hvm/hvm_info_table.h
>>>> index 08c252e..6833a4c 100644
>>>> --- a/xen/include/public/hvm/hvm_info_table.h
>>>> +++ b/xen/include/public/hvm/hvm_info_table.h
>>>> @@ -32,7 +32,7 @@
>>>>  #define HVM_INFO_PADDR       ((HVM_INFO_PFN << 12) + HVM_INFO_OFFSET)
>>>>  
>>>>  /* Maximum we can support with current vLAPIC ID mapping. */
>>>> -#define HVM_MAX_VCPUS        128
>>>> +#define HVM_MAX_VCPUS        512
>>>
>>> Wow, that looks like a pretty big jump. I certainly don't have access
>>> to any box with this number of vCPUs, so that's going to be quite hard
>>> to test. What the reasoning behind this bump? Is hardware with 512
>>> ways expected soon-ish?
>>>
>>> Also osstest is not even able to test the current limit, so I would
>>> maybe bump this to 256, but as I expressed in other occasions I don't
>>> feel comfortable with have a number of vCPUs that the current test
>>> system doesn't have hardware to test with.
>>
>> I think implementation limit and supported limit need to be clearly
>> distinguished here. Therefore I'd put the question the other way
>> around: What's causing the limit to be 512, rather than 1024,
>> 4096, or even 4G-1 (x2APIC IDs are 32 bits wide, after all)?
> 
> TBH, I have no idea. When I choose a value, what comes up to my mind is
> that the value should be 288, because Intel has Xeon-phi platform which
> has 288 physical threads, and some customers wants to use this new platform
> for HPC cloud. Furthermore, they requests to support a big VM in which
> almost computing and device resources are assigned to the VM. They just
> use virtulization technology to manage the machines. In this situation,
> I choose 512 is because I feel much better if the limit is a power of 2.
> 
> You are asking that as these patches remove limitations imposed by some
> components, which one is the next bottleneck and how many vcpus does it
> limit.  Maybe it would be the use-case. No one is requesting to support
> more than 288 at this moment. So what is the value you prefer? 288 or
> 512? or you think I should find the next bottleneck in Xen's
> implementation.

I understood Jan to be responding to Roger -- Roger said he didn't want
to increase the limit beyond what osstest could reasonably test; Jan
said that what osstest can test should be factored into the *supported*
limit, not the *implementation* limit.

I agree with Jan: People should be allowed to run systems with 288
vcpus, understanding that they are at that point running essentially an
experimental system which may not work; and that if it doesn't work, it
may be that nobody will be able to reproduce and thus fix their issue.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 8/8] x86/hvm: bump the maximum number of vcpus to 512
  2018-02-26 16:10         ` Jan Beulich
@ 2018-03-01  5:21           ` Chao Gao
  2018-03-01  7:17             ` Juergen Gross
  2018-03-01  7:37             ` Jan Beulich
  0 siblings, 2 replies; 56+ messages in thread
From: Chao Gao @ 2018-03-01  5:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Roger Pau Monné

On Mon, Feb 26, 2018 at 09:10:33AM -0700, Jan Beulich wrote:
>>>> On 26.02.18 at 14:11, <chao.gao@intel.com> wrote:
>> On Mon, Feb 26, 2018 at 01:26:42AM -0700, Jan Beulich wrote:
>>>>>> On 23.02.18 at 19:11, <roger.pau@citrix.com> wrote:
>>>> On Wed, Dec 06, 2017 at 03:50:14PM +0800, Chao Gao wrote:
>>>>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>>>>> ---
>>>>>  xen/include/public/hvm/hvm_info_table.h | 2 +-
>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>> 
>>>>> diff --git a/xen/include/public/hvm/hvm_info_table.h 
>>>> b/xen/include/public/hvm/hvm_info_table.h
>>>>> index 08c252e..6833a4c 100644
>>>>> --- a/xen/include/public/hvm/hvm_info_table.h
>>>>> +++ b/xen/include/public/hvm/hvm_info_table.h
>>>>> @@ -32,7 +32,7 @@
>>>>>  #define HVM_INFO_PADDR       ((HVM_INFO_PFN << 12) + HVM_INFO_OFFSET)
>>>>>  
>>>>>  /* Maximum we can support with current vLAPIC ID mapping. */
>>>>> -#define HVM_MAX_VCPUS        128
>>>>> +#define HVM_MAX_VCPUS        512
>>>> 
>>>> Wow, that looks like a pretty big jump. I certainly don't have access
>>>> to any box with this number of vCPUs, so that's going to be quite hard
>>>> to test. What the reasoning behind this bump? Is hardware with 512
>>>> ways expected soon-ish?
>>>> 
>>>> Also osstest is not even able to test the current limit, so I would
>>>> maybe bump this to 256, but as I expressed in other occasions I don't
>>>> feel comfortable with have a number of vCPUs that the current test
>>>> system doesn't have hardware to test with.
>>>
>>>I think implementation limit and supported limit need to be clearly
>>>distinguished here. Therefore I'd put the question the other way
>>>around: What's causing the limit to be 512, rather than 1024,
>>>4096, or even 4G-1 (x2APIC IDs are 32 bits wide, after all)?
>> 
>> TBH, I have no idea. When I choose a value, what comes up to my mind is
>> that the value should be 288, because Intel has Xeon-phi platform which
>> has 288 physical threads, and some customers wants to use this new platform
>> for HPC cloud. Furthermore, they requests to support a big VM in which
>> almost computing and device resources are assigned to the VM. They just
>> use virtulization technology to manage the machines. In this situation,
>> I choose 512 is because I feel much better if the limit is a power of 2.
>> 
>> You are asking that as these patches remove limitations imposed by some
>> components, which one is the next bottleneck and how many vcpus does it
>> limit.  Maybe it would be the use-case. No one is requesting to support
>> more than 288 at this moment. So what is the value you prefer? 288 or
>> 512? or you think I should find the next bottleneck in Xen's
>> implementation.
>
>Again - here we're talking about implementation limits, not
>bottlenecks. So in this context all I'm interested in is whether
>(and if so which) implementation limit remains. If an (almost)
>arbitrary number is fine, perhaps we'll want to have a Kconfig
>option.

Do you think that struct hvm_info_table would be a implementation
limits? To contain this struct in a single page, the HVM_MAX_VCPUS
should be smaller than a value, like (PAGE_SIZE * 8). Supposing
it is the only implementation limit, I don't think it is reasonable
to set HVM_MAX_VCPUS to that value, because we don't have hardwares to
perform tests, even Xeon-phi isn't capable. This value can be bumped
when some methods verify a guest can work with more vcpus. Now I
prefer 288 over 512 and some values else.

>
>I'm also curious - do Phis not come in multi-socket configs? It's
>my understanding that 288 is the count for a single socket.

Currently we don't have. But it's hard to say for future products.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 8/8] x86/hvm: bump the maximum number of vcpus to 512
  2018-03-01  7:37             ` Jan Beulich
@ 2018-03-01  7:11               ` Chao Gao
  0 siblings, 0 replies; 56+ messages in thread
From: Chao Gao @ 2018-03-01  7:11 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel, roger.pau

On Thu, Mar 01, 2018 at 12:37:57AM -0700, Jan Beulich wrote:
>>>> Chao Gao <chao.gao@intel.com> 03/01/18 7:34 AM >>>
>>On Mon, Feb 26, 2018 at 09:10:33AM -0700, Jan Beulich wrote:
>>>Again - here we're talking about implementation limits, not
>>>bottlenecks. So in this context all I'm interested in is whether
>>>(and if so which) implementation limit remains. If an (almost)
>>>arbitrary number is fine, perhaps we'll want to have a Kconfig
>>>option.
>>
>>Do you think that struct hvm_info_table would be a implementation
>>limits? To contain this struct in a single page, the HVM_MAX_VCPUS
>>should be smaller than a value, like (PAGE_SIZE * 8). Supposing
>>it is the only implementation limit, I don't think it is reasonable
>>to set HVM_MAX_VCPUS to that value, because we don't have hardwares to
>>perform tests, even Xeon-phi isn't capable. This value can be bumped
>>when some methods verify a guest can work with more vcpus. Now I
>>prefer 288 over 512 and some values else.
>
>Whether going beyond PAGE_SIZE with the structure size is a valid item
>to think about, but I don't think there's any implied limit from that. But -
>did you read my and George's subsequent reply at all? You continue to

Yes. I did. But somehow I didn't clearly understand the difference.
Sorry for this.

>mix up supported (because of being able to test) limits with implementation
>ones. Even Jürgen's suggestion to take NR_CPUS as the limit is not very
>reasonable - PV guests have an implementation limit of (iirc) 8192. Once
>again - if there's no sensible upper limit imposed by the implementation,
>consider introducing a Kconfig option to pick the limit.

Got it.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 8/8] x86/hvm: bump the maximum number of vcpus to 512
  2018-03-01  5:21           ` Chao Gao
@ 2018-03-01  7:17             ` Juergen Gross
  2018-03-01  7:37             ` Jan Beulich
  1 sibling, 0 replies; 56+ messages in thread
From: Juergen Gross @ 2018-03-01  7:17 UTC (permalink / raw)
  To: Chao Gao, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Ian Jackson,
	Tim Deegan, xen-devel, Andrew Cooper, Roger Pau Monné

On 01/03/18 06:21, Chao Gao wrote:
> On Mon, Feb 26, 2018 at 09:10:33AM -0700, Jan Beulich wrote:
>>>>> On 26.02.18 at 14:11, <chao.gao@intel.com> wrote:
>>> On Mon, Feb 26, 2018 at 01:26:42AM -0700, Jan Beulich wrote:
>>>>>>> On 23.02.18 at 19:11, <roger.pau@citrix.com> wrote:
>>>>> On Wed, Dec 06, 2017 at 03:50:14PM +0800, Chao Gao wrote:
>>>>>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>>>>>> ---
>>>>>>  xen/include/public/hvm/hvm_info_table.h | 2 +-
>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/xen/include/public/hvm/hvm_info_table.h 
>>>>> b/xen/include/public/hvm/hvm_info_table.h
>>>>>> index 08c252e..6833a4c 100644
>>>>>> --- a/xen/include/public/hvm/hvm_info_table.h
>>>>>> +++ b/xen/include/public/hvm/hvm_info_table.h
>>>>>> @@ -32,7 +32,7 @@
>>>>>>  #define HVM_INFO_PADDR       ((HVM_INFO_PFN << 12) + HVM_INFO_OFFSET)
>>>>>>  
>>>>>>  /* Maximum we can support with current vLAPIC ID mapping. */
>>>>>> -#define HVM_MAX_VCPUS        128
>>>>>> +#define HVM_MAX_VCPUS        512
>>>>>
>>>>> Wow, that looks like a pretty big jump. I certainly don't have access
>>>>> to any box with this number of vCPUs, so that's going to be quite hard
>>>>> to test. What the reasoning behind this bump? Is hardware with 512
>>>>> ways expected soon-ish?
>>>>>
>>>>> Also osstest is not even able to test the current limit, so I would
>>>>> maybe bump this to 256, but as I expressed in other occasions I don't
>>>>> feel comfortable with have a number of vCPUs that the current test
>>>>> system doesn't have hardware to test with.
>>>>
>>>> I think implementation limit and supported limit need to be clearly
>>>> distinguished here. Therefore I'd put the question the other way
>>>> around: What's causing the limit to be 512, rather than 1024,
>>>> 4096, or even 4G-1 (x2APIC IDs are 32 bits wide, after all)?
>>>
>>> TBH, I have no idea. When I choose a value, what comes up to my mind is
>>> that the value should be 288, because Intel has Xeon-phi platform which
>>> has 288 physical threads, and some customers wants to use this new platform
>>> for HPC cloud. Furthermore, they requests to support a big VM in which
>>> almost computing and device resources are assigned to the VM. They just
>>> use virtulization technology to manage the machines. In this situation,
>>> I choose 512 is because I feel much better if the limit is a power of 2.
>>>
>>> You are asking that as these patches remove limitations imposed by some
>>> components, which one is the next bottleneck and how many vcpus does it
>>> limit.  Maybe it would be the use-case. No one is requesting to support
>>> more than 288 at this moment. So what is the value you prefer? 288 or
>>> 512? or you think I should find the next bottleneck in Xen's
>>> implementation.
>>
>> Again - here we're talking about implementation limits, not
>> bottlenecks. So in this context all I'm interested in is whether
>> (and if so which) implementation limit remains. If an (almost)
>> arbitrary number is fine, perhaps we'll want to have a Kconfig
>> option.
> 
> Do you think that struct hvm_info_table would be a implementation
> limits? To contain this struct in a single page, the HVM_MAX_VCPUS
> should be smaller than a value, like (PAGE_SIZE * 8). Supposing
> it is the only implementation limit, I don't think it is reasonable
> to set HVM_MAX_VCPUS to that value, because we don't have hardwares to
> perform tests, even Xeon-phi isn't capable. This value can be bumped
> when some methods verify a guest can work with more vcpus. Now I
> prefer 288 over 512 and some values else.
> 
>>
>> I'm also curious - do Phis not come in multi-socket configs? It's
>> my understanding that 288 is the count for a single socket.
> 
> Currently we don't have. But it's hard to say for future products.

Is there any reason to set HVM_MAX_VCPUS to a lower limit than
CONFIG_NR_CPUS? This can be set to 4095, so why not use the same
limit for HVM_MAX_VCPUS?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 8/8] x86/hvm: bump the maximum number of vcpus to 512
  2018-03-01  5:21           ` Chao Gao
  2018-03-01  7:17             ` Juergen Gross
@ 2018-03-01  7:37             ` Jan Beulich
  2018-03-01  7:11               ` Chao Gao
  1 sibling, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2018-03-01  7:37 UTC (permalink / raw)
  To: chao.gao
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel, roger.pau

>>> Chao Gao <chao.gao@intel.com> 03/01/18 7:34 AM >>>
>On Mon, Feb 26, 2018 at 09:10:33AM -0700, Jan Beulich wrote:
>>Again - here we're talking about implementation limits, not
>>bottlenecks. So in this context all I'm interested in is whether
>>(and if so which) implementation limit remains. If an (almost)
>>arbitrary number is fine, perhaps we'll want to have a Kconfig
>>option.
>
>Do you think that struct hvm_info_table would be a implementation
>limits? To contain this struct in a single page, the HVM_MAX_VCPUS
>should be smaller than a value, like (PAGE_SIZE * 8). Supposing
>it is the only implementation limit, I don't think it is reasonable
>to set HVM_MAX_VCPUS to that value, because we don't have hardwares to
>perform tests, even Xeon-phi isn't capable. This value can be bumped
>when some methods verify a guest can work with more vcpus. Now I
>prefer 288 over 512 and some values else.

Whether going beyond PAGE_SIZE with the structure size is a valid item
to think about, but I don't think there's any implied limit from that. But -
did you read my and George's subsequent reply at all? You continue to
mix up supported (because of being able to test) limits with implementation
ones. Even Jürgen's suggestion to take NR_CPUS as the limit is not very
reasonable - PV guests have an implementation limit of (iirc) 8192. Once
again - if there's no sensible upper limit imposed by the implementation,
consider introducing a Kconfig option to pick the limit.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages
  2017-12-06  7:50 ` [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages Chao Gao
  2017-12-06 15:04   ` Paul Durrant
@ 2018-04-18  8:19   ` Jan Beulich
  1 sibling, 0 replies; 56+ messages in thread
From: Jan Beulich @ 2018-04-18  8:19 UTC (permalink / raw)
  To: Chao Gao
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, xen-devel, Paul Durrant

>>> On 06.12.17 at 08:50, <chao.gao@intel.com> wrote:
> One 4K-byte page at most contains 128 'ioreq_t'. In order to remove the vcpu
> number constraint imposed by one IOREQ page, bump the number of IOREQ page to
> 4 pages. With this patch, multiple pages can be used as IOREQ page.

In case I didn't say so before - I'm opposed to simply changing the upper limit
here. Please make sure any number of vCPU-s can be supported by the new
code (it looks like it mostly does already, so it may be mainly the description
which needs changing). If we want to impose an upper limit, this shouldn't
affect the ioreq page handling code at all.

> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -64,14 +64,24 @@ static struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
>              continue; \
>          else
>  
> +/* Iterate over all ioreq pages */
> +#define FOR_EACH_IOREQ_PAGE(s, i, iorp) \
> +    for ( (i) = 0, iorp = s->ioreq; (i) < (s)->ioreq_page_nr; (i)++, iorp++ )

You're going too far with parenthesization here: Just like iorp, i is required to
be a simple identifier anyway. Otoh you parenthesize s only once.

> +static ioreq_t *get_ioreq_fallible(struct hvm_ioreq_server *s, struct vcpu *v)

What is "fallible"? And why such a separate wrapper anyway? But I guess a lot
of this will need to change anyway with Paul's recent changes. I'm therefore not
going to look in close detail at any of this.

>  static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
>  {
> -    hvm_unmap_ioreq_gfn(s, &s->ioreq);
> +    int i;

At the example of this - unsigned int please in all cases where the value can't go
negative.

> @@ -688,8 +741,15 @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server *s,
>      INIT_LIST_HEAD(&s->ioreq_vcpu_list);
>      spin_lock_init(&s->bufioreq_lock);
>  
> -    s->ioreq.gfn = INVALID_GFN;
> +    FOR_EACH_IOREQ_PAGE(s, i, iorp)
> +        iorp->gfn = INVALID_GFN;
>      s->bufioreq.gfn = INVALID_GFN;
> +    s->ioreq_page_nr = (d->max_vcpus + IOREQ_NUM_PER_PAGE - 1) /
> +                       IOREQ_NUM_PER_PAGE;

DIV_ROUND_UP() - please don't open-code things.

> --- a/xen/include/public/hvm/params.h
> +++ b/xen/include/public/hvm/params.h
> @@ -279,6 +279,12 @@
>  #define XEN_HVM_MCA_CAP_LMCE   (xen_mk_ullong(1) << 0)
>  #define XEN_HVM_MCA_CAP_MASK   XEN_HVM_MCA_CAP_LMCE
>  
> -#define HVM_NR_PARAMS 39
> +/*
> + * Number of pages that are reserved for default IOREQ server. The base PFN
> + * is set via HVM_PARAM_IOREQ_PFN.
> + */
> +#define HVM_PARAM_IOREQ_PAGES 39

Why is this needed? It can be derived from the vCPU count permitted for the
domain.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 4/8] hvmloader: boot cpu through broadcast
  2017-12-06  7:50 ` [RFC Patch v4 4/8] hvmloader: boot cpu through broadcast Chao Gao
  2018-02-22 18:44   ` Wei Liu
  2018-02-23 16:42   ` Roger Pau Monné
@ 2018-04-18  8:38   ` Jan Beulich
  2018-04-18 11:20     ` Chao Gao
  2 siblings, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2018-04-18  8:38 UTC (permalink / raw)
  To: Chao Gao; +Cc: Andrew Cooper, Wei Liu, Ian Jackson, xen-devel

>>> On 06.12.17 at 08:50, <chao.gao@intel.com> wrote:
> Intel SDM Extended XAPIC (X2APIC) -> "Initialization by System Software"
> has the following description:
> 
> "The ACPI interfaces for the x2APIC are described in Section 5.2, “ACPI System
> Description Tables,” of the Advanced Configuration and Power Interface
> Specification, Revision 4.0a (http://www.acpi.info/spec.htm). The default
> behavior for BIOS is to pass the control to the operating system with the
> local x2APICs in xAPIC mode if all APIC IDs reported by CPUID.0BH:EDX are less
> than 255, and in x2APIC mode if there are any logical processor reporting an
> APIC ID of 255 or greater."

With this you mean ...

> In this patch, hvmloader enables x2apic mode for all vcpus if there are cpus
> with APIC ID > 255. To wake up processors whose APIC ID is greater than 255,

">= 255" and "greater than 254" here respectively. Please be precise.

> the SIPI is broadcasted to all APs. It is the way how Seabios wakes up APs.
> APs may compete for the stack, thus a lock is introduced to protect the 
> stack.



> --- a/tools/firmware/hvmloader/smp.c
> +++ b/tools/firmware/hvmloader/smp.c
> @@ -26,7 +26,9 @@
>  #define AP_BOOT_EIP 0x1000
>  extern char ap_boot_start[], ap_boot_end[];
>  
> -static int ap_callin, ap_cpuid;
> +static int ap_callin;
> +static int enable_x2apic;
> +static bool lock = 1;
>  
>  asm (
>      "    .text                       \n"
> @@ -47,7 +49,15 @@ asm (
>      "    mov   %eax,%ds              \n"
>      "    mov   %eax,%es              \n"
>      "    mov   %eax,%ss              \n"
> -    "    movl  $stack_top,%esp       \n"
> +    "3:  movb  $1, %bl               \n"
> +    "    mov   $lock,%edx            \n"
> +    "    movzbl %bl,%eax             \n"
> +    "    xchg  %al, (%edx)           \n"
> +    "    test  %al,%al               \n"
> +    "    je    2f                    \n"
> +    "    pause                       \n"
> +    "    jmp   3b                    \n"
> +    "2:  movl  $stack_top,%esp       \n"

Please be consistent with suffixes: Either add them only when really needed
(preferred) or add them uniformly everywhere (when permitted of course).
I also don't understand why you need to use %bl here at all.

> @@ -68,14 +78,34 @@ asm (
>      "    .text                       \n"
>      );
>  
> +unsigned int ap_cpuid(void)

static

> +{
> +    if ( !(rdmsr(MSR_IA32_APICBASE) & MSR_IA32_APICBASE_EXTD) )
> +    {
> +        uint32_t eax, ebx, ecx, edx;
> +
> +        cpuid(1, &eax, &ebx, &ecx, &edx);
> +        return ebx >> 24;
> +    }
> +    else

pointless "else"

> +        return rdmsr(MSR_IA32_APICBASE_MSR + (APIC_ID >> 4));
> +}
> +
>  void ap_start(void); /* non-static avoids unused-function compiler warning */
>  /*static*/ void ap_start(void)
>  {
> -    printf(" - CPU%d ... ", ap_cpuid);
> +    printf(" - CPU%d ... ", ap_cpuid());
>      cacheattr_init();
>      printf("done.\n");
>      wmb();
> -    ap_callin = 1;
> +    ap_callin++;
> +
> +    if ( enable_x2apic )
> +        wrmsr(MSR_IA32_APICBASE, rdmsr(MSR_IA32_APICBASE) |
> +                                 MSR_IA32_APICBASE_EXTD);
> +
> +    /* Release the lock */
> +    asm volatile ( "xchgb %1, %b0" : : "m" (lock), "r" (0) : "memory" );
>  }

How can you release the lock here - you've not switched off the stack, and
you're about to actually use it (for returning from the function)?

> @@ -125,9 +169,15 @@ void smp_initialise(void)
>      memcpy((void *)AP_BOOT_EIP, ap_boot_start, ap_boot_end - ap_boot_start);
>  
>      printf("Multiprocessor initialisation:\n");
> +    if ( nr_cpus > MADT_MAX_LOCAL_APIC )
> +        enable_x2apic = 1;
> +
>      ap_start();
> -    for ( i = 1; i < nr_cpus; i++ )
> -        boot_cpu(i);
> +    if ( nr_cpus > MADT_MAX_LOCAL_APIC )

Where does MADT_MAX_LOCAL_APIC come from? I can't find it anywhere, and
hence can't judge (along the lines of my remark on the description) whether this
is a correct comparison.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 6/8] hvmload: Add x2apic entry support in the MADT and SRAT build
  2017-12-06  7:50 ` [RFC Patch v4 6/8] hvmload: Add x2apic entry support in the MADT and SRAT build Chao Gao
@ 2018-04-18  8:48   ` Jan Beulich
  0 siblings, 0 replies; 56+ messages in thread
From: Jan Beulich @ 2018-04-18  8:48 UTC (permalink / raw)
  To: Chao Gao; +Cc: Tianyu Lan, Ian Jackson, Wei Liu, xen-devel

>>> On 06.12.17 at 08:50, <chao.gao@intel.com> wrote:
> --- a/tools/libacpi/build.c
> +++ b/tools/libacpi/build.c
> @@ -30,6 +30,11 @@
>  
>  #define align16(sz)        (((sz) + 15) & ~15)
>  #define fixed_strcpy(d, s) strncpy((d), (s), sizeof(d))
> +#define min(X, Y) ({                             \
> +            const typeof (X) _x = (X);           \
> +            const typeof (Y) _y = (Y);           \
> +            (void) (&_x == &_y);                 \
> +            (_x < _y) ? _x : _y; })

No new name space violations please: Identifiers starting with a single underscore
and a lower case letter are reserved for file scope symbols. Use a trailing
underscore instead, for example.

> @@ -79,16 +84,19 @@ static struct acpi_20_madt *construct_madt(struct acpi_ctxt *ctxt,
>      struct acpi_20_madt_intsrcovr *intsrcovr;
>      struct acpi_20_madt_ioapic    *io_apic;
>      struct acpi_20_madt_lapic     *lapic;
> +    struct acpi_20_madt_x2apic    *x2apic;
>      const struct hvm_info_table   *hvminfo = config->hvminfo;
> -    int i, sz;
> +    int i, sz, nr_apic;

Apart from your own addition I think both i and sz also want to be unsigned.
Please make such corrections when you touch a line anyway.

>      if ( config->lapic_id == NULL )
>          return NULL;
>  
> +    nr_apic = min(hvminfo->nr_vcpus, MADT_MAX_LOCAL_APIC);

With this the variable name is not well chosen. nr_xapic perhaps?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 7/8] x86/hvm: bump the number of pages of shadow memory
  2017-12-06  7:50 ` [RFC Patch v4 7/8] x86/hvm: bump the number of pages of shadow memory Chao Gao
  2018-02-27 14:17   ` George Dunlap
@ 2018-04-18  8:53   ` Jan Beulich
  2018-04-18 11:39     ` Chao Gao
  1 sibling, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2018-04-18  8:53 UTC (permalink / raw)
  To: Chao Gao; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, xen-devel

>>> On 06.12.17 at 08:50, <chao.gao@intel.com> wrote:
> Each vcpu of hvm guest consumes at least one shadow page. Currently, only 256
> (for hap case) pages are pre-allocated as shadow memory at beginning. It would
> run out if guest has more than 256 vcpus and guest creation fails. Bump the
> number of shadow pages to 2 * HVM_MAX_VCPUS for hap case and 8 * HVM_MAX_VCPUS
> for shadow case.
> 
> This patch won't lead to more memory consumption for the size of shadow memory
> will be adjusted via XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION according to the size
> of guest memory and the number of vcpus.

I don't understand this: What's the purpose of bumping the values if it won't lead
to higher memory consumption? Afaict there's be higher consumption at least
transiently. And I don't see why this would need doing independent of the intended
vCPU count in the guest. I guess you want to base your series on top on Andrew's
max-vCPU-s adjustments (which sadly didn't become ready in time for 4.11).

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 4/8] hvmloader: boot cpu through broadcast
  2018-04-18  8:38   ` Jan Beulich
@ 2018-04-18 11:20     ` Chao Gao
  2018-04-18 11:50       ` Jan Beulich
  0 siblings, 1 reply; 56+ messages in thread
From: Chao Gao @ 2018-04-18 11:20 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, Ian Jackson, xen-devel

On Wed, Apr 18, 2018 at 02:38:48AM -0600, Jan Beulich wrote:
>>>> On 06.12.17 at 08:50, <chao.gao@intel.com> wrote:
>> Intel SDM Extended XAPIC (X2APIC) -> "Initialization by System Software"
>> has the following description:
>> 
>> "The ACPI interfaces for the x2APIC are described in Section 5.2, “ACPI System
>> Description Tables,” of the Advanced Configuration and Power Interface
>> Specification, Revision 4.0a (http://www.acpi.info/spec.htm). The default
>> behavior for BIOS is to pass the control to the operating system with the
>> local x2APICs in xAPIC mode if all APIC IDs reported by CPUID.0BH:EDX are less
>> than 255, and in x2APIC mode if there are any logical processor reporting an
>> APIC ID of 255 or greater."
>
>With this you mean ...
>
>> In this patch, hvmloader enables x2apic mode for all vcpus if there are cpus
>> with APIC ID > 255. To wake up processors whose APIC ID is greater than 255,
>
>">= 255" and "greater than 254" here respectively. Please be precise.

Will do.

>
>> the SIPI is broadcasted to all APs. It is the way how Seabios wakes up APs.
>> APs may compete for the stack, thus a lock is introduced to protect the 
>> stack.
>
>
>
>> --- a/tools/firmware/hvmloader/smp.c
>> +++ b/tools/firmware/hvmloader/smp.c
>> @@ -26,7 +26,9 @@
>>  #define AP_BOOT_EIP 0x1000
>>  extern char ap_boot_start[], ap_boot_end[];
>>  
>> -static int ap_callin, ap_cpuid;
>> +static int ap_callin;
>> +static int enable_x2apic;
>> +static bool lock = 1;
>>  
>>  asm (
>>      "    .text                       \n"
>> @@ -47,7 +49,15 @@ asm (
>>      "    mov   %eax,%ds              \n"
>>      "    mov   %eax,%es              \n"
>>      "    mov   %eax,%ss              \n"
>> -    "    movl  $stack_top,%esp       \n"
>> +    "3:  movb  $1, %bl               \n"
>> +    "    mov   $lock,%edx            \n"
>> +    "    movzbl %bl,%eax             \n"
>> +    "    xchg  %al, (%edx)           \n"
>> +    "    test  %al,%al               \n"
>> +    "    je    2f                    \n"
>> +    "    pause                       \n"
>> +    "    jmp   3b                    \n"
>> +    "2:  movl  $stack_top,%esp       \n"
>
>Please be consistent with suffixes: Either add them only when really needed
>(preferred) or add them uniformly everywhere (when permitted of course).
>I also don't understand why you need to use %bl here at all.

will remove %bl stuff.

>
>> @@ -68,14 +78,34 @@ asm (
>>      "    .text                       \n"
>>      );
>>  
>> +unsigned int ap_cpuid(void)
>
>static
>
>> +{
>> +    if ( !(rdmsr(MSR_IA32_APICBASE) & MSR_IA32_APICBASE_EXTD) )
>> +    {
>> +        uint32_t eax, ebx, ecx, edx;
>> +
>> +        cpuid(1, &eax, &ebx, &ecx, &edx);
>> +        return ebx >> 24;
>> +    }
>> +    else
>
>pointless "else"
>
>> +        return rdmsr(MSR_IA32_APICBASE_MSR + (APIC_ID >> 4));
>> +}
>> +
>>  void ap_start(void); /* non-static avoids unused-function compiler warning */
>>  /*static*/ void ap_start(void)
>>  {
>> -    printf(" - CPU%d ... ", ap_cpuid);
>> +    printf(" - CPU%d ... ", ap_cpuid());
>>      cacheattr_init();
>>      printf("done.\n");
>>      wmb();
>> -    ap_callin = 1;
>> +    ap_callin++;
>> +
>> +    if ( enable_x2apic )
>> +        wrmsr(MSR_IA32_APICBASE, rdmsr(MSR_IA32_APICBASE) |
>> +                                 MSR_IA32_APICBASE_EXTD);
>> +
>> +    /* Release the lock */
>> +    asm volatile ( "xchgb %1, %b0" : : "m" (lock), "r" (0) : "memory" );
>>  }
>
>How can you release the lock here - you've not switched off the stack, and
>you're about to actually use it (for returning from the function)?

    "2:  movl  $stack_top,%esp       \n"
    "    movl  %esp,%ebp             \n"
    "    call  ap_start              \n"
    "1:  hlt                         \n"
    "    jmp  1b                     \n"

Yes. I think it would be right to release the lock following the
"call ap_start" here.
>
>> @@ -125,9 +169,15 @@ void smp_initialise(void)
>>      memcpy((void *)AP_BOOT_EIP, ap_boot_start, ap_boot_end - ap_boot_start);
>>  
>>      printf("Multiprocessor initialisation:\n");
>> +    if ( nr_cpus > MADT_MAX_LOCAL_APIC )
>> +        enable_x2apic = 1;
>> +
>>      ap_start();
>> -    for ( i = 1; i < nr_cpus; i++ )
>> -        boot_cpu(i);
>> +    if ( nr_cpus > MADT_MAX_LOCAL_APIC )
>
>Where does MADT_MAX_LOCAL_APIC come from? I can't find it anywhere, and
>hence can't judge (along the lines of my remark on the description) whether this
>is a correct comparison.

It is defined in patch 5/8. I know it is a mistake.

With regard to this point, I only boot up vCPU-s via broadcasting
INIT-STARTUP signal when the number of vCPU-s is greater than a
given value, otherwise the previous way will be used.

In general, before all APs are up, BSP couldn't know each AP's APIC ID
(currently, we know that because APIC ID can be inferred from vcpu_id) and
whether there is a vCPU with APIC_ID >= 255. I plan to discard the
old way and always use broadcast to boot up APs. And we don't need
MADT_MAX_LOCAL_APIC here. Instead, a global variable can be set if
any vCPU-s finds its APIC ID is greater than 254. Then all CPUs can
switch to x2apic mode if need be. Do you think it is reasonable?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 7/8] x86/hvm: bump the number of pages of shadow memory
  2018-04-18  8:53   ` Jan Beulich
@ 2018-04-18 11:39     ` Chao Gao
  2018-04-18 11:50       ` Andrew Cooper
  2018-04-18 11:59       ` Jan Beulich
  0 siblings, 2 replies; 56+ messages in thread
From: Chao Gao @ 2018-04-18 11:39 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, xen-devel

On Wed, Apr 18, 2018 at 02:53:03AM -0600, Jan Beulich wrote:
>>>> On 06.12.17 at 08:50, <chao.gao@intel.com> wrote:
>> Each vcpu of hvm guest consumes at least one shadow page. Currently, only 256
>> (for hap case) pages are pre-allocated as shadow memory at beginning. It would
>> run out if guest has more than 256 vcpus and guest creation fails. Bump the
>> number of shadow pages to 2 * HVM_MAX_VCPUS for hap case and 8 * HVM_MAX_VCPUS
>> for shadow case.
>> 
>> This patch won't lead to more memory consumption for the size of shadow memory
>> will be adjusted via XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION according to the size
>> of guest memory and the number of vcpus.
>
>I don't understand this: What's the purpose of bumping the values if it won't lead
>to higher memory consumption? Afaict there's be higher consumption at least
>transiently. And I don't see why this would need doing independent of the intended
>vCPU count in the guest. I guess you want to base your series on top on Andrew's
>max-vCPU-s adjustments (which sadly didn't become ready in time for 4.11).

The situation here is some pages are pre-allocated as P2M page for domain
initialization. After vCPU creation, the total number of P2M page are
adjusted by the domctl interface. Before vCPU creation, this domctl
is unusable for the check in paging_domctl():
 if ( unlikely(d->vcpu == NULL) || unlikely(d->vcpu[0] == NULL) )

When the number of a guest's vCPU is small, the pre-allocated are
enough. But it won't be if the number of vCPU is bigger than 256. Each
vCPU will at least use one P2M page when it is created, seeing
contruct_vmcs()->hap_update_paging_modes().

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 4/8] hvmloader: boot cpu through broadcast
  2018-04-18 11:20     ` Chao Gao
@ 2018-04-18 11:50       ` Jan Beulich
  0 siblings, 0 replies; 56+ messages in thread
From: Jan Beulich @ 2018-04-18 11:50 UTC (permalink / raw)
  To: Chao Gao; +Cc: Andrew Cooper, Wei Liu, Ian Jackson, xen-devel

>>> On 18.04.18 at 13:20, <chao.gao@intel.com> wrote:
> On Wed, Apr 18, 2018 at 02:38:48AM -0600, Jan Beulich wrote:
>>>>> On 06.12.17 at 08:50, <chao.gao@intel.com> wrote:
>>>  /*static*/ void ap_start(void)
>>>  {
>>> -    printf(" - CPU%d ... ", ap_cpuid);
>>> +    printf(" - CPU%d ... ", ap_cpuid());
>>>      cacheattr_init();
>>>      printf("done.\n");
>>>      wmb();
>>> -    ap_callin = 1;
>>> +    ap_callin++;
>>> +
>>> +    if ( enable_x2apic )
>>> +        wrmsr(MSR_IA32_APICBASE, rdmsr(MSR_IA32_APICBASE) |
>>> +                                 MSR_IA32_APICBASE_EXTD);
>>> +
>>> +    /* Release the lock */
>>> +    asm volatile ( "xchgb %1, %b0" : : "m" (lock), "r" (0) : "memory" );
>>>  }
>>
>>How can you release the lock here - you've not switched off the stack, and
>>you're about to actually use it (for returning from the function)?
> 
>     "2:  movl  $stack_top,%esp       \n"
>     "    movl  %esp,%ebp             \n"
>     "    call  ap_start              \n"
>     "1:  hlt                         \n"
>     "    jmp  1b                     \n"
> 
> Yes. I think it would be right to release the lock following the
> "call ap_start" here.

Strictly speaking even that's too early, as an NMI might arrive at the HLT
(before actually executing it). Otoh that's going to be deadly anyway, so
perhaps with a suitable comment that might be accptable.

>>> @@ -125,9 +169,15 @@ void smp_initialise(void)
>>>      memcpy((void *)AP_BOOT_EIP, ap_boot_start, ap_boot_end - ap_boot_start);
>>>  
>>>      printf("Multiprocessor initialisation:\n");
>>> +    if ( nr_cpus > MADT_MAX_LOCAL_APIC )
>>> +        enable_x2apic = 1;
>>> +
>>>      ap_start();
>>> -    for ( i = 1; i < nr_cpus; i++ )
>>> -        boot_cpu(i);
>>> +    if ( nr_cpus > MADT_MAX_LOCAL_APIC )
>>
>>Where does MADT_MAX_LOCAL_APIC come from? I can't find it anywhere, and
>>hence can't judge (along the lines of my remark on the description) whether 
> this
>>is a correct comparison.
> 
> It is defined in patch 5/8. I know it is a mistake.
> 
> With regard to this point, I only boot up vCPU-s via broadcasting
> INIT-STARTUP signal when the number of vCPU-s is greater than a
> given value, otherwise the previous way will be used.
> 
> In general, before all APs are up, BSP couldn't know each AP's APIC ID
> (currently, we know that because APIC ID can be inferred from vcpu_id) and
> whether there is a vCPU with APIC_ID >= 255. I plan to discard the
> old way and always use broadcast to boot up APs. And we don't need
> MADT_MAX_LOCAL_APIC here. Instead, a global variable can be set if
> any vCPU-s finds its APIC ID is greater than 254. Then all CPUs can
> switch to x2apic mode if need be. Do you think it is reasonable?

At the first glance - why not.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 7/8] x86/hvm: bump the number of pages of shadow memory
  2018-04-18 11:39     ` Chao Gao
@ 2018-04-18 11:50       ` Andrew Cooper
  2018-04-18 11:59       ` Jan Beulich
  1 sibling, 0 replies; 56+ messages in thread
From: Andrew Cooper @ 2018-04-18 11:50 UTC (permalink / raw)
  To: Chao Gao, Jan Beulich; +Cc: George Dunlap, Tim Deegan, xen-devel

On 18/04/18 12:39, Chao Gao wrote:
> On Wed, Apr 18, 2018 at 02:53:03AM -0600, Jan Beulich wrote:
>>>>> On 06.12.17 at 08:50, <chao.gao@intel.com> wrote:
>>> Each vcpu of hvm guest consumes at least one shadow page. Currently, only 256
>>> (for hap case) pages are pre-allocated as shadow memory at beginning. It would
>>> run out if guest has more than 256 vcpus and guest creation fails. Bump the
>>> number of shadow pages to 2 * HVM_MAX_VCPUS for hap case and 8 * HVM_MAX_VCPUS
>>> for shadow case.
>>>
>>> This patch won't lead to more memory consumption for the size of shadow memory
>>> will be adjusted via XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION according to the size
>>> of guest memory and the number of vcpus.
>> I don't understand this: What's the purpose of bumping the values if it won't lead
>> to higher memory consumption? Afaict there's be higher consumption at least
>> transiently. And I don't see why this would need doing independent of the intended
>> vCPU count in the guest. I guess you want to base your series on top on Andrew's
>> max-vCPU-s adjustments (which sadly didn't become ready in time for 4.11).
> The situation here is some pages are pre-allocated as P2M page for domain
> initialization. After vCPU creation, the total number of P2M page are
> adjusted by the domctl interface. Before vCPU creation, this domctl
> is unusable for the check in paging_domctl():
>  if ( unlikely(d->vcpu == NULL) || unlikely(d->vcpu[0] == NULL) )
>
> When the number of a guest's vCPU is small, the pre-allocated are
> enough. But it won't be if the number of vCPU is bigger than 256. Each
> vCPU will at least use one P2M page when it is created, seeing
> contruct_vmcs()->hap_update_paging_modes().

I've also found with XTF tests that the current minimum shadow
calculations are insufficient for single-vcpu domains with 128M of ram.

I haven't had time fix the algorithm, and XTF is using "shadow_memory=4"
as a bodge workaround.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC Patch v4 7/8] x86/hvm: bump the number of pages of shadow memory
  2018-04-18 11:39     ` Chao Gao
  2018-04-18 11:50       ` Andrew Cooper
@ 2018-04-18 11:59       ` Jan Beulich
  1 sibling, 0 replies; 56+ messages in thread
From: Jan Beulich @ 2018-04-18 11:59 UTC (permalink / raw)
  To: Andrew Cooper, Chao Gao; +Cc: George Dunlap, Tim Deegan, xen-devel

>>> On 18.04.18 at 13:39, <chao.gao@intel.com> wrote:
> On Wed, Apr 18, 2018 at 02:53:03AM -0600, Jan Beulich wrote:
>>>>> On 06.12.17 at 08:50, <chao.gao@intel.com> wrote:
>>> Each vcpu of hvm guest consumes at least one shadow page. Currently, only 
> 256
>>> (for hap case) pages are pre-allocated as shadow memory at beginning. It 
> would
>>> run out if guest has more than 256 vcpus and guest creation fails. Bump the
>>> number of shadow pages to 2 * HVM_MAX_VCPUS for hap case and 8 * 
> HVM_MAX_VCPUS
>>> for shadow case.
>>> 
>>> This patch won't lead to more memory consumption for the size of shadow memory
>>> will be adjusted via XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION according to the size
>>> of guest memory and the number of vcpus.
>>
>>I don't understand this: What's the purpose of bumping the values if it won't lead
>>to higher memory consumption? Afaict there's be higher consumption at least
>>transiently. And I don't see why this would need doing independent of the intended
>>vCPU count in the guest. I guess you want to base your series on top on Andrew's
>>max-vCPU-s adjustments (which sadly didn't become ready in time for 4.11).
> 
> The situation here is some pages are pre-allocated as P2M page for domain
> initialization. After vCPU creation, the total number of P2M page are
> adjusted by the domctl interface. Before vCPU creation, this domctl
> is unusable for the check in paging_domctl():
>  if ( unlikely(d->vcpu == NULL) || unlikely(d->vcpu[0] == NULL) )

Hence my reference to Andrew's series.

> When the number of a guest's vCPU is small, the pre-allocated are
> enough. But it won't be if the number of vCPU is bigger than 256. Each
> vCPU will at least use one P2M page when it is created, seeing
> contruct_vmcs()->hap_update_paging_modes().

Hmm, that might be a problem perhaps to be addressed in Andrew's series
then, as the implication would be that the amount of shadow/p2m memory
also needs to be set right by XEN_DOMCTL_createdomain (which iirc the
series doesn't do so far).

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2018-04-18 11:59 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-06  7:50 [RFC Patch v4 0/8] Extend resources to support more vcpus in single VM Chao Gao
2017-12-06  7:50 ` [RFC Patch v4 1/8] ioreq: remove most 'buf' parameter from static functions Chao Gao
2017-12-06 14:44   ` Paul Durrant
2017-12-06  8:37     ` Chao Gao
2017-12-06  7:50 ` [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages Chao Gao
2017-12-06 15:04   ` Paul Durrant
2017-12-06  9:02     ` Chao Gao
2017-12-06 16:10       ` Paul Durrant
2017-12-07  8:41         ` Paul Durrant
2017-12-07  6:56           ` Chao Gao
2017-12-08 11:06             ` Paul Durrant
2017-12-12  1:03               ` Chao Gao
2017-12-12  9:07                 ` Paul Durrant
2017-12-12 23:39                   ` Chao Gao
2017-12-13 10:49                     ` Paul Durrant
2017-12-13 17:50                       ` Paul Durrant
2017-12-14 14:50                         ` Paul Durrant
2017-12-15  0:35                           ` Chao Gao
2017-12-15  9:40                             ` Paul Durrant
2018-04-18  8:19   ` Jan Beulich
2017-12-06  7:50 ` [RFC Patch v4 3/8] xl/acpi: unify the computation of lapic_id Chao Gao
2018-02-22 18:05   ` Wei Liu
2017-12-06  7:50 ` [RFC Patch v4 4/8] hvmloader: boot cpu through broadcast Chao Gao
2018-02-22 18:44   ` Wei Liu
2018-02-23  8:41     ` Jan Beulich
2018-02-23 16:42   ` Roger Pau Monné
2018-02-24  5:49     ` Chao Gao
2018-02-26  8:28       ` Jan Beulich
2018-02-26 12:33         ` Chao Gao
2018-02-26 14:19           ` Roger Pau Monné
2018-04-18  8:38   ` Jan Beulich
2018-04-18 11:20     ` Chao Gao
2018-04-18 11:50       ` Jan Beulich
2017-12-06  7:50 ` [RFC Patch v4 5/8] Tool/ACPI: DSDT extension to support more vcpus Chao Gao
2017-12-06  7:50 ` [RFC Patch v4 6/8] hvmload: Add x2apic entry support in the MADT and SRAT build Chao Gao
2018-04-18  8:48   ` Jan Beulich
2017-12-06  7:50 ` [RFC Patch v4 7/8] x86/hvm: bump the number of pages of shadow memory Chao Gao
2018-02-27 14:17   ` George Dunlap
2018-04-18  8:53   ` Jan Beulich
2018-04-18 11:39     ` Chao Gao
2018-04-18 11:50       ` Andrew Cooper
2018-04-18 11:59       ` Jan Beulich
2017-12-06  7:50 ` [RFC Patch v4 8/8] x86/hvm: bump the maximum number of vcpus to 512 Chao Gao
2018-02-22 18:46   ` Wei Liu
2018-02-23  8:50     ` Jan Beulich
2018-02-23 17:18       ` Wei Liu
2018-02-23 18:11   ` Roger Pau Monné
2018-02-24  6:26     ` Chao Gao
2018-02-26  8:26     ` Jan Beulich
2018-02-26 13:11       ` Chao Gao
2018-02-26 16:10         ` Jan Beulich
2018-03-01  5:21           ` Chao Gao
2018-03-01  7:17             ` Juergen Gross
2018-03-01  7:37             ` Jan Beulich
2018-03-01  7:11               ` Chao Gao
2018-02-27 14:59         ` George Dunlap

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.