All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/6] HVMlite: DomU fixes and a Dom0 preparatory patch
@ 2016-01-21 16:51 Roger Pau Monne
  2016-01-21 16:51 ` [PATCH v4 1/6] libelf: rewrite symtab/strtab loading for Dom0 Roger Pau Monne
                   ` (5 more replies)
  0 siblings, 6 replies; 29+ messages in thread
From: Roger Pau Monne @ 2016-01-21 16:51 UTC (permalink / raw)
  To: xen-devel

Hello,

The series is sorted in the following way:
 - Patch 1 is a preparatory patch for Dom0 HVMlite support.
 - Patch 4 is a fix from a fallout introduced by the HVMlite series, which
   inadvertently disabled the emulated PIT that was added to PV(H) guests.
 - Patches 2, 3 and 5 expand the fields of the libxl_domain_build_info
   structure in order to store what emulated devices inside of Xen are
   enabled.
 - Patch 6 is new in this iteration and aims to provide a simple way to 
   report to the guest which emulated devices are enabled. I belive this is 
   the last part of the guest ABI that I wanted to do (if we leave 
   pci-passthrough out of the picture). IMHO, the DomU ABI is now complete.

The full series can also be found in my git tree:

http://xenbits.xen.org/gitweb/?p=people/royger/xen.git;a=shortlog;h=refs/heads/hvmlite_fixes_v4

I hope this will help review of patch 1/6, which is a significant rework,
and will probably be better reviewed by looking at the resulting code.

Thanks, Roger.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v4 1/6] libelf: rewrite symtab/strtab loading for Dom0
  2016-01-21 16:51 [PATCH v4 0/6] HVMlite: DomU fixes and a Dom0 preparatory patch Roger Pau Monne
@ 2016-01-21 16:51 ` Roger Pau Monne
  2016-01-21 17:29   ` Ian Jackson
  2016-01-21 16:51 ` [PATCH v4 2/6] libxl: introduce LIBXL_VGA_INTERFACE_TYPE_UNKNOWN Roger Pau Monne
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 29+ messages in thread
From: Roger Pau Monne @ 2016-01-21 16:51 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Ian Jackson, Tim Deegan, Jan Beulich,
	Roger Pau Monne

Current implementation of elf_load_bsdsyms is broken when loading inside of
a HVM guest, because it assumes elf_memcpy_safe is able to write into guest
memory space, which it is not.

Take the oportunity to do some cleanup and properly document how
elf_{parse/load}_bsdsyms works. The new implementation uses elf_load_image
when dealing with data that needs to be copied to the guest memory space.
Also reduce the number of section headers copied to the minimum necessary.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
---
 xen/common/libelf/libelf-loader.c | 213 ++++++++++++++++++++++++++++----------
 1 file changed, 158 insertions(+), 55 deletions(-)

diff --git a/xen/common/libelf/libelf-loader.c b/xen/common/libelf/libelf-loader.c
index 6f42bea..9552d4c 100644
--- a/xen/common/libelf/libelf-loader.c
+++ b/xen/common/libelf/libelf-loader.c
@@ -153,7 +153,7 @@ void elf_parse_bsdsyms(struct elf_binary *elf, uint64_t pstart)
 {
     uint64_t sz;
     ELF_HANDLE_DECL(elf_shdr) shdr;
-    unsigned i, type;
+    unsigned int i;
 
     if ( !ELF_HANDLE_VALID(elf->sym_tab) )
         return;
@@ -164,20 +164,33 @@ void elf_parse_bsdsyms(struct elf_binary *elf, uint64_t pstart)
     sz = sizeof(uint32_t);
 
     /* Space for the elf and elf section headers */
-    sz += (elf_uval(elf, elf->ehdr, e_ehsize) +
-           elf_shdr_count(elf) * elf_uval(elf, elf->ehdr, e_shentsize));
+    sz += elf_uval(elf, elf->ehdr, e_ehsize) +
+          3 * elf_uval(elf, elf->ehdr, e_shentsize);
     sz = elf_round_up(elf, sz);
 
-    /* Space for the symbol and string tables. */
+    /* Space for the symbol and string table. */
     for ( i = 0; i < elf_shdr_count(elf); i++ )
     {
         shdr = elf_shdr_by_index(elf, i);
         if ( !elf_access_ok(elf, ELF_HANDLE_PTRVAL(shdr), 1) )
             /* input has an insane section header count field */
             break;
-        type = elf_uval(elf, shdr, sh_type);
-        if ( (type == SHT_STRTAB) || (type == SHT_SYMTAB) )
-            sz = elf_round_up(elf, sz + elf_uval(elf, shdr, sh_size));
+
+        if ( elf_uval(elf, shdr, sh_type) != SHT_SYMTAB )
+            continue;
+
+        sz = elf_round_up(elf, sz + elf_uval(elf, shdr, sh_size));
+        shdr = elf_shdr_by_index(elf, elf_uval(elf, shdr, sh_link));
+
+        if ( !elf_access_ok(elf, ELF_HANDLE_PTRVAL(shdr), 1) )
+            /* input has an insane section header count field */
+            break;
+
+        if ( elf_uval(elf, shdr, sh_type) != SHT_STRTAB )
+            /* Invalid symtab -> strtab link */
+            break;
+
+        sz = elf_round_up(elf, sz + elf_uval(elf, shdr, sh_size));
     }
 
     elf->bsd_symtab_pstart = pstart;
@@ -186,13 +199,31 @@ void elf_parse_bsdsyms(struct elf_binary *elf, uint64_t pstart)
 
 static void elf_load_bsdsyms(struct elf_binary *elf)
 {
-    ELF_HANDLE_DECL(elf_ehdr) sym_ehdr;
-    unsigned long sz;
-    elf_ptrval maxva;
-    elf_ptrval symbase;
-    elf_ptrval symtab_addr;
-    ELF_HANDLE_DECL(elf_shdr) shdr;
-    unsigned i, type;
+    /*
+     * Header that is placed at the end of the kernel and allows
+     * the OS to find where the symtab and strtab have been loaded.
+     * It mimics a valid ELF file header, although it only contains
+     * a symtab and a strtab section.
+     *
+     * NB: according to the ELF spec there's only ONE symtab per ELF
+     * file, and accordingly we will only load the corresponding
+     * strtab, so we only need three section headers in our fake ELF
+     * header (first section header is always a dummy).
+     */
+    struct __packed {
+        elf_ehdr header;
+        elf_shdr section[3];
+    } symbol_header;
+
+    ELF_HANDLE_DECL(elf_ehdr) header_handle;
+    unsigned long shdr_size;
+    uint32_t symsize;
+    ELF_HANDLE_DECL(elf_shdr) section_handle;
+    ELF_HANDLE_DECL(elf_shdr) image_handle;
+    unsigned int i, link;
+    elf_ptrval header_base;
+    elf_ptrval symtab_base;
+    elf_ptrval strtab_base;
 
     if ( !elf->bsd_symtab_pstart )
         return;
@@ -200,64 +231,136 @@ static void elf_load_bsdsyms(struct elf_binary *elf)
 #define elf_hdr_elm(_elf, _hdr, _elm, _val)     \
 do {                                            \
     if ( elf_64bit(_elf) )                      \
-        elf_store_field(_elf, _hdr, e64._elm, _val);  \
+        (_hdr).e64._elm = _val;                 \
     else                                        \
-        elf_store_field(_elf, _hdr, e32._elm, _val);  \
+        (_hdr).e32._elm = _val;                 \
 } while ( 0 )
 
-    symbase = elf_get_ptr(elf, elf->bsd_symtab_pstart);
-    symtab_addr = maxva = symbase + sizeof(uint32_t);
+#define SYMTAB_INDEX    1
+#define STRTAB_INDEX    2
 
-    /* Set up Elf header. */
-    sym_ehdr = ELF_MAKE_HANDLE(elf_ehdr, symtab_addr);
-    sz = elf_uval(elf, elf->ehdr, e_ehsize);
-    elf_memcpy_safe(elf, ELF_HANDLE_PTRVAL(sym_ehdr), ELF_HANDLE_PTRVAL(elf->ehdr), sz);
-    maxva += sz; /* no round up */
+    /* Allow elf_memcpy_safe to write to symbol_header. */
+    elf->caller_xdest_base = &symbol_header;
+    elf->caller_xdest_size = sizeof(symbol_header);
 
-    elf_hdr_elm(elf, sym_ehdr, e_phoff, 0);
-    elf_hdr_elm(elf, sym_ehdr, e_shoff, elf_uval(elf, elf->ehdr, e_ehsize));
-    elf_hdr_elm(elf, sym_ehdr, e_phentsize, 0);
-    elf_hdr_elm(elf, sym_ehdr, e_phnum, 0);
-
-    /* Copy Elf section headers. */
-    shdr = ELF_MAKE_HANDLE(elf_shdr, maxva);
-    sz = elf_shdr_count(elf) * elf_uval(elf, elf->ehdr, e_shentsize);
-    elf_memcpy_safe(elf, ELF_HANDLE_PTRVAL(shdr),
-                    ELF_IMAGE_BASE(elf) + elf_uval(elf, elf->ehdr, e_shoff),
-                    sz);
-    maxva = elf_round_up(elf, (unsigned long)maxva + sz);
+    /*
+     * Calculate the position of the various elements in GUEST MEMORY SPACE.
+     * This addresses MUST only be used with elf_load_image.
+     *
+     * NB: strtab_base cannot be calculated at this point because we don't
+     * know the size of the symtab yet, and the strtab will be placed after it.
+     */
+    header_base = elf_get_ptr(elf, elf->bsd_symtab_pstart) + sizeof(uint32_t);
+    symtab_base = elf_round_up(elf, header_base + sizeof(symbol_header));
+
+    /* Fill the ELF header, copied from the original ELF header. */
+    header_handle = ELF_MAKE_HANDLE(elf_ehdr,
+                                    ELF_REALPTR2PTRVAL(&symbol_header.header));
+    elf_memcpy_safe(elf, ELF_HANDLE_PTRVAL(header_handle),
+                    ELF_HANDLE_PTRVAL(elf->ehdr),
+                    elf_uval(elf, elf->ehdr, e_ehsize));
+
+    /* Set the offset to the shdr array. */
+    elf_hdr_elm(elf, symbol_header.header, e_shoff,
+                offsetof(typeof(symbol_header), section));
+
+    /* Set the right number of section headers. */
+    elf_hdr_elm(elf, symbol_header.header, e_shnum, 3);
+
+    /* Clear a couple of fields we don't use. */
+    elf_hdr_elm(elf, symbol_header.header, e_phoff, 0);
+    elf_hdr_elm(elf, symbol_header.header, e_phentsize, 0);
+    elf_hdr_elm(elf, symbol_header.header, e_phnum, 0);
+
+    /* Zero the dummy section. */
+    section_handle = ELF_MAKE_HANDLE(elf_shdr,
+                     ELF_REALPTR2PTRVAL(&symbol_header.section[SHN_UNDEF]));
+    shdr_size = elf_uval(elf, elf->ehdr, e_shentsize);
+    elf_memset_safe(elf, ELF_HANDLE_PTRVAL(section_handle), 0, shdr_size);
 
+    /*
+     * Find the actual symtab and strtab in the ELF.
+     *
+     * The symtab section header is going to reside in section[SYMTAB_INDEX],
+     * while the corresponding strtab is going to be placed in
+     * section[STRTAB_INDEX]. sh_offset is mangled so it points to the offset
+     * where the sections are actually loaded (relative to the ELF header
+     * location).
+     */
+    section_handle = ELF_MAKE_HANDLE(elf_shdr,
+                     ELF_REALPTR2PTRVAL(&symbol_header.section[SYMTAB_INDEX]));
     for ( i = 0; i < elf_shdr_count(elf); i++ )
     {
-        elf_ptrval old_shdr_p;
-        elf_ptrval new_shdr_p;
 
-        type = elf_uval(elf, shdr, sh_type);
-        if ( (type == SHT_STRTAB) || (type == SHT_SYMTAB) )
+        image_handle = elf_shdr_by_index(elf, i);
+        if ( elf_uval(elf, image_handle, sh_type) != SHT_SYMTAB )
+            continue;
+
+        elf_memcpy_safe(elf, ELF_HANDLE_PTRVAL(section_handle),
+                        ELF_HANDLE_PTRVAL(image_handle),
+                        shdr_size);
+
+        link = elf_uval(elf, section_handle, sh_link);
+        if ( link == SHN_UNDEF )
         {
-             elf_msg(elf, "%s: shdr %i at 0x%"ELF_PRPTRVAL" -> 0x%"ELF_PRPTRVAL"\n", __func__, i,
-                     elf_section_start(elf, shdr), maxva);
-             sz = elf_uval(elf, shdr, sh_size);
-             elf_memcpy_safe(elf, maxva, elf_section_start(elf, shdr), sz);
-             /* Mangled to be based on ELF header location. */
-             elf_hdr_elm(elf, shdr, sh_offset, maxva - symtab_addr);
-             maxva = elf_round_up(elf, (unsigned long)maxva + sz);
+            elf_mark_broken(elf, "bad link in symtab");
+            break;
         }
-        old_shdr_p = ELF_HANDLE_PTRVAL(shdr);
-        new_shdr_p = old_shdr_p + elf_uval(elf, elf->ehdr, e_shentsize);
-        if ( new_shdr_p <= old_shdr_p ) /* wrapped or stuck */
+
+        /* Load symtab into guest memory. */
+        elf_load_image(elf, symtab_base, elf_section_start(elf, section_handle),
+                       elf_uval(elf, section_handle, sh_size),
+                       elf_uval(elf, section_handle, sh_size));
+        elf_hdr_elm(elf, symbol_header.section[SYMTAB_INDEX], sh_offset,
+                    symtab_base - header_base);
+        elf_hdr_elm(elf, symbol_header.section[SYMTAB_INDEX], sh_link,
+                    STRTAB_INDEX);
+
+        /* Calculate the guest address where strtab is loaded. */
+        strtab_base = elf_round_up(elf, symtab_base +
+                                   elf_uval(elf, section_handle, sh_size));
+
+        /* Load strtab section header. */
+        section_handle = ELF_MAKE_HANDLE(elf_shdr,
+                    ELF_REALPTR2PTRVAL(&symbol_header.section[STRTAB_INDEX]));
+        elf_memcpy_safe(elf, ELF_HANDLE_PTRVAL(section_handle),
+                        ELF_HANDLE_PTRVAL(elf_shdr_by_index(elf, link)),
+                        shdr_size);
+
+        if ( elf_uval(elf, section_handle, sh_type) != SHT_STRTAB )
         {
-            elf_mark_broken(elf, "bad section header length");
+            elf_mark_broken(elf, "strtab not found");
             break;
         }
-        if ( !elf_access_ok(elf, new_shdr_p, 1) ) /* outside image */
-            break;
-        shdr = ELF_MAKE_HANDLE(elf_shdr, new_shdr_p);
+
+        /* Load strtab into guest memory. */
+        elf_load_image(elf, strtab_base, elf_section_start(elf, section_handle),
+                       elf_uval(elf, section_handle, sh_size),
+                       elf_uval(elf, section_handle, sh_size));
+        elf_hdr_elm(elf, symbol_header.section[STRTAB_INDEX], sh_offset,
+                    strtab_base - header_base);
+
+        /* Store the whole size (including headers and loaded sections). */
+        symsize = strtab_base + elf_uval(elf, section_handle, sh_size) -
+                  header_base;
+        break;
     }
 
-    /* Write down the actual sym size. */
-    elf_store_val(elf, uint32_t, symbase, maxva - symtab_addr);
+    /* Load the total size at symtab_pstart. */
+    elf_load_image(elf, elf_get_ptr(elf, elf->bsd_symtab_pstart),
+                   ELF_REALPTR2PTRVAL(&symsize), sizeof(symsize),
+                   sizeof(symsize));
+
+    /* Load the headers. */
+    elf_load_image(elf, header_base, ELF_REALPTR2PTRVAL(&symbol_header),
+                   sizeof(symbol_header), sizeof(symbol_header));
+
+    /* Remove permissions from elf_memcpy_safe. */
+    elf->caller_xdest_base = NULL;
+    elf->caller_xdest_size = 0;
 
+#undef SYMTAB_INDEX
+#undef STRTAB_INDEX
 #undef elf_ehdr_elm
 }
 
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 2/6] libxl: introduce LIBXL_VGA_INTERFACE_TYPE_UNKNOWN
  2016-01-21 16:51 [PATCH v4 0/6] HVMlite: DomU fixes and a Dom0 preparatory patch Roger Pau Monne
  2016-01-21 16:51 ` [PATCH v4 1/6] libelf: rewrite symtab/strtab loading for Dom0 Roger Pau Monne
@ 2016-01-21 16:51 ` Roger Pau Monne
  2016-01-22 10:59   ` Ian Campbell
  2016-01-21 16:51 ` [PATCH v4 3/6] libxl: initialise the build info before calling prepare_config Roger Pau Monne
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 29+ messages in thread
From: Roger Pau Monne @ 2016-01-21 16:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Roger Pau Monne

And use it as the default value for the VGA kind. This allows libxl to set
it to the default value later on when the domain type is known. For HVM
guests the default value is LIBXL_VGA_INTERFACE_TYPE_CIRRUS while for
HVMlite the default value is LIBXL_VGA_INTERFACE_TYPE_NONE.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v3:
 - s/UNDEF/UNKNOWN/.
 - Add a LIBXL_HAVE_VGA_INTERFACE_TYPE_UNKNOWN.
---
 tools/libxl/libxl.h         | 10 ++++++++++
 tools/libxl/libxl_create.c  |  8 ++++++--
 tools/libxl/libxl_dm.c      |  6 ++++++
 tools/libxl/libxl_types.idl |  3 ++-
 4 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 156c0d5..ab76d85 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -876,6 +876,16 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, libxl_mac *src);
  */
 #define LIBXL_HAVE_DEVICE_MODEL_VERSION_NONE 1
 
+/*
+ * LIBXL_HAVE_VGA_INTERFACE_TYPE_UNKNOWN
+ *
+ * In the case that LIBXL_HAVE_VGA_INTERFACE_TYPE_UNKNOWN is set the
+ * libxl_vga_interface_type enumeration type contains a
+ * LIBXL_VGA_INTERFACE_TYPE_UNKNOWN identifier. This is used to signal
+ * that a libxl_vga_interface_type type has not been initialized yet.
+ */
+#define LIBXL_HAVE_VGA_INTERFACE_TYPE_UNKNOWN 1
+
 typedef char **libxl_string_list;
 void libxl_string_list_dispose(libxl_string_list *sl);
 int libxl_string_list_length(const libxl_string_list *sl);
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index e491d83..b669359 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -206,8 +206,12 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
         if (b_info->u.hvm.mmio_hole_memkb == LIBXL_MEMKB_DEFAULT)
             b_info->u.hvm.mmio_hole_memkb = 0;
 
-        if (!b_info->u.hvm.vga.kind)
-            b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_CIRRUS;
+        if (b_info->u.hvm.vga.kind == LIBXL_VGA_INTERFACE_TYPE_UNKNOWN) {
+            if (b_info->device_model_version == LIBXL_DEVICE_MODEL_VERSION_NONE)
+                b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_NONE;
+            else
+                b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_CIRRUS;
+        }
 
         if (!b_info->u.hvm.hdtype)
             b_info->u.hvm.hdtype = LIBXL_HDTYPE_IDE;
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index a088d71..9aa0cc8 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -531,6 +531,9 @@ static int libxl__build_device_model_args_old(libxl__gc *gc,
             break;
         case LIBXL_VGA_INTERFACE_TYPE_QXL:
             break;
+        default:
+            LOG(ERROR, "Invalid emulated video card specified");
+            return ERROR_INVAL;
         }
 
         if (b_info->u.hvm.boot) {
@@ -970,6 +973,9 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
                 GCSPRINTF("qxl-vga,vram_size_mb=%"PRIu64",ram_size_mb=%"PRIu64,
                 (b_info->video_memkb/2/1024), (b_info->video_memkb/2/1024) ) );
             break;
+        default:
+            LOG(ERROR, "Invalid emulated video card specified");
+            return ERROR_INVAL;
         }
 
         if (b_info->u.hvm.boot) {
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 9ad7eba..e9e0da0 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -204,11 +204,12 @@ libxl_shutdown_reason = Enumeration("shutdown_reason", [
     ], init_val = "LIBXL_SHUTDOWN_REASON_UNKNOWN")
 
 libxl_vga_interface_type = Enumeration("vga_interface_type", [
+    (0, "UNKNOWN"),
     (1, "CIRRUS"),
     (2, "STD"),
     (3, "NONE"),
     (4, "QXL"),
-    ], init_val = "LIBXL_VGA_INTERFACE_TYPE_CIRRUS")
+    ], init_val = "LIBXL_VGA_INTERFACE_TYPE_UNKNOWN")
 
 libxl_vendor_device = Enumeration("vendor_device", [
     (0, "NONE"),
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 3/6] libxl: initialise the build info before calling prepare_config
  2016-01-21 16:51 [PATCH v4 0/6] HVMlite: DomU fixes and a Dom0 preparatory patch Roger Pau Monne
  2016-01-21 16:51 ` [PATCH v4 1/6] libelf: rewrite symtab/strtab loading for Dom0 Roger Pau Monne
  2016-01-21 16:51 ` [PATCH v4 2/6] libxl: introduce LIBXL_VGA_INTERFACE_TYPE_UNKNOWN Roger Pau Monne
@ 2016-01-21 16:51 ` Roger Pau Monne
  2016-01-22 11:00   ` Ian Campbell
  2016-01-21 16:51 ` [PATCH v4 4/6] x86/PV: allow PV guests to have an emulated PIT Roger Pau Monne
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 29+ messages in thread
From: Roger Pau Monne @ 2016-01-21 16:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Roger Pau Monne

libxl__arch_domain_prepare_config has access to the libxl_domain_build_info
struct, so make sure it's properly initialised. Note that prepare_config is
called from within libxl__domain_make.

This is not a bug at the moment, because prepare_config doesn't access
libxl_domain_build_info yet, but this is likely going to change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v3:
 - Reword commit message to make it clear that prepare_config is called from
   domain_make, and that the fact that build_info is not initialised is not
   a bug ATM because prepare_config doesn't make use of it.
---
 tools/libxl/libxl_create.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index b669359..c7700a7 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -892,6 +892,12 @@ static void initiate_domain_create(libxl__egc *egc,
         goto error_out;
     }
 
+    ret = libxl__domain_build_info_setdefault(gc, &d_config->b_info);
+    if (ret) {
+        LOG(ERROR, "Unable to set domain build info defaults");
+        goto error_out;
+    }
+
     ret = libxl__domain_make(gc, d_config, &domid, &state->config);
     if (ret) {
         LOG(ERROR, "cannot make domain: %d", ret);
@@ -903,12 +909,6 @@ static void initiate_domain_create(libxl__egc *egc,
     dcs->guest_domid = domid;
     dcs->dmss.dm.guest_domid = 0; /* means we haven't spawned */
 
-    ret = libxl__domain_build_info_setdefault(gc, &d_config->b_info);
-    if (ret) {
-        LOG(ERROR, "Unable to set domain build info defaults");
-        goto error_out;
-    }
-
     if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_HVM &&
         (libxl_defbool_val(d_config->b_info.u.hvm.nested_hvm) &&
          libxl_defbool_val(d_config->b_info.u.hvm.altp2m))) {
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 4/6] x86/PV: allow PV guests to have an emulated PIT
  2016-01-21 16:51 [PATCH v4 0/6] HVMlite: DomU fixes and a Dom0 preparatory patch Roger Pau Monne
                   ` (2 preceding siblings ...)
  2016-01-21 16:51 ` [PATCH v4 3/6] libxl: initialise the build info before calling prepare_config Roger Pau Monne
@ 2016-01-21 16:51 ` Roger Pau Monne
  2016-01-22 10:48   ` Jan Beulich
  2016-01-21 16:51 ` [PATCH v4 5/6] libxl: add options to enable/disable emulated devices Roger Pau Monne
  2016-01-21 16:51 ` [PATCH v4 6/6] x86/HVM: report the set of enabled emulated devices through CPUID Roger Pau Monne
  5 siblings, 1 reply; 29+ messages in thread
From: Roger Pau Monne @ 2016-01-21 16:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

This fixes the fallout from the HVMlite series, that removed the emulated
PIT from PV(H) guests. Also, this patch forces the hardware domain to
always have an emulated PIT, regardless of whether the toolstack specified
one or not.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v3:
 - Allow PV guests to specify whether a PIT is needed or not.
 - Fix pv_pit_handler so it can deal with the fact that the PIT might not be
   present.

Changes since v2:
 - Force the emulated PIT to always be enabled for the hardware domain.
 - Change indentation of the valid set of emulation bitmaps check.
---
 xen/arch/x86/domain.c    | 5 ++++-
 xen/arch/x86/hvm/i8254.c | 3 +++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index e70c125..b005fe0 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -542,8 +542,11 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags,
                    d->domain_id, config->emulation_flags);
             return -EINVAL;
         }
+        if ( is_hardware_domain(d) )
+            config->emulation_flags |= XEN_X86_EMU_PIT;
         if ( config->emulation_flags != 0 &&
-             (!is_hvm_domain(d) || config->emulation_flags != XEN_X86_EMU_ALL) )
+             (is_hvm_domain(d) ? (config->emulation_flags != XEN_X86_EMU_ALL) :
+                                 (config->emulation_flags != XEN_X86_EMU_PIT)) )
         {
             printk(XENLOG_G_ERR "d%d: Xen does not allow %s domain creation "
                    "with the current selection of emulators: %#x\n",
diff --git a/xen/arch/x86/hvm/i8254.c b/xen/arch/x86/hvm/i8254.c
index b517cd6..577b43c 100644
--- a/xen/arch/x86/hvm/i8254.c
+++ b/xen/arch/x86/hvm/i8254.c
@@ -568,6 +568,9 @@ int pv_pit_handler(int port, int data, int write)
         .data = data
     };
 
+    if ( !has_vpit(current->domain) )
+        return ~0;
+
     if ( is_hardware_domain(current->domain) && hwdom_pit_access(&ioreq) )
     {
         /* nothing to do */;
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 5/6] libxl: add options to enable/disable emulated devices
  2016-01-21 16:51 [PATCH v4 0/6] HVMlite: DomU fixes and a Dom0 preparatory patch Roger Pau Monne
                   ` (3 preceding siblings ...)
  2016-01-21 16:51 ` [PATCH v4 4/6] x86/PV: allow PV guests to have an emulated PIT Roger Pau Monne
@ 2016-01-21 16:51 ` Roger Pau Monne
  2016-01-22 17:04   ` Roger Pau Monné
  2016-01-21 16:51 ` [PATCH v4 6/6] x86/HVM: report the set of enabled emulated devices through CPUID Roger Pau Monne
  5 siblings, 1 reply; 29+ messages in thread
From: Roger Pau Monne @ 2016-01-21 16:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell, Roger Pau Monne

Allow enabling or disabling emulated devices from the libxl domain
configuration file. For HVM guests with a device model all the emulated
devices are enabled. For HVM guests without a device model no devices are
enabled by default, although they can be enabled using the options provided.
The arbiter of whether a combination is posible or not is always Xen, libxl
doesn't do any kind of check.

This set of options is also propagated inside of the libxl migration record
as part of the contents of the libxl_domain_build_info struct, so that when
the other end (restore) creates the domain the same set of devices are
enabled. This is important for future compatibility, in case we decide to
enable some emulated devices by default for HVMlite guests, old HVMlite
guests migrated to newer versions should continue to see the same set of
emulated devices.

It has been discussed that it would be better to avoid having this
information inside of the libxl stream, and to instead rely on which devices
get their context loaded inside of Xen on resume. This of course requires
more work and it also has certain issues, like the fact that some devices
don't restore a context at all (like VGA). The consensus is that the
solution presented in this patch is not going to prevent further
developments in this direction, and can always be used as a check to make
sure the Xen context and the libxl context are in sync.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Changes since v3:
 - Fix spelling mistake in xl.cfg man page.
 - Error out if a classic HVM guest tries to use some of the HVMlite
   options.
 - Modify libxl__update_domain_configuration so the fields relevant to the
   set of enabled devices are stored.
 - Expand commit message to clarify the approach taken.
---
 docs/man/xl.cfg.pod.5        | 43 +++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl.h          | 11 +++++++++++
 tools/libxl/libxl_create.c   | 26 +++++++++++++++++++++++++-
 tools/libxl/libxl_internal.c | 25 +++++++++++++++++++++++++
 tools/libxl/libxl_types.idl  |  6 ++++++
 tools/libxl/libxl_x86.c      | 33 ++++++++++++++++++++++++++++-----
 tools/libxl/xl_cmdimpl.c     |  7 +++++++
 7 files changed, 145 insertions(+), 6 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 8899f75..dc47888 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -1762,6 +1762,49 @@ See F<docs/misc/pci-device-reservations.txt> for more information.
 
 =back
 
+=head3 HVM without a device model options
+
+These options can be used to change the set of emulated devices provided
+to guests without a device model. Note that Xen might not support all
+possible combinations. By default HVM guests without a device model
+don't have any of them enabled.
+
+It is important to notice that these options (except the hpet one) are
+not available to HVM guests with a device model, and trying to set them
+will cause xl to exit with an error.
+
+=over 4
+
+=item B<lapic=BOOLEAN>
+
+Enables or disables the Local APIC.
+
+=item B<ioapic=BOOLEAN>
+
+Enables or disables the IO APIC.
+
+=item B<rtc=BOOLEAN>
+
+Enables or disables the RTC.
+
+=item B<power_management=BOOLEAN>
+
+Enables or disables the ACPI power management timer and control interfaces.
+
+=item B<pic=BOOLEAN>
+
+Enables or disables the PIC.
+
+=item B<pit=BOOLEAN>
+
+Enables or disables the PIT.
+
+=item B<hpet=BOOLEAN>
+
+Enables or disables the HPET.
+
+=back
+
 =head2 Device-Model Options
 
 The following options control the selection of the device-model.  This
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index ab76d85..228f192 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -886,6 +886,17 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, libxl_mac *src);
  */
 #define LIBXL_HAVE_VGA_INTERFACE_TYPE_UNKNOWN 1
 
+/*
+ * LIBXL_HAVE_EMULATED_DEVS_OPTIONS
+ *
+ * In the case that LIBXL_HAVE_EMULATED_DEVS_OPTIONS is set libxl
+ * allows enabling or disabling emulated devices for HVM guests
+ * without a device model. The following fields are added to the
+ * hvm structure inside of libxl_domain_build_info: lapic, ioapic,
+ * rtc, power_management, pic, pit.
+ */
+#define LIBXL_HAVE_EMULATED_DEVS_OPTIONS 1
+
 typedef char **libxl_string_list;
 void libxl_string_list_dispose(libxl_string_list *sl);
 int libxl_string_list_length(const libxl_string_list *sl);
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index c7700a7..7d0f570 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -295,13 +295,37 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
         libxl_defbool_setdefault(&b_info->u.hvm.acpi_s4,            true);
         libxl_defbool_setdefault(&b_info->u.hvm.nx,                 true);
         libxl_defbool_setdefault(&b_info->u.hvm.viridian,           false);
-        libxl_defbool_setdefault(&b_info->u.hvm.hpet,               true);
         libxl_defbool_setdefault(&b_info->u.hvm.vpt_align,          true);
         libxl_defbool_setdefault(&b_info->u.hvm.nested_hvm,         false);
         libxl_defbool_setdefault(&b_info->u.hvm.altp2m,             false);
         libxl_defbool_setdefault(&b_info->u.hvm.usb,                false);
         libxl_defbool_setdefault(&b_info->u.hvm.xen_platform_pci,   true);
 
+        if (b_info->device_model_version == LIBXL_DEVICE_MODEL_VERSION_NONE)
+        {
+            libxl_defbool_setdefault(&b_info->u.hvm.hpet,               false);
+            libxl_defbool_setdefault(&b_info->u.hvm.lapic,              false);
+            libxl_defbool_setdefault(&b_info->u.hvm.ioapic,             false);
+            libxl_defbool_setdefault(&b_info->u.hvm.rtc,                false);
+            libxl_defbool_setdefault(&b_info->u.hvm.power_management,   false);
+            libxl_defbool_setdefault(&b_info->u.hvm.pic,                false);
+            libxl_defbool_setdefault(&b_info->u.hvm.pit,                false);
+        } else {
+            libxl_defbool_setdefault(&b_info->u.hvm.hpet,               true);
+
+            if (!libxl_defbool_is_default(b_info->u.hvm.lapic) ||
+                !libxl_defbool_is_default(b_info->u.hvm.ioapic) ||
+                !libxl_defbool_is_default(b_info->u.hvm.rtc) ||
+                !libxl_defbool_is_default(b_info->u.hvm.power_management) ||
+                !libxl_defbool_is_default(b_info->u.hvm.pic) ||
+                !libxl_defbool_is_default(b_info->u.hvm.pit)) {
+                LOG(ERROR, "the lapic, ioapic, rtc, power_management, pic and "
+                           "pit options are only available to HVM guests "
+                           "without a device model");
+                return ERROR_INVAL;
+            }
+        }
+
         libxl_defbool_setdefault(&b_info->u.hvm.spice.enable, false);
         if (!libxl_defbool_val(b_info->u.hvm.spice.enable) &&
             (b_info->u.hvm.spice.usbredirection > 0) ){
diff --git a/tools/libxl/libxl_internal.c b/tools/libxl/libxl_internal.c
index 328046b..791eae5 100644
--- a/tools/libxl/libxl_internal.c
+++ b/tools/libxl/libxl_internal.c
@@ -551,6 +551,31 @@ void libxl__update_domain_configuration(libxl__gc *gc,
 
     /* video ram */
     dst->b_info.video_memkb = src->b_info.video_memkb;
+
+    if (src->b_info.type == LIBXL_DOMAIN_TYPE_HVM) {
+#define check_and_save_hvm_field(src, dst, field)                           \
+    if (!libxl_defbool_is_default((src)->u.hvm.field))                      \
+        libxl_defbool_setdefault(&(dst)->u.hvm.field,                       \
+                                 libxl_defbool_val((src)->u.hvm.field))
+
+        /*
+         * Save the status of emulated devices.
+         *
+         * For HVMlite guests all those fields are forced to a specific
+         * value, so they are always saved. For HVM guests only the HPET
+         * one is used and thus set.
+         */
+        check_and_save_hvm_field(&src->b_info, &dst->b_info, hpet);
+        check_and_save_hvm_field(&src->b_info, &dst->b_info, lapic);
+        check_and_save_hvm_field(&src->b_info, &dst->b_info, ioapic);
+        check_and_save_hvm_field(&src->b_info, &dst->b_info, rtc);
+        check_and_save_hvm_field(&src->b_info, &dst->b_info, power_management);
+        check_and_save_hvm_field(&src->b_info, &dst->b_info, pic);
+        check_and_save_hvm_field(&src->b_info, &dst->b_info, pit);
+#undef check_and_save_hvm_field
+
+        dst->b_info.u.hvm.vga.kind = src->b_info.u.hvm.vga.kind;
+    }
 }
 
 char *libxl__device_model_xs_path(libxl__gc *gc, uint32_t dm_domid,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index e9e0da0..fdec23c 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -519,6 +519,12 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                        ("serial_list",      libxl_string_list),
                                        ("rdm", libxl_rdm_reserve),
                                        ("rdm_mem_boundary_memkb", MemKB),
+                                       ("lapic",            libxl_defbool),
+                                       ("ioapic",           libxl_defbool),
+                                       ("rtc",              libxl_defbool),
+                                       ("power_management", libxl_defbool),
+                                       ("pic",              libxl_defbool),
+                                       ("pit",              libxl_defbool),
                                        ])),
                  ("pv", Struct(None, [("kernel", string),
                                       ("slack_memkb", MemKB),
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index 46cfafb..92f25fd 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -7,15 +7,38 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
                                       libxl_domain_config *d_config,
                                       xc_domain_configuration_t *xc_config)
 {
+    struct libxl_domain_build_info *info = &d_config->b_info;
 
-    if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_HVM &&
-        d_config->b_info.device_model_version !=
-        LIBXL_DEVICE_MODEL_VERSION_NONE) {
-        /* HVM domains with a device model. */
+    if (d_config->c_info.type == LIBXL_DOMAIN_TYPE_PV) {
+        /* PV guests. */
+        xc_config->emulation_flags = XEN_X86_EMU_PIT;
+    } else if (info->device_model_version != LIBXL_DEVICE_MODEL_VERSION_NONE) {
+        /* HVM guests with a device model. */
         xc_config->emulation_flags = XEN_X86_EMU_ALL;
     } else {
-        /* PV or HVM domains without a device model. */
+        /* HVM guests without a device model. */
         xc_config->emulation_flags = 0;
+
+        if (libxl_defbool_val(info->u.hvm.lapic))
+            xc_config->emulation_flags |= XEN_X86_EMU_LAPIC;
+        if (libxl_defbool_val(info->u.hvm.ioapic))
+            xc_config->emulation_flags |= XEN_X86_EMU_IOAPIC;
+        if (libxl_defbool_val(info->u.hvm.rtc))
+            xc_config->emulation_flags |= XEN_X86_EMU_RTC;
+        if (libxl_defbool_val(info->u.hvm.power_management))
+            xc_config->emulation_flags |= XEN_X86_EMU_PM;
+        if (libxl_defbool_val(info->u.hvm.pic))
+            xc_config->emulation_flags |= XEN_X86_EMU_PIC;
+        if (libxl_defbool_val(info->u.hvm.pit))
+            xc_config->emulation_flags |= XEN_X86_EMU_PIT;
+        if (libxl_defbool_val(info->u.hvm.hpet))
+            xc_config->emulation_flags |= XEN_X86_EMU_HPET;
+
+        if (info->u.hvm.vga.kind != LIBXL_VGA_INTERFACE_TYPE_NONE)
+            xc_config->emulation_flags |= XEN_X86_EMU_VGA;
+
+        if (d_config->num_pcidevs != 0)
+            xc_config->emulation_flags |= XEN_X86_EMU_IOMMU;
     }
 
     return 0;
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 25507c7..3654097 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1514,6 +1514,13 @@ static void parse_config_data(const char *config_source,
         xlu_cfg_get_defbool(config, "acpi_s4", &b_info->u.hvm.acpi_s4, 0);
         xlu_cfg_get_defbool(config, "nx", &b_info->u.hvm.nx, 0);
         xlu_cfg_get_defbool(config, "hpet", &b_info->u.hvm.hpet, 0);
+        xlu_cfg_get_defbool(config, "lapic", &b_info->u.hvm.lapic, 0);
+        xlu_cfg_get_defbool(config, "ioapic", &b_info->u.hvm.ioapic, 0);
+        xlu_cfg_get_defbool(config, "rtc", &b_info->u.hvm.rtc, 0);
+        xlu_cfg_get_defbool(config, "power_management",
+                            &b_info->u.hvm.power_management, 0);
+        xlu_cfg_get_defbool(config, "pic", &b_info->u.hvm.pic, 0);
+        xlu_cfg_get_defbool(config, "pit", &b_info->u.hvm.pit, 0);
         xlu_cfg_get_defbool(config, "vpt_align", &b_info->u.hvm.vpt_align, 0);
 
         switch (xlu_cfg_get_list(config, "viridian",
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v4 6/6] x86/HVM: report the set of enabled emulated devices through CPUID
  2016-01-21 16:51 [PATCH v4 0/6] HVMlite: DomU fixes and a Dom0 preparatory patch Roger Pau Monne
                   ` (4 preceding siblings ...)
  2016-01-21 16:51 ` [PATCH v4 5/6] libxl: add options to enable/disable emulated devices Roger Pau Monne
@ 2016-01-21 16:51 ` Roger Pau Monne
  2016-01-22 10:57   ` Jan Beulich
  5 siblings, 1 reply; 29+ messages in thread
From: Roger Pau Monne @ 2016-01-21 16:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

Add a new HVM-specific feature flag that signals the presence of a bitmap
that contains the current set of enabled emulated devices. The bitmap is
placed in the ecx register. The bit fields used in the bitmap are the same
as the ones used in the xen_arch_domainconfig emulation_flags field, and
their meaning can be found at arch-x86/xen.h.

This will allow Xen to enable emulated devices for HVMlite guests in the
future, by having a proper ABI for reporting which devices are enabled.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/hvm.c              | 4 ++++
 xen/include/public/arch-x86/cpuid.h | 5 +++++
 2 files changed, 9 insertions(+)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 674feea..f0145f6 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4536,6 +4536,10 @@ void hvm_hypervisor_cpuid_leaf(uint32_t sub_idx,
         /* Indicate presence of vcpu id and set it in ebx */
         *eax |= XEN_HVM_CPUID_VCPU_ID_PRESENT;
         *ebx = current->vcpu_id;
+
+        /* Indicate the presence of the devices bitmap in ecx. */
+        *eax |= XEN_HVM_CPUID_DEVICES_BITMAP;
+        *ecx = current->domain->arch.emulation_flags;
     }
 }
 
diff --git a/xen/include/public/arch-x86/cpuid.h b/xen/include/public/arch-x86/cpuid.h
index d709340..7222483 100644
--- a/xen/include/public/arch-x86/cpuid.h
+++ b/xen/include/public/arch-x86/cpuid.h
@@ -78,12 +78,17 @@
  * HVM-specific features
  * EAX: Features
  * EBX: vcpu id (iff EAX has XEN_HVM_CPUID_VCPU_ID_PRESENT flag)
+ * ECX: bitmap of enabled devices, according to the bit fields defined in
+ *      arch-x86/xen.h. All unused bits have undefined values. The contents
+ *      of this register is only valid if EAX has the
+ *      XEN_HVM_CPUID_DEVICES_BITMAP flag set.
  */
 #define XEN_HVM_CPUID_APIC_ACCESS_VIRT (1u << 0) /* Virtualized APIC registers */
 #define XEN_HVM_CPUID_X2APIC_VIRT      (1u << 1) /* Virtualized x2APIC accesses */
 /* Memory mapped from other domains has valid IOMMU entries */
 #define XEN_HVM_CPUID_IOMMU_MAPPINGS   (1u << 2)
 #define XEN_HVM_CPUID_VCPU_ID_PRESENT  (1u << 3) /* vcpu id is present in EBX */
+#define XEN_HVM_CPUID_DEVICES_BITMAP   (1u << 4) /* device bitmap in ECX */
 
 #define XEN_CPUID_MAX_NUM_LEAVES 4
 
-- 
1.9.5 (Apple Git-50.3)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 1/6] libelf: rewrite symtab/strtab loading for Dom0
  2016-01-21 16:51 ` [PATCH v4 1/6] libelf: rewrite symtab/strtab loading for Dom0 Roger Pau Monne
@ 2016-01-21 17:29   ` Ian Jackson
  2016-01-21 17:55     ` Roger Pau Monné
  0 siblings, 1 reply; 29+ messages in thread
From: Ian Jackson @ 2016-01-21 17:29 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: xen-devel, Keir Fraser, Ian Campbell, Jan Beulich, Tim Deegan

Roger Pau Monne writes ("[PATCH v4 1/6] libelf: rewrite symtab/strtab loading for Dom0"):
> Current implementation of elf_load_bsdsyms is broken when loading inside of
> a HVM guest, because it assumes elf_memcpy_safe is able to write into guest
> memory space, which it is not.
> 
> Take the oportunity to do some cleanup and properly document how
> elf_{parse/load}_bsdsyms works. The new implementation uses elf_load_image
> when dealing with data that needs to be copied to the guest memory space.
> Also reduce the number of section headers copied to the minimum necessary.
...
>  #define elf_hdr_elm(_elf, _hdr, _elm, _val)     \
>  do {                                            \
>      if ( elf_64bit(_elf) )                      \
> -        elf_store_field(_elf, _hdr, e64._elm, _val);  \
> +        (_hdr).e64._elm = _val;                 \

This seems to bypass the safe store mechanism which was introduced to
fix XSA-55.

Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 1/6] libelf: rewrite symtab/strtab loading for Dom0
  2016-01-21 17:29   ` Ian Jackson
@ 2016-01-21 17:55     ` Roger Pau Monné
  2016-01-21 18:44       ` Ian Jackson
  2016-01-22  8:11       ` Jan Beulich
  0 siblings, 2 replies; 29+ messages in thread
From: Roger Pau Monné @ 2016-01-21 17:55 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Keir Fraser, Ian Campbell, Jan Beulich, Tim Deegan

El 21/01/16 a les 18.29, Ian Jackson ha escrit:
> Roger Pau Monne writes ("[PATCH v4 1/6] libelf: rewrite symtab/strtab loading for Dom0"):
>> Current implementation of elf_load_bsdsyms is broken when loading inside of
>> a HVM guest, because it assumes elf_memcpy_safe is able to write into guest
>> memory space, which it is not.
>>
>> Take the oportunity to do some cleanup and properly document how
>> elf_{parse/load}_bsdsyms works. The new implementation uses elf_load_image
>> when dealing with data that needs to be copied to the guest memory space.
>> Also reduce the number of section headers copied to the minimum necessary.
> ...
>>  #define elf_hdr_elm(_elf, _hdr, _elm, _val)     \
>>  do {                                            \
>>      if ( elf_64bit(_elf) )                      \
>> -        elf_store_field(_elf, _hdr, e64._elm, _val);  \
>> +        (_hdr).e64._elm = _val;                 \
> 
> This seems to bypass the safe store mechanism which was introduced to
> fix XSA-55.

This macro is only used to store fields inside of a structure that's
allocated on the stack, and it doesn't involve any kind of pointer
magic/arithmetic. The way it was used previously in this function indeed
required the use of the _safe mechanism in order to prevent writing into
arbitrary memory places, because we were actually modifying guest memory
IIRC.

I could restore the previous behaviour, but that would mean adding some
handlers in order to access the structure. Since this is only used for
Dom0, which already makes use of the elf_memcpy_unchecked function as
called by elf_store_val which is in turn called from elf_store_field, so
it's not like we were protected previously anyway.

Roger.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 1/6] libelf: rewrite symtab/strtab loading for Dom0
  2016-01-21 17:55     ` Roger Pau Monné
@ 2016-01-21 18:44       ` Ian Jackson
  2016-01-22  8:11       ` Jan Beulich
  1 sibling, 0 replies; 29+ messages in thread
From: Ian Jackson @ 2016-01-21 18:44 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Keir Fraser, Ian Campbell, Jan Beulich, Tim Deegan

Roger Pau Monné writes ("Re: [PATCH v4 1/6] libelf: rewrite symtab/strtab loading for Dom0"):
> El 21/01/16 a les 18.29, Ian Jackson ha escrit:
> > Roger Pau Monne writes ("[PATCH v4 1/6] libelf: rewrite symtab/strtab loading for Dom0"):
> >>  #define elf_hdr_elm(_elf, _hdr, _elm, _val)     \
> >>  do {                                            \
> >>      if ( elf_64bit(_elf) )                      \
> >> -        elf_store_field(_elf, _hdr, e64._elm, _val);  \
> >> +        (_hdr).e64._elm = _val;                 \
> > 
> > This seems to bypass the safe store mechanism which was introduced to
> > fix XSA-55.
> 
> This macro is only used to store fields inside of a structure that's
> allocated on the stack, and it doesn't involve any kind of pointer
> magic/arithmetic. The way it was used previously in this function indeed
> required the use of the _safe mechanism in order to prevent writing into
> arbitrary memory places, because we were actually modifying guest memory
> IIRC.

Aha.  Hmm.

The problem is that someone might use this macro to access a structure
that was _not_ stored on the stack.

...

Oh, I see, you have changed the type of _hdr from an ELF_MAKE_HANDLE
to an actual literal structure.  So actually this change is correct.
I think this ought to be (have been) mentioned in the commit message.

> I could restore the previous behaviour, but that would mean adding some
> handlers in order to access the structure. Since this is only used for
> Dom0, which already makes use of the elf_memcpy_unchecked function as
> called by elf_store_val which is in turn called from elf_store_field, so
> it's not like we were protected previously anyway.

elf_store_val calls elf_access_ok before calling elf_memcpy_unchecked.

It wouldn't need `handlers', it would need the struct being registered
as the xdest.  But there is no need if it never needs to be accessed
through any kind of guest-influenceable pointers.

Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 1/6] libelf: rewrite symtab/strtab loading for Dom0
  2016-01-21 17:55     ` Roger Pau Monné
  2016-01-21 18:44       ` Ian Jackson
@ 2016-01-22  8:11       ` Jan Beulich
  2016-01-22  9:58         ` Roger Pau Monné
  1 sibling, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2016-01-22  8:11 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Keir Fraser, Ian Jackson, Ian Campbell, Tim Deegan

>>> On 21.01.16 at 18:55, <roger.pau@citrix.com> wrote:
> El 21/01/16 a les 18.29, Ian Jackson ha escrit:
>> Roger Pau Monne writes ("[PATCH v4 1/6] libelf: rewrite symtab/strtab 
> loading for Dom0"):
>>> Current implementation of elf_load_bsdsyms is broken when loading inside of
>>> a HVM guest, because it assumes elf_memcpy_safe is able to write into guest
>>> memory space, which it is not.
>>>
>>> Take the oportunity to do some cleanup and properly document how
>>> elf_{parse/load}_bsdsyms works. The new implementation uses elf_load_image
>>> when dealing with data that needs to be copied to the guest memory space.
>>> Also reduce the number of section headers copied to the minimum necessary.
>> ...
>>>  #define elf_hdr_elm(_elf, _hdr, _elm, _val)     \
>>>  do {                                            \
>>>      if ( elf_64bit(_elf) )                      \
>>> -        elf_store_field(_elf, _hdr, e64._elm, _val);  \
>>> +        (_hdr).e64._elm = _val;                 \
>> 
>> This seems to bypass the safe store mechanism which was introduced to
>> fix XSA-55.
> 
> This macro is only used to store fields inside of a structure that's
> allocated on the stack, and it doesn't involve any kind of pointer
> magic/arithmetic. The way it was used previously in this function indeed
> required the use of the _safe mechanism in order to prevent writing into
> arbitrary memory places, because we were actually modifying guest memory
> IIRC.
> 
> I could restore the previous behaviour, but that would mean adding some
> handlers in order to access the structure. Since this is only used for
> Dom0, which already makes use of the elf_memcpy_unchecked function as
> called by elf_store_val which is in turn called from elf_store_field, so
> it's not like we were protected previously anyway.

If this is indeed Dom0-only code, could (and perhaps should) it be
guarded suitably by an #ifdef to make obvious it's not used for
DomU loading, and hence not security sensitive? From looking at
the call sites of elf_{parse,load}_bsdsyms() I can't, btw., tell that
this is Dom0-only ...

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 1/6] libelf: rewrite symtab/strtab loading for Dom0
  2016-01-22  8:11       ` Jan Beulich
@ 2016-01-22  9:58         ` Roger Pau Monné
  0 siblings, 0 replies; 29+ messages in thread
From: Roger Pau Monné @ 2016-01-22  9:58 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser, Ian Jackson, Ian Campbell, Tim Deegan

El 22/01/16 a les 9.11, Jan Beulich ha escrit:
>>>> On 21.01.16 at 18:55, <roger.pau@citrix.com> wrote:
>> El 21/01/16 a les 18.29, Ian Jackson ha escrit:
>>> Roger Pau Monne writes ("[PATCH v4 1/6] libelf: rewrite symtab/strtab 
>> loading for Dom0"):
>>>> Current implementation of elf_load_bsdsyms is broken when loading inside of
>>>> a HVM guest, because it assumes elf_memcpy_safe is able to write into guest
>>>> memory space, which it is not.
>>>>
>>>> Take the oportunity to do some cleanup and properly document how
>>>> elf_{parse/load}_bsdsyms works. The new implementation uses elf_load_image
>>>> when dealing with data that needs to be copied to the guest memory space.
>>>> Also reduce the number of section headers copied to the minimum necessary.
>>> ...
>>>>  #define elf_hdr_elm(_elf, _hdr, _elm, _val)     \
>>>>  do {                                            \
>>>>      if ( elf_64bit(_elf) )                      \
>>>> -        elf_store_field(_elf, _hdr, e64._elm, _val);  \
>>>> +        (_hdr).e64._elm = _val;                 \
>>>
>>> This seems to bypass the safe store mechanism which was introduced to
>>> fix XSA-55.
>>
>> This macro is only used to store fields inside of a structure that's
>> allocated on the stack, and it doesn't involve any kind of pointer
>> magic/arithmetic. The way it was used previously in this function indeed
>> required the use of the _safe mechanism in order to prevent writing into
>> arbitrary memory places, because we were actually modifying guest memory
>> IIRC.
>>
>> I could restore the previous behaviour, but that would mean adding some
>> handlers in order to access the structure. Since this is only used for
>> Dom0, which already makes use of the elf_memcpy_unchecked function as
>> called by elf_store_val which is in turn called from elf_store_field, so
>> it's not like we were protected previously anyway.
> 
> If this is indeed Dom0-only code, could (and perhaps should) it be
> guarded suitably by an #ifdef to make obvious it's not used for
> DomU loading, and hence not security sensitive? From looking at
> the call sites of elf_{parse,load}_bsdsyms() I can't, btw., tell that
> this is Dom0-only ...

You are completely right, TBH I'm not sure what's going on with this.
xc_dom_elfloader.c has it's own functions to load the strtab/symtab, but
it's also calling elf_load_binary which, AFAICT, will call
elf_load_bsdsyms. Am I missing something, or are we loading the
symtab/strtab twice from libxc?

Roger.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 4/6] x86/PV: allow PV guests to have an emulated PIT
  2016-01-21 16:51 ` [PATCH v4 4/6] x86/PV: allow PV guests to have an emulated PIT Roger Pau Monne
@ 2016-01-22 10:48   ` Jan Beulich
  2016-01-22 11:03     ` Ian Campbell
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2016-01-22 10:48 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, xen-devel

>>> On 21.01.16 at 17:51, <roger.pau@citrix.com> wrote:
> This fixes the fallout from the HVMlite series, that removed the emulated
> PIT from PV(H) guests. Also, this patch forces the hardware domain to
> always have an emulated PIT, regardless of whether the toolstack specified
> one or not.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
albeit ...

> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -542,8 +542,11 @@ int arch_domain_create(struct domain *d, unsigned int 
> domcr_flags,
>                     d->domain_id, config->emulation_flags);
>              return -EINVAL;
>          }
> +        if ( is_hardware_domain(d) )
> +            config->emulation_flags |= XEN_X86_EMU_PIT;
>          if ( config->emulation_flags != 0 &&
> -             (!is_hvm_domain(d) || config->emulation_flags != XEN_X86_EMU_ALL) )
> +             (is_hvm_domain(d) ? (config->emulation_flags != XEN_X86_EMU_ALL) :
> +                                 (config->emulation_flags != XEN_X86_EMU_PIT)) )
>          {

... you having chosen the != route instead of the suggested &
one, I guess I'll take the liberty to further simplify this while
committing (as the ?: is now only needed on the right side of the
!= and I'm generally of the opinion that redundancy like this is
hampering readability).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 6/6] x86/HVM: report the set of enabled emulated devices through CPUID
  2016-01-21 16:51 ` [PATCH v4 6/6] x86/HVM: report the set of enabled emulated devices through CPUID Roger Pau Monne
@ 2016-01-22 10:57   ` Jan Beulich
  2016-01-22 12:43     ` Roger Pau Monné
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2016-01-22 10:57 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, xen-devel

>>> On 21.01.16 at 17:51, <roger.pau@citrix.com> wrote:
> Add a new HVM-specific feature flag that signals the presence of a bitmap
> that contains the current set of enabled emulated devices. The bitmap is
> placed in the ecx register. The bit fields used in the bitmap are the same
> as the ones used in the xen_arch_domainconfig emulation_flags field, and
> their meaning can be found at arch-x86/xen.h.
> 
> This will allow Xen to enable emulated devices for HVMlite guests in the
> future, by having a proper ABI for reporting which devices are enabled.

The idea is certainly nice and appreciated, but ...

> --- a/xen/include/public/arch-x86/cpuid.h
> +++ b/xen/include/public/arch-x86/cpuid.h
> @@ -78,12 +78,17 @@
>   * HVM-specific features
>   * EAX: Features
>   * EBX: vcpu id (iff EAX has XEN_HVM_CPUID_VCPU_ID_PRESENT flag)
> + * ECX: bitmap of enabled devices, according to the bit fields defined in
> + *      arch-x86/xen.h.

... this set of definitions is not currently a stable ABI (limited to
hypervisor and tool stack), and if we wanted to make it stable
we'd first need to think a little about the complications that may
arise if the granularity chosen (think about the PM bit and the
discussion around it before your changes went in) turns out to
be a problem later on.

Also at least some of the features can be determined by other
means (CPUID, ACPI tables), so I'm not even sure we need all
of this, and I'd really prefer to avoid multiple distinct ways to
learn of a certain feature, as it's too easy for the two (or more)
mechanisms to get out of sync.

> All unused bits have undefined values.

Nor is this an option, but maybe this is just a wording issue:
Perhaps you mean to say that they're reserved for future use?
Since truly unused bits have are guaranteed to have the value
zero, just that the set of bits varies.

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 2/6] libxl: introduce LIBXL_VGA_INTERFACE_TYPE_UNKNOWN
  2016-01-21 16:51 ` [PATCH v4 2/6] libxl: introduce LIBXL_VGA_INTERFACE_TYPE_UNKNOWN Roger Pau Monne
@ 2016-01-22 10:59   ` Ian Campbell
  0 siblings, 0 replies; 29+ messages in thread
From: Ian Campbell @ 2016-01-22 10:59 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel; +Cc: Wei Liu, Ian Jackson

On Thu, 2016-01-21 at 17:51 +0100, Roger Pau Monne wrote:
> And use it as the default value for the VGA kind. This allows libxl to
> set
> it to the default value later on when the domain type is known. For HVM
> guests the default value is LIBXL_VGA_INTERFACE_TYPE_CIRRUS while for
> HVMlite the default value is LIBXL_VGA_INTERFACE_TYPE_NONE.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

There a behavioural change from an xl users PoV, WRT the dmlite/qemu=none,
which warrants a documentation change I think. That might need to be part
of a bigger overhaul of the xl manpages to incorporate discussion of
pvh/dmlite though? and in any case can come in a followup.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 3/6] libxl: initialise the build info before calling prepare_config
  2016-01-21 16:51 ` [PATCH v4 3/6] libxl: initialise the build info before calling prepare_config Roger Pau Monne
@ 2016-01-22 11:00   ` Ian Campbell
  0 siblings, 0 replies; 29+ messages in thread
From: Ian Campbell @ 2016-01-22 11:00 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel; +Cc: Wei Liu, Ian Jackson

On Thu, 2016-01-21 at 17:51 +0100, Roger Pau Monne wrote:
> libxl__arch_domain_prepare_config has access to the
> libxl_domain_build_info
> struct, so make sure it's properly initialised. Note that prepare_config
> is
> called from within libxl__domain_make.
> 
> This is not a bug at the moment, because prepare_config doesn't access
> libxl_domain_build_info yet, but this is likely going to change.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Acked-by: Ian Campbell <ia.campbell@citri.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 4/6] x86/PV: allow PV guests to have an emulated PIT
  2016-01-22 10:48   ` Jan Beulich
@ 2016-01-22 11:03     ` Ian Campbell
  0 siblings, 0 replies; 29+ messages in thread
From: Ian Campbell @ 2016-01-22 11:03 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monne; +Cc: Andrew Cooper, xen-devel

On Fri, 2016-01-22 at 03:48 -0700, Jan Beulich wrote:
> > > > On 21.01.16 at 17:51, <roger.pau@citrix.com> wrote:
> > This fixes the fallout from the HVMlite series, that removed the
> > emulated
> > PIT from PV(H) guests. Also, this patch forces the hardware domain to
> > always have an emulated PIT, regardless of whether the toolstack
> > specified
> > one or not.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> albeit ...
> 
> > --- a/xen/arch/x86/domain.c
> > +++ b/xen/arch/x86/domain.c
> > @@ -542,8 +542,11 @@ int arch_domain_create(struct domain *d, unsigned
> > int 
> > domcr_flags,
> >                     d->domain_id, config->emulation_flags);
> >              return -EINVAL;
> >          }
> > +        if ( is_hardware_domain(d) )
> > +            config->emulation_flags |= XEN_X86_EMU_PIT;
> >          if ( config->emulation_flags != 0 &&
> > -             (!is_hvm_domain(d) || config->emulation_flags !=
> > XEN_X86_EMU_ALL) )
> > +             (is_hvm_domain(d) ? (config->emulation_flags !=
> > XEN_X86_EMU_ALL) :
> > +                                 (config->emulation_flags !=
> > XEN_X86_EMU_PIT)) )
> >          {
> 
> ... you having chosen the != route instead of the suggested &
> one, I guess I'll take the liberty to further simplify this while
> committing (as the ?: is now only needed on the right side of the
> != and I'm generally of the opinion that redundancy like this is
> hampering readability).

Readability would be even greater IMHO with the use of a
"required_emulation_flags" variable suitably initialised and then checked,
compared with the use of bitops etc.

> 
> Jan
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 6/6] x86/HVM: report the set of enabled emulated devices through CPUID
  2016-01-22 10:57   ` Jan Beulich
@ 2016-01-22 12:43     ` Roger Pau Monné
  2016-01-22 13:24       ` Jan Beulich
  2016-01-22 13:34       ` Andrew Cooper
  0 siblings, 2 replies; 29+ messages in thread
From: Roger Pau Monné @ 2016-01-22 12:43 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

El 22/01/16 a les 11.57, Jan Beulich ha escrit:
>>>> On 21.01.16 at 17:51, <roger.pau@citrix.com> wrote:
>> Add a new HVM-specific feature flag that signals the presence of a bitmap
>> that contains the current set of enabled emulated devices. The bitmap is
>> placed in the ecx register. The bit fields used in the bitmap are the same
>> as the ones used in the xen_arch_domainconfig emulation_flags field, and
>> their meaning can be found at arch-x86/xen.h.
>>
>> This will allow Xen to enable emulated devices for HVMlite guests in the
>> future, by having a proper ABI for reporting which devices are enabled.
> 
> The idea is certainly nice and appreciated, but ...
> 
>> --- a/xen/include/public/arch-x86/cpuid.h
>> +++ b/xen/include/public/arch-x86/cpuid.h
>> @@ -78,12 +78,17 @@
>>   * HVM-specific features
>>   * EAX: Features
>>   * EBX: vcpu id (iff EAX has XEN_HVM_CPUID_VCPU_ID_PRESENT flag)
>> + * ECX: bitmap of enabled devices, according to the bit fields defined in
>> + *      arch-x86/xen.h.
> 
> ... this set of definitions is not currently a stable ABI (limited to
> hypervisor and tool stack), and if we wanted to make it stable
> we'd first need to think a little about the complications that may
> arise if the granularity chosen (think about the PM bit and the
> discussion around it before your changes went in) turns out to
> be a problem later on.

Yes, in fact I'm having second thoughts on the PM flag, and I think I
should have split it into ACPI_PM and ACPI_TIMER instead.

> Also at least some of the features can be determined by other
> means (CPUID, ACPI tables), so I'm not even sure we need all
> of this, and I'd really prefer to avoid multiple distinct ways to
> learn of a certain feature, as it's too easy for the two (or more)
> mechanisms to get out of sync.

So let's look at the flags and whether there's an existing way to signal
it's presence:

LAPIC: CPUID.01h:EDX[bit 9]
IOAPIC: tied to LAPIC (so either both enabled or none).

HPET: can only be enabled from/with ACPI, since it's base memory address
is not fixed, and we would need to find a way to pass it's address to
the OS in the absence of ACPI.

RTC: I don't know of any way to signal the RTC presence, AFAICT it's
always assumed to be there in the PC architecture. Could maybe return ~0
when reading from IO port 0x71, but that's meh..., not the best way IMHO.

PIC: same as RTC, I don't know of any way to signal it's presence since
it's assumed to be there.

VGA: again I don't think there's an easy way to signal it's presence,
apart from returning ~0 from the multiple IO ports it uses. The fact
that the 0xA0000-0xBFFFF memory range is also marked as RAM in the e820
map in HVMlite DomUs should also trigger OSes into disabling VGA due to
the lack of proper MMIO range, but sadly I think most OSes just assume
it's there.

PIT: assumed to be always present in the PC architecture.

PM: I'm leaning to split this into ACPI_PM and ACPI_TIMER as said
before. ACPI_TIMER presence it's contained inside of ACPI tables, and
the availability of ACPI_PM (power management) can be inferred from the
presence of ACPI itself.

AMD guest IOMMU: AFAICT this seems to be currently disabled, since the
MMIO range it checks is [~0ULL, ~0ULL + 0x8000]. There is a function to
change the base address ~0ULL to something else, but it doesn't seem to
be reachable from any path. In any case, I guess the presence of this
device will be reported from ACPI.

So, we have the following devices that are assumed to be there: RTC,
PIC, PIT. Everything else I think can be signalled by other means
already available.

IMHO, I think we could say that the PIC is never going to be available
to HVMlite guests (in any case we would enable the lapic/ioapic), and
maybe enable the RTC and PIT by default?

Then I think we could get away without any Xen-specific way of reporting
enabled devices.

>> All unused bits have undefined values.
> 
> Nor is this an option, but maybe this is just a wording issue:
> Perhaps you mean to say that they're reserved for future use?
> Since truly unused bits have are guaranteed to have the value
> zero, just that the set of bits varies.

Yes, that's exactly what I meant to say, thanks.

Roger.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 6/6] x86/HVM: report the set of enabled emulated devices through CPUID
  2016-01-22 12:43     ` Roger Pau Monné
@ 2016-01-22 13:24       ` Jan Beulich
  2016-01-22 14:41         ` Roger Pau Monné
  2016-01-22 13:34       ` Andrew Cooper
  1 sibling, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2016-01-22 13:24 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, xen-devel

>>> On 22.01.16 at 13:43, <roger.pau@citrix.com> wrote:
> RTC: I don't know of any way to signal the RTC presence, AFAICT it's
> always assumed to be there in the PC architecture. Could maybe return ~0
> when reading from IO port 0x71, but that's meh..., not the best way IMHO.

There actually is an RTC-absent flag in the FADT, which the
hypervisor itself actually looks at (ACPI_FADT_NO_CMOS_RTC).

> PIC: same as RTC, I don't know of any way to signal it's presence since
> it's assumed to be there.

I think PIC absence can also be gathered from ACPI
(ACPI_FADT_HW_REDUCED).

> VGA: again I don't think there's an easy way to signal it's presence,
> apart from returning ~0 from the multiple IO ports it uses. The fact
> that the 0xA0000-0xBFFFF memory range is also marked as RAM in the e820
> map in HVMlite DomUs should also trigger OSes into disabling VGA due to
> the lack of proper MMIO range, but sadly I think most OSes just assume
> it's there.

Yes, VGA is kind of more difficult. Looking at all PCI devices'
command words may provide a hint, as may looking at all PCI
bridges' bridge control words.

> PIT: assumed to be always present in the PC architecture.

See PIC above.

> PM: I'm leaning to split this into ACPI_PM and ACPI_TIMER as said
> before. ACPI_TIMER presence it's contained inside of ACPI tables, and
> the availability of ACPI_PM (power management) can be inferred from the
> presence of ACPI itself.

As indicated in the original discussion, I don't think these should
have been separate flags anyway - either you have ACPI (then
you have indication of both in FADT), or there is no ACPI.

> AMD guest IOMMU: AFAICT this seems to be currently disabled, since the
> MMIO range it checks is [~0ULL, ~0ULL + 0x8000]. There is a function to
> change the base address ~0ULL to something else, but it doesn't seem to
> be reachable from any path. In any case, I guess the presence of this
> device will be reported from ACPI.

Yes, the implementation is incomplete (abandoned?), but IOMMU
presence can always be determined by the guest through
inspecting its ACPI tables.

> So, we have the following devices that are assumed to be there: RTC,
> PIC, PIT. Everything else I think can be signalled by other means
> already available.
> 
> IMHO, I think we could say that the PIC is never going to be available
> to HVMlite guests (in any case we would enable the lapic/ioapic), and
> maybe enable the RTC and PIT by default?

That may be a sane initial setup, but with the ACPI flags named
above we may be able to expressed even their absence.

> Then I think we could get away without any Xen-specific way of reporting
> enabled devices.

Indeed - that should be the preferred goal.

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 6/6] x86/HVM: report the set of enabled emulated devices through CPUID
  2016-01-22 12:43     ` Roger Pau Monné
  2016-01-22 13:24       ` Jan Beulich
@ 2016-01-22 13:34       ` Andrew Cooper
  2016-01-22 14:59         ` Roger Pau Monné
  1 sibling, 1 reply; 29+ messages in thread
From: Andrew Cooper @ 2016-01-22 13:34 UTC (permalink / raw)
  To: Roger Pau Monné, Jan Beulich; +Cc: xen-devel

On 22/01/16 12:43, Roger Pau Monné wrote:
> El 22/01/16 a les 11.57, Jan Beulich ha escrit:
>>>>> On 21.01.16 at 17:51, <roger.pau@citrix.com> wrote:
>>> Add a new HVM-specific feature flag that signals the presence of a bitmap
>>> that contains the current set of enabled emulated devices. The bitmap is
>>> placed in the ecx register. The bit fields used in the bitmap are the same
>>> as the ones used in the xen_arch_domainconfig emulation_flags field, and
>>> their meaning can be found at arch-x86/xen.h.
>>>
>>> This will allow Xen to enable emulated devices for HVMlite guests in the
>>> future, by having a proper ABI for reporting which devices are enabled.
>> The idea is certainly nice and appreciated, but ...
>>
>>> --- a/xen/include/public/arch-x86/cpuid.h
>>> +++ b/xen/include/public/arch-x86/cpuid.h
>>> @@ -78,12 +78,17 @@
>>>   * HVM-specific features
>>>   * EAX: Features
>>>   * EBX: vcpu id (iff EAX has XEN_HVM_CPUID_VCPU_ID_PRESENT flag)
>>> + * ECX: bitmap of enabled devices, according to the bit fields defined in
>>> + *      arch-x86/xen.h.
>> ... this set of definitions is not currently a stable ABI (limited to
>> hypervisor and tool stack), and if we wanted to make it stable
>> we'd first need to think a little about the complications that may
>> arise if the granularity chosen (think about the PM bit and the
>> discussion around it before your changes went in) turns out to
>> be a problem later on.
> Yes, in fact I'm having second thoughts on the PM flag, and I think I
> should have split it into ACPI_PM and ACPI_TIMER instead.
>
>> Also at least some of the features can be determined by other
>> means (CPUID, ACPI tables), so I'm not even sure we need all
>> of this, and I'd really prefer to avoid multiple distinct ways to
>> learn of a certain feature, as it's too easy for the two (or more)
>> mechanisms to get out of sync.
> So let's look at the flags and whether there's an existing way to signal
> it's presence:
>
> LAPIC: CPUID.01h:EDX[bit 9]
> IOAPIC: tied to LAPIC (so either both enabled or none).

An IOAPIC is by no means required - they are only for turning legacy
interrupts into MSIs.  It would be perfectly fine for a PVH domain to
have an LAPIC and an SRIOV virtual function, without an IOAPIC at all.

The presence of LAPICs and IOAPICs reside in the MADT ACPI table.

Note also that the cpuid bit is a fastforward of the hardware enable bit
in the APIC_BASE MSR.  The cpuid bit will disappear from view if you
hardware-disable the LAPIC.

>
> HPET: can only be enabled from/with ACPI, since it's base memory address
> is not fixed, and we would need to find a way to pass it's address to
> the OS in the absence of ACPI.

In reality, there are heuristics to guess if an HPET is present.  The
legacy HPET traditionally always resides at pfn fed000.  Linux even has
heuristics to find the legacy HPET based on the IOH, for when the BIOS
doesn't present the HPET properly in ACPI.

This leads to an awkward bug where Linux is able to turn off legacy
timer interrupts behinds Xen's back, and cause carnage for kdump
environment, as Xen didn't know to re-enable legacy interrupts on the
crash path.

>
> RTC: I don't know of any way to signal the RTC presence, AFAICT it's
> always assumed to be there in the PC architecture. Could maybe return ~0
> when reading from IO port 0x71, but that's meh..., not the best way IMHO.
>
> PIC: same as RTC, I don't know of any way to signal it's presence since
> it's assumed to be there.
>
> VGA: again I don't think there's an easy way to signal it's presence,
> apart from returning ~0 from the multiple IO ports it uses. The fact
> that the 0xA0000-0xBFFFF memory range is also marked as RAM in the e820
> map in HVMlite DomUs should also trigger OSes into disabling VGA due to
> the lack of proper MMIO range, but sadly I think most OSes just assume
> it's there.

VGA can be found by following the VGA routing bit in PCI config space. 
This is how real hardware makes the legacy IO ranges reach the graphics
card configured as the primary vga device.

>
> PIT: assumed to be always present in the PC architecture.

PIT, RTC and PIC have their presence always assumed, but returning ~0 on
reads is completely fine.  A DMLite OS knows it is booting in a
virtualised environment.

>
> PM: I'm leaning to split this into ACPI_PM and ACPI_TIMER as said
> before. ACPI_TIMER presence it's contained inside of ACPI tables, and
> the availability of ACPI_PM (power management) can be inferred from the
> presence of ACPI itself.
>
> AMD guest IOMMU: AFAICT this seems to be currently disabled, since the
> MMIO range it checks is [~0ULL, ~0ULL + 0x8000]. There is a function to
> change the base address ~0ULL to something else, but it doesn't seem to
> be reachable from any path. In any case, I guess the presence of this
> device will be reported from ACPI.

It is indeed currently disabled  (See
https://bugs.xenserver.org/browse/XSO-132 if you want to see why.  It
manifested as a very curious bug).

It will be available via an IVRS ACPI table when implemented.

>
> So, we have the following devices that are assumed to be there: RTC,
> PIC, PIT. Everything else I think can be signalled by other means
> already available.
>
> IMHO, I think we could say that the PIC is never going to be available
> to HVMlite guests (in any case we would enable the lapic/ioapic), and
> maybe enable the RTC and PIT by default?
>
> Then I think we could get away without any Xen-specific way of reporting
> enabled devices.

DMLite is a new container type.  I would far rather it was assumed that
there was no legacy hardware at all.

~Andrew

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 6/6] x86/HVM: report the set of enabled emulated devices through CPUID
  2016-01-22 13:24       ` Jan Beulich
@ 2016-01-22 14:41         ` Roger Pau Monné
  2016-01-22 15:02           ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Roger Pau Monné @ 2016-01-22 14:41 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

El 22/01/16 a les 14.24, Jan Beulich ha escrit:
>>>> On 22.01.16 at 13:43, <roger.pau@citrix.com> wrote:
>> RTC: I don't know of any way to signal the RTC presence, AFAICT it's
>> always assumed to be there in the PC architecture. Could maybe return ~0
>> when reading from IO port 0x71, but that's meh..., not the best way IMHO.
> 
> There actually is an RTC-absent flag in the FADT, which the
> hypervisor itself actually looks at (ACPI_FADT_NO_CMOS_RTC).

So most of this assumes that if we ever want to enable any of those
devices we will provide ACPI tables to the guest?

The RTC can also be used as an interrupt source, which I think it's not
covered by the ACPI_FADT_NO_CMOS_RTC flag.

> 
>> PIC: same as RTC, I don't know of any way to signal it's presence since
>> it's assumed to be there.
> 
> I think PIC absence can also be gathered from ACPI
> (ACPI_FADT_HW_REDUCED).
> 
>> VGA: again I don't think there's an easy way to signal it's presence,
>> apart from returning ~0 from the multiple IO ports it uses. The fact
>> that the 0xA0000-0xBFFFF memory range is also marked as RAM in the e820
>> map in HVMlite DomUs should also trigger OSes into disabling VGA due to
>> the lack of proper MMIO range, but sadly I think most OSes just assume
>> it's there.
> 
> Yes, VGA is kind of more difficult. Looking at all PCI devices'
> command words may provide a hint, as may looking at all PCI
> bridges' bridge control words.

Hm that seems like a rather convoluted procedure, and this needs to be
available very early on during the boot process usually.

>> PIT: assumed to be always present in the PC architecture.
> 
> See PIC above.

At least on FreeBSD PIT is used much earlier than parsing any ACPI
tables (it's used to implement a busy-wait DELAY routine), so I don't
think it's sensible to tie this device to ACPI. Also see my note above
about requiring ACPI in order to signal all of this.

> 
>> PM: I'm leaning to split this into ACPI_PM and ACPI_TIMER as said
>> before. ACPI_TIMER presence it's contained inside of ACPI tables, and
>> the availability of ACPI_PM (power management) can be inferred from the
>> presence of ACPI itself.
> 
> As indicated in the original discussion, I don't think these should
> have been separate flags anyway - either you have ACPI (then
> you have indication of both in FADT), or there is no ACPI.

Right. I think this is something internal that's used between the
toolstack and Xen in order to tell Xen whether it should add those
handlers or not, because AFAIK Xen doesn't know if a certain domain has
ACPI or not.

It should not be exposed in any way to the guest except when using ACPI
tables, that will contain the appropriate value.

>> AMD guest IOMMU: AFAICT this seems to be currently disabled, since the
>> MMIO range it checks is [~0ULL, ~0ULL + 0x8000]. There is a function to
>> change the base address ~0ULL to something else, but it doesn't seem to
>> be reachable from any path. In any case, I guess the presence of this
>> device will be reported from ACPI.
> 
> Yes, the implementation is incomplete (abandoned?), but IOMMU
> presence can always be determined by the guest through
> inspecting its ACPI tables.
> 
>> So, we have the following devices that are assumed to be there: RTC,
>> PIC, PIT. Everything else I think can be signalled by other means
>> already available.
>>
>> IMHO, I think we could say that the PIC is never going to be available
>> to HVMlite guests (in any case we would enable the lapic/ioapic), and
>> maybe enable the RTC and PIT by default?
> 
> That may be a sane initial setup, but with the ACPI flags named
> above we may be able to expressed even their absence.

I still think we should probably enable those, because they tend to be
used very early on boot, before parsing ACPI tables, and in general are
considered to be always there on PCs.

Also, in order to be able to express their absence we would have to
provide ACPI tables to DomUs, which IMHO, is not something we wish to do
right now.

>> Then I think we could get away without any Xen-specific way of reporting
>> enabled devices.
> 
> Indeed - that should be the preferred goal.

Thanks for the feedback, Roger.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 6/6] x86/HVM: report the set of enabled emulated devices through CPUID
  2016-01-22 13:34       ` Andrew Cooper
@ 2016-01-22 14:59         ` Roger Pau Monné
  2016-01-22 15:31           ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Roger Pau Monné @ 2016-01-22 14:59 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich; +Cc: xen-devel

El 22/01/16 a les 14.34, Andrew Cooper ha escrit:
> On 22/01/16 12:43, Roger Pau Monné wrote:
>> El 22/01/16 a les 11.57, Jan Beulich ha escrit:
>>>>>> On 21.01.16 at 17:51, <roger.pau@citrix.com> wrote:
>>>> Add a new HVM-specific feature flag that signals the presence of a bitmap
>>>> that contains the current set of enabled emulated devices. The bitmap is
>>>> placed in the ecx register. The bit fields used in the bitmap are the same
>>>> as the ones used in the xen_arch_domainconfig emulation_flags field, and
>>>> their meaning can be found at arch-x86/xen.h.
>>>>
>>>> This will allow Xen to enable emulated devices for HVMlite guests in the
>>>> future, by having a proper ABI for reporting which devices are enabled.
>>> The idea is certainly nice and appreciated, but ...
>>>
>>>> --- a/xen/include/public/arch-x86/cpuid.h
>>>> +++ b/xen/include/public/arch-x86/cpuid.h
>>>> @@ -78,12 +78,17 @@
>>>>   * HVM-specific features
>>>>   * EAX: Features
>>>>   * EBX: vcpu id (iff EAX has XEN_HVM_CPUID_VCPU_ID_PRESENT flag)
>>>> + * ECX: bitmap of enabled devices, according to the bit fields defined in
>>>> + *      arch-x86/xen.h.
>>> ... this set of definitions is not currently a stable ABI (limited to
>>> hypervisor and tool stack), and if we wanted to make it stable
>>> we'd first need to think a little about the complications that may
>>> arise if the granularity chosen (think about the PM bit and the
>>> discussion around it before your changes went in) turns out to
>>> be a problem later on.
>> Yes, in fact I'm having second thoughts on the PM flag, and I think I
>> should have split it into ACPI_PM and ACPI_TIMER instead.
>>
>>> Also at least some of the features can be determined by other
>>> means (CPUID, ACPI tables), so I'm not even sure we need all
>>> of this, and I'd really prefer to avoid multiple distinct ways to
>>> learn of a certain feature, as it's too easy for the two (or more)
>>> mechanisms to get out of sync.
>> So let's look at the flags and whether there's an existing way to signal
>> it's presence:
>>
>> LAPIC: CPUID.01h:EDX[bit 9]
>> IOAPIC: tied to LAPIC (so either both enabled or none).
> 
> An IOAPIC is by no means required - they are only for turning legacy
> interrupts into MSIs.  It would be perfectly fine for a PVH domain to
> have an LAPIC and an SRIOV virtual function, without an IOAPIC at all.
> 
> The presence of LAPICs and IOAPICs reside in the MADT ACPI table.

Right, so as said in the reply to Jan, we will require ACPI in order to
enable any of this pieces. I don't have a problem with that, just wasn't
sure if this requirement was desired.

If that's the plan, then I think we would also need to fixup the tables
provided to Dom0 in order to match what's available, but that can be
discussed later.

> Note also that the cpuid bit is a fastforward of the hardware enable bit
> in the APIC_BASE MSR.  The cpuid bit will disappear from view if you
> hardware-disable the LAPIC.

Right, it looks like ACPI is the best way to decide.

>>
>> HPET: can only be enabled from/with ACPI, since it's base memory address
>> is not fixed, and we would need to find a way to pass it's address to
>> the OS in the absence of ACPI.
> 
> In reality, there are heuristics to guess if an HPET is present.  The
> legacy HPET traditionally always resides at pfn fed000.  Linux even has
> heuristics to find the legacy HPET based on the IOH, for when the BIOS
> doesn't present the HPET properly in ACPI.

Heh, if we already require ACPI in order to discover local APICs and IO
APICs, I don't think it hurts to also require it on order to discover HPET.

>> RTC: I don't know of any way to signal the RTC presence, AFAICT it's
>> always assumed to be there in the PC architecture. Could maybe return ~0
>> when reading from IO port 0x71, but that's meh..., not the best way IMHO.
>>
>> PIC: same as RTC, I don't know of any way to signal it's presence since
>> it's assumed to be there.
>>
>> VGA: again I don't think there's an easy way to signal it's presence,
>> apart from returning ~0 from the multiple IO ports it uses. The fact
>> that the 0xA0000-0xBFFFF memory range is also marked as RAM in the e820
>> map in HVMlite DomUs should also trigger OSes into disabling VGA due to
>> the lack of proper MMIO range, but sadly I think most OSes just assume
>> it's there.
> 
> VGA can be found by following the VGA routing bit in PCI config space. 
> This is how real hardware makes the legacy IO ranges reach the graphics
> card configured as the primary vga device.

Hm, I have to look into this, are there any examples of this mechanism
out there?

>>
>> PIT: assumed to be always present in the PC architecture.
> 
> PIT, RTC and PIC have their presence always assumed, but returning ~0 on
> reads is completely fine.  A DMLite OS knows it is booting in a
> virtualised environment.

Yes, that's fine, I'm completely disabling the attachment of those
devices when entering from the Xen entry point ATM on FreeBSD, but how
are we going to notify the OS when they are actually available?

Just by returning something different from ~0 when poking at their IO
ports? Doesn't look like a very robust way IMHO.

> 
>>
>> PM: I'm leaning to split this into ACPI_PM and ACPI_TIMER as said
>> before. ACPI_TIMER presence it's contained inside of ACPI tables, and
>> the availability of ACPI_PM (power management) can be inferred from the
>> presence of ACPI itself.
>>
>> AMD guest IOMMU: AFAICT this seems to be currently disabled, since the
>> MMIO range it checks is [~0ULL, ~0ULL + 0x8000]. There is a function to
>> change the base address ~0ULL to something else, but it doesn't seem to
>> be reachable from any path. In any case, I guess the presence of this
>> device will be reported from ACPI.
> 
> It is indeed currently disabled  (See
> https://bugs.xenserver.org/browse/XSO-132 if you want to see why.  It
> manifested as a very curious bug).
> 
> It will be available via an IVRS ACPI table when implemented.
> 
>>
>> So, we have the following devices that are assumed to be there: RTC,
>> PIC, PIT. Everything else I think can be signalled by other means
>> already available.
>>
>> IMHO, I think we could say that the PIC is never going to be available
>> to HVMlite guests (in any case we would enable the lapic/ioapic), and
>> maybe enable the RTC and PIT by default?
>>
>> Then I think we could get away without any Xen-specific way of reporting
>> enabled devices.
> 
> DMLite is a new container type.  I would far rather it was assumed that
> there was no legacy hardware at all.

So I take that you are in favour of only considering enabling the local
APIC and IO APIC maybe for HVMlite, because of the performance benefits,
while the other devices are _never_ going to be available to HVMlite
guests/hosts at all? (Dom0 already gets the hw VGA)

IMHO, I would like to be able to eventually enable them in order to
provide an environment that's as close as possible to a compatible PC,
in order to reduce the amount of changes required in order to port an OS
to run in this mode.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 6/6] x86/HVM: report the set of enabled emulated devices through CPUID
  2016-01-22 14:41         ` Roger Pau Monné
@ 2016-01-22 15:02           ` Jan Beulich
  2016-01-22 15:41             ` Roger Pau Monné
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2016-01-22 15:02 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, xen-devel

>>> On 22.01.16 at 15:41, <roger.pau@citrix.com> wrote:
> El 22/01/16 a les 14.24, Jan Beulich ha escrit:
>>>>> On 22.01.16 at 13:43, <roger.pau@citrix.com> wrote:
>>> RTC: I don't know of any way to signal the RTC presence, AFAICT it's
>>> always assumed to be there in the PC architecture. Could maybe return ~0
>>> when reading from IO port 0x71, but that's meh..., not the best way IMHO.
>> 
>> There actually is an RTC-absent flag in the FADT, which the
>> hypervisor itself actually looks at (ACPI_FADT_NO_CMOS_RTC).
> 
> So most of this assumes that if we ever want to enable any of those
> devices we will provide ACPI tables to the guest?

We could check whether exposing SFI tables to the guest would
be a simpler mechanism allowing enough control.

In the absence of ACPI we need to settle on defaults: As Andrew
has said, contemporary logic would imply no legacy devices for an
environment that can be (made) aware of such.

> The RTC can also be used as an interrupt source, which I think it's not
> covered by the ACPI_FADT_NO_CMOS_RTC flag.

Certainly if there's no RTC device, then there's also no legacy
IRQ8.

>>> VGA: again I don't think there's an easy way to signal it's presence,
>>> apart from returning ~0 from the multiple IO ports it uses. The fact
>>> that the 0xA0000-0xBFFFF memory range is also marked as RAM in the e820
>>> map in HVMlite DomUs should also trigger OSes into disabling VGA due to
>>> the lack of proper MMIO range, but sadly I think most OSes just assume
>>> it's there.
>> 
>> Yes, VGA is kind of more difficult. Looking at all PCI devices'
>> command words may provide a hint, as may looking at all PCI
>> bridges' bridge control words.
> 
> Hm that seems like a rather convoluted procedure, and this needs to be
> available very early on during the boot process usually.

As long as the legacy MMIO address range isn't re-used by some
other device, having an OS blindly write to that range is quite okay
(and common practice) I think. Iiuc you think about getting log
messages out early?

>>> PIT: assumed to be always present in the PC architecture.
>> 
>> See PIC above.
> 
> At least on FreeBSD PIT is used much earlier than parsing any ACPI
> tables (it's used to implement a busy-wait DELAY routine), so I don't
> think it's sensible to tie this device to ACPI. Also see my note above
> about requiring ACPI in order to signal all of this.

See above: Quite likely you will need to do away with using PIT when
run as HVMlite guest.

>>> So, we have the following devices that are assumed to be there: RTC,
>>> PIC, PIT. Everything else I think can be signalled by other means
>>> already available.
>>>
>>> IMHO, I think we could say that the PIC is never going to be available
>>> to HVMlite guests (in any case we would enable the lapic/ioapic), and
>>> maybe enable the RTC and PIT by default?
>> 
>> That may be a sane initial setup, but with the ACPI flags named
>> above we may be able to expressed even their absence.
> 
> I still think we should probably enable those, because they tend to be
> used very early on boot, before parsing ACPI tables, and in general are
> considered to be always there on PCs.

No if you think the modern legacy free way.

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 6/6] x86/HVM: report the set of enabled emulated devices through CPUID
  2016-01-22 14:59         ` Roger Pau Monné
@ 2016-01-22 15:31           ` Jan Beulich
  2016-01-22 15:51             ` Roger Pau Monné
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2016-01-22 15:31 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, xen-devel

>>> On 22.01.16 at 15:59, <roger.pau@citrix.com> wrote:
> El 22/01/16 a les 14.34, Andrew Cooper ha escrit:
>> On 22/01/16 12:43, Roger Pau Monné wrote:
>>> IOAPIC: tied to LAPIC (so either both enabled or none).
>> 
>> An IOAPIC is by no means required - they are only for turning legacy
>> interrupts into MSIs.  It would be perfectly fine for a PVH domain to
>> have an LAPIC and an SRIOV virtual function, without an IOAPIC at all.
>> 
>> The presence of LAPICs and IOAPICs reside in the MADT ACPI table.
> 
> Right, so as said in the reply to Jan, we will require ACPI in order to
> enable any of this pieces. I don't have a problem with that, just wasn't
> sure if this requirement was desired.

The question is whether a non-Dom0 HVMlite guest needs any
of these in the first place. Because if it doesn't (and that's the
mode we provide right now), making them dependent on ACPI
availability should be quite fine: If we need them for some
purpose in the guest, we'd need to make ACPI tables available.

> If that's the plan, then I think we would also need to fixup the tables
> provided to Dom0 in order to match what's available, but that can be
> discussed later.

I don't see what you're getting at here: The IO-APIC information
should be usable as is for Dom0 (as is the case for PV).

>>> So, we have the following devices that are assumed to be there: RTC,
>>> PIC, PIT. Everything else I think can be signalled by other means
>>> already available.
>>>
>>> IMHO, I think we could say that the PIC is never going to be available
>>> to HVMlite guests (in any case we would enable the lapic/ioapic), and
>>> maybe enable the RTC and PIT by default?
>>>
>>> Then I think we could get away without any Xen-specific way of reporting
>>> enabled devices.
>> 
>> DMLite is a new container type.  I would far rather it was assumed that
>> there was no legacy hardware at all.
> 
> So I take that you are in favour of only considering enabling the local
> APIC and IO APIC maybe for HVMlite, because of the performance benefits,
> while the other devices are _never_ going to be available to HVMlite
> guests/hosts at all? (Dom0 already gets the hw VGA)
> 
> IMHO, I would like to be able to eventually enable them in order to
> provide an environment that's as close as possible to a compatible PC,
> in order to reduce the amount of changes required in order to port an OS
> to run in this mode.

I didn't think that's among the goals for HVMlite. Just like PV and PVH,
HVMlite requires OS awareness.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 6/6] x86/HVM: report the set of enabled emulated devices through CPUID
  2016-01-22 15:02           ` Jan Beulich
@ 2016-01-22 15:41             ` Roger Pau Monné
  0 siblings, 0 replies; 29+ messages in thread
From: Roger Pau Monné @ 2016-01-22 15:41 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

El 22/01/16 a les 16.02, Jan Beulich ha escrit:
>>>> On 22.01.16 at 15:41, <roger.pau@citrix.com> wrote:
>> El 22/01/16 a les 14.24, Jan Beulich ha escrit:
>>>>>> On 22.01.16 at 13:43, <roger.pau@citrix.com> wrote:
>>>> RTC: I don't know of any way to signal the RTC presence, AFAICT it's
>>>> always assumed to be there in the PC architecture. Could maybe return ~0
>>>> when reading from IO port 0x71, but that's meh..., not the best way IMHO.
>>>
>>> There actually is an RTC-absent flag in the FADT, which the
>>> hypervisor itself actually looks at (ACPI_FADT_NO_CMOS_RTC).
>>
>> So most of this assumes that if we ever want to enable any of those
>> devices we will provide ACPI tables to the guest?
> 
> We could check whether exposing SFI tables to the guest would
> be a simpler mechanism allowing enough control.
> 
> In the absence of ACPI we need to settle on defaults: As Andrew
> has said, contemporary logic would imply no legacy devices for an
> environment that can be (made) aware of such.
> 
>> The RTC can also be used as an interrupt source, which I think it's not
>> covered by the ACPI_FADT_NO_CMOS_RTC flag.
> 
> Certainly if there's no RTC device, then there's also no legacy
> IRQ8.
> 
>>>> VGA: again I don't think there's an easy way to signal it's presence,
>>>> apart from returning ~0 from the multiple IO ports it uses. The fact
>>>> that the 0xA0000-0xBFFFF memory range is also marked as RAM in the e820
>>>> map in HVMlite DomUs should also trigger OSes into disabling VGA due to
>>>> the lack of proper MMIO range, but sadly I think most OSes just assume
>>>> it's there.
>>>
>>> Yes, VGA is kind of more difficult. Looking at all PCI devices'
>>> command words may provide a hint, as may looking at all PCI
>>> bridges' bridge control words.
>>
>> Hm that seems like a rather convoluted procedure, and this needs to be
>> available very early on during the boot process usually.
> 
> As long as the legacy MMIO address range isn't re-used by some
> other device, having an OS blindly write to that range is quite okay
> (and common practice) I think. Iiuc you think about getting log
> messages out early?
> 
>>>> PIT: assumed to be always present in the PC architecture.
>>>
>>> See PIC above.
>>
>> At least on FreeBSD PIT is used much earlier than parsing any ACPI
>> tables (it's used to implement a busy-wait DELAY routine), so I don't
>> think it's sensible to tie this device to ACPI. Also see my note above
>> about requiring ACPI in order to signal all of this.
> 
> See above: Quite likely you will need to do away with using PIT when
> run as HVMlite guest.

Oh yes, that's certainly not a problem for FreeBSD. I've already
replaced the usage of the PIT during early boot with the PV timer. I was
just pointing this out in order to make it easier for other OSes to
adopt HVMlite, I wouldn't be surprised to find out that other OSes also
use the PIT as an early source for delay available universally.

>>>> So, we have the following devices that are assumed to be there: RTC,
>>>> PIC, PIT. Everything else I think can be signalled by other means
>>>> already available.
>>>>
>>>> IMHO, I think we could say that the PIC is never going to be available
>>>> to HVMlite guests (in any case we would enable the lapic/ioapic), and
>>>> maybe enable the RTC and PIT by default?
>>>
>>> That may be a sane initial setup, but with the ACPI flags named
>>> above we may be able to expressed even their absence.
>>
>> I still think we should probably enable those, because they tend to be
>> used very early on boot, before parsing ACPI tables, and in general are
>> considered to be always there on PCs.
> 
> No if you think the modern legacy free way.

Ack.

Roger.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 6/6] x86/HVM: report the set of enabled emulated devices through CPUID
  2016-01-22 15:31           ` Jan Beulich
@ 2016-01-22 15:51             ` Roger Pau Monné
  2016-01-25 11:23               ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Roger Pau Monné @ 2016-01-22 15:51 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

El 22/01/16 a les 16.31, Jan Beulich ha escrit:
>>>> On 22.01.16 at 15:59, <roger.pau@citrix.com> wrote:
>> El 22/01/16 a les 14.34, Andrew Cooper ha escrit:
>>> On 22/01/16 12:43, Roger Pau Monné wrote:
>>>> IOAPIC: tied to LAPIC (so either both enabled or none).
>>>
>>> An IOAPIC is by no means required - they are only for turning legacy
>>> interrupts into MSIs.  It would be perfectly fine for a PVH domain to
>>> have an LAPIC and an SRIOV virtual function, without an IOAPIC at all.
>>>
>>> The presence of LAPICs and IOAPICs reside in the MADT ACPI table.
>>
>> Right, so as said in the reply to Jan, we will require ACPI in order to
>> enable any of this pieces. I don't have a problem with that, just wasn't
>> sure if this requirement was desired.
> 
> The question is whether a non-Dom0 HVMlite guest needs any
> of these in the first place. Because if it doesn't (and that's the
> mode we provide right now), making them dependent on ACPI
> availability should be quite fine: If we need them for some
> purpose in the guest, we'd need to make ACPI tables available.

I only foresee non-Dom0 HVMlite guests with pci-passthrough to ever
require any of those, but that's still a long shot...

>> If that's the plan, then I think we would also need to fixup the tables
>> provided to Dom0 in order to match what's available, but that can be
>> discussed later.
> 
> I don't see what you're getting at here: The IO-APIC information
> should be usable as is for Dom0 (as is the case for PV).

Well, the information it's usable, but the IO-APIC is not accessible by
Dom0, you need to use PHYSDEV ops in order to manage it. One of my aims
would be to get rid of quite of the PHYSDEV/PIRQ ops for Dom0, and
instead provide an emulated local and IO APICs. That would also allow us
to make use of posted-interrupts I assume (although I certainly need to
look more deeply into this).

In this case, how are we going to signal Dom0 that the local and IO
APICs reported by ACPI are usable using the bare mental interfaces?

>>>> So, we have the following devices that are assumed to be there: RTC,
>>>> PIC, PIT. Everything else I think can be signalled by other means
>>>> already available.
>>>>
>>>> IMHO, I think we could say that the PIC is never going to be available
>>>> to HVMlite guests (in any case we would enable the lapic/ioapic), and
>>>> maybe enable the RTC and PIT by default?
>>>>
>>>> Then I think we could get away without any Xen-specific way of reporting
>>>> enabled devices.
>>>
>>> DMLite is a new container type.  I would far rather it was assumed that
>>> there was no legacy hardware at all.
>>
>> So I take that you are in favour of only considering enabling the local
>> APIC and IO APIC maybe for HVMlite, because of the performance benefits,
>> while the other devices are _never_ going to be available to HVMlite
>> guests/hosts at all? (Dom0 already gets the hw VGA)
>>
>> IMHO, I would like to be able to eventually enable them in order to
>> provide an environment that's as close as possible to a compatible PC,
>> in order to reduce the amount of changes required in order to port an OS
>> to run in this mode.
> 
> I didn't think that's among the goals for HVMlite. Just like PV and PVH,
> HVMlite requires OS awareness.

Right, just getting rid of the PHYSDEV/PIRQ operations would be a big
win IMHO.

Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 5/6] libxl: add options to enable/disable emulated devices
  2016-01-21 16:51 ` [PATCH v4 5/6] libxl: add options to enable/disable emulated devices Roger Pau Monne
@ 2016-01-22 17:04   ` Roger Pau Monné
  2016-01-25  9:33     ` Ian Campbell
  0 siblings, 1 reply; 29+ messages in thread
From: Roger Pau Monné @ 2016-01-22 17:04 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, Ian Jackson, Ian Campbell

El 21/01/16 a les 17.51, Roger Pau Monne ha escrit:
> Allow enabling or disabling emulated devices from the libxl domain
> configuration file. For HVM guests with a device model all the emulated
> devices are enabled. For HVM guests without a device model no devices are
> enabled by default, although they can be enabled using the options provided.
> The arbiter of whether a combination is posible or not is always Xen, libxl
> doesn't do any kind of check.
> 
> This set of options is also propagated inside of the libxl migration record
> as part of the contents of the libxl_domain_build_info struct, so that when
> the other end (restore) creates the domain the same set of devices are
> enabled. This is important for future compatibility, in case we decide to
> enable some emulated devices by default for HVMlite guests, old HVMlite
> guests migrated to newer versions should continue to see the same set of
> emulated devices.
> 
> It has been discussed that it would be better to avoid having this
> information inside of the libxl stream, and to instead rely on which devices
> get their context loaded inside of Xen on resume. This of course requires
> more work and it also has certain issues, like the fact that some devices
> don't restore a context at all (like VGA). The consensus is that the
> solution presented in this patch is not going to prevent further
> developments in this direction, and can always be used as a check to make
> sure the Xen context and the libxl context are in sync.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>

To tools committers: please refrain from pushing this until we have a
clear view on which devices we might want to enable in the future.
Adding a bunch of libxl options that are never going to be allowed
doesn't make any sense.

Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 5/6] libxl: add options to enable/disable emulated devices
  2016-01-22 17:04   ` Roger Pau Monné
@ 2016-01-25  9:33     ` Ian Campbell
  0 siblings, 0 replies; 29+ messages in thread
From: Ian Campbell @ 2016-01-25  9:33 UTC (permalink / raw)
  To: Roger Pau Monné, xen-devel; +Cc: Wei Liu, Ian Jackson

On Fri, 2016-01-22 at 18:04 +0100, Roger Pau Monné wrote:
> El 21/01/16 a les 17.51, Roger Pau Monne ha escrit:
> > Allow enabling or disabling emulated devices from the libxl domain
> > configuration file. For HVM guests with a device model all the emulated
> > devices are enabled. For HVM guests without a device model no devices
> > are
> > enabled by default, although they can be enabled using the options
> > provided.
> > The arbiter of whether a combination is posible or not is always Xen,
> > libxl
> > doesn't do any kind of check.
> > 
> > This set of options is also propagated inside of the libxl migration
> > record
> > as part of the contents of the libxl_domain_build_info struct, so that
> > when
> > the other end (restore) creates the domain the same set of devices are
> > enabled. This is important for future compatibility, in case we decide
> > to
> > enable some emulated devices by default for HVMlite guests, old HVMlite
> > guests migrated to newer versions should continue to see the same set
> > of
> > emulated devices.
> > 
> > It has been discussed that it would be better to avoid having this
> > information inside of the libxl stream, and to instead rely on which
> > devices
> > get their context loaded inside of Xen on resume. This of course
> > requires
> > more work and it also has certain issues, like the fact that some
> > devices
> > don't restore a context at all (like VGA). The consensus is that the
> > solution presented in this patch is not going to prevent further
> > developments in this direction, and can always be used as a check to
> > make
> > sure the Xen context and the libxl context are in sync.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Ian Campbell <ian.campbell@citrix.com>
> > Cc: Wei Liu <wei.liu2@citrix.com>
> 
> To tools committers:

Ack.

>  please refrain from pushing this until we have a
> clear view on which devices we might want to enable in the future.
> Adding a bunch of libxl options that are never going to be allowed
> doesn't make any sense.
> 
> Roger.
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v4 6/6] x86/HVM: report the set of enabled emulated devices through CPUID
  2016-01-22 15:51             ` Roger Pau Monné
@ 2016-01-25 11:23               ` Jan Beulich
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2016-01-25 11:23 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, xen-devel

>>> On 22.01.16 at 16:51, <roger.pau@citrix.com> wrote:
> El 22/01/16 a les 16.31, Jan Beulich ha escrit:
>>>>> On 22.01.16 at 15:59, <roger.pau@citrix.com> wrote:
>>> If that's the plan, then I think we would also need to fixup the tables
>>> provided to Dom0 in order to match what's available, but that can be
>>> discussed later.
>> 
>> I don't see what you're getting at here: The IO-APIC information
>> should be usable as is for Dom0 (as is the case for PV).
> 
> Well, the information it's usable, but the IO-APIC is not accessible by
> Dom0, you need to use PHYSDEV ops in order to manage it. One of my aims
> would be to get rid of quite of the PHYSDEV/PIRQ ops for Dom0, and
> instead provide an emulated local and IO APICs. That would also allow us
> to make use of posted-interrupts I assume (although I certainly need to
> look more deeply into this).

Wait - you're aware that some of the IO-APIC setup in the
hypervisor requires Dom0's help? Presenting it with other than
the real ACPI tables would seem to make this impossible.

> In this case, how are we going to signal Dom0 that the local and IO
> APICs reported by ACPI are usable using the bare mental interfaces?

With the above I'm not sure the answer to this would really matter.

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-01-25 11:24 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-21 16:51 [PATCH v4 0/6] HVMlite: DomU fixes and a Dom0 preparatory patch Roger Pau Monne
2016-01-21 16:51 ` [PATCH v4 1/6] libelf: rewrite symtab/strtab loading for Dom0 Roger Pau Monne
2016-01-21 17:29   ` Ian Jackson
2016-01-21 17:55     ` Roger Pau Monné
2016-01-21 18:44       ` Ian Jackson
2016-01-22  8:11       ` Jan Beulich
2016-01-22  9:58         ` Roger Pau Monné
2016-01-21 16:51 ` [PATCH v4 2/6] libxl: introduce LIBXL_VGA_INTERFACE_TYPE_UNKNOWN Roger Pau Monne
2016-01-22 10:59   ` Ian Campbell
2016-01-21 16:51 ` [PATCH v4 3/6] libxl: initialise the build info before calling prepare_config Roger Pau Monne
2016-01-22 11:00   ` Ian Campbell
2016-01-21 16:51 ` [PATCH v4 4/6] x86/PV: allow PV guests to have an emulated PIT Roger Pau Monne
2016-01-22 10:48   ` Jan Beulich
2016-01-22 11:03     ` Ian Campbell
2016-01-21 16:51 ` [PATCH v4 5/6] libxl: add options to enable/disable emulated devices Roger Pau Monne
2016-01-22 17:04   ` Roger Pau Monné
2016-01-25  9:33     ` Ian Campbell
2016-01-21 16:51 ` [PATCH v4 6/6] x86/HVM: report the set of enabled emulated devices through CPUID Roger Pau Monne
2016-01-22 10:57   ` Jan Beulich
2016-01-22 12:43     ` Roger Pau Monné
2016-01-22 13:24       ` Jan Beulich
2016-01-22 14:41         ` Roger Pau Monné
2016-01-22 15:02           ` Jan Beulich
2016-01-22 15:41             ` Roger Pau Monné
2016-01-22 13:34       ` Andrew Cooper
2016-01-22 14:59         ` Roger Pau Monné
2016-01-22 15:31           ` Jan Beulich
2016-01-22 15:51             ` Roger Pau Monné
2016-01-25 11:23               ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.