xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 9] xSplice v1 design and implementation.
@ 2016-04-25 15:34 Konrad Rzeszutek Wilk
  2016-04-25 15:34 ` [PATCH v9 01/27] Revert "libxc/libxl/python/xenstat/ocaml: Use new XEN_VERSION hypercall" Konrad Rzeszutek Wilk
                   ` (27 more replies)
  0 siblings, 28 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:34 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack

Hey!

Changelog:
v8.1: http://lists.xen.org/archives/html/xen-devel/2016-04/msg01903.html
 - Worked on Jan's comments.
v8: since http://lists.xen.org/archives/html/xen-devel/2016-04/msg01873.html
 - Posting the _RIGHT_ set of patches.

v7: http://lists.xen.org/archives/html/xen-devel/2016-04/msg01476.html
 - Ingested newer version of x86/mm: Introduce modify_xen_mappings()
 - Implemented faster symbol table lookup (NEW)
 - Carried out tests on large CPU machine (240CPUs)
 - Made the struct xsplice_patch_func work on ARM32 and changed its size
   (64bit it is 64bytes, 32bit is 52 bytes - and the offset is different too)
 - Changed a bunch of printk to dprintk, adjusted XENLOG_ levels.
 - Resurrected the XENVER_build_id.
 - Reverted VERSION_op.
v6: http://lists.xen.org/archives/html/xen-devel/2016-04/msg00871.html
 - Acted on all comments from Andrew, Julien, and Jan.
 - Got help from Andrew on one of them over the weekend.
 - Dropped: xsplice: Add .xsplice.hooks functions and test-case,
   xsplice: Add support for shadow variables.
v5: http://lists.xen.org/archives/html/xen-devel/2016-03/msg03286.html
 - Acted at all comments from Jan.
v4: http://lists.xen.org/archives/html/xen-devel/2016-03/msg01776.html
 - Lots of review. Lots of rework. Some patches checked in.
v3: http://www.gossamer-threads.com/lists/xen/devel/418262
    and 
    http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg04106.html
 - Act on all reviews.
 - Redo the flow of patches
v2: http://lists.xen.org/archives/html/xen-devel/2016-01/msg01597.html
 - Updated code/docs/design with review comments.
 - Make xen also have an PT_NOTE
 - Added more of Ross's patches
 - Combined build-id patchset with this.
(since the RFC and the Seattle Xen presentation)
 - Finished off some of the work around the build-id.
 - Settled on the preemption mechanism.
 - Cleaned the patches a lot up, broke them up to easy
   review for maintainers.
v1: http://lists.xenproject.org/archives/html/xen-devel/2015-09/msg02116.html
  - Put all the design comments in the code
Prototype: http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg02595.html
[Posting by Ross]
 - Took all reviews into account.
 - Redid the patches


*Maintainers*

Since v8.1, all patches except:
 xsplice, symbols: Implement fast symbol names -> virtual addresses lookup

have reviewed-by.

*Maintainers*

Legend:
 *    - See below
 R    - Reviewed 
 R+   - Reviewed by two folks
 A    - Acked by maintainer of the area (hypervisor or toolstack)

      Revert "libxc/libxl/python/xenstat/ocaml: Use new
  A   Revert "HYPERCALL_version_op. New hypercall mirroring
  A   xsplice: Design document
   R  xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
  AR  libxc: Implementation of XEN_XSPLICE_op in libxc
  A   xen-xsplice: Tool to manipulate xsplice payloads
  AR  arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables lookup.
        [Julien Acked the ARM part]
  AR  arm/x86/vmap: Add v[z|m]alloc_xen, vfree_xen and vm_init_type
        [Julien Acked the ARM part]
   R+  x86/mm: Introduce modify_xen_mappings()
   R  xsplice: Add helper elf routines
   R  xsplice: Implement payload loading
   R  xsplice: Implement support for applying/reverting/replacing patches.
   R  x86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version'.
   R  xsplice,symbols: Implement symbol name resolution on address.
*     xsplice, symbols: Implement fast symbol names -> virtual addresses lookup
   R  x86, xsplice: Print payload's symbol name and payload name in backtraces
   R  xsplice: Add support for bug frames.
   R  xsplice: Add support for exception tables.
   R  xsplice: Add support for alternatives
  AR  build_id: Provide ld-embedded build-ids
  AR  xsplice: Print build_id in keyhandler and on bootup.
  A   XENVER_build_id/libxc: Provide ld-embedded build-id
  A   libxl: info: Display build_id of the hypervisor.
   R  xsplice: Stacking build-id dependency checking.
   R  xsplice/xen_replace_world: Test-case for XSPLICE_ACTION_REPLACE
   R  xsplice: Prevent duplicate payloads from being loaded.
   R  MAINTAINERS/xsplice: Add myself and Ross as the maintainers.


*Are there any TODOs left from v5,v6,v7,v8.1 reviews?*

None.

*What is xSplice?*

A mechanism to binarily patch the running hypervisor with new
opcodes that have come about due to primarily security updates.

*What will this patchset do once I've it*

Patch the hypervisor.

*Why are you emailing me?*

Please please review as many patches as possible.

*OK, what do you have?*

They are located at a git tree:
  git://xenbits.xen.org/people/konradwilk/xen.git   xsplice.v9

(Copying from Ross's email):

Much of the work is implementing a basic version of the Linux kernel module
loader. The code:
* Loading of xSplice ELF payloads.
* Copying allocated sections into a new executable region of memory.
* Resolving symbols.
* Applying relocations.
* Patching of altinstructions.
* Special handling of bug frames and exception tables.
* Unloading of xSplice ELF payloads.
* Compiling a sample xSplice ELF payload
* Resolving symbols
* Using build-id dependencies
* Support for shadow variable framework
* Support for executing ELF payload functions on load/unload.

The other main bit of this work is applying and reverting the patches safely.
As implemented, the code is patched with each CPU waiting in the
return-to-guest path (i.e. with no stack) or on the cpu-idle path
which appears to be the safest way of patching. While it is safe we should
still (in the next wave of patches) to verify to not patch cetain critical
sections (say the code doing the patching)

All of the following should work:
* Applying patches safely.
* Reverting patches safely.
* Replacing patches safely (e.g. reverting any applied patches and applying
   a new patch).
* Bug frames as part of modules. This means adding or
  changing WARN, ASSERT, BUG, and run_in_exception_handler works correctly.
  Line number only changes _are ignored_.
* Exception tables as part of modules. E.g. wrmsr_safe and copy_to_user work
  correctly when used in a patch module.
* Stacking of patches on top of each other
* Resolving symbols (even of patches)

*Limitations*

The above is enough to fully implement an update system where multiple source
patches are combined (using combinediff) and built into a single binary
which then atomically replaces any existing loaded patches
(this is why Ross added a REPLACE operation). This is the approach used
by kPatch and kGraft.

Multiple completely independent patches can also be loaded but unexpected
interactions may occur.

As it stands, the patches are statically linked which means that independent
patches cannot be linked against one another (e.g. if one introduces a
new symbol). Using the combinediff approach above fixes this.

Backtraces containing functions from a patch module do not show the symbol name.

There is no checking that a patch which is loaded is built for the
correct hypervisor (need to use build-id).

Binary patching works at the function level.

*Testing*

You can use the example code included in this patchset:

# xl info | grep extra
xen_extra              : -unstable
# xen-xsplice load /usr/lib/debug/xen_hello_world.xsplice
Uploading /usr/lib/debug/xen_hello_world.xsplice (2071 bytes)
Performing check: completed
Performing apply:. completed
# xl info | grep extra
xen_extra              : Hello World
# xen-xsplice revert xen_hello_world
Performing revert:. completed
# xen-xsplice unload xen_hello_world
Performing unload: completed
# xl info | grep extra
xen_extra              : -unstable

Or you can use git://xenbits.xen.org/people/konradwilk/xsplice-build-tools.git
which generates the ELF payloads.

This link has a nice description of how to use the tool:
http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg02595.html
 .gitignore                                   |    5 +
 Config.mk                                    |   12 +
 MAINTAINERS                                  |   10 +
 docs/misc/xsplice.markdown                   | 1113 ++++++++++++++++++
 tools/flask/policy/policy/modules/xen/xen.te |    9 +-
 tools/libxc/include/xenctrl.h                |   94 +-
 tools/libxc/xc_core.c                        |   35 +-
 tools/libxc/xc_dom_boot.c                    |   12 +-
 tools/libxc/xc_domain.c                      |    3 +-
 tools/libxc/xc_misc.c                        |  355 ++++++
 tools/libxc/xc_private.c                     |   60 +-
 tools/libxc/xc_private.h                     |   18 +-
 tools/libxc/xc_resume.c                      |    3 +-
 tools/libxc/xc_sr_save.c                     |    9 +-
 tools/libxc/xg_save_restore.h                |    6 +-
 tools/libxl/libxl.c                          |  107 +-
 tools/libxl/libxl.h                          |    6 +
 tools/libxl/libxl_types.idl                  |    1 +
 tools/libxl/xl_cmdimpl.c                     |    1 +
 tools/misc/Makefile                          |    4 +
 tools/misc/xen-xsplice.c                     |  463 ++++++++
 tools/ocaml/libs/xc/xenctrl_stubs.c          |   39 +-
 tools/python/xen/lowlevel/xc/xc.c            |   30 +-
 tools/xenstat/libxenstat/src/xenstat.c       |   12 +-
 tools/xentrace/xenctx.c                      |    3 +-
 xen/Makefile                                 |    8 +-
 xen/arch/arm/Makefile                        |    6 +-
 xen/arch/arm/kernel.c                        |    2 +-
 xen/arch/arm/mm.c                            |    2 +-
 xen/arch/arm/setup.c                         |    4 +
 xen/arch/arm/traps.c                         |   40 +-
 xen/arch/arm/xen.lds.S                       |   15 +-
 xen/arch/arm/xsplice.c                       |   78 ++
 xen/arch/x86/Makefile                        |   58 +-
 xen/arch/x86/alternative.c                   |   46 +-
 xen/arch/x86/boot/mkelf32.c                  |  142 ++-
 xen/arch/x86/domain.c                        |    6 +
 xen/arch/x86/extable.c                       |   41 +-
 xen/arch/x86/hvm/hvm.c                       |    1 -
 xen/arch/x86/mm.c                            |   77 +-
 xen/arch/x86/platform_hypercall.c            |    5 +-
 xen/arch/x86/setup.c                         |   28 +-
 xen/arch/x86/test/Makefile                   |   88 ++
 xen/arch/x86/test/xen_bye_world.c            |   34 +
 xen/arch/x86/test/xen_bye_world_func.c       |   22 +
 xen/arch/x86/test/xen_hello_world.c          |   33 +
 xen/arch/x86/test/xen_hello_world_func.c     |   39 +
 xen/arch/x86/test/xen_replace_world.c        |   32 +
 xen/arch/x86/test/xen_replace_world_func.c   |   22 +
 xen/arch/x86/traps.c                         |   45 +-
 xen/arch/x86/x86_64/compat/entry.S           |    2 -
 xen/arch/x86/x86_64/entry.S                  |    2 -
 xen/arch/x86/xen.lds.S                       |   30 +-
 xen/arch/x86/xsplice.c                       |  241 ++++
 xen/common/Kconfig                           |   28 +
 xen/common/Makefile                          |    3 +
 xen/common/compat/kernel.c                   |    2 -
 xen/common/kernel.c                          |  226 +---
 xen/common/symbols-dummy.c                   |    5 +
 xen/common/symbols.c                         |   75 +-
 xen/common/sysctl.c                          |    7 +
 xen/common/version.c                         |   75 ++
 xen/common/virtual_region.c                  |  147 +++
 xen/common/vmap.c                            |  205 ++--
 xen/common/vsprintf.c                        |   12 +
 xen/common/xsplice.c                         | 1564 ++++++++++++++++++++++++++
 xen/common/xsplice_elf.c                     |  495 ++++++++
 xen/drivers/acpi/osl.c                       |    2 +-
 xen/include/asm-x86/alternative.h            |    4 +
 xen/include/asm-x86/current.h                |   10 +-
 xen/include/asm-x86/uaccess.h                |    2 +
 xen/include/public/arch-arm.h                |    2 -
 xen/include/public/sysctl.h                  |  186 +++
 xen/include/public/version.h                 |   82 +-
 xen/include/public/xen.h                     |    1 -
 xen/include/xen/elfstructs.h                 |    8 +
 xen/include/xen/hypercall.h                  |    4 -
 xen/include/xen/mm.h                         |    2 +
 xen/include/xen/symbols.h                    |   21 +-
 xen/include/xen/version.h                    |    1 +
 xen/include/xen/virtual_region.h             |   47 +
 xen/include/xen/vmap.h                       |   22 +-
 xen/include/xen/xsplice.h                    |  119 ++
 xen/include/xen/xsplice_elf.h                |   58 +
 xen/include/xsm/dummy.h                      |   21 -
 xen/include/xsm/xsm.h                        |    6 -
 xen/tools/symbols.c                          |   50 +-
 xen/xsm/dummy.c                              |    1 -
 xen/xsm/flask/hooks.c                        |   40 +-
 xen/xsm/flask/policy/access_vectors          |   25 +-
 90 files changed, 6399 insertions(+), 718 deletions(-)


Andrew Cooper (1):
      x86/mm: Introduce modify_xen_mappings()

Konrad Rzeszutek Wilk (17):
      Revert "libxc/libxl/python/xenstat/ocaml: Use new XEN_VERSION hypercall"
      Revert "HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane."
      xsplice: Design document
      xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
      libxc: Implementation of XEN_XSPLICE_op in libxc
      xen-xsplice: Tool to manipulate xsplice payloads
      arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables lookup.
      arm/x86/vmap: Add v[z|m]alloc_xen and vm_init_type
      x86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version'.
      xsplice, symbols: Implement fast symbol names -> virtual addresses lookup
      build_id: Provide ld-embedded build-ids
      xsplice: Print build_id in keyhandler and on bootup.
      XENVER_build_id/libxc: Provide ld-embedded build-id
      libxl: info: Display build_id of the hypervisor.
      xsplice: Stacking build-id dependency checking.
      xsplice/xen_replace_world: Test-case for XSPLICE_ACTION_REPLACE
      MAINTAINERS/xsplice: Add myself and Ross as the maintainers.

Ross Lagerwall (9):
      xsplice: Add helper elf routines
      xsplice: Implement payload loading
      xsplice: Implement support for applying/reverting/replacing patches.
      xsplice,symbols: Implement symbol name resolution on address.
      x86, xsplice: Print payload's symbol name and payload name in backtraces
      xsplice: Add support for bug frames.
      xsplice: Add support for exception tables.
      xsplice: Add support for alternatives
      xsplice: Prevent duplicate payloads from being loaded.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v9 01/27] Revert "libxc/libxl/python/xenstat/ocaml: Use new XEN_VERSION hypercall"
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
@ 2016-04-25 15:34 ` Konrad Rzeszutek Wilk
  2016-04-25 15:48   ` Jan Beulich
  2016-04-25 15:34 ` [PATCH v9 02/27] Revert "HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane." Konrad Rzeszutek Wilk
                   ` (26 subsequent siblings)
  27 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:34 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Konrad Rzeszutek Wilk

This reverts commit d275ec9ca8a86f7c9c213f3551194d471ce90fbd.

As we prefer to still utilize the old XENVER_ hypercall.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Requested-and-acked-by: Jan Beulich <jbeulich@suse.com>
---
 tools/libxc/include/xenctrl.h          | 32 +-------------
 tools/libxc/xc_core.c                  | 35 ++++++++--------
 tools/libxc/xc_dom_boot.c              | 12 +-----
 tools/libxc/xc_domain.c                |  3 +-
 tools/libxc/xc_private.c               | 53 +++++++++++++++++++----
 tools/libxc/xc_private.h               |  7 ++--
 tools/libxc/xc_resume.c                |  3 +-
 tools/libxc/xc_sr_save.c               |  9 ++--
 tools/libxc/xg_save_restore.h          |  6 +--
 tools/libxl/libxl.c                    | 77 +++++++++++++---------------------
 tools/ocaml/libs/xc/xenctrl_stubs.c    | 39 ++++++++++-------
 tools/python/xen/lowlevel/xc/xc.c      | 30 ++++++-------
 tools/xenstat/libxenstat/src/xenstat.c | 12 +++---
 tools/xentrace/xenctx.c                |  3 +-
 14 files changed, 146 insertions(+), 175 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index f5a034a..42f201b 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1536,37 +1536,7 @@ int xc_tbuf_set_evt_mask(xc_interface *xch, uint32_t mask);
 int xc_domctl(xc_interface *xch, struct xen_domctl *domctl);
 int xc_sysctl(xc_interface *xch, struct xen_sysctl *sysctl);
 
-/**
- * This function returns the size of buffer to be allocated for
- * the cmd. The cmd are XEN_VERSION_*.
- */
-ssize_t xc_version_len(xc_interface *xch, unsigned int cmd);
-
-/**
- * This function retrieves the information from the version_op hypercall.
- * The len is the size of the arg buffer. If arg is NULL, will not
- * perform hypercall - instead will just return the size of arg
- * buffer that is needed.
- *
- * Note that prior to Xen 4.7 this would return 0 for success and
- * negative value (-1) for error (with the error in errno). In Xen 4.7
- * and later for success it will return an positive value which is the
- * number of bytes copied in arg.
- *
- * It can also return -1 with various errno values:
- *  - EPERM - not permitted.
- *  - ENOBUFS - the len was to short, output in arg truncated.
- *  - ENOSYS - not implemented.
- *
- * @parm xch a handle to an open hypervisor interface
- * @parm cmd XEN_VERSION_* value
- * @param arg Pointer to xen_version_op_buf_t or xen_version_op_val_t
- * @param len Size of arg
- * @return size of bytes copied in arg on success, -1 on failure (and
- * errno will contain the error)
- *
- */
-int xc_version(xc_interface *xch, unsigned int cmd, void *arg, size_t len);
+int xc_version(xc_interface *xch, int cmd, void *arg);
 
 int xc_flask_op(xc_interface *xch, xen_flask_op_t *op);
 
diff --git a/tools/libxc/xc_core.c b/tools/libxc/xc_core.c
index cfeba6b..d792566 100644
--- a/tools/libxc/xc_core.c
+++ b/tools/libxc/xc_core.c
@@ -270,43 +270,42 @@ elfnote_fill_xen_version(xc_interface *xch,
                          *xen_version)
 {
     int rc;
-    xen_version_op_val_t val = 0;
     memset(xen_version, 0, sizeof(*xen_version));
 
-    rc = xc_version(xch, XEN_VERSION_version, &val, sizeof(val));
+    rc = xc_version(xch, XENVER_version, NULL);
     if ( rc < 0 )
         return rc;
-    xen_version->major_version = val >> 16;
-    xen_version->minor_version = val & ((1 << 16) - 1);
+    xen_version->major_version = rc >> 16;
+    xen_version->minor_version = rc & ((1 << 16) - 1);
 
-    rc = xc_version(xch, XEN_VERSION_extraversion,
-                    xen_version->extra_version,
-                    sizeof(xen_version->extra_version));
+    rc = xc_version(xch, XENVER_extraversion,
+                    &xen_version->extra_version);
     if ( rc < 0 )
         return rc;
 
-    rc = xc_version(xch, XEN_VERSION_capabilities,
-                    xen_version->capabilities,
-                    sizeof(xen_version->capabilities));
+    rc = xc_version(xch, XENVER_compile_info,
+                    &xen_version->compile_info);
     if ( rc < 0 )
         return rc;
 
-    rc = xc_version(xch, XEN_VERSION_changeset, xen_version->changeset,
-                    sizeof(xen_version->changeset));
+    rc = xc_version(xch,
+                    XENVER_capabilities, &xen_version->capabilities);
     if ( rc < 0 )
         return rc;
 
-    rc = xc_version(xch, XEN_VERSION_platform_parameters,
-                    &xen_version->platform_parameters,
-                    sizeof(xen_version->platform_parameters));
+    rc = xc_version(xch, XENVER_changeset, &xen_version->changeset);
     if ( rc < 0 )
         return rc;
 
-    val = 0;
-    rc = xc_version(xch, XEN_VERSION_pagesize, &val, sizeof(val));
+    rc = xc_version(xch, XENVER_platform_parameters,
+                    &xen_version->platform_parameters);
     if ( rc < 0 )
         return rc;
-    xen_version->pagesize = val;
+
+    rc = xc_version(xch, XENVER_pagesize, NULL);
+    if ( rc < 0 )
+        return rc;
+    xen_version->pagesize = rc;
 
     return 0;
 }
diff --git a/tools/libxc/xc_dom_boot.c b/tools/libxc/xc_dom_boot.c
index bbff72e..791041b 100644
--- a/tools/libxc/xc_dom_boot.c
+++ b/tools/libxc/xc_dom_boot.c
@@ -112,19 +112,11 @@ int xc_dom_compat_check(struct xc_dom_image *dom)
 
 int xc_dom_boot_xen_init(struct xc_dom_image *dom, xc_interface *xch, domid_t domid)
 {
-    xen_version_op_val_t val = 0;
-
-    if ( xc_version(xch, XEN_VERSION_version, &val, sizeof(val)) < 0 )
-    {
-        xc_dom_panic(xch, XC_INTERNAL_ERROR, "can't get Xen version!");
-        return -1;
-    }
-    dom->xen_version = val;
     dom->xch = xch;
     dom->guest_domid = domid;
 
-    if ( xc_version(xch, XEN_VERSION_capabilities, dom->xen_caps,
-                    sizeof(dom->xen_caps)) < 0 )
+    dom->xen_version = xc_version(xch, XENVER_version, NULL);
+    if ( xc_version(xch, XENVER_capabilities, &dom->xen_caps) < 0 )
     {
         xc_dom_panic(xch, XC_INTERNAL_ERROR, "can't get xen capabilities");
         return -1;
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 9ebd1d6..050216e 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -2084,8 +2084,7 @@ int xc_map_domain_meminfo(xc_interface *xch, int domid,
     _di.guest_width = minfo->guest_width;
 
     /* Get page table levels (see get_platform_info() in xg_save_restore.h */
-    if ( xc_version(xch, XEN_VERSION_capabilities, xen_caps,
-                    sizeof(xen_caps)) < 0 )
+    if ( xc_version(xch, XENVER_capabilities, &xen_caps) )
     {
         PERROR("Could not get Xen capabilities (for page table levels)");
         return -1;
diff --git a/tools/libxc/xc_private.c b/tools/libxc/xc_private.c
index 631ad91..c41e433 100644
--- a/tools/libxc/xc_private.c
+++ b/tools/libxc/xc_private.c
@@ -457,23 +457,58 @@ int xc_sysctl(xc_interface *xch, struct xen_sysctl *sysctl)
     return do_sysctl(xch, sysctl);
 }
 
-ssize_t xc_version_len(xc_interface *xch, unsigned int cmd)
+int xc_version(xc_interface *xch, int cmd, void *arg)
 {
-    return do_version_op(xch, cmd, NULL, 0);
-}
-
-int xc_version(xc_interface *xch, unsigned int cmd, void *arg, size_t sz)
-{
-    DECLARE_HYPERCALL_BOUNCE(arg, sz, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_HYPERCALL_BOUNCE(arg, 0, XC_HYPERCALL_BUFFER_BOUNCE_OUT); /* Size unknown until cmd decoded */
+    size_t sz;
     int rc;
 
-    if ( xc_hypercall_bounce_pre(xch, arg) )
+    switch ( cmd )
+    {
+    case XENVER_version:
+        sz = 0;
+        break;
+    case XENVER_extraversion:
+        sz = sizeof(xen_extraversion_t);
+        break;
+    case XENVER_compile_info:
+        sz = sizeof(xen_compile_info_t);
+        break;
+    case XENVER_capabilities:
+        sz = sizeof(xen_capabilities_info_t);
+        break;
+    case XENVER_changeset:
+        sz = sizeof(xen_changeset_info_t);
+        break;
+    case XENVER_platform_parameters:
+        sz = sizeof(xen_platform_parameters_t);
+        break;
+    case XENVER_get_features:
+        sz = sizeof(xen_feature_info_t);
+        break;
+    case XENVER_pagesize:
+        sz = 0;
+        break;
+    case XENVER_guest_handle:
+        sz = sizeof(xen_domain_handle_t);
+        break;
+    case XENVER_commandline:
+        sz = sizeof(xen_commandline_t);
+        break;
+    default:
+        ERROR("xc_version: unknown command %d\n", cmd);
+        return -EINVAL;
+    }
+
+    HYPERCALL_BOUNCE_SET_SIZE(arg, sz);
+
+    if ( (sz != 0) && xc_hypercall_bounce_pre(xch, arg) )
     {
         PERROR("Could not bounce buffer for version hypercall");
         return -ENOMEM;
     }
 
-    rc = do_version_op(xch, cmd, HYPERCALL_BUFFER(arg), sz);
+    rc = do_xen_version(xch, cmd, HYPERCALL_BUFFER(arg));
 
     if ( sz != 0 )
         xc_hypercall_bounce_post(xch, arg);
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index 5be8fdd..aa8daf1 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -214,12 +214,11 @@ void xc__hypercall_buffer_cache_release(xc_interface *xch);
  * Hypercall interfaces.
  */
 
-static inline long do_version_op(xc_interface *xch, int cmd,
-                                 xc_hypercall_buffer_t *dest, ssize_t len)
+static inline int do_xen_version(xc_interface *xch, int cmd, xc_hypercall_buffer_t *dest)
 {
     DECLARE_HYPERCALL_BUFFER_ARGUMENT(dest);
-    return xencall3(xch->xcall, __HYPERVISOR_version_op,
-                    cmd, HYPERCALL_BUFFER_AS_ARG(dest), len);
+    return xencall2(xch->xcall, __HYPERVISOR_xen_version,
+                    cmd, HYPERCALL_BUFFER_AS_ARG(dest));
 }
 
 static inline int do_physdev_op(xc_interface *xch, int cmd, void *op, size_t len)
diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
index 2b6c308..c169204 100644
--- a/tools/libxc/xc_resume.c
+++ b/tools/libxc/xc_resume.c
@@ -56,8 +56,7 @@ static int modify_returncode(xc_interface *xch, uint32_t domid)
             return 0;
 
         /* HVM guests have host address width. */
-        if ( xc_version(xch, XEN_VERSION_capabilities, caps,
-                        sizeof(caps)) < 0 )
+        if ( xc_version(xch, XENVER_capabilities, &caps) != 0 )
         {
             PERROR("Could not get Xen capabilities");
             return -1;
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index 291fe9f..f98c827 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -9,7 +9,7 @@
 static int write_headers(struct xc_sr_context *ctx, uint16_t guest_type)
 {
     xc_interface *xch = ctx->xch;
-    xen_version_op_val_t xen_version;
+    int32_t xen_version = xc_version(xch, XENVER_version, NULL);
     struct xc_sr_ihdr ihdr =
         {
             .marker  = IHDR_MARKER,
@@ -21,16 +21,15 @@ static int write_headers(struct xc_sr_context *ctx, uint16_t guest_type)
         {
             .type       = guest_type,
             .page_shift = XC_PAGE_SHIFT,
+            .xen_major  = (xen_version >> 16) & 0xffff,
+            .xen_minor  = (xen_version)       & 0xffff,
         };
 
-    if ( xc_version(xch, XEN_VERSION_version, &xen_version,
-                    sizeof(xen_version)) < 0 )
+    if ( xen_version < 0 )
     {
         PERROR("Unable to obtain Xen Version");
         return -1;
     }
-    dhdr.xen_major = (xen_version >> 16) & 0xffff;
-    dhdr.xen_minor = (xen_version)       & 0xffff;
 
     if ( write_exact(ctx->fd, &ihdr, sizeof(ihdr)) )
     {
diff --git a/tools/libxc/xg_save_restore.h b/tools/libxc/xg_save_restore.h
index 007875f..303081d 100644
--- a/tools/libxc/xg_save_restore.h
+++ b/tools/libxc/xg_save_restore.h
@@ -57,12 +57,10 @@ static inline int get_platform_info(xc_interface *xch, uint32_t dom,
     xen_capabilities_info_t xen_caps = "";
     xen_platform_parameters_t xen_params;
 
-    if (xc_version(xch, XEN_VERSION_platform_parameters, &xen_params,
-                   sizeof(xen_params)) < 0)
+    if (xc_version(xch, XENVER_platform_parameters, &xen_params) != 0)
         return 0;
 
-    if (xc_version(xch, XEN_VERSION_capabilities, xen_caps,
-                   sizeof(xen_caps)) < 0)
+    if (xc_version(xch, XENVER_capabilities, &xen_caps) != 0)
         return 0;
 
     if (xc_maximum_ram_page(xch, max_mfn))
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index d232473..eec899d 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5353,71 +5353,50 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
     return ret;
 }
 
-
-static int libxl__xc_version_wrapper(libxl__gc *gc, unsigned int cmd,
-                                     char *buf, ssize_t len, char **dst)
-{
-    int r;
-
-    r = xc_version(CTX->xch, cmd, buf, len);
-    if ( r == -EPERM ) {
-        buf[0] = '\0';
-    } else if ( r < 0 ) {
-        return r;
-    }
-    *dst = libxl__strdup(NOGC, buf);
-    return 0;
-}
-
 const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx)
 {
     GC_INIT(ctx);
-    char *buf;
-    xen_version_op_val_t val = 0;
+    union {
+        xen_extraversion_t xen_extra;
+        xen_compile_info_t xen_cc;
+        xen_changeset_info_t xen_chgset;
+        xen_capabilities_info_t xen_caps;
+        xen_platform_parameters_t p_parms;
+        xen_commandline_t xen_commandline;
+    } u;
+    long xen_version;
     libxl_version_info *info = &ctx->version_info;
 
     if (info->xen_version_extra != NULL)
         goto out;
 
-    if (xc_version(CTX->xch, XEN_VERSION_pagesize, &val, sizeof(val)) < 0)
-        goto out;
+    xen_version = xc_version(ctx->xch, XENVER_version, NULL);
+    info->xen_version_major = xen_version >> 16;
+    info->xen_version_minor = xen_version & 0xFF;
 
-    info->pagesize = val;
-    /* 4K buffer. */
-    buf = libxl__zalloc(gc, info->pagesize);
+    xc_version(ctx->xch, XENVER_extraversion, &u.xen_extra);
+    info->xen_version_extra = libxl__strdup(NOGC, u.xen_extra);
 
-    val = 0;
-    if (xc_version(CTX->xch, XEN_VERSION_version, &val, sizeof(val)) < 0)
-        goto out;
-    info->xen_version_major = val >> 16;
-    info->xen_version_minor = val & 0xFF;
+    xc_version(ctx->xch, XENVER_compile_info, &u.xen_cc);
+    info->compiler = libxl__strdup(NOGC, u.xen_cc.compiler);
+    info->compile_by = libxl__strdup(NOGC, u.xen_cc.compile_by);
+    info->compile_domain = libxl__strdup(NOGC, u.xen_cc.compile_domain);
+    info->compile_date = libxl__strdup(NOGC, u.xen_cc.compile_date);
 
-    if (libxl__xc_version_wrapper(gc, XEN_VERSION_extraversion, buf,
-                                  info->pagesize, &info->xen_version_extra) < 0)
-        goto out;
+    xc_version(ctx->xch, XENVER_capabilities, &u.xen_caps);
+    info->capabilities = libxl__strdup(NOGC, u.xen_caps);
 
-    info->compiler = libxl__strdup(NOGC, "");
-    info->compile_by = libxl__strdup(NOGC, "");
-    info->compile_domain = libxl__strdup(NOGC, "");
-    info->compile_date = libxl__strdup(NOGC, "");
+    xc_version(ctx->xch, XENVER_changeset, &u.xen_chgset);
+    info->changeset = libxl__strdup(NOGC, u.xen_chgset);
 
-    if (libxl__xc_version_wrapper(gc, XEN_VERSION_capabilities, buf,
-                                  info->pagesize, &info->capabilities) < 0)
-        goto out;
+    xc_version(ctx->xch, XENVER_platform_parameters, &u.p_parms);
+    info->virt_start = u.p_parms.virt_start;
 
-    if (libxl__xc_version_wrapper(gc, XEN_VERSION_changeset, buf,
-                                  info->pagesize, &info->changeset) < 0)
-        goto out;
-
-    val = 0;
-    if (xc_version(CTX->xch, XEN_VERSION_platform_parameters, &val,
-                   sizeof(val)) < 0)
-        goto out;
+    info->pagesize = xc_version(ctx->xch, XENVER_pagesize, NULL);
 
-    info->virt_start = val;
+    xc_version(ctx->xch, XENVER_commandline, &u.xen_commandline);
+    info->commandline = libxl__strdup(NOGC, u.xen_commandline);
 
-    (void)libxl__xc_version_wrapper(gc, XEN_VERSION_commandline, buf,
-                                    info->pagesize, &info->commandline);
  out:
     GC_FREE;
     return info;
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 22741d5..5477df3 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -853,21 +853,21 @@ CAMLprim value stub_xc_version_version(value xch)
 	CAMLparam1(xch);
 	CAMLlocal1(result);
 	xen_extraversion_t extra;
-	xen_version_op_val_t packed;
+	long packed;
 	int retval;
 
 	caml_enter_blocking_section();
-	retval = xc_version(_H(xch), XEN_VERSION_version, &packed, sizeof(packed));
+	packed = xc_version(_H(xch), XENVER_version, NULL);
 	caml_leave_blocking_section();
 
-	if (retval < 0)
+	if (packed < 0)
 		failwith_xc(_H(xch));
 
 	caml_enter_blocking_section();
-	retval = xc_version(_H(xch), XEN_VERSION_extraversion, &extra, sizeof(extra));
+	retval = xc_version(_H(xch), XENVER_extraversion, &extra);
 	caml_leave_blocking_section();
 
-	if (retval < 0)
+	if (retval)
 		failwith_xc(_H(xch));
 
 	result = caml_alloc_tuple(3);
@@ -884,28 +884,37 @@ CAMLprim value stub_xc_version_compile_info(value xch)
 {
 	CAMLparam1(xch);
 	CAMLlocal1(result);
+	xen_compile_info_t ci;
+	int retval;
+
+	caml_enter_blocking_section();
+	retval = xc_version(_H(xch), XENVER_compile_info, &ci);
+	caml_leave_blocking_section();
+
+	if (retval)
+		failwith_xc(_H(xch));
 
 	result = caml_alloc_tuple(4);
 
-	Store_field(result, 0, caml_copy_string(""));
-	Store_field(result, 1, caml_copy_string(""));
-	Store_field(result, 2, caml_copy_string(""));
-	Store_field(result, 3, caml_copy_string(""));
+	Store_field(result, 0, caml_copy_string(ci.compiler));
+	Store_field(result, 1, caml_copy_string(ci.compile_by));
+	Store_field(result, 2, caml_copy_string(ci.compile_domain));
+	Store_field(result, 3, caml_copy_string(ci.compile_date));
 
 	CAMLreturn(result);
 }
 
 
-static value xc_version_single_string(value xch, int code, void *info, ssize_t len)
+static value xc_version_single_string(value xch, int code, void *info)
 {
 	CAMLparam1(xch);
 	int retval;
 
 	caml_enter_blocking_section();
-	retval = xc_version(_H(xch), code, info, len);
+	retval = xc_version(_H(xch), code, info);
 	caml_leave_blocking_section();
 
-	if (retval < 0)
+	if (retval)
 		failwith_xc(_H(xch));
 
 	CAMLreturn(caml_copy_string((char *)info));
@@ -916,8 +925,7 @@ CAMLprim value stub_xc_version_changeset(value xch)
 {
 	xen_changeset_info_t ci;
 
-	return xc_version_single_string(xch, XEN_VERSION_changeset,
-					&ci, sizeof(ci));
+	return xc_version_single_string(xch, XENVER_changeset, &ci);
 }
 
 
@@ -925,8 +933,7 @@ CAMLprim value stub_xc_version_capabilities(value xch)
 {
 	xen_capabilities_info_t ci;
 
-	return xc_version_single_string(xch, XEN_VERSION_capabilities,
-					&ci, sizeof(ci));
+	return xc_version_single_string(xch, XENVER_capabilities, &ci);
 }
 
 
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index 812a905..8411789 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -1204,40 +1204,34 @@ static PyObject *pyxc_xeninfo(XcObject *self)
     xen_capabilities_info_t xen_caps;
     xen_platform_parameters_t p_parms;
     xen_commandline_t xen_commandline;
-    xen_version_op_val_t xen_version;
-    xen_version_op_val_t xen_pagesize;
+    long xen_version;
+    long xen_pagesize;
     char str[128];
 
-    if ( xc_version(self->xc_handle, XEN_VERSION_version, &xen_version,
-                    sizeof(xen_version)) < 0 )
-        return pyxc_error_to_exception(self->xc_handle);
+    xen_version = xc_version(self->xc_handle, XENVER_version, NULL);
 
-    if ( xc_version(self->xc_handle, XEN_VERSION_extraversion, &xen_extra,
-                    sizeof(xen_extra)) < 0 )
+    if ( xc_version(self->xc_handle, XENVER_extraversion, &xen_extra) != 0 )
         return pyxc_error_to_exception(self->xc_handle);
 
-    memset(&xen_cc, 0, sizeof(xen_cc));
+    if ( xc_version(self->xc_handle, XENVER_compile_info, &xen_cc) != 0 )
+        return pyxc_error_to_exception(self->xc_handle);
 
-    if ( xc_version(self->xc_handle, XEN_VERSION_changeset, &xen_chgset,
-                    sizeof(xen_chgset)) < 0 )
+    if ( xc_version(self->xc_handle, XENVER_changeset, &xen_chgset) != 0 )
         return pyxc_error_to_exception(self->xc_handle);
 
-    if ( xc_version(self->xc_handle, XEN_VERSION_capabilities, &xen_caps,
-                    sizeof(xen_caps)) < 0 )
+    if ( xc_version(self->xc_handle, XENVER_capabilities, &xen_caps) != 0 )
         return pyxc_error_to_exception(self->xc_handle);
 
-    if ( xc_version(self->xc_handle, XEN_VERSION_platform_parameters,
-                    &p_parms, sizeof(p_parms)) < 0 )
+    if ( xc_version(self->xc_handle, XENVER_platform_parameters, &p_parms) != 0 )
         return pyxc_error_to_exception(self->xc_handle);
 
-    if ( xc_version(self->xc_handle, XEN_VERSION_commandline,
-                    &xen_commandline, sizeof(xen_commandline)) < 0 )
+    if ( xc_version(self->xc_handle, XENVER_commandline, &xen_commandline) != 0 )
         return pyxc_error_to_exception(self->xc_handle);
 
     snprintf(str, sizeof(str), "virt_start=0x%"PRI_xen_ulong, p_parms.virt_start);
 
-    if ( xc_version(self->xc_handle, XEN_VERSION_pagesize, &xen_pagesize,
-                    sizeof(xen_pagesize)) < 0 )
+    xen_pagesize = xc_version(self->xc_handle, XENVER_pagesize, NULL);
+    if (xen_pagesize < 0 )
         return pyxc_error_to_exception(self->xc_handle);
 
     return Py_BuildValue("{s:i,s:i,s:s,s:s,s:i,s:s,s:s,s:s,s:s,s:s,s:s,s:s}",
diff --git a/tools/xenstat/libxenstat/src/xenstat.c b/tools/xenstat/libxenstat/src/xenstat.c
index efb68b5..3495f3f 100644
--- a/tools/xenstat/libxenstat/src/xenstat.c
+++ b/tools/xenstat/libxenstat/src/xenstat.c
@@ -621,18 +621,20 @@ unsigned long long xenstat_network_tdrop(xenstat_network * network)
 /* Collect Xen version information */
 static int xenstat_collect_xen_version(xenstat_node * node)
 {
-	xen_version_op_val_t vnum = 0;
+	long vnum = 0;
 	xen_extraversion_t version;
 
 	/* Collect Xen version information if not already collected */
 	if (node->handle->xen_version[0] == '\0') {
 		/* Get the Xen version number and extraversion string */
-		if (xc_version(node->handle->xc_handle,
-			       XEN_VERSION_version, &vnum, sizeof(vnum)) < 0)
+		vnum = xc_version(node->handle->xc_handle,
+			XENVER_version, NULL);
+
+		if (vnum < 0)
 			return 0;
 
-		if (xc_version(node->handle->xc_handle, XEN_VERSION_extraversion,
-			       &version, sizeof(version)) < 0)
+		if (xc_version(node->handle->xc_handle, XENVER_extraversion,
+			&version) < 0)
 			return 0;
 		/* Format the version information as a string and store it */
 		snprintf(node->handle->xen_version, VERSION_SIZE, "%ld.%ld%s",
diff --git a/tools/xentrace/xenctx.c b/tools/xentrace/xenctx.c
index cd280fc..e647179 100644
--- a/tools/xentrace/xenctx.c
+++ b/tools/xentrace/xenctx.c
@@ -1000,8 +1000,7 @@ static void dump_ctx(int vcpu)
             guest_word_size = (cpuctx.msr_efer & 0x400) ? 8 :
                 guest_protected_mode ? 4 : 2;
             /* HVM guest context records are always host-sized */
-            if (xc_version(xenctx.xc_handle, XEN_VERSION_capabilities,
-                           &xen_caps, sizeof(xen_caps)) < 0) {
+            if (xc_version(xenctx.xc_handle, XENVER_capabilities, &xen_caps) != 0) {
                 perror("xc_version");
                 return;
             }
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 02/27] Revert "HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane."
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
  2016-04-25 15:34 ` [PATCH v9 01/27] Revert "libxc/libxl/python/xenstat/ocaml: Use new XEN_VERSION hypercall" Konrad Rzeszutek Wilk
@ 2016-04-25 15:34 ` Konrad Rzeszutek Wilk
  2016-04-25 15:34 ` [PATCH v9 03/27] xsplice: Design document Konrad Rzeszutek Wilk
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:34 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Konrad Rzeszutek Wilk

This reverts commit 2716d875379d538c1dfccad78a99ca7db2e09f90.

As it was decided that the existing XENVER hypercall - while having
grown organically over the years can still be expanded.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/flask/policy/policy/modules/xen/xen.te |   7 +-
 xen/arch/arm/traps.c                         |   1 -
 xen/arch/x86/hvm/hvm.c                       |   1 -
 xen/arch/x86/x86_64/compat/entry.S           |   2 -
 xen/arch/x86/x86_64/entry.S                  |   2 -
 xen/common/compat/kernel.c                   |   2 -
 xen/common/kernel.c                          | 212 +++++----------------------
 xen/include/public/arch-arm.h                |   2 -
 xen/include/public/version.h                 |  72 +--------
 xen/include/public/xen.h                     |   1 -
 xen/include/xen/hypercall.h                  |   4 -
 xen/include/xsm/dummy.h                      |  21 ---
 xen/include/xsm/xsm.h                        |   6 -
 xen/xsm/dummy.c                              |   1 -
 xen/xsm/flask/hooks.c                        |  35 -----
 xen/xsm/flask/policy/access_vectors          |  21 +--
 16 files changed, 43 insertions(+), 347 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index a551756..2a2630d 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -76,12 +76,11 @@ allow dom0_t xen_t:xen2 {
     get_cpu_featureset
 };
 
-# Allow dom0 to use all XENVER_ subops and VERSION subops that have checks.
+# Allow dom0 to use all XENVER_ subops that have checks.
 # Note that dom0 is part of domain_type so this has duplicates.
 allow dom0_t xen_t:version {
     xen_extraversion xen_compile_info xen_capabilities
     xen_changeset xen_pagesize xen_guest_handle xen_commandline
-    extraversion capabilities changeset pagesize guest_handle commandline
 };
 
 allow dom0_t xen_t:mmu memorymap;
@@ -148,12 +147,10 @@ if (guest_writeconsole) {
 # pmu_ctrl is for)
 allow domain_type xen_t:xen2 pmu_use;
 
-# For normal guests all possible except XENVER_commandline, VERSION_changeset,
-# and VERSION_commandline
+# For normal guests all possible except XENVER_commandline.
 allow domain_type xen_t:version {
     xen_extraversion xen_compile_info xen_capabilities
     xen_changeset xen_pagesize xen_guest_handle
-    extraversion capabilities pagesize guest_handle
 };
 
 ###############################################################################
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 1516abd..9abfc3c 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1274,7 +1274,6 @@ static arm_hypercall_t arm_hypercall_table[] = {
     HYPERCALL(multicall, 2),
     HYPERCALL(platform_op, 1),
     HYPERCALL_ARM(vcpu_op, 3),
-    HYPERCALL(version_op, 3),
 };
 
 #ifndef NDEBUG
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index e9d4c6b..8cb6e9e 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4053,7 +4053,6 @@ static const struct {
     COMPAT_CALL(platform_op),
     COMPAT_CALL(mmuext_op),
     HYPERCALL(xenpmu_op),
-    HYPERCALL(version_op),
     HYPERCALL(arch_1)
 };
 
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index 0ff6818..6ca4a54 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -399,7 +399,6 @@ ENTRY(compat_hypercall_table)
         .quad do_tmem_op
         .quad do_ni_hypercall           /* reserved for XenClient */
         .quad do_xenpmu_op              /* 40 */
-        .quad do_version_op
         .rept __HYPERVISOR_arch_0-((.-compat_hypercall_table)/8)
         .quad compat_ni_hypercall
         .endr
@@ -451,7 +450,6 @@ ENTRY(compat_hypercall_args_table)
         .byte 1 /* do_tmem_op               */
         .byte 0 /* reserved for XenClient   */
         .byte 2 /* do_xenpmu_op             */  /* 40 */
-        .byte 3 /* do_version_op            */
         .rept __HYPERVISOR_arch_0-(.-compat_hypercall_args_table)
         .byte 0 /* compat_ni_hypercall      */
         .endr
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 6866e8f..d0f3259 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -735,7 +735,6 @@ ENTRY(hypercall_table)
         .quad do_tmem_op
         .quad do_ni_hypercall       /* reserved for XenClient */
         .quad do_xenpmu_op          /* 40 */
-        .quad do_version_op
         .rept __HYPERVISOR_arch_0-((.-hypercall_table)/8)
         .quad do_ni_hypercall
         .endr
@@ -787,7 +786,6 @@ ENTRY(hypercall_args_table)
         .byte 1 /* do_tmem_op           */
         .byte 0 /* reserved for XenClient */
         .byte 2 /* do_xenpmu_op         */  /* 40 */
-        .byte 3 /* do_version_op        */
         .rept __HYPERVISOR_arch_0-(.-hypercall_args_table)
         .byte 0 /* do_ni_hypercall      */
         .endr
diff --git a/xen/common/compat/kernel.c b/xen/common/compat/kernel.c
index 7a7ca53..df93fdd 100644
--- a/xen/common/compat/kernel.c
+++ b/xen/common/compat/kernel.c
@@ -39,8 +39,6 @@ CHECK_TYPE(capabilities_info);
 
 CHECK_TYPE(domain_handle);
 
-CHECK_TYPE(version_op_val);
-
 #define xennmi_callback compat_nmi_callback
 #define xennmi_callback_t compat_nmi_callback_t
 
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index af2674d..a4a3c36 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -221,47 +221,6 @@ void __init do_initcalls(void)
 
 #endif
 
-static int get_features(struct domain *d, xen_feature_info_t *fi)
-{
-    switch ( fi->submap_idx )
-    {
-    case 0:
-        fi->submap = (1U << XENFEAT_memory_op_vnode_supported);
-        if ( paging_mode_translate(d) )
-            fi->submap |=
-                (1U << XENFEAT_writable_page_tables) |
-                (1U << XENFEAT_auto_translated_physmap);
-        if ( is_hardware_domain(d) )
-            fi->submap |= 1U << XENFEAT_dom0;
-#ifdef CONFIG_X86
-        if ( VM_ASSIST(d, pae_extended_cr3) )
-            fi->submap |= (1U << XENFEAT_pae_pgdir_above_4gb);
-        switch ( d->guest_type )
-        {
-        case guest_type_pv:
-            fi->submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) |
-                          (1U << XENFEAT_highmem_assist) |
-                          (1U << XENFEAT_gnttab_map_avail_bits);
-            break;
-        case guest_type_pvh:
-            fi->submap |= (1U << XENFEAT_hvm_safe_pvclock) |
-                          (1U << XENFEAT_supervisor_mode_kernel) |
-                          (1U << XENFEAT_hvm_callback_vector);
-            break;
-        case guest_type_hvm:
-            fi->submap |= (1U << XENFEAT_hvm_safe_pvclock) |
-                          (1U << XENFEAT_hvm_callback_vector) |
-                          (1U << XENFEAT_hvm_pirqs);
-           break;
-        }
-#endif
-        break;
-    default:
-        return -EINVAL;
-    }
-    return 0;
-}
-
 /*
  * Simple hypercalls.
  */
@@ -339,14 +298,47 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     case XENVER_get_features:
     {
         xen_feature_info_t fi;
-        int rc;
+        struct domain *d = current->domain;
 
         if ( copy_from_guest(&fi, arg, 1) )
             return -EFAULT;
 
-        rc = get_features(current->domain, &fi);
-        if ( rc )
-            return rc;
+        switch ( fi.submap_idx )
+        {
+        case 0:
+            fi.submap = (1U << XENFEAT_memory_op_vnode_supported);
+            if ( VM_ASSIST(d, pae_extended_cr3) )
+                fi.submap |= (1U << XENFEAT_pae_pgdir_above_4gb);
+            if ( paging_mode_translate(d) )
+                fi.submap |= 
+                    (1U << XENFEAT_writable_page_tables) |
+                    (1U << XENFEAT_auto_translated_physmap);
+            if ( is_hardware_domain(d) )
+                fi.submap |= 1U << XENFEAT_dom0;
+#ifdef CONFIG_X86
+            switch ( d->guest_type )
+            {
+            case guest_type_pv:
+                fi.submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) |
+                             (1U << XENFEAT_highmem_assist) |
+                             (1U << XENFEAT_gnttab_map_avail_bits);
+                break;
+            case guest_type_pvh:
+                fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
+                             (1U << XENFEAT_supervisor_mode_kernel) |
+                             (1U << XENFEAT_hvm_callback_vector);
+                break;
+            case guest_type_hvm:
+                fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
+                             (1U << XENFEAT_hvm_callback_vector) |
+                             (1U << XENFEAT_hvm_pirqs);
+                break;
+            }
+#endif
+            break;
+        default:
+            return -EINVAL;
+        }
 
         if ( __copy_to_guest(arg, &fi, 1) )
             return -EFAULT;
@@ -389,122 +381,6 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     return -ENOSYS;
 }
 
-/* Computed by capabilities_cache_init. */
-static xen_capabilities_info_t __read_mostly cached_cap;
-static unsigned int __read_mostly cached_cap_len;
-
-/*
- * Similar to HYPERVISOR_xen_version but with a sane interface
- * (has a length, one can probe for the length) and with one less sub-ops:
- * missing XENVER_compile_info.
- */
-DO(version_op)(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg,
-               unsigned int len)
-{
-    union {
-        xen_version_op_val_t val;
-        xen_feature_info_t fi;
-    } u = {};
-    unsigned int sz = 0;
-    const void *ptr = NULL;
-    int rc = xsm_version_op(XSM_OTHER, cmd);
-
-    if ( rc )
-        return rc;
-
-    /*
-     * The HYPERVISOR_xen_version sub-ops differ in that some return the value,
-     * and some copy it on back on argument. We follow the same rule for all
-     * sub-ops: return the number of bytes written, or negative errno on
-     * failure, and always copy the result in arg. Yeey sanity!
-     */
-    switch ( cmd )
-    {
-    case XEN_VERSION_version:
-        sz = sizeof(xen_version_op_val_t);
-        u.val = (xen_major_version() << 16) | xen_minor_version();
-        break;
-
-    case XEN_VERSION_extraversion:
-        sz = strlen(xen_extra_version()) + 1;
-        ptr = xen_extra_version();
-        break;
-
-    case XEN_VERSION_capabilities:
-        sz = cached_cap_len;
-        ptr = cached_cap;
-        break;
-
-    case XEN_VERSION_changeset:
-        sz = strlen(xen_changeset()) + 1;
-        ptr = xen_changeset();
-        break;
-
-    case XEN_VERSION_platform_parameters:
-        sz = sizeof(xen_version_op_val_t);
-        u.val = HYPERVISOR_VIRT_START;
-        break;
-
-    case XEN_VERSION_get_features:
-        sz = sizeof(xen_feature_info_t);
-
-        if ( guest_handle_is_null(arg) )
-            break;
-
-        if ( copy_from_guest(&u.fi, arg, 1) )
-        {
-            rc = -EFAULT;
-            break;
-        }
-        rc = get_features(current->domain, &u.fi);
-        break;
-
-    case XEN_VERSION_pagesize:
-        sz = sizeof(xen_version_op_val_t);
-        u.val = PAGE_SIZE;
-        break;
-
-    case XEN_VERSION_guest_handle:
-        sz = ARRAY_SIZE(current->domain->handle);
-        ptr = current->domain->handle;
-        break;
-
-    case XEN_VERSION_commandline:
-        sz = strlen(saved_cmdline) + 1;
-        ptr = saved_cmdline;
-        break;
-
-    default:
-        rc = -ENOSYS;
-    }
-
-    if ( rc )
-        return rc;
-
-    /*
-     * This hypercall also allows the client to probe. If it provides
-     * a NULL arg we will return the size of the space it has to
-     * allocate for the specific sub-op.
-     */
-    ASSERT(sz);
-    if ( guest_handle_is_null(arg) )
-        return sz;
-
-    if ( !rc )
-    {
-        unsigned int bytes = min(sz, len);
-
-        if ( copy_to_guest(arg, ptr ? : &u, bytes) )
-            rc = -EFAULT;
-
-        /* We return len (truncate) worth of data even if we fail. */
-        if ( !rc && sz > len )
-            rc = -ENOBUFS;
-    }
-
-    return rc == 0 ? sz : rc;
-}
-
 DO(nmi_op)(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct xennmi_callback cb;
@@ -542,20 +418,6 @@ DO(ni_hypercall)(void)
     return -ENOSYS;
 }
 
-static int __init capabilities_cache_init(void)
-{
-    /*
-     * Pre-populate the cache so we do not have to worry about
-     * simultaneous invocations on safe_strcat by guests and the cache
-     * data becoming garbage.
-     */
-    arch_get_xen_caps(&cached_cap);
-    cached_cap_len = strlen(cached_cap) + 1;
-
-    return 0;
-}
-__initcall(capabilities_cache_init);
-
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index 5f90718..870bc3b 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -128,8 +128,6 @@
  *    * VCPUOP_register_vcpu_info
  *    * VCPUOP_register_runstate_memory_area
  *
- *  HYPERVISOR_version_op
- *   All generic sub-operations
  *
  * Other notes on the ARM ABI:
  *
diff --git a/xen/include/public/version.h b/xen/include/public/version.h
index 78961c9..24a582f 100644
--- a/xen/include/public/version.h
+++ b/xen/include/public/version.h
@@ -30,16 +30,7 @@
 
 #include "xen.h"
 
-/*
- * There are two hypercalls mentioned in here. The XENVER_ are for
- * HYPERCALL_xen_version (17), while VERSION_ are for the
- * HYPERCALL_version_op (41).
- *
- * The subops are very similar except that the later hypercall has a
- * sane interface.
- *
- * NB. All XENVER_ ops return zero on success, except XENVER_{version,pagesize}
- */
+/* NB. All ops return zero on success, except XENVER_{version,pagesize} */
 
 /* arg == NULL; returns major:minor (16:16). */
 #define XENVER_version      0
@@ -96,67 +87,6 @@ typedef struct xen_feature_info xen_feature_info_t;
 #define XENVER_commandline 9
 typedef char xen_commandline_t[1024];
 
-/*
- * The HYPERCALL_version_op has a set of sub-ops which mirror the
- * sub-ops of HYPERCALL_xen_version. However this hypercall differs
- * radically from the former:
- *  - It returns the amount of bytes copied, or
- *  - It will return -XEN_EPERM if the sub-op is denied to the guest.
- *    (Albeit XEN_VERSION_version, XEN_VERSION_platform_parameters, and
- *    XEN_VERSION_get_features will always return an value as guest cannot
- *    survive without this information).
- *  - It will return the requested data in arg.
- *  - It requires an third argument (len) for the length of the
- *    arg. Naturally the arg has to fit the requested data otherwise
- *    -XEN_ENOBUFS is returned.
- *
- * It also offers a mechanism to probe for the amount of bytes an
- * sub-op will require. Having the arg have a NULL handle will
- * return the number of bytes requested for the operation.
- * Or a negative value if an error is encountered.
- */
-
-typedef uint64_t xen_version_op_val_t;
-DEFINE_XEN_GUEST_HANDLE(xen_version_op_val_t);
-
-/*
- * arg == xen_version_op_val_t. Encoded as major:minor (31..16:15..0), while
- * 63..32 are zero.
- */
-#define XEN_VERSION_version             0
-
-/* arg == char[]. Contains NUL terminated utf-8 string. */
-#define XEN_VERSION_extraversion        1
-
-/* arg == char[]. Contains NUL terminated utf-8 string. */
-#define XEN_VERSION_capabilities        3
-
-/* arg == char[]. Contains NUL terminated utf-8 string. */
-#define XEN_VERSION_changeset           4
-
-/* arg == xen_version_op_val_t. */
-#define XEN_VERSION_platform_parameters 5
-
-/*
- * arg = xen_feature_info_t - shares the same structure
- * as the XENVER_get_features.
- */
-#define XEN_VERSION_get_features        6
-
-/* arg == xen_version_op_val_t. */
-#define XEN_VERSION_pagesize            7
-
-/*
- * arg == void.
- *
- * The toolstack fills it out for guest consumption. It is intended to hold
- * the UUID of the guest.
- */
-#define XEN_VERSION_guest_handle        8
-
-/* arg = char[]. Contains NUL terminated utf-8 string. */
-#define XEN_VERSION_commandline         9
-
 #endif /* __XEN_PUBLIC_VERSION_H__ */
 
 /*
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 6ed74ef..37bbb22 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -115,7 +115,6 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define __HYPERVISOR_tmem_op              38
 #define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
 #define __HYPERVISOR_xenpmu_op            40
-#define __HYPERVISOR_version_op           41 /* supersedes xen_version (17) */
 
 /* Architecture-specific hypercall definitions. */
 #define __HYPERVISOR_arch_0               48
diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
index e8d2b81..0c8ae0e 100644
--- a/xen/include/xen/hypercall.h
+++ b/xen/include/xen/hypercall.h
@@ -147,10 +147,6 @@ do_xenoprof_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg);
 extern long
 do_xenpmu_op(unsigned int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg);
 
-extern long
-do_version_op(unsigned int cmd,
-    XEN_GUEST_HANDLE_PARAM(void) arg, unsigned int len);
-
 #ifdef CONFIG_COMPAT
 
 extern int
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index e5dad35..abbe282 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -751,24 +751,3 @@ static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
         return xsm_default_action(XSM_PRIV, current->domain, NULL);
     }
 }
-
-static XSM_INLINE int xsm_version_op (XSM_DEFAULT_ARG uint32_t op)
-{
-    XSM_ASSERT_ACTION(XSM_OTHER);
-    switch ( op )
-    {
-    case XEN_VERSION_version:
-    case XEN_VERSION_platform_parameters:
-    case XEN_VERSION_get_features:
-        /* These MUST always be accessible to any guest by default. */
-        return 0;
-    case XEN_VERSION_extraversion:
-    case XEN_VERSION_capabilities:
-    case XEN_VERSION_pagesize:
-    case XEN_VERSION_guest_handle:
-        /* These can be accessible to a guest. */
-        return xsm_default_action(XSM_HOOK, current->domain, NULL);
-    default:
-        return xsm_default_action(XSM_PRIV, current->domain, NULL);
-    }
-}
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 3cfd953..8ed8ee5 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -197,7 +197,6 @@ struct xsm_operations {
     int (*pmu_op) (struct domain *d, unsigned int op);
 #endif
     int (*xen_version) (uint32_t cmd);
-    int (*version_op) (uint32_t cmd);
 };
 
 #ifdef CONFIG_XSM
@@ -741,11 +740,6 @@ static inline int xsm_xen_version (xsm_default_t def, uint32_t op)
     return xsm_ops->xen_version(op);
 }
 
-static inline int xsm_version_op (xsm_default_t def, uint32_t op)
-{
-    return xsm_ops->version_op(op);
-}
-
 #endif /* XSM_NO_WRAPPERS */
 
 #ifdef CONFIG_MULTIBOOT
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 776dd09..9791ad4 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -163,5 +163,4 @@ void xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, pmu_op);
 #endif
     set_to_dummy_if_null(ops, xen_version);
-    set_to_dummy_if_null(ops, version_op);
 }
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 233612e..6295768 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1664,40 +1664,6 @@ static int flask_xen_version (uint32_t op)
     }
 }
 
-static int flask_version_op (uint32_t op)
-{
-    u32 dsid = domain_sid(current->domain);
-
-    switch ( op )
-    {
-    case XEN_VERSION_version:
-    case XEN_VERSION_platform_parameters:
-    case XEN_VERSION_get_features:
-        /* These MUST always be accessible to any guest by default. */
-        return 0;
-    case XEN_VERSION_extraversion:
-        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
-                            VERSION__EXTRAVERSION, NULL);
-    case XEN_VERSION_capabilities:
-        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
-                            VERSION__CAPABILITIES, NULL);
-    case XEN_VERSION_changeset:
-        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
-                            VERSION__CHANGESET, NULL);
-    case XEN_VERSION_pagesize:
-        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
-                            VERSION__PAGESIZE, NULL);
-    case XEN_VERSION_guest_handle:
-        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
-                            VERSION__GUEST_HANDLE, NULL);
-    case XEN_VERSION_commandline:
-        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
-                            VERSION__COMMANDLINE, NULL);
-    default:
-        return -EPERM;
-    }
-}
-
 long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 int compat_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 
@@ -1837,7 +1803,6 @@ static struct xsm_operations flask_ops = {
     .pmu_op = flask_pmu_op,
 #endif
     .xen_version = flask_xen_version,
-    .version_op = flask_version_op,
 };
 
 static __init void flask_init(void)
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 0ebb56b..bdb7b89 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -500,14 +500,12 @@ class security
     del_ocontext
 }
 
-# Class version is used to describe the XENVER_ and VERSION hypercall.
+# Class version is used to describe the XENVER_ hypercall.
 # Almost all sub-ops are described here - in the default case all of them should
-# be allowed except the XENVER_commandline, VERSION_commandline, and
-# VERSION_changeset.
+# be allowed except the XENVER_commandline.
 #
 # The ones that are omitted are XENVER_version, XENVER_platform_parameters,
-# XENVER_get_features, XEN_VERSION_version, XEN_VERSION_platform_parameters,
-# and XEN_VERSION_get_features - as they MUST always be returned to a guest.
+# and XENVER_get_features  - as they MUST always be returned to a guest.
 #
 class version
 {
@@ -525,17 +523,4 @@ class version
     xen_guest_handle
 # Xen command line.
     xen_commandline
-# --- VERSION hypercall ---
-# Extra informations (-unstable).
-    extraversion
-# Such as "xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64".
-    capabilities
-# Source code changeset.
-    changeset
-# Page size the hypervisor uses.
-    pagesize
-# An value that the control stack can choose.
-    guest_handle
-# Xen command line.
-    commandline
 }
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 03/27] xsplice: Design document
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
  2016-04-25 15:34 ` [PATCH v9 01/27] Revert "libxc/libxl/python/xenstat/ocaml: Use new XEN_VERSION hypercall" Konrad Rzeszutek Wilk
  2016-04-25 15:34 ` [PATCH v9 02/27] Revert "HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane." Konrad Rzeszutek Wilk
@ 2016-04-25 15:34 ` Konrad Rzeszutek Wilk
  2016-04-25 15:34 ` [PATCH v9 04/27] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op Konrad Rzeszutek Wilk
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:34 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Keir Fraser, Tim Deegan, Ian Jackson, Jan Beulich, Konrad Rzeszutek Wilk

A mechanism is required to binarily patch the running hypervisor with new
opcodes that have come about due to primarily security updates.

This document describes the design of the API that would allow us to
upload to the hypervisor binary patches.

This document has been shaped by the input from:
  Martin Pohlack <mpohlack@amazon.de>
  Jan Beulich <jbeulich@suse.com>

Thank you!

Input-from: Martin Pohlack <mpohlack@amazon.de>
Input-from: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>

v1-2: review
v3: Split document in v1 and v2 (todo) to simplify implementation goals.
 - Add const on some structures. Truncate size to uint16_t where it makes sense.
 - Convert 'id' to 'name', Add Ross's comments about what is implemented.
 - Wei's and Ross's reviews.
 - Jan's review comments.
 - Jan's review comments.
    s/int32_t state/uint32_t state/ now that return code is in seperate
    field (rc). Add various other types, such as R_X86_64_PC64 in the list.
    Mention the need for compiler check.
v4:
 - Drop the LOADED->CHECKED state and go directly to CHECKED state. Drop
    LOADED.
v5: Julien mentioned ARM 32-bit would not use ELF64, so make the .xsplice.func
    use uintXX_t types instead of ELF ones. Remove the OUT on idx subfield.
    Mention that 'nr' being zero can be used for probing the number of payloads.
    Update what 'idx' means.
v6: Update what 'idx' means again!
    Move the "Interdependencies section" to make it easier to in the design
    doc the movement of text (when the patch implements it).
    Add also 'version' field to payload.
v9:
   uint64_t to void in struct xsplice_patch_func. Also mention the size
   on 32 and 64 bit hypervisors.
   Make the padding be called opaque.
   Add symbol+offset to Todo
---
 docs/misc/xsplice.markdown | 1047 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 1047 insertions(+)
 create mode 100644 docs/misc/xsplice.markdown

diff --git a/docs/misc/xsplice.markdown b/docs/misc/xsplice.markdown
new file mode 100644
index 0000000..99711bf
--- /dev/null
+++ b/docs/misc/xsplice.markdown
@@ -0,0 +1,1047 @@
+# xSplice Design v1
+
+## Rationale
+
+A mechanism is required to binarily patch the running hypervisor with new
+opcodes that have come about due to primarily security updates.
+
+This document describes the design of the API that would allow us to
+upload to the hypervisor binary patches.
+
+The document is split in four sections:
+
+ * Detailed descriptions of the problem statement.
+ * Design of the data structures.
+ * Design of the hypercalls.
+ * Implementation notes that should be taken into consideration.
+
+
+## Glossary
+
+ * splice - patch in the binary code with new opcodes
+ * trampoline - a jump to a new instruction.
+ * payload - telemetries of the old code along with binary blob of the new
+   function (if needed).
+ * reloc - telemetries contained in the payload to construct proper trampoline.
+
+## History
+
+The document has gone under various reviews and only covers v1 design.
+
+The end of the document has a section titled `Not Yet Done` which
+outlines ideas and design for the future version of this work.
+
+## Multiple ways to patch
+
+The mechanism needs to be flexible to patch the hypervisor in multiple ways
+and be as simple as possible. The compiled code is contiguous in memory with
+no gaps - so we have no luxury of 'moving' existing code and must either
+insert a trampoline to the new code to be executed - or only modify in-place
+the code if there is sufficient space. The placement of new code has to be done
+by hypervisor and the virtual address for the new code is allocated dynamically.
+
+This implies that the hypervisor must compute the new offsets when splicing
+in the new trampoline code. Where the trampoline is added (inside
+the function we are patching or just the callers?) is also important.
+
+To lessen the amount of code in hypervisor, the consumer of the API
+is responsible for identifying which mechanism to employ and how many locations
+to patch. Combinations of modifying in-place code, adding trampoline, etc
+has to be supported. The API should allow read/write any memory within
+the hypervisor virtual address space.
+
+We must also have a mechanism to query what has been applied and a mechanism
+to revert it if needed.
+
+## Workflow
+
+The expected workflows of higher-level tools that manage multiple patches
+on production machines would be:
+
+ * The first obvious task is loading all available / suggested
+   hotpatches when they are available.
+ * Whenever new hotpatches are installed, they should be loaded too.
+ * One wants to query which modules have been loaded at runtime.
+ * If unloading is deemed safe (see unloading below), one may want to
+   support a workflow where a specific hotpatch is marked as bad and
+   unloaded.
+
+## Patching code
+
+The first mechanism to patch that comes in mind is in-place replacement.
+That is replace the affected code with new code. Unfortunately the x86
+ISA is variable size which places limits on how much space we have available
+to replace the instructions. That is not a problem if the change is smaller
+than the original opcode and we can fill it with nops. Problems will
+appear if the replacement code is longer.
+
+The second mechanism is by ti replace the call or jump to the
+old function with the address of the new function.
+
+A third mechanism is to add a jump to the new function at the
+start of the old function. N.B. The Xen hypervisor implements the third
+mechanism. See `Trampoline (e9 opcode)` section for more details.
+
+### Example of trampoline and in-place splicing
+
+As example we will assume the hypervisor does not have XSA-132 (see
+*domctl/sysctl: don't leak hypervisor stack to toolstacks*
+4ff3449f0e9d175ceb9551d3f2aecb59273f639d) and we would like to binary patch
+the hypervisor with it. The original code looks as so:
+
+<pre>
+   48 89 e0                  mov    %rsp,%rax  
+   48 25 00 80 ff ff         and    $0xffffffffffff8000,%rax  
+</pre>
+
+while the new patched hypervisor would be:
+
+<pre>
+   48 c7 45 b8 00 00 00 00   movq   $0x0,-0x48(%rbp)  
+   48 c7 45 c0 00 00 00 00   movq   $0x0,-0x40(%rbp)  
+   48 c7 45 c8 00 00 00 00   movq   $0x0,-0x38(%rbp)  
+   48 89 e0                  mov    %rsp,%rax  
+   48 25 00 80 ff ff         and    $0xffffffffffff8000,%rax  
+</pre>
+
+This is inside the arch_do_domctl. This new change adds 21 extra
+bytes of code which alters all the offsets inside the function. To alter
+these offsets and add the extra 21 bytes of code we might not have enough
+space in .text to squeeze this in.
+
+As such we could simplify this problem by only patching the site
+which calls arch_do_domctl:
+
+<pre>
+do_domctl:  
+ e8 4b b1 05 00          callq  ffff82d08015fbb9 <arch_do_domctl>  
+</pre>
+
+with a new address for where the new `arch_do_domctl` would be (this
+area would be allocated dynamically).
+
+Astute readers will wonder what we need to do if we were to patch `do_domctl`
+- which is not called directly by hypervisor but on behalf of the guests via
+the `compat_hypercall_table` and `hypercall_table`.
+Patching the offset in `hypercall_table` for `do_domctl:
+(ffff82d080103079 <do_domctl>:)
+
+<pre>
+
+ ffff82d08024d490:   79 30  
+ ffff82d08024d492:   10 80 d0 82 ff ff   
+
+</pre>
+
+with the new address where the new `do_domctl` is possible. The other
+place where it is used is in `hvm_hypercall64_table` which would need
+to be patched in a similar way. This would require an in-place splicing
+of the new virtual address of `arch_do_domctl`.
+
+In summary this example patched the callee of the affected function by
+ * allocating memory for the new code to live in,
+ * changing the virtual address in all the functions which called the old
+   code (computing the new offset, patching the callq with a new callq).
+ * changing the function pointer tables with the new virtual address of
+   the function (splicing in the new virtual address). Since this table
+   resides in the .rodata section we would need to temporarily change the
+   page table permissions during this part.
+
+However it has drawbacks - the safety checks which have to make sure
+the function is not on the stack - must also check every caller. For some
+patches this could mean - if there were an sufficient large amount of
+callers - that we would never be able to apply the update.
+
+Having the patching done at predetermined instances where the stacks
+are not deep mostly solves this problem.
+
+### Example of different trampoline patching.
+
+An alternative mechanism exists where we can insert a trampoline in the
+existing function to be patched to jump directly to the new code. This
+lessens the locations to be patched to one but it puts pressure on the
+CPU branching logic (I-cache, but it is just one unconditional jump).
+
+For this example we will assume that the hypervisor has not been compiled
+with fe2e079f642effb3d24a6e1a7096ef26e691d93e (XSA-125: *pre-fill structures
+for certain HYPERVISOR_xen_version sub-ops*) which mem-sets an structure
+in `xen_version` hypercall. This function is not called **anywhere** in
+the hypervisor (it is called by the guest) but referenced in the
+`compat_hypercall_table` and `hypercall_table` (and indirectly called
+from that). Patching the offset in `hypercall_table` for the old
+`do_xen_version` (ffff82d080112f9e <do_xen_version>)
+
+</pre>
+ ffff82d08024b270 <hypercall_table>:   
+ ...  
+ ffff82d08024b2f8:   9e 2f 11 80 d0 82 ff ff  
+
+</pre>
+
+with the new address where the new `do_xen_version` is possible. The other
+place where it is used is in `hvm_hypercall64_table` which would need
+to be patched in a similar way. This would require an in-place splicing
+of the new virtual address of `do_xen_version`.
+
+An alternative solution would be to patch insert a trampoline in the
+old `do_xen_version' function to directly jump to the new `do_xen_version`.
+
+<pre>
+ ffff82d080112f9e do_xen_version:  
+ ffff82d080112f9e:       48 c7 c0 da ff ff ff    mov    $0xffffffffffffffda,%rax  
+ ffff82d080112fa5:       83 ff 09                cmp    $0x9,%edi  
+ ffff82d080112fa8:       0f 87 24 05 00 00       ja     ffff82d0801134d2 ; do_xen_version+0x534  
+</pre>
+
+with:
+
+<pre>
+ ffff82d080112f9e do_xen_version:  
+ ffff82d080112f9e:       e9 XX YY ZZ QQ          jmpq   [new do_xen_version]  
+</pre>
+
+which would lessen the amount of patching to just one location.
+
+In summary this example patched the affected function to jump to the
+new replacement function which required:
+ * allocating memory for the new code to live in,
+ * inserting trampoline with new offset in the old function to point to the
+   new function.
+ * Optionally we can insert in the old function a trampoline jump to an function
+   providing an BUG_ON to catch errant code.
+
+The disadvantage of this are that the unconditional jump will consume a small
+I-cache penalty. However the simplicity of the patching and higher chance
+of passing safety checks make this a worthwhile option.
+
+This patching has a similar drawback as inline patching - the safety
+checks have to make sure the function is not on the stack. However
+since we are replacing at a higher level (a full function as opposed
+to various offsets within functions) the checks are simpler.
+
+Having the patching done at predetermined instances where the stacks
+are not deep mostly solves this problem as well.
+
+### Security
+
+With this method we can re-write the hypervisor - and as such we **MUST** be
+diligent in only allowing certain guests to perform this operation.
+
+Furthermore with SecureBoot or tboot, we **MUST** also verify the signature
+of the payload to be certain it came from a trusted source and integrity
+was intact.
+
+As such the hypercall **MUST** support an XSM policy to limit what the guest
+is allowed to invoke. If the system is booted with signature checking the
+signature checking will be enforced.
+
+## Design of payload format
+
+The payload **MUST** contain enough data to allow us to apply the update
+and also safely reverse it. As such we **MUST** know:
+
+ * The locations in memory to be patched. This can be determined dynamically
+   via symbols or via virtual addresses.
+ * The new code that will be patched in.
+
+This binary format can be constructed using an custom binary format but
+there are severe disadvantages of it:
+
+ * The format might need to be changed and we need an mechanism to accommodate
+   that.
+ * It has to be platform agnostic.
+ * Easily constructed using existing tools.
+
+As such having the payload in an ELF file is the sensible way. We would be
+carrying the various sets of structures (and data) in the ELF sections under
+different names and with definitions.
+
+Note that every structure has padding. This is added so that the hypervisor
+can re-use those fields as it sees fit.
+
+Earlier design attempted to ineptly explain the relations of the ELF sections
+to each other without using proper ELF mechanism (sh_info, sh_link, data
+structures using Elf types, etc). This design will explain the structures
+and how they are used together and not dig in the ELF format - except mention
+that the section names should match the structure names.
+
+The xSplice payload is a relocatable ELF binary. A typical binary would have:
+
+ * One or more .text sections.
+ * Zero or more read-only data sections.
+ * Zero or more data sections.
+ * Relocations for each of these sections.
+
+It may also have some architecture-specific sections. For example:
+
+ * Alternatives instructions.
+ * Bug frames.
+ * Exception tables.
+ * Relocations for each of these sections.
+
+The xSplice core code loads the payload as a standard ELF binary, relocates it
+and handles the architecture-specifc sections as needed. This process is much
+like what the Linux kernel module loader does.
+
+The payload contains a section (xsplice_patch_func) with an array of structures
+describing the functions to be patched:
+
+<pre>
+struct xsplice_patch_func {  
+    const char *name;  
+    void *new_addr;  
+    void *old_addr;  
+    uint32_t new_size;  
+    uint32_t old_size;  
+    uint8_t version;  
+    uint8_t opaque[31];  
+};  
+</pre>
+
+The size of the structure is 64 bytes on 64-bit hypervisors. It will be
+52 on 32-bit hypervisors.
+
+* `name` is the symbol name of the old function. Only used if `old_addr` is
+   zero, otherwise will be used during dynamic linking (when hypervisor loads
+   the payload).
+
+* `old_addr` is the address of the function to be patched and is filled in at
+  payload generation time if hypervisor function address is known. If unknown,
+  the value *MUST* be zero and the hypervisor will attempt to resolve the address.
+
+* `new_addr` is the address of the function that is replacing the old
+  function. The address is filled in during relocation. The value **MUST** be
+  the address of the new function in the file.
+
+* `old_size` and `new_size` contain the sizes of the respective functions in bytes.
+   The value of `old_size` **MUST** not be zero.
+
+* `version` is to be one.
+
+* `opaque` **MUST** be zero.
+
+The size of the `xsplice_patch_func` array is determined from the ELF section
+size.
+
+When applying the patch the hypervisor iterates over each `xsplice_patch_func`
+structure and the core code inserts a trampoline at `old_addr` to `new_addr`.
+The `new_addr` is altered when the ELF payload is loaded.
+
+When reverting a patch, the hypervisor iterates over each `xsplice_patch_func`
+and the core code copies the data from the undo buffer (private internal copy)
+to `old_addr`.
+
+## Hypercalls
+
+We will employ the sub operations of the system management hypercall (sysctl).
+There are to be four sub-operations:
+
+ * upload the payloads.
+ * listing of payloads summary uploaded and their state.
+ * getting an particular payload summary and its state.
+ * command to apply, delete, or revert the payload.
+
+Most of the actions are asynchronous therefore the caller is responsible
+to verify that it has been applied properly by retrieving the summary of it
+and verifying that there are no error codes associated with the payload.
+
+We **MUST** make some of them asynchronous due to the nature of patching
+it requires every physical CPU to be lock-step with each other.
+The patching mechanism while an implementation detail, is not an short
+operation and as such the design **MUST** assume it will be an long-running
+operation.
+
+The sub-operations will spell out how preemption is to be handled (if at all).
+
+Furthermore it is possible to have multiple different payloads for the same
+function. As such an unique name per payload has to be visible to allow proper manipulation.
+
+The hypercall is part of the `xen_sysctl`. The top level structure contains
+one uint32_t to determine the sub-operations and one padding field which
+*MUST* always be zero.
+
+<pre>
+struct xen_sysctl_xsplice_op {  
+    uint32_t cmd;                   /* IN: XEN_SYSCTL_XSPLICE_*. */  
+    uint32_t pad;                   /* IN: Always zero. */  
+	union {  
+          ... see below ...  
+        } u;  
+};  
+
+</pre>
+while the rest of hypercall specific structures are part of the this structure.
+
+### Basic type: struct xen_xsplice_name
+
+Most of the hypercalls employ an shared structure called `struct xen_xsplice_name`
+which contains:
+
+ * `name` - pointer where the string for the name is located.
+ * `size` - the size of the string
+ * `pad` - padding - to be zero.
+
+The structure is as follow:
+
+<pre>
+/*  
+ *  Uniquely identifies the payload.  Should be human readable.  
+ * Includes the NUL terminator  
+ */  
+#define XEN_XSPLICE_NAME_SIZE 128  
+struct xen_xsplice_name {  
+    XEN_GUEST_HANDLE_64(char) name;         /* IN, pointer to name. */  
+    uint16_t size;                          /* IN, size of name. May be upto   
+                                               XEN_XSPLICE_NAME_SIZE. */  
+    uint16_t pad[3];                        /* IN: MUST be zero. */ 
+};  
+</pre>
+
+### XEN_SYSCTL_XSPLICE_UPLOAD (0)
+
+Upload a payload to the hypervisor. The payload is verified
+against basic checks and if there are any issues the proper return code
+will be returned. The payload is not applied at this time - that is
+controlled by *XEN_SYSCTL_XSPLICE_ACTION*.
+
+The caller provides:
+
+ * A `struct xen_xsplice_name` called `name` which has the unique name.
+ * `size` the size of the ELF payload (in bytes).
+ * `payload` the virtual address of where the ELF payload is.
+
+The `name` could be an UUID that stays fixed forever for a given
+payload. It can be embedded into the ELF payload at creation time
+and extracted by tools.
+
+The return value is zero if the payload was succesfully uploaded.
+Otherwise an -XEN_EXX return value is provided. Duplicate `name` are not supported.
+
+The `payload` is the ELF payload as mentioned in the `Payload format` section.
+
+The structure is as follow:
+
+<pre>
+struct xen_sysctl_xsplice_upload {  
+    xen_xsplice_name_t name;            /* IN, name of the patch. */  
+    uint64_t size;                      /* IN, size of the ELF file. */  
+    XEN_GUEST_HANDLE_64(uint8) payload; /* IN: ELF file. */  
+};  
+</pre>
+
+### XEN_SYSCTL_XSPLICE_GET (1)
+
+Retrieve an status of an specific payload. This caller provides:
+
+ * A `struct xen_xsplice_name` called `name` which has the unique name.
+ * A `struct xen_xsplice_status` structure. The member values will
+   be over-written upon completion.
+
+Upon completion the `struct xen_xsplice_status` is updated.
+
+ * `status` - indicates the current status of the payload:
+   * *XSPLICE_STATUS_CHECKED*  (1) loaded and the ELF payload safety checks passed.
+   * *XSPLICE_STATUS_APPLIED* (2) loaded, checked, and applied.
+   *  No other value is possible.
+ * `rc` - -XEN_EXX type errors encountered while performing the last
+   XSPLICE_ACTION_* operation. The normal values can be zero or -XEN_EAGAIN which
+   respectively mean: success or operation in progress. Other values
+   imply an error occurred. If there is an error in `rc`, `status` will **NOT**
+   have changed.
+
+The return value of the hypercall is zero on success and -XEN_EXX on failure.
+(Note that the `rc`` value can be different from the return value, as in
+rc=-XEN_EAGAIN and return value can be 0).
+
+For example, supposing there is an payload:
+
+<pre>
+ status: XSPLICE_STATUS_CHECKED
+ rc: 0
+</pre>
+
+We apply an action - XSPLICE_ACTION_REVERT - to revert it (which won't work
+as we have not even applied it. Afterwards we will have:
+
+<pre>
+ status: XSPLICE_STATUS_CHECKED
+ rc: -XEN_EINVAL
+</pre>
+
+It has failed but it remains loaded.
+
+This operation is synchronous and does not require preemption.
+
+The structure is as follow:
+
+<pre>
+struct xen_xsplice_status {  
+#define XSPLICE_STATUS_CHECKED      1  
+#define XSPLICE_STATUS_APPLIED      2  
+    uint32_t state;                 /* OUT: XSPLICE_STATE_*. */  
+    int32_t rc;                     /* OUT: 0 if no error, otherwise -XEN_EXX. */  
+};  
+
+struct xen_sysctl_xsplice_get {  
+    xen_xsplice_name_t name;        /* IN, the name of the payload. */  
+    xen_xsplice_status_t status;    /* IN/OUT: status of the payload. */  
+};  
+</pre>
+
+### XEN_SYSCTL_XSPLICE_LIST (2)
+
+Retrieve an array of abbreviated status and names of payloads that are loaded in the
+hypervisor.
+
+The caller provides:
+
+ * `version`. Version of the payload. Caller should re-use the field provided by
+    the hypervisor. If the value differs the data is stale.
+ * `idx` index iterator. The index into the hypervisor's payload count. It is
+    recommended that on first invocation zero be used so that `nr` (which the
+    hypervisor will update with the remaining payload count) be provided.
+    Also the hypervisor will provide `version` with the most current value.
+ * `nr` the max number of entries to populate. Can be zero which will result
+    in the hypercall being a probing one and return the number of payloads
+    (and update the `version`).
+ * `pad` - *MUST* be zero.
+ * `status` virtual address of where to write `struct xen_xsplice_status`
+   structures. Caller *MUST* allocate up to `nr` of them.
+ * `name` - virtual address of where to write the unique name of the payload.
+   Caller *MUST* allocate up to `nr` of them. Each *MUST* be of
+   **XEN_XSPLICE_NAME_SIZE** size. Note that **XEN_XSPLICE_NAME_SIZE** includes
+   the NUL terminator.
+ * `len` - virtual address of where to write the length of each unique name
+   of the payload. Caller *MUST* allocate up to `nr` of them. Each *MUST* be
+   of sizeof(uint32_t) (4 bytes).
+
+If the hypercall returns an positive number, it is the number (upto `nr`
+provided to the hypercall) of the payloads returned, along with `nr` updated
+with the number of remaining payloads, `version` updated (it may be the same
+across hypercalls - if it varies the data is stale and further calls could
+fail). The `status`, `name`, and `len`' are updated at their designed index
+value (`idx`) with the returned value of data.
+
+If the hypercall returns -XEN_E2BIG the `nr` is too big and should be
+lowered.
+
+If the hypercall returns an zero value there are no more payloads.
+
+Note that due to the asynchronous nature of hypercalls the control domain might
+have added or removed a number of payloads making this information stale. It is
+the responsibility of the toolstack to use the `version` field to check
+between each invocation. if the version differs it should discard the stale
+data and start from scratch. It is OK for the toolstack to use the new
+`version` field.
+
+The `struct xen_xsplice_status` structure contains an status of payload which includes:
+
+ * `status` - indicates the current status of the payload:
+   * *XSPLICE_STATUS_CHECKED*  (1) loaded and the ELF payload safety checks passed.
+   * *XSPLICE_STATUS_APPLIED* (2) loaded, checked, and applied.
+   *  No other value is possible.
+ * `rc` - -XEN_EXX type errors encountered while performing the last
+   XSPLICE_ACTION_* operation. The normal values can be zero or -XEN_EAGAIN which
+   respectively mean: success or operation in progress. Other values
+   imply an error occurred. If there is an error in `rc`, `status` will **NOT**
+   have changed.
+
+The structure is as follow:
+
+<pre>
+struct xen_sysctl_xsplice_list {  
+    uint32_t version;                       /* OUT: Hypervisor stamps value.
+                                               If varies between calls, we are  
+                                               getting stale data. */  
+    uint32_t idx;                           /* IN: Index into hypervisor list. */
+    uint32_t nr;                            /* IN: How many status, names, and len  
+                                               should be filled out. Can be zero to get  
+                                               amount of payloads and version.  
+                                               OUT: How many payloads left. */  
+    uint32_t pad;                           /* IN: Must be zero. */  
+    XEN_GUEST_HANDLE_64(xen_xsplice_status_t) status;  /* OUT. Must have enough  
+                                               space allocate for nr of them. */  
+    XEN_GUEST_HANDLE_64(char) id;           /* OUT: Array of names. Each member  
+                                               MUST XEN_XSPLICE_NAME_SIZE in size.  
+                                               Must have nr of them. */  
+    XEN_GUEST_HANDLE_64(uint32) len;        /* OUT: Array of lengths of name's.  
+                                               Must have nr of them. */  
+};  
+</pre>
+
+### XEN_SYSCTL_XSPLICE_ACTION (3)
+
+Perform an operation on the payload structure referenced by the `name` field.
+The operation request is asynchronous and the status should be retrieved
+by using either **XEN_SYSCTL_XSPLICE_GET** or **XEN_SYSCTL_XSPLICE_LIST** hypercall.
+
+The caller provides:
+
+ * A 'struct xen_xsplice_name` `name` containing the unique name.
+ * `cmd` the command requested:
+  * *XSPLICE_ACTION_CHECK* (1) check that the payload will apply properly.
+    This also verfies the payload - which may require SecureBoot firmware
+    calls. This is the initial state an payload is in.
+  * *XSPLICE_ACTION_UNLOAD* (2) unload the payload.
+   Any further hypercalls against the `name` will result in failure unless
+   **XEN_SYSCTL_XSPLICE_UPLOAD** hypercall is perfomed with same `name`.
+  * *XSPLICE_ACTION_REVERT* (3) revert the payload. If the operation takes
+  more time than the upper bound of time the `rc` in `xen_xsplice_status'
+  retrieved via **XEN_SYSCTL_XSPLICE_GET** will be -XEN_EBUSY.
+  * *XSPLICE_ACTION_APPLY* (4) apply the payload. If the operation takes
+  more time than the upper bound of time the `rc` in `xen_xsplice_status'
+  retrieved via **XEN_SYSCTL_XSPLICE_GET** will be -XEN_EBUSY.
+  * *XSPLICE_ACTION_REPLACE* (5) revert all applied payloads and apply this
+  payload. If the operation takes more time than the upper bound of time
+  the `rc` in `xen_xsplice_status' retrieved via **XEN_SYSCTL_XSPLICE_GET**
+  will be -XEN_EBUSY.
+ * `time` the upper bound of time (ms) the cmd should take. Zero means infinite.
+   If within the time the operation does not succeed the operation would go in
+   error state.
+ * `pad` - *MUST* be zero.
+
+The return value will be zero unless the provided fields are incorrect.
+
+The structure is as follow:
+
+<pre>
+#define XSPLICE_ACTION_CHECK   1  
+#define XSPLICE_ACTION_UNLOAD  2  
+#define XSPLICE_ACTION_REVERT  3  
+#define XSPLICE_ACTION_APPLY   4  
+#define XSPLICE_ACTION_REPLACE 5  
+struct xen_sysctl_xsplice_action {  
+    xen_xsplice_name_t name;                /* IN, name of the patch. */  
+    uint32_t cmd;                           /* IN: XSPLICE_ACTION_* */  
+    uint32_t time;                          /* IN: Zero if no timeout. */   
+                                            /* Or upper bound of time (ms) */   
+                                            /* for operation to take. */  
+};  
+
+</pre>
+
+## State diagrams of XSPLICE_ACTION commands.
+
+There is a strict ordering state of what the commands can be.
+The XSPLICE_ACTION prefix has been dropped to easy reading and
+does not include the XSPLICE_STATES:
+
+<pre>
+              /->\  
+              \  /  
+ UNLOAD <--- CHECK ---> REPLACE|APPLY --> REVERT --\  
+                \                                  |  
+                 \-------------------<-------------/  
+
+</pre>
+## State transition table of XSPLICE_ACTION commands and XSPLICE_STATUS.
+
+Note that:
+
+ - The CHECKED state is the starting one achieved with *XEN_SYSCTL_XSPLICE_UPLOAD* hypercall.
+ - The REVERT operation on success will automatically move to the CHECKED state.
+ - There are two STATES: CHECKED and APPLIED.
+ - There are five actions (aka commands): CHECK, APPLY, REPLACE, REVERT, and UNLOAD.
+
+The state transition table of valid states and action states:
+
+<pre>
+
++---------+---------+--------------------------------+-------+--------+
+| ACTION  | Current | Result                         | Next STATE:    |
+| ACTION  | STATE   |                                |CHECKED|APPLIED |
++---------+----------+-------------------------------+-------+--------+
+| CHECK   | CHECKED | Check payload (once more, no)  |   x   |        |
+|         |         | errors)                        |       |        |
++---------+---------+--------------------------------+-------+--------+
+| CHECK   | CHECKED | Check payload (once more, with |       |        |
+|         |         | errors)                        |       |        |
++---------+---------+--------------------------------+-------+--------+
+| UNLOAD  | CHECKED | Unload payload. Always works.  |       |        |
+|         |         | No next states.                |       |        |
++---------+---------+--------------------------------+-------+--------+
+| APPLY   | CHECKED | Apply payload (success).       |       |   x    |
++---------+---------+--------------------------------+-------+--------+
+| APPLY   | CHECKED | Apply payload (error|timeout)  |   x   |        |
++---------+---------+--------------------------------+-------+--------+
+| REPLACE | CHECKED | Revert payloads and apply new  |       |   x    |
+|         |         | payload with success.          |       |        |
++---------+---------+--------------------------------+-------+--------+
+| REPLACE | CHECKED | Revert payloads and apply new  |   x   |        |
+|         |         | payload with error.            |       |        |
++---------+---------+--------------------------------+-------+--------+
+| REVERT  | APPLIED | Revert payload (success).      |   x   |        |
++---------+---------+--------------------------------+-------+--------+
+| REVERT  | APPLIED | Revert payload (error|timeout) |       |   x    |
++---------+---------+--------------------------------+-------+--------+
+</pre>
+
+All the other state transitions are invalid.
+
+## Sequence of events.
+
+The normal sequence of events is to:
+
+ 1. *XEN_SYSCTL_XSPLICE_UPLOAD* to upload the payload. If there are errors *STOP* here.
+ 2. *XEN_SYSCTL_XSPLICE_GET* to check the `->rc`. If *-XEN_EAGAIN* spin. If zero go to next step.
+ 3. *XEN_SYSCTL_XSPLICE_ACTION* with *XSPLICE_ACTION_APPLY* to apply the patch.
+ 4. *XEN_SYSCTL_XSPLICE_GET* to check the `->rc`. If in *-XEN_EAGAIN* spin. If zero exit with success.
+
+
+## Addendum
+
+Implementation quirks should not be discussed in a design document.
+
+However these observations can provide aid when developing against this
+document.
+
+
+### Alternative assembler
+
+Alternative assembler is a mechanism to use different instructions depending
+on what the CPU supports. This is done by providing multiple streams of code
+that can be patched in - or if the CPU does not support it - padded with
+`nop` operations. The alternative assembler macros cause the compiler to
+expand the code to place a most generic code in place - emit a special
+ELF .section header to tag this location. During run-time the hypervisor
+can leave the areas alone or patch them with an better suited opcodes.
+
+Note that patching functions that copy to or from guest memory requires
+to support alternative support. For example this can be due to SMAP
+(specifically *stac* and *clac* operations) which is enabled on Broadwell
+and later architectures. It may be related to other alternative instructions.
+
+### When to patch
+
+During the discussion on the design two candidates bubbled where
+the call stack for each CPU would be deterministic. This would
+minimize the chance of the patch not being applied due to safety
+checks failing. Safety checks such as not patching code which
+is on the stack - which can lead to corruption.
+
+#### Rendezvous code instead of stop_machine for patching
+
+The hypervisor's time rendezvous code runs synchronously across all CPUs
+every second. Using the stop_machine to patch can stall the time rendezvous
+code and result in NMI. As such having the patching be done at the tail
+of rendezvous code should avoid this problem.
+
+However the entrance point for that code is
+do_softirq->timer_softirq_action->time_calibration
+which ends up calling on_selected_cpus on remote CPUs.
+
+The remote CPUs receive CALL_FUNCTION_VECTOR IPI and execute the
+desired function.
+
+#### Before entering the guest code.
+
+Before we call VMXResume we check whether any soft IRQs need to be executed.
+This is a good spot because all Xen stacks are effectively empty at
+that point.
+
+To randezvous all the CPUs an barrier with an maximum timeout (which
+could be adjusted), combined with forcing all other CPUs through the
+hypervisor with IPIs, can be utilized to execute lockstep instructions
+on all CPUs.
+
+The approach is similar in concept to stop_machine and the time rendezvous
+but is time-bound. However the local CPU stack is much shorter and
+a lot more deterministic.
+
+This is implemented in the Xen Project hypervisor.
+
+### Compiling the hypervisor code
+
+Hotpatch generation often requires support for compiling the target
+with -ffunction-sections / -fdata-sections.  Changes would have to
+be done to the linker scripts to support this.
+
+### Generation of xSplice ELF payloads
+
+The design of that is not discussed in this design.
+
+This is implemented in a seperate tool which lives in a seperate
+GIT repo.
+
+Currently it resides at https://github.com/rosslagerwall/xsplice-build
+
+### Exception tables and symbol tables growth
+
+We may need support for adapting or augmenting exception tables if
+patching such code.  Hotpatches may need to bring their own small
+exception tables (similar to how Linux modules support this).
+
+If supporting hotpatches that introduce additional exception-locations
+is not important, one could also change the exception table in-place
+and reorder it afterwards.
+
+As found almost every patch (XSA) to a non-trivial function requires
+additional entries in the exception table and/or the bug frames.
+
+This is implemented in the Xen Project hypervisor.
+
+### .rodata sections
+
+The patching might require strings to be updated as well. As such we must be
+also able to patch the strings as needed. This sounds simple - but the compiler
+has a habit of coalescing strings that are the same - which means if we in-place
+alter the strings - other users will be inadvertently affected as well.
+
+This is also where pointers to functions live - and we may need to patch this
+as well. And switch-style jump tables.
+
+To guard against that we must be prepared to do patching similar to
+trampoline patching or in-line depending on the flavour. If we can
+do in-line patching we would need to:
+
+ * alter `.rodata` to be writeable.
+ * inline patch.
+ * alter `.rodata` to be read-only.
+
+If are doing trampoline patching we would need to:
+
+ * allocate a new memory location for the string.
+ * all locations which use this string will have to be updated to use the
+   offset to the string.
+ * mark the region RO when we are done.
+
+The trampoline patching is implemented in the Xen Project hypervisor.
+
+### .bss and .data sections.
+
+In place patching writable data is not suitable as it is unclear what should be done
+depending on the current state of data. As such it should not be attempted.
+
+However, functions which are being patched can bring in changes to strings
+(.data or .rodata section changes), or even to .bss sections.
+
+As such the ELF payload can introduce new .rodata, .bss, and .data sections.
+Patching in the new function will end up also patching in the new .rodata
+section and the new function will reference the new string in the new
+.rodata section.
+
+This is implemented in the Xen Project hypervisor.
+
+### Security
+
+Only the privileged domain should be allowed to do this operation.
+
+
+# Not Yet Done
+
+This is for further development of xSplice.
+
+## TODO Goals
+
+The implementation must also have a mechanism for (in no particular order):
+
+ * Be able to lookup in the Xen hypervisor the symbol names of functions from the
+   ELF payload. (Either as `symbol` or `symbol`+`offset`).
+ * Be able to patch .rodata, .bss, and .data sections.
+ * Deal with NMI/MCE checks during patching instead of ignoring them.
+ * Further safety checks (blacklist of which functions cannot be patched, check
+   the stack, make sure the payload is built with same compiler as hypervisor).
+   Specifically we want to make sure that xSplice codepaths cannot be patched.
+ * NOP out the code sequence if `new_size` is zero.
+ * Deal with other relocation types:  R_X86_64_[8,16,32,32S], R_X86_64_PC[8,16,64]
+   in payload file.
+ * An dependency mechanism for the payloads. To use that information to load:
+    - The appropiate payload. To verify that payload is built against the
+      hypervisor. This can be done via the `build-id`
+      or via providing an copy of the old code - so that the hypervisor can
+       verify it against the code in memory.
+    - To construct an appropiate order of payloads to load in case they
+      depend on each other.
+
+### Handle inlined __LINE__
+
+This problem is related to hotpatch construction
+and potentially has influence on the design of the hotpatching
+infrastructure in Xen.
+
+For example:
+
+We have file1.c with functions f1 and f2 (in that order).  f2 contains a
+BUG() (or WARN()) macro and at that point embeds the source line number
+into the generated code for f2.
+
+Now we want to hotpatch f1 and the hotpatch source-code patch adds 2
+lines to f1 and as a consequence shifts out f2 by two lines.  The newly
+constructed file1.o will now contain differences in both binary
+functions f1 (because we actually changed it with the applied patch) and
+f2 (because the contained BUG macro embeds the new line number).
+
+Without additional information, an algorithm comparing file1.o before
+and after hotpatch application will determine both functions to be
+changed and will have to include both into the binary hotpatch.
+
+Options:
+
+1. Transform source code patches for hotpatches to be line-neutral for
+   each chunk.  This can be done in almost all cases with either
+   reformatting of the source code or by introducing artificial
+   preprocessor "#line n" directives to adjust for the introduced
+   differences.
+
+   This approach is low-tech and simple.  Potentially generated
+   backtraces and existing debug information refers to the original
+   build and does not reflect hotpatching state except for actually
+   hotpatched functions but should be mostly correct.
+
+2. Ignoring the problem and living with artificially large hotpatches
+   that unnecessarily patch many functions.
+
+   This approach might lead to some very large hotpatches depending on
+   content of specific source file.  It may also trigger pulling in
+   functions into the hotpatch that cannot reasonable be hotpatched due
+   to limitations of a hotpatching framework (init-sections, parts of
+   the hotpatching framework itself, ...) and may thereby prevent us
+   from patching a specific problem.
+
+   The decision between 1. and 2. can be made on a patch--by-patch
+   basis.
+
+3. Introducing an indirection table for storing line numbers and
+   treating that specially for binary diffing. Linux may follow
+   this approach.
+
+   We might either use this indirection table for runtime use and patch
+   that with each hotpatch (similarly to exception tables) or we might
+   purely use it when building hotpatches to ignore functions that only
+   differ at exactly the location where a line-number is embedded.
+
+For BUG(), WARN(), etc., the line number is embedded into the bug frame, not
+the function itself.
+
+Similar considerations are true to a lesser extent for __FILE__, but it
+could be argued that file renaming should be done outside of hotpatches.
+
+### xSplice interdependencies
+
+xSplice patches interdependencies are tricky.
+
+There are the ways this can be addressed:
+ * A single large patch that subsumes and replaces all previous ones.
+   Over the life-time of patching the hypervisor this large patch
+   grows to accumulate all the code changes.
+ * Hotpatch stack - where an mechanism exists that loads the hotpatches
+   in the same order they were built in. We would need an build-id
+   of the hypevisor to make sure the hot-patches are build against the
+   correct build.
+ * Payload containing the old code to check against that. That allows
+   the hotpatches to be loaded indepedently (if they don't overlap) - or
+   if the old code also containst previously patched code - even if they
+   overlap.
+
+The disadvantage of the first large patch is that it can grow over
+time and not provide an bisection mechanism to identify faulty patches.
+
+The hot-patch stack puts stricts requirements on the order of the patches
+being loaded and requires an hypervisor build-id to match against.
+
+The old code allows much more flexibility and an additional guard,
+but is more complex to implement.
+
+## Signature checking requirements.
+
+The signature checking requires that the layout of the data in memory
+**MUST** be same for signature to be verified. This means that the payload
+data layout in ELF format **MUST** match what the hypervisor would be
+expecting such that it can properly do signature verification.
+
+The signature is based on the all of the payloads continuously laid out
+in memory. The signature is to be appended at the end of the ELF payload
+prefixed with the string '~Module signature appended~\n', followed by
+an signature header then followed by the signature, key identifier, and signers
+name.
+
+Specifically the signature header would be:
+
+<pre>
+#define PKEY_ALGO_DSA       0  
+#define PKEY_ALGO_RSA       1  
+
+#define PKEY_ID_PGP         0 /* OpenPGP generated key ID */  
+#define PKEY_ID_X509        1 /* X.509 arbitrary subjectKeyIdentifier */  
+
+#define HASH_ALGO_MD4          0  
+#define HASH_ALGO_MD5          1  
+#define HASH_ALGO_SHA1         2  
+#define HASH_ALGO_RIPE_MD_160  3  
+#define HASH_ALGO_SHA256       4  
+#define HASH_ALGO_SHA384       5  
+#define HASH_ALGO_SHA512       6  
+#define HASH_ALGO_SHA224       7  
+#define HASH_ALGO_RIPE_MD_128  8  
+#define HASH_ALGO_RIPE_MD_256  9  
+#define HASH_ALGO_RIPE_MD_320 10  
+#define HASH_ALGO_WP_256      11  
+#define HASH_ALGO_WP_384      12  
+#define HASH_ALGO_WP_512      13  
+#define HASH_ALGO_TGR_128     14  
+#define HASH_ALGO_TGR_160     15  
+#define HASH_ALGO_TGR_192     16  
+
+
+struct elf_payload_signature {  
+	u8	algo;		/* Public-key crypto algorithm PKEY_ALGO_*. */  
+	u8	hash;		/* Digest algorithm: HASH_ALGO_*. */  
+	u8	id_type;	/* Key identifier type PKEY_ID*. */  
+	u8	signer_len;	/* Length of signer's name */  
+	u8	key_id_len;	/* Length of key identifier */  
+	u8	__pad[3];  
+	__be32	sig_len;	/* Length of signature data */  
+};
+
+</pre>
+(Note that this has been borrowed from Linux module signature code.).
+
+
+### .bss and .data sections.
+
+In place patching writable data is not suitable as it is unclear what should be done
+depending on the current state of data. As such it should not be attempted.
+
+That said we should provide hook functions so that the existing data
+can be changed during payload application.
+
+
+### Inline patching
+
+The hypervisor should verify that the in-place patching would fit within
+the code or data.
+
+### Trampoline (e9 opcode)
+
+The e9 opcode used for jmpq uses a 32-bit signed displacement. That means
+we are limited to up to 2GB of virtual address to place the new code
+from the old code. That should not be a problem since Xen hypervisor has
+a very small footprint.
+
+However if we need - we can always add two trampolines. One at the 2GB
+limit that calls the next trampoline.
+
+Please note there is a small limitation for trampolines in
+function entries: The target function (+ trailing padding) must be able
+to accomodate the trampoline. On x86 with +-2 GB relative jumps,
+this means 5 bytes are required.
+
+Depending on compiler settings, there are several functions in Xen that
+are smaller (without inter-function padding).
+
+<pre> 
+readelf -sW xen-syms | grep " FUNC " | \
+    awk '{ if ($3 < 5) print $3, $4, $5, $8 }'
+
+...
+3 FUNC LOCAL wbinvd_ipi
+3 FUNC LOCAL shadow_l1_index
+...
+</pre>
+A compile-time check for, e.g., a minimum alignment of functions or a
+runtime check that verifies symbol size (+ padding to next symbols) for
+that in the hypervisor is advised.
+
+The tool for generating payloads currently does perform a compile-time
+check to ensure that the function to be replaced is large enough.
+
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 04/27] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (2 preceding siblings ...)
  2016-04-25 15:34 ` [PATCH v9 03/27] xsplice: Design document Konrad Rzeszutek Wilk
@ 2016-04-25 15:34 ` Konrad Rzeszutek Wilk
  2016-04-26  7:48   ` Ross Lagerwall
                     ` (2 more replies)
  2016-04-25 15:34 ` [PATCH v9 05/27] libxc: Implementation of XEN_XSPLICE_op in libxc Konrad Rzeszutek Wilk
                   ` (23 subsequent siblings)
  27 siblings, 3 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:34 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Wei Liu, Daniel De Graaf, Stefano Stabellini, Ian Jackson,
	Konrad Rzeszutek Wilk

The implementation does not actually do any patching.

It just adds the framework for doing the hypercalls,
keeping track of ELF payloads, and the basic operations:
 - query which payloads exist,
 - query for specific payloads,
 - check*1, apply*1, replace*1, and unload payloads.

*1: Which of course in this patch are nops.

The functionality is disabled on ARM until all arch
components are implemented.

Also by default it is disabled until the implementation
is in place.

We also use recursive spinlocks to so that the find_payload
function does not need to have a 'lock' and 'non-lock' variant.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>

---
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Wei Liu <wei.liu2@citrix.com>

v2: Rebased on keyhandler: rework keyhandler infrastructure
v3: Fixed XSM.
 - Removed REVERTED state.
    Split status and error code.
    Add REPLACE action.
    Separate payload data from the payload structure.
    s/XSPLICE_ID_../XSPLICE_NAME_../
 - Add xsplice and CONFIG_XSPLICE build toption.
    Fix code per Jan's review.
    Update the sysctl.h (change bits to enum like)
 - Rebase on Kconfig changes.
 - Add missing pad checks. Re-order keyhandler.h to build on ARM.
 - Rebase on build: hook the schedulers into Kconfig
 - s/id/name/; s/payload_list_lock/payload_lock/
 - Put #ifdef CONFIG_XSPLICE in header file per Doug review.
 - Andrew review:
    - use recursive spinlocks, change name to xsplice_op,
      sprinkle new-lines, add local variable block, include
      state diagram, squash two goto labels, use vzalloc instead of
      alloc_xenheap_pages.
    - change 'state' from int32 to uint32_t
    - remove the err label out of xsplice_upload
    - use void* instaed of uint8_t
    - move code around to make it easier to read.
    - Add vmap.h to compiler under ARM.
 - Add missing Copyright in header file
 - Dropped LOADED state, make the payload go in CHECKED.
v4: Made it only work on x86 per Julien's (ARM) maintainer request.
v5: Dropped the load->check state example in sysctl.h
    Made the ->nr=0 call work. Remove rc=0 in lots of cases. Update
    header from design doc.
v6: Update what 'idx' means. Don't drop lock in find_payload. Make
    find_name copy data.
v7: Don't print -EINVAL when payload_cnt is zero (and toolstack provides
    idx as zero). Change return code to -ENOSYS, so change callback setting
    based on -ENOSYS and -EOPNOTSUPP.
    Add extra printk in keyhandler.
    Use if (..) else in  xsplice_upload instead of two 'if'.
    Remove #ifdef in XSM machinery.
    Add Andrew's Reviewed-by.
    Rebase on x86/cpu: Sysctl and common infrastructure for levelling context switching
    and xen+tools: Export maximum host and guest cpu featuresets via SYSCTL
v9:
    s/find_name/get_name/, drop locks when allocating data.
    Drop conditional expression on copyback
    Move the allocation on upload outside the spinlock.
    Add (TECH PREVIEW) to the Kconfig help
    Return -EINVAL if the CHECK or UNLOAD action is to be performed and the payload
    state is not in expected state.
    Print 'c' not 'u' when invoking the keyhandler.
---
 tools/flask/policy/policy/modules/xen/xen.te |   1 +
 xen/common/Kconfig                           |  12 +
 xen/common/Makefile                          |   1 +
 xen/common/sysctl.c                          |   7 +
 xen/common/xsplice.c                         | 407 +++++++++++++++++++++++++++
 xen/include/public/sysctl.h                  | 166 +++++++++++
 xen/include/xen/xsplice.h                    |  35 +++
 xen/xsm/flask/hooks.c                        |   4 +
 xen/xsm/flask/policy/access_vectors          |   2 +
 9 files changed, 635 insertions(+)
 create mode 100644 xen/common/xsplice.c
 create mode 100644 xen/include/xen/xsplice.h

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index 2a2630d..daa1315 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -74,6 +74,7 @@ allow dom0_t xen_t:xen2 {
     get_symbol
     get_cpu_levelling_caps
     get_cpu_featureset
+    xsplice_op
 };
 
 # Allow dom0 to use all XENVER_ subops that have checks.
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index ad9f7bf..692ef51 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -188,4 +188,16 @@ config SCHED_DEFAULT
 
 endmenu
 
+# Enable/Disable xsplice support
+config XSPLICE
+	bool "xSplice live patching support (TECH PREVIEW)"
+	default n
+	depends on X86
+	---help---
+	  Allows a running Xen hypervisor to be dynamically patched using
+	  binary patches without rebooting. This is primarily used to binarily
+	  patch in the field an hypervisor with XSA fixes.
+
+	  If unsure, say Y.
+
 endmenu
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 77de27e..910ac69 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -57,6 +57,7 @@ obj-y += vsprintf.o
 obj-y += wait.o
 obj-$(CONFIG_XENOPROF) += xenoprof.o
 obj-y += xmalloc_tlsf.o
+obj-$(CONFIG_XSPLICE) += xsplice.o
 
 obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma unlzo unlz4 earlycpio,$(n).init.o)
 
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 253b7c8..9a4cc1f 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -28,6 +28,7 @@
 #include <xsm/xsm.h>
 #include <xen/pmstat.h>
 #include <xen/gcov.h>
+#include <xen/xsplice.h>
 
 long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
 {
@@ -460,6 +461,12 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
         ret = tmem_control(&op->u.tmem_op);
         break;
 
+    case XEN_SYSCTL_xsplice_op:
+        ret = xsplice_op(&op->u.xsplice);
+        if ( ret != -ENOSYS && ret != -EOPNOTSUPP )
+            copyback = 1;
+        break;
+
     default:
         ret = arch_do_sysctl(op, u_sysctl);
         copyback = 0;
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
new file mode 100644
index 0000000..4878a57
--- /dev/null
+++ b/xen/common/xsplice.c
@@ -0,0 +1,407 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#include <xen/err.h>
+#include <xen/guest_access.h>
+#include <xen/keyhandler.h>
+#include <xen/lib.h>
+#include <xen/list.h>
+#include <xen/mm.h>
+#include <xen/sched.h>
+#include <xen/smp.h>
+#include <xen/spinlock.h>
+#include <xen/vmap.h>
+#include <xen/xsplice.h>
+
+#include <asm/event.h>
+#include <public/sysctl.h>
+
+/* Protects against payload_list operations. */
+static DEFINE_SPINLOCK(payload_lock);
+static LIST_HEAD(payload_list);
+
+static unsigned int payload_cnt;
+static unsigned int payload_version = 1;
+
+struct payload {
+    uint32_t state;                      /* One of the XSPLICE_STATE_*. */
+    int32_t rc;                          /* 0 or -XEN_EXX. */
+    struct list_head list;               /* Linked to 'payload_list'. */
+    char name[XEN_XSPLICE_NAME_SIZE];    /* Name of it. */
+};
+
+static int get_name(const xen_xsplice_name_t *name, char *n)
+{
+    if ( !name->size || name->size > XEN_XSPLICE_NAME_SIZE )
+        return -EINVAL;
+
+    if ( name->pad[0] || name->pad[1] || name->pad[2] )
+        return -EINVAL;
+
+    if ( !guest_handle_okay(name->name, name->size) )
+        return -EINVAL;
+
+    if ( __copy_from_guest(n, name->name, name->size) )
+        return -EFAULT;
+
+    if ( n[name->size - 1] )
+        return -EINVAL;
+
+    return 0;
+}
+
+static int verify_payload(const xen_sysctl_xsplice_upload_t *upload, char *n)
+{
+    if ( get_name(&upload->name, n) )
+        return -EINVAL;
+
+    if ( !upload->size )
+        return -EINVAL;
+
+    if ( upload->size > MB(2) )
+        return -EINVAL;
+
+    if ( !guest_handle_okay(upload->payload, upload->size) )
+        return -EFAULT;
+
+    return 0;
+}
+
+static struct payload *find_payload(const char *name)
+{
+    struct payload *data, *found = NULL;
+
+    ASSERT(spin_is_locked(&payload_lock));
+    list_for_each_entry ( data, &payload_list, list )
+    {
+        if ( !strcmp(data->name, name) )
+        {
+            found = data;
+            break;
+        }
+    }
+
+    return found;
+}
+
+static void free_payload(struct payload *data)
+{
+    ASSERT(spin_is_locked(&payload_lock));
+    list_del(&data->list);
+    payload_cnt--;
+    payload_version++;
+    xfree(data);
+}
+
+static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
+{
+    struct payload *data, *found;
+    char n[XEN_XSPLICE_NAME_SIZE];
+    int rc;
+
+    rc = verify_payload(upload, n);
+    if ( rc )
+        return rc;
+
+    data = xzalloc(struct payload);
+
+    spin_lock(&payload_lock);
+
+    found = find_payload(n);
+    if ( IS_ERR(found) )
+    {
+        rc = PTR_ERR(found);
+        goto out;
+    }
+    else if ( found )
+    {
+        rc = -EEXIST;
+        goto out;
+    }
+
+    if ( !data )
+    {
+        rc = -ENOMEM;
+        goto out;
+    }
+
+    rc = 0;
+
+    memcpy(data->name, n, strlen(n));
+    data->state = XSPLICE_STATE_CHECKED;
+    INIT_LIST_HEAD(&data->list);
+
+    list_add_tail(&data->list, &payload_list);
+    payload_cnt++;
+    payload_version++;
+
+ out:
+    spin_unlock(&payload_lock);
+
+    if ( rc )
+        xfree(data);
+
+    return rc;
+}
+
+static int xsplice_get(xen_sysctl_xsplice_get_t *get)
+{
+    struct payload *data;
+    int rc;
+    char n[XEN_XSPLICE_NAME_SIZE];
+
+    rc = get_name(&get->name, n);
+    if ( rc )
+        return rc;
+
+    spin_lock(&payload_lock);
+
+    data = find_payload(n);
+    if ( IS_ERR_OR_NULL(data) )
+    {
+        spin_unlock(&payload_lock);
+
+        if ( !data )
+            return -ENOENT;
+
+        return PTR_ERR(data);
+    }
+
+    get->status.state = data->state;
+    get->status.rc = data->rc;
+
+    spin_unlock(&payload_lock);
+
+    return 0;
+}
+
+static int xsplice_list(xen_sysctl_xsplice_list_t *list)
+{
+    xen_xsplice_status_t status;
+    struct payload *data;
+    unsigned int idx = 0, i = 0;
+    int rc = 0;
+
+    if ( list->nr > 1024 )
+        return -E2BIG;
+
+    if ( list->pad )
+        return -EINVAL;
+
+    if ( list->nr &&
+         (!guest_handle_okay(list->status, list->nr) ||
+          !guest_handle_okay(list->name, XEN_XSPLICE_NAME_SIZE * list->nr) ||
+          !guest_handle_okay(list->len, list->nr)) )
+        return -EINVAL;
+
+    spin_lock(&payload_lock);
+    if ( list->idx >= payload_cnt && payload_cnt )
+    {
+        spin_unlock(&payload_lock);
+        return -EINVAL;
+    }
+
+    if ( list->nr )
+    {
+        list_for_each_entry( data, &payload_list, list )
+        {
+            uint32_t len;
+
+            if ( list->idx > i++ )
+                continue;
+
+            status.state = data->state;
+            status.rc = data->rc;
+            len = strlen(data->name) + 1;
+
+            /* N.B. 'idx' != 'i'. */
+            if ( __copy_to_guest_offset(list->name, idx * XEN_XSPLICE_NAME_SIZE,
+                                        data->name, len) ||
+                __copy_to_guest_offset(list->len, idx, &len, 1) ||
+                __copy_to_guest_offset(list->status, idx, &status, 1) )
+            {
+                rc = -EFAULT;
+                break;
+            }
+
+            idx++;
+
+            if ( (idx >= list->nr) || hypercall_preempt_check() )
+                break;
+        }
+    }
+    list->nr = payload_cnt - i; /* Remaining amount. */
+    list->version = payload_version;
+    spin_unlock(&payload_lock);
+
+    /* And how many we have processed. */
+    return rc ? : idx;
+}
+
+static int xsplice_action(xen_sysctl_xsplice_action_t *action)
+{
+    struct payload *data;
+    char n[XEN_XSPLICE_NAME_SIZE];
+    int rc;
+
+    rc = get_name(&action->name, n);
+    if ( rc )
+        return rc;
+
+    spin_lock(&payload_lock);
+
+    data = find_payload(n);
+    if ( IS_ERR_OR_NULL(data) )
+    {
+        spin_unlock(&payload_lock);
+
+        if ( !data )
+            return -ENOENT;
+
+        return PTR_ERR(data);
+    }
+
+    switch ( action->cmd )
+    {
+    case XSPLICE_ACTION_CHECK:
+        if ( data->state == XSPLICE_STATE_CHECKED )
+        {
+            /* No implementation yet. */
+            data->state = XSPLICE_STATE_CHECKED;
+            data->rc = 0;
+        } else
+            rc = -EINVAL;
+        break;
+
+    case XSPLICE_ACTION_UNLOAD:
+        if ( data->state == XSPLICE_STATE_CHECKED )
+        {
+            free_payload(data);
+            /* No touching 'data' from here on! */
+            data = NULL;
+        } else
+            rc = -EINVAL;
+        break;
+
+    case XSPLICE_ACTION_REVERT:
+        if ( data->state == XSPLICE_STATE_APPLIED )
+        {
+            /* No implementation yet. */
+            data->state = XSPLICE_STATE_CHECKED;
+            data->rc = 0;
+        }
+        break;
+
+    case XSPLICE_ACTION_APPLY:
+        if ( data->state == XSPLICE_STATE_CHECKED )
+        {
+            /* No implementation yet. */
+            data->state = XSPLICE_STATE_APPLIED;
+            data->rc = 0;
+        }
+        break;
+
+    case XSPLICE_ACTION_REPLACE:
+        if ( data->state == XSPLICE_STATE_CHECKED )
+        {
+            /* No implementation yet. */
+            data->state = XSPLICE_STATE_CHECKED;
+            data->rc = 0;
+        }
+        break;
+
+    default:
+        rc = -EOPNOTSUPP;
+        break;
+    }
+
+    spin_unlock(&payload_lock);
+
+    return rc;
+}
+
+int xsplice_op(xen_sysctl_xsplice_op_t *xsplice)
+{
+    int rc;
+
+    if ( xsplice->pad )
+        return -EINVAL;
+
+    switch ( xsplice->cmd )
+    {
+    case XEN_SYSCTL_XSPLICE_UPLOAD:
+        rc = xsplice_upload(&xsplice->u.upload);
+        break;
+
+    case XEN_SYSCTL_XSPLICE_GET:
+        rc = xsplice_get(&xsplice->u.get);
+        break;
+
+    case XEN_SYSCTL_XSPLICE_LIST:
+        rc = xsplice_list(&xsplice->u.list);
+        break;
+
+    case XEN_SYSCTL_XSPLICE_ACTION:
+        rc = xsplice_action(&xsplice->u.action);
+        break;
+
+    default:
+        rc = -EOPNOTSUPP;
+        break;
+   }
+
+    return rc;
+}
+
+static const char *state2str(uint32_t state)
+{
+#define STATE(x) [XSPLICE_STATE_##x] = #x
+    static const char *const names[] = {
+            STATE(CHECKED),
+            STATE(APPLIED),
+    };
+#undef STATE
+
+    if (state >= ARRAY_SIZE(names) || !names[state])
+        return "unknown";
+
+    return names[state];
+}
+
+static void xsplice_printall(unsigned char key)
+{
+    struct payload *data;
+
+    printk("'%c' pressed - Dumping all xsplice patches\n", key);
+
+    if ( !spin_trylock(&payload_lock) )
+    {
+        printk("Lock held. Try again.\n");
+        return;
+    }
+
+    list_for_each_entry ( data, &payload_list, list )
+        printk(" name=%s state=%s(%d)\n", data->name,
+               state2str(data->state), data->state);
+
+    spin_unlock(&payload_lock);
+}
+
+static int __init xsplice_init(void)
+{
+    register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
+    return 0;
+}
+__initcall(xsplice_init);
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 82a2a3e..416e39a 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -848,6 +848,170 @@ struct xen_sysctl_cpu_featureset {
 typedef struct xen_sysctl_featureset xen_sysctl_featureset_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_featureset_t);
 
+/*
+ * XEN_SYSCTL_XSPLICE_op
+ *
+ * Refer to the docs/unstable/misc/xsplice.markdown
+ * for the design details of this hypercall.
+ *
+ * There are four sub-ops:
+ *  XEN_SYSCTL_XSPLICE_UPLOAD (0)
+ *  XEN_SYSCTL_XSPLICE_GET (1)
+ *  XEN_SYSCTL_XSPLICE_LIST (2)
+ *  XEN_SYSCTL_XSPLICE_ACTION (3)
+ *
+ * The normal sequence of sub-ops is to:
+ *  1) XEN_SYSCTL_XSPLICE_UPLOAD to upload the payload. If errors STOP.
+ *  2) XEN_SYSCTL_XSPLICE_GET to check the `->rc`. If -XEN_EAGAIN spin.
+ *     If zero go to next step.
+ *  3) XEN_SYSCTL_XSPLICE_ACTION with XSPLICE_ACTION_APPLY to apply the patch.
+ *  4) XEN_SYSCTL_XSPLICE_GET to check the `->rc`. If in -XEN_EAGAIN spin.
+ *     If zero exit with success.
+ */
+
+/*
+ * Structure describing an ELF payload. Uniquely identifies the
+ * payload. Should be human readable.
+ * Recommended length is upto XEN_XSPLICE_NAME_SIZE.
+ * Includes the NUL terminator.
+ */
+#define XEN_XSPLICE_NAME_SIZE 128
+struct xen_xsplice_name {
+    XEN_GUEST_HANDLE_64(char) name;         /* IN: pointer to name. */
+    uint16_t size;                          /* IN: size of name. May be upto
+                                               XEN_XSPLICE_NAME_SIZE. */
+    uint16_t pad[3];                        /* IN: MUST be zero. */
+};
+typedef struct xen_xsplice_name xen_xsplice_name_t;
+DEFINE_XEN_GUEST_HANDLE(xen_xsplice_name_t);
+
+/*
+ * Upload a payload to the hypervisor. The payload is verified
+ * against basic checks and if there are any issues the proper return code
+ * will be returned. The payload is not applied at this time - that is
+ * controlled by XEN_SYSCTL_XSPLICE_ACTION.
+ *
+ * The return value is zero if the payload was succesfully uploaded.
+ * Otherwise an EXX return value is provided. Duplicate `name` are not
+ * supported.
+ *
+ * The payload at this point is verified against basic checks.
+ *
+ * The `payload` is the ELF payload as mentioned in the `Payload format`
+ * section in the xSplice design document.
+ */
+#define XEN_SYSCTL_XSPLICE_UPLOAD 0
+struct xen_sysctl_xsplice_upload {
+    xen_xsplice_name_t name;                /* IN, name of the patch. */
+    uint64_t size;                          /* IN, size of the ELF file. */
+    XEN_GUEST_HANDLE_64(uint8) payload;     /* IN, the ELF file. */
+};
+typedef struct xen_sysctl_xsplice_upload xen_sysctl_xsplice_upload_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_upload_t);
+
+/*
+ * Retrieve an status of an specific payload.
+ *
+ * Upon completion the `struct xen_xsplice_status` is updated.
+ *
+ * The return value is zero on success and XEN_EXX on failure. This operation
+ * is synchronous and does not require preemption.
+ */
+#define XEN_SYSCTL_XSPLICE_GET 1
+
+struct xen_xsplice_status {
+#define XSPLICE_STATE_CHECKED      1
+#define XSPLICE_STATE_APPLIED      2
+    uint32_t state;                /* OUT: XSPLICE_STATE_*. */
+    int32_t rc;                    /* OUT: 0 if no error, otherwise -XEN_EXX. */
+};
+typedef struct xen_xsplice_status xen_xsplice_status_t;
+DEFINE_XEN_GUEST_HANDLE(xen_xsplice_status_t);
+
+struct xen_sysctl_xsplice_get {
+    xen_xsplice_name_t name;                /* IN, name of the payload. */
+    xen_xsplice_status_t status;            /* IN/OUT, state of it. */
+};
+typedef struct xen_sysctl_xsplice_get xen_sysctl_xsplice_get_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_get_t);
+
+/*
+ * Retrieve an array of abbreviated status and names of payloads that are
+ * loaded in the hypervisor.
+ *
+ * If the hypercall returns an positive number, it is the number (up to `nr`)
+ * of the payloads returned, along with `nr` updated with the number of remaining
+ * payloads, `version` updated (it may be the same across hypercalls. If it
+ * varies the data is stale and further calls could fail). The `status`,
+ * `name`, and `len`' are updated at their designed index value (`idx`) with
+ * the returned value of data.
+ *
+ * If the hypercall returns E2BIG the `nr` is too big and should be
+ * lowered. The upper limit of `nr` is left to the implemention.
+ *
+ * Note that due to the asynchronous nature of hypercalls the domain might have
+ * added or removed the number of payloads making this information stale. It is
+ * the responsibility of the toolstack to use the `version` field to check
+ * between each invocation. if the version differs it should discard the stale
+ * data and start from scratch. It is OK for the toolstack to use the new
+ * `version` field.
+ */
+#define XEN_SYSCTL_XSPLICE_LIST 2
+struct xen_sysctl_xsplice_list {
+    uint32_t version;                       /* OUT: Hypervisor stamps value.
+                                               If varies between calls, we are
+                                             * getting stale data. */
+    uint32_t idx;                           /* IN: Index into hypervisor list. */
+    uint32_t nr;                            /* IN: How many status, name, and len
+                                               should fill out. Can be zero to get
+                                               amount of payloads and version.
+                                               OUT: How many payloads left. */
+    uint32_t pad;                           /* IN: Must be zero. */
+    XEN_GUEST_HANDLE_64(xen_xsplice_status_t) status;  /* OUT. Must have enough
+                                               space allocate for nr of them. */
+    XEN_GUEST_HANDLE_64(char) name;         /* OUT: Array of names. Each member
+                                               MUST XEN_XSPLICE_NAME_SIZE in size.
+                                               Must have nr of them. */
+    XEN_GUEST_HANDLE_64(uint32) len;        /* OUT: Array of lengths of name's.
+                                               Must have nr of them. */
+};
+typedef struct xen_sysctl_xsplice_list xen_sysctl_xsplice_list_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_list_t);
+
+/*
+ * Perform an operation on the payload structure referenced by the `name` field.
+ * The operation request is asynchronous and the status should be retrieved
+ * by using either XEN_SYSCTL_XSPLICE_GET or XEN_SYSCTL_XSPLICE_LIST hypercall.
+ */
+#define XEN_SYSCTL_XSPLICE_ACTION 3
+struct xen_sysctl_xsplice_action {
+    xen_xsplice_name_t name;                /* IN, name of the patch. */
+#define XSPLICE_ACTION_CHECK        1
+#define XSPLICE_ACTION_UNLOAD       2
+#define XSPLICE_ACTION_REVERT       3
+#define XSPLICE_ACTION_APPLY        4
+#define XSPLICE_ACTION_REPLACE      5
+    uint32_t cmd;                           /* IN: XSPLICE_ACTION_*. */
+    uint32_t timeout;                       /* IN: Zero if no timeout. */
+                                            /* Or upper bound of time (ms) */
+                                            /* for operation to take. */
+};
+typedef struct xen_sysctl_xsplice_action xen_sysctl_xsplice_action_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_action_t);
+
+struct xen_sysctl_xsplice_op {
+    uint32_t cmd;                           /* IN: XEN_SYSCTL_XSPLICE_*. */
+    uint32_t pad;                           /* IN: Always zero. */
+    union {
+        xen_sysctl_xsplice_upload_t upload;
+        xen_sysctl_xsplice_list_t list;
+        xen_sysctl_xsplice_get_t get;
+        xen_sysctl_xsplice_action_t action;
+    } u;
+};
+typedef struct xen_sysctl_xsplice_op xen_sysctl_xsplice_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_op_t);
+
 struct xen_sysctl {
     uint32_t cmd;
 #define XEN_SYSCTL_readconsole                    1
@@ -875,6 +1039,7 @@ struct xen_sysctl {
 #define XEN_SYSCTL_tmem_op                       24
 #define XEN_SYSCTL_get_cpu_levelling_caps        25
 #define XEN_SYSCTL_get_cpu_featureset            26
+#define XEN_SYSCTL_xsplice_op                    27
     uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
     union {
         struct xen_sysctl_readconsole       readconsole;
@@ -902,6 +1067,7 @@ struct xen_sysctl {
         struct xen_sysctl_tmem_op           tmem_op;
         struct xen_sysctl_cpu_levelling_caps cpu_levelling_caps;
         struct xen_sysctl_cpu_featureset    cpu_featureset;
+        struct xen_sysctl_xsplice_op        xsplice;
         uint8_t                             pad[128];
     } u;
 };
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
new file mode 100644
index 0000000..b9f08cd
--- /dev/null
+++ b/xen/include/xen/xsplice.h
@@ -0,0 +1,35 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#ifndef __XEN_XSPLICE_H__
+#define __XEN_XSPLICE_H__
+
+struct xen_sysctl_xsplice_op;
+
+#ifdef CONFIG_XSPLICE
+
+int xsplice_op(struct xen_sysctl_xsplice_op *);
+
+#else
+
+#include <xen/errno.h> /* For -ENOSYS */
+static inline int xsplice_op(struct xen_sysctl_xsplice_op *op)
+{
+    return -ENOSYS;
+}
+
+#endif /* CONFIG_XSPLICE */
+
+#endif /* __XEN_XSPLICE_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 6295768..c2df48f 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -814,6 +814,10 @@ static int flask_sysctl(int cmd)
     case XEN_SYSCTL_get_cpu_featureset:
         return domain_has_xen(current->domain, XEN2__GET_CPU_FEATURESET);
 
+    case XEN_SYSCTL_xsplice_op:
+        return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2,
+                                    XEN2__XSPLICE_OP, NULL);
+
     default:
         printk("flask_sysctl: Unknown op %d\n", cmd);
         return -EPERM;
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index bdb7b89..e9ab149 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -97,6 +97,8 @@ class xen2
     get_cpu_levelling_caps
 # XEN_SYSCTL_get_cpu_featureset
     get_cpu_featureset
+# XEN_SYSCTL_xsplice_op
+    xsplice_op
 }
 
 # Classes domain and domain2 consist of operations that a domain performs on
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 05/27] libxc: Implementation of XEN_XSPLICE_op in libxc
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (3 preceding siblings ...)
  2016-04-25 15:34 ` [PATCH v9 04/27] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op Konrad Rzeszutek Wilk
@ 2016-04-25 15:34 ` Konrad Rzeszutek Wilk
  2016-04-26  7:51   ` Ross Lagerwall
  2016-04-25 15:34 ` [PATCH v9 06/27] xen-xsplice: Tool to manipulate xsplice payloads Konrad Rzeszutek Wilk
                   ` (22 subsequent siblings)
  27 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:34 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Konrad Rzeszutek Wilk

The underlaying toolstack code to do the basic
operations when using the XEN_XSPLICE_op syscalls:
 - upload the payload,
 - get status of an payload,
 - list all the payloads,
 - apply, check, replace, and revert the payload.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Wei Liu <wei.liu2@citrix.com>

v2: Actually set zero for the _pad entries.
v3: Split status into state and error code.
    Add REPLACE action.
 - Use timeout and utilize pads.
 - Update per Wei's review.
 - Extra space slipped in, remove it
v4: Add Wei's review, update comment and Ack.
v7: Sprinkle errno=-EINVAL on all the 'if (!len)', etc checks.
    Added Reviewed-by from Andrew.
---
---
 tools/libxc/include/xenctrl.h |  62 ++++++++
 tools/libxc/xc_misc.c         | 355 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 417 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 42f201b..54431de 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2610,6 +2610,68 @@ const uint32_t *xc_get_feature_deep_deps(uint32_t feature);
 
 #endif
 
+int xc_xsplice_upload(xc_interface *xch,
+                      char *name, unsigned char *payload, uint32_t size);
+
+int xc_xsplice_get(xc_interface *xch,
+                   char *name,
+                   xen_xsplice_status_t *status);
+
+/*
+ * The heart of this function is to get an array of xen_xsplice_status_t.
+ *
+ * However it is complex because it has to deal with the hypervisor
+ * returning some of the requested data or data being stale
+ * (another hypercall might alter the list).
+ *
+ * The parameters that the function expects to contain data from
+ * the hypervisor are: 'info', 'name', and 'len'. The 'done' and
+ * 'left' are also updated with the number of entries filled out
+ * and respectively the number of entries left to get from hypervisor.
+ *
+ * It is expected that the caller of this function will take the
+ * 'left' and use the value for 'start'. This way we have an
+ * cursor in the array. Note that the 'info','name', and 'len' will
+ * be updated at the subsequent calls.
+ *
+ * The 'max' is to be provided by the caller with the maximum
+ * number of entries that 'info', 'name', and 'len' arrays can
+ * be filled up with.
+ *
+ * Each entry in the 'name' array is expected to be of XEN_XSPLICE_NAME_SIZE
+ * length.
+ *
+ * Each entry in the 'info' array is expected to be of xen_xsplice_status_t
+ * structure size.
+ *
+ * Each entry in the 'len' array is expected to be of uint32_t size.
+ *
+ * The return value is zero if the hypercall completed successfully.
+ * Note that the return value is _not_ the amount of entries filled
+ * out - that is saved in 'done'.
+ *
+ * If there was an error performing the operation, the return value
+ * will contain an negative -EXX type value. The 'done' and 'left'
+ * will contain the number of entries that had been succesfully
+ * retrieved (if any).
+ */
+int xc_xsplice_list(xc_interface *xch, unsigned int max, unsigned int start,
+                    xen_xsplice_status_t *info, char *name,
+                    uint32_t *len, unsigned int *done,
+                    unsigned int *left);
+
+/*
+ * The operations are asynchronous and the hypervisor may take a while
+ * to complete them. The `timeout` offers an option to expire the
+ * operation if it could not be completed within the specified time
+ * (in ms). Value of 0 means let hypervisor decide the best timeout.
+ */
+int xc_xsplice_apply(xc_interface *xch, char *name, uint32_t timeout);
+int xc_xsplice_revert(xc_interface *xch, char *name, uint32_t timeout);
+int xc_xsplice_unload(xc_interface *xch, char *name, uint32_t timeout);
+int xc_xsplice_check(xc_interface *xch, char *name, uint32_t timeout);
+int xc_xsplice_replace(xc_interface *xch, char *name, uint32_t timeout);
+
 /* Compat shims */
 #include "xenctrl_compat.h"
 
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 7d997d9..8cd398b 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -696,6 +696,361 @@ int xc_hvm_inject_trap(
     return rc;
 }
 
+int xc_xsplice_upload(xc_interface *xch,
+                      char *name,
+                      unsigned char *payload,
+                      uint32_t size)
+{
+    int rc;
+    DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BUFFER(char, local);
+    DECLARE_HYPERCALL_BOUNCE(name, 0 /* later */, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    xen_xsplice_name_t def_name = { .pad = { 0, 0, 0 } };
+
+    if ( !name || !payload )
+    {
+        errno = EINVAL;
+        return -1;
+    }
+
+    def_name.size = strlen(name) + 1;
+    if ( def_name.size > XEN_XSPLICE_NAME_SIZE )
+    {
+        errno = EINVAL;
+        return -1;
+    }
+
+    HYPERCALL_BOUNCE_SET_SIZE(name, def_name.size);
+
+    if ( xc_hypercall_bounce_pre(xch, name) )
+        return -1;
+
+    local = xc_hypercall_buffer_alloc(xch, local, size);
+    if ( !local )
+    {
+        xc_hypercall_bounce_post(xch, name);
+        return -1;
+    }
+    memcpy(local, payload, size);
+
+    sysctl.cmd = XEN_SYSCTL_xsplice_op;
+    sysctl.u.xsplice.cmd = XEN_SYSCTL_XSPLICE_UPLOAD;
+    sysctl.u.xsplice.pad = 0;
+    sysctl.u.xsplice.u.upload.size = size;
+    set_xen_guest_handle(sysctl.u.xsplice.u.upload.payload, local);
+
+    sysctl.u.xsplice.u.upload.name = def_name;
+    set_xen_guest_handle(sysctl.u.xsplice.u.upload.name.name, name);
+
+    rc = do_sysctl(xch, &sysctl);
+
+    xc_hypercall_buffer_free(xch, local);
+    xc_hypercall_bounce_post(xch, name);
+
+    return rc;
+}
+
+int xc_xsplice_get(xc_interface *xch,
+                   char *name,
+                   xen_xsplice_status_t *status)
+{
+    int rc;
+    DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BOUNCE(name, 0 /*adjust later */, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    xen_xsplice_name_t def_name = { .pad = { 0, 0, 0 } };
+
+    if ( !name )
+    {
+        errno = EINVAL;
+        return -1;
+    }
+
+    def_name.size = strlen(name) + 1;
+    if ( def_name.size > XEN_XSPLICE_NAME_SIZE )
+    {
+        errno = EINVAL;
+        return -1;
+    }
+
+    HYPERCALL_BOUNCE_SET_SIZE(name, def_name.size);
+
+    if ( xc_hypercall_bounce_pre(xch, name) )
+        return -1;
+
+    sysctl.cmd = XEN_SYSCTL_xsplice_op;
+    sysctl.u.xsplice.cmd = XEN_SYSCTL_XSPLICE_GET;
+    sysctl.u.xsplice.pad = 0;
+
+    sysctl.u.xsplice.u.get.status.state = 0;
+    sysctl.u.xsplice.u.get.status.rc = 0;
+
+    sysctl.u.xsplice.u.get.name = def_name;
+    set_xen_guest_handle(sysctl.u.xsplice.u.get.name.name, name);
+
+    rc = do_sysctl(xch, &sysctl);
+
+    xc_hypercall_bounce_post(xch, name);
+
+    memcpy(status, &sysctl.u.xsplice.u.get.status, sizeof(*status));
+
+    return rc;
+}
+
+/*
+ * The heart of this function is to get an array of xen_xsplice_status_t.
+ *
+ * However it is complex because it has to deal with the hypervisor
+ * returning some of the requested data or data being stale
+ * (another hypercall might alter the list).
+ *
+ * The parameters that the function expects to contain data from
+ * the hypervisor are: 'info', 'name', and 'len'. The 'done' and
+ * 'left' are also updated with the number of entries filled out
+ * and respectively the number of entries left to get from hypervisor.
+ *
+ * It is expected that the caller of this function will take the
+ * 'left' and use the value for 'start'. This way we have an
+ * cursor in the array. Note that the 'info','name', and 'len' will
+ * be updated at the subsequent calls.
+ *
+ * The 'max' is to be provided by the caller with the maximum
+ * number of entries that 'info', 'name', and 'len' arrays can
+ * be filled up with.
+ *
+ * Each entry in the 'name' array is expected to be of XEN_XSPLICE_NAME_SIZE
+ * length.
+ *
+ * Each entry in the 'info' array is expected to be of xen_xsplice_status_t
+ * structure size.
+ *
+ * Each entry in the 'len' array is expected to be of uint32_t size.
+ *
+ * The return value is zero if the hypercall completed successfully.
+ * Note that the return value is _not_ the amount of entries filled
+ * out - that is saved in 'done'.
+ *
+ * If there was an error performing the operation, the return value
+ * will contain an negative -EXX type value. The 'done' and 'left'
+ * will contain the number of entries that had been succesfully
+ * retrieved (if any).
+ */
+int xc_xsplice_list(xc_interface *xch, unsigned int max, unsigned int start,
+                    xen_xsplice_status_t *info,
+                    char *name, uint32_t *len,
+                    unsigned int *done,
+                    unsigned int *left)
+{
+    int rc;
+    DECLARE_SYSCTL;
+    /* The sizes are adjusted later - hence zero. */
+    DECLARE_HYPERCALL_BOUNCE(info, 0, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_HYPERCALL_BOUNCE(name, 0, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_HYPERCALL_BOUNCE(len, 0, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    uint32_t max_batch_sz, nr;
+    uint32_t version = 0, retries = 0;
+    uint32_t adjust = 0;
+    ssize_t sz;
+
+    if ( !max || !info || !name || !len )
+    {
+        errno = EINVAL;
+        return -1;
+    }
+
+    sysctl.cmd = XEN_SYSCTL_xsplice_op;
+    sysctl.u.xsplice.cmd = XEN_SYSCTL_XSPLICE_LIST;
+    sysctl.u.xsplice.pad = 0;
+    sysctl.u.xsplice.u.list.version = 0;
+    sysctl.u.xsplice.u.list.idx = start;
+    sysctl.u.xsplice.u.list.pad = 0;
+
+    max_batch_sz = max;
+    /* Convience value. */
+    sz = sizeof(*name) * XEN_XSPLICE_NAME_SIZE;
+    *done = 0;
+    *left = 0;
+    do {
+        /*
+         * The first time we go in this loop our 'max' may be bigger
+         * than what the hypervisor is comfortable with - hence the first
+         * couple of loops may adjust the number of entries we will
+         * want filled (tracked by 'nr').
+         *
+         * N.B. This is a do { } while loop and the right hand side of
+         * the conditional when adjusting will evaluate to false (as
+         * *left is set to zero before the loop. Hence we need this
+         * adjust - even if we reset it at the start of the loop.
+         */
+        if ( adjust )
+            adjust = 0; /* Used when adjusting the 'max_batch_sz' or 'retries'. */
+
+        nr = min(max - *done, max_batch_sz);
+
+        sysctl.u.xsplice.u.list.nr = nr;
+        /* Fix the size (may vary between hypercalls). */
+        HYPERCALL_BOUNCE_SET_SIZE(info, nr * sizeof(*info));
+        HYPERCALL_BOUNCE_SET_SIZE(name, nr * nr);
+        HYPERCALL_BOUNCE_SET_SIZE(len, nr * sizeof(*len));
+        /* Move the pointer to proper offset into 'info'. */
+        (HYPERCALL_BUFFER(info))->ubuf = info + *done;
+        (HYPERCALL_BUFFER(name))->ubuf = name + (sz * *done);
+        (HYPERCALL_BUFFER(len))->ubuf = len + *done;
+        /* Allocate memory. */
+        rc = xc_hypercall_bounce_pre(xch, info);
+        if ( rc )
+            break;
+
+        rc = xc_hypercall_bounce_pre(xch, name);
+        if ( rc )
+            break;
+
+        rc = xc_hypercall_bounce_pre(xch, len);
+        if ( rc )
+            break;
+
+        set_xen_guest_handle(sysctl.u.xsplice.u.list.status, info);
+        set_xen_guest_handle(sysctl.u.xsplice.u.list.name, name);
+        set_xen_guest_handle(sysctl.u.xsplice.u.list.len, len);
+
+        rc = do_sysctl(xch, &sysctl);
+        /*
+         * From here on we MUST call xc_hypercall_bounce. If rc < 0 we
+         * end up doing it (outside the loop), so using a break is OK.
+         */
+        if ( rc < 0 && errno == E2BIG )
+        {
+            if ( max_batch_sz <= 1 )
+                break;
+            max_batch_sz >>= 1;
+            adjust = 1; /* For the loop conditional to let us loop again. */
+            /* No memory leaks! */
+            xc_hypercall_bounce_post(xch, info);
+            xc_hypercall_bounce_post(xch, name);
+            xc_hypercall_bounce_post(xch, len);
+            continue;
+        }
+        else if ( rc < 0 ) /* For all other errors we bail out. */
+            break;
+
+        if ( !version )
+            version = sysctl.u.xsplice.u.list.version;
+
+        if ( sysctl.u.xsplice.u.list.version != version )
+        {
+            /* We could make this configurable as parameter? */
+            if ( retries++ > 3 )
+            {
+                rc = -1;
+                errno = EBUSY;
+                break;
+            }
+            *done = 0; /* Retry from scratch. */
+            version = sysctl.u.xsplice.u.list.version;
+            adjust = 1; /* And make sure we continue in the loop. */
+            /* No memory leaks. */
+            xc_hypercall_bounce_post(xch, info);
+            xc_hypercall_bounce_post(xch, name);
+            xc_hypercall_bounce_post(xch, len);
+            continue;
+        }
+
+        /* We should never hit this, but just in case. */
+        if ( rc > nr )
+        {
+            errno = EOVERFLOW; /* Overflow! */
+            rc = -1;
+            break;
+        }
+        *left = sysctl.u.xsplice.u.list.nr; /* Total remaining count. */
+        /* Copy only up 'rc' of data' - we could add 'min(rc,nr) if desired. */
+        HYPERCALL_BOUNCE_SET_SIZE(info, (rc * sizeof(*info)));
+        HYPERCALL_BOUNCE_SET_SIZE(name, (rc * sz));
+        HYPERCALL_BOUNCE_SET_SIZE(len, (rc * sizeof(*len)));
+        /* Bounce the data and free the bounce buffer. */
+        xc_hypercall_bounce_post(xch, info);
+        xc_hypercall_bounce_post(xch, name);
+        xc_hypercall_bounce_post(xch, len);
+        /* And update how many elements of info we have copied into. */
+        *done += rc;
+        /* Update idx. */
+        sysctl.u.xsplice.u.list.idx = *done;
+    } while ( adjust || (*done < max && *left != 0) );
+
+    if ( rc < 0 )
+    {
+        xc_hypercall_bounce_post(xch, len);
+        xc_hypercall_bounce_post(xch, name);
+        xc_hypercall_bounce_post(xch, info);
+    }
+
+    return rc > 0 ? 0 : rc;
+}
+
+static int _xc_xsplice_action(xc_interface *xch,
+                              char *name,
+                              unsigned int action,
+                              uint32_t timeout)
+{
+    int rc;
+    DECLARE_SYSCTL;
+    /* The size is figured out when we strlen(name) */
+    DECLARE_HYPERCALL_BOUNCE(name, 0, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    xen_xsplice_name_t def_name = { .pad = { 0, 0, 0 } };
+
+    def_name.size = strlen(name) + 1;
+
+    if ( def_name.size > XEN_XSPLICE_NAME_SIZE )
+    {
+        errno = EINVAL;
+        return -1;
+    }
+
+    HYPERCALL_BOUNCE_SET_SIZE(name, def_name.size);
+
+    if ( xc_hypercall_bounce_pre(xch, name) )
+        return -1;
+
+    sysctl.cmd = XEN_SYSCTL_xsplice_op;
+    sysctl.u.xsplice.cmd = XEN_SYSCTL_XSPLICE_ACTION;
+    sysctl.u.xsplice.pad = 0;
+    sysctl.u.xsplice.u.action.cmd = action;
+    sysctl.u.xsplice.u.action.timeout = timeout;
+
+    sysctl.u.xsplice.u.action.name = def_name;
+    set_xen_guest_handle(sysctl.u.xsplice.u.action.name.name, name);
+
+    rc = do_sysctl(xch, &sysctl);
+
+    xc_hypercall_bounce_post(xch, name);
+
+    return rc;
+}
+
+int xc_xsplice_apply(xc_interface *xch, char *name, uint32_t timeout)
+{
+    return _xc_xsplice_action(xch, name, XSPLICE_ACTION_APPLY, timeout);
+}
+
+int xc_xsplice_revert(xc_interface *xch, char *name, uint32_t timeout)
+{
+    return _xc_xsplice_action(xch, name, XSPLICE_ACTION_REVERT, timeout);
+}
+
+int xc_xsplice_unload(xc_interface *xch, char *name, uint32_t timeout)
+{
+    return _xc_xsplice_action(xch, name, XSPLICE_ACTION_UNLOAD, timeout);
+}
+
+int xc_xsplice_check(xc_interface *xch, char *name, uint32_t timeout)
+{
+    return _xc_xsplice_action(xch, name, XSPLICE_ACTION_CHECK, timeout);
+}
+
+int xc_xsplice_replace(xc_interface *xch, char *name, uint32_t timeout)
+{
+    return _xc_xsplice_action(xch, name, XSPLICE_ACTION_REPLACE, timeout);
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 06/27] xen-xsplice: Tool to manipulate xsplice payloads
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (4 preceding siblings ...)
  2016-04-25 15:34 ` [PATCH v9 05/27] libxc: Implementation of XEN_XSPLICE_op in libxc Konrad Rzeszutek Wilk
@ 2016-04-25 15:34 ` Konrad Rzeszutek Wilk
  2016-04-26  7:49   ` Ross Lagerwall
  2016-04-25 15:34 ` [PATCH v9 07/27] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables lookup Konrad Rzeszutek Wilk
                   ` (21 subsequent siblings)
  27 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:34 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Konrad Rzeszutek Wilk

A simple tool that allows an system admin to perform
basic xsplice operations:

 - Upload a xsplice file (with an unique name)
 - List all the xsplice payloads loaded.
 - Apply, revert, replace, or unload the payload using the
   unique name.
 - Do all two - upload, and apply the payload in one go (load).
   Also will use the name of the file as the <name>

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Wei Liu <wei.liu2@citrix.com>

v2:
 - Removed REVERTED state.
 - Fixed bugs handling XSPLICE_STATUS_PROGRESS.
 - Split status into state and error.
   Add REPLACE action.
v3:
 - Utilize the timeout and use the default one (let the hypervisor
   pick it).
 - Change the s/all/load and infer the <id> from name of file.
 - s/id/name/
 - Don't use hypercall buffer in upload_func, instead do it in libxc
 - Remove the debug printk.
 - Remove goto's (per Wei's review)
 - Use fprintf(stderr in error paths.
 - Add local variable block.
 - Syntax, expand comment, and don't overwrite rc if xc_xsplice_upload failed.
v4:
 - Remove LOADED state. Only have CHECKED state.
---
---
 .gitignore               |   1 +
 tools/misc/Makefile      |   4 +
 tools/misc/xen-xsplice.c | 463 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 468 insertions(+)
 create mode 100644 tools/misc/xen-xsplice.c

diff --git a/.gitignore b/.gitignore
index 20ffa2d..39eb779 100644
--- a/.gitignore
+++ b/.gitignore
@@ -182,6 +182,7 @@ tools/misc/xen_cpuperf
 tools/misc/xen-cpuid
 tools/misc/xen-detect
 tools/misc/xen-tmem-list-parse
+tools/misc/xen-xsplice
 tools/misc/xenperf
 tools/misc/xenpm
 tools/misc/xen-hvmctx
diff --git a/tools/misc/Makefile b/tools/misc/Makefile
index a94dad9..3a5f842 100644
--- a/tools/misc/Makefile
+++ b/tools/misc/Makefile
@@ -32,6 +32,7 @@ INSTALL_SBIN                   += xenlockprof
 INSTALL_SBIN                   += xenperf
 INSTALL_SBIN                   += xenpm
 INSTALL_SBIN                   += xenwatchdogd
+INSTALL_SBIN                   += xen-xsplice
 INSTALL_SBIN += $(INSTALL_SBIN-y)
 
 # Everything to be installed in a private bin/
@@ -103,6 +104,9 @@ xen-mfndump: xen-mfndump.o
 xenwatchdogd: xenwatchdogd.o
 	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
 
+xen-xsplice: xen-xsplice.o
+	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
+
 xen-lowmemd: xen-lowmemd.o
 	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenevtchn) $(LDLIBS_libxenctrl) $(LDLIBS_libxenstore) $(APPEND_LDFLAGS)
 
diff --git a/tools/misc/xen-xsplice.c b/tools/misc/xen-xsplice.c
new file mode 100644
index 0000000..fb9228e
--- /dev/null
+++ b/tools/misc/xen-xsplice.c
@@ -0,0 +1,463 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ */
+
+#include <fcntl.h>
+#include <libgen.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <unistd.h>
+#include <xenctrl.h>
+#include <xenstore.h>
+
+static xc_interface *xch;
+
+void show_help(void)
+{
+    fprintf(stderr,
+            "xen-xsplice: Xsplice test tool\n"
+            "Usage: xen-xsplice <command> [args]\n"
+            " <name> An unique name of payload. Up to %d characters.\n"
+            "Commands:\n"
+            "  help                   display this help\n"
+            "  upload <name> <file>   upload file <file> with <name> name\n"
+            "  list                   list payloads uploaded.\n"
+            "  apply <name>           apply <name> patch.\n"
+            "  revert <name>          revert name <name> patch.\n"
+            "  replace <name>         apply <name> patch and revert all others.\n"
+            "  unload <name>          unload name <name> patch.\n"
+            "  load  <file>           upload, check and apply <file>.\n"
+            "                         name is the <file> name\n",
+            XEN_XSPLICE_NAME_SIZE);
+}
+
+/* wrapper function */
+static int help_func(int argc, char *argv[])
+{
+    show_help();
+    return 0;
+}
+
+#define ARRAY_SIZE(a) (sizeof (a) / sizeof ((a)[0]))
+
+static const char *state2str(unsigned int state)
+{
+#define STATE(x) [XSPLICE_STATE_##x] = #x
+    static const char *const names[] = {
+            STATE(CHECKED),
+            STATE(APPLIED),
+    };
+#undef STATE
+    if (state >= ARRAY_SIZE(names) || !names[state])
+        return "unknown";
+
+    return names[state];
+}
+
+/* This value was choosen adhoc. It could be 42 too. */
+#define MAX_LEN 11
+static int list_func(int argc, char *argv[])
+{
+    unsigned int idx, done, left, i;
+    xen_xsplice_status_t *info = NULL;
+    char *name = NULL;
+    uint32_t *len = NULL;
+    int rc = ENOMEM;
+
+    if ( argc )
+    {
+        show_help();
+        return -1;
+    }
+    idx = left = 0;
+    info = malloc(sizeof(*info) * MAX_LEN);
+    if ( !info )
+        return rc;
+    name = malloc(sizeof(*name) * XEN_XSPLICE_NAME_SIZE * MAX_LEN);
+    if ( !name )
+    {
+        free(info);
+        return rc;
+    }
+    len = malloc(sizeof(*len) * MAX_LEN);
+    if ( !len ) {
+        free(name);
+        free(info);
+        return rc;
+    }
+
+    fprintf(stdout," ID                                     | status\n"
+                   "----------------------------------------+------------\n");
+    do {
+        done = 0;
+        /* The memset is done to catch errors. */
+        memset(info, 'A', sizeof(*info) * MAX_LEN);
+        memset(name, 'B', sizeof(*name * MAX_LEN * XEN_XSPLICE_NAME_SIZE));
+        memset(len, 'C', sizeof(*len) * MAX_LEN);
+        rc = xc_xsplice_list(xch, MAX_LEN, idx, info, name, len, &done, &left);
+        if ( rc )
+        {
+            fprintf(stderr, "Failed to list %d/%d: %d(%s)!\n",
+                    idx, left, errno, strerror(errno));
+            break;
+        }
+        for ( i = 0; i < done; i++ )
+        {
+            unsigned int j;
+            uint32_t sz;
+            char *str;
+
+            sz = len[i];
+            str = name + (i * XEN_XSPLICE_NAME_SIZE);
+            for ( j = sz; j < XEN_XSPLICE_NAME_SIZE; j++ )
+                str[j] = '\0';
+
+            printf("%-40s| %s", str, state2str(info[i].state));
+            if ( info[i].rc )
+                printf(" (%d, %s)\n", -info[i].rc, strerror(-info[i].rc));
+            else
+                puts("");
+        }
+        idx += done;
+    } while ( left );
+
+    free(name);
+    free(info);
+    free(len);
+    return rc;
+}
+#undef MAX_LEN
+
+static int get_name(int argc, char *argv[], char *name)
+{
+    ssize_t len = strlen(argv[0]);
+    if ( len > XEN_XSPLICE_NAME_SIZE )
+    {
+        fprintf(stderr, "ID MUST be %d characters!\n", XEN_XSPLICE_NAME_SIZE);
+        errno = EINVAL;
+        return errno;
+    }
+    /* Don't want any funny strings from the stack. */
+    memset(name, 0, XEN_XSPLICE_NAME_SIZE);
+    strncpy(name, argv[0], len);
+    return 0;
+}
+
+static int upload_func(int argc, char *argv[])
+{
+    char *filename;
+    char name[XEN_XSPLICE_NAME_SIZE];
+    int fd = 0, rc;
+    struct stat buf;
+    unsigned char *fbuf;
+    ssize_t len;
+
+    if ( argc != 2 )
+    {
+        show_help();
+        return -1;
+    }
+
+    if ( get_name(argc, argv, name) )
+        return EINVAL;
+
+    filename = argv[1];
+    fd = open(filename, O_RDONLY);
+    if ( fd < 0 )
+    {
+        fprintf(stderr, "Could not open %s, error: %d(%s)\n",
+                filename, errno, strerror(errno));
+        return errno;
+    }
+    if ( stat(filename, &buf) != 0 )
+    {
+        fprintf(stderr, "Could not get right size %s, error: %d(%s)\n",
+                filename, errno, strerror(errno));
+        close(fd);
+        return errno;
+    }
+
+    len = buf.st_size;
+    fbuf = mmap(0, len, PROT_READ, MAP_PRIVATE, fd, 0);
+    if ( fbuf == MAP_FAILED )
+    {
+        fprintf(stderr,"Could not map: %s, error: %d(%s)\n",
+                filename, errno, strerror(errno));
+        close (fd);
+        return errno;
+    }
+    printf("Uploading %s (%zu bytes)\n", filename, len);
+    rc = xc_xsplice_upload(xch, name, fbuf, len);
+    if ( rc )
+        fprintf(stderr, "Upload failed: %s, error: %d(%s)!\n",
+                filename, errno, strerror(errno));
+
+    if ( munmap( fbuf, len) )
+    {
+        fprintf(stderr, "Could not unmap!? error: %d(%s)!\n",
+                errno, strerror(errno));
+        if ( !rc )
+            rc = errno;
+    }
+    close(fd);
+
+    return rc;
+}
+
+/* These MUST match to the 'action_options[]' array slots. */
+enum {
+    ACTION_APPLY = 0,
+    ACTION_REVERT = 1,
+    ACTION_UNLOAD = 2,
+    ACTION_REPLACE = 3,
+};
+
+struct {
+    int allow; /* State it must be in to call function. */
+    int expected; /* The state to be in after the function. */
+    const char *name;
+    int (*function)(xc_interface *xch, char *name, uint32_t timeout);
+    unsigned int executed; /* Has the function been called?. */
+} action_options[] = {
+    {   .allow = XSPLICE_STATE_CHECKED,
+        .expected = XSPLICE_STATE_APPLIED,
+        .name = "apply",
+        .function = xc_xsplice_apply,
+    },
+    {   .allow = XSPLICE_STATE_APPLIED,
+        .expected = XSPLICE_STATE_CHECKED,
+        .name = "revert",
+        .function = xc_xsplice_revert,
+    },
+    {   .allow = XSPLICE_STATE_CHECKED,
+        .expected = -ENOENT,
+        .name = "unload",
+        .function = xc_xsplice_unload,
+    },
+    {   .allow = XSPLICE_STATE_CHECKED,
+        .expected = XSPLICE_STATE_APPLIED,
+        .name = "replace",
+        .function = xc_xsplice_replace,
+    },
+};
+
+/* Go around 300 * 0.1 seconds = 30 seconds. */
+#define RETRIES 300
+/* aka 0.1 second */
+#define DELAY 100000
+
+int action_func(int argc, char *argv[], unsigned int idx)
+{
+    char name[XEN_XSPLICE_NAME_SIZE];
+    int rc, original_state;
+    xen_xsplice_status_t status;
+    unsigned int retry = 0;
+
+    if ( argc != 1 )
+    {
+        show_help();
+        return -1;
+    }
+
+    if ( idx >= ARRAY_SIZE(action_options) )
+        return -1;
+
+    if ( get_name(argc, argv, name) )
+        return EINVAL;
+
+    /* Check initial status. */
+    rc = xc_xsplice_get(xch, name, &status);
+    if ( rc )
+    {
+        fprintf(stderr, "%s failed to get status (rc=%d, %s)!\n",
+                name, -rc, strerror(-rc));
+        return -1;
+    }
+    if ( status.rc == -EAGAIN )
+    {
+        fprintf(stderr, "%s failed. Operation already in progress\n", name);
+        return -1;
+    }
+
+    if ( status.state == action_options[idx].expected )
+    {
+        printf("No action needed\n");
+        return 0;
+    }
+
+    /* Perform action. */
+    if ( action_options[idx].allow & status.state )
+    {
+        printf("Performing %s:", action_options[idx].name);
+        rc = action_options[idx].function(xch, name, 0);
+        if ( rc )
+        {
+            fprintf(stderr, "%s failed with %d(%s)\n", name, -rc, strerror(-rc));
+            return -1;
+        }
+    }
+    else
+    {
+        printf("%s: in wrong state (%s), expected (%s)\n",
+               name, state2str(status.state),
+               state2str(action_options[idx].expected));
+        return -1;
+    }
+
+    original_state = status.state;
+    do {
+        rc = xc_xsplice_get(xch, name, &status);
+        if ( rc )
+        {
+            rc = -errno;
+            break;
+        }
+
+        if ( status.state != original_state )
+            break;
+        if ( status.rc && status.rc != -EAGAIN )
+        {
+            rc = status.rc;
+            break;
+        }
+
+        printf(".");
+        fflush(stdout);
+        usleep(DELAY);
+    } while ( ++retry < RETRIES );
+
+    if ( retry >= RETRIES )
+    {
+        fprintf(stderr, "%s: Operation didn't complete after 30 seconds.\n", name);
+        return -1;
+    }
+    else
+    {
+        if ( rc == 0 )
+            rc = status.state;
+
+        if ( action_options[idx].expected == rc )
+            printf(" completed\n");
+        else if ( rc < 0 )
+        {
+            fprintf(stderr, "%s failed with %d(%s)\n", name, -rc, strerror(-rc));
+            return -1;
+        }
+        else
+        {
+            fprintf(stderr, "%s: in wrong state (%s), expected (%s)\n",
+               name, state2str(rc),
+               state2str(action_options[idx].expected));
+            return -1;
+        }
+    }
+
+    return 0;
+}
+
+static int load_func(int argc, char *argv[])
+{
+    int rc;
+    char *new_argv[2];
+    char *path, *name, *lastdot;
+
+    if ( argc != 1 )
+    {
+        show_help();
+        return -1;
+    }
+    /* <file> */
+    new_argv[1] = argv[0];
+
+    /* Synthesize the <id> */
+    path = strdup(argv[0]);
+
+    name = basename(path);
+    lastdot = strrchr(name, '.');
+    if ( lastdot != NULL )
+        *lastdot = '\0';
+    new_argv[0] = name;
+
+    rc = upload_func(2 /* <id> <file> */, new_argv);
+    if ( rc )
+        return rc;
+
+    rc = action_func(1 /* only <id> */, new_argv, ACTION_APPLY);
+    if ( rc )
+        action_func(1, new_argv, ACTION_UNLOAD);
+
+    free(path);
+    return rc;
+}
+
+/*
+ * These are also functions in action_options that are called in case
+ * none of the ones in main_options match.
+ */
+struct {
+    const char *name;
+    int (*function)(int argc, char *argv[]);
+} main_options[] = {
+    { "help", help_func },
+    { "list", list_func },
+    { "upload", upload_func },
+    { "load", load_func },
+};
+
+int main(int argc, char *argv[])
+{
+    int i, j, ret;
+
+    if ( argc  <= 1 )
+    {
+        show_help();
+        return 0;
+    }
+    for ( i = 0; i < ARRAY_SIZE(main_options); i++ )
+        if (!strncmp(main_options[i].name, argv[1], strlen(argv[1])))
+            break;
+
+    if ( i == ARRAY_SIZE(main_options) )
+    {
+        for ( j = 0; j < ARRAY_SIZE(action_options); j++ )
+            if (!strncmp(action_options[j].name, argv[1], strlen(argv[1])))
+                break;
+
+        if ( j == ARRAY_SIZE(action_options) )
+        {
+            fprintf(stderr, "Unrecognised command '%s' -- try "
+                   "'xen-xsplice help'\n", argv[1]);
+            return 1;
+        }
+    } else
+        j = ARRAY_SIZE(action_options);
+
+    xch = xc_interface_open(0,0,0);
+    if ( !xch )
+    {
+        fprintf(stderr, "failed to get the handler\n");
+        return 0;
+    }
+
+    if ( i == ARRAY_SIZE(main_options) )
+        ret = action_func(argc -2, argv + 2, j);
+    else
+        ret = main_options[i].function(argc -2, argv + 2);
+
+    xc_interface_close(xch);
+
+    return !!ret;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 07/27] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables lookup.
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (5 preceding siblings ...)
  2016-04-25 15:34 ` [PATCH v9 06/27] xen-xsplice: Tool to manipulate xsplice payloads Konrad Rzeszutek Wilk
@ 2016-04-25 15:34 ` Konrad Rzeszutek Wilk
  2016-04-26 10:31   ` Jan Beulich
  2016-04-25 15:34 ` [PATCH v9 08/27] arm/x86/vmap: Add v[z|m]alloc_xen and vm_init_type Konrad Rzeszutek Wilk
                   ` (20 subsequent siblings)
  27 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:34 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Julien Grall, Stefano Stabellini, Keir Fraser, Jan Beulich,
	Konrad Rzeszutek Wilk

During execution of the hypervisor we have two regions of
executable code - stext -> _etext, and _sinittext -> _einitext.

The later is not needed after bootup.

We also have various built-in macros and functions to search
in between those two swaths depending on the state of the system.

That is either for bug_frames, exceptions (x86) or symbol
names for the instruction.

With xSplice in the picture - we need a mechanism for new payloads
to searched as well for all of this.

Originally we had extra 'if (xsplice)...' but that gets
a bit tiring and does not hook up nicely.

This 'struct virtual_region' and virtual_region_list provide a
mechanism to search for the bug_frames, exception table,
and symbol names entries without having various calls in
other sub-components in the system.

Code which wishes to participate in bug_frames and exception table
entries search has to only use two public APIs:
 - register_virtual_region
 - unregister_virtual_region

to let the core code know.

If the ->lookup_symbol is not then the default internal symbol lookup
mechanism is used.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com> [ARM]

---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v4: New patch.
v5:
 - Rename to virtual_region.
 - Ditch the 'skip' function.
 - Remove the _stext.
 - Use RCU lists.
 - Add a search function.
 - Remove extern, add rcu_read_lock. remove __ from name.
v6: s/search_for_text/find_text_region/.
 - Drop the uaccess.h need. Make setup_virtual_regions accept two parameters.
   Remove #ifdef.
 - Constify struct exception_table_entry.
 - Remove some newlines.
 - Change header file guard #define to proper name.
v9:
 - Proper #define (was missing _H)
 - s/big/bit/
 - Change 'start' and 'end' to void* to not do casting in xSplice code.
 - Make 'start' and 'end' be const void *.
---
 xen/arch/arm/setup.c             |   4 ++
 xen/arch/arm/traps.c             |  39 +++++++----
 xen/arch/x86/extable.c           |  12 +++-
 xen/arch/x86/setup.c             |   6 ++
 xen/arch/x86/traps.c             |  40 ++++++-----
 xen/common/Makefile              |   1 +
 xen/common/symbols.c             |  11 ++-
 xen/common/virtual_region.c      | 148 +++++++++++++++++++++++++++++++++++++++
 xen/include/xen/symbols.h        |   9 +++
 xen/include/xen/virtual_region.h |  47 +++++++++++++
 10 files changed, 280 insertions(+), 37 deletions(-)
 create mode 100644 xen/common/virtual_region.c
 create mode 100644 xen/include/xen/virtual_region.h

diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 6d205a9..09ff1ea 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -34,6 +34,7 @@
 #include <xen/keyhandler.h>
 #include <xen/cpu.h>
 #include <xen/pfn.h>
+#include <xen/virtual_region.h>
 #include <xen/vmap.h>
 #include <xen/libfdt/libfdt.h>
 #include <xen/acpi.h>
@@ -860,6 +861,9 @@ void __init start_xen(unsigned long boot_phys_offset,
 
     system_state = SYS_STATE_active;
 
+    /* Must be done past setting system_state. */
+    unregister_init_virtual_region();
+
     domain_unpause_by_systemcontroller(dom0);
 
     /* Switch on to the dynamically allocated stack for the idle vcpu
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 9abfc3c..d9ffcc3 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -31,6 +31,7 @@
 #include <xen/softirq.h>
 #include <xen/domain_page.h>
 #include <xen/perfc.h>
+#include <xen/virtual_region.h>
 #include <public/sched.h>
 #include <public/xen.h>
 #include <asm/debugger.h>
@@ -101,6 +102,8 @@ integer_param("debug_stack_lines", debug_stack_lines);
 
 void init_traps(void)
 {
+    setup_virtual_regions(NULL, NULL);
+
     /* Setup Hyp vector base */
     WRITE_SYSREG((vaddr_t)hyp_traps_vector, VBAR_EL2);
 
@@ -1116,27 +1119,33 @@ void do_unexpected_trap(const char *msg, struct cpu_user_regs *regs)
 
 int do_bug_frame(struct cpu_user_regs *regs, vaddr_t pc)
 {
-    const struct bug_frame *bug;
+    const struct bug_frame *bug = NULL;
     const char *prefix = "", *filename, *predicate;
     unsigned long fixup;
-    int id, lineno;
-    static const struct bug_frame *const stop_frames[] = {
-        __stop_bug_frames_0,
-        __stop_bug_frames_1,
-        __stop_bug_frames_2,
-        NULL
-    };
+    int id = -1, lineno;
+    const struct virtual_region *region;
 
-    for ( bug = __start_bug_frames, id = 0; stop_frames[id]; ++bug )
+    region = find_text_region(pc);
+    if ( region )
     {
-        while ( unlikely(bug == stop_frames[id]) )
-            ++id;
+        for ( id = 0; id < BUGFRAME_NR; id++ )
+        {
+            const struct bug_frame *b;
+            unsigned int i;
 
-        if ( ((vaddr_t)bug_loc(bug)) == pc )
-            break;
+            for ( i = 0, b = region->frame[id].bugs;
+                  i < region->frame[id].n_bugs; b++, i++ )
+            {
+                if ( ((vaddr_t)bug_loc(b)) == pc )
+                {
+                    bug = b;
+                    goto found;
+                }
+            }
+        }
     }
-
-    if ( !stop_frames[id] )
+ found:
+    if ( !bug )
         return -ENOENT;
 
     /* WARN, BUG or ASSERT: decode the filename pointer and line number. */
diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c
index 89b5bcb..2a06cca 100644
--- a/xen/arch/x86/extable.c
+++ b/xen/arch/x86/extable.c
@@ -1,10 +1,12 @@
 
-#include <xen/config.h>
 #include <xen/init.h>
+#include <xen/list.h>
 #include <xen/perfc.h>
+#include <xen/rcupdate.h>
 #include <xen/sort.h>
 #include <xen/spinlock.h>
 #include <asm/uaccess.h>
+#include <xen/virtual_region.h>
 
 #define EX_FIELD(ptr, field) ((unsigned long)&(ptr)->field + (ptr)->field)
 
@@ -80,8 +82,12 @@ search_one_table(const struct exception_table_entry *first,
 unsigned long
 search_exception_table(unsigned long addr)
 {
-    return search_one_table(
-        __start___ex_table, __stop___ex_table-1, addr);
+    const struct virtual_region *region = find_text_region(addr);
+
+    if ( region && region->ex )
+        return search_one_table(region->ex, region->ex_end - 1, addr);
+
+    return 0;
 }
 
 unsigned long
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 3696c31..22dc148 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -26,6 +26,7 @@
 #include <xen/pfn.h>
 #include <xen/nodemask.h>
 #include <xen/tmem_xen.h>
+#include <xen/virtual_region.h>
 #include <xen/watchdog.h>
 #include <public/version.h>
 #include <compat/platform.h>
@@ -515,6 +516,9 @@ static void noinline init_done(void)
 
     system_state = SYS_STATE_active;
 
+    /* MUST be done prior to removing .init data. */
+    unregister_init_virtual_region();
+
     domain_unpause_by_systemcontroller(hardware_domain);
 
     /* Zero the .init code and data. */
@@ -617,6 +621,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     smp_prepare_boot_cpu();
     sort_exception_tables();
 
+    setup_virtual_regions(__start___ex_table, __stop___ex_table);
+
     /* Full exception support from here on in. */
 
     loader = (mbi->flags & MBI_LOADERNAME)
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index c6c9227..f73f7f3 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -48,6 +48,7 @@
 #include <xen/kexec.h>
 #include <xen/trace.h>
 #include <xen/paging.h>
+#include <xen/virtual_region.h>
 #include <xen/watchdog.h>
 #include <asm/system.h>
 #include <asm/io.h>
@@ -1229,18 +1230,12 @@ static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
 
 void do_invalid_op(struct cpu_user_regs *regs)
 {
-    const struct bug_frame *bug;
+    const struct bug_frame *bug = NULL;
     u8 bug_insn[2];
     const char *prefix = "", *filename, *predicate, *eip = (char *)regs->eip;
     unsigned long fixup;
-    int id, lineno;
-    static const struct bug_frame *const stop_frames[] = {
-        __stop_bug_frames_0,
-        __stop_bug_frames_1,
-        __stop_bug_frames_2,
-        __stop_bug_frames_3,
-        NULL
-    };
+    int id = -1, lineno;
+    const struct virtual_region *region;
 
     DEBUGGER_trap_entry(TRAP_invalid_op, regs);
 
@@ -1257,16 +1252,29 @@ void do_invalid_op(struct cpu_user_regs *regs)
          memcmp(bug_insn, "\xf\xb", sizeof(bug_insn)) )
         goto die;
 
-    for ( bug = __start_bug_frames, id = 0; stop_frames[id]; ++bug )
+    region = find_text_region(regs->eip);
+    if ( region )
     {
-        while ( unlikely(bug == stop_frames[id]) )
-            ++id;
-        if ( bug_loc(bug) == eip )
-            break;
+        for ( id = 0; id < BUGFRAME_NR; id++ )
+        {
+            const struct bug_frame *b;
+            unsigned int i;
+
+            for ( i = 0, b = region->frame[id].bugs;
+                  i < region->frame[id].n_bugs; b++, i++ )
+            {
+                if ( bug_loc(b) == eip )
+                {
+                    bug = b;
+                    goto found;
+                }
+            }
+        }
     }
-    if ( !stop_frames[id] )
-        goto die;
 
+ found:
+    if ( !bug )
+        goto die;
     eip += sizeof(bug_insn);
     if ( id == BUGFRAME_run_fn )
     {
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 910ac69..1e4bc70 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -51,6 +51,7 @@ obj-y += time.o
 obj-y += timer.o
 obj-y += trace.o
 obj-y += version.o
+obj-y += virtual_region.o
 obj-y += vm_event.o
 obj-y += vmap.o
 obj-y += vsprintf.o
diff --git a/xen/common/symbols.c b/xen/common/symbols.c
index a59c59d..b18ddcd1 100644
--- a/xen/common/symbols.c
+++ b/xen/common/symbols.c
@@ -17,6 +17,7 @@
 #include <xen/lib.h>
 #include <xen/string.h>
 #include <xen/spinlock.h>
+#include <xen/virtual_region.h>
 #include <public/platform.h>
 #include <xen/guest_access.h>
 
@@ -97,8 +98,7 @@ static unsigned int get_symbol_offset(unsigned long pos)
 
 bool_t is_active_kernel_text(unsigned long addr)
 {
-    return (is_kernel_text(addr) ||
-            (system_state < SYS_STATE_active && is_kernel_inittext(addr)));
+    return !!find_text_region(addr);
 }
 
 const char *symbols_lookup(unsigned long addr,
@@ -108,13 +108,18 @@ const char *symbols_lookup(unsigned long addr,
 {
     unsigned long i, low, high, mid;
     unsigned long symbol_end = 0;
+    const struct virtual_region *region;
 
     namebuf[KSYM_NAME_LEN] = 0;
     namebuf[0] = 0;
 
-    if (!is_active_kernel_text(addr))
+    region = find_text_region(addr);
+    if (!region)
         return NULL;
 
+    if (region->symbols_lookup)
+        return region->symbols_lookup(addr, symbolsize, offset, namebuf);
+
         /* do a binary search on the sorted symbols_addresses array */
     low = 0;
     high = symbols_num_syms;
diff --git a/xen/common/virtual_region.c b/xen/common/virtual_region.c
new file mode 100644
index 0000000..d75b75d
--- /dev/null
+++ b/xen/common/virtual_region.c
@@ -0,0 +1,148 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#include <xen/init.h>
+#include <xen/kernel.h>
+#include <xen/rcupdate.h>
+#include <xen/spinlock.h>
+#include <xen/virtual_region.h>
+
+static struct virtual_region core = {
+    .list = LIST_HEAD_INIT(core.list),
+    .start = _stext,
+    .end = _etext,
+};
+
+/* Becomes irrelevant when __init sections are cleared. */
+static struct virtual_region core_init __initdata = {
+    .list = LIST_HEAD_INIT(core_init.list),
+    .start = _sinittext,
+    .end = _einittext,
+};
+
+/*
+ * RCU locking. Additions are done either at startup (when there is only
+ * one CPU) or when all CPUs are running without IRQs.
+ *
+ * Deletions are bit tricky. We do it when xSplicing (all CPUs running
+ * without IRQs) or during bootup (when clearing the init).
+ *
+ * Hence we use list_del_rcu (which sports an memory fence) and a spinlock
+ * on deletion.
+ *
+ * All readers of virtual_region_list MUST use list list_for_each_entry_rcu.
+ *
+ */
+static LIST_HEAD(virtual_region_list);
+static DEFINE_SPINLOCK(virtual_region_lock);
+static DEFINE_RCU_READ_LOCK(rcu_virtual_region_lock);
+
+const struct virtual_region *find_text_region(unsigned long addr)
+{
+    const struct virtual_region *region;
+
+    rcu_read_lock(&rcu_virtual_region_lock);
+    list_for_each_entry_rcu( region, &virtual_region_list, list )
+    {
+        if ( (void *)addr >= region->start && (void *)addr < region->end )
+        {
+            rcu_read_unlock(&rcu_virtual_region_lock);
+            return region;
+        }
+    }
+    rcu_read_unlock(&rcu_virtual_region_lock);
+
+    return NULL;
+}
+
+void register_virtual_region(struct virtual_region *r)
+{
+    ASSERT(!local_irq_is_enabled());
+
+    list_add_tail_rcu(&r->list, &virtual_region_list);
+}
+
+static void remove_virtual_region(struct virtual_region *r)
+{
+    unsigned long flags;
+
+    spin_lock_irqsave(&virtual_region_lock, flags);
+    list_del_rcu(&r->list);
+    spin_unlock_irqrestore(&virtual_region_lock, flags);
+    /*
+     * We do not need to invoke call_rcu.
+     *
+     * This is due to the fact that on the deletion we have made sure
+     * to use spinlocks (to guard against somebody else calling
+     * unregister_virtual_region) and list_deletion spiced with
+     * memory barrier.
+     *
+     * That protects us from corrupting the list as the readers all
+     * use list_for_each_entry_rcu which is safe against concurrent
+     * deletions.
+     */
+}
+
+void unregister_virtual_region(struct virtual_region *r)
+{
+    /* Expected to be called from xSplice - which has IRQs disabled. */
+    ASSERT(!local_irq_is_enabled());
+
+    remove_virtual_region(r);
+}
+
+void __init unregister_init_virtual_region(void)
+{
+    BUG_ON(system_state != SYS_STATE_active);
+
+    remove_virtual_region(&core_init);
+}
+
+void __init setup_virtual_regions(const struct exception_table_entry *start,
+                                  const struct exception_table_entry *end)
+{
+    size_t sz;
+    unsigned int i;
+    static const struct bug_frame *const __initconstrel bug_frames[] = {
+        __start_bug_frames,
+        __stop_bug_frames_0,
+        __stop_bug_frames_1,
+        __stop_bug_frames_2,
+#ifdef CONFIG_X86
+        __stop_bug_frames_3,
+#endif
+        NULL
+    };
+
+    for ( i = 1; bug_frames[i]; i++ )
+    {
+        const struct bug_frame *s;
+
+        s = bug_frames[i - 1];
+        sz = bug_frames[i] - s;
+
+        core.frame[i - 1].n_bugs = sz;
+        core.frame[i - 1].bugs = s;
+
+        core_init.frame[i - 1].n_bugs = sz;
+        core_init.frame[i - 1].bugs = s;
+    }
+
+    core_init.ex = core.ex = start;
+    core_init.ex_end = core.ex_end = end;
+
+    register_virtual_region(&core_init);
+    register_virtual_region(&core);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/symbols.h b/xen/include/xen/symbols.h
index 1fa0537..f58e611 100644
--- a/xen/include/xen/symbols.h
+++ b/xen/include/xen/symbols.h
@@ -5,6 +5,15 @@
 
 #define KSYM_NAME_LEN 127
 
+/*
+ * Typedef for the callback functions that symbols_lookup
+ * can call if virtual_region_list has an callback for it.
+ */
+typedef const char *symbols_lookup_t(unsigned long addr,
+                                     unsigned long *symbolsize,
+                                     unsigned long *offset,
+                                     char *namebuf);
+
 /* Lookup an address. */
 const char *symbols_lookup(unsigned long addr,
                            unsigned long *symbolsize,
diff --git a/xen/include/xen/virtual_region.h b/xen/include/xen/virtual_region.h
new file mode 100644
index 0000000..e5e58ed
--- /dev/null
+++ b/xen/include/xen/virtual_region.h
@@ -0,0 +1,47 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#ifndef __XEN_VIRTUAL_REGION_H__
+#define __XEN_VIRTUAL_REGION_H__
+
+#include <xen/list.h>
+#include <xen/symbols.h>
+
+struct virtual_region
+{
+    struct list_head list;
+    const void *start;                /* Virtual address start. */
+    const void *end;                  /* Virtual address end. */
+
+    /* If this is NULL the default lookup mechanism is used. */
+    symbols_lookup_t *symbols_lookup;
+
+    struct {
+        const struct bug_frame *bugs; /* The pointer to array of bug frames. */
+        size_t n_bugs;          /* The number of them. */
+    } frame[BUGFRAME_NR];
+
+    const struct exception_table_entry *ex;
+    const struct exception_table_entry *ex_end;
+};
+
+const struct virtual_region *find_text_region(unsigned long addr);
+void setup_virtual_regions(const struct exception_table_entry *start,
+                           const struct exception_table_entry *end);
+void unregister_init_virtual_region(void);
+void register_virtual_region(struct virtual_region *r);
+void unregister_virtual_region(struct virtual_region *r);
+
+#endif /* __XEN_VIRTUAL_REGION_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 08/27] arm/x86/vmap: Add v[z|m]alloc_xen and vm_init_type
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (6 preceding siblings ...)
  2016-04-25 15:34 ` [PATCH v9 07/27] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables lookup Konrad Rzeszutek Wilk
@ 2016-04-25 15:34 ` Konrad Rzeszutek Wilk
  2016-04-26 10:47   ` Jan Beulich
  2016-04-25 15:34 ` [PATCH v9 09/27] x86/mm: Introduce modify_xen_mappings() Konrad Rzeszutek Wilk
                   ` (19 subsequent siblings)
  27 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:34 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Stefano Stabellini, Keir Fraser, Konrad Rzeszutek Wilk,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich

For those users who want to use the virtual addresses that
are in the hypervisor's code/data region address space -
these three new functions allow that.

Implementation wise the vmap API keeps track of two virtual
address regions now:
 a) VMAP_VIRT_START
 b) Any provided virtual address space (need start and end).

The a) one is the default one and the existing behavior
for users of vmalloc, vmap, etc is the same.

If however one wishes to use the b) one only has to use
the vm_init_type to initialize and the v[m|z]alloc_xen to utilize
it (vfree is capable of searching both address spaces).

This allows users (such as xSplice) to provide their own
mechanism to change the the page flags, and also use virtual
addresses closer to the hypervisor virtual addresses (at least
on x86) while not having to deal with the allocation of
pages.

For example of users, see patch titled "xsplice: Implement payload
loading", where we parse the payload's ELF relocations - which
is defined to be signed 32-bit (on x86) (max displacement hence
is 2GB virtual space, ARM32 is 128MB). The displacement of the
hypervisor virtual addresses to the vmalloc (on x86)
is more than 32-bits - which means that ELF relocations would
truncate the 34 and 33th bit. Hence this alternate API.

We also add add extra checks in case the b) range has not been
initialized.

Part of this patch also removes 'vm_alloc' decleration as
we do not have any users of it - and if there ever will be - we
will have to expose and vm_alloc_xen variant.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Suggested-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com> [ARM]
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>

v4: New patch.
v5: Update per Jan's comments.
v6: Drop the stray parentheses on typedefs.
    Ditch the vunmap callback. Stash away the virtual addresses in lists.
    Ditch the vmap callback. Just provide virtual address.
    Ditch the vmalloc_range. Require users of alternative virtual address
    to call vmap_init_type first.
v7: Don't expose the vmalloc_type and such. Instead provide an wrapper
    called vmalloc_xen for those.
    Rename the enum, change one of the names.
    Moved the vunmap_type around in c file so we don't have to declare
    it in the header.
v9: Remove the vunmap_xen, removed vm_alloc from header.
    Add vzalloc_xen
---
 xen/arch/arm/kernel.c       |   2 +-
 xen/arch/arm/mm.c           |   2 +-
 xen/arch/x86/mm.c           |   2 +-
 xen/common/virtual_region.c |   1 -
 xen/common/vmap.c           | 205 +++++++++++++++++++++++++++-----------------
 xen/drivers/acpi/osl.c      |   2 +-
 xen/include/xen/vmap.h      |  22 ++++-
 7 files changed, 150 insertions(+), 86 deletions(-)

diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
index 61808ac..9871bd9 100644
--- a/xen/arch/arm/kernel.c
+++ b/xen/arch/arm/kernel.c
@@ -299,7 +299,7 @@ static __init int kernel_decompress(struct bootmodule *mod)
         return -ENOMEM;
     }
     mfn = _mfn(page_to_mfn(pages));
-    output = __vmap(&mfn, 1 << kernel_order_out, 1, 1, PAGE_HYPERVISOR);
+    output = __vmap(&mfn, 1 << kernel_order_out, 1, 1, PAGE_HYPERVISOR, VMAP_DEFAULT);
 
     rc = perform_gunzip(output, input, size);
     clean_dcache_va_range(output, output_size);
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 7065c3e..94ea054 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -807,7 +807,7 @@ void *ioremap_attr(paddr_t pa, size_t len, unsigned int attributes)
     mfn_t mfn = _mfn(PFN_DOWN(pa));
     unsigned int offs = pa & (PAGE_SIZE - 1);
     unsigned int nr = PFN_UP(offs + len);
-    void *ptr = __vmap(&mfn, nr, 1, 1, attributes);
+    void *ptr = __vmap(&mfn, nr, 1, 1, attributes, VMAP_DEFAULT);
 
     if ( ptr == NULL )
         return NULL;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index ca26f49..633f0dc 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -6124,7 +6124,7 @@ void __iomem *ioremap(paddr_t pa, size_t len)
         unsigned int offs = pa & (PAGE_SIZE - 1);
         unsigned int nr = PFN_UP(offs + len);
 
-        va = __vmap(&mfn, nr, 1, 1, PAGE_HYPERVISOR_NOCACHE) + offs;
+        va = __vmap(&mfn, nr, 1, 1, PAGE_HYPERVISOR_NOCACHE, VMAP_DEFAULT) + offs;
     }
 
     return (void __force __iomem *)va;
diff --git a/xen/common/virtual_region.c b/xen/common/virtual_region.c
index d75b75d..41f8c28 100644
--- a/xen/common/virtual_region.c
+++ b/xen/common/virtual_region.c
@@ -33,7 +33,6 @@ static struct virtual_region core_init __initdata = {
  * on deletion.
  *
  * All readers of virtual_region_list MUST use list list_for_each_entry_rcu.
- *
  */
 static LIST_HEAD(virtual_region_list);
 static DEFINE_SPINLOCK(virtual_region_lock);
diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index 134eda0..9931482 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -10,40 +10,43 @@
 #include <asm/page.h>
 
 static DEFINE_SPINLOCK(vm_lock);
-static void *__read_mostly vm_base;
-#define vm_bitmap ((unsigned long *)vm_base)
+static void *__read_mostly vm_base[VMAP_REGION_NR];
+#define vm_bitmap(x) ((unsigned long *)vm_base[x])
 /* highest allocated bit in the bitmap */
-static unsigned int __read_mostly vm_top;
+static unsigned int __read_mostly vm_top[VMAP_REGION_NR];
 /* total number of bits in the bitmap */
-static unsigned int __read_mostly vm_end;
+static unsigned int __read_mostly vm_end[VMAP_REGION_NR];
 /* lowest known clear bit in the bitmap */
-static unsigned int vm_low;
+static unsigned int vm_low[VMAP_REGION_NR];
 
-void __init vm_init(void)
+void __init vm_init_type(enum vmap_region type, void *start, void *end)
 {
     unsigned int i, nr;
     unsigned long va;
 
-    vm_base = (void *)VMAP_VIRT_START;
-    vm_end = PFN_DOWN(arch_vmap_virt_end() - vm_base);
-    vm_low = PFN_UP((vm_end + 7) / 8);
-    nr = PFN_UP((vm_low + 7) / 8);
-    vm_top = nr * PAGE_SIZE * 8;
+    ASSERT(!vm_base[type]);
 
-    for ( i = 0, va = (unsigned long)vm_bitmap; i < nr; ++i, va += PAGE_SIZE )
+    vm_base[type] = start;
+    vm_end[type] = PFN_DOWN(end - start);
+    vm_low[type]= PFN_UP((vm_end[type] + 7) / 8);
+    nr = PFN_UP((vm_low[type] + 7) / 8);
+    vm_top[type] = nr * PAGE_SIZE * 8;
+
+    for ( i = 0, va = (unsigned long)vm_bitmap(type); i < nr; ++i, va += PAGE_SIZE )
     {
         struct page_info *pg = alloc_domheap_page(NULL, 0);
 
         map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR);
         clear_page((void *)va);
     }
-    bitmap_fill(vm_bitmap, vm_low);
+    bitmap_fill(vm_bitmap(type), vm_low[type]);
 
     /* Populate page tables for the bitmap if necessary. */
-    populate_pt_range(va, 0, vm_low - nr);
+    populate_pt_range(va, 0, vm_low[type] - nr);
 }
 
-void *vm_alloc(unsigned int nr, unsigned int align)
+static void *vm_alloc_type(unsigned int nr, unsigned int align,
+                           enum vmap_region t)
 {
     unsigned int start, bit;
 
@@ -52,27 +55,31 @@ void *vm_alloc(unsigned int nr, unsigned int align)
     else if ( align & (align - 1) )
         align &= -align;
 
+    ASSERT(t != VMAP_REGION_NR);
+    if ( !vm_base[t] )
+        return NULL;
+
     spin_lock(&vm_lock);
     for ( ; ; )
     {
         struct page_info *pg;
 
-        ASSERT(vm_low == vm_top || !test_bit(vm_low, vm_bitmap));
-        for ( start = vm_low; start < vm_top; )
+        ASSERT(vm_low[t] == vm_top[t] || !test_bit(vm_low[t], vm_bitmap(t)));
+        for ( start = vm_low[t]; start < vm_top[t]; )
         {
-            bit = find_next_bit(vm_bitmap, vm_top, start + 1);
-            if ( bit > vm_top )
-                bit = vm_top;
+            bit = find_next_bit(vm_bitmap(t), vm_top[t], start + 1);
+            if ( bit > vm_top[t] )
+                bit = vm_top[t];
             /*
              * Note that this skips the first bit, making the
              * corresponding page a guard one.
              */
             start = (start + align) & ~(align - 1);
-            if ( bit < vm_top )
+            if ( bit < vm_top[t] )
             {
                 if ( start + nr < bit )
                     break;
-                start = find_next_zero_bit(vm_bitmap, vm_top, bit + 1);
+                start = find_next_zero_bit(vm_bitmap(t), vm_top[t], bit + 1);
             }
             else
             {
@@ -82,12 +89,12 @@ void *vm_alloc(unsigned int nr, unsigned int align)
             }
         }
 
-        if ( start < vm_top )
+        if ( start < vm_top[t] )
             break;
 
         spin_unlock(&vm_lock);
 
-        if ( vm_top >= vm_end )
+        if ( vm_top[t] >= vm_end[t] )
             return NULL;
 
         pg = alloc_domheap_page(NULL, 0);
@@ -96,23 +103,23 @@ void *vm_alloc(unsigned int nr, unsigned int align)
 
         spin_lock(&vm_lock);
 
-        if ( start >= vm_top )
+        if ( start >= vm_top[t] )
         {
-            unsigned long va = (unsigned long)vm_bitmap + vm_top / 8;
+            unsigned long va = (unsigned long)vm_bitmap(t) + vm_top[t] / 8;
 
             if ( !map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR) )
             {
                 clear_page((void *)va);
-                vm_top += PAGE_SIZE * 8;
-                if ( vm_top > vm_end )
-                    vm_top = vm_end;
+                vm_top[t] += PAGE_SIZE * 8;
+                if ( vm_top[t] > vm_end[t] )
+                    vm_top[t] = vm_end[t];
                 continue;
             }
         }
 
         free_domheap_page(pg);
 
-        if ( start >= vm_top )
+        if ( start >= vm_top[t] )
         {
             spin_unlock(&vm_lock);
             return NULL;
@@ -120,47 +127,51 @@ void *vm_alloc(unsigned int nr, unsigned int align)
     }
 
     for ( bit = start; bit < start + nr; ++bit )
-        __set_bit(bit, vm_bitmap);
-    if ( bit < vm_top )
-        ASSERT(!test_bit(bit, vm_bitmap));
+        __set_bit(bit, vm_bitmap(t));
+    if ( bit < vm_top[t] )
+        ASSERT(!test_bit(bit, vm_bitmap(t)));
     else
-        ASSERT(bit == vm_top);
-    if ( start <= vm_low + 2 )
-        vm_low = bit;
+        ASSERT(bit == vm_top[t]);
+    if ( start <= vm_low[t] + 2 )
+        vm_low[t] = bit;
     spin_unlock(&vm_lock);
 
-    return vm_base + start * PAGE_SIZE;
+    return vm_base[t] + start * PAGE_SIZE;
 }
 
-static unsigned int vm_index(const void *va)
+static unsigned int vm_index(const void *va, enum vmap_region type)
 {
     unsigned long addr = (unsigned long)va & ~(PAGE_SIZE - 1);
     unsigned int idx;
+    unsigned long start = (unsigned long)vm_base[type];
 
-    if ( addr < VMAP_VIRT_START + (vm_end / 8) ||
-         addr >= VMAP_VIRT_START + vm_top * PAGE_SIZE )
+    if ( !start )
         return 0;
 
-    idx = PFN_DOWN(va - vm_base);
-    return !test_bit(idx - 1, vm_bitmap) &&
-           test_bit(idx, vm_bitmap) ? idx : 0;
+    if ( addr < start + (vm_end[type] / 8) ||
+         addr >= start + vm_top[type] * PAGE_SIZE )
+        return 0;
+
+    idx = PFN_DOWN(va - vm_base[type]);
+    return !test_bit(idx - 1, vm_bitmap(type)) &&
+           test_bit(idx, vm_bitmap(type)) ? idx : 0;
 }
 
-static unsigned int vm_size(const void *va)
+static unsigned int vm_size(const void *va, enum vmap_region type)
 {
-    unsigned int start = vm_index(va), end;
+    unsigned int start = vm_index(va, type), end;
 
     if ( !start )
         return 0;
 
-    end = find_next_zero_bit(vm_bitmap, vm_top, start + 1);
+    end = find_next_zero_bit(vm_bitmap(type), vm_top[type], start + 1);
 
-    return min(end, vm_top) - start;
+    return min(end, vm_top[type]) - start;
 }
 
-void vm_free(const void *va)
+static void vm_free_type(const void *va, enum vmap_region type)
 {
-    unsigned int bit = vm_index(va);
+    unsigned int bit = vm_index(va, type);
 
     if ( !bit )
     {
@@ -169,29 +180,54 @@ void vm_free(const void *va)
     }
 
     spin_lock(&vm_lock);
-    if ( bit < vm_low )
+    if ( bit < vm_low[type] )
     {
-        vm_low = bit - 1;
-        while ( !test_bit(vm_low - 1, vm_bitmap) )
-            --vm_low;
+        vm_low[type] = bit - 1;
+        while ( !test_bit(vm_low[type] - 1, vm_bitmap(type)) )
+            --vm_low[type];
     }
-    while ( __test_and_clear_bit(bit, vm_bitmap) )
-        if ( ++bit == vm_top )
+    while ( __test_and_clear_bit(bit, vm_bitmap(type)) )
+        if ( ++bit == vm_top[type] )
             break;
     spin_unlock(&vm_lock);
 }
 
+void vm_free(const void *va)
+{
+    vm_free_type(va, VMAP_DEFAULT);
+}
+
+static void vunmap_type(const void *va, enum vmap_region type)
+{
+    unsigned int size = vm_size(va, type);
+#ifndef _PAGE_NONE
+    unsigned long addr = (unsigned long)va;
+
+    destroy_xen_mappings(addr, addr + PAGE_SIZE * size);
+#else /* Avoid tearing down intermediate page tables. */
+    map_pages_to_xen((unsigned long)va, 0, size, _PAGE_NONE);
+#endif
+    vm_free_type(va, type);
+}
+
+void vunmap(const void *va)
+{
+    vunmap_type(va, VMAP_DEFAULT);
+}
+
+
 void *__vmap(const mfn_t *mfn, unsigned int granularity,
-             unsigned int nr, unsigned int align, unsigned int flags)
+             unsigned int nr, unsigned int align, unsigned int flags,
+             enum vmap_region type)
 {
-    void *va = vm_alloc(nr * granularity, align);
+    void *va = vm_alloc_type(nr * granularity, align, type);
     unsigned long cur = (unsigned long)va;
 
     for ( ; va && nr--; ++mfn, cur += PAGE_SIZE * granularity )
     {
         if ( map_pages_to_xen(cur, mfn_x(*mfn), granularity, flags) )
         {
-            vunmap(va);
+            vunmap_type(va, type);
             va = NULL;
         }
     }
@@ -201,22 +237,10 @@ void *__vmap(const mfn_t *mfn, unsigned int granularity,
 
 void *vmap(const mfn_t *mfn, unsigned int nr)
 {
-    return __vmap(mfn, 1, nr, 1, PAGE_HYPERVISOR);
-}
-
-void vunmap(const void *va)
-{
-#ifndef _PAGE_NONE
-    unsigned long addr = (unsigned long)va;
-
-    destroy_xen_mappings(addr, addr + PAGE_SIZE * vm_size(va));
-#else /* Avoid tearing down intermediate page tables. */
-    map_pages_to_xen((unsigned long)va, 0, vm_size(va), _PAGE_NONE);
-#endif
-    vm_free(va);
+    return __vmap(mfn, 1, nr, 1, PAGE_HYPERVISOR, VMAP_DEFAULT);
 }
 
-void *vmalloc(size_t size)
+static void *vmalloc_type(size_t size, enum vmap_region type)
 {
     mfn_t *mfn;
     size_t pages, i;
@@ -238,11 +262,12 @@ void *vmalloc(size_t size)
         mfn[i] = _mfn(page_to_mfn(pg));
     }
 
-    va = vmap(mfn, pages);
+    va = __vmap(mfn, 1, pages, 1, PAGE_HYPERVISOR, type);
     if ( va == NULL )
         goto error;
 
     xfree(mfn);
+
     return va;
 
  error:
@@ -252,9 +277,19 @@ void *vmalloc(size_t size)
     return NULL;
 }
 
-void *vzalloc(size_t size)
+void *vmalloc(size_t size)
+{
+    return vmalloc_type(size, VMAP_DEFAULT);
+}
+
+void *vmalloc_xen(size_t size)
+{
+    return vmalloc_type(size, VMAP_XEN);
+}
+
+static void *vzalloc_type(size_t size, enum vmap_region type)
 {
-    void *p = vmalloc(size);
+    void *p = vmalloc_type(size, type);
     int i;
 
     if ( p == NULL )
@@ -266,16 +301,32 @@ void *vzalloc(size_t size)
     return p;
 }
 
+void *vzalloc(size_t size)
+{
+    return vzalloc_type(size, VMAP_DEFAULT);
+}
+
+void *vzalloc_xen(size_t size)
+{
+    return vzalloc_type(size, VMAP_XEN);
+}
+
 void vfree(void *va)
 {
     unsigned int i, pages;
     struct page_info *pg;
     PAGE_LIST_HEAD(pg_list);
+    enum vmap_region type = VMAP_DEFAULT;
 
     if ( !va )
         return;
 
-    pages = vm_size(va);
+    pages = vm_size(va, type);
+    if ( !pages )
+    {
+        type = VMAP_XEN;
+        pages = vm_size(va, type);
+    }
     ASSERT(pages);
 
     for ( i = 0; i < pages; i++ )
@@ -285,7 +336,7 @@ void vfree(void *va)
         ASSERT(page);
         page_list_add(page, &pg_list);
     }
-    vunmap(va);
+    vunmap_type(va, type);
 
     while ( (pg = page_list_remove_head(&pg_list)) != NULL )
         free_domheap_page(pg);
diff --git a/xen/drivers/acpi/osl.c b/xen/drivers/acpi/osl.c
index 8a28d87..9a49029 100644
--- a/xen/drivers/acpi/osl.c
+++ b/xen/drivers/acpi/osl.c
@@ -97,7 +97,7 @@ acpi_os_map_memory(acpi_physical_address phys, acpi_size size)
 		if (IS_ENABLED(CONFIG_X86) && !((phys + size - 1) >> 20))
 			return __va(phys);
 		return __vmap(&mfn, PFN_UP(offs + size), 1, 1,
-			      ACPI_MAP_MEM_ATTR) + offs;
+			      ACPI_MAP_MEM_ATTR, VMAP_DEFAULT) + offs;
 	}
 	return __acpi_map_table(phys, size);
 }
diff --git a/xen/include/xen/vmap.h b/xen/include/xen/vmap.h
index 5671ac8..765c06e 100644
--- a/xen/include/xen/vmap.h
+++ b/xen/include/xen/vmap.h
@@ -4,15 +4,26 @@
 #include <xen/mm.h>
 #include <asm/page.h>
 
-void *vm_alloc(unsigned int nr, unsigned int align);
+enum vmap_region {
+    VMAP_DEFAULT,
+    VMAP_XEN,
+    VMAP_REGION_NR,
+};
+
+void vm_init_type(enum vmap_region type, void *start, void *end);
+
 void vm_free(const void *);
 
-void *__vmap(const mfn_t *mfn, unsigned int granularity,
-             unsigned int nr, unsigned int align, unsigned int flags);
+void *__vmap(const mfn_t *mfn, unsigned int granularity, unsigned int nr,
+             unsigned int align, unsigned int flags, enum vmap_region);
 void *vmap(const mfn_t *mfn, unsigned int nr);
 void vunmap(const void *);
+
 void *vmalloc(size_t size);
+void *vmalloc_xen(size_t size);
+
 void *vzalloc(size_t size);
+void *vzalloc_xen(size_t size);
 void vfree(void *va);
 
 void __iomem *ioremap(paddr_t, size_t);
@@ -24,7 +35,10 @@ static inline void iounmap(void __iomem *va)
     vunmap((void *)(addr & PAGE_MASK));
 }
 
-void vm_init(void);
 void *arch_vmap_virt_end(void);
+static inline void vm_init(void)
+{
+    vm_init_type(VMAP_DEFAULT, (void *)VMAP_VIRT_START, arch_vmap_virt_end());
+}
 
 #endif /* __XEN_VMAP_H__ */
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 09/27] x86/mm: Introduce modify_xen_mappings()
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (7 preceding siblings ...)
  2016-04-25 15:34 ` [PATCH v9 08/27] arm/x86/vmap: Add v[z|m]alloc_xen and vm_init_type Konrad Rzeszutek Wilk
@ 2016-04-25 15:34 ` Konrad Rzeszutek Wilk
  2016-04-25 15:34 ` [PATCH v9 10/27] xsplice: Add helper elf routines Konrad Rzeszutek Wilk
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:34 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: George Dunlap, Jan Beulich, Konrad Rzeszutek Wilk

From: Andrew Cooper <andrew.cooper3@citrix.com>

To simply change the permissions on existing Xen mappings.  The existing
destroy_xen_mappings() is altered to support changing the PTE permissions.

A new destroy_xen_mappings() is introduced, as the special case of not passing
_PAGE_PRESENT to modify_xen_mappings().

As cleanup (and an ideal functional test), the boot logic which remaps Xen's
code and data with reduced permissions is altered to use
modify_xen_mappings(), rather than map_pages_to_xen() passing the same mfn's
back in.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

---
CC: Jan Beulich <JBeulich@suse.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>

v7: New
v8:
 * Allow callers to pass PAGE_HYPERVISOR_xxx constants, for consistently with
   similar functions.
 * Replace one opencoded 0 with its logical equivalent, _PAGE_NONE.
 * Add check for creating new mappings of 4K entries.
 * Correct the continue logic.  Invert the sense of the comment to match the
   code.
 * Added Reviewed-by George and Jan. Fix missing "are" in comment.
v9: Add missing second 'are'.
---
---
 xen/arch/x86/mm.c    | 75 +++++++++++++++++++++++++++++++++++++++++++++-------
 xen/arch/x86/setup.c | 22 +++++++--------
 xen/include/xen/mm.h |  2 ++
 3 files changed, 76 insertions(+), 23 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 633f0dc..2bb920b 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5956,7 +5956,19 @@ int populate_pt_range(unsigned long virt, unsigned long mfn,
     return map_pages_to_xen(virt, mfn, nr_mfns, MAP_SMALL_PAGES);
 }
 
-void destroy_xen_mappings(unsigned long s, unsigned long e)
+/*
+ * Alter the permissions of a range of Xen virtual address space.
+ *
+ * Does not create new mappings, and does not modify the mfn in existing
+ * mappings, but will shatter superpages if necessary, and will destroy
+ * mappings if not passed _PAGE_PRESENT.
+ *
+ * The only flags considered are NX, RW and PRESENT.  All other input flags
+ * are ignored.
+ *
+ * It is an error to call with present flags over an unpopulated range.
+ */
+void modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
 {
     bool_t locking = system_state > SYS_STATE_boot;
     l2_pgentry_t *pl2e;
@@ -5964,6 +5976,10 @@ void destroy_xen_mappings(unsigned long s, unsigned long e)
     unsigned int  i;
     unsigned long v = s;
 
+    /* Set of valid PTE bits which may be altered. */
+#define FLAGS_MASK (_PAGE_NX|_PAGE_RW|_PAGE_PRESENT)
+    nf &= FLAGS_MASK;
+
     ASSERT(IS_ALIGNED(s, PAGE_SIZE));
     ASSERT(IS_ALIGNED(e, PAGE_SIZE));
 
@@ -5973,6 +5989,9 @@ void destroy_xen_mappings(unsigned long s, unsigned long e)
 
         if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
         {
+            /* Confirm the caller isn't trying to create new mappings. */
+            ASSERT(!(nf & _PAGE_PRESENT));
+
             v += 1UL << L3_PAGETABLE_SHIFT;
             v &= ~((1UL << L3_PAGETABLE_SHIFT) - 1);
             continue;
@@ -5984,8 +6003,12 @@ void destroy_xen_mappings(unsigned long s, unsigned long e)
                  l1_table_offset(v) == 0 &&
                  ((e - v) >= (1UL << L3_PAGETABLE_SHIFT)) )
             {
-                /* PAGE1GB: whole superpage is destroyed. */
-                l3e_write_atomic(pl3e, l3e_empty());
+                /* PAGE1GB: whole superpage is modified. */
+                l3_pgentry_t nl3e = !(nf & _PAGE_PRESENT) ? l3e_empty()
+                    : l3e_from_pfn(l3e_get_pfn(*pl3e),
+                                   (l3e_get_flags(*pl3e) & ~FLAGS_MASK) | nf);
+
+                l3e_write_atomic(pl3e, nl3e);
                 v += 1UL << L3_PAGETABLE_SHIFT;
                 continue;
             }
@@ -6016,6 +6039,9 @@ void destroy_xen_mappings(unsigned long s, unsigned long e)
 
         if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
         {
+            /* Confirm the caller isn't trying to create new mappings. */
+            ASSERT(!(nf & _PAGE_PRESENT));
+
             v += 1UL << L2_PAGETABLE_SHIFT;
             v &= ~((1UL << L2_PAGETABLE_SHIFT) - 1);
             continue;
@@ -6026,8 +6052,12 @@ void destroy_xen_mappings(unsigned long s, unsigned long e)
             if ( (l1_table_offset(v) == 0) &&
                  ((e-v) >= (1UL << L2_PAGETABLE_SHIFT)) )
             {
-                /* PSE: whole superpage is destroyed. */
-                l2e_write_atomic(pl2e, l2e_empty());
+                /* PSE: whole superpage is modified. */
+                l2_pgentry_t nl2e = !(nf & _PAGE_PRESENT) ? l2e_empty()
+                    : l2e_from_pfn(l2e_get_pfn(*pl2e),
+                                   (l2e_get_flags(*pl2e) & ~FLAGS_MASK) | nf);
+
+                l2e_write_atomic(pl2e, nl2e);
                 v += 1UL << L2_PAGETABLE_SHIFT;
             }
             else
@@ -6055,13 +6085,27 @@ void destroy_xen_mappings(unsigned long s, unsigned long e)
         }
         else
         {
+            l1_pgentry_t nl1e;
+
             /* Ordinary 4kB mapping. */
             pl1e = l2e_to_l1e(*pl2e) + l1_table_offset(v);
-            l1e_write_atomic(pl1e, l1e_empty());
+
+            /* Confirm the caller isn't trying to create new mappings. */
+            if ( !(l1e_get_flags(*pl1e) & _PAGE_PRESENT) )
+                ASSERT(!(nf & _PAGE_PRESENT));
+
+            nl1e = !(nf & _PAGE_PRESENT) ? l1e_empty()
+                : l1e_from_pfn(l1e_get_pfn(*pl1e),
+                               (l1e_get_flags(*pl1e) & ~FLAGS_MASK) | nf);
+
+            l1e_write_atomic(pl1e, nl1e);
             v += PAGE_SIZE;
 
-            /* If we are done with the L2E, check if it is now empty. */
-            if ( (v != e) && (l1_table_offset(v) != 0) )
+            /*
+             * If we are not destroying mappings, or not done with the L2E,
+             * skip the empty&free check.
+             */
+            if ( (nf & _PAGE_PRESENT) || ((v != e) && (l1_table_offset(v) != 0)) )
                 continue;
             pl1e = l2e_to_l1e(*pl2e);
             for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
@@ -6076,8 +6120,12 @@ void destroy_xen_mappings(unsigned long s, unsigned long e)
             }
         }
 
-        /* If we are done with the L3E, check if it is now empty. */
-        if ( (v != e) && (l2_table_offset(v) + l1_table_offset(v) != 0) )
+        /*
+         * If we are not destroying mappings, or not done with the L3E,
+         * skip the empty&free check.
+         */
+        if ( (nf & _PAGE_PRESENT) ||
+             ((v != e) && (l2_table_offset(v) + l1_table_offset(v) != 0)) )
             continue;
         pl2e = l3e_to_l2e(*pl3e);
         for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
@@ -6093,10 +6141,17 @@ void destroy_xen_mappings(unsigned long s, unsigned long e)
     }
 
     flush_area(NULL, FLUSH_TLB_GLOBAL);
+
+#undef FLAGS_MASK
 }
 
 #undef flush_area
 
+void destroy_xen_mappings(unsigned long s, unsigned long e)
+{
+    modify_xen_mappings(s, e, _PAGE_NONE);
+}
+
 void __set_fixmap(
     enum fixed_addresses idx, unsigned long mfn, unsigned long flags)
 {
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 22dc148..5029568 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1220,23 +1220,19 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     if ( !using_2M_mapping() )
     {
         /* Mark .text as RX (avoiding the first 2M superpage). */
-        map_pages_to_xen(XEN_VIRT_START + MB(2),
-                         PFN_DOWN(__pa(XEN_VIRT_START + MB(2))),
-                         PFN_DOWN(__2M_text_end -
-                                  (const char *)(XEN_VIRT_START + MB(2))),
-                         PAGE_HYPERVISOR_RX);
+        modify_xen_mappings(XEN_VIRT_START + MB(2),
+                            (unsigned long)&__2M_text_end,
+                            PAGE_HYPERVISOR_RX);
 
         /* Mark .rodata as RO. */
-        map_pages_to_xen((unsigned long)&__2M_rodata_start,
-                         PFN_DOWN(__pa(__2M_rodata_start)),
-                         PFN_DOWN(__2M_rodata_end - __2M_rodata_start),
-                         PAGE_HYPERVISOR_RO);
+        modify_xen_mappings((unsigned long)&__2M_rodata_start,
+                            (unsigned long)&__2M_rodata_end,
+                            PAGE_HYPERVISOR_RO);
 
         /* Mark .data and .bss as RW. */
-        map_pages_to_xen((unsigned long)&__2M_rwdata_start,
-                         PFN_DOWN(__pa(__2M_rwdata_start)),
-                         PFN_DOWN(__2M_rwdata_end - __2M_rwdata_start),
-                         PAGE_HYPERVISOR_RW);
+        modify_xen_mappings((unsigned long)&__2M_rwdata_start,
+                            (unsigned long)&__2M_rwdata_end,
+                            PAGE_HYPERVISOR_RW);
 
         /* Drop the remaining mappings in the shattered superpage. */
         destroy_xen_mappings((unsigned long)&__2M_rwdata_end,
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index d62394f..d4721fc 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -103,6 +103,8 @@ int map_pages_to_xen(
     unsigned long mfn,
     unsigned long nr_mfns,
     unsigned int flags);
+/* Alter the permissions of a range of Xen virtual address space. */
+void modify_xen_mappings(unsigned long s, unsigned long e, unsigned int flags);
 void destroy_xen_mappings(unsigned long v, unsigned long e);
 /*
  * Create only non-leaf page table entries for the
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 10/27] xsplice: Add helper elf routines
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (8 preceding siblings ...)
  2016-04-25 15:34 ` [PATCH v9 09/27] x86/mm: Introduce modify_xen_mappings() Konrad Rzeszutek Wilk
@ 2016-04-25 15:34 ` Konrad Rzeszutek Wilk
  2016-04-26 10:05   ` Ross Lagerwall
  2016-04-26 12:37   ` Jan Beulich
  2016-04-25 15:34 ` [PATCH v9 11/27] xsplice: Implement payload loading Konrad Rzeszutek Wilk
                   ` (17 subsequent siblings)
  27 siblings, 2 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:34 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Keir Fraser, Tim Deegan, Ian Jackson, Jan Beulich, Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Add Elf routines and data structures in preparation for loading an
xSplice payload.

We make an assumption that the max number of sections an ELF payload
can have is 64. We can in future make this be dependent on the
names of the sections and verifying against a list, but for right now
this suffices.

Also we a whole lot of checks to make sure that the ELF payload
file is not corrupted nor that the offsets point past the file.

For most of the checks we print an message if the hypervisor is built
with debug enabled.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Andrew Cooper<andrew.cooper3@citrix.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>

v2: - With the #define ELFSIZE in the ARM file we can use the common
     #defines instead of using #ifdef CONFIG_ARM_32. Moved to another
    patch.
    - Add checks for ELF file.
    - Add name to be printed.
    - Add len for easier ELF checks.
    - Expand on the checks. Add macro.
v3: Remove the return_ macro
  - Add return_ macro back but make it depend on debug=y
  - Per Andrew review: add local variable. Fix memory leak in
    elf_resolve_sections, Remove macro and use dprintk. Fix alignment.
    Use void* instead of uint8_t to handle raw payload.
v4 - Fix memory leak in elf_get_sym
  - Add XSPLICE to printk/dprintk
v5: Sprinkle newlines.
v6: Squash the ELF header checks from 'xsplice: Implement payload loading' here,
    Do better job at checking string sections and the users of them (sh_size),
    Use XSPLICE as a string literal,
    Move some checks outside the loop,
    Make sure that SHT_STRTAB are really what they say
    Sprinkle consts.
v7:
    Check sh_entsize and sh_offset.
    Added Andrew's Reviewed-by and Ian's Acked-by
    Redo check on sh_entsize to not be !=
v8: Make all the dprintk(XENLOG_DEBUG be XENLOG_ERR
v9: Changed elf_verify_strtab to use const char and return EINVAL.
    Remove 'if ( !delta )' check in elf_resolve_sections
    Remove stale comments.
    Fixed one off check against  sh_link.
    Document boundary checks against shstrtab and symtab.
    Fixed return codes in xsplice_header_check.
    Add check for sections to not be within ELF header.
    Added overflow check for e_shoff in xsplice_header_check.
    Moved XSPLICE macro by four tabs.
    Make ->sym be const.
---
 xen/common/Makefile           |   1 +
 xen/common/xsplice_elf.c      | 363 ++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/xsplice.h     |   3 +
 xen/include/xen/xsplice_elf.h |  51 ++++++
 4 files changed, 418 insertions(+)
 create mode 100644 xen/common/xsplice_elf.c
 create mode 100644 xen/include/xen/xsplice_elf.h

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 1e4bc70..afd84b6 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -59,6 +59,7 @@ obj-y += wait.o
 obj-$(CONFIG_XENOPROF) += xenoprof.o
 obj-y += xmalloc_tlsf.o
 obj-$(CONFIG_XSPLICE) += xsplice.o
+obj-$(CONFIG_XSPLICE) += xsplice_elf.o
 
 obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma unlzo unlz4 earlycpio,$(n).init.o)
 
diff --git a/xen/common/xsplice_elf.c b/xen/common/xsplice_elf.c
new file mode 100644
index 0000000..2fd48fb
--- /dev/null
+++ b/xen/common/xsplice_elf.c
@@ -0,0 +1,363 @@
+/*
+ * Copyright (C) 2016 Citrix Systems R&D Ltd.
+ */
+
+#include <xen/errno.h>
+#include <xen/lib.h>
+#include <xen/xsplice_elf.h>
+#include <xen/xsplice.h>
+
+const struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
+                                                      const char *name)
+{
+    unsigned int i;
+
+    for ( i = 1; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( !strcmp(name, elf->sec[i].name) )
+            return &elf->sec[i];
+    }
+
+    return NULL;
+}
+
+static int elf_verify_strtab(const struct xsplice_elf_sec *sec)
+{
+    const Elf_Shdr *s;
+    const char *contents;
+
+    s = sec->sec;
+
+    if ( s->sh_type != SHT_STRTAB )
+        return -EINVAL;
+
+    if ( !s->sh_size )
+        return -EINVAL;
+
+    contents = sec->data;
+
+    if ( contents[0] || contents[s->sh_size - 1] )
+        return -EINVAL;
+
+    return 0;
+}
+
+static int elf_resolve_sections(struct xsplice_elf *elf, const void *data)
+{
+    struct xsplice_elf_sec *sec;
+    unsigned int i;
+    Elf_Off delta;
+    int rc;
+
+    /* xsplice_elf_load sanity checked e_shnum. */
+    sec = xmalloc_array(struct xsplice_elf_sec, elf->hdr->e_shnum);
+    if ( !sec )
+    {
+        dprintk(XENLOG_ERR, XSPLICE"%s: Could not allocate memory for section table!\n",
+               elf->name);
+        return -ENOMEM;
+    }
+
+    elf->sec = sec;
+
+    /* e_shoff and e_shnum overflow checks are done in xsplice_header_check. */
+    delta = elf->hdr->e_shoff + elf->hdr->e_shnum * elf->hdr->e_shentsize;
+    if ( delta > elf->len )
+    {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Section table is past end of payload!\n",
+                    elf->name);
+            return -EINVAL;
+    }
+
+    for ( i = 1; i < elf->hdr->e_shnum; i++ )
+    {
+        delta = elf->hdr->e_shoff + i * elf->hdr->e_shentsize;
+
+        sec[i].sec = data + delta;
+
+        delta = sec[i].sec->sh_offset;
+        /*
+         * N.B. elf_resolve_section_names, elf_get_sym skip this check as
+         * we do it here.
+         */
+        if ( delta < sizeof(Elf_Ehdr) ||
+             (delta + sec[i].sec->sh_size > elf->len) )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Section [%u] data %s of payload!\n",
+                    elf->name, i,
+                    delta <= sizeof(Elf_Ehdr) ? "at ELF header" : "is past end");
+            return -EINVAL;
+        }
+
+        sec[i].data = data + delta;
+        /* Name is populated in elf_resolve_section_names. */
+        sec[i].name = NULL;
+
+        if ( sec[i].sec->sh_type == SHT_SYMTAB )
+        {
+            if ( elf->symtab )
+            {
+                dprintk(XENLOG_ERR, XSPLICE "%s: Unsupported multiple symbol tables!\n",
+                        elf->name);
+                return -EOPNOTSUPP;
+            }
+
+            elf->symtab = &sec[i];
+
+            /*
+             * elf->symtab->sec->sh_link would point to the right section
+             * but we hadn't finished parsing all the sections.
+             */
+            if ( elf->symtab->sec->sh_link >= elf->hdr->e_shnum )
+            {
+                dprintk(XENLOG_ERR, XSPLICE
+                        "%s: Symbol table idx (%u) to strtab past end (%u)\n",
+                        elf->name, elf->symtab->sec->sh_link,
+                        elf->hdr->e_shnum);
+                return -EINVAL;
+            }
+        }
+    }
+
+    if ( !elf->symtab )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: No symbol table found!\n",
+                elf->name);
+        return -EINVAL;
+    }
+
+    if ( !elf->symtab->sec->sh_size ||
+         elf->symtab->sec->sh_entsize < sizeof(Elf_Sym) ||
+         elf->symtab->sec->sh_size % elf->symtab->sec->sh_entsize )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Symbol table header is corrupted!\n",
+                elf->name);
+        return -EINVAL;
+    }
+
+    /*
+     * There can be multiple SHT_STRTAB (.shstrtab, .strtab) so pick the one
+     * associated with the symbol table.
+     */
+    elf->strtab = &sec[elf->symtab->sec->sh_link];
+
+    rc = elf_verify_strtab(elf->strtab);
+    if ( rc )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: String table section is corrupted\n",
+                elf->name);
+    }
+
+    return rc;
+}
+
+static int elf_resolve_section_names(struct xsplice_elf *elf, const void *data)
+{
+    const char *shstrtab;
+    unsigned int i;
+    Elf_Off offset, delta;
+    struct xsplice_elf_sec *sec;
+    int rc;
+
+    /*
+     * The elf->sec[0 -> e_shnum] structures have been verified by
+     * elf_resolve_sections. Find file offset for section string table
+     * (normally called .shstrtab)
+     */
+    sec = &elf->sec[elf->hdr->e_shstrndx];
+
+    rc = elf_verify_strtab(sec);
+    if ( rc )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Section string table is corrupted\n",
+                elf->name);
+        return rc;
+    }
+
+    /* Verified in elf_resolve_sections but just in case. */
+    offset = sec->sec->sh_offset;
+    ASSERT(offset < elf->len && (offset + sec->sec->sh_size <= elf->len));
+
+    shstrtab = data + offset;
+
+    for ( i = 1; i < elf->hdr->e_shnum; i++ )
+    {
+        delta = elf->sec[i].sec->sh_name;
+
+        /* Boundary check on offset of name within the .shstrtab. */
+        if ( delta >= sec->sec->sh_size )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: shstrtab [%u] data is past end of payload!\n",
+                    elf->name, i);
+            return -EINVAL;
+        }
+
+        elf->sec[i].name = shstrtab + delta;
+    }
+
+    return 0;
+}
+
+static int elf_get_sym(struct xsplice_elf *elf, const void *data)
+{
+    const struct xsplice_elf_sec *symtab_sec, *strtab_sec;
+    struct xsplice_elf_sym *sym;
+    unsigned int i, delta, offset, nsym;
+
+    symtab_sec = elf->symtab;
+    strtab_sec = elf->strtab;
+
+    /* Pointers arithmetic to get file offset. */
+    offset = strtab_sec->data - data;
+
+    /* Checked already in elf_resolve_sections, but just in case. */
+    ASSERT(offset == strtab_sec->sec->sh_offset);
+    ASSERT(offset < elf->len && (offset + strtab_sec->sec->sh_size <= elf->len));
+
+    /* symtab_sec->data was computed in elf_resolve_sections. */
+    ASSERT((symtab_sec->sec->sh_offset + data) == symtab_sec->data);
+
+    /* No need to check values as elf_resolve_sections did it. */
+    nsym = symtab_sec->sec->sh_size / symtab_sec->sec->sh_entsize;
+
+    sym = xmalloc_array(struct xsplice_elf_sym, nsym);
+    if ( !sym )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Could not allocate memory for symbols\n",
+               elf->name);
+        return -ENOMEM;
+    }
+
+    /* So we don't leak memory. */
+    elf->sym = sym;
+
+    for ( i = 1; i < nsym; i++ )
+    {
+        Elf_Sym *s = &((Elf_Sym *)symtab_sec->data)[i];
+
+        delta = s->st_name;
+        /* Boundary check within the .strtab. */
+        if ( delta > strtab_sec->sec->sh_size )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Symbol [%u] data is past end of payload!\n",
+                    elf->name, i);
+            return -EINVAL;
+        }
+
+        sym[i].sym = s;
+        sym[i].name = data + (delta + offset);
+    }
+    elf->nsym = nsym;
+
+    return 0;
+}
+
+static int xsplice_header_check(const struct xsplice_elf *elf)
+{
+    const Elf_Ehdr *hdr = elf->hdr;
+
+    if ( sizeof(*elf->hdr) > elf->len )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Section header is bigger than payload!\n",
+                elf->name);
+        return -EINVAL;
+    }
+
+    if ( !IS_ELF(*hdr) )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Not an ELF payload!\n", elf->name);
+        return -EINVAL;
+    }
+
+    if ( hdr->e_ident[EI_CLASS] != ELFCLASS64 ||
+         hdr->e_ident[EI_DATA] != ELFDATA2LSB ||
+         hdr->e_ident[EI_OSABI] != ELFOSABI_SYSV ||
+         hdr->e_type != ET_REL ||
+         hdr->e_phnum != 0 )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Invalid ELF payload!\n", elf->name);
+        return -EOPNOTSUPP;
+    }
+
+    if ( elf->hdr->e_shstrndx == SHN_UNDEF )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Section name idx is undefined!?\n",
+                elf->name);
+        return -EINVAL;
+    }
+
+    /* Check that section name index is within the sections. */
+    if ( elf->hdr->e_shstrndx >= elf->hdr->e_shnum )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Section name idx (%u) is past end of sections (%u)!\n",
+                elf->name, elf->hdr->e_shstrndx, elf->hdr->e_shnum);
+        return -EINVAL;
+    }
+
+    if ( elf->hdr->e_shnum > 64 )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Too many (%u) sections!\n",
+                elf->name, elf->hdr->e_shnum);
+        return -EOPNOTSUPP;
+    }
+
+    if ( elf->hdr->e_shoff > ULONG_MAX )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Bogus e_shoff!\n", elf->name);
+        return -EINVAL;
+    }
+
+    if ( elf->hdr->e_shentsize < sizeof(Elf_Shdr) )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Section header size is %u! Expected %zu!?\n",
+                elf->name, elf->hdr->e_shentsize, sizeof(Elf_Shdr));
+        return -EINVAL;
+    }
+    return 0;
+}
+
+int xsplice_elf_load(struct xsplice_elf *elf, const void *data)
+{
+    int rc;
+
+    elf->hdr = data;
+
+    rc = xsplice_header_check(elf);
+    if ( rc )
+        return rc;
+
+    rc = elf_resolve_sections(elf, data);
+    if ( rc )
+        return rc;
+
+    rc = elf_resolve_section_names(elf, data);
+    if ( rc )
+        return rc;
+
+    rc = elf_get_sym(elf, data);
+    if ( rc )
+        return rc;
+
+    return 0;
+}
+
+void xsplice_elf_free(struct xsplice_elf *elf)
+{
+    xfree(elf->sec);
+    elf->sec = NULL;
+    xfree(elf->sym);
+    elf->sym = NULL;
+    elf->nsym = 0;
+    elf->name = NULL;
+    elf->len = 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index b9f08cd..7559877 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -10,6 +10,9 @@ struct xen_sysctl_xsplice_op;
 
 #ifdef CONFIG_XSPLICE
 
+/* Convenience define for printk. */
+#define XSPLICE             "xsplice: "
+
 int xsplice_op(struct xen_sysctl_xsplice_op *);
 
 #else
diff --git a/xen/include/xen/xsplice_elf.h b/xen/include/xen/xsplice_elf.h
new file mode 100644
index 0000000..686aaf0
--- /dev/null
+++ b/xen/include/xen/xsplice_elf.h
@@ -0,0 +1,51 @@
+/*
+ * Copyright (C) 2016 Citrix Systems R&D Ltd.
+ */
+
+#ifndef __XEN_XSPLICE_ELF_H__
+#define __XEN_XSPLICE_ELF_H__
+
+#include <xen/types.h>
+#include <xen/elfstructs.h>
+
+/* The following describes an Elf file as consumed by xSplice. */
+struct xsplice_elf_sec {
+    const Elf_Shdr *sec;                 /* Hooked up in elf_resolve_sections.*/
+    const char *name;                    /* Human readable name hooked in
+                                            elf_resolve_section_names. */
+    const void *data;                    /* Pointer to the section (done by
+                                            elf_resolve_sections). */
+};
+
+struct xsplice_elf_sym {
+    const Elf_Sym *sym;
+    const char *name;
+};
+
+struct xsplice_elf {
+    const char *name;                    /* Pointer to payload->name. */
+    size_t len;                          /* Length of the ELF file. */
+    const Elf_Ehdr *hdr;                 /* ELF file. */
+    struct xsplice_elf_sec *sec;         /* Array of sections, allocated by us. */
+    struct xsplice_elf_sym *sym;         /* Array of symbols , allocated by us. */
+    unsigned int nsym;
+    const struct xsplice_elf_sec *symtab;/* Pointer to .symtab section - aka to sec[x]. */
+    const struct xsplice_elf_sec *strtab;/* Pointer to .strtab section - aka to sec[y]. */
+};
+
+const struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
+                                                      const char *name);
+int xsplice_elf_load(struct xsplice_elf *elf, const void *data);
+void xsplice_elf_free(struct xsplice_elf *elf);
+
+#endif /* __XEN_XSPLICE_ELF_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 11/27] xsplice: Implement payload loading
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (9 preceding siblings ...)
  2016-04-25 15:34 ` [PATCH v9 10/27] xsplice: Add helper elf routines Konrad Rzeszutek Wilk
@ 2016-04-25 15:34 ` Konrad Rzeszutek Wilk
  2016-04-26 10:48   ` Ross Lagerwall
  2016-04-26 13:39   ` Jan Beulich
  2016-04-25 15:34 ` [PATCH v9 12/27] xsplice: Implement support for applying/reverting/replacing patches Konrad Rzeszutek Wilk
                   ` (16 subsequent siblings)
  27 siblings, 2 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:34 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Julien Grall, Stefano Stabellini, Keir Fraser, Jan Beulich,
	Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Add support for loading xsplice payloads. This is somewhat similar to
the Linux kernel module loader, implementing the following steps:
- Verify the elf file.
- Parse the elf file.
- Allocate a region of memory mapped within a free area of
  [xen_virt_end, XEN_VIRT_END].
- Copy allocated sections into the new region. Split them in three
  regions - .text, .data, and .rodata. MUST have at least .text.
- Resolve section symbols. All other symbols must be absolute addresses.
  (Note that patch titled "xsplice,symbols: Implement symbol name resolution
   on address" implements that)
- Perform relocations.
- Secure the the regions (.text,.data,.rodata) with proper permissions.

We capitalize on the vmalloc callback API (see patch titled:
"rm/x86/vmap: Add v[z|m]alloc_xen, and vm_init_type") to allocate
a region of memory within the [xen_virt_end, XEN_VIRT_END] for the code.

We also use the "x86/mm: Introduce modify_xen_mappings()"
to change the virtual address page-table permissions.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>

---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v2: - Change the 'xsplice_patch_func' structure layout/size.
    - Add more error checking. Fix memory leak.
    - Move elf_resolve and elf_perform relocs in elf file.
    - Print the payload address and pages in keyhandler.
v3:
    - Make it build under ARM
    - Build it without using the return_ macro.
    - Add fixes from Ross.
    - Add the _return macro back - but only use it during debug builds.
    - Remove the macro, prefix arch_ on arch specific calls.
v4:
    - Move alloc_payload to arch specific file.
    - Use void* instead of uint8_t, use const
    - Add copyrights
    - Unroll the vmap code to add ASSERT. Change while to not incur
      potential long error loop
   - Use vmalloc/vfree cb APIs
   - Secure .text pages to be RX instead of RWX.
v5:
  - Fix allocation of virtual addresses only allowing one page to be allocated.
  - Create .text, .data, and .rodata regions with different permissions.
  - Make the find_space_t not a typedef to pointer to a function.
  - Allocate memory in here.
v6: Drop parentheses on typedefs.
  - s/an xSplice/a xSplice/
  - Rebase on "vmap: Add vmalloc_cb"
  - Rebase on "vmap: Add vmalloc_type and vm_init_type"
  - s/uint8_t/void/ on load_addr
  - Set xsplice_elf on stack without using memset.
v7:
  - Changed the check on delta = elf->hdr->e_shoff + elf->hdr->e_shnum * elf->hdr->e_shentsize;
    The sections can be right at the back of the file (different linker!), so the failing conditional
    for 'if (delta >= elf->len)' is incorrect and should have been '>'.
  - Changed dprintk(XENLOG_DEBUG to XENLOG_ERR, then back to DEBUG. Converted
    some of the printk to dprintk.
  - Rebase on " arm/x86/vmap: Add vmalloc_xen, vfree_xen and vm_init_type"
  - Changed some of the printk XENLOG_ERR to XENLOG_DEBUG
  - Check the idx in the relocation to make sure it is within bounds and
    implemented.
  - Use "x86/mm: Introduce modify_xen_mappings()"
  - Introduce PRIxElfAddr
  - Check for overflow in R_X86_64_PC32
  - Return -EOPNOTSUPP if we don't support types in ELF64_R_TYPE
v8:
  - Change dprintk and printk XENLOG_DEBUG to XENLOG_ERR
  - Convert four of the printks in dprintk.
v9:
  - Rebase on different spinlock usage in xsplice_upload.
  - Do proper bound and overflow checking.
  - Added 'const' on [text,ro,rw]_addr.
  - Made 'calc_section' and 'move_payload' use an dynamically
    allocated array for computed offsets instead of modifying sh_entsize.
  - Remove arch_xsplice_[alloc_payload|free] and use vzalloc_xen and
    vfree.
  - Collapse for loop in move_payload.
  - Move xsplice.o in Makefile
  - Add more checks in arch_xsplice_perform_rela (r_offset and
     sh_size % sh_entsize)
  - Use int32_t and int64_t in arch_xsplice_perform_rela.
  - Tighten the list of sh_flags we check
  - Use intermediate on 'buf' so that we can do 'const void *'
  - Use intermediate in xsplice_elf_resolve_symbols for 'const' of elf->sym.
  - Fail if (and only) SHF_ALLOC and SHT_NOBITS section is seen.
---
 xen/arch/arm/Makefile         |   1 +
 xen/arch/arm/xsplice.c        |  45 ++++++++
 xen/arch/x86/Makefile         |   1 +
 xen/arch/x86/xsplice.c        | 165 +++++++++++++++++++++++++++++
 xen/common/xsplice.c          | 234 ++++++++++++++++++++++++++++++++++++++++--
 xen/common/xsplice_elf.c      | 120 +++++++++++++++++++++-
 xen/include/xen/elfstructs.h  |   4 +
 xen/include/xen/xsplice.h     |  24 +++++
 xen/include/xen/xsplice_elf.h |  11 +-
 9 files changed, 595 insertions(+), 10 deletions(-)
 create mode 100644 xen/arch/arm/xsplice.c
 create mode 100644 xen/arch/x86/xsplice.c

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 0328b50..eae5cb3 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -40,6 +40,7 @@ obj-y += device.o
 obj-y += decode.o
 obj-y += processor.o
 obj-y += smc.o
+obj-$(CONFIG_XSPLICE) += xsplice.o
 
 #obj-bin-y += ....o
 
diff --git a/xen/arch/arm/xsplice.c b/xen/arch/arm/xsplice.c
new file mode 100644
index 0000000..c7bcc8e
--- /dev/null
+++ b/xen/arch/arm/xsplice.c
@@ -0,0 +1,45 @@
+/*
+ *  Copyright (C) 2016 Citrix Systems R&D Ltd.
+ */
+#include <xen/lib.h>
+#include <xen/errno.h>
+#include <xen/xsplice_elf.h>
+#include <xen/xsplice.h>
+
+int arch_xsplice_verify_elf(const struct xsplice_elf *elf)
+{
+    return -ENOSYS;
+}
+
+int arch_xsplice_perform_rel(struct xsplice_elf *elf,
+                             const struct xsplice_elf_sec *base,
+                             const struct xsplice_elf_sec *rela)
+{
+    return -ENOSYS;
+}
+
+int arch_xsplice_perform_rela(struct xsplice_elf *elf,
+                              const struct xsplice_elf_sec *base,
+                              const struct xsplice_elf_sec *rela)
+{
+    return -ENOSYS;
+}
+
+int arch_xsplice_secure(const void *va, unsigned int pages, enum va_type type)
+{
+    return -ENOSYS;
+}
+
+void arch_xsplice_init(void)
+{
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 729065b..f74fd2c 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -61,6 +61,7 @@ obj-y += x86_emulate.o
 obj-y += tboot.o
 obj-y += hpet.o
 obj-y += vm_event.o
+obj-$(CONFIG_XSPLICE) += xsplice.o
 obj-y += xstate.o
 
 obj-$(crash_debug) += gdbstub.o
diff --git a/xen/arch/x86/xsplice.c b/xen/arch/x86/xsplice.c
new file mode 100644
index 0000000..f95463e
--- /dev/null
+++ b/xen/arch/x86/xsplice.c
@@ -0,0 +1,165 @@
+/*
+ * Copyright (C) 2016 Citrix Systems R&D Ltd.
+ */
+
+#include <xen/errno.h>
+#include <xen/lib.h>
+#include <xen/mm.h>
+#include <xen/pfn.h>
+#include <xen/vmap.h>
+#include <xen/xsplice_elf.h>
+#include <xen/xsplice.h>
+
+int arch_xsplice_verify_elf(const struct xsplice_elf *elf)
+{
+
+    const Elf_Ehdr *hdr = elf->hdr;
+
+    if ( hdr->e_machine != EM_X86_64 )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Unsupported ELF Machine type!\n",
+                elf->name);
+        return -EOPNOTSUPP;
+    }
+
+    return 0;
+}
+
+int arch_xsplice_perform_rel(struct xsplice_elf *elf,
+                             const struct xsplice_elf_sec *base,
+                             const struct xsplice_elf_sec *rela)
+{
+    dprintk(XENLOG_ERR, XSPLICE "%s: SHT_REL relocation unsupported\n",
+            elf->name);
+    return -EOPNOTSUPP;
+}
+
+int arch_xsplice_perform_rela(struct xsplice_elf *elf,
+                              const struct xsplice_elf_sec *base,
+                              const struct xsplice_elf_sec *rela)
+{
+    const Elf_RelA *r;
+    unsigned int symndx, i;
+    uint64_t val;
+    uint8_t *dest;
+
+    /* Nothing to do. */
+    if ( !rela->sec->sh_size )
+        return 0;
+
+    if ( !rela->sec->sh_entsize ||
+         rela->sec->sh_entsize < sizeof(Elf_RelA) ||
+         rela->sec->sh_size % rela->sec->sh_entsize )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Section relative header is corrupted!\n",
+                elf->name);
+        return -EINVAL;
+    }
+
+    for ( i = 0; i < (rela->sec->sh_size / rela->sec->sh_entsize); i++ )
+    {
+        r = rela->data + i * rela->sec->sh_entsize;
+
+        symndx = ELF64_R_SYM(r->r_info);
+
+        if ( symndx > elf->nsym )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Relative relocation wants symbol@%u which is past end!\n",
+                    elf->name, symndx);
+            return -EINVAL;
+        }
+
+        if ( r->r_offset > base->sec->sh_size )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Relative relocation offset is past %s section!\n",
+                    elf->name, base->name);
+            return -EINVAL;
+        }
+
+        dest = base->load_addr + r->r_offset;
+        val = r->r_addend + elf->sym[symndx].sym->st_value;
+
+        switch ( ELF64_R_TYPE(r->r_info) )
+        {
+        case R_X86_64_NONE:
+            break;
+
+        case R_X86_64_64:
+            *(uint64_t *)dest = val;
+            break;
+
+        case R_X86_64_PLT32:
+            /*
+             * Xen uses -fpic which normally uses PLT relocations
+             * except that it sets visibility to hidden which means
+             * that they are not used.  However, when gcc cannot
+             * inline memcpy it emits memcpy with default visibility
+             * which then creates a PLT relocation.  It can just be
+             * treated the same as R_X86_64_PC32.
+             */
+        case R_X86_64_PC32:
+            val -= (uint64_t)dest;
+            *(int32_t *)dest = val;
+            if ( (int64_t)val != *(int32_t *)dest )
+            {
+                dprintk(XENLOG_ERR, XSPLICE "%s: Overflow in relocation %u in %s for %s!\n",
+                        elf->name, i, rela->name, base->name);
+                return -EOVERFLOW;
+            }
+            break;
+
+        default:
+            dprintk(XENLOG_ERR, XSPLICE "%s: Unhandled relocation %lu\n",
+                    elf->name, ELF64_R_TYPE(r->r_info));
+            return -EOPNOTSUPP;
+        }
+    }
+
+    return 0;
+}
+
+/*
+ * Once the resolving symbols, performing relocations, etc is complete
+ * we secure the memory by putting in the proper page table attributes
+ * for the desired type.
+ */
+int arch_xsplice_secure(const void *va, unsigned int pages, enum va_type type)
+{
+    unsigned long start = (unsigned long)va;
+    unsigned int flag;
+
+    ASSERT(va);
+    ASSERT(pages);
+
+    if ( type == XSPLICE_VA_RX )
+        flag = PAGE_HYPERVISOR_RX;
+    else if ( type == XSPLICE_VA_RW )
+        flag = PAGE_HYPERVISOR_RW;
+    else
+        flag = PAGE_HYPERVISOR_RO;
+
+    modify_xen_mappings(start, start + pages * PAGE_SIZE, flag);
+
+    return 0;
+}
+
+void arch_xsplice_init(void)
+{
+    void *start, *end;
+
+    start = (void *)xen_virt_end;
+    end = (void *)(XEN_VIRT_END - NR_CPUS * PAGE_SIZE);
+
+    BUG_ON(end <= start);
+
+    vm_init_type(VMAP_XEN, start, end);
+}
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 4878a57..fd33a53 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -13,6 +13,7 @@
 #include <xen/smp.h>
 #include <xen/spinlock.h>
 #include <xen/vmap.h>
+#include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 
 #include <asm/event.h>
@@ -29,6 +30,13 @@ struct payload {
     uint32_t state;                      /* One of the XSPLICE_STATE_*. */
     int32_t rc;                          /* 0 or -XEN_EXX. */
     struct list_head list;               /* Linked to 'payload_list'. */
+    const void *text_addr;               /* Virtual address of .text. */
+    size_t text_size;                    /* .. and its size. */
+    const void *rw_addr;                 /* Virtual address of .data. */
+    size_t rw_size;                      /* .. and its size (if any). */
+    const void *ro_addr;                 /* Virtual address of .rodata. */
+    size_t ro_size;                      /* .. and its size (if any). */
+    unsigned int pages;                  /* Total pages for [text,rw,ro]_addr */
     char name[XEN_XSPLICE_NAME_SIZE];    /* Name of it. */
 };
 
@@ -86,19 +94,220 @@ static struct payload *find_payload(const char *name)
     return found;
 }
 
+/*
+ * Functions related to XEN_SYSCTL_XSPLICE_UPLOAD (see xsplice_upload), and
+ * freeing payload (XEN_SYSCTL_XSPLICE_ACTION:XSPLICE_ACTION_UNLOAD).
+ */
+
+static void free_payload_data(struct payload *payload)
+{
+    /* Set to zero until "move_payload". */
+    if ( !payload->text_addr )
+        return;
+
+    vfree((void *)payload->text_addr);
+
+    payload->pages = 0;
+}
+
+/*
+* calc_section computes the size (taking into account section alignment).
+*
+* Furthermore the offset is set with the offset from the start of the virtual
+* address space for the payload (using passed in size). This is used in
+* move_payload to figure out the destination location (load_addr).
+*/
+static void calc_section(const struct xsplice_elf_sec *sec, size_t *size,
+                         unsigned int *offset)
+{
+    const Elf_Shdr *s = sec->sec;
+    size_t align_size;
+
+    align_size = ROUNDUP(*size, s->sh_addralign);
+    *offset = align_size;
+    *size = s->sh_size + align_size;
+}
+
+static int move_payload(struct payload *payload, struct xsplice_elf *elf)
+{
+    uint8_t *text_buf, *ro_buf, *rw_buf;
+    unsigned int i;
+    size_t size = 0;
+    unsigned int *offset;
+    int rc = 0;
+
+    offset = xzalloc_array(unsigned int, elf->hdr->e_shnum);
+    if ( !offset )
+        return -ENOMEM;
+
+    /* Compute size of different regions. */
+    for ( i = 1; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( (elf->sec[i].sec->sh_flags & (SHF_ALLOC|SHF_EXECINSTR)) ==
+             (SHF_ALLOC|SHF_EXECINSTR) )
+            calc_section(&elf->sec[i], &payload->text_size, &offset[i]);
+        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
+                  !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
+                  (elf->sec[i].sec->sh_flags & SHF_WRITE) )
+            calc_section(&elf->sec[i], &payload->rw_size, &offset[i]);
+        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
+                  !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
+                  !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
+            calc_section(&elf->sec[i], &payload->ro_size, &offset[i]);
+        else if ( !elf->sec[i].sec->sh_flags ||
+                  (elf->sec[i].sec->sh_flags & SHF_EXECINSTR) ||
+                  (elf->sec[i].sec->sh_flags & SHF_MASKPROC) )
+            /* Do nothing.*/;
+        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
+                  (elf->sec[i].sec->sh_type == SHT_NOBITS) )
+        {
+            dprintk(XENLOG_DEBUG, XSPLICE "%s: Not supporting %s section!\n",
+                    elf->name, elf->sec[i].name);
+            rc = -EOPNOTSUPP;
+            goto out;
+        }
+        else /* Such as .comment. */
+            dprintk(XENLOG_DEBUG, XSPLICE "%s: Ignoring %s section!\n",
+                    elf->name, elf->sec[i].name);
+    }
+
+    /*
+     * Total of all three regions - RX, RW, and RO. We have to have
+     * keep them in seperate pages so we PAGE_ALIGN the RX and RW to have
+     * them on seperate pages. The last one will by default fall on its
+     * own page.
+     */
+    size = PAGE_ALIGN(payload->text_size) + PAGE_ALIGN(payload->rw_size) +
+                      payload->ro_size;
+
+    size = PFN_UP(size); /* Nr of pages. */
+    text_buf = vzalloc_xen(size * PAGE_SIZE);
+    if ( !text_buf )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Could not allocate memory for payload!\n",
+                elf->name);
+        rc = -ENOMEM;
+        goto out;
+    }
+    rw_buf = text_buf +  + PAGE_ALIGN(payload->text_size);
+    ro_buf = rw_buf + PAGE_ALIGN(payload->rw_size);
+
+    payload->pages = size;
+    payload->text_addr = text_buf;
+    payload->rw_addr = rw_buf;
+    payload->ro_addr = ro_buf;
+
+    for ( i = 1; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( elf->sec[i].sec->sh_flags & SHF_ALLOC )
+        {
+            uint8_t *buf;
+            if ( (elf->sec[i].sec->sh_flags & SHF_EXECINSTR) )
+                buf = text_buf;
+            else if ( (elf->sec[i].sec->sh_flags & SHF_WRITE) )
+                buf = rw_buf;
+             else
+                buf = ro_buf;
+
+            elf->sec[i].load_addr = buf + offset[i];
+
+            /*
+             * Don't copy NOBITS - such as BSS. We don't memset BSS as
+             * arch_xsplice_alloc_payload has zeroed it out for us.
+             */
+            if ( elf->sec[i].sec->sh_type != SHT_NOBITS )
+            {
+                memcpy(elf->sec[i].load_addr, elf->sec[i].data,
+                       elf->sec[i].sec->sh_size);
+                dprintk(XENLOG_DEBUG, XSPLICE "%s: Loaded %s at %p\n",
+                        elf->name, elf->sec[i].name, elf->sec[i].load_addr);
+            }
+        }
+    }
+
+ out:
+    xfree(offset);
+
+    return rc;
+}
+
+static int secure_payload(struct payload *payload, struct xsplice_elf *elf)
+{
+    int rc;
+    unsigned int text_pages, rw_pages, ro_pages;
+
+    text_pages = PFN_UP(payload->text_size);
+    ASSERT(text_pages);
+
+    rc = arch_xsplice_secure(payload->text_addr, text_pages, XSPLICE_VA_RX);
+    if ( rc )
+        return rc;
+
+    rw_pages = PFN_UP(payload->rw_size);
+    if ( rw_pages )
+    {
+        rc = arch_xsplice_secure(payload->rw_addr, rw_pages, XSPLICE_VA_RW);
+        if ( rc )
+            return rc;
+    }
+
+    ro_pages = PFN_UP(payload->ro_size);
+    if ( ro_pages )
+        rc = arch_xsplice_secure(payload->ro_addr, ro_pages, XSPLICE_VA_RO);
+
+    ASSERT(ro_pages + rw_pages + text_pages == payload->pages);
+
+    return rc;
+}
+
 static void free_payload(struct payload *data)
 {
     ASSERT(spin_is_locked(&payload_lock));
     list_del(&data->list);
     payload_cnt--;
     payload_version++;
+    free_payload_data(data);
     xfree(data);
 }
 
+static int load_payload_data(struct payload *payload, void *raw, size_t len)
+{
+    struct xsplice_elf elf = { .name = payload->name, .len = len };
+    int rc = 0;
+
+    rc = xsplice_elf_load(&elf, raw);
+    if ( rc )
+        goto out;
+
+    rc = move_payload(payload, &elf);
+    if ( rc )
+        goto out;
+
+    rc = xsplice_elf_resolve_symbols(&elf);
+    if ( rc )
+        goto out;
+
+    rc = xsplice_elf_perform_relocs(&elf);
+    if ( rc )
+        goto out;
+
+    rc = secure_payload(payload, &elf);
+
+ out:
+    if ( rc )
+        free_payload_data(payload);
+
+    /* Free our temporary data structure. */
+    xsplice_elf_free(&elf);
+
+    return rc;
+}
+
 static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
 {
     struct payload *data, *found;
     char n[XEN_XSPLICE_NAME_SIZE];
+    void *raw_data;
     int rc;
 
     rc = verify_payload(upload, n);
@@ -106,6 +315,7 @@ static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
         return rc;
 
     data = xzalloc(struct payload);
+    raw_data = vmalloc(upload->size);
 
     spin_lock(&payload_lock);
 
@@ -121,15 +331,20 @@ static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
         goto out;
     }
 
-    if ( !data )
-    {
-        rc = -ENOMEM;
+    rc = -ENOMEM;
+    if ( !data || !raw_data )
         goto out;
-    }
 
-    rc = 0;
+    rc = -EFAULT;
+    if ( __copy_from_guest(raw_data, upload->payload, upload->size) )
+        goto out;
 
     memcpy(data->name, n, strlen(n));
+
+    rc = load_payload_data(data, raw_data, upload->size);
+    if ( rc )
+        goto out;
+
     data->state = XSPLICE_STATE_CHECKED;
     INIT_LIST_HEAD(&data->list);
 
@@ -140,6 +355,8 @@ static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
  out:
     spin_unlock(&payload_lock);
 
+    vfree(raw_data);
+
     if ( rc )
         xfree(data);
 
@@ -383,8 +600,9 @@ static void xsplice_printall(unsigned char key)
     }
 
     list_for_each_entry ( data, &payload_list, list )
-        printk(" name=%s state=%s(%d)\n", data->name,
-               state2str(data->state), data->state);
+        printk(" name=%s state=%s(%d) %p (.data=%p, .rodata=%p) using %u pages.\n",
+               data->name, state2str(data->state), data->state, data->text_addr,
+               data->rw_addr, data->ro_addr, data->pages);
 
     spin_unlock(&payload_lock);
 }
@@ -392,6 +610,8 @@ static void xsplice_printall(unsigned char key)
 static int __init xsplice_init(void)
 {
     register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
+
+    arch_xsplice_init();
     return 0;
 }
 __initcall(xsplice_init);
diff --git a/xen/common/xsplice_elf.c b/xen/common/xsplice_elf.c
index 2fd48fb..8501138 100644
--- a/xen/common/xsplice_elf.c
+++ b/xen/common/xsplice_elf.c
@@ -103,7 +103,7 @@ static int elf_resolve_sections(struct xsplice_elf *elf, const void *data)
             }
 
             elf->symtab = &sec[i];
-
+            elf->symtab_idx = i;
             /*
              * elf->symtab->sec->sh_link would point to the right section
              * but we hadn't finished parsing all the sections.
@@ -252,9 +252,123 @@ static int elf_get_sym(struct xsplice_elf *elf, const void *data)
     return 0;
 }
 
+int xsplice_elf_resolve_symbols(struct xsplice_elf *elf)
+{
+    unsigned int i;
+    int rc = 0;
+
+    ASSERT(elf->sym);
+
+    for ( i = 1; i < elf->nsym; i++ )
+    {
+        unsigned int idx = elf->sym[i].sym->st_shndx;
+        Elf_Sym *sym = (Elf_Sym *)elf->sym[i].sym;
+
+        switch ( idx )
+        {
+        case SHN_COMMON:
+            dprintk(XENLOG_ERR, XSPLICE "%s: Unexpected common symbol: %s\n",
+                    elf->name, elf->sym[i].name);
+            rc = -EINVAL;
+            break;
+
+        case SHN_UNDEF:
+            dprintk(XENLOG_ERR, XSPLICE "%s: Unknown symbol: %s\n",
+                    elf->name, elf->sym[i].name);
+            rc = -ENOENT;
+            break;
+
+        case SHN_ABS:
+            dprintk(XENLOG_DEBUG, XSPLICE "%s: Absolute symbol: %s => %#"PRIxElfAddr"\n",
+                    elf->name, elf->sym[i].name, sym->st_value);
+            break;
+
+        default:
+            if ( idx < elf->hdr->e_shnum &&
+                 !(elf->sec[idx].sec->sh_flags & SHF_ALLOC) )
+                break;
+
+            /* SHN_COMMON and SHN_ABS are above. */
+            if ( idx >= SHN_LORESERVE )
+                rc = -EOPNOTSUPP;
+            else if ( idx >= elf->hdr->e_shnum )
+                rc = -EINVAL;
+
+            if ( rc )
+            {
+                dprintk(XENLOG_ERR, XSPLICE "%s: Unknown type=%#"PRIx16"\n",
+                        elf->name, idx);
+                break;
+            }
+
+            sym->st_value += (unsigned long)elf->sec[idx].load_addr;
+            if ( elf->sym[i].name )
+                dprintk(XENLOG_DEBUG, XSPLICE "%s: Symbol resolved: %s => %#"PRIxElfAddr"(%s)\n",
+                       elf->name, elf->sym[i].name,
+                       sym->st_value, elf->sec[idx].name);
+        }
+
+        if ( rc )
+            break;
+    }
+
+    return rc;
+}
+
+int xsplice_elf_perform_relocs(struct xsplice_elf *elf)
+{
+    struct xsplice_elf_sec *r, *base;
+    unsigned int i;
+    int rc = 0;
+
+    /*
+     * The first entry of an ELF symbol table is the "undefined symbol index".
+     * aka reserved so we skip it.
+     */
+    ASSERT(elf->sym);
+
+    for ( i = 1; i < elf->hdr->e_shnum; i++ )
+    {
+        r = &elf->sec[i];
+
+        if ( (r->sec->sh_type != SHT_RELA) &&
+             (r->sec->sh_type != SHT_REL) )
+            continue;
+
+         /* Is it a valid relocation section? */
+         if ( r->sec->sh_info >= elf->hdr->e_shnum )
+            continue;
+
+         base = &elf->sec[r->sec->sh_info];
+
+         /* Don't relocate non-allocated sections. */
+         if ( !(base->sec->sh_flags & SHF_ALLOC) )
+            continue;
+
+        if ( r->sec->sh_link != elf->symtab_idx )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Relative link of %s is incorrect (%d, expected=%d)\n",
+                    elf->name, r->name, r->sec->sh_link, elf->symtab_idx);
+            rc = -EINVAL;
+            break;
+        }
+
+        if ( r->sec->sh_type == SHT_RELA )
+            rc = arch_xsplice_perform_rela(elf, base, r);
+        else /* SHT_REL */
+            rc = arch_xsplice_perform_rel(elf, base, r);
+
+        if ( rc )
+            break;
+    }
+
+    return rc;
+}
+
 static int xsplice_header_check(const struct xsplice_elf *elf)
 {
     const Elf_Ehdr *hdr = elf->hdr;
+    int rc;
 
     if ( sizeof(*elf->hdr) > elf->len )
     {
@@ -279,6 +393,10 @@ static int xsplice_header_check(const struct xsplice_elf *elf)
         return -EOPNOTSUPP;
     }
 
+    rc = arch_xsplice_verify_elf(elf);
+    if ( rc )
+        return rc;
+
     if ( elf->hdr->e_shstrndx == SHN_UNDEF )
     {
         dprintk(XENLOG_ERR, XSPLICE "%s: Section name idx is undefined!?\n",
diff --git a/xen/include/xen/elfstructs.h b/xen/include/xen/elfstructs.h
index 85f35ed..2b9bd3f 100644
--- a/xen/include/xen/elfstructs.h
+++ b/xen/include/xen/elfstructs.h
@@ -472,6 +472,8 @@ typedef struct {
 #endif
 
 #if defined(ELFSIZE) && (ELFSIZE == 32)
+#define PRIxElfAddr	"08x"
+
 #define Elf_Ehdr	Elf32_Ehdr
 #define Elf_Phdr	Elf32_Phdr
 #define Elf_Shdr	Elf32_Shdr
@@ -497,6 +499,8 @@ typedef struct {
 
 #define AuxInfo		Aux32Info
 #elif defined(ELFSIZE) && (ELFSIZE == 64)
+#define PRIxElfAddr	PRIx64
+
 #define Elf_Ehdr	Elf64_Ehdr
 #define Elf_Phdr	Elf64_Phdr
 #define Elf_Shdr	Elf64_Shdr
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index 7559877..857c264 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -6,6 +6,9 @@
 #ifndef __XEN_XSPLICE_H__
 #define __XEN_XSPLICE_H__
 
+struct xsplice_elf;
+struct xsplice_elf_sec;
+struct xsplice_elf_sym;
 struct xen_sysctl_xsplice_op;
 
 #ifdef CONFIG_XSPLICE
@@ -15,6 +18,27 @@ struct xen_sysctl_xsplice_op;
 
 int xsplice_op(struct xen_sysctl_xsplice_op *);
 
+/* Arch hooks. */
+int arch_xsplice_verify_elf(const struct xsplice_elf *elf);
+int arch_xsplice_perform_rel(struct xsplice_elf *elf,
+                             const struct xsplice_elf_sec *base,
+                             const struct xsplice_elf_sec *rela);
+int arch_xsplice_perform_rela(struct xsplice_elf *elf,
+                              const struct xsplice_elf_sec *base,
+                              const struct xsplice_elf_sec *rela);
+enum va_type {
+    XSPLICE_VA_RX, /* .text */
+    XSPLICE_VA_RW, /* .data */
+    XSPLICE_VA_RO, /* .rodata */
+};
+
+/*
+ * Function to secure the allocate pages (from arch_xsplice_alloc_payload)
+ * with the right page permissions.
+ */
+int arch_xsplice_secure(const void *va, unsigned int pages, enum va_type types);
+
+void arch_xsplice_init(void);
 #else
 
 #include <xen/errno.h> /* For -ENOSYS */
diff --git a/xen/include/xen/xsplice_elf.h b/xen/include/xen/xsplice_elf.h
index 686aaf0..750dc94 100644
--- a/xen/include/xen/xsplice_elf.h
+++ b/xen/include/xen/xsplice_elf.h
@@ -15,6 +15,8 @@ struct xsplice_elf_sec {
                                             elf_resolve_section_names. */
     const void *data;                    /* Pointer to the section (done by
                                             elf_resolve_sections). */
+    void *load_addr;                     /* A pointer to the allocated destination.
+                                            Done by load_payload_data. */
 };
 
 struct xsplice_elf_sym {
@@ -29,8 +31,10 @@ struct xsplice_elf {
     struct xsplice_elf_sec *sec;         /* Array of sections, allocated by us. */
     struct xsplice_elf_sym *sym;         /* Array of symbols , allocated by us. */
     unsigned int nsym;
-    const struct xsplice_elf_sec *symtab;/* Pointer to .symtab section - aka to sec[x]. */
-    const struct xsplice_elf_sec *strtab;/* Pointer to .strtab section - aka to sec[y]. */
+    const struct xsplice_elf_sec *symtab;/* Pointer to .symtab section - aka to
+                                            sec[symtab_idx]. */
+    const struct xsplice_elf_sec *strtab;/* Pointer to .strtab section. */
+    unsigned int symtab_idx;
 };
 
 const struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
@@ -38,6 +42,9 @@ const struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *
 int xsplice_elf_load(struct xsplice_elf *elf, const void *data);
 void xsplice_elf_free(struct xsplice_elf *elf);
 
+int xsplice_elf_resolve_symbols(struct xsplice_elf *elf);
+int xsplice_elf_perform_relocs(struct xsplice_elf *elf);
+
 #endif /* __XEN_XSPLICE_ELF_H__ */
 
 /*
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 12/27] xsplice: Implement support for applying/reverting/replacing patches.
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (10 preceding siblings ...)
  2016-04-25 15:34 ` [PATCH v9 11/27] xsplice: Implement payload loading Konrad Rzeszutek Wilk
@ 2016-04-25 15:34 ` Konrad Rzeszutek Wilk
  2016-04-26 15:21   ` Jan Beulich
  2016-04-25 15:35 ` [PATCH v9 13/27] x86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version' Konrad Rzeszutek Wilk
                   ` (15 subsequent siblings)
  27 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:34 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Kevin Tian, Stefano Stabellini, Keir Fraser,
	Suravee Suthikulpanit, Konrad Rzeszutek Wilk, Jun Nakajima,
	Julien Grall, Jan Beulich, Boris Ostrovsky

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Implement support for the apply, revert and replace actions.

To perform and action on a payload, the hypercall sets up a data
structure to schedule the work.  A hook is added in the reset_stack_and_jump
to check for work and execute it if needed (specifically we check an
per-cpu flag to make this as quick as possible).

In this way, patches can be applied with all CPUs idle and without
stacks.  The first CPU to run check_for_xsplice_work() becomes the
master and triggers a reschedule softirq to trigger all the other CPUs
to enter check_for_xsplice_work() with no stack.  Once all CPUs
have rendezvoused, all CPUs disable their IRQs and NMIs are ignored.
The system is then quiscient and the master performs the action.
After this, all CPUs enable IRQs and NMIs are re-enabled.

Note that it is unsafe to patch do_nmi and the xSplice internal functions.
Patching functions on NMI/MCE path is liable to end in disaster on x86.
This is not addressed in this patch and is mentioned in the
design doc as a further TODO.

The action to perform is one of:
- APPLY: For each function in the module, store the first arch-specific
  number bytes of the old function and replace it with a jump to the
  new function. (on x86 it is 5 bytes, on ARM it will likey be 4 bytes).
- REVERT: Copy the previously stored bytes into the first arch-specific
  number of bytes of the old function (again, 5 bytes on x86).
- REPLACE: Revert each applied module and then apply the new module.

To prevent a deadlock with any other barrier in the system, the master
will wait for up to 30ms before timing out.
Measurements found that the patch application to take about 100 μs on a
72 CPU system, whether idle or fully loaded.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>

--
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>

v2: - Pluck the 'struct xsplice_patch_func' in this patch.
    - Modify code per review comments.
    - Add more data in the keyboard handler.
    - Redo the patching code, split it in functions.
v3: - Add return_ macro for debug builds.
    - Move s/payload_list_lock/payload_list/ to earlier patch
    - Remove const and use ELF types for xsplice_patch_func
     - Add check routine to do simple sanity checks for various
      sections.
    - s/%p/PRIx64/ as ARM builds complain.
    - Move code around. Add more dprintk. Add XSPLICE in front of all
      printks/dprintk.
      Put the NMIs back if we fail patching.
      Add per-cpu to lessen contention for global structure.
      Extract from xsplice_do_single patching code into xsplice_do_action
      Squash xsplice_do_single and check_for_xsplice_work together to
      have all rendezvous in one place.
      Made XSPLICE_ACTION_REPLACE work again (wrong list iterator)
      s/find_special_sections/prepare_payload/
      Use list_del_init and INIT_LIST_HEAD for applied_list
v4:
   - Add comment, adjust spacing for "Timed out on CPU semaphore"
   - Added CR0.WP manipulations when altering the .text of hypervisor.
   - Added fix from Andrew for CR0.WP manipulation.
v5: - Made xsplice_patch_func use uintXX_t instead of ELF_ types to easy
      making it work under ARM (32bit). Add more BUILD-BUG-ON checks.
    - Add more BUILD_ON checks. Sprinkle newlines.
v6: Rebase on "arm/x86: Alter nmi_callback_t typedef"
   - Drop the recursive spinlock usage.
   - Move NMI callbacks in arch specific.
   - Fold the 'check_for_xsplice_work' in reset_stack_and_jump
   - Add arch specific check for .xsplice.funcs.
   - Seperate external and internal structure of .xsplice.funcs.
   - Changed per Jan's review
   - Modified the .xsplice.funcs checks
v7:
   - Modified old_ptr to void* instead of uint8_t*
   - Modified the xsplice_patch_func_internal for ARM32 to have padding.
   - Used #if BITS_PER_LONG == 64 for the xsplice_patch_func_internal along
     with ifndef CONFIG_ARM for the undo (which may be different size on ARM64)
v8:
  - Add "is empty" if special sections are in fact empty.
  - Added Andrew's Reviewed-by:
  - Rebase on v7.2 of  x86/mm: Introduce modify_xen_mappings()
  - Change some of printk to dprintk and some of the dprintk to printk.
  - Make the xsplice_patch_func (and the internal) structure have uint32_t
    (instead of uint64_t) if BITS_PER_LONG==32. This makes the size and
    offset different so note that in the design and common code.
  - Add #undef ACTION
  - Guard struct xsplice_patch_func in sysctl.h with __XEN__ as toolstacks
    will fail to compile. We do have BITS_PER_LONG defined in xc_bitops.h but
    that will go away (and also that macro uses sizeof and the pre-processor
    will choke on that).
  - Dropped Julien's Acked as I replaced BITS_PER_LONG/CONFIG_ARM_32.
    (Stefano is OK with it, but would prefer BITS_PER_LONG, Jan does not want
    BITS_PER_LONG).
v9: Expose the struct xsplice_patch_func old_addr and new_addr as void
    instead of uint32_t or uint64_t.
  - Added Julien' Ack back.
  - Rename pad to opaque.
  - Added comment in aidle_loop.
  - Squash internal and public of 'xsplice_patch_func'
  - Fixed remainig sizeof use.
  - Removed reference to MCE
  - Fixed comment styles.
  - Use bool_t in check_special_sections
  - Add a #define for .xsplice.funcs.
  - Remove full stops from printk
  - Fix xsplice_do_action per Jan's punchlist
  - Use spin_lock_try in keyhandler
  - Remove leading underscores from __CHECK_FOR_XSPLICE_WORK
  - Don't fail compilation on GCC5 - we MUST have rc set.
  - Don't bail out if finding !sh_type as those are for .rela or .debug
    and while we don't need to allocate it (as we had already done
    the relocation), do continue.
  - Make applied_list be an RCU type to guard against infinite loops
    when searching the applied_list.
  - Dropped the irq_semaphore and are re-using the semaphore atomic when
    CPUs have rendezvoused and are ready to go in IRQ disable phase.
---
 xen/arch/arm/xsplice.c        |  33 +++
 xen/arch/x86/domain.c         |   6 +
 xen/arch/x86/xsplice.c        |  76 +++++++
 xen/common/xsplice.c          | 469 +++++++++++++++++++++++++++++++++++++++++-
 xen/include/asm-x86/current.h |  10 +-
 xen/include/public/sysctl.h   |  20 ++
 xen/include/xen/xsplice.h     |  21 ++
 7 files changed, 623 insertions(+), 12 deletions(-)

diff --git a/xen/arch/arm/xsplice.c b/xen/arch/arm/xsplice.c
index c7bcc8e..d5735c6 100644
--- a/xen/arch/arm/xsplice.c
+++ b/xen/arch/arm/xsplice.c
@@ -6,6 +6,39 @@
 #include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 
+void arch_xsplice_patching_enter(void)
+{
+}
+
+void arch_xsplice_patching_leave(void)
+{
+}
+
+int arch_xsplice_verify_func(const struct xsplice_patch_func *func)
+{
+    return -ENOSYS;
+}
+
+void arch_xsplice_apply_jmp(struct xsplice_patch_func *func)
+{
+}
+
+void arch_xsplice_revert_jmp(const struct xsplice_patch_func *func)
+{
+}
+
+void arch_xsplice_post_action(void)
+{
+}
+
+void arch_xsplice_mask(void)
+{
+}
+
+void arch_xsplice_unmask(void)
+{
+}
+
 int arch_xsplice_verify_elf(const struct xsplice_elf *elf)
 {
     return -ENOSYS;
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index e93ff20..d13b272 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -36,6 +36,7 @@
 #include <xen/cpu.h>
 #include <xen/wait.h>
 #include <xen/guest_access.h>
+#include <xen/xsplice.h>
 #include <public/sysctl.h>
 #include <public/hvm/hvm_vcpu.h>
 #include <asm/regs.h>
@@ -120,6 +121,11 @@ static void idle_loop(void)
         (*pm_idle)();
         do_tasklet();
         do_softirq();
+        /*
+         * We MUST be last (or before pm_idle). Otherwise after we get the
+         * softirq we would execute pm_idle (and sleep) and not patch.
+         */
+        check_for_xsplice_work();
     }
 }
 
diff --git a/xen/arch/x86/xsplice.c b/xen/arch/x86/xsplice.c
index f95463e..493fefe 100644
--- a/xen/arch/x86/xsplice.c
+++ b/xen/arch/x86/xsplice.c
@@ -10,6 +10,82 @@
 #include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 
+#include <asm/nmi.h>
+
+#define PATCH_INSN_SIZE 5
+
+void arch_xsplice_patching_enter(void)
+{
+    /* Disable WP to allow changes to read-only pages. */
+    write_cr0(read_cr0() & ~X86_CR0_WP);
+}
+
+void arch_xsplice_patching_leave(void)
+{
+    /* Reinstate WP. */
+    write_cr0(read_cr0() | X86_CR0_WP);
+}
+
+int arch_xsplice_verify_func(const struct xsplice_patch_func *func)
+{
+    /* No NOP patching yet. */
+    if ( !func->new_size )
+        return -EOPNOTSUPP;
+
+    if ( func->old_size < PATCH_INSN_SIZE )
+        return -EINVAL;
+
+    return 0;
+}
+
+void arch_xsplice_apply_jmp(struct xsplice_patch_func *func)
+{
+    int32_t val;
+    uint8_t *old_ptr;
+
+    BUILD_BUG_ON(PATCH_INSN_SIZE > sizeof(func->opaque));
+    BUILD_BUG_ON(PATCH_INSN_SIZE != (1 + sizeof(val)));
+
+    old_ptr = func->old_addr;
+    memcpy(func->opaque, old_ptr, PATCH_INSN_SIZE);
+
+    *old_ptr++ = 0xe9; /* Relative jump */
+    val = func->new_addr - func->old_addr - PATCH_INSN_SIZE;
+    memcpy(old_ptr, &val, sizeof(val));
+}
+
+void arch_xsplice_revert_jmp(const struct xsplice_patch_func *func)
+{
+    memcpy(func->old_addr, func->opaque, PATCH_INSN_SIZE);
+}
+
+/* Serialise the CPU pipeline. */
+void arch_xsplice_post_action(void)
+{
+    cpuid_eax(0);
+}
+
+static nmi_callback_t *saved_nmi_callback;
+/*
+ * Note that because of this NOP code the do_nmi is not safely patchable.
+ * Also if we do receive 'real' NMIs we have lost them.
+ */
+static int mask_nmi_callback(const struct cpu_user_regs *regs, int cpu)
+{
+    /* TODO: Handle missing NMI/MCE.*/
+    return 1;
+}
+
+void arch_xsplice_mask(void)
+{
+    saved_nmi_callback = set_nmi_callback(mask_nmi_callback);
+}
+
+void arch_xsplice_unmask(void)
+{
+    set_nmi_callback(saved_nmi_callback);
+}
+
 int arch_xsplice_verify_elf(const struct xsplice_elf *elf)
 {
 
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index fd33a53..efb396a 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -3,6 +3,7 @@
  *
  */
 
+#include <xen/cpu.h>
 #include <xen/err.h>
 #include <xen/guest_access.h>
 #include <xen/keyhandler.h>
@@ -11,18 +12,30 @@
 #include <xen/mm.h>
 #include <xen/sched.h>
 #include <xen/smp.h>
+#include <xen/softirq.h>
 #include <xen/spinlock.h>
 #include <xen/vmap.h>
+#include <xen/wait.h>
 #include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 
 #include <asm/event.h>
-#include <public/sysctl.h>
 
-/* Protects against payload_list operations. */
+/*
+ * Protects against payload_list operations and also allows only one
+ * caller in schedule_work.
+ */
 static DEFINE_SPINLOCK(payload_lock);
 static LIST_HEAD(payload_list);
 
+/*
+ * Patches which have been applied. Need RCU in case we crash (and then
+ * traps code would iterate via applied_list) when adding entries on the list.
+ *
+ * Note: There are no 'rcu_applied_lock' as we don't iterate yet the list.
+ */
+static LIST_HEAD(applied_list);
+
 static unsigned int payload_cnt;
 static unsigned int payload_version = 1;
 
@@ -37,9 +50,35 @@ struct payload {
     const void *ro_addr;                 /* Virtual address of .rodata. */
     size_t ro_size;                      /* .. and its size (if any). */
     unsigned int pages;                  /* Total pages for [text,rw,ro]_addr */
+    struct list_head applied_list;       /* Linked to 'applied_list'. */
+    struct xsplice_patch_func *funcs;    /* The array of functions to patch. */
+    unsigned int nfuncs;                 /* Nr of functions to patch. */
     char name[XEN_XSPLICE_NAME_SIZE];    /* Name of it. */
 };
 
+/* Defines an outstanding patching action. */
+struct xsplice_work
+{
+    atomic_t semaphore;          /* Used to rendezvous CPUs in
+                                    check_for_xsplice_work. */
+    uint32_t timeout;            /* Timeout to do the operation. */
+    struct payload *data;        /* The payload on which to act. */
+    volatile bool_t do_work;     /* Signals work to do. */
+    volatile bool_t ready;       /* Signals all CPUs synchronized. */
+    unsigned int cmd;            /* Action request: XSPLICE_ACTION_* */
+};
+
+/* There can be only one outstanding patching action. */
+static struct xsplice_work xsplice_work;
+
+/*
+ * Indicate whether the CPU needs to consult xsplice_work structure.
+ * We want an per-cpu data structure otherwise the check_for_xsplice_work
+ * would hammer a global xsplice_work structure on every guest VMEXIT.
+ * Having an per-cpu lessens the load.
+ */
+static DEFINE_PER_CPU(bool_t, work_to_do);
+
 static int get_name(const xen_xsplice_name_t *name, char *n)
 {
     if ( !name->size || name->size > XEN_XSPLICE_NAME_SIZE )
@@ -260,6 +299,88 @@ static int secure_payload(struct payload *payload, struct xsplice_elf *elf)
     return rc;
 }
 
+static int check_special_sections(const struct xsplice_elf *elf)
+{
+    unsigned int i;
+    static const char *const names[] = { ELF_XSPLICE_FUNC };
+    bool_t count[ARRAY_SIZE(names)] = { 0 };
+
+    for ( i = 0; i < ARRAY_SIZE(names); i++ )
+    {
+        const struct xsplice_elf_sec *sec;
+
+        sec = xsplice_elf_sec_by_name(elf, names[i]);
+        if ( !sec )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: %s is missing!\n",
+                    elf->name, names[i]);
+            return -EINVAL;
+        }
+
+        if ( !sec->sec->sh_size )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: %s is empty!\n",
+                    elf->name, names[i]);
+            return -EINVAL;
+        }
+        if ( ++count[i] > 1 )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: %s was seen more than once!\n",
+                    elf->name, names[i]);
+            return -EINVAL;
+        }
+    }
+
+    return 0;
+}
+
+static int prepare_payload(struct payload *payload,
+                           struct xsplice_elf *elf)
+{
+    const struct xsplice_elf_sec *sec;
+    unsigned int i;
+    struct xsplice_patch_func *f;
+
+    sec = xsplice_elf_sec_by_name(elf, ELF_XSPLICE_FUNC);
+    ASSERT(sec);
+    if ( sec->sec->sh_size % sizeof(*payload->funcs) )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Wrong size of "ELF_XSPLICE_FUNC"!\n",
+                elf->name);
+        return -EINVAL;
+    }
+
+    payload->funcs = sec->load_addr;
+    payload->nfuncs = sec->sec->sh_size / sizeof(*payload->funcs);
+
+    for ( i = 0; i < payload->nfuncs; i++ )
+    {
+        int rc;
+
+        f = &(payload->funcs[i]);
+
+        if ( f->version != XSPLICE_PAYLOAD_VERSION )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Wrong version (%u). Expected %d!\n",
+                    elf->name, f->version, XSPLICE_PAYLOAD_VERSION);
+            return -EOPNOTSUPP;
+        }
+
+        if ( !f->new_addr || !f->new_size )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Address or size fields are zero!\n",
+                    elf->name);
+            return -EINVAL;
+        }
+
+        rc = arch_xsplice_verify_func(f);
+        if ( rc )
+            return rc;
+    }
+
+    return 0;
+}
+
 static void free_payload(struct payload *data)
 {
     ASSERT(spin_is_locked(&payload_lock));
@@ -291,6 +412,14 @@ static int load_payload_data(struct payload *payload, void *raw, size_t len)
     if ( rc )
         goto out;
 
+    rc = check_special_sections(&elf);
+    if ( rc )
+        goto out;
+
+    rc = prepare_payload(payload, &elf);
+    if ( rc )
+        goto out;
+
     rc = secure_payload(payload, &elf);
 
  out:
@@ -347,6 +476,7 @@ static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
 
     data->state = XSPLICE_STATE_CHECKED;
     INIT_LIST_HEAD(&data->list);
+    INIT_LIST_HEAD(&data->applied_list);
 
     list_add_tail(&data->list, &payload_list);
     payload_cnt++;
@@ -457,6 +587,305 @@ static int xsplice_list(xen_sysctl_xsplice_list_t *list)
     return rc ? : idx;
 }
 
+/*
+ * The following functions get the CPUs into an appropriate state and
+ * apply (or revert) each of the payload's functions. This is needed
+ * for XEN_SYSCTL_XSPLICE_ACTION operation (see xsplice_action).
+ */
+
+static int apply_payload(struct payload *data)
+{
+    unsigned int i;
+
+    printk(XENLOG_INFO XSPLICE "%s: Applying %u functions\n",
+            data->name, data->nfuncs);
+
+    arch_xsplice_patching_enter();
+
+    for ( i = 0; i < data->nfuncs; i++ )
+        arch_xsplice_apply_jmp(&data->funcs[i]);
+
+    arch_xsplice_patching_leave();
+
+    list_add_tail_rcu(&data->applied_list, &applied_list);
+
+    return 0;
+}
+
+static int revert_payload(struct payload *data)
+{
+    unsigned int i;
+
+    printk(XENLOG_INFO XSPLICE "%s: Reverting\n", data->name);
+
+    arch_xsplice_patching_enter();
+
+    for ( i = 0; i < data->nfuncs; i++ )
+        arch_xsplice_revert_jmp(&data->funcs[i]);
+
+    arch_xsplice_patching_leave();
+
+    list_del_rcu(&data->applied_list);
+
+    return 0;
+}
+
+/*
+ * This function is executed having all other CPUs with no deep stack (we may
+ * have cpu_idle on it) and IRQs disabled.
+ */
+static void xsplice_do_action(void)
+{
+    int rc;
+    struct payload *data, *other, *tmp;
+
+    data = xsplice_work.data;
+    /*
+     * This function and the transition from asm to C code should be the only
+     * one on any stack. No need to lock the payload list or applied list.
+     */
+    switch ( xsplice_work.cmd )
+    {
+    case XSPLICE_ACTION_APPLY:
+        rc = apply_payload(data);
+        if ( rc == 0 )
+            data->state = XSPLICE_STATE_APPLIED;
+        break;
+
+    case XSPLICE_ACTION_REVERT:
+        rc = revert_payload(data);
+        if ( rc == 0 )
+            data->state = XSPLICE_STATE_CHECKED;
+        break;
+
+    case XSPLICE_ACTION_REPLACE:
+        rc = 0;
+        /*
+	 * N.B: Use 'applied_list' member, not 'list'. We also abuse the
+	 * the 'normal' list iterator as the list is an RCU one.
+	 */
+        list_for_each_entry_safe_reverse ( other, tmp, &applied_list, applied_list )
+        {
+            other->rc = revert_payload(other);
+            if ( other->rc == 0 )
+                other->state = XSPLICE_STATE_CHECKED;
+            else
+            {
+                rc = -EINVAL;
+                break;
+            }
+        }
+
+        if ( rc == 0 )
+        {
+            rc = apply_payload(data);
+            if ( rc == 0 )
+                data->state = XSPLICE_STATE_APPLIED;
+        }
+        break;
+
+    default:
+        rc = -EINVAL; /* Make GCC5 happy. */
+        ASSERT_UNREACHABLE();
+        break;
+    }
+
+    /* We must set rc as xsplice_action sets it to -EAGAIN when kicking of. */
+    data->rc = rc;
+}
+
+static int schedule_work(struct payload *data, uint32_t cmd, uint32_t timeout)
+{
+    ASSERT(spin_is_locked(&payload_lock));
+
+    /* Fail if an operation is already scheduled. */
+    if ( xsplice_work.do_work )
+        return -EBUSY;
+
+    if ( !get_cpu_maps() )
+    {
+        printk(XENLOG_ERR XSPLICE "%s: unable to get cpu_maps lock!\n",
+               data->name);
+        return -EBUSY;
+    }
+
+    xsplice_work.cmd = cmd;
+    xsplice_work.data = data;
+    xsplice_work.timeout = timeout ?: MILLISECS(30);
+
+    dprintk(XENLOG_DEBUG, XSPLICE "%s: timeout is %"PRI_stime"ms\n",
+            data->name, xsplice_work.timeout / MILLISECS(1));
+
+    atomic_set(&xsplice_work.semaphore, -1);
+
+    xsplice_work.ready = 0;
+
+    smp_wmb();
+
+    xsplice_work.do_work = 1;
+    this_cpu(work_to_do) = 1;
+
+    put_cpu_maps();
+
+    return 0;
+}
+
+static void reschedule_fn(void *unused)
+{
+    this_cpu(work_to_do) = 1;
+    raise_softirq(SCHEDULE_SOFTIRQ);
+}
+
+static int xsplice_spin(atomic_t *counter, s_time_t timeout,
+                           unsigned int cpus, const char *s)
+{
+    int rc = 0;
+
+    while ( atomic_read(counter) != cpus && NOW() < timeout )
+        cpu_relax();
+
+    /* Log & abort. */
+    if ( atomic_read(counter) != cpus )
+    {
+        printk(XENLOG_ERR XSPLICE "%s: Timed out on semaphore in %s quiesce phase %u/%u\n",
+               xsplice_work.data->name, s, atomic_read(counter), cpus);
+        rc = -EBUSY;
+        xsplice_work.data->rc = rc;
+        smp_wmb();
+        xsplice_work.do_work = 0;
+    }
+
+    return rc;
+}
+
+/*
+ * The main function which manages the work of quiescing the system and
+ * patching code.
+ */
+void check_for_xsplice_work(void)
+{
+#define ACTION(x) [XSPLICE_ACTION_##x] = #x
+    static const char *const names[] = {
+            ACTION(APPLY),
+            ACTION(REVERT),
+            ACTION(REPLACE),
+    };
+#undef ACTION
+    unsigned int cpu = smp_processor_id();
+    s_time_t timeout;
+    unsigned long flags;
+
+    /* Fast path: no work to do. */
+    if ( !per_cpu(work_to_do, cpu ) )
+        return;
+
+    smp_rmb();
+    /* In case we aborted, other CPUs can skip right away. */
+    if ( !xsplice_work.do_work )
+    {
+        per_cpu(work_to_do, cpu) = 0;
+        return;
+    }
+
+    ASSERT(local_irq_is_enabled());
+
+    /* Set at -1, so will go up to num_online_cpus - 1. */
+    if ( atomic_inc_and_test(&xsplice_work.semaphore) )
+    {
+        struct payload *p;
+        unsigned int cpus;
+
+        p = xsplice_work.data;
+        if ( !get_cpu_maps() )
+        {
+            printk(XENLOG_ERR XSPLICE "%s: CPU%u - unable to get cpu_maps lock!\n",
+                   p->name, cpu);
+            per_cpu(work_to_do, cpu) = 0;
+            xsplice_work.data->rc = -EBUSY;
+            smp_wmb();
+            xsplice_work.do_work = 0;
+            /*
+             * Do NOT decrement xsplice_work.semaphore down - as that may cause
+             * the other CPU (which may be at this point ready to increment it)
+             * to assume the role of master and then needlessly time out
+             * out (as do_work is zero).
+             */
+            return;
+        }
+        /* "Mask" NMIs. */
+        arch_xsplice_mask();
+
+        barrier(); /* MUST do it after get_cpu_maps. */
+        cpus = num_online_cpus() - 1;
+
+        if ( cpus )
+        {
+            dprintk(XENLOG_DEBUG, XSPLICE "%s: CPU%u - IPIing the other %u CPUs\n",
+                    p->name, cpu, cpus);
+            smp_call_function(reschedule_fn, NULL, 0);
+        }
+
+        timeout = xsplice_work.timeout + NOW();
+        if ( xsplice_spin(&xsplice_work.semaphore, timeout, cpus, "CPU") )
+            goto abort;
+
+        /* All CPUs are waiting, now signal to disable IRQs. */
+        atomic_set(&xsplice_work.semaphore, 0);
+        /*
+         * MUST have a barrier after semaphore so that the other CPUs don't
+         * leak out of the 'Wait for all CPUs to rendezvous' loop and increment
+         * 'semaphore' before we set it to zero.
+         */
+        smp_wmb();
+        xsplice_work.ready = 1;
+
+        if ( !xsplice_spin(&xsplice_work.semaphore, timeout, cpus, "IRQ") )
+        {
+            local_irq_save(flags);
+            /* Do the patching. */
+            xsplice_do_action();
+            /* Serialize and flush out the CPU via CPUID instruction (on x86). */
+            arch_xsplice_post_action();
+            local_irq_restore(flags);
+        }
+        arch_xsplice_unmask();
+
+ abort:
+        per_cpu(work_to_do, cpu) = 0;
+        xsplice_work.do_work = 0;
+
+        /* put_cpu_maps has an barrier(). */
+        put_cpu_maps();
+
+        printk(XENLOG_INFO XSPLICE "%s finished %s with rc=%d\n",
+               p->name, names[xsplice_work.cmd], p->rc);
+    }
+    else
+    {
+        /* Wait for all CPUs to rendezvous. */
+        while ( xsplice_work.do_work && !xsplice_work.ready )
+            cpu_relax();
+
+        /* Disable IRQs and signal. */
+        local_irq_save(flags);
+        /*
+         * We re-use the sempahore, so MUST have it reset by master before
+         * we exit the loop above.
+         */
+        atomic_inc(&xsplice_work.semaphore);
+
+        /* Wait for patching to complete. */
+        while ( xsplice_work.do_work )
+            cpu_relax();
+
+        /* To flush out pipeline. */
+        arch_xsplice_post_action();
+        local_irq_restore(flags);
+
+        per_cpu(work_to_do, cpu) = 0;
+    }
+}
+
 static int xsplice_action(xen_sysctl_xsplice_action_t *action)
 {
     struct payload *data;
@@ -505,27 +934,24 @@ static int xsplice_action(xen_sysctl_xsplice_action_t *action)
     case XSPLICE_ACTION_REVERT:
         if ( data->state == XSPLICE_STATE_APPLIED )
         {
-            /* No implementation yet. */
-            data->state = XSPLICE_STATE_CHECKED;
-            data->rc = 0;
+            data->rc = -EAGAIN;
+            rc = schedule_work(data, action->cmd, action->timeout);
         }
         break;
 
     case XSPLICE_ACTION_APPLY:
         if ( data->state == XSPLICE_STATE_CHECKED )
         {
-            /* No implementation yet. */
-            data->state = XSPLICE_STATE_APPLIED;
-            data->rc = 0;
+            data->rc = -EAGAIN;
+            rc = schedule_work(data, action->cmd, action->timeout);
         }
         break;
 
     case XSPLICE_ACTION_REPLACE:
         if ( data->state == XSPLICE_STATE_CHECKED )
         {
-            /* No implementation yet. */
-            data->state = XSPLICE_STATE_CHECKED;
-            data->rc = 0;
+            data->rc = -EAGAIN;
+            rc = schedule_work(data, action->cmd, action->timeout);
         }
         break;
 
@@ -590,6 +1016,7 @@ static const char *state2str(uint32_t state)
 static void xsplice_printall(unsigned char key)
 {
     struct payload *data;
+    unsigned int i;
 
     printk("'%c' pressed - Dumping all xsplice patches\n", key);
 
@@ -600,10 +1027,30 @@ static void xsplice_printall(unsigned char key)
     }
 
     list_for_each_entry ( data, &payload_list, list )
+    {
         printk(" name=%s state=%s(%d) %p (.data=%p, .rodata=%p) using %u pages.\n",
                data->name, state2str(data->state), data->state, data->text_addr,
                data->rw_addr, data->ro_addr, data->pages);
 
+        for ( i = 0; i < data->nfuncs; i++ )
+        {
+            struct xsplice_patch_func *f = &(data->funcs[i]);
+            printk("    %s patch %p(%u) with %p (%u)\n",
+                   f->name, f->old_addr, f->old_size, f->new_addr, f->new_size);
+
+            if ( i && !(i % 64) )
+            {
+                spin_unlock(&payload_lock);
+                process_pending_softirqs();
+                if ( spin_trylock(&payload_lock) )
+                {
+                    printk("Couldn't reacquire lock. Try again.\n");
+                    return;
+                }
+            }
+        }
+    }
+
     spin_unlock(&payload_lock);
 }
 
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index 4083261..73a7209 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -86,10 +86,18 @@ static inline struct cpu_info *get_cpu_info(void)
 unsigned long get_stack_trace_bottom(unsigned long sp);
 unsigned long get_stack_dump_bottom (unsigned long sp);
 
+#ifdef CONFIG_XSPLICE
+# define CHECK_FOR_XSPLICE_WORK "call check_for_xsplice_work;"
+#else
+# define CHECK_FOR_XSPLICE_WORK ""
+#endif
+
 #define reset_stack_and_jump(__fn)                                      \
     ({                                                                  \
         __asm__ __volatile__ (                                          \
-            "mov %0,%%"__OP"sp; jmp %c1"                                \
+            "mov %0,%%"__OP"sp;"                                        \
+            CHECK_FOR_XSPLICE_WORK                                      \
+             "jmp %c1"                                                  \
             : : "r" (guest_cpu_user_regs()), "i" (__fn) : "memory" );   \
         unreachable();                                                  \
     })
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 416e39a..36a37ef 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -869,6 +869,26 @@ DEFINE_XEN_GUEST_HANDLE(xen_sysctl_featureset_t);
  *     If zero exit with success.
  */
 
+#define XSPLICE_PAYLOAD_VERSION 1
+/*
+ * .xsplice.funcs structure layout defined in the `Payload format`
+ * section in the xSplice design document.
+ *
+ * We guard this with __XEN__ as toolstacks SHOULD not use it.
+ */
+#ifdef __XEN__
+struct xsplice_patch_func {
+    const char *name;       /* Name of function to be patched. */
+    void *new_addr;
+    void *old_addr;
+    uint32_t new_size;
+    uint32_t old_size;
+    uint8_t version;        /* MUST be XSPLICE_PAYLOAD_VERSION. */
+    uint8_t opaque[31];     /* MUST be zero filled. */
+};
+typedef struct xsplice_patch_func xsplice_patch_func_t;
+#endif
+
 /*
  * Structure describing an ELF payload. Uniquely identifies the
  * payload. Should be human readable.
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index 857c264..c9723e4 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -11,12 +11,16 @@ struct xsplice_elf_sec;
 struct xsplice_elf_sym;
 struct xen_sysctl_xsplice_op;
 
+#include <xen/elfstructs.h>
 #ifdef CONFIG_XSPLICE
 
 /* Convenience define for printk. */
 #define XSPLICE             "xsplice: "
+/* ELF payload special section names. */
+#define ELF_XSPLICE_FUNC    ".xsplice.funcs"
 
 int xsplice_op(struct xen_sysctl_xsplice_op *);
+void check_for_xsplice_work(void);
 
 /* Arch hooks. */
 int arch_xsplice_verify_elf(const struct xsplice_elf *elf);
@@ -39,6 +43,22 @@ enum va_type {
 int arch_xsplice_secure(const void *va, unsigned int pages, enum va_type types);
 
 void arch_xsplice_init(void);
+
+#include <public/sysctl.h> /* For struct xsplice_patch_func. */
+int arch_xsplice_verify_func(const struct xsplice_patch_func *func);
+/*
+ * These functions are called around the critical region patching live code,
+ * for an architecture to take make appropratie global state adjustments.
+ */
+void arch_xsplice_patching_enter(void);
+void arch_xsplice_patching_leave(void);
+
+void arch_xsplice_apply_jmp(struct xsplice_patch_func *func);
+void arch_xsplice_revert_jmp(const struct xsplice_patch_func *func);
+void arch_xsplice_post_action(void);
+
+void arch_xsplice_mask(void);
+void arch_xsplice_unmask(void);
 #else
 
 #include <xen/errno.h> /* For -ENOSYS */
@@ -47,6 +67,7 @@ static inline int xsplice_op(struct xen_sysctl_xsplice_op *op)
     return -ENOSYS;
 }
 
+static inline void check_for_xsplice_work(void) { };
 #endif /* CONFIG_XSPLICE */
 
 #endif /* __XEN_XSPLICE_H__ */
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 13/27] x86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version'.
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (11 preceding siblings ...)
  2016-04-25 15:34 ` [PATCH v9 12/27] xsplice: Implement support for applying/reverting/replacing patches Konrad Rzeszutek Wilk
@ 2016-04-25 15:35 ` Konrad Rzeszutek Wilk
  2016-04-26 15:31   ` Jan Beulich
  2016-04-25 15:35 ` [PATCH v9 14/27] xsplice, symbols: Implement symbol name resolution on address Konrad Rzeszutek Wilk
                   ` (14 subsequent siblings)
  27 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:35 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Julien Grall, Stefano Stabellini, Keir Fraser, Jan Beulich,
	Konrad Rzeszutek Wilk

This change demonstrates how to generate an xSplice ELF payload.

The idea here is that we want to patch in the hypervisor
the 'xen_version_extra' function with an function that will
return 'Hello World'. The 'xl info | grep extraversion'
will reflect the new value after the patching.

To generate this ELF payload file we need:
 - C code of the new code (xen_hello_world_func.c).
 - C code generating the .xsplice.funcs structure
   (xen_hello_world.c)
 - The address of the old code (xen_extra_version). We
   retrieve it by  using 'nm --defined' on xen-syms.
 - The size of the new and old code for which we use
   nm --defined -S on our code and xen-syms respectively.

There are two C files and one header files generated
during build. One could make this one C file if the
size of the newly patched function size was known in
advance (or an random value was choosen).

There is also a strict order of compiling:
 1) xen_hello_world_func.c
 2) config.h - extract the size of the new function,
    the old function and the old function address.
 3) xen_hello_world.c - which contains the .xsplice.funcs
    structure.
 4) Link the object files in an xen_hello_world.xsplice file.

The use-case is simple:

$xen-xsplice load /usr/lib/debug/xen_hello_world.xsplice
$xen-xsplice list
 ID                                     | status
----------------------------------------+------------
xen_hello_world                           APPLIED
$xl info | grep extra
xen_extra              : Hello World
$xen-xsplice revert xen_hello_world
Performing revert: completed
$xen-xsplice unload xen_hello_world
Performing unload: completed
$xl info | grep extra
xen_extra              : -unstable

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com> [ARM]
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v2: Do it using hypervisor Makefiles
v3: Remove the stale linker file.
    Add Copyright and local definition block
    s/name/xen_hello_world_name/
v6: Remove the 'install', and 'uninstall' destinations.
    Remove xen/config.h from files.
v7: Made the build target be called 'tests'.
    Changed the .name to have 'xen_extra_version' to be consistent
    with the spec.
    Add Julien's Ack and Andrew's Reviewed-by.
v9: old_code and new_code are void, so drop the unsigned long cast
    and add void* - in both test-cases and document.
    Make tests target on ARM phony
    Add build dependencies on x86 build
    Include public/sysctl.h as CONFIG_XSPLICE may not be exposed.
---
 .gitignore                               |  2 ++
 docs/misc/xsplice.markdown               | 37 +++++++++++++++++++++++++++
 xen/Makefile                             |  8 ++++--
 xen/arch/arm/Makefile                    |  3 +++
 xen/arch/x86/Makefile                    |  4 +++
 xen/arch/x86/test/Makefile               | 43 ++++++++++++++++++++++++++++++++
 xen/arch/x86/test/xen_hello_world.c      | 32 ++++++++++++++++++++++++
 xen/arch/x86/test/xen_hello_world_func.c | 22 ++++++++++++++++
 8 files changed, 149 insertions(+), 2 deletions(-)
 create mode 100644 xen/arch/x86/test/Makefile
 create mode 100644 xen/arch/x86/test/xen_hello_world.c
 create mode 100644 xen/arch/x86/test/xen_hello_world_func.c

diff --git a/.gitignore b/.gitignore
index 39eb779..4a81f43 100644
--- a/.gitignore
+++ b/.gitignore
@@ -246,6 +246,8 @@ xen/arch/x86/efi.lds
 xen/arch/x86/efi/check.efi
 xen/arch/x86/efi/disabled
 xen/arch/x86/efi/mkreloc
+xen/arch/x86/test/config.h
+xen/arch/x86/test/xen_hello_world.xsplice
 xen/arch/*/efi/boot.c
 xen/arch/*/efi/compat.c
 xen/arch/*/efi/efi.h
diff --git a/docs/misc/xsplice.markdown b/docs/misc/xsplice.markdown
index 99711bf..62f143e 100644
--- a/docs/misc/xsplice.markdown
+++ b/docs/misc/xsplice.markdown
@@ -331,6 +331,43 @@ When reverting a patch, the hypervisor iterates over each `xsplice_patch_func`
 and the core code copies the data from the undo buffer (private internal copy)
 to `old_addr`.
 
+### Example of .xsplice.funcs
+
+A simple example of what a payload file can be:
+
+<pre>
+/* MUST be in sync with hypervisor. */  
+struct xsplice_patch_func {  
+    const char *name;  
+    void *new_addr;  
+    void *old_addr;  
+    uint32_t new_size;  
+    uint32_t old_size;  
+    uint8_t version;
+    uint8_t pad[31];  
+};  
+
+/* Our replacement function for xen_extra_version. */  
+const char *xen_hello_world(void)  
+{  
+    return "Hello World";  
+}  
+
+static unsigned char patch_this_fnc[] = "xen_extra_version";  
+
+struct xsplice_patch_func xsplice_hello_world = {  
+    .version = XSPLICE_PAYLOAD_VERSION,
+    .name = patch_this_fnc,  
+    .new_addr = xen_hello_world,  
+    .old_addr = (void *)0xffff82d08013963c, /* Extracted from xen-syms. */  
+    .new_size = 13, /* To be be computed by scripts. */  
+    .old_size = 13, /* -----------""---------------  */  
+} __attribute__((__section__(".xsplice.funcs")));  
+
+</pre>
+
+Code must be compiled with -fPIC.
+
 ## Hypercalls
 
 We will employ the sub operations of the system management hypercall (sysctl).
diff --git a/xen/Makefile b/xen/Makefile
index c908544..eb0482e 100644
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -39,8 +39,8 @@ dist: install
 
 build install:: include/config/auto.conf
 
-.PHONY: build install uninstall clean distclean cscope TAGS tags MAP gtags
-build install uninstall debug clean distclean cscope TAGS tags MAP gtags::
+.PHONY: build install uninstall clean distclean cscope TAGS tags MAP gtags tests
+build install uninstall debug clean distclean cscope TAGS tags MAP gtags tests::
 ifneq ($(XEN_TARGET_ARCH),x86_32)
 	$(MAKE) -f Rules.mk _$@
 else
@@ -76,6 +76,10 @@ _install: $(TARGET)$(CONFIG_XEN_INSTALL_SUFFIX)
 		fi; \
 	fi
 
+.PHONY: _tests
+_tests:
+	$(MAKE) -f $(BASEDIR)/Rules.mk -C arch/$(TARGET_ARCH) tests
+
 .PHONY: _uninstall
 _uninstall: D=$(DESTDIR)
 _uninstall: T=$(notdir $(TARGET))
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index eae5cb3..f77f8db 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -57,6 +57,9 @@ ifeq ($(CONFIG_ARM_64),y)
 	ln -sf $(notdir $@)  ../../$(notdir $@).efi
 endif
 
+.PHONY: tests
+tests:
+
 $(TARGET).axf: $(TARGET)-syms
 	# XXX: VE model loads by VMA so instead of
 	# making a proper ELF we link with LMA == VMA and adjust crudely
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index f74fd2c..f8f6eeb 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -76,6 +76,9 @@ $(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32
 	./boot/mkelf32 $(TARGET)-syms $(TARGET) 0x100000 \
 	`$(NM) -nr $(TARGET)-syms | head -n 1 | sed -e 's/^\([^ ]*\).*/0x\1/'`
 
+.PHONY: tests
+tests:
+	$(MAKE) -f $(BASEDIR)/Rules.mk -C test xsplice
 
 ALL_OBJS := $(BASEDIR)/arch/x86/boot/built_in.o $(BASEDIR)/arch/x86/efi/built_in.o $(ALL_OBJS)
 
@@ -179,3 +182,4 @@ clean::
 	rm -f $(BASEDIR)/.xen-syms.[0-9]* boot/.*.d
 	rm -f $(BASEDIR)/.xen.efi.[0-9]* efi/*.o efi/.*.d efi/*.efi efi/disabled efi/mkreloc
 	rm -f boot/reloc.S boot/reloc.lnk boot/reloc.bin
+	$(MAKE) -f $(BASEDIR)/Rules.mk -C test clean
diff --git a/xen/arch/x86/test/Makefile b/xen/arch/x86/test/Makefile
new file mode 100644
index 0000000..b6af07c
--- /dev/null
+++ b/xen/arch/x86/test/Makefile
@@ -0,0 +1,43 @@
+include $(XEN_ROOT)/Config.mk
+
+CODE_ADDR=$(shell nm --defined $(1) | grep $(2) | awk '{print "0x"$$1}')
+CODE_SZ=$(shell nm --defined -S $(1) | grep $(2) | awk '{ print "0x"$$2}')
+
+.PHONY: default
+
+XSPLICE := xen_hello_world.xsplice
+
+default: xsplice
+
+install: xsplice
+	$(INSTALL_DATA) $(XSPLICE) $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE)
+uninstall:
+	rm -f $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE)
+
+.PHONY: clean
+clean::
+	rm -f *.o .*.o.d $(XSPLICE) config.h
+
+#
+# To compute these values we need the binary files: xen-syms
+# and xen_hello_world_func.o to be already compiled.
+#
+.PHONY: config.h
+config.h: OLD_CODE=$(call CODE_ADDR,$(BASEDIR)/xen-syms,xen_extra_version)
+config.h: OLD_CODE_SZ=$(call CODE_SZ,$(BASEDIR)/xen-syms,xen_extra_version)
+config.h: NEW_CODE_SZ=$(call CODE_SZ,$<,xen_hello_world)
+config.h: xen_hello_world_func.o
+	(set -e; \
+	 echo "#define NEW_CODE_SZ $(NEW_CODE_SZ)"; \
+	 echo "#define OLD_CODE_SZ $(OLD_CODE_SZ)"; \
+	 echo "#define OLD_CODE $(OLD_CODE)") > $@
+
+xen_hello_world.o: xen_hello_world_func.o
+
+.PHONY: $(XSPLICE)
+$(XSPLICE): config.h xen_hello_world_func.o xen_hello_world.o
+	$(LD) $(LDFLAGS) -r -o $(XSPLICE) xen_hello_world_func.o \
+		xen_hello_world.o
+
+.PHONY: xsplice
+xsplice: $(XSPLICE)
diff --git a/xen/arch/x86/test/xen_hello_world.c b/xen/arch/x86/test/xen_hello_world.c
new file mode 100644
index 0000000..f42b25c
--- /dev/null
+++ b/xen/arch/x86/test/xen_hello_world.c
@@ -0,0 +1,32 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#include "config.h"
+#include <xen/types.h>
+#include <xen/xsplice.h>
+
+#include <public/sysctl.h>
+
+static char hello_world_patch_this_fnc[] = "xen_extra_version";
+extern const char *xen_hello_world(void);
+
+struct xsplice_patch_func __section(".xsplice.funcs") xsplice_xen_hello_world = {
+    .version = XSPLICE_PAYLOAD_VERSION,
+    .name = hello_world_patch_this_fnc,
+    .new_addr = xen_hello_world,
+    .old_addr = (void *)OLD_CODE,
+    .new_size = NEW_CODE_SZ,
+    .old_size = OLD_CODE_SZ,
+};
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/test/xen_hello_world_func.c b/xen/arch/x86/test/xen_hello_world_func.c
new file mode 100644
index 0000000..1ad002a
--- /dev/null
+++ b/xen/arch/x86/test/xen_hello_world_func.c
@@ -0,0 +1,22 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#include <xen/types.h>
+
+/* Our replacement function for xen_extra_version. */
+const char *xen_hello_world(void)
+{
+    return "Hello World";
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 14/27] xsplice, symbols: Implement symbol name resolution on address.
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (12 preceding siblings ...)
  2016-04-25 15:35 ` [PATCH v9 13/27] x86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version' Konrad Rzeszutek Wilk
@ 2016-04-25 15:35 ` Konrad Rzeszutek Wilk
  2016-04-26 15:48   ` Jan Beulich
  2016-04-25 15:35 ` [PATCH v9 15/27] xsplice, symbols: Implement fast symbol names -> virtual addresses lookup Konrad Rzeszutek Wilk
                   ` (13 subsequent siblings)
  27 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:35 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Keir Fraser, Jan Beulich, Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

If in the payload we do not have the old_addr we can resolve
the virtual address based on the UNDEFined symbols.

We also use an boolean flag: new_symbol to track symbols. The usual
case this is used is by:

* A payload may introduce a new symbol
* A payload may override an existing symbol (introduced in Xen or another
  payload)
* Overriding symbols must exist in the symtab for backtraces.
* A payload must always link against the object which defines the new symbol.

Considering that payloads may be loaded in any order it would be incorrect to
link against a payload which simply overrides a symbol because you could end
up with a chain of jumps which is inefficient and may result in the expected
function not being executed.

Since the payload we get is an relocatable image (partial linked ELF file)
we have to match up the symbols. We follow the ELF visibility rules for that
and for local symbols do what bintutils ld does.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

---
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v1: Ross original version.
v2: Include test-case and document update.
v2: s/size_t/ssize_t/
    Include core_text_size, core_text calculation
v4: Cast on dprintk to uint64_t to make ELF 32bit build.
v6: Rebase where the spinlock is no more recursive. Drop the spinlock
    usage in xsplice_symbols_lookup_by_name
v7: Add Andrew's Reviewed-by
    Initialize addr and symname to zero as xensyms_read uses it.
v8: Change one XENLOG_DEBUG to XENLOG_ERR.
    Change printk to dprintk on symbols and one error case.
v9: 'new_addr' is now void, so change it to from unsigned long
    to void *.
    Include <xen/version.h> header in test case.
    Drop initialized for name in symbols_lookup_by_name.
    Make ->symtab and ->strtab of struct payload const. As such
    cast void* when freeing it.`
    Drop 'size' from xsplice_symbol struct.
    Make return value be void*
    Make 'is_core_symbol' code have the same behavior as what binutils linker
    has.
---
 xen/arch/x86/Makefile               |  16 +++-
 xen/arch/x86/platform_hypercall.c   |   5 +-
 xen/arch/x86/test/Makefile          |   4 +-
 xen/arch/x86/test/xen_hello_world.c |   3 +-
 xen/common/symbols.c                |  36 +++++++-
 xen/common/xsplice.c                | 179 +++++++++++++++++++++++++++++++++++-
 xen/common/xsplice_elf.c            |  20 +++-
 xen/include/xen/elfstructs.h        |   1 +
 xen/include/xen/symbols.h           |   4 +-
 xen/include/xen/xsplice.h           |   7 ++
 10 files changed, 259 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index f8f6eeb..fdf4202 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -72,6 +72,12 @@ efi-y := $(shell if [ ! -r $(BASEDIR)/include/xen/compile.h -o \
                       -O $(BASEDIR)/include/xen/compile.h ]; then \
                          echo '$(TARGET).efi'; fi)
 
+ifdef CONFIG_XSPLICE
+all_symbols = --all-symbols
+else
+all_symbols =
+endif
+
 $(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32
 	./boot/mkelf32 $(TARGET)-syms $(TARGET) 0x100000 \
 	`$(NM) -nr $(TARGET)-syms | head -n 1 | sed -e 's/^\([^ ]*\).*/0x\1/'`
@@ -111,12 +117,14 @@ $(TARGET)-syms: prelink.o xen.lds $(BASEDIR)/common/symbols-dummy.o
 	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
 	    $(BASEDIR)/common/symbols-dummy.o -o $(@D)/.$(@F).0
 	$(NM) -pa --format=sysv $(@D)/.$(@F).0 \
-		| $(BASEDIR)/tools/symbols --sysv --sort >$(@D)/.$(@F).0.S
+		| $(BASEDIR)/tools/symbols $(all_symbols) --sysv --sort \
+		>$(@D)/.$(@F).0.S
 	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).0.o
 	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
 	    $(@D)/.$(@F).0.o -o $(@D)/.$(@F).1
 	$(NM) -pa --format=sysv $(@D)/.$(@F).1 \
-		| $(BASEDIR)/tools/symbols --sysv --sort --warn-dup >$(@D)/.$(@F).1.S
+		| $(BASEDIR)/tools/symbols $(all_symbols) --sysv --sort --warn-dup \
+		>$(@D)/.$(@F).1.S
 	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1.o
 	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
 	    $(@D)/.$(@F).1.o -o $@
@@ -140,14 +148,14 @@ $(TARGET).efi: prelink-efi.o efi.lds efi/relocs-dummy.o $(BASEDIR)/common/symbol
 	                $(BASEDIR)/common/symbols-dummy.o -o $(@D)/.$(@F).$(base).0 &&) :
 	$(guard) efi/mkreloc $(foreach base,$(VIRT_BASE) $(ALT_BASE),$(@D)/.$(@F).$(base).0) >$(@D)/.$(@F).0r.S
 	$(guard) $(NM) -pa --format=sysv $(@D)/.$(@F).$(VIRT_BASE).0 \
-		| $(guard) $(BASEDIR)/tools/symbols --sysv --sort >$(@D)/.$(@F).0s.S
+		| $(guard) $(BASEDIR)/tools/symbols $(all_symbols) --sysv --sort >$(@D)/.$(@F).0s.S
 	$(guard) $(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).0r.o $(@D)/.$(@F).0s.o
 	$(foreach base, $(VIRT_BASE) $(ALT_BASE), \
 	          $(guard) $(LD) $(call EFI_LDFLAGS,$(base)) -T efi.lds -N $< \
 	                $(@D)/.$(@F).0r.o $(@D)/.$(@F).0s.o -o $(@D)/.$(@F).$(base).1 &&) :
 	$(guard) efi/mkreloc $(foreach base,$(VIRT_BASE) $(ALT_BASE),$(@D)/.$(@F).$(base).1) >$(@D)/.$(@F).1r.S
 	$(guard) $(NM) -pa --format=sysv $(@D)/.$(@F).$(VIRT_BASE).1 \
-		| $(guard) $(BASEDIR)/tools/symbols --sysv --sort >$(@D)/.$(@F).1s.S
+		| $(guard) $(BASEDIR)/tools/symbols $(all_symbols) --sysv --sort >$(@D)/.$(@F).1s.S
 	$(guard) $(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1r.o $(@D)/.$(@F).1s.o
 	$(guard) $(LD) $(call EFI_LDFLAGS,$(VIRT_BASE)) -T efi.lds -N $< \
 	                $(@D)/.$(@F).1r.o $(@D)/.$(@F).1s.o -o $@
diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
index 39fa808..8dbba24 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -798,12 +798,13 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
         static char name[KSYM_NAME_LEN + 1]; /* protected by xenpf_lock */
         XEN_GUEST_HANDLE(char) nameh;
         uint32_t namelen, copylen;
+        uint64_t addr;
 
         guest_from_compat_handle(nameh, op->u.symdata.name);
 
         ret = xensyms_read(&op->u.symdata.symnum, &op->u.symdata.type,
-                           &op->u.symdata.address, name);
-
+                           &addr, name);
+        op->u.symdata.address = addr;
         namelen = strlen(name) + 1;
 
         if ( namelen > op->u.symdata.namelen )
diff --git a/xen/arch/x86/test/Makefile b/xen/arch/x86/test/Makefile
index b6af07c..af72aff 100644
--- a/xen/arch/x86/test/Makefile
+++ b/xen/arch/x86/test/Makefile
@@ -23,14 +23,12 @@ clean::
 # and xen_hello_world_func.o to be already compiled.
 #
 .PHONY: config.h
-config.h: OLD_CODE=$(call CODE_ADDR,$(BASEDIR)/xen-syms,xen_extra_version)
 config.h: OLD_CODE_SZ=$(call CODE_SZ,$(BASEDIR)/xen-syms,xen_extra_version)
 config.h: NEW_CODE_SZ=$(call CODE_SZ,$<,xen_hello_world)
 config.h: xen_hello_world_func.o
 	(set -e; \
 	 echo "#define NEW_CODE_SZ $(NEW_CODE_SZ)"; \
-	 echo "#define OLD_CODE_SZ $(OLD_CODE_SZ)"; \
-	 echo "#define OLD_CODE $(OLD_CODE)") > $@
+	 echo "#define OLD_CODE_SZ $(OLD_CODE_SZ)") > $@
 
 xen_hello_world.o: xen_hello_world_func.o
 
diff --git a/xen/arch/x86/test/xen_hello_world.c b/xen/arch/x86/test/xen_hello_world.c
index f42b25c..8afd66e 100644
--- a/xen/arch/x86/test/xen_hello_world.c
+++ b/xen/arch/x86/test/xen_hello_world.c
@@ -5,6 +5,7 @@
 
 #include "config.h"
 #include <xen/types.h>
+#include <xen/version.h>
 #include <xen/xsplice.h>
 
 #include <public/sysctl.h>
@@ -16,7 +17,7 @@ struct xsplice_patch_func __section(".xsplice.funcs") xsplice_xen_hello_world =
     .version = XSPLICE_PAYLOAD_VERSION,
     .name = hello_world_patch_this_fnc,
     .new_addr = xen_hello_world,
-    .old_addr = (void *)OLD_CODE,
+    .old_addr = xen_extra_version,
     .new_size = NEW_CODE_SZ,
     .old_size = OLD_CODE_SZ,
 };
diff --git a/xen/common/symbols.c b/xen/common/symbols.c
index b18ddcd1..8b4d0fd 100644
--- a/xen/common/symbols.c
+++ b/xen/common/symbols.c
@@ -170,7 +170,7 @@ static char symbols_get_symbol_type(unsigned int off)
 }
 
 int xensyms_read(uint32_t *symnum, char *type,
-                 uint64_t *address, char *name)
+                 unsigned long *address, char *name)
 {
     /*
      * Symbols are most likely accessed sequentially so we remember position
@@ -207,3 +207,37 @@ int xensyms_read(uint32_t *symnum, char *type,
 
     return 0;
 }
+
+void *symbols_lookup_by_name(const char *symname)
+{
+    char name[KSYM_NAME_LEN + 1];
+    uint32_t symnum = 0;
+    char type;
+    unsigned long addr;
+    int rc;
+
+    if ( *symname == '\0' )
+        return NULL;
+
+    do {
+        rc = xensyms_read(&symnum, &type, &addr, name);
+        if ( rc )
+           break;
+
+        if ( !strcmp(name, symname) )
+            return (void *)addr;
+
+    } while ( name[0] != '\0' );
+
+    return NULL;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index efb396a..6051ade 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -14,6 +14,7 @@
 #include <xen/smp.h>
 #include <xen/softirq.h>
 #include <xen/spinlock.h>
+#include <xen/symbols.h>
 #include <xen/vmap.h>
 #include <xen/wait.h>
 #include <xen/xsplice_elf.h>
@@ -53,6 +54,9 @@ struct payload {
     struct list_head applied_list;       /* Linked to 'applied_list'. */
     struct xsplice_patch_func *funcs;    /* The array of functions to patch. */
     unsigned int nfuncs;                 /* Nr of functions to patch. */
+    const struct xsplice_symbol *symtab; /* All symbols. */
+    const char *strtab;                  /* Pointer to .strtab. */
+    unsigned int nsyms;                  /* Nr of entries in .strtab and symbols. */
     char name[XEN_XSPLICE_NAME_SIZE];    /* Name of it. */
 };
 
@@ -116,6 +120,28 @@ static int verify_payload(const xen_sysctl_xsplice_upload_t *upload, char *n)
     return 0;
 }
 
+void *xsplice_symbols_lookup_by_name(const char *symname)
+{
+    const struct payload *data;
+
+    ASSERT(spin_is_locked(&payload_lock));
+    list_for_each_entry ( data, &payload_list, list )
+    {
+        unsigned int i;
+
+        for ( i = 0; i < data->nsyms; i++ )
+        {
+            if ( !data->symtab[i].new_symbol )
+                continue;
+
+            if ( !strcmp(data->symtab[i].name, symname) )
+                return data->symtab[i].value;
+        }
+    }
+
+    return 0;
+}
+
 static struct payload *find_payload(const char *name)
 {
     struct payload *data, *found = NULL;
@@ -376,11 +402,152 @@ static int prepare_payload(struct payload *payload,
         rc = arch_xsplice_verify_func(f);
         if ( rc )
             return rc;
+
+        /* Lookup function's old address if not already resolved. */
+        if ( !f->old_addr )
+        {
+            f->old_addr = symbols_lookup_by_name(f->name);
+            if ( !f->old_addr )
+            {
+                f->old_addr = xsplice_symbols_lookup_by_name(f->name);
+                if ( !f->old_addr )
+                {
+                    dprintk(XENLOG_ERR, XSPLICE "%s: Could not resolve old address of %s\n",
+                            elf->name, f->name);
+                    return -ENOENT;
+                }
+            }
+            dprintk(XENLOG_DEBUG, XSPLICE "%s: Resolved old address %s => %p\n",
+                    elf->name, f->name, f->old_addr);
+        }
     }
 
     return 0;
 }
 
+static bool_t is_payload_symbol(const struct xsplice_elf *elf,
+                                const struct xsplice_elf_sym *sym)
+{
+    if ( sym->sym->st_shndx == SHN_UNDEF ||
+         sym->sym->st_shndx >= elf->hdr->e_shnum )
+        return 0;
+
+    /*
+     * The payload is not a final image as we dynmically link against it.
+     * As such the linker has left symbols we don't care about and which
+     * binutils would have removed had it be a final image. Hence we:
+     * - For SHF_ALLOC - ignore symbols referring to sections that are not
+     *   loaded.
+     */
+    if ( !(elf->sec[sym->sym->st_shndx].sec->sh_flags & SHF_ALLOC) )
+        return 0;
+
+    /* - And ignore empty symbols (\0). */
+    if ( *sym->name == '\0' )
+        return 0;
+
+    /*
+     * - For SHF_MERGE - ignore local symbols referring to mergeable sections.
+     *    (ld squashes them all in one section and discards the symbols) when
+     *    those symbols start with '.L' (like .LCx). Those are intermediate
+     *    artifacts of assembly.
+     *
+     * See elf_link_input_bfd and _bfd_elf_is_local_label_name in binutils.
+     */
+    if ( (elf->sec[sym->sym->st_shndx].sec->sh_flags & SHF_MERGE) &&
+         !strncmp(sym->name, ".L", 2) )
+        return 0;
+
+    return 1;
+}
+
+static int build_symbol_table(struct payload *payload,
+                              const struct xsplice_elf *elf)
+{
+    unsigned int i, j, nsyms = 0;
+    size_t strtab_len = 0;
+    struct xsplice_symbol *symtab;
+    char *strtab;
+
+    ASSERT(payload->nfuncs);
+
+    /* Recall that section @0 is always NULL. */
+    for ( i = 1; i < elf->nsym; i++ )
+    {
+        if ( is_payload_symbol(elf, elf->sym + i) )
+        {
+            nsyms++;
+            strtab_len += strlen(elf->sym[i].name) + 1;
+        }
+    }
+
+    symtab = xmalloc_array(struct xsplice_symbol, nsyms);
+    strtab = xmalloc_array(char, strtab_len);
+
+    if ( !strtab || !symtab )
+    {
+        xfree(strtab);
+        xfree(symtab);
+        return -ENOMEM;
+    }
+
+    nsyms = 0;
+    strtab_len = 0;
+    for ( i = 1; i < elf->nsym; i++ )
+    {
+        if ( is_payload_symbol(elf, elf->sym + i) )
+        {
+            symtab[nsyms].name = strtab + strtab_len;
+            symtab[nsyms].value = (void *)elf->sym[i].sym->st_value;
+            symtab[nsyms].new_symbol = 0; /* May be overwritten below. */
+            strtab_len += strlcpy(strtab + strtab_len, elf->sym[i].name,
+                                  KSYM_NAME_LEN) + 1;
+            nsyms++;
+        }
+    }
+
+    for ( i = 0; i < nsyms; i++ )
+    {
+        bool_t found = 0;
+
+        for ( j = 0; j < payload->nfuncs; j++ )
+        {
+            if ( symtab[i].value == payload->funcs[j].new_addr )
+            {
+                found = 1;
+                break;
+            }
+        }
+
+        if ( !found )
+        {
+            if ( xsplice_symbols_lookup_by_name(symtab[i].name) )
+            {
+                dprintk(XENLOG_ERR, XSPLICE "%s: duplicate new symbol: %s\n",
+                        elf->name, symtab[i].name);
+                xfree(symtab);
+                xfree(strtab);
+                return -EEXIST;
+            }
+            symtab[i].new_symbol = 1;
+            dprintk(XENLOG_DEBUG, XSPLICE "%s: new symbol %s\n",
+                     elf->name, symtab[i].name);
+        }
+        else
+        {
+            /* new_symbol is not set. */
+            dprintk(XENLOG_DEBUG, XSPLICE "%s: overriding symbol %s\n",
+                    elf->name, symtab[i].name);
+        }
+    }
+
+    payload->symtab = symtab;
+    payload->strtab = strtab;
+    payload->nsyms = nsyms;
+
+    return 0;
+}
+
 static void free_payload(struct payload *data)
 {
     ASSERT(spin_is_locked(&payload_lock));
@@ -388,6 +555,8 @@ static void free_payload(struct payload *data)
     payload_cnt--;
     payload_version++;
     free_payload_data(data);
+    xfree((void *)data->symtab);
+    xfree((void *)data->strtab);
     xfree(data);
 }
 
@@ -420,6 +589,10 @@ static int load_payload_data(struct payload *payload, void *raw, size_t len)
     if ( rc )
         goto out;
 
+    rc = build_symbol_table(payload, &elf);
+    if ( rc )
+        goto out;
+
     rc = secure_payload(payload, &elf);
 
  out:
@@ -487,8 +660,12 @@ static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
 
     vfree(raw_data);
 
-    if ( rc )
+    if ( rc && data )
+    {
+        xfree((void *)data->symtab);
+        xfree((void *)data->strtab);
         xfree(data);
+    }
 
     return rc;
 }
diff --git a/xen/common/xsplice_elf.c b/xen/common/xsplice_elf.c
index 8501138..7dcf52f 100644
--- a/xen/common/xsplice_elf.c
+++ b/xen/common/xsplice_elf.c
@@ -4,6 +4,7 @@
 
 #include <xen/errno.h>
 #include <xen/lib.h>
+#include <xen/symbols.h>
 #include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 
@@ -273,9 +274,22 @@ int xsplice_elf_resolve_symbols(struct xsplice_elf *elf)
             break;
 
         case SHN_UNDEF:
-            dprintk(XENLOG_ERR, XSPLICE "%s: Unknown symbol: %s\n",
-                    elf->name, elf->sym[i].name);
-            rc = -ENOENT;
+            sym->st_value = (unsigned long)
+                    symbols_lookup_by_name(elf->sym[i].name);
+            if ( !sym->st_value )
+            {
+                sym->st_value = (unsigned long)
+                        xsplice_symbols_lookup_by_name(elf->sym[i].name);
+                if ( !sym->st_value )
+                {
+                    dprintk(XENLOG_ERR, XSPLICE "%s: Unknown symbol: %s\n",
+                            elf->name, elf->sym[i].name);
+                    rc = -ENOENT;
+                    break;
+                }
+            }
+            dprintk(XENLOG_DEBUG, XSPLICE "%s: Undefined symbol resolved: %s => %#"PRIxElfAddr"\n",
+                    elf->name, elf->sym[i].name, sym->st_value);
             break;
 
         case SHN_ABS:
diff --git a/xen/include/xen/elfstructs.h b/xen/include/xen/elfstructs.h
index 2b9bd3f..ab9b1ea 100644
--- a/xen/include/xen/elfstructs.h
+++ b/xen/include/xen/elfstructs.h
@@ -263,6 +263,7 @@ typedef struct {
 #define SHF_WRITE	0x1		/* Writable */
 #define SHF_ALLOC	0x2		/* occupies memory */
 #define SHF_EXECINSTR	0x4		/* executable */
+#define SHF_MERGE	0x10            /* mergeable */
 #define SHF_MASKPROC	0xf0000000	/* reserved bits for processor */
 					/*  specific section attributes */
 
diff --git a/xen/include/xen/symbols.h b/xen/include/xen/symbols.h
index f58e611..2122a5d 100644
--- a/xen/include/xen/symbols.h
+++ b/xen/include/xen/symbols.h
@@ -21,6 +21,8 @@ const char *symbols_lookup(unsigned long addr,
                            char *namebuf);
 
 int xensyms_read(uint32_t *symnum, char *type,
-                 uint64_t *address, char *name);
+                 unsigned long *address, char *name);
+
+void *symbols_lookup_by_name(const char *symname);
 
 #endif /*_XEN_SYMBOLS_H*/
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index c9723e4..1526752 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -19,8 +19,15 @@ struct xen_sysctl_xsplice_op;
 /* ELF payload special section names. */
 #define ELF_XSPLICE_FUNC    ".xsplice.funcs"
 
+struct xsplice_symbol {
+    const char *name;
+    void *value;
+    bool_t new_symbol;
+};
+
 int xsplice_op(struct xen_sysctl_xsplice_op *);
 void check_for_xsplice_work(void);
+void *xsplice_symbols_lookup_by_name(const char *symname);
 
 /* Arch hooks. */
 int arch_xsplice_verify_elf(const struct xsplice_elf *elf);
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 15/27] xsplice, symbols: Implement fast symbol names -> virtual addresses lookup
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (13 preceding siblings ...)
  2016-04-25 15:35 ` [PATCH v9 14/27] xsplice, symbols: Implement symbol name resolution on address Konrad Rzeszutek Wilk
@ 2016-04-25 15:35 ` Konrad Rzeszutek Wilk
  2016-04-26 15:53   ` Jan Beulich
  2016-04-25 15:35 ` [PATCH v9 16/27] x86, xsplice: Print payload's symbol name and payload name in backtraces Konrad Rzeszutek Wilk
                   ` (12 subsequent siblings)
  27 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:35 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Keir Fraser, Jan Beulich, Konrad Rzeszutek Wilk

The current mechanism is geared towards fast virtual address ->
symbol names lookup. This is fine for the normal use cases
(BUG_ON, WARN_ON, etc), but for xSplice - where we need to find
hypervisor symbols - it is slow.

To understand this patch, a description of the existing
method is explained first. For folks familar go to 'NEW CODE:'.

HOW IT WORKS:

The symbol table lookup mechanism uses a simple encoding mechanism
where it extracts the common ascii characters that the symbol's use.

This saves us space. The lookup mechanism is geared towards looking
up symbols based on address. We have one 0..N (where N is
the number of symbols, so 6849 for example) table:

symbols_addresses[0..N]

And an 1-1 (in a loose fashion) of the symbols (encoded) in a
symbols_names stream of size N.

The N is variable (later on that below)

The symbols_names are sorted based on symbols_addresses, which
means that the decoded entries inside symbols_names are not in
ascending or descending order.

There is also the encoding mechanism - the table of 255 entries
called symbols_token_index[]. And the symbols_token_table which
is an stream of ASCIIZ characters, such as (it really
is not a table as the values are variable):

@0   .asciz  "credit"
@6   .asciz  "mask"
..
@300 .asciz  "S"

And the symbols_token_index:
@0        .short  0
@1        .short  7
@2        .short  12
@4        .short  16
...
@84         .short  300

The relationship between them is that the symbols_token_index
gives us the offset to symbols_token_table.

The symbol_names[] array is a stream of encoded values. Each value
follows the same pattern - <len> followed by <encoding values>.
And the another <len> followed by <encoding values>.

Hence to find the right one you need to read <len>, add <len>
(to skip over), read <len>, add <len>, and so on until one
finds the right tuple offset.

The <encoding values> are the indicies into the symbols_token_index.

Meaning if you have:
  0x04, 0x54, 0xda, 0xe2, 0x74
  [4, 84, 218, 226, 116 in human numbering]

The 0x04 tells us that the symbol is four bytes past this one (so next
symbol offset starts at 5). If we lookup symbols_token_index[84] we get 300.
symbols_token[300] gets us the "S". And so on, the string eventually
end up being decode to be 'S_stext'. The first character is the type,
then optionally follwed by the filename (and # right after filename)
and then lastly the symbol, such as:

tvpmu_intel.c#core2_vpmu_do_interrupt

Keep in mind that there are two fixed sized tables:
symbols_addresses[0..symbols_num_syms], and
symbols_markers[0..symbols_num_syms/255].

The symbols_markers is used to speed searching for the right address.
It gives us the offsets within symbol_names that start at the <len><encoded value>.

The way to find a symbol based on the address is:
1) Figure out the 'tuple offset' from symbols_address[0..symbols_num_syms].
   This table is sorted by virtual addresses so finding the value is simple.
2) Get starting offset of symbol_names by retrieving value of
   symbol_markers['tuple offset' / 255].
3). Iterate up to 'tuple_offset & 255' in symbols_markers stream starting
   at 'offset'.
4). Decode the <len><encoded value>

This however does not work very well if we want to search the other
way - we have the symbol name and want to find the address.

NEW CODE:

To make that work we add one fixed size table called symbols_sorted_offsets which
has two elements: offset in symbol stream, offset in the symbol-address.

This whole array is sorted on the original symbol name during build-time
(in case of collision we also take into account the type).

The values are for example:

symbols_sorted_offsets:
    .long 83363, 6302 # [.bss, len=5]
    .long 80459, 6084 # [.data, len=5]
..
[The # added for clarity]

Which makes it incredibly easy to get in the symbols_names and also
symbols_addresses (or symbols_offsets)

Searching for symbols is simplified as we can do a binary search
on symbols_sorted_offsets. Since the symbols are sorted it takes on
average 13 calls to symbols_expand_symbol.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>

v8: New
 - Remove the debug code
 - Return the 'mid' index in symbol_addresses, not the 'low'.
v9:
 - Make it return void*
 - Ditch the old implementation. Use a single fixed-size array
   with two uint32_t values - offset in stream and offset in address.
 - Change printf in symbols.c to %u. Change parameter to --sort-by-name.
 - Squash the two seperate implementation of symbols_lookup_by_name
   in one function using #ifdefs.
 - Fix comment and simplify compare_name_orig code.
---
 xen/arch/x86/Makefile      |  3 +++
 xen/common/Kconfig         | 12 +++++++++++
 xen/common/symbols-dummy.c |  5 +++++
 xen/common/symbols.c       | 28 ++++++++++++++++++++++++++
 xen/include/xen/symbols.h  |  8 ++++++++
 xen/tools/symbols.c        | 50 ++++++++++++++++++++++++++++++++++++++++++++--
 6 files changed, 104 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index fdf4202..900fa59 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -74,6 +74,9 @@ efi-y := $(shell if [ ! -r $(BASEDIR)/include/xen/compile.h -o \
 
 ifdef CONFIG_XSPLICE
 all_symbols = --all-symbols
+ifdef CONFIG_FAST_SYMBOL_LOOKUP
+all_symbols = --all-symbols --sort-by-name
+endif
 else
 all_symbols =
 endif
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 692ef51..e4f86c2 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -200,4 +200,16 @@ config XSPLICE
 
 	  If unsure, say Y.
 
+config FAST_SYMBOL_LOOKUP
+	bool "Fast symbol lookup (bigger binary)"
+	default y
+	depends on XSPLICE
+	---help---
+	  When searching for symbol addresses we can use the built-in system
+	  that is optimized for searching symbols using addresses as the key.
+	  However using it for the inverse (find address using the symbol name)
+	  it is slow. This extra data and code (~55kB) speeds up the search.
+	  The only user of this is xSplice.
+
+	  If unsure, say Y.
 endmenu
diff --git a/xen/common/symbols-dummy.c b/xen/common/symbols-dummy.c
index 5090c3b..044dfd3 100644
--- a/xen/common/symbols-dummy.c
+++ b/xen/common/symbols-dummy.c
@@ -5,6 +5,7 @@
 
 #include <xen/config.h>
 #include <xen/types.h>
+#include <xen/symbols.h>
 
 #ifdef SYMBOLS_ORIGIN
 const unsigned int symbols_offsets[1];
@@ -14,6 +15,10 @@ const unsigned long symbols_addresses[1];
 const unsigned int symbols_num_syms;
 const u8 symbols_names[1];
 
+#ifdef CONFIG_FAST_SYMBOL_LOOKUP
+const struct symbol_offset symbols_sorted_offsets[1];
+#endif
+
 const u8 symbols_token_table[1];
 const u16 symbols_token_index[1];
 
diff --git a/xen/common/symbols.c b/xen/common/symbols.c
index 8b4d0fd..c6931e9 100644
--- a/xen/common/symbols.c
+++ b/xen/common/symbols.c
@@ -31,6 +31,8 @@ extern const unsigned long symbols_addresses[];
 extern const unsigned int symbols_num_syms;
 extern const u8 symbols_names[];
 
+extern const struct symbol_offset symbols_sorted_offsets[];
+
 extern const u8 symbols_token_table[];
 extern const u16 symbols_token_index[];
 
@@ -211,14 +213,39 @@ int xensyms_read(uint32_t *symnum, char *type,
 void *symbols_lookup_by_name(const char *symname)
 {
     char name[KSYM_NAME_LEN + 1];
+#ifdef CONFIG_FAST_SYMBOL_LOOKUP
+    unsigned long low, high;
+#else
     uint32_t symnum = 0;
     char type;
     unsigned long addr;
     int rc;
+#endif
 
     if ( *symname == '\0' )
         return NULL;
 
+#ifdef CONFIG_FAST_SYMBOL_LOOKUP
+    low = 0;
+    high = symbols_num_syms;
+    while ( low < high )
+    {
+        unsigned long mid = low + ((high - low) / 2);
+        const struct symbol_offset *s;
+        int rc;
+
+        s = &symbols_sorted_offsets[mid];
+        (void)symbols_expand_symbol(s->stream, name);
+        /* Format is: [filename]#<symbol>. symbols_expand_symbol eats type.*/
+        rc = strcmp(symname, name);
+        if ( rc < 0 )
+            high = mid;
+        else if ( rc > 0 )
+            low = mid + 1;
+        else
+            return (void *)symbols_address(s->addr);
+    }
+#else
     do {
         rc = xensyms_read(&symnum, &type, &addr, name);
         if ( rc )
@@ -229,6 +256,7 @@ void *symbols_lookup_by_name(const char *symname)
 
     } while ( name[0] != '\0' );
 
+#endif
     return NULL;
 }
 
diff --git a/xen/include/xen/symbols.h b/xen/include/xen/symbols.h
index 2122a5d..0f5876a 100644
--- a/xen/include/xen/symbols.h
+++ b/xen/include/xen/symbols.h
@@ -25,4 +25,12 @@ int xensyms_read(uint32_t *symnum, char *type,
 
 void *symbols_lookup_by_name(const char *symname);
 
+/*
+ * A sorted (by symbols) lookup table table to symbols_names (stream)
+ * and symbols_address (or offset).
+ */
+struct symbol_offset {
+    uint32_t stream; /* .. in the compressed stream.*/
+    uint32_t addr;   /* .. and in the fixed size address array. */
+};
 #endif /*_XEN_SYMBOLS_H*/
diff --git a/xen/tools/symbols.c b/xen/tools/symbols.c
index 196db74..941fbe7 100644
--- a/xen/tools/symbols.c
+++ b/xen/tools/symbols.c
@@ -40,6 +40,10 @@ struct sym_entry {
 	unsigned long long addr;
 	unsigned int len;
 	unsigned char *sym;
+	char *orig_symbol;
+	unsigned int addr_idx;
+	unsigned int stream_offset;
+	unsigned char type;
 };
 #define SYMBOL_NAME(s) ((char *)(s)->sym + 1)
 
@@ -47,8 +51,10 @@ static struct sym_entry *table;
 static unsigned int table_size, table_cnt;
 static unsigned long long _stext, _etext, _sinittext, _einittext, _sextratext, _eextratext;
 static int all_symbols = 0;
+static int sort_by_name = 0;
 static char symbol_prefix_char = '\0';
 static enum { fmt_bsd, fmt_sysv } input_format;
+static int compare_name(const void *p1, const void *p2);
 
 int token_profit[0x10000];
 
@@ -175,8 +181,11 @@ static int read_symbol(FILE *in, struct sym_entry *s)
 		*sym++ = '#';
 	}
 	strcpy(sym, str);
+	if (sort_by_name) {
+		s->orig_symbol = strdup(SYMBOL_NAME(s));
+		s->type = stype; /* As s->sym[0] ends mangled. */
+	}
 	s->sym[0] = stype;
-
 	rc = 0;
 
  skip_tail:
@@ -276,6 +285,21 @@ static int expand_symbol(unsigned char *data, int len, char *result)
 	return total;
 }
 
+/* Sort by original (non mangled) symbol name, then type. */
+static int compare_name_orig(const void *p1, const void *p2)
+{
+	const struct sym_entry *sym1 = p1;
+	const struct sym_entry *sym2 = p2;
+	int rc;
+
+	rc = strcmp(sym1->orig_symbol, sym2->orig_symbol);
+
+	if (!rc)
+		rc = sym1->type - sym2->type;
+
+	return rc;
+}
+
 static void write_src(void)
 {
 	unsigned int i, k, off;
@@ -325,6 +349,7 @@ static void write_src(void)
 			printf(", 0x%02x", table[i].sym[k]);
 		printf("\n");
 
+		table[i].stream_offset = off;
 		off += table[i].len + 1;
 	}
 	printf("\n");
@@ -334,7 +359,6 @@ static void write_src(void)
 		printf("\t.long\t%d\n", markers[i]);
 	printf("\n");
 
-	free(markers);
 
 	output_label("symbols_token_table");
 	off = 0;
@@ -350,6 +374,25 @@ static void write_src(void)
 	for (i = 0; i < 256; i++)
 		printf("\t.short\t%d\n", best_idx[i]);
 	printf("\n");
+
+	if (!sort_by_name) {
+		free(markers);
+		return;
+	}
+
+	/* Sorted by original symbol names and type. */
+	qsort(table, table_cnt, sizeof(*table), compare_name_orig);
+
+	output_label("symbols_sorted_offsets");
+	/* A fixed sized array with two entries: offset in the
+	 * compressed stream (for symbol name), and offset in
+	 * symbols_addresses (or symbols_offset). */
+	for (i = 0; i < table_cnt; i++) {
+		printf("\t.long %u, %u\n", table[i].stream_offset, table[i].addr_idx);
+	}
+	printf("\n");
+
+	free(markers);
 }
 
 
@@ -410,6 +453,7 @@ static void compress_symbols(unsigned char *str, int idx)
 		len = table[i].len;
 		p1 = table[i].sym;
 
+		table[i].addr_idx = i;
 		/* find the token on the symbol */
 		p2 = memmem_pvt(p1, len, str, 2);
 		if (!p2) continue;
@@ -561,6 +605,8 @@ int main(int argc, char **argv)
 				input_format = fmt_sysv;
 			else if (strcmp(argv[i], "--sort") == 0)
 				unsorted = true;
+			else if (strcmp(argv[i], "--sort-by-name") == 0)
+				sort_by_name = 1;
 			else if (strcmp(argv[i], "--warn-dup") == 0)
 				warn_dup = true;
 			else
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 16/27] x86, xsplice: Print payload's symbol name and payload name in backtraces
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (14 preceding siblings ...)
  2016-04-25 15:35 ` [PATCH v9 15/27] xsplice, symbols: Implement fast symbol names -> virtual addresses lookup Konrad Rzeszutek Wilk
@ 2016-04-25 15:35 ` Konrad Rzeszutek Wilk
  2016-04-26 11:06   ` Ross Lagerwall
  2016-04-25 15:35 ` [PATCH v9 17/27] xsplice: Add support for bug frames Konrad Rzeszutek Wilk
                   ` (11 subsequent siblings)
  27 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:35 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Keir Fraser, Tim Deegan, Ian Jackson, Jan Beulich, Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Naturally the backtrace is presented when an instruction
hits an bug_frame or %p is used.

The payloads do not support bug_frames yet - however the functions
the payloads call could hit an BUG() or WARN().

The traps.c has logic to scan for it this - and eventually it will
find the correct bug_frame and the walk the stack using %p to print
the backtrace. For %p and symbols to print a string -  the
'is_active_kernel_text' is consulted which uses an 'struct virtual_region'.

Therefore we register our start->end addresses so that
'is_active_kernel_text' will include our payload address.

We also register our symbol lookup table function so that it can
scan the list of payloads and retrieve the correct name.

Lastly we change vsprintf to take into account s and namebuf.
For core code they are the same, but for payloads they are different.
This gets us:

Xen call trace:
   [<ffff82d080a00041>] revert_hook+0x31/0x35 [xen_hello_world]
   [<ffff82d0801431bd>] xsplice.c#revert_payload+0x86/0xc6
   [<ffff82d080143502>] check_for_xsplice_work+0x233/0x3cd
   [<ffff82d08017a0b2>] domain.c#continue_idle_domain+0x9/0x1f

Which is great if payloads have similar or same symbol names.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>

v2: Add missing full stop.
v3: s/module/payload/
v4: Expand comment and include registration of 'virtual_region'
    Redo the vsprintf handling of payload name.
    Drop the ->skip function
v6: Add comment explaining the purpose behind the strcmp.
    Redid per Jan's review.
v7: Add Andrew's Review-by
    Drop the strcmp and just do pointer checks.
v9: Do pointer comparison on vsprintf by itself, no need for intermediate
    payload bool_t
    Add const in xsplice_symbols_lookup
    Make 'best' in xsplice_symbols_lookup be unsigned int.
    Use an RCU list for iterating the applied_list. Define the RCU lock.
---
---
 xen/common/vsprintf.c     | 12 +++++++++
 xen/common/xsplice.c      | 69 ++++++++++++++++++++++++++++++++++++++++++++---
 xen/include/xen/xsplice.h |  1 +
 3 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/xen/common/vsprintf.c b/xen/common/vsprintf.c
index 18d2634..70e1edf 100644
--- a/xen/common/vsprintf.c
+++ b/xen/common/vsprintf.c
@@ -20,6 +20,7 @@
 #include <xen/symbols.h>
 #include <xen/lib.h>
 #include <xen/sched.h>
+#include <xen/xsplice.h>
 #include <asm/div64.h>
 #include <asm/page.h>
 
@@ -354,6 +355,17 @@ static char *pointer(char *str, char *end, const char **fmt_ptr,
             str = number(str, end, sym_size, 16, -1, -1, SPECIAL);
         }
 
+        /*
+         * namebuf contents and s for core hypervisor are same but for xSplice
+         * payloads they differ (namebuf contains the name of the payload).
+         */
+        if ( namebuf != s )
+        {
+            str = string(str, end, " [", -1, -1, 0);
+            str = string(str, end, namebuf, -1, -1, 0);
+            str = string(str, end, "]", -1, -1, 0);
+        }
+
         return str;
     }
 
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 6051ade..72a3b88 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -14,7 +14,9 @@
 #include <xen/smp.h>
 #include <xen/softirq.h>
 #include <xen/spinlock.h>
+#include <xen/string.h>
 #include <xen/symbols.h>
+#include <xen/virtual_region.h>
 #include <xen/vmap.h>
 #include <xen/wait.h>
 #include <xen/xsplice_elf.h>
@@ -31,10 +33,9 @@ static LIST_HEAD(payload_list);
 
 /*
  * Patches which have been applied. Need RCU in case we crash (and then
- * traps code would iterate via applied_list) when adding entries on the list.
- *
- * Note: There are no 'rcu_applied_lock' as we don't iterate yet the list.
+ * traps code would iterate via applied_list) when adding entries onthe list.
  */
+static DEFINE_RCU_READ_LOCK(rcu_applied_lock);
 static LIST_HEAD(applied_list);
 
 static unsigned int payload_cnt;
@@ -56,6 +57,8 @@ struct payload {
     unsigned int nfuncs;                 /* Nr of functions to patch. */
     const struct xsplice_symbol *symtab; /* All symbols. */
     const char *strtab;                  /* Pointer to .strtab. */
+    struct virtual_region region;        /* symbol, bug.frame patching and
+                                            exception table (x86). */
     unsigned int nsyms;                  /* Nr of entries in .strtab and symbols. */
     char name[XEN_XSPLICE_NAME_SIZE];    /* Name of it. */
 };
@@ -142,6 +145,55 @@ void *xsplice_symbols_lookup_by_name(const char *symname)
     return 0;
 }
 
+static const char *xsplice_symbols_lookup(unsigned long addr,
+                                          unsigned long *symbolsize,
+                                          unsigned long *offset,
+                                          char *namebuf)
+{
+    const struct payload *data;
+    unsigned int i, best;
+    const void *va = (const void *)addr;
+    const char *n = NULL;
+
+    /*
+     * Only RCU locking since this list is only ever changed during apply
+     * or revert context. And in case it dies there we need an safe list.
+     */
+    rcu_read_lock(&rcu_applied_lock);
+    list_for_each_entry_rcu ( data, &applied_list, applied_list )
+    {
+        if ( va < data->text_addr &&
+             va >= (data->text_addr + data->pages * PAGE_SIZE) )
+            continue;
+
+        best = UINT_MAX;
+
+        for ( i = 0; i < data->nsyms; i++ )
+        {
+            if ( data->symtab[i].value <= va &&
+                 (best == UINT_MAX ||
+                  data->symtab[best].value < data->symtab[i].value) )
+                best = i;
+        }
+
+        if ( best == UINT_MAX )
+            break;
+
+        if ( symbolsize )
+            *symbolsize = data->symtab[best].size;
+        if ( offset )
+            *offset = va - data->symtab[best].value;
+        if ( namebuf )
+            strlcpy(namebuf, data->name, KSYM_NAME_LEN);
+
+        n = data->symtab[best].name;
+        break;
+    }
+    rcu_read_unlock(&rcu_applied_lock);
+
+    return n;
+}
+
 static struct payload *find_payload(const char *name)
 {
     struct payload *data, *found = NULL;
@@ -366,6 +418,7 @@ static int prepare_payload(struct payload *payload,
     const struct xsplice_elf_sec *sec;
     unsigned int i;
     struct xsplice_patch_func *f;
+    struct virtual_region *region;
 
     sec = xsplice_elf_sec_by_name(elf, ELF_XSPLICE_FUNC);
     ASSERT(sec);
@@ -422,6 +475,13 @@ static int prepare_payload(struct payload *payload,
         }
     }
 
+    /* Setup the virtual region with proper data. */
+    region = &payload->region;
+
+    region->symbols_lookup = xsplice_symbols_lookup;
+    region->start = payload->text_addr;
+    region->end = payload->text_addr + payload->text_size;
+
     return 0;
 }
 
@@ -498,6 +558,7 @@ static int build_symbol_table(struct payload *payload,
         if ( is_payload_symbol(elf, elf->sym + i) )
         {
             symtab[nsyms].name = strtab + strtab_len;
+            symtab[nsyms].size = elf->sym[i].sym->st_size;
             symtab[nsyms].value = (void *)elf->sym[i].sym->st_value;
             symtab[nsyms].new_symbol = 0; /* May be overwritten below. */
             strtab_len += strlcpy(strtab + strtab_len, elf->sym[i].name,
@@ -785,6 +846,7 @@ static int apply_payload(struct payload *data)
     arch_xsplice_patching_leave();
 
     list_add_tail_rcu(&data->applied_list, &applied_list);
+    register_virtual_region(&data->region);
 
     return 0;
 }
@@ -803,6 +865,7 @@ static int revert_payload(struct payload *data)
     arch_xsplice_patching_leave();
 
     list_del_rcu(&data->applied_list);
+    unregister_virtual_region(&data->region);
 
     return 0;
 }
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index 1526752..bb8baee 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -22,6 +22,7 @@ struct xen_sysctl_xsplice_op;
 struct xsplice_symbol {
     const char *name;
     void *value;
+    unsigned int size;
     bool_t new_symbol;
 };
 
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 17/27] xsplice: Add support for bug frames.
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (15 preceding siblings ...)
  2016-04-25 15:35 ` [PATCH v9 16/27] x86, xsplice: Print payload's symbol name and payload name in backtraces Konrad Rzeszutek Wilk
@ 2016-04-25 15:35 ` Konrad Rzeszutek Wilk
  2016-04-26 11:05   ` Ross Lagerwall
  2016-04-26 15:58   ` Jan Beulich
  2016-04-25 15:35 ` [PATCH v9 18/27] xsplice: Add support for exception tables Konrad Rzeszutek Wilk
                   ` (10 subsequent siblings)
  27 siblings, 2 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:35 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Keir Fraser, Jan Beulich, Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Add support for handling bug frames contained with xsplice modules. If a
trap occurs search either the kernel bug table or an applied payload's
bug table depending on the instruction pointer.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v2:- s/module/payload/
   - add build time check in case amount of bug frames expands.
   - add define for the number of bug-frames.
v3:
  - add missing BUGFRAME_NR, squash s/core_size/core/ in earlier patch.
  - Moved code around.
  - Changed per Andrew's recommendation.
  - Fixed style changes.
  - Made it compile under ARM (PRIu32,PRIu64)
v4: Use 'struct virtual_region'
  - Rip more of the is_active_text code.
  - Use one function for the ->skip
  - Include test-case
v5: Rip out the ->skip function.
v7: Add a text check as well.
    Add Andrew's Reviewed-by.
v8: Changed dprintk XENLOG_DEBUG to XENLOG_ERR
v9: Removed pointless check on the side of conditional (sec->sec->sh_size)
  - Added const.
  - Use RCU list.
---
---
 xen/arch/x86/traps.c      |  5 +++--
 xen/common/xsplice.c      | 51 +++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/xsplice.h |  5 +++++
 3 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index f73f7f3..8384158 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -50,6 +50,7 @@
 #include <xen/paging.h>
 #include <xen/virtual_region.h>
 #include <xen/watchdog.h>
+#include <xen/xsplice.h>
 #include <asm/system.h>
 #include <asm/io.h>
 #include <asm/atomic.h>
@@ -1287,7 +1288,7 @@ void do_invalid_op(struct cpu_user_regs *regs)
 
     /* WARN, BUG or ASSERT: decode the filename pointer and line number. */
     filename = bug_ptr(bug);
-    if ( !is_kernel(filename) )
+    if ( !is_kernel(filename) && !is_patch(filename) )
         goto die;
     fixup = strlen(filename);
     if ( fixup > 50 )
@@ -1314,7 +1315,7 @@ void do_invalid_op(struct cpu_user_regs *regs)
     case BUGFRAME_assert:
         /* ASSERT: decode the predicate string pointer. */
         predicate = bug_msg(bug);
-        if ( !is_kernel(predicate) )
+        if ( !is_kernel(predicate) && !is_patch(predicate) )
             predicate = "<unknown>";
 
         printk("Assertion '%s' failed at %s%s:%d\n",
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 72a3b88..11b19dd 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -123,6 +123,35 @@ static int verify_payload(const xen_sysctl_xsplice_upload_t *upload, char *n)
     return 0;
 }
 
+bool_t is_patch(const void *ptr)
+{
+    const struct payload *data;
+    bool_t r = 0;
+
+    /*
+     * Only RCU locking since this list is only ever changed during apply
+     * or revert context. And in case it dies there we need an safe list.
+     */
+    rcu_read_lock(&rcu_applied_lock);
+    list_for_each_entry_rcu ( data, &applied_list, applied_list )
+    {
+        if ( (ptr >= data->rw_addr &&
+              ptr < (data->rw_addr + data->rw_size)) ||
+             (ptr >= data->ro_addr &&
+              ptr < (data->ro_addr + data->ro_size)) ||
+             (ptr >= data->text_addr &&
+              ptr < (data->text_addr + data->text_size)) )
+        {
+            r = 1;
+            break;
+        }
+
+    }
+    rcu_read_unlock(&rcu_applied_lock);
+
+    return r;
+}
+
 void *xsplice_symbols_lookup_by_name(const char *symname)
 {
     const struct payload *data;
@@ -482,6 +511,28 @@ static int prepare_payload(struct payload *payload,
     region->start = payload->text_addr;
     region->end = payload->text_addr + payload->text_size;
 
+    /* Optional sections. */
+    for ( i = 0; i < BUGFRAME_NR; i++ )
+    {
+        char str[14];
+
+        snprintf(str, sizeof(str), ".bug_frames.%u", i);
+        sec = xsplice_elf_sec_by_name(elf, str);
+        if ( !sec )
+            continue;
+
+        if ( sec->sec->sh_size % sizeof(*region->frame[i].bugs) )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Wrong size of .bug_frames.%u!\n",
+                    elf->name, i);
+            return -EINVAL;
+        }
+
+        region->frame[i].bugs = sec->load_addr;
+        region->frame[i].n_bugs = sec->sec->sh_size /
+                                  sizeof(*region->frame[i].bugs);
+    }
+
     return 0;
 }
 
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index bb8baee..7f4c8f7 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -29,6 +29,7 @@ struct xsplice_symbol {
 int xsplice_op(struct xen_sysctl_xsplice_op *);
 void check_for_xsplice_work(void);
 void *xsplice_symbols_lookup_by_name(const char *symname);
+bool_t is_patch(const void *addr);
 
 /* Arch hooks. */
 int arch_xsplice_verify_elf(const struct xsplice_elf *elf);
@@ -76,6 +77,10 @@ static inline int xsplice_op(struct xen_sysctl_xsplice_op *op)
 }
 
 static inline void check_for_xsplice_work(void) { };
+static inline bool_t is_patch(const void *addr)
+{
+    return 0;
+}
 #endif /* CONFIG_XSPLICE */
 
 #endif /* __XEN_XSPLICE_H__ */
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 18/27] xsplice: Add support for exception tables.
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (16 preceding siblings ...)
  2016-04-25 15:35 ` [PATCH v9 17/27] xsplice: Add support for bug frames Konrad Rzeszutek Wilk
@ 2016-04-25 15:35 ` Konrad Rzeszutek Wilk
  2016-04-26 16:01   ` Jan Beulich
  2016-04-25 15:35 ` [PATCH v9 19/27] xsplice: Add support for alternatives Konrad Rzeszutek Wilk
                   ` (9 subsequent siblings)
  27 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:35 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Keir Fraser, Jan Beulich, Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Add support for exception tables contained within xSplice payloads. If an
exception occurs search either the main exception table or a particular
active payload's exception table depending on the instruction pointer.

Also we add an test-case to make sure we have an exception that
is handled.

To not grow the code-base if xSplice is not compiled in we add
certain #define to help in determining if code needs to be __init
or not.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v3:
 - s/module/payload/
 - sanity checks.
 - Move code around.
 - s/module/payload/
v4: Use 'struct virtual_region'
v5:
  - Expand test-case.
  - Deal with struct exception_table_entry being const.
v6:
 - Make the code have __init if not compiled with xSplice
 - Remove not needed declarations.
v7:
 - Make the non_canonical_addr be 0xdead000000000000ULL
 - Remove casts
 - Add Reviewed-by from Andrew
 - Change ifdef to be !ARM
v8:
 - Change dprintk XENLOG_DEBUG to XENLOG_ERR on dprintk.
 - Remove pointless parentheses on 0xdead...
 - Changed __INIT.. to init_or_xsplice
 - Added const to sort_exception_table last parameter.
 - Added __initconstrel #define
---
 xen/arch/x86/extable.c                   | 31 ++++++++++++++++++-------------
 xen/arch/x86/test/xen_hello_world_func.c | 13 +++++++++++++
 xen/common/xsplice.c                     | 25 +++++++++++++++++++++++++
 xen/include/asm-x86/uaccess.h            |  2 ++
 xen/include/xen/xsplice.h                | 19 +++++++++++++++++++
 5 files changed, 77 insertions(+), 13 deletions(-)

diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c
index 2a06cca..349df79 100644
--- a/xen/arch/x86/extable.c
+++ b/xen/arch/x86/extable.c
@@ -7,6 +7,7 @@
 #include <xen/spinlock.h>
 #include <asm/uaccess.h>
 #include <xen/virtual_region.h>
+#include <xen/xsplice.h>
 
 #define EX_FIELD(ptr, field) ((unsigned long)&(ptr)->field + (ptr)->field)
 
@@ -20,7 +21,7 @@ static inline unsigned long ex_cont(const struct exception_table_entry *x)
 	return EX_FIELD(x, cont);
 }
 
-static int __init cmp_ex(const void *a, const void *b)
+static int init_or_xsplice cmp_ex(const void *a, const void *b)
 {
 	const struct exception_table_entry *l = a, *r = b;
 	unsigned long lip = ex_addr(l);
@@ -35,7 +36,7 @@ static int __init cmp_ex(const void *a, const void *b)
 }
 
 #ifndef swap_ex
-static void __init swap_ex(void *a, void *b, int size)
+static void init_or_xsplice swap_ex(void *a, void *b, int size)
 {
 	struct exception_table_entry *l = a, *r = b, tmp;
 	long delta = b - a;
@@ -48,19 +49,23 @@ static void __init swap_ex(void *a, void *b, int size)
 }
 #endif
 
-void __init sort_exception_tables(void)
+void init_or_xsplice sort_exception_table(struct exception_table_entry *start,
+                                 const struct exception_table_entry *stop)
 {
-    sort(__start___ex_table, __stop___ex_table - __start___ex_table,
-         sizeof(struct exception_table_entry), cmp_ex, swap_ex);
-    sort(__start___pre_ex_table,
-         __stop___pre_ex_table - __start___pre_ex_table,
+    sort(start, stop - start,
          sizeof(struct exception_table_entry), cmp_ex, swap_ex);
 }
 
-static inline unsigned long
-search_one_table(const struct exception_table_entry *first,
-                 const struct exception_table_entry *last,
-                 unsigned long value)
+void __init sort_exception_tables(void)
+{
+    sort_exception_table(__start___ex_table, __stop___ex_table);
+    sort_exception_table(__start___pre_ex_table, __stop___pre_ex_table);
+}
+
+unsigned long
+search_one_extable(const struct exception_table_entry *first,
+                   const struct exception_table_entry *last,
+                   unsigned long value)
 {
     const struct exception_table_entry *mid;
     long diff;
@@ -85,7 +90,7 @@ search_exception_table(unsigned long addr)
     const struct virtual_region *region = find_text_region(addr);
 
     if ( region && region->ex )
-        return search_one_table(region->ex, region->ex_end - 1, addr);
+        return search_one_extable(region->ex, region->ex_end - 1, addr);
 
     return 0;
 }
@@ -94,7 +99,7 @@ unsigned long
 search_pre_exception_table(struct cpu_user_regs *regs)
 {
     unsigned long addr = (unsigned long)regs->eip;
-    unsigned long fixup = search_one_table(
+    unsigned long fixup = search_one_extable(
         __start___pre_ex_table, __stop___pre_ex_table-1, addr);
     if ( fixup )
     {
diff --git a/xen/arch/x86/test/xen_hello_world_func.c b/xen/arch/x86/test/xen_hello_world_func.c
index 1ad002a..2e4af9c 100644
--- a/xen/arch/x86/test/xen_hello_world_func.c
+++ b/xen/arch/x86/test/xen_hello_world_func.c
@@ -5,9 +5,22 @@
 
 #include <xen/types.h>
 
+#include <asm/uaccess.h>
+
+static unsigned long *non_canonical_addr = (unsigned long *)0xdead000000000000ULL;
+
 /* Our replacement function for xen_extra_version. */
 const char *xen_hello_world(void)
 {
+    unsigned long tmp;
+    int rc;
+    /*
+     * Any BUG, or WARN_ON will contain symbol and payload name. Furthermore
+     * exceptions will be caught and processed properly.
+     */
+    rc = __get_user(tmp, non_canonical_addr);
+    BUG_ON(rc != -EFAULT);
+
     return "Hello World";
 }
 
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 11b19dd..f68062f 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -533,6 +533,31 @@ static int prepare_payload(struct payload *payload,
                                   sizeof(*region->frame[i].bugs);
     }
 
+#ifndef CONFIG_ARM
+    sec = xsplice_elf_sec_by_name(elf, ".ex_table");
+    if ( sec )
+    {
+        struct exception_table_entry *s, *e;
+
+        if ( !sec->sec->sh_size ||
+             (sec->sec->sh_size % sizeof(*region->ex)) )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Wrong size of .ex_table (exp:%lu vs %lu)!\n",
+                    elf->name, sizeof(*region->ex),
+                    sec->sec->sh_size);
+            return -EINVAL;
+        }
+
+        s = sec->load_addr;
+        e = sec->load_addr + sec->sec->sh_size;
+
+        sort_exception_table(s ,e);
+
+        region->ex = s;
+        region->ex_end = e;
+    }
+#endif
+
     return 0;
 }
 
diff --git a/xen/include/asm-x86/uaccess.h b/xen/include/asm-x86/uaccess.h
index 947470d..5df26c2 100644
--- a/xen/include/asm-x86/uaccess.h
+++ b/xen/include/asm-x86/uaccess.h
@@ -277,5 +277,7 @@ extern struct exception_table_entry __stop___pre_ex_table[];
 
 extern unsigned long search_exception_table(unsigned long);
 extern void sort_exception_tables(void);
+extern void sort_exception_table(struct exception_table_entry *start,
+                                 const struct exception_table_entry *stop);
 
 #endif /* __X86_UACCESS_H__ */
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index 7f4c8f7..cbdbff1 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -14,6 +14,16 @@ struct xen_sysctl_xsplice_op;
 #include <xen/elfstructs.h>
 #ifdef CONFIG_XSPLICE
 
+/*
+ * We use alternative and exception table code - which by default are __init
+ * only, however we need them during runtime. These macros allows us to build
+ * the image with these functions built-in. (See the #else below).
+ */
+#define init_or_xsplice_const
+#define init_or_xsplice_constrel
+#define init_or_xsplice_data
+#define init_or_xsplice
+
 /* Convenience define for printk. */
 #define XSPLICE             "xsplice: "
 /* ELF payload special section names. */
@@ -70,6 +80,15 @@ void arch_xsplice_mask(void);
 void arch_xsplice_unmask(void);
 #else
 
+/*
+ * If not compiling with xSplice certain functionality should stay as
+ * __init.
+ */
+#define init_or_xsplice_const       __initconst
+#define init_or_xsplice_constrel    __initconstrel
+#define init_or_xsplice_data        __initdata
+#define init_or_xsplice             __init
+
 #include <xen/errno.h> /* For -ENOSYS */
 static inline int xsplice_op(struct xen_sysctl_xsplice_op *op)
 {
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 19/27] xsplice: Add support for alternatives
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (17 preceding siblings ...)
  2016-04-25 15:35 ` [PATCH v9 18/27] xsplice: Add support for exception tables Konrad Rzeszutek Wilk
@ 2016-04-25 15:35 ` Konrad Rzeszutek Wilk
  2016-04-27  8:58   ` Jan Beulich
  2016-04-25 15:35 ` [PATCH v9 20/27] build_id: Provide ld-embedded build-ids Konrad Rzeszutek Wilk
                   ` (8 subsequent siblings)
  27 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:35 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Keir Fraser, Jan Beulich, Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Add support for applying alternative sections within xsplice payload.
At payload load time, apply an alternative sections that are found.

Also we add an test-case exercising a rather useless alternative
(patching a NOP with a NOP) - but it does exercise the code-path.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

---
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v2: Make a new alternative function that does not ASSERT on IRQs and
    don't disable IRQs in the code when loading payload.
v4: Include test-case
    Include check for size of alternatives and that it is not a 0 size
    section.
v6: Add #define INIT to preserve __initness on alternative code.
    Double check that alt_instr are only patching payload code.
v7: Move cr0 manipulation in apply_alternatives.
    ifdef around alternative.o in Makefile
    Pick X86_FEATURE_LM in test-case
    Drop casting from load_addr
    It is alternative.init.o, not alternative_init.o (thanks Andrew!)
v8: Change XENLOG_DEBUG to XENLOG_ERR on dprintk.
v9: Use init_or_xsplice instead of __INIT macros
    Take care of __initconstrel
    Change message when .alt_instr has incorrect size.
    Update add_nops with proper comment
    Update test case to patch a long instruction with a short one
    Used ..constrel on k6_nops and p6_nops.
    Used #%lx on printk. But with load_addr being void * switched to %p
    Use Jan's Makefile obj list incantation incantation incantation incantation
---
 xen/arch/x86/Makefile                    |  6 +++--
 xen/arch/x86/alternative.c               | 46 ++++++++++++++++++++------------
 xen/arch/x86/test/xen_hello_world_func.c |  4 +++
 xen/common/xsplice.c                     | 31 +++++++++++++++++++++
 xen/include/asm-x86/alternative.h        |  4 +++
 5 files changed, 72 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 900fa59..bd7ba9f 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -6,7 +6,9 @@ subdir-y += mm
 subdir-$(CONFIG_XENOPROF) += oprofile
 subdir-y += x86_64
 
-obj-bin-y += alternative.init.o
+alternative-y := alternative.init.o
+alternative-$(CONFIG_XSPLICE) :=
+obj-bin-y += $(alternative-y)
 obj-y += apic.o
 obj-y += bitops.o
 obj-bin-y += bzimage.init.o
@@ -61,7 +63,7 @@ obj-y += x86_emulate.o
 obj-y += tboot.o
 obj-y += hpet.o
 obj-y += vm_event.o
-obj-$(CONFIG_XSPLICE) += xsplice.o
+obj-$(CONFIG_XSPLICE) += alternative.o xsplice.o
 obj-y += xstate.o
 
 obj-$(crash_debug) += gdbstub.o
diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index f735ff8..c188a15 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -22,13 +22,14 @@
 #include <asm/system.h>
 #include <asm/traps.h>
 #include <asm/nmi.h>
+#include <xen/xsplice.h>
 
 #define MAX_PATCH_LEN (255-1)
 
 extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
 
 #ifdef K8_NOP1
-static const unsigned char k8nops[] __initconst = {
+static const unsigned char k8nops[] init_or_xsplice_const = {
     K8_NOP1,
     K8_NOP2,
     K8_NOP3,
@@ -38,7 +39,7 @@ static const unsigned char k8nops[] __initconst = {
     K8_NOP7,
     K8_NOP8
 };
-static const unsigned char * const k8_nops[ASM_NOP_MAX+1] __initconstrel = {
+static const unsigned char * const k8_nops[ASM_NOP_MAX+1] init_or_xsplice_constrel = {
     NULL,
     k8nops,
     k8nops + 1,
@@ -52,7 +53,7 @@ static const unsigned char * const k8_nops[ASM_NOP_MAX+1] __initconstrel = {
 #endif
 
 #ifdef P6_NOP1
-static const unsigned char p6nops[] __initconst = {
+static const unsigned char p6nops[] init_or_xsplice_const = {
     P6_NOP1,
     P6_NOP2,
     P6_NOP3,
@@ -62,7 +63,7 @@ static const unsigned char p6nops[] __initconst = {
     P6_NOP7,
     P6_NOP8
 };
-static const unsigned char * const p6_nops[ASM_NOP_MAX+1] __initconstrel = {
+static const unsigned char * const p6_nops[ASM_NOP_MAX+1] init_or_xsplice_constrel = {
     NULL,
     p6nops,
     p6nops + 1,
@@ -75,7 +76,7 @@ static const unsigned char * const p6_nops[ASM_NOP_MAX+1] __initconstrel = {
 };
 #endif
 
-static const unsigned char * const *ideal_nops __initdata = k8_nops;
+static const unsigned char * const *ideal_nops init_or_xsplice_data = k8_nops;
 
 static int __init mask_nmi_callback(const struct cpu_user_regs *regs, int cpu)
 {
@@ -100,7 +101,7 @@ static void __init arch_init_ideal_nops(void)
 }
 
 /* Use this to add nops to a buffer, then text_poke the whole buffer. */
-static void __init add_nops(void *insns, unsigned int len)
+static void init_or_xsplice add_nops(void *insns, unsigned int len)
 {
     while ( len > 0 )
     {
@@ -114,7 +115,7 @@ static void __init add_nops(void *insns, unsigned int len)
 }
 
 /*
- * text_poke_early - Update instructions on a live kernel at boot time
+ * text_poke - Update instructions on a live kernel or non-executed code.
  * @addr: address to modify
  * @opcode: source of the copy
  * @len: length to copy
@@ -125,9 +126,10 @@ static void __init add_nops(void *insns, unsigned int len)
  * instructions. And on the local CPU you need to be protected again NMI or MCE
  * handlers seeing an inconsistent instruction while you patch.
  *
- * This routine is called with local interrupt disabled.
+ * You should run this with interrupts disabled or on code that is not
+ * executing.
  */
-static void *__init text_poke_early(void *addr, const void *opcode, size_t len)
+static void *init_or_xsplice text_poke(void *addr, const void *opcode, size_t len)
 {
     memcpy(addr, opcode, len);
     sync_core();
@@ -142,20 +144,14 @@ static void *__init text_poke_early(void *addr, const void *opcode, size_t len)
  * APs have less capabilities than the boot processor are not handled.
  * Tough. Make sure you disable such features by hand.
  */
-static void __init apply_alternatives(struct alt_instr *start, struct alt_instr *end)
+void init_or_xsplice apply_alternatives_nocheck(struct alt_instr *start, struct alt_instr *end)
 {
     struct alt_instr *a;
     u8 *instr, *replacement;
     u8 insnbuf[MAX_PATCH_LEN];
-    unsigned long cr0 = read_cr0();
-
-    ASSERT(!local_irq_is_enabled());
 
     printk(KERN_INFO "alt table %p -> %p\n", start, end);
 
-    /* Disable WP to allow application of alternatives to read-only pages. */
-    write_cr0(cr0 & ~X86_CR0_WP);
-
     /*
      * The scan order should be from start to end. A later scanned
      * alternative code can overwrite a previous scanned alternative code.
@@ -183,8 +179,24 @@ static void __init apply_alternatives(struct alt_instr *start, struct alt_instr
 
         add_nops(insnbuf + a->replacementlen,
                  a->instrlen - a->replacementlen);
-        text_poke_early(instr, insnbuf, a->instrlen);
+        text_poke(instr, insnbuf, a->instrlen);
     }
+}
+
+/*
+ * This routine is called with local interrupt disabled and used during
+ * bootup.
+ */
+void __init apply_alternatives(struct alt_instr *start, struct alt_instr *end)
+{
+    unsigned long cr0 = read_cr0();
+
+    ASSERT(!local_irq_is_enabled());
+
+    /* Disable WP to allow application of alternatives to read-only pages. */
+    write_cr0(cr0 & ~X86_CR0_WP);
+
+    apply_alternatives_nocheck(start, end);
 
     /* Reinstate WP. */
     write_cr0(cr0);
diff --git a/xen/arch/x86/test/xen_hello_world_func.c b/xen/arch/x86/test/xen_hello_world_func.c
index 2e4af9c..03d6b84 100644
--- a/xen/arch/x86/test/xen_hello_world_func.c
+++ b/xen/arch/x86/test/xen_hello_world_func.c
@@ -5,6 +5,8 @@
 
 #include <xen/types.h>
 
+#include <asm/alternative.h>
+#include <asm/nops.h>
 #include <asm/uaccess.h>
 
 static unsigned long *non_canonical_addr = (unsigned long *)0xdead000000000000ULL;
@@ -14,6 +16,8 @@ const char *xen_hello_world(void)
 {
     unsigned long tmp;
     int rc;
+
+    alternative(ASM_NOP8, ASM_NOP1, X86_FEATURE_LM);
     /*
      * Any BUG, or WARN_ON will contain symbol and payload name. Furthermore
      * exceptions will be caught and processed properly.
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index f68062f..05064ae 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -534,6 +534,37 @@ static int prepare_payload(struct payload *payload,
     }
 
 #ifndef CONFIG_ARM
+    sec = xsplice_elf_sec_by_name(elf, ".altinstructions");
+    if ( sec )
+    {
+        struct alt_instr *a, *start, *end;
+
+        if ( sec->sec->sh_size % sizeof(*a) )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Size of .alt_instr is not multiple of %zu!\n",
+                    elf->name, sizeof(*a));
+            return -EINVAL;
+        }
+
+        start = sec->load_addr;
+        end = sec->load_addr + sec->sec->sh_size;
+
+        for ( a = start; a < end; a++ )
+        {
+            const void *instr = &a->instr_offset + a->instr_offset;
+            const void *replacement = &a->repl_offset + a->repl_offset;
+
+            if ( (instr < region->start && instr >= region->end) ||
+                 (replacement < region->start && replacement >= region->end) )
+            {
+                dprintk(XENLOG_ERR, XSPLICE "%s Alt patching outside payload: %p!\n",
+                        elf->name, instr);
+                return -EINVAL;
+            }
+        }
+        apply_alternatives_nocheck(start, end);
+    }
+
     sec = xsplice_elf_sec_by_name(elf, ".ex_table");
     if ( sec )
     {
diff --git a/xen/include/asm-x86/alternative.h b/xen/include/asm-x86/alternative.h
index 1056630..bce959f 100644
--- a/xen/include/asm-x86/alternative.h
+++ b/xen/include/asm-x86/alternative.h
@@ -23,6 +23,10 @@ struct alt_instr {
     u8  replacementlen;     /* length of new instruction, <= instrlen */
 };
 
+/* Similar to apply_alternatives except it can be run with IRQs enabled. */
+extern void apply_alternatives_nocheck(struct alt_instr *start,
+                                       struct alt_instr *end);
+extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end);
 extern void alternative_instructions(void);
 
 #define OLDINSTR(oldinstr)      "661:\n\t" oldinstr "\n662:\n"
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 20/27] build_id: Provide ld-embedded build-ids
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (18 preceding siblings ...)
  2016-04-25 15:35 ` [PATCH v9 19/27] xsplice: Add support for alternatives Konrad Rzeszutek Wilk
@ 2016-04-25 15:35 ` Konrad Rzeszutek Wilk
  2016-04-25 15:35 ` [PATCH v9 21/27] xsplice: Print build_id in keyhandler and on bootup Konrad Rzeszutek Wilk
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:35 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Julien Grall, Stefano Stabellini, Keir Fraser, Jan Beulich,
	Konrad Rzeszutek Wilk

This patch enables the Elf to be built with the build-id
and provide in the Xen hypervisor the code to extract it.

The man-page for ld --build-id says it is:

"Request the creation of a ".note.gnu.build-id" ELF note
section or a ".build-id" COFF section.  The contents of the
note are unique bits identifying this linked file. style can be
"uuid" to use 128 random bits, "sha1" to use a 160-bit SHA1 hash
on the normative parts of the output contents, ..."

One can also retrieve the value of the build-id by doing
'readelf -n xen-syms'.

For EFI builds we re-use the same build-id that the xen-syms
was built with.

The version of ld that first implemented --build-id is v2.18.
We check for to see if the linker supports the --build-id
parameter and if so use it.

For x86 we have two binaries - the xen-syms and the xen - an
smaller version with lots of sections removed. To make it possible
for readelf -n xen we also modify mkelf32 and xen.lds.S to include
the PT_NOTE ELF section.

The EFI binary is more complicated. We only build one type of
binary and expanding the amount of sections the EFI binary has to
include an .note one is pointless - as there is no concept of
PT_NOTE. The best we can do is move this .note in the .rodata section.

Further development wise should move it to .buildid section
so that DataDirectory debug data nor CodeView can view it.
(The author has no clue what those are).

Note that in earlier patches the linker script had:

 __note_gnu_build_id_start = .;
 *(.rodata.note.gnu.build-id)
 __note_gnu_build_id_end = .;
 *(.note)
 *(.note.*)

Which meant you could have different ELF notes _outside_ the
__note_gnu_build_id_end. However for EFI builds we take the whole
.note* section and jam it in the EFI to be between
__note_gnu_build_id_start and __note_gnu_build_id_end.
To not make this happend we make on the ELF build the section
be called .note.gnu.build-id  (instead of just .note).
If there is a need for a different type of note other folks
can add it as a different section name.

Note that we do call --binary-id=sha1 on all linker invocations.
We have to do to enforce that the symbol offsets don't changes
(the side effect is that we we would have multiple binary ids -
except that the last one is the final one).

Without this working the symbol table embedded in Xen ends
up incorrect - some of the values it contains would be offset by the
size of the included build id.

This obviously causes problems when resolving symbols.

We also define the NT_GNU_BUILD_ID in the elfstructs.h as we
need to use it in various places.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Martin Pohlack <mpohlack@amazon.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v1: Rebase it on Martin's initial patch
v2: Move it to XENVER hypercall
v3: Fix EFI building (Ross's fix)
    Don't use the third argument for length.
    Use new structure for XENVER_build_id with variable buf.
    Include Ross's fix.
    Include detection of bin-utils for build-id support, add
    probing for size, and return -EPERM for XSM denied calls.
    Build xen_build_id under ARM, required adding ELFSIZE in proper file.
    Rebase on top XSM version class.
v4:
    Include the build-id .note in the xen ELF binary.
    s/build_id/build_id_linker/
    For EFI build, moved the --build-id values in .data section
    Rebase on staging.
    Split patch in two. Always do --build-id call. Include the .note in
    .rodata. USe const void * and ssize_t
    Use -S to make build_id.o and objcopy differently (Andrew suggested)
v5: Put back the #ifdef LOCK_PROFILE on ARM. (Bad change). Move the _erodata
    around. s/ssize_t/unsigned int/
v6: Redid it per Jan's review.
v7: Move build-id note in .rodata.note for EFI builds only.
    Move build-id note in .rodata for EFI builds only. Retain
    it in .note. Change name of object file used by EFI builds to notes.o
    Make on ELF builds the PT_NOTE section name be .note.gnu.build-id and
    ingest that in ELF build.
    Define NT_GNU_BUILD_ID in elfstructs.h
v8: s/num_phdrs/notes_phdrs/
   Added Andrew's Reviewed-by
v9:
   Made mkelf32 parse --notes as first argument
   Moved _erodata past .note.gnu.. section
   Added rm -f note.o on clean target.
   Make build_id_[p|len] be __read_mostly
   Add Jan's Acked-by
   Made the detection be smarter and just check for option requested being mirrored.
   (That takes care of checking unrecognized in different languages).
---
 Config.mk                    |  11 ++++
 xen/arch/arm/Makefile        |   2 +-
 xen/arch/arm/xen.lds.S       |  15 ++++-
 xen/arch/x86/Makefile        |  30 +++++++--
 xen/arch/x86/boot/mkelf32.c  | 142 +++++++++++++++++++++++++++++++++++++------
 xen/arch/x86/xen.lds.S       |  30 ++++++++-
 xen/common/version.c         |  52 ++++++++++++++++
 xen/include/xen/elfstructs.h |   3 +
 xen/include/xen/version.h    |   1 +
 9 files changed, 258 insertions(+), 28 deletions(-)

diff --git a/Config.mk b/Config.mk
index a0e6d4e..41f8c44 100644
--- a/Config.mk
+++ b/Config.mk
@@ -126,6 +126,17 @@ endef
 check-$(gcc) = $(call cc-ver-check,CC,0x040100,"Xen requires at least gcc-4.1")
 $(eval $(check-y))
 
+ld-ver-build-id = $(shell $(1) --build-id 2>&1 | \
+					grep -q build-id && echo n || echo y)
+
+export XEN_HAS_BUILD_ID ?= n
+ifeq ($(call ld-ver-build-id,$(LD)),n)
+build_id_linker :=
+else
+CFLAGS += -DBUILD_ID
+build_id_linker := --build-id=sha1
+endif
+
 # as-insn: Check whether assembler supports an instruction.
 # Usage: cflags-y += $(call as-insn "insn",option-yes,option-no)
 as-insn = $(if $(shell echo 'void _(void) { asm volatile ( $(2) ); }' \
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index f77f8db..ead0cc0 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -93,7 +93,7 @@ $(TARGET)-syms: prelink.o xen.lds $(BASEDIR)/common/symbols-dummy.o
 	$(NM) -pa --format=sysv $(@D)/.$(@F).1 \
 		| $(BASEDIR)/tools/symbols --sysv --sort >$(@D)/.$(@F).1.S
 	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1.o
-	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
+	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o $(build_id_linker) \
 	    $(@D)/.$(@F).1.o -o $@
 	rm -f $(@D)/.$(@F).[0-9]*
 
diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index 9909595..1f010bd 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -22,6 +22,9 @@ OUTPUT_ARCH(FORMAT)
 PHDRS
 {
   text PT_LOAD /* XXX should be AT ( XEN_PHYS_START ) */ ;
+#if defined(BUILD_ID)
+  note PT_NOTE ;
+#endif
 }
 SECTIONS
 {
@@ -57,10 +60,18 @@ SECTIONS
        *(.lockprofile.data)
        __lock_profile_end = .;
 #endif
-
-        _erodata = .;          /* End of read-only data */
   } :text
 
+#if defined(BUILD_ID)
+  . = ALIGN(4);
+  .note.gnu.build-id : {
+       __note_gnu_build_id_start = .;
+       *(.note.gnu.build-id)
+       __note_gnu_build_id_end = .;
+  } :note :text
+#endif
+  _erodata = .;                /* End of read-only data */
+
   .data : {                    /* Data */
        . = ALIGN(PAGE_SIZE);
        *(.data.page_aligned)
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index bd7ba9f..4665a68 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -74,6 +74,12 @@ efi-y := $(shell if [ ! -r $(BASEDIR)/include/xen/compile.h -o \
                       -O $(BASEDIR)/include/xen/compile.h ]; then \
                          echo '$(TARGET).efi'; fi)
 
+ifneq ($(build_id_linker),)
+notes_phdrs = --notes
+else
+notes_phdrs =
+endif
+
 ifdef CONFIG_XSPLICE
 all_symbols = --all-symbols
 ifdef CONFIG_FAST_SYMBOL_LOOKUP
@@ -84,7 +90,7 @@ all_symbols =
 endif
 
 $(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32
-	./boot/mkelf32 $(TARGET)-syms $(TARGET) 0x100000 \
+	./boot/mkelf32 $(notes_phdrs) $(TARGET)-syms $(TARGET) 0x100000 \
 	`$(NM) -nr $(TARGET)-syms | head -n 1 | sed -e 's/^\([^ ]*\).*/0x\1/'`
 
 .PHONY: tests
@@ -119,22 +125,28 @@ $(BASEDIR)/common/symbols-dummy.o:
 	$(MAKE) -f $(BASEDIR)/Rules.mk -C $(BASEDIR)/common symbols-dummy.o
 
 $(TARGET)-syms: prelink.o xen.lds $(BASEDIR)/common/symbols-dummy.o
-	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
+	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o $(build_id_linker) \
 	    $(BASEDIR)/common/symbols-dummy.o -o $(@D)/.$(@F).0
 	$(NM) -pa --format=sysv $(@D)/.$(@F).0 \
 		| $(BASEDIR)/tools/symbols $(all_symbols) --sysv --sort \
 		>$(@D)/.$(@F).0.S
 	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).0.o
-	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
+	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o $(build_id_linker) \
 	    $(@D)/.$(@F).0.o -o $(@D)/.$(@F).1
 	$(NM) -pa --format=sysv $(@D)/.$(@F).1 \
 		| $(BASEDIR)/tools/symbols $(all_symbols) --sysv --sort --warn-dup \
 		>$(@D)/.$(@F).1.S
 	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1.o
-	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
+	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o $(build_id_linker) \
 	    $(@D)/.$(@F).1.o -o $@
 	rm -f $(@D)/.$(@F).[0-9]*
 
+note.o: $(TARGET)-syms
+	$(OBJCOPY) -O binary --only-section=.note.gnu.build-id  $(BASEDIR)/xen-syms $@.bin
+	$(OBJCOPY) -I binary -O elf64-x86-64 -B i386:x86-64 \
+		--rename-section=.data=.note.gnu.build-id -S $@.bin $@
+	rm -f $@.bin
+
 EFI_LDFLAGS = $(patsubst -m%,-mi386pep,$(LDFLAGS)) --subsystem=10
 EFI_LDFLAGS += --image-base=$(1) --stack=0,0 --heap=0,0 --strip-debug
 EFI_LDFLAGS += --section-alignment=0x200000 --file-alignment=0x20
@@ -147,6 +159,13 @@ $(TARGET).efi: VIRT_BASE = 0x$(shell $(NM) efi/relocs-dummy.o | sed -n 's, A VIR
 $(TARGET).efi: ALT_BASE = 0x$(shell $(NM) efi/relocs-dummy.o | sed -n 's, A ALT_START$$,,p')
 # Don't use $(wildcard ...) here - at least make 3.80 expands this too early!
 $(TARGET).efi: guard = $(if $(shell echo efi/dis* | grep disabled),:)
+ifneq ($(build_id_linker),)
+$(TARGET).efi: note.o
+note_file := note.o
+else
+note_file :=
+endif
+
 $(TARGET).efi: prelink-efi.o efi.lds efi/relocs-dummy.o $(BASEDIR)/common/symbols-dummy.o efi/mkreloc
 	$(foreach base, $(VIRT_BASE) $(ALT_BASE), \
 	          $(guard) $(LD) $(call EFI_LDFLAGS,$(base)) -T efi.lds -N $< efi/relocs-dummy.o \
@@ -163,7 +182,7 @@ $(TARGET).efi: prelink-efi.o efi.lds efi/relocs-dummy.o $(BASEDIR)/common/symbol
 		| $(guard) $(BASEDIR)/tools/symbols $(all_symbols) --sysv --sort >$(@D)/.$(@F).1s.S
 	$(guard) $(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1r.o $(@D)/.$(@F).1s.o
 	$(guard) $(LD) $(call EFI_LDFLAGS,$(VIRT_BASE)) -T efi.lds -N $< \
-	                $(@D)/.$(@F).1r.o $(@D)/.$(@F).1s.o -o $@
+	                $(@D)/.$(@F).1r.o $(@D)/.$(@F).1s.o $(note_file) -o $@
 	if $(guard) false; then rm -f $@; echo 'EFI support disabled'; fi
 	rm -f $(@D)/.$(@F).[0-9]*
 
@@ -195,4 +214,5 @@ clean::
 	rm -f $(BASEDIR)/.xen-syms.[0-9]* boot/.*.d
 	rm -f $(BASEDIR)/.xen.efi.[0-9]* efi/*.o efi/.*.d efi/*.efi efi/disabled efi/mkreloc
 	rm -f boot/reloc.S boot/reloc.lnk boot/reloc.bin
+	rm -f note.o
 	$(MAKE) -f $(BASEDIR)/Rules.mk -C test clean
diff --git a/xen/arch/x86/boot/mkelf32.c b/xen/arch/x86/boot/mkelf32.c
index 993a7ee..8c51990 100644
--- a/xen/arch/x86/boot/mkelf32.c
+++ b/xen/arch/x86/boot/mkelf32.c
@@ -45,9 +45,9 @@ static Elf32_Ehdr out_ehdr = {
     0,                                       /* e_flags */
     sizeof(Elf32_Ehdr),                      /* e_ehsize */
     sizeof(Elf32_Phdr),                      /* e_phentsize */
-    1,                                       /* e_phnum */
+    1,  /* modify based on num_phdrs */      /* e_phnum */
     sizeof(Elf32_Shdr),                      /* e_shentsize */
-    3,                                       /* e_shnum */
+    3,  /* modify based on num_phdrs */      /* e_shnum */
     2                                        /* e_shstrndx */
 };
 
@@ -61,8 +61,20 @@ static Elf32_Phdr out_phdr = {
     PF_R|PF_W|PF_X,                          /* p_flags */
     64                                       /* p_align */
 };
+static Elf32_Phdr note_phdr = {
+    PT_NOTE,                                 /* p_type */
+    DYNAMICALLY_FILLED,                      /* p_offset */
+    DYNAMICALLY_FILLED,                      /* p_vaddr */
+    DYNAMICALLY_FILLED,                      /* p_paddr */
+    DYNAMICALLY_FILLED,                      /* p_filesz */
+    DYNAMICALLY_FILLED,                      /* p_memsz */
+    PF_R,                                    /* p_flags */
+    4                                        /* p_align */
+};
 
 static u8 out_shstrtab[] = "\0.text\0.shstrtab";
+/* If num_phdrs >= 2, we need to tack the .note. */
+static u8 out_shstrtab_extra[] = ".note\0";
 
 static Elf32_Shdr out_shdr[] = {
     { 0 },
@@ -90,6 +102,23 @@ static Elf32_Shdr out_shdr[] = {
     }
 };
 
+/*
+ * The 17 points to the '.note' in the out_shstrtab and out_shstrtab_extra
+ * laid out in the file.
+ */
+static Elf32_Shdr out_shdr_note = {
+      17,                                    /* sh_name */
+      SHT_NOTE,                              /* sh_type */
+      0,                                     /* sh_flags */
+      DYNAMICALLY_FILLED,                    /* sh_addr */
+      DYNAMICALLY_FILLED,                    /* sh_offset */
+      DYNAMICALLY_FILLED,                    /* sh_size */
+      0,                                     /* sh_link */
+      0,                                     /* sh_info */
+      4,                                     /* sh_addralign */
+      0                                      /* sh_entsize */
+};
+
 /* Some system header files define these macros and pollute our namespace. */
 #undef swap16
 #undef swap32
@@ -228,28 +257,34 @@ static void do_read(int fd, void *data, int len)
 int main(int argc, char **argv)
 {
     u64        final_exec_addr;
-    u32        loadbase, dat_siz, mem_siz;
+    u32        loadbase, dat_siz, mem_siz, note_base, note_sz, offset;
     char      *inimage, *outimage;
     int        infd, outfd;
     char       buffer[1024];
-    int        bytes, todo, i;
+    int        bytes, todo, i = 1;
+    int        num_phdrs = 1;
 
     Elf32_Ehdr in32_ehdr;
 
     Elf64_Ehdr in64_ehdr;
     Elf64_Phdr in64_phdr;
 
-    if ( argc != 5 )
+    if ( argc < 5 )
     {
-        fprintf(stderr, "Usage: mkelf32 <in-image> <out-image> "
+        fprintf(stderr, "Usage: mkelf32 [--notes] <in-image> <out-image> "
                 "<load-base> <final-exec-addr>\n");
         return 1;
     }
 
-    inimage  = argv[1];
-    outimage = argv[2];
-    loadbase = strtoul(argv[3], NULL, 16);
-    final_exec_addr = strtoull(argv[4], NULL, 16);
+    if ( !strcmp(argv[1], "--notes") )
+    {
+        i = 2;
+        num_phdrs = 2;
+    }
+    inimage  = argv[i++];
+    outimage = argv[i++];
+    loadbase = strtoul(argv[i++], NULL, 16);
+    final_exec_addr = strtoull(argv[i++], NULL, 16);
 
     infd = open(inimage, O_RDONLY);
     if ( infd == -1 )
@@ -285,11 +320,10 @@ int main(int argc, char **argv)
                 (int)in64_ehdr.e_phentsize, (int)sizeof(in64_phdr));
         return 1;
     }
-
-    if ( in64_ehdr.e_phnum != 1 )
+    if ( in64_ehdr.e_phnum != num_phdrs )
     {
-        fprintf(stderr, "Expect precisly 1 program header; found %d.\n",
-                (int)in64_ehdr.e_phnum);
+        fprintf(stderr, "Expect precisly %d program header; found %d.\n",
+                num_phdrs, (int)in64_ehdr.e_phnum);
         return 1;
     }
 
@@ -304,6 +338,32 @@ int main(int argc, char **argv)
     /*mem_siz = (u32)in64_phdr.p_memsz;*/
     mem_siz = (u32)(final_exec_addr - in64_phdr.p_vaddr);
 
+    note_sz = note_base = offset = 0;
+    if ( num_phdrs > 1 )
+    {
+        offset = in64_phdr.p_offset;
+        note_base = in64_phdr.p_vaddr;
+
+        (void)lseek(infd, in64_ehdr.e_phoff+sizeof(in64_phdr), SEEK_SET);
+        do_read(infd, &in64_phdr, sizeof(in64_phdr));
+        endianadjust_phdr64(&in64_phdr);
+
+        (void)lseek(infd, offset, SEEK_SET);
+
+        note_sz = in64_phdr.p_memsz;
+        note_base = in64_phdr.p_vaddr - note_base;
+
+        if ( in64_phdr.p_offset > dat_siz || offset > in64_phdr.p_offset )
+        {
+            fprintf(stderr, "Expected .note section within .text section!\n" \
+                    "Offset %ld not within %d!\n",
+                    in64_phdr.p_offset, dat_siz);
+            return 1;
+        }
+        /* Gets us the absolute offset within the .text section. */
+        offset = in64_phdr.p_offset - offset;
+    }
+
     /*
      * End the image on a page boundary. This gets round alignment bugs
      * in the boot- or chain-loader (e.g., kexec on the XenoBoot CD).
@@ -322,6 +382,31 @@ int main(int argc, char **argv)
     out_shdr[1].sh_size   = dat_siz;
     out_shdr[2].sh_offset = RAW_OFFSET + dat_siz + sizeof(out_shdr);
 
+    if ( num_phdrs > 1 )
+    {
+        /* We have two of them! */
+        out_ehdr.e_phnum = num_phdrs;
+        /* Extra .note section. */
+        out_ehdr.e_shnum++;
+
+        /* Fill out the PT_NOTE program header. */
+        note_phdr.p_vaddr   = note_base;
+        note_phdr.p_paddr   = note_base;
+        note_phdr.p_filesz  = note_sz;
+        note_phdr.p_memsz   = note_sz;
+        note_phdr.p_offset  = offset;
+
+        /* Tack on the .note\0 */
+        out_shdr[2].sh_size += sizeof(out_shstrtab_extra);
+        /* And move it past the .note section. */
+        out_shdr[2].sh_offset += sizeof(out_shdr_note);
+
+        /* Fill out the .note section. */
+        out_shdr_note.sh_size = note_sz;
+        out_shdr_note.sh_addr = note_base;
+        out_shdr_note.sh_offset = RAW_OFFSET + offset;
+    }
+
     outfd = open(outimage, O_WRONLY|O_CREAT|O_TRUNC, 0775);
     if ( outfd == -1 )
     {
@@ -335,8 +420,14 @@ int main(int argc, char **argv)
 
     endianadjust_phdr32(&out_phdr);
     do_write(outfd, &out_phdr, sizeof(out_phdr));
-    
-    if ( (bytes = RAW_OFFSET - sizeof(out_ehdr) - sizeof(out_phdr)) < 0 )
+
+    if ( num_phdrs > 1 )
+    {
+        endianadjust_phdr32(&note_phdr);
+        do_write(outfd, &note_phdr, sizeof(note_phdr));
+    }
+
+    if ( (bytes = RAW_OFFSET - sizeof(out_ehdr) - (num_phdrs * sizeof(out_phdr)) ) < 0 )
     {
         fprintf(stderr, "Header overflow.\n");
         return 1;
@@ -355,9 +446,22 @@ int main(int argc, char **argv)
         endianadjust_shdr32(&out_shdr[i]);
     do_write(outfd, &out_shdr[0], sizeof(out_shdr));
 
-    do_write(outfd, out_shstrtab, sizeof(out_shstrtab));
-    do_write(outfd, buffer, 4-((sizeof(out_shstrtab)+dat_siz)&3));
-
+    if ( num_phdrs > 1 )
+    {
+        endianadjust_shdr32(&out_shdr_note);
+        /* Append the .note section. */
+        do_write(outfd, &out_shdr_note, sizeof(out_shdr_note));
+        /* The normal strings - .text\0.. */
+        do_write(outfd, out_shstrtab, sizeof(out_shstrtab));
+        /* Our .note */
+        do_write(outfd, out_shstrtab_extra, sizeof(out_shstrtab_extra));
+        do_write(outfd, buffer, 4-((sizeof(out_shstrtab)+sizeof(out_shstrtab_extra)+dat_siz)&3));
+    }
+    else
+    {
+        do_write(outfd, out_shstrtab, sizeof(out_shstrtab));
+        do_write(outfd, buffer, 4-((sizeof(out_shstrtab)+dat_siz)&3));
+    }
     close(infd);
     close(outfd);
 
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 5eb825e..b14bcd2 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -31,6 +31,9 @@ OUTPUT_ARCH(i386:x86-64)
 PHDRS
 {
   text PT_LOAD ;
+#if defined(BUILD_ID) && !defined(EFI)
+  note PT_NOTE ;
+#endif
 }
 SECTIONS
 {
@@ -79,6 +82,16 @@ SECTIONS
        *(.rodata)
        *(.rodata.*)
 
+#if defined(BUILD_ID) && defined(EFI)
+/*
+ * No mechanism to put an PT_NOTE in the EFI file - so put
+ * it in .rodata section. (notes.o supplies us with .note.gnu.build-id).
+ */
+       . = ALIGN(4);
+       __note_gnu_build_id_start = .;
+       *(.note.gnu.build-id)
+       __note_gnu_build_id_end = .;
+#endif
        . = ALIGN(8);
        /* Exception table */
        __start___ex_table = .;
@@ -96,9 +109,24 @@ SECTIONS
        *(.lockprofile.data)
        __lock_profile_end = .;
 #endif
-       _erodata = .;
   } :text
 
+#if defined(BUILD_ID) && !defined(EFI)
+/*
+ * What a strange section name. The reason is that on ELF builds this section
+ * is extracted to notes.o (which then is ingested in the EFI file). But the
+ * compiler may want to inject other things in the .note which we don't care
+ * about - hence this unique name.
+ */
+  . = ALIGN(4);
+  .note.gnu.build-id : {
+       __note_gnu_build_id_start = .;
+       *(.note.gnu.build-id)
+       __note_gnu_build_id_end = .;
+  } :note :text
+#endif
+  _erodata = .;
+
 #ifdef EFI
   . = ALIGN(MB(2));
 #else
diff --git a/xen/common/version.c b/xen/common/version.c
index fc9bf42..30578a6 100644
--- a/xen/common/version.c
+++ b/xen/common/version.c
@@ -1,6 +1,13 @@
 #include <xen/compile.h>
+#include <xen/init.h>
+#include <xen/errno.h>
+#include <xen/string.h>
+#include <xen/types.h>
+#include <xen/elf.h>
 #include <xen/version.h>
 
+#include <asm/cache.h>
+
 const char *xen_compile_date(void)
 {
     return XEN_COMPILE_DATE;
@@ -61,6 +68,51 @@ const char *xen_deny(void)
     return "<denied>";
 }
 
+static const void *build_id_p __read_mostly;
+static unsigned int build_id_len __read_mostly;
+
+int xen_build_id(const void **p, unsigned int *len)
+{
+    if ( !build_id_len )
+        return -ENODATA;
+
+    *len = build_id_len;
+    *p = build_id_p;
+
+    return 0;
+}
+
+#ifdef BUILD_ID
+/* Defined in linker script. */
+extern const Elf_Note __note_gnu_build_id_start[], __note_gnu_build_id_end[];
+
+static int __init xen_build_init(void)
+{
+    const Elf_Note *n = __note_gnu_build_id_start;
+
+    /* --build-id invoked with wrong parameters. */
+    if ( __note_gnu_build_id_end <= &n[0] )
+        return -ENODATA;
+
+    /* Check for full Note header. */
+    if ( &n[1] > __note_gnu_build_id_end )
+        return -ENODATA;;
+
+    /* Check if we really have a build-id. */
+    if ( NT_GNU_BUILD_ID != n->type )
+        return -ENODATA;
+
+    /* Sanity check, name should be "GNU" for ld-generated build-id. */
+    if ( strncmp(ELFNOTE_NAME(n), "GNU", n->namesz) != 0 )
+        return -ENODATA;
+
+    build_id_len = n->descsz;
+    build_id_p = ELFNOTE_DESC(n);
+
+    return 0;
+}
+__initcall(xen_build_init);
+#endif
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/xen/elfstructs.h b/xen/include/xen/elfstructs.h
index ab9b1ea..615eb06 100644
--- a/xen/include/xen/elfstructs.h
+++ b/xen/include/xen/elfstructs.h
@@ -40,6 +40,9 @@ typedef uint32_t	Elf64_Word;
 typedef int64_t		Elf64_Sxword;
 typedef uint64_t	Elf64_Xword;
 
+/* Unique build id string format when using --build-id. */
+#define NT_GNU_BUILD_ID 3
+
 /*
  * e_ident[] identification indexes
  * See http://www.caldera.com/developers/gabi/2000-07-17/ch4.eheader.html 
diff --git a/xen/include/xen/version.h b/xen/include/xen/version.h
index 2015c0b..400160f 100644
--- a/xen/include/xen/version.h
+++ b/xen/include/xen/version.h
@@ -13,5 +13,6 @@ const char *xen_extra_version(void);
 const char *xen_changeset(void);
 const char *xen_banner(void);
 const char *xen_deny(void);
+int xen_build_id(const void **p, unsigned int *len);
 
 #endif /* __XEN_VERSION_H__ */
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 21/27] xsplice: Print build_id in keyhandler and on bootup.
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (19 preceding siblings ...)
  2016-04-25 15:35 ` [PATCH v9 20/27] build_id: Provide ld-embedded build-ids Konrad Rzeszutek Wilk
@ 2016-04-25 15:35 ` Konrad Rzeszutek Wilk
  2016-04-25 15:35 ` [PATCH v9 22/27] XENVER_build_id/libxc: Provide ld-embedded build-id Konrad Rzeszutek Wilk
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:35 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Keir Fraser, Tim Deegan, Ian Jackson, Jan Beulich, Konrad Rzeszutek Wilk

As it should be an useful debug mechanism.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>

v2: s/char */const void *
v5: s/ssize_t/unsigned int/
v6: Remove pointless initializers, use string literal instead of %s,
    add Jan's Ack.
v7: Add Andrew's Reviewed-by
---
---
 xen/common/xsplice.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 05064ae..934dd22 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -16,6 +16,7 @@
 #include <xen/spinlock.h>
 #include <xen/string.h>
 #include <xen/symbols.h>
+#include <xen/version.h>
 #include <xen/virtual_region.h>
 #include <xen/vmap.h>
 #include <xen/wait.h>
@@ -1363,10 +1364,15 @@ static const char *state2str(uint32_t state)
 static void xsplice_printall(unsigned char key)
 {
     struct payload *data;
+    const void *binary_id = NULL;
+    unsigned int len = 0;
     unsigned int i;
 
     printk("'%c' pressed - Dumping all xsplice patches\n", key);
 
+    if ( !xen_build_id(&binary_id, &len) )
+        printk("build-id: %*phN\n", len, binary_id);
+
     if ( !spin_trylock(&payload_lock) )
     {
         printk("Lock held. Try again.\n");
@@ -1403,6 +1409,12 @@ static void xsplice_printall(unsigned char key)
 
 static int __init xsplice_init(void)
 {
+    const void *binary_id;
+    unsigned int len;
+
+    if ( !xen_build_id(&binary_id, &len) )
+        printk(XENLOG_INFO XSPLICE ": build-id: %*phN\n", len, binary_id);
+
     register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
 
     arch_xsplice_init();
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 22/27] XENVER_build_id/libxc: Provide ld-embedded build-id
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (20 preceding siblings ...)
  2016-04-25 15:35 ` [PATCH v9 21/27] xsplice: Print build_id in keyhandler and on bootup Konrad Rzeszutek Wilk
@ 2016-04-25 15:35 ` Konrad Rzeszutek Wilk
  2016-04-25 15:35 ` [PATCH v9 23/27] libxl: info: Display build_id of the hypervisor Konrad Rzeszutek Wilk
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:35 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Wei Liu, Daniel De Graaf, Ian Jackson, Konrad Rzeszutek Wilk

If the hypervisor was built with build-ids we can expose the
build-id value to the toolstack (if it is not built with
it will just return -ENODATA). This is a priviligied operation
so only the controlling stack is able to request this.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Jan Beulich <jbeulich@suse.com>

---
CC: Daniel De Graaf <dgdegra@tycho.nsa.gov>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>

v1: Rebase it on Martin's initial patch
v2: Move it to XENVER hypercall
v3: Don't use the third argument for length.
   - Use new structure for XENVER_build_id with variable buf.
v8: Resurrected from v3!
v9: Added Acks from Wei, Daniel and Jan
    Removed pointless initializers.
---
---
 tools/flask/policy/policy/modules/xen/xen.te |  1 +
 tools/libxc/xc_private.c                     |  7 ++++++
 tools/libxc/xc_private.h                     | 11 +++++++++
 xen/common/kernel.c                          | 36 ++++++++++++++++++++++++++++
 xen/include/public/version.h                 | 18 +++++++++++++-
 xen/xsm/flask/hooks.c                        |  3 +++
 xen/xsm/flask/policy/access_vectors          |  2 ++
 7 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index daa1315..bef33b0 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -82,6 +82,7 @@ allow dom0_t xen_t:xen2 {
 allow dom0_t xen_t:version {
     xen_extraversion xen_compile_info xen_capabilities
     xen_changeset xen_pagesize xen_guest_handle xen_commandline
+    xen_build_id
 };
 
 allow dom0_t xen_t:mmu memorymap;
diff --git a/tools/libxc/xc_private.c b/tools/libxc/xc_private.c
index c41e433..d57c39a 100644
--- a/tools/libxc/xc_private.c
+++ b/tools/libxc/xc_private.c
@@ -495,6 +495,13 @@ int xc_version(xc_interface *xch, int cmd, void *arg)
     case XENVER_commandline:
         sz = sizeof(xen_commandline_t);
         break;
+    case XENVER_build_id:
+        {
+            xen_build_id_t *build_id = (xen_build_id_t *)arg;
+            sz = sizeof(*build_id) + build_id->len;
+            HYPERCALL_BOUNCE_SET_DIR(arg, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
+            break;
+        }
     default:
         ERROR("xc_version: unknown command %d\n", cmd);
         return -EINVAL;
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index aa8daf1..75b761c 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -197,6 +197,17 @@ enum {
 #define HYPERCALL_BOUNCE_SET_SIZE(_buf, _sz) do { (HYPERCALL_BUFFER(_buf))->sz = _sz; } while (0)
 
 /*
+ * Change the direction.
+ *
+ * Can only be used if the bounce_pre/bounce_post commands have
+ * not been used.
+ */
+#define HYPERCALL_BOUNCE_SET_DIR(_buf, _dir) do { if ((HYPERCALL_BUFFER(_buf))->hbuf)         \
+                                                        assert(1);                            \
+                                                   (HYPERCALL_BUFFER(_buf))->dir = _dir;      \
+                                                } while (0)
+
+/*
  * Initialise and free hypercall safe memory. Takes care of any required
  * copying.
  */
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index a4a3c36..1a6823a 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -376,6 +376,42 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             return -EFAULT;
         return 0;
     }
+
+    case XENVER_build_id:
+    {
+        xen_build_id_t build_id;
+        unsigned int sz;
+        int rc;
+        const void *p;
+
+        if ( deny )
+            return -EPERM;
+
+        /* Only return size. */
+        if ( !guest_handle_is_null(arg) )
+        {
+            if ( copy_from_guest(&build_id, arg, 1) )
+                return -EFAULT;
+
+            if ( build_id.len == 0 )
+                return -EINVAL;
+        }
+
+        rc = xen_build_id(&p, &sz);
+        if ( rc )
+            return rc;
+
+        if ( guest_handle_is_null(arg) )
+            return sz;
+
+        if ( sz > build_id.len )
+            return -ENOBUFS;
+
+        if ( copy_to_guest_offset(arg, offsetof(xen_build_id_t, buf), p, sz) )
+            return -EFAULT;
+
+        return sz;
+    }
     }
 
     return -ENOSYS;
diff --git a/xen/include/public/version.h b/xen/include/public/version.h
index 24a582f..cb84565 100644
--- a/xen/include/public/version.h
+++ b/xen/include/public/version.h
@@ -30,7 +30,8 @@
 
 #include "xen.h"
 
-/* NB. All ops return zero on success, except XENVER_{version,pagesize} */
+/* NB. All ops return zero on success, except XENVER_{version,pagesize}
+ * XENVER_{version,pagesize,build_id} */
 
 /* arg == NULL; returns major:minor (16:16). */
 #define XENVER_version      0
@@ -87,6 +88,21 @@ typedef struct xen_feature_info xen_feature_info_t;
 #define XENVER_commandline 9
 typedef char xen_commandline_t[1024];
 
+/*
+ * Return value is the number of bytes written, or XEN_Exx on error.
+ * Calling with empty parameter returns the size of build_id.
+ */
+#define XENVER_build_id 10
+struct xen_build_id {
+        uint32_t        len; /* IN: size of buf[]. */
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
+        unsigned char   buf[];
+#elif defined(__GNUC__)
+        unsigned char   buf[1]; /* OUT: Variable length buffer with build_id. */
+#endif
+};
+typedef struct xen_build_id xen_build_id_t;
+
 #endif /* __XEN_PUBLIC_VERSION_H__ */
 
 /*
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index c2df48f..7477dbe 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1663,6 +1663,9 @@ static int flask_xen_version (uint32_t op)
     case XENVER_commandline:
         return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
                             VERSION__XEN_COMMANDLINE, NULL);
+    case XENVER_build_id:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_BUILD_ID, NULL);
     default:
         return -EPERM;
     }
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index e9ab149..4d1b548 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -525,4 +525,6 @@ class version
     xen_guest_handle
 # Xen command line.
     xen_commandline
+# Xen build id
+    xen_build_id
 }
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 23/27] libxl: info: Display build_id of the hypervisor.
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (21 preceding siblings ...)
  2016-04-25 15:35 ` [PATCH v9 22/27] XENVER_build_id/libxc: Provide ld-embedded build-id Konrad Rzeszutek Wilk
@ 2016-04-25 15:35 ` Konrad Rzeszutek Wilk
  2016-04-25 15:35 ` [PATCH v9 24/27] xsplice: Stacking build-id dependency checking Konrad Rzeszutek Wilk
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:35 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Wei Liu, Ian Jackson, Konrad Rzeszutek Wilk

If the hypervisor is built with we will display it.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>

---
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>

v2: Include HAVE_*, use libxl_zalloc, s/rc/ret/
v3: Retry with different size if 1020 is not enough.
v4: Use VERSION_OP subops instead of the XENVER_ subops
v5: Change it per Wei's review. s/VERSION_OP/VERSION/
    And actually use the proper Style!
v8: VERSION_OP was reverted, resurrect v3 version.
v9: Made the if (r) LOGEV adhere to StyleGuide
---
 tools/libxl/libxl.c         | 44 ++++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl.h         |  6 ++++++
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl_cmdimpl.c    |  1 +
 4 files changed, 52 insertions(+)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index eec899d..c39d745 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5353,6 +5353,38 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
     return ret;
 }
 
+static const int libxl__xc_version_wrap(libxl__gc *gc, libxl_version_info *info,
+                                        xen_build_id_t *build)
+{
+    int r;
+
+    r = xc_version(CTX->xch, XENVER_build_id, build);
+    switch (r) {
+    case -EPERM:
+    case -ENODATA:
+    case 0:
+        info->build_id = libxl__strdup(NOGC, "");
+        break;
+
+    case -ENOBUFS:
+        break;
+
+    default:
+        if (r > 0) {
+            unsigned int i;
+
+            info->build_id = libxl__zalloc(NOGC, (r * 2) + 1);
+
+            for (i = 0; i < r ; i++)
+                snprintf(&info->build_id[i * 2], 3, "%02hhx", build->buf[i]);
+
+            r = 0;
+        }
+        break;
+    }
+    return r;
+}
+
 const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx)
 {
     GC_INIT(ctx);
@@ -5363,8 +5395,10 @@ const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx)
         xen_capabilities_info_t xen_caps;
         xen_platform_parameters_t p_parms;
         xen_commandline_t xen_commandline;
+        xen_build_id_t build_id;
     } u;
     long xen_version;
+    int r;
     libxl_version_info *info = &ctx->version_info;
 
     if (info->xen_version_extra != NULL)
@@ -5397,6 +5431,16 @@ const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx)
     xc_version(ctx->xch, XENVER_commandline, &u.xen_commandline);
     info->commandline = libxl__strdup(NOGC, u.xen_commandline);
 
+    u.build_id.len = sizeof(u) - sizeof(u.build_id);
+    r = libxl__xc_version_wrap(gc, info, &u.build_id);
+    if (r == -ENOBUFS) {
+            xen_build_id_t *build_id;
+
+            build_id = libxl__zalloc(gc, info->pagesize);
+            build_id->len = info->pagesize - sizeof(*build_id);
+            r = libxl__xc_version_wrap(gc, info, build_id);
+            if (r) LOGEV(ERROR, r, "getting build_id");
+    }
  out:
     GC_FREE;
     return info;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 8ff5f31..2c0f868 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -247,6 +247,12 @@
 #define LIBXL_HAVE_APIC_ASSIST 1
 
 /*
+ * LIBXL_HAVE_BUILD_ID means that libxl_version_info has the extra
+ * field for the hypervisor build_id.
+ */
+#define LIBXL_HAVE_BUILD_ID 1
+
+/*
  * libxl ABI compatibility
  *
  * The only guarantee which libxl makes regarding ABI compatibility
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index c3161f3..9840f3b 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -365,6 +365,7 @@ libxl_version_info = Struct("version_info", [
     ("virt_start",        uint64),
     ("pagesize",          integer),
     ("commandline",       string),
+    ("build_id",          string),
     ], dir=DIR_OUT)
 
 libxl_domain_create_info = Struct("domain_create_info",[
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 6346017..ac7d759 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -5920,6 +5920,7 @@ static void output_xeninfo(void)
     printf("cc_compile_by          : %s\n", info->compile_by);
     printf("cc_compile_domain      : %s\n", info->compile_domain);
     printf("cc_compile_date        : %s\n", info->compile_date);
+    printf("build_id               : %s\n", info->build_id);
 
     return;
 }
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 24/27] xsplice: Stacking build-id dependency checking.
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (22 preceding siblings ...)
  2016-04-25 15:35 ` [PATCH v9 23/27] libxl: info: Display build_id of the hypervisor Konrad Rzeszutek Wilk
@ 2016-04-25 15:35 ` Konrad Rzeszutek Wilk
  2016-04-27  9:27   ` Jan Beulich
  2016-04-25 15:35 ` [PATCH v9 25/27] xsplice/xen_replace_world: Test-case for XSPLICE_ACTION_REPLACE Konrad Rzeszutek Wilk
                   ` (3 subsequent siblings)
  27 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:35 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Keir Fraser, Jan Beulich, Konrad Rzeszutek Wilk

We now expect that the ELF payloads be built with the
--build-id.

Also the .xsplice.deps section has to have the contents
of the hypervisor (or a preceding payload) build-id.

We already have the code to verify the Elf_Note build-id
so export parts of it.

This dependency means the hypervisor MUST be compiled with
--build-id - so we gate the build of xSplice on the availability
of said functionality.

This does not impact the ordering of how the payloads can
be loaded, but it does enforce an STRICT ordering when the
payloads are applied. Also the REPLACE is special - we need
to check that its dependency against the hypervisor - not
the last applied patch.

To make this easier to test we also add an extra test-case
to be used - which can only be applied on top of the
xen_hello_world payload.

As in, one can apply xen_hello_world and then xen_bye_world
on top of that. Not the other way.

We also print the dependency and payloads build_in the keyhandler.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

---
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v3: First time included.
v4: Andrew fix against the build_id.o mutilations.
    Andrew fix to not include extra symbols in binary.id
v5: s/ssize_t/unsigned int/
v6: s/an NT_GNU../a NT_GNU/
   - Squash "xsplice: Print dependency and payloads build_id in the keyhandler"
     in this patch.
   - Add in xen_build_id_check size of section for better checking.
v7: Added Andrew's reviewed-by.
    Change the .name in test-case to adhere to spec.
    Dropped NT_GNU_BUILD_ID and moved that to earlier patch
    (build_id: Provide ld-embedded build-ids)
    Amended spec and code to only have one of .xsplice.depends and
    .note.gnu.build-id
    Expanded comment about note.o and why we don't use arch/x86/note.o.bin
    Moved xen_build_id_check definition to xsplice.h from version.h
    (and dropping the #include's in version.h)
    Sort header files in tests.
v8:
 - Change two of the dprinkt from XENLOG_DEBUG to XENLOG_ERR
v9:
 - Dropped the (unsigned long) casts since we use void.
 - Make the .xsplice_depends and .note.gnu_build_id be #defines.
 - Make the build section use $(XSPLICE_BYE)
 - Make the testcase include <public/sysctl.h>
 - Made comparisons on descsz and namesz a bit different (overflow
   checks, against value of 4, and against size)
---
 .gitignore                             |   1 +
 Config.mk                              |   1 +
 docs/misc/xsplice.markdown             |  99 +++++++++++++++++----------
 xen/arch/x86/test/Makefile             |  49 ++++++++++++--
 xen/arch/x86/test/xen_bye_world.c      |  34 ++++++++++
 xen/arch/x86/test/xen_bye_world_func.c |  22 ++++++
 xen/common/Kconfig                     |   6 +-
 xen/common/version.c                   |  45 ++++++++++---
 xen/common/xsplice.c                   | 119 ++++++++++++++++++++++++++++++++-
 xen/include/xen/xsplice.h              |   4 ++
 10 files changed, 325 insertions(+), 55 deletions(-)
 create mode 100644 xen/arch/x86/test/xen_bye_world.c
 create mode 100644 xen/arch/x86/test/xen_bye_world_func.c

diff --git a/.gitignore b/.gitignore
index 4a81f43..88cec1d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -248,6 +248,7 @@ xen/arch/x86/efi/disabled
 xen/arch/x86/efi/mkreloc
 xen/arch/x86/test/config.h
 xen/arch/x86/test/xen_hello_world.xsplice
+xen/arch/x86/test/xen_bye_world.xsplice
 xen/arch/*/efi/boot.c
 xen/arch/*/efi/compat.c
 xen/arch/*/efi/efi.h
diff --git a/Config.mk b/Config.mk
index 41f8c44..614dc9e 100644
--- a/Config.mk
+++ b/Config.mk
@@ -134,6 +134,7 @@ ifeq ($(call ld-ver-build-id,$(LD)),n)
 build_id_linker :=
 else
 CFLAGS += -DBUILD_ID
+export XEN_HAS_BUILD_ID=y
 build_id_linker := --build-id=sha1
 endif
 
diff --git a/docs/misc/xsplice.markdown b/docs/misc/xsplice.markdown
index 62f143e..377ed6a 100644
--- a/docs/misc/xsplice.markdown
+++ b/docs/misc/xsplice.markdown
@@ -283,8 +283,17 @@ The xSplice core code loads the payload as a standard ELF binary, relocates it
 and handles the architecture-specifc sections as needed. This process is much
 like what the Linux kernel module loader does.
 
-The payload contains a section (xsplice_patch_func) with an array of structures
-describing the functions to be patched:
+The payload contains at least three sections:
+
+ * `.xsplice.funcs` - which is an array of xsplice_patch_func structures.
+ * `.xsplice.depends` - which is an ELF Note that describes what the payload
+    depends on. **MUST** have one.
+ *  `.note.gnu.build-id` - the build-id of this payload. **MUST** have one.
+
+### .xsplice.funcs
+
+The `.xsplice.funcs` contains an array of xsplice_patch_func structures
+which describe the functions to be patched:
 
 <pre>
 struct xsplice_patch_func {  
@@ -368,6 +377,23 @@ struct xsplice_patch_func xsplice_hello_world = {
 
 Code must be compiled with -fPIC.
 
+### .xsplice.depends and .note.gnu.build-id
+
+To support dependencies checking and safe loading (to load the
+appropiate payload against the right hypervisor) there is a need
+to embbed an build-id dependency.
+
+This is done by the payload containing an section `.xsplice.depends`
+which follows the format of an ELF Note. The contents of this
+(name, and description) are specific to the linker utilized to
+build the hypevisor and payload.
+
+If GNU linker is used then the name is `GNU` and the description
+is a NT_GNU_BUILD_ID type ID. The description can be an SHA1
+checksum, MD5 checksum or any unique value.
+
+The size of these structures varies with the --build-id linker option.
+
 ## Hypercalls
 
 We will employ the sub operations of the system management hypercall (sysctl).
@@ -863,6 +889,42 @@ This is implemented in the Xen Project hypervisor.
 
 Only the privileged domain should be allowed to do this operation.
 
+### xSplice interdependencies
+
+xSplice patches interdependencies are tricky.
+
+There are the ways this can be addressed:
+ * A single large patch that subsumes and replaces all previous ones.
+   Over the life-time of patching the hypervisor this large patch
+   grows to accumulate all the code changes.
+ * Hotpatch stack - where an mechanism exists that loads the hotpatches
+   in the same order they were built in. We would need an build-id
+   of the hypevisor to make sure the hot-patches are build against the
+   correct build.
+ * Payload containing the old code to check against that. That allows
+   the hotpatches to be loaded indepedently (if they don't overlap) - or
+   if the old code also containst previously patched code - even if they
+   overlap.
+
+The disadvantage of the first large patch is that it can grow over
+time and not provide an bisection mechanism to identify faulty patches.
+
+The hot-patch stack puts stricts requirements on the order of the patches
+being loaded and requires an hypervisor build-id to match against.
+
+The old code allows much more flexibility and an additional guard,
+but is more complex to implement.
+
+The second option which requires an build-id of the hypervisor
+is implemented in the Xen Project hypervisor.
+
+Specifically each payload has two build-id ELF notes:
+ * The build-id of the payload itself (generated via --build-id).
+ * The build-id of the payload it depends on (extracted from the
+   the previous payload or hypervisor during build time).
+
+This means that the very first payload depends on the hypervisor
+build-id.
 
 # Not Yet Done
 
@@ -882,13 +944,6 @@ The implementation must also have a mechanism for (in no particular order):
  * NOP out the code sequence if `new_size` is zero.
  * Deal with other relocation types:  R_X86_64_[8,16,32,32S], R_X86_64_PC[8,16,64]
    in payload file.
- * An dependency mechanism for the payloads. To use that information to load:
-    - The appropiate payload. To verify that payload is built against the
-      hypervisor. This can be done via the `build-id`
-      or via providing an copy of the old code - so that the hypervisor can
-       verify it against the code in memory.
-    - To construct an appropiate order of payloads to load in case they
-      depend on each other.
 
 ### Handle inlined __LINE__
 
@@ -953,32 +1008,6 @@ the function itself.
 Similar considerations are true to a lesser extent for __FILE__, but it
 could be argued that file renaming should be done outside of hotpatches.
 
-### xSplice interdependencies
-
-xSplice patches interdependencies are tricky.
-
-There are the ways this can be addressed:
- * A single large patch that subsumes and replaces all previous ones.
-   Over the life-time of patching the hypervisor this large patch
-   grows to accumulate all the code changes.
- * Hotpatch stack - where an mechanism exists that loads the hotpatches
-   in the same order they were built in. We would need an build-id
-   of the hypevisor to make sure the hot-patches are build against the
-   correct build.
- * Payload containing the old code to check against that. That allows
-   the hotpatches to be loaded indepedently (if they don't overlap) - or
-   if the old code also containst previously patched code - even if they
-   overlap.
-
-The disadvantage of the first large patch is that it can grow over
-time and not provide an bisection mechanism to identify faulty patches.
-
-The hot-patch stack puts stricts requirements on the order of the patches
-being loaded and requires an hypervisor build-id to match against.
-
-The old code allows much more flexibility and an additional guard,
-but is more complex to implement.
-
 ## Signature checking requirements.
 
 The signature checking requires that the layout of the data in memory
diff --git a/xen/arch/x86/test/Makefile b/xen/arch/x86/test/Makefile
index af72aff..83d4c06 100644
--- a/xen/arch/x86/test/Makefile
+++ b/xen/arch/x86/test/Makefile
@@ -6,17 +6,20 @@ CODE_SZ=$(shell nm --defined -S $(1) | grep $(2) | awk '{ print "0x"$$2}')
 .PHONY: default
 
 XSPLICE := xen_hello_world.xsplice
+XSPLICE_BYE := xen_bye_world.xsplice
 
 default: xsplice
 
 install: xsplice
 	$(INSTALL_DATA) $(XSPLICE) $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE)
+	$(INSTALL_DATA) $(XSPLICE_BYE) $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE_BYE)
 uninstall:
 	rm -f $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE)
+	rm -f $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE_BYE)
 
 .PHONY: clean
 clean::
-	rm -f *.o .*.o.d $(XSPLICE) config.h
+	rm -f *.o .*.o.d $(XSPLICE) $(XSPLICE_BYE) config.h *.bin
 
 #
 # To compute these values we need the binary files: xen-syms
@@ -25,7 +28,7 @@ clean::
 .PHONY: config.h
 config.h: OLD_CODE_SZ=$(call CODE_SZ,$(BASEDIR)/xen-syms,xen_extra_version)
 config.h: NEW_CODE_SZ=$(call CODE_SZ,$<,xen_hello_world)
-config.h: xen_hello_world_func.o
+config.h: xen_hello_world_func.o xen_bye_world_func.o
 	(set -e; \
 	 echo "#define NEW_CODE_SZ $(NEW_CODE_SZ)"; \
 	 echo "#define OLD_CODE_SZ $(OLD_CODE_SZ)") > $@
@@ -33,9 +36,43 @@ config.h: xen_hello_world_func.o
 xen_hello_world.o: xen_hello_world_func.o
 
 .PHONY: $(XSPLICE)
-$(XSPLICE): config.h xen_hello_world_func.o xen_hello_world.o
-	$(LD) $(LDFLAGS) -r -o $(XSPLICE) xen_hello_world_func.o \
-		xen_hello_world.o
+$(XSPLICE): config.h xen_hello_world_func.o xen_hello_world.o note.o
+	$(LD) $(LDFLAGS) $(build_id_linker) -r -o $(XSPLICE) \
+		xen_hello_world_func.o xen_hello_world.o note.o
+#
+# This target is only accessible if CONFIG_XSPLICE is defined, which
+# depends on $(build_id_linker) being available. Hence we do not
+# need any checks.
+#
+# N.B. The reason we don't use arch/x86/note.o is that it may
+# not be built (it is for EFI builds), and that we do not have
+# the note.o.bin to muck with (as it gets deleted)
+#
+.PHONY: note.o
+note.o:
+	$(OBJCOPY) -O binary --only-section=.note.gnu.build-id $(BASEDIR)/xen-syms $@.bin
+	$(OBJCOPY) -I binary -O elf64-x86-64 -B i386:x86-64 \
+		   --rename-section=.data=.xsplice.depends -S $@.bin $@
+	rm -f $@.bin
+
+#
+# Extract the build-id of the xen_hello_world.xsplice
+# (which xen_bye_world will depend on).
+#
+.PHONY: hello_world_note.o
+hello_world_note.o: $(XSPLICE)
+	$(OBJCOPY) -O binary --only-section=.note.gnu.build-id $(XSPLICE) $@.bin
+	$(OBJCOPY)  -I binary -O elf64-x86-64 -B i386:x86-64 \
+		   --rename-section=.data=.xsplice.depends -S $@.bin $@
+	rm -f $@.bin
+
+xen_bye_world.o: xen_bye_world_func.o
+
+.PHONY: $(XSPLICE_BYE)
+$(XSPLICE_BYE): $(XSPLICE) config.h xen_bye_world_func.o xen_bye_world.o hello_world_note.o
+	$(LD) $(LDFLAGS) $(build_id_linker) -r -o $(XSPLICE_BYE) \
+		xen_bye_world_func.o xen_bye_world.o hello_world_note.o
+
 
 .PHONY: xsplice
-xsplice: $(XSPLICE)
+xsplice: $(XSPLICE) $(XSPLICE_BYE)
diff --git a/xen/arch/x86/test/xen_bye_world.c b/xen/arch/x86/test/xen_bye_world.c
new file mode 100644
index 0000000..f93f969
--- /dev/null
+++ b/xen/arch/x86/test/xen_bye_world.c
@@ -0,0 +1,34 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#include "config.h"
+#include <xen/lib.h>
+#include <xen/types.h>
+#include <xen/version.h>
+#include <xen/xsplice.h>
+
+#include <public/sysctl.h>
+
+static char bye_world_patch_this_fnc[] = "xen_extra_version";
+extern const char *xen_bye_world(void);
+
+struct xsplice_patch_func __section(".xsplice.funcs") xsplice_xen_bye_world = {
+    .version = XSPLICE_PAYLOAD_VERSION,
+    .name = bye_world_patch_this_fnc,
+    .new_addr = xen_bye_world,
+    .old_addr = xen_extra_version,
+    .new_size = NEW_CODE_SZ,
+    .old_size = OLD_CODE_SZ,
+};
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/test/xen_bye_world_func.c b/xen/arch/x86/test/xen_bye_world_func.c
new file mode 100644
index 0000000..32ef341
--- /dev/null
+++ b/xen/arch/x86/test/xen_bye_world_func.c
@@ -0,0 +1,22 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#include <xen/types.h>
+
+/* Our replacement function for xen_hello_world. */
+const char *xen_bye_world(void)
+{
+    return "Bye World!";
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index e4f86c2..91ea904 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -60,6 +60,10 @@ config HAS_GDBSX
 config HAS_IOPORTS
 	bool
 
+config HAS_BUILD_ID
+	string
+	option env="XEN_HAS_BUILD_ID"
+
 # Enable/Disable kexec support
 config KEXEC
 	bool "kexec support"
@@ -192,7 +196,7 @@ endmenu
 config XSPLICE
 	bool "xSplice live patching support (TECH PREVIEW)"
 	default n
-	depends on X86
+	depends on X86 && HAS_BUILD_ID = "y"
 	---help---
 	  Allows a running Xen hypervisor to be dynamically patched using
 	  binary patches without rebooting. This is primarily used to binarily
diff --git a/xen/common/version.c b/xen/common/version.c
index 30578a6..744fbe3 100644
--- a/xen/common/version.c
+++ b/xen/common/version.c
@@ -86,9 +86,41 @@ int xen_build_id(const void **p, unsigned int *len)
 /* Defined in linker script. */
 extern const Elf_Note __note_gnu_build_id_start[], __note_gnu_build_id_end[];
 
+int xen_build_id_check(const Elf_Note *n, unsigned int n_sz,
+                       const void **p, unsigned int *len)
+{
+    /* Check if we really have a build-id. */
+    if ( NT_GNU_BUILD_ID != n->type )
+        return -ENODATA;
+
+    if ( n_sz <= sizeof(*n) )
+        return -EINVAL;
+
+    if ( n->namesz + n->descsz > UINT_MAX )
+        return -EINVAL;
+
+    if ( n->namesz != 4 /* GNU\0 */)
+        return -EINVAL;
+
+    if ( n->namesz + n->descsz + sizeof(*n) > n_sz )
+        return -EINVAL;
+
+    /* Sanity check, name should be "GNU" for ld-generated build-id. */
+    if ( strncmp(ELFNOTE_NAME(n), "GNU", n->namesz) != 0 )
+        return -ENODATA;
+
+    if ( len )
+        *len = n->descsz;
+    if ( p )
+        *p = ELFNOTE_DESC(n);
+
+    return 0;
+}
+
 static int __init xen_build_init(void)
 {
     const Elf_Note *n = __note_gnu_build_id_start;
+    size_t sz;
 
     /* --build-id invoked with wrong parameters. */
     if ( __note_gnu_build_id_end <= &n[0] )
@@ -98,18 +130,9 @@ static int __init xen_build_init(void)
     if ( &n[1] > __note_gnu_build_id_end )
         return -ENODATA;;
 
-    /* Check if we really have a build-id. */
-    if ( NT_GNU_BUILD_ID != n->type )
-        return -ENODATA;
+    sz = (size_t)__note_gnu_build_id_end - (size_t)n;
 
-    /* Sanity check, name should be "GNU" for ld-generated build-id. */
-    if ( strncmp(ELFNOTE_NAME(n), "GNU", n->namesz) != 0 )
-        return -ENODATA;
-
-    build_id_len = n->descsz;
-    build_id_p = ELFNOTE_DESC(n);
-
-    return 0;
+    return xen_build_id_check(n, sz, &build_id_p, &build_id_len);
 }
 __initcall(xen_build_init);
 #endif
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 934dd22..a8b208d 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -4,6 +4,7 @@
  */
 
 #include <xen/cpu.h>
+#include <xen/elf.h>
 #include <xen/err.h>
 #include <xen/guest_access.h>
 #include <xen/keyhandler.h>
@@ -42,6 +43,12 @@ static LIST_HEAD(applied_list);
 static unsigned int payload_cnt;
 static unsigned int payload_version = 1;
 
+/* To contain the ELF Note header. */
+struct xsplice_build_id {
+   const void *p;
+   unsigned int len;
+};
+
 struct payload {
     uint32_t state;                      /* One of the XSPLICE_STATE_*. */
     int32_t rc;                          /* 0 or -XEN_EXX. */
@@ -61,6 +68,8 @@ struct payload {
     struct virtual_region region;        /* symbol, bug.frame patching and
                                             exception table (x86). */
     unsigned int nsyms;                  /* Nr of entries in .strtab and symbols. */
+    struct xsplice_build_id id;          /* ELFNOTE_DESC(.note.gnu.build-id) of the payload. */
+    struct xsplice_build_id dep;         /* ELFNOTE_DESC(.xsplice.depends). */
     char name[XEN_XSPLICE_NAME_SIZE];    /* Name of it. */
 };
 
@@ -410,8 +419,10 @@ static int secure_payload(struct payload *payload, struct xsplice_elf *elf)
 static int check_special_sections(const struct xsplice_elf *elf)
 {
     unsigned int i;
-    static const char *const names[] = { ELF_XSPLICE_FUNC };
-    bool_t count[ARRAY_SIZE(names)] = { 0 };
+    static const char *const names[] = { ELF_XSPLICE_FUNC,
+                                         ELF_XSPLICE_DEPENDS,
+                                         ELF_BUILD_ID_NOTE};
+    bool_t count[ARRAY_SIZE(names)] = { 0, 0, 0 };
 
     for ( i = 0; i < ARRAY_SIZE(names); i++ )
     {
@@ -449,6 +460,7 @@ static int prepare_payload(struct payload *payload,
     unsigned int i;
     struct xsplice_patch_func *f;
     struct virtual_region *region;
+    const Elf_Note *n;
 
     sec = xsplice_elf_sec_by_name(elf, ELF_XSPLICE_FUNC);
     ASSERT(sec);
@@ -505,6 +517,37 @@ static int prepare_payload(struct payload *payload,
         }
     }
 
+    sec = xsplice_elf_sec_by_name(elf, ELF_BUILD_ID_NOTE);
+    if ( sec )
+    {
+        n = sec->load_addr;
+
+        if ( sec->sec->sh_size <= sizeof(*n) )
+            return -EINVAL;
+
+        if ( xen_build_id_check(n, sec->sec->sh_size,
+                                &payload->id.p, &payload->id.len) )
+            return -EINVAL;
+
+        if ( !payload->id.len || !payload->id.p )
+            return -EINVAL;
+    }
+
+    sec = xsplice_elf_sec_by_name(elf, ELF_XSPLICE_DEPENDS);
+    {
+        n = sec->load_addr;
+
+        if ( sec->sec->sh_size <= sizeof(*n) )
+            return -EINVAL;
+
+        if ( xen_build_id_check(n, sec->sec->sh_size,
+                                &payload->dep.p, &payload->dep.len) )
+            return -EINVAL;
+
+        if ( !payload->dep.len || !payload->dep.p )
+            return -EINVAL;
+    }
+
     /* Setup the virtual region with proper data. */
     region = &payload->region;
 
@@ -1234,6 +1277,55 @@ void check_for_xsplice_work(void)
     }
 }
 
+/*
+ * Only allow dependent payload is applied on top of the correct
+ * build-id.
+ *
+ * This enforces an stacking order - the first payload MUST be against the
+ * hypervisor. The second against the first payload, and so on.
+ *
+ * Unless the 'internal' parameter is used - in which case we only
+ * check against the hypervisor.
+ */
+static int build_id_dep(struct payload *payload, bool_t internal)
+{
+    const void *id = NULL;
+    unsigned int len = 0;
+    int rc;
+    const char *name = "hypervisor";
+
+    ASSERT(payload->dep.len && payload->dep.p);
+
+    /* First time user is against hypervisor. */
+    if ( internal )
+    {
+        rc = xen_build_id(&id, &len);
+        if ( rc )
+            return rc;
+    }
+    else
+    {
+        /* We should be against the last applied one. */
+        const struct payload *data;
+
+        data = list_last_entry(&applied_list, struct payload, applied_list);
+
+        id = data->id.p;
+        len = data->id.len;
+        name = data->name;
+    }
+
+    if ( payload->dep.len != len ||
+         memcmp(id, payload->dep.p, len) )
+    {
+        dprintk(XENLOG_ERR, "%s%s: check against %s build-id failed!\n",
+                XSPLICE, payload->name, name);
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
 static int xsplice_action(xen_sysctl_xsplice_action_t *action)
 {
     struct payload *data;
@@ -1282,6 +1374,18 @@ static int xsplice_action(xen_sysctl_xsplice_action_t *action)
     case XSPLICE_ACTION_REVERT:
         if ( data->state == XSPLICE_STATE_APPLIED )
         {
+            const struct payload *p;
+
+            p = list_last_entry(&applied_list, struct payload, applied_list);
+            ASSERT(p);
+            /* We should be the last applied one. */
+            if ( p != data )
+            {
+                dprintk(XENLOG_ERR, "%s%s: can't unload. Top is %s!\n",
+                        XSPLICE, data->name, p->name);
+                rc = -EBUSY;
+                break;
+            }
             data->rc = -EAGAIN;
             rc = schedule_work(data, action->cmd, action->timeout);
         }
@@ -1290,6 +1394,9 @@ static int xsplice_action(xen_sysctl_xsplice_action_t *action)
     case XSPLICE_ACTION_APPLY:
         if ( data->state == XSPLICE_STATE_CHECKED )
         {
+            rc = build_id_dep(data, !!list_empty(&applied_list));
+            if ( rc )
+                break;
             data->rc = -EAGAIN;
             rc = schedule_work(data, action->cmd, action->timeout);
         }
@@ -1298,6 +1405,9 @@ static int xsplice_action(xen_sysctl_xsplice_action_t *action)
     case XSPLICE_ACTION_REPLACE:
         if ( data->state == XSPLICE_STATE_CHECKED )
         {
+            rc = build_id_dep(data, 1 /* against hypervisor. */);
+            if ( rc )
+                break;
             data->rc = -EAGAIN;
             rc = schedule_work(data, action->cmd, action->timeout);
         }
@@ -1402,6 +1512,11 @@ static void xsplice_printall(unsigned char key)
                 }
             }
         }
+        if ( data->id.len )
+            printk("build-id=%*phN\n", data->id.len, data->id.p);
+
+        if ( data->dep.len )
+            printk("depend-on=%*phN\n", data->dep.len, data->dep.p);
     }
 
     spin_unlock(&payload_lock);
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index cbdbff1..005cf18 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -28,6 +28,8 @@ struct xen_sysctl_xsplice_op;
 #define XSPLICE             "xsplice: "
 /* ELF payload special section names. */
 #define ELF_XSPLICE_FUNC    ".xsplice.funcs"
+#define ELF_XSPLICE_DEPENDS ".xsplice.depends"
+#define ELF_BUILD_ID_NOTE   ".note.gnu.build-id"
 
 struct xsplice_symbol {
     const char *name;
@@ -40,6 +42,8 @@ int xsplice_op(struct xen_sysctl_xsplice_op *);
 void check_for_xsplice_work(void);
 void *xsplice_symbols_lookup_by_name(const char *symname);
 bool_t is_patch(const void *addr);
+int xen_build_id_check(const Elf_Note *n, unsigned int n_sz,
+                       const void **p, unsigned int *len);
 
 /* Arch hooks. */
 int arch_xsplice_verify_elf(const struct xsplice_elf *elf);
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 25/27] xsplice/xen_replace_world: Test-case for XSPLICE_ACTION_REPLACE
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (23 preceding siblings ...)
  2016-04-25 15:35 ` [PATCH v9 24/27] xsplice: Stacking build-id dependency checking Konrad Rzeszutek Wilk
@ 2016-04-25 15:35 ` Konrad Rzeszutek Wilk
  2016-04-25 15:35 ` [PATCH v9 26/27] xsplice: Prevent duplicate payloads from being loaded Konrad Rzeszutek Wilk
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:35 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Keir Fraser, Jan Beulich, Konrad Rzeszutek Wilk

With this third payload one can do:

-bash-4.1# xen-xsplice load xen_hello_world.xsplice
Uploading xen_hello_world.xsplice (10148 bytes)
Performing check: completed
Performing apply:. completed

[xen_hello_world depends on hypervisor build-id]
-bash-4.1# xen-xsplice load xen_bye_world.xsplice
Uploading xen_bye_world.xsplice (7076 bytes)
Performing check: completed
Performing apply:. completed
[xen_bye_world depends on xen_hello_world build-id]
-bash-4.1# xen-xsplice upload xen_replace_world xen_replace_world.xsplice
Uploading xen_replace_world.xsplice (7148 bytes)
-bash-4.1# xen-xsplice list
 ID                                     | status
----------------------------------------+------------
xen_hello_world                         | APPLIED
xen_bye_world                           | APPLIED
xen_replace_world                       | CHECKED
-bash-4.1# xen-xsplice replace xen_replace_world
Performing replace:. completed
-bash-4.1# xl info | grep extra
xen_extra              : Hello Again World!
-bash-4.1# xen-xsplice list
 ID                                     | status
----------------------------------------+------------
xen_hello_world                         | CHECKED
xen_bye_world                           | CHECKED
xen_replace_world                       | APPLIED

and revert both of the previous payloads and apply
the xen_replace_world.

All the magic of this is in the Makefile - we extract
the build-id from the hypervisor (xen-syms) and jam it
in the xen_replace_world as .xsplice.depends.

We also make .old_addr be zero, forcing the hypervisor
to lookup the xen_extra_version.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

---
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v4: New. Make the objcopy use -S to strip the name.
v7: Added Andrew's Reviewed-by
v9: Drop (unsignd long) casts on new_addr and old_addr as they are
    void pointers.
    Make the build section use $(XSPLICE_REPLACE)
    Make the test-case include <public/sysctl.h>
    Don't set .old_addr - let the hypervisor look it up.
---
 .gitignore                                 |  1 +
 xen/arch/x86/test/Makefile                 | 14 +++++++++++--
 xen/arch/x86/test/xen_replace_world.c      | 32 ++++++++++++++++++++++++++++++
 xen/arch/x86/test/xen_replace_world_func.c | 22 ++++++++++++++++++++
 4 files changed, 67 insertions(+), 2 deletions(-)
 create mode 100644 xen/arch/x86/test/xen_replace_world.c
 create mode 100644 xen/arch/x86/test/xen_replace_world_func.c

diff --git a/.gitignore b/.gitignore
index 88cec1d..1b73293 100644
--- a/.gitignore
+++ b/.gitignore
@@ -249,6 +249,7 @@ xen/arch/x86/efi/mkreloc
 xen/arch/x86/test/config.h
 xen/arch/x86/test/xen_hello_world.xsplice
 xen/arch/x86/test/xen_bye_world.xsplice
+xen/arch/x86/test/xen_replace_world.xsplice
 xen/arch/*/efi/boot.c
 xen/arch/*/efi/compat.c
 xen/arch/*/efi/efi.h
diff --git a/xen/arch/x86/test/Makefile b/xen/arch/x86/test/Makefile
index 83d4c06..e7d75b9 100644
--- a/xen/arch/x86/test/Makefile
+++ b/xen/arch/x86/test/Makefile
@@ -7,15 +7,18 @@ CODE_SZ=$(shell nm --defined -S $(1) | grep $(2) | awk '{ print "0x"$$2}')
 
 XSPLICE := xen_hello_world.xsplice
 XSPLICE_BYE := xen_bye_world.xsplice
+XSPLICE_REPLACE := xen_replace_world.xsplice
 
 default: xsplice
 
 install: xsplice
 	$(INSTALL_DATA) $(XSPLICE) $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE)
 	$(INSTALL_DATA) $(XSPLICE_BYE) $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE_BYE)
+	$(INSTALL_DATA) $(XSPLICE_REPLACE) $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE_REPLACE)
 uninstall:
 	rm -f $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE)
 	rm -f $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE_BYE)
+	rm -f $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE_REPLACE)
 
 .PHONY: clean
 clean::
@@ -28,7 +31,7 @@ clean::
 .PHONY: config.h
 config.h: OLD_CODE_SZ=$(call CODE_SZ,$(BASEDIR)/xen-syms,xen_extra_version)
 config.h: NEW_CODE_SZ=$(call CODE_SZ,$<,xen_hello_world)
-config.h: xen_hello_world_func.o xen_bye_world_func.o
+config.h: xen_hello_world_func.o xen_bye_world_func.o xen_replace_world_func.o
 	(set -e; \
 	 echo "#define NEW_CODE_SZ $(NEW_CODE_SZ)"; \
 	 echo "#define OLD_CODE_SZ $(OLD_CODE_SZ)") > $@
@@ -73,6 +76,13 @@ $(XSPLICE_BYE): $(XSPLICE) config.h xen_bye_world_func.o xen_bye_world.o hello_w
 	$(LD) $(LDFLAGS) $(build_id_linker) -r -o $(XSPLICE_BYE) \
 		xen_bye_world_func.o xen_bye_world.o hello_world_note.o
 
+xen_replace_world.o: xen_replace_world_func.o
+
+.PHONY: $(XSPLICE_REPLACE)
+$(XSPLICE_REPLACE): config.h xen_replace_world_func.o xen_replace_world.o note.o
+	$(LD) $(LDFLAGS) $(build_id_linker) -r -o $(XSPLICE_REPLACE) \
+		 xen_replace_world_func.o xen_replace_world.o note.o
+
 
 .PHONY: xsplice
-xsplice: $(XSPLICE) $(XSPLICE_BYE)
+xsplice: $(XSPLICE) $(XSPLICE_BYE) $(XSPLICE_REPLACE)
diff --git a/xen/arch/x86/test/xen_replace_world.c b/xen/arch/x86/test/xen_replace_world.c
new file mode 100644
index 0000000..4862601
--- /dev/null
+++ b/xen/arch/x86/test/xen_replace_world.c
@@ -0,0 +1,32 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#include "config.h"
+#include <xen/lib.h>
+#include <xen/types.h>
+#include <xen/xsplice.h>
+
+#include <public/sysctl.h>
+
+static char xen_replace_world_name[] = "xen_extra_version";
+extern const char *xen_replace_world(void);
+
+struct xsplice_patch_func __section(".xsplice.funcs") xsplice_xen_replace_world = {
+    .version = XSPLICE_PAYLOAD_VERSION,
+    .name = xen_replace_world_name,
+    .new_addr = xen_replace_world,
+    .new_size = NEW_CODE_SZ,
+    .old_size = OLD_CODE_SZ,
+};
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/test/xen_replace_world_func.c b/xen/arch/x86/test/xen_replace_world_func.c
new file mode 100644
index 0000000..afb5cda
--- /dev/null
+++ b/xen/arch/x86/test/xen_replace_world_func.c
@@ -0,0 +1,22 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#include <xen/types.h>
+
+/* Our replacement function for xen_hello_world. */
+const char *xen_replace_world(void)
+{
+    return "Hello Again World!";
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 26/27] xsplice: Prevent duplicate payloads from being loaded.
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (24 preceding siblings ...)
  2016-04-25 15:35 ` [PATCH v9 25/27] xsplice/xen_replace_world: Test-case for XSPLICE_ACTION_REPLACE Konrad Rzeszutek Wilk
@ 2016-04-25 15:35 ` Konrad Rzeszutek Wilk
  2016-04-27  9:31   ` Jan Beulich
  2016-04-25 15:35 ` [PATCH v9 27/27] MAINTAINERS/xsplice: Add myself and Ross as the maintainers Konrad Rzeszutek Wilk
  2016-04-25 15:41 ` [PATCH 9] xSplice v1 design and implementation Jan Beulich
  27 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:35 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Keir Fraser, Tim Deegan, Ian Jackson, Jan Beulich, Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>

v6: Drop recursive lock - also now the caller is holding the lock
    Move the code up in the code above.
v7: Add Andrew's Reviewed-by
v9: Add const on struct payload.
    Check data->id.len != payload->id.len in the loop
---
---
 xen/common/xsplice.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index a8b208d..b5e2135 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -520,6 +520,8 @@ static int prepare_payload(struct payload *payload,
     sec = xsplice_elf_sec_by_name(elf, ELF_BUILD_ID_NOTE);
     if ( sec )
     {
+        const struct payload *data;
+
         n = sec->load_addr;
 
         if ( sec->sec->sh_size <= sizeof(*n) )
@@ -531,6 +533,20 @@ static int prepare_payload(struct payload *payload,
 
         if ( !payload->id.len || !payload->id.p )
             return -EINVAL;
+
+        /* Make sure it is not a duplicate. */
+        list_for_each_entry ( data, &payload_list, list )
+        {
+            /* No way _this_ payload is on the list. */
+            ASSERT(data != payload);
+            if ( data->id.len != payload->id.len ||
+                 !memcmp(data->id.p, payload->id.p, data->id.len) )
+            {
+                dprintk(XENLOG_DEBUG, XSPLICE "%s: Already loaded as %s!\n",
+                        elf->name, data->name);
+                return -EEXIST;
+            }
+        }
     }
 
     sec = xsplice_elf_sec_by_name(elf, ELF_XSPLICE_DEPENDS);
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v9 27/27] MAINTAINERS/xsplice: Add myself and Ross as the maintainers.
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (25 preceding siblings ...)
  2016-04-25 15:35 ` [PATCH v9 26/27] xsplice: Prevent duplicate payloads from being loaded Konrad Rzeszutek Wilk
@ 2016-04-25 15:35 ` Konrad Rzeszutek Wilk
  2016-04-25 15:41 ` [PATCH 9] xSplice v1 design and implementation Jan Beulich
  27 siblings, 0 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:35 UTC (permalink / raw)
  To: konrad, xen-devel, sasha.levin, andrew.cooper3, ross.lagerwall, mpohlack
  Cc: Keir Fraser, Tim Deegan, Ian Jackson, Jan Beulich, Konrad Rzeszutek Wilk

If you have a patch for xSplice send it our way!

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>

v5: Sort them F: fields (Jan)
v7: Added Andrew's Reviewed-by
---
---
 MAINTAINERS | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5af7a0c..de482ea 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -435,6 +435,16 @@ F:  xen/include/xsm/
 F:  xen/xsm/
 F:  docs/misc/xsm-flask.txt
 
+XSPLICE
+M:  Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
+M:  Ross Lagerwall <ross.lagerwall@citrix.com>
+S:  Supported
+F:  docs/misc/xsplice.markdown
+F:  tools/misc/xen-xsplice.c
+F:  xen/arch/*/xsplice*
+F:  xen/common/xsplice*
+F:  xen/include/xen/xsplice*
+
 THE REST
 M:	Andrew Cooper <andrew.cooper3@citrix.com>
 M:	George Dunlap <George.Dunlap@eu.citrix.com>
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [PATCH 9] xSplice v1 design and implementation.
  2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (26 preceding siblings ...)
  2016-04-25 15:35 ` [PATCH v9 27/27] MAINTAINERS/xsplice: Add myself and Ross as the maintainers Konrad Rzeszutek Wilk
@ 2016-04-25 15:41 ` Jan Beulich
  2016-04-25 15:47   ` Konrad Rzeszutek Wilk
  27 siblings, 1 reply; 90+ messages in thread
From: Jan Beulich @ 2016-04-25 15:41 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: ross.lagerwall, andrew.cooper3, mpohlack, sasha.levin, xen-devel

>>> On 25.04.16 at 17:34, <konrad.wilk@oracle.com> wrote:
> Hey!
> 
> Changelog:
> v8.1: http://lists.xen.org/archives/html/xen-devel/2016-04/msg01903.html 

Old changelog?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 9] xSplice v1 design and implementation.
  2016-04-25 15:41 ` [PATCH 9] xSplice v1 design and implementation Jan Beulich
@ 2016-04-25 15:47   ` Konrad Rzeszutek Wilk
  2016-04-25 15:54     ` Jan Beulich
  0 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-25 15:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Martin Pohlack, Ross Lagerwall, sasha.levin, Xen-devel

On Mon, Apr 25, 2016 at 11:41 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 25.04.16 at 17:34, <konrad.wilk@oracle.com> wrote:
>> Hey!
>>
>> Changelog:
>> v8.1: http://lists.xen.org/archives/html/xen-devel/2016-04/msg01903.html
>
> Old changelog?
>

It should have said: "Since v8.1:":
 Worked on Jan's comments.

I could enumerate _all_ of them here if you would like? But figured it
would be easier to have them patch-by-patch? What is your preference?
Both places?
And if here - enumerate also by patch number?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 01/27] Revert "libxc/libxl/python/xenstat/ocaml: Use new XEN_VERSION hypercall"
  2016-04-25 15:34 ` [PATCH v9 01/27] Revert "libxc/libxl/python/xenstat/ocaml: Use new XEN_VERSION hypercall" Konrad Rzeszutek Wilk
@ 2016-04-25 15:48   ` Jan Beulich
  2016-04-25 15:53     ` Wei Liu
  0 siblings, 1 reply; 90+ messages in thread
From: Jan Beulich @ 2016-04-25 15:48 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, Wei Liu, ross.lagerwall, andrew.cooper3,
	Ian Jackson, mpohlack, sasha.levin, xen-devel

>>> On 25.04.16 at 17:34, <konrad.wilk@oracle.com> wrote:
> This reverts commit d275ec9ca8a86f7c9c213f3551194d471ce90fbd.
> 
> As we prefer to still utilize the old XENVER_ hypercall.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Requested-and-acked-by: Jan Beulich <jbeulich@suse.com>

Either I replied to the wrong one last time round, or this went into
the wrong one - it really applies to 02/27. The one here needs a
tools maintainer's ack.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 01/27] Revert "libxc/libxl/python/xenstat/ocaml: Use new XEN_VERSION hypercall"
  2016-04-25 15:48   ` Jan Beulich
@ 2016-04-25 15:53     ` Wei Liu
  0 siblings, 0 replies; 90+ messages in thread
From: Wei Liu @ 2016-04-25 15:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, andrew.cooper3, Ian Jackson,
	mpohlack, ross.lagerwall, xen-devel, sasha.levin

On Mon, Apr 25, 2016 at 09:48:37AM -0600, Jan Beulich wrote:
> >>> On 25.04.16 at 17:34, <konrad.wilk@oracle.com> wrote:
> > This reverts commit d275ec9ca8a86f7c9c213f3551194d471ce90fbd.
> > 
> > As we prefer to still utilize the old XENVER_ hypercall.
> > 
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Requested-and-acked-by: Jan Beulich <jbeulich@suse.com>
> 
> Either I replied to the wrong one last time round, or this went into
> the wrong one - it really applies to 02/27. The one here needs a
> tools maintainer's ack.
> 

Thanks for prodding.

Acked-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH 9] xSplice v1 design and implementation.
  2016-04-25 15:47   ` Konrad Rzeszutek Wilk
@ 2016-04-25 15:54     ` Jan Beulich
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Beulich @ 2016-04-25 15:54 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Andrew Cooper, Martin Pohlack, Ross Lagerwall, sasha.levin, Xen-devel

>>> On 25.04.16 at 17:47, <konrad@kernel.org> wrote:
> On Mon, Apr 25, 2016 at 11:41 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 25.04.16 at 17:34, <konrad.wilk@oracle.com> wrote:
>>> Hey!
>>>
>>> Changelog:
>>> v8.1: http://lists.xen.org/archives/html/xen-devel/2016-04/msg01903.html 
>>
>> Old changelog?
>>
> 
> It should have said: "Since v8.1:":

Ah, okay. No need for anything else if detail are in each patch.

Jan

>  Worked on Jan's comments.
> 
> I could enumerate _all_ of them here if you would like? But figured it
> would be easier to have them patch-by-patch? What is your preference?
> Both places?
> And if here - enumerate also by patch number?




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 04/27] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
  2016-04-25 15:34 ` [PATCH v9 04/27] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op Konrad Rzeszutek Wilk
@ 2016-04-26  7:48   ` Ross Lagerwall
  2016-04-26  7:52   ` Ross Lagerwall
  2016-04-26 10:21   ` Jan Beulich
  2 siblings, 0 replies; 90+ messages in thread
From: Ross Lagerwall @ 2016-04-26  7:48 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, konrad, xen-devel, sasha.levin,
	andrew.cooper3, mpohlack
  Cc: Wei Liu, Daniel De Graaf, Stefano Stabellini, Ian Jackson

On 04/25/2016 04:34 PM, Konrad Rzeszutek Wilk wrote:
snip
> +static int xsplice_action(xen_sysctl_xsplice_action_t *action)
> +{
> +    struct payload *data;
> +    char n[XEN_XSPLICE_NAME_SIZE];
> +    int rc;
> +
> +    rc = get_name(&action->name, n);
> +    if ( rc )
> +        return rc;
> +
> +    spin_lock(&payload_lock);
> +
> +    data = find_payload(n);
> +    if ( IS_ERR_OR_NULL(data) )
> +    {
> +        spin_unlock(&payload_lock);
> +
> +        if ( !data )
> +            return -ENOENT;
> +
> +        return PTR_ERR(data);
> +    }
> +
> +    switch ( action->cmd )
> +    {
> +    case XSPLICE_ACTION_CHECK:
> +        if ( data->state == XSPLICE_STATE_CHECKED )
> +        {
> +            /* No implementation yet. */
> +            data->state = XSPLICE_STATE_CHECKED;
> +            data->rc = 0;
> +        } else
> +            rc = -EINVAL;
> +        break;

If the payload goes straight into the CHECKED stated, then it doesn't 
make sense to have the above case.

-- 
Ross Lagerwall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 06/27] xen-xsplice: Tool to manipulate xsplice payloads
  2016-04-25 15:34 ` [PATCH v9 06/27] xen-xsplice: Tool to manipulate xsplice payloads Konrad Rzeszutek Wilk
@ 2016-04-26  7:49   ` Ross Lagerwall
  0 siblings, 0 replies; 90+ messages in thread
From: Ross Lagerwall @ 2016-04-26  7:49 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, konrad, xen-devel, sasha.levin,
	andrew.cooper3, mpohlack
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson

On 04/25/2016 04:34 PM, Konrad Rzeszutek Wilk wrote:
snip
> diff --git a/tools/misc/xen-xsplice.c b/tools/misc/xen-xsplice.c
> new file mode 100644
> index 0000000..fb9228e
> --- /dev/null
> +++ b/tools/misc/xen-xsplice.c
> @@ -0,0 +1,463 @@
> +/*
> + * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
> + */
> +
> +#include <fcntl.h>
> +#include <libgen.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/mman.h>
> +#include <sys/stat.h>
> +#include <unistd.h>
> +#include <xenctrl.h>
> +#include <xenstore.h>
> +
> +static xc_interface *xch;
> +
> +void show_help(void)
> +{
> +    fprintf(stderr,
> +            "xen-xsplice: Xsplice test tool\n"
> +            "Usage: xen-xsplice <command> [args]\n"
> +            " <name> An unique name of payload. Up to %d characters.\n"
> +            "Commands:\n"
> +            "  help                   display this help\n"
> +            "  upload <name> <file>   upload file <file> with <name> name\n"
> +            "  list                   list payloads uploaded.\n"
> +            "  apply <name>           apply <name> patch.\n"
> +            "  revert <name>          revert name <name> patch.\n"
> +            "  replace <name>         apply <name> patch and revert all others.\n"
> +            "  unload <name>          unload name <name> patch.\n"
> +            "  load  <file>           upload, check and apply <file>.\n"

Since the check command is removed, this should be "upload and apply 
<file>..."

-- 
Ross Lagerwall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 05/27] libxc: Implementation of XEN_XSPLICE_op in libxc
  2016-04-25 15:34 ` [PATCH v9 05/27] libxc: Implementation of XEN_XSPLICE_op in libxc Konrad Rzeszutek Wilk
@ 2016-04-26  7:51   ` Ross Lagerwall
  0 siblings, 0 replies; 90+ messages in thread
From: Ross Lagerwall @ 2016-04-26  7:51 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, konrad, xen-devel, sasha.levin,
	andrew.cooper3, mpohlack
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson

On 04/25/2016 04:34 PM, Konrad Rzeszutek Wilk wrote:
snip
> +
> +/*
> + * The operations are asynchronous and the hypervisor may take a while
> + * to complete them. The `timeout` offers an option to expire the
> + * operation if it could not be completed within the specified time
> + * (in ms). Value of 0 means let hypervisor decide the best timeout.
> + */
> +int xc_xsplice_apply(xc_interface *xch, char *name, uint32_t timeout);
> +int xc_xsplice_revert(xc_interface *xch, char *name, uint32_t timeout);
> +int xc_xsplice_unload(xc_interface *xch, char *name, uint32_t timeout);
> +int xc_xsplice_check(xc_interface *xch, char *name, uint32_t timeout);

Drop xc_xsplice_check?

> +int xc_xsplice_replace(xc_interface *xch, char *name, uint32_t timeout);
> +
>   /* Compat shims */
>   #include "xenctrl_compat.h"
>
snip
> +
> +int xc_xsplice_check(xc_interface *xch, char *name, uint32_t timeout)
> +{
> +    return _xc_xsplice_action(xch, name, XSPLICE_ACTION_CHECK, timeout);
> +}

And this?

-- 
Ross Lagerwall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 04/27] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
  2016-04-25 15:34 ` [PATCH v9 04/27] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op Konrad Rzeszutek Wilk
  2016-04-26  7:48   ` Ross Lagerwall
@ 2016-04-26  7:52   ` Ross Lagerwall
  2016-04-26 10:21   ` Jan Beulich
  2 siblings, 0 replies; 90+ messages in thread
From: Ross Lagerwall @ 2016-04-26  7:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, konrad, xen-devel, sasha.levin,
	andrew.cooper3, mpohlack
  Cc: Wei Liu, Daniel De Graaf, Stefano Stabellini, Ian Jackson

On 04/25/2016 04:34 PM, Konrad Rzeszutek Wilk wrote:
> +/*
> + * Perform an operation on the payload structure referenced by the `name` field.
> + * The operation request is asynchronous and the status should be retrieved
> + * by using either XEN_SYSCTL_XSPLICE_GET or XEN_SYSCTL_XSPLICE_LIST hypercall.
> + */
> +#define XEN_SYSCTL_XSPLICE_ACTION 3
> +struct xen_sysctl_xsplice_action {
> +    xen_xsplice_name_t name;                /* IN, name of the patch. */
> +#define XSPLICE_ACTION_CHECK        1
> +#define XSPLICE_ACTION_UNLOAD       2
> +#define XSPLICE_ACTION_REVERT       3
> +#define XSPLICE_ACTION_APPLY        4
> +#define XSPLICE_ACTION_REPLACE      5
> +    uint32_t cmd;                           /* IN: XSPLICE_ACTION_*. */
> +    uint32_t timeout;                       /* IN: Zero if no timeout. */
> +                                            /* Or upper bound of time (ms) */
> +                                            /* for operation to take. */

I guess XSPLICE_ACTION_CHECK should also be removed and XSPLICE_ACTION_* 
renumbered.

-- 
Ross Lagerwall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 10/27] xsplice: Add helper elf routines
  2016-04-25 15:34 ` [PATCH v9 10/27] xsplice: Add helper elf routines Konrad Rzeszutek Wilk
@ 2016-04-26 10:05   ` Ross Lagerwall
  2016-04-26 11:52     ` Jan Beulich
  2016-04-26 12:37   ` Jan Beulich
  1 sibling, 1 reply; 90+ messages in thread
From: Ross Lagerwall @ 2016-04-26 10:05 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, konrad, sasha.levin, andrew.cooper3, mpohlack
  Cc: xen-devel, Keir Fraser, Ian Jackson, Jan Beulich, Tim Deegan

On 04/25/2016 04:34 PM, Konrad Rzeszutek Wilk wrote:
snip
> +static int xsplice_header_check(const struct xsplice_elf *elf)
> +{
> +    const Elf_Ehdr *hdr = elf->hdr;
> +
> +    if ( sizeof(*elf->hdr) > elf->len )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Section header is bigger than payload!\n",
> +                elf->name);
> +        return -EINVAL;
> +    }
> +
> +    if ( !IS_ELF(*hdr) )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Not an ELF payload!\n", elf->name);
> +        return -EINVAL;
> +    }
> +
> +    if ( hdr->e_ident[EI_CLASS] != ELFCLASS64 ||
> +         hdr->e_ident[EI_DATA] != ELFDATA2LSB ||
> +         hdr->e_ident[EI_OSABI] != ELFOSABI_SYSV ||
> +         hdr->e_type != ET_REL ||
> +         hdr->e_phnum != 0 )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Invalid ELF payload!\n", elf->name);
> +        return -EOPNOTSUPP;
> +    }
> +
> +    if ( elf->hdr->e_shstrndx == SHN_UNDEF )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Section name idx is undefined!?\n",
> +                elf->name);
> +        return -EINVAL;
> +    }
> +
> +    /* Check that section name index is within the sections. */
> +    if ( elf->hdr->e_shstrndx >= elf->hdr->e_shnum )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Section name idx (%u) is past end of sections (%u)!\n",
> +                elf->name, elf->hdr->e_shstrndx, elf->hdr->e_shnum);
> +        return -EINVAL;
> +    }
> +
> +    if ( elf->hdr->e_shnum > 64 )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Too many (%u) sections!\n",
> +                elf->name, elf->hdr->e_shnum);
> +        return -EOPNOTSUPP;
> +    }

If I recall correctly, Andrew asked you to add this check. Due to 
compiling with -ffunction-sections -fdata-sections, the build tool can 
quite easily exceed this limit. IMO the check doesn't serve any useful 
purpose and should be removed.

-- 
Ross Lagerwall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 04/27] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
  2016-04-25 15:34 ` [PATCH v9 04/27] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op Konrad Rzeszutek Wilk
  2016-04-26  7:48   ` Ross Lagerwall
  2016-04-26  7:52   ` Ross Lagerwall
@ 2016-04-26 10:21   ` Jan Beulich
  2016-04-26 17:50     ` Konrad Rzeszutek Wilk
  2 siblings, 1 reply; 90+ messages in thread
From: Jan Beulich @ 2016-04-26 10:21 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, Wei Liu, ross.lagerwall, andrew.cooper3,
	Ian Jackson, mpohlack, sasha.levin, xen-devel, Daniel De Graaf

>>> On 25.04.16 at 17:34, <konrad.wilk@oracle.com> wrote:
> The implementation does not actually do any patching.
> 
> It just adds the framework for doing the hypercalls,
> keeping track of ELF payloads, and the basic operations:
>  - query which payloads exist,
>  - query for specific payloads,
>  - check*1, apply*1, replace*1, and unload payloads.
> 
> *1: Which of course in this patch are nops.
> 
> The functionality is disabled on ARM until all arch
> components are implemented.
> 
> Also by default it is disabled until the implementation
> is in place.
> 
> We also use recursive spinlocks to so that the find_payload
> function does not need to have a 'lock' and 'non-lock' variant.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>

I'm hesitant to say that, but with all of this:

> v9:
>     s/find_name/get_name/, drop locks when allocating data.
>     Drop conditional expression on copyback
>     Move the allocation on upload outside the spinlock.
>     Add (TECH PREVIEW) to the Kconfig help
>     Return -EINVAL if the CHECK or UNLOAD action is to be performed and the payload
>     state is not in expected state.
>     Print 'c' not 'u' when invoking the keyhandler.

... I'm not sure the earlier R-b can still be considered valid. Andrew?

> +static int get_name(const xen_xsplice_name_t *name, char *n)
> +{
> +    if ( !name->size || name->size > XEN_XSPLICE_NAME_SIZE )
> +        return -EINVAL;
> +
> +    if ( name->pad[0] || name->pad[1] || name->pad[2] )
> +        return -EINVAL;
> +
> +    if ( !guest_handle_okay(name->name, name->size) )
> +        return -EINVAL;
> +
> +    if ( __copy_from_guest(n, name->name, name->size) )
> +        return -EFAULT;

Quoting part of my v8.1 reply:
"Is there a particular reason why you open code copy_from_guest() here?"

> +static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
> +{
> +    struct payload *data, *found;
> +    char n[XEN_XSPLICE_NAME_SIZE];
> +    int rc;
> +
> +    rc = verify_payload(upload, n);
> +    if ( rc )
> +        return rc;
> +
> +    data = xzalloc(struct payload);
> +
> +    spin_lock(&payload_lock);
> +
> +    found = find_payload(n);
> +    if ( IS_ERR(found) )
> +    {
> +        rc = PTR_ERR(found);
> +        goto out;
> +    }
> +    else if ( found )
> +    {
> +        rc = -EEXIST;
> +        goto out;
> +    }
> +
> +    if ( !data )
> +    {
> +        rc = -ENOMEM;
> +        goto out;
> +    }
> +
> +    rc = 0;

rc is already zero by the time we get here.

I also wonder whether the code wouldn't be easier to read if you
used just a sequence of if()/else if() here, without any goto-s.

> +static int xsplice_action(xen_sysctl_xsplice_action_t *action)
> +{
> +    struct payload *data;
> +    char n[XEN_XSPLICE_NAME_SIZE];
> +    int rc;
> +
> +    rc = get_name(&action->name, n);
> +    if ( rc )
> +        return rc;
> +
> +    spin_lock(&payload_lock);
> +
> +    data = find_payload(n);
> +    if ( IS_ERR_OR_NULL(data) )
> +    {
> +        spin_unlock(&payload_lock);
> +
> +        if ( !data )
> +            return -ENOENT;
> +
> +        return PTR_ERR(data);
> +    }
> +
> +    switch ( action->cmd )
> +    {
> +    case XSPLICE_ACTION_CHECK:
> +        if ( data->state == XSPLICE_STATE_CHECKED )
> +        {
> +            /* No implementation yet. */
> +            data->state = XSPLICE_STATE_CHECKED;
> +            data->rc = 0;
> +        } else
> +            rc = -EINVAL;
> +        break;

While according to Ross it looks like this is going to go away anyway,
...

> +    case XSPLICE_ACTION_UNLOAD:
> +        if ( data->state == XSPLICE_STATE_CHECKED )
> +        {
> +            free_payload(data);
> +            /* No touching 'data' from here on! */
> +            data = NULL;
> +        } else

... there's a coding style issue here (and also above if that code is
to stay).

> +static const char *state2str(uint32_t state)

Preferably "unsigned int".

> +{
> +#define STATE(x) [XSPLICE_STATE_##x] = #x
> +    static const char *const names[] = {
> +            STATE(CHECKED),
> +            STATE(APPLIED),
> +    };
> +#undef STATE
> +
> +    if (state >= ARRAY_SIZE(names) || !names[state])

Missing blanks.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 07/27] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables lookup.
  2016-04-25 15:34 ` [PATCH v9 07/27] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables lookup Konrad Rzeszutek Wilk
@ 2016-04-26 10:31   ` Jan Beulich
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Beulich @ 2016-04-26 10:31 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, Keir Fraser, ross.lagerwall, andrew.cooper3,
	mpohlack, Julien Grall, sasha.levin, xen-devel

>>> On 25.04.16 at 17:34, <konrad.wilk@oracle.com> wrote:
> --- /dev/null
> +++ b/xen/common/virtual_region.c
> @@ -0,0 +1,148 @@
> +/*
> + * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
> + *
> + */
> +
> +#include <xen/init.h>
> +#include <xen/kernel.h>
> +#include <xen/rcupdate.h>
> +#include <xen/spinlock.h>
> +#include <xen/virtual_region.h>
> +
> +static struct virtual_region core = {
> +    .list = LIST_HEAD_INIT(core.list),
> +    .start = _stext,
> +    .end = _etext,
> +};
> +
> +/* Becomes irrelevant when __init sections are cleared. */
> +static struct virtual_region core_init __initdata = {
> +    .list = LIST_HEAD_INIT(core_init.list),
> +    .start = _sinittext,
> +    .end = _einittext,
> +};
> +
> +/*
> + * RCU locking. Additions are done either at startup (when there is only
> + * one CPU) or when all CPUs are running without IRQs.
> + *
> + * Deletions are bit tricky. We do it when xSplicing (all CPUs running
> + * without IRQs) or during bootup (when clearing the init).
> + *
> + * Hence we use list_del_rcu (which sports an memory fence) and a spinlock
> + * on deletion.
> + *
> + * All readers of virtual_region_list MUST use list list_for_each_entry_rcu.

Stray "list"?

> + *
> + */

For v8.1 I said "Stray empty comment line." And I notice there's
another one right at the top of the file.

And for v8.1 I also said "With those minor aspects taken care of,
Acked-by: Jan Beulich <jbeulich@suse.com>" which I can only repeat
now.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 08/27] arm/x86/vmap: Add v[z|m]alloc_xen and vm_init_type
  2016-04-25 15:34 ` [PATCH v9 08/27] arm/x86/vmap: Add v[z|m]alloc_xen and vm_init_type Konrad Rzeszutek Wilk
@ 2016-04-26 10:47   ` Jan Beulich
  2016-04-27  2:38     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 90+ messages in thread
From: Jan Beulich @ 2016-04-26 10:47 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, Keir Fraser, ross.lagerwall, andrew.cooper3,
	Ian Jackson, Tim Deegan, mpohlack, Julien Grall, sasha.levin,
	xen-devel

>>> On 25.04.16 at 17:34, <konrad.wilk@oracle.com> wrote:
> For those users who want to use the virtual addresses that
> are in the hypervisor's code/data region address space -
> these three new functions allow that.
> 
> Implementation wise the vmap API keeps track of two virtual
> address regions now:
>  a) VMAP_VIRT_START
>  b) Any provided virtual address space (need start and end).
> 
> The a) one is the default one and the existing behavior
> for users of vmalloc, vmap, etc is the same.
> 
> If however one wishes to use the b) one only has to use
> the vm_init_type to initialize and the v[m|z]alloc_xen to utilize
> it (vfree is capable of searching both address spaces).
> 
> This allows users (such as xSplice) to provide their own
> mechanism to change the the page flags, and also use virtual
> addresses closer to the hypervisor virtual addresses (at least
> on x86) while not having to deal with the allocation of
> pages.
> 
> For example of users, see patch titled "xsplice: Implement payload
> loading", where we parse the payload's ELF relocations - which
> is defined to be signed 32-bit (on x86) (max displacement hence
> is 2GB virtual space, ARM32 is 128MB). The displacement of the
> hypervisor virtual addresses to the vmalloc (on x86)
> is more than 32-bits - which means that ELF relocations would
> truncate the 34 and 33th bit. Hence this alternate API.
> 
> We also add add extra checks in case the b) range has not been
> initialized.
> 
> Part of this patch also removes 'vm_alloc' decleration as
> we do not have any users of it - and if there ever will be - we
> will have to expose and vm_alloc_xen variant.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Suggested-by: Jan Beulich <jbeulich@suse.com>
> Acked-by: Julien Grall <julien.grall@arm.com> [ARM]
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

And I again wonder whether this really holds with the changes
done, the more that I had questioned it on v8.1 already.

> Acked-by: Jan Beulich <jbeulich@suse.com>

According to my outbox this likely belongs on another patch.

> --- a/xen/common/virtual_region.c
> +++ b/xen/common/virtual_region.c
> @@ -33,7 +33,6 @@ static struct virtual_region core_init __initdata = {
>   * on deletion.
>   *
>   * All readers of virtual_region_list MUST use list list_for_each_entry_rcu.
> - *
>   */
>  static LIST_HEAD(virtual_region_list);
>  static DEFINE_SPINLOCK(virtual_region_lock);

Wrong patch.

> @@ -52,27 +55,31 @@ void *vm_alloc(unsigned int nr, unsigned int align)
>      else if ( align & (align - 1) )
>          align &= -align;
>  
> +    ASSERT(t != VMAP_REGION_NR);

"t is in range" really means >= 0 and < VMAP_REGION_NR.

> +void vm_free(const void *va)
> +{
> +    vm_free_type(va, VMAP_DEFAULT);
> +}

I'm not really happy about this and ...

> +void vunmap(const void *va)
> +{
> +    vunmap_type(va, VMAP_DEFAULT);
> +}

... this. Just like vfree() they should derive the type from the address.

> --- a/xen/include/xen/vmap.h
> +++ b/xen/include/xen/vmap.h
> @@ -4,15 +4,26 @@
>  #include <xen/mm.h>
>  #include <asm/page.h>
>  
> -void *vm_alloc(unsigned int nr, unsigned int align);
> +enum vmap_region {
> +    VMAP_DEFAULT,
> +    VMAP_XEN,
> +    VMAP_REGION_NR,
> +};
> +
> +void vm_init_type(enum vmap_region type, void *start, void *end);
> +
>  void vm_free(const void *);

With vm_alloc() getting removed, vm_free() should get removed
here too. And with that, vm_alloc_type() and vm_free_type() can
then just become vm_alloc() and vm_free() respectively (as static
internal functions).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 11/27] xsplice: Implement payload loading
  2016-04-25 15:34 ` [PATCH v9 11/27] xsplice: Implement payload loading Konrad Rzeszutek Wilk
@ 2016-04-26 10:48   ` Ross Lagerwall
  2016-04-26 13:39   ` Jan Beulich
  1 sibling, 0 replies; 90+ messages in thread
From: Ross Lagerwall @ 2016-04-26 10:48 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, konrad, xen-devel, sasha.levin,
	andrew.cooper3, mpohlack
  Cc: Julien Grall, Stefano Stabellini, Keir Fraser, Jan Beulich

On 04/25/2016 04:34 PM, Konrad Rzeszutek Wilk wrote:
snip
> +static int move_payload(struct payload *payload, struct xsplice_elf *elf)
> +{
> +    uint8_t *text_buf, *ro_buf, *rw_buf;
> +    unsigned int i;
> +    size_t size = 0;
> +    unsigned int *offset;
> +    int rc = 0;
> +
> +    offset = xzalloc_array(unsigned int, elf->hdr->e_shnum);
> +    if ( !offset )
> +        return -ENOMEM;
> +
> +    /* Compute size of different regions. */
> +    for ( i = 1; i < elf->hdr->e_shnum; i++ )
> +    {
> +        if ( (elf->sec[i].sec->sh_flags & (SHF_ALLOC|SHF_EXECINSTR)) ==
> +             (SHF_ALLOC|SHF_EXECINSTR) )
> +            calc_section(&elf->sec[i], &payload->text_size, &offset[i]);
> +        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> +                  !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
> +                  (elf->sec[i].sec->sh_flags & SHF_WRITE) )
> +            calc_section(&elf->sec[i], &payload->rw_size, &offset[i]);
> +        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> +                  !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
> +                  !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
> +            calc_section(&elf->sec[i], &payload->ro_size, &offset[i]);
> +        else if ( !elf->sec[i].sec->sh_flags ||
> +                  (elf->sec[i].sec->sh_flags & SHF_EXECINSTR) ||
> +                  (elf->sec[i].sec->sh_flags & SHF_MASKPROC) )
> +            /* Do nothing.*/;
> +        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> +                  (elf->sec[i].sec->sh_type == SHT_NOBITS) )
> +        {
> +            dprintk(XENLOG_DEBUG, XSPLICE "%s: Not supporting %s section!\n",
> +                    elf->name, elf->sec[i].name);
> +            rc = -EOPNOTSUPP;
> +            goto out;
> +        }
> +        else /* Such as .comment. */
> +            dprintk(XENLOG_DEBUG, XSPLICE "%s: Ignoring %s section!\n",
> +                    elf->name, elf->sec[i].name);
> +    }
> +
> +    /*
> +     * Total of all three regions - RX, RW, and RO. We have to have
> +     * keep them in seperate pages so we PAGE_ALIGN the RX and RW to have
> +     * them on seperate pages. The last one will by default fall on its
> +     * own page.
> +     */
> +    size = PAGE_ALIGN(payload->text_size) + PAGE_ALIGN(payload->rw_size) +
> +                      payload->ro_size;
> +
> +    size = PFN_UP(size); /* Nr of pages. */
> +    text_buf = vzalloc_xen(size * PAGE_SIZE);
> +    if ( !text_buf )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Could not allocate memory for payload!\n",
> +                elf->name);
> +        rc = -ENOMEM;
> +        goto out;
> +    }
> +    rw_buf = text_buf +  + PAGE_ALIGN(payload->text_size);

???

The rarely used unary plus operator :-)

> +    ro_buf = rw_buf + PAGE_ALIGN(payload->rw_size);
> +
> +    payload->pages = size;
> +    payload->text_addr = text_buf;
> +    payload->rw_addr = rw_buf;
> +    payload->ro_addr = ro_buf;
> +
> +    for ( i = 1; i < elf->hdr->e_shnum; i++ )
> +    {
> +        if ( elf->sec[i].sec->sh_flags & SHF_ALLOC )
> +        {
> +            uint8_t *buf;
> +            if ( (elf->sec[i].sec->sh_flags & SHF_EXECINSTR) )
> +                buf = text_buf;
> +            else if ( (elf->sec[i].sec->sh_flags & SHF_WRITE) )
> +                buf = rw_buf;
> +             else
> +                buf = ro_buf;
> +
> +            elf->sec[i].load_addr = buf + offset[i];
> +
> +            /*
> +             * Don't copy NOBITS - such as BSS. We don't memset BSS as
> +             * arch_xsplice_alloc_payload has zeroed it out for us.
> +             */
> +            if ( elf->sec[i].sec->sh_type != SHT_NOBITS )
> +            {
> +                memcpy(elf->sec[i].load_addr, elf->sec[i].data,
> +                       elf->sec[i].sec->sh_size);
> +                dprintk(XENLOG_DEBUG, XSPLICE "%s: Loaded %s at %p\n",
> +                        elf->name, elf->sec[i].name, elf->sec[i].load_addr);

-- 
Ross Lagerwall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 17/27] xsplice: Add support for bug frames.
  2016-04-25 15:35 ` [PATCH v9 17/27] xsplice: Add support for bug frames Konrad Rzeszutek Wilk
@ 2016-04-26 11:05   ` Ross Lagerwall
  2016-04-26 13:08     ` Ross Lagerwall
  2016-04-26 15:58   ` Jan Beulich
  1 sibling, 1 reply; 90+ messages in thread
From: Ross Lagerwall @ 2016-04-26 11:05 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, konrad, xen-devel, sasha.levin,
	andrew.cooper3, mpohlack
  Cc: Keir Fraser, Jan Beulich

On 04/25/2016 04:35 PM, Konrad Rzeszutek Wilk wrote:
snip
> diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
> index 72a3b88..11b19dd 100644
> --- a/xen/common/xsplice.c
> +++ b/xen/common/xsplice.c
> @@ -123,6 +123,35 @@ static int verify_payload(const xen_sysctl_xsplice_upload_t *upload, char *n)
>       return 0;
>   }
>
> +bool_t is_patch(const void *ptr)
> +{
> +    const struct payload *data;
> +    bool_t r = 0;
> +
> +    /*
> +     * Only RCU locking since this list is only ever changed during apply
> +     * or revert context. And in case it dies there we need an safe list.
> +     */
> +    rcu_read_lock(&rcu_applied_lock);
> +    list_for_each_entry_rcu ( data, &applied_list, applied_list )
> +    {
> +        if ( (ptr >= data->rw_addr &&
> +              ptr < (data->rw_addr + data->rw_size)) ||
> +             (ptr >= data->ro_addr &&
> +              ptr < (data->ro_addr + data->ro_size)) ||
> +             (ptr >= data->text_addr &&
> +              ptr < (data->text_addr + data->text_size)) )

The above 3 calculations are wrong due to the use of void *.

-- 
Ross Lagerwall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 16/27] x86, xsplice: Print payload's symbol name and payload name in backtraces
  2016-04-25 15:35 ` [PATCH v9 16/27] x86, xsplice: Print payload's symbol name and payload name in backtraces Konrad Rzeszutek Wilk
@ 2016-04-26 11:06   ` Ross Lagerwall
  2016-04-26 12:41     ` Jan Beulich
  0 siblings, 1 reply; 90+ messages in thread
From: Ross Lagerwall @ 2016-04-26 11:06 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, konrad, xen-devel, sasha.levin,
	andrew.cooper3, mpohlack
  Cc: Keir Fraser, Ian Jackson, Jan Beulich, Tim Deegan

On 04/25/2016 04:35 PM, Konrad Rzeszutek Wilk wrote:
snip
> +static DEFINE_RCU_READ_LOCK(rcu_applied_lock);
>   static LIST_HEAD(applied_list);
>
>   static unsigned int payload_cnt;
> @@ -56,6 +57,8 @@ struct payload {
>       unsigned int nfuncs;                 /* Nr of functions to patch. */
>       const struct xsplice_symbol *symtab; /* All symbols. */
>       const char *strtab;                  /* Pointer to .strtab. */
> +    struct virtual_region region;        /* symbol, bug.frame patching and
> +                                            exception table (x86). */
>       unsigned int nsyms;                  /* Nr of entries in .strtab and symbols. */
>       char name[XEN_XSPLICE_NAME_SIZE];    /* Name of it. */
>   };
> @@ -142,6 +145,55 @@ void *xsplice_symbols_lookup_by_name(const char *symname)
>       return 0;
>   }
>
> +static const char *xsplice_symbols_lookup(unsigned long addr,
> +                                          unsigned long *symbolsize,
> +                                          unsigned long *offset,
> +                                          char *namebuf)
> +{
> +    const struct payload *data;
> +    unsigned int i, best;
> +    const void *va = (const void *)addr;
> +    const char *n = NULL;
> +
> +    /*
> +     * Only RCU locking since this list is only ever changed during apply
> +     * or revert context. And in case it dies there we need an safe list.
> +     */
> +    rcu_read_lock(&rcu_applied_lock);
> +    list_for_each_entry_rcu ( data, &applied_list, applied_list )
> +    {
> +        if ( va < data->text_addr &&
> +             va >= (data->text_addr + data->pages * PAGE_SIZE) )

This calculation is wrong due to the use of void * and results in 
incorrect backtrace results.

You also need to have || rather than &&.

Additionally, I think it should use data->text_size rather than 
data->pages * PAGE_SIZE.

> +            continue;
> +
> +        best = UINT_MAX;
> +
> +        for ( i = 0; i < data->nsyms; i++ )
> +        {
> +            if ( data->symtab[i].value <= va &&
> +                 (best == UINT_MAX ||
> +                  data->symtab[best].value < data->symtab[i].value) )
> +                best = i;
> +        }
> +
> +        if ( best == UINT_MAX )
> +            break;
> +
> +        if ( symbolsize )
> +            *symbolsize = data->symtab[best].size;
> +        if ( offset )
> +            *offset = va - data->symtab[best].value;
> +        if ( namebuf )
> +            strlcpy(namebuf, data->name, KSYM_NAME_LEN);
> +
> +        n = data->symtab[best].name;
> +        break;
> +    }
> +    rcu_read_unlock(&rcu_applied_lock);
> +
> +    return n;
> +}
> +
>   static struct payload *find_payload(const char *name)
>   {
>       struct payload *data, *found = NULL;
> @@ -366,6 +418,7 @@ static int prepare_payload(struct payload *payload,
>       const struct xsplice_elf_sec *sec;
>       unsigned int i;
>       struct xsplice_patch_func *f;
> +    struct virtual_region *region;
>
>       sec = xsplice_elf_sec_by_name(elf, ELF_XSPLICE_FUNC);
>       ASSERT(sec);
> @@ -422,6 +475,13 @@ static int prepare_payload(struct payload *payload,
>           }
>       }
>
> +    /* Setup the virtual region with proper data. */
> +    region = &payload->region;
> +
> +    region->symbols_lookup = xsplice_symbols_lookup;
> +    region->start = payload->text_addr;
> +    region->end = payload->text_addr + payload->text_size;

This calculation is wrong due to the use of void *.

-- 
Ross Lagerwall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 10/27] xsplice: Add helper elf routines
  2016-04-26 10:05   ` Ross Lagerwall
@ 2016-04-26 11:52     ` Jan Beulich
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Beulich @ 2016-04-26 11:52 UTC (permalink / raw)
  To: Ross Lagerwall, Konrad Rzeszutek Wilk
  Cc: Keir Fraser, andrew.cooper3, Ian Jackson, Tim Deegan, mpohlack,
	sasha.levin, xen-devel

>>> On 26.04.16 at 12:05, <ross.lagerwall@citrix.com> wrote:
> On 04/25/2016 04:34 PM, Konrad Rzeszutek Wilk wrote:
> snip
>> +static int xsplice_header_check(const struct xsplice_elf *elf)
>> +{
>> +    const Elf_Ehdr *hdr = elf->hdr;
>> +
>> +    if ( sizeof(*elf->hdr) > elf->len )
>> +    {
>> +        dprintk(XENLOG_ERR, XSPLICE "%s: Section header is bigger than 
> payload!\n",
>> +                elf->name);
>> +        return -EINVAL;
>> +    }
>> +
>> +    if ( !IS_ELF(*hdr) )
>> +    {
>> +        dprintk(XENLOG_ERR, XSPLICE "%s: Not an ELF payload!\n", elf->name);
>> +        return -EINVAL;
>> +    }
>> +
>> +    if ( hdr->e_ident[EI_CLASS] != ELFCLASS64 ||
>> +         hdr->e_ident[EI_DATA] != ELFDATA2LSB ||
>> +         hdr->e_ident[EI_OSABI] != ELFOSABI_SYSV ||
>> +         hdr->e_type != ET_REL ||
>> +         hdr->e_phnum != 0 )
>> +    {
>> +        dprintk(XENLOG_ERR, XSPLICE "%s: Invalid ELF payload!\n", elf->name);
>> +        return -EOPNOTSUPP;
>> +    }
>> +
>> +    if ( elf->hdr->e_shstrndx == SHN_UNDEF )
>> +    {
>> +        dprintk(XENLOG_ERR, XSPLICE "%s: Section name idx is 
> undefined!?\n",
>> +                elf->name);
>> +        return -EINVAL;
>> +    }
>> +
>> +    /* Check that section name index is within the sections. */
>> +    if ( elf->hdr->e_shstrndx >= elf->hdr->e_shnum )
>> +    {
>> +        dprintk(XENLOG_ERR, XSPLICE "%s: Section name idx (%u) is past end 
> of sections (%u)!\n",
>> +                elf->name, elf->hdr->e_shstrndx, elf->hdr->e_shnum);
>> +        return -EINVAL;
>> +    }
>> +
>> +    if ( elf->hdr->e_shnum > 64 )
>> +    {
>> +        dprintk(XENLOG_ERR, XSPLICE "%s: Too many (%u) sections!\n",
>> +                elf->name, elf->hdr->e_shnum);
>> +        return -EOPNOTSUPP;
>> +    }
> 
> If I recall correctly, Andrew asked you to add this check. Due to 
> compiling with -ffunction-sections -fdata-sections, the build tool can 
> quite easily exceed this limit. IMO the check doesn't serve any useful 
> purpose and should be removed.

Well, it certainly serves the purpose of subsequent things not taking
overly long, but I'd be fine with the limit bumped. And the check can't
go away altogether anyway - at the very least you need to check
against SHN_LORESERVE.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 10/27] xsplice: Add helper elf routines
  2016-04-25 15:34 ` [PATCH v9 10/27] xsplice: Add helper elf routines Konrad Rzeszutek Wilk
  2016-04-26 10:05   ` Ross Lagerwall
@ 2016-04-26 12:37   ` Jan Beulich
  2016-04-27  1:59     ` Konrad Rzeszutek Wilk
  2016-04-27  4:06     ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 90+ messages in thread
From: Jan Beulich @ 2016-04-26 12:37 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, Ian Jackson,
	Tim Deegan, mpohlack, sasha.levin, xen-devel

>>> On 25.04.16 at 17:34, <konrad.wilk@oracle.com> wrote:
> From: Ross Lagerwall <ross.lagerwall@citrix.com>
> 
> Add Elf routines and data structures in preparation for loading an
> xSplice payload.
> 
> We make an assumption that the max number of sections an ELF payload
> can have is 64. We can in future make this be dependent on the
> names of the sections and verifying against a list, but for right now
> this suffices.
> 
> Also we a whole lot of checks to make sure that the ELF payload
> file is not corrupted nor that the offsets point past the file.
> 
> For most of the checks we print an message if the hypervisor is built
> with debug enabled.
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
> Reviewed-by: Andrew Cooper<andrew.cooper3@citrix.com>

Again ...

> v9: Changed elf_verify_strtab to use const char and return EINVAL.
>     Remove 'if ( !delta )' check in elf_resolve_sections
>     Remove stale comments.
>     Fixed one off check against  sh_link.
>     Document boundary checks against shstrtab and symtab.
>     Fixed return codes in xsplice_header_check.
>     Add check for sections to not be within ELF header.
>     Added overflow check for e_shoff in xsplice_header_check.
>     Moved XSPLICE macro by four tabs.
>     Make ->sym be const.

... way too many changes for pre-existing tags to stay, at least
for my taste.

> +static int elf_resolve_sections(struct xsplice_elf *elf, const void *data)
> +{
> +    struct xsplice_elf_sec *sec;
> +    unsigned int i;
> +    Elf_Off delta;
> +    int rc;
> +
> +    /* xsplice_elf_load sanity checked e_shnum. */
> +    sec = xmalloc_array(struct xsplice_elf_sec, elf->hdr->e_shnum);
> +    if ( !sec )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE"%s: Could not allocate memory for section table!\n",
> +               elf->name);
> +        return -ENOMEM;
> +    }
> +
> +    elf->sec = sec;
> +
> +    /* e_shoff and e_shnum overflow checks are done in xsplice_header_check. */
> +    delta = elf->hdr->e_shoff + elf->hdr->e_shnum * elf->hdr->e_shentsize;

The added comment just helps make obvious that the overflow I
believe Andrew was worried about is still not being taken care of:
All xsplice_header_check() does is range check the two values
mentioned in the comment. But I agree that a proper range check
(at once eliminating overflow concerns for the arithmetic here)
would better live there (and also see there).

> +    if ( delta > elf->len )
> +    {
> +            dprintk(XENLOG_ERR, XSPLICE "%s: Section table is past end of payload!\n",
> +                    elf->name);
> +            return -EINVAL;
> +    }
> +
> +    for ( i = 1; i < elf->hdr->e_shnum; i++ )
> +    {
> +        delta = elf->hdr->e_shoff + i * elf->hdr->e_shentsize;
> +
> +        sec[i].sec = data + delta;
> +
> +        delta = sec[i].sec->sh_offset;
> +        /*
> +         * N.B. elf_resolve_section_names, elf_get_sym skip this check as
> +         * we do it here.
> +         */
> +        if ( delta < sizeof(Elf_Ehdr) ||
> +             (delta + sec[i].sec->sh_size > elf->len) )

The second half of the check needs to be skipped for SHT_NOBITS
sections. And beware of overflow again - both addends alone may
be too large, but the sum may be within range.

> +static int elf_get_sym(struct xsplice_elf *elf, const void *data)
> +{
> +    const struct xsplice_elf_sec *symtab_sec, *strtab_sec;
> +    struct xsplice_elf_sym *sym;
> +    unsigned int i, delta, offset, nsym;
> +
> +    symtab_sec = elf->symtab;
> +    strtab_sec = elf->strtab;
> +
> +    /* Pointers arithmetic to get file offset. */
> +    offset = strtab_sec->data - data;
> +
> +    /* Checked already in elf_resolve_sections, but just in case. */
> +    ASSERT(offset == strtab_sec->sec->sh_offset);
> +    ASSERT(offset < elf->len && (offset + strtab_sec->sec->sh_size <= elf->len));
> +
> +    /* symtab_sec->data was computed in elf_resolve_sections. */
> +    ASSERT((symtab_sec->sec->sh_offset + data) == symtab_sec->data);
> +
> +    /* No need to check values as elf_resolve_sections did it. */
> +    nsym = symtab_sec->sec->sh_size / symtab_sec->sec->sh_entsize;
> +
> +    sym = xmalloc_array(struct xsplice_elf_sym, nsym);
> +    if ( !sym )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Could not allocate memory for symbols\n",
> +               elf->name);
> +        return -ENOMEM;
> +    }
> +
> +    /* So we don't leak memory. */
> +    elf->sym = sym;
> +
> +    for ( i = 1; i < nsym; i++ )
> +    {
> +        Elf_Sym *s = &((Elf_Sym *)symtab_sec->data)[i];

I'm sorry for not spotting this earlier, but the calculation here needs
to follow that of the section pointers into the section table, i.e. use
symtab_sec->sec->sh_entsize (which afaict at once will allow getting
rid of the cast, and which I guess will make obvious that this lacks a
const qualifier).

> +        delta = s->st_name;
> +        /* Boundary check within the .strtab. */
> +        if ( delta > strtab_sec->sec->sh_size )

>= (just like in elf_resolve_section_names())

> +        {
> +            dprintk(XENLOG_ERR, XSPLICE "%s: Symbol [%u] data is past end of payload!\n",

Message text does not match context (also in
elf_resolve_section_names() as I now see).

> +                    elf->name, i);
> +            return -EINVAL;
> +        }
> +
> +        sym[i].sym = s;
> +        sym[i].name = data + (delta + offset);

I think this

        sym[i].name = strtab_sec->data + delta;

would be more obvious to the reader.

> +static int xsplice_header_check(const struct xsplice_elf *elf)
> +{
> +    const Elf_Ehdr *hdr = elf->hdr;
> +
> +    if ( sizeof(*elf->hdr) > elf->len )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Section header is bigger than payload!\n",
> +                elf->name);
> +        return -EINVAL;
> +    }
> +
> +    if ( !IS_ELF(*hdr) )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Not an ELF payload!\n", elf->name);
> +        return -EINVAL;
> +    }
> +
> +    if ( hdr->e_ident[EI_CLASS] != ELFCLASS64 ||
> +         hdr->e_ident[EI_DATA] != ELFDATA2LSB ||
> +         hdr->e_ident[EI_OSABI] != ELFOSABI_SYSV ||

What about EI_VERSION and EI_ABIVERSION, btw?

> +         hdr->e_type != ET_REL ||
> +         hdr->e_phnum != 0 )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Invalid ELF payload!\n", elf->name);
> +        return -EOPNOTSUPP;
> +    }
> +
> +    if ( elf->hdr->e_shstrndx == SHN_UNDEF )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Section name idx is undefined!?\n",
> +                elf->name);
> +        return -EINVAL;
> +    }
> +
> +    /* Check that section name index is within the sections. */
> +    if ( elf->hdr->e_shstrndx >= elf->hdr->e_shnum )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Section name idx (%u) is past end of sections (%u)!\n",
> +                elf->name, elf->hdr->e_shstrndx, elf->hdr->e_shnum);
> +        return -EINVAL;
> +    }
> +
> +    if ( elf->hdr->e_shnum > 64 )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Too many (%u) sections!\n",
> +                elf->name, elf->hdr->e_shnum);
> +        return -EOPNOTSUPP;
> +    }
> +
> +    if ( elf->hdr->e_shoff > ULONG_MAX )

Why not ">= elf->len" (and I see it was almost that way in v8.1)?
And then followed (further down) by another check taking
elf->hdr->e_shnum * elf->hdr->e_shentsize into account (of
course as things stand now, elf->hdr->e_shentsize can also be
arbitrarily large, so this would need to be suitably structured
- e.g. "(elf->len - elf->hdr->e_shoff) / elf->hdr->e_shentsize <
elf->hdr->e_shnum").

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 16/27] x86, xsplice: Print payload's symbol name and payload name in backtraces
  2016-04-26 11:06   ` Ross Lagerwall
@ 2016-04-26 12:41     ` Jan Beulich
  2016-04-26 12:48       ` Ross Lagerwall
  0 siblings, 1 reply; 90+ messages in thread
From: Jan Beulich @ 2016-04-26 12:41 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: Keir Fraser, andrew.cooper3, Ian Jackson, Tim Deegan, mpohlack,
	sasha.levin, xen-devel

>>> On 26.04.16 at 13:06, <ross.lagerwall@citrix.com> wrote:
> On 04/25/2016 04:35 PM, Konrad Rzeszutek Wilk wrote:
>> @@ -142,6 +145,55 @@ void *xsplice_symbols_lookup_by_name(const char *symname)
>>       return 0;
>>   }
>>
>> +static const char *xsplice_symbols_lookup(unsigned long addr,
>> +                                          unsigned long *symbolsize,
>> +                                          unsigned long *offset,
>> +                                          char *namebuf)
>> +{
>> +    const struct payload *data;
>> +    unsigned int i, best;
>> +    const void *va = (const void *)addr;
>> +    const char *n = NULL;
>> +
>> +    /*
>> +     * Only RCU locking since this list is only ever changed during apply
>> +     * or revert context. And in case it dies there we need an safe list.
>> +     */
>> +    rcu_read_lock(&rcu_applied_lock);
>> +    list_for_each_entry_rcu ( data, &applied_list, applied_list )
>> +    {
>> +        if ( va < data->text_addr &&
>> +             va >= (data->text_addr + data->pages * PAGE_SIZE) )
> 
> This calculation is wrong due to the use of void * and results in 
> incorrect backtrace results.

When text_addr is void *, how is this calculation wrong then?

>> @@ -422,6 +475,13 @@ static int prepare_payload(struct payload *payload,
>>           }
>>       }
>>
>> +    /* Setup the virtual region with proper data. */
>> +    region = &payload->region;
>> +
>> +    region->symbols_lookup = xsplice_symbols_lookup;
>> +    region->start = payload->text_addr;
>> +    region->end = payload->text_addr + payload->text_size;
> 
> This calculation is wrong due to the use of void *.

And again - why?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 16/27] x86, xsplice: Print payload's symbol name and payload name in backtraces
  2016-04-26 12:41     ` Jan Beulich
@ 2016-04-26 12:48       ` Ross Lagerwall
  2016-04-26 13:41         ` Jan Beulich
  0 siblings, 1 reply; 90+ messages in thread
From: Ross Lagerwall @ 2016-04-26 12:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, andrew.cooper3, Ian Jackson, Tim Deegan, mpohlack,
	sasha.levin, xen-devel

On 04/26/2016 01:41 PM, Jan Beulich wrote:
>>>> On 26.04.16 at 13:06, <ross.lagerwall@citrix.com> wrote:
>> On 04/25/2016 04:35 PM, Konrad Rzeszutek Wilk wrote:
>>> @@ -142,6 +145,55 @@ void *xsplice_symbols_lookup_by_name(const char *symname)
>>>        return 0;
>>>    }
>>>
>>> +static const char *xsplice_symbols_lookup(unsigned long addr,
>>> +                                          unsigned long *symbolsize,
>>> +                                          unsigned long *offset,
>>> +                                          char *namebuf)
>>> +{
>>> +    const struct payload *data;
>>> +    unsigned int i, best;
>>> +    const void *va = (const void *)addr;
>>> +    const char *n = NULL;
>>> +
>>> +    /*
>>> +     * Only RCU locking since this list is only ever changed during apply
>>> +     * or revert context. And in case it dies there we need an safe list.
>>> +     */
>>> +    rcu_read_lock(&rcu_applied_lock);
>>> +    list_for_each_entry_rcu ( data, &applied_list, applied_list )
>>> +    {
>>> +        if ( va < data->text_addr &&
>>> +             va >= (data->text_addr + data->pages * PAGE_SIZE) )
>>
>> This calculation is wrong due to the use of void * and results in
>> incorrect backtrace results.
>
> When text_addr is void *, how is this calculation wrong then?

I'm sorry, ignore that. I temporarily forgot how void* arithmetic is 
defined for GCC.

The other two points are still valid and may result in incorrect 
backtraces with > 1 payload loaded.

>
>>> @@ -422,6 +475,13 @@ static int prepare_payload(struct payload *payload,
>>>            }
>>>        }
>>>
>>> +    /* Setup the virtual region with proper data. */
>>> +    region = &payload->region;
>>> +
>>> +    region->symbols_lookup = xsplice_symbols_lookup;
>>> +    region->start = payload->text_addr;
>>> +    region->end = payload->text_addr + payload->text_size;
>>
>> This calculation is wrong due to the use of void *.
>
> And again - why?
>
> Jan
>


-- 
Ross Lagerwall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 17/27] xsplice: Add support for bug frames.
  2016-04-26 11:05   ` Ross Lagerwall
@ 2016-04-26 13:08     ` Ross Lagerwall
  0 siblings, 0 replies; 90+ messages in thread
From: Ross Lagerwall @ 2016-04-26 13:08 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, konrad, xen-devel, sasha.levin,
	andrew.cooper3, mpohlack
  Cc: Keir Fraser, Jan Beulich

On 04/26/2016 12:05 PM, Ross Lagerwall wrote:
> On 04/25/2016 04:35 PM, Konrad Rzeszutek Wilk wrote:
> snip
>> diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
>> index 72a3b88..11b19dd 100644
>> --- a/xen/common/xsplice.c
>> +++ b/xen/common/xsplice.c
>> @@ -123,6 +123,35 @@ static int verify_payload(const
>> xen_sysctl_xsplice_upload_t *upload, char *n)
>>       return 0;
>>   }
>>
>> +bool_t is_patch(const void *ptr)
>> +{
>> +    const struct payload *data;
>> +    bool_t r = 0;
>> +
>> +    /*
>> +     * Only RCU locking since this list is only ever changed during
>> apply
>> +     * or revert context. And in case it dies there we need an safe
>> list.
>> +     */
>> +    rcu_read_lock(&rcu_applied_lock);
>> +    list_for_each_entry_rcu ( data, &applied_list, applied_list )
>> +    {
>> +        if ( (ptr >= data->rw_addr &&
>> +              ptr < (data->rw_addr + data->rw_size)) ||
>> +             (ptr >= data->ro_addr &&
>> +              ptr < (data->ro_addr + data->ro_size)) ||
>> +             (ptr >= data->text_addr &&
>> +              ptr < (data->text_addr + data->text_size)) )
>
> The above 3 calculations are wrong due to the use of void *.
>

Sorry, you can ignore this, I temporarily forgot how void* arithmetic is 
defined for GCC.

-- 
Ross Lagerwall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 11/27] xsplice: Implement payload loading
  2016-04-25 15:34 ` [PATCH v9 11/27] xsplice: Implement payload loading Konrad Rzeszutek Wilk
  2016-04-26 10:48   ` Ross Lagerwall
@ 2016-04-26 13:39   ` Jan Beulich
  2016-04-27  1:47     ` Konrad Rzeszutek Wilk
  2016-04-27  3:28     ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 90+ messages in thread
From: Jan Beulich @ 2016-04-26 13:39 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, Keir Fraser, ross.lagerwall, andrew.cooper3,
	mpohlack, Julien Grall, sasha.levin, xen-devel

>>> On 25.04.16 at 17:34, <konrad.wilk@oracle.com> wrote:
> From: Ross Lagerwall <ross.lagerwall@citrix.com>
> 
> Add support for loading xsplice payloads. This is somewhat similar to
> the Linux kernel module loader, implementing the following steps:
> - Verify the elf file.
> - Parse the elf file.
> - Allocate a region of memory mapped within a free area of
>   [xen_virt_end, XEN_VIRT_END].
> - Copy allocated sections into the new region. Split them in three
>   regions - .text, .data, and .rodata. MUST have at least .text.
> - Resolve section symbols. All other symbols must be absolute addresses.
>   (Note that patch titled "xsplice,symbols: Implement symbol name resolution
>    on address" implements that)
> - Perform relocations.
> - Secure the the regions (.text,.data,.rodata) with proper permissions.
> 
> We capitalize on the vmalloc callback API (see patch titled:
> "rm/x86/vmap: Add v[z|m]alloc_xen, and vm_init_type") to allocate
> a region of memory within the [xen_virt_end, XEN_VIRT_END] for the code.
> 
> We also use the "x86/mm: Introduce modify_xen_mappings()"
> to change the virtual address page-table permissions.
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Julien Grall <julien.grall@arm.com>

By now I guess you can guess what I think about the above vs ...

> v9:
>   - Rebase on different spinlock usage in xsplice_upload.
>   - Do proper bound and overflow checking.
>   - Added 'const' on [text,ro,rw]_addr.
>   - Made 'calc_section' and 'move_payload' use an dynamically
>     allocated array for computed offsets instead of modifying sh_entsize.
>   - Remove arch_xsplice_[alloc_payload|free] and use vzalloc_xen and
>     vfree.
>   - Collapse for loop in move_payload.
>   - Move xsplice.o in Makefile
>   - Add more checks in arch_xsplice_perform_rela (r_offset and
>      sh_size % sh_entsize)
>   - Use int32_t and int64_t in arch_xsplice_perform_rela.
>   - Tighten the list of sh_flags we check
>   - Use intermediate on 'buf' so that we can do 'const void *'
>   - Use intermediate in xsplice_elf_resolve_symbols for 'const' of elf->sym.
>   - Fail if (and only) SHF_ALLOC and SHT_NOBITS section is seen.

... this long list.

> +int arch_xsplice_perform_rela(struct xsplice_elf *elf,
> +                              const struct xsplice_elf_sec *base,
> +                              const struct xsplice_elf_sec *rela)
> +{
> +    const Elf_RelA *r;
> +    unsigned int symndx, i;
> +    uint64_t val;
> +    uint8_t *dest;
> +
> +    /* Nothing to do. */
> +    if ( !rela->sec->sh_size )
> +        return 0;
> +
> +    if ( !rela->sec->sh_entsize ||
> +         rela->sec->sh_entsize < sizeof(Elf_RelA) ||

Same thing here as in v8.1 in another patch: The first check is
pointless now that you have the second one.

> +         rela->sec->sh_size % rela->sec->sh_entsize )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Section relative header is corrupted!\n",
> +                elf->name);
> +        return -EINVAL;
> +    }
> +
> +    for ( i = 0; i < (rela->sec->sh_size / rela->sec->sh_entsize); i++ )
> +    {
> +        r = rela->data + i * rela->sec->sh_entsize;
> +
> +        symndx = ELF64_R_SYM(r->r_info);
> +
> +        if ( symndx > elf->nsym )
> +        {
> +            dprintk(XENLOG_ERR, XSPLICE "%s: Relative relocation wants symbol@%u which is past end!\n",
> +                    elf->name, symndx);
> +            return -EINVAL;
> +        }
> +
> +        if ( r->r_offset > base->sec->sh_size )

>= at the very least. The size of the relocated location would really
also need to be taken into account, but that can only be done in the
switch below.

> +void arch_xsplice_init(void)

__init

> +static void free_payload_data(struct payload *payload)
> +{
> +    /* Set to zero until "move_payload". */
> +    if ( !payload->text_addr )
> +        return;
> +
> +    vfree((void *)payload->text_addr);
> +
> +    payload->pages = 0;

I think what you check and what you clear should match up, such that
redundant invocation of the function wouldn't lead to problems.

> +static int move_payload(struct payload *payload, struct xsplice_elf *elf)
> +{
> +    uint8_t *text_buf, *ro_buf, *rw_buf;

Any particular reason for them not being void *?

> +    unsigned int i;
> +    size_t size = 0;
> +    unsigned int *offset;
> +    int rc = 0;
> +
> +    offset = xzalloc_array(unsigned int, elf->hdr->e_shnum);

Why not xmalloc_array()?

> +    if ( !offset )
> +        return -ENOMEM;
> +
> +    /* Compute size of different regions. */
> +    for ( i = 1; i < elf->hdr->e_shnum; i++ )
> +    {
> +        if ( (elf->sec[i].sec->sh_flags & (SHF_ALLOC|SHF_EXECINSTR)) ==
> +             (SHF_ALLOC|SHF_EXECINSTR) )
> +            calc_section(&elf->sec[i], &payload->text_size, &offset[i]);

This silently accepts writable text sections, yet the portion of the
memory this gets placed in will be mapped RX.

> +        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> +                  !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
> +                  (elf->sec[i].sec->sh_flags & SHF_WRITE) )
> +            calc_section(&elf->sec[i], &payload->rw_size, &offset[i]);
> +        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> +                  !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
> +                  !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
> +            calc_section(&elf->sec[i], &payload->ro_size, &offset[i]);
> +        else if ( !elf->sec[i].sec->sh_flags ||
> +                  (elf->sec[i].sec->sh_flags & SHF_EXECINSTR) ||
> +                  (elf->sec[i].sec->sh_flags & SHF_MASKPROC) )
> +            /* Do nothing.*/;
> +        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> +                  (elf->sec[i].sec->sh_type == SHT_NOBITS) )
> +        {
> +            dprintk(XENLOG_DEBUG, XSPLICE "%s: Not supporting %s section!\n",
> +                    elf->name, elf->sec[i].name);
> +            rc = -EOPNOTSUPP;
> +            goto out;
> +        }

I saw this in the changelog, but I don't really understand these last
two conditionals. Wouldn't you want to bail on _any_ sections which
have SHF_ALLOC set but don't get mapped to one of the three
blocks? And wouldn't you (silently) ignore any sections with SHF_ALLOC
clear?

> +        else /* Such as .comment. */
> +            dprintk(XENLOG_DEBUG, XSPLICE "%s: Ignoring %s section!\n",
> +                    elf->name, elf->sec[i].name);
> +    }
> +
> +    /*
> +     * Total of all three regions - RX, RW, and RO. We have to have
> +     * keep them in seperate pages so we PAGE_ALIGN the RX and RW to have
> +     * them on seperate pages. The last one will by default fall on its
> +     * own page.
> +     */
> +    size = PAGE_ALIGN(payload->text_size) + PAGE_ALIGN(payload->rw_size) +
> +                      payload->ro_size;
> +
> +    size = PFN_UP(size); /* Nr of pages. */
> +    text_buf = vzalloc_xen(size * PAGE_SIZE);
> +    if ( !text_buf )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Could not allocate memory for payload!\n",
> +                elf->name);
> +        rc = -ENOMEM;
> +        goto out;
> +    }
> +    rw_buf = text_buf +  + PAGE_ALIGN(payload->text_size);
> +    ro_buf = rw_buf + PAGE_ALIGN(payload->rw_size);
> +
> +    payload->pages = size;
> +    payload->text_addr = text_buf;
> +    payload->rw_addr = rw_buf;
> +    payload->ro_addr = ro_buf;
> +
> +    for ( i = 1; i < elf->hdr->e_shnum; i++ )
> +    {
> +        if ( elf->sec[i].sec->sh_flags & SHF_ALLOC )
> +        {
> +            uint8_t *buf;

Perhaps void * again? And missing a blank line afterwards.

> +            if ( (elf->sec[i].sec->sh_flags & SHF_EXECINSTR) )
> +                buf = text_buf;
> +            else if ( (elf->sec[i].sec->sh_flags & SHF_WRITE) )
> +                buf = rw_buf;
> +             else

The indentation here is still one off.

> +                buf = ro_buf;
> +
> +            elf->sec[i].load_addr = buf + offset[i];
> +
> +            /*
> +             * Don't copy NOBITS - such as BSS. We don't memset BSS as
> +             * arch_xsplice_alloc_payload has zeroed it out for us.

Stale comment - the mentioned function doesn't exist anymore.
(Same elsewhere further down in xsplice.h.) And I really think
using memset() here would be better than using vzalloc_xen()
above. Or is there a particular reason to zero the whole area,
just to overwrite almost everything of it again right afterwards?

> +int xsplice_elf_resolve_symbols(struct xsplice_elf *elf)
> +{
> +    unsigned int i;
> +    int rc = 0;
> +
> +    ASSERT(elf->sym);
> +
> +    for ( i = 1; i < elf->nsym; i++ )
> +    {
> +        unsigned int idx = elf->sym[i].sym->st_shndx;
> +        Elf_Sym *sym = (Elf_Sym *)elf->sym[i].sym;

Well, I admit that this is the more straightforward solution, but it
opens up all of what sym points to for writing. I.e. I'd have
considered it much better to really only do the casting away of
const in the one spot where you need it (see below).

> +        switch ( idx )
> +        {
> +        case SHN_COMMON:
> +            dprintk(XENLOG_ERR, XSPLICE "%s: Unexpected common symbol: %s\n",
> +                    elf->name, elf->sym[i].name);
> +            rc = -EINVAL;
> +            break;
> +
> +        case SHN_UNDEF:
> +            dprintk(XENLOG_ERR, XSPLICE "%s: Unknown symbol: %s\n",
> +                    elf->name, elf->sym[i].name);
> +            rc = -ENOENT;
> +            break;
> +
> +        case SHN_ABS:
> +            dprintk(XENLOG_DEBUG, XSPLICE "%s: Absolute symbol: %s => %#"PRIxElfAddr"\n",
> +                    elf->name, elf->sym[i].name, sym->st_value);
> +            break;
> +
> +        default:
> +            if ( idx < elf->hdr->e_shnum &&
> +                 !(elf->sec[idx].sec->sh_flags & SHF_ALLOC) )
> +                break;

I'm afraid I got confused on v8.1 and misguided you here. For some
reason I thought the check originally sat past the switch() statement,
which obviously it can't. With the now redundant idx range check it
is pretty clear that should really have remained where it was.

> +            /* SHN_COMMON and SHN_ABS are above. */
> +            if ( idx >= SHN_LORESERVE )
> +                rc = -EOPNOTSUPP;
> +            else if ( idx >= elf->hdr->e_shnum )
> +                rc = -EINVAL;
> +
> +            if ( rc )
> +            {
> +                dprintk(XENLOG_ERR, XSPLICE "%s: Unknown type=%#"PRIx16"\n",
> +                        elf->name, idx);
> +                break;
> +            }
> +
> +            sym->st_value += (unsigned long)elf->sec[idx].load_addr;

*(<type> *)&sym->st_value += ...

But anyway.

> +int xsplice_elf_perform_relocs(struct xsplice_elf *elf)
> +{
> +    struct xsplice_elf_sec *r, *base;
> +    unsigned int i;
> +    int rc = 0;
> +
> +    /*
> +     * The first entry of an ELF symbol table is the "undefined symbol index".
> +     * aka reserved so we skip it.
> +     */

Same comment as on v8.1: "This comment seems at least misplaced,
if not pointless."

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 16/27] x86, xsplice: Print payload's symbol name and payload name in backtraces
  2016-04-26 12:48       ` Ross Lagerwall
@ 2016-04-26 13:41         ` Jan Beulich
  2016-04-27  3:31           ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 90+ messages in thread
From: Jan Beulich @ 2016-04-26 13:41 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: Keir Fraser, andrew.cooper3, Ian Jackson, Tim Deegan, mpohlack,
	sasha.levin, xen-devel

>>> On 26.04.16 at 14:48, <ross.lagerwall@citrix.com> wrote:
> On 04/26/2016 01:41 PM, Jan Beulich wrote:
>>>>> On 26.04.16 at 13:06, <ross.lagerwall@citrix.com> wrote:
>>> On 04/25/2016 04:35 PM, Konrad Rzeszutek Wilk wrote:
>>>> @@ -142,6 +145,55 @@ void *xsplice_symbols_lookup_by_name(const char 
> *symname)
>>>>        return 0;
>>>>    }
>>>>
>>>> +static const char *xsplice_symbols_lookup(unsigned long addr,
>>>> +                                          unsigned long *symbolsize,
>>>> +                                          unsigned long *offset,
>>>> +                                          char *namebuf)
>>>> +{
>>>> +    const struct payload *data;
>>>> +    unsigned int i, best;
>>>> +    const void *va = (const void *)addr;
>>>> +    const char *n = NULL;
>>>> +
>>>> +    /*
>>>> +     * Only RCU locking since this list is only ever changed during apply
>>>> +     * or revert context. And in case it dies there we need an safe list.
>>>> +     */
>>>> +    rcu_read_lock(&rcu_applied_lock);
>>>> +    list_for_each_entry_rcu ( data, &applied_list, applied_list )
>>>> +    {
>>>> +        if ( va < data->text_addr &&
>>>> +             va >= (data->text_addr + data->pages * PAGE_SIZE) )
>>>
>>> This calculation is wrong due to the use of void * and results in
>>> incorrect backtrace results.
>>
>> When text_addr is void *, how is this calculation wrong then?
> 
> I'm sorry, ignore that. I temporarily forgot how void* arithmetic is 
> defined for GCC.
> 
> The other two points are still valid and may result in incorrect 
> backtraces with > 1 payload loaded.

Of course.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 12/27] xsplice: Implement support for applying/reverting/replacing patches.
  2016-04-25 15:34 ` [PATCH v9 12/27] xsplice: Implement support for applying/reverting/replacing patches Konrad Rzeszutek Wilk
@ 2016-04-26 15:21   ` Jan Beulich
  2016-04-27  3:39     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 90+ messages in thread
From: Jan Beulich @ 2016-04-26 15:21 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Kevin Tian, Stefano Stabellini, Keir Fraser,
	Suravee Suthikulpanit, andrew.cooper3, mpohlack, ross.lagerwall,
	Julien Grall, Jun Nakajima, sasha.levin, xen-devel,
	Boris Ostrovsky

>>> On 25.04.16 at 17:34, <konrad.wilk@oracle.com> wrote:
> +static int check_special_sections(const struct xsplice_elf *elf)
> +{
> +    unsigned int i;
> +    static const char *const names[] = { ELF_XSPLICE_FUNC };
> +    bool_t count[ARRAY_SIZE(names)] = { 0 };
> +
> +    for ( i = 0; i < ARRAY_SIZE(names); i++ )
> +    {
> +        const struct xsplice_elf_sec *sec;
> +
> +        sec = xsplice_elf_sec_by_name(elf, names[i]);
> +        if ( !sec )
> +        {
> +            dprintk(XENLOG_ERR, XSPLICE "%s: %s is missing!\n",
> +                    elf->name, names[i]);
> +            return -EINVAL;
> +        }
> +
> +        if ( !sec->sec->sh_size )
> +        {
> +            dprintk(XENLOG_ERR, XSPLICE "%s: %s is empty!\n",
> +                    elf->name, names[i]);
> +            return -EINVAL;
> +        }
> +        if ( ++count[i] > 1 )

boolean values can only validly be 0 or 1. Just "if ( count[i] )" here
and ...

> +        {
> +            dprintk(XENLOG_ERR, XSPLICE "%s: %s was seen more than once!\n",
> +                    elf->name, names[i]);
> +            return -EINVAL;
> +        }

"count[i] = 1;" here.

Thinking about it again, even more stack conserving would be a
bitmap...

> +static int apply_payload(struct payload *data)
> +{
> +    unsigned int i;
> +
> +    printk(XENLOG_INFO XSPLICE "%s: Applying %u functions\n",
> +            data->name, data->nfuncs);
> +
> +    arch_xsplice_patching_enter();
> +
> +    for ( i = 0; i < data->nfuncs; i++ )
> +        arch_xsplice_apply_jmp(&data->funcs[i]);
> +
> +    arch_xsplice_patching_leave();
> +
> +    list_add_tail_rcu(&data->applied_list, &applied_list);

Neither in the comment earlier on nor here it becomes clear that this
is more of an abuse than a use of RCU.

> +struct xsplice_patch_func {
> +    const char *name;       /* Name of function to be patched. */
> +    void *new_addr;
> +    void *old_addr;
> +    uint32_t new_size;
> +    uint32_t old_size;
> +    uint8_t version;        /* MUST be XSPLICE_PAYLOAD_VERSION. */
> +    uint8_t opaque[31];     /* MUST be zero filled. */

I don't see the zero filling being a requirement, nor it being enforced.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 13/27] x86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version'.
  2016-04-25 15:35 ` [PATCH v9 13/27] x86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version' Konrad Rzeszutek Wilk
@ 2016-04-26 15:31   ` Jan Beulich
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Beulich @ 2016-04-26 15:31 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, Keir Fraser, ross.lagerwall, andrew.cooper3,
	mpohlack, Julien Grall, sasha.levin, xen-devel

>>> On 25.04.16 at 17:35, <konrad.wilk@oracle.com> wrote:
> v9: old_code and new_code are void, so drop the unsigned long cast
>     and add void* - in both test-cases and document.
>     Make tests target on ARM phony
>     Add build dependencies on x86 build
>     Include public/sysctl.h as CONFIG_XSPLICE may not be exposed.

Quite a bit better, albeit I don't really understand the reference to
CONFIG_XPLICE in this last bullet point. Irrespective of that

Acked-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 14/27] xsplice, symbols: Implement symbol name resolution on address.
  2016-04-25 15:35 ` [PATCH v9 14/27] xsplice, symbols: Implement symbol name resolution on address Konrad Rzeszutek Wilk
@ 2016-04-26 15:48   ` Jan Beulich
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Beulich @ 2016-04-26 15:48 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, mpohlack,
	sasha.levin, xen-devel

>>> On 25.04.16 at 17:35, <konrad.wilk@oracle.com> wrote:
> --- a/xen/arch/x86/platform_hypercall.c
> +++ b/xen/arch/x86/platform_hypercall.c
> @@ -798,12 +798,13 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
>          static char name[KSYM_NAME_LEN + 1]; /* protected by xenpf_lock */
>          XEN_GUEST_HANDLE(char) nameh;
>          uint32_t namelen, copylen;
> +        uint64_t addr;
>  
>          guest_from_compat_handle(nameh, op->u.symdata.name);
>  
>          ret = xensyms_read(&op->u.symdata.symnum, &op->u.symdata.type,
> -                           &op->u.symdata.address, name);
> -
> +                           &addr, name);
> +        op->u.symdata.address = addr;

Wasn't the whole point of this change to have the argument be
a pointer to unsigned long? Why is the type of the new local
variable then uint64_t?

With that adjusted:
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 15/27] xsplice, symbols: Implement fast symbol names -> virtual addresses lookup
  2016-04-25 15:35 ` [PATCH v9 15/27] xsplice, symbols: Implement fast symbol names -> virtual addresses lookup Konrad Rzeszutek Wilk
@ 2016-04-26 15:53   ` Jan Beulich
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Beulich @ 2016-04-26 15:53 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, mpohlack,
	sasha.levin, xen-devel

>>> On 25.04.16 at 17:35, <konrad.wilk@oracle.com> wrote:
> The current mechanism is geared towards fast virtual address ->
> symbol names lookup. This is fine for the normal use cases
> (BUG_ON, WARN_ON, etc), but for xSplice - where we need to find
> hypervisor symbols - it is slow.
> 
> To understand this patch, a description of the existing
> method is explained first. For folks familar go to 'NEW CODE:'.
> 
> HOW IT WORKS:
> 
> The symbol table lookup mechanism uses a simple encoding mechanism
> where it extracts the common ascii characters that the symbol's use.
> 
> This saves us space. The lookup mechanism is geared towards looking
> up symbols based on address. We have one 0..N (where N is
> the number of symbols, so 6849 for example) table:
> 
> symbols_addresses[0..N]
> 
> And an 1-1 (in a loose fashion) of the symbols (encoded) in a
> symbols_names stream of size N.
> 
> The N is variable (later on that below)
> 
> The symbols_names are sorted based on symbols_addresses, which
> means that the decoded entries inside symbols_names are not in
> ascending or descending order.
> 
> There is also the encoding mechanism - the table of 255 entries
> called symbols_token_index[]. And the symbols_token_table which
> is an stream of ASCIIZ characters, such as (it really
> is not a table as the values are variable):
> 
> @0   .asciz  "credit"
> @6   .asciz  "mask"
> ..
> @300 .asciz  "S"
> 
> And the symbols_token_index:
> @0        .short  0
> @1        .short  7
> @2        .short  12
> @4        .short  16
> ...
> @84         .short  300
> 
> The relationship between them is that the symbols_token_index
> gives us the offset to symbols_token_table.
> 
> The symbol_names[] array is a stream of encoded values. Each value
> follows the same pattern - <len> followed by <encoding values>.
> And the another <len> followed by <encoding values>.
> 
> Hence to find the right one you need to read <len>, add <len>
> (to skip over), read <len>, add <len>, and so on until one
> finds the right tuple offset.
> 
> The <encoding values> are the indicies into the symbols_token_index.
> 
> Meaning if you have:
>   0x04, 0x54, 0xda, 0xe2, 0x74
>   [4, 84, 218, 226, 116 in human numbering]
> 
> The 0x04 tells us that the symbol is four bytes past this one (so next
> symbol offset starts at 5). If we lookup symbols_token_index[84] we get 300.
> symbols_token[300] gets us the "S". And so on, the string eventually
> end up being decode to be 'S_stext'. The first character is the type,
> then optionally follwed by the filename (and # right after filename)
> and then lastly the symbol, such as:
> 
> tvpmu_intel.c#core2_vpmu_do_interrupt
> 
> Keep in mind that there are two fixed sized tables:
> symbols_addresses[0..symbols_num_syms], and
> symbols_markers[0..symbols_num_syms/255].
> 
> The symbols_markers is used to speed searching for the right address.
> It gives us the offsets within symbol_names that start at the <len><encoded 
>value>.
> 
> The way to find a symbol based on the address is:
> 1) Figure out the 'tuple offset' from symbols_address[0..symbols_num_syms].
>    This table is sorted by virtual addresses so finding the value is simple.
> 2) Get starting offset of symbol_names by retrieving value of
>    symbol_markers['tuple offset' / 255].
> 3). Iterate up to 'tuple_offset & 255' in symbols_markers stream starting
>    at 'offset'.
> 4). Decode the <len><encoded value>
> 
> This however does not work very well if we want to search the other
> way - we have the symbol name and want to find the address.
> 
> NEW CODE:
> 
> To make that work we add one fixed size table called symbols_sorted_offsets 
> which
> has two elements: offset in symbol stream, offset in the symbol-address.
> 
> This whole array is sorted on the original symbol name during build-time
> (in case of collision we also take into account the type).
> 
> The values are for example:
> 
> symbols_sorted_offsets:
>     .long 83363, 6302 # [.bss, len=5]
>     .long 80459, 6084 # [.data, len=5]
> ..
> [The # added for clarity]
> 
> Which makes it incredibly easy to get in the symbols_names and also
> symbols_addresses (or symbols_offsets)
> 
> Searching for symbols is simplified as we can do a binary search
> on symbols_sorted_offsets. Since the symbols are sorted it takes on
> average 13 calls to symbols_expand_symbol.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 17/27] xsplice: Add support for bug frames.
  2016-04-25 15:35 ` [PATCH v9 17/27] xsplice: Add support for bug frames Konrad Rzeszutek Wilk
  2016-04-26 11:05   ` Ross Lagerwall
@ 2016-04-26 15:58   ` Jan Beulich
  1 sibling, 0 replies; 90+ messages in thread
From: Jan Beulich @ 2016-04-26 15:58 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, mpohlack,
	sasha.levin, xen-devel

>>> On 25.04.16 at 17:35, <konrad.wilk@oracle.com> wrote:
> From: Ross Lagerwall <ross.lagerwall@citrix.com>
> 
> Add support for handling bug frames contained with xsplice modules. If a
> trap occurs search either the kernel bug table or an applied payload's
> bug table depending on the instruction pointer.
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 18/27] xsplice: Add support for exception tables.
  2016-04-25 15:35 ` [PATCH v9 18/27] xsplice: Add support for exception tables Konrad Rzeszutek Wilk
@ 2016-04-26 16:01   ` Jan Beulich
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Beulich @ 2016-04-26 16:01 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, mpohlack,
	sasha.levin, xen-devel

>>> On 25.04.16 at 17:35, <konrad.wilk@oracle.com> wrote:
> From: Ross Lagerwall <ross.lagerwall@citrix.com>
> 
> Add support for exception tables contained within xSplice payloads. If an
> exception occurs search either the main exception table or a particular
> active payload's exception table depending on the instruction pointer.
> 
> Also we add an test-case to make sure we have an exception that
> is handled.
> 
> To not grow the code-base if xSplice is not compiled in we add
> certain #define to help in determining if code needs to be __init
> or not.
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 04/27] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
  2016-04-26 10:21   ` Jan Beulich
@ 2016-04-26 17:50     ` Konrad Rzeszutek Wilk
  2016-04-27  6:51       ` Jan Beulich
  0 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-26 17:50 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, ross.lagerwall, andrew.cooper3,
	Ian Jackson, mpohlack, sasha.levin, xen-devel, Daniel De Graaf

On Tue, Apr 26, 2016 at 04:21:10AM -0600, Jan Beulich wrote:
> >>> On 25.04.16 at 17:34, <konrad.wilk@oracle.com> wrote:
> > The implementation does not actually do any patching.
> > 
> > It just adds the framework for doing the hypercalls,
> > keeping track of ELF payloads, and the basic operations:
> >  - query which payloads exist,
> >  - query for specific payloads,
> >  - check*1, apply*1, replace*1, and unload payloads.
> > 
> > *1: Which of course in this patch are nops.
> > 
> > The functionality is disabled on ARM until all arch
> > components are implemented.
> > 
> > Also by default it is disabled until the implementation
> > is in place.
> > 
> > We also use recursive spinlocks to so that the find_payload
> > function does not need to have a 'lock' and 'non-lock' variant.
> > 
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> > Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
> 
> I'm hesitant to say that, but with all of this:
> 
> > v9:
> >     s/find_name/get_name/, drop locks when allocating data.
> >     Drop conditional expression on copyback
> >     Move the allocation on upload outside the spinlock.
> >     Add (TECH PREVIEW) to the Kconfig help
> >     Return -EINVAL if the CHECK or UNLOAD action is to be performed and the payload
> >     state is not in expected state.
> >     Print 'c' not 'u' when invoking the keyhandler.
> 
> ... I'm not sure the earlier R-b can still be considered valid. Andrew?

I don't know what the criteria is for dropping an Reviewed-by.
I am happy to drop it if you would like - but it may be that Andrew
is OK with the way he had his review?

Or is this more of your view as maintainer - that is the patch
changed considerably (and what is that? percentage of the patch?
small amount of the patch? Trivial changes? Dropping code?)?
> 
> > +static int get_name(const xen_xsplice_name_t *name, char *n)
> > +{
> > +    if ( !name->size || name->size > XEN_XSPLICE_NAME_SIZE )
> > +        return -EINVAL;
> > +
> > +    if ( name->pad[0] || name->pad[1] || name->pad[2] )
> > +        return -EINVAL;
> > +
> > +    if ( !guest_handle_okay(name->name, name->size) )
> > +        return -EINVAL;
> > +
> > +    if ( __copy_from_guest(n, name->name, name->size) )
> > +        return -EFAULT;
> 
> Quoting part of my v8.1 reply:
> "Is there a particular reason why you open code copy_from_guest() here?"

You mean why I use guest_handle_okay and __copy_from_guest instead of
say copy_from_guest?

I think it is an artificat of earlier changes - in which the find_name
would only check 'name-size' and then in another function we would
just do '__copy_from_guest'. But that is not needed anymore - so let
me change it to 'copy_from_guest'
I thought at some point you asked for that as the check was done for
it once and there was no point
> 
> > +static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
> > +{
> > +    struct payload *data, *found;
> > +    char n[XEN_XSPLICE_NAME_SIZE];
> > +    int rc;
> > +
> > +    rc = verify_payload(upload, n);
> > +    if ( rc )
> > +        return rc;
> > +
> > +    data = xzalloc(struct payload);
> > +
> > +    spin_lock(&payload_lock);
> > +
> > +    found = find_payload(n);
> > +    if ( IS_ERR(found) )
> > +    {
> > +        rc = PTR_ERR(found);
> > +        goto out;
> > +    }
> > +    else if ( found )
> > +    {
> > +        rc = -EEXIST;
> > +        goto out;
> > +    }
> > +
> > +    if ( !data )
> > +    {
> > +        rc = -ENOMEM;
> > +        goto out;
> > +    }
> > +
> > +    rc = 0;
> 
> rc is already zero by the time we get here.
> 
> I also wonder whether the code wouldn't be easier to read if you
> used just a sequence of if()/else if() here, without any goto-s.

But I do need to free(data) and unlock the spinlock - so having
a common code to pass through makes sense.

Unless you mean have an condition on if ( !rc ), and do the normal path?
Like so:

    rc = verify_payload(upload, n);
    if ( rc )
        return rc;

    data = xzalloc(struct payload);

    spin_lock(&payload_lock);

    found = find_payload(n);
    if ( IS_ERR(found) )
        rc = PTR_ERR(found);
    else if ( found )
        rc = -EEXIST;

    if ( !rc && !data )
        rc = -ENOMEM;

    if ( !rc )
    {
        memcpy(data->name, n, strlen(n));
        data->state = XSPLICE_STATE_CHECKED;
        INIT_LIST_HEAD(&data->list);

        list_add_tail(&data->list, &payload_list);
        payload_cnt++;
        payload_version++;
    }

    spin_unlock(&payload_lock);

    if ( rc )
        xfree(data);

    return rc;


That looks fine here, but in the subsequent patch I have to also
check for

if ( __copy_from_guest(raw_data, upload->payload, upload->size) )       

and
rc = load_payload_data(data, raw_data, upload->size);

and goto statement help a lot there.

I would rather have it the way it is now if you are OK with that?




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 11/27] xsplice: Implement payload loading
  2016-04-26 13:39   ` Jan Beulich
@ 2016-04-27  1:47     ` Konrad Rzeszutek Wilk
  2016-04-27  7:57       ` Jan Beulich
  2016-04-27  3:28     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-27  1:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Keir Fraser, ross.lagerwall, andrew.cooper3,
	mpohlack, Julien Grall, sasha.levin, xen-devel

> > +static int move_payload(struct payload *payload, struct xsplice_elf *elf)
> > +{
.. snip..
> > +    /* Compute size of different regions. */
> > +    for ( i = 1; i < elf->hdr->e_shnum; i++ )
> > +    {
> > +        if ( (elf->sec[i].sec->sh_flags & (SHF_ALLOC|SHF_EXECINSTR)) ==
> > +             (SHF_ALLOC|SHF_EXECINSTR) )
> > +            calc_section(&elf->sec[i], &payload->text_size, &offset[i]);
> 
> This silently accepts writable text sections, yet the portion of the
> memory this gets placed in will be mapped RX.

I am not sure I follow. We only accept if sh_flags have AX. Not WAX?
How am I accepting writable text sections?
> 
> > +        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> > +                  !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
> > +                  (elf->sec[i].sec->sh_flags & SHF_WRITE) )
> > +            calc_section(&elf->sec[i], &payload->rw_size, &offset[i]);
> > +        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> > +                  !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
> > +                  !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
> > +            calc_section(&elf->sec[i], &payload->ro_size, &offset[i]);
> > +        else if ( !elf->sec[i].sec->sh_flags ||
> > +                  (elf->sec[i].sec->sh_flags & SHF_EXECINSTR) ||
> > +                  (elf->sec[i].sec->sh_flags & SHF_MASKPROC) )
> > +            /* Do nothing.*/;
> > +        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> > +                  (elf->sec[i].sec->sh_type == SHT_NOBITS) )
> > +        {
> > +            dprintk(XENLOG_DEBUG, XSPLICE "%s: Not supporting %s section!\n",
> > +                    elf->name, elf->sec[i].name);
> > +            rc = -EOPNOTSUPP;
> > +            goto out;
> > +        }
> 
> I saw this in the changelog, but I don't really understand these last
> two conditionals. Wouldn't you want to bail on _any_ sections which

The first (/Do nothing/) is for sections such as .rela.* (which we can
ditch after we are done), .symtab, .strtab (for which in later patches in
build_symbol_table construct a copy), and:

[ 1] .note.gnu.build-i NOTE 0000000000000000  00000040
       0000000000000024  0000000000000000   A       0     0     4

which value we just copy in struct payload->id.
(also in later patch).
> have SHF_ALLOC set but don't get mapped to one of the three
> blocks? And wouldn't you (silently) ignore any sections with SHF_ALLOC
> clear?

Correct, such as:
 [29] .shstrtab         STRTAB           0000000000000000  000002fe
       0000000000000143  0000000000000000           0     0     1

I've update the comments to be more clear.

..snip..
> > +int xsplice_elf_resolve_symbols(struct xsplice_elf *elf)
> > +{
> > +    unsigned int i;
> > +    int rc = 0;
> > +
> > +    ASSERT(elf->sym);
> > +
> > +    for ( i = 1; i < elf->nsym; i++ )
> > +    {
> > +        unsigned int idx = elf->sym[i].sym->st_shndx;
> > +        Elf_Sym *sym = (Elf_Sym *)elf->sym[i].sym;
> 
> Well, I admit that this is the more straightforward solution, but it
> opens up all of what sym points to for writing. I.e. I'd have
> considered it much better to really only do the casting away of
> const in the one spot where you need it (see below).

OK. That may become a bit cumbersome. We would have in the later
patches (xsplice,symbols: Implement symbol name resolution on addres)
the SHN_UNDEF doing symbol lookup. And that one tries to set
sym->st_value twice.

I can certainly cast it twice there, and then once in the default
case if you would like.

> 
> > +        switch ( idx )
> > +        {
> > +        case SHN_COMMON:
> > +            dprintk(XENLOG_ERR, XSPLICE "%s: Unexpected common symbol: %s\n",
> > +                    elf->name, elf->sym[i].name);
> > +            rc = -EINVAL;
> > +            break;
> > +
> > +        case SHN_UNDEF:
> > +            dprintk(XENLOG_ERR, XSPLICE "%s: Unknown symbol: %s\n",
> > +                    elf->name, elf->sym[i].name);
> > +            rc = -ENOENT;
> > +            break;
..snip..
> > +        default:

..snip..
> > +
> > +            sym->st_value += (unsigned long)elf->sec[idx].load_addr;
> 
> *(<type> *)&sym->st_value += ...

Right, so I can cast it to a non-const and write to it. Keep in mind
that in this patch it is only one place, but in further I would have
to do this twice.

What is your preference?
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 10/27] xsplice: Add helper elf routines
  2016-04-26 12:37   ` Jan Beulich
@ 2016-04-27  1:59     ` Konrad Rzeszutek Wilk
  2016-04-27  7:27       ` Jan Beulich
  2016-04-27  4:06     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-27  1:59 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, Ian Jackson,
	Tim Deegan, mpohlack, sasha.levin, xen-devel

> > +static int xsplice_header_check(const struct xsplice_elf *elf)
> > +{
> > +    const Elf_Ehdr *hdr = elf->hdr;
> > +
> > +    if ( sizeof(*elf->hdr) > elf->len )
> > +    {
> > +        dprintk(XENLOG_ERR, XSPLICE "%s: Section header is bigger than payload!\n",
> > +                elf->name);
> > +        return -EINVAL;
> > +    }
> > +
> > +    if ( !IS_ELF(*hdr) )
> > +    {
> > +        dprintk(XENLOG_ERR, XSPLICE "%s: Not an ELF payload!\n", elf->name);
> > +        return -EINVAL;
> > +    }
> > +
> > +    if ( hdr->e_ident[EI_CLASS] != ELFCLASS64 ||
> > +         hdr->e_ident[EI_DATA] != ELFDATA2LSB ||
> > +         hdr->e_ident[EI_OSABI] != ELFOSABI_SYSV ||
> 
> What about EI_VERSION and EI_ABIVERSION, btw?

As I did some prototype on ARM32 I realized that the EI_CLASS is wrong
in common code (as ELFCLASS32 is what ARM32 has). And the EI_ABIVERSION
too.

So EI_CLASS check moves to arch/x86/xsplice.c (and not in this
patch but in xsplice: Implement payload loading).


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 08/27] arm/x86/vmap: Add v[z|m]alloc_xen and vm_init_type
  2016-04-26 10:47   ` Jan Beulich
@ 2016-04-27  2:38     ` Konrad Rzeszutek Wilk
  2016-04-27  7:12       ` Jan Beulich
  0 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-27  2:38 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Keir Fraser, ross.lagerwall, andrew.cooper3,
	Ian Jackson, Tim Deegan, mpohlack, Julien Grall, sasha.levin,
	xen-devel

> With vm_alloc() getting removed, vm_free() should get removed
> here too. And with that, vm_alloc_type() and vm_free_type() can
> then just become vm_alloc() and vm_free() respectively (as static
> internal functions).

Please take a look at this inline one:

From 1c133365d98a02c8f5131cdcde11960623fa247a Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Tue, 26 Apr 2016 14:03:06 -0400
Subject: [PATCH] arm/x86/vmap: Add v[z|m]alloc_xen and vm_init_type

For those users who want to use the virtual addresses that
are in the hypervisor's code/data region address space -
these three new functions allow that.

Implementation wise the vmap API keeps track of two virtual
address regions now:
 a) VMAP_VIRT_START
 b) Any provided virtual address space (need start and end).

The a) one is the default one and the existing behavior
for users of vmalloc, vmap, etc is the same.

If however one wishes to use the b) one only has to use
the vm_init_type to initialize and the v[m|z]alloc_xen to utilize
it (vfree and vunmap are capable of searching both address spaces).

This allows users (such as xSplice) to provide their own
mechanism to change the the page flags, and also use virtual
addresses closer to the hypervisor virtual addresses (at least
on x86) while not having to deal with the allocation of
pages.

For example of users, see patch titled "xsplice: Implement payload
loading", where we parse the payload's ELF relocations - which
is defined to be signed 32-bit (on x86) (max displacement hence
is 2GB virtual space, ARM32 is 128MB). The displacement of the
hypervisor virtual addresses to the vmalloc (on x86)
is more than 32-bits - which means that ELF relocations would
truncate the 34 and 33th bit. Hence this alternate API.

We also add add extra checks in case the b) range has not been
initialized.

Part of this patch also removes 'vm_alloc' and 'vm_free'
decleration as we do not have any users of it.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Suggested-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com> [ARM]

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>

v4: New patch.
v5: Update per Jan's comments.
v6: Drop the stray parentheses on typedefs.
    Ditch the vunmap callback. Stash away the virtual addresses in lists.
    Ditch the vmap callback. Just provide virtual address.
    Ditch the vmalloc_range. Require users of alternative virtual address
    to call vmap_init_type first.
v7: Don't expose the vmalloc_type and such. Instead provide an wrapper
    called vmalloc_xen for those.
    Rename the enum, change one of the names.
    Moved the vunmap_type around in c file so we don't have to declare
    it in the header.
v9: Remove the vunmap_xen, removed vm_alloc from header.
    Add vzalloc_xen
v10:
    Properly ASSERT on ranges
    Make vm_free and vunmap automatically detect the right va space.
    Remove from header vm_free. Rename vm_alloc_type and vm_free_type
    to  vm_alloc and vm_free respectively.
---
 xen/arch/arm/kernel.c  |   2 +-
 xen/arch/arm/mm.c      |   2 +-
 xen/arch/x86/mm.c      |   2 +-
 xen/common/vmap.c      | 212 +++++++++++++++++++++++++++++++------------------
 xen/drivers/acpi/osl.c |   2 +-
 xen/include/xen/vmap.h |  22 +++--
 6 files changed, 156 insertions(+), 86 deletions(-)

diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
index 61808ac..9871bd9 100644
--- a/xen/arch/arm/kernel.c
+++ b/xen/arch/arm/kernel.c
@@ -299,7 +299,7 @@ static __init int kernel_decompress(struct bootmodule *mod)
         return -ENOMEM;
     }
     mfn = _mfn(page_to_mfn(pages));
-    output = __vmap(&mfn, 1 << kernel_order_out, 1, 1, PAGE_HYPERVISOR);
+    output = __vmap(&mfn, 1 << kernel_order_out, 1, 1, PAGE_HYPERVISOR, VMAP_DEFAULT);
 
     rc = perform_gunzip(output, input, size);
     clean_dcache_va_range(output, output_size);
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 7065c3e..94ea054 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -807,7 +807,7 @@ void *ioremap_attr(paddr_t pa, size_t len, unsigned int attributes)
     mfn_t mfn = _mfn(PFN_DOWN(pa));
     unsigned int offs = pa & (PAGE_SIZE - 1);
     unsigned int nr = PFN_UP(offs + len);
-    void *ptr = __vmap(&mfn, nr, 1, 1, attributes);
+    void *ptr = __vmap(&mfn, nr, 1, 1, attributes, VMAP_DEFAULT);
 
     if ( ptr == NULL )
         return NULL;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index a42097f..2bb920b 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -6179,7 +6179,7 @@ void __iomem *ioremap(paddr_t pa, size_t len)
         unsigned int offs = pa & (PAGE_SIZE - 1);
         unsigned int nr = PFN_UP(offs + len);
 
-        va = __vmap(&mfn, nr, 1, 1, PAGE_HYPERVISOR_NOCACHE) + offs;
+        va = __vmap(&mfn, nr, 1, 1, PAGE_HYPERVISOR_NOCACHE, VMAP_DEFAULT) + offs;
     }
 
     return (void __force __iomem *)va;
diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index 134eda0..9c8782c 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -10,40 +10,43 @@
 #include <asm/page.h>
 
 static DEFINE_SPINLOCK(vm_lock);
-static void *__read_mostly vm_base;
-#define vm_bitmap ((unsigned long *)vm_base)
+static void *__read_mostly vm_base[VMAP_REGION_NR];
+#define vm_bitmap(x) ((unsigned long *)vm_base[x])
 /* highest allocated bit in the bitmap */
-static unsigned int __read_mostly vm_top;
+static unsigned int __read_mostly vm_top[VMAP_REGION_NR];
 /* total number of bits in the bitmap */
-static unsigned int __read_mostly vm_end;
+static unsigned int __read_mostly vm_end[VMAP_REGION_NR];
 /* lowest known clear bit in the bitmap */
-static unsigned int vm_low;
+static unsigned int vm_low[VMAP_REGION_NR];
 
-void __init vm_init(void)
+void __init vm_init_type(enum vmap_region type, void *start, void *end)
 {
     unsigned int i, nr;
     unsigned long va;
 
-    vm_base = (void *)VMAP_VIRT_START;
-    vm_end = PFN_DOWN(arch_vmap_virt_end() - vm_base);
-    vm_low = PFN_UP((vm_end + 7) / 8);
-    nr = PFN_UP((vm_low + 7) / 8);
-    vm_top = nr * PAGE_SIZE * 8;
+    ASSERT(!vm_base[type]);
 
-    for ( i = 0, va = (unsigned long)vm_bitmap; i < nr; ++i, va += PAGE_SIZE )
+    vm_base[type] = start;
+    vm_end[type] = PFN_DOWN(end - start);
+    vm_low[type]= PFN_UP((vm_end[type] + 7) / 8);
+    nr = PFN_UP((vm_low[type] + 7) / 8);
+    vm_top[type] = nr * PAGE_SIZE * 8;
+
+    for ( i = 0, va = (unsigned long)vm_bitmap(type); i < nr; ++i, va += PAGE_SIZE )
     {
         struct page_info *pg = alloc_domheap_page(NULL, 0);
 
         map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR);
         clear_page((void *)va);
     }
-    bitmap_fill(vm_bitmap, vm_low);
+    bitmap_fill(vm_bitmap(type), vm_low[type]);
 
     /* Populate page tables for the bitmap if necessary. */
-    populate_pt_range(va, 0, vm_low - nr);
+    populate_pt_range(va, 0, vm_low[type] - nr);
 }
 
-void *vm_alloc(unsigned int nr, unsigned int align)
+static void *vm_alloc(unsigned int nr, unsigned int align,
+                      enum vmap_region t)
 {
     unsigned int start, bit;
 
@@ -52,27 +55,31 @@ void *vm_alloc(unsigned int nr, unsigned int align)
     else if ( align & (align - 1) )
         align &= -align;
 
+    ASSERT((t >= VMAP_DEFAULT) && (t < VMAP_REGION_NR));
+    if ( !vm_base[t] )
+        return NULL;
+
     spin_lock(&vm_lock);
     for ( ; ; )
     {
         struct page_info *pg;
 
-        ASSERT(vm_low == vm_top || !test_bit(vm_low, vm_bitmap));
-        for ( start = vm_low; start < vm_top; )
+        ASSERT(vm_low[t] == vm_top[t] || !test_bit(vm_low[t], vm_bitmap(t)));
+        for ( start = vm_low[t]; start < vm_top[t]; )
         {
-            bit = find_next_bit(vm_bitmap, vm_top, start + 1);
-            if ( bit > vm_top )
-                bit = vm_top;
+            bit = find_next_bit(vm_bitmap(t), vm_top[t], start + 1);
+            if ( bit > vm_top[t] )
+                bit = vm_top[t];
             /*
              * Note that this skips the first bit, making the
              * corresponding page a guard one.
              */
             start = (start + align) & ~(align - 1);
-            if ( bit < vm_top )
+            if ( bit < vm_top[t] )
             {
                 if ( start + nr < bit )
                     break;
-                start = find_next_zero_bit(vm_bitmap, vm_top, bit + 1);
+                start = find_next_zero_bit(vm_bitmap(t), vm_top[t], bit + 1);
             }
             else
             {
@@ -82,12 +89,12 @@ void *vm_alloc(unsigned int nr, unsigned int align)
             }
         }
 
-        if ( start < vm_top )
+        if ( start < vm_top[t] )
             break;
 
         spin_unlock(&vm_lock);
 
-        if ( vm_top >= vm_end )
+        if ( vm_top[t] >= vm_end[t] )
             return NULL;
 
         pg = alloc_domheap_page(NULL, 0);
@@ -96,23 +103,23 @@ void *vm_alloc(unsigned int nr, unsigned int align)
 
         spin_lock(&vm_lock);
 
-        if ( start >= vm_top )
+        if ( start >= vm_top[t] )
         {
-            unsigned long va = (unsigned long)vm_bitmap + vm_top / 8;
+            unsigned long va = (unsigned long)vm_bitmap(t) + vm_top[t] / 8;
 
             if ( !map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR) )
             {
                 clear_page((void *)va);
-                vm_top += PAGE_SIZE * 8;
-                if ( vm_top > vm_end )
-                    vm_top = vm_end;
+                vm_top[t] += PAGE_SIZE * 8;
+                if ( vm_top[t] > vm_end[t] )
+                    vm_top[t] = vm_end[t];
                 continue;
             }
         }
 
         free_domheap_page(pg);
 
-        if ( start >= vm_top )
+        if ( start >= vm_top[t] )
         {
             spin_unlock(&vm_lock);
             return NULL;
@@ -120,47 +127,58 @@ void *vm_alloc(unsigned int nr, unsigned int align)
     }
 
     for ( bit = start; bit < start + nr; ++bit )
-        __set_bit(bit, vm_bitmap);
-    if ( bit < vm_top )
-        ASSERT(!test_bit(bit, vm_bitmap));
+        __set_bit(bit, vm_bitmap(t));
+    if ( bit < vm_top[t] )
+        ASSERT(!test_bit(bit, vm_bitmap(t)));
     else
-        ASSERT(bit == vm_top);
-    if ( start <= vm_low + 2 )
-        vm_low = bit;
+        ASSERT(bit == vm_top[t]);
+    if ( start <= vm_low[t] + 2 )
+        vm_low[t] = bit;
     spin_unlock(&vm_lock);
 
-    return vm_base + start * PAGE_SIZE;
+    return vm_base[t] + start * PAGE_SIZE;
 }
 
-static unsigned int vm_index(const void *va)
+static unsigned int vm_index(const void *va, enum vmap_region type)
 {
     unsigned long addr = (unsigned long)va & ~(PAGE_SIZE - 1);
     unsigned int idx;
+    unsigned long start = (unsigned long)vm_base[type];
 
-    if ( addr < VMAP_VIRT_START + (vm_end / 8) ||
-         addr >= VMAP_VIRT_START + vm_top * PAGE_SIZE )
+    if ( !start )
         return 0;
 
-    idx = PFN_DOWN(va - vm_base);
-    return !test_bit(idx - 1, vm_bitmap) &&
-           test_bit(idx, vm_bitmap) ? idx : 0;
+    if ( addr < start + (vm_end[type] / 8) ||
+         addr >= start + vm_top[type] * PAGE_SIZE )
+        return 0;
+
+    idx = PFN_DOWN(va - vm_base[type]);
+    return !test_bit(idx - 1, vm_bitmap(type)) &&
+           test_bit(idx, vm_bitmap(type)) ? idx : 0;
 }
 
-static unsigned int vm_size(const void *va)
+static unsigned int vm_size(const void *va, enum vmap_region type)
 {
-    unsigned int start = vm_index(va), end;
+    unsigned int start = vm_index(va, type), end;
 
     if ( !start )
         return 0;
 
-    end = find_next_zero_bit(vm_bitmap, vm_top, start + 1);
+    end = find_next_zero_bit(vm_bitmap(type), vm_top[type], start + 1);
 
-    return min(end, vm_top) - start;
+    return min(end, vm_top[type]) - start;
 }
 
-void vm_free(const void *va)
+static void vm_free(const void *va)
 {
-    unsigned int bit = vm_index(va);
+    enum vmap_region type = VMAP_DEFAULT;
+    unsigned int bit = vm_index(va, type);
+
+    if ( !bit )
+    {
+        type = VMAP_XEN;
+        bit = vm_index(va, type);
+    }
 
     if ( !bit )
     {
@@ -169,29 +187,55 @@ void vm_free(const void *va)
     }
 
     spin_lock(&vm_lock);
-    if ( bit < vm_low )
+    if ( bit < vm_low[type] )
     {
-        vm_low = bit - 1;
-        while ( !test_bit(vm_low - 1, vm_bitmap) )
-            --vm_low;
+        vm_low[type] = bit - 1;
+        while ( !test_bit(vm_low[type] - 1, vm_bitmap(type)) )
+            --vm_low[type];
     }
-    while ( __test_and_clear_bit(bit, vm_bitmap) )
-        if ( ++bit == vm_top )
+    while ( __test_and_clear_bit(bit, vm_bitmap(type)) )
+        if ( ++bit == vm_top[type] )
             break;
     spin_unlock(&vm_lock);
 }
 
+static void vunmap_pages(const void *va, unsigned int pages)
+{
+#ifndef _PAGE_NONE
+    unsigned long addr = (unsigned long)va;
+
+    destroy_xen_mappings(addr, addr + PAGE_SIZE * pages);
+#else /* Avoid tearing down intermediate page tables. */
+    map_pages_to_xen((unsigned long)va, 0, pages, _PAGE_NONE);
+#endif
+    vm_free(va);
+}
+
+void vunmap(const void *va)
+{
+    enum vmap_region type = VMAP_DEFAULT;
+    unsigned int pages = vm_size(va, type);
+
+    if ( !pages )
+    {
+        type = VMAP_XEN;
+        pages = vm_size(va, type);
+    }
+    vunmap_pages(va, pages);
+}
+
 void *__vmap(const mfn_t *mfn, unsigned int granularity,
-             unsigned int nr, unsigned int align, unsigned int flags)
+             unsigned int nr, unsigned int align, unsigned int flags,
+             enum vmap_region type)
 {
-    void *va = vm_alloc(nr * granularity, align);
+    void *va = vm_alloc(nr * granularity, align, type);
     unsigned long cur = (unsigned long)va;
 
     for ( ; va && nr--; ++mfn, cur += PAGE_SIZE * granularity )
     {
         if ( map_pages_to_xen(cur, mfn_x(*mfn), granularity, flags) )
         {
-            vunmap(va);
+            vunmap_pages(va, vm_size(va, type));
             va = NULL;
         }
     }
@@ -201,22 +245,10 @@ void *__vmap(const mfn_t *mfn, unsigned int granularity,
 
 void *vmap(const mfn_t *mfn, unsigned int nr)
 {
-    return __vmap(mfn, 1, nr, 1, PAGE_HYPERVISOR);
-}
-
-void vunmap(const void *va)
-{
-#ifndef _PAGE_NONE
-    unsigned long addr = (unsigned long)va;
-
-    destroy_xen_mappings(addr, addr + PAGE_SIZE * vm_size(va));
-#else /* Avoid tearing down intermediate page tables. */
-    map_pages_to_xen((unsigned long)va, 0, vm_size(va), _PAGE_NONE);
-#endif
-    vm_free(va);
+    return __vmap(mfn, 1, nr, 1, PAGE_HYPERVISOR, VMAP_DEFAULT);
 }
 
-void *vmalloc(size_t size)
+static void *vmalloc_type(size_t size, enum vmap_region type)
 {
     mfn_t *mfn;
     size_t pages, i;
@@ -238,7 +270,7 @@ void *vmalloc(size_t size)
         mfn[i] = _mfn(page_to_mfn(pg));
     }
 
-    va = vmap(mfn, pages);
+    va = __vmap(mfn, 1, pages, 1, PAGE_HYPERVISOR, type);
     if ( va == NULL )
         goto error;
 
@@ -252,9 +284,19 @@ void *vmalloc(size_t size)
     return NULL;
 }
 
-void *vzalloc(size_t size)
+void *vmalloc(size_t size)
 {
-    void *p = vmalloc(size);
+    return vmalloc_type(size, VMAP_DEFAULT);
+}
+
+void *vmalloc_xen(size_t size)
+{
+    return vmalloc_type(size, VMAP_XEN);
+}
+
+static void *vzalloc_type(size_t size, enum vmap_region type)
+{
+    void *p = vmalloc_type(size, type);
     int i;
 
     if ( p == NULL )
@@ -266,16 +308,32 @@ void *vzalloc(size_t size)
     return p;
 }
 
+void *vzalloc(size_t size)
+{
+    return vzalloc_type(size, VMAP_DEFAULT);
+}
+
+void *vzalloc_xen(size_t size)
+{
+    return vzalloc_type(size, VMAP_XEN);
+}
+
 void vfree(void *va)
 {
     unsigned int i, pages;
     struct page_info *pg;
     PAGE_LIST_HEAD(pg_list);
+    enum vmap_region type = VMAP_DEFAULT;
 
     if ( !va )
         return;
 
-    pages = vm_size(va);
+    pages = vm_size(va, type);
+    if ( !pages )
+    {
+        type = VMAP_XEN;
+        pages = vm_size(va, type);
+    }
     ASSERT(pages);
 
     for ( i = 0; i < pages; i++ )
@@ -285,7 +343,7 @@ void vfree(void *va)
         ASSERT(page);
         page_list_add(page, &pg_list);
     }
-    vunmap(va);
+    vunmap_pages(va, pages);
 
     while ( (pg = page_list_remove_head(&pg_list)) != NULL )
         free_domheap_page(pg);
diff --git a/xen/drivers/acpi/osl.c b/xen/drivers/acpi/osl.c
index 8a28d87..9a49029 100644
--- a/xen/drivers/acpi/osl.c
+++ b/xen/drivers/acpi/osl.c
@@ -97,7 +97,7 @@ acpi_os_map_memory(acpi_physical_address phys, acpi_size size)
 		if (IS_ENABLED(CONFIG_X86) && !((phys + size - 1) >> 20))
 			return __va(phys);
 		return __vmap(&mfn, PFN_UP(offs + size), 1, 1,
-			      ACPI_MAP_MEM_ATTR) + offs;
+			      ACPI_MAP_MEM_ATTR, VMAP_DEFAULT) + offs;
 	}
 	return __acpi_map_table(phys, size);
 }
diff --git a/xen/include/xen/vmap.h b/xen/include/xen/vmap.h
index 5671ac8..3434b6e 100644
--- a/xen/include/xen/vmap.h
+++ b/xen/include/xen/vmap.h
@@ -4,15 +4,24 @@
 #include <xen/mm.h>
 #include <asm/page.h>
 
-void *vm_alloc(unsigned int nr, unsigned int align);
-void vm_free(const void *);
+enum vmap_region {
+    VMAP_DEFAULT,
+    VMAP_XEN,
+    VMAP_REGION_NR,
+};
 
-void *__vmap(const mfn_t *mfn, unsigned int granularity,
-             unsigned int nr, unsigned int align, unsigned int flags);
+void vm_init_type(enum vmap_region type, void *start, void *end);
+
+void *__vmap(const mfn_t *mfn, unsigned int granularity, unsigned int nr,
+             unsigned int align, unsigned int flags, enum vmap_region);
 void *vmap(const mfn_t *mfn, unsigned int nr);
 void vunmap(const void *);
+
 void *vmalloc(size_t size);
+void *vmalloc_xen(size_t size);
+
 void *vzalloc(size_t size);
+void *vzalloc_xen(size_t size);
 void vfree(void *va);
 
 void __iomem *ioremap(paddr_t, size_t);
@@ -24,7 +33,10 @@ static inline void iounmap(void __iomem *va)
     vunmap((void *)(addr & PAGE_MASK));
 }
 
-void vm_init(void);
 void *arch_vmap_virt_end(void);
+static inline void vm_init(void)
+{
+    vm_init_type(VMAP_DEFAULT, (void *)VMAP_VIRT_START, arch_vmap_virt_end());
+}
 
 #endif /* __XEN_VMAP_H__ */
-- 
2.4.3


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 11/27] xsplice: Implement payload loading
  2016-04-26 13:39   ` Jan Beulich
  2016-04-27  1:47     ` Konrad Rzeszutek Wilk
@ 2016-04-27  3:28     ` Konrad Rzeszutek Wilk
  2016-04-27  8:28       ` Jan Beulich
  1 sibling, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-27  3:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Keir Fraser, ross.lagerwall, andrew.cooper3,
	mpohlack, Julien Grall, sasha.levin, xen-devel

> > +static int move_payload(struct payload *payload, struct xsplice_elf *elf)
> > +{
..snip..
> > +    for ( i = 1; i < elf->hdr->e_shnum; i++ )
> > +    {
> > +        if ( elf->sec[i].sec->sh_flags & SHF_ALLOC )
> > +        {
> > +            uint8_t *buf;
> 
> Perhaps void * again? And missing a blank line afterwards.
> 
> > +            if ( (elf->sec[i].sec->sh_flags & SHF_EXECINSTR) )
> > +                buf = text_buf;
> > +            else if ( (elf->sec[i].sec->sh_flags & SHF_WRITE) )
> > +                buf = rw_buf;
> > +             else
> 
> The indentation here is still one off.

I am not seeing it. I deleted the line and added it back using
spaces just in case. But I really don't see the indentation isse
you are seeing?

Here is what the patch looks like with the changes (minus the
possiblility of making Elf_Sym an const and casting it ..):

From 4b190a9d9fe738138416d9f501ac746ea9db2512 Mon Sep 17 00:00:00 2001
From: Ross Lagerwall <ross.lagerwall@citrix.com>
Date: Tue, 26 Apr 2016 13:52:48 -0400
Subject: [PATCH] xsplice: Implement payload loading

Add support for loading xsplice payloads. This is somewhat similar to
the Linux kernel module loader, implementing the following steps:
- Verify the elf file.
- Parse the elf file.
- Allocate a region of memory mapped within a free area of
  [xen_virt_end, XEN_VIRT_END].
- Copy allocated sections into the new region. Split them in three
  regions - .text, .data, and .rodata. MUST have at least .text.
- Resolve section symbols. All other symbols must be absolute addresses.
  (Note that patch titled "xsplice,symbols: Implement symbol name resolution
   on address" implements that)
- Perform relocations.
- Secure the the regions (.text,.data,.rodata) with proper permissions.

We capitalize on the vmalloc callback API (see patch titled:
"rm/x86/vmap: Add v[z|m]alloc_xen, and vm_init_type") to allocate
a region of memory within the [xen_virt_end, XEN_VIRT_END] for the code.

We also use the "x86/mm: Introduce modify_xen_mappings()"
to change the virtual address page-table permissions.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Julien Grall <julien.grall@arm.com>

---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v2: - Change the 'xsplice_patch_func' structure layout/size.
    - Add more error checking. Fix memory leak.
    - Move elf_resolve and elf_perform relocs in elf file.
    - Print the payload address and pages in keyhandler.
v3:
    - Make it build under ARM
    - Build it without using the return_ macro.
    - Add fixes from Ross.
    - Add the _return macro back - but only use it during debug builds.
    - Remove the macro, prefix arch_ on arch specific calls.
v4:
    - Move alloc_payload to arch specific file.
    - Use void* instead of uint8_t, use const
    - Add copyrights
    - Unroll the vmap code to add ASSERT. Change while to not incur
      potential long error loop
   - Use vmalloc/vfree cb APIs
   - Secure .text pages to be RX instead of RWX.
v5:
  - Fix allocation of virtual addresses only allowing one page to be allocated.
  - Create .text, .data, and .rodata regions with different permissions.
  - Make the find_space_t not a typedef to pointer to a function.
  - Allocate memory in here.
v6: Drop parentheses on typedefs.
  - s/an xSplice/a xSplice/
  - Rebase on "vmap: Add vmalloc_cb"
  - Rebase on "vmap: Add vmalloc_type and vm_init_type"
  - s/uint8_t/void/ on load_addr
  - Set xsplice_elf on stack without using memset.
v7:
  - Changed the check on delta = elf->hdr->e_shoff + elf->hdr->e_shnum * elf->hdr->e_shentsize;
    The sections can be right at the back of the file (different linker!), so the failing conditional
    for 'if (delta >= elf->len)' is incorrect and should have been '>'.
  - Changed dprintk(XENLOG_DEBUG to XENLOG_ERR, then back to DEBUG. Converted
    some of the printk to dprintk.
  - Rebase on " arm/x86/vmap: Add vmalloc_xen, vfree_xen and vm_init_type"
  - Changed some of the printk XENLOG_ERR to XENLOG_DEBUG
  - Check the idx in the relocation to make sure it is within bounds and
    implemented.
  - Use "x86/mm: Introduce modify_xen_mappings()"
  - Introduce PRIxElfAddr
  - Check for overflow in R_X86_64_PC32
  - Return -EOPNOTSUPP if we don't support types in ELF64_R_TYPE
v8:
  - Change dprintk and printk XENLOG_DEBUG to XENLOG_ERR
  - Convert four of the printks in dprintk.
v9:
  - Rebase on different spinlock usage in xsplice_upload.
  - Do proper bound and overflow checking.
  - Added 'const' on [text,ro,rw]_addr.
  - Made 'calc_section' and 'move_payload' use an dynamically
    allocated array for computed offsets instead of modifying sh_entsize.
  - Remove arch_xsplice_[alloc_payload|free] and use vzalloc_xen and
    vfree.
  - Collapse for loop in move_payload.
  - Move xsplice.o in Makefile
  - Add more checks in arch_xsplice_perform_rela (r_offset and
     sh_size % sh_entsize)
  - Use int32_t and int64_t in arch_xsplice_perform_rela.
  - Tighten the list of sh_flags we check
  - Use intermediate on 'buf' so that we can do 'const void *'
  - Use intermediate in xsplice_elf_resolve_symbols for 'const' of elf->sym.
  - Fail if (and only) SHF_ALLOC and SHT_NOBITS section is seen.
v10:
   - Dropped Andrew's Reviewed-by
   - Expand arch_xsplice_verify_elf to check EI_CLASS and EI_ABIVERSION
   - In arch_xsplice_perform_rela drop check against !rela->sec->sh_entsize,
     add extra checks against r_offset + sizeof(type) neccessating
     an extra goto statement.
   - Make arch_xsplice_init be __init.
   - In free_payload_data check against ->pages instead of ->text_addr.
   - In move_payload use 'void *' instead of 'uint8_t *', use xmalloc_array
     for offset, expand on the 'Do Nothing' comment and the 'Ignoring';
     Use vmalloc instead of vzalloc - which means for .bss we also use
     memset; drop the unary + when calculating address for rw_buf;
     Fix indention (I hope? I don't see an issue); also use offset[i] =UINT_MAX
     for sections we are not going to allocate or memcpy - and assert if
     we do hit those.
   - In xsplice_elf_resolve_symbols move check against
     !(elf->sec[idx].sec->sh_flags & SHF_ALLOC) back to what it was
     in v8.
   - In xsplice_elf_perform_relocs drop comment about first ELF
     symbol.
---
 xen/arch/arm/Makefile         |   1 +
 xen/arch/arm/xsplice.c        |  46 ++++++++
 xen/arch/x86/Makefile         |   1 +
 xen/arch/x86/xsplice.c        | 174 ++++++++++++++++++++++++++++++
 xen/common/xsplice.c          | 244 ++++++++++++++++++++++++++++++++++++++++--
 xen/common/xsplice_elf.c      | 114 ++++++++++++++++++++
 xen/include/xen/elfstructs.h  |   4 +
 xen/include/xen/xsplice.h     |  24 +++++
 xen/include/xen/xsplice_elf.h |  11 +-
 9 files changed, 611 insertions(+), 8 deletions(-)
 create mode 100644 xen/arch/arm/xsplice.c
 create mode 100644 xen/arch/x86/xsplice.c

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 0328b50..eae5cb3 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -40,6 +40,7 @@ obj-y += device.o
 obj-y += decode.o
 obj-y += processor.o
 obj-y += smc.o
+obj-$(CONFIG_XSPLICE) += xsplice.o
 
 #obj-bin-y += ....o
 
diff --git a/xen/arch/arm/xsplice.c b/xen/arch/arm/xsplice.c
new file mode 100644
index 0000000..8cb7767
--- /dev/null
+++ b/xen/arch/arm/xsplice.c
@@ -0,0 +1,46 @@
+/*
+ *  Copyright (C) 2016 Citrix Systems R&D Ltd.
+ */
+#include <xen/errno.h>
+#include <xen/init.h>
+#include <xen/lib.h>
+#include <xen/xsplice_elf.h>
+#include <xen/xsplice.h>
+
+int arch_xsplice_verify_elf(const struct xsplice_elf *elf)
+{
+    return -ENOSYS;
+}
+
+int arch_xsplice_perform_rel(struct xsplice_elf *elf,
+                             const struct xsplice_elf_sec *base,
+                             const struct xsplice_elf_sec *rela)
+{
+    return -ENOSYS;
+}
+
+int arch_xsplice_perform_rela(struct xsplice_elf *elf,
+                              const struct xsplice_elf_sec *base,
+                              const struct xsplice_elf_sec *rela)
+{
+    return -ENOSYS;
+}
+
+int arch_xsplice_secure(const void *va, unsigned int pages, enum va_type type)
+{
+    return -ENOSYS;
+}
+
+void __init arch_xsplice_init(void)
+{
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 729065b..f74fd2c 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -61,6 +61,7 @@ obj-y += x86_emulate.o
 obj-y += tboot.o
 obj-y += hpet.o
 obj-y += vm_event.o
+obj-$(CONFIG_XSPLICE) += xsplice.o
 obj-y += xstate.o
 
 obj-$(crash_debug) += gdbstub.o
diff --git a/xen/arch/x86/xsplice.c b/xen/arch/x86/xsplice.c
new file mode 100644
index 0000000..e50cbc0
--- /dev/null
+++ b/xen/arch/x86/xsplice.c
@@ -0,0 +1,174 @@
+/*
+ * Copyright (C) 2016 Citrix Systems R&D Ltd.
+ */
+
+#include <xen/errno.h>
+#include <xen/init.h>
+#include <xen/lib.h>
+#include <xen/mm.h>
+#include <xen/pfn.h>
+#include <xen/vmap.h>
+#include <xen/xsplice_elf.h>
+#include <xen/xsplice.h>
+
+int arch_xsplice_verify_elf(const struct xsplice_elf *elf)
+{
+
+    const Elf_Ehdr *hdr = elf->hdr;
+
+    if ( hdr->e_machine != EM_X86_64 ||
+         hdr->e_ident[EI_CLASS] != ELFCLASS64 ||
+         hdr->e_ident[EI_ABIVERSION] != 0 )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Unsupported ELF Machine type!\n",
+                elf->name);
+        return -EOPNOTSUPP;
+    }
+
+    return 0;
+}
+
+int arch_xsplice_perform_rel(struct xsplice_elf *elf,
+                             const struct xsplice_elf_sec *base,
+                             const struct xsplice_elf_sec *rela)
+{
+    dprintk(XENLOG_ERR, XSPLICE "%s: SHT_REL relocation unsupported\n",
+            elf->name);
+    return -EOPNOTSUPP;
+}
+
+int arch_xsplice_perform_rela(struct xsplice_elf *elf,
+                              const struct xsplice_elf_sec *base,
+                              const struct xsplice_elf_sec *rela)
+{
+    const Elf_RelA *r;
+    unsigned int symndx, i;
+    uint64_t val;
+    uint8_t *dest;
+
+    /* Nothing to do. */
+    if ( !rela->sec->sh_size )
+        return 0;
+
+    if ( rela->sec->sh_entsize < sizeof(Elf_RelA) ||
+         rela->sec->sh_size % rela->sec->sh_entsize )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Section relative header is corrupted!\n",
+                elf->name);
+        return -EINVAL;
+    }
+
+    for ( i = 0; i < (rela->sec->sh_size / rela->sec->sh_entsize); i++ )
+    {
+        r = rela->data + i * rela->sec->sh_entsize;
+
+        symndx = ELF64_R_SYM(r->r_info);
+
+        if ( symndx > elf->nsym )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Relative relocation wants symbol@%u which is past end!\n",
+                    elf->name, symndx);
+            return -EINVAL;
+        }
+
+        if ( r->r_offset >= base->sec->sh_size )
+            goto bad_offset;
+
+        dest = base->load_addr + r->r_offset;
+        val = r->r_addend + elf->sym[symndx].sym->st_value;
+
+        switch ( ELF64_R_TYPE(r->r_info) )
+        {
+        case R_X86_64_NONE:
+            break;
+
+        case R_X86_64_64:
+            if ( r->r_offset + sizeof(uint64_t) > base->sec->sh_size )
+                goto bad_offset;
+
+            *(uint64_t *)dest = val;
+            break;
+
+        case R_X86_64_PLT32:
+            /*
+             * Xen uses -fpic which normally uses PLT relocations
+             * except that it sets visibility to hidden which means
+             * that they are not used.  However, when gcc cannot
+             * inline memcpy it emits memcpy with default visibility
+             * which then creates a PLT relocation.  It can just be
+             * treated the same as R_X86_64_PC32.
+             */
+        case R_X86_64_PC32:
+            if ( r->r_offset + sizeof(uint32_t) > base->sec->sh_size )
+                goto bad_offset;
+
+            val -= (uint64_t)dest;
+            *(int32_t *)dest = val;
+            if ( (int64_t)val != *(int32_t *)dest )
+            {
+                dprintk(XENLOG_ERR, XSPLICE "%s: Overflow in relocation %u in %s for %s!\n",
+                        elf->name, i, rela->name, base->name);
+                return -EOVERFLOW;
+            }
+            break;
+
+        default:
+            dprintk(XENLOG_ERR, XSPLICE "%s: Unhandled relocation %lu\n",
+                    elf->name, ELF64_R_TYPE(r->r_info));
+            return -EOPNOTSUPP;
+        }
+    }
+
+    return 0;
+
+ bad_offset:
+    dprintk(XENLOG_ERR, XSPLICE "%s: Relative relocation offset is past %s section!\n",
+            elf->name, base->name);
+    return -EINVAL;
+}
+
+/*
+ * Once the resolving symbols, performing relocations, etc is complete
+ * we secure the memory by putting in the proper page table attributes
+ * for the desired type.
+ */
+int arch_xsplice_secure(const void *va, unsigned int pages, enum va_type type)
+{
+    unsigned long start = (unsigned long)va;
+    unsigned int flag;
+
+    ASSERT(va);
+    ASSERT(pages);
+
+    if ( type == XSPLICE_VA_RX )
+        flag = PAGE_HYPERVISOR_RX;
+    else if ( type == XSPLICE_VA_RW )
+        flag = PAGE_HYPERVISOR_RW;
+    else
+        flag = PAGE_HYPERVISOR_RO;
+
+    modify_xen_mappings(start, start + pages * PAGE_SIZE, flag);
+
+    return 0;
+}
+
+void __init arch_xsplice_init(void)
+{
+    void *start, *end;
+
+    start = (void *)xen_virt_end;
+    end = (void *)(XEN_VIRT_END - NR_CPUS * PAGE_SIZE);
+
+    BUG_ON(end <= start);
+
+    vm_init_type(VMAP_XEN, start, end);
+}
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 6623ce5..3f3aacc 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -13,6 +13,7 @@
 #include <xen/smp.h>
 #include <xen/spinlock.h>
 #include <xen/vmap.h>
+#include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 
 #include <asm/event.h>
@@ -29,6 +30,13 @@ struct payload {
     uint32_t state;                      /* One of the XSPLICE_STATE_*. */
     int32_t rc;                          /* 0 or -XEN_EXX. */
     struct list_head list;               /* Linked to 'payload_list'. */
+    const void *text_addr;               /* Virtual address of .text. */
+    size_t text_size;                    /* .. and its size. */
+    const void *rw_addr;                 /* Virtual address of .data. */
+    size_t rw_size;                      /* .. and its size (if any). */
+    const void *ro_addr;                 /* Virtual address of .rodata. */
+    size_t ro_size;                      /* .. and its size (if any). */
+    unsigned int pages;                  /* Total pages for [text,rw,ro]_addr */
     char name[XEN_XSPLICE_NAME_SIZE];    /* Name of it. */
 };
 
@@ -83,19 +91,231 @@ static struct payload *find_payload(const char *name)
     return found;
 }
 
+/*
+ * Functions related to XEN_SYSCTL_XSPLICE_UPLOAD (see xsplice_upload), and
+ * freeing payload (XEN_SYSCTL_XSPLICE_ACTION:XSPLICE_ACTION_UNLOAD).
+ */
+
+static void free_payload_data(struct payload *payload)
+{
+    /* Set to zero until "move_payload". */
+    if ( !payload->pages )
+        return;
+
+    vfree((void *)payload->text_addr);
+
+    payload->pages = 0;
+}
+
+/*
+* calc_section computes the size (taking into account section alignment).
+*
+* Furthermore the offset is set with the offset from the start of the virtual
+* address space for the payload (using passed in size). This is used in
+* move_payload to figure out the destination location (load_addr).
+*/
+static void calc_section(const struct xsplice_elf_sec *sec, size_t *size,
+                         unsigned int *offset)
+{
+    const Elf_Shdr *s = sec->sec;
+    size_t align_size;
+
+    align_size = ROUNDUP(*size, s->sh_addralign);
+    *offset = align_size;
+    *size = s->sh_size + align_size;
+}
+
+static int move_payload(struct payload *payload, struct xsplice_elf *elf)
+{
+    void *text_buf, *ro_buf, *rw_buf;
+    unsigned int i;
+    size_t size = 0;
+    unsigned int *offset;
+    int rc = 0;
+
+    offset = xmalloc_array(unsigned int, elf->hdr->e_shnum);
+    if ( !offset )
+        return -ENOMEM;
+
+    /* Compute size of different regions. */
+    for ( i = 1; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( (elf->sec[i].sec->sh_flags & (SHF_ALLOC|SHF_EXECINSTR)) ==
+             (SHF_ALLOC|SHF_EXECINSTR) )
+            calc_section(&elf->sec[i], &payload->text_size, &offset[i]);
+        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
+                  !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
+                  (elf->sec[i].sec->sh_flags & SHF_WRITE) )
+            calc_section(&elf->sec[i], &payload->rw_size, &offset[i]);
+        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
+                  !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
+                  !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
+            calc_section(&elf->sec[i], &payload->ro_size, &offset[i]);
+        else if ( !elf->sec[i].sec->sh_flags ||
+                  (elf->sec[i].sec->sh_flags & SHF_EXECINSTR) ||
+                  (elf->sec[i].sec->sh_flags & SHF_MASKPROC) )
+            /*
+             * Do nothing. These are .rel.text, rel.*, .symtab, .strtab,
+             * and .shstrtab. For the non-relocate we allocate and copy these
+             * via other means - and the .rel we can ignore as we only use it
+             * once during loading.
+             */
+            offset[i] = UINT_MAX;
+        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
+                  (elf->sec[i].sec->sh_type == SHT_NOBITS) )
+        {
+            dprintk(XENLOG_DEBUG, XSPLICE "%s: Not supporting %s section!\n",
+                    elf->name, elf->sec[i].name);
+            rc = -EOPNOTSUPP;
+            goto out;
+        }
+        else /* Such as .comment, or .debug_str. */
+        {
+            dprintk(XENLOG_DEBUG, XSPLICE "%s: Ignoring %s section!\n",
+                    elf->name, elf->sec[i].name);
+            offset[i] = UINT_MAX;
+        }
+    }
+
+    /*
+     * Total of all three regions - RX, RW, and RO. We have to have
+     * keep them in seperate pages so we PAGE_ALIGN the RX and RW to have
+     * them on seperate pages. The last one will by default fall on its
+     * own page.
+     */
+    size = PAGE_ALIGN(payload->text_size) + PAGE_ALIGN(payload->rw_size) +
+                      payload->ro_size;
+
+    size = PFN_UP(size); /* Nr of pages. */
+    text_buf = vmalloc_xen(size * PAGE_SIZE);
+    if ( !text_buf )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Could not allocate memory for payload!\n",
+                elf->name);
+        rc = -ENOMEM;
+        goto out;
+    }
+    rw_buf = text_buf + PAGE_ALIGN(payload->text_size);
+    ro_buf = rw_buf + PAGE_ALIGN(payload->rw_size);
+
+    payload->pages = size;
+    payload->text_addr = text_buf;
+    payload->rw_addr = rw_buf;
+    payload->ro_addr = ro_buf;
+
+    for ( i = 1; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( elf->sec[i].sec->sh_flags & SHF_ALLOC )
+        {
+            void *buf;
+
+            if ( elf->sec[i].sec->sh_flags & SHF_EXECINSTR )
+                buf = text_buf;
+            else if ( elf->sec[i].sec->sh_flags & SHF_WRITE )
+                buf = rw_buf;
+             else
+                buf = ro_buf;
+
+            ASSERT(offset[i] != UINT_MAX);
+
+            elf->sec[i].load_addr = buf + offset[i];
+
+            /* Don't copy NOBITS - such as BSS. */
+            if ( elf->sec[i].sec->sh_type != SHT_NOBITS )
+            {
+                memcpy(elf->sec[i].load_addr, elf->sec[i].data,
+                       elf->sec[i].sec->sh_size);
+                dprintk(XENLOG_DEBUG, XSPLICE "%s: Loaded %s at %p\n",
+                        elf->name, elf->sec[i].name, elf->sec[i].load_addr);
+            }
+            else
+                memset(elf->sec[i].load_addr, 0, elf->sec[i].sec->sh_size);
+        }
+    }
+
+ out:
+    xfree(offset);
+
+    return rc;
+}
+
+static int secure_payload(struct payload *payload, struct xsplice_elf *elf)
+{
+    int rc;
+    unsigned int text_pages, rw_pages, ro_pages;
+
+    text_pages = PFN_UP(payload->text_size);
+    ASSERT(text_pages);
+
+    rc = arch_xsplice_secure(payload->text_addr, text_pages, XSPLICE_VA_RX);
+    if ( rc )
+        return rc;
+
+    rw_pages = PFN_UP(payload->rw_size);
+    if ( rw_pages )
+    {
+        rc = arch_xsplice_secure(payload->rw_addr, rw_pages, XSPLICE_VA_RW);
+        if ( rc )
+            return rc;
+    }
+
+    ro_pages = PFN_UP(payload->ro_size);
+    if ( ro_pages )
+        rc = arch_xsplice_secure(payload->ro_addr, ro_pages, XSPLICE_VA_RO);
+
+    ASSERT(ro_pages + rw_pages + text_pages == payload->pages);
+
+    return rc;
+}
+
 static void free_payload(struct payload *data)
 {
     ASSERT(spin_is_locked(&payload_lock));
     list_del(&data->list);
     payload_cnt--;
     payload_version++;
+    free_payload_data(data);
     xfree(data);
 }
 
+static int load_payload_data(struct payload *payload, void *raw, size_t len)
+{
+    struct xsplice_elf elf = { .name = payload->name, .len = len };
+    int rc = 0;
+
+    rc = xsplice_elf_load(&elf, raw);
+    if ( rc )
+        goto out;
+
+    rc = move_payload(payload, &elf);
+    if ( rc )
+        goto out;
+
+    rc = xsplice_elf_resolve_symbols(&elf);
+    if ( rc )
+        goto out;
+
+    rc = xsplice_elf_perform_relocs(&elf);
+    if ( rc )
+        goto out;
+
+    rc = secure_payload(payload, &elf);
+
+ out:
+    if ( rc )
+        free_payload_data(payload);
+
+    /* Free our temporary data structure. */
+    xsplice_elf_free(&elf);
+
+    return rc;
+}
+
 static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
 {
     struct payload *data, *found;
     char n[XEN_XSPLICE_NAME_SIZE];
+    void *raw_data;
     int rc;
 
     rc = verify_payload(upload, n);
@@ -103,6 +323,7 @@ static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
         return rc;
 
     data = xzalloc(struct payload);
+    raw_data = vmalloc(upload->size);
 
     spin_lock(&payload_lock);
 
@@ -118,13 +339,19 @@ static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
         goto out;
     }
 
-    if ( !data )
-    {
-        rc = -ENOMEM;
+    rc = -ENOMEM;
+    if ( !data || !raw_data )
         goto out;
-    }
 
+    rc = -EFAULT;
+    if ( __copy_from_guest(raw_data, upload->payload, upload->size) )
+        goto out;
     memcpy(data->name, n, strlen(n));
+
+    rc = load_payload_data(data, raw_data, upload->size);
+    if ( rc )
+        goto out;
+
     data->state = XSPLICE_STATE_CHECKED;
     INIT_LIST_HEAD(&data->list);
 
@@ -135,6 +362,8 @@ static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
  out:
     spin_unlock(&payload_lock);
 
+    vfree(raw_data);
+
     if ( rc )
         xfree(data);
 
@@ -369,8 +598,9 @@ static void xsplice_printall(unsigned char key)
     }
 
     list_for_each_entry ( data, &payload_list, list )
-        printk(" name=%s state=%s(%d)\n", data->name,
-               state2str(data->state), data->state);
+        printk(" name=%s state=%s(%d) %p (.data=%p, .rodata=%p) using %u pages.\n",
+               data->name, state2str(data->state), data->state, data->text_addr,
+               data->rw_addr, data->ro_addr, data->pages);
 
     spin_unlock(&payload_lock);
 }
@@ -378,6 +608,8 @@ static void xsplice_printall(unsigned char key)
 static int __init xsplice_init(void)
 {
     register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
+
+    arch_xsplice_init();
     return 0;
 }
 __initcall(xsplice_init);
diff --git a/xen/common/xsplice_elf.c b/xen/common/xsplice_elf.c
index dd078c4..a3d84b1 100644
--- a/xen/common/xsplice_elf.c
+++ b/xen/common/xsplice_elf.c
@@ -105,6 +105,7 @@ static int elf_resolve_sections(struct xsplice_elf *elf, const void *data)
 
             elf->symtab = &sec[i];
 
+            elf->symtab_idx = i;
             /*
              * elf->symtab->sec->sh_link would point to the right section
              * but we hadn't finished parsing all the sections.
@@ -253,9 +254,118 @@ static int elf_get_sym(struct xsplice_elf *elf, const void *data)
     return 0;
 }
 
+int xsplice_elf_resolve_symbols(struct xsplice_elf *elf)
+{
+    unsigned int i;
+    int rc = 0;
+
+    ASSERT(elf->sym);
+
+    for ( i = 1; i < elf->nsym; i++ )
+    {
+        unsigned int idx = elf->sym[i].sym->st_shndx;
+        Elf_Sym *sym = (Elf_Sym *)elf->sym[i].sym;
+
+        switch ( idx )
+        {
+        case SHN_COMMON:
+            dprintk(XENLOG_ERR, XSPLICE "%s: Unexpected common symbol: %s\n",
+                    elf->name, elf->sym[i].name);
+            rc = -EINVAL;
+            break;
+
+        case SHN_UNDEF:
+            dprintk(XENLOG_ERR, XSPLICE "%s: Unknown symbol: %s\n",
+                    elf->name, elf->sym[i].name);
+            rc = -ENOENT;
+            break;
+
+        case SHN_ABS:
+            dprintk(XENLOG_DEBUG, XSPLICE "%s: Absolute symbol: %s => %#"PRIxElfAddr"\n",
+                    elf->name, elf->sym[i].name, sym->st_value);
+            break;
+
+        default:
+            /* SHN_COMMON and SHN_ABS are above. */
+            if ( idx >= SHN_LORESERVE )
+                rc = -EOPNOTSUPP;
+            else if ( idx >= elf->hdr->e_shnum )
+                rc = -EINVAL;
+
+            if ( rc )
+            {
+                dprintk(XENLOG_ERR, XSPLICE "%s: Unknown type=%#"PRIx16"\n",
+                        elf->name, idx);
+                break;
+            }
+
+            if ( !(elf->sec[idx].sec->sh_flags & SHF_ALLOC) )
+                break;
+
+            sym->st_value += (unsigned long)elf->sec[idx].load_addr;
+            if ( elf->sym[i].name )
+                dprintk(XENLOG_DEBUG, XSPLICE "%s: Symbol resolved: %s => %#"PRIxElfAddr"(%s)\n",
+                       elf->name, elf->sym[i].name,
+                       sym->st_value, elf->sec[idx].name);
+        }
+
+        if ( rc )
+            break;
+    }
+
+    return rc;
+}
+
+int xsplice_elf_perform_relocs(struct xsplice_elf *elf)
+{
+    struct xsplice_elf_sec *r, *base;
+    unsigned int i;
+    int rc = 0;
+
+    ASSERT(elf->sym);
+
+    for ( i = 1; i < elf->hdr->e_shnum; i++ )
+    {
+        r = &elf->sec[i];
+
+        if ( (r->sec->sh_type != SHT_RELA) &&
+             (r->sec->sh_type != SHT_REL) )
+            continue;
+
+         /* Is it a valid relocation section? */
+         if ( r->sec->sh_info >= elf->hdr->e_shnum )
+            continue;
+
+         base = &elf->sec[r->sec->sh_info];
+
+         /* Don't relocate non-allocated sections. */
+         if ( !(base->sec->sh_flags & SHF_ALLOC) )
+            continue;
+
+        if ( r->sec->sh_link != elf->symtab_idx )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Relative link of %s is incorrect (%d, expected=%d)\n",
+                    elf->name, r->name, r->sec->sh_link, elf->symtab_idx);
+            rc = -EINVAL;
+            break;
+        }
+
+        if ( r->sec->sh_type == SHT_RELA )
+            rc = arch_xsplice_perform_rela(elf, base, r);
+        else /* SHT_REL */
+            rc = arch_xsplice_perform_rel(elf, base, r);
+
+        if ( rc )
+            break;
+    }
+
+    return rc;
+}
+
 static int xsplice_header_check(const struct xsplice_elf *elf)
 {
     const Elf_Ehdr *hdr = elf->hdr;
+    int rc;
 
     if ( sizeof(*elf->hdr) > elf->len )
     {
@@ -282,6 +392,10 @@ static int xsplice_header_check(const struct xsplice_elf *elf)
         return -EOPNOTSUPP;
     }
 
+    rc = arch_xsplice_verify_elf(elf);
+    if ( rc )
+        return rc;
+
     if ( elf->hdr->e_shstrndx == SHN_UNDEF )
     {
         dprintk(XENLOG_ERR, XSPLICE "%s: Section name idx is undefined!?\n",
diff --git a/xen/include/xen/elfstructs.h b/xen/include/xen/elfstructs.h
index 85f35ed..2b9bd3f 100644
--- a/xen/include/xen/elfstructs.h
+++ b/xen/include/xen/elfstructs.h
@@ -472,6 +472,8 @@ typedef struct {
 #endif
 
 #if defined(ELFSIZE) && (ELFSIZE == 32)
+#define PRIxElfAddr	"08x"
+
 #define Elf_Ehdr	Elf32_Ehdr
 #define Elf_Phdr	Elf32_Phdr
 #define Elf_Shdr	Elf32_Shdr
@@ -497,6 +499,8 @@ typedef struct {
 
 #define AuxInfo		Aux32Info
 #elif defined(ELFSIZE) && (ELFSIZE == 64)
+#define PRIxElfAddr	PRIx64
+
 #define Elf_Ehdr	Elf64_Ehdr
 #define Elf_Phdr	Elf64_Phdr
 #define Elf_Shdr	Elf64_Shdr
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index 7559877..857c264 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -6,6 +6,9 @@
 #ifndef __XEN_XSPLICE_H__
 #define __XEN_XSPLICE_H__
 
+struct xsplice_elf;
+struct xsplice_elf_sec;
+struct xsplice_elf_sym;
 struct xen_sysctl_xsplice_op;
 
 #ifdef CONFIG_XSPLICE
@@ -15,6 +18,27 @@ struct xen_sysctl_xsplice_op;
 
 int xsplice_op(struct xen_sysctl_xsplice_op *);
 
+/* Arch hooks. */
+int arch_xsplice_verify_elf(const struct xsplice_elf *elf);
+int arch_xsplice_perform_rel(struct xsplice_elf *elf,
+                             const struct xsplice_elf_sec *base,
+                             const struct xsplice_elf_sec *rela);
+int arch_xsplice_perform_rela(struct xsplice_elf *elf,
+                              const struct xsplice_elf_sec *base,
+                              const struct xsplice_elf_sec *rela);
+enum va_type {
+    XSPLICE_VA_RX, /* .text */
+    XSPLICE_VA_RW, /* .data */
+    XSPLICE_VA_RO, /* .rodata */
+};
+
+/*
+ * Function to secure the allocate pages (from arch_xsplice_alloc_payload)
+ * with the right page permissions.
+ */
+int arch_xsplice_secure(const void *va, unsigned int pages, enum va_type types);
+
+void arch_xsplice_init(void);
 #else
 
 #include <xen/errno.h> /* For -ENOSYS */
diff --git a/xen/include/xen/xsplice_elf.h b/xen/include/xen/xsplice_elf.h
index 686aaf0..750dc94 100644
--- a/xen/include/xen/xsplice_elf.h
+++ b/xen/include/xen/xsplice_elf.h
@@ -15,6 +15,8 @@ struct xsplice_elf_sec {
                                             elf_resolve_section_names. */
     const void *data;                    /* Pointer to the section (done by
                                             elf_resolve_sections). */
+    void *load_addr;                     /* A pointer to the allocated destination.
+                                            Done by load_payload_data. */
 };
 
 struct xsplice_elf_sym {
@@ -29,8 +31,10 @@ struct xsplice_elf {
     struct xsplice_elf_sec *sec;         /* Array of sections, allocated by us. */
     struct xsplice_elf_sym *sym;         /* Array of symbols , allocated by us. */
     unsigned int nsym;
-    const struct xsplice_elf_sec *symtab;/* Pointer to .symtab section - aka to sec[x]. */
-    const struct xsplice_elf_sec *strtab;/* Pointer to .strtab section - aka to sec[y]. */
+    const struct xsplice_elf_sec *symtab;/* Pointer to .symtab section - aka to
+                                            sec[symtab_idx]. */
+    const struct xsplice_elf_sec *strtab;/* Pointer to .strtab section. */
+    unsigned int symtab_idx;
 };
 
 const struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
@@ -38,6 +42,9 @@ const struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *
 int xsplice_elf_load(struct xsplice_elf *elf, const void *data);
 void xsplice_elf_free(struct xsplice_elf *elf);
 
+int xsplice_elf_resolve_symbols(struct xsplice_elf *elf);
+int xsplice_elf_perform_relocs(struct xsplice_elf *elf);
+
 #endif /* __XEN_XSPLICE_ELF_H__ */
 
 /*
-- 
2.4.3


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 16/27] x86, xsplice: Print payload's symbol name and payload name in backtraces
  2016-04-26 13:41         ` Jan Beulich
@ 2016-04-27  3:31           ` Konrad Rzeszutek Wilk
  2016-04-27  8:37             ` Jan Beulich
  0 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-27  3:31 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, andrew.cooper3, Ian Jackson, Tim Deegan, mpohlack,
	Ross Lagerwall, sasha.levin, xen-devel

> >> When text_addr is void *, how is this calculation wrong then?
> > 
> > I'm sorry, ignore that. I temporarily forgot how void* arithmetic is 
> > defined for GCC.
> > 
> > The other two points are still valid and may result in incorrect 
> > backtraces with > 1 payload loaded.
> 
> Of course.

So it would look like this then:

From affca85da4d57c466cc3a603afa4d57fea7ed092 Mon Sep 17 00:00:00 2001
From: Ross Lagerwall <ross.lagerwall@citrix.com>
Date: Fri, 22 Apr 2016 11:16:36 -0400
Subject: [PATCH] x86, xsplice: Print payload's symbol name and payload name in
 backtraces

Naturally the backtrace is presented when an instruction
hits an bug_frame or %p is used.

The payloads do not support bug_frames yet - however the functions
the payloads call could hit an BUG() or WARN().

The traps.c has logic to scan for it this - and eventually it will
find the correct bug_frame and the walk the stack using %p to print
the backtrace. For %p and symbols to print a string -  the
'is_active_kernel_text' is consulted which uses an 'struct virtual_region'.

Therefore we register our start->end addresses so that
'is_active_kernel_text' will include our payload address.

We also register our symbol lookup table function so that it can
scan the list of payloads and retrieve the correct name.

Lastly we change vsprintf to take into account s and namebuf.
For core code they are the same, but for payloads they are different.
This gets us:

Xen call trace:
   [<ffff82d080a00041>] revert_hook+0x31/0x35 [xen_hello_world]
   [<ffff82d0801431bd>] xsplice.c#revert_payload+0x86/0xc6
   [<ffff82d080143502>] check_for_xsplice_work+0x233/0x3cd
   [<ffff82d08017a0b2>] domain.c#continue_idle_domain+0x9/0x1f

Which is great if payloads have similar or same symbol names.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>

v2: Add missing full stop.
v3: s/module/payload/
v4: Expand comment and include registration of 'virtual_region'
    Redo the vsprintf handling of payload name.
    Drop the ->skip function
v6: Add comment explaining the purpose behind the strcmp.
    Redid per Jan's review.
v7: Add Andrew's Review-by
    Drop the strcmp and just do pointer checks.
v9: Do pointer comparison on vsprintf by itself, no need for intermediate
    payload bool_t
    Add const in xsplice_symbols_lookup
    Make 'best' in xsplice_symbols_lookup be unsigned int.
    Use an RCU list for iterating the applied_list. Define the RCU lock.
v10:
    In xsplice_symbols_lookup use || instead of && when skipping.
    Also in xsplice_symbols_lookup use ->text_size instead of ->pages.
---
---
 xen/common/vsprintf.c     | 12 +++++++++
 xen/common/xsplice.c      | 69 ++++++++++++++++++++++++++++++++++++++++++++---
 xen/include/xen/xsplice.h |  1 +
 3 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/xen/common/vsprintf.c b/xen/common/vsprintf.c
index 18d2634..70e1edf 100644
--- a/xen/common/vsprintf.c
+++ b/xen/common/vsprintf.c
@@ -20,6 +20,7 @@
 #include <xen/symbols.h>
 #include <xen/lib.h>
 #include <xen/sched.h>
+#include <xen/xsplice.h>
 #include <asm/div64.h>
 #include <asm/page.h>
 
@@ -354,6 +355,17 @@ static char *pointer(char *str, char *end, const char **fmt_ptr,
             str = number(str, end, sym_size, 16, -1, -1, SPECIAL);
         }
 
+        /*
+         * namebuf contents and s for core hypervisor are same but for xSplice
+         * payloads they differ (namebuf contains the name of the payload).
+         */
+        if ( namebuf != s )
+        {
+            str = string(str, end, " [", -1, -1, 0);
+            str = string(str, end, namebuf, -1, -1, 0);
+            str = string(str, end, "]", -1, -1, 0);
+        }
+
         return str;
     }
 
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 85af98a..188b850 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -14,7 +14,9 @@
 #include <xen/smp.h>
 #include <xen/softirq.h>
 #include <xen/spinlock.h>
+#include <xen/string.h>
 #include <xen/symbols.h>
+#include <xen/virtual_region.h>
 #include <xen/vmap.h>
 #include <xen/wait.h>
 #include <xen/xsplice_elf.h>
@@ -31,10 +33,9 @@ static LIST_HEAD(payload_list);
 
 /*
  * Patches which have been applied. Need RCU in case we crash (and then
- * traps code would iterate via applied_list) when adding entries on the list.
- *
- * Note: There are no 'rcu_applied_lock' as we don't iterate yet the list.
+ * traps code would iterate via applied_list) when adding entries onthe list.
  */
+static DEFINE_RCU_READ_LOCK(rcu_applied_lock);
 static LIST_HEAD(applied_list);
 
 static unsigned int payload_cnt;
@@ -56,6 +57,8 @@ struct payload {
     unsigned int nfuncs;                 /* Nr of functions to patch. */
     const struct xsplice_symbol *symtab; /* All symbols. */
     const char *strtab;                  /* Pointer to .strtab. */
+    struct virtual_region region;        /* symbol, bug.frame patching and
+                                            exception table (x86). */
     unsigned int nsyms;                  /* Nr of entries in .strtab and symbols. */
     char name[XEN_XSPLICE_NAME_SIZE];    /* Name of it. */
 };
@@ -139,6 +142,55 @@ void *xsplice_symbols_lookup_by_name(const char *symname)
     return 0;
 }
 
+static const char *xsplice_symbols_lookup(unsigned long addr,
+                                          unsigned long *symbolsize,
+                                          unsigned long *offset,
+                                          char *namebuf)
+{
+    const struct payload *data;
+    unsigned int i, best;
+    const void *va = (const void *)addr;
+    const char *n = NULL;
+
+    /*
+     * Only RCU locking since this list is only ever changed during apply
+     * or revert context. And in case it dies there we need an safe list.
+     */
+    rcu_read_lock(&rcu_applied_lock);
+    list_for_each_entry_rcu ( data, &applied_list, applied_list )
+    {
+        if ( va < data->text_addr ||
+             va >= (data->text_addr + data->text_size) )
+            continue;
+
+        best = UINT_MAX;
+
+        for ( i = 0; i < data->nsyms; i++ )
+        {
+            if ( data->symtab[i].value <= va &&
+                 (best == UINT_MAX ||
+                  data->symtab[best].value < data->symtab[i].value) )
+                best = i;
+        }
+
+        if ( best == UINT_MAX )
+            break;
+
+        if ( symbolsize )
+            *symbolsize = data->symtab[best].size;
+        if ( offset )
+            *offset = va - data->symtab[best].value;
+        if ( namebuf )
+            strlcpy(namebuf, data->name, KSYM_NAME_LEN);
+
+        n = data->symtab[best].name;
+        break;
+    }
+    rcu_read_unlock(&rcu_applied_lock);
+
+    return n;
+}
+
 static struct payload *find_payload(const char *name)
 {
     struct payload *data, *found = NULL;
@@ -375,6 +427,7 @@ static int prepare_payload(struct payload *payload,
     const struct xsplice_elf_sec *sec;
     unsigned int i;
     struct xsplice_patch_func *f;
+    struct virtual_region *region;
 
     sec = xsplice_elf_sec_by_name(elf, ELF_XSPLICE_FUNC);
     ASSERT(sec);
@@ -431,6 +484,13 @@ static int prepare_payload(struct payload *payload,
         }
     }
 
+    /* Setup the virtual region with proper data. */
+    region = &payload->region;
+
+    region->symbols_lookup = xsplice_symbols_lookup;
+    region->start = payload->text_addr;
+    region->end = payload->text_addr + payload->text_size;
+
     return 0;
 }
 
@@ -507,6 +567,7 @@ static int build_symbol_table(struct payload *payload,
         if ( is_payload_symbol(elf, elf->sym + i) )
         {
             symtab[nsyms].name = strtab + strtab_len;
+            symtab[nsyms].size = elf->sym[i].sym->st_size;
             symtab[nsyms].value = (void *)elf->sym[i].sym->st_value;
             symtab[nsyms].new_symbol = 0; /* May be overwritten below. */
             strtab_len += strlcpy(strtab + strtab_len, elf->sym[i].name,
@@ -797,6 +858,7 @@ static int apply_payload(struct payload *data)
      * The applied_list is iterated by the trap code.
      */
     list_add_tail_rcu(&data->applied_list, &applied_list);
+    register_virtual_region(&data->region);
 
     return 0;
 }
@@ -819,6 +881,7 @@ static int revert_payload(struct payload *data)
      * The applied_list is iterated by the trap code.
      */
     list_del_rcu(&data->applied_list);
+    unregister_virtual_region(&data->region);
 
     return 0;
 }
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index 1526752..bb8baee 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -22,6 +22,7 @@ struct xen_sysctl_xsplice_op;
 struct xsplice_symbol {
     const char *name;
     void *value;
+    unsigned int size;
     bool_t new_symbol;
 };
 
-- 
2.4.3


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 12/27] xsplice: Implement support for applying/reverting/replacing patches.
  2016-04-26 15:21   ` Jan Beulich
@ 2016-04-27  3:39     ` Konrad Rzeszutek Wilk
  2016-04-27  8:36       ` Jan Beulich
  2016-05-11  9:51       ` Martin Pohlack
  0 siblings, 2 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-27  3:39 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, Keir Fraser,
	Suravee Suthikulpanit, andrew.cooper3, mpohlack, ross.lagerwall,
	Julien Grall, Jun Nakajima, sasha.levin, xen-devel,
	Boris Ostrovsky

. snip..
> 
> Thinking about it again, even more stack conserving would be a
> bitmap...

Heheh
> 
> > +static int apply_payload(struct payload *data)
> > +{
> > +    unsigned int i;
> > +
> > +    printk(XENLOG_INFO XSPLICE "%s: Applying %u functions\n",
> > +            data->name, data->nfuncs);
> > +
> > +    arch_xsplice_patching_enter();
> > +
> > +    for ( i = 0; i < data->nfuncs; i++ )
> > +        arch_xsplice_apply_jmp(&data->funcs[i]);
> > +
> > +    arch_xsplice_patching_leave();
> > +
> > +    list_add_tail_rcu(&data->applied_list, &applied_list);
> 
> Neither in the comment earlier on nor here it becomes clear that this
> is more of an abuse than a use of RCU.

I added more comments and..
> 
> > +struct xsplice_patch_func {
> > +    const char *name;       /* Name of function to be patched. */
> > +    void *new_addr;
> > +    void *old_addr;
> > +    uint32_t new_size;
> > +    uint32_t old_size;
> > +    uint8_t version;        /* MUST be XSPLICE_PAYLOAD_VERSION. */
> > +    uint8_t opaque[31];     /* MUST be zero filled. */
> 
> I don't see the zero filling being a requirement, nor it being enforced.

.. removed this.

From adeadf8babcc5ef6d512cdc28899b4d1de34c60e Mon Sep 17 00:00:00 2001
From: Ross Lagerwall <ross.lagerwall@citrix.com>
Date: Thu, 21 Apr 2016 06:14:29 -0400
Subject: [PATCH] xsplice: Implement support for applying/reverting/replacing
 patches.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Implement support for the apply, revert and replace actions.

To perform and action on a payload, the hypercall sets up a data
structure to schedule the work.  A hook is added in the reset_stack_and_jump
to check for work and execute it if needed (specifically we check an
per-cpu flag to make this as quick as possible).

In this way, patches can be applied with all CPUs idle and without
stacks.  The first CPU to run check_for_xsplice_work() becomes the
master and triggers a reschedule softirq to trigger all the other CPUs
to enter check_for_xsplice_work() with no stack.  Once all CPUs
have rendezvoused, all CPUs disable their IRQs and NMIs are ignored.
The system is then quiscient and the master performs the action.
After this, all CPUs enable IRQs and NMIs are re-enabled.

Note that it is unsafe to patch do_nmi and the xSplice internal functions.
Patching functions on NMI/MCE path is liable to end in disaster on x86.
This is not addressed in this patch and is mentioned in the
design doc as a further TODO.

The action to perform is one of:
- APPLY: For each function in the module, store the first arch-specific
  number bytes of the old function and replace it with a jump to the
  new function. (on x86 it is 5 bytes, on ARM it will likey be 4 bytes).
- REVERT: Copy the previously stored bytes into the first arch-specific
  number of bytes of the old function (again, 5 bytes on x86).
- REPLACE: Revert each applied module and then apply the new module.

To prevent a deadlock with any other barrier in the system, the master
will wait for up to 30ms before timing out.
Measurements found that the patch application to take about 100 μs on a
72 CPU system, whether idle or fully loaded.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Julien Grall <julien.grall@arm.com>

--
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>

v2: - Pluck the 'struct xsplice_patch_func' in this patch.
    - Modify code per review comments.
    - Add more data in the keyboard handler.
    - Redo the patching code, split it in functions.
v3: - Add return_ macro for debug builds.
    - Move s/payload_list_lock/payload_list/ to earlier patch
    - Remove const and use ELF types for xsplice_patch_func
     - Add check routine to do simple sanity checks for various
      sections.
    - s/%p/PRIx64/ as ARM builds complain.
    - Move code around. Add more dprintk. Add XSPLICE in front of all
      printks/dprintk.
      Put the NMIs back if we fail patching.
      Add per-cpu to lessen contention for global structure.
      Extract from xsplice_do_single patching code into xsplice_do_action
      Squash xsplice_do_single and check_for_xsplice_work together to
      have all rendezvous in one place.
      Made XSPLICE_ACTION_REPLACE work again (wrong list iterator)
      s/find_special_sections/prepare_payload/
      Use list_del_init and INIT_LIST_HEAD for applied_list
v4:
   - Add comment, adjust spacing for "Timed out on CPU semaphore"
   - Added CR0.WP manipulations when altering the .text of hypervisor.
   - Added fix from Andrew for CR0.WP manipulation.
v5: - Made xsplice_patch_func use uintXX_t instead of ELF_ types to easy
      making it work under ARM (32bit). Add more BUILD-BUG-ON checks.
    - Add more BUILD_ON checks. Sprinkle newlines.
v6: Rebase on "arm/x86: Alter nmi_callback_t typedef"
   - Drop the recursive spinlock usage.
   - Move NMI callbacks in arch specific.
   - Fold the 'check_for_xsplice_work' in reset_stack_and_jump
   - Add arch specific check for .xsplice.funcs.
   - Seperate external and internal structure of .xsplice.funcs.
   - Changed per Jan's review
   - Modified the .xsplice.funcs checks
v7:
   - Modified old_ptr to void* instead of uint8_t*
   - Modified the xsplice_patch_func_internal for ARM32 to have padding.
   - Used #if BITS_PER_LONG == 64 for the xsplice_patch_func_internal along
     with ifndef CONFIG_ARM for the undo (which may be different size on ARM64)
v8:
  - Add "is empty" if special sections are in fact empty.
  - Added Andrew's Reviewed-by:
  - Rebase on v7.2 of  x86/mm: Introduce modify_xen_mappings()
  - Change some of printk to dprintk and some of the dprintk to printk.
  - Make the xsplice_patch_func (and the internal) structure have uint32_t
    (instead of uint64_t) if BITS_PER_LONG==32. This makes the size and
    offset different so note that in the design and common code.
  - Add #undef ACTION
  - Guard struct xsplice_patch_func in sysctl.h with __XEN__ as toolstacks
    will fail to compile. We do have BITS_PER_LONG defined in xc_bitops.h but
    that will go away (and also that macro uses sizeof and the pre-processor
    will choke on that).
  - Dropped Julien's Acked as I replaced BITS_PER_LONG/CONFIG_ARM_32.
    (Stefano is OK with it, but would prefer BITS_PER_LONG, Jan does not want
    BITS_PER_LONG).
v9: Expose the struct xsplice_patch_func old_addr and new_addr as void
    instead of uint32_t or uint64_t.
  - Added Julien' Ack back.
  - Rename pad to opaque.
  - Added comment in aidle_loop.
  - Squash internal and public of 'xsplice_patch_func'
  - Fixed remainig sizeof use.
  - Removed reference to MCE
  - Fixed comment styles.
  - Use bool_t in check_special_sections
  - Add a #define for .xsplice.funcs.
  - Remove full stops from printk
  - Fix xsplice_do_action per Jan's punchlist
  - Use spin_lock_try in keyhandler
  - Remove leading underscores from __CHECK_FOR_XSPLICE_WORK
  - Don't fail compilation on GCC5 - we MUST have rc set.
  - Don't bail out if finding !sh_type as those are for .rela or .debug
    and while we don't need to allocate it (as we had already done
    the relocation), do continue.
  - Make applied_list be an RCU type to guard against infinite loops
    when searching the applied_list.
  - Dropped the irq_semaphore and are re-using the semaphore atomic when
    CPUs have rendezvoused and are ready to go in IRQ disable phase.
v10: Drop Reviewed-by
  - Use bitmap for in check_special_sections to check for sections
    appearing twice.
  - Add comment about us abusing the list RCU for our safety reasons.
  - And remove MUST comment about opaque having to be zero filled.
---
 xen/arch/arm/xsplice.c        |  33 +++
 xen/arch/x86/domain.c         |   6 +
 xen/arch/x86/xsplice.c        |  76 +++++++
 xen/common/xsplice.c          | 480 +++++++++++++++++++++++++++++++++++++++++-
 xen/include/asm-x86/current.h |  10 +-
 xen/include/public/sysctl.h   |  20 ++
 xen/include/xen/xsplice.h     |  21 ++
 7 files changed, 634 insertions(+), 12 deletions(-)

diff --git a/xen/arch/arm/xsplice.c b/xen/arch/arm/xsplice.c
index 8cb7767..db0dce2 100644
--- a/xen/arch/arm/xsplice.c
+++ b/xen/arch/arm/xsplice.c
@@ -7,6 +7,39 @@
 #include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 
+void arch_xsplice_patching_enter(void)
+{
+}
+
+void arch_xsplice_patching_leave(void)
+{
+}
+
+int arch_xsplice_verify_func(const struct xsplice_patch_func *func)
+{
+    return -ENOSYS;
+}
+
+void arch_xsplice_apply_jmp(struct xsplice_patch_func *func)
+{
+}
+
+void arch_xsplice_revert_jmp(const struct xsplice_patch_func *func)
+{
+}
+
+void arch_xsplice_post_action(void)
+{
+}
+
+void arch_xsplice_mask(void)
+{
+}
+
+void arch_xsplice_unmask(void)
+{
+}
+
 int arch_xsplice_verify_elf(const struct xsplice_elf *elf)
 {
     return -ENOSYS;
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index e93ff20..d13b272 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -36,6 +36,7 @@
 #include <xen/cpu.h>
 #include <xen/wait.h>
 #include <xen/guest_access.h>
+#include <xen/xsplice.h>
 #include <public/sysctl.h>
 #include <public/hvm/hvm_vcpu.h>
 #include <asm/regs.h>
@@ -120,6 +121,11 @@ static void idle_loop(void)
         (*pm_idle)();
         do_tasklet();
         do_softirq();
+        /*
+         * We MUST be last (or before pm_idle). Otherwise after we get the
+         * softirq we would execute pm_idle (and sleep) and not patch.
+         */
+        check_for_xsplice_work();
     }
 }
 
diff --git a/xen/arch/x86/xsplice.c b/xen/arch/x86/xsplice.c
index e50cbc0..fc1338e 100644
--- a/xen/arch/x86/xsplice.c
+++ b/xen/arch/x86/xsplice.c
@@ -11,6 +11,82 @@
 #include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 
+#include <asm/nmi.h>
+
+#define PATCH_INSN_SIZE 5
+
+void arch_xsplice_patching_enter(void)
+{
+    /* Disable WP to allow changes to read-only pages. */
+    write_cr0(read_cr0() & ~X86_CR0_WP);
+}
+
+void arch_xsplice_patching_leave(void)
+{
+    /* Reinstate WP. */
+    write_cr0(read_cr0() | X86_CR0_WP);
+}
+
+int arch_xsplice_verify_func(const struct xsplice_patch_func *func)
+{
+    /* No NOP patching yet. */
+    if ( !func->new_size )
+        return -EOPNOTSUPP;
+
+    if ( func->old_size < PATCH_INSN_SIZE )
+        return -EINVAL;
+
+    return 0;
+}
+
+void arch_xsplice_apply_jmp(struct xsplice_patch_func *func)
+{
+    int32_t val;
+    uint8_t *old_ptr;
+
+    BUILD_BUG_ON(PATCH_INSN_SIZE > sizeof(func->opaque));
+    BUILD_BUG_ON(PATCH_INSN_SIZE != (1 + sizeof(val)));
+
+    old_ptr = func->old_addr;
+    memcpy(func->opaque, old_ptr, PATCH_INSN_SIZE);
+
+    *old_ptr++ = 0xe9; /* Relative jump */
+    val = func->new_addr - func->old_addr - PATCH_INSN_SIZE;
+    memcpy(old_ptr, &val, sizeof(val));
+}
+
+void arch_xsplice_revert_jmp(const struct xsplice_patch_func *func)
+{
+    memcpy(func->old_addr, func->opaque, PATCH_INSN_SIZE);
+}
+
+/* Serialise the CPU pipeline. */
+void arch_xsplice_post_action(void)
+{
+    cpuid_eax(0);
+}
+
+static nmi_callback_t *saved_nmi_callback;
+/*
+ * Note that because of this NOP code the do_nmi is not safely patchable.
+ * Also if we do receive 'real' NMIs we have lost them.
+ */
+static int mask_nmi_callback(const struct cpu_user_regs *regs, int cpu)
+{
+    /* TODO: Handle missing NMI/MCE.*/
+    return 1;
+}
+
+void arch_xsplice_mask(void)
+{
+    saved_nmi_callback = set_nmi_callback(mask_nmi_callback);
+}
+
+void arch_xsplice_unmask(void)
+{
+    set_nmi_callback(saved_nmi_callback);
+}
+
 int arch_xsplice_verify_elf(const struct xsplice_elf *elf)
 {
 
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 3f3aacc..6d94023 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -3,6 +3,7 @@
  *
  */
 
+#include <xen/cpu.h>
 #include <xen/err.h>
 #include <xen/guest_access.h>
 #include <xen/keyhandler.h>
@@ -11,18 +12,30 @@
 #include <xen/mm.h>
 #include <xen/sched.h>
 #include <xen/smp.h>
+#include <xen/softirq.h>
 #include <xen/spinlock.h>
 #include <xen/vmap.h>
+#include <xen/wait.h>
 #include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 
 #include <asm/event.h>
-#include <public/sysctl.h>
 
-/* Protects against payload_list operations. */
+/*
+ * Protects against payload_list operations and also allows only one
+ * caller in schedule_work.
+ */
 static DEFINE_SPINLOCK(payload_lock);
 static LIST_HEAD(payload_list);
 
+/*
+ * Patches which have been applied. Need RCU in case we crash (and then
+ * traps code would iterate via applied_list) when adding entries on the list.
+ *
+ * Note: There are no 'rcu_applied_lock' as we don't iterate yet the list.
+ */
+static LIST_HEAD(applied_list);
+
 static unsigned int payload_cnt;
 static unsigned int payload_version = 1;
 
@@ -37,9 +50,35 @@ struct payload {
     const void *ro_addr;                 /* Virtual address of .rodata. */
     size_t ro_size;                      /* .. and its size (if any). */
     unsigned int pages;                  /* Total pages for [text,rw,ro]_addr */
+    struct list_head applied_list;       /* Linked to 'applied_list'. */
+    struct xsplice_patch_func *funcs;    /* The array of functions to patch. */
+    unsigned int nfuncs;                 /* Nr of functions to patch. */
     char name[XEN_XSPLICE_NAME_SIZE];    /* Name of it. */
 };
 
+/* Defines an outstanding patching action. */
+struct xsplice_work
+{
+    atomic_t semaphore;          /* Used to rendezvous CPUs in
+                                    check_for_xsplice_work. */
+    uint32_t timeout;            /* Timeout to do the operation. */
+    struct payload *data;        /* The payload on which to act. */
+    volatile bool_t do_work;     /* Signals work to do. */
+    volatile bool_t ready;       /* Signals all CPUs synchronized. */
+    unsigned int cmd;            /* Action request: XSPLICE_ACTION_* */
+};
+
+/* There can be only one outstanding patching action. */
+static struct xsplice_work xsplice_work;
+
+/*
+ * Indicate whether the CPU needs to consult xsplice_work structure.
+ * We want an per-cpu data structure otherwise the check_for_xsplice_work
+ * would hammer a global xsplice_work structure on every guest VMEXIT.
+ * Having an per-cpu lessens the load.
+ */
+static DEFINE_PER_CPU(bool_t, work_to_do);
+
 static int get_name(const xen_xsplice_name_t *name, char *n)
 {
     if ( !name->size || name->size > XEN_XSPLICE_NAME_SIZE )
@@ -268,6 +307,91 @@ static int secure_payload(struct payload *payload, struct xsplice_elf *elf)
     return rc;
 }
 
+static int check_special_sections(const struct xsplice_elf *elf)
+{
+    unsigned int i;
+    static const char *const names[] = { ELF_XSPLICE_FUNC };
+    DECLARE_BITMAP(count, ARRAY_SIZE(names)) = { 0 };
+
+    for ( i = 0; i < ARRAY_SIZE(names); i++ )
+    {
+        const struct xsplice_elf_sec *sec;
+
+        sec = xsplice_elf_sec_by_name(elf, names[i]);
+        if ( !sec )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: %s is missing!\n",
+                    elf->name, names[i]);
+            return -EINVAL;
+        }
+
+        if ( !sec->sec->sh_size )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: %s is empty!\n",
+                    elf->name, names[i]);
+            return -EINVAL;
+        }
+
+        if ( test_bit(i, count) )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: %s was seen more than once!\n",
+                    elf->name, names[i]);
+            return -EINVAL;
+        }
+
+        __set_bit(i, count);
+    }
+
+    return 0;
+}
+
+static int prepare_payload(struct payload *payload,
+                           struct xsplice_elf *elf)
+{
+    const struct xsplice_elf_sec *sec;
+    unsigned int i;
+    struct xsplice_patch_func *f;
+
+    sec = xsplice_elf_sec_by_name(elf, ELF_XSPLICE_FUNC);
+    ASSERT(sec);
+    if ( sec->sec->sh_size % sizeof(*payload->funcs) )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Wrong size of "ELF_XSPLICE_FUNC"!\n",
+                elf->name);
+        return -EINVAL;
+    }
+
+    payload->funcs = sec->load_addr;
+    payload->nfuncs = sec->sec->sh_size / sizeof(*payload->funcs);
+
+    for ( i = 0; i < payload->nfuncs; i++ )
+    {
+        int rc;
+
+        f = &(payload->funcs[i]);
+
+        if ( f->version != XSPLICE_PAYLOAD_VERSION )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Wrong version (%u). Expected %d!\n",
+                    elf->name, f->version, XSPLICE_PAYLOAD_VERSION);
+            return -EOPNOTSUPP;
+        }
+
+        if ( !f->new_addr || !f->new_size )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Address or size fields are zero!\n",
+                    elf->name);
+            return -EINVAL;
+        }
+
+        rc = arch_xsplice_verify_func(f);
+        if ( rc )
+            return rc;
+    }
+
+    return 0;
+}
+
 static void free_payload(struct payload *data)
 {
     ASSERT(spin_is_locked(&payload_lock));
@@ -299,6 +423,14 @@ static int load_payload_data(struct payload *payload, void *raw, size_t len)
     if ( rc )
         goto out;
 
+    rc = check_special_sections(&elf);
+    if ( rc )
+        goto out;
+
+    rc = prepare_payload(payload, &elf);
+    if ( rc )
+        goto out;
+
     rc = secure_payload(payload, &elf);
 
  out:
@@ -354,6 +486,7 @@ static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
 
     data->state = XSPLICE_STATE_CHECKED;
     INIT_LIST_HEAD(&data->list);
+    INIT_LIST_HEAD(&data->applied_list);
 
     list_add_tail(&data->list, &payload_list);
     payload_cnt++;
@@ -464,6 +597,313 @@ static int xsplice_list(xen_sysctl_xsplice_list_t *list)
     return rc ? : idx;
 }
 
+/*
+ * The following functions get the CPUs into an appropriate state and
+ * apply (or revert) each of the payload's functions. This is needed
+ * for XEN_SYSCTL_XSPLICE_ACTION operation (see xsplice_action).
+ */
+
+static int apply_payload(struct payload *data)
+{
+    unsigned int i;
+
+    printk(XENLOG_INFO XSPLICE "%s: Applying %u functions\n",
+            data->name, data->nfuncs);
+
+    arch_xsplice_patching_enter();
+
+    for ( i = 0; i < data->nfuncs; i++ )
+        arch_xsplice_apply_jmp(&data->funcs[i]);
+
+    arch_xsplice_patching_leave();
+
+    /*
+     * We need RCU variant (which has barriers) in case we crash here.
+     * The applied_list is iterated by the trap code.
+     */
+    list_add_tail_rcu(&data->applied_list, &applied_list);
+
+    return 0;
+}
+
+static int revert_payload(struct payload *data)
+{
+    unsigned int i;
+
+    printk(XENLOG_INFO XSPLICE "%s: Reverting\n", data->name);
+
+    arch_xsplice_patching_enter();
+
+    for ( i = 0; i < data->nfuncs; i++ )
+        arch_xsplice_revert_jmp(&data->funcs[i]);
+
+    arch_xsplice_patching_leave();
+
+    /*
+     * We need RCU variant (which has barriers) in case we crash here.
+     * The applied_list is iterated by the trap code.
+     */
+    list_del_rcu(&data->applied_list);
+
+    return 0;
+}
+
+/*
+ * This function is executed having all other CPUs with no deep stack (we may
+ * have cpu_idle on it) and IRQs disabled.
+ */
+static void xsplice_do_action(void)
+{
+    int rc;
+    struct payload *data, *other, *tmp;
+
+    data = xsplice_work.data;
+    /*
+     * This function and the transition from asm to C code should be the only
+     * one on any stack. No need to lock the payload list or applied list.
+     */
+    switch ( xsplice_work.cmd )
+    {
+    case XSPLICE_ACTION_APPLY:
+        rc = apply_payload(data);
+        if ( rc == 0 )
+            data->state = XSPLICE_STATE_APPLIED;
+        break;
+
+    case XSPLICE_ACTION_REVERT:
+        rc = revert_payload(data);
+        if ( rc == 0 )
+            data->state = XSPLICE_STATE_CHECKED;
+        break;
+
+    case XSPLICE_ACTION_REPLACE:
+        rc = 0;
+        /*
+	 * N.B: Use 'applied_list' member, not 'list'. We also abuse the
+	 * the 'normal' list iterator as the list is an RCU one.
+	 */
+        list_for_each_entry_safe_reverse ( other, tmp, &applied_list, applied_list )
+        {
+            other->rc = revert_payload(other);
+            if ( other->rc == 0 )
+                other->state = XSPLICE_STATE_CHECKED;
+            else
+            {
+                rc = -EINVAL;
+                break;
+            }
+        }
+
+        if ( rc == 0 )
+        {
+            rc = apply_payload(data);
+            if ( rc == 0 )
+                data->state = XSPLICE_STATE_APPLIED;
+        }
+        break;
+
+    default:
+        rc = -EINVAL; /* Make GCC5 happy. */
+        ASSERT_UNREACHABLE();
+        break;
+    }
+
+    /* We must set rc as xsplice_action sets it to -EAGAIN when kicking of. */
+    data->rc = rc;
+}
+
+static int schedule_work(struct payload *data, uint32_t cmd, uint32_t timeout)
+{
+    ASSERT(spin_is_locked(&payload_lock));
+
+    /* Fail if an operation is already scheduled. */
+    if ( xsplice_work.do_work )
+        return -EBUSY;
+
+    if ( !get_cpu_maps() )
+    {
+        printk(XENLOG_ERR XSPLICE "%s: unable to get cpu_maps lock!\n",
+               data->name);
+        return -EBUSY;
+    }
+
+    xsplice_work.cmd = cmd;
+    xsplice_work.data = data;
+    xsplice_work.timeout = timeout ?: MILLISECS(30);
+
+    dprintk(XENLOG_DEBUG, XSPLICE "%s: timeout is %"PRI_stime"ms\n",
+            data->name, xsplice_work.timeout / MILLISECS(1));
+
+    atomic_set(&xsplice_work.semaphore, -1);
+
+    xsplice_work.ready = 0;
+
+    smp_wmb();
+
+    xsplice_work.do_work = 1;
+    this_cpu(work_to_do) = 1;
+
+    put_cpu_maps();
+
+    return 0;
+}
+
+static void reschedule_fn(void *unused)
+{
+    this_cpu(work_to_do) = 1;
+    raise_softirq(SCHEDULE_SOFTIRQ);
+}
+
+static int xsplice_spin(atomic_t *counter, s_time_t timeout,
+                           unsigned int cpus, const char *s)
+{
+    int rc = 0;
+
+    while ( atomic_read(counter) != cpus && NOW() < timeout )
+        cpu_relax();
+
+    /* Log & abort. */
+    if ( atomic_read(counter) != cpus )
+    {
+        printk(XENLOG_ERR XSPLICE "%s: Timed out on semaphore in %s quiesce phase %u/%u\n",
+               xsplice_work.data->name, s, atomic_read(counter), cpus);
+        rc = -EBUSY;
+        xsplice_work.data->rc = rc;
+        smp_wmb();
+        xsplice_work.do_work = 0;
+    }
+
+    return rc;
+}
+
+/*
+ * The main function which manages the work of quiescing the system and
+ * patching code.
+ */
+void check_for_xsplice_work(void)
+{
+#define ACTION(x) [XSPLICE_ACTION_##x] = #x
+    static const char *const names[] = {
+            ACTION(APPLY),
+            ACTION(REVERT),
+            ACTION(REPLACE),
+    };
+#undef ACTION
+    unsigned int cpu = smp_processor_id();
+    s_time_t timeout;
+    unsigned long flags;
+
+    /* Fast path: no work to do. */
+    if ( !per_cpu(work_to_do, cpu ) )
+        return;
+
+    smp_rmb();
+    /* In case we aborted, other CPUs can skip right away. */
+    if ( !xsplice_work.do_work )
+    {
+        per_cpu(work_to_do, cpu) = 0;
+        return;
+    }
+
+    ASSERT(local_irq_is_enabled());
+
+    /* Set at -1, so will go up to num_online_cpus - 1. */
+    if ( atomic_inc_and_test(&xsplice_work.semaphore) )
+    {
+        struct payload *p;
+        unsigned int cpus;
+
+        p = xsplice_work.data;
+        if ( !get_cpu_maps() )
+        {
+            printk(XENLOG_ERR XSPLICE "%s: CPU%u - unable to get cpu_maps lock!\n",
+                   p->name, cpu);
+            per_cpu(work_to_do, cpu) = 0;
+            xsplice_work.data->rc = -EBUSY;
+            smp_wmb();
+            xsplice_work.do_work = 0;
+            /*
+             * Do NOT decrement xsplice_work.semaphore down - as that may cause
+             * the other CPU (which may be at this point ready to increment it)
+             * to assume the role of master and then needlessly time out
+             * out (as do_work is zero).
+             */
+            return;
+        }
+        /* "Mask" NMIs. */
+        arch_xsplice_mask();
+
+        barrier(); /* MUST do it after get_cpu_maps. */
+        cpus = num_online_cpus() - 1;
+
+        if ( cpus )
+        {
+            dprintk(XENLOG_DEBUG, XSPLICE "%s: CPU%u - IPIing the other %u CPUs\n",
+                    p->name, cpu, cpus);
+            smp_call_function(reschedule_fn, NULL, 0);
+        }
+
+        timeout = xsplice_work.timeout + NOW();
+        if ( xsplice_spin(&xsplice_work.semaphore, timeout, cpus, "CPU") )
+            goto abort;
+
+        /* All CPUs are waiting, now signal to disable IRQs. */
+        atomic_set(&xsplice_work.semaphore, 0);
+        /*
+         * MUST have a barrier after semaphore so that the other CPUs don't
+         * leak out of the 'Wait for all CPUs to rendezvous' loop and increment
+         * 'semaphore' before we set it to zero.
+         */
+        smp_wmb();
+        xsplice_work.ready = 1;
+
+        if ( !xsplice_spin(&xsplice_work.semaphore, timeout, cpus, "IRQ") )
+        {
+            local_irq_save(flags);
+            /* Do the patching. */
+            xsplice_do_action();
+            /* Serialize and flush out the CPU via CPUID instruction (on x86). */
+            arch_xsplice_post_action();
+            local_irq_restore(flags);
+        }
+        arch_xsplice_unmask();
+
+ abort:
+        per_cpu(work_to_do, cpu) = 0;
+        xsplice_work.do_work = 0;
+
+        /* put_cpu_maps has an barrier(). */
+        put_cpu_maps();
+
+        printk(XENLOG_INFO XSPLICE "%s finished %s with rc=%d\n",
+               p->name, names[xsplice_work.cmd], p->rc);
+    }
+    else
+    {
+        /* Wait for all CPUs to rendezvous. */
+        while ( xsplice_work.do_work && !xsplice_work.ready )
+            cpu_relax();
+
+        /* Disable IRQs and signal. */
+        local_irq_save(flags);
+        /*
+         * We re-use the sempahore, so MUST have it reset by master before
+         * we exit the loop above.
+         */
+        atomic_inc(&xsplice_work.semaphore);
+
+        /* Wait for patching to complete. */
+        while ( xsplice_work.do_work )
+            cpu_relax();
+
+        /* To flush out pipeline. */
+        arch_xsplice_post_action();
+        local_irq_restore(flags);
+
+        per_cpu(work_to_do, cpu) = 0;
+    }
+}
+
 static int xsplice_action(xen_sysctl_xsplice_action_t *action)
 {
     struct payload *data;
@@ -503,27 +943,24 @@ static int xsplice_action(xen_sysctl_xsplice_action_t *action)
     case XSPLICE_ACTION_REVERT:
         if ( data->state == XSPLICE_STATE_APPLIED )
         {
-            /* No implementation yet. */
-            data->state = XSPLICE_STATE_CHECKED;
-            data->rc = 0;
+            data->rc = -EAGAIN;
+            rc = schedule_work(data, action->cmd, action->timeout);
         }
         break;
 
     case XSPLICE_ACTION_APPLY:
         if ( data->state == XSPLICE_STATE_CHECKED )
         {
-            /* No implementation yet. */
-            data->state = XSPLICE_STATE_APPLIED;
-            data->rc = 0;
+            data->rc = -EAGAIN;
+            rc = schedule_work(data, action->cmd, action->timeout);
         }
         break;
 
     case XSPLICE_ACTION_REPLACE:
         if ( data->state == XSPLICE_STATE_CHECKED )
         {
-            /* No implementation yet. */
-            data->state = XSPLICE_STATE_CHECKED;
-            data->rc = 0;
+            data->rc = -EAGAIN;
+            rc = schedule_work(data, action->cmd, action->timeout);
         }
         break;
 
@@ -588,6 +1025,7 @@ static const char *state2str(unsigned int state)
 static void xsplice_printall(unsigned char key)
 {
     struct payload *data;
+    unsigned int i;
 
     printk("'%c' pressed - Dumping all xsplice patches\n", key);
 
@@ -598,10 +1036,30 @@ static void xsplice_printall(unsigned char key)
     }
 
     list_for_each_entry ( data, &payload_list, list )
+    {
         printk(" name=%s state=%s(%d) %p (.data=%p, .rodata=%p) using %u pages.\n",
                data->name, state2str(data->state), data->state, data->text_addr,
                data->rw_addr, data->ro_addr, data->pages);
 
+        for ( i = 0; i < data->nfuncs; i++ )
+        {
+            struct xsplice_patch_func *f = &(data->funcs[i]);
+            printk("    %s patch %p(%u) with %p (%u)\n",
+                   f->name, f->old_addr, f->old_size, f->new_addr, f->new_size);
+
+            if ( i && !(i % 64) )
+            {
+                spin_unlock(&payload_lock);
+                process_pending_softirqs();
+                if ( spin_trylock(&payload_lock) )
+                {
+                    printk("Couldn't reacquire lock. Try again.\n");
+                    return;
+                }
+            }
+        }
+    }
+
     spin_unlock(&payload_lock);
 }
 
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index 4083261..73a7209 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -86,10 +86,18 @@ static inline struct cpu_info *get_cpu_info(void)
 unsigned long get_stack_trace_bottom(unsigned long sp);
 unsigned long get_stack_dump_bottom (unsigned long sp);
 
+#ifdef CONFIG_XSPLICE
+# define CHECK_FOR_XSPLICE_WORK "call check_for_xsplice_work;"
+#else
+# define CHECK_FOR_XSPLICE_WORK ""
+#endif
+
 #define reset_stack_and_jump(__fn)                                      \
     ({                                                                  \
         __asm__ __volatile__ (                                          \
-            "mov %0,%%"__OP"sp; jmp %c1"                                \
+            "mov %0,%%"__OP"sp;"                                        \
+            CHECK_FOR_XSPLICE_WORK                                      \
+             "jmp %c1"                                                  \
             : : "r" (guest_cpu_user_regs()), "i" (__fn) : "memory" );   \
         unreachable();                                                  \
     })
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 3fa1fe7..b2b312b 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -869,6 +869,26 @@ DEFINE_XEN_GUEST_HANDLE(xen_sysctl_featureset_t);
  *     If zero exit with success.
  */
 
+#define XSPLICE_PAYLOAD_VERSION 1
+/*
+ * .xsplice.funcs structure layout defined in the `Payload format`
+ * section in the xSplice design document.
+ *
+ * We guard this with __XEN__ as toolstacks SHOULD not use it.
+ */
+#ifdef __XEN__
+struct xsplice_patch_func {
+    const char *name;       /* Name of function to be patched. */
+    void *new_addr;
+    void *old_addr;
+    uint32_t new_size;
+    uint32_t old_size;
+    uint8_t version;        /* MUST be XSPLICE_PAYLOAD_VERSION. */
+    uint8_t opaque[31];
+};
+typedef struct xsplice_patch_func xsplice_patch_func_t;
+#endif
+
 /*
  * Structure describing an ELF payload. Uniquely identifies the
  * payload. Should be human readable.
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index 857c264..c9723e4 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -11,12 +11,16 @@ struct xsplice_elf_sec;
 struct xsplice_elf_sym;
 struct xen_sysctl_xsplice_op;
 
+#include <xen/elfstructs.h>
 #ifdef CONFIG_XSPLICE
 
 /* Convenience define for printk. */
 #define XSPLICE             "xsplice: "
+/* ELF payload special section names. */
+#define ELF_XSPLICE_FUNC    ".xsplice.funcs"
 
 int xsplice_op(struct xen_sysctl_xsplice_op *);
+void check_for_xsplice_work(void);
 
 /* Arch hooks. */
 int arch_xsplice_verify_elf(const struct xsplice_elf *elf);
@@ -39,6 +43,22 @@ enum va_type {
 int arch_xsplice_secure(const void *va, unsigned int pages, enum va_type types);
 
 void arch_xsplice_init(void);
+
+#include <public/sysctl.h> /* For struct xsplice_patch_func. */
+int arch_xsplice_verify_func(const struct xsplice_patch_func *func);
+/*
+ * These functions are called around the critical region patching live code,
+ * for an architecture to take make appropratie global state adjustments.
+ */
+void arch_xsplice_patching_enter(void);
+void arch_xsplice_patching_leave(void);
+
+void arch_xsplice_apply_jmp(struct xsplice_patch_func *func);
+void arch_xsplice_revert_jmp(const struct xsplice_patch_func *func);
+void arch_xsplice_post_action(void);
+
+void arch_xsplice_mask(void);
+void arch_xsplice_unmask(void);
 #else
 
 #include <xen/errno.h> /* For -ENOSYS */
@@ -47,6 +67,7 @@ static inline int xsplice_op(struct xen_sysctl_xsplice_op *op)
     return -ENOSYS;
 }
 
+static inline void check_for_xsplice_work(void) { };
 #endif /* CONFIG_XSPLICE */
 
 #endif /* __XEN_XSPLICE_H__ */
-- 
2.4.3


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 10/27] xsplice: Add helper elf routines
  2016-04-26 12:37   ` Jan Beulich
  2016-04-27  1:59     ` Konrad Rzeszutek Wilk
@ 2016-04-27  4:06     ` Konrad Rzeszutek Wilk
  2016-04-27  7:52       ` Jan Beulich
  1 sibling, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-27  4:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, Ian Jackson,
	Tim Deegan, mpohlack, sasha.levin, xen-devel

> > +static int elf_resolve_sections(struct xsplice_elf *elf, const void *data)
> > +{
..snip..
> > +    /* e_shoff and e_shnum overflow checks are done in xsplice_header_check. */
> > +    delta = elf->hdr->e_shoff + elf->hdr->e_shnum * elf->hdr->e_shentsize;
> 
> The added comment just helps make obvious that the overflow I
> believe Andrew was worried about is still not being taken care of:
> All xsplice_header_check() does is range check the two values
> mentioned in the comment. But I agree that a proper range check
> (at once eliminating overflow concerns for the arithmetic here)
> would better live there (and also see there).

..snip..
> > +static int xsplice_header_check(const struct xsplice_elf *elf)
> > +{ 
..snip..
> > +    if ( elf->hdr->e_shnum > 64 )
> > +    {
> > +        dprintk(XENLOG_ERR, XSPLICE "%s: Too many (%u) sections!\n",
> > +                elf->name, elf->hdr->e_shnum);
> > +        return -EOPNOTSUPP;
> > +    }
> > +
> > +    if ( elf->hdr->e_shoff > ULONG_MAX )
> 
> Why not ">= elf->len" (and I see it was almost that way in v8.1)?

I misunderstood your comment. You mentioned to me that we have
an boundary check here (when it was against elf->len) and that you
wanted an overflow - so I replaced it - while you meant - in addition to.

But adding in both:

	elf->hdr->e_shoff >= ULONG_MAX || elf->hdr->e_shoff >= elf->len

feels unneccessary. And the boundary check is more imporant.
I added both in the code.


> And then followed (further down) by another check taking
> elf->hdr->e_shnum * elf->hdr->e_shentsize into account (of
> course as things stand now, elf->hdr->e_shentsize can also be
> arbitrarily large, so this would need to be suitably structured
> - e.g. "(elf->len - elf->hdr->e_shoff) / elf->hdr->e_shentsize <
> elf->hdr->e_shnum").

Ah, so that is how you want to check for e_shnum!

Here is the updated patch:


From eb90312d4ff66c17fad2d4cd5379974fc4c9f2f6 Mon Sep 17 00:00:00 2001
From: Ross Lagerwall <ross.lagerwall@citrix.com>
Date: Fri, 19 Feb 2016 14:37:17 -0500
Subject: [PATCH] xsplice: Add helper elf routines

Add Elf routines and data structures in preparation for loading an
xSplice payload.

We make an assumption that the max number of sections an ELF payload
can have is 64. We can in future make this be dependent on the
names of the sections and verifying against a list, but for right now
this suffices.

Also we a whole lot of checks to make sure that the ELF payload
file is not corrupted nor that the offsets point past the file.

For most of the checks we print an message if the hypervisor is built
with debug enabled.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>

v2: - With the #define ELFSIZE in the ARM file we can use the common
     #defines instead of using #ifdef CONFIG_ARM_32. Moved to another
    patch.
    - Add checks for ELF file.
    - Add name to be printed.
    - Add len for easier ELF checks.
    - Expand on the checks. Add macro.
v3: Remove the return_ macro
  - Add return_ macro back but make it depend on debug=y
  - Per Andrew review: add local variable. Fix memory leak in
    elf_resolve_sections, Remove macro and use dprintk. Fix alignment.
    Use void* instead of uint8_t to handle raw payload.
v4 - Fix memory leak in elf_get_sym
  - Add XSPLICE to printk/dprintk
v5: Sprinkle newlines.
v6: Squash the ELF header checks from 'xsplice: Implement payload loading' here,
    Do better job at checking string sections and the users of them (sh_size),
    Use XSPLICE as a string literal,
    Move some checks outside the loop,
    Make sure that SHT_STRTAB are really what they say
    Sprinkle consts.
v7:
    Check sh_entsize and sh_offset.
    Added Andrew's Reviewed-by and Ian's Acked-by
    Redo check on sh_entsize to not be !=
v8: Make all the dprintk(XENLOG_DEBUG be XENLOG_ERR
v9: Changed elf_verify_strtab to use const char and return EINVAL.
    Remove 'if ( !delta )' check in elf_resolve_sections
    Remove stale comments.
    Fixed one off check against  sh_link.
    Document boundary checks against shstrtab and symtab.
    Fixed return codes in xsplice_header_check.
    Add check for sections to not be within ELF header.
    Added overflow check for e_shoff in xsplice_header_check.
    Moved XSPLICE macro by four tabs.
    Make ->sym be const.
v10:
  - Change the check against 64 to be against SHN_LORESERVE
  - Remove Reviewed-by
  - In elf_resolve_sections skip delta check if SHT_NOBITS is set in
    second conditional.
  - In elf_get_sym use symtab_sec->sec->sh_entsize to access
    Elf_Sym symbols and also make it a const. Also
    fix boundary check against .strtab and make assigment of
    sym[i].name more natural.
  - In xsplice_header_check add comment about EI_CLASS and e_flags
    being platform specific. Check against e_version and EI_VERSION.
    Also reinstate elf->hdr->e_shoff >= elf->len  check. Add Jan's check against
    elf->hdr->e_shnum * elf->hdr->e_shentsize
---
 xen/common/Makefile           |   1 +
 xen/common/xsplice_elf.c      | 375 ++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/xsplice.h     |   3 +
 xen/include/xen/xsplice_elf.h |  51 ++++++
 4 files changed, 430 insertions(+)
 create mode 100644 xen/common/xsplice_elf.c
 create mode 100644 xen/include/xen/xsplice_elf.h

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 1e4bc70..afd84b6 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -59,6 +59,7 @@ obj-y += wait.o
 obj-$(CONFIG_XENOPROF) += xenoprof.o
 obj-y += xmalloc_tlsf.o
 obj-$(CONFIG_XSPLICE) += xsplice.o
+obj-$(CONFIG_XSPLICE) += xsplice_elf.o
 
 obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma unlzo unlz4 earlycpio,$(n).init.o)
 
diff --git a/xen/common/xsplice_elf.c b/xen/common/xsplice_elf.c
new file mode 100644
index 0000000..702d43e
--- /dev/null
+++ b/xen/common/xsplice_elf.c
@@ -0,0 +1,375 @@
+/*
+ * Copyright (C) 2016 Citrix Systems R&D Ltd.
+ */
+
+#include <xen/errno.h>
+#include <xen/lib.h>
+#include <xen/xsplice_elf.h>
+#include <xen/xsplice.h>
+
+const struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
+                                                      const char *name)
+{
+    unsigned int i;
+
+    for ( i = 1; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( !strcmp(name, elf->sec[i].name) )
+            return &elf->sec[i];
+    }
+
+    return NULL;
+}
+
+static int elf_verify_strtab(const struct xsplice_elf_sec *sec)
+{
+    const Elf_Shdr *s;
+    const char *contents;
+
+    s = sec->sec;
+
+    if ( s->sh_type != SHT_STRTAB )
+        return -EINVAL;
+
+    if ( !s->sh_size )
+        return -EINVAL;
+
+    contents = sec->data;
+
+    if ( contents[0] || contents[s->sh_size - 1] )
+        return -EINVAL;
+
+    return 0;
+}
+
+static int elf_resolve_sections(struct xsplice_elf *elf, const void *data)
+{
+    struct xsplice_elf_sec *sec;
+    unsigned int i;
+    Elf_Off delta;
+    int rc;
+
+    /* xsplice_elf_load sanity checked e_shnum. */
+    sec = xmalloc_array(struct xsplice_elf_sec, elf->hdr->e_shnum);
+    if ( !sec )
+    {
+        dprintk(XENLOG_ERR, XSPLICE"%s: Could not allocate memory for section table!\n",
+               elf->name);
+        return -ENOMEM;
+    }
+
+    elf->sec = sec;
+
+    /* e_shoff and e_shnum overflow checks are done in xsplice_header_check. */
+    delta = elf->hdr->e_shoff + elf->hdr->e_shnum * elf->hdr->e_shentsize;
+    if ( delta > elf->len )
+    {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Section table is past end of payload!\n",
+                    elf->name);
+            return -EINVAL;
+    }
+
+    for ( i = 1; i < elf->hdr->e_shnum; i++ )
+    {
+        delta = elf->hdr->e_shoff + i * elf->hdr->e_shentsize;
+
+        sec[i].sec = data + delta;
+
+        delta = sec[i].sec->sh_offset;
+        /*
+         * N.B. elf_resolve_section_names, elf_get_sym skip this check as
+         * we do it here.
+         */
+        if ( delta < sizeof(Elf_Ehdr) ||
+             (sec[i].sec->sh_type != SHT_NOBITS && /* Skip SHT_NOBITS */
+              (delta > elf->len || (delta + sec[i].sec->sh_size > elf->len))) )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Section [%u] data %s of payload!\n",
+                    elf->name, i,
+                    delta < sizeof(Elf_Ehdr) ? "at ELF header" : "is past end");
+            return -EINVAL;
+        }
+
+        sec[i].data = data + delta;
+        /* Name is populated in elf_resolve_section_names. */
+        sec[i].name = NULL;
+
+        if ( sec[i].sec->sh_type == SHT_SYMTAB )
+        {
+            if ( elf->symtab )
+            {
+                dprintk(XENLOG_ERR, XSPLICE "%s: Unsupported multiple symbol tables!\n",
+                        elf->name);
+                return -EOPNOTSUPP;
+            }
+
+            elf->symtab = &sec[i];
+
+            /*
+             * elf->symtab->sec->sh_link would point to the right section
+             * but we hadn't finished parsing all the sections.
+             */
+            if ( elf->symtab->sec->sh_link >= elf->hdr->e_shnum )
+            {
+                dprintk(XENLOG_ERR, XSPLICE
+                        "%s: Symbol table idx (%u) to strtab past end (%u)\n",
+                        elf->name, elf->symtab->sec->sh_link,
+                        elf->hdr->e_shnum);
+                return -EINVAL;
+            }
+        }
+    }
+
+    if ( !elf->symtab )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: No symbol table found!\n",
+                elf->name);
+        return -EINVAL;
+    }
+
+    if ( !elf->symtab->sec->sh_size ||
+         elf->symtab->sec->sh_entsize < sizeof(Elf_Sym) ||
+         elf->symtab->sec->sh_size % elf->symtab->sec->sh_entsize )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Symbol table header is corrupted!\n",
+                elf->name);
+        return -EINVAL;
+    }
+
+    /*
+     * There can be multiple SHT_STRTAB (.shstrtab, .strtab) so pick the one
+     * associated with the symbol table.
+     */
+    elf->strtab = &sec[elf->symtab->sec->sh_link];
+
+    rc = elf_verify_strtab(elf->strtab);
+    if ( rc )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: String table section is corrupted\n",
+                elf->name);
+    }
+
+    return rc;
+}
+
+static int elf_resolve_section_names(struct xsplice_elf *elf, const void *data)
+{
+    const char *shstrtab;
+    unsigned int i;
+    Elf_Off offset, delta;
+    struct xsplice_elf_sec *sec;
+    int rc;
+
+    /*
+     * The elf->sec[0 -> e_shnum] structures have been verified by
+     * elf_resolve_sections. Find file offset for section string table
+     * (normally called .shstrtab)
+     */
+    sec = &elf->sec[elf->hdr->e_shstrndx];
+
+    rc = elf_verify_strtab(sec);
+    if ( rc )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Section string table is corrupted\n",
+                elf->name);
+        return rc;
+    }
+
+    /* Verified in elf_resolve_sections but just in case. */
+    offset = sec->sec->sh_offset;
+    ASSERT(offset < elf->len && (offset + sec->sec->sh_size <= elf->len));
+
+    shstrtab = data + offset;
+
+    for ( i = 1; i < elf->hdr->e_shnum; i++ )
+    {
+        delta = elf->sec[i].sec->sh_name;
+
+        /* Boundary check on offset of name within the .shstrtab. */
+        if ( delta >= sec->sec->sh_size )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: shstrtab [%u] data is past end of payload!\n",
+                    elf->name, i);
+            return -EINVAL;
+        }
+
+        elf->sec[i].name = shstrtab + delta;
+    }
+
+    return 0;
+}
+
+static int elf_get_sym(struct xsplice_elf *elf, const void *data)
+{
+    const struct xsplice_elf_sec *symtab_sec, *strtab_sec;
+    struct xsplice_elf_sym *sym;
+    unsigned int i, delta, offset, nsym;
+
+    symtab_sec = elf->symtab;
+    strtab_sec = elf->strtab;
+
+    /* Pointers arithmetic to get file offset. */
+    offset = strtab_sec->data - data;
+
+    /* Checked already in elf_resolve_sections, but just in case. */
+    ASSERT(offset == strtab_sec->sec->sh_offset);
+    ASSERT(offset < elf->len && (offset + strtab_sec->sec->sh_size <= elf->len));
+
+    /* symtab_sec->data was computed in elf_resolve_sections. */
+    ASSERT((symtab_sec->sec->sh_offset + data) == symtab_sec->data);
+
+    /* No need to check values as elf_resolve_sections did it. */
+    nsym = symtab_sec->sec->sh_size / symtab_sec->sec->sh_entsize;
+
+    sym = xmalloc_array(struct xsplice_elf_sym, nsym);
+    if ( !sym )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Could not allocate memory for symbols\n",
+               elf->name);
+        return -ENOMEM;
+    }
+
+    /* So we don't leak memory. */
+    elf->sym = sym;
+
+    for ( i = 1; i < nsym; i++ )
+    {
+        const Elf_Sym *s = symtab_sec->data + symtab_sec->sec->sh_entsize * i;
+
+        delta = s->st_name;
+        /* Boundary check within the .strtab. */
+        if ( delta >= strtab_sec->sec->sh_size )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Symbol [%u] name is not within .strtab!\n",
+                    elf->name, i);
+            return -EINVAL;
+        }
+
+        sym[i].sym = s;
+        sym[i].name = strtab_sec->data + delta;
+    }
+    elf->nsym = nsym;
+
+    return 0;
+}
+
+static int xsplice_header_check(const struct xsplice_elf *elf)
+{
+    const Elf_Ehdr *hdr = elf->hdr;
+
+    if ( sizeof(*elf->hdr) > elf->len )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Section header is bigger than payload!\n",
+                elf->name);
+        return -EINVAL;
+    }
+
+    if ( !IS_ELF(*hdr) )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Not an ELF payload!\n", elf->name);
+        return -EINVAL;
+    }
+
+    /* EI_CLASS, and e_flags are platform specific. */
+    if ( hdr->e_version != EV_CURRENT ||
+         hdr->e_ident[EI_VERSION] != EV_CURRENT ||
+         hdr->e_ident[EI_DATA] != ELFDATA2LSB ||
+         hdr->e_ident[EI_OSABI] != ELFOSABI_SYSV ||
+         hdr->e_type != ET_REL ||
+         hdr->e_phnum != 0 )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Invalid ELF payload!\n", elf->name);
+        return -EOPNOTSUPP;
+    }
+
+    if ( elf->hdr->e_shstrndx == SHN_UNDEF )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Section name idx is undefined!?\n",
+                elf->name);
+        return -EINVAL;
+    }
+
+    /* Check that section name index is within the sections. */
+    if ( elf->hdr->e_shstrndx >= elf->hdr->e_shnum )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Section name idx (%u) is past end of sections (%u)!\n",
+                elf->name, elf->hdr->e_shstrndx, elf->hdr->e_shnum);
+        return -EINVAL;
+    }
+
+    if ( elf->hdr->e_shnum >= SHN_LORESERVE )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Too many (%u) sections!\n",
+                elf->name, elf->hdr->e_shnum);
+        return -EOPNOTSUPP;
+    }
+
+    if ( elf->hdr->e_shoff >= elf->len || elf->hdr->e_shoff >= ULONG_MAX )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Bogus e_shoff!\n", elf->name);
+        return -EINVAL;
+    }
+
+    if ( elf->hdr->e_shentsize < sizeof(Elf_Shdr) )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Section header size is %u! Expected %zu!?\n",
+                elf->name, elf->hdr->e_shentsize, sizeof(Elf_Shdr));
+        return -EINVAL;
+    }
+
+    if ( ((elf->len - elf->hdr->e_shoff) / elf->hdr->e_shentsize) <
+         elf->hdr->e_shnum )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Section header size is corrupted!\n",
+                elf->name);
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+int xsplice_elf_load(struct xsplice_elf *elf, const void *data)
+{
+    int rc;
+
+    elf->hdr = data;
+
+    rc = xsplice_header_check(elf);
+    if ( rc )
+        return rc;
+
+    rc = elf_resolve_sections(elf, data);
+    if ( rc )
+        return rc;
+
+    rc = elf_resolve_section_names(elf, data);
+    if ( rc )
+        return rc;
+
+    rc = elf_get_sym(elf, data);
+    if ( rc )
+        return rc;
+
+    return 0;
+}
+
+void xsplice_elf_free(struct xsplice_elf *elf)
+{
+    xfree(elf->sec);
+    elf->sec = NULL;
+    xfree(elf->sym);
+    elf->sym = NULL;
+    elf->nsym = 0;
+    elf->name = NULL;
+    elf->len = 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index b9f08cd..7559877 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -10,6 +10,9 @@ struct xen_sysctl_xsplice_op;
 
 #ifdef CONFIG_XSPLICE
 
+/* Convenience define for printk. */
+#define XSPLICE             "xsplice: "
+
 int xsplice_op(struct xen_sysctl_xsplice_op *);
 
 #else
diff --git a/xen/include/xen/xsplice_elf.h b/xen/include/xen/xsplice_elf.h
new file mode 100644
index 0000000..686aaf0
--- /dev/null
+++ b/xen/include/xen/xsplice_elf.h
@@ -0,0 +1,51 @@
+/*
+ * Copyright (C) 2016 Citrix Systems R&D Ltd.
+ */
+
+#ifndef __XEN_XSPLICE_ELF_H__
+#define __XEN_XSPLICE_ELF_H__
+
+#include <xen/types.h>
+#include <xen/elfstructs.h>
+
+/* The following describes an Elf file as consumed by xSplice. */
+struct xsplice_elf_sec {
+    const Elf_Shdr *sec;                 /* Hooked up in elf_resolve_sections.*/
+    const char *name;                    /* Human readable name hooked in
+                                            elf_resolve_section_names. */
+    const void *data;                    /* Pointer to the section (done by
+                                            elf_resolve_sections). */
+};
+
+struct xsplice_elf_sym {
+    const Elf_Sym *sym;
+    const char *name;
+};
+
+struct xsplice_elf {
+    const char *name;                    /* Pointer to payload->name. */
+    size_t len;                          /* Length of the ELF file. */
+    const Elf_Ehdr *hdr;                 /* ELF file. */
+    struct xsplice_elf_sec *sec;         /* Array of sections, allocated by us. */
+    struct xsplice_elf_sym *sym;         /* Array of symbols , allocated by us. */
+    unsigned int nsym;
+    const struct xsplice_elf_sec *symtab;/* Pointer to .symtab section - aka to sec[x]. */
+    const struct xsplice_elf_sec *strtab;/* Pointer to .strtab section - aka to sec[y]. */
+};
+
+const struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
+                                                      const char *name);
+int xsplice_elf_load(struct xsplice_elf *elf, const void *data);
+void xsplice_elf_free(struct xsplice_elf *elf);
+
+#endif /* __XEN_XSPLICE_ELF_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.4.3


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 04/27] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
  2016-04-26 17:50     ` Konrad Rzeszutek Wilk
@ 2016-04-27  6:51       ` Jan Beulich
  2016-04-27 13:47         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 90+ messages in thread
From: Jan Beulich @ 2016-04-27  6:51 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, Wei Liu, ross.lagerwall, andrew.cooper3,
	Ian Jackson, mpohlack, sasha.levin, xen-devel, Daniel De Graaf

>>> On 26.04.16 at 19:50, <konrad.wilk@oracle.com> wrote:
> On Tue, Apr 26, 2016 at 04:21:10AM -0600, Jan Beulich wrote:
>> >>> On 25.04.16 at 17:34, <konrad.wilk@oracle.com> wrote:
>> > The implementation does not actually do any patching.
>> > 
>> > It just adds the framework for doing the hypercalls,
>> > keeping track of ELF payloads, and the basic operations:
>> >  - query which payloads exist,
>> >  - query for specific payloads,
>> >  - check*1, apply*1, replace*1, and unload payloads.
>> > 
>> > *1: Which of course in this patch are nops.
>> > 
>> > The functionality is disabled on ARM until all arch
>> > components are implemented.
>> > 
>> > Also by default it is disabled until the implementation
>> > is in place.
>> > 
>> > We also use recursive spinlocks to so that the find_payload
>> > function does not need to have a 'lock' and 'non-lock' variant.
>> > 
>> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> > Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
>> > Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> > Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
>> 
>> I'm hesitant to say that, but with all of this:
>> 
>> > v9:
>> >     s/find_name/get_name/, drop locks when allocating data.
>> >     Drop conditional expression on copyback
>> >     Move the allocation on upload outside the spinlock.
>> >     Add (TECH PREVIEW) to the Kconfig help
>> >     Return -EINVAL if the CHECK or UNLOAD action is to be performed and the payload
>> >     state is not in expected state.
>> >     Print 'c' not 'u' when invoking the keyhandler.
>> 
>> ... I'm not sure the earlier R-b can still be considered valid. Andrew?
> 
> I don't know what the criteria is for dropping an Reviewed-by.
> I am happy to drop it if you would like - but it may be that Andrew
> is OK with the way he had his review?
> 
> Or is this more of your view as maintainer - that is the patch
> changed considerably (and what is that? percentage of the patch?
> small amount of the patch? Trivial changes? Dropping code?)?

Indeed, that's the aspects that matter: _Any_ non-trivial change
to the area a tag was offered of should lead to the tag getting
dropped. That is, if you make substantial changes to e.g. non-XSM
parts but have an XSM ack, that can of course stay.

Among the above, the obviously (to me) non-trivial changes are
the ordering adjustment of allocation vs locking.

>> > +static int get_name(const xen_xsplice_name_t *name, char *n)
>> > +{
>> > +    if ( !name->size || name->size > XEN_XSPLICE_NAME_SIZE )
>> > +        return -EINVAL;
>> > +
>> > +    if ( name->pad[0] || name->pad[1] || name->pad[2] )
>> > +        return -EINVAL;
>> > +
>> > +    if ( !guest_handle_okay(name->name, name->size) )
>> > +        return -EINVAL;
>> > +
>> > +    if ( __copy_from_guest(n, name->name, name->size) )
>> > +        return -EFAULT;
>> 
>> Quoting part of my v8.1 reply:
>> "Is there a particular reason why you open code copy_from_guest() here?"
> 
> You mean why I use guest_handle_okay and __copy_from_guest instead of
> say copy_from_guest?
> 
> I think it is an artificat of earlier changes - in which the find_name
> would only check 'name-size' and then in another function we would
> just do '__copy_from_guest'. But that is not needed anymore - so let
> me change it to 'copy_from_guest'

Right, but that change didn't happen.

> I thought at some point you asked for that as the check was done for
> it once and there was no point

This may well have been in some much earlier version, where the
two lived in different places. But when they're together, they
clearly should be folded back.

>> > +static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
>> > +{
>> > +    struct payload *data, *found;
>> > +    char n[XEN_XSPLICE_NAME_SIZE];
>> > +    int rc;
>> > +
>> > +    rc = verify_payload(upload, n);
>> > +    if ( rc )
>> > +        return rc;
>> > +
>> > +    data = xzalloc(struct payload);
>> > +
>> > +    spin_lock(&payload_lock);
>> > +
>> > +    found = find_payload(n);
>> > +    if ( IS_ERR(found) )
>> > +    {
>> > +        rc = PTR_ERR(found);
>> > +        goto out;
>> > +    }
>> > +    else if ( found )
>> > +    {
>> > +        rc = -EEXIST;
>> > +        goto out;
>> > +    }
>> > +
>> > +    if ( !data )
>> > +    {
>> > +        rc = -ENOMEM;
>> > +        goto out;
>> > +    }
>> > +
>> > +    rc = 0;
>> 
>> rc is already zero by the time we get here.
>> 
>> I also wonder whether the code wouldn't be easier to read if you
>> used just a sequence of if()/else if() here, without any goto-s.
> 
> But I do need to free(data) and unlock the spinlock - so having
> a common code to pass through makes sense.
> 
> Unless you mean have an condition on if ( !rc ), and do the normal path?
> Like so:
> 
>     rc = verify_payload(upload, n);
>     if ( rc )
>         return rc;
> 
>     data = xzalloc(struct payload);
> 
>     spin_lock(&payload_lock);
> 
>     found = find_payload(n);
>     if ( IS_ERR(found) )
>         rc = PTR_ERR(found);
>     else if ( found )
>         rc = -EEXIST;
> 
>     if ( !rc && !data )

This can just be "else if ( !data )" afaict.

>         rc = -ENOMEM;
> 
>     if ( !rc )

And this one then just "else".

>     {
>         memcpy(data->name, n, strlen(n));
>         data->state = XSPLICE_STATE_CHECKED;
>         INIT_LIST_HEAD(&data->list);
> 
>         list_add_tail(&data->list, &payload_list);
>         payload_cnt++;
>         payload_version++;
>     }
> 
>     spin_unlock(&payload_lock);
> 
>     if ( rc )
>         xfree(data);
> 
>     return rc;
> 
> 
> That looks fine here, but in the subsequent patch I have to also
> check for
> 
> if ( __copy_from_guest(raw_data, upload->payload, upload->size) )       

This could easily be another "else if()" in the chain outlined above.

> and
> rc = load_payload_data(data, raw_data, upload->size);

But I can see that this one would be a little less neat to integrate.

> and goto statement help a lot there.
> 
> I would rather have it the way it is now if you are OK with that?

As I have tried to express by saying "I also wonder", and as this
clearly is a matter of taste to some degree, I'm not insisting on
that alternative code flow. What I'd really like to ask for is
consistency though: While in the patch here you use

    if ( ... )
    {
        rc = ...;
        goto ...;
    }

patch 11 introduces an instance of the alternative

    rc = -E...;
    if ( ... )
        goto ...;

Similarly (see above) you should aim at consistency between
if/else-if chains or chains of just if-s, when each of them ends in an
unconditional goto (or return, continue, or break, taking a more
general perspective). Not mixing styles helps avoid (possibly silent)
questions by readers along the lines of "Is there a reason it's done
one way here and another way a few lines down?"

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 08/27] arm/x86/vmap: Add v[z|m]alloc_xen and vm_init_type
  2016-04-27  2:38     ` Konrad Rzeszutek Wilk
@ 2016-04-27  7:12       ` Jan Beulich
  2016-04-27 13:46         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 90+ messages in thread
From: Jan Beulich @ 2016-04-27  7:12 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, Keir Fraser, ross.lagerwall, andrew.cooper3,
	Ian Jackson, Tim Deegan, mpohlack, Julien Grall, sasha.levin,
	xen-devel

>>> On 27.04.16 at 04:38, <konrad.wilk@oracle.com> wrote:
>>  With vm_alloc() getting removed, vm_free() should get removed
>> here too. And with that, vm_alloc_type() and vm_free_type() can
>> then just become vm_alloc() and vm_free() respectively (as static
>> internal functions).
> 
> Please take a look at this inline one:

Better, and it can have my ack, but it's still doing more changes than
really needed:

> +static void vunmap_pages(const void *va, unsigned int pages)
> +{
> +#ifndef _PAGE_NONE
> +    unsigned long addr = (unsigned long)va;
> +
> +    destroy_xen_mappings(addr, addr + PAGE_SIZE * pages);
> +#else /* Avoid tearing down intermediate page tables. */
> +    map_pages_to_xen((unsigned long)va, 0, pages, _PAGE_NONE);
> +#endif
> +    vm_free(va);
> +}

There's no real reason to break this out and move up here - the
two callers other than vunmap() could easily continue to call
vunmap(). The more that you do not similarly leverage knowing
the type here already (all callers of vunmap_pages() already
know the type, and hence could pass it here).

> +void vunmap(const void *va)
> +{
> +    enum vmap_region type = VMAP_DEFAULT;

If vunmap_pages() was to stay, and was to continue to not have a
type parameter, this local variable is pointless.

> @@ -266,16 +308,32 @@ void *vzalloc(size_t size)
>      return p;
>  }
>  
> +void *vzalloc(size_t size)
> +{
> +    return vzalloc_type(size, VMAP_DEFAULT);
> +}
> +
> +void *vzalloc_xen(size_t size)
> +{
> +    return vzalloc_type(size, VMAP_XEN);
> +}

I didn't look at your replies to the later patches yet, but considering
my reply to the one using vzalloc_xen() I wonder whether in fact
you still need this flavor (and hence vzalloc_type()).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 10/27] xsplice: Add helper elf routines
  2016-04-27  1:59     ` Konrad Rzeszutek Wilk
@ 2016-04-27  7:27       ` Jan Beulich
  2016-04-27 14:00         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 90+ messages in thread
From: Jan Beulich @ 2016-04-27  7:27 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, Ian Jackson,
	Tim Deegan, mpohlack, sasha.levin, xen-devel

>>> On 27.04.16 at 03:59, <konrad.wilk@oracle.com> wrote:
>> > +static int xsplice_header_check(const struct xsplice_elf *elf)
>> > +{
>> > +    const Elf_Ehdr *hdr = elf->hdr;
>> > +
>> > +    if ( sizeof(*elf->hdr) > elf->len )
>> > +    {
>> > +        dprintk(XENLOG_ERR, XSPLICE "%s: Section header is bigger than 
> payload!\n",
>> > +                elf->name);
>> > +        return -EINVAL;
>> > +    }
>> > +
>> > +    if ( !IS_ELF(*hdr) )
>> > +    {
>> > +        dprintk(XENLOG_ERR, XSPLICE "%s: Not an ELF payload!\n", elf->name);
>> > +        return -EINVAL;
>> > +    }
>> > +
>> > +    if ( hdr->e_ident[EI_CLASS] != ELFCLASS64 ||
>> > +         hdr->e_ident[EI_DATA] != ELFDATA2LSB ||
>> > +         hdr->e_ident[EI_OSABI] != ELFOSABI_SYSV ||
>> 
>> What about EI_VERSION and EI_ABIVERSION, btw?
> 
> As I did some prototype on ARM32 I realized that the EI_CLASS is wrong
> in common code (as ELFCLASS32 is what ARM32 has). And the EI_ABIVERSION
> too.

EI_CLASS I can easily see (and in fact EI_DATA would need to
move there too, now that you menton this aspect), but why
EI_ABIVERSION? Afaik there are no versions other than 0
defined for ELFOSABI_NONE (which btw we wrongly call
ELFOSABI_SYSV). That imo is either EI_OSABI and EI_ABIVERSION
need to move, or both should be in common code.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 10/27] xsplice: Add helper elf routines
  2016-04-27  4:06     ` Konrad Rzeszutek Wilk
@ 2016-04-27  7:52       ` Jan Beulich
  2016-04-27 18:45         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 90+ messages in thread
From: Jan Beulich @ 2016-04-27  7:52 UTC (permalink / raw)
  To: andrew.cooper3, ross.lagerwall, Konrad Rzeszutek Wilk
  Cc: Keir Fraser, Tim Deegan, Ian Jackson, mpohlack, xen-devel, sasha.levin

>>> On 27.04.16 at 06:06, <konrad.wilk@oracle.com> wrote:
>> > +static int xsplice_header_check(const struct xsplice_elf *elf)
>> > +{ 
> ..snip..
>> > +    if ( elf->hdr->e_shnum > 64 )
>> > +    {
>> > +        dprintk(XENLOG_ERR, XSPLICE "%s: Too many (%u) sections!\n",
>> > +                elf->name, elf->hdr->e_shnum);
>> > +        return -EOPNOTSUPP;
>> > +    }
>> > +
>> > +    if ( elf->hdr->e_shoff > ULONG_MAX )
>> 
>> Why not ">= elf->len" (and I see it was almost that way in v8.1)?
> 
> I misunderstood your comment. You mentioned to me that we have
> an boundary check here (when it was against elf->len) and that you
> wanted an overflow - so I replaced it - while you meant - in addition to.
> 
> But adding in both:
> 
> 	elf->hdr->e_shoff >= ULONG_MAX || elf->hdr->e_shoff >= elf->len
> 
> feels unneccessary. And the boundary check is more imporant.
> I added both in the code.

And indeed the latter being more strict than the former, the former
should be dropped.

> v10:
>   - Change the check against 64 to be against SHN_LORESERVE

So we're moving between the extremes, and (as said in reply to v9)
I think we really want to be somewhere in the middle.

Andrew? Ross?

> +static int elf_resolve_sections(struct xsplice_elf *elf, const void *data)
> +{
> +    struct xsplice_elf_sec *sec;
> +    unsigned int i;
> +    Elf_Off delta;
> +    int rc;
> +
> +    /* xsplice_elf_load sanity checked e_shnum. */
> +    sec = xmalloc_array(struct xsplice_elf_sec, elf->hdr->e_shnum);
> +    if ( !sec )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE"%s: Could not allocate memory for section table!\n",
> +               elf->name);
> +        return -ENOMEM;
> +    }
> +
> +    elf->sec = sec;
> +
> +    /* e_shoff and e_shnum overflow checks are done in xsplice_header_check. */
> +    delta = elf->hdr->e_shoff + elf->hdr->e_shnum * elf->hdr->e_shentsize;
> +    if ( delta > elf->len )

You've added the suggested (transformation of the expression above)
check there, so the check here is now redundant and hence could be
dropped, or simply be converted to an ASSERT().

> +static int elf_resolve_section_names(struct xsplice_elf *elf, const void *data)
> +{
> +    const char *shstrtab;
> +    unsigned int i;
> +    Elf_Off offset, delta;
> +    struct xsplice_elf_sec *sec;
> +    int rc;
> +
> +    /*
> +     * The elf->sec[0 -> e_shnum] structures have been verified by
> +     * elf_resolve_sections. Find file offset for section string table
> +     * (normally called .shstrtab)
> +     */
> +    sec = &elf->sec[elf->hdr->e_shstrndx];
> +
> +    rc = elf_verify_strtab(sec);
> +    if ( rc )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Section string table is corrupted\n",
> +                elf->name);
> +        return rc;
> +    }
> +
> +    /* Verified in elf_resolve_sections but just in case. */
> +    offset = sec->sec->sh_offset;
> +    ASSERT(offset < elf->len && (offset + sec->sec->sh_size <= elf->len));
> +
> +    shstrtab = data + offset;
> +
> +    for ( i = 1; i < elf->hdr->e_shnum; i++ )
> +    {
> +        delta = elf->sec[i].sec->sh_name;
> +
> +        /* Boundary check on offset of name within the .shstrtab. */
> +        if ( delta >= sec->sec->sh_size )
> +        {
> +            dprintk(XENLOG_ERR, XSPLICE "%s: shstrtab [%u] data is past end of payload!\n",

You've fixed the message text in elf_get_sym() but not here.

> +static int elf_get_sym(struct xsplice_elf *elf, const void *data)
> +{
> +    const struct xsplice_elf_sec *symtab_sec, *strtab_sec;
> +    struct xsplice_elf_sym *sym;
> +    unsigned int i, delta, offset, nsym;
> +
> +    symtab_sec = elf->symtab;
> +    strtab_sec = elf->strtab;
> +
> +    /* Pointers arithmetic to get file offset. */
> +    offset = strtab_sec->data - data;
> +
> +    /* Checked already in elf_resolve_sections, but just in case. */
> +    ASSERT(offset == strtab_sec->sec->sh_offset);

Considering the different types of the expressions on both sides of
the ==, wouldn't it be better for offset to be of Elf_Off type?

> +    ASSERT(offset < elf->len && (offset + strtab_sec->sec->sh_size <= elf->len));
> +
> +    /* symtab_sec->data was computed in elf_resolve_sections. */
> +    ASSERT((symtab_sec->sec->sh_offset + data) == symtab_sec->data);
> +
> +    /* No need to check values as elf_resolve_sections did it. */
> +    nsym = symtab_sec->sec->sh_size / symtab_sec->sec->sh_entsize;
> +
> +    sym = xmalloc_array(struct xsplice_elf_sym, nsym);
> +    if ( !sym )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Could not allocate memory for symbols\n",
> +               elf->name);
> +        return -ENOMEM;
> +    }
> +
> +    /* So we don't leak memory. */
> +    elf->sym = sym;
> +
> +    for ( i = 1; i < nsym; i++ )
> +    {
> +        const Elf_Sym *s = symtab_sec->data + symtab_sec->sec->sh_entsize * i;
> +
> +        delta = s->st_name;

And similarly here, for delta to be Elf_Word? Both more along the
lines of what elf_resolve_section_names() has...

> +static int xsplice_header_check(const struct xsplice_elf *elf)
> +{
> +    const Elf_Ehdr *hdr = elf->hdr;
> +
> +    if ( sizeof(*elf->hdr) > elf->len )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Section header is bigger than payload!\n",
> +                elf->name);
> +        return -EINVAL;
> +    }
> +
> +    if ( !IS_ELF(*hdr) )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Not an ELF payload!\n", elf->name);
> +        return -EINVAL;
> +    }
> +
> +    /* EI_CLASS, and e_flags are platform specific. */
> +    if ( hdr->e_version != EV_CURRENT ||
> +         hdr->e_ident[EI_VERSION] != EV_CURRENT ||
> +         hdr->e_ident[EI_DATA] != ELFDATA2LSB ||

As said, this also needs to become arch-specific.

> +         hdr->e_ident[EI_OSABI] != ELFOSABI_SYSV ||
> +         hdr->e_type != ET_REL ||
> +         hdr->e_phnum != 0 )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Invalid ELF payload!\n", elf->name);
> +        return -EOPNOTSUPP;
> +    }
> +
> +    if ( elf->hdr->e_shstrndx == SHN_UNDEF )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Section name idx is undefined!?\n",
> +                elf->name);
> +        return -EINVAL;
> +    }
> +
> +    /* Check that section name index is within the sections. */
> +    if ( elf->hdr->e_shstrndx >= elf->hdr->e_shnum )

Since this uses e_shnum as a boundary, it would seem more logical
for this to be done after the e_shnum check itself.

> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Section name idx (%u) is past end of sections (%u)!\n",
> +                elf->name, elf->hdr->e_shstrndx, elf->hdr->e_shnum);
> +        return -EINVAL;
> +    }
> +
> +    if ( elf->hdr->e_shnum >= SHN_LORESERVE )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Too many (%u) sections!\n",

The message text is now stale (but may become correct again if the
conditional gets changed again).

> +                elf->name, elf->hdr->e_shnum);
> +        return -EOPNOTSUPP;
> +    }
> +
> +    if ( elf->hdr->e_shoff >= elf->len || elf->hdr->e_shoff >= ULONG_MAX )

As said - the right side of the || is weaker than the left side, and
hence should be dropped.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 11/27] xsplice: Implement payload loading
  2016-04-27  1:47     ` Konrad Rzeszutek Wilk
@ 2016-04-27  7:57       ` Jan Beulich
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Beulich @ 2016-04-27  7:57 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, Keir Fraser, ross.lagerwall, andrew.cooper3,
	mpohlack, Julien Grall, sasha.levin, xen-devel

>>> On 27.04.16 at 03:47, <konrad.wilk@oracle.com> wrote:
>> > +static int move_payload(struct payload *payload, struct xsplice_elf *elf)
>> > +{
> .. snip..
>> > +    /* Compute size of different regions. */
>> > +    for ( i = 1; i < elf->hdr->e_shnum; i++ )
>> > +    {
>> > +        if ( (elf->sec[i].sec->sh_flags & (SHF_ALLOC|SHF_EXECINSTR)) ==
>> > +             (SHF_ALLOC|SHF_EXECINSTR) )
>> > +            calc_section(&elf->sec[i], &payload->text_size, &offset[i]);
>> 
>> This silently accepts writable text sections, yet the portion of the
>> memory this gets placed in will be mapped RX.
> 
> I am not sure I follow. We only accept if sh_flags have AX. Not WAX?
> How am I accepting writable text sections?

Because the & masks off SHF_WRITE, i.e. you only check that
SHF_ALLOC and SHF_EXECINSTR are set, but not that SHF_WRITE
is clear.

>> > +        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
>> > +                  !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
>> > +                  (elf->sec[i].sec->sh_flags & SHF_WRITE) )
>> > +            calc_section(&elf->sec[i], &payload->rw_size, &offset[i]);
>> > +        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
>> > +                  !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
>> > +                  !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
>> > +            calc_section(&elf->sec[i], &payload->ro_size, &offset[i]);
>> > +        else if ( !elf->sec[i].sec->sh_flags ||
>> > +                  (elf->sec[i].sec->sh_flags & SHF_EXECINSTR) ||
>> > +                  (elf->sec[i].sec->sh_flags & SHF_MASKPROC) )
>> > +            /* Do nothing.*/;
>> > +        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
>> > +                  (elf->sec[i].sec->sh_type == SHT_NOBITS) )
>> > +        {
>> > +            dprintk(XENLOG_DEBUG, XSPLICE "%s: Not supporting %s section!\n",
>> > +                    elf->name, elf->sec[i].name);
>> > +            rc = -EOPNOTSUPP;
>> > +            goto out;
>> > +        }
>> 
>> I saw this in the changelog, but I don't really understand these last
>> two conditionals. Wouldn't you want to bail on _any_ sections which
> 
> The first (/Do nothing/) is for sections such as .rela.* (which we can
> ditch after we are done), .symtab, .strtab (for which in later patches in
> build_symbol_table construct a copy), and:
> 
> [ 1] .note.gnu.build-i NOTE 0000000000000000  00000040
>        0000000000000024  0000000000000000   A       0     0     4
> 
> which value we just copy in struct payload->id.
> (also in later patch).

All of which would fall under the "ignore sections with SHF_ALLOC
clear" rule, as mentioned ...

>> have SHF_ALLOC set but don't get mapped to one of the three
>> blocks? And wouldn't you (silently) ignore any sections with SHF_ALLOC
>> clear?

... here.

>> > +int xsplice_elf_resolve_symbols(struct xsplice_elf *elf)
>> > +{
>> > +    unsigned int i;
>> > +    int rc = 0;
>> > +
>> > +    ASSERT(elf->sym);
>> > +
>> > +    for ( i = 1; i < elf->nsym; i++ )
>> > +    {
>> > +        unsigned int idx = elf->sym[i].sym->st_shndx;
>> > +        Elf_Sym *sym = (Elf_Sym *)elf->sym[i].sym;
>> 
>> Well, I admit that this is the more straightforward solution, but it
>> opens up all of what sym points to for writing. I.e. I'd have
>> considered it much better to really only do the casting away of
>> const in the one spot where you need it (see below).
> 
> OK. That may become a bit cumbersome. We would have in the later
> patches (xsplice,symbols: Implement symbol name resolution on addres)
> the SHN_UNDEF doing symbol lookup. And that one tries to set
> sym->st_value twice.
> 
> I can certainly cast it twice there, and then once in the default
> case if you would like.

How about calculating the new value into a local variable, and doing
the cast needed for the assignment just once after the switch()?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 11/27] xsplice: Implement payload loading
  2016-04-27  3:28     ` Konrad Rzeszutek Wilk
@ 2016-04-27  8:28       ` Jan Beulich
  2016-04-27 15:48         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 90+ messages in thread
From: Jan Beulich @ 2016-04-27  8:28 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, Keir Fraser, ross.lagerwall, andrew.cooper3,
	mpohlack, Julien Grall, sasha.levin, xen-devel

>>> On 27.04.16 at 05:28, <konrad.wilk@oracle.com> wrote:
>> > +static int move_payload(struct payload *payload, struct xsplice_elf *elf)
>> > +{
> ..snip..
>> > +    for ( i = 1; i < elf->hdr->e_shnum; i++ )
>> > +    {
>> > +        if ( elf->sec[i].sec->sh_flags & SHF_ALLOC )
>> > +        {
>> > +            uint8_t *buf;
>> 
>> Perhaps void * again? And missing a blank line afterwards.
>> 
>> > +            if ( (elf->sec[i].sec->sh_flags & SHF_EXECINSTR) )
>> > +                buf = text_buf;
>> > +            else if ( (elf->sec[i].sec->sh_flags & SHF_WRITE) )
>> > +                buf = rw_buf;
>> > +             else
>> 
>> The indentation here is still one off.
> 
> I am not seeing it. I deleted the line and added it back using
> spaces just in case. But I really don't see the indentation isse
> you are seeing?

Just count the number of blanks - I count 12 ahead of the "else if"
but 13 ahead of the bare "else". And it's still the same in the
updated patch below. Checking what the list archives say (in case
this is an artifact of my mail client) ... Same there (and even more
easily visible in the browser).

> +int arch_xsplice_perform_rela(struct xsplice_elf *elf,
> +                              const struct xsplice_elf_sec *base,
> +                              const struct xsplice_elf_sec *rela)
> +{
> +    const Elf_RelA *r;
> +    unsigned int symndx, i;
> +    uint64_t val;
> +    uint8_t *dest;
> +
> +    /* Nothing to do. */
> +    if ( !rela->sec->sh_size )
> +        return 0;
> +
> +    if ( rela->sec->sh_entsize < sizeof(Elf_RelA) ||
> +         rela->sec->sh_size % rela->sec->sh_entsize )
> +    {
> +        dprintk(XENLOG_ERR, XSPLICE "%s: Section relative header is corrupted!\n",
> +                elf->name);
> +        return -EINVAL;
> +    }
> +
> +    for ( i = 0; i < (rela->sec->sh_size / rela->sec->sh_entsize); i++ )
> +    {
> +        r = rela->data + i * rela->sec->sh_entsize;
> +
> +        symndx = ELF64_R_SYM(r->r_info);
> +
> +        if ( symndx > elf->nsym )
> +        {
> +            dprintk(XENLOG_ERR, XSPLICE "%s: Relative relocation wants symbol@%u which is past end!\n",
> +                    elf->name, symndx);
> +            return -EINVAL;
> +        }
> +
> +        if ( r->r_offset >= base->sec->sh_size )
> +            goto bad_offset;

There's one more thing to consider here, which I only notice now:
For "NONE" relocations this check should not be done. Since these

> +        dest = base->load_addr + r->r_offset;
> +        val = r->r_addend + elf->sym[symndx].sym->st_value;

don't touch the possibly out of bounds destination yet, I think
the above should just be dropped here in favor of doing things
below in the individual case statements. But to deal with overflow,
the check above would need to be moved there, i.e. not dropped
entirely.

> +static int move_payload(struct payload *payload, struct xsplice_elf *elf)
> +{
> +    void *text_buf, *ro_buf, *rw_buf;
> +    unsigned int i;
> +    size_t size = 0;
> +    unsigned int *offset;
> +    int rc = 0;
> +
> +    offset = xmalloc_array(unsigned int, elf->hdr->e_shnum);
> +    if ( !offset )
> +        return -ENOMEM;
> +
> +    /* Compute size of different regions. */
> +    for ( i = 1; i < elf->hdr->e_shnum; i++ )
> +    {
> +        if ( (elf->sec[i].sec->sh_flags & (SHF_ALLOC|SHF_EXECINSTR)) ==
> +             (SHF_ALLOC|SHF_EXECINSTR) )
> +            calc_section(&elf->sec[i], &payload->text_size, &offset[i]);
> +        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> +                  !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
> +                  (elf->sec[i].sec->sh_flags & SHF_WRITE) )
> +            calc_section(&elf->sec[i], &payload->rw_size, &offset[i]);
> +        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> +                  !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
> +                  !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
> +            calc_section(&elf->sec[i], &payload->ro_size, &offset[i]);
> +        else if ( !elf->sec[i].sec->sh_flags ||
> +                  (elf->sec[i].sec->sh_flags & SHF_EXECINSTR) ||
> +                  (elf->sec[i].sec->sh_flags & SHF_MASKPROC) )
> +            /*
> +             * Do nothing. These are .rel.text, rel.*, .symtab, .strtab,
> +             * and .shstrtab. For the non-relocate we allocate and copy these
> +             * via other means - and the .rel we can ignore as we only use it
> +             * once during loading.
> +             */
> +            offset[i] = UINT_MAX;
> +        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> +                  (elf->sec[i].sec->sh_type == SHT_NOBITS) )
> +        {
> +            dprintk(XENLOG_DEBUG, XSPLICE "%s: Not supporting %s section!\n",
> +                    elf->name, elf->sec[i].name);
> +            rc = -EOPNOTSUPP;
> +            goto out;
> +        }
> +        else /* Such as .comment, or .debug_str. */
> +        {
> +            dprintk(XENLOG_DEBUG, XSPLICE "%s: Ignoring %s section!\n",
> +                    elf->name, elf->sec[i].name);
> +            offset[i] = UINT_MAX;
> +        }

See earlier reply regarding this entire loop body.

> +int xsplice_elf_resolve_symbols(struct xsplice_elf *elf)
> +{
> +    unsigned int i;
> +    int rc = 0;
> +
> +    ASSERT(elf->sym);
> +
> +    for ( i = 1; i < elf->nsym; i++ )
> +    {
> +        unsigned int idx = elf->sym[i].sym->st_shndx;
> +        Elf_Sym *sym = (Elf_Sym *)elf->sym[i].sym;

Again, see earlier reply.

> +        switch ( idx )
> +        {
> +        case SHN_COMMON:
> +            dprintk(XENLOG_ERR, XSPLICE "%s: Unexpected common symbol: %s\n",
> +                    elf->name, elf->sym[i].name);
> +            rc = -EINVAL;
> +            break;
> +
> +        case SHN_UNDEF:
> +            dprintk(XENLOG_ERR, XSPLICE "%s: Unknown symbol: %s\n",
> +                    elf->name, elf->sym[i].name);
> +            rc = -ENOENT;
> +            break;
> +
> +        case SHN_ABS:
> +            dprintk(XENLOG_DEBUG, XSPLICE "%s: Absolute symbol: %s => %#"PRIxElfAddr"\n",
> +                    elf->name, elf->sym[i].name, sym->st_value);
> +            break;
> +
> +        default:
> +            /* SHN_COMMON and SHN_ABS are above. */
> +            if ( idx >= SHN_LORESERVE )
> +                rc = -EOPNOTSUPP;
> +            else if ( idx >= elf->hdr->e_shnum )
> +                rc = -EINVAL;
> +
> +            if ( rc )
> +            {
> +                dprintk(XENLOG_ERR, XSPLICE "%s: Unknown type=%#"PRIx16"\n",

"Out of bounds symbol section"?

Also just %#x now that idx is "unsigned int".

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 12/27] xsplice: Implement support for applying/reverting/replacing patches.
  2016-04-27  3:39     ` Konrad Rzeszutek Wilk
@ 2016-04-27  8:36       ` Jan Beulich
  2016-05-11  9:51       ` Martin Pohlack
  1 sibling, 0 replies; 90+ messages in thread
From: Jan Beulich @ 2016-04-27  8:36 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Kevin Tian, Stefano Stabellini, Keir Fraser,
	Suravee Suthikulpanit, andrew.cooper3, mpohlack, ross.lagerwall,
	Julien Grall, Jun Nakajima, sasha.levin, xen-devel,
	Boris Ostrovsky

>>> On 27.04.16 at 05:39, <konrad.wilk@oracle.com> wrote:
> +static int check_special_sections(const struct xsplice_elf *elf)
> +{
> +    unsigned int i;
> +    static const char *const names[] = { ELF_XSPLICE_FUNC };
> +    DECLARE_BITMAP(count, ARRAY_SIZE(names)) = { 0 };

Perhaps better "seen" or "found" or some such, now that this
isn't an array of counters anymore?

> +    for ( i = 0; i < ARRAY_SIZE(names); i++ )
> +    {
> +        const struct xsplice_elf_sec *sec;
> +
> +        sec = xsplice_elf_sec_by_name(elf, names[i]);
> +        if ( !sec )
> +        {
> +            dprintk(XENLOG_ERR, XSPLICE "%s: %s is missing!\n",
> +                    elf->name, names[i]);
> +            return -EINVAL;
> +        }
> +
> +        if ( !sec->sec->sh_size )
> +        {
> +            dprintk(XENLOG_ERR, XSPLICE "%s: %s is empty!\n",
> +                    elf->name, names[i]);
> +            return -EINVAL;
> +        }
> +
> +        if ( test_bit(i, count) )
> +        {
> +            dprintk(XENLOG_ERR, XSPLICE "%s: %s was seen more than once!\n",
> +                    elf->name, names[i]);
> +            return -EINVAL;
> +        }
> +
> +        __set_bit(i, count);

__test_and_set_bit() to fold the two?

In any event, feel free to add my ack.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 16/27] x86, xsplice: Print payload's symbol name and payload name in backtraces
  2016-04-27  3:31           ` Konrad Rzeszutek Wilk
@ 2016-04-27  8:37             ` Jan Beulich
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Beulich @ 2016-04-27  8:37 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, Ross Lagerwall, andrew.cooper3, Ian Jackson,
	Tim Deegan, mpohlack, sasha.levin, xen-devel

>>> On 27.04.16 at 05:31, <konrad.wilk@oracle.com> wrote:
> From affca85da4d57c466cc3a603afa4d57fea7ed092 Mon Sep 17 00:00:00 2001
> From: Ross Lagerwall <ross.lagerwall@citrix.com>
> Date: Fri, 22 Apr 2016 11:16:36 -0400
> Subject: [PATCH] x86, xsplice: Print payload's symbol name and payload name in
>  backtraces
> 
> Naturally the backtrace is presented when an instruction
> hits an bug_frame or %p is used.
> 
> The payloads do not support bug_frames yet - however the functions
> the payloads call could hit an BUG() or WARN().
> 
> The traps.c has logic to scan for it this - and eventually it will
> find the correct bug_frame and the walk the stack using %p to print
> the backtrace. For %p and symbols to print a string -  the
> 'is_active_kernel_text' is consulted which uses an 'struct virtual_region'.
> 
> Therefore we register our start->end addresses so that
> 'is_active_kernel_text' will include our payload address.
> 
> We also register our symbol lookup table function so that it can
> scan the list of payloads and retrieve the correct name.
> 
> Lastly we change vsprintf to take into account s and namebuf.
> For core code they are the same, but for payloads they are different.
> This gets us:
> 
> Xen call trace:
>    [<ffff82d080a00041>] revert_hook+0x31/0x35 [xen_hello_world]
>    [<ffff82d0801431bd>] xsplice.c#revert_payload+0x86/0xc6
>    [<ffff82d080143502>] check_for_xsplice_work+0x233/0x3cd
>    [<ffff82d08017a0b2>] domain.c#continue_idle_domain+0x9/0x1f
> 
> Which is great if payloads have similar or same symbol names.
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 19/27] xsplice: Add support for alternatives
  2016-04-25 15:35 ` [PATCH v9 19/27] xsplice: Add support for alternatives Konrad Rzeszutek Wilk
@ 2016-04-27  8:58   ` Jan Beulich
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Beulich @ 2016-04-27  8:58 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, mpohlack,
	sasha.levin, xen-devel

>>> On 25.04.16 at 17:35, <konrad.wilk@oracle.com> wrote:
> From: Ross Lagerwall <ross.lagerwall@citrix.com>
> 
> Add support for applying alternative sections within xsplice payload.
> At payload load time, apply an alternative sections that are found.
> 
> Also we add an test-case exercising a rather useless alternative
> (patching a NOP with a NOP) - but it does exercise the code-path.
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 24/27] xsplice: Stacking build-id dependency checking.
  2016-04-25 15:35 ` [PATCH v9 24/27] xsplice: Stacking build-id dependency checking Konrad Rzeszutek Wilk
@ 2016-04-27  9:27   ` Jan Beulich
  2016-04-27 16:36     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 90+ messages in thread
From: Jan Beulich @ 2016-04-27  9:27 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, mpohlack,
	sasha.levin, xen-devel

>>> On 25.04.16 at 17:35, <konrad.wilk@oracle.com> wrote:
> @@ -25,7 +28,7 @@ clean::
>  .PHONY: config.h
>  config.h: OLD_CODE_SZ=$(call CODE_SZ,$(BASEDIR)/xen-syms,xen_extra_version)
>  config.h: NEW_CODE_SZ=$(call CODE_SZ,$<,xen_hello_world)
> -config.h: xen_hello_world_func.o
> +config.h: xen_hello_world_func.o xen_bye_world_func.o

Why is that?

> @@ -33,9 +36,43 @@ config.h: xen_hello_world_func.o
>  xen_hello_world.o: xen_hello_world_func.o
>  
>  .PHONY: $(XSPLICE)
> -$(XSPLICE): config.h xen_hello_world_func.o xen_hello_world.o
> -	$(LD) $(LDFLAGS) -r -o $(XSPLICE) xen_hello_world_func.o \
> -		xen_hello_world.o
> +$(XSPLICE): config.h xen_hello_world_func.o xen_hello_world.o note.o
> +	$(LD) $(LDFLAGS) $(build_id_linker) -r -o $(XSPLICE) \
> +		xen_hello_world_func.o xen_hello_world.o note.o

Probably easier to read and maintain if you used $(filter %.o,$^)
here?

> +xen_bye_world.o: xen_bye_world_func.o

Again - why?

> +.PHONY: $(XSPLICE_BYE)
> +$(XSPLICE_BYE): $(XSPLICE) config.h xen_bye_world_func.o xen_bye_world.o hello_world_note.o

The object files depend on config.h, but the binary does only
indirectly via the object files I would guess. (This, just like the
question right above, would then apply to the $(XSPLICE) related
rules too, in an earlier patch.)

> +	$(LD) $(LDFLAGS) $(build_id_linker) -r -o $(XSPLICE_BYE) \
> +		xen_bye_world_func.o xen_bye_world.o hello_world_note.o

Same as above - better use $^ (and if config.h goes away as a
direct dependency, it looks like you don't even need $(filter ...)).

> +int xen_build_id_check(const Elf_Note *n, unsigned int n_sz,
> +                       const void **p, unsigned int *len)
> +{
> +    /* Check if we really have a build-id. */
> +    if ( NT_GNU_BUILD_ID != n->type )
> +        return -ENODATA;
> +
> +    if ( n_sz <= sizeof(*n) )
> +        return -EINVAL;
> +
> +    if ( n->namesz + n->descsz > UINT_MAX )

Afaict this is always false. I think you really want

    if ( n->namesz + n->descsz < n->namesz )

> +        return -EINVAL;
> +
> +    if ( n->namesz != 4 /* GNU\0 */)

< 4 would suffice here (and be more flexible if odd padding gets
inserted by whatever generates the note)

> +        return -EINVAL;
> +
> +    if ( n->namesz + n->descsz + sizeof(*n) > n_sz )

    if ( n->namesz + n->descsz > n_sz - sizeof(*n) )

> @@ -98,18 +130,9 @@ static int __init xen_build_init(void)
>      if ( &n[1] > __note_gnu_build_id_end )
>          return -ENODATA;;
>  
> -    /* Check if we really have a build-id. */
> -    if ( NT_GNU_BUILD_ID != n->type )
> -        return -ENODATA;
> +    sz = (size_t)__note_gnu_build_id_end - (size_t)n;

So let's hope sizeof(void *) == sizeof(size_t) (or else this would yield
compiler warnings).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 26/27] xsplice: Prevent duplicate payloads from being loaded.
  2016-04-25 15:35 ` [PATCH v9 26/27] xsplice: Prevent duplicate payloads from being loaded Konrad Rzeszutek Wilk
@ 2016-04-27  9:31   ` Jan Beulich
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Beulich @ 2016-04-27  9:31 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, Ian Jackson,
	Tim Deegan, mpohlack, sasha.levin, xen-devel

>>> On 25.04.16 at 17:35, <konrad.wilk@oracle.com> wrote:
> From: Ross Lagerwall <ross.lagerwall@citrix.com>
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Keir Fraser <keir@xen.org>
> Cc: Tim Deegan <tim@xen.org>
> 
> v6: Drop recursive lock - also now the caller is holding the lock
>     Move the code up in the code above.
> v7: Add Andrew's Reviewed-by
> v9: Add const on struct payload.
>     Check data->id.len != payload->id.len in the loop
> ---
> ---
>  xen/common/xsplice.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
> index a8b208d..b5e2135 100644
> --- a/xen/common/xsplice.c
> +++ b/xen/common/xsplice.c
> @@ -520,6 +520,8 @@ static int prepare_payload(struct payload *payload,
>      sec = xsplice_elf_sec_by_name(elf, ELF_BUILD_ID_NOTE);
>      if ( sec )
>      {
> +        const struct payload *data;
> +
>          n = sec->load_addr;
>  
>          if ( sec->sec->sh_size <= sizeof(*n) )
> @@ -531,6 +533,20 @@ static int prepare_payload(struct payload *payload,
>  
>          if ( !payload->id.len || !payload->id.p )
>              return -EINVAL;
> +
> +        /* Make sure it is not a duplicate. */
> +        list_for_each_entry ( data, &payload_list, list )
> +        {
> +            /* No way _this_ payload is on the list. */
> +            ASSERT(data != payload);
> +            if ( data->id.len != payload->id.len ||

DYM

            if ( data->id.len == payload->id.len &&

? (I'm sorry for having suggested it the wrong way round in the reply
to v8.1.)

With that fixed
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 08/27] arm/x86/vmap: Add v[z|m]alloc_xen and vm_init_type
  2016-04-27  7:12       ` Jan Beulich
@ 2016-04-27 13:46         ` Konrad Rzeszutek Wilk
  2016-04-27 14:15           ` Jan Beulich
  0 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-27 13:46 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Keir Fraser, andrew.cooper3, Ian Jackson,
	Tim Deegan, mpohlack, ross.lagerwall, Julien Grall, sasha.levin,
	xen-devel

On Wed, Apr 27, 2016 at 01:12:24AM -0600, Jan Beulich wrote:
> >>> On 27.04.16 at 04:38, <konrad.wilk@oracle.com> wrote:
> >>  With vm_alloc() getting removed, vm_free() should get removed
> >> here too. And with that, vm_alloc_type() and vm_free_type() can
> >> then just become vm_alloc() and vm_free() respectively (as static
> >> internal functions).
> > 
> > Please take a look at this inline one:
> 
> Better, and it can have my ack, but it's still doing more changes than
> really needed:
> 
> > +static void vunmap_pages(const void *va, unsigned int pages)
> > +{
> > +#ifndef _PAGE_NONE
> > +    unsigned long addr = (unsigned long)va;
> > +
> > +    destroy_xen_mappings(addr, addr + PAGE_SIZE * pages);
> > +#else /* Avoid tearing down intermediate page tables. */
> > +    map_pages_to_xen((unsigned long)va, 0, pages, _PAGE_NONE);
> > +#endif
> > +    vm_free(va);
> > +}
> 
> There's no real reason to break this out and move up here - the
> two callers other than vunmap() could easily continue to call
> vunmap(). The more that you do not similarly leverage knowing
> the type here already (all callers of vunmap_pages() already
> know the type, and hence could pass it here).

/me nods.

> 
> > +void vunmap(const void *va)
> > +{
> > +    enum vmap_region type = VMAP_DEFAULT;
> 
> If vunmap_pages() was to stay, and was to continue to not have a
> type parameter, this local variable is pointless.
> 
> > @@ -266,16 +308,32 @@ void *vzalloc(size_t size)
> >      return p;
> >  }
> >  
> > +void *vzalloc(size_t size)
> > +{
> > +    return vzalloc_type(size, VMAP_DEFAULT);
> > +}
> > +
> > +void *vzalloc_xen(size_t size)
> > +{
> > +    return vzalloc_type(size, VMAP_XEN);
> > +}
> 
> I didn't look at your replies to the later patches yet, but considering
> my reply to the one using vzalloc_xen() I wonder whether in fact
> you still need this flavor (and hence vzalloc_type()).

/me nods.

Then this should be perfect:
From cef95bc0682f94ca5e61609211c4787491212acf Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Tue, 26 Apr 2016 14:03:06 -0400
Subject: [PATCH] arm/x86/vmap: Add vmalloc_xen and vm_init_type

For those users who want to use the virtual addresses that
are in the hypervisor's code/data region address space -
these three new functions allow that.

Implementation wise the vmap API keeps track of two virtual
address regions now:
 a) VMAP_VIRT_START
 b) Any provided virtual address space (need start and end).

The a) one is the default one and the existing behavior
for users of vmalloc, vmap, etc is the same.

If however one wishes to use the b) one only has to use
the vm_init_type to initialize and the vmzalloc_xen to utilize
it (vfree and vunmap are capable of searching both address spaces).

This allows users (such as xSplice) to provide their own
mechanism to change the the page flags, and also use virtual
addresses closer to the hypervisor virtual addresses (at least
on x86) while not having to deal with the allocation of
pages.

For example of users, see patch titled "xsplice: Implement payload
loading", where we parse the payload's ELF relocations - which
is defined to be signed 32-bit (on x86) (max displacement hence
is 2GB virtual space, ARM32 is 128MB). The displacement of the
hypervisor virtual addresses to the vmalloc (on x86)
is more than 32-bits - which means that ELF relocations would
truncate the 34 and 33th bit. Hence this alternate API.

We also add add extra checks in case the b) range has not been
initialized.

Part of this patch also removes 'vm_alloc' and 'vm_free'
decleration as we do not have any users of it.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Suggested-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com> [ARM]

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>

v4: New patch.
v5: Update per Jan's comments.
v6: Drop the stray parentheses on typedefs.
    Ditch the vunmap callback. Stash away the virtual addresses in lists.
    Ditch the vmap callback. Just provide virtual address.
    Ditch the vmalloc_range. Require users of alternative virtual address
    to call vmap_init_type first.
v7: Don't expose the vmalloc_type and such. Instead provide an wrapper
    called vmalloc_xen for those.
    Rename the enum, change one of the names.
    Moved the vunmap_type around in c file so we don't have to declare
    it in the header.
v9: Remove the vunmap_xen, removed vm_alloc from header.
    Add vzalloc_xen
v10:
    Properly ASSERT on ranges
    Make vm_free and vunmap automatically detect the right va space.
    Remove from header vm_free. Rename vm_alloc_type and vm_free_type
    to  vm_alloc and vm_free respectively.
v10 - inline patch set in v8:
    Ditch the vzalloc_xen
    Squash vunmap and vunmap_pages together. Move back to original position.
    Drop vzalloc_type and only expose vzalloc.
---
 xen/arch/arm/kernel.c  |   2 +-
 xen/arch/arm/mm.c      |   2 +-
 xen/arch/x86/mm.c      |   2 +-
 xen/common/vmap.c      | 169 ++++++++++++++++++++++++++++++-------------------
 xen/drivers/acpi/osl.c |   2 +-
 xen/include/xen/vmap.h |  21 ++++--
 6 files changed, 125 insertions(+), 73 deletions(-)

diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
index 61808ac..9871bd9 100644
--- a/xen/arch/arm/kernel.c
+++ b/xen/arch/arm/kernel.c
@@ -299,7 +299,7 @@ static __init int kernel_decompress(struct bootmodule *mod)
         return -ENOMEM;
     }
     mfn = _mfn(page_to_mfn(pages));
-    output = __vmap(&mfn, 1 << kernel_order_out, 1, 1, PAGE_HYPERVISOR);
+    output = __vmap(&mfn, 1 << kernel_order_out, 1, 1, PAGE_HYPERVISOR, VMAP_DEFAULT);
 
     rc = perform_gunzip(output, input, size);
     clean_dcache_va_range(output, output_size);
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 7065c3e..94ea054 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -807,7 +807,7 @@ void *ioremap_attr(paddr_t pa, size_t len, unsigned int attributes)
     mfn_t mfn = _mfn(PFN_DOWN(pa));
     unsigned int offs = pa & (PAGE_SIZE - 1);
     unsigned int nr = PFN_UP(offs + len);
-    void *ptr = __vmap(&mfn, nr, 1, 1, attributes);
+    void *ptr = __vmap(&mfn, nr, 1, 1, attributes, VMAP_DEFAULT);
 
     if ( ptr == NULL )
         return NULL;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index a42097f..2bb920b 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -6179,7 +6179,7 @@ void __iomem *ioremap(paddr_t pa, size_t len)
         unsigned int offs = pa & (PAGE_SIZE - 1);
         unsigned int nr = PFN_UP(offs + len);
 
-        va = __vmap(&mfn, nr, 1, 1, PAGE_HYPERVISOR_NOCACHE) + offs;
+        va = __vmap(&mfn, nr, 1, 1, PAGE_HYPERVISOR_NOCACHE, VMAP_DEFAULT) + offs;
     }
 
     return (void __force __iomem *)va;
diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index 134eda0..2393df1 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -10,40 +10,43 @@
 #include <asm/page.h>
 
 static DEFINE_SPINLOCK(vm_lock);
-static void *__read_mostly vm_base;
-#define vm_bitmap ((unsigned long *)vm_base)
+static void *__read_mostly vm_base[VMAP_REGION_NR];
+#define vm_bitmap(x) ((unsigned long *)vm_base[x])
 /* highest allocated bit in the bitmap */
-static unsigned int __read_mostly vm_top;
+static unsigned int __read_mostly vm_top[VMAP_REGION_NR];
 /* total number of bits in the bitmap */
-static unsigned int __read_mostly vm_end;
+static unsigned int __read_mostly vm_end[VMAP_REGION_NR];
 /* lowest known clear bit in the bitmap */
-static unsigned int vm_low;
+static unsigned int vm_low[VMAP_REGION_NR];
 
-void __init vm_init(void)
+void __init vm_init_type(enum vmap_region type, void *start, void *end)
 {
     unsigned int i, nr;
     unsigned long va;
 
-    vm_base = (void *)VMAP_VIRT_START;
-    vm_end = PFN_DOWN(arch_vmap_virt_end() - vm_base);
-    vm_low = PFN_UP((vm_end + 7) / 8);
-    nr = PFN_UP((vm_low + 7) / 8);
-    vm_top = nr * PAGE_SIZE * 8;
+    ASSERT(!vm_base[type]);
 
-    for ( i = 0, va = (unsigned long)vm_bitmap; i < nr; ++i, va += PAGE_SIZE )
+    vm_base[type] = start;
+    vm_end[type] = PFN_DOWN(end - start);
+    vm_low[type]= PFN_UP((vm_end[type] + 7) / 8);
+    nr = PFN_UP((vm_low[type] + 7) / 8);
+    vm_top[type] = nr * PAGE_SIZE * 8;
+
+    for ( i = 0, va = (unsigned long)vm_bitmap(type); i < nr; ++i, va += PAGE_SIZE )
     {
         struct page_info *pg = alloc_domheap_page(NULL, 0);
 
         map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR);
         clear_page((void *)va);
     }
-    bitmap_fill(vm_bitmap, vm_low);
+    bitmap_fill(vm_bitmap(type), vm_low[type]);
 
     /* Populate page tables for the bitmap if necessary. */
-    populate_pt_range(va, 0, vm_low - nr);
+    populate_pt_range(va, 0, vm_low[type] - nr);
 }
 
-void *vm_alloc(unsigned int nr, unsigned int align)
+static void *vm_alloc(unsigned int nr, unsigned int align,
+                      enum vmap_region t)
 {
     unsigned int start, bit;
 
@@ -52,27 +55,31 @@ void *vm_alloc(unsigned int nr, unsigned int align)
     else if ( align & (align - 1) )
         align &= -align;
 
+    ASSERT((t >= VMAP_DEFAULT) && (t < VMAP_REGION_NR));
+    if ( !vm_base[t] )
+        return NULL;
+
     spin_lock(&vm_lock);
     for ( ; ; )
     {
         struct page_info *pg;
 
-        ASSERT(vm_low == vm_top || !test_bit(vm_low, vm_bitmap));
-        for ( start = vm_low; start < vm_top; )
+        ASSERT(vm_low[t] == vm_top[t] || !test_bit(vm_low[t], vm_bitmap(t)));
+        for ( start = vm_low[t]; start < vm_top[t]; )
         {
-            bit = find_next_bit(vm_bitmap, vm_top, start + 1);
-            if ( bit > vm_top )
-                bit = vm_top;
+            bit = find_next_bit(vm_bitmap(t), vm_top[t], start + 1);
+            if ( bit > vm_top[t] )
+                bit = vm_top[t];
             /*
              * Note that this skips the first bit, making the
              * corresponding page a guard one.
              */
             start = (start + align) & ~(align - 1);
-            if ( bit < vm_top )
+            if ( bit < vm_top[t] )
             {
                 if ( start + nr < bit )
                     break;
-                start = find_next_zero_bit(vm_bitmap, vm_top, bit + 1);
+                start = find_next_zero_bit(vm_bitmap(t), vm_top[t], bit + 1);
             }
             else
             {
@@ -82,12 +89,12 @@ void *vm_alloc(unsigned int nr, unsigned int align)
             }
         }
 
-        if ( start < vm_top )
+        if ( start < vm_top[t] )
             break;
 
         spin_unlock(&vm_lock);
 
-        if ( vm_top >= vm_end )
+        if ( vm_top[t] >= vm_end[t] )
             return NULL;
 
         pg = alloc_domheap_page(NULL, 0);
@@ -96,23 +103,23 @@ void *vm_alloc(unsigned int nr, unsigned int align)
 
         spin_lock(&vm_lock);
 
-        if ( start >= vm_top )
+        if ( start >= vm_top[t] )
         {
-            unsigned long va = (unsigned long)vm_bitmap + vm_top / 8;
+            unsigned long va = (unsigned long)vm_bitmap(t) + vm_top[t] / 8;
 
             if ( !map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR) )
             {
                 clear_page((void *)va);
-                vm_top += PAGE_SIZE * 8;
-                if ( vm_top > vm_end )
-                    vm_top = vm_end;
+                vm_top[t] += PAGE_SIZE * 8;
+                if ( vm_top[t] > vm_end[t] )
+                    vm_top[t] = vm_end[t];
                 continue;
             }
         }
 
         free_domheap_page(pg);
 
-        if ( start >= vm_top )
+        if ( start >= vm_top[t] )
         {
             spin_unlock(&vm_lock);
             return NULL;
@@ -120,47 +127,58 @@ void *vm_alloc(unsigned int nr, unsigned int align)
     }
 
     for ( bit = start; bit < start + nr; ++bit )
-        __set_bit(bit, vm_bitmap);
-    if ( bit < vm_top )
-        ASSERT(!test_bit(bit, vm_bitmap));
+        __set_bit(bit, vm_bitmap(t));
+    if ( bit < vm_top[t] )
+        ASSERT(!test_bit(bit, vm_bitmap(t)));
     else
-        ASSERT(bit == vm_top);
-    if ( start <= vm_low + 2 )
-        vm_low = bit;
+        ASSERT(bit == vm_top[t]);
+    if ( start <= vm_low[t] + 2 )
+        vm_low[t] = bit;
     spin_unlock(&vm_lock);
 
-    return vm_base + start * PAGE_SIZE;
+    return vm_base[t] + start * PAGE_SIZE;
 }
 
-static unsigned int vm_index(const void *va)
+static unsigned int vm_index(const void *va, enum vmap_region type)
 {
     unsigned long addr = (unsigned long)va & ~(PAGE_SIZE - 1);
     unsigned int idx;
+    unsigned long start = (unsigned long)vm_base[type];
+
+    if ( !start )
+        return 0;
 
-    if ( addr < VMAP_VIRT_START + (vm_end / 8) ||
-         addr >= VMAP_VIRT_START + vm_top * PAGE_SIZE )
+    if ( addr < start + (vm_end[type] / 8) ||
+         addr >= start + vm_top[type] * PAGE_SIZE )
         return 0;
 
-    idx = PFN_DOWN(va - vm_base);
-    return !test_bit(idx - 1, vm_bitmap) &&
-           test_bit(idx, vm_bitmap) ? idx : 0;
+    idx = PFN_DOWN(va - vm_base[type]);
+    return !test_bit(idx - 1, vm_bitmap(type)) &&
+           test_bit(idx, vm_bitmap(type)) ? idx : 0;
 }
 
-static unsigned int vm_size(const void *va)
+static unsigned int vm_size(const void *va, enum vmap_region type)
 {
-    unsigned int start = vm_index(va), end;
+    unsigned int start = vm_index(va, type), end;
 
     if ( !start )
         return 0;
 
-    end = find_next_zero_bit(vm_bitmap, vm_top, start + 1);
+    end = find_next_zero_bit(vm_bitmap(type), vm_top[type], start + 1);
 
-    return min(end, vm_top) - start;
+    return min(end, vm_top[type]) - start;
 }
 
-void vm_free(const void *va)
+static void vm_free(const void *va)
 {
-    unsigned int bit = vm_index(va);
+    enum vmap_region type = VMAP_DEFAULT;
+    unsigned int bit = vm_index(va, type);
+
+    if ( !bit )
+    {
+        type = VMAP_XEN;
+        bit = vm_index(va, type);
+    }
 
     if ( !bit )
     {
@@ -169,22 +187,23 @@ void vm_free(const void *va)
     }
 
     spin_lock(&vm_lock);
-    if ( bit < vm_low )
+    if ( bit < vm_low[type] )
     {
-        vm_low = bit - 1;
-        while ( !test_bit(vm_low - 1, vm_bitmap) )
-            --vm_low;
+        vm_low[type] = bit - 1;
+        while ( !test_bit(vm_low[type] - 1, vm_bitmap(type)) )
+            --vm_low[type];
     }
-    while ( __test_and_clear_bit(bit, vm_bitmap) )
-        if ( ++bit == vm_top )
+    while ( __test_and_clear_bit(bit, vm_bitmap(type)) )
+        if ( ++bit == vm_top[type] )
             break;
     spin_unlock(&vm_lock);
 }
 
 void *__vmap(const mfn_t *mfn, unsigned int granularity,
-             unsigned int nr, unsigned int align, unsigned int flags)
+             unsigned int nr, unsigned int align, unsigned int flags,
+             enum vmap_region type)
 {
-    void *va = vm_alloc(nr * granularity, align);
+    void *va = vm_alloc(nr * granularity, align, type);
     unsigned long cur = (unsigned long)va;
 
     for ( ; va && nr--; ++mfn, cur += PAGE_SIZE * granularity )
@@ -201,22 +220,28 @@ void *__vmap(const mfn_t *mfn, unsigned int granularity,
 
 void *vmap(const mfn_t *mfn, unsigned int nr)
 {
-    return __vmap(mfn, 1, nr, 1, PAGE_HYPERVISOR);
+    return __vmap(mfn, 1, nr, 1, PAGE_HYPERVISOR, VMAP_DEFAULT);
 }
 
 void vunmap(const void *va)
 {
 #ifndef _PAGE_NONE
     unsigned long addr = (unsigned long)va;
+#endif
+    unsigned int pages = vm_size(va, VMAP_DEFAULT);
+
+    if ( !pages )
+        pages = vm_size(va, VMAP_XEN);
 
-    destroy_xen_mappings(addr, addr + PAGE_SIZE * vm_size(va));
+#ifndef _PAGE_NONE
+    destroy_xen_mappings(addr, addr + PAGE_SIZE * pages);
 #else /* Avoid tearing down intermediate page tables. */
-    map_pages_to_xen((unsigned long)va, 0, vm_size(va), _PAGE_NONE);
+    map_pages_to_xen((unsigned long)va, 0, pages, _PAGE_NONE);
 #endif
     vm_free(va);
 }
 
-void *vmalloc(size_t size)
+static void *vmalloc_type(size_t size, enum vmap_region type)
 {
     mfn_t *mfn;
     size_t pages, i;
@@ -238,7 +263,7 @@ void *vmalloc(size_t size)
         mfn[i] = _mfn(page_to_mfn(pg));
     }
 
-    va = vmap(mfn, pages);
+    va = __vmap(mfn, 1, pages, 1, PAGE_HYPERVISOR, type);
     if ( va == NULL )
         goto error;
 
@@ -252,9 +277,19 @@ void *vmalloc(size_t size)
     return NULL;
 }
 
+void *vmalloc(size_t size)
+{
+    return vmalloc_type(size, VMAP_DEFAULT);
+}
+
+void *vmalloc_xen(size_t size)
+{
+    return vmalloc_type(size, VMAP_XEN);
+}
+
 void *vzalloc(size_t size)
 {
-    void *p = vmalloc(size);
+    void *p = vmalloc_type(size, VMAP_DEFAULT);
     int i;
 
     if ( p == NULL )
@@ -271,11 +306,17 @@ void vfree(void *va)
     unsigned int i, pages;
     struct page_info *pg;
     PAGE_LIST_HEAD(pg_list);
+    enum vmap_region type = VMAP_DEFAULT;
 
     if ( !va )
         return;
 
-    pages = vm_size(va);
+    pages = vm_size(va, type);
+    if ( !pages )
+    {
+        type = VMAP_XEN;
+        pages = vm_size(va, type);
+    }
     ASSERT(pages);
 
     for ( i = 0; i < pages; i++ )
diff --git a/xen/drivers/acpi/osl.c b/xen/drivers/acpi/osl.c
index 8a28d87..9a49029 100644
--- a/xen/drivers/acpi/osl.c
+++ b/xen/drivers/acpi/osl.c
@@ -97,7 +97,7 @@ acpi_os_map_memory(acpi_physical_address phys, acpi_size size)
 		if (IS_ENABLED(CONFIG_X86) && !((phys + size - 1) >> 20))
 			return __va(phys);
 		return __vmap(&mfn, PFN_UP(offs + size), 1, 1,
-			      ACPI_MAP_MEM_ATTR) + offs;
+			      ACPI_MAP_MEM_ATTR, VMAP_DEFAULT) + offs;
 	}
 	return __acpi_map_table(phys, size);
 }
diff --git a/xen/include/xen/vmap.h b/xen/include/xen/vmap.h
index 5671ac8..369560e 100644
--- a/xen/include/xen/vmap.h
+++ b/xen/include/xen/vmap.h
@@ -4,14 +4,22 @@
 #include <xen/mm.h>
 #include <asm/page.h>
 
-void *vm_alloc(unsigned int nr, unsigned int align);
-void vm_free(const void *);
+enum vmap_region {
+    VMAP_DEFAULT,
+    VMAP_XEN,
+    VMAP_REGION_NR,
+};
 
-void *__vmap(const mfn_t *mfn, unsigned int granularity,
-             unsigned int nr, unsigned int align, unsigned int flags);
+void vm_init_type(enum vmap_region type, void *start, void *end);
+
+void *__vmap(const mfn_t *mfn, unsigned int granularity, unsigned int nr,
+             unsigned int align, unsigned int flags, enum vmap_region);
 void *vmap(const mfn_t *mfn, unsigned int nr);
 void vunmap(const void *);
+
 void *vmalloc(size_t size);
+void *vmalloc_xen(size_t size);
+
 void *vzalloc(size_t size);
 void vfree(void *va);
 
@@ -24,7 +32,10 @@ static inline void iounmap(void __iomem *va)
     vunmap((void *)(addr & PAGE_MASK));
 }
 
-void vm_init(void);
 void *arch_vmap_virt_end(void);
+static inline void vm_init(void)
+{
+    vm_init_type(VMAP_DEFAULT, (void *)VMAP_VIRT_START, arch_vmap_virt_end());
+}
 
 #endif /* __XEN_VMAP_H__ */
-- 
2.4.3


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 04/27] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
  2016-04-27  6:51       ` Jan Beulich
@ 2016-04-27 13:47         ` Konrad Rzeszutek Wilk
  2016-04-27 14:11           ` Jan Beulich
  0 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-27 13:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, andrew.cooper3, Ian Jackson,
	mpohlack, ross.lagerwall, sasha.levin, xen-devel,
	Daniel De Graaf

On Wed, Apr 27, 2016 at 12:51:34AM -0600, Jan Beulich wrote:
> >>> On 26.04.16 at 19:50, <konrad.wilk@oracle.com> wrote:
> > On Tue, Apr 26, 2016 at 04:21:10AM -0600, Jan Beulich wrote:
> >> >>> On 25.04.16 at 17:34, <konrad.wilk@oracle.com> wrote:
> >> > The implementation does not actually do any patching.
> >> > 
> >> > It just adds the framework for doing the hypercalls,
> >> > keeping track of ELF payloads, and the basic operations:
> >> >  - query which payloads exist,
> >> >  - query for specific payloads,
> >> >  - check*1, apply*1, replace*1, and unload payloads.
> >> > 
> >> > *1: Which of course in this patch are nops.
> >> > 
> >> > The functionality is disabled on ARM until all arch
> >> > components are implemented.
> >> > 
> >> > Also by default it is disabled until the implementation
> >> > is in place.
> >> > 
> >> > We also use recursive spinlocks to so that the find_payload
> >> > function does not need to have a 'lock' and 'non-lock' variant.
> >> > 
> >> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> >> > Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> >> > Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >> > Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
> >> 
> >> I'm hesitant to say that, but with all of this:
> >> 
> >> > v9:
> >> >     s/find_name/get_name/, drop locks when allocating data.
> >> >     Drop conditional expression on copyback
> >> >     Move the allocation on upload outside the spinlock.
> >> >     Add (TECH PREVIEW) to the Kconfig help
> >> >     Return -EINVAL if the CHECK or UNLOAD action is to be performed and the payload
> >> >     state is not in expected state.
> >> >     Print 'c' not 'u' when invoking the keyhandler.
> >> 
> >> ... I'm not sure the earlier R-b can still be considered valid. Andrew?
> > 
> > I don't know what the criteria is for dropping an Reviewed-by.
> > I am happy to drop it if you would like - but it may be that Andrew
> > is OK with the way he had his review?
> > 
> > Or is this more of your view as maintainer - that is the patch
> > changed considerably (and what is that? percentage of the patch?
> > small amount of the patch? Trivial changes? Dropping code?)?
> 
> Indeed, that's the aspects that matter: _Any_ non-trivial change
> to the area a tag was offered of should lead to the tag getting
> dropped. That is, if you make substantial changes to e.g. non-XSM
> parts but have an XSM ack, that can of course stay.
> 
> Among the above, the obviously (to me) non-trivial changes are
> the ordering adjustment of allocation vs locking.
> 
> >> > +static int get_name(const xen_xsplice_name_t *name, char *n)
> >> > +{
> >> > +    if ( !name->size || name->size > XEN_XSPLICE_NAME_SIZE )
> >> > +        return -EINVAL;
> >> > +
> >> > +    if ( name->pad[0] || name->pad[1] || name->pad[2] )
> >> > +        return -EINVAL;
> >> > +
> >> > +    if ( !guest_handle_okay(name->name, name->size) )
> >> > +        return -EINVAL;
> >> > +
> >> > +    if ( __copy_from_guest(n, name->name, name->size) )
> >> > +        return -EFAULT;
> >> 
> >> Quoting part of my v8.1 reply:
> >> "Is there a particular reason why you open code copy_from_guest() here?"
> > 
> > You mean why I use guest_handle_okay and __copy_from_guest instead of
> > say copy_from_guest?
> > 
> > I think it is an artificat of earlier changes - in which the find_name
> > would only check 'name-size' and then in another function we would
> > just do '__copy_from_guest'. But that is not needed anymore - so let
> > me change it to 'copy_from_guest'
> 
> Right, but that change didn't happen.
> 
> > I thought at some point you asked for that as the check was done for
> > it once and there was no point
> 
> This may well have been in some much earlier version, where the
> two lived in different places. But when they're together, they
> clearly should be folded back.
> 
> >> > +static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
> >> > +{
> >> > +    struct payload *data, *found;
> >> > +    char n[XEN_XSPLICE_NAME_SIZE];
> >> > +    int rc;
> >> > +
> >> > +    rc = verify_payload(upload, n);
> >> > +    if ( rc )
> >> > +        return rc;
> >> > +
> >> > +    data = xzalloc(struct payload);
> >> > +
> >> > +    spin_lock(&payload_lock);
> >> > +
> >> > +    found = find_payload(n);
> >> > +    if ( IS_ERR(found) )
> >> > +    {
> >> > +        rc = PTR_ERR(found);
> >> > +        goto out;
> >> > +    }
> >> > +    else if ( found )
> >> > +    {
> >> > +        rc = -EEXIST;
> >> > +        goto out;
> >> > +    }
> >> > +
> >> > +    if ( !data )
> >> > +    {
> >> > +        rc = -ENOMEM;
> >> > +        goto out;
> >> > +    }
> >> > +
> >> > +    rc = 0;
> >> 
> >> rc is already zero by the time we get here.
> >> 
> >> I also wonder whether the code wouldn't be easier to read if you
> >> used just a sequence of if()/else if() here, without any goto-s.
> > 
> > But I do need to free(data) and unlock the spinlock - so having
> > a common code to pass through makes sense.
> > 
> > Unless you mean have an condition on if ( !rc ), and do the normal path?
> > Like so:
> > 
> >     rc = verify_payload(upload, n);
> >     if ( rc )
> >         return rc;
> > 
> >     data = xzalloc(struct payload);
> > 
> >     spin_lock(&payload_lock);
> > 
> >     found = find_payload(n);
> >     if ( IS_ERR(found) )
> >         rc = PTR_ERR(found);
> >     else if ( found )
> >         rc = -EEXIST;
> > 
> >     if ( !rc && !data )
> 
> This can just be "else if ( !data )" afaict.

But then we "lose"
> 
> >         rc = -ENOMEM;
> > 
> >     if ( !rc )
> 
> And this one then just "else".
> 
> >     {
> >         memcpy(data->name, n, strlen(n));
> >         data->state = XSPLICE_STATE_CHECKED;
> >         INIT_LIST_HEAD(&data->list);
> > 
> >         list_add_tail(&data->list, &payload_list);
> >         payload_cnt++;
> >         payload_version++;
> >     }
> > 
> >     spin_unlock(&payload_lock);
> > 
> >     if ( rc )
> >         xfree(data);
> > 
> >     return rc;
> > 
> > 
> > That looks fine here, but in the subsequent patch I have to also
> > check for
> > 
> > if ( __copy_from_guest(raw_data, upload->payload, upload->size) )       
> 
> This could easily be another "else if()" in the chain outlined above.
> 
> > and
> > rc = load_payload_data(data, raw_data, upload->size);
> 
> But I can see that this one would be a little less neat to integrate.

But it is neater than what it has now.
The final product ends up being:

    rc = verify_payload(upload, n);
    if ( rc )
        return rc;

    data = xzalloc(struct payload);
    raw_data = vmalloc(upload->size);

    spin_lock(&payload_lock);

    found = find_payload(n);
    if ( IS_ERR(found) )
        rc = PTR_ERR(found);
    else if ( found )
        rc = -EEXIST;
    else if ( !data || !raw_data )
        rc = -ENOMEM;
    else if ( __copy_from_guest(raw_data, upload->payload, upload->size) )
        rc = -EFAULT;
    else
    {
        memcpy(data->name, n, strlen(n));

        rc = load_payload_data(data, raw_data, upload->size);
        if ( rc )
            goto out;

        data->state = XSPLICE_STATE_CHECKED;
        INIT_LIST_HEAD(&data->list);

        list_add_tail(&data->list, &payload_list);
        payload_cnt++;
        payload_version++;
    }

 out:
    spin_unlock(&payload_lock);

    vfree(raw_data);

    if ( rc )
        xfree(data);

    return rc;

> 
> > and goto statement help a lot there.
> > 
> > I would rather have it the way it is now if you are OK with that?
> 
> As I have tried to express by saying "I also wonder", and as this
> clearly is a matter of taste to some degree, I'm not insisting on
> that alternative code flow. What I'd really like to ask for is
> consistency though: While in the patch here you use
> 
>     if ( ... )
>     {
>         rc = ...;
>         goto ...;
>     }
> 
> patch 11 introduces an instance of the alternative
> 
>     rc = -E...;
>     if ( ... )
>         goto ...;
> 
> Similarly (see above) you should aim at consistency between
> if/else-if chains or chains of just if-s, when each of them ends in an
> unconditional goto (or return, continue, or break, taking a more
> general perspective). Not mixing styles helps avoid (possibly silent)
> questions by readers along the lines of "Is there a reason it's done
> one way here and another way a few lines down?"

Different authors, different matter of taste - as you saw with
the sizeof and this one - Ross and me write code differently.

How do you and Andrew deal with this one?
> 
> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 10/27] xsplice: Add helper elf routines
  2016-04-27  7:27       ` Jan Beulich
@ 2016-04-27 14:00         ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-27 14:00 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, andrew.cooper3, Ian Jackson, Tim Deegan, mpohlack,
	ross.lagerwall, sasha.levin, xen-devel

On Wed, Apr 27, 2016 at 01:27:05AM -0600, Jan Beulich wrote:
> >>> On 27.04.16 at 03:59, <konrad.wilk@oracle.com> wrote:
> >> > +static int xsplice_header_check(const struct xsplice_elf *elf)
> >> > +{
> >> > +    const Elf_Ehdr *hdr = elf->hdr;
> >> > +
> >> > +    if ( sizeof(*elf->hdr) > elf->len )
> >> > +    {
> >> > +        dprintk(XENLOG_ERR, XSPLICE "%s: Section header is bigger than 
> > payload!\n",
> >> > +                elf->name);
> >> > +        return -EINVAL;
> >> > +    }
> >> > +
> >> > +    if ( !IS_ELF(*hdr) )
> >> > +    {
> >> > +        dprintk(XENLOG_ERR, XSPLICE "%s: Not an ELF payload!\n", elf->name);
> >> > +        return -EINVAL;
> >> > +    }
> >> > +
> >> > +    if ( hdr->e_ident[EI_CLASS] != ELFCLASS64 ||
> >> > +         hdr->e_ident[EI_DATA] != ELFDATA2LSB ||
> >> > +         hdr->e_ident[EI_OSABI] != ELFOSABI_SYSV ||
> >> 
> >> What about EI_VERSION and EI_ABIVERSION, btw?
> > 
> > As I did some prototype on ARM32 I realized that the EI_CLASS is wrong
> > in common code (as ELFCLASS32 is what ARM32 has). And the EI_ABIVERSION
> > too.
> 
> EI_CLASS I can easily see (and in fact EI_DATA would need to
> move there too, now that you menton this aspect), but why
> EI_ABIVERSION? Afaik there are no versions other than 0
> defined for ELFOSABI_NONE (which btw we wrongly call
> ELFOSABI_SYSV). That imo is either EI_OSABI and EI_ABIVERSION
> need to move, or both should be in common code.

I keep on getting messed up with the e_flags which for ARM32 has
0x500000:

 Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              REL (Relocatable file)
  Machine:                           ARM
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          0 (bytes into file)
  Start of section headers:          4528 (bytes into file)
  Flags:                             0x5000000, Version5 EABI
  Size of this header:               52 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         0
  Size of section headers:           40 (bytes)
  Number of section headers:         33
  Section header string table index: 30

while same test-case under x86:
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              REL (Relocatable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x24
  Start of program headers:          0 (bytes into file)
  Start of section headers:          6680 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         0
  Size of section headers:           64 (bytes)
  Number of section headers:         47
  Section header string table index: 44

Moved EI_ABIVERSION back in the common code. Moved EI_CLASS and EI_DATA
in platform specific.
> 
> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 04/27] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
  2016-04-27 13:47         ` Konrad Rzeszutek Wilk
@ 2016-04-27 14:11           ` Jan Beulich
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Beulich @ 2016-04-27 14:11 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, Wei Liu, andrew.cooper3, Ian Jackson,
	mpohlack, ross.lagerwall, sasha.levin, xen-devel,
	Daniel De Graaf

>>> On 27.04.16 at 15:47, <konrad@kernel.org> wrote:
> On Wed, Apr 27, 2016 at 12:51:34AM -0600, Jan Beulich wrote:
>> >>> On 26.04.16 at 19:50, <konrad.wilk@oracle.com> wrote:
>> > On Tue, Apr 26, 2016 at 04:21:10AM -0600, Jan Beulich wrote:
>> >> I also wonder whether the code wouldn't be easier to read if you
>> >> used just a sequence of if()/else if() here, without any goto-s.
>> > 
>> > But I do need to free(data) and unlock the spinlock - so having
>> > a common code to pass through makes sense.
>> > 
>> > Unless you mean have an condition on if ( !rc ), and do the normal path?
>> > Like so:
>> > 
>> >     rc = verify_payload(upload, n);
>> >     if ( rc )
>> >         return rc;
>> > 
>> >     data = xzalloc(struct payload);
>> > 
>> >     spin_lock(&payload_lock);
>> > 
>> >     found = find_payload(n);
>> >     if ( IS_ERR(found) )
>> >         rc = PTR_ERR(found);
>> >     else if ( found )
>> >         rc = -EEXIST;
>> > 
>> >     if ( !rc && !data )
>> 
>> This can just be "else if ( !data )" afaict.
> 
> But then we "lose"

I don't understand what you're trying to tell me. But it looks like I also
don't need to understand it, since ...

> But it is neater than what it has now.
> The final product ends up being:
> 
>     rc = verify_payload(upload, n);
>     if ( rc )
>         return rc;
> 
>     data = xzalloc(struct payload);
>     raw_data = vmalloc(upload->size);
> 
>     spin_lock(&payload_lock);
> 
>     found = find_payload(n);
>     if ( IS_ERR(found) )
>         rc = PTR_ERR(found);
>     else if ( found )
>         rc = -EEXIST;
>     else if ( !data || !raw_data )
>         rc = -ENOMEM;
>     else if ( __copy_from_guest(raw_data, upload->payload, upload->size) )
>         rc = -EFAULT;
>     else

... this is what I was hoping for.

>> As I have tried to express by saying "I also wonder", and as this
>> clearly is a matter of taste to some degree, I'm not insisting on
>> that alternative code flow. What I'd really like to ask for is
>> consistency though: While in the patch here you use
>> 
>>     if ( ... )
>>     {
>>         rc = ...;
>>         goto ...;
>>     }
>> 
>> patch 11 introduces an instance of the alternative
>> 
>>     rc = -E...;
>>     if ( ... )
>>         goto ...;
>> 
>> Similarly (see above) you should aim at consistency between
>> if/else-if chains or chains of just if-s, when each of them ends in an
>> unconditional goto (or return, continue, or break, taking a more
>> general perspective). Not mixing styles helps avoid (possibly silent)
>> questions by readers along the lines of "Is there a reason it's done
>> one way here and another way a few lines down?"
> 
> Different authors, different matter of taste - as you saw with
> the sizeof and this one - Ross and me write code differently.
> 
> How do you and Andrew deal with this one?

Simply by making code additions fit existing (surrounding) style
(and that's not specific to being between Andrew and me).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 08/27] arm/x86/vmap: Add v[z|m]alloc_xen and vm_init_type
  2016-04-27 13:46         ` Konrad Rzeszutek Wilk
@ 2016-04-27 14:15           ` Jan Beulich
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Beulich @ 2016-04-27 14:15 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, Keir Fraser, andrew.cooper3, Ian Jackson,
	Tim Deegan, mpohlack, ross.lagerwall, Julien Grall, sasha.levin,
	xen-devel

>>> On 27.04.16 at 15:46, <konrad@kernel.org> wrote:
> Then this should be perfect:

Almost.

> From cef95bc0682f94ca5e61609211c4787491212acf Mon Sep 17 00:00:00 2001
> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Date: Tue, 26 Apr 2016 14:03:06 -0400
> Subject: [PATCH] arm/x86/vmap: Add vmalloc_xen and vm_init_type
> 
> For those users who want to use the virtual addresses that
> are in the hypervisor's code/data region address space -
> these three new functions allow that.
> 
> Implementation wise the vmap API keeps track of two virtual
> address regions now:
>  a) VMAP_VIRT_START
>  b) Any provided virtual address space (need start and end).
> 
> The a) one is the default one and the existing behavior
> for users of vmalloc, vmap, etc is the same.
> 
> If however one wishes to use the b) one only has to use
> the vm_init_type to initialize and the vmzalloc_xen to utilize
> it (vfree and vunmap are capable of searching both address spaces).
> 
> This allows users (such as xSplice) to provide their own
> mechanism to change the the page flags, and also use virtual
> addresses closer to the hypervisor virtual addresses (at least
> on x86) while not having to deal with the allocation of
> pages.
> 
> For example of users, see patch titled "xsplice: Implement payload
> loading", where we parse the payload's ELF relocations - which
> is defined to be signed 32-bit (on x86) (max displacement hence
> is 2GB virtual space, ARM32 is 128MB). The displacement of the
> hypervisor virtual addresses to the vmalloc (on x86)
> is more than 32-bits - which means that ELF relocations would
> truncate the 34 and 33th bit. Hence this alternate API.
> 
> We also add add extra checks in case the b) range has not been
> initialized.
> 
> Part of this patch also removes 'vm_alloc' and 'vm_free'
> decleration as we do not have any users of it.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Suggested-by: Jan Beulich <jbeulich@suse.com>
> Acked-by: Julien Grall <julien.grall@arm.com> [ARM]

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with ...

>  void vunmap(const void *va)
>  {
>  #ifndef _PAGE_NONE
>      unsigned long addr = (unsigned long)va;
> +#endif

... the #ifndef here gone and ...

> +    unsigned int pages = vm_size(va, VMAP_DEFAULT);
> +
> +    if ( !pages )
> +        pages = vm_size(va, VMAP_XEN);
>  
> -    destroy_xen_mappings(addr, addr + PAGE_SIZE * vm_size(va));
> +#ifndef _PAGE_NONE
> +    destroy_xen_mappings(addr, addr + PAGE_SIZE * pages);
>  #else /* Avoid tearing down intermediate page tables. */
> -    map_pages_to_xen((unsigned long)va, 0, vm_size(va), _PAGE_NONE);
> +    map_pages_to_xen((unsigned long)va, 0, pages, _PAGE_NONE);

addr used here.

Thanks, Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 11/27] xsplice: Implement payload loading
  2016-04-27  8:28       ` Jan Beulich
@ 2016-04-27 15:48         ` Konrad Rzeszutek Wilk
  2016-04-27 16:06           ` Jan Beulich
  2016-04-27 16:14           ` Jan Beulich
  0 siblings, 2 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-27 15:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Keir Fraser, andrew.cooper3, mpohlack,
	ross.lagerwall, Julien Grall, sasha.levin, xen-devel

On Wed, Apr 27, 2016 at 02:28:09AM -0600, Jan Beulich wrote:
> >>> On 27.04.16 at 05:28, <konrad.wilk@oracle.com> wrote:
> >> > +static int move_payload(struct payload *payload, struct xsplice_elf *elf)
> >> > +{
> > ..snip..
> >> > +    for ( i = 1; i < elf->hdr->e_shnum; i++ )
> >> > +    {
> >> > +        if ( elf->sec[i].sec->sh_flags & SHF_ALLOC )
> >> > +        {
> >> > +            uint8_t *buf;
> >> 
> >> Perhaps void * again? And missing a blank line afterwards.
> >> 
> >> > +            if ( (elf->sec[i].sec->sh_flags & SHF_EXECINSTR) )
> >> > +                buf = text_buf;
> >> > +            else if ( (elf->sec[i].sec->sh_flags & SHF_WRITE) )
> >> > +                buf = rw_buf;
> >> > +             else
> >> 
> >> The indentation here is still one off.
> > 
> > I am not seeing it. I deleted the line and added it back using
> > spaces just in case. But I really don't see the indentation isse
> > you are seeing?
> 
> Just count the number of blanks - I count 12 ahead of the "else if"
> but 13 ahead of the bare "else". And it's still the same in the

I finally see it now! I had been looking at the 'else if' and 'buf'
but not the 'else'. Argh.

..snip..
> > +static int move_payload(struct payload *payload, struct xsplice_elf *elf)
> > +{
> > +    void *text_buf, *ro_buf, *rw_buf;
> > +    unsigned int i;
> > +    size_t size = 0;
> > +    unsigned int *offset;
> > +    int rc = 0;
> > +
> > +    offset = xmalloc_array(unsigned int, elf->hdr->e_shnum);
> > +    if ( !offset )
> > +        return -ENOMEM;
> > +
> > +    /* Compute size of different regions. */
> > +    for ( i = 1; i < elf->hdr->e_shnum; i++ )
> > +    {
> > +        if ( (elf->sec[i].sec->sh_flags & (SHF_ALLOC|SHF_EXECINSTR)) ==
> > +             (SHF_ALLOC|SHF_EXECINSTR) )
> > +            calc_section(&elf->sec[i], &payload->text_size, &offset[i]);
> > +        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> > +                  !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
> > +                  (elf->sec[i].sec->sh_flags & SHF_WRITE) )
> > +            calc_section(&elf->sec[i], &payload->rw_size, &offset[i]);
> > +        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> > +                  !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
> > +                  !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
> > +            calc_section(&elf->sec[i], &payload->ro_size, &offset[i]);
> > +        else if ( !elf->sec[i].sec->sh_flags ||
> > +                  (elf->sec[i].sec->sh_flags & SHF_EXECINSTR) ||
> > +                  (elf->sec[i].sec->sh_flags & SHF_MASKPROC) )
> > +            /*
> > +             * Do nothing. These are .rel.text, rel.*, .symtab, .strtab,
> > +             * and .shstrtab. For the non-relocate we allocate and copy these
> > +             * via other means - and the .rel we can ignore as we only use it
> > +             * once during loading.
> > +             */
> > +            offset[i] = UINT_MAX;
> > +        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> > +                  (elf->sec[i].sec->sh_type == SHT_NOBITS) )
> > +        {
> > +            dprintk(XENLOG_DEBUG, XSPLICE "%s: Not supporting %s section!\n",
> > +                    elf->name, elf->sec[i].name);
> > +            rc = -EOPNOTSUPP;
> > +            goto out;
> > +        }
> > +        else /* Such as .comment, or .debug_str. */
> > +        {
> > +            dprintk(XENLOG_DEBUG, XSPLICE "%s: Ignoring %s section!\n",
> > +                    elf->name, elf->sec[i].name);
> > +            offset[i] = UINT_MAX;
> > +        }
> 
> See earlier reply regarding this entire loop body.


So I modified it and added some more comments in
xsplice_elf_resolve_symbols to make it clear that some of these
are just plain ignored. 

Here is the inline patch:

From 99b2bb4969f1a8cc82fbd80885572a99204f95f4 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Wed, 27 Apr 2016 09:01:51 -0400
Subject: [PATCH] xsplice: Implement payload loading

Add support for loading xsplice payloads. This is somewhat similar to
the Linux kernel module loader, implementing the following steps:
- Verify the elf file.
- Parse the elf file.
- Allocate a region of memory mapped within a free area of
  [xen_virt_end, XEN_VIRT_END].
- Copy allocated sections into the new region. Split them in three
  regions - .text, .data, and .rodata. MUST have at least .text.
- Resolve section symbols. All other symbols must be absolute addresses.
  (Note that patch titled "xsplice,symbols: Implement symbol name resolution
   on address" implements that)
- Perform relocations.
- Secure the the regions (.text,.data,.rodata) with proper permissions.

We capitalize on the vmalloc callback API (see patch titled:
"rm/x86/vmap: Add v[z|m]alloc_xen, and vm_init_type") to allocate
a region of memory within the [xen_virt_end, XEN_VIRT_END] for the code.

We also use the "x86/mm: Introduce modify_xen_mappings()"
to change the virtual address page-table permissions.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Julien Grall <julien.grall@arm.com>

---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v2: - Change the 'xsplice_patch_func' structure layout/size.
    - Add more error checking. Fix memory leak.
    - Move elf_resolve and elf_perform relocs in elf file.
    - Print the payload address and pages in keyhandler.
v3:
    - Make it build under ARM
    - Build it without using the return_ macro.
    - Add fixes from Ross.
    - Add the _return macro back - but only use it during debug builds.
    - Remove the macro, prefix arch_ on arch specific calls.
v4:
    - Move alloc_payload to arch specific file.
    - Use void* instead of uint8_t, use const
    - Add copyrights
    - Unroll the vmap code to add ASSERT. Change while to not incur
      potential long error loop
   - Use vmalloc/vfree cb APIs
   - Secure .text pages to be RX instead of RWX.
v5:
  - Fix allocation of virtual addresses only allowing one page to be allocated.
  - Create .text, .data, and .rodata regions with different permissions.
  - Make the find_space_t not a typedef to pointer to a function.
  - Allocate memory in here.
v6: Drop parentheses on typedefs.
  - s/an xSplice/a xSplice/
  - Rebase on "vmap: Add vmalloc_cb"
  - Rebase on "vmap: Add vmalloc_type and vm_init_type"
  - s/uint8_t/void/ on load_addr
  - Set xsplice_elf on stack without using memset.
v7:
  - Changed the check on delta = elf->hdr->e_shoff + elf->hdr->e_shnum * elf->hdr->e_shentsize;
    The sections can be right at the back of the file (different linker!), so the failing conditional
    for 'if (delta >= elf->len)' is incorrect and should have been '>'.
  - Changed dprintk(XENLOG_DEBUG to XENLOG_ERR, then back to DEBUG. Converted
    some of the printk to dprintk.
  - Rebase on " arm/x86/vmap: Add vmalloc_xen, vfree_xen and vm_init_type"
  - Changed some of the printk XENLOG_ERR to XENLOG_DEBUG
  - Check the idx in the relocation to make sure it is within bounds and
    implemented.
  - Use "x86/mm: Introduce modify_xen_mappings()"
  - Introduce PRIxElfAddr
  - Check for overflow in R_X86_64_PC32
  - Return -EOPNOTSUPP if we don't support types in ELF64_R_TYPE
v8:
  - Change dprintk and printk XENLOG_DEBUG to XENLOG_ERR
  - Convert four of the printks in dprintk.
v9:
  - Rebase on different spinlock usage in xsplice_upload.
  - Do proper bound and overflow checking.
  - Added 'const' on [text,ro,rw]_addr.
  - Made 'calc_section' and 'move_payload' use an dynamically
    allocated array for computed offsets instead of modifying sh_entsize.
  - Remove arch_xsplice_[alloc_payload|free] and use vzalloc_xen and
    vfree.
  - Collapse for loop in move_payload.
  - Move xsplice.o in Makefile
  - Add more checks in arch_xsplice_perform_rela (r_offset and
     sh_size % sh_entsize)
  - Use int32_t and int64_t in arch_xsplice_perform_rela.
  - Tighten the list of sh_flags we check
  - Use intermediate on 'buf' so that we can do 'const void *'
  - Use intermediate in xsplice_elf_resolve_symbols for 'const' of elf->sym.
  - Fail if (and only) SHF_ALLOC and SHT_NOBITS section is seen.
v10:
   - Dropped Andrew's Reviewed-by
   - Expand arch_xsplice_verify_elf to check EI_CLASS and EI_ABIVERSION
   - In arch_xsplice_perform_rela drop check against !rela->sec->sh_entsize,
     add extra checks against r_offset + sizeof(type) neccessating
     an extra goto statement.
   - Make arch_xsplice_init be __init.
   - In free_payload_data check against ->pages instead of ->text_addr.
   - In move_payload use 'void *' instead of 'uint8_t *', use xmalloc_array
     for offset, expand on the 'Do Nothing' comment and the 'Ignoring';
     Use vmalloc instead of vzalloc - which means for .bss we also use
     memset; drop the unary + when calculating address for rw_buf;
     Fix indention (I hope? I don't see an issue); also use offset[i] =UINT_MAX
     for sections we are not going to allocate or memcpy - and assert if
     we do hit those.
   - In xsplice_elf_resolve_symbols move check against
     !(elf->sec[idx].sec->sh_flags & SHF_ALLOC) back to what it was
     in v8.
   - In xsplice_elf_perform_relocs drop comment about first ELF
     symbol.
   - Different if/else if in upload function.
v10 - patch posted as inline reply within v9 patchset:
   - Add check for EI_DATA and ditch EI_ABIVERSION out of platform specific checks.
   - In elf_resolve_symbols use const ElfSym and temporary variable (st_value)
     which is used to set value in const (and only in one place will we
     unconst the sym); change %#"PRIx.." to #x or #lx types as we don't use Elf
     types anymore.
   - Make the r_offset check be within switch statements to guard against NONE
     relocations.
   - Fix the 'else if' spacing issue;
   - Use different message when idx in symbols is out of bounds.
---
 xen/arch/arm/Makefile         |   1 +
 xen/arch/arm/xsplice.c        |  46 ++++++++
 xen/arch/x86/Makefile         |   1 +
 xen/arch/x86/xsplice.c        | 173 ++++++++++++++++++++++++++++++
 xen/common/xsplice.c          | 242 +++++++++++++++++++++++++++++++++++++++++-
 xen/common/xsplice_elf.c      | 118 ++++++++++++++++++++
 xen/include/xen/elfstructs.h  |   4 +
 xen/include/xen/xsplice.h     |  24 +++++
 xen/include/xen/xsplice_elf.h |  11 +-
 9 files changed, 615 insertions(+), 5 deletions(-)
 create mode 100644 xen/arch/arm/xsplice.c
 create mode 100644 xen/arch/x86/xsplice.c

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 0328b50..eae5cb3 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -40,6 +40,7 @@ obj-y += device.o
 obj-y += decode.o
 obj-y += processor.o
 obj-y += smc.o
+obj-$(CONFIG_XSPLICE) += xsplice.o
 
 #obj-bin-y += ....o
 
diff --git a/xen/arch/arm/xsplice.c b/xen/arch/arm/xsplice.c
new file mode 100644
index 0000000..8cb7767
--- /dev/null
+++ b/xen/arch/arm/xsplice.c
@@ -0,0 +1,46 @@
+/*
+ *  Copyright (C) 2016 Citrix Systems R&D Ltd.
+ */
+#include <xen/errno.h>
+#include <xen/init.h>
+#include <xen/lib.h>
+#include <xen/xsplice_elf.h>
+#include <xen/xsplice.h>
+
+int arch_xsplice_verify_elf(const struct xsplice_elf *elf)
+{
+    return -ENOSYS;
+}
+
+int arch_xsplice_perform_rel(struct xsplice_elf *elf,
+                             const struct xsplice_elf_sec *base,
+                             const struct xsplice_elf_sec *rela)
+{
+    return -ENOSYS;
+}
+
+int arch_xsplice_perform_rela(struct xsplice_elf *elf,
+                              const struct xsplice_elf_sec *base,
+                              const struct xsplice_elf_sec *rela)
+{
+    return -ENOSYS;
+}
+
+int arch_xsplice_secure(const void *va, unsigned int pages, enum va_type type)
+{
+    return -ENOSYS;
+}
+
+void __init arch_xsplice_init(void)
+{
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 729065b..f74fd2c 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -61,6 +61,7 @@ obj-y += x86_emulate.o
 obj-y += tboot.o
 obj-y += hpet.o
 obj-y += vm_event.o
+obj-$(CONFIG_XSPLICE) += xsplice.o
 obj-y += xstate.o
 
 obj-$(crash_debug) += gdbstub.o
diff --git a/xen/arch/x86/xsplice.c b/xen/arch/x86/xsplice.c
new file mode 100644
index 0000000..82618f7
--- /dev/null
+++ b/xen/arch/x86/xsplice.c
@@ -0,0 +1,173 @@
+/*
+ * Copyright (C) 2016 Citrix Systems R&D Ltd.
+ */
+
+#include <xen/errno.h>
+#include <xen/init.h>
+#include <xen/lib.h>
+#include <xen/mm.h>
+#include <xen/pfn.h>
+#include <xen/vmap.h>
+#include <xen/xsplice_elf.h>
+#include <xen/xsplice.h>
+
+int arch_xsplice_verify_elf(const struct xsplice_elf *elf)
+{
+
+    const Elf_Ehdr *hdr = elf->hdr;
+
+    if ( hdr->e_machine != EM_X86_64 ||
+         hdr->e_ident[EI_CLASS] != ELFCLASS64 ||
+         hdr->e_ident[EI_DATA] != ELFDATA2LSB )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Unsupported ELF Machine type!\n",
+                elf->name);
+        return -EOPNOTSUPP;
+    }
+
+    return 0;
+}
+
+int arch_xsplice_perform_rel(struct xsplice_elf *elf,
+                             const struct xsplice_elf_sec *base,
+                             const struct xsplice_elf_sec *rela)
+{
+    dprintk(XENLOG_ERR, XSPLICE "%s: SHT_REL relocation unsupported\n",
+            elf->name);
+    return -EOPNOTSUPP;
+}
+
+int arch_xsplice_perform_rela(struct xsplice_elf *elf,
+                              const struct xsplice_elf_sec *base,
+                              const struct xsplice_elf_sec *rela)
+{
+    const Elf_RelA *r;
+    unsigned int symndx, i;
+    uint64_t val;
+    uint8_t *dest;
+
+    /* Nothing to do. */
+    if ( !rela->sec->sh_size )
+        return 0;
+
+    if ( rela->sec->sh_entsize < sizeof(Elf_RelA) ||
+         rela->sec->sh_size % rela->sec->sh_entsize )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Section relative header is corrupted!\n",
+                elf->name);
+        return -EINVAL;
+    }
+
+    for ( i = 0; i < (rela->sec->sh_size / rela->sec->sh_entsize); i++ )
+    {
+        r = rela->data + i * rela->sec->sh_entsize;
+
+        symndx = ELF64_R_SYM(r->r_info);
+
+        if ( symndx > elf->nsym )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Relative relocation wants symbol@%u which is past end!\n",
+                    elf->name, symndx);
+            return -EINVAL;
+        }
+
+        dest = base->load_addr + r->r_offset;
+        val = r->r_addend + elf->sym[symndx].sym->st_value;
+
+        switch ( ELF64_R_TYPE(r->r_info) )
+        {
+        case R_X86_64_NONE:
+            break;
+
+        case R_X86_64_64:
+            if ( r->r_offset >= base->sec->sh_size ||
+                (r->r_offset + sizeof(uint64_t)) > base->sec->sh_size )
+                goto bad_offset;
+
+            *(uint64_t *)dest = val;
+            break;
+
+        case R_X86_64_PLT32:
+            /*
+             * Xen uses -fpic which normally uses PLT relocations
+             * except that it sets visibility to hidden which means
+             * that they are not used.  However, when gcc cannot
+             * inline memcpy it emits memcpy with default visibility
+             * which then creates a PLT relocation.  It can just be
+             * treated the same as R_X86_64_PC32.
+             */
+        case R_X86_64_PC32:
+            if ( r->r_offset >= base->sec->sh_size ||
+                (r->r_offset + sizeof(uint32_t)) > base->sec->sh_size )
+                goto bad_offset;
+
+            val -= (uint64_t)dest;
+            *(int32_t *)dest = val;
+            if ( (int64_t)val != *(int32_t *)dest )
+            {
+                dprintk(XENLOG_ERR, XSPLICE "%s: Overflow in relocation %u in %s for %s!\n",
+                        elf->name, i, rela->name, base->name);
+                return -EOVERFLOW;
+            }
+            break;
+
+        default:
+            dprintk(XENLOG_ERR, XSPLICE "%s: Unhandled relocation %lu\n",
+                    elf->name, ELF64_R_TYPE(r->r_info));
+            return -EOPNOTSUPP;
+        }
+    }
+
+    return 0;
+
+ bad_offset:
+    dprintk(XENLOG_ERR, XSPLICE "%s: Relative relocation offset is past %s section!\n",
+            elf->name, base->name);
+    return -EINVAL;
+}
+
+/*
+ * Once the resolving symbols, performing relocations, etc is complete
+ * we secure the memory by putting in the proper page table attributes
+ * for the desired type.
+ */
+int arch_xsplice_secure(const void *va, unsigned int pages, enum va_type type)
+{
+    unsigned long start = (unsigned long)va;
+    unsigned int flag;
+
+    ASSERT(va);
+    ASSERT(pages);
+
+    if ( type == XSPLICE_VA_RX )
+        flag = PAGE_HYPERVISOR_RX;
+    else if ( type == XSPLICE_VA_RW )
+        flag = PAGE_HYPERVISOR_RW;
+    else
+        flag = PAGE_HYPERVISOR_RO;
+
+    modify_xen_mappings(start, start + pages * PAGE_SIZE, flag);
+
+    return 0;
+}
+
+void __init arch_xsplice_init(void)
+{
+    void *start, *end;
+
+    start = (void *)xen_virt_end;
+    end = (void *)(XEN_VIRT_END - NR_CPUS * PAGE_SIZE);
+
+    BUG_ON(end <= start);
+
+    vm_init_type(VMAP_XEN, start, end);
+}
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 73e50f0..65545c3 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -13,6 +13,7 @@
 #include <xen/smp.h>
 #include <xen/spinlock.h>
 #include <xen/vmap.h>
+#include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 
 #include <asm/event.h>
@@ -29,6 +30,13 @@ struct payload {
     uint32_t state;                      /* One of the XSPLICE_STATE_*. */
     int32_t rc;                          /* 0 or -XEN_EXX. */
     struct list_head list;               /* Linked to 'payload_list'. */
+    const void *text_addr;               /* Virtual address of .text. */
+    size_t text_size;                    /* .. and its size. */
+    const void *rw_addr;                 /* Virtual address of .data. */
+    size_t rw_size;                      /* .. and its size (if any). */
+    const void *ro_addr;                 /* Virtual address of .rodata. */
+    size_t ro_size;                      /* .. and its size (if any). */
+    unsigned int pages;                  /* Total pages for [text,rw,ro]_addr */
     char name[XEN_XSPLICE_NAME_SIZE];    /* Name of it. */
 };
 
@@ -83,19 +91,232 @@ static struct payload *find_payload(const char *name)
     return found;
 }
 
+/*
+ * Functions related to XEN_SYSCTL_XSPLICE_UPLOAD (see xsplice_upload), and
+ * freeing payload (XEN_SYSCTL_XSPLICE_ACTION:XSPLICE_ACTION_UNLOAD).
+ */
+
+static void free_payload_data(struct payload *payload)
+{
+    /* Set to zero until "move_payload". */
+    if ( !payload->pages )
+        return;
+
+    vfree((void *)payload->text_addr);
+
+    payload->pages = 0;
+}
+
+/*
+* calc_section computes the size (taking into account section alignment).
+*
+* Furthermore the offset is set with the offset from the start of the virtual
+* address space for the payload (using passed in size). This is used in
+* move_payload to figure out the destination location (load_addr).
+*/
+static void calc_section(const struct xsplice_elf_sec *sec, size_t *size,
+                         unsigned int *offset)
+{
+    const Elf_Shdr *s = sec->sec;
+    size_t align_size;
+
+    align_size = ROUNDUP(*size, s->sh_addralign);
+    *offset = align_size;
+    *size = s->sh_size + align_size;
+}
+
+static int move_payload(struct payload *payload, struct xsplice_elf *elf)
+{
+    void *text_buf, *ro_buf, *rw_buf;
+    unsigned int i;
+    size_t size = 0;
+    unsigned int *offset;
+    int rc = 0;
+
+    offset = xmalloc_array(unsigned int, elf->hdr->e_shnum);
+    if ( !offset )
+        return -ENOMEM;
+
+    /* Compute size of different regions. */
+    for ( i = 1; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
+             (elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
+             !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
+            calc_section(&elf->sec[i], &payload->text_size, &offset[i]);
+        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
+                  !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
+                  (elf->sec[i].sec->sh_flags & SHF_WRITE) )
+            calc_section(&elf->sec[i], &payload->rw_size, &offset[i]);
+        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
+                  !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
+                  !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
+            calc_section(&elf->sec[i], &payload->ro_size, &offset[i]);
+        else if ( !elf->sec[i].sec->sh_flags ||
+                  (elf->sec[i].sec->sh_flags & SHF_EXECINSTR) ||
+                  (elf->sec[i].sec->sh_flags & SHF_MASKPROC) )
+            /*
+             * Do nothing. These are .rel.text, rel.*, .symtab, .strtab,
+             * and .shstrtab. For the non-relocate we allocate and copy these
+             * via other means - and the .rel we can ignore as we only use it
+             * once during loading.
+             */
+            offset[i] = UINT_MAX;
+        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
+                  (elf->sec[i].sec->sh_type == SHT_NOBITS) )
+        {
+            dprintk(XENLOG_DEBUG, XSPLICE "%s: Not supporting %s section!\n",
+                    elf->name, elf->sec[i].name);
+            rc = -EOPNOTSUPP;
+            goto out;
+        }
+        else /* Such as .comment, or .debug_str. */
+        {
+            dprintk(XENLOG_DEBUG, XSPLICE "%s: Ignoring %s section!\n",
+                    elf->name, elf->sec[i].name);
+            offset[i] = UINT_MAX;
+        }
+    }
+
+    /*
+     * Total of all three regions - RX, RW, and RO. We have to have
+     * keep them in seperate pages so we PAGE_ALIGN the RX and RW to have
+     * them on seperate pages. The last one will by default fall on its
+     * own page.
+     */
+    size = PAGE_ALIGN(payload->text_size) + PAGE_ALIGN(payload->rw_size) +
+                      payload->ro_size;
+
+    size = PFN_UP(size); /* Nr of pages. */
+    text_buf = vmalloc_xen(size * PAGE_SIZE);
+    if ( !text_buf )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Could not allocate memory for payload!\n",
+                elf->name);
+        rc = -ENOMEM;
+        goto out;
+    }
+    rw_buf = text_buf + PAGE_ALIGN(payload->text_size);
+    ro_buf = rw_buf + PAGE_ALIGN(payload->rw_size);
+
+    payload->pages = size;
+    payload->text_addr = text_buf;
+    payload->rw_addr = rw_buf;
+    payload->ro_addr = ro_buf;
+
+    for ( i = 1; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( elf->sec[i].sec->sh_flags & SHF_ALLOC )
+        {
+            void *buf;
+
+            if ( elf->sec[i].sec->sh_flags & SHF_EXECINSTR )
+                buf = text_buf;
+            else if ( elf->sec[i].sec->sh_flags & SHF_WRITE )
+                buf = rw_buf;
+            else
+                buf = ro_buf;
+
+            ASSERT(offset[i] != UINT_MAX);
+
+            elf->sec[i].load_addr = buf + offset[i];
+
+            /* Don't copy NOBITS - such as BSS. */
+            if ( elf->sec[i].sec->sh_type != SHT_NOBITS )
+            {
+                memcpy(elf->sec[i].load_addr, elf->sec[i].data,
+                       elf->sec[i].sec->sh_size);
+                dprintk(XENLOG_DEBUG, XSPLICE "%s: Loaded %s at %p\n",
+                        elf->name, elf->sec[i].name, elf->sec[i].load_addr);
+            }
+            else
+                memset(elf->sec[i].load_addr, 0, elf->sec[i].sec->sh_size);
+        }
+    }
+
+ out:
+    xfree(offset);
+
+    return rc;
+}
+
+static int secure_payload(struct payload *payload, struct xsplice_elf *elf)
+{
+    int rc;
+    unsigned int text_pages, rw_pages, ro_pages;
+
+    text_pages = PFN_UP(payload->text_size);
+    ASSERT(text_pages);
+
+    rc = arch_xsplice_secure(payload->text_addr, text_pages, XSPLICE_VA_RX);
+    if ( rc )
+        return rc;
+
+    rw_pages = PFN_UP(payload->rw_size);
+    if ( rw_pages )
+    {
+        rc = arch_xsplice_secure(payload->rw_addr, rw_pages, XSPLICE_VA_RW);
+        if ( rc )
+            return rc;
+    }
+
+    ro_pages = PFN_UP(payload->ro_size);
+    if ( ro_pages )
+        rc = arch_xsplice_secure(payload->ro_addr, ro_pages, XSPLICE_VA_RO);
+
+    ASSERT(ro_pages + rw_pages + text_pages == payload->pages);
+
+    return rc;
+}
+
 static void free_payload(struct payload *data)
 {
     ASSERT(spin_is_locked(&payload_lock));
     list_del(&data->list);
     payload_cnt--;
     payload_version++;
+    free_payload_data(data);
     xfree(data);
 }
 
+static int load_payload_data(struct payload *payload, void *raw, size_t len)
+{
+    struct xsplice_elf elf = { .name = payload->name, .len = len };
+    int rc = 0;
+
+    rc = xsplice_elf_load(&elf, raw);
+    if ( rc )
+        goto out;
+
+    rc = move_payload(payload, &elf);
+    if ( rc )
+        goto out;
+
+    rc = xsplice_elf_resolve_symbols(&elf);
+    if ( rc )
+        goto out;
+
+    rc = xsplice_elf_perform_relocs(&elf);
+    if ( rc )
+        goto out;
+
+    rc = secure_payload(payload, &elf);
+
+ out:
+    if ( rc )
+        free_payload_data(payload);
+
+    /* Free our temporary data structure. */
+    xsplice_elf_free(&elf);
+
+    return rc;
+}
+
 static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
 {
     struct payload *data, *found;
     char n[XEN_XSPLICE_NAME_SIZE];
+    void *raw_data;
     int rc;
 
     rc = verify_payload(upload, n);
@@ -103,6 +324,7 @@ static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
         return rc;
 
     data = xzalloc(struct payload);
+    raw_data = vmalloc(upload->size);
 
     spin_lock(&payload_lock);
 
@@ -111,11 +333,18 @@ static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
         rc = PTR_ERR(found);
     else if ( found )
         rc = -EEXIST;
-    else if ( !data )
+    else if ( !data || !raw_data )
         rc = -ENOMEM;
+    else if ( __copy_from_guest(raw_data, upload->payload, upload->size) )
+        rc = -EFAULT;
     else
     {
         memcpy(data->name, n, strlen(n));
+
+        rc = load_payload_data(data, raw_data, upload->size);
+        if ( rc )
+            goto out;
+
         data->state = XSPLICE_STATE_CHECKED;
         INIT_LIST_HEAD(&data->list);
 
@@ -123,8 +352,12 @@ static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
         payload_cnt++;
         payload_version++;
     }
+
+ out:
     spin_unlock(&payload_lock);
 
+    vfree(raw_data);
+
     if ( rc )
         xfree(data);
 
@@ -359,8 +592,9 @@ static void xsplice_printall(unsigned char key)
     }
 
     list_for_each_entry ( data, &payload_list, list )
-        printk(" name=%s state=%s(%d)\n", data->name,
-               state2str(data->state), data->state);
+        printk(" name=%s state=%s(%d) %p (.data=%p, .rodata=%p) using %u pages.\n",
+               data->name, state2str(data->state), data->state, data->text_addr,
+               data->rw_addr, data->ro_addr, data->pages);
 
     spin_unlock(&payload_lock);
 }
@@ -368,6 +602,8 @@ static void xsplice_printall(unsigned char key)
 static int __init xsplice_init(void)
 {
     register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
+
+    arch_xsplice_init();
     return 0;
 }
 __initcall(xsplice_init);
diff --git a/xen/common/xsplice_elf.c b/xen/common/xsplice_elf.c
index 8136ab7..c5b7192 100644
--- a/xen/common/xsplice_elf.c
+++ b/xen/common/xsplice_elf.c
@@ -100,6 +100,7 @@ static int elf_resolve_sections(struct xsplice_elf *elf, const void *data)
 
             elf->symtab = &sec[i];
 
+            elf->symtab_idx = i;
             /*
              * elf->symtab->sec->sh_link would point to the right section
              * but we hadn't finished parsing all the sections.
@@ -250,9 +251,122 @@ static int elf_get_sym(struct xsplice_elf *elf, const void *data)
     return 0;
 }
 
+int xsplice_elf_resolve_symbols(struct xsplice_elf *elf)
+{
+    unsigned int i;
+    int rc = 0;
+
+    ASSERT(elf->sym);
+
+    for ( i = 1; i < elf->nsym; i++ )
+    {
+        unsigned int idx = elf->sym[i].sym->st_shndx;
+        const Elf_Sym *sym = elf->sym[i].sym;
+        unsigned long st_value = sym->st_value;
+
+        switch ( idx )
+        {
+        case SHN_COMMON:
+            dprintk(XENLOG_ERR, XSPLICE "%s: Unexpected common symbol: %s\n",
+                    elf->name, elf->sym[i].name);
+            rc = -EINVAL;
+            break;
+
+        case SHN_UNDEF:
+            dprintk(XENLOG_ERR, XSPLICE "%s: Unknown symbol: %s\n",
+                    elf->name, elf->sym[i].name);
+            rc = -ENOENT;
+            break;
+
+        case SHN_ABS:
+            dprintk(XENLOG_DEBUG, XSPLICE "%s: Absolute symbol: %s => %#"PRIxElfAddr"\n",
+                    elf->name, elf->sym[i].name, sym->st_value);
+            break;
+
+        default:
+            /* SHN_COMMON and SHN_ABS are above. */
+            if ( idx >= SHN_LORESERVE )
+                rc = -EOPNOTSUPP;
+            else if ( idx >= elf->hdr->e_shnum )
+                rc = -EINVAL;
+
+            if ( rc )
+            {
+                dprintk(XENLOG_ERR, XSPLICE "%s: Out of bounds symbol section %#x\n",
+                        elf->name, idx);
+                break;
+            }
+
+            /* Matches 'move_payload' which ignores such sections. */
+            if ( !(elf->sec[idx].sec->sh_flags & SHF_ALLOC) )
+                break;
+
+            st_value += (unsigned long)elf->sec[idx].load_addr;
+            if ( elf->sym[i].name )
+                dprintk(XENLOG_DEBUG, XSPLICE "%s: Symbol resolved: %s => %#lx (%s)\n",
+                       elf->name, elf->sym[i].name,
+                       st_value, elf->sec[idx].name);
+        }
+
+        if ( rc )
+            break;
+
+        ((Elf_Sym *)sym)->st_value = st_value;
+    }
+
+    return rc;
+}
+
+int xsplice_elf_perform_relocs(struct xsplice_elf *elf)
+{
+    struct xsplice_elf_sec *r, *base;
+    unsigned int i;
+    int rc = 0;
+
+    ASSERT(elf->sym);
+
+    for ( i = 1; i < elf->hdr->e_shnum; i++ )
+    {
+        r = &elf->sec[i];
+
+        if ( (r->sec->sh_type != SHT_RELA) &&
+             (r->sec->sh_type != SHT_REL) )
+            continue;
+
+         /* Is it a valid relocation section? */
+         if ( r->sec->sh_info >= elf->hdr->e_shnum )
+            continue;
+
+         base = &elf->sec[r->sec->sh_info];
+
+         /* Don't relocate non-allocated sections. */
+         if ( !(base->sec->sh_flags & SHF_ALLOC) )
+            continue;
+
+        if ( r->sec->sh_link != elf->symtab_idx )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Relative link of %s is incorrect (%d, expected=%d)\n",
+                    elf->name, r->name, r->sec->sh_link, elf->symtab_idx);
+            rc = -EINVAL;
+            break;
+        }
+
+        if ( r->sec->sh_type == SHT_RELA )
+            rc = arch_xsplice_perform_rela(elf, base, r);
+        else /* SHT_REL */
+            rc = arch_xsplice_perform_rel(elf, base, r);
+
+        if ( rc )
+            break;
+    }
+
+    return rc;
+}
+
 static int xsplice_header_check(const struct xsplice_elf *elf)
 {
     const Elf_Ehdr *hdr = elf->hdr;
+    int rc;
 
     if ( sizeof(*elf->hdr) > elf->len )
     {
@@ -279,6 +393,10 @@ static int xsplice_header_check(const struct xsplice_elf *elf)
         return -EOPNOTSUPP;
     }
 
+    rc = arch_xsplice_verify_elf(elf);
+    if ( rc )
+        return rc;
+
     if ( elf->hdr->e_shstrndx == SHN_UNDEF )
     {
         dprintk(XENLOG_ERR, XSPLICE "%s: Section name idx is undefined!?\n",
diff --git a/xen/include/xen/elfstructs.h b/xen/include/xen/elfstructs.h
index 85f35ed..2b9bd3f 100644
--- a/xen/include/xen/elfstructs.h
+++ b/xen/include/xen/elfstructs.h
@@ -472,6 +472,8 @@ typedef struct {
 #endif
 
 #if defined(ELFSIZE) && (ELFSIZE == 32)
+#define PRIxElfAddr	"08x"
+
 #define Elf_Ehdr	Elf32_Ehdr
 #define Elf_Phdr	Elf32_Phdr
 #define Elf_Shdr	Elf32_Shdr
@@ -497,6 +499,8 @@ typedef struct {
 
 #define AuxInfo		Aux32Info
 #elif defined(ELFSIZE) && (ELFSIZE == 64)
+#define PRIxElfAddr	PRIx64
+
 #define Elf_Ehdr	Elf64_Ehdr
 #define Elf_Phdr	Elf64_Phdr
 #define Elf_Shdr	Elf64_Shdr
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index 7559877..857c264 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -6,6 +6,9 @@
 #ifndef __XEN_XSPLICE_H__
 #define __XEN_XSPLICE_H__
 
+struct xsplice_elf;
+struct xsplice_elf_sec;
+struct xsplice_elf_sym;
 struct xen_sysctl_xsplice_op;
 
 #ifdef CONFIG_XSPLICE
@@ -15,6 +18,27 @@ struct xen_sysctl_xsplice_op;
 
 int xsplice_op(struct xen_sysctl_xsplice_op *);
 
+/* Arch hooks. */
+int arch_xsplice_verify_elf(const struct xsplice_elf *elf);
+int arch_xsplice_perform_rel(struct xsplice_elf *elf,
+                             const struct xsplice_elf_sec *base,
+                             const struct xsplice_elf_sec *rela);
+int arch_xsplice_perform_rela(struct xsplice_elf *elf,
+                              const struct xsplice_elf_sec *base,
+                              const struct xsplice_elf_sec *rela);
+enum va_type {
+    XSPLICE_VA_RX, /* .text */
+    XSPLICE_VA_RW, /* .data */
+    XSPLICE_VA_RO, /* .rodata */
+};
+
+/*
+ * Function to secure the allocate pages (from arch_xsplice_alloc_payload)
+ * with the right page permissions.
+ */
+int arch_xsplice_secure(const void *va, unsigned int pages, enum va_type types);
+
+void arch_xsplice_init(void);
 #else
 
 #include <xen/errno.h> /* For -ENOSYS */
diff --git a/xen/include/xen/xsplice_elf.h b/xen/include/xen/xsplice_elf.h
index 686aaf0..750dc94 100644
--- a/xen/include/xen/xsplice_elf.h
+++ b/xen/include/xen/xsplice_elf.h
@@ -15,6 +15,8 @@ struct xsplice_elf_sec {
                                             elf_resolve_section_names. */
     const void *data;                    /* Pointer to the section (done by
                                             elf_resolve_sections). */
+    void *load_addr;                     /* A pointer to the allocated destination.
+                                            Done by load_payload_data. */
 };
 
 struct xsplice_elf_sym {
@@ -29,8 +31,10 @@ struct xsplice_elf {
     struct xsplice_elf_sec *sec;         /* Array of sections, allocated by us. */
     struct xsplice_elf_sym *sym;         /* Array of symbols , allocated by us. */
     unsigned int nsym;
-    const struct xsplice_elf_sec *symtab;/* Pointer to .symtab section - aka to sec[x]. */
-    const struct xsplice_elf_sec *strtab;/* Pointer to .strtab section - aka to sec[y]. */
+    const struct xsplice_elf_sec *symtab;/* Pointer to .symtab section - aka to
+                                            sec[symtab_idx]. */
+    const struct xsplice_elf_sec *strtab;/* Pointer to .strtab section. */
+    unsigned int symtab_idx;
 };
 
 const struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
@@ -38,6 +42,9 @@ const struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *
 int xsplice_elf_load(struct xsplice_elf *elf, const void *data);
 void xsplice_elf_free(struct xsplice_elf *elf);
 
+int xsplice_elf_resolve_symbols(struct xsplice_elf *elf);
+int xsplice_elf_perform_relocs(struct xsplice_elf *elf);
+
 #endif /* __XEN_XSPLICE_ELF_H__ */
 
 /*
-- 
2.4.3


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 11/27] xsplice: Implement payload loading
  2016-04-27 15:48         ` Konrad Rzeszutek Wilk
@ 2016-04-27 16:06           ` Jan Beulich
  2016-04-27 16:14           ` Jan Beulich
  1 sibling, 0 replies; 90+ messages in thread
From: Jan Beulich @ 2016-04-27 16:06 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, Keir Fraser, andrew.cooper3, mpohlack,
	ross.lagerwall, Julien Grall, sasha.levin, xen-devel

>>> On 27.04.16 at 17:48, <konrad@kernel.org> wrote:
> Here is the inline patch:

At first I'll reply on just the particular issue in move_payload(); I'll
then go through the entire patch to see if anything else needs
commenting.

> +static int move_payload(struct payload *payload, struct xsplice_elf *elf)
> +{
> +    void *text_buf, *ro_buf, *rw_buf;
> +    unsigned int i;
> +    size_t size = 0;
> +    unsigned int *offset;
> +    int rc = 0;
> +
> +    offset = xmalloc_array(unsigned int, elf->hdr->e_shnum);
> +    if ( !offset )
> +        return -ENOMEM;
> +
> +    /* Compute size of different regions. */
> +    for ( i = 1; i < elf->hdr->e_shnum; i++ )
> +    {
> +        if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> +             (elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
> +             !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
> +            calc_section(&elf->sec[i], &payload->text_size, &offset[i]);
> +        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> +                  !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
> +                  (elf->sec[i].sec->sh_flags & SHF_WRITE) )
> +            calc_section(&elf->sec[i], &payload->rw_size, &offset[i]);
> +        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> +                  !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
> +                  !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
> +            calc_section(&elf->sec[i], &payload->ro_size, &offset[i]);
> +        else if ( !elf->sec[i].sec->sh_flags ||
> +                  (elf->sec[i].sec->sh_flags & SHF_EXECINSTR) ||
> +                  (elf->sec[i].sec->sh_flags & SHF_MASKPROC) )
> +            /*
> +             * Do nothing. These are .rel.text, rel.*, .symtab, .strtab,
> +             * and .shstrtab. For the non-relocate we allocate and copy these
> +             * via other means - and the .rel we can ignore as we only use it
> +             * once during loading.
> +             */
> +            offset[i] = UINT_MAX;
> +        else if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> +                  (elf->sec[i].sec->sh_type == SHT_NOBITS) )
> +        {
> +            dprintk(XENLOG_DEBUG, XSPLICE "%s: Not supporting %s section!\n",
> +                    elf->name, elf->sec[i].name);
> +            rc = -EOPNOTSUPP;
> +            goto out;
> +        }
> +        else /* Such as .comment, or .debug_str. */
> +        {
> +            dprintk(XENLOG_DEBUG, XSPLICE "%s: Ignoring %s section!\n",
> +                    elf->name, elf->sec[i].name);
> +            offset[i] = UINT_MAX;
> +        }

I continue to not understand why SHT_NOBITS, SHF_MASKPROC, or
zero sh_flags need considering here at all. You really only care about
SHF_ALLOC sections here (as I think you confirmed on irc), so why
can't you just start this whole sequence with

    if ( !(elf->sec[i].sec->sh_flags & SHF_ALLOC) )
        <ignore-this-section>

then handle RX, RW, and RO just like you do now and finally have
an "else" covering unsupported SHF_ALLOC sections. Less code,
and easier to understand.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 11/27] xsplice: Implement payload loading
  2016-04-27 15:48         ` Konrad Rzeszutek Wilk
  2016-04-27 16:06           ` Jan Beulich
@ 2016-04-27 16:14           ` Jan Beulich
  2016-04-27 18:40             ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 90+ messages in thread
From: Jan Beulich @ 2016-04-27 16:14 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Stefano Stabellini, Keir Fraser, andrew.cooper3, mpohlack,
	ross.lagerwall, Julien Grall, sasha.levin, xen-devel

>>> On 27.04.16 at 17:48, <konrad@kernel.org> wrote:
> +int xsplice_elf_resolve_symbols(struct xsplice_elf *elf)
> +{
> +    unsigned int i;
> +    int rc = 0;
> +
> +    ASSERT(elf->sym);
> +
> +    for ( i = 1; i < elf->nsym; i++ )
> +    {
> +        unsigned int idx = elf->sym[i].sym->st_shndx;
> +        const Elf_Sym *sym = elf->sym[i].sym;
> +        unsigned long st_value = sym->st_value;

Better to use Elf_Addr here.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 24/27] xsplice: Stacking build-id dependency checking.
  2016-04-27  9:27   ` Jan Beulich
@ 2016-04-27 16:36     ` Konrad Rzeszutek Wilk
  2016-04-28  9:47       ` Jan Beulich
  0 siblings, 1 reply; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-27 16:36 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, andrew.cooper3, mpohlack, ross.lagerwall,
	sasha.levin, xen-devel

On Wed, Apr 27, 2016 at 03:27:27AM -0600, Jan Beulich wrote:
> >>> On 25.04.16 at 17:35, <konrad.wilk@oracle.com> wrote:
> > @@ -25,7 +28,7 @@ clean::
> >  .PHONY: config.h
> >  config.h: OLD_CODE_SZ=$(call CODE_SZ,$(BASEDIR)/xen-syms,xen_extra_version)
> >  config.h: NEW_CODE_SZ=$(call CODE_SZ,$<,xen_hello_world)
> > -config.h: xen_hello_world_func.o
> > +config.h: xen_hello_world_func.o xen_bye_world_func.o
> 
> Why is that?

I need OLD_CODE_SZ in xen_bye_world.c. Ah, right it should be
the other way around - xen_bye_world.o depends on config.h

And this change can be removed.
> 
> > @@ -33,9 +36,43 @@ config.h: xen_hello_world_func.o
> >  xen_hello_world.o: xen_hello_world_func.o
> >  
> >  .PHONY: $(XSPLICE)
> > -$(XSPLICE): config.h xen_hello_world_func.o xen_hello_world.o
> > -	$(LD) $(LDFLAGS) -r -o $(XSPLICE) xen_hello_world_func.o \
> > -		xen_hello_world.o
> > +$(XSPLICE): config.h xen_hello_world_func.o xen_hello_world.o note.o
> > +	$(LD) $(LDFLAGS) $(build_id_linker) -r -o $(XSPLICE) \
> > +		xen_hello_world_func.o xen_hello_world.o note.o
> 
> Probably easier to read and maintain if you used $(filter %.o,$^)
> here?
> 
> > +xen_bye_world.o: xen_bye_world_func.o
> 
> Again - why?

B/c xen_bye_world.o depends on xen_bye_world_func.o ? Oh wait, they
only depend during linking!

It should be:

xen_bye_world.o: config.h

(and also for xen_hello_world.o case)
> 
> > +.PHONY: $(XSPLICE_BYE)
> > +$(XSPLICE_BYE): $(XSPLICE) config.h xen_bye_world_func.o xen_bye_world.o hello_world_note.o
> 
> The object files depend on config.h, but the binary does only
> indirectly via the object files I would guess. (This, just like the
> question right above, would then apply to the $(XSPLICE) related
> rules too, in an earlier patch.)

xen_bye_world.c won't compile if config.h is not present.

I need to make sure that config.h gets created before xen_bye_world.o
gets built. And since config.h generation depends on the existence
of xen_hello_world_func.o

Is there a better way of making this dependency?
> 
> > +	$(LD) $(LDFLAGS) $(build_id_linker) -r -o $(XSPLICE_BYE) \
> > +		xen_bye_world_func.o xen_bye_world.o hello_world_note.o
> 
> Same as above - better use $^ (and if config.h goes away as a
> direct dependency, it looks like you don't even need $(filter ...)).

I have to have config.h as dependency. If I do 'make -j1232131 tests'
if I don't have config.h as dependency things eventually break.

> 
> > +int xen_build_id_check(const Elf_Note *n, unsigned int n_sz,
> > +                       const void **p, unsigned int *len)
> > +{
> > +    /* Check if we really have a build-id. */
> > +    if ( NT_GNU_BUILD_ID != n->type )
> > +        return -ENODATA;
> > +
> > +    if ( n_sz <= sizeof(*n) )
> > +        return -EINVAL;
> > +
> > +    if ( n->namesz + n->descsz > UINT_MAX )
> 
> Afaict this is always false. I think you really want
> 
>     if ( n->namesz + n->descsz < n->namesz )
> 
> > +        return -EINVAL;
> > +
> > +    if ( n->namesz != 4 /* GNU\0 */)
> 
> < 4 would suffice here (and be more flexible if odd padding gets
> inserted by whatever generates the note)
> 
> > +        return -EINVAL;
> > +
> > +    if ( n->namesz + n->descsz + sizeof(*n) > n_sz )
> 
>     if ( n->namesz + n->descsz > n_sz - sizeof(*n) )
> 
> > @@ -98,18 +130,9 @@ static int __init xen_build_init(void)
> >      if ( &n[1] > __note_gnu_build_id_end )
> >          return -ENODATA;;
> >  
> > -    /* Check if we really have a build-id. */
> > -    if ( NT_GNU_BUILD_ID != n->type )
> > -        return -ENODATA;
> > +    sz = (size_t)__note_gnu_build_id_end - (size_t)n;
> 
> So let's hope sizeof(void *) == sizeof(size_t) (or else this would yield
> compiler warnings).

Hmm, so far no warnings on ARM32, ARM64 nor x86.

But I will change the cast to (void *) so it is just pointer arithmetic.

New patch:

From a0bb72ff1723a320fb02c158f63d94b2a811a238 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Wed, 27 Apr 2016 12:26:50 -0400
Subject: [PATCH] xsplice: Stacking build-id dependency checking.

We now expect that the ELF payloads be built with the
--build-id.

Also the .xsplice.deps section has to have the contents
of the hypervisor (or a preceding payload) build-id.

We already have the code to verify the Elf_Note build-id
so export parts of it.

This dependency means the hypervisor MUST be compiled with
--build-id - so we gate the build of xSplice on the availability
of said functionality.

This does not impact the ordering of how the payloads can
be loaded, but it does enforce an STRICT ordering when the
payloads are applied. Also the REPLACE is special - we need
to check that its dependency against the hypervisor - not
the last applied patch.

To make this easier to test we also add an extra test-case
to be used - which can only be applied on top of the
xen_hello_world payload.

As in, one can apply xen_hello_world and then xen_bye_world
on top of that. Not the other way.

We also print the dependency and payloads build_in the keyhandler.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

---
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v3: First time included.
v4: Andrew fix against the build_id.o mutilations.
    Andrew fix to not include extra symbols in binary.id
v5: s/ssize_t/unsigned int/
v6: s/an NT_GNU../a NT_GNU/
   - Squash "xsplice: Print dependency and payloads build_id in the keyhandler"
     in this patch.
   - Add in xen_build_id_check size of section for better checking.
v7: Added Andrew's reviewed-by.
    Change the .name in test-case to adhere to spec.
    Dropped NT_GNU_BUILD_ID and moved that to earlier patch
    (build_id: Provide ld-embedded build-ids)
    Amended spec and code to only have one of .xsplice.depends and
    .note.gnu.build-id
    Expanded comment about note.o and why we don't use arch/x86/note.o.bin
    Moved xen_build_id_check definition to xsplice.h from version.h
    (and dropping the #include's in version.h)
    Sort header files in tests.
v8:
 - Change two of the dprinkt from XENLOG_DEBUG to XENLOG_ERR
v9:
 - Dropped the (unsigned long) casts since we use void.
 - Make the .xsplice_depends and .note.gnu_build_id be #defines.
 - Make the build section use $(XSPLICE_BYE)
 - Make the testcase include <public/sysctl.h>
 - Made comparisons on descsz and namesz a bit different (overflow
   checks, against value of 4, and against size)
v10 - inline patches to response to v9:
 - Use filter() to only link .o files.
 - Make dependency for xen_bye_world.o be config.h only
 - Make overflow check for n->namesz and n->descsz be proper
 - Check n->namesz against less than 4.
 - Change check against header of Elf_Note
 - Calculate size (in bytes) of Elf_Note using pointer arithmetic.
---
 .gitignore                             |   1 +
 Config.mk                              |   1 +
 docs/misc/xsplice.markdown             |  99 ++++++++++++++++++----------
 xen/arch/x86/test/Makefile             |  46 +++++++++++--
 xen/arch/x86/test/xen_bye_world.c      |  34 ++++++++++
 xen/arch/x86/test/xen_bye_world_func.c |  22 +++++++
 xen/common/Kconfig                     |   6 +-
 xen/common/version.c                   |  45 +++++++++----
 xen/common/xsplice.c                   | 117 ++++++++++++++++++++++++++++++++-
 xen/include/xen/xsplice.h              |   4 ++
 10 files changed, 323 insertions(+), 52 deletions(-)
 create mode 100644 xen/arch/x86/test/xen_bye_world.c
 create mode 100644 xen/arch/x86/test/xen_bye_world_func.c

diff --git a/.gitignore b/.gitignore
index 4a81f43..88cec1d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -248,6 +248,7 @@ xen/arch/x86/efi/disabled
 xen/arch/x86/efi/mkreloc
 xen/arch/x86/test/config.h
 xen/arch/x86/test/xen_hello_world.xsplice
+xen/arch/x86/test/xen_bye_world.xsplice
 xen/arch/*/efi/boot.c
 xen/arch/*/efi/compat.c
 xen/arch/*/efi/efi.h
diff --git a/Config.mk b/Config.mk
index 41f8c44..614dc9e 100644
--- a/Config.mk
+++ b/Config.mk
@@ -134,6 +134,7 @@ ifeq ($(call ld-ver-build-id,$(LD)),n)
 build_id_linker :=
 else
 CFLAGS += -DBUILD_ID
+export XEN_HAS_BUILD_ID=y
 build_id_linker := --build-id=sha1
 endif
 
diff --git a/docs/misc/xsplice.markdown b/docs/misc/xsplice.markdown
index 35ebc28..4a98be1 100644
--- a/docs/misc/xsplice.markdown
+++ b/docs/misc/xsplice.markdown
@@ -283,8 +283,17 @@ The xSplice core code loads the payload as a standard ELF binary, relocates it
 and handles the architecture-specifc sections as needed. This process is much
 like what the Linux kernel module loader does.
 
-The payload contains a section (xsplice_patch_func) with an array of structures
-describing the functions to be patched:
+The payload contains at least three sections:
+
+ * `.xsplice.funcs` - which is an array of xsplice_patch_func structures.
+ * `.xsplice.depends` - which is an ELF Note that describes what the payload
+    depends on. **MUST** have one.
+ *  `.note.gnu.build-id` - the build-id of this payload. **MUST** have one.
+
+### .xsplice.funcs
+
+The `.xsplice.funcs` contains an array of xsplice_patch_func structures
+which describe the functions to be patched:
 
 <pre>
 struct xsplice_patch_func {  
@@ -368,6 +377,23 @@ struct xsplice_patch_func xsplice_hello_world = {
 
 Code must be compiled with -fPIC.
 
+### .xsplice.depends and .note.gnu.build-id
+
+To support dependencies checking and safe loading (to load the
+appropiate payload against the right hypervisor) there is a need
+to embbed an build-id dependency.
+
+This is done by the payload containing an section `.xsplice.depends`
+which follows the format of an ELF Note. The contents of this
+(name, and description) are specific to the linker utilized to
+build the hypevisor and payload.
+
+If GNU linker is used then the name is `GNU` and the description
+is a NT_GNU_BUILD_ID type ID. The description can be an SHA1
+checksum, MD5 checksum or any unique value.
+
+The size of these structures varies with the --build-id linker option.
+
 ## Hypercalls
 
 We will employ the sub operations of the system management hypercall (sysctl).
@@ -853,6 +879,42 @@ This is implemented in the Xen Project hypervisor.
 
 Only the privileged domain should be allowed to do this operation.
 
+### xSplice interdependencies
+
+xSplice patches interdependencies are tricky.
+
+There are the ways this can be addressed:
+ * A single large patch that subsumes and replaces all previous ones.
+   Over the life-time of patching the hypervisor this large patch
+   grows to accumulate all the code changes.
+ * Hotpatch stack - where an mechanism exists that loads the hotpatches
+   in the same order they were built in. We would need an build-id
+   of the hypevisor to make sure the hot-patches are build against the
+   correct build.
+ * Payload containing the old code to check against that. That allows
+   the hotpatches to be loaded indepedently (if they don't overlap) - or
+   if the old code also containst previously patched code - even if they
+   overlap.
+
+The disadvantage of the first large patch is that it can grow over
+time and not provide an bisection mechanism to identify faulty patches.
+
+The hot-patch stack puts stricts requirements on the order of the patches
+being loaded and requires an hypervisor build-id to match against.
+
+The old code allows much more flexibility and an additional guard,
+but is more complex to implement.
+
+The second option which requires an build-id of the hypervisor
+is implemented in the Xen Project hypervisor.
+
+Specifically each payload has two build-id ELF notes:
+ * The build-id of the payload itself (generated via --build-id).
+ * The build-id of the payload it depends on (extracted from the
+   the previous payload or hypervisor during build time).
+
+This means that the very first payload depends on the hypervisor
+build-id.
 
 # Not Yet Done
 
@@ -872,13 +934,6 @@ The implementation must also have a mechanism for (in no particular order):
  * NOP out the code sequence if `new_size` is zero.
  * Deal with other relocation types:  R_X86_64_[8,16,32,32S], R_X86_64_PC[8,16,64]
    in payload file.
- * An dependency mechanism for the payloads. To use that information to load:
-    - The appropiate payload. To verify that payload is built against the
-      hypervisor. This can be done via the `build-id`
-      or via providing an copy of the old code - so that the hypervisor can
-       verify it against the code in memory.
-    - To construct an appropiate order of payloads to load in case they
-      depend on each other.
 
 ### Handle inlined __LINE__
 
@@ -943,32 +998,6 @@ the function itself.
 Similar considerations are true to a lesser extent for __FILE__, but it
 could be argued that file renaming should be done outside of hotpatches.
 
-### xSplice interdependencies
-
-xSplice patches interdependencies are tricky.
-
-There are the ways this can be addressed:
- * A single large patch that subsumes and replaces all previous ones.
-   Over the life-time of patching the hypervisor this large patch
-   grows to accumulate all the code changes.
- * Hotpatch stack - where an mechanism exists that loads the hotpatches
-   in the same order they were built in. We would need an build-id
-   of the hypevisor to make sure the hot-patches are build against the
-   correct build.
- * Payload containing the old code to check against that. That allows
-   the hotpatches to be loaded indepedently (if they don't overlap) - or
-   if the old code also containst previously patched code - even if they
-   overlap.
-
-The disadvantage of the first large patch is that it can grow over
-time and not provide an bisection mechanism to identify faulty patches.
-
-The hot-patch stack puts stricts requirements on the order of the patches
-being loaded and requires an hypervisor build-id to match against.
-
-The old code allows much more flexibility and an additional guard,
-but is more complex to implement.
-
 ## Signature checking requirements.
 
 The signature checking requires that the layout of the data in memory
diff --git a/xen/arch/x86/test/Makefile b/xen/arch/x86/test/Makefile
index 364408d..349c603 100644
--- a/xen/arch/x86/test/Makefile
+++ b/xen/arch/x86/test/Makefile
@@ -6,17 +6,20 @@ CODE_SZ=$(shell nm --defined -S $(1) | grep $(2) | awk '{ print "0x"$$2}')
 .PHONY: default
 
 XSPLICE := xen_hello_world.xsplice
+XSPLICE_BYE := xen_bye_world.xsplice
 
 default: xsplice
 
 install: xsplice
 	$(INSTALL_DATA) $(XSPLICE) $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE)
+	$(INSTALL_DATA) $(XSPLICE_BYE) $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE_BYE)
 uninstall:
 	rm -f $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE)
+	rm -f $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE_BYE)
 
 .PHONY: clean
 clean::
-	rm -f *.o .*.o.d $(XSPLICE) config.h
+	rm -f *.o .*.o.d $(XSPLICE) $(XSPLICE_BYE) config.h *.bin
 
 #
 # To compute these values we need the binary files: xen-syms
@@ -33,8 +36,43 @@ config.h: xen_hello_world_func.o
 xen_hello_world.o: config.h
 
 .PHONY: $(XSPLICE)
-$(XSPLICE): config.h xen_hello_world_func.o xen_hello_world.o
-	$(LD) $(LDFLAGS) -r -o $(XSPLICE) $(filter %.o,$^)
+$(XSPLICE): config.h xen_hello_world_func.o xen_hello_world.o note.o
+	$(LD) $(LDFLAGS) $(build_id_linker) -r -o $(XSPLICE) \
+		$(filter %.o,$^)
+#
+# This target is only accessible if CONFIG_XSPLICE is defined, which
+# depends on $(build_id_linker) being available. Hence we do not
+# need any checks.
+#
+# N.B. The reason we don't use arch/x86/note.o is that it may
+# not be built (it is for EFI builds), and that we do not have
+# the note.o.bin to muck with (as it gets deleted)
+#
+.PHONY: note.o
+note.o:
+	$(OBJCOPY) -O binary --only-section=.note.gnu.build-id $(BASEDIR)/xen-syms $@.bin
+	$(OBJCOPY) -I binary -O elf64-x86-64 -B i386:x86-64 \
+		   --rename-section=.data=.xsplice.depends -S $@.bin $@
+	rm -f $@.bin
+
+#
+# Extract the build-id of the xen_hello_world.xsplice
+# (which xen_bye_world will depend on).
+#
+.PHONY: hello_world_note.o
+hello_world_note.o: $(XSPLICE)
+	$(OBJCOPY) -O binary --only-section=.note.gnu.build-id $(XSPLICE) $@.bin
+	$(OBJCOPY)  -I binary -O elf64-x86-64 -B i386:x86-64 \
+		   --rename-section=.data=.xsplice.depends -S $@.bin $@
+	rm -f $@.bin
+
+xen_bye_world.o: config.h
+
+.PHONY: $(XSPLICE_BYE)
+$(XSPLICE_BYE): $(XSPLICE) config.h xen_bye_world_func.o xen_bye_world.o hello_world_note.o
+	$(LD) $(LDFLAGS) $(build_id_linker) -r -o $(XSPLICE_BYE) \
+		$(filter %.o,$^)
+
 
 .PHONY: xsplice
-xsplice: $(XSPLICE)
+xsplice: $(XSPLICE) $(XSPLICE_BYE)
diff --git a/xen/arch/x86/test/xen_bye_world.c b/xen/arch/x86/test/xen_bye_world.c
new file mode 100644
index 0000000..f93f969
--- /dev/null
+++ b/xen/arch/x86/test/xen_bye_world.c
@@ -0,0 +1,34 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#include "config.h"
+#include <xen/lib.h>
+#include <xen/types.h>
+#include <xen/version.h>
+#include <xen/xsplice.h>
+
+#include <public/sysctl.h>
+
+static char bye_world_patch_this_fnc[] = "xen_extra_version";
+extern const char *xen_bye_world(void);
+
+struct xsplice_patch_func __section(".xsplice.funcs") xsplice_xen_bye_world = {
+    .version = XSPLICE_PAYLOAD_VERSION,
+    .name = bye_world_patch_this_fnc,
+    .new_addr = xen_bye_world,
+    .old_addr = xen_extra_version,
+    .new_size = NEW_CODE_SZ,
+    .old_size = OLD_CODE_SZ,
+};
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/test/xen_bye_world_func.c b/xen/arch/x86/test/xen_bye_world_func.c
new file mode 100644
index 0000000..32ef341
--- /dev/null
+++ b/xen/arch/x86/test/xen_bye_world_func.c
@@ -0,0 +1,22 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#include <xen/types.h>
+
+/* Our replacement function for xen_hello_world. */
+const char *xen_bye_world(void)
+{
+    return "Bye World!";
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index e4f86c2..91ea904 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -60,6 +60,10 @@ config HAS_GDBSX
 config HAS_IOPORTS
 	bool
 
+config HAS_BUILD_ID
+	string
+	option env="XEN_HAS_BUILD_ID"
+
 # Enable/Disable kexec support
 config KEXEC
 	bool "kexec support"
@@ -192,7 +196,7 @@ endmenu
 config XSPLICE
 	bool "xSplice live patching support (TECH PREVIEW)"
 	default n
-	depends on X86
+	depends on X86 && HAS_BUILD_ID = "y"
 	---help---
 	  Allows a running Xen hypervisor to be dynamically patched using
 	  binary patches without rebooting. This is primarily used to binarily
diff --git a/xen/common/version.c b/xen/common/version.c
index 30578a6..0f96849 100644
--- a/xen/common/version.c
+++ b/xen/common/version.c
@@ -86,9 +86,41 @@ int xen_build_id(const void **p, unsigned int *len)
 /* Defined in linker script. */
 extern const Elf_Note __note_gnu_build_id_start[], __note_gnu_build_id_end[];
 
+int xen_build_id_check(const Elf_Note *n, unsigned int n_sz,
+                       const void **p, unsigned int *len)
+{
+    /* Check if we really have a build-id. */
+    if ( NT_GNU_BUILD_ID != n->type )
+        return -ENODATA;
+
+    if ( n_sz <= sizeof(*n) )
+        return -EINVAL;
+
+    if ( n->namesz + n->descsz < n->namesz )
+        return -EINVAL;
+
+    if ( n->namesz < 4 /* GNU\0 */)
+        return -EINVAL;
+
+    if ( n->namesz + n->descsz > n_sz - sizeof(*n) )
+        return -EINVAL;
+
+    /* Sanity check, name should be "GNU" for ld-generated build-id. */
+    if ( strncmp(ELFNOTE_NAME(n), "GNU", n->namesz) != 0 )
+        return -ENODATA;
+
+    if ( len )
+        *len = n->descsz;
+    if ( p )
+        *p = ELFNOTE_DESC(n);
+
+    return 0;
+}
+
 static int __init xen_build_init(void)
 {
     const Elf_Note *n = __note_gnu_build_id_start;
+    unsigned int sz;
 
     /* --build-id invoked with wrong parameters. */
     if ( __note_gnu_build_id_end <= &n[0] )
@@ -98,18 +130,9 @@ static int __init xen_build_init(void)
     if ( &n[1] > __note_gnu_build_id_end )
         return -ENODATA;;
 
-    /* Check if we really have a build-id. */
-    if ( NT_GNU_BUILD_ID != n->type )
-        return -ENODATA;
+    sz = (void *)__note_gnu_build_id_end - (void *)n;
 
-    /* Sanity check, name should be "GNU" for ld-generated build-id. */
-    if ( strncmp(ELFNOTE_NAME(n), "GNU", n->namesz) != 0 )
-        return -ENODATA;
-
-    build_id_len = n->descsz;
-    build_id_p = ELFNOTE_DESC(n);
-
-    return 0;
+    return xen_build_id_check(n, sz, &build_id_p, &build_id_len);
 }
 __initcall(xen_build_init);
 #endif
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 8243348..4e5c549 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -4,6 +4,7 @@
  */
 
 #include <xen/cpu.h>
+#include <xen/elf.h>
 #include <xen/err.h>
 #include <xen/guest_access.h>
 #include <xen/keyhandler.h>
@@ -42,6 +43,12 @@ static LIST_HEAD(applied_list);
 static unsigned int payload_cnt;
 static unsigned int payload_version = 1;
 
+/* To contain the ELF Note header. */
+struct xsplice_build_id {
+   const void *p;
+   unsigned int len;
+};
+
 struct payload {
     uint32_t state;                      /* One of the XSPLICE_STATE_*. */
     int32_t rc;                          /* 0 or -XEN_EXX. */
@@ -61,6 +68,8 @@ struct payload {
     struct virtual_region region;        /* symbol, bug.frame patching and
                                             exception table (x86). */
     unsigned int nsyms;                  /* Nr of entries in .strtab and symbols. */
+    struct xsplice_build_id id;          /* ELFNOTE_DESC(.note.gnu.build-id) of the payload. */
+    struct xsplice_build_id dep;         /* ELFNOTE_DESC(.xsplice.depends). */
     char name[XEN_XSPLICE_NAME_SIZE];    /* Name of it. */
 };
 
@@ -419,7 +428,9 @@ static int secure_payload(struct payload *payload, struct xsplice_elf *elf)
 static int check_special_sections(const struct xsplice_elf *elf)
 {
     unsigned int i;
-    static const char *const names[] = { ELF_XSPLICE_FUNC };
+    static const char *const names[] = { ELF_XSPLICE_FUNC,
+                                         ELF_XSPLICE_DEPENDS,
+                                         ELF_BUILD_ID_NOTE};
     DECLARE_BITMAP(found, ARRAY_SIZE(names)) = { 0 };
 
     for ( i = 0; i < ARRAY_SIZE(names); i++ )
@@ -459,6 +470,7 @@ static int prepare_payload(struct payload *payload,
     unsigned int i;
     struct xsplice_patch_func *f;
     struct virtual_region *region;
+    const Elf_Note *n;
 
     sec = xsplice_elf_sec_by_name(elf, ELF_XSPLICE_FUNC);
     ASSERT(sec);
@@ -515,6 +527,37 @@ static int prepare_payload(struct payload *payload,
         }
     }
 
+    sec = xsplice_elf_sec_by_name(elf, ELF_BUILD_ID_NOTE);
+    if ( sec )
+    {
+        n = sec->load_addr;
+
+        if ( sec->sec->sh_size <= sizeof(*n) )
+            return -EINVAL;
+
+        if ( xen_build_id_check(n, sec->sec->sh_size,
+                                &payload->id.p, &payload->id.len) )
+            return -EINVAL;
+
+        if ( !payload->id.len || !payload->id.p )
+            return -EINVAL;
+    }
+
+    sec = xsplice_elf_sec_by_name(elf, ELF_XSPLICE_DEPENDS);
+    {
+        n = sec->load_addr;
+
+        if ( sec->sec->sh_size <= sizeof(*n) )
+            return -EINVAL;
+
+        if ( xen_build_id_check(n, sec->sec->sh_size,
+                                &payload->dep.p, &payload->dep.len) )
+            return -EINVAL;
+
+        if ( !payload->dep.len || !payload->dep.p )
+            return -EINVAL;
+    }
+
     /* Setup the virtual region with proper data. */
     region = &payload->region;
 
@@ -1244,6 +1287,55 @@ void check_for_xsplice_work(void)
     }
 }
 
+/*
+ * Only allow dependent payload is applied on top of the correct
+ * build-id.
+ *
+ * This enforces an stacking order - the first payload MUST be against the
+ * hypervisor. The second against the first payload, and so on.
+ *
+ * Unless the 'internal' parameter is used - in which case we only
+ * check against the hypervisor.
+ */
+static int build_id_dep(struct payload *payload, bool_t internal)
+{
+    const void *id = NULL;
+    unsigned int len = 0;
+    int rc;
+    const char *name = "hypervisor";
+
+    ASSERT(payload->dep.len && payload->dep.p);
+
+    /* First time user is against hypervisor. */
+    if ( internal )
+    {
+        rc = xen_build_id(&id, &len);
+        if ( rc )
+            return rc;
+    }
+    else
+    {
+        /* We should be against the last applied one. */
+        const struct payload *data;
+
+        data = list_last_entry(&applied_list, struct payload, applied_list);
+
+        id = data->id.p;
+        len = data->id.len;
+        name = data->name;
+    }
+
+    if ( payload->dep.len != len ||
+         memcmp(id, payload->dep.p, len) )
+    {
+        dprintk(XENLOG_ERR, "%s%s: check against %s build-id failed!\n",
+                XSPLICE, payload->name, name);
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
 static int xsplice_action(xen_sysctl_xsplice_action_t *action)
 {
     struct payload *data;
@@ -1283,6 +1375,18 @@ static int xsplice_action(xen_sysctl_xsplice_action_t *action)
     case XSPLICE_ACTION_REVERT:
         if ( data->state == XSPLICE_STATE_APPLIED )
         {
+            const struct payload *p;
+
+            p = list_last_entry(&applied_list, struct payload, applied_list);
+            ASSERT(p);
+            /* We should be the last applied one. */
+            if ( p != data )
+            {
+                dprintk(XENLOG_ERR, "%s%s: can't unload. Top is %s!\n",
+                        XSPLICE, data->name, p->name);
+                rc = -EBUSY;
+                break;
+            }
             data->rc = -EAGAIN;
             rc = schedule_work(data, action->cmd, action->timeout);
         }
@@ -1291,6 +1395,9 @@ static int xsplice_action(xen_sysctl_xsplice_action_t *action)
     case XSPLICE_ACTION_APPLY:
         if ( data->state == XSPLICE_STATE_CHECKED )
         {
+            rc = build_id_dep(data, !!list_empty(&applied_list));
+            if ( rc )
+                break;
             data->rc = -EAGAIN;
             rc = schedule_work(data, action->cmd, action->timeout);
         }
@@ -1299,6 +1406,9 @@ static int xsplice_action(xen_sysctl_xsplice_action_t *action)
     case XSPLICE_ACTION_REPLACE:
         if ( data->state == XSPLICE_STATE_CHECKED )
         {
+            rc = build_id_dep(data, 1 /* against hypervisor. */);
+            if ( rc )
+                break;
             data->rc = -EAGAIN;
             rc = schedule_work(data, action->cmd, action->timeout);
         }
@@ -1403,6 +1513,11 @@ static void xsplice_printall(unsigned char key)
                 }
             }
         }
+        if ( data->id.len )
+            printk("build-id=%*phN\n", data->id.len, data->id.p);
+
+        if ( data->dep.len )
+            printk("depend-on=%*phN\n", data->dep.len, data->dep.p);
     }
 
     spin_unlock(&payload_lock);
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index 849f58c..5ffb69d 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -28,6 +28,8 @@ struct xen_sysctl_xsplice_op;
 #define XSPLICE             "xsplice: "
 /* ELF payload special section names. */
 #define ELF_XSPLICE_FUNC    ".xsplice.funcs"
+#define ELF_XSPLICE_DEPENDS ".xsplice.depends"
+#define ELF_BUILD_ID_NOTE   ".note.gnu.build-id"
 
 struct xsplice_symbol {
     const char *name;
@@ -40,6 +42,8 @@ int xsplice_op(struct xen_sysctl_xsplice_op *);
 void check_for_xsplice_work(void);
 unsigned long xsplice_symbols_lookup_by_name(const char *symname);
 bool_t is_patch(const void *addr);
+int xen_build_id_check(const Elf_Note *n, unsigned int n_sz,
+                       const void **p, unsigned int *len);
 
 /* Arch hooks. */
 int arch_xsplice_verify_elf(const struct xsplice_elf *elf);
-- 
2.4.3


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 11/27] xsplice: Implement payload loading
  2016-04-27 16:14           ` Jan Beulich
@ 2016-04-27 18:40             ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-27 18:40 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Keir Fraser, ross.lagerwall, andrew.cooper3,
	mpohlack, Julien Grall, sasha.levin, xen-devel

On Wed, Apr 27, 2016 at 10:14:20AM -0600, Jan Beulich wrote:
> >>> On 27.04.16 at 17:48, <konrad@kernel.org> wrote:
> > +int xsplice_elf_resolve_symbols(struct xsplice_elf *elf)
> > +{
> > +    unsigned int i;
> > +    int rc = 0;
> > +
> > +    ASSERT(elf->sym);
> > +
> > +    for ( i = 1; i < elf->nsym; i++ )
> > +    {
> > +        unsigned int idx = elf->sym[i].sym->st_shndx;
> > +        const Elf_Sym *sym = elf->sym[i].sym;
> > +        unsigned long st_value = sym->st_value;
> 
> Better to use Elf_Addr here.

/me nods.

I hadn't received any other emails so hopefully that was the only
remaining issue. Here is the updated patch:


From 741c77f2dd0d9677ed44a20c9c43aa06aed5d883 Mon Sep 17 00:00:00 2001
From: Ross Lagerwall <ross.lagerwall@citrix.com>
Date: Wed, 27 Apr 2016 09:01:51 -0400
Subject: [PATCH] xsplice: Implement payload loading

Add support for loading xsplice payloads. This is somewhat similar to
the Linux kernel module loader, implementing the following steps:
- Verify the elf file.
- Parse the elf file.
- Allocate a region of memory mapped within a free area of
  [xen_virt_end, XEN_VIRT_END].
- Copy allocated sections into the new region. Split them in three
  regions - .text, .data, and .rodata. MUST have at least .text.
- Resolve section symbols. All other symbols must be absolute addresses.
  (Note that patch titled "xsplice,symbols: Implement symbol name resolution
   on address" implements that)
- Perform relocations.
- Secure the the regions (.text,.data,.rodata) with proper permissions.

We capitalize on the vmalloc callback API (see patch titled:
"rm/x86/vmap: Add v[z|m]alloc_xen, and vm_init_type") to allocate
a region of memory within the [xen_virt_end, XEN_VIRT_END] for the code.

We also use the "x86/mm: Introduce modify_xen_mappings()"
to change the virtual address page-table permissions.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Julien Grall <julien.grall@arm.com>

---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v2: - Change the 'xsplice_patch_func' structure layout/size.
    - Add more error checking. Fix memory leak.
    - Move elf_resolve and elf_perform relocs in elf file.
    - Print the payload address and pages in keyhandler.
v3:
    - Make it build under ARM
    - Build it without using the return_ macro.
    - Add fixes from Ross.
    - Add the _return macro back - but only use it during debug builds.
    - Remove the macro, prefix arch_ on arch specific calls.
v4:
    - Move alloc_payload to arch specific file.
    - Use void* instead of uint8_t, use const
    - Add copyrights
    - Unroll the vmap code to add ASSERT. Change while to not incur
      potential long error loop
   - Use vmalloc/vfree cb APIs
   - Secure .text pages to be RX instead of RWX.
v5:
  - Fix allocation of virtual addresses only allowing one page to be allocated.
  - Create .text, .data, and .rodata regions with different permissions.
  - Make the find_space_t not a typedef to pointer to a function.
  - Allocate memory in here.
v6: Drop parentheses on typedefs.
  - s/an xSplice/a xSplice/
  - Rebase on "vmap: Add vmalloc_cb"
  - Rebase on "vmap: Add vmalloc_type and vm_init_type"
  - s/uint8_t/void/ on load_addr
  - Set xsplice_elf on stack without using memset.
v7:
  - Changed the check on delta = elf->hdr->e_shoff + elf->hdr->e_shnum * elf->hdr->e_shentsize;
    The sections can be right at the back of the file (different linker!), so the failing conditional
    for 'if (delta >= elf->len)' is incorrect and should have been '>'.
  - Changed dprintk(XENLOG_DEBUG to XENLOG_ERR, then back to DEBUG. Converted
    some of the printk to dprintk.
  - Rebase on " arm/x86/vmap: Add vmalloc_xen, vfree_xen and vm_init_type"
  - Changed some of the printk XENLOG_ERR to XENLOG_DEBUG
  - Check the idx in the relocation to make sure it is within bounds and
    implemented.
  - Use "x86/mm: Introduce modify_xen_mappings()"
  - Introduce PRIxElfAddr
  - Check for overflow in R_X86_64_PC32
  - Return -EOPNOTSUPP if we don't support types in ELF64_R_TYPE
v8:
  - Change dprintk and printk XENLOG_DEBUG to XENLOG_ERR
  - Convert four of the printks in dprintk.
v9:
  - Rebase on different spinlock usage in xsplice_upload.
  - Do proper bound and overflow checking.
  - Added 'const' on [text,ro,rw]_addr.
  - Made 'calc_section' and 'move_payload' use an dynamically
    allocated array for computed offsets instead of modifying sh_entsize.
  - Remove arch_xsplice_[alloc_payload|free] and use vzalloc_xen and
    vfree.
  - Collapse for loop in move_payload.
  - Move xsplice.o in Makefile
  - Add more checks in arch_xsplice_perform_rela (r_offset and
     sh_size % sh_entsize)
  - Use int32_t and int64_t in arch_xsplice_perform_rela.
  - Tighten the list of sh_flags we check
  - Use intermediate on 'buf' so that we can do 'const void *'
  - Use intermediate in xsplice_elf_resolve_symbols for 'const' of elf->sym.
  - Fail if (and only) SHF_ALLOC and SHT_NOBITS section is seen.
v10:
   - Dropped Andrew's Reviewed-by
   - Expand arch_xsplice_verify_elf to check EI_CLASS and EI_ABIVERSION
   - In arch_xsplice_perform_rela drop check against !rela->sec->sh_entsize,
     add extra checks against r_offset + sizeof(type) neccessating
     an extra goto statement.
   - Make arch_xsplice_init be __init.
   - In free_payload_data check against ->pages instead of ->text_addr.
   - In move_payload use 'void *' instead of 'uint8_t *', use xmalloc_array
     for offset, expand on the 'Do Nothing' comment and the 'Ignoring';
     Use vmalloc instead of vzalloc - which means for .bss we also use
     memset; drop the unary + when calculating address for rw_buf;
     Fix indention (I hope? I don't see an issue); also use offset[i] =UINT_MAX
     for sections we are not going to allocate or memcpy - and assert if
     we do hit those.
   - In xsplice_elf_resolve_symbols move check against
     !(elf->sec[idx].sec->sh_flags & SHF_ALLOC) back to what it was
     in v8.
   - In xsplice_elf_perform_relocs drop comment about first ELF
     symbol.
   - Different if/else if in upload function.
v10.1 - patch posted as inline reply within v9 patchset:
   - Add check for EI_DATA and ditch EI_ABIVERSION out of platform specific checks.
   - In elf_resolve_symbols use const ElfSym and temporary variable (st_value)
     which is used to set value in const (and only in one place will we
     unconst the sym); change %#"PRIx.." to #x or #lx types as we don't use Elf
     types anymore.
   - Make the r_offset check be within switch statements to guard against NONE
     relocations.
   - Fix the 'else if' spacing issue;
   - Use different message when idx in symbols is out of bounds.
v10.2 - patch posted as inline reply within the v10.1 above.
   - In 'move_payload' simplify loop calculating size.
   - In xsplice_elf_resolve_symbols use Elf_Addr instead of unsigned long.
     Also use %#"PRIx" to work on ARM32
---
 xen/arch/arm/Makefile         |   1 +
 xen/arch/arm/xsplice.c        |  46 +++++++++
 xen/arch/x86/Makefile         |   1 +
 xen/arch/x86/xsplice.c        | 173 +++++++++++++++++++++++++++++++
 xen/common/xsplice.c          | 233 +++++++++++++++++++++++++++++++++++++++++-
 xen/common/xsplice_elf.c      | 118 +++++++++++++++++++++
 xen/include/xen/elfstructs.h  |   4 +
 xen/include/xen/xsplice.h     |  24 +++++
 xen/include/xen/xsplice_elf.h |  11 +-
 9 files changed, 606 insertions(+), 5 deletions(-)
 create mode 100644 xen/arch/arm/xsplice.c
 create mode 100644 xen/arch/x86/xsplice.c

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 0328b50..eae5cb3 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -40,6 +40,7 @@ obj-y += device.o
 obj-y += decode.o
 obj-y += processor.o
 obj-y += smc.o
+obj-$(CONFIG_XSPLICE) += xsplice.o
 
 #obj-bin-y += ....o
 
diff --git a/xen/arch/arm/xsplice.c b/xen/arch/arm/xsplice.c
new file mode 100644
index 0000000..8cb7767
--- /dev/null
+++ b/xen/arch/arm/xsplice.c
@@ -0,0 +1,46 @@
+/*
+ *  Copyright (C) 2016 Citrix Systems R&D Ltd.
+ */
+#include <xen/errno.h>
+#include <xen/init.h>
+#include <xen/lib.h>
+#include <xen/xsplice_elf.h>
+#include <xen/xsplice.h>
+
+int arch_xsplice_verify_elf(const struct xsplice_elf *elf)
+{
+    return -ENOSYS;
+}
+
+int arch_xsplice_perform_rel(struct xsplice_elf *elf,
+                             const struct xsplice_elf_sec *base,
+                             const struct xsplice_elf_sec *rela)
+{
+    return -ENOSYS;
+}
+
+int arch_xsplice_perform_rela(struct xsplice_elf *elf,
+                              const struct xsplice_elf_sec *base,
+                              const struct xsplice_elf_sec *rela)
+{
+    return -ENOSYS;
+}
+
+int arch_xsplice_secure(const void *va, unsigned int pages, enum va_type type)
+{
+    return -ENOSYS;
+}
+
+void __init arch_xsplice_init(void)
+{
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 729065b..f74fd2c 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -61,6 +61,7 @@ obj-y += x86_emulate.o
 obj-y += tboot.o
 obj-y += hpet.o
 obj-y += vm_event.o
+obj-$(CONFIG_XSPLICE) += xsplice.o
 obj-y += xstate.o
 
 obj-$(crash_debug) += gdbstub.o
diff --git a/xen/arch/x86/xsplice.c b/xen/arch/x86/xsplice.c
new file mode 100644
index 0000000..82618f7
--- /dev/null
+++ b/xen/arch/x86/xsplice.c
@@ -0,0 +1,173 @@
+/*
+ * Copyright (C) 2016 Citrix Systems R&D Ltd.
+ */
+
+#include <xen/errno.h>
+#include <xen/init.h>
+#include <xen/lib.h>
+#include <xen/mm.h>
+#include <xen/pfn.h>
+#include <xen/vmap.h>
+#include <xen/xsplice_elf.h>
+#include <xen/xsplice.h>
+
+int arch_xsplice_verify_elf(const struct xsplice_elf *elf)
+{
+
+    const Elf_Ehdr *hdr = elf->hdr;
+
+    if ( hdr->e_machine != EM_X86_64 ||
+         hdr->e_ident[EI_CLASS] != ELFCLASS64 ||
+         hdr->e_ident[EI_DATA] != ELFDATA2LSB )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Unsupported ELF Machine type!\n",
+                elf->name);
+        return -EOPNOTSUPP;
+    }
+
+    return 0;
+}
+
+int arch_xsplice_perform_rel(struct xsplice_elf *elf,
+                             const struct xsplice_elf_sec *base,
+                             const struct xsplice_elf_sec *rela)
+{
+    dprintk(XENLOG_ERR, XSPLICE "%s: SHT_REL relocation unsupported\n",
+            elf->name);
+    return -EOPNOTSUPP;
+}
+
+int arch_xsplice_perform_rela(struct xsplice_elf *elf,
+                              const struct xsplice_elf_sec *base,
+                              const struct xsplice_elf_sec *rela)
+{
+    const Elf_RelA *r;
+    unsigned int symndx, i;
+    uint64_t val;
+    uint8_t *dest;
+
+    /* Nothing to do. */
+    if ( !rela->sec->sh_size )
+        return 0;
+
+    if ( rela->sec->sh_entsize < sizeof(Elf_RelA) ||
+         rela->sec->sh_size % rela->sec->sh_entsize )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Section relative header is corrupted!\n",
+                elf->name);
+        return -EINVAL;
+    }
+
+    for ( i = 0; i < (rela->sec->sh_size / rela->sec->sh_entsize); i++ )
+    {
+        r = rela->data + i * rela->sec->sh_entsize;
+
+        symndx = ELF64_R_SYM(r->r_info);
+
+        if ( symndx > elf->nsym )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Relative relocation wants symbol@%u which is past end!\n",
+                    elf->name, symndx);
+            return -EINVAL;
+        }
+
+        dest = base->load_addr + r->r_offset;
+        val = r->r_addend + elf->sym[symndx].sym->st_value;
+
+        switch ( ELF64_R_TYPE(r->r_info) )
+        {
+        case R_X86_64_NONE:
+            break;
+
+        case R_X86_64_64:
+            if ( r->r_offset >= base->sec->sh_size ||
+                (r->r_offset + sizeof(uint64_t)) > base->sec->sh_size )
+                goto bad_offset;
+
+            *(uint64_t *)dest = val;
+            break;
+
+        case R_X86_64_PLT32:
+            /*
+             * Xen uses -fpic which normally uses PLT relocations
+             * except that it sets visibility to hidden which means
+             * that they are not used.  However, when gcc cannot
+             * inline memcpy it emits memcpy with default visibility
+             * which then creates a PLT relocation.  It can just be
+             * treated the same as R_X86_64_PC32.
+             */
+        case R_X86_64_PC32:
+            if ( r->r_offset >= base->sec->sh_size ||
+                (r->r_offset + sizeof(uint32_t)) > base->sec->sh_size )
+                goto bad_offset;
+
+            val -= (uint64_t)dest;
+            *(int32_t *)dest = val;
+            if ( (int64_t)val != *(int32_t *)dest )
+            {
+                dprintk(XENLOG_ERR, XSPLICE "%s: Overflow in relocation %u in %s for %s!\n",
+                        elf->name, i, rela->name, base->name);
+                return -EOVERFLOW;
+            }
+            break;
+
+        default:
+            dprintk(XENLOG_ERR, XSPLICE "%s: Unhandled relocation %lu\n",
+                    elf->name, ELF64_R_TYPE(r->r_info));
+            return -EOPNOTSUPP;
+        }
+    }
+
+    return 0;
+
+ bad_offset:
+    dprintk(XENLOG_ERR, XSPLICE "%s: Relative relocation offset is past %s section!\n",
+            elf->name, base->name);
+    return -EINVAL;
+}
+
+/*
+ * Once the resolving symbols, performing relocations, etc is complete
+ * we secure the memory by putting in the proper page table attributes
+ * for the desired type.
+ */
+int arch_xsplice_secure(const void *va, unsigned int pages, enum va_type type)
+{
+    unsigned long start = (unsigned long)va;
+    unsigned int flag;
+
+    ASSERT(va);
+    ASSERT(pages);
+
+    if ( type == XSPLICE_VA_RX )
+        flag = PAGE_HYPERVISOR_RX;
+    else if ( type == XSPLICE_VA_RW )
+        flag = PAGE_HYPERVISOR_RW;
+    else
+        flag = PAGE_HYPERVISOR_RO;
+
+    modify_xen_mappings(start, start + pages * PAGE_SIZE, flag);
+
+    return 0;
+}
+
+void __init arch_xsplice_init(void)
+{
+    void *start, *end;
+
+    start = (void *)xen_virt_end;
+    end = (void *)(XEN_VIRT_END - NR_CPUS * PAGE_SIZE);
+
+    BUG_ON(end <= start);
+
+    vm_init_type(VMAP_XEN, start, end);
+}
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 73e50f0..1688c6c 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -13,6 +13,7 @@
 #include <xen/smp.h>
 #include <xen/spinlock.h>
 #include <xen/vmap.h>
+#include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 
 #include <asm/event.h>
@@ -29,6 +30,13 @@ struct payload {
     uint32_t state;                      /* One of the XSPLICE_STATE_*. */
     int32_t rc;                          /* 0 or -XEN_EXX. */
     struct list_head list;               /* Linked to 'payload_list'. */
+    const void *text_addr;               /* Virtual address of .text. */
+    size_t text_size;                    /* .. and its size. */
+    const void *rw_addr;                 /* Virtual address of .data. */
+    size_t rw_size;                      /* .. and its size (if any). */
+    const void *ro_addr;                 /* Virtual address of .rodata. */
+    size_t ro_size;                      /* .. and its size (if any). */
+    unsigned int pages;                  /* Total pages for [text,rw,ro]_addr */
     char name[XEN_XSPLICE_NAME_SIZE];    /* Name of it. */
 };
 
@@ -83,19 +91,223 @@ static struct payload *find_payload(const char *name)
     return found;
 }
 
+/*
+ * Functions related to XEN_SYSCTL_XSPLICE_UPLOAD (see xsplice_upload), and
+ * freeing payload (XEN_SYSCTL_XSPLICE_ACTION:XSPLICE_ACTION_UNLOAD).
+ */
+
+static void free_payload_data(struct payload *payload)
+{
+    /* Set to zero until "move_payload". */
+    if ( !payload->pages )
+        return;
+
+    vfree((void *)payload->text_addr);
+
+    payload->pages = 0;
+}
+
+/*
+* calc_section computes the size (taking into account section alignment).
+*
+* Furthermore the offset is set with the offset from the start of the virtual
+* address space for the payload (using passed in size). This is used in
+* move_payload to figure out the destination location (load_addr).
+*/
+static void calc_section(const struct xsplice_elf_sec *sec, size_t *size,
+                         unsigned int *offset)
+{
+    const Elf_Shdr *s = sec->sec;
+    size_t align_size;
+
+    align_size = ROUNDUP(*size, s->sh_addralign);
+    *offset = align_size;
+    *size = s->sh_size + align_size;
+}
+
+static int move_payload(struct payload *payload, struct xsplice_elf *elf)
+{
+    void *text_buf, *ro_buf, *rw_buf;
+    unsigned int i;
+    size_t size = 0;
+    unsigned int *offset;
+    int rc = 0;
+
+    offset = xmalloc_array(unsigned int, elf->hdr->e_shnum);
+    if ( !offset )
+        return -ENOMEM;
+
+    /* Compute size of different regions. */
+    for ( i = 1; i < elf->hdr->e_shnum; i++ )
+    {
+        /*
+         * Do nothing. These are .rel.text, rel.*, .symtab, .strtab,
+         * and .shstrtab. For the non-relocate we allocate and copy these
+         * via other means - and the .rel we can ignore as we only use it
+         * once during loading.
+         */
+        if ( !(elf->sec[i].sec->sh_flags & SHF_ALLOC) )
+        {
+            offset[i] = UINT_MAX;
+            continue;
+        }
+        if ( (elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
+             !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
+            calc_section(&elf->sec[i], &payload->text_size, &offset[i]);
+        else if ( !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
+                  (elf->sec[i].sec->sh_flags & SHF_WRITE) )
+            calc_section(&elf->sec[i], &payload->rw_size, &offset[i]);
+        else if ( !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
+                  !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
+            calc_section(&elf->sec[i], &payload->ro_size, &offset[i]);
+        else
+        {
+            dprintk(XENLOG_DEBUG, XSPLICE "%s: Not supporting %s section!\n",
+                    elf->name, elf->sec[i].name);
+            rc = -EOPNOTSUPP;
+            goto out;
+        }
+    }
+
+    /*
+     * Total of all three regions - RX, RW, and RO. We have to have
+     * keep them in seperate pages so we PAGE_ALIGN the RX and RW to have
+     * them on seperate pages. The last one will by default fall on its
+     * own page.
+     */
+    size = PAGE_ALIGN(payload->text_size) + PAGE_ALIGN(payload->rw_size) +
+                      payload->ro_size;
+
+    size = PFN_UP(size); /* Nr of pages. */
+    text_buf = vmalloc_xen(size * PAGE_SIZE);
+    if ( !text_buf )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Could not allocate memory for payload!\n",
+                elf->name);
+        rc = -ENOMEM;
+        goto out;
+    }
+    rw_buf = text_buf + PAGE_ALIGN(payload->text_size);
+    ro_buf = rw_buf + PAGE_ALIGN(payload->rw_size);
+
+    payload->pages = size;
+    payload->text_addr = text_buf;
+    payload->rw_addr = rw_buf;
+    payload->ro_addr = ro_buf;
+
+    for ( i = 1; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( elf->sec[i].sec->sh_flags & SHF_ALLOC )
+        {
+            void *buf;
+
+            if ( elf->sec[i].sec->sh_flags & SHF_EXECINSTR )
+                buf = text_buf;
+            else if ( elf->sec[i].sec->sh_flags & SHF_WRITE )
+                buf = rw_buf;
+            else
+                buf = ro_buf;
+
+            ASSERT(offset[i] != UINT_MAX);
+
+            elf->sec[i].load_addr = buf + offset[i];
+
+            /* Don't copy NOBITS - such as BSS. */
+            if ( elf->sec[i].sec->sh_type != SHT_NOBITS )
+            {
+                memcpy(elf->sec[i].load_addr, elf->sec[i].data,
+                       elf->sec[i].sec->sh_size);
+                dprintk(XENLOG_DEBUG, XSPLICE "%s: Loaded %s at %p\n",
+                        elf->name, elf->sec[i].name, elf->sec[i].load_addr);
+            }
+            else
+                memset(elf->sec[i].load_addr, 0, elf->sec[i].sec->sh_size);
+        }
+    }
+
+ out:
+    xfree(offset);
+
+    return rc;
+}
+
+static int secure_payload(struct payload *payload, struct xsplice_elf *elf)
+{
+    int rc;
+    unsigned int text_pages, rw_pages, ro_pages;
+
+    text_pages = PFN_UP(payload->text_size);
+    ASSERT(text_pages);
+
+    rc = arch_xsplice_secure(payload->text_addr, text_pages, XSPLICE_VA_RX);
+    if ( rc )
+        return rc;
+
+    rw_pages = PFN_UP(payload->rw_size);
+    if ( rw_pages )
+    {
+        rc = arch_xsplice_secure(payload->rw_addr, rw_pages, XSPLICE_VA_RW);
+        if ( rc )
+            return rc;
+    }
+
+    ro_pages = PFN_UP(payload->ro_size);
+    if ( ro_pages )
+        rc = arch_xsplice_secure(payload->ro_addr, ro_pages, XSPLICE_VA_RO);
+
+    ASSERT(ro_pages + rw_pages + text_pages == payload->pages);
+
+    return rc;
+}
+
 static void free_payload(struct payload *data)
 {
     ASSERT(spin_is_locked(&payload_lock));
     list_del(&data->list);
     payload_cnt--;
     payload_version++;
+    free_payload_data(data);
     xfree(data);
 }
 
+static int load_payload_data(struct payload *payload, void *raw, size_t len)
+{
+    struct xsplice_elf elf = { .name = payload->name, .len = len };
+    int rc = 0;
+
+    rc = xsplice_elf_load(&elf, raw);
+    if ( rc )
+        goto out;
+
+    rc = move_payload(payload, &elf);
+    if ( rc )
+        goto out;
+
+    rc = xsplice_elf_resolve_symbols(&elf);
+    if ( rc )
+        goto out;
+
+    rc = xsplice_elf_perform_relocs(&elf);
+    if ( rc )
+        goto out;
+
+    rc = secure_payload(payload, &elf);
+
+ out:
+    if ( rc )
+        free_payload_data(payload);
+
+    /* Free our temporary data structure. */
+    xsplice_elf_free(&elf);
+
+    return rc;
+}
+
 static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
 {
     struct payload *data, *found;
     char n[XEN_XSPLICE_NAME_SIZE];
+    void *raw_data;
     int rc;
 
     rc = verify_payload(upload, n);
@@ -103,6 +315,7 @@ static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
         return rc;
 
     data = xzalloc(struct payload);
+    raw_data = vmalloc(upload->size);
 
     spin_lock(&payload_lock);
 
@@ -111,11 +324,18 @@ static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
         rc = PTR_ERR(found);
     else if ( found )
         rc = -EEXIST;
-    else if ( !data )
+    else if ( !data || !raw_data )
         rc = -ENOMEM;
+    else if ( __copy_from_guest(raw_data, upload->payload, upload->size) )
+        rc = -EFAULT;
     else
     {
         memcpy(data->name, n, strlen(n));
+
+        rc = load_payload_data(data, raw_data, upload->size);
+        if ( rc )
+            goto out;
+
         data->state = XSPLICE_STATE_CHECKED;
         INIT_LIST_HEAD(&data->list);
 
@@ -123,8 +343,12 @@ static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
         payload_cnt++;
         payload_version++;
     }
+
+ out:
     spin_unlock(&payload_lock);
 
+    vfree(raw_data);
+
     if ( rc )
         xfree(data);
 
@@ -359,8 +583,9 @@ static void xsplice_printall(unsigned char key)
     }
 
     list_for_each_entry ( data, &payload_list, list )
-        printk(" name=%s state=%s(%d)\n", data->name,
-               state2str(data->state), data->state);
+        printk(" name=%s state=%s(%d) %p (.data=%p, .rodata=%p) using %u pages.\n",
+               data->name, state2str(data->state), data->state, data->text_addr,
+               data->rw_addr, data->ro_addr, data->pages);
 
     spin_unlock(&payload_lock);
 }
@@ -368,6 +593,8 @@ static void xsplice_printall(unsigned char key)
 static int __init xsplice_init(void)
 {
     register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
+
+    arch_xsplice_init();
     return 0;
 }
 __initcall(xsplice_init);
diff --git a/xen/common/xsplice_elf.c b/xen/common/xsplice_elf.c
index f046015..f51a161 100644
--- a/xen/common/xsplice_elf.c
+++ b/xen/common/xsplice_elf.c
@@ -100,6 +100,7 @@ static int elf_resolve_sections(struct xsplice_elf *elf, const void *data)
 
             elf->symtab = &sec[i];
 
+            elf->symtab_idx = i;
             /*
              * elf->symtab->sec->sh_link would point to the right section
              * but we hadn't finished parsing all the sections.
@@ -250,9 +251,122 @@ static int elf_get_sym(struct xsplice_elf *elf, const void *data)
     return 0;
 }
 
+int xsplice_elf_resolve_symbols(struct xsplice_elf *elf)
+{
+    unsigned int i;
+    int rc = 0;
+
+    ASSERT(elf->sym);
+
+    for ( i = 1; i < elf->nsym; i++ )
+    {
+        unsigned int idx = elf->sym[i].sym->st_shndx;
+        const Elf_Sym *sym = elf->sym[i].sym;
+        Elf_Addr st_value = sym->st_value;
+
+        switch ( idx )
+        {
+        case SHN_COMMON:
+            dprintk(XENLOG_ERR, XSPLICE "%s: Unexpected common symbol: %s\n",
+                    elf->name, elf->sym[i].name);
+            rc = -EINVAL;
+            break;
+
+        case SHN_UNDEF:
+            dprintk(XENLOG_ERR, XSPLICE "%s: Unknown symbol: %s\n",
+                    elf->name, elf->sym[i].name);
+            rc = -ENOENT;
+            break;
+
+        case SHN_ABS:
+            dprintk(XENLOG_DEBUG, XSPLICE "%s: Absolute symbol: %s => %#"PRIxElfAddr"\n",
+                    elf->name, elf->sym[i].name, sym->st_value);
+            break;
+
+        default:
+            /* SHN_COMMON and SHN_ABS are above. */
+            if ( idx >= SHN_LORESERVE )
+                rc = -EOPNOTSUPP;
+            else if ( idx >= elf->hdr->e_shnum )
+                rc = -EINVAL;
+
+            if ( rc )
+            {
+                dprintk(XENLOG_ERR, XSPLICE "%s: Out of bounds symbol section %#x\n",
+                        elf->name, idx);
+                break;
+            }
+
+            /* Matches 'move_payload' which ignores such sections. */
+            if ( !(elf->sec[idx].sec->sh_flags & SHF_ALLOC) )
+                break;
+
+            st_value += (unsigned long)elf->sec[idx].load_addr;
+            if ( elf->sym[i].name )
+                dprintk(XENLOG_DEBUG, XSPLICE "%s: Symbol resolved: %s => %#"PRIxElfAddr" (%s)\n",
+                       elf->name, elf->sym[i].name,
+                       st_value, elf->sec[idx].name);
+        }
+
+        if ( rc )
+            break;
+
+        ((Elf_Sym *)sym)->st_value = st_value;
+    }
+
+    return rc;
+}
+
+int xsplice_elf_perform_relocs(struct xsplice_elf *elf)
+{
+    struct xsplice_elf_sec *r, *base;
+    unsigned int i;
+    int rc = 0;
+
+    ASSERT(elf->sym);
+
+    for ( i = 1; i < elf->hdr->e_shnum; i++ )
+    {
+        r = &elf->sec[i];
+
+        if ( (r->sec->sh_type != SHT_RELA) &&
+             (r->sec->sh_type != SHT_REL) )
+            continue;
+
+         /* Is it a valid relocation section? */
+         if ( r->sec->sh_info >= elf->hdr->e_shnum )
+            continue;
+
+         base = &elf->sec[r->sec->sh_info];
+
+         /* Don't relocate non-allocated sections. */
+         if ( !(base->sec->sh_flags & SHF_ALLOC) )
+            continue;
+
+        if ( r->sec->sh_link != elf->symtab_idx )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Relative link of %s is incorrect (%d, expected=%d)\n",
+                    elf->name, r->name, r->sec->sh_link, elf->symtab_idx);
+            rc = -EINVAL;
+            break;
+        }
+
+        if ( r->sec->sh_type == SHT_RELA )
+            rc = arch_xsplice_perform_rela(elf, base, r);
+        else /* SHT_REL */
+            rc = arch_xsplice_perform_rel(elf, base, r);
+
+        if ( rc )
+            break;
+    }
+
+    return rc;
+}
+
 static int xsplice_header_check(const struct xsplice_elf *elf)
 {
     const Elf_Ehdr *hdr = elf->hdr;
+    int rc;
 
     if ( sizeof(*elf->hdr) > elf->len )
     {
@@ -279,6 +393,10 @@ static int xsplice_header_check(const struct xsplice_elf *elf)
         return -EOPNOTSUPP;
     }
 
+    rc = arch_xsplice_verify_elf(elf);
+    if ( rc )
+        return rc;
+
     if ( elf->hdr->e_shstrndx == SHN_UNDEF )
     {
         dprintk(XENLOG_ERR, XSPLICE "%s: Section name idx is undefined!?\n",
diff --git a/xen/include/xen/elfstructs.h b/xen/include/xen/elfstructs.h
index 85f35ed..2b9bd3f 100644
--- a/xen/include/xen/elfstructs.h
+++ b/xen/include/xen/elfstructs.h
@@ -472,6 +472,8 @@ typedef struct {
 #endif
 
 #if defined(ELFSIZE) && (ELFSIZE == 32)
+#define PRIxElfAddr	"08x"
+
 #define Elf_Ehdr	Elf32_Ehdr
 #define Elf_Phdr	Elf32_Phdr
 #define Elf_Shdr	Elf32_Shdr
@@ -497,6 +499,8 @@ typedef struct {
 
 #define AuxInfo		Aux32Info
 #elif defined(ELFSIZE) && (ELFSIZE == 64)
+#define PRIxElfAddr	PRIx64
+
 #define Elf_Ehdr	Elf64_Ehdr
 #define Elf_Phdr	Elf64_Phdr
 #define Elf_Shdr	Elf64_Shdr
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index 7559877..857c264 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -6,6 +6,9 @@
 #ifndef __XEN_XSPLICE_H__
 #define __XEN_XSPLICE_H__
 
+struct xsplice_elf;
+struct xsplice_elf_sec;
+struct xsplice_elf_sym;
 struct xen_sysctl_xsplice_op;
 
 #ifdef CONFIG_XSPLICE
@@ -15,6 +18,27 @@ struct xen_sysctl_xsplice_op;
 
 int xsplice_op(struct xen_sysctl_xsplice_op *);
 
+/* Arch hooks. */
+int arch_xsplice_verify_elf(const struct xsplice_elf *elf);
+int arch_xsplice_perform_rel(struct xsplice_elf *elf,
+                             const struct xsplice_elf_sec *base,
+                             const struct xsplice_elf_sec *rela);
+int arch_xsplice_perform_rela(struct xsplice_elf *elf,
+                              const struct xsplice_elf_sec *base,
+                              const struct xsplice_elf_sec *rela);
+enum va_type {
+    XSPLICE_VA_RX, /* .text */
+    XSPLICE_VA_RW, /* .data */
+    XSPLICE_VA_RO, /* .rodata */
+};
+
+/*
+ * Function to secure the allocate pages (from arch_xsplice_alloc_payload)
+ * with the right page permissions.
+ */
+int arch_xsplice_secure(const void *va, unsigned int pages, enum va_type types);
+
+void arch_xsplice_init(void);
 #else
 
 #include <xen/errno.h> /* For -ENOSYS */
diff --git a/xen/include/xen/xsplice_elf.h b/xen/include/xen/xsplice_elf.h
index 686aaf0..750dc94 100644
--- a/xen/include/xen/xsplice_elf.h
+++ b/xen/include/xen/xsplice_elf.h
@@ -15,6 +15,8 @@ struct xsplice_elf_sec {
                                             elf_resolve_section_names. */
     const void *data;                    /* Pointer to the section (done by
                                             elf_resolve_sections). */
+    void *load_addr;                     /* A pointer to the allocated destination.
+                                            Done by load_payload_data. */
 };
 
 struct xsplice_elf_sym {
@@ -29,8 +31,10 @@ struct xsplice_elf {
     struct xsplice_elf_sec *sec;         /* Array of sections, allocated by us. */
     struct xsplice_elf_sym *sym;         /* Array of symbols , allocated by us. */
     unsigned int nsym;
-    const struct xsplice_elf_sec *symtab;/* Pointer to .symtab section - aka to sec[x]. */
-    const struct xsplice_elf_sec *strtab;/* Pointer to .strtab section - aka to sec[y]. */
+    const struct xsplice_elf_sec *symtab;/* Pointer to .symtab section - aka to
+                                            sec[symtab_idx]. */
+    const struct xsplice_elf_sec *strtab;/* Pointer to .strtab section. */
+    unsigned int symtab_idx;
 };
 
 const struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
@@ -38,6 +42,9 @@ const struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *
 int xsplice_elf_load(struct xsplice_elf *elf, const void *data);
 void xsplice_elf_free(struct xsplice_elf *elf);
 
+int xsplice_elf_resolve_symbols(struct xsplice_elf *elf);
+int xsplice_elf_perform_relocs(struct xsplice_elf *elf);
+
 #endif /* __XEN_XSPLICE_ELF_H__ */
 
 /*
-- 
2.4.3


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 10/27] xsplice: Add helper elf routines
  2016-04-27  7:52       ` Jan Beulich
@ 2016-04-27 18:45         ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-27 18:45 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, Ian Jackson,
	Tim Deegan, mpohlack, sasha.levin, xen-devel

> > v10:
> >   - Change the check against 64 to be against SHN_LORESERVE
> 
> So we're moving between the extremes, and (as said in reply to v9)
> I think we really want to be somewhere in the middle.
> 
> Andrew? Ross?


I stuck 1024 in there with a comment saying it is arbitrary.
Andrew was ok with that number on IRC.

And the updated patch is here, thought I will be also posting the
whole patchset shortly:


From 5663cfdad5e9a928fdca3771211edfc8cbb3e332 Mon Sep 17 00:00:00 2001
From: Ross Lagerwall <ross.lagerwall@citrix.com>
Date: Fri, 19 Feb 2016 14:37:17 -0500
Subject: [PATCH] xsplice: Add helper elf routines

Add Elf routines and data structures in preparation for loading an
xSplice payload.

We make an assumption that the max number of sections an ELF payload
can have is 64. We can in future make this be dependent on the
names of the sections and verifying against a list, but for right now
this suffices.

Also we a whole lot of checks to make sure that the ELF payload
file is not corrupted nor that the offsets point past the file.

For most of the checks we print an message if the hypervisor is built
with debug enabled.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>

v2: - With the #define ELFSIZE in the ARM file we can use the common
     #defines instead of using #ifdef CONFIG_ARM_32. Moved to another
    patch.
    - Add checks for ELF file.
    - Add name to be printed.
    - Add len for easier ELF checks.
    - Expand on the checks. Add macro.
v3: Remove the return_ macro
  - Add return_ macro back but make it depend on debug=y
  - Per Andrew review: add local variable. Fix memory leak in
    elf_resolve_sections, Remove macro and use dprintk. Fix alignment.
    Use void* instead of uint8_t to handle raw payload.
v4 - Fix memory leak in elf_get_sym
  - Add XSPLICE to printk/dprintk
v5: Sprinkle newlines.
v6: Squash the ELF header checks from 'xsplice: Implement payload loading' here,
    Do better job at checking string sections and the users of them (sh_size),
    Use XSPLICE as a string literal,
    Move some checks outside the loop,
    Make sure that SHT_STRTAB are really what they say
    Sprinkle consts.
v7:
    Check sh_entsize and sh_offset.
    Added Andrew's Reviewed-by and Ian's Acked-by
    Redo check on sh_entsize to not be !=
v8: Make all the dprintk(XENLOG_DEBUG be XENLOG_ERR
v9: Changed elf_verify_strtab to use const char and return EINVAL.
    Remove 'if ( !delta )' check in elf_resolve_sections
    Remove stale comments.
    Fixed one off check against  sh_link.
    Document boundary checks against shstrtab and symtab.
    Fixed return codes in xsplice_header_check.
    Add check for sections to not be within ELF header.
    Added overflow check for e_shoff in xsplice_header_check.
    Moved XSPLICE macro by four tabs.
    Make ->sym be const.
v10:
  - Change the check against 64 to be against SHN_LORESERVE
  - Remove Reviewed-by
  - In elf_resolve_sections skip delta check if SHT_NOBITS is set in
    second conditional.
  - In elf_get_sym use symtab_sec->sec->sh_entsize to access
    Elf_Sym symbols and also make it a const. Also
    fix boundary check against .strtab and make assigment of
    sym[i].name more natural.
  - In xsplice_header_check add comment about EI_CLASS and e_flags
    being platform specific. Check against e_version and EI_VERSION.
    Also reinstate elf->hdr->e_shoff >= elf->len  check. Add Jan's check against
    elf->hdr->e_shnum * elf->hdr->e_shentsize
v10: patches posted inline in v8.1 replies:
  - Convert in elf_resolve_sections check against delta and elf->len to an ASSERT
    (equal or less);
    Fix error message for .shstrtab to be correct.
  - In elf_get_sym make delta be Elf_Word and offset be Elf_Off.
  - Move EI_ABIVERSION back in commond code, move EI_DATA out. Update comment.
  - Drop the overflow check against e_shoff. Move the check of e_shstrndx
    and e_shnum after we check e_shnum for max value.
  - Make the check against e_shnum be up to 1024.
---
 xen/common/Makefile           |   1 +
 xen/common/xsplice_elf.c      | 373 ++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/xsplice.h     |   3 +
 xen/include/xen/xsplice_elf.h |  51 ++++++
 4 files changed, 428 insertions(+)
 create mode 100644 xen/common/xsplice_elf.c
 create mode 100644 xen/include/xen/xsplice_elf.h

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 1e4bc70..afd84b6 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -59,6 +59,7 @@ obj-y += wait.o
 obj-$(CONFIG_XENOPROF) += xenoprof.o
 obj-y += xmalloc_tlsf.o
 obj-$(CONFIG_XSPLICE) += xsplice.o
+obj-$(CONFIG_XSPLICE) += xsplice_elf.o
 
 obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma unlzo unlz4 earlycpio,$(n).init.o)
 
diff --git a/xen/common/xsplice_elf.c b/xen/common/xsplice_elf.c
new file mode 100644
index 0000000..f046015
--- /dev/null
+++ b/xen/common/xsplice_elf.c
@@ -0,0 +1,373 @@
+/*
+ * Copyright (C) 2016 Citrix Systems R&D Ltd.
+ */
+
+#include <xen/errno.h>
+#include <xen/lib.h>
+#include <xen/xsplice_elf.h>
+#include <xen/xsplice.h>
+
+const struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
+                                                      const char *name)
+{
+    unsigned int i;
+
+    for ( i = 1; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( !strcmp(name, elf->sec[i].name) )
+            return &elf->sec[i];
+    }
+
+    return NULL;
+}
+
+static int elf_verify_strtab(const struct xsplice_elf_sec *sec)
+{
+    const Elf_Shdr *s;
+    const char *contents;
+
+    s = sec->sec;
+
+    if ( s->sh_type != SHT_STRTAB )
+        return -EINVAL;
+
+    if ( !s->sh_size )
+        return -EINVAL;
+
+    contents = sec->data;
+
+    if ( contents[0] || contents[s->sh_size - 1] )
+        return -EINVAL;
+
+    return 0;
+}
+
+static int elf_resolve_sections(struct xsplice_elf *elf, const void *data)
+{
+    struct xsplice_elf_sec *sec;
+    unsigned int i;
+    Elf_Off delta;
+    int rc;
+
+    /* xsplice_elf_load sanity checked e_shnum. */
+    sec = xmalloc_array(struct xsplice_elf_sec, elf->hdr->e_shnum);
+    if ( !sec )
+    {
+        dprintk(XENLOG_ERR, XSPLICE"%s: Could not allocate memory for section table!\n",
+               elf->name);
+        return -ENOMEM;
+    }
+
+    elf->sec = sec;
+
+    /* e_shoff and e_shnum overflow checks are done in xsplice_header_check. */
+    delta = elf->hdr->e_shoff + elf->hdr->e_shnum * elf->hdr->e_shentsize;
+    ASSERT(delta <= elf->len);
+
+    for ( i = 1; i < elf->hdr->e_shnum; i++ )
+    {
+        delta = elf->hdr->e_shoff + i * elf->hdr->e_shentsize;
+
+        sec[i].sec = data + delta;
+
+        delta = sec[i].sec->sh_offset;
+        /*
+         * N.B. elf_resolve_section_names, elf_get_sym skip this check as
+         * we do it here.
+         */
+        if ( delta < sizeof(Elf_Ehdr) ||
+             (sec[i].sec->sh_type != SHT_NOBITS && /* Skip SHT_NOBITS */
+              (delta > elf->len || (delta + sec[i].sec->sh_size > elf->len))) )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Section [%u] data %s of payload!\n",
+                    elf->name, i,
+                    delta < sizeof(Elf_Ehdr) ? "at ELF header" : "is past end");
+            return -EINVAL;
+        }
+
+        sec[i].data = data + delta;
+        /* Name is populated in elf_resolve_section_names. */
+        sec[i].name = NULL;
+
+        if ( sec[i].sec->sh_type == SHT_SYMTAB )
+        {
+            if ( elf->symtab )
+            {
+                dprintk(XENLOG_ERR, XSPLICE "%s: Unsupported multiple symbol tables!\n",
+                        elf->name);
+                return -EOPNOTSUPP;
+            }
+
+            elf->symtab = &sec[i];
+
+            /*
+             * elf->symtab->sec->sh_link would point to the right section
+             * but we hadn't finished parsing all the sections.
+             */
+            if ( elf->symtab->sec->sh_link >= elf->hdr->e_shnum )
+            {
+                dprintk(XENLOG_ERR, XSPLICE
+                        "%s: Symbol table idx (%u) to strtab past end (%u)\n",
+                        elf->name, elf->symtab->sec->sh_link,
+                        elf->hdr->e_shnum);
+                return -EINVAL;
+            }
+        }
+    }
+
+    if ( !elf->symtab )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: No symbol table found!\n",
+                elf->name);
+        return -EINVAL;
+    }
+
+    if ( !elf->symtab->sec->sh_size ||
+         elf->symtab->sec->sh_entsize < sizeof(Elf_Sym) ||
+         elf->symtab->sec->sh_size % elf->symtab->sec->sh_entsize )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Symbol table header is corrupted!\n",
+                elf->name);
+        return -EINVAL;
+    }
+
+    /*
+     * There can be multiple SHT_STRTAB (.shstrtab, .strtab) so pick the one
+     * associated with the symbol table.
+     */
+    elf->strtab = &sec[elf->symtab->sec->sh_link];
+
+    rc = elf_verify_strtab(elf->strtab);
+    if ( rc )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: String table section is corrupted\n",
+                elf->name);
+    }
+
+    return rc;
+}
+
+static int elf_resolve_section_names(struct xsplice_elf *elf, const void *data)
+{
+    const char *shstrtab;
+    unsigned int i;
+    Elf_Off offset, delta;
+    struct xsplice_elf_sec *sec;
+    int rc;
+
+    /*
+     * The elf->sec[0 -> e_shnum] structures have been verified by
+     * elf_resolve_sections. Find file offset for section string table
+     * (normally called .shstrtab)
+     */
+    sec = &elf->sec[elf->hdr->e_shstrndx];
+
+    rc = elf_verify_strtab(sec);
+    if ( rc )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Section string table is corrupted\n",
+                elf->name);
+        return rc;
+    }
+
+    /* Verified in elf_resolve_sections but just in case. */
+    offset = sec->sec->sh_offset;
+    ASSERT(offset < elf->len && (offset + sec->sec->sh_size <= elf->len));
+
+    shstrtab = data + offset;
+
+    for ( i = 1; i < elf->hdr->e_shnum; i++ )
+    {
+        delta = elf->sec[i].sec->sh_name;
+
+        /* Boundary check on offset of name within the .shstrtab. */
+        if ( delta >= sec->sec->sh_size )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Section %u name is not within .shstrtab!\n",
+                    elf->name, i);
+            return -EINVAL;
+        }
+
+        elf->sec[i].name = shstrtab + delta;
+    }
+
+    return 0;
+}
+
+static int elf_get_sym(struct xsplice_elf *elf, const void *data)
+{
+    const struct xsplice_elf_sec *symtab_sec, *strtab_sec;
+    struct xsplice_elf_sym *sym;
+    unsigned int i, nsym;
+    Elf_Off offset;
+    Elf_Word delta;
+
+    symtab_sec = elf->symtab;
+    strtab_sec = elf->strtab;
+
+    /* Pointers arithmetic to get file offset. */
+    offset = strtab_sec->data - data;
+
+    /* Checked already in elf_resolve_sections, but just in case. */
+    ASSERT(offset == strtab_sec->sec->sh_offset);
+    ASSERT(offset < elf->len && (offset + strtab_sec->sec->sh_size <= elf->len));
+
+    /* symtab_sec->data was computed in elf_resolve_sections. */
+    ASSERT((symtab_sec->sec->sh_offset + data) == symtab_sec->data);
+
+    /* No need to check values as elf_resolve_sections did it. */
+    nsym = symtab_sec->sec->sh_size / symtab_sec->sec->sh_entsize;
+
+    sym = xmalloc_array(struct xsplice_elf_sym, nsym);
+    if ( !sym )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Could not allocate memory for symbols\n",
+               elf->name);
+        return -ENOMEM;
+    }
+
+    /* So we don't leak memory. */
+    elf->sym = sym;
+
+    for ( i = 1; i < nsym; i++ )
+    {
+        const Elf_Sym *s = symtab_sec->data + symtab_sec->sec->sh_entsize * i;
+
+        delta = s->st_name;
+        /* Boundary check within the .strtab. */
+        if ( delta >= strtab_sec->sec->sh_size )
+        {
+            dprintk(XENLOG_ERR, XSPLICE "%s: Symbol [%u] name is not within .strtab!\n",
+                    elf->name, i);
+            return -EINVAL;
+        }
+
+        sym[i].sym = s;
+        sym[i].name = strtab_sec->data + delta;
+    }
+    elf->nsym = nsym;
+
+    return 0;
+}
+
+static int xsplice_header_check(const struct xsplice_elf *elf)
+{
+    const Elf_Ehdr *hdr = elf->hdr;
+
+    if ( sizeof(*elf->hdr) > elf->len )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Section header is bigger than payload!\n",
+                elf->name);
+        return -EINVAL;
+    }
+
+    if ( !IS_ELF(*hdr) )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Not an ELF payload!\n", elf->name);
+        return -EINVAL;
+    }
+
+    /* EI_CLASS, EI_DATA, and e_flags are platform specific. */
+    if ( hdr->e_version != EV_CURRENT ||
+         hdr->e_ident[EI_VERSION] != EV_CURRENT ||
+         hdr->e_ident[EI_ABIVERSION] != 0 ||
+         hdr->e_ident[EI_OSABI] != ELFOSABI_SYSV ||
+         hdr->e_type != ET_REL ||
+         hdr->e_phnum != 0 )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Invalid ELF payload!\n", elf->name);
+        return -EOPNOTSUPP;
+    }
+
+    if ( elf->hdr->e_shstrndx == SHN_UNDEF )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Section name idx is undefined!?\n",
+                elf->name);
+        return -EINVAL;
+    }
+
+    /* Arbitrary boundary limit. */
+    if ( elf->hdr->e_shnum >= 1024 )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Too many (%u) sections!\n",
+                elf->name, elf->hdr->e_shnum);
+        return -EOPNOTSUPP;
+    }
+
+    /* Check that section name index is within the sections. */
+    if ( elf->hdr->e_shstrndx >= elf->hdr->e_shnum )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Section name idx (%u) is past end of sections (%u)!\n",
+                elf->name, elf->hdr->e_shstrndx, elf->hdr->e_shnum);
+        return -EINVAL;
+    }
+
+    if ( elf->hdr->e_shoff >= elf->len )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Bogus e_shoff!\n", elf->name);
+        return -EINVAL;
+    }
+
+    if ( elf->hdr->e_shentsize < sizeof(Elf_Shdr) )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Section header size is %u! Expected %zu!?\n",
+                elf->name, elf->hdr->e_shentsize, sizeof(Elf_Shdr));
+        return -EINVAL;
+    }
+
+    if ( ((elf->len - elf->hdr->e_shoff) / elf->hdr->e_shentsize) <
+         elf->hdr->e_shnum )
+    {
+        dprintk(XENLOG_ERR, XSPLICE "%s: Section header size is corrupted!\n",
+                elf->name);
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+int xsplice_elf_load(struct xsplice_elf *elf, const void *data)
+{
+    int rc;
+
+    elf->hdr = data;
+
+    rc = xsplice_header_check(elf);
+    if ( rc )
+        return rc;
+
+    rc = elf_resolve_sections(elf, data);
+    if ( rc )
+        return rc;
+
+    rc = elf_resolve_section_names(elf, data);
+    if ( rc )
+        return rc;
+
+    rc = elf_get_sym(elf, data);
+    if ( rc )
+        return rc;
+
+    return 0;
+}
+
+void xsplice_elf_free(struct xsplice_elf *elf)
+{
+    xfree(elf->sec);
+    elf->sec = NULL;
+    xfree(elf->sym);
+    elf->sym = NULL;
+    elf->nsym = 0;
+    elf->name = NULL;
+    elf->len = 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index b9f08cd..7559877 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -10,6 +10,9 @@ struct xen_sysctl_xsplice_op;
 
 #ifdef CONFIG_XSPLICE
 
+/* Convenience define for printk. */
+#define XSPLICE             "xsplice: "
+
 int xsplice_op(struct xen_sysctl_xsplice_op *);
 
 #else
diff --git a/xen/include/xen/xsplice_elf.h b/xen/include/xen/xsplice_elf.h
new file mode 100644
index 0000000..686aaf0
--- /dev/null
+++ b/xen/include/xen/xsplice_elf.h
@@ -0,0 +1,51 @@
+/*
+ * Copyright (C) 2016 Citrix Systems R&D Ltd.
+ */
+
+#ifndef __XEN_XSPLICE_ELF_H__
+#define __XEN_XSPLICE_ELF_H__
+
+#include <xen/types.h>
+#include <xen/elfstructs.h>
+
+/* The following describes an Elf file as consumed by xSplice. */
+struct xsplice_elf_sec {
+    const Elf_Shdr *sec;                 /* Hooked up in elf_resolve_sections.*/
+    const char *name;                    /* Human readable name hooked in
+                                            elf_resolve_section_names. */
+    const void *data;                    /* Pointer to the section (done by
+                                            elf_resolve_sections). */
+};
+
+struct xsplice_elf_sym {
+    const Elf_Sym *sym;
+    const char *name;
+};
+
+struct xsplice_elf {
+    const char *name;                    /* Pointer to payload->name. */
+    size_t len;                          /* Length of the ELF file. */
+    const Elf_Ehdr *hdr;                 /* ELF file. */
+    struct xsplice_elf_sec *sec;         /* Array of sections, allocated by us. */
+    struct xsplice_elf_sym *sym;         /* Array of symbols , allocated by us. */
+    unsigned int nsym;
+    const struct xsplice_elf_sec *symtab;/* Pointer to .symtab section - aka to sec[x]. */
+    const struct xsplice_elf_sec *strtab;/* Pointer to .strtab section - aka to sec[y]. */
+};
+
+const struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
+                                                      const char *name);
+int xsplice_elf_load(struct xsplice_elf *elf, const void *data);
+void xsplice_elf_free(struct xsplice_elf *elf);
+
+#endif /* __XEN_XSPLICE_ELF_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.4.3


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 24/27] xsplice: Stacking build-id dependency checking.
  2016-04-27 16:36     ` Konrad Rzeszutek Wilk
@ 2016-04-28  9:47       ` Jan Beulich
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Beulich @ 2016-04-28  9:47 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, andrew.cooper3, mpohlack, ross.lagerwall,
	sasha.levin, xen-devel

>>> On 27.04.16 at 18:36, <konrad@kernel.org> wrote:
> On Wed, Apr 27, 2016 at 03:27:27AM -0600, Jan Beulich wrote:
>> >>> On 25.04.16 at 17:35, <konrad.wilk@oracle.com> wrote:
>> > @@ -33,9 +36,43 @@ config.h: xen_hello_world_func.o
>> >  xen_hello_world.o: xen_hello_world_func.o
>> >  
>> >  .PHONY: $(XSPLICE)
>> > -$(XSPLICE): config.h xen_hello_world_func.o xen_hello_world.o
>> > -	$(LD) $(LDFLAGS) -r -o $(XSPLICE) xen_hello_world_func.o \
>> > -		xen_hello_world.o
>> > +$(XSPLICE): config.h xen_hello_world_func.o xen_hello_world.o note.o
>> > +	$(LD) $(LDFLAGS) $(build_id_linker) -r -o $(XSPLICE) \
>> > +		xen_hello_world_func.o xen_hello_world.o note.o
>> 
>> Probably easier to read and maintain if you used $(filter %.o,$^)
>> here?
>> 
>> > +xen_bye_world.o: xen_bye_world_func.o
>> 
>> Again - why?
> 
> B/c xen_bye_world.o depends on xen_bye_world_func.o ? Oh wait, they
> only depend during linking!
> 
> It should be:
> 
> xen_bye_world.o: config.h
> 
> (and also for xen_hello_world.o case)
>> 
>> > +.PHONY: $(XSPLICE_BYE)
>> > +$(XSPLICE_BYE): $(XSPLICE) config.h xen_bye_world_func.o xen_bye_world.o 
> hello_world_note.o
>> 
>> The object files depend on config.h, but the binary does only
>> indirectly via the object files I would guess. (This, just like the
>> question right above, would then apply to the $(XSPLICE) related
>> rules too, in an earlier patch.)
> 
> xen_bye_world.c won't compile if config.h is not present.
> 
> I need to make sure that config.h gets created before xen_bye_world.o
> gets built. And since config.h generation depends on the existence
> of xen_hello_world_func.o
> 
> Is there a better way of making this dependency?

You have it above (in your reply):

xen_bye_world.o: config.h

>> > +	$(LD) $(LDFLAGS) $(build_id_linker) -r -o $(XSPLICE_BYE) \
>> > +		xen_bye_world_func.o xen_bye_world.o hello_world_note.o
>> 
>> Same as above - better use $^ (and if config.h goes away as a
>> direct dependency, it looks like you don't even need $(filter ...)).
> 
> I have to have config.h as dependency. If I do 'make -j1232131 tests'
> if I don't have config.h as dependency things eventually break.

But as said - it's the object files which depend on the header, not the
final binary. At least that's how things would be normally.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 12/27] xsplice: Implement support for applying/reverting/replacing patches.
  2016-04-27  3:39     ` Konrad Rzeszutek Wilk
  2016-04-27  8:36       ` Jan Beulich
@ 2016-05-11  9:51       ` Martin Pohlack
  2016-05-11 13:56         ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 90+ messages in thread
From: Martin Pohlack @ 2016-05-11  9:51 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, Keir Fraser,
	Suravee Suthikulpanit, andrew.cooper3, mpohlack, ross.lagerwall,
	Julien Grall, Jun Nakajima, sasha.levin, xen-devel,
	Boris Ostrovsky

On 27.04.2016 05:39, Konrad Rzeszutek Wilk wrote:
[...]
> +        /* "Mask" NMIs. */
> +        arch_xsplice_mask();

You mask here ...

> +        barrier(); /* MUST do it after get_cpu_maps. */
> +        cpus = num_online_cpus() - 1;
> +
> +        if ( cpus )
> +        {
> +            dprintk(XENLOG_DEBUG, XSPLICE "%s: CPU%u - IPIing the other %u CPUs\n",
> +                    p->name, cpu, cpus);
> +            smp_call_function(reschedule_fn, NULL, 0);
> +        }
> +
> +        timeout = xsplice_work.timeout + NOW();
> +        if ( xsplice_spin(&xsplice_work.semaphore, timeout, cpus, "CPU") )
> +            goto abort;

... and potentially abort here, but the abort path does not unmask, so
you lose the NMI handler.

> +
> +        /* All CPUs are waiting, now signal to disable IRQs. */
> +        atomic_set(&xsplice_work.semaphore, 0);
> +        /*
> +         * MUST have a barrier after semaphore so that the other CPUs don't
> +         * leak out of the 'Wait for all CPUs to rendezvous' loop and increment
> +         * 'semaphore' before we set it to zero.
> +         */
> +        smp_wmb();
> +        xsplice_work.ready = 1;
> +
> +        if ( !xsplice_spin(&xsplice_work.semaphore, timeout, cpus, "IRQ") )
> +        {
> +            local_irq_save(flags);
> +            /* Do the patching. */
> +            xsplice_do_action();
> +            /* Serialize and flush out the CPU via CPUID instruction (on x86). */
> +            arch_xsplice_post_action();
> +            local_irq_restore(flags);
> +        }
> +        arch_xsplice_unmask();
> +
> + abort:
> +        per_cpu(work_to_do, cpu) = 0;
> +        xsplice_work.do_work = 0;
> +
> +        /* put_cpu_maps has an barrier(). */
> +        put_cpu_maps();
> +
> +        printk(XENLOG_INFO XSPLICE "%s finished %s with rc=%d\n",
> +               p->name, names[xsplice_work.cmd], p->rc);
> +    }
> +    else
[...]

Martin
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v9 12/27] xsplice: Implement support for applying/reverting/replacing patches.
  2016-05-11  9:51       ` Martin Pohlack
@ 2016-05-11 13:56         ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 90+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-05-11 13:56 UTC (permalink / raw)
  To: Martin Pohlack
  Cc: Kevin Tian, Stefano Stabellini, Keir Fraser, Jan Beulich,
	Jun Nakajima, andrew.cooper3, mpohlack, ross.lagerwall,
	Julien Grall, Suravee Suthikulpanit, sasha.levin, xen-devel,
	Boris Ostrovsky

On Wed, May 11, 2016 at 11:51:53AM +0200, Martin Pohlack wrote:
> On 27.04.2016 05:39, Konrad Rzeszutek Wilk wrote:
> [...]
> > +        /* "Mask" NMIs. */
> > +        arch_xsplice_mask();
> 
> You mask here ...
> 
> > +        barrier(); /* MUST do it after get_cpu_maps. */
> > +        cpus = num_online_cpus() - 1;
> > +
> > +        if ( cpus )
> > +        {
> > +            dprintk(XENLOG_DEBUG, XSPLICE "%s: CPU%u - IPIing the other %u CPUs\n",
> > +                    p->name, cpu, cpus);
> > +            smp_call_function(reschedule_fn, NULL, 0);
> > +        }
> > +
> > +        timeout = xsplice_work.timeout + NOW();
> > +        if ( xsplice_spin(&xsplice_work.semaphore, timeout, cpus, "CPU") )
> > +            goto abort;
> 
> ... and potentially abort here, but the abort path does not unmask, so
> you lose the NMI handler.

Ouch! Sending a patch shortly out! Thanks for spotting that

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 90+ messages in thread

end of thread, other threads:[~2016-05-11 13:57 UTC | newest]

Thread overview: 90+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-25 15:34 [PATCH 9] xSplice v1 design and implementation Konrad Rzeszutek Wilk
2016-04-25 15:34 ` [PATCH v9 01/27] Revert "libxc/libxl/python/xenstat/ocaml: Use new XEN_VERSION hypercall" Konrad Rzeszutek Wilk
2016-04-25 15:48   ` Jan Beulich
2016-04-25 15:53     ` Wei Liu
2016-04-25 15:34 ` [PATCH v9 02/27] Revert "HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane." Konrad Rzeszutek Wilk
2016-04-25 15:34 ` [PATCH v9 03/27] xsplice: Design document Konrad Rzeszutek Wilk
2016-04-25 15:34 ` [PATCH v9 04/27] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op Konrad Rzeszutek Wilk
2016-04-26  7:48   ` Ross Lagerwall
2016-04-26  7:52   ` Ross Lagerwall
2016-04-26 10:21   ` Jan Beulich
2016-04-26 17:50     ` Konrad Rzeszutek Wilk
2016-04-27  6:51       ` Jan Beulich
2016-04-27 13:47         ` Konrad Rzeszutek Wilk
2016-04-27 14:11           ` Jan Beulich
2016-04-25 15:34 ` [PATCH v9 05/27] libxc: Implementation of XEN_XSPLICE_op in libxc Konrad Rzeszutek Wilk
2016-04-26  7:51   ` Ross Lagerwall
2016-04-25 15:34 ` [PATCH v9 06/27] xen-xsplice: Tool to manipulate xsplice payloads Konrad Rzeszutek Wilk
2016-04-26  7:49   ` Ross Lagerwall
2016-04-25 15:34 ` [PATCH v9 07/27] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables lookup Konrad Rzeszutek Wilk
2016-04-26 10:31   ` Jan Beulich
2016-04-25 15:34 ` [PATCH v9 08/27] arm/x86/vmap: Add v[z|m]alloc_xen and vm_init_type Konrad Rzeszutek Wilk
2016-04-26 10:47   ` Jan Beulich
2016-04-27  2:38     ` Konrad Rzeszutek Wilk
2016-04-27  7:12       ` Jan Beulich
2016-04-27 13:46         ` Konrad Rzeszutek Wilk
2016-04-27 14:15           ` Jan Beulich
2016-04-25 15:34 ` [PATCH v9 09/27] x86/mm: Introduce modify_xen_mappings() Konrad Rzeszutek Wilk
2016-04-25 15:34 ` [PATCH v9 10/27] xsplice: Add helper elf routines Konrad Rzeszutek Wilk
2016-04-26 10:05   ` Ross Lagerwall
2016-04-26 11:52     ` Jan Beulich
2016-04-26 12:37   ` Jan Beulich
2016-04-27  1:59     ` Konrad Rzeszutek Wilk
2016-04-27  7:27       ` Jan Beulich
2016-04-27 14:00         ` Konrad Rzeszutek Wilk
2016-04-27  4:06     ` Konrad Rzeszutek Wilk
2016-04-27  7:52       ` Jan Beulich
2016-04-27 18:45         ` Konrad Rzeszutek Wilk
2016-04-25 15:34 ` [PATCH v9 11/27] xsplice: Implement payload loading Konrad Rzeszutek Wilk
2016-04-26 10:48   ` Ross Lagerwall
2016-04-26 13:39   ` Jan Beulich
2016-04-27  1:47     ` Konrad Rzeszutek Wilk
2016-04-27  7:57       ` Jan Beulich
2016-04-27  3:28     ` Konrad Rzeszutek Wilk
2016-04-27  8:28       ` Jan Beulich
2016-04-27 15:48         ` Konrad Rzeszutek Wilk
2016-04-27 16:06           ` Jan Beulich
2016-04-27 16:14           ` Jan Beulich
2016-04-27 18:40             ` Konrad Rzeszutek Wilk
2016-04-25 15:34 ` [PATCH v9 12/27] xsplice: Implement support for applying/reverting/replacing patches Konrad Rzeszutek Wilk
2016-04-26 15:21   ` Jan Beulich
2016-04-27  3:39     ` Konrad Rzeszutek Wilk
2016-04-27  8:36       ` Jan Beulich
2016-05-11  9:51       ` Martin Pohlack
2016-05-11 13:56         ` Konrad Rzeszutek Wilk
2016-04-25 15:35 ` [PATCH v9 13/27] x86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version' Konrad Rzeszutek Wilk
2016-04-26 15:31   ` Jan Beulich
2016-04-25 15:35 ` [PATCH v9 14/27] xsplice, symbols: Implement symbol name resolution on address Konrad Rzeszutek Wilk
2016-04-26 15:48   ` Jan Beulich
2016-04-25 15:35 ` [PATCH v9 15/27] xsplice, symbols: Implement fast symbol names -> virtual addresses lookup Konrad Rzeszutek Wilk
2016-04-26 15:53   ` Jan Beulich
2016-04-25 15:35 ` [PATCH v9 16/27] x86, xsplice: Print payload's symbol name and payload name in backtraces Konrad Rzeszutek Wilk
2016-04-26 11:06   ` Ross Lagerwall
2016-04-26 12:41     ` Jan Beulich
2016-04-26 12:48       ` Ross Lagerwall
2016-04-26 13:41         ` Jan Beulich
2016-04-27  3:31           ` Konrad Rzeszutek Wilk
2016-04-27  8:37             ` Jan Beulich
2016-04-25 15:35 ` [PATCH v9 17/27] xsplice: Add support for bug frames Konrad Rzeszutek Wilk
2016-04-26 11:05   ` Ross Lagerwall
2016-04-26 13:08     ` Ross Lagerwall
2016-04-26 15:58   ` Jan Beulich
2016-04-25 15:35 ` [PATCH v9 18/27] xsplice: Add support for exception tables Konrad Rzeszutek Wilk
2016-04-26 16:01   ` Jan Beulich
2016-04-25 15:35 ` [PATCH v9 19/27] xsplice: Add support for alternatives Konrad Rzeszutek Wilk
2016-04-27  8:58   ` Jan Beulich
2016-04-25 15:35 ` [PATCH v9 20/27] build_id: Provide ld-embedded build-ids Konrad Rzeszutek Wilk
2016-04-25 15:35 ` [PATCH v9 21/27] xsplice: Print build_id in keyhandler and on bootup Konrad Rzeszutek Wilk
2016-04-25 15:35 ` [PATCH v9 22/27] XENVER_build_id/libxc: Provide ld-embedded build-id Konrad Rzeszutek Wilk
2016-04-25 15:35 ` [PATCH v9 23/27] libxl: info: Display build_id of the hypervisor Konrad Rzeszutek Wilk
2016-04-25 15:35 ` [PATCH v9 24/27] xsplice: Stacking build-id dependency checking Konrad Rzeszutek Wilk
2016-04-27  9:27   ` Jan Beulich
2016-04-27 16:36     ` Konrad Rzeszutek Wilk
2016-04-28  9:47       ` Jan Beulich
2016-04-25 15:35 ` [PATCH v9 25/27] xsplice/xen_replace_world: Test-case for XSPLICE_ACTION_REPLACE Konrad Rzeszutek Wilk
2016-04-25 15:35 ` [PATCH v9 26/27] xsplice: Prevent duplicate payloads from being loaded Konrad Rzeszutek Wilk
2016-04-27  9:31   ` Jan Beulich
2016-04-25 15:35 ` [PATCH v9 27/27] MAINTAINERS/xsplice: Add myself and Ross as the maintainers Konrad Rzeszutek Wilk
2016-04-25 15:41 ` [PATCH 9] xSplice v1 design and implementation Jan Beulich
2016-04-25 15:47   ` Konrad Rzeszutek Wilk
2016-04-25 15:54     ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).