All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] xSplice v1 implementation and design.
@ 2016-02-12 18:05 Konrad Rzeszutek Wilk
  2016-02-12 18:05 ` [PATCH v3 01/23] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op (v10) Konrad Rzeszutek Wilk
                   ` (23 more replies)
  0 siblings, 24 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:05 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu

Hey!

Changelog:
v2: http://lists.xen.org/archives/html/xen-devel/2016-01/msg01597.html
 - Updated code/docs/design with review comments.
 - Make xen also have an PT_NOTE
 - Added more of Ross's patches
 - Combined build-id patchset with this.
(since the RFC and the Seattle Xen presentation)
 - Finished off some of the work around the build-id.
 - Settled on the preemption mechanism.
 - Cleaned the patches a lot up, broke them up to easy
   review for maintainers.
v1: http://lists.xenproject.org/archives/html/xen-devel/2015-09/msg02116.html
  - Put all the design comments in the code
Prototype: http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg02595.html
[Posting by Ross]
 - Took all reviews into account.
 - Redid the patches

*What is xSplice?*

A mechanism to binarily patch the running hypervisor with new
opcodes that have come about due to primarily security updates.

*What will this patchset do once I've it*

Patch the hypervisor.

*Why are you emailing me?*

Please please review the patches. The first three are the foundation of the
design and everything else depends on them.

*Do they depend on anything*

Yes, I've sent some of the prerequisite patches:
http://lists.xen.org/archives/html/xen-devel/2016-02/msg01724.html

*OK, what do you have?*

They are located at a git tree:
  git://xenbits.xen.org/people/konradwilk/xen.git xsplice.v3

(Copying from Ross's email):

Much of the work is implementing a basic version of the Linux kernel module
loader. The code:
* Loading of xSplice ELF payloads.
* Copying allocated sections into a new executable region of memory.
* Resolving symbols.
* Applying relocations.
* Patching of altinstructions.
* Special handling of bug frames and exception tables.
* Unloading of xSplice ELF payloads.
* Compiling a sample xSplice ELF payload
* Resolving symbols (*NEW*)
* Using build-id dependencies (*NEW*)
* Support for shadow variable framework (*NEW*)
* Support for executing ELF payload functions on load/unload. (*NEW*)

The other main bit of this work is applying and reverting the patches safely.
As implemented, the code is patched with each CPU waiting in the
return-to-guest path (i.e. with no stack) or on the cpu-idle path
which appears to be the safest way of patching. While it is safe we should
still (in the next wave of patches) to verify to not patch cetain critical
sections (say the code doing the patching)

All of the following should work:
* Applying patches safely.
* Reverting patches safely.
* Replacing patches safely (e.g. reverting any applied patches and applying
   a new patch).
* Bug frames as part of modules. This means adding or
  changing WARN, ASSERT, BUG, and run_in_exception_handler works correctly.
  Line number only changes _are ignored_.
* Exception tables as part of modules. E.g. wrmsr_safe and copy_to_user work
  correctly when used in a patch module.
* Stacking of patches on top of each other
* Resolving symbols (even of patches)

*Limitations*

The above is enough to fully implement an update system where multiple source
patches are combined (using combinediff) and built into a single binary
which then atomically replaces any existing loaded patches
(this is why Ross added a REPLACE operation). This is the approach used
by kPatch and kGraft.

Multiple completely independent patches can also be loaded but unexpected
interactions may occur.

As it stands, the patches are statically linked which means that independent
patches cannot be linked against one another (e.g. if one introduces a
new symbol). Using the combinediff approach above fixes this.

Backtraces containing functions from a patch module do not show the symbol name.

There is no checking that a patch which is loaded is built for the
correct hypervisor (need to use build-id).

Binary patching works at the function level.

*Testing*

You can use the example code included in this patchset:

# xl info | grep extra
xen_extra              : -unstable
# xen-xsplice load /usr/lib/debug/xen_hello_world.xsplice
Uploading /usr/lib/debug/xen_hello_world.xsplice (8785 bytes)
Performing check: completed
Performing apply:. completed
# xl info | grep extra
xen_extra              : Hello World
# xen-xsplice revert xen_hello_world
Performing revert:. completed
# xen-xsplice unload xen_hello_world
Performing unload: completed
# xl info | grep extra
xen_extra              : -unstable

Or you can use git://xenbits.xen.org/people/konradwilk/xsplice-build-tools.git
which generates the ELF payloads.

This link has a nice description of how to use the tool:
http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg02595.html


Konrad Rzeszutek Wilk (13):
      xsplice: Design document (v7).
      xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op (v10)
      libxc: Implementation of XEN_XSPLICE_op in libxc (v5).
      xen-xsplice: Tool to manipulate xsplice payloads (v4)
      x86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version'. (v2)
      xsm/xen_version: Add XSM for the xen_version hypercall (v8).
      XENVER_build_id: Provide ld-embedded build-ids (v10)
      libxl: info: Display build_id of the hypervisor.
      xsplice: Print build_id in keyhandler.
      xsplice: basic build-id dependency checking.
      xsplice: Print dependency and payloads build_id in the keyhandler.
      xsplice: Add hooks functions and other macros
      xsplice,hello_world: Use the XSPLICE_[UN|]LOAD_HOOK hooks     for two functions.

Ross Lagerwall (11):
      elf: Add relocation types to elfstructs.h
      xsplice: Add helper elf routines (v4)
      xsplice: Implement payload loading (v4)
      xsplice: Implement support for applying/reverting/replacing patches. (v5)
      xsplice: Add support for bug frames. (v4)
      xsplice: Add support for exception tables. (v2)
      xsplice: Add support for alternatives
      xsplice: Prevent duplicate payloads to be loaded.
      xsplice,symbols: Implement symbol name resolution on address. (v2)
      x86, xsplice: Print payload's symbol name and module in backtraces
      xsplice: Add support for shadow variables

 .gitignore                                   |    1 +
 Config.mk                                    |   12 +
 docs/misc/xsplice.markdown                   | 1126 ++++++++++++++++++++
 tools/flask/policy/policy/modules/xen/xen.te |   14 +
 tools/libxc/include/xenctrl.h                |   19 +-
 tools/libxc/xc_misc.c                        |  332 ++++++
 tools/libxc/xc_private.c                     |    7 +
 tools/libxc/xc_private.h                     |   10 +
 tools/libxl/libxl.c                          |   45 +
 tools/libxl/libxl.h                          |    5 +
 tools/libxl/libxl_types.idl                  |    1 +
 tools/libxl/xl_cmdimpl.c                     |    1 +
 tools/misc/Makefile                          |    4 +
 tools/misc/xen-xsplice.c                     |  470 ++++++++
 tools/misc/xsplice.lds                       |   11 +
 xen/Makefile                                 |    2 +
 xen/arch/arm/Makefile                        |    7 +-
 xen/arch/arm/xen.lds.S                       |   13 +
 xen/arch/arm/xsplice.c                       |   31 +
 xen/arch/x86/Makefile                        |   46 +-
 xen/arch/x86/alternative.c                   |   12 +-
 xen/arch/x86/boot/mkelf32.c                  |  137 ++-
 xen/arch/x86/domain.c                        |    4 +
 xen/arch/x86/extable.c                       |   36 +-
 xen/arch/x86/hvm/svm/svm.c                   |    2 +
 xen/arch/x86/hvm/vmx/vmcs.c                  |    2 +
 xen/arch/x86/setup.c                         |    7 +
 xen/arch/x86/test/Makefile                   |   63 ++
 xen/arch/x86/test/xen_hello_world.c          |   33 +
 xen/arch/x86/test/xen_hello_world_func.c     |    8 +
 xen/arch/x86/traps.c                         |   36 +-
 xen/arch/x86/xen.lds.S                       |   23 +
 xen/arch/x86/xsplice.c                       |  132 +++
 xen/common/Kconfig                           |   15 +
 xen/common/Makefile                          |    4 +
 xen/common/kernel.c                          |   89 +-
 xen/common/symbols.c                         |   30 +
 xen/common/sysctl.c                          |    7 +
 xen/common/version.c                         |   70 ++
 xen/common/vsprintf.c                        |   18 +-
 xen/common/xsplice.c                         | 1475 ++++++++++++++++++++++++++
 xen/common/xsplice_elf.c                     |  302 ++++++
 xen/common/xsplice_shadow.c                  |  105 ++
 xen/include/asm-arm/bug.h                    |    2 +
 xen/include/asm-arm/nmi.h                    |   13 +
 xen/include/asm-x86/alternative.h            |    1 +
 xen/include/asm-x86/bug.h                    |    2 +
 xen/include/asm-x86/uaccess.h                |    5 +
 xen/include/asm-x86/x86_64/page.h            |    2 +
 xen/include/public/sysctl.h                  |  156 +++
 xen/include/public/version.h                 |   16 +-
 xen/include/xen/elfstructs.h                 |    8 +
 xen/include/xen/kernel.h                     |    1 +
 xen/include/xen/symbols.h                    |    2 +
 xen/include/xen/version.h                    |    6 +
 xen/include/xen/xsplice.h                    |   92 ++
 xen/include/xen/xsplice_elf.h                |   42 +
 xen/include/xen/xsplice_patch.h              |   85 ++
 xen/include/xsm/dummy.h                      |   22 +
 xen/include/xsm/xsm.h                        |    5 +
 xen/xsm/dummy.c                              |    1 +
 xen/xsm/flask/hooks.c                        |   53 +
 xen/xsm/flask/policy/access_vectors          |   32 +
 xen/xsm/flask/policy/security_classes        |    1 +
 64 files changed, 5240 insertions(+), 74 deletions(-)

Ugh! 5K ?! 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v3 01/23] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op (v10)
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
@ 2016-02-12 18:05 ` Konrad Rzeszutek Wilk
  2016-02-12 20:11   ` Andrew Cooper
  2016-02-12 18:05 ` [PATCH v3 02/23] libxc: Implementation of XEN_XSPLICE_op in libxc (v5) Konrad Rzeszutek Wilk
                   ` (22 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:05 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Daniel De Graaf, Ian Jackson,
	Stefano Stabellini, Ian Campbell, Wei Liu, xen-devel
  Cc: Konrad Rzeszutek Wilk

The implementation does not actually do any patching.

It just adds the framework for doing the hypercalls,
keeping track of ELF payloads, and the basic operations:
 - query which payloads exist,
 - query for specific payloads,
 - check*1, apply*1, replace*1, and unload payloads.

*1: Which of course in this patch are nops.

Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>

---
v2: Rebased on keyhandler: rework keyhandler infrastructure
v3: Fixed XSM.
v4: Removed REVERTED state.
    Split status and error code.
    Add REPLACE action.
    Separate payload data from the payload structure.
    s/XSPLICE_ID_../XSPLICE_NAME_../
v5: Add xsplice and CONFIG_XSPLICE build toption.
    Fix code per Jan's review.
    Update the sysctl.h (change bits to enum like)
v6: Rebase on Kconfig changes.
v7: Add missing pad checks. Re-order keyhandler.h to build on ARM.
v8: Rebase on build: hook the schedulers into Kconfig
v9: s/id/name/
    s/payload_list_lock/payload_lock/
v10: Put #ifdef CONFIG_XSPLICE in header file.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/flask/policy/policy/modules/xen/xen.te |   1 +
 xen/common/Kconfig                           |  10 +
 xen/common/Makefile                          |   2 +
 xen/common/sysctl.c                          |   7 +
 xen/common/xsplice.c                         | 386 +++++++++++++++++++++++++++
 xen/include/public/sysctl.h                  | 156 +++++++++++
 xen/include/xen/xsplice.h                    |  15 ++
 xen/xsm/flask/hooks.c                        |   6 +
 xen/xsm/flask/policy/access_vectors          |   2 +
 9 files changed, 585 insertions(+)
 create mode 100644 xen/common/xsplice.c
 create mode 100644 xen/include/xen/xsplice.h

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index d35ae22..542c3e1 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -72,6 +72,7 @@ allow dom0_t xen_t:xen2 {
 allow dom0_t xen_t:xen2 {
     pmu_ctrl
     get_symbol
+    xsplice_op
 };
 allow dom0_t xen_t:mmu memorymap;
 
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 6f404b4..619aa9e 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -152,4 +152,14 @@ config SCHED_DEFAULT
 
 endmenu
 
+# Enable/Disable xsplice support
+config XSPLICE
+	bool "xsplice support"
+	default y
+	---help---
+	  Allows a running Xen hypervisor to be patched without rebooting.
+	  This is primarily used to patch an hypervisor with XSA fixes.
+
+	  If unsure, say Y.
+
 endmenu
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 6e82b33..43b3911 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -72,3 +72,5 @@ subdir-$(coverage) += gcov
 
 subdir-y += libelf
 subdir-$(CONFIG_HAS_DEVICE_TREE) += libfdt
+
+obj-$(CONFIG_XSPLICE) += xsplice.o
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 1624024..68e3eb4 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -28,6 +28,7 @@
 #include <xsm/xsm.h>
 #include <xen/pmstat.h>
 #include <xen/gcov.h>
+#include <xen/xsplice.h>
 
 long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
 {
@@ -460,6 +461,12 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
         ret = tmem_control(&op->u.tmem_op);
         break;
 
+    case XEN_SYSCTL_xsplice_op:
+        ret = xsplice_control(&op->u.xsplice);
+        if ( ret != -ENOSYS )
+            copyback = 1;
+        break;
+
     default:
         ret = arch_do_sysctl(op, u_sysctl);
         copyback = 0;
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
new file mode 100644
index 0000000..125d9b8
--- /dev/null
+++ b/xen/common/xsplice.c
@@ -0,0 +1,386 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#include <xen/guest_access.h>
+#include <xen/keyhandler.h>
+#include <xen/lib.h>
+#include <xen/list.h>
+#include <xen/mm.h>
+#include <xen/sched.h>
+#include <xen/smp.h>
+#include <xen/spinlock.h>
+#include <xen/xsplice.h>
+
+#include <asm/event.h>
+#include <public/sysctl.h>
+
+static DEFINE_SPINLOCK(payload_lock);
+static LIST_HEAD(payload_list);
+
+static unsigned int payload_cnt;
+static unsigned int payload_version = 1;
+
+struct payload {
+    int32_t state;                       /* One of the XSPLICE_STATE_*. */
+    int32_t rc;                          /* 0 or -XEN_EXX. */
+    struct list_head list;               /* Linked to 'payload_list'. */
+    char name[XEN_XSPLICE_NAME_SIZE + 1];/* Name of it. */
+};
+
+static const char *state2str(int32_t state)
+{
+#define STATE(x) [XSPLICE_STATE_##x] = #x
+    static const char *const names[] = {
+            STATE(LOADED),
+            STATE(CHECKED),
+            STATE(APPLIED),
+    };
+#undef STATE
+
+    if (state >= ARRAY_SIZE(names))
+        return "unknown";
+
+    if (state < 0)
+        return "-EXX";
+
+    if (!names[state])
+        return "unknown";
+
+    return names[state];
+}
+
+static void xsplice_printall(unsigned char key)
+{
+    struct payload *data;
+
+    spin_lock(&payload_lock);
+
+    list_for_each_entry ( data, &payload_list, list )
+        printk(" name=%s state=%s(%d)\n", data->name,
+               state2str(data->state), data->state);
+
+    spin_unlock(&payload_lock);
+}
+
+static int verify_name(xen_xsplice_name_t *name)
+{
+    if ( name->size == 0 || name->size > XEN_XSPLICE_NAME_SIZE )
+        return -EINVAL;
+
+    if ( name->pad[0] || name->pad[1] || name->pad[2] )
+        return -EINVAL;
+
+    if ( !guest_handle_okay(name->name, name->size) )
+        return -EINVAL;
+
+    return 0;
+}
+
+static int find_payload(xen_xsplice_name_t *name, bool_t need_lock,
+                        struct payload **f)
+{
+    struct payload *data;
+    XEN_GUEST_HANDLE_PARAM(char) str;
+    char n[XEN_XSPLICE_NAME_SIZE + 1] = { 0 };
+    int rc = -EINVAL;
+
+    rc = verify_name(name);
+    if ( rc )
+        return rc;
+
+    str = guest_handle_cast(name->name, char);
+    if ( copy_from_guest(n, str, name->size) )
+        return -EFAULT;
+
+    if ( need_lock )
+        spin_lock(&payload_lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( data, &payload_list, list )
+    {
+        if ( !strcmp(data->name, n) )
+        {
+            *f = data;
+            rc = 0;
+            break;
+        }
+    }
+
+    if ( need_lock )
+        spin_unlock(&payload_lock);
+
+    return rc;
+}
+
+static int verify_payload(xen_sysctl_xsplice_upload_t *upload)
+{
+    if ( verify_name(&upload->name) )
+        return -EINVAL;
+
+    if ( upload->size == 0 )
+        return -EINVAL;
+
+    if ( !guest_handle_okay(upload->payload, upload->size) )
+        return -EFAULT;
+
+    return 0;
+}
+
+/*
+ * We MUST be holding the payload_lock spinlock.
+ */
+static void free_payload(struct payload *data)
+{
+    list_del(&data->list);
+    payload_cnt--;
+    payload_version++;
+    xfree(data);
+}
+
+static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
+{
+    struct payload *data = NULL;
+    uint8_t *raw_data;
+    int rc;
+
+    rc = verify_payload(upload);
+    if ( rc )
+        return rc;
+
+    rc = find_payload(&upload->name, 1 /* true. */, &data);
+    if ( rc == 0 /* Found. */ )
+        return -EEXIST;
+
+    if ( rc != -ENOENT )
+        return rc;
+
+    data = xzalloc(struct payload);
+    if ( !data )
+        return -ENOMEM;
+
+    memset(data, 0, sizeof *data);
+    rc = -EFAULT;
+    if ( copy_from_guest(data->name, upload->name.name, upload->name.size) )
+        goto err_data;
+
+    rc = -ENOMEM;
+    raw_data = alloc_xenheap_pages(get_order_from_bytes(upload->size), 0);
+    if ( !raw_data )
+        goto err_data;
+
+    rc = -EFAULT;
+    if ( copy_from_guest(raw_data, upload->payload, upload->size) )
+        goto err_raw;
+
+    data->state = XSPLICE_STATE_LOADED;
+    data->rc = 0;
+    INIT_LIST_HEAD(&data->list);
+
+    spin_lock(&payload_lock);
+    list_add_tail(&data->list, &payload_list);
+    payload_cnt++;
+    payload_version++;
+    spin_unlock(&payload_lock);
+
+    free_xenheap_pages(raw_data, get_order_from_bytes(upload->size));
+    return 0;
+
+ err_raw:
+    free_xenheap_pages(raw_data, get_order_from_bytes(upload->size));
+ err_data:
+    xfree(data);
+    return rc;
+}
+
+static int xsplice_get(xen_sysctl_xsplice_summary_t *summary)
+{
+    struct payload *data;
+    int rc;
+
+    if ( summary->status.state )
+        return -EINVAL;
+
+    if ( summary->status.rc != 0 )
+        return -EINVAL;
+
+    rc = verify_name(&summary->name);
+    if ( rc )
+        return rc;
+
+    rc = find_payload(&summary->name, 1 /* true. */, &data);
+    if ( rc )
+        return rc;
+
+    summary->status.state = data->state;
+    summary->status.rc = data->rc;
+
+    return 0;
+}
+
+static int xsplice_list(xen_sysctl_xsplice_list_t *list)
+{
+    xen_xsplice_status_t status;
+    struct payload *data;
+    unsigned int idx = 0, i = 0;
+    int rc = 0;
+
+    if ( list->nr > 1024 )
+        return -E2BIG;
+
+    if ( list->pad != 0 )
+        return -EINVAL;
+
+    if ( !guest_handle_okay(list->status, sizeof(status) * list->nr) ||
+         !guest_handle_okay(list->name, XEN_XSPLICE_NAME_SIZE * list->nr) ||
+         !guest_handle_okay(list->len, sizeof(uint32_t) * list->nr) )
+        return -EINVAL;
+
+    spin_lock(&payload_lock);
+    if ( list->idx > payload_cnt || !list->nr )
+    {
+        spin_unlock(&payload_lock);
+        return -EINVAL;
+    }
+
+    list_for_each_entry( data, &payload_list, list )
+    {
+        uint32_t len;
+
+        if ( list->idx > i++ )
+            continue;
+
+        status.state = data->state;
+        status.rc = data->rc;
+        len = strlen(data->name);
+
+        /* N.B. 'idx' != 'i'. */
+        if ( __copy_to_guest_offset(list->name, idx * XEN_XSPLICE_NAME_SIZE,
+                                    data->name, len) ||
+             __copy_to_guest_offset(list->len, idx, &len, 1) ||
+             __copy_to_guest_offset(list->status, idx, &status, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+        idx++;
+        if ( hypercall_preempt_check() || (idx + 1 > list->nr) )
+            break;
+    }
+    list->nr = payload_cnt - i; /* Remaining amount. */
+    list->version = payload_version;
+    spin_unlock(&payload_lock);
+
+    /* And how many we have processed. */
+    return rc ? : idx;
+}
+
+static int xsplice_action(xen_sysctl_xsplice_action_t *action)
+{
+    struct payload *data;
+    int rc;
+
+    rc = verify_name(&action->name);
+    if ( rc )
+        return rc;
+
+    spin_lock(&payload_lock);
+    rc = find_payload(&action->name, 0 /* We are holding the lock. */, &data);
+    if ( rc )
+        goto out;
+
+    switch ( action->cmd )
+    {
+    case XSPLICE_ACTION_CHECK:
+        if ( (data->state == XSPLICE_STATE_LOADED) ||
+             (data->state == XSPLICE_STATE_CHECKED) )
+        {
+            /* No implementation yet. */
+            data->state = XSPLICE_STATE_CHECKED;
+            data->rc = 0;
+            rc = 0;
+        }
+        break;
+    case XSPLICE_ACTION_UNLOAD:
+        if ( (data->state == XSPLICE_STATE_LOADED) ||
+             (data->state == XSPLICE_STATE_CHECKED) )
+        {
+            free_payload(data);
+            /* No touching 'data' from here on! */
+            rc = 0;
+        }
+        break;
+    case XSPLICE_ACTION_REVERT:
+        if ( data->state == XSPLICE_STATE_APPLIED )
+        {
+            /* No implementation yet. */
+            data->state = XSPLICE_STATE_CHECKED;
+            data->rc = 0;
+            rc = 0;
+        }
+        break;
+    case XSPLICE_ACTION_APPLY:
+        if ( (data->state == XSPLICE_STATE_CHECKED) )
+        {
+            /* No implementation yet. */
+            data->state = XSPLICE_STATE_APPLIED;
+            data->rc = 0;
+            rc = 0;
+        }
+        break;
+    case XSPLICE_ACTION_REPLACE:
+        if ( data->state == XSPLICE_STATE_CHECKED )
+        {
+            /* No implementation yet. */
+            data->state = XSPLICE_STATE_CHECKED;
+            data->rc = 0;
+            rc = 0;
+        }
+        break;
+    default:
+        rc = -EOPNOTSUPP;
+        break;
+    }
+
+ out:
+    spin_unlock(&payload_lock);
+
+    return rc;
+}
+
+int xsplice_control(xen_sysctl_xsplice_op_t *xsplice)
+{
+    int rc;
+
+    if ( xsplice->pad != 0 )
+        return -EINVAL;
+
+    switch ( xsplice->cmd )
+    {
+    case XEN_SYSCTL_XSPLICE_UPLOAD:
+        rc = xsplice_upload(&xsplice->u.upload);
+        break;
+    case XEN_SYSCTL_XSPLICE_GET:
+        rc = xsplice_get(&xsplice->u.get);
+        break;
+    case XEN_SYSCTL_XSPLICE_LIST:
+        rc = xsplice_list(&xsplice->u.list);
+        break;
+    case XEN_SYSCTL_XSPLICE_ACTION:
+        rc = xsplice_action(&xsplice->u.action);
+        break;
+    default:
+        rc = -EOPNOTSUPP;
+        break;
+   }
+
+    return rc;
+}
+
+static int __init xsplice_init(void)
+{
+    register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
+    return 0;
+}
+__initcall(xsplice_init);
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 96680eb..d549e7a 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -766,6 +766,160 @@ struct xen_sysctl_tmem_op {
 typedef struct xen_sysctl_tmem_op xen_sysctl_tmem_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tmem_op_t);
 
+/*
+ * XEN_SYSCTL_XSPLICE_op
+ *
+ * Refer to the http://xenbits.xenproject.org/docs/unstable/misc/xsplice.html
+ * for the design details of this hyprcall.
+ */
+
+/*
+ * Structure describing an ELF payload. Uniquely identifies the
+ * payload. Should be human readable.
+ * Recommended length is upto XEN_XSPLICE_NAME_SIZE.
+ */
+#define XEN_XSPLICE_NAME_SIZE 128
+struct xen_xsplice_name {
+    XEN_GUEST_HANDLE_64(char) name;         /* IN: pointer to name. */
+    uint16_t size;                          /* IN: size of name. May be upto
+                                               XEN_XSPLICE_NAME_SIZE. */
+    uint16_t pad[3];                        /* IN: MUST be zero. */
+};
+typedef struct xen_xsplice_name xen_xsplice_name_t;
+DEFINE_XEN_GUEST_HANDLE(xen_xsplice_name_t);
+
+/*
+ * Upload a payload to the hypervisor. The payload is verified
+ * against basic checks and if there are any issues the proper return code
+ * will be returned. The payload is not applied at this time - that is
+ * controlled by XEN_SYSCTL_XSPLICE_ACTION.
+ *
+ * The return value is zero if the payload was succesfully uploaded.
+ * Otherwise an EXX return value is provided. Duplicate `name` are not
+ * supported.
+ *
+ * The payload at this point is verified against the basic checks.
+ *
+ * The `payload` is the ELF payload as mentioned in the `Payload format`
+ * section in the xSplice design document.
+ */
+#define XEN_SYSCTL_XSPLICE_UPLOAD 0
+struct xen_sysctl_xsplice_upload {
+    xen_xsplice_name_t name;                /* IN, name of the patch. */
+    uint64_t size;                          /* IN, size of the ELF file. */
+    XEN_GUEST_HANDLE_64(uint8) payload;     /* IN, the ELF file. */
+};
+typedef struct xen_sysctl_xsplice_upload xen_sysctl_xsplice_upload_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_upload_t);
+
+/*
+ * Retrieve an status of an specific payload.
+ *
+ * Upon completion the `struct xen_xsplice_status` is updated.
+ *
+ * The return value is zero on success and XEN_EXX on failure. This operation
+ * is synchronous and does not require preemption.
+ */
+#define XEN_SYSCTL_XSPLICE_GET 1
+
+struct xen_xsplice_status {
+#define XSPLICE_STATE_LOADED       1
+#define XSPLICE_STATE_CHECKED      2
+#define XSPLICE_STATE_APPLIED      3
+    int32_t state;                 /* OUT: XSPLICE_STATE_*. IN: MUST be zero. */
+    int32_t rc;                    /* OUT: 0 if no error, otherwise -XEN_EXX. */
+                                   /* IN: MUST be zero. */
+};
+typedef struct xen_xsplice_status xen_xsplice_status_t;
+DEFINE_XEN_GUEST_HANDLE(xen_xsplice_status_t);
+
+struct xen_sysctl_xsplice_summary {
+    xen_xsplice_name_t name;                /* IN, name of the payload. */
+    xen_xsplice_status_t status;            /* IN/OUT, state of it. */
+};
+typedef struct xen_sysctl_xsplice_summary xen_sysctl_xsplice_summary_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_summary_t);
+
+/*
+ * Retrieve an array of abbreviated status and names of payloads that are
+ * loaded in the hypervisor.
+ *
+ * If the hypercall returns an positive number, it is the number (up to `nr`)
+ * of the payloads returned, along with `nr` updated with the number of remaining
+ * payloads, `version` updated (it may be the same across hypercalls. If it
+ * varies the data is stale and further calls could fail). The `status`,
+ * `name`, and `len`' are updated at their designed index value (`idx`) with
+ * the returned value of data.
+ *
+ * If the hypercall returns E2BIG the `nr` is too big and should be
+ * lowered.
+ *
+ * This operation can be preempted by the hypercall returning EAGAIN.
+ * Retry.
+ *
+ * Note that due to the asynchronous nature of hypercalls the domain might have
+ * added or removed the number of payloads making this information stale. It is
+ * the responsibility of the toolstack to use the `version` field to check
+ * between each invocation. if the version differs it should discard the stale
+ * data and start from scratch. It is OK for the toolstack to use the new
+ * `version` field.
+ */
+#define XEN_SYSCTL_XSPLICE_LIST 2
+struct xen_sysctl_xsplice_list {
+    uint32_t version;                       /* IN/OUT: Initially *MUST* be zero.
+                                               On subsequent calls reuse value.
+                                               If varies between calls, we are
+                                             * getting stale data. */
+    uint32_t idx;                           /* IN/OUT: Index into array. */
+    uint32_t nr;                            /* IN: How many status, name, and len
+                                               should fill out.
+                                               OUT: How many payloads left. */
+    uint32_t pad;                           /* IN: Must be zero. */
+    XEN_GUEST_HANDLE_64(xen_xsplice_status_t) status;  /* OUT. Must have enough
+                                               space allocate for nr of them. */
+    XEN_GUEST_HANDLE_64(char) name;         /* OUT: Array of names. Each member
+                                               MUST XEN_XSPLICE_NAME_SIZE in size.
+                                               Must have nr of them. */
+    XEN_GUEST_HANDLE_64(uint32) len;        /* OUT: Array of lengths of name's.
+                                               Must have nr of them. */
+};
+typedef struct xen_sysctl_xsplice_list xen_sysctl_xsplice_list_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_list_t);
+
+/*
+ * Perform an operation on the payload structure referenced by the `name` field.
+ * The operation request is asynchronous and the status should be retrieved
+ * by using either XEN_SYSCTL_XSPLICE_GET or XEN_SYSCTL_XSPLICE_LIST hypercall.
+ */
+#define XEN_SYSCTL_XSPLICE_ACTION 3
+struct xen_sysctl_xsplice_action {
+    xen_xsplice_name_t name;                /* IN, name of the patch. */
+#define XSPLICE_ACTION_CHECK        1
+#define XSPLICE_ACTION_UNLOAD       2
+#define XSPLICE_ACTION_REVERT       3
+#define XSPLICE_ACTION_APPLY        4
+#define XSPLICE_ACTION_REPLACE      5
+    uint32_t cmd;                           /* IN: XSPLICE_ACTION_*. */
+    uint32_t timeout;                       /* IN: Zero if no timeout. */
+                                            /* Or upper bound of time (ms) */
+                                            /* for operation to take. */
+};
+typedef struct xen_sysctl_xsplice_action xen_sysctl_xsplice_action_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_action_t);
+
+struct xen_sysctl_xsplice_op {
+    uint32_t cmd;                           /* IN: XEN_SYSCTL_XSPLICE_*. */
+    uint32_t pad;                           /* IN: Always zero. */
+    union {
+        xen_sysctl_xsplice_upload_t upload;
+        xen_sysctl_xsplice_list_t list;
+        xen_sysctl_xsplice_summary_t get;
+        xen_sysctl_xsplice_action_t action;
+    } u;
+};
+typedef struct xen_sysctl_xsplice_op xen_sysctl_xsplice_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_op_t);
+
 struct xen_sysctl {
     uint32_t cmd;
 #define XEN_SYSCTL_readconsole                    1
@@ -791,6 +945,7 @@ struct xen_sysctl {
 #define XEN_SYSCTL_pcitopoinfo                   22
 #define XEN_SYSCTL_psr_cat_op                    23
 #define XEN_SYSCTL_tmem_op                       24
+#define XEN_SYSCTL_xsplice_op                    25
     uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
     union {
         struct xen_sysctl_readconsole       readconsole;
@@ -816,6 +971,7 @@ struct xen_sysctl {
         struct xen_sysctl_psr_cmt_op        psr_cmt_op;
         struct xen_sysctl_psr_cat_op        psr_cat_op;
         struct xen_sysctl_tmem_op           tmem_op;
+        struct xen_sysctl_xsplice_op        xsplice;
         uint8_t                             pad[128];
     } u;
 };
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
new file mode 100644
index 0000000..cf465c4
--- /dev/null
+++ b/xen/include/xen/xsplice.h
@@ -0,0 +1,15 @@
+#ifndef __XEN_XSPLICE_H__
+#define __XEN_XSPLICE_H__
+
+struct xen_sysctl_xsplice_op;
+
+#ifdef CONFIG_XSPLICE
+int xsplice_control(struct xen_sysctl_xsplice_op *);
+#else
+#include <xen/errno.h> /* For -ENOSYS */
+static inline int xsplice_control(struct xen_sysctl_xsplice_op *op)
+{
+    return -ENOSYS;
+}
+#endif
+#endif /* __XEN_XSPLICE_H__ */
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index f63c3e2..c856e1e 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -807,6 +807,12 @@ static int flask_sysctl(int cmd)
     case XEN_SYSCTL_tmem_op:
         return domain_has_xen(current->domain, XEN__TMEM_CONTROL);
 
+#ifdef CONFIG_XSPLICE
+    case XEN_SYSCTL_xsplice_op:
+        return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2,
+                                    XEN2__XSPLICE_OP, NULL);
+#endif
+
     default:
         printk("flask_sysctl: Unknown op %d\n", cmd);
         return -EPERM;
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index effb59f..5f08d05 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -93,6 +93,8 @@ class xen2
     pmu_ctrl
 # PMU use (domains, including unprivileged ones, will be using this operation)
     pmu_use
+# XEN_SYSCTL_xsplice_op
+    xsplice_op
 }
 
 # Classes domain and domain2 consist of operations that a domain performs on
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 02/23] libxc: Implementation of XEN_XSPLICE_op in libxc (v5).
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
  2016-02-12 18:05 ` [PATCH v3 01/23] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op (v10) Konrad Rzeszutek Wilk
@ 2016-02-12 18:05 ` Konrad Rzeszutek Wilk
  2016-02-15 12:35   ` Wei Liu
  2016-02-12 18:05 ` [PATCH v3 03/23] xen-xsplice: Tool to manipulate xsplice payloads (v4) Konrad Rzeszutek Wilk
                   ` (21 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:05 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Ian Jackson, Stefano Stabellini,
	Ian Campbell, Wei Liu, xen-devel
  Cc: Konrad Rzeszutek Wilk

The underlaying toolstack code to do the basic
operations when using the XEN_XSPLICE_op syscalls:
 - upload the payload,
 - get status of an payload,
 - list all the payloads,
 - apply, check, replace, and revert the payload.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
v2: Actually set zero for the _pad entries.
v3: Split status into state and error code.
    Add REPLACE action.
v4: Use timeout and utilize pads.
v5: Update per Wei's review.
---
 tools/libxc/include/xenctrl.h |  19 ++-
 tools/libxc/xc_misc.c         | 332 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 350 insertions(+), 1 deletion(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 1a5f4ec..7c666b7 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2573,9 +2573,26 @@ int xc_psr_cat_get_l3_info(xc_interface *xch, uint32_t socket,
                            bool *cdp_enabled);
 #endif
 
+int xc_xsplice_upload(xc_interface *xch,
+                      char *name, unsigned char *payload, uint32_t size);
+
+int xc_xsplice_get(xc_interface *xch,
+                   char *name,
+                   xen_xsplice_status_t *status);
+
+int xc_xsplice_list(xc_interface *xch, unsigned int max, unsigned int start,
+                    xen_xsplice_status_t *info, char *name,
+                    uint32_t *len, unsigned int *done,
+                    unsigned int *left);
+
+int xc_xsplice_apply(xc_interface *xch, char *name, uint32_t timeout);
+int xc_xsplice_revert(xc_interface *xch, char *name, uint32_t timeout);
+int xc_xsplice_unload(xc_interface *xch, char *name, uint32_t timeout);
+int xc_xsplice_check(xc_interface *xch, char *name, uint32_t timeout);
+int xc_xsplice_replace(xc_interface *xch, char *name, uint32_t timeout);
+
 /* Compat shims */
 #include "xenctrl_compat.h"
-
 #endif /* XENCTRL_H */
 
 /*
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 124537b..b0f7068 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -693,6 +693,338 @@ int xc_hvm_inject_trap(
     return rc;
 }
 
+int xc_xsplice_upload(xc_interface *xch,
+                      char *name,
+                      unsigned char *payload,
+                      uint32_t size)
+{
+    int rc;
+    DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BUFFER(char, local);
+    DECLARE_HYPERCALL_BOUNCE(name, 0 /* later */, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    xen_xsplice_name_t def_name = { .pad = { 0, 0, 0 } };
+
+    if ( !name || !payload )
+        return -1;
+
+    def_name.size = strlen(name);
+    if ( def_name.size > XEN_XSPLICE_NAME_SIZE )
+        return -1;
+
+    HYPERCALL_BOUNCE_SET_SIZE(name, def_name.size );
+
+    if ( xc_hypercall_bounce_pre(xch, name) )
+        return -1;
+
+    local = xc_hypercall_buffer_alloc(xch, local, size);
+    if ( !local )
+    {
+        xc_hypercall_bounce_post(xch, name);
+        return -1;
+    }
+    memcpy(local, payload, size);
+
+    sysctl.cmd = XEN_SYSCTL_xsplice_op;
+    sysctl.u.xsplice.cmd = XEN_SYSCTL_XSPLICE_UPLOAD;
+    sysctl.u.xsplice.pad = 0;
+    sysctl.u.xsplice.u.upload.size = size;
+    set_xen_guest_handle(sysctl.u.xsplice.u.upload.payload, local);
+
+    sysctl.u.xsplice.u.upload.name = def_name;
+    set_xen_guest_handle(sysctl.u.xsplice.u.upload.name.name, name);
+
+    rc = do_sysctl(xch, &sysctl);
+
+    xc_hypercall_buffer_free(xch, local);
+    xc_hypercall_bounce_post(xch, name);
+
+    return rc;
+}
+
+int xc_xsplice_get(xc_interface *xch,
+                   char *name,
+                   xen_xsplice_status_t *status)
+{
+    int rc;
+    DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BOUNCE(name, 0 /*adjust later */, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    xen_xsplice_name_t def_name = { .pad = { 0, 0, 0 } };
+
+    if ( !name )
+        return -1;
+
+    def_name.size = strlen(name);
+    if ( def_name.size > XEN_XSPLICE_NAME_SIZE )
+        return -1;
+
+    HYPERCALL_BOUNCE_SET_SIZE(name, def_name.size );
+
+    if ( xc_hypercall_bounce_pre(xch, name) )
+        return -1;
+
+    sysctl.cmd = XEN_SYSCTL_xsplice_op;
+    sysctl.u.xsplice.cmd = XEN_SYSCTL_XSPLICE_GET;
+    sysctl.u.xsplice.pad = 0;
+
+    sysctl.u.xsplice.u.get.status.state = 0;
+    sysctl.u.xsplice.u.get.status.rc = 0;
+
+    sysctl.u.xsplice.u.get.name = def_name;
+    set_xen_guest_handle(sysctl.u.xsplice.u.get.name.name, name);
+
+    rc = do_sysctl(xch, &sysctl);
+
+    xc_hypercall_bounce_post(xch, name);
+
+    memcpy(status, &sysctl.u.xsplice.u.get.status, sizeof(*status));
+
+    return rc;
+}
+
+/*
+ * The heart of this function is to get an array of xen_xsplice_status_t.
+ *
+ * However it is complex because it has to deal with the hypervisor
+ * returning -EAGAIN or the data that is being returned becomes stale
+ * (another hypercall might alter the list).
+ *
+ * The parameters that the function expects to contain data from
+ * the hypervisor are: 'info', 'name', and 'len'. The 'done' and
+ * 'left' are also updated with the number of entries filled out
+ * and respectively the number of entries left to get from hypervisor.
+ *
+ * It is expected that the caller of this function will take the
+ * 'left' and use the value for 'start'. This way we have an
+ * cursor in the array. Note that the 'info','name', and 'len' will
+ * be updated at the subsequent calls.
+ *
+ * The 'max' is to be provided by the caller with the maximum
+ * number of entries that 'info', 'name', and 'len' arrays can
+ * be filled up with.
+ *
+ * Each entry in the 'name' array is expected to be of XEN_XSPLICE_NAME_SIZE
+ * length.
+ *
+ * Each entry in the 'info' array is expected to be of xen_xsplice_status_t
+ * structure size.
+ *
+ * Each entry in the 'len' array is expected to be of uint32_t size.
+ *
+ * The return value is zero if the hypercall completed successfully.
+ * Note that the return value is _not_ the amount of entries filled
+ * out - that is saved in 'done'.
+ *
+ * If there was an error performing the operation, the return value
+ * will contain an negative -EXX type value. The 'done' and 'left'
+ * will contain the number of entries that had been succesfully
+ * retrieved (if any).
+ */
+int xc_xsplice_list(xc_interface *xch, unsigned int max, unsigned int start,
+                    xen_xsplice_status_t *info,
+                    char *name, uint32_t *len,
+                    unsigned int *done,
+                    unsigned int *left)
+{
+    int rc;
+    DECLARE_SYSCTL;
+    /* The sizes are adjusted later - hence zero. */
+    DECLARE_HYPERCALL_BOUNCE(info, 0, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_HYPERCALL_BOUNCE(name, 0, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_HYPERCALL_BOUNCE(len, 0, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    uint32_t max_batch_sz, nr;
+    uint32_t version = 0, retries = 0;
+    uint32_t adjust = 0;
+    ssize_t sz;
+
+    if ( !max || !info || !name || !len )
+        return -1;
+
+    sysctl.cmd = XEN_SYSCTL_xsplice_op;
+    sysctl.u.xsplice.cmd = XEN_SYSCTL_XSPLICE_LIST;
+    sysctl.u.xsplice.pad = 0;
+    sysctl.u.xsplice.u.list.version = 0;
+    sysctl.u.xsplice.u.list.idx = start;
+    sysctl.u.xsplice.u.list.pad = 0;
+
+    max_batch_sz = max;
+    /* Convience value. */
+    sz = sizeof(*name) * XEN_XSPLICE_NAME_SIZE;
+    *done = 0;
+    *left = 0;
+    do {
+        /*
+         * The first time we go in this loop our 'max' may be bigger
+         * than what the hypervisor is comfortable with - hence the first
+         * couple of loops may adjust the number of entries we will
+         * want filled (tracked by 'nr').
+         */
+        if ( adjust )
+            adjust = 0; /* Used when adjusting the 'max_batch_sz' or 'retries'. */
+
+        nr = min(max - *done, max_batch_sz);
+
+        sysctl.u.xsplice.u.list.nr = nr;
+        /* Fix the size (may vary between hypercalls). */
+        HYPERCALL_BOUNCE_SET_SIZE(info, nr * sizeof(*info));
+        HYPERCALL_BOUNCE_SET_SIZE(name, nr * nr);
+        HYPERCALL_BOUNCE_SET_SIZE(len, nr * sizeof(*len));
+        /* Move the pointer to proper offset into 'info'. */
+        (HYPERCALL_BUFFER(info))->ubuf = info + *done;
+        (HYPERCALL_BUFFER(name))->ubuf = name + (sz * *done);
+        (HYPERCALL_BUFFER(len))->ubuf = len + *done;
+        /* Allocate memory. */
+        rc = xc_hypercall_bounce_pre(xch, info);
+        if ( rc )
+            break;
+
+        rc = xc_hypercall_bounce_pre(xch, name);
+        if ( rc )
+            break;
+
+        rc = xc_hypercall_bounce_pre(xch, len);
+        if ( rc )
+            break;
+
+        set_xen_guest_handle(sysctl.u.xsplice.u.list.status, info);
+        set_xen_guest_handle(sysctl.u.xsplice.u.list.name, name);
+        set_xen_guest_handle(sysctl.u.xsplice.u.list.len, len);
+
+        rc = do_sysctl(xch, &sysctl);
+        /*
+         * From here on we MUST call xc_hypercall_bounce. If rc < 0 we
+         * end up doing it (outside the loop), so using a break is OK.
+         */
+        if ( rc < 0 && errno == E2BIG )
+        {
+            if ( max_batch_sz <= 1 )
+                break;
+            max_batch_sz >>= 1;
+            adjust = 1; /* For the loop conditional to let us loop again. */
+            /* No memory leaks! */
+            xc_hypercall_bounce_post(xch, info);
+            xc_hypercall_bounce_post(xch, name);
+            xc_hypercall_bounce_post(xch, len);
+            continue;
+        }
+        else if ( rc < 0 ) /* For all other errors we bail out. */
+            break;
+
+        if ( !version )
+            version = sysctl.u.xsplice.u.list.version;
+
+        if ( sysctl.u.xsplice.u.list.version != version )
+        {
+            /* We could make this configurable as parameter? */
+            if ( retries++ > 3 )
+            {
+                rc = -1;
+                errno = EBUSY;
+                break;
+            }
+            *done = 0; /* Retry from scratch. */
+            version = sysctl.u.xsplice.u.list.version;
+            adjust = 1; /* And make sure we continue in the loop. */
+            /* No memory leaks. */
+            xc_hypercall_bounce_post(xch, info);
+            xc_hypercall_bounce_post(xch, name);
+            xc_hypercall_bounce_post(xch, len);
+            continue;
+        }
+
+        /* We should never hit this, but just in case. */
+        if ( rc > nr )
+        {
+            errno = EINVAL; /* Overflow! */
+            rc = -1;
+            break;
+        }
+        *left = sysctl.u.xsplice.u.list.nr; /* Total remaining count. */
+        /* Copy only up 'rc' of data' - we could add 'min(rc,nr) if desired. */
+        HYPERCALL_BOUNCE_SET_SIZE(info, (rc * sizeof(*info)));
+        HYPERCALL_BOUNCE_SET_SIZE(name, (rc * sz));
+        HYPERCALL_BOUNCE_SET_SIZE(len, (rc * sizeof(*len)));
+        /* Bounce the data and free the bounce buffer. */
+        xc_hypercall_bounce_post(xch, info);
+        xc_hypercall_bounce_post(xch, name);
+        xc_hypercall_bounce_post(xch, len);
+        /* And update how many elements of info we have copied into. */
+        *done += rc;
+        /* Update idx. */
+        sysctl.u.xsplice.u.list.idx = *done;
+    } while ( adjust || (*done < max && *left != 0) );
+
+    if ( rc < 0 )
+    {
+        xc_hypercall_bounce_post(xch, len);
+        xc_hypercall_bounce_post(xch, name);
+        xc_hypercall_bounce_post(xch, info);
+    }
+
+    return rc > 0 ? 0 : rc;
+}
+
+static int _xc_xsplice_action(xc_interface *xch,
+                              char *name,
+                              unsigned int action,
+                              uint32_t timeout)
+{
+    int rc;
+    DECLARE_SYSCTL;
+    /* The size is figured out when we strlen(name) */
+    DECLARE_HYPERCALL_BOUNCE(name, 0, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    xen_xsplice_name_t def_name = { .pad = { 0, 0, 0 } };
+
+    def_name.size = strlen(name);
+
+    if ( def_name.size > XEN_XSPLICE_NAME_SIZE )
+        return -1;
+
+    HYPERCALL_BOUNCE_SET_SIZE(name, def_name.size);
+
+    if ( xc_hypercall_bounce_pre(xch, name) )
+        return -1;
+
+    sysctl.cmd = XEN_SYSCTL_xsplice_op;
+    sysctl.u.xsplice.cmd = XEN_SYSCTL_XSPLICE_ACTION;
+    sysctl.u.xsplice.pad = 0;
+    sysctl.u.xsplice.u.action.cmd = action;
+    sysctl.u.xsplice.u.action.timeout = timeout;
+
+    sysctl.u.xsplice.u.action.name = def_name;
+    set_xen_guest_handle(sysctl.u.xsplice.u.action.name.name, name);
+
+    rc = do_sysctl(xch, &sysctl);
+
+    xc_hypercall_bounce_post(xch, name);
+
+    return rc;
+}
+
+int xc_xsplice_apply(xc_interface *xch, char *name, uint32_t timeout)
+{
+    return _xc_xsplice_action(xch, name, XSPLICE_ACTION_APPLY, timeout);
+}
+
+int xc_xsplice_revert(xc_interface *xch, char *name, uint32_t timeout)
+{
+    return _xc_xsplice_action(xch, name, XSPLICE_ACTION_REVERT, timeout);
+}
+
+int xc_xsplice_unload(xc_interface *xch, char *name, uint32_t timeout)
+{
+    return _xc_xsplice_action(xch, name, XSPLICE_ACTION_UNLOAD, timeout);
+}
+
+int xc_xsplice_check(xc_interface *xch, char *name, uint32_t timeout)
+{
+    return _xc_xsplice_action(xch, name, XSPLICE_ACTION_CHECK, timeout);
+}
+
+int xc_xsplice_replace(xc_interface *xch, char *name, uint32_t timeout)
+{
+    return _xc_xsplice_action(xch, name, XSPLICE_ACTION_REPLACE, timeout);
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 03/23] xen-xsplice: Tool to manipulate xsplice payloads (v4)
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
  2016-02-12 18:05 ` [PATCH v3 01/23] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op (v10) Konrad Rzeszutek Wilk
  2016-02-12 18:05 ` [PATCH v3 02/23] libxc: Implementation of XEN_XSPLICE_op in libxc (v5) Konrad Rzeszutek Wilk
@ 2016-02-12 18:05 ` Konrad Rzeszutek Wilk
  2016-02-15 12:59   ` Wei Liu
  2016-02-12 18:05 ` [PATCH v3 04/23] elf: Add relocation types to elfstructs.h Konrad Rzeszutek Wilk
                   ` (20 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:05 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Ian Jackson, Stefano Stabellini,
	Ian Campbell, Wei Liu, xen-devel
  Cc: Konrad Rzeszutek Wilk

A simple tool that allows an system admin to perform
basic xsplice operations:

 - Upload a xsplice file (with an unique name)
 - List all the xsplice payloads loaded.
 - Apply, revert, replace, unload, or check the payload using the
   unique name.
 - Do all three - upload, check, and apply the
   payload in one go (load). Also will use the name of the
   file as the <name>

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
v2:
 - Removed REVERTED state.
 - Fixed bugs handling XSPLICE_STATUS_PROGRESS.
 - Split status into state and error.
   Add REPLACE action.
v3:
 - Utilize the timeout and use the default one (let the hypervisor
   pick it).
 - Change the s/all/load and infer the <id> from name of file.
v4:
 - s/id/name/
 - Don't use hypercall buffer in upload_func, instead do it in libxc
 - Remove the debug printk.
---
 .gitignore               |   1 +
 tools/misc/Makefile      |   4 +
 tools/misc/xen-xsplice.c | 470 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 475 insertions(+)
 create mode 100644 tools/misc/xen-xsplice.c

diff --git a/.gitignore b/.gitignore
index 91f690c..5cae935 100644
--- a/.gitignore
+++ b/.gitignore
@@ -181,6 +181,7 @@ tools/misc/xc_shadow
 tools/misc/xen_cpuperf
 tools/misc/xen-detect
 tools/misc/xen-tmem-list-parse
+tools/misc/xen-xsplice
 tools/misc/xenperf
 tools/misc/xenpm
 tools/misc/xen-hvmctx
diff --git a/tools/misc/Makefile b/tools/misc/Makefile
index a2ef0ec..e1956f6 100644
--- a/tools/misc/Makefile
+++ b/tools/misc/Makefile
@@ -31,6 +31,7 @@ INSTALL_SBIN                   += xenlockprof
 INSTALL_SBIN                   += xenperf
 INSTALL_SBIN                   += xenpm
 INSTALL_SBIN                   += xenwatchdogd
+INSTALL_SBIN                   += xen-xsplice
 INSTALL_SBIN += $(INSTALL_SBIN-y)
 
 # Everything to be installed in a private bin/
@@ -99,6 +100,9 @@ xen-mfndump: xen-mfndump.o
 xenwatchdogd: xenwatchdogd.o
 	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
 
+xen-xsplice: xen-xsplice.o
+	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
+
 xen-lowmemd: xen-lowmemd.o
 	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenevtchn) $(LDLIBS_libxenctrl) $(LDLIBS_libxenstore) $(APPEND_LDFLAGS)
 
diff --git a/tools/misc/xen-xsplice.c b/tools/misc/xen-xsplice.c
new file mode 100644
index 0000000..13f762f
--- /dev/null
+++ b/tools/misc/xen-xsplice.c
@@ -0,0 +1,470 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ */
+
+#include <fcntl.h>
+#include <libgen.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <unistd.h>
+#include <xenctrl.h>
+#include <xenstore.h>
+
+static xc_interface *xch;
+
+void show_help(void)
+{
+    fprintf(stderr,
+            "xen-xsplice: Xsplice test tool\n"
+            "Usage: xen-xsplice <command> [args]\n"
+            " <name> An unique name of payload. Up to %d characters.\n"
+            "Commands:\n"
+            "  help                   display this help\n"
+            "  upload <name> <file>   upload file <file> with <name> name\n"
+            "  list                   list payloads uploaded.\n"
+            "  apply <name>           apply <name> patch.\n"
+            "  revert <name>          revert name <name> patch.\n"
+            "  replace <name>         apply <name> patch and revert all others.\n"
+            "  unload <name>          unload name <name> patch.\n"
+            "  check <name>           check name <name> patch.\n"
+            "  load  <file>           upload, check and apply <file>.\n"
+            "                         name is the <file> name\n",
+            XEN_XSPLICE_NAME_SIZE);
+}
+
+/* wrapper function */
+static int help_func(int argc, char *argv[])
+{
+    show_help();
+    return 0;
+}
+
+#define ARRAY_SIZE(a) (sizeof (a) / sizeof ((a)[0]))
+
+static const char *state2str(long state)
+{
+#define STATE(x) [XSPLICE_STATE_##x] = #x
+    static const char *const names[] = {
+            STATE(LOADED),
+            STATE(CHECKED),
+            STATE(APPLIED),
+    };
+#undef STATE
+    if (state >= ARRAY_SIZE(names))
+        return "unknown";
+
+    if (state < 0)
+        return "-EXX";
+
+    if (!names[state])
+        return "unknown";
+
+    return names[state];
+}
+
+/* This value was choosen adhoc. It could be 42 too. */
+#define MAX_LEN 11
+static int list_func(int argc, char *argv[])
+{
+    unsigned int idx, done, left, i;
+    xen_xsplice_status_t *info = NULL;
+    char *name = NULL;
+    uint32_t *len = NULL;
+    int rc = ENOMEM;
+
+    if ( argc )
+    {
+        show_help();
+        return -1;
+    }
+    idx = left = 0;
+    info = malloc(sizeof(*info) * MAX_LEN);
+    if ( !info )
+        goto out;
+    name = malloc(sizeof(*name) * XEN_XSPLICE_NAME_SIZE * MAX_LEN);
+    if ( !name )
+        goto out;
+    len = malloc(sizeof(*len) * MAX_LEN);
+    if ( !len )
+        goto out;
+
+    fprintf(stdout," ID                                     | status\n"
+                   "----------------------------------------+------------\n");
+    do {
+        done = 0;
+        /* The memset is done to catch errors. */
+        memset(info, 'A', sizeof(*info) * MAX_LEN);
+        memset(name, 'B', sizeof(*name * MAX_LEN * XEN_XSPLICE_NAME_SIZE));
+        memset(len, 'C', sizeof(*len) * MAX_LEN);
+        rc = xc_xsplice_list(xch, MAX_LEN, idx, info, name, len, &done, &left);
+        if ( rc )
+        {
+            fprintf(stderr, "Failed to list %d/%d: %d(%s)!\n",
+                    idx, left, errno, strerror(errno));
+            break;
+        }
+        for ( i = 0; i < done; i++ )
+        {
+            unsigned int j;
+            uint32_t sz;
+            char *str;
+
+            sz = len[i];
+            str = name + (i * XEN_XSPLICE_NAME_SIZE);
+            for ( j = sz; j < XEN_XSPLICE_NAME_SIZE; j++ )
+                str[j] = '\0';
+
+            printf("%-40s| %s", str, state2str(info[i].state));
+            if ( info[i].rc )
+                printf(" (%d, %s)\n", -info[i].rc, strerror(-info[i].rc));
+            else
+                puts("");
+        }
+        idx += done;
+    } while ( left );
+
+out:
+    free(name);
+    free(info);
+    free(len);
+    return rc;
+}
+#undef MAX_LEN
+
+static int get_name(int argc, char *argv[], char *name)
+{
+    ssize_t len = strlen(argv[0]);
+    if ( len > XEN_XSPLICE_NAME_SIZE )
+    {
+        fprintf(stderr, "ID MUST be %d characters!\n", XEN_XSPLICE_NAME_SIZE);
+        errno = EINVAL;
+        return errno;
+    }
+    /* Don't want any funny strings from the stack. */
+    memset(name, 0, XEN_XSPLICE_NAME_SIZE);
+    strncpy(name, argv[0], len);
+    return 0;
+}
+
+static int upload_func(int argc, char *argv[])
+{
+    char *filename;
+    char name[XEN_XSPLICE_NAME_SIZE];
+    int fd = 0, rc;
+    struct stat buf;
+    unsigned char *fbuf;
+    ssize_t len;
+
+    if ( argc != 2 )
+    {
+        show_help();
+        return -1;
+    }
+
+    if ( get_name(argc, argv, name) )
+        return EINVAL;
+
+    filename = argv[1];
+    fd = open(filename, O_RDONLY);
+    if ( fd < 0 )
+    {
+        fprintf(stderr, "Could not open %s, error: %d(%s)\n",
+                filename, errno, strerror(errno));
+        return errno;
+    }
+    if ( stat(filename, &buf) != 0 )
+    {
+        fprintf(stderr, "Could not get right size %s, error: %d(%s)\n",
+                filename, errno, strerror(errno));
+        close(fd);
+        return errno;
+    }
+
+    len = buf.st_size;
+    fbuf = mmap(0, len, PROT_READ, MAP_PRIVATE, fd, 0);
+    if ( fbuf == MAP_FAILED )
+    {
+        fprintf(stderr,"Could not map: %s, error: %d(%s)\n",
+                filename, errno, strerror(errno));
+        close (fd);
+        return errno;
+    }
+    printf("Uploading %s (%zu bytes)\n", filename, len);
+    rc = xc_xsplice_upload(xch, name, fbuf, len);
+    if ( rc )
+    {
+        fprintf(stderr, "Upload failed: %s, error: %d(%s)!\n",
+                filename, errno, strerror(errno));
+        goto out;
+    }
+out:
+    if ( munmap( fbuf, len) )
+    {
+        fprintf(stderr, "Could not unmap!? error: %d(%s)!\n",
+                errno, strerror(errno));
+        rc = errno;
+    }
+    close(fd);
+
+    return rc;
+}
+
+/* These MUST match to the 'action_options[]' array slots. */
+enum {
+    ACTION_APPLY = 0,
+    ACTION_REVERT = 1,
+    ACTION_UNLOAD = 2,
+    ACTION_CHECK = 3,
+    ACTION_REPLACE = 4,
+};
+
+struct {
+    int allow; /* State it must be in to call function. */
+    int expected; /* The state to be in after the function. */
+    const char *name;
+    int (*function)(xc_interface *xch, char *name, uint32_t timeout);
+    unsigned int executed; /* Has the function been called?. */
+} action_options[] = {
+    {   .allow = XSPLICE_STATE_CHECKED,
+        .expected = XSPLICE_STATE_APPLIED,
+        .name = "apply",
+        .function = xc_xsplice_apply,
+    },
+    {   .allow = XSPLICE_STATE_APPLIED,
+        .expected = XSPLICE_STATE_CHECKED,
+        .name = "revert",
+        .function = xc_xsplice_revert,
+    },
+    {   .allow = XSPLICE_STATE_CHECKED | XSPLICE_STATE_LOADED,
+        .expected = -ENOENT,
+        .name = "unload",
+        .function = xc_xsplice_unload,
+    },
+    {   .allow = XSPLICE_STATE_CHECKED | XSPLICE_STATE_LOADED,
+        .expected = XSPLICE_STATE_CHECKED,
+        .name = "check",
+        .function = xc_xsplice_check
+    },
+    {   .allow = XSPLICE_STATE_CHECKED,
+        .expected = XSPLICE_STATE_APPLIED,
+        .name = "replace",
+        .function = xc_xsplice_replace,
+    },
+};
+
+/* Go around 300 * 0.1 seconds = 30 seconds. */
+#define RETRIES 300
+/* aka 0.1 second */
+#define DELAY 100000
+
+int action_func(int argc, char *argv[], unsigned int idx)
+{
+    char name[XEN_XSPLICE_NAME_SIZE];
+    int rc, original_state;
+    xen_xsplice_status_t status;
+    unsigned int retry = 0;
+
+    if ( argc != 1 )
+    {
+        show_help();
+        return -1;
+    }
+
+    if ( idx >= ARRAY_SIZE(action_options) )
+        return -1;
+
+    if ( get_name(argc, argv, name) )
+        return EINVAL;
+
+    /* Check initial status. */
+    rc = xc_xsplice_get(xch, name, &status);
+    if ( rc )
+        goto err;
+
+    if ( status.rc == -EAGAIN )
+    {
+        printf("%s failed. Operation already in progress\n", name);
+        return -1;
+    }
+
+    if ( status.state == action_options[idx].expected )
+    {
+        printf("No action needed\n");
+        return 0;
+    }
+
+    /* Perform action. */
+    if ( action_options[idx].allow & status.state )
+    {
+        printf("Performing %s:", action_options[idx].name);
+        rc = action_options[idx].function(xch, name, 0);
+        if ( rc )
+            goto err;
+    }
+    else
+    {
+        printf("%s: in wrong state (%s), expected (%s)\n",
+               name, state2str(status.state),
+               state2str(action_options[idx].expected));
+        return -1;
+    }
+
+    original_state = status.state;
+    do {
+        rc = xc_xsplice_get(xch, name, &status);
+        if ( rc )
+        {
+            rc = -errno;
+            break;
+        }
+
+        if ( status.state != original_state )
+            break;
+        if ( status.rc && status.rc != -EAGAIN )
+        {
+            rc = status.rc;
+            break;
+        }
+
+        printf(".");
+        fflush(stdout);
+        usleep(DELAY);
+    } while ( ++retry < RETRIES );
+
+    if ( retry >= RETRIES )
+    {
+        printf("%s: Operation didn't complete after 30 seconds.\n", name);
+        return -1;
+    }
+    else
+    {
+        if ( rc == 0 )
+            rc = status.state;
+
+        if ( action_options[idx].expected == rc )
+            printf(" completed\n");
+        else if ( rc < 0 )
+        {
+            printf("%s failed with %d(%s)\n", name, -rc, strerror(-rc));
+            return -1;
+        }
+        else
+        {
+            printf("%s: in wrong state (%s), expected (%s)\n",
+               name, state2str(rc),
+               state2str(action_options[idx].expected));
+            return -1;
+        }
+    }
+
+    return 0;
+
+ err:
+    printf("%s failed with %d(%s)\n", name, -rc, strerror(-rc));
+    return rc;
+}
+
+static int load_func(int argc, char *argv[])
+{
+    int rc;
+    char *new_argv[2];
+    char *path, *name, *lastdot;
+
+    if ( argc != 1 )
+    {
+        show_help();
+        return -1;
+    }
+    /* <file> */
+    new_argv[1] = argv[0];
+
+    /* Synthesize the <id> */
+    path = strdup(argv[0]);
+
+    name = basename(path);
+    lastdot = strrchr(name, '.');
+    if (lastdot != NULL)
+        *lastdot = '\0';
+    new_argv[0] = name;
+
+    rc = upload_func(2 /* <id> <file> */, new_argv);
+    if ( rc )
+        return rc;
+
+    rc = action_func(1 /* only <id> */, new_argv, ACTION_CHECK);
+    if ( rc )
+        goto unload;
+
+    rc = action_func(1 /* only <id> */, new_argv, ACTION_APPLY);
+    if ( rc )
+        goto unload;
+
+    free(path);
+    return 0;
+unload:
+    action_func(1, new_argv, ACTION_UNLOAD);
+    free(path);
+    return rc;
+}
+
+/*
+ * These are also functions in action_options that are called in case
+ * none of these match.
+ */
+struct {
+    const char *name;
+    int (*function)(int argc, char *argv[]);
+} main_options[] = {
+    { "help", help_func },
+    { "list", list_func },
+    { "upload", upload_func },
+    { "load", load_func },
+};
+
+int main(int argc, char *argv[])
+{
+    int i, j, ret;
+
+    if ( argc  <= 1 )
+    {
+        show_help();
+        return 0;
+    }
+    for ( i = 0; i < ARRAY_SIZE(main_options); i++ )
+        if (!strncmp(main_options[i].name, argv[1], strlen(argv[1])))
+            break;
+
+    if ( i == ARRAY_SIZE(main_options) )
+    {
+        for ( j = 0; j < ARRAY_SIZE(action_options); j++ )
+            if (!strncmp(action_options[j].name, argv[1], strlen(argv[1])))
+                break;
+
+        if ( j == ARRAY_SIZE(action_options) )
+        {
+            fprintf(stderr, "Unrecognised command '%s' -- try "
+                   "'xen-xsplice help'\n", argv[1]);
+            return 1;
+        }
+    } else
+        j = ARRAY_SIZE(action_options);
+
+    xch = xc_interface_open(0,0,0);
+    if ( !xch )
+    {
+        fprintf(stderr, "failed to get the handler\n");
+        return 0;
+    }
+
+    if ( i == ARRAY_SIZE(main_options) )
+        ret = action_func(argc -2, argv + 2, j);
+    else
+        ret = main_options[i].function(argc -2, argv + 2);
+
+    xc_interface_close(xch);
+
+    return !!ret;
+}
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 04/23] elf: Add relocation types to elfstructs.h
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
                   ` (2 preceding siblings ...)
  2016-02-12 18:05 ` [PATCH v3 03/23] xen-xsplice: Tool to manipulate xsplice payloads (v4) Konrad Rzeszutek Wilk
@ 2016-02-12 18:05 ` Konrad Rzeszutek Wilk
  2016-02-12 20:13   ` Andrew Cooper
  2016-02-15  8:34   ` Jan Beulich
  2016-02-12 18:05 ` [PATCH v3 05/23] xsplice: Add helper elf routines (v4) Konrad Rzeszutek Wilk
                   ` (19 subsequent siblings)
  23 siblings, 2 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:05 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Ian Campbell, Ian Jackson, Jan Beulich,
	Keir Fraser, Tim Deegan, xen-devel
  Cc: Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v2: Slim the list as we do not use all of them.
---
 xen/include/xen/elfstructs.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/xen/include/xen/elfstructs.h b/xen/include/xen/elfstructs.h
index 12ffb82..4ff3258 100644
--- a/xen/include/xen/elfstructs.h
+++ b/xen/include/xen/elfstructs.h
@@ -348,6 +348,14 @@ typedef struct {
 #define	ELF64_R_TYPE(info)	((info) & 0xFFFFFFFF)
 #define ELF64_R_INFO(s,t) 	(((s) << 32) + (u_int32_t)(t))
 
+/* x86-64 relocation types. We list only the ones we implement. */
+#define R_X86_64_NONE		0	/* No reloc */
+#define R_X86_64_64		1	/* Direct 64 bit  */
+#define R_X86_64_PC32		2	/* PC relative 32 bit signed */
+#define R_X86_64_PLT32		4	/* 32 bit PLT address */
+#define R_X86_64_32		10	/* Direct 32 bit zero extended */
+#define R_X86_64_32S		11	/* Direct 32 bit sign extended */
+
 /* Program Header */
 typedef struct {
 	Elf32_Word	p_type;		/* segment type */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 05/23] xsplice: Add helper elf routines (v4)
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
                   ` (3 preceding siblings ...)
  2016-02-12 18:05 ` [PATCH v3 04/23] elf: Add relocation types to elfstructs.h Konrad Rzeszutek Wilk
@ 2016-02-12 18:05 ` Konrad Rzeszutek Wilk
  2016-02-12 20:24   ` Andrew Cooper
  2016-02-12 18:05 ` [PATCH v3 06/23] xsplice: Implement payload loading (v4) Konrad Rzeszutek Wilk
                   ` (18 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:05 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Ian Campbell, Ian Jackson, Jan Beulich,
	Keir Fraser, Tim Deegan, xen-devel
  Cc: Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Add Elf routines and data structures in preparation for loading an
xSplice payload.

We also add an macro that will print where we failed during
the ELF parsing - which is only available during debug builds.
In production (debug=n) we only return the error value.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v2: - With the #define ELFSIZE in the ARM file we can use the common
     #defines instead of using #ifdef CONFIG_ARM_32. Moved to another
    patch.
    - Add checks for ELF file.
    - Add name to be printed.
    - Add len for easier ELF checks.
    - Expand on the checks. Add macro.
v3: Remove the return_ macro
v4: Add return_ macro back but make it depend on debug=y
---
 xen/common/Makefile           |   1 +
 xen/common/xsplice_elf.c      | 205 ++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/xsplice_elf.h |  37 ++++++++
 3 files changed, 243 insertions(+)
 create mode 100644 xen/common/xsplice_elf.c
 create mode 100644 xen/include/xen/xsplice_elf.h

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 43b3911..a8ceaff 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -74,3 +74,4 @@ subdir-y += libelf
 subdir-$(CONFIG_HAS_DEVICE_TREE) += libfdt
 
 obj-$(CONFIG_XSPLICE) += xsplice.o
+obj-$(CONFIG_XSPLICE) += xsplice_elf.o
diff --git a/xen/common/xsplice_elf.c b/xen/common/xsplice_elf.c
new file mode 100644
index 0000000..d9f9002
--- /dev/null
+++ b/xen/common/xsplice_elf.c
@@ -0,0 +1,205 @@
+#include <xen/errno.h>
+#include <xen/lib.h>
+#include <xen/xsplice_elf.h>
+#include <xen/xsplice.h>
+
+#ifdef NDEBUG
+#define return_(x) return x
+#else
+#define return_(x) { printk(XENLOG_DEBUG "%s:%d rc: %d\n",  \
+                            __func__,__LINE__, x); return x; }
+#endif
+
+struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
+                                                const char *name)
+{
+    unsigned int i;
+
+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( !strcmp(name, elf->sec[i].name) )
+            return &elf->sec[i];
+    }
+
+    return NULL;
+}
+
+static int elf_resolve_sections(struct xsplice_elf *elf, uint8_t *data)
+{
+    struct xsplice_elf_sec *sec;
+    unsigned int i;
+
+    sec = xmalloc_array(struct xsplice_elf_sec, elf->hdr->e_shnum);
+    if ( !sec )
+    {
+        printk(XENLOG_ERR "Could not allocate memory for section table!\n");
+        return_(-ENOMEM);
+    }
+
+    /* N.B. We also will ingest SHN_UNDEF sections. */
+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
+    {
+        ssize_t delta = elf->hdr->e_shoff + i * elf->hdr->e_shentsize;
+
+        if ( delta + sizeof(Elf_Shdr) > elf->len )
+            return_(-EINVAL);
+
+        sec[i].sec = (Elf_Shdr *)(data + delta);
+        delta = sec[i].sec->sh_offset;
+
+        if ( delta > elf->len )
+            return_(-EINVAL);
+
+        sec[i].data = data + delta;
+        /* Name is populated in xsplice_elf_sections_name. */
+        sec[i].name = NULL;
+
+        if ( sec[i].sec->sh_type == SHT_SYMTAB )
+        {
+                if ( elf->symtab )
+                    return_(-EINVAL);
+                elf->symtab = &sec[i];
+                /* elf->symtab->sec->sh_link would point to the right section
+                 * but we hadn't finished parsing all the sections. */
+                if ( elf->symtab->sec->sh_link > elf->hdr->e_shnum )
+                    return_(-EINVAL);
+        }
+    }
+    elf->sec = sec;
+    if ( !elf->symtab )
+        return_(-EINVAL);
+
+    /* There can be multiple SHT_STRTAB so pick the right one. */
+    elf->strtab = &sec[elf->symtab->sec->sh_link];
+
+    if ( elf->symtab->sec->sh_size == 0 || elf->symtab->sec->sh_entsize == 0 )
+        return_(-EINVAL);
+
+    if ( elf->symtab->sec->sh_entsize != sizeof(Elf_Sym) )
+        return_(-EINVAL);
+
+    return 0;
+}
+
+static int elf_resolve_section_names(struct xsplice_elf *elf, uint8_t *data)
+{
+    const char *shstrtab;
+    unsigned int i;
+    unsigned int offset, delta;
+
+    /* The elf->sec[0 -> e_shnum] structures have been verified by elf_resolve_sections */
+    /* Find file offset for section string table. */
+    offset =  elf->sec[elf->hdr->e_shstrndx].sec->sh_offset;
+
+    if ( offset > elf->len )
+        return_(-EINVAL);
+
+    shstrtab = (const char *)(data + offset);
+
+    /* We could ignore the first as it is reserved.. */
+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
+    {
+        delta = elf->sec[i].sec->sh_name;
+
+        if ( offset + delta > elf->len )
+            return_(-EINVAL);
+
+        elf->sec[i].name = shstrtab + delta;
+    }
+    return 0;
+}
+
+static int elf_get_sym(struct xsplice_elf *elf, uint8_t *data)
+{
+    struct xsplice_elf_sec *symtab_sec, *strtab_sec;
+    struct xsplice_elf_sym *sym;
+    unsigned int i, delta, offset;
+
+    symtab_sec = elf->symtab;
+
+    strtab_sec = elf->strtab;
+
+    /* Pointers arithmetic to get file offset. */
+    offset = strtab_sec->data - data;
+
+    ASSERT( offset == strtab_sec->sec->sh_offset );
+    /* symtab_sec->data was computed in elf_resolve_sections. */
+    ASSERT((symtab_sec->sec->sh_offset + data) == symtab_sec->data );
+
+    /* No need to check values as elf_resolve_sections did it. */
+    elf->nsym = symtab_sec->sec->sh_size / symtab_sec->sec->sh_entsize;
+
+    sym = xmalloc_array(struct xsplice_elf_sym, elf->nsym);
+    if ( !sym )
+    {
+        printk(XENLOG_ERR "%s: Could not allocate memory for symbols\n", elf->name);
+        return_(-ENOMEM);
+    }
+
+    for ( i = 0; i < elf->nsym; i++ )
+    {
+        Elf_Sym *s;
+
+        if ( i * sizeof(Elf_Sym) > elf->len )
+            return_(-EINVAL);
+
+        s = &((Elf_Sym *)symtab_sec->data)[i];
+
+        /* If st->name is STN_UNDEF it is zero, so the check will always be true. */
+        delta = s->st_name;
+        /* Offset has been computed earlier. */
+        if ( offset + delta > elf->len )
+            return_(-EINVAL);
+
+        sym[i].sym = s;
+        if ( s->st_name == STN_UNDEF )
+            sym[i].name = NULL;
+        else
+            sym[i].name = (const char *)data + ( delta + offset );
+    }
+    elf->sym = sym;
+
+    return 0;
+}
+
+int xsplice_elf_load(struct xsplice_elf *elf, uint8_t *data)
+{
+    int rc;
+
+    elf->hdr = (Elf_Ehdr *)data;
+
+    if ( sizeof(*elf->hdr) >= elf->len )
+        return_(-EINVAL);
+
+    if ( elf->hdr->e_shstrndx == SHN_UNDEF )
+        return_(-EINVAL);
+
+    /* Check that section name index is within the sections. */
+    if ( elf->hdr->e_shstrndx > elf->hdr->e_shnum )
+        return_(-EINVAL);
+
+    rc = elf_resolve_sections(elf, data);
+    if ( rc )
+        return rc;
+
+    rc = elf_resolve_section_names(elf, data);
+    if ( rc )
+        return rc;
+
+    rc = elf_get_sym(elf, data);
+    if ( rc )
+        return rc;
+
+    return 0;
+}
+
+void xsplice_elf_free(struct xsplice_elf *elf)
+{
+    xfree(elf->sec);
+    elf->sec = NULL;
+    xfree(elf->sym);
+    elf->sym = NULL;
+    elf->nsym = 0;
+    elf->name = NULL;
+    elf->len = 0;
+}
diff --git a/xen/include/xen/xsplice_elf.h b/xen/include/xen/xsplice_elf.h
new file mode 100644
index 0000000..42dbc6f
--- /dev/null
+++ b/xen/include/xen/xsplice_elf.h
@@ -0,0 +1,37 @@
+#ifndef __XEN_XSPLICE_ELF_H__
+#define __XEN_XSPLICE_ELF_H__
+
+#include <xen/types.h>
+#include <xen/elfstructs.h>
+
+/* The following describes an Elf file as consumed by xSplice. */
+struct xsplice_elf_sec {
+    Elf_Shdr *sec;                 /* Hooked up in elf_resolve_sections. */
+    const char *name;              /* Human readable name hooked in
+                                      elf_resolve_section_names. */
+    const uint8_t *data;           /* Pointer to the section (done by
+                                      elf_resolve_sections). */
+};
+
+struct xsplice_elf_sym {
+    Elf_Sym *sym;
+    const char *name;
+};
+
+struct xsplice_elf {
+    const char *name;              /* Pointer to payload->name. */
+    ssize_t len;                   /* Length of the ELF file. */
+    Elf_Ehdr *hdr;                 /* ELF file. */
+    struct xsplice_elf_sec *sec;   /* Array of sections, allocated by us. */
+    struct xsplice_elf_sym *sym;   /* Array of symbols , allocated by us. */
+    unsigned int nsym;
+    struct xsplice_elf_sec *symtab;/* Pointer to .symtab section - aka to sec[x]. */
+    struct xsplice_elf_sec *strtab;/* Pointer to .strtab section - aka to sec[y]. */
+};
+
+struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
+                                                const char *name);
+int xsplice_elf_load(struct xsplice_elf *elf, uint8_t *data);
+void xsplice_elf_free(struct xsplice_elf *elf);
+
+#endif /* __XEN_XSPLICE_ELF_H__ */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 06/23] xsplice: Implement payload loading (v4)
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
                   ` (4 preceding siblings ...)
  2016-02-12 18:05 ` [PATCH v3 05/23] xsplice: Add helper elf routines (v4) Konrad Rzeszutek Wilk
@ 2016-02-12 18:05 ` Konrad Rzeszutek Wilk
  2016-02-12 20:48   ` Andrew Cooper
  2016-02-12 18:05 ` [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5) Konrad Rzeszutek Wilk
                   ` (17 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:05 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Ian Campbell, Stefano Stabellini,
	Keir Fraser, Jan Beulich, xen-devel
  Cc: Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Add support for loading xsplice payloads. This is somewhat similar to
the Linux kernel module loader, implementing the following steps:
- Verify the elf file.
- Parse the elf file.
- Allocate a region of memory mapped within a free area of
  [xen_virt_end, XEN_VIRT_END].
- Copy allocated sections into the new region.
- Resolve section symbols. All other symbols must be absolute addresses.
- Perform relocations.

Note that the structure 'xsplice_patch_func' differs a bit from the design
by usurping 8 bytes from the padding. We use that for our own uses.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v2: - Change the 'xsplice_patch_func' structure layout/size.
    - Add more error checking. Fix memory leak.
    - Move elf_resolve and elf_perform relocs in elf file.
    - Print the payload address and pages in keyhandler.
v3:
    - Make it build under ARM
    - Build it without using the return_ macro.
    - Add fixes from Ross.
v4:
    - Add the _return macro back - but only use it during debug builds.
---
 xen/arch/arm/Makefile             |   1 +
 xen/arch/arm/xsplice.c            |  23 ++++
 xen/arch/x86/Makefile             |   1 +
 xen/arch/x86/setup.c              |   7 ++
 xen/arch/x86/xsplice.c            | 113 ++++++++++++++++++
 xen/common/xsplice.c              | 243 +++++++++++++++++++++++++++++++++++++-
 xen/common/xsplice_elf.c          |  84 +++++++++++++
 xen/include/asm-x86/x86_64/page.h |   2 +
 xen/include/xen/xsplice.h         |  12 ++
 xen/include/xen/xsplice_elf.h     |   5 +
 10 files changed, 489 insertions(+), 2 deletions(-)
 create mode 100644 xen/arch/arm/xsplice.c
 create mode 100644 xen/arch/x86/xsplice.c

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 1783912..f144c14 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -39,6 +39,7 @@ obj-y += device.o
 obj-y += decode.o
 obj-y += processor.o
 obj-y += smc.o
+obj-$(CONFIG_XSPLICE) += xsplice.o
 
 #obj-bin-y += ....o
 
diff --git a/xen/arch/arm/xsplice.c b/xen/arch/arm/xsplice.c
new file mode 100644
index 0000000..8d85fa9
--- /dev/null
+++ b/xen/arch/arm/xsplice.c
@@ -0,0 +1,23 @@
+#include <xen/lib.h>
+#include <xen/errno.h>
+#include <xen/xsplice_elf.h>
+#include <xen/xsplice.h>
+
+int xsplice_verify_elf(uint8_t *data, ssize_t len)
+{
+    return -ENOSYS;
+}
+
+int xsplice_perform_rel(struct xsplice_elf *elf,
+                        struct xsplice_elf_sec *base,
+                        struct xsplice_elf_sec *rela)
+{
+    return -ENOSYS;
+}
+
+int xsplice_perform_rela(struct xsplice_elf *elf,
+                         struct xsplice_elf_sec *base,
+                         struct xsplice_elf_sec *rela)
+{
+    return -ENOSYS;
+}
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 8e6e901..f7d3e39 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -63,6 +63,7 @@ obj-y += vm_event.o
 obj-y += xstate.o
 
 obj-$(crash_debug) += gdbstub.o
+obj-$(CONFIG_XSPLICE) += xsplice.o
 
 x86_emulate.o: x86_emulate/x86_emulate.c x86_emulate/x86_emulate.h
 
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index b8a28d7..afa074a 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -99,6 +99,9 @@ unsigned long __read_mostly xen_phys_start;
 
 unsigned long __read_mostly xen_virt_end;
 
+unsigned long __read_mostly module_virt_start;
+unsigned long __read_mostly module_virt_end;
+
 DEFINE_PER_CPU(struct tss_struct, init_tss);
 
 char __section(".bss.stack_aligned") cpu0_stack[STACK_SIZE];
@@ -1146,6 +1149,10 @@ void __init noreturn __start_xen(unsigned long mbi_p)
                    ~((1UL << L2_PAGETABLE_SHIFT) - 1);
     destroy_xen_mappings(xen_virt_end, XEN_VIRT_START + BOOTSTRAP_MAP_BASE);
 
+    module_virt_start = xen_virt_end;
+    module_virt_end = XEN_VIRT_END - NR_CPUS * PAGE_SIZE;
+    BUG_ON(module_virt_end <= module_virt_start);
+
     memguard_init();
 
     nr_pages = 0;
diff --git a/xen/arch/x86/xsplice.c b/xen/arch/x86/xsplice.c
new file mode 100644
index 0000000..814dd52
--- /dev/null
+++ b/xen/arch/x86/xsplice.c
@@ -0,0 +1,113 @@
+#include <xen/errno.h>
+#include <xen/lib.h>
+#include <xen/xsplice_elf.h>
+#include <xen/xsplice.h>
+
+#ifdef NDEBUG
+#define return_(x) return x
+#else
+#define return_(x) { printk(XENLOG_DEBUG "%s:%d rc: %d\n",  \
+                            __func__,__LINE__, x); return x; }
+#endif
+
+int xsplice_verify_elf(struct xsplice_elf *elf, uint8_t *data)
+{
+
+    Elf_Ehdr *hdr = (Elf_Ehdr *)data;
+
+    if ( elf->len < (sizeof *hdr) ||
+         !IS_ELF(*hdr) ||
+         hdr->e_ident[EI_CLASS] != ELFCLASS64 ||
+         hdr->e_ident[EI_DATA] != ELFDATA2LSB ||
+         hdr->e_ident[EI_OSABI] != ELFOSABI_SYSV ||
+         hdr->e_machine != EM_X86_64 ||
+         hdr->e_type != ET_REL ||
+         hdr->e_phnum != 0 )
+    {
+        printk(XENLOG_ERR "%s: Invalid ELF file.\n", elf->name);
+        return -EOPNOTSUPP;
+    }
+
+    return 0;
+}
+
+int xsplice_perform_rel(struct xsplice_elf *elf,
+                        struct xsplice_elf_sec *base,
+                        struct xsplice_elf_sec *rela)
+{
+    printk(XENLOG_ERR "%s: SHR_REL relocation unsupported\n", elf->name);
+    return -ENOSYS;
+}
+
+int xsplice_perform_rela(struct xsplice_elf *elf,
+                         struct xsplice_elf_sec *base,
+                         struct xsplice_elf_sec *rela)
+{
+    Elf_RelA *r;
+    unsigned int symndx, i;
+    uint64_t val;
+    uint8_t *dest;
+
+    if ( !rela->sec->sh_entsize || !rela->sec->sh_size )
+        return_(-EINVAL);
+
+    if ( rela->sec->sh_entsize != sizeof(Elf_RelA) )
+        return_(-EINVAL);
+
+    for ( i = 0; i < (rela->sec->sh_size / rela->sec->sh_entsize); i++ )
+    {
+        r = (Elf_RelA *)(rela->data + i * rela->sec->sh_entsize);
+        if ( (unsigned long)r > (unsigned long)(elf->hdr + elf->len) )
+            return_(-EINVAL);
+
+        symndx = ELF64_R_SYM(r->r_info);
+        if ( symndx > elf->nsym )
+            return_(-EINVAL);
+
+        dest = base->load_addr + r->r_offset;
+        val = r->r_addend + elf->sym[symndx].sym->st_value;
+
+        switch ( ELF64_R_TYPE(r->r_info) )
+        {
+            case R_X86_64_NONE:
+                break;
+            case R_X86_64_64:
+                *(uint64_t *)dest = val;
+                break;
+            case R_X86_64_32:
+                *(uint32_t *)dest = val;
+                if (val != *(uint32_t *)dest)
+                    goto overflow;
+                break;
+            case R_X86_64_32S:
+                *(int32_t *)dest = val;
+                if ((int64_t)val != *(int32_t *)dest)
+                    goto overflow;
+                break;
+            case R_X86_64_PLT32:
+                /*
+                 * Xen uses -fpic which normally uses PLT relocations
+                 * except that it sets visibility to hidden which means
+                 * that they are not used.  However, when gcc cannot
+                 * inline memcpy it emits memcpy with default visibility
+                 * which then creates a PLT relocation.  It can just be
+                 * treated the same as R_X86_64_PC32.
+                 */
+                /* Fall through */
+            case R_X86_64_PC32:
+                *(uint32_t *)dest = val - (uint64_t)dest;
+                break;
+            default:
+                printk(XENLOG_ERR "%s: Unhandled relocation %lu\n",
+                       elf->name, ELF64_R_TYPE(r->r_info));
+                return -EINVAL;
+        }
+    }
+
+    return 0;
+
+ overflow:
+    printk(XENLOG_ERR "%s: Overflow in relocation %d in %s for %s\n",
+           elf->name, i, rela->name, base->name);
+    return -EOVERFLOW;
+}
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 125d9b8..fbd6129 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -11,6 +11,7 @@
 #include <xen/sched.h>
 #include <xen/smp.h>
 #include <xen/spinlock.h>
+#include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 
 #include <asm/event.h>
@@ -26,9 +27,15 @@ struct payload {
     int32_t state;                       /* One of the XSPLICE_STATE_*. */
     int32_t rc;                          /* 0 or -XEN_EXX. */
     struct list_head list;               /* Linked to 'payload_list'. */
+    void *payload_address;               /* Virtual address mapped. */
+    size_t payload_pages;                /* Nr of the pages. */
+
     char name[XEN_XSPLICE_NAME_SIZE + 1];/* Name of it. */
 };
 
+static int load_payload_data(struct payload *payload, uint8_t *raw, ssize_t len);
+static void free_payload_data(struct payload *payload);
+
 static const char *state2str(int32_t state)
 {
 #define STATE(x) [XSPLICE_STATE_##x] = #x
@@ -58,8 +65,9 @@ static void xsplice_printall(unsigned char key)
     spin_lock(&payload_lock);
 
     list_for_each_entry ( data, &payload_list, list )
-        printk(" name=%s state=%s(%d)\n", data->name,
-               state2str(data->state), data->state);
+        printk(" name=%s state=%s(%d) %p using %zu pages.\n", data->name,
+               state2str(data->state), data->state, data->payload_address,
+               data->payload_pages);
 
     spin_unlock(&payload_lock);
 }
@@ -136,6 +144,7 @@ static void free_payload(struct payload *data)
     list_del(&data->list);
     payload_cnt--;
     payload_version++;
+    free_payload_data(data);
     xfree(data);
 }
 
@@ -174,6 +183,10 @@ static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
     if ( copy_from_guest(raw_data, upload->payload, upload->size) )
         goto err_raw;
 
+    rc = load_payload_data(data, raw_data, upload->size);
+    if ( rc )
+        goto err_raw;
+
     data->state = XSPLICE_STATE_LOADED;
     data->rc = 0;
     INIT_LIST_HEAD(&data->list);
@@ -378,6 +391,232 @@ int xsplice_control(xen_sysctl_xsplice_op_t *xsplice)
     return rc;
 }
 
+#ifdef CONFIG_X86
+static void find_hole(ssize_t pages, unsigned long *hole_start,
+                      unsigned long *hole_end)
+{
+    struct payload *data, *data2;
+
+    spin_lock(&payload_lock);
+    list_for_each_entry ( data, &payload_list, list )
+    {
+        list_for_each_entry ( data2, &payload_list, list )
+        {
+            unsigned long start, end;
+
+            start = (unsigned long)data2->payload_address;
+            end = start + data2->payload_pages * PAGE_SIZE;
+            if ( *hole_end > start && *hole_start < end )
+            {
+                *hole_start = end;
+                *hole_end = *hole_start + pages * PAGE_SIZE;
+                break;
+            }
+        }
+        if ( &data2->list == &payload_list )
+            break;
+    }
+    spin_unlock(&payload_lock);
+}
+
+/*
+ * The following functions prepare an xSplice payload to be executed by
+ * allocating space, loading the allocated sections, resolving symbols,
+ * performing relocations, etc.
+ */
+static void *alloc_payload(size_t size)
+{
+    mfn_t *mfn, *mfn_ptr;
+    size_t pages, i;
+    struct page_info *pg;
+    unsigned long hole_start, hole_end, cur;
+
+    ASSERT(size);
+
+    /*
+     * Copied from vmalloc which allocates pages and then maps them to an
+     * arbitrary virtual address with PAGE_HYPERVISOR. We need specific
+     * virtual address with PAGE_HYPERVISOR_RWX.
+     */
+    pages = PFN_UP(size);
+    mfn = xmalloc_array(mfn_t, pages);
+    if ( mfn == NULL )
+        return NULL;
+
+    for ( i = 0; i < pages; i++ )
+    {
+        pg = alloc_domheap_page(NULL, 0);
+        if ( pg == NULL )
+            goto error;
+        mfn[i] = _mfn(page_to_mfn(pg));
+    }
+
+    hole_start = (unsigned long)module_virt_start;
+    hole_end = hole_start + pages * PAGE_SIZE;
+    find_hole(pages, &hole_start, &hole_end);
+
+    if ( hole_end >= module_virt_end )
+        goto error;
+
+    for ( cur = hole_start, mfn_ptr = mfn; pages--; ++mfn_ptr, cur += PAGE_SIZE )
+    {
+        if ( map_pages_to_xen(cur, mfn_x(*mfn_ptr), 1, PAGE_HYPERVISOR_RWX) )
+        {
+            if ( cur != hole_start )
+                destroy_xen_mappings(hole_start, cur);
+            goto error;
+        }
+    }
+    xfree(mfn);
+    return (void *)hole_start;
+
+ error:
+    while ( i-- )
+        free_domheap_page(mfn_to_page(mfn_x(mfn[i])));
+    xfree(mfn);
+    return NULL;
+}
+#else
+static void *alloc_payload(size_t size)
+{
+    return NULL;
+}
+#endif
+
+static void free_payload_data(struct payload *payload)
+{
+    unsigned int i;
+    struct page_info *pg;
+    PAGE_LIST_HEAD(pg_list);
+    void *va = payload->payload_address;
+    unsigned long addr = (unsigned long)va;
+
+    if ( !va )
+        return;
+
+    payload->payload_address = NULL;
+
+    for ( i = 0; i < payload->payload_pages; i++ )
+        page_list_add(vmap_to_page(va + i * PAGE_SIZE), &pg_list);
+
+    destroy_xen_mappings(addr, addr + payload->payload_pages * PAGE_SIZE);
+
+    while ( (pg = page_list_remove_head(&pg_list)) != NULL )
+        free_domheap_page(pg);
+
+    payload->payload_pages = 0;
+}
+
+static void calc_section(struct xsplice_elf_sec *sec, size_t *size)
+{
+    size_t align_size = ROUNDUP(*size, sec->sec->sh_addralign);
+    sec->sec->sh_entsize = align_size;
+    *size = sec->sec->sh_size + align_size;
+}
+
+static int move_payload(struct payload *payload, struct xsplice_elf *elf)
+{
+    uint8_t *buf;
+    unsigned int i;
+    size_t size = 0;
+
+    /* Compute text regions */
+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( (elf->sec[i].sec->sh_flags & (SHF_ALLOC|SHF_EXECINSTR)) ==
+             (SHF_ALLOC|SHF_EXECINSTR) )
+            calc_section(&elf->sec[i], &size);
+    }
+
+    /* Compute rw data */
+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
+             !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
+             (elf->sec[i].sec->sh_flags & SHF_WRITE) )
+            calc_section(&elf->sec[i], &size);
+    }
+
+    /* Compute ro data */
+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
+             !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
+             !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
+            calc_section(&elf->sec[i], &size);
+    }
+
+    buf = alloc_payload(size);
+    if ( !buf ) {
+        printk(XENLOG_ERR "%s: Could not allocate memory for module\n",
+               elf->name);
+        return -ENOMEM;
+    }
+    memset(buf, 0, size);
+
+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( elf->sec[i].sec->sh_flags & SHF_ALLOC )
+        {
+            elf->sec[i].load_addr = buf + elf->sec[i].sec->sh_entsize;
+            /* Don't copy NOBITS - such as BSS. */
+            if ( elf->sec[i].sec->sh_type != SHT_NOBITS )
+            {
+                memcpy(elf->sec[i].load_addr, elf->sec[i].data,
+                       elf->sec[i].sec->sh_size);
+                printk(XENLOG_DEBUG "%s: Loaded %s at 0x%p\n",
+                       elf->name, elf->sec[i].name, elf->sec[i].load_addr);
+            }
+        }
+    }
+
+    payload->payload_address = buf;
+    payload->payload_pages = PFN_UP(size);
+
+    return 0;
+}
+
+static int load_payload_data(struct payload *payload, uint8_t *raw, ssize_t len)
+{
+    struct xsplice_elf elf;
+    int rc = 0;
+
+    memset(&elf, 0, sizeof(elf));
+    elf.name = payload->name;
+    elf.len = len;
+
+    rc = xsplice_verify_elf(&elf, raw);
+    if ( rc )
+        return rc;
+
+    rc = xsplice_elf_load(&elf, raw);
+    if ( rc )
+        goto err_elf;
+
+    rc = move_payload(payload, &elf);
+    if ( rc )
+        goto err_elf;
+
+    rc = xsplice_elf_resolve_symbols(&elf);
+    if ( rc )
+        goto err_payload;
+
+    rc = xsplice_elf_perform_relocs(&elf);
+    if ( rc )
+        goto err_payload;
+
+    /* Free our temporary data structure. */
+    xsplice_elf_free(&elf);
+    return 0;
+
+ err_payload:
+    free_payload_data(payload);
+ err_elf:
+    xsplice_elf_free(&elf);
+
+    return rc;
+}
+
 static int __init xsplice_init(void)
 {
     register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
diff --git a/xen/common/xsplice_elf.c b/xen/common/xsplice_elf.c
index d9f9002..0717263 100644
--- a/xen/common/xsplice_elf.c
+++ b/xen/common/xsplice_elf.c
@@ -203,3 +203,87 @@ void xsplice_elf_free(struct xsplice_elf *elf)
     elf->name = NULL;
     elf->len = 0;
 }
+
+int xsplice_elf_resolve_symbols(struct xsplice_elf *elf)
+{
+    unsigned int i;
+
+    /*
+     * The first entry of an ELF symbol table is the "undefined symbol index".
+     * aka reserved so we skip it.
+     */
+    ASSERT( elf->sym );
+    for ( i = 1; i < elf->nsym; i++ )
+    {
+        switch ( elf->sym[i].sym->st_shndx )
+        {
+            case SHN_COMMON:
+                printk(XENLOG_ERR "%s: Unexpected common symbol: %s\n",
+                       elf->name, elf->sym[i].name);
+                return_(-EINVAL);
+                break;
+            case SHN_UNDEF:
+                printk(XENLOG_ERR "%s: Unknown symbol: %s\n", elf->name,
+                       elf->sym[i].name);
+                return_(-ENOENT);
+                break;
+            case SHN_ABS:
+                printk(XENLOG_DEBUG "%s: Absolute symbol: %s => 0x%p\n",
+                      elf->name, elf->sym[i].name,
+                      (void *)elf->sym[i].sym->st_value);
+                break;
+            default:
+                if ( elf->sec[elf->sym[i].sym->st_shndx].sec->sh_flags & SHF_ALLOC )
+                {
+                    elf->sym[i].sym->st_value +=
+                        (unsigned long)elf->sec[elf->sym[i].sym->st_shndx].load_addr;
+                    printk(XENLOG_DEBUG "%s: Symbol resolved: %s => 0x%p\n",
+                           elf->name, elf->sym[i].name,
+                           (void *)elf->sym[i].sym->st_value);
+                }
+        }
+    }
+
+    return 0;
+}
+
+int xsplice_elf_perform_relocs(struct xsplice_elf *elf)
+{
+    struct xsplice_elf_sec *rela, *base;
+    unsigned int i;
+    int rc;
+
+    /*
+     * The first entry of an ELF symbol table is the "undefined symbol index".
+     * aka reserved so we skip it.
+     */
+    ASSERT( elf->sym );
+    for ( i = 1; i < elf->hdr->e_shnum; i++ )
+    {
+        rela = &elf->sec[i];
+
+        if ( (rela->sec->sh_type != SHT_RELA ) &&
+             (rela->sec->sh_type != SHT_REL ) )
+            continue;
+
+         /* Is it a valid relocation section? */
+         if ( rela->sec->sh_info >= elf->hdr->e_shnum )
+            continue;
+
+         base = &elf->sec[rela->sec->sh_info];
+
+         /* Don't relocate non-allocated sections. */
+         if ( !(base->sec->sh_flags & SHF_ALLOC) )
+            continue;
+
+        if ( elf->sec[i].sec->sh_type == SHT_RELA )
+            rc = xsplice_perform_rela(elf, base, rela);
+        else /* SHT_REL */
+            rc = xsplice_perform_rel(elf, base, rela);
+
+        if ( rc )
+            return rc;
+    }
+
+    return 0;
+}
diff --git a/xen/include/asm-x86/x86_64/page.h b/xen/include/asm-x86/x86_64/page.h
index 19ab4d0..e6f08e9 100644
--- a/xen/include/asm-x86/x86_64/page.h
+++ b/xen/include/asm-x86/x86_64/page.h
@@ -38,6 +38,8 @@
 #include <xen/pdx.h>
 
 extern unsigned long xen_virt_end;
+extern unsigned long module_virt_start;
+extern unsigned long module_virt_end;
 
 #define spage_to_pdx(spg) (((spg) - spage_table)<<(SUPERPAGE_SHIFT-PAGE_SHIFT))
 #define pdx_to_spage(pdx) (spage_table + ((pdx)>>(SUPERPAGE_SHIFT-PAGE_SHIFT)))
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index cf465c4..d71c898 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -1,10 +1,22 @@
 #ifndef __XEN_XSPLICE_H__
 #define __XEN_XSPLICE_H__
 
+struct xsplice_elf;
+struct xsplice_elf_sec;
+struct xsplice_elf_sym;
 struct xen_sysctl_xsplice_op;
 
 #ifdef CONFIG_XSPLICE
 int xsplice_control(struct xen_sysctl_xsplice_op *);
+
+/* Arch hooks */
+int xsplice_verify_elf(struct xsplice_elf *elf, uint8_t *data);
+int xsplice_perform_rel(struct xsplice_elf *elf,
+                        struct xsplice_elf_sec *base,
+                        struct xsplice_elf_sec *rela);
+int xsplice_perform_rela(struct xsplice_elf *elf,
+                         struct xsplice_elf_sec *base,
+                         struct xsplice_elf_sec *rela);
 #else
 #include <xen/errno.h> /* For -ENOSYS */
 static inline int xsplice_control(struct xen_sysctl_xsplice_op *op)
diff --git a/xen/include/xen/xsplice_elf.h b/xen/include/xen/xsplice_elf.h
index 42dbc6f..229c11f 100644
--- a/xen/include/xen/xsplice_elf.h
+++ b/xen/include/xen/xsplice_elf.h
@@ -11,6 +11,8 @@ struct xsplice_elf_sec {
                                       elf_resolve_section_names. */
     const uint8_t *data;           /* Pointer to the section (done by
                                       elf_resolve_sections). */
+    uint8_t *load_addr;            /* A pointer to the allocated destination.
+                                      Done by load_payload_data. */
 };
 
 struct xsplice_elf_sym {
@@ -34,4 +36,7 @@ struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
 int xsplice_elf_load(struct xsplice_elf *elf, uint8_t *data);
 void xsplice_elf_free(struct xsplice_elf *elf);
 
+int xsplice_elf_resolve_symbols(struct xsplice_elf *elf);
+int xsplice_elf_perform_relocs(struct xsplice_elf *elf);
+
 #endif /* __XEN_XSPLICE_ELF_H__ */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
                   ` (5 preceding siblings ...)
  2016-02-12 18:05 ` [PATCH v3 06/23] xsplice: Implement payload loading (v4) Konrad Rzeszutek Wilk
@ 2016-02-12 18:05 ` Konrad Rzeszutek Wilk
  2016-02-16 19:11   ` Andrew Cooper
  2016-02-22 15:00   ` Ross Lagerwall
  2016-02-12 18:05 ` [PATCH v3 08/23] x86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version'. (v2) Konrad Rzeszutek Wilk
                   ` (16 subsequent siblings)
  23 siblings, 2 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:05 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Ian Campbell, Stefano Stabellini,
	Keir Fraser, Jan Beulich, Boris Ostrovsky, Suravee Suthikulpanit,
	Aravind Gopalakrishnan, Jun Nakajima, Kevin Tian, xen-devel
  Cc: Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Implement support for the apply, revert and replace actions.

To perform and action on a payload, the hypercall sets up a data
structure to schedule the work.  A hook is added in all the
return-to-guest paths to check for work to do and execute it if needed.
In this way, patches can be applied with all CPUs idle and without
stacks.  The first CPU to do_xsplice() becomes the master and triggers a
reschedule softirq to trigger all the other CPUs to enter do_xsplice()
with no stack.  Once all CPUs have rendezvoused, all CPUs disable IRQs
and NMIs are ignored. The system is then quiscient and the master
performs the action.  After this, all CPUs enable IRQs and NMIs are
re-enabled.

The action to perform is one of:
- APPLY: For each function in the module, store the first 5 bytes of the
  old function and replace it with a jump to the new function.
- REVERT: Copy the previously stored bytes into the first 5 bytes of the
  old function.
- REPLACE: Revert each applied module and then apply the new module.

To prevent a deadlock with any other barrier in the system, the master
will wait for up to 30ms before timing out.  I've taken some
measurements and found the patch application to take about 100 μs on a
72 CPU system, whether idle or fully loaded.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
--
v2: - Pluck the 'struct xsplice_patch_func' in this patch.
    - Modify code per review comments.
    - Add more data in the keyboard handler.
    - Redo the patching code, split it in functions.
v3: - Add return_ macro for debug builds.
    - Move s/payload_list_lock/payload_list/ to earlier patch
    - Remove const and use ELF types for xsplice_patch_func
v4: - Add check routine to do simple sanity checks for various
      sections.
v5: - s/%p/PRIx64/ as ARM builds complain.
---
 xen/arch/arm/xsplice.c      |  10 +-
 xen/arch/x86/domain.c       |   4 +
 xen/arch/x86/hvm/svm/svm.c  |   2 +
 xen/arch/x86/hvm/vmx/vmcs.c |   2 +
 xen/arch/x86/xsplice.c      |  19 +++
 xen/common/xsplice.c        | 372 ++++++++++++++++++++++++++++++++++++++++++--
 xen/common/xsplice_elf.c    |   8 +-
 xen/include/asm-arm/nmi.h   |  13 ++
 xen/include/xen/xsplice.h   |  21 +++
 9 files changed, 432 insertions(+), 19 deletions(-)

diff --git a/xen/arch/arm/xsplice.c b/xen/arch/arm/xsplice.c
index 8d85fa9..06f6875 100644
--- a/xen/arch/arm/xsplice.c
+++ b/xen/arch/arm/xsplice.c
@@ -3,7 +3,15 @@
 #include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 
-int xsplice_verify_elf(uint8_t *data, ssize_t len)
+void xsplice_apply_jmp(struct xsplice_patch_func *func)
+{
+}
+
+void xsplice_revert_jmp(struct xsplice_patch_func *func)
+{
+}
+
+int xsplice_verify_elf(struct xsplice_elf *elf, uint8_t *data)
 {
     return -ENOSYS;
 }
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 9d43f7b..b5995b9 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -36,6 +36,7 @@
 #include <xen/cpu.h>
 #include <xen/wait.h>
 #include <xen/guest_access.h>
+#include <xen/xsplice.h>
 #include <public/sysctl.h>
 #include <public/hvm/hvm_vcpu.h>
 #include <asm/regs.h>
@@ -121,6 +122,7 @@ static void idle_loop(void)
         (*pm_idle)();
         do_tasklet();
         do_softirq();
+        do_xsplice(); /* Must be last. */
     }
 }
 
@@ -137,6 +139,7 @@ void startup_cpu_idle_loop(void)
 
 static void noreturn continue_idle_domain(struct vcpu *v)
 {
+    do_xsplice();
     reset_stack_and_jump(idle_loop);
 }
 
@@ -144,6 +147,7 @@ static void noreturn continue_nonidle_domain(struct vcpu *v)
 {
     check_wakeup_from_wait();
     mark_regs_dirty(guest_cpu_user_regs());
+    do_xsplice();
     reset_stack_and_jump(ret_from_intr);
 }
 
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index e62dfa1..340f23b 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -26,6 +26,7 @@
 #include <xen/hypercall.h>
 #include <xen/domain_page.h>
 #include <xen/xenoprof.h>
+#include <xen/xsplice.h>
 #include <asm/current.h>
 #include <asm/io.h>
 #include <asm/paging.h>
@@ -1108,6 +1109,7 @@ static void noreturn svm_do_resume(struct vcpu *v)
 
     hvm_do_resume(v);
 
+    do_xsplice();
     reset_stack_and_jump(svm_asm_do_resume);
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 5bc3c74..1008163 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -25,6 +25,7 @@
 #include <xen/kernel.h>
 #include <xen/keyhandler.h>
 #include <xen/vm_event.h>
+#include <xen/xsplice.h>
 #include <asm/current.h>
 #include <asm/cpufeature.h>
 #include <asm/processor.h>
@@ -1716,6 +1717,7 @@ void vmx_do_resume(struct vcpu *v)
     }
 
     hvm_do_resume(v);
+    do_xsplice();
     reset_stack_and_jump(vmx_asm_do_vmentry);
 }
 
diff --git a/xen/arch/x86/xsplice.c b/xen/arch/x86/xsplice.c
index 814dd52..ae35e91 100644
--- a/xen/arch/x86/xsplice.c
+++ b/xen/arch/x86/xsplice.c
@@ -10,6 +10,25 @@
                             __func__,__LINE__, x); return x; }
 #endif
 
+#define PATCH_INSN_SIZE 5
+
+void xsplice_apply_jmp(struct xsplice_patch_func *func)
+{
+    uint32_t val;
+    uint8_t *old_ptr;
+
+    old_ptr = (uint8_t *)func->old_addr;
+    memcpy(func->undo, old_ptr, PATCH_INSN_SIZE);
+    *old_ptr++ = 0xe9; /* Relative jump */
+    val = func->new_addr - func->old_addr - PATCH_INSN_SIZE;
+    memcpy(old_ptr, &val, sizeof val);
+}
+
+void xsplice_revert_jmp(struct xsplice_patch_func *func)
+{
+    memcpy((void *)func->old_addr, func->undo, PATCH_INSN_SIZE);
+}
+
 int xsplice_verify_elf(struct xsplice_elf *elf, uint8_t *data)
 {
 
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index fbd6129..b854c0a 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -3,6 +3,7 @@
  *
  */
 
+#include <xen/cpu.h>
 #include <xen/guest_access.h>
 #include <xen/keyhandler.h>
 #include <xen/lib.h>
@@ -10,16 +11,25 @@
 #include <xen/mm.h>
 #include <xen/sched.h>
 #include <xen/smp.h>
+#include <xen/softirq.h>
 #include <xen/spinlock.h>
+#include <xen/wait.h>
 #include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 
 #include <asm/event.h>
+#include <asm/nmi.h>
 #include <public/sysctl.h>
 
+/*
+ * Protects against payload_list operations and also allows only one
+ * caller in schedule_work.
+ */
 static DEFINE_SPINLOCK(payload_lock);
 static LIST_HEAD(payload_list);
 
+static LIST_HEAD(applied_list);
+
 static unsigned int payload_cnt;
 static unsigned int payload_version = 1;
 
@@ -29,6 +39,9 @@ struct payload {
     struct list_head list;               /* Linked to 'payload_list'. */
     void *payload_address;               /* Virtual address mapped. */
     size_t payload_pages;                /* Nr of the pages. */
+    struct list_head applied_list;       /* Linked to 'applied_list'. */
+    struct xsplice_patch_func *funcs;    /* The array of functions to patch. */
+    unsigned int nfuncs;                 /* Nr of functions to patch. */
 
     char name[XEN_XSPLICE_NAME_SIZE + 1];/* Name of it. */
 };
@@ -36,6 +49,24 @@ struct payload {
 static int load_payload_data(struct payload *payload, uint8_t *raw, ssize_t len);
 static void free_payload_data(struct payload *payload);
 
+/* Defines an outstanding patching action. */
+struct xsplice_work
+{
+    atomic_t semaphore;          /* Used for rendezvous. First to grab it will
+                                    do the patching. */
+    atomic_t irq_semaphore;      /* Used to signal all IRQs disabled. */
+    uint32_t timeout;                    /* Timeout to do the operation. */
+    struct payload *data;        /* The payload on which to act. */
+    volatile bool_t do_work;     /* Signals work to do. */
+    volatile bool_t ready;       /* Signals all CPUs synchronized. */
+    uint32_t cmd;                /* Action request: XSPLICE_ACTION_* */
+};
+
+/* There can be only one outstanding patching action. */
+static struct xsplice_work xsplice_work;
+
+static int schedule_work(struct payload *data, uint32_t cmd, uint32_t timeout);
+
 static const char *state2str(int32_t state)
 {
 #define STATE(x) [XSPLICE_STATE_##x] = #x
@@ -61,14 +92,23 @@ static const char *state2str(int32_t state)
 static void xsplice_printall(unsigned char key)
 {
     struct payload *data;
+    unsigned int i;
 
     spin_lock(&payload_lock);
 
     list_for_each_entry ( data, &payload_list, list )
-        printk(" name=%s state=%s(%d) %p using %zu pages.\n", data->name,
+    {
+        printk(" name=%s state=%s(%d) %p using %zu pages:\n", data->name,
                state2str(data->state), data->state, data->payload_address,
                data->payload_pages);
 
+        for ( i = 0; i < data->nfuncs; i++ )
+        {
+            struct xsplice_patch_func *f = &(data->funcs[i]);
+            printk("    %s patch 0x%"PRIx64"(%u) with 0x%"PRIx64"(%u)\n",
+                   f->name, f->old_addr, f->old_size, f->new_addr, f->new_size);
+        }
+    }
     spin_unlock(&payload_lock);
 }
 
@@ -327,28 +367,22 @@ static int xsplice_action(xen_sysctl_xsplice_action_t *action)
     case XSPLICE_ACTION_REVERT:
         if ( data->state == XSPLICE_STATE_APPLIED )
         {
-            /* No implementation yet. */
-            data->state = XSPLICE_STATE_CHECKED;
-            data->rc = 0;
-            rc = 0;
+            data->rc = -EAGAIN;
+            rc = schedule_work(data, action->cmd, action->timeout);
         }
         break;
     case XSPLICE_ACTION_APPLY:
         if ( (data->state == XSPLICE_STATE_CHECKED) )
         {
-            /* No implementation yet. */
-            data->state = XSPLICE_STATE_APPLIED;
-            data->rc = 0;
-            rc = 0;
+            data->rc = -EAGAIN;
+            rc = schedule_work(data, action->cmd, action->timeout);
         }
         break;
     case XSPLICE_ACTION_REPLACE:
         if ( data->state == XSPLICE_STATE_CHECKED )
         {
-            /* No implementation yet. */
-            data->state = XSPLICE_STATE_CHECKED;
-            data->rc = 0;
-            rc = 0;
+            data->rc = -EAGAIN;
+            rc = schedule_work(data, action->cmd, action->timeout);
         }
         break;
     default:
@@ -576,6 +610,62 @@ static int move_payload(struct payload *payload, struct xsplice_elf *elf)
     return 0;
 }
 
+static int check_special_sections(struct payload *payload,
+                                  struct xsplice_elf *elf)
+{
+    unsigned int i;
+    static const char *const names[] = { ".xsplice.funcs" };
+
+    for ( i = 0; i < ARRAY_SIZE(names); i++ )
+    {
+        struct xsplice_elf_sec *sec;
+
+        sec = xsplice_elf_sec_by_name(elf, names[i]);
+        if ( !sec )
+        {
+            printk(XENLOG_ERR "%s: %s is missing!\n", names[i],elf->name);
+            return -EINVAL;
+        }
+        if ( !sec->sec->sh_size )
+            return -EINVAL;
+    }
+    return 0;
+}
+
+static int find_special_sections(struct payload *payload,
+                                 struct xsplice_elf *elf)
+{
+    struct xsplice_elf_sec *sec;
+    unsigned int i;
+    struct xsplice_patch_func *f;
+
+    sec = xsplice_elf_sec_by_name(elf, ".xsplice.funcs");
+    if ( sec->sec->sh_size % sizeof *payload->funcs )
+        return -EINVAL;
+
+    payload->funcs = (struct xsplice_patch_func *)sec->load_addr;
+    payload->nfuncs = sec->sec->sh_size / (sizeof *payload->funcs);
+
+    for ( i = 0; i < payload->nfuncs; i++ )
+    {
+        unsigned int j;
+
+        f = &(payload->funcs[i]);
+
+        if ( !f->new_addr || !f->old_addr || !f->old_size || !f->new_size )
+            return -EINVAL;
+
+        for ( j = 0; j < 8; j ++ )
+            if ( f->undo[j] )
+                return -EINVAL;
+
+        for ( j = 0; j < 24; j ++ )
+            if ( f->pad[j] )
+                return -EINVAL;
+    }
+    return 0;
+}
+
 static int load_payload_data(struct payload *payload, uint8_t *raw, ssize_t len)
 {
     struct xsplice_elf elf;
@@ -605,7 +695,14 @@ static int load_payload_data(struct payload *payload, uint8_t *raw, ssize_t len)
     if ( rc )
         goto err_payload;
 
-    /* Free our temporary data structure. */
+    rc = check_special_sections(payload, &elf);
+    if ( rc )
+        goto err_payload;
+
+    rc = find_special_sections(payload, &elf);
+    if ( rc )
+        goto err_payload;
+
     xsplice_elf_free(&elf);
     return 0;
 
@@ -617,6 +714,253 @@ static int load_payload_data(struct payload *payload, uint8_t *raw, ssize_t len)
     return rc;
 }
 
+
+/*
+ * The following functions get the CPUs into an appropriate state and
+ * apply (or revert) each of the module's functions.
+ */
+
+/*
+ * This function is executed having all other CPUs with no stack (we may
+ * have cpu_idle on it) and IRQs disabled. We guard against NMI by temporarily
+ * installing our NOP NMI handler.
+ */
+static int apply_payload(struct payload *data)
+{
+    unsigned int i;
+
+    printk(XENLOG_DEBUG "%s: Applying %u functions.\n", data->name,
+           data->nfuncs);
+
+    for ( i = 0; i < data->nfuncs; i++ )
+        xsplice_apply_jmp(data->funcs + i);
+
+    list_add_tail(&data->applied_list, &applied_list);
+
+    return 0;
+}
+
+/*
+ * This function is executed having all other CPUs with no stack (we may
+ * have cpu_idle on it) and IRQs disabled.
+ */
+static int revert_payload(struct payload *data)
+{
+    unsigned int i;
+
+    printk(XENLOG_DEBUG "%s: Reverting.\n", data->name);
+
+    for ( i = 0; i < data->nfuncs; i++ )
+        xsplice_revert_jmp(data->funcs + i);
+
+    list_del(&data->applied_list);
+
+    return 0;
+}
+
+/* Must be holding the payload_lock. */
+static int schedule_work(struct payload *data, uint32_t cmd, uint32_t timeout)
+{
+    /* Fail if an operation is already scheduled. */
+    if ( xsplice_work.do_work )
+        return -EBUSY;
+
+    xsplice_work.cmd = cmd;
+    xsplice_work.data = data;
+    xsplice_work.timeout = timeout ? timeout : MILLISECS(30);
+
+    printk(XENLOG_DEBUG "%s: timeout is %"PRI_stime"ms\n", data->name,
+           xsplice_work.timeout / MILLISECS(1));
+
+    atomic_set(&xsplice_work.semaphore, -1);
+    atomic_set(&xsplice_work.irq_semaphore, -1);
+
+    xsplice_work.ready = 0;
+    smp_wmb();
+    xsplice_work.do_work = 1;
+    smp_wmb();
+
+    return 0;
+}
+
+/*
+ * Note that because of this NOP code the do_nmi is not safely patchable.
+ * Also if we do receive 'real' NMIs we have lost them.
+ */
+static int mask_nmi_callback(const struct cpu_user_regs *regs, int cpu)
+{
+    return 1;
+}
+
+static void reschedule_fn(void *unused)
+{
+    smp_mb(); /* Synchronize with setting do_work */
+    raise_softirq(SCHEDULE_SOFTIRQ);
+}
+
+static int xsplice_do_wait(atomic_t *counter, s_time_t timeout,
+                           unsigned int total_cpus, const char *s)
+{
+    int rc = 0;
+
+    while ( atomic_read(counter) != total_cpus && NOW() < timeout )
+        cpu_relax();
+
+    /* Log & abort. */
+    if ( atomic_read(counter) != total_cpus )
+    {
+        printk(XENLOG_DEBUG "%s: %s %u/%u\n", xsplice_work.data->name,
+               s, atomic_read(counter), total_cpus);
+        rc = -EBUSY;
+        xsplice_work.data->rc = rc;
+        xsplice_work.do_work = 0;
+        smp_wmb();
+        return rc;
+    }
+    return rc;
+}
+
+static void xsplice_do_single(unsigned int total_cpus)
+{
+    nmi_callback_t saved_nmi_callback;
+    struct payload *data, *tmp;
+    s_time_t timeout;
+    int rc;
+
+    data = xsplice_work.data;
+    timeout = xsplice_work.timeout + NOW();
+    if ( xsplice_do_wait(&xsplice_work.semaphore, timeout, total_cpus,
+                         "Timed out on CPU semaphore") )
+        return;
+
+    /* "Mask" NMIs. */
+    saved_nmi_callback = set_nmi_callback(mask_nmi_callback);
+
+    /* All CPUs are waiting, now signal to disable IRQs. */
+    xsplice_work.ready = 1;
+    smp_wmb();
+
+    atomic_inc(&xsplice_work.irq_semaphore);
+    if ( xsplice_do_wait(&xsplice_work.irq_semaphore, timeout, total_cpus,
+                         "Timed out on IRQ semaphore.") )
+        return;
+
+    local_irq_disable();
+    /* Now this function should be the only one on any stack.
+     * No need to lock the payload list or applied list. */
+    switch ( xsplice_work.cmd )
+    {
+    case XSPLICE_ACTION_APPLY:
+        rc = apply_payload(data);
+        if ( rc == 0 )
+            data->state = XSPLICE_STATE_APPLIED;
+        break;
+    case XSPLICE_ACTION_REVERT:
+        rc = revert_payload(data);
+        if ( rc == 0 )
+            data->state = XSPLICE_STATE_CHECKED;
+        break;
+    case XSPLICE_ACTION_REPLACE:
+        list_for_each_entry_safe_reverse ( data, tmp, &applied_list, list )
+        {
+            data->rc = revert_payload(data);
+            if ( data->rc == 0 )
+                data->state = XSPLICE_STATE_CHECKED;
+            else
+            {
+                rc = -EINVAL;
+                break;
+            }
+        }
+        if ( rc != -EINVAL )
+        {
+            rc = apply_payload(xsplice_work.data);
+            if ( rc == 0 )
+                xsplice_work.data->state = XSPLICE_STATE_APPLIED;
+        }
+        break;
+    default:
+        rc = -EINVAL;
+        break;
+    }
+
+    xsplice_work.data->rc = rc;
+
+    local_irq_enable();
+    set_nmi_callback(saved_nmi_callback);
+
+    xsplice_work.do_work = 0;
+    smp_wmb(); /* Synchronize with waiting CPUs. */
+}
+
+/*
+ * The main function which manages the work of quiescing the system and
+ * patching code.
+ */
+void do_xsplice(void)
+{
+    struct payload *p = xsplice_work.data;
+    unsigned int cpu = smp_processor_id();
+
+    /* Fast path: no work to do. */
+    if ( likely(!xsplice_work.do_work) )
+        return;
+    ASSERT(local_irq_is_enabled());
+
+    /* Set at -1, so will go up to num_online_cpus - 1 */
+    if ( atomic_inc_and_test(&xsplice_work.semaphore) )
+    {
+        unsigned int total_cpus;
+
+        if ( !get_cpu_maps() )
+        {
+            printk(XENLOG_DEBUG "%s: CPU%u - unable to get cpu_maps lock.\n",
+                   p->name, cpu);
+            xsplice_work.data->rc = -EBUSY;
+            xsplice_work.do_work = 0;
+            return;
+        }
+
+        barrier(); /* MUST do it after get_cpu_maps. */
+        total_cpus = num_online_cpus() - 1;
+
+        if ( total_cpus )
+        {
+            printk(XENLOG_DEBUG "%s: CPU%u - IPIing the %u CPUs.\n", p->name,
+                   cpu, total_cpus);
+            smp_call_function(reschedule_fn, NULL, 0);
+        }
+        (void)xsplice_do_single(total_cpus);
+
+        ASSERT(local_irq_is_enabled());
+
+        put_cpu_maps();
+
+        printk(XENLOG_DEBUG "%s finished with rc=%d\n", p->name, p->rc);
+    }
+    else
+    {
+        /* Wait for all CPUs to rendezvous. */
+        while ( xsplice_work.do_work && !xsplice_work.ready )
+        {
+            cpu_relax();
+            smp_rmb();
+        }
+
+        /* Disable IRQs and signal. */
+        local_irq_disable();
+        atomic_inc(&xsplice_work.irq_semaphore);
+
+        /* Wait for patching to complete. */
+        while ( xsplice_work.do_work )
+        {
+            cpu_relax();
+            smp_rmb();
+        }
+        local_irq_enable();
+    }
+}
+
 static int __init xsplice_init(void)
 {
     register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
diff --git a/xen/common/xsplice_elf.c b/xen/common/xsplice_elf.c
index 0717263..ad70797 100644
--- a/xen/common/xsplice_elf.c
+++ b/xen/common/xsplice_elf.c
@@ -228,18 +228,18 @@ int xsplice_elf_resolve_symbols(struct xsplice_elf *elf)
                 return_(-ENOENT);
                 break;
             case SHN_ABS:
-                printk(XENLOG_DEBUG "%s: Absolute symbol: %s => 0x%p\n",
+                printk(XENLOG_DEBUG "%s: Absolute symbol: %s => 0x%"PRIx64"\n",
                       elf->name, elf->sym[i].name,
-                      (void *)elf->sym[i].sym->st_value);
+                      elf->sym[i].sym->st_value);
                 break;
             default:
                 if ( elf->sec[elf->sym[i].sym->st_shndx].sec->sh_flags & SHF_ALLOC )
                 {
                     elf->sym[i].sym->st_value +=
                         (unsigned long)elf->sec[elf->sym[i].sym->st_shndx].load_addr;
-                    printk(XENLOG_DEBUG "%s: Symbol resolved: %s => 0x%p\n",
+                    printk(XENLOG_DEBUG "%s: Symbol resolved: %s => 0x%"PRIx64"\n",
                            elf->name, elf->sym[i].name,
-                           (void *)elf->sym[i].sym->st_value);
+                           elf->sym[i].sym->st_value);
                 }
         }
     }
diff --git a/xen/include/asm-arm/nmi.h b/xen/include/asm-arm/nmi.h
index a60587e..82aff35 100644
--- a/xen/include/asm-arm/nmi.h
+++ b/xen/include/asm-arm/nmi.h
@@ -4,6 +4,19 @@
 #define register_guest_nmi_callback(a)  (-ENOSYS)
 #define unregister_guest_nmi_callback() (-ENOSYS)
 
+typedef int (*nmi_callback_t)(const struct cpu_user_regs *regs, int cpu);
+
+/**
+ * set_nmi_callback
+ *
+ * Set a handler for an NMI. Only one handler may be
+ * set. Return the old nmi callback handler.
+ */
+static inline nmi_callback_t set_nmi_callback(nmi_callback_t callback)
+{
+    return NULL;
+}
+
 #endif /* ASM_NMI_H */
 /*
  * Local variables:
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index d71c898..d6db1c2 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -6,8 +6,26 @@ struct xsplice_elf_sec;
 struct xsplice_elf_sym;
 struct xen_sysctl_xsplice_op;
 
+#include <xen/elfstructs.h>
+/*
+ * The structure which defines the patching. This is what the hypervisor
+ * expects in the '.xsplice.func' section of the ELF file.
+ *
+ * This MUST be in sync with what the tools generate.
+ */
+struct xsplice_patch_func {
+    const char *name;
+    Elf64_Xword new_addr;
+    Elf64_Xword old_addr;
+    Elf64_Word new_size;
+    Elf64_Word old_size;
+    uint8_t undo[8];
+    uint8_t pad[24];
+};
+
 #ifdef CONFIG_XSPLICE
 int xsplice_control(struct xen_sysctl_xsplice_op *);
+void do_xsplice(void);
 
 /* Arch hooks */
 int xsplice_verify_elf(struct xsplice_elf *elf, uint8_t *data);
@@ -17,11 +35,14 @@ int xsplice_perform_rel(struct xsplice_elf *elf,
 int xsplice_perform_rela(struct xsplice_elf *elf,
                          struct xsplice_elf_sec *base,
                          struct xsplice_elf_sec *rela);
+void xsplice_apply_jmp(struct xsplice_patch_func *func);
+void xsplice_revert_jmp(struct xsplice_patch_func *func);
 #else
 #include <xen/errno.h> /* For -ENOSYS */
 static inline int xsplice_control(struct xen_sysctl_xsplice_op *op)
 {
     return -ENOSYS;
 }
+static inline void do_xsplice(void) { };
 #endif
 #endif /* __XEN_XSPLICE_H__ */
-- 
2.1.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 08/23] x86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version'. (v2)
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
                   ` (6 preceding siblings ...)
  2016-02-12 18:05 ` [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5) Konrad Rzeszutek Wilk
@ 2016-02-12 18:05 ` Konrad Rzeszutek Wilk
  2016-02-16 11:31   ` Ross Lagerwall
  2016-02-12 18:05 ` [PATCH v3 09/23] xsplice: Add support for bug frames. (v4) Konrad Rzeszutek Wilk
                   ` (15 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:05 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Ian Jackson, Stefano Stabellini,
	Ian Campbell, Wei Liu, Stefano Stabellini, Keir Fraser,
	Jan Beulich, xen-devel
  Cc: Konrad Rzeszutek Wilk

This change demonstrates how to generate an xSplice ELF payload.

The idea here is that we want to patch in the hypervisor
the 'xen_version_extra' function with an function that will
return 'Hello World'. The 'xl info | grep extraversion'
will reflect the new value after the patching.

To generate this ELF payload file we need:
 - C code of the new code (xen_hello_world_func.c).
 - C code generating the .xsplice.funcs structure
   (xen_hello_world.c)
 - The address of the old code (xen_extra_version). We
   retrieve it by  using 'nm --defined' on xen-syms.
 - The size of the new and old code for which we use
   nm --defined -S on our code and xen-syms respectively.

There are two C files and one header files generated
during build. One could make this one C file if the
size of the newly patched function size was known in
advance (or an random value was choosen).

There is also a strict order of compiling:
 1) xen_hello_world_func.c
 2) config.h - extract the size of the new function,
    the old function and the old function address.
 3) xen_hello_world.c - which contains the .xsplice.funcs
    structure.
 4) Link the object files in an xen_hello_world.xsplice file.

The use-case is simple:

$xen-xsplice load /usr/lib/debug/xen_hello_world.xsplice
$xen-xsplice list
 ID                                     | status
----------------------------------------+------------
xen_hello_world                           APPLIED
$xl info | grep extra
xen_extra              : Hello World
$xen-xsplice revert xen_hello_world
Performing revert: completed
$xen-xsplice unload xen_hello_world
Performing unload: completed
$xl info | grep extra
xen_extra              : -unstable

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 docs/misc/xsplice.markdown               | 36 +++++++++++++++++++++++
 tools/misc/xsplice.lds                   | 11 +++++++
 xen/Makefile                             |  2 ++
 xen/arch/arm/Makefile                    |  4 +++
 xen/arch/x86/Makefile                    |  6 ++++
 xen/arch/x86/test/Makefile               | 50 ++++++++++++++++++++++++++++++++
 xen/arch/x86/test/xen_hello_world.c      | 15 ++++++++++
 xen/arch/x86/test/xen_hello_world_func.c |  8 +++++
 8 files changed, 132 insertions(+)
 create mode 100644 tools/misc/xsplice.lds
 create mode 100644 xen/arch/x86/test/Makefile
 create mode 100644 xen/arch/x86/test/xen_hello_world.c
 create mode 100644 xen/arch/x86/test/xen_hello_world_func.c

diff --git a/docs/misc/xsplice.markdown b/docs/misc/xsplice.markdown
index 9a95243..0a5b87b 100644
--- a/docs/misc/xsplice.markdown
+++ b/docs/misc/xsplice.markdown
@@ -321,11 +321,47 @@ size.
 
 When applying the patch the hypervisor iterates over each `xsplice_patch_func`
 structure and the core code inserts a trampoline at `old_addr` to `new_addr`.
+The `new_addr` is altered when the ELF payload is loaded.
 
 When reverting a patch, the hypervisor iterates over each `xsplice_patch_func`
 and the core code copies the data from the undo buffer (private internal copy)
 to `old_addr`.
 
+### Example
+
+A simple example of what a payload file can be:
+
+<pre>
+/* MUST be in sync with hypervisor. */  
+struct xsplice_patch_func {  
+    const char *name;  
+    uint64_t new_addr;  
+    uint64_t old_addr;  
+    uint32_t new_size;  
+    uint32_t old_size;  
+    uint8_t pad[32];  
+};  
+
+/* Our replacement function for xen_extra_version. */  
+const char *xen_hello_world(void)  
+{  
+    return "Hello World";  
+}  
+
+static unsigned char name[] = "xen_hello_world";  
+
+struct xsplice_patch_func xsplice_hello_world = {  
+    .name = name,  
+    .new_addr = (unsigned long)(xen_hello_world),  
+    .old_addr = 0xffff82d08013963c, /* Extracted from xen-syms. */  
+    .new_size = 13, /* To be be computed by scripts. */  
+    .old_size = 13, /* -----------""---------------  */  
+} __attribute__((__section__(".xsplice.funcs")));  
+
+</pre>
+
+Code must be compiled with -fPIC.
+
 ## Hypercalls
 
 We will employ the sub operations of the system management hypercall (sysctl).
diff --git a/tools/misc/xsplice.lds b/tools/misc/xsplice.lds
new file mode 100644
index 0000000..f52eb8c
--- /dev/null
+++ b/tools/misc/xsplice.lds
@@ -0,0 +1,11 @@
+OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", "elf64-x86-64")
+OUTPUT_ARCH(i386:x86-64)
+ENTRY(xsplice_hello_world)
+SECTIONS
+{
+    /* The hypervisor expects ".xsplice.func", so change
+     * the ".data.xsplice_hello_world" to it. */
+
+    .xsplice.funcs : { *(*.xsplice_hello_world) }
+
+}
diff --git a/xen/Makefile b/xen/Makefile
index 5d98bcb..f702525 100644
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -75,6 +75,7 @@ _install: $(TARGET)$(CONFIG_XEN_INSTALL_SUFFIX)
 			echo 'EFI installation only partially done (EFI_VENDOR not set)' >&2; \
 		fi; \
 	fi
+	$(MAKE) -f $(BASEDIR)/Rules.mk -C arch/$(TARGET_ARCH) install
 
 .PHONY: _uninstall
 _uninstall: D=$(DESTDIR)
@@ -92,6 +93,7 @@ _uninstall:
 	rm -f $(D)$(EFI_DIR)/$(T)-$(XEN_VERSION).efi
 	rm -f $(D)$(EFI_DIR)/$(T).efi
 	rm -f $(D)$(EFI_MOUNTPOINT)/efi/$(EFI_VENDOR)/$(T)-$(XEN_FULLVERSION).efi
+	$(MAKE) -f $(BASEDIR)/Rules.mk -C arch/$(TARGET_ARCH) uninstall
 
 .PHONY: _debug
 _debug:
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index f144c14..35ba293 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -56,6 +56,10 @@ ifeq ($(CONFIG_ARM_64),y)
 	ln -sf $(notdir $@)  ../../$(notdir $@).efi
 endif
 
+install:
+
+uninstall:
+
 $(TARGET).axf: $(TARGET)-syms
 	# XXX: VE model loads by VMA so instead of
 	# making a proper ELF we link with LMA == VMA and adjust crudely
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index f7d3e39..51d12ac 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -74,7 +74,12 @@ efi-y := $(shell if [ ! -r $(BASEDIR)/include/xen/compile.h -o \
 $(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32
 	./boot/mkelf32 $(TARGET)-syms $(TARGET) 0x100000 \
 	`$(NM) -nr $(TARGET)-syms | head -n 1 | sed -e 's/^\([^ ]*\).*/0x\1/'`
+	$(MAKE) -f $(BASEDIR)/Rules.mk -C test
 
+install:
+	$(MAKE) -f $(BASEDIR)/Rules.mk -C test install
+uninstall:
+	$(MAKE) -f $(BASEDIR)/Rules.mk -C test uninstall
 
 ALL_OBJS := $(BASEDIR)/arch/x86/boot/built_in.o $(BASEDIR)/arch/x86/efi/built_in.o $(ALL_OBJS)
 
@@ -178,3 +183,4 @@ clean::
 	rm -f $(BASEDIR)/.xen-syms.[0-9]* boot/.*.d
 	rm -f $(BASEDIR)/.xen.efi.[0-9]* efi/*.o efi/.*.d efi/*.efi efi/disabled efi/mkreloc
 	rm -f boot/reloc.S boot/reloc.lnk boot/reloc.bin
+	$(MAKE) -f $(BASEDIR)/Rules.mk -C test clean
diff --git a/xen/arch/x86/test/Makefile b/xen/arch/x86/test/Makefile
new file mode 100644
index 0000000..3fe951d
--- /dev/null
+++ b/xen/arch/x86/test/Makefile
@@ -0,0 +1,50 @@
+include $(XEN_ROOT)/Config.mk
+
+CODE_ADDR=$(shell nm --defined $(1) | grep $(2) | awk '{print "0x"$$1}')
+CODE_SZ=$(shell nm --defined -S $(1) | grep $(2) | awk '{ print "0x"$$2}')
+
+.PHONY: default
+ifdef CONFIG_XSPLICE
+
+XSPLICE := xen_hello_world.xsplice
+
+default: xsplice
+
+install: xsplice
+	$(INSTALL_DATA) $(XSPLICE) $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE)
+uninstall:
+	rm -f $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE)
+else
+default:
+install:
+uninstall:
+endif
+
+.PHONY: clean
+clean::
+	rm -f *.o .*.o.d $(XSPLICE) config.h
+
+#
+# To compute these values we need the binary files: xen-syms
+# and xen_hello_world_func.o to be already compiled.
+#
+# We can be assured that xen-syms is already built as we are
+# the last entry in the build target.
+#
+.PHONY: config.h
+config.h: OLD_CODE=$(call CODE_ADDR,$(BASEDIR)/xen-syms,xen_extra_version)
+config.h: OLD_CODE_SZ=$(call CODE_SZ,$<,xen_hello_world)
+config.h: NEW_CODE_SZ=$(call CODE_SZ,$(BASEDIR)/xen-syms,xen_extra_version)
+config.h: xen_hello_world_func.o
+	(set -e; \
+	 echo "#define NEW_CODE_SZ $(NEW_CODE_SZ)"; \
+	 echo "#define OLD_CODE_SZ $(OLD_CODE_SZ)"; \
+	 echo "#define OLD_CODE $(OLD_CODE)") > $@
+
+.PHONY: xsplice
+xsplice: config.h
+	# Need to have these done in sequential order
+	$(MAKE) -f $(BASEDIR)/Rules.mk xen_hello_world_func.o
+	$(MAKE) -f $(BASEDIR)/Rules.mk xen_hello_world.o
+	$(LD) $(LDFLAGS) -r -o $(XSPLICE) xen_hello_world_func.o xen_hello_world.o
+
diff --git a/xen/arch/x86/test/xen_hello_world.c b/xen/arch/x86/test/xen_hello_world.c
new file mode 100644
index 0000000..6a1775b
--- /dev/null
+++ b/xen/arch/x86/test/xen_hello_world.c
@@ -0,0 +1,15 @@
+#include <xen/config.h>
+#include <xen/types.h>
+#include <xen/xsplice.h>
+#include "config.h"
+
+static char name[] = "xen_hello_world";
+extern const char *xen_hello_world(void);
+
+struct xsplice_patch_func __section(".xsplice.funcs") xsplice_xen_hello_world = {
+    .name = name,
+    .new_addr = (unsigned long)(xen_hello_world),
+    .old_addr = OLD_CODE,
+    .new_size = NEW_CODE_SZ,
+    .old_size = OLD_CODE_SZ,
+};
diff --git a/xen/arch/x86/test/xen_hello_world_func.c b/xen/arch/x86/test/xen_hello_world_func.c
new file mode 100644
index 0000000..95ffcbd
--- /dev/null
+++ b/xen/arch/x86/test/xen_hello_world_func.c
@@ -0,0 +1,8 @@
+#include <xen/config.h>
+#include <xen/types.h>
+
+/* Our replacement function for xen_extra_version. */
+const char *xen_hello_world(void)
+{
+    return "Hello World";
+}
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 09/23] xsplice: Add support for bug frames. (v4)
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
                   ` (7 preceding siblings ...)
  2016-02-12 18:05 ` [PATCH v3 08/23] x86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version'. (v2) Konrad Rzeszutek Wilk
@ 2016-02-12 18:05 ` Konrad Rzeszutek Wilk
  2016-02-16 19:35   ` Andrew Cooper
  2016-02-12 18:05 ` [PATCH v3 10/23] xsplice: Add support for exception tables. (v2) Konrad Rzeszutek Wilk
                   ` (14 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:05 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Keir Fraser, Jan Beulich, Ian Campbell,
	Stefano Stabellini, xen-devel
  Cc: Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Add support for handling bug frames contained with xsplice modules. If a
trap occurs search either the kernel bug table or an applied payload's
bug table depending on the instruction pointer.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v2:- s/module/payload/
   - add build time check in case amount of bug frames expands.
   - add define for the number of bug-frames.
v3:
  - add missing BUGFRAME_NR, squash s/core_size/core/ in earlier patch.
v4:- Add comment about it being optional.
---
 xen/arch/x86/traps.c      | 30 ++++++++++------
 xen/common/symbols.c      |  7 ++++
 xen/common/xsplice.c      | 91 ++++++++++++++++++++++++++++++++++++++++++++++-
 xen/include/asm-arm/bug.h |  2 ++
 xen/include/asm-x86/bug.h |  2 ++
 xen/include/xen/kernel.h  |  1 +
 xen/include/xen/xsplice.h | 15 ++++++++
 7 files changed, 137 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 26a5026..f3adefa 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -48,6 +48,7 @@
 #include <xen/kexec.h>
 #include <xen/trace.h>
 #include <xen/paging.h>
+#include <xen/xsplice.h>
 #include <xen/watchdog.h>
 #include <asm/system.h>
 #include <asm/io.h>
@@ -1161,20 +1162,29 @@ void do_invalid_op(struct cpu_user_regs *regs)
         return;
     }
 
-    if ( !is_active_kernel_text(regs->eip) ||
+    if ( !is_active_text(regs->eip) ||
          __copy_from_user(bug_insn, eip, sizeof(bug_insn)) ||
          memcmp(bug_insn, "\xf\xb", sizeof(bug_insn)) )
         goto die;
 
-    for ( bug = __start_bug_frames, id = 0; stop_frames[id]; ++bug )
+    if ( likely(is_active_kernel_text(regs->eip)) )
     {
-        while ( unlikely(bug == stop_frames[id]) )
-            ++id;
-        if ( bug_loc(bug) == eip )
-            break;
+        for ( bug = __start_bug_frames, id = 0; stop_frames[id]; ++bug )
+        {
+            while ( unlikely(bug == stop_frames[id]) )
+                ++id;
+            if ( bug_loc(bug) == eip )
+                break;
+        }
+        if ( !stop_frames[id] )
+            goto die;
+    }
+    else
+    {
+        bug = xsplice_find_bug(eip, &id);
+        if ( !bug )
+            goto die;
     }
-    if ( !stop_frames[id] )
-        goto die;
 
     eip += sizeof(bug_insn);
     if ( id == BUGFRAME_run_fn )
@@ -1188,7 +1198,7 @@ void do_invalid_op(struct cpu_user_regs *regs)
 
     /* WARN, BUG or ASSERT: decode the filename pointer and line number. */
     filename = bug_ptr(bug);
-    if ( !is_kernel(filename) )
+    if ( !is_kernel(filename) && !is_module(filename) )
         goto die;
     fixup = strlen(filename);
     if ( fixup > 50 )
@@ -1215,7 +1225,7 @@ void do_invalid_op(struct cpu_user_regs *regs)
     case BUGFRAME_assert:
         /* ASSERT: decode the predicate string pointer. */
         predicate = bug_msg(bug);
-        if ( !is_kernel(predicate) )
+        if ( !is_kernel(predicate) && !is_module(predicate) )
             predicate = "<unknown>";
 
         printk("Assertion '%s' failed at %s%s:%d\n",
diff --git a/xen/common/symbols.c b/xen/common/symbols.c
index a59c59d..bf5623f 100644
--- a/xen/common/symbols.c
+++ b/xen/common/symbols.c
@@ -17,6 +17,7 @@
 #include <xen/lib.h>
 #include <xen/string.h>
 #include <xen/spinlock.h>
+#include <xen/xsplice.h>
 #include <public/platform.h>
 #include <xen/guest_access.h>
 
@@ -101,6 +102,12 @@ bool_t is_active_kernel_text(unsigned long addr)
             (system_state < SYS_STATE_active && is_kernel_inittext(addr)));
 }
 
+bool_t is_active_text(unsigned long addr)
+{
+    return is_active_kernel_text(addr) ||
+           is_active_module_text(addr);
+}
+
 const char *symbols_lookup(unsigned long addr,
                            unsigned long *symbolsize,
                            unsigned long *offset,
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index b854c0a..7f71ac6 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -42,7 +42,10 @@ struct payload {
     struct list_head applied_list;       /* Linked to 'applied_list'. */
     struct xsplice_patch_func *funcs;    /* The array of functions to patch. */
     unsigned int nfuncs;                 /* Nr of functions to patch. */
-
+    size_t core_size;                    /* Everything else - .data,.rodata, etc. */
+    size_t core_text_size;               /* Only .text size. */
+    struct bug_frame *start_bug_frames[BUGFRAME_NR]; /* .bug.frame patching. */
+    struct bug_frame *stop_bug_frames[BUGFRAME_NR];
     char name[XEN_XSPLICE_NAME_SIZE + 1];/* Name of it. */
 };
 
@@ -561,6 +564,7 @@ static int move_payload(struct payload *payload, struct xsplice_elf *elf)
              (SHF_ALLOC|SHF_EXECINSTR) )
             calc_section(&elf->sec[i], &size);
     }
+    payload->core_text_size = size;
 
     /* Compute rw data */
     for ( i = 0; i < elf->hdr->e_shnum; i++ )
@@ -579,6 +583,7 @@ static int move_payload(struct payload *payload, struct xsplice_elf *elf)
              !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
             calc_section(&elf->sec[i], &size);
     }
+    payload->core_size = size;
 
     buf = alloc_payload(size);
     if ( !buf ) {
@@ -663,6 +668,24 @@ static int find_special_sections(struct payload *payload,
             if ( f->pad[j] )
                 return -EINVAL;
     }
+
+    /* Optional sections. */
+    for ( i = 0; i < BUGFRAME_NR; i++ )
+    {
+        char str[14];
+
+        snprintf(str, sizeof str, ".bug_frames.%d", i);
+        sec = xsplice_elf_sec_by_name(elf, str);
+        if ( !sec )
+            continue;
+
+        if ( ( !sec->sec->sh_size ) ||
+             ( sec->sec->sh_size % sizeof (struct bug_frame) ) )
+            return -EINVAL;
+
+        payload->start_bug_frames[i] = (struct bug_frame *)sec->load_addr;
+        payload->stop_bug_frames[i] = (struct bug_frame *)(sec->load_addr + sec->sec->sh_size);
+    }
     return 0;
 }
 
@@ -961,6 +984,72 @@ void do_xsplice(void)
     }
 }
 
+
+/*
+ * Functions for handling special sections.
+ */
+struct bug_frame *xsplice_find_bug(const char *eip, int *id)
+{
+    struct payload *data;
+    struct bug_frame *bug;
+    int i;
+
+    /* No locking since this list is only ever changed during apply or revert
+     * context. */
+    list_for_each_entry ( data, &applied_list, applied_list )
+    {
+        for (i = 0; i < BUGFRAME_NR; i++) {
+            if (!data->start_bug_frames[i])
+                continue;
+            if ( !((void *)eip >= data->payload_address &&
+                   (void *)eip < (data->payload_address + data->core_text_size)))
+                continue;
+
+            for ( bug = data->start_bug_frames[i]; bug != data->stop_bug_frames[i]; ++bug ) {
+                if ( bug_loc(bug) == eip )
+                {
+                    *id = i;
+                    return bug;
+                }
+            }
+        }
+    }
+
+    return NULL;
+}
+
+bool_t is_module(const void *ptr)
+{
+    struct payload *data;
+
+    /* No locking since this list is only ever changed during apply or revert
+     * context. */
+    list_for_each_entry ( data, &applied_list, applied_list )
+    {
+        if ( ptr >= data->payload_address &&
+             ptr < (data->payload_address + data->core_size))
+            return 1;
+    }
+
+    return 0;
+}
+
+bool_t is_active_module_text(unsigned long addr)
+{
+    struct payload *data;
+
+    /* No locking since this list is only ever changed during apply or revert
+     * context. */
+    list_for_each_entry ( data, &applied_list, applied_list )
+    {
+        if ( (void *)addr >= data->payload_address &&
+             (void *)addr < (data->payload_address + data->core_text_size))
+            return 1;
+    }
+
+    return 0;
+}
+
 static int __init xsplice_init(void)
 {
     register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
diff --git a/xen/include/asm-arm/bug.h b/xen/include/asm-arm/bug.h
index ab9e811..4df6b2a 100644
--- a/xen/include/asm-arm/bug.h
+++ b/xen/include/asm-arm/bug.h
@@ -31,6 +31,7 @@ struct bug_frame {
 #define BUGFRAME_warn   0
 #define BUGFRAME_bug    1
 #define BUGFRAME_assert 2
+#define BUGFRAME_NR     3
 
 /* Many versions of GCC doesn't support the asm %c parameter which would
  * be preferable to this unpleasantness. We use mergeable string
@@ -39,6 +40,7 @@ struct bug_frame {
  */
 #define BUG_FRAME(type, line, file, has_msg, msg) do {                      \
     BUILD_BUG_ON((line) >> 16);                                             \
+    BUILD_BUG_ON(type >= BUGFRAME_NR);                                      \
     asm ("1:"BUG_INSTR"\n"                                                  \
          ".pushsection .rodata.str, \"aMS\", %progbits, 1\n"                \
          "2:\t.asciz " __stringify(file) "\n"                               \
diff --git a/xen/include/asm-x86/bug.h b/xen/include/asm-x86/bug.h
index e868e85..5443191 100644
--- a/xen/include/asm-x86/bug.h
+++ b/xen/include/asm-x86/bug.h
@@ -9,6 +9,7 @@
 #define BUGFRAME_warn   1
 #define BUGFRAME_bug    2
 #define BUGFRAME_assert 3
+#define BUGFRAME_NR     4
 
 #ifndef __ASSEMBLY__
 
@@ -51,6 +52,7 @@ struct bug_frame {
 
 #define BUG_FRAME(type, line, ptr, second_frame, msg) do {                   \
     BUILD_BUG_ON((line) >> (BUG_LINE_LO_WIDTH + BUG_LINE_HI_WIDTH));         \
+    BUILD_BUG_ON((type) >= (BUGFRAME_NR));                                   \
     asm volatile ( _ASM_BUGFRAME_TEXT(second_frame)                          \
                    :: _ASM_BUGFRAME_INFO(type, line, ptr, msg) );            \
 } while (0)
diff --git a/xen/include/xen/kernel.h b/xen/include/xen/kernel.h
index 548b64d..df57754 100644
--- a/xen/include/xen/kernel.h
+++ b/xen/include/xen/kernel.h
@@ -99,6 +99,7 @@ extern enum system_state {
 } system_state;
 
 bool_t is_active_kernel_text(unsigned long addr);
+bool_t is_active_text(unsigned long addr);
 
 #endif /* _LINUX_KERNEL_H */
 
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index d6db1c2..c257b3a 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -26,6 +26,9 @@ struct xsplice_patch_func {
 #ifdef CONFIG_XSPLICE
 int xsplice_control(struct xen_sysctl_xsplice_op *);
 void do_xsplice(void);
+struct bug_frame *xsplice_find_bug(const char *eip, int *id);
+bool_t is_module(const void *addr);
+bool_t is_active_module_text(unsigned long addr);
 
 /* Arch hooks */
 int xsplice_verify_elf(struct xsplice_elf *elf, uint8_t *data);
@@ -44,5 +47,17 @@ static inline int xsplice_control(struct xen_sysctl_xsplice_op *op)
     return -ENOSYS;
 }
 static inline void do_xsplice(void) { };
+static inline struct bug_frame *xsplice_find_bug(const char *eip, int *id)
+{
+	return NULL;
+}
+static inline bool_t is_module(const void *addr)
+{
+	return 0;
+}
+static inline bool_t is_active_module_text(unsigned long addr)
+{
+	return 0;
+}
 #endif
 #endif /* __XEN_XSPLICE_H__ */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 10/23] xsplice: Add support for exception tables. (v2)
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
                   ` (8 preceding siblings ...)
  2016-02-12 18:05 ` [PATCH v3 09/23] xsplice: Add support for bug frames. (v4) Konrad Rzeszutek Wilk
@ 2016-02-12 18:05 ` Konrad Rzeszutek Wilk
  2016-02-12 18:05 ` [PATCH v3 11/23] xsplice: Add support for alternatives Konrad Rzeszutek Wilk
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:05 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Keir Fraser, Jan Beulich, xen-devel
  Cc: Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Add support for exception tables contained within xSplice payloads. If an
exception occurs search either the main exception table or a particular
active payload's exception table depending on the instruction pointer.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v2:
 - s/module/payload/
 - sanity checks.
---
 xen/arch/x86/extable.c        | 36 +++++++++++++++++++++--------------
 xen/common/xsplice.c          | 44 +++++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/uaccess.h |  5 +++++
 xen/include/xen/xsplice.h     |  5 +++++
 4 files changed, 76 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c
index 89b5bcb..2787a92 100644
--- a/xen/arch/x86/extable.c
+++ b/xen/arch/x86/extable.c
@@ -4,6 +4,7 @@
 #include <xen/perfc.h>
 #include <xen/sort.h>
 #include <xen/spinlock.h>
+#include <xen/xsplice.h>
 #include <asm/uaccess.h>
 
 #define EX_FIELD(ptr, field) ((unsigned long)&(ptr)->field + (ptr)->field)
@@ -18,7 +19,7 @@ static inline unsigned long ex_cont(const struct exception_table_entry *x)
 	return EX_FIELD(x, cont);
 }
 
-static int __init cmp_ex(const void *a, const void *b)
+static int cmp_ex(const void *a, const void *b)
 {
 	const struct exception_table_entry *l = a, *r = b;
 	unsigned long lip = ex_addr(l);
@@ -33,7 +34,7 @@ static int __init cmp_ex(const void *a, const void *b)
 }
 
 #ifndef swap_ex
-static void __init swap_ex(void *a, void *b, int size)
+static void swap_ex(void *a, void *b, int size)
 {
 	struct exception_table_entry *l = a, *r = b, tmp;
 	long delta = b - a;
@@ -46,19 +47,23 @@ static void __init swap_ex(void *a, void *b, int size)
 }
 #endif
 
-void __init sort_exception_tables(void)
+void sort_exception_table(struct exception_table_entry *start,
+                          struct exception_table_entry *stop)
 {
-    sort(__start___ex_table, __stop___ex_table - __start___ex_table,
-         sizeof(struct exception_table_entry), cmp_ex, swap_ex);
-    sort(__start___pre_ex_table,
-         __stop___pre_ex_table - __start___pre_ex_table,
+    sort(start, stop - start,
          sizeof(struct exception_table_entry), cmp_ex, swap_ex);
 }
 
-static inline unsigned long
-search_one_table(const struct exception_table_entry *first,
-                 const struct exception_table_entry *last,
-                 unsigned long value)
+void __init sort_exception_tables(void)
+{
+    sort_exception_table(__start___ex_table, __stop___ex_table);
+    sort_exception_table(__start___pre_ex_table, __stop___pre_ex_table);
+}
+
+unsigned long
+search_one_extable(const struct exception_table_entry *first,
+                   const struct exception_table_entry *last,
+                   unsigned long value)
 {
     const struct exception_table_entry *mid;
     long diff;
@@ -80,15 +85,18 @@ search_one_table(const struct exception_table_entry *first,
 unsigned long
 search_exception_table(unsigned long addr)
 {
-    return search_one_table(
-        __start___ex_table, __stop___ex_table-1, addr);
+    if ( likely(is_kernel(addr)) )
+        return search_one_extable(
+            __start___ex_table, __stop___ex_table-1, addr);
+    else
+        return search_module_extables(addr);
 }
 
 unsigned long
 search_pre_exception_table(struct cpu_user_regs *regs)
 {
     unsigned long addr = (unsigned long)regs->eip;
-    unsigned long fixup = search_one_table(
+    unsigned long fixup = search_one_extable(
         __start___pre_ex_table, __stop___pre_ex_table-1, addr);
     if ( fixup )
     {
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 7f71ac6..d863a99 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -46,6 +46,10 @@ struct payload {
     size_t core_text_size;               /* Only .text size. */
     struct bug_frame *start_bug_frames[BUGFRAME_NR]; /* .bug.frame patching. */
     struct bug_frame *stop_bug_frames[BUGFRAME_NR];
+#ifdef CONFIG_X86
+    struct exception_table_entry *start_ex_table;
+    struct exception_table_entry *stop_ex_table;
+#endif
     char name[XEN_XSPLICE_NAME_SIZE + 1];/* Name of it. */
 };
 
@@ -686,6 +690,20 @@ static int find_special_sections(struct payload *payload,
         payload->start_bug_frames[i] = (struct bug_frame *)sec->load_addr;
         payload->stop_bug_frames[i] = (struct bug_frame *)(sec->load_addr + sec->sec->sh_size);
     }
+#ifdef CONFIG_X86
+    sec = xsplice_elf_sec_by_name(elf, ".ex_table");
+    if ( sec )
+    {
+        if ( ( !sec->sec->sh_size ) ||
+             ( sec->sec->sh_size % sizeof *sec->load_addr ) )
+            return -EINVAL;
+
+        payload->start_ex_table = (struct exception_table_entry *)sec->load_addr;
+        payload->stop_ex_table = (struct exception_table_entry *)(sec->load_addr + sec->sec->sh_size);
+
+        sort_exception_table(payload->start_ex_table, payload->stop_ex_table);
+    }
+#endif
     return 0;
 }
 
@@ -1050,6 +1068,32 @@ bool_t is_active_module_text(unsigned long addr)
     return 0;
 }
 
+#ifdef CONFIG_X86
+unsigned long search_module_extables(unsigned long addr)
+{
+    struct payload *data;
+    unsigned long ret;
+
+    /* No locking since this list is only ever changed during apply or revert
+     * context. */
+    list_for_each_entry ( data, &applied_list, applied_list )
+    {
+        if ( !data->start_ex_table )
+            continue;
+        if ( !((void *)addr >= data->payload_address &&
+               (void *)addr < (data->payload_address + data->core_text_size)))
+            continue;
+
+        ret = search_one_extable(data->start_ex_table, data->stop_ex_table - 1,
+                                 addr);
+        if ( ret )
+            return ret;
+    }
+
+    return 0;
+}
+#endif
+
 static int __init xsplice_init(void)
 {
     register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
diff --git a/xen/include/asm-x86/uaccess.h b/xen/include/asm-x86/uaccess.h
index 947470d..9e67bf0 100644
--- a/xen/include/asm-x86/uaccess.h
+++ b/xen/include/asm-x86/uaccess.h
@@ -276,6 +276,11 @@ extern struct exception_table_entry __start___pre_ex_table[];
 extern struct exception_table_entry __stop___pre_ex_table[];
 
 extern unsigned long search_exception_table(unsigned long);
+extern unsigned long search_one_extable(const struct exception_table_entry *first,
+                                        const struct exception_table_entry *last,
+                                        unsigned long value);
 extern void sort_exception_tables(void);
+extern void sort_exception_table(struct exception_table_entry *start,
+                                 struct exception_table_entry *stop);
 
 #endif /* __X86_UACCESS_H__ */
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index c257b3a..3a9948a 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -29,6 +29,7 @@ void do_xsplice(void);
 struct bug_frame *xsplice_find_bug(const char *eip, int *id);
 bool_t is_module(const void *addr);
 bool_t is_active_module_text(unsigned long addr);
+unsigned long search_module_extables(unsigned long addr);
 
 /* Arch hooks */
 int xsplice_verify_elf(struct xsplice_elf *elf, uint8_t *data);
@@ -59,5 +60,9 @@ static inline bool_t is_active_module_text(unsigned long addr)
 {
 	return 0;
 }
+static inline unsigned long search_module_extables(unsigned long addr)
+{
+	return 0;
+}
 #endif
 #endif /* __XEN_XSPLICE_H__ */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 11/23] xsplice: Add support for alternatives
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
                   ` (9 preceding siblings ...)
  2016-02-12 18:05 ` [PATCH v3 10/23] xsplice: Add support for exception tables. (v2) Konrad Rzeszutek Wilk
@ 2016-02-12 18:05 ` Konrad Rzeszutek Wilk
  2016-02-16 19:41   ` Andrew Cooper
  2016-02-12 18:05 ` [PATCH v3 12/23] xsm/xen_version: Add XSM for the xen_version hypercall (v8) Konrad Rzeszutek Wilk
                   ` (12 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:05 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Keir Fraser, Jan Beulich, xen-devel
  Cc: Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Add support for applying alternative sections within xsplice modules. At
module load time, apply an alternative sections that are found.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/arch/x86/Makefile             |  2 +-
 xen/arch/x86/alternative.c        | 12 ++++++------
 xen/common/xsplice.c              | 10 +++++++++-
 xen/include/asm-x86/alternative.h |  1 +
 4 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 51d12ac..8cce4ba 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -6,7 +6,7 @@ subdir-y += mm
 subdir-y += oprofile
 subdir-y += x86_64
 
-obj-bin-y += alternative.init.o
+obj-bin-y += alternative.o
 obj-y += apic.o
 obj-y += bitops.o
 obj-bin-y += bzimage.init.o
diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index 46ac0fd..8d895ad 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -28,7 +28,7 @@
 extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
 
 #ifdef K8_NOP1
-static const unsigned char k8nops[] __initconst = {
+static const unsigned char k8nops[] = {
     K8_NOP1,
     K8_NOP2,
     K8_NOP3,
@@ -52,7 +52,7 @@ static const unsigned char * const k8_nops[ASM_NOP_MAX+1] = {
 #endif
 
 #ifdef P6_NOP1
-static const unsigned char p6nops[] __initconst = {
+static const unsigned char p6nops[] = {
     P6_NOP1,
     P6_NOP2,
     P6_NOP3,
@@ -75,7 +75,7 @@ static const unsigned char * const p6_nops[ASM_NOP_MAX+1] = {
 };
 #endif
 
-static const unsigned char * const *ideal_nops __initdata = k8_nops;
+static const unsigned char * const *ideal_nops = k8_nops;
 
 static int __init mask_nmi_callback(const struct cpu_user_regs *regs, int cpu)
 {
@@ -100,7 +100,7 @@ static void __init arch_init_ideal_nops(void)
 }
 
 /* Use this to add nops to a buffer, then text_poke the whole buffer. */
-static void __init add_nops(void *insns, unsigned int len)
+static void add_nops(void *insns, unsigned int len)
 {
     while ( len > 0 )
     {
@@ -127,7 +127,7 @@ static void __init add_nops(void *insns, unsigned int len)
  *
  * This routine is called with local interrupt disabled.
  */
-static void *__init text_poke_early(void *addr, const void *opcode, size_t len)
+static void *text_poke_early(void *addr, const void *opcode, size_t len)
 {
     memcpy(addr, opcode, len);
     sync_core();
@@ -142,7 +142,7 @@ static void *__init text_poke_early(void *addr, const void *opcode, size_t len)
  * APs have less capabilities than the boot processor are not handled.
  * Tough. Make sure you disable such features by hand.
  */
-static void __init apply_alternatives(struct alt_instr *start, struct alt_instr *end)
+void apply_alternatives(struct alt_instr *start, struct alt_instr *end)
 {
     struct alt_instr *a;
     u8 *instr, *replacement;
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index d863a99..65b1f11 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -695,7 +695,7 @@ static int find_special_sections(struct payload *payload,
     if ( sec )
     {
         if ( ( !sec->sec->sh_size ) ||
-             ( sec->sec->sh_size % sizeof *sec->load_addr ) )
+             ( sec->sec->sh_size % sizeof (struct exception_table_entry) ) )
             return -EINVAL;
 
         payload->start_ex_table = (struct exception_table_entry *)sec->load_addr;
@@ -703,6 +703,14 @@ static int find_special_sections(struct payload *payload,
 
         sort_exception_table(payload->start_ex_table, payload->stop_ex_table);
     }
+    sec = xsplice_elf_sec_by_name(elf, ".altinstructions");
+    if ( sec )
+    {
+        local_irq_disable();
+        apply_alternatives((struct alt_instr *)sec->load_addr,
+                           (struct alt_instr *)(sec->load_addr + sec->sec->sh_size));
+        local_irq_enable();
+    }
 #endif
     return 0;
 }
diff --git a/xen/include/asm-x86/alternative.h b/xen/include/asm-x86/alternative.h
index 1056630..6a62113 100644
--- a/xen/include/asm-x86/alternative.h
+++ b/xen/include/asm-x86/alternative.h
@@ -23,6 +23,7 @@ struct alt_instr {
     u8  replacementlen;     /* length of new instruction, <= instrlen */
 };
 
+extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end);
 extern void alternative_instructions(void);
 
 #define OLDINSTR(oldinstr)      "661:\n\t" oldinstr "\n662:\n"
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 12/23] xsm/xen_version: Add XSM for the xen_version hypercall (v8).
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
                   ` (10 preceding siblings ...)
  2016-02-12 18:05 ` [PATCH v3 11/23] xsplice: Add support for alternatives Konrad Rzeszutek Wilk
@ 2016-02-12 18:05 ` Konrad Rzeszutek Wilk
  2016-02-12 21:52   ` Daniel De Graaf
  2016-02-12 18:05 ` [PATCH v3 13/23] XENVER_build_id: Provide ld-embedded build-ids (v10) Konrad Rzeszutek Wilk
                   ` (11 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:05 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Daniel De Graaf, Ian Jackson,
	Stefano Stabellini, Ian Campbell, Wei Liu, xen-devel
  Cc: Konrad Rzeszutek Wilk

All of XENVER_* have now an XSM check for their sub-ops.

The subop for XENVER_commandline is now a priviliged operation.
To not break guests we still return an string - but it is
just '<denied>\0'.

The rest: XENVER_[version|extraversion|capabilities|
parameters|get_features|page_size|guest_handle|changeset|
compile_info] behave as before - allowed by default for all
guests if using the XSM default policy or with the dummy one.

The admin can choose to change the sub-ops to be denied
as they see fit.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v2: Do XSM check for all the XENVER_ ops.
v3: Add empty data conditions.
v4: Return <denied> for priv subops.
v5: Move extraversion from priv to normal. Drop the XSM check
    for the non-priv subops.
v6: Add +1 for strlen(xen_deny()) to include NULL. Move changeset,
    compile_info to non-priv subops.
v7: Remove the \0 on xen_deny()
v8: Add new XSM domain for xenver hypercall. Add all subops to it.
---
 tools/flask/policy/policy/modules/xen/xen.te | 13 +++++++
 xen/common/kernel.c                          | 53 +++++++++++++++++++++-------
 xen/common/version.c                         |  5 +++
 xen/include/xen/version.h                    |  1 +
 xen/include/xsm/dummy.h                      | 22 ++++++++++++
 xen/include/xsm/xsm.h                        |  5 +++
 xen/xsm/dummy.c                              |  1 +
 xen/xsm/flask/hooks.c                        | 44 +++++++++++++++++++++++
 xen/xsm/flask/policy/access_vectors          | 28 +++++++++++++++
 xen/xsm/flask/policy/security_classes        |  1 +
 10 files changed, 161 insertions(+), 12 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index 542c3e1..9ad648a 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -74,6 +74,14 @@ allow dom0_t xen_t:xen2 {
     get_symbol
     xsplice_op
 };
+
+# Allow dom0 to use all XENVER_ subops
+# Note that dom0 is part of domain_type so this has duplicates.
+allow dom0_t xen_t:version {
+    version extraversion compile_info capabilities changeset
+    platform_parameters get_features pagesize guest_handle commandline
+};
+
 allow dom0_t xen_t:mmu memorymap;
 
 # Allow dom0 to use these domctls on itself. For domctls acting on other
@@ -138,6 +146,11 @@ if (guest_writeconsole) {
 # pmu_ctrl is for)
 allow domain_type xen_t:xen2 pmu_use;
 
+# For normal guests all except XENVER_commandline
+allow domain_type xen_t:version {
+    version extraversion compile_info capabilities changeset
+    platform_parameters get_features pagesize guest_handle
+};
 ###############################################################################
 #
 # Domain creation
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 0618da2..a5e3f0e 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -13,6 +13,7 @@
 #include <xen/nmi.h>
 #include <xen/guest_access.h>
 #include <xen/hypercall.h>
+#include <xsm/xsm.h>
 #include <asm/current.h>
 #include <public/nmi.h>
 #include <public/version.h>
@@ -223,12 +224,15 @@ void __init do_initcalls(void)
 /*
  * Simple hypercalls.
  */
-
 DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
+    bool_t deny = !!xsm_version_op(XSM_OTHER, cmd);
+
     switch ( cmd )
     {
     case XENVER_version:
+        if ( deny )
+            return 0;
         return (xen_major_version() << 16) | xen_minor_version();
 
     case XENVER_extraversion:
@@ -236,7 +240,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         xen_extraversion_t extraversion;
 
         memset(extraversion, 0, sizeof(extraversion));
-        safe_strcpy(extraversion, xen_extra_version());
+        safe_strcpy(extraversion, deny ? xen_deny() : xen_extra_version());
         if ( copy_to_guest(arg, extraversion, ARRAY_SIZE(extraversion)) )
             return -EFAULT;
         return 0;
@@ -247,10 +251,10 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         xen_compile_info_t info;
 
         memset(&info, 0, sizeof(info));
-        safe_strcpy(info.compiler,       xen_compiler());
-        safe_strcpy(info.compile_by,     xen_compile_by());
-        safe_strcpy(info.compile_domain, xen_compile_domain());
-        safe_strcpy(info.compile_date,   xen_compile_date());
+        safe_strcpy(info.compiler,       deny ? xen_deny() : xen_compiler());
+        safe_strcpy(info.compile_by,     deny ? xen_deny() : xen_compile_by());
+        safe_strcpy(info.compile_domain, deny ? xen_deny() : xen_compile_domain());
+        safe_strcpy(info.compile_date,   deny ? xen_deny() : xen_compile_date());
         if ( copy_to_guest(arg, &info, 1) )
             return -EFAULT;
         return 0;
@@ -261,7 +265,8 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         xen_capabilities_info_t info;
 
         memset(info, 0, sizeof(info));
-        arch_get_xen_caps(&info);
+        if ( !deny )
+            arch_get_xen_caps(&info);
 
         if ( copy_to_guest(arg, info, ARRAY_SIZE(info)) )
             return -EFAULT;
@@ -274,6 +279,9 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             .virt_start = HYPERVISOR_VIRT_START
         };
 
+        if ( deny )
+            params.virt_start = 0;
+
         if ( copy_to_guest(arg, &params, 1) )
             return -EFAULT;
         return 0;
@@ -285,7 +293,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         xen_changeset_info_t chgset;
 
         memset(chgset, 0, sizeof(chgset));
-        safe_strcpy(chgset, xen_changeset());
+        safe_strcpy(chgset, deny ? xen_deny() : xen_changeset());
         if ( copy_to_guest(arg, chgset, ARRAY_SIZE(chgset)) )
             return -EFAULT;
         return 0;
@@ -302,6 +310,8 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         switch ( fi.submap_idx )
         {
         case 0:
+            if ( deny )
+                break;
             fi.submap = (1U << XENFEAT_memory_op_vnode_supported);
             if ( VM_ASSIST(d, pae_extended_cr3) )
                 fi.submap |= (1U << XENFEAT_pae_pgdir_above_4gb);
@@ -342,19 +352,38 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     }
 
     case XENVER_pagesize:
+        if ( deny )
+            return 0;
         return (!guest_handle_is_null(arg) ? -EINVAL : PAGE_SIZE);
 
     case XENVER_guest_handle:
-        if ( copy_to_guest(arg, current->domain->handle,
-                           ARRAY_SIZE(current->domain->handle)) )
+    {
+        xen_domain_handle_t hdl;
+        ssize_t len;
+
+        if ( deny )
+        {
+            len = sizeof(hdl);
+            memset(&hdl, 0, len);
+        } else
+            len = ARRAY_SIZE(current->domain->handle);
+
+        if ( copy_to_guest(arg, deny ? hdl : current->domain->handle, len ) )
             return -EFAULT;
         return 0;
-
+    }
     case XENVER_commandline:
-        if ( copy_to_guest(arg, saved_cmdline, ARRAY_SIZE(saved_cmdline)) )
+    {
+        size_t len = ARRAY_SIZE(saved_cmdline);
+
+        if ( deny )
+            len = strlen(xen_deny()) + 1;
+
+        if ( copy_to_guest(arg, deny ? xen_deny() : saved_cmdline, len) )
             return -EFAULT;
         return 0;
     }
+    }
 
     return -ENOSYS;
 }
diff --git a/xen/common/version.c b/xen/common/version.c
index b152e27..786be4e 100644
--- a/xen/common/version.c
+++ b/xen/common/version.c
@@ -55,3 +55,8 @@ const char *xen_banner(void)
 {
     return XEN_BANNER;
 }
+
+const char *xen_deny(void)
+{
+    return "<denied>";
+}
diff --git a/xen/include/xen/version.h b/xen/include/xen/version.h
index 81a3c7d..2015c0b 100644
--- a/xen/include/xen/version.h
+++ b/xen/include/xen/version.h
@@ -12,5 +12,6 @@ unsigned int xen_minor_version(void);
 const char *xen_extra_version(void);
 const char *xen_changeset(void);
 const char *xen_banner(void);
+const char *xen_deny(void);
 
 #endif /* __XEN_VERSION_H__ */
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 1d13826..9fcc57a 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -727,3 +727,25 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, unsigned int
 }
 
 #endif /* CONFIG_X86 */
+
+#include <public/version.h>
+static XSM_INLINE int xsm_version_op (XSM_DEFAULT_ARG uint32_t op)
+{
+    XSM_ASSERT_ACTION(XSM_OTHER);
+    switch ( op )
+    {
+    case XENVER_version:
+    case XENVER_extraversion:
+    case XENVER_compile_info:
+    case XENVER_capabilities:
+    case XENVER_changeset:
+    case XENVER_platform_parameters:
+    case XENVER_get_features:
+    case XENVER_pagesize:
+    case XENVER_guest_handle:
+        /* These MUST always be accessible to any guest by default. */
+        return xsm_default_action(XSM_HOOK, current->domain, NULL);
+    default:
+        return xsm_default_action(XSM_PRIV, current->domain, NULL);
+    }
+}
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 3afed70..2c3b1c0 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -193,6 +193,7 @@ struct xsm_operations {
     int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
     int (*pmu_op) (struct domain *d, unsigned int op);
 #endif
+    int (*version_op) (uint32_t cmd);
 };
 
 #ifdef CONFIG_XSM
@@ -731,6 +732,10 @@ static inline int xsm_pmu_op (xsm_default_t def, struct domain *d, unsigned int
 
 #endif /* CONFIG_X86 */
 
+static inline int xsm_version_op (xsm_default_t def, uint32_t op)
+{
+    return xsm_ops->version_op(op);
+}
 #endif /* XSM_NO_WRAPPERS */
 
 #ifdef CONFIG_MULTIBOOT
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 0f32636..1469dce 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -162,4 +162,5 @@ void xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, ioport_mapping);
     set_to_dummy_if_null(ops, pmu_op);
 #endif
+    set_to_dummy_if_null(ops, version_op);
 }
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index c856e1e..7e3bcdd 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -26,6 +26,7 @@
 #include <public/xen.h>
 #include <public/physdev.h>
 #include <public/platform.h>
+#include <public/version.h>
 
 #include <public/xsm/flask_op.h>
 
@@ -1626,6 +1627,48 @@ static int flask_pmu_op (struct domain *d, unsigned int op)
 }
 #endif /* CONFIG_X86 */
 
+static int flask_version_op (uint32_t op)
+{
+    u32 dsid = domain_sid(current->domain);
+
+    switch ( op )
+    {
+    case XENVER_version:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__VERSION, NULL);
+    case XENVER_extraversion:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__EXTRAVERSION, NULL);
+    case XENVER_compile_info:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__COMPILE_INFO, NULL);
+    case XENVER_capabilities:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__CAPABILITIES, NULL);
+    case XENVER_changeset:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__CHANGESET, NULL);
+    case XENVER_platform_parameters:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__PLATFORM_PARAMETERS, NULL);
+    case XENVER_get_features:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__GET_FEATURES, NULL);
+    case XENVER_pagesize:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__PAGESIZE, NULL);
+    case XENVER_guest_handle:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__GUEST_HANDLE, NULL);
+        return 0; /* These MUST always be accessible to guests. */
+    case XENVER_commandline:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__COMMANDLINE, NULL);
+    default:
+        return -EPERM;
+    }
+}
+
 long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 int compat_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 
@@ -1764,6 +1807,7 @@ static struct xsm_operations flask_ops = {
     .ioport_mapping = flask_ioport_mapping,
     .pmu_op = flask_pmu_op,
 #endif
+    .version_op = flask_version_op,
 };
 
 static __init void flask_init(void)
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 5f08d05..7cb32de 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -497,3 +497,31 @@ class security
 # remove ocontext label definitions for resources
     del_ocontext
 }
+
+# Class version is used to describe the XENVER_ hypercall.
+# Each sub-ops is described here - in the default case all of them should
+# be allowed except the XENVER_commandline.
+#
+class version
+{
+# Often called by PV kernels to force an callback.
+    version
+# Extra informations (-unstable).
+    extraversion
+# Compile information of the hypervisor.
+    compile_info
+# Such as "xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64".
+    capabilities
+# Such as the virtual address of where the hypervisor resides.
+    platform_parameters
+# Source code changeset.
+    changeset
+# The features the hypervisor supports.
+    get_features
+# Page size the hypervisor uses.
+    pagesize
+# An value that the control stack can choose.
+    guest_handle
+# Xen command line.
+    commandline
+}
diff --git a/xen/xsm/flask/policy/security_classes b/xen/xsm/flask/policy/security_classes
index ca191db..cde4e1a 100644
--- a/xen/xsm/flask/policy/security_classes
+++ b/xen/xsm/flask/policy/security_classes
@@ -18,5 +18,6 @@ class shadow
 class event
 class grant
 class security
+class version
 
 # FLASK
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 13/23] XENVER_build_id: Provide ld-embedded build-ids (v10)
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
                   ` (11 preceding siblings ...)
  2016-02-12 18:05 ` [PATCH v3 12/23] xsm/xen_version: Add XSM for the xen_version hypercall (v8) Konrad Rzeszutek Wilk
@ 2016-02-12 18:05 ` Konrad Rzeszutek Wilk
  2016-02-12 21:52   ` Daniel De Graaf
  2016-02-16 20:09   ` Andrew Cooper
  2016-02-12 18:05 ` [PATCH v3 14/23] libxl: info: Display build_id of the hypervisor Konrad Rzeszutek Wilk
                   ` (10 subsequent siblings)
  23 siblings, 2 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:05 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Daniel De Graaf, Ian Jackson,
	Stefano Stabellini, Ian Campbell, Wei Liu, Stefano Stabellini,
	Keir Fraser, Jan Beulich, xen-devel
  Cc: Konrad Rzeszutek Wilk

The mechanism to get this is via the XENVER hypercall and
we add a new sub-command to retrieve the binary build-id
called XENVER_build_id. The sub-hypercall parameter
allows an arbitrary size (the buffer and len is provided
to the hypervisor). A NULL parameter will probe the hypervisor
for the length of the build-id.

One can also retrieve the value of the build-id by doing
'readelf -n xen-syms'.

For EFI builds we re-use the same build-id that the xen-syms
was built with.

The version of ld that first implemented --build-id is v2.18.
Hence we check for that or later version - if older version
found we do not build the hypervisor with the build-id
(and the return code is -ENODATA for that case).

For x86 we have two binaries - the xen-syms and the xen - an
smaller version with lots of sections removed. To make it possible
for readelf -n xen we also modify mkelf32 and xen.lds.S to include
the PT_NOTE ELF section.

The EFI binary is more complicated. Having any non-recognizable
sections (.note, .data.note, etc) causes the boot to hang.
Moving the .note in the .data section makes it work. It is also
worth noting that the PE/COFF does not have any "comment"
sections to the author.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Martin Pohlack <mpohlack@amazon.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v1: Rebase it on Martin's initial patch
v2: Move it to XENVER hypercall
v3: Fix EFI building (Ross's fix)
v4: Don't use the third argument for length.
v5: Use new structure for XENVER_build_id with variable buf.
v6: Include Ross's fix.
v7: Include detection of bin-utils for build-id support, add
    probing for size, and return -EPERM for XSM denied calls.
v8: Build xen_build_id under ARM, required adding ELFSIZE in proper file.
v9: Rebase on top XSM version class.
v10: Include the build-id .note in the xen ELF binary.
     s/build_id/build_id_linker/
    For EFI build, moved the --build-id values in .data section
---
 Config.mk                                    |  11 +++
 tools/flask/policy/policy/modules/xen/xen.te |   4 +-
 tools/libxc/xc_private.c                     |   7 ++
 tools/libxc/xc_private.h                     |  10 ++
 xen/arch/arm/Makefile                        |   2 +-
 xen/arch/arm/xen.lds.S                       |  13 +++
 xen/arch/x86/Makefile                        |  31 +++++-
 xen/arch/x86/boot/mkelf32.c                  | 137 +++++++++++++++++++++++----
 xen/arch/x86/xen.lds.S                       |  23 +++++
 xen/common/kernel.c                          |  36 +++++++
 xen/common/version.c                         |  48 ++++++++++
 xen/include/public/version.h                 |  16 +++-
 xen/include/xen/version.h                    |   1 +
 xen/xsm/flask/hooks.c                        |   3 +
 xen/xsm/flask/policy/access_vectors          |   2 +
 15 files changed, 319 insertions(+), 25 deletions(-)

diff --git a/Config.mk b/Config.mk
index 429e460..61186e2 100644
--- a/Config.mk
+++ b/Config.mk
@@ -126,6 +126,17 @@ endef
 check-$(gcc) = $(call cc-ver-check,CC,0x040100,"Xen requires at least gcc-4.1")
 $(eval $(check-y))
 
+ld-ver-build-id = $(shell $(1) --build-id 2>&1 | \
+					grep -q unrecognized && echo n || echo y)
+
+# binutils 2.18 implement build-id.
+ifeq ($(call ld-ver-build-id,$(LD)),n)
+build_id_linker :=
+else
+CFLAGS += -DBUILD_ID
+build_id_linker := --build-id=sha1
+endif
+
 # as-insn: Check whether assembler supports an instruction.
 # Usage: cflags-y += $(call as-insn "insn",option-yes,option-no)
 as-insn = $(if $(shell echo 'void _(void) { asm volatile ( $(2) ); }' \
diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index 9ad648a..2988954 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -79,7 +79,7 @@ allow dom0_t xen_t:xen2 {
 # Note that dom0 is part of domain_type so this has duplicates.
 allow dom0_t xen_t:version {
     version extraversion compile_info capabilities changeset
-    platform_parameters get_features pagesize guest_handle commandline
+    platform_parameters get_features pagesize guest_handle commandline build_id
 };
 
 allow dom0_t xen_t:mmu memorymap;
@@ -146,7 +146,7 @@ if (guest_writeconsole) {
 # pmu_ctrl is for)
 allow domain_type xen_t:xen2 pmu_use;
 
-# For normal guests all except XENVER_commandline
+# For normal guests all except XENVER_commandline|build_id
 allow domain_type xen_t:version {
     version extraversion compile_info capabilities changeset
     platform_parameters get_features pagesize guest_handle
diff --git a/tools/libxc/xc_private.c b/tools/libxc/xc_private.c
index c41e433..d57c39a 100644
--- a/tools/libxc/xc_private.c
+++ b/tools/libxc/xc_private.c
@@ -495,6 +495,13 @@ int xc_version(xc_interface *xch, int cmd, void *arg)
     case XENVER_commandline:
         sz = sizeof(xen_commandline_t);
         break;
+    case XENVER_build_id:
+        {
+            xen_build_id_t *build_id = (xen_build_id_t *)arg;
+            sz = sizeof(*build_id) + build_id->len;
+            HYPERCALL_BOUNCE_SET_DIR(arg, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
+            break;
+        }
     default:
         ERROR("xc_version: unknown command %d\n", cmd);
         return -EINVAL;
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index aa8daf1..6b592d3 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -191,6 +191,16 @@ enum {
 #define DECLARE_HYPERCALL_BOUNCE(_ubuf, _sz, _dir) DECLARE_NAMED_HYPERCALL_BOUNCE(_ubuf, _ubuf, _sz, _dir)
 
 /*
+ * Change the direction.
+ *
+ * Can only be used if the bounce_pre/bounce_post commands have
+ * not been used.
+ */
+#define HYPERCALL_BOUNCE_SET_DIR(_buf, _dir) do { if ((HYPERCALL_BUFFER(_buf))->hbuf)         \
+                                                        assert(1);                            \
+                                                   (HYPERCALL_BUFFER(_buf))->dir = _dir;      \
+                                                } while (0)
+/*
  * Set the size of data to bounce. Useful when the size is not known
  * when the bounce buffer is declared.
  */
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 35ba293..8491267 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -93,7 +93,7 @@ $(TARGET)-syms: prelink.o xen.lds $(BASEDIR)/common/symbols-dummy.o
 	$(NM) -pa --format=sysv $(@D)/.$(@F).1 \
 		| $(BASEDIR)/tools/symbols --sysv --sort >$(@D)/.$(@F).1.S
 	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1.o
-	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
+	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o $(build_id_linker) \
 	    $(@D)/.$(@F).1.o -o $@
 	rm -f $(@D)/.$(@F).[0-9]*
 
diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index f501a2f..5cf180f 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -22,6 +22,9 @@ OUTPUT_ARCH(FORMAT)
 PHDRS
 {
   text PT_LOAD /* XXX should be AT ( XEN_PHYS_START ) */ ;
+#if defined(BUILD_ID)
+  note PT_NOTE ;
+#endif
 }
 SECTIONS
 {
@@ -53,6 +56,16 @@ SECTIONS
         _erodata = .;          /* End of read-only data */
   } :text
 
+#if defined(BUILD_ID)
+  .note : {
+       __note_gnu_build_id_start = .;
+       *(.note.gnu.build-id)
+       __note_gnu_build_id_end = .;
+       *(.note)
+       *(.note.*)
+  } :text
+#endif
+
   .data : {                    /* Data */
        . = ALIGN(PAGE_SIZE);
        *(.data.page_aligned)
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 8cce4ba..4a19ae9 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -71,9 +71,16 @@ efi-y := $(shell if [ ! -r $(BASEDIR)/include/xen/compile.h -o \
                       -O $(BASEDIR)/include/xen/compile.h ]; then \
                          echo '$(TARGET).efi'; fi)
 
+ifdef build_id_linker
+num_phdrs = 2
+else
+num_phdrs = 1
+endif
+
 $(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32
 	./boot/mkelf32 $(TARGET)-syms $(TARGET) 0x100000 \
-	`$(NM) -nr $(TARGET)-syms | head -n 1 | sed -e 's/^\([^ ]*\).*/0x\1/'`
+	`$(NM) -nr $(TARGET)-syms | head -n 1 | sed -e 's/^\([^ ]*\).*/0x\1/'` \
+	$(num_phdrs)
 	$(MAKE) -f $(BASEDIR)/Rules.mk -C test
 
 install:
@@ -109,20 +116,27 @@ $(BASEDIR)/common/symbols-dummy.o:
 	$(MAKE) -f $(BASEDIR)/Rules.mk -C $(BASEDIR)/common symbols-dummy.o
 
 $(TARGET)-syms: prelink.o xen.lds $(BASEDIR)/common/symbols-dummy.o
-	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
+	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o  \
 	    $(BASEDIR)/common/symbols-dummy.o -o $(@D)/.$(@F).0
 	$(NM) -pa --format=sysv $(@D)/.$(@F).0 \
 		| $(BASEDIR)/tools/symbols --sysv --sort >$(@D)/.$(@F).0.S
 	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).0.o
-	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
+	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o  \
 	    $(@D)/.$(@F).0.o -o $(@D)/.$(@F).1
 	$(NM) -pa --format=sysv $(@D)/.$(@F).1 \
 		| $(BASEDIR)/tools/symbols --sysv --sort --warn-dup >$(@D)/.$(@F).1.S
 	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1.o
-	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
+	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o $(build_id_linker) \
 	    $(@D)/.$(@F).1.o -o $@
 	rm -f $(@D)/.$(@F).[0-9]*
 
+build_id.o: $(TARGET)-syms
+	$(OBJCOPY) --only-section=.note $< .$@.0
+	# Need to clear the CODE when the build_id.o is put in the .data
+	$(OBJCOPY) --set-section-flags=.note=alloc,load,data  \
+		   --rename-section=.note=.note.gnu.build-id .$@.0 $@
+	rm -f .$@.0
+
 EFI_LDFLAGS = $(patsubst -m%,-mi386pep,$(LDFLAGS)) --subsystem=10
 EFI_LDFLAGS += --image-base=$(1) --stack=0,0 --heap=0,0 --strip-debug
 EFI_LDFLAGS += --section-alignment=0x200000 --file-alignment=0x20
@@ -135,6 +149,13 @@ $(TARGET).efi: VIRT_BASE = 0x$(shell $(NM) efi/relocs-dummy.o | sed -n 's, A VIR
 $(TARGET).efi: ALT_BASE = 0x$(shell $(NM) efi/relocs-dummy.o | sed -n 's, A ALT_START$$,,p')
 # Don't use $(wildcard ...) here - at least make 3.80 expands this too early!
 $(TARGET).efi: guard = $(if $(shell echo efi/dis* | grep disabled),:)
+ifdef build_id_linker
+$(TARGET).efi: build_id.o
+build_id_file := build_id.o
+else
+build_id_file :=
+endif
+
 $(TARGET).efi: prelink-efi.o efi.lds efi/relocs-dummy.o $(BASEDIR)/common/symbols-dummy.o efi/mkreloc
 	$(foreach base, $(VIRT_BASE) $(ALT_BASE), \
 	          $(guard) $(LD) $(call EFI_LDFLAGS,$(base)) -T efi.lds -N $< efi/relocs-dummy.o \
@@ -151,7 +172,7 @@ $(TARGET).efi: prelink-efi.o efi.lds efi/relocs-dummy.o $(BASEDIR)/common/symbol
 		| $(guard) $(BASEDIR)/tools/symbols --sysv --sort >$(@D)/.$(@F).1s.S
 	$(guard) $(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1r.o $(@D)/.$(@F).1s.o
 	$(guard) $(LD) $(call EFI_LDFLAGS,$(VIRT_BASE)) -T efi.lds -N $< \
-	                $(@D)/.$(@F).1r.o $(@D)/.$(@F).1s.o -o $@
+	                $(@D)/.$(@F).1r.o $(@D)/.$(@F).1s.o $(build_id_file) -o $@
 	if $(guard) false; then rm -f $@; echo 'EFI support disabled'; fi
 	rm -f $(@D)/.$(@F).[0-9]*
 
diff --git a/xen/arch/x86/boot/mkelf32.c b/xen/arch/x86/boot/mkelf32.c
index 890ae6d..da4347a 100644
--- a/xen/arch/x86/boot/mkelf32.c
+++ b/xen/arch/x86/boot/mkelf32.c
@@ -45,9 +45,9 @@ static Elf32_Ehdr out_ehdr = {
     0,                                       /* e_flags */
     sizeof(Elf32_Ehdr),                      /* e_ehsize */
     sizeof(Elf32_Phdr),                      /* e_phentsize */
-    1,                                       /* e_phnum */
+    1,  /* modify based on num_phdrs */      /* e_phnum */
     sizeof(Elf32_Shdr),                      /* e_shentsize */
-    3,                                       /* e_shnum */
+    3,  /* modify based on num_phdrs */      /* e_shnum */
     2                                        /* e_shstrndx */
 };
 
@@ -61,8 +61,20 @@ static Elf32_Phdr out_phdr = {
     PF_R|PF_W|PF_X,                          /* p_flags */
     64                                       /* p_align */
 };
+static Elf32_Phdr note_phdr = {
+    PT_NOTE,                                 /* p_type */
+    DYNAMICALLY_FILLED,                      /* p_offset */
+    DYNAMICALLY_FILLED,                      /* p_vaddr */
+    DYNAMICALLY_FILLED,                      /* p_paddr */
+    DYNAMICALLY_FILLED,                      /* p_filesz */
+    DYNAMICALLY_FILLED,                      /* p_memsz */
+    PF_R,                                    /* p_flags */
+    4                                        /* p_align */
+};
 
 static u8 out_shstrtab[] = "\0.text\0.shstrtab";
+/* If num_phdrs >= 2, we need to tack the .note. */
+static u8 out_shstrtab_extra[] = ".note\0";
 
 static Elf32_Shdr out_shdr[] = {
     { 0 },
@@ -90,6 +102,23 @@ static Elf32_Shdr out_shdr[] = {
     }
 };
 
+/*
+ * The 17 points to the '.note' in the out_shstrtab and out_shstrtab_extra
+ * laid out in the file.
+ */
+static Elf32_Shdr out_shdr_extra = {
+      17,                                    /* sh_name */
+      SHT_NOTE,                              /* sh_type */
+      0,                                     /* sh_flags */
+      DYNAMICALLY_FILLED,                    /* sh_addr */
+      DYNAMICALLY_FILLED,                    /* sh_offset */
+      DYNAMICALLY_FILLED,                    /* sh_size */
+      0,                                     /* sh_link */
+      0,                                     /* sh_info */
+      4,                                     /* sh_addralign */
+      0                                      /* sh_entsize */
+};
+
 /* Some system header files define these macros and pollute our namespace. */
 #undef swap16
 #undef swap32
@@ -228,21 +257,22 @@ static void do_read(int fd, void *data, int len)
 int main(int argc, char **argv)
 {
     u64        final_exec_addr;
-    u32        loadbase, dat_siz, mem_siz;
+    u32        loadbase, dat_siz, mem_siz, note_base, note_sz, offset;
     char      *inimage, *outimage;
     int        infd, outfd = -1;
     char       buffer[1024];
     int        bytes, todo, i, rc = 1;
+    int        num_phdrs;
 
     Elf32_Ehdr in32_ehdr;
 
     Elf64_Ehdr in64_ehdr;
     Elf64_Phdr in64_phdr;
 
-    if ( argc != 5 )
+    if ( argc != 6 )
     {
         fprintf(stderr, "Usage: mkelf32 <in-image> <out-image> "
-                "<load-base> <final-exec-addr>\n");
+                "<load-base> <final-exec-addr> <number of program headers>\n");
         return 1;
     }
 
@@ -250,7 +280,13 @@ int main(int argc, char **argv)
     outimage = argv[2];
     loadbase = strtoul(argv[3], NULL, 16);
     final_exec_addr = strtoull(argv[4], NULL, 16);
-
+    num_phdrs = atoi(argv[5]);
+    if ( num_phdrs > 2 || num_phdrs < 1 )
+    {
+        fprintf(stderr, "Number of program headers MUST be 1 or 2, got %d!\n",
+                num_phdrs);
+        return 1;
+    }
     infd = open(inimage, O_RDONLY);
     if ( infd == -1 )
     {
@@ -285,11 +321,10 @@ int main(int argc, char **argv)
                 (int)in64_ehdr.e_phentsize, (int)sizeof(in64_phdr));
         goto out;
     }
-
-    if ( in64_ehdr.e_phnum != 1 )
+    if ( in64_ehdr.e_phnum != num_phdrs )
     {
-        fprintf(stderr, "Expect precisly 1 program header; found %d.\n",
-                (int)in64_ehdr.e_phnum);
+        fprintf(stderr, "Expect precisly %d program header; found %d.\n",
+                num_phdrs, (int)in64_ehdr.e_phnum);
         goto out;
     }
 
@@ -299,11 +334,36 @@ int main(int argc, char **argv)
 
     (void)lseek(infd, in64_phdr.p_offset, SEEK_SET);
     dat_siz = (u32)in64_phdr.p_filesz;
-
     /* Do not use p_memsz: it does not include BSS alignment padding. */
     /*mem_siz = (u32)in64_phdr.p_memsz;*/
     mem_siz = (u32)(final_exec_addr - in64_phdr.p_vaddr);
 
+    note_sz = note_base = offset = 0;
+    if ( num_phdrs > 1 )
+    {
+        offset = in64_phdr.p_offset;
+        note_base = in64_phdr.p_vaddr;
+
+        (void)lseek(infd, in64_ehdr.e_phoff+sizeof(in64_phdr), SEEK_SET);
+        do_read(infd, &in64_phdr, sizeof(in64_phdr));
+        endianadjust_phdr64(&in64_phdr);
+
+        (void)lseek(infd, offset, SEEK_SET);
+
+        note_sz = in64_phdr.p_memsz;
+        note_base = in64_phdr.p_vaddr - note_base;
+
+        if ( in64_phdr.p_offset > dat_siz || offset > in64_phdr.p_offset )
+        {
+            fprintf(stderr, "Expected .note section within .text section!\n" \
+                    "Offset %ld not within %d!\n",
+                    in64_phdr.p_offset, dat_siz);
+            goto out;
+        }
+        /* Gets us the absolute offset within the .text section. */
+        offset = in64_phdr.p_offset - offset;
+    }
+
     /*
      * End the image on a page boundary. This gets round alignment bugs
      * in the boot- or chain-loader (e.g., kexec on the XenoBoot CD).
@@ -322,6 +382,31 @@ int main(int argc, char **argv)
     out_shdr[1].sh_size   = dat_siz;
     out_shdr[2].sh_offset = RAW_OFFSET + dat_siz + sizeof(out_shdr);
 
+    if ( num_phdrs > 1 )
+    {
+        /* We have two of them! */
+        out_ehdr.e_phnum = num_phdrs;
+        /* Extra .note section. */
+        out_ehdr.e_shnum++;
+
+        /* Fill out the PT_NOTE program header. */
+        note_phdr.p_vaddr   = note_base;
+        note_phdr.p_paddr   = note_base;
+        note_phdr.p_filesz  = note_sz;
+        note_phdr.p_memsz   = note_sz;
+        note_phdr.p_offset  = offset;
+
+        /* Tack on the .note\0 */
+        out_shdr[2].sh_size += sizeof(out_shstrtab_extra);
+        /* And move it past the .note section. */
+        out_shdr[2].sh_offset += sizeof(out_shdr_extra);
+
+        /* Fill out the .note section. */
+        out_shdr_extra.sh_size = note_sz;
+        out_shdr_extra.sh_addr = note_base;
+        out_shdr_extra.sh_offset = RAW_OFFSET + offset;
+    }
+
     outfd = open(outimage, O_WRONLY|O_CREAT|O_TRUNC, 0775);
     if ( outfd == -1 )
     {
@@ -335,8 +420,15 @@ int main(int argc, char **argv)
 
     endianadjust_phdr32(&out_phdr);
     do_write(outfd, &out_phdr, sizeof(out_phdr));
-    
-    if ( (bytes = RAW_OFFSET - sizeof(out_ehdr) - sizeof(out_phdr)) < 0 )
+
+    if ( num_phdrs > 1 )
+    {
+        endianadjust_phdr32(&note_phdr);
+        do_write(outfd, &note_phdr, sizeof(note_phdr));
+    }
+
+    if ( (bytes = RAW_OFFSET - sizeof(out_ehdr) - sizeof(out_phdr) -
+          ( num_phdrs > 1 ? sizeof(note_phdr) : 0 ) ) < 0 )
     {
         fprintf(stderr, "Header overflow.\n");
         goto out;
@@ -355,9 +447,22 @@ int main(int argc, char **argv)
         endianadjust_shdr32(&out_shdr[i]);
     do_write(outfd, &out_shdr[0], sizeof(out_shdr));
 
-    do_write(outfd, out_shstrtab, sizeof(out_shstrtab));
-    do_write(outfd, buffer, 4-((sizeof(out_shstrtab)+dat_siz)&3));
-
+    if ( num_phdrs > 1 )
+    {
+        endianadjust_shdr32(&out_shdr_extra);
+        /* Append the .note section. */
+        do_write(outfd, &out_shdr_extra, sizeof(out_shdr_extra));
+        /* The normal strings - .text\0.. */
+        do_write(outfd, out_shstrtab, sizeof(out_shstrtab));
+        /* Our .note */
+        do_write(outfd, out_shstrtab_extra, sizeof(out_shstrtab_extra));
+        do_write(outfd, buffer, 4-((sizeof(out_shstrtab)+sizeof(out_shstrtab_extra)+dat_siz)&3));
+    }
+    else
+    {
+        do_write(outfd, out_shstrtab, sizeof(out_shstrtab));
+        do_write(outfd, buffer, 4-((sizeof(out_shstrtab)+dat_siz)&3));
+    }
     rc = 0;
 out:
     if ( infd != -1 )
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 3b199ca..99a3fcb 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -31,6 +31,9 @@ OUTPUT_ARCH(i386:x86-64)
 PHDRS
 {
   text PT_LOAD ;
+#if defined(BUILD_ID) && !defined(EFI)
+  note PT_NOTE ;
+#endif
 }
 SECTIONS
 {
@@ -67,6 +70,21 @@ SECTIONS
        *(.rodata.*)
   } :text
 
+#if defined(BUILD_ID) && !defined(EFI)
+/*
+ * No mechanism to put an PT_NOTE in the EFI file - so put
+ * it in .data section.
+ */
+  . = ALIGN(4);
+  .note : {
+       __note_gnu_build_id_start = .;
+       *(.note.gnu.build-id)
+       __note_gnu_build_id_end = .;
+       *(.note)
+       *(.note.*)
+  } :note :text
+#endif
+
   . = ALIGN(SMP_CACHE_BYTES);
   .data.read_mostly : {
        /* Exception table */
@@ -86,6 +104,11 @@ SECTIONS
        __end_schedulers_array = .;
        *(.data.rel.ro)
        *(.data.rel.ro.*)
+#if defined(BUILD_ID) && defined(EFI)
+       __note_gnu_build_id_start = .;
+       *(.note.gnu.build-id)
+       __note_gnu_build_id_end = .;
+#endif
   } :text
 
   .data : {                    /* Data */
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index a5e3f0e..cd746a9 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -383,6 +383,42 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             return -EFAULT;
         return 0;
     }
+
+    case XENVER_build_id:
+    {
+        xen_build_id_t build_id;
+        unsigned int sz = 0;
+        int rc = 0;
+        char *p = NULL;
+
+        if ( deny )
+            return -EPERM;
+
+        /* Only return size. */
+        if ( !guest_handle_is_null(arg) )
+        {
+            if ( copy_from_guest(&build_id, arg, 1) )
+                return -EFAULT;
+
+            if ( build_id.len == 0 )
+                return -EINVAL;
+        }
+
+        rc = xen_build_id(&p, &sz);
+        if ( rc )
+            return rc;
+
+        if ( guest_handle_is_null(arg) )
+            return sz;
+
+        if ( sz > build_id.len )
+            return -ENOBUFS;
+
+        if ( copy_to_guest_offset(arg, offsetof(xen_build_id_t, buf), p, sz) )
+            return -EFAULT;
+
+        return sz;
+    }
     }
 
     return -ENOSYS;
diff --git a/xen/common/version.c b/xen/common/version.c
index 786be4e..33c09e5 100644
--- a/xen/common/version.c
+++ b/xen/common/version.c
@@ -1,5 +1,9 @@
 #include <xen/compile.h>
+#include <xen/errno.h>
+#include <xen/string.h>
+#include <xen/types.h>
 #include <xen/version.h>
+#include <xen/elf.h>
 
 const char *xen_compile_date(void)
 {
@@ -60,3 +64,47 @@ const char *xen_deny(void)
 {
     return "<denied>";
 }
+#ifdef BUILD_ID
+#define NT_GNU_BUILD_ID 3
+/* Defined in linker script. */
+extern const Elf_Note __note_gnu_build_id_start[], __note_gnu_build_id_end[];
+
+int xen_build_id(char **p, unsigned int *len)
+{
+    const Elf_Note *n = __note_gnu_build_id_start;
+    static bool_t checked = 0;
+
+    if ( checked )
+    {
+        *len = n->descsz;
+        *p = ELFNOTE_DESC(n);
+        return 0;
+    }
+    /* --build-id invoked with wrong parameters. */
+    if ( __note_gnu_build_id_end <= __note_gnu_build_id_start )
+        return -ENODATA;
+
+    /* Check for full Note header. */
+    if ( &n[1] > __note_gnu_build_id_end )
+        return -ENODATA;
+
+    /* Check if we really have a build-id. */
+    if ( NT_GNU_BUILD_ID != n->type )
+        return -ENODATA;
+
+    /* Sanity check, name should be "GNU" for ld-generated build-id. */
+    if ( strncmp(ELFNOTE_NAME(n), "GNU", n->namesz) != 0 )
+        return -ENODATA;
+
+    *len = n->descsz;
+    *p = ELFNOTE_DESC(n);
+
+    checked = 1;
+    return 0;
+}
+#else
+int xen_build_id(char **p, unsigned int *len)
+{
+    return -ENODATA;
+}
+#endif
diff --git a/xen/include/public/version.h b/xen/include/public/version.h
index 44f26b0..adca602 100644
--- a/xen/include/public/version.h
+++ b/xen/include/public/version.h
@@ -30,7 +30,8 @@
 
 #include "xen.h"
 
-/* NB. All ops return zero on success, except XENVER_{version,pagesize} */
+/* NB. All ops return zero on success, except
+ * XENVER_{version,pagesize,build_id} */
 
 /* arg == NULL; returns major:minor (16:16). */
 #define XENVER_version      0
@@ -83,6 +84,19 @@ typedef struct xen_feature_info xen_feature_info_t;
 #define XENVER_commandline 9
 typedef char xen_commandline_t[1024];
 
+/* Return value is the number of bytes written, or XEN_Exx on error.
+ * Calling with empty parameter returns the size of build_id. */
+#define XENVER_build_id 10
+struct xen_build_id {
+        uint32_t        len; /* IN: size of buf[]. */
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
+        unsigned char   buf[];
+#elif defined(__GNUC__)
+        unsigned char   buf[1]; /* OUT: Variable length buffer with build_id. */
+#endif
+};
+typedef struct xen_build_id xen_build_id_t;
+
 #endif /* __XEN_PUBLIC_VERSION_H__ */
 
 /*
diff --git a/xen/include/xen/version.h b/xen/include/xen/version.h
index 2015c0b..466c977 100644
--- a/xen/include/xen/version.h
+++ b/xen/include/xen/version.h
@@ -13,5 +13,6 @@ const char *xen_extra_version(void);
 const char *xen_changeset(void);
 const char *xen_banner(void);
 const char *xen_deny(void);
+int xen_build_id(char **p, unsigned int *len);
 
 #endif /* __XEN_VERSION_H__ */
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 7e3bcdd..396ee46 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1664,6 +1664,9 @@ static int flask_version_op (uint32_t op)
     case XENVER_commandline:
         return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
                             VERSION__COMMANDLINE, NULL);
+    case XENVER_build_id:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__BUILD_ID, NULL);
     default:
         return -EPERM;
     }
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 7cb32de..c9cd102 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -524,4 +524,6 @@ class version
     guest_handle
 # Xen command line.
     commandline
+# Build id
+    build_id
 }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 14/23] libxl: info: Display build_id of the hypervisor.
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
                   ` (12 preceding siblings ...)
  2016-02-12 18:05 ` [PATCH v3 13/23] XENVER_build_id: Provide ld-embedded build-ids (v10) Konrad Rzeszutek Wilk
@ 2016-02-12 18:05 ` Konrad Rzeszutek Wilk
  2016-02-15 12:45   ` Wei Liu
  2016-02-12 18:05 ` [PATCH v3 15/23] xsplice: Print build_id in keyhandler Konrad Rzeszutek Wilk
                   ` (9 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:05 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Ian Jackson, Stefano Stabellini,
	Ian Campbell, Wei Liu, xen-devel
  Cc: Konrad Rzeszutek Wilk

If the hypervisor is built with we will display it.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v2: Include HAVE_*, use libxl_zalloc, s/rc/ret/
v3: Retry with different size if 1020 is not enough.
---
 tools/libxl/libxl.c         | 45 +++++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl.h         |  5 +++++
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl_cmdimpl.c    |  1 +
 4 files changed, 52 insertions(+)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 2d18b8d..4efd8dd 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5256,6 +5256,38 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
     return ret;
 }
 
+static const int libxl_get_build_id(libxl_ctx *ctx, libxl_version_info *info,
+                                    xen_build_id_t *build)
+{
+    GC_INIT(ctx);
+    int ret;
+
+    ret = xc_version(ctx->xch, XENVER_build_id, build);
+    switch ( ret ) {
+    case -EPERM:
+    case -ENODATA:
+    case 0:
+        info->build_id = libxl__strdup(NOGC, "");
+        break;
+    case -ENOBUFS:
+        GC_FREE;
+        return -ENOBUFS;
+    default:
+        if (ret > 0) {
+            unsigned int i;
+
+            info->build_id = libxl__zalloc(NOGC, (ret * 2) + 1);
+
+            for (i = 0; i < ret ; i++)
+                snprintf(&info->build_id[i * 2], 3, "%02hhx", build->buf[i]);
+        } else
+            LOGEV(ERROR, ret, "getting build_id");
+        break;
+    }
+    GC_FREE;
+    return 0;
+}
+
 const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx)
 {
     GC_INIT(ctx);
@@ -5266,8 +5298,10 @@ const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx)
         xen_capabilities_info_t xen_caps;
         xen_platform_parameters_t p_parms;
         xen_commandline_t xen_commandline;
+        xen_build_id_t build_id;
     } u;
     long xen_version;
+    int ret;
     libxl_version_info *info = &ctx->version_info;
 
     if (info->xen_version_extra != NULL)
@@ -5300,6 +5334,17 @@ const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx)
     xc_version(ctx->xch, XENVER_commandline, &u.xen_commandline);
     info->commandline = libxl__strdup(NOGC, u.xen_commandline);
 
+    u.build_id.len = sizeof(u) - sizeof(u.build_id);
+    ret = libxl_get_build_id(ctx, info, &u.build_id);
+    if ( ret == -ENOBUFS ) {
+            xen_build_id_t *build_id;
+
+            build_id = libxl__zalloc(gc, info->pagesize);
+            build_id->len = info->pagesize - sizeof(*build_id);
+            ret = libxl_get_build_id(ctx, info, build_id);
+            if ( ret )
+                LOGEV(ERROR, ret, "getting build_id");
+    }
  out:
     GC_FREE;
     return info;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index fa87f53..b713407 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -218,6 +218,11 @@
 #define LIBXL_HAVE_SOFT_RESET 1
 
 /*
+ * LIBXL_HAVE_BUILD_ID means that libxl_version_info has the extra
+ * field for the hypervisor build_id.
+ */
+#define LIBXL_HAVE_BUILD_ID 1
+/*
  * libxl ABI compatibility
  *
  * The only guarantee which libxl makes regarding ABI compatibility
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 9ad7eba..92bf620 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -356,6 +356,7 @@ libxl_version_info = Struct("version_info", [
     ("virt_start",        uint64),
     ("pagesize",          integer),
     ("commandline",       string),
+    ("build_id",          string),
     ], dir=DIR_OUT)
 
 libxl_domain_create_info = Struct("domain_create_info",[
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index d07ccb2..9bdc42a 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -5552,6 +5552,7 @@ static void output_xeninfo(void)
     printf("cc_compile_by          : %s\n", info->compile_by);
     printf("cc_compile_domain      : %s\n", info->compile_domain);
     printf("cc_compile_date        : %s\n", info->compile_date);
+    printf("build_id               : %s\n", info->build_id);
 
     return;
 }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 15/23] xsplice: Print build_id in keyhandler.
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
                   ` (13 preceding siblings ...)
  2016-02-12 18:05 ` [PATCH v3 14/23] libxl: info: Display build_id of the hypervisor Konrad Rzeszutek Wilk
@ 2016-02-12 18:05 ` Konrad Rzeszutek Wilk
  2016-02-16 20:13   ` Andrew Cooper
  2016-02-12 18:05 ` [PATCH v3 16/23] xsplice: basic build-id dependency checking Konrad Rzeszutek Wilk
                   ` (8 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:05 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Ian Campbell, Ian Jackson, Jan Beulich,
	Keir Fraser, Tim Deegan, xen-devel
  Cc: Konrad Rzeszutek Wilk

As it should be an useful debug mechanism.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/common/xsplice.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 65b1f11..34719fc 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -13,6 +13,7 @@
 #include <xen/smp.h>
 #include <xen/softirq.h>
 #include <xen/spinlock.h>
+#include <xen/version.h>
 #include <xen/wait.h>
 #include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
@@ -99,7 +100,22 @@ static const char *state2str(int32_t state)
 static void xsplice_printall(unsigned char key)
 {
     struct payload *data;
-    unsigned int i;
+    char *binary_id = NULL;
+    unsigned int len = 0, i;
+    int rc;
+
+    rc = xen_build_id(&binary_id, &len);
+    printk("build-id: ");
+    if ( !rc )
+    {
+        for ( i = 0; i < len; i++ )
+        {
+                   uint8_t c = binary_id[i];
+                   printk("%02x", c);
+        }
+           printk("\n");
+    } else if ( rc < 0 )
+        printk("rc = %d\n", rc);
 
     spin_lock(&payload_lock);
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 16/23] xsplice: basic build-id dependency checking.
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
                   ` (14 preceding siblings ...)
  2016-02-12 18:05 ` [PATCH v3 15/23] xsplice: Print build_id in keyhandler Konrad Rzeszutek Wilk
@ 2016-02-12 18:05 ` Konrad Rzeszutek Wilk
  2016-02-12 18:05 ` [PATCH v3 17/23] xsplice: Print dependency and payloads build_id in the keyhandler Konrad Rzeszutek Wilk
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:05 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Keir Fraser, Jan Beulich, xen-devel
  Cc: Konrad Rzeszutek Wilk

We now expect that the ELF payloads be built with the
--build-id.

Also the .xsplice.deps section has to have the contents
of the hypervisor (or a preceding payload) build-id.

We already have the code to verify the Elf_Note build-id
so export parts of it.

This dependency means the hypervisor MUST be compiled with
--build-id - so we gate the build of xSplice on the availability
of said functionality.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 Config.mk                  |  1 +
 docs/misc/xsplice.markdown | 78 +++++++++++++++++++++++++++++--------------
 xen/arch/x86/test/Makefile | 21 ++++++++++--
 xen/common/Kconfig         |  5 +++
 xen/common/version.c       | 41 ++++++++++++++++-------
 xen/common/xsplice.c       | 83 ++++++++++++++++++++++++++++++++++++++++++++--
 xen/include/xen/version.h  |  4 +++
 xen/include/xen/xsplice.h  |  6 ++++
 8 files changed, 197 insertions(+), 42 deletions(-)

diff --git a/Config.mk b/Config.mk
index 61186e2..ced25df 100644
--- a/Config.mk
+++ b/Config.mk
@@ -134,6 +134,7 @@ ifeq ($(call ld-ver-build-id,$(LD)),n)
 build_id_linker :=
 else
 CFLAGS += -DBUILD_ID
+export XEN_HAS_BUILD_ID=y
 build_id_linker := --build-id=sha1
 endif
 
diff --git a/docs/misc/xsplice.markdown b/docs/misc/xsplice.markdown
index 0a5b87b..c06cd9d 100644
--- a/docs/misc/xsplice.markdown
+++ b/docs/misc/xsplice.markdown
@@ -283,9 +283,17 @@ The xSplice core code loads the payload as a standard ELF binary, relocates it
 and handles the architecture-specifc sections as needed. This process is much
 like what the Linux kernel module loader does.
 
-The payload contains a section (xsplice_patch_func) with an array of structures
-describing the functions to be patched:
+The payload contains at least three sections:
 
+ * `.xsplice.funcs` - which is an array of xsplice_patch_func structures.
+ * `.xsplice.depends` - which is an ELF Note that describes what the payload
+    depends on.
+ *  `.note.gnu.build-id` - the build-id of this payload.
+
+### .xsplice.funcs
+
+The `.xsplice.funcs` contains an array of xsplice_patch_func structures
+which describe the functions to be patched:
 <pre>
 struct xsplice_patch_func {  
     const char *name;  
@@ -327,7 +335,7 @@ When reverting a patch, the hypervisor iterates over each `xsplice_patch_func`
 and the core code copies the data from the undo buffer (private internal copy)
 to `old_addr`.
 
-### Example
+### Example of .xsplice.funcs
 
 A simple example of what a payload file can be:
 
@@ -362,6 +370,23 @@ struct xsplice_patch_func xsplice_hello_world = {
 
 Code must be compiled with -fPIC.
 
+### .xsplice.depends and .note.gnu.build-id
+
+To support dependencies checking and safe loading (to load the
+appropiate payload against the right hypervisor) there is a need
+to embbed an build-id dependency.
+
+This is done by the payload containing an section `.xsplice.depends`
+which follows the format of an ELF Note. The contents of this
+(name, and description) are specific to the linker utilized to
+build the hypevisor and payload.
+
+If GNU linker is used then the name is `GNU` and the description
+is an NT_GNU_BUILD_ID type ID. The description can be an SHA1
+checksum, MD5 checksum or any unique value.
+
+The size of these structures varies with the --build-id linker option.
+
 ## Hypercalls
 
 We will employ the sub operations of the system management hypercall (sysctl).
@@ -862,28 +887,6 @@ This is implemented in the Xen Project hypervisor.
 
 Only the privileged domain should be allowed to do this operation.
 
-
-# Not Yet Done
-
-This is for further development of xSplice.
-
-## Goals
-
-The design must also have a mechanism for:
-
- *  An dependency mechanism for the payloads. To use that information to load:
-    - The appropiate payload. To verify that payload is built against the
-      hypervisor. This can be done via the `build-id`
-      or via providing an copy of the old code - so that the hypervisor can
-       verify it against the code in memory.
-    - To construct an appropiate order of payloads to load in case they
-      depend on each other.
- * Be able to lookup in the Xen hypervisor the symbol names of functions from the ELF payload.
- * Be able to patch .rodata, .bss, and .data sections.
- * Further safety checks (blacklist of which functions cannot be patched, check
-   the stack, etc).
- * NOP out the code sequence if `new_size` is zero.
-
 ### xSplice interdependencies
 
 xSplice patches interdependencies are tricky.
@@ -910,6 +913,31 @@ being loaded and requires an hypervisor build-id to match against.
 The old code allows much more flexibility and an additional guard,
 but is more complex to implement.
 
+The second option which requires an build-id of the hypervisor
+is implemented in the Xen Project hypervisor.
+
+Specifically each payload has two build-id ELF notes:
+ * The build-id of the payload itself (generated via --build-id).
+ * The build-id of the payload it depends on (extracted from the
+   the previous payload or hypervisor during build time).
+
+This means that the very first payload depends on the hypervisor
+build-id.
+
+# Not Yet Done
+
+This is for further development of xSplice.
+
+## Goals
+
+The design must also have a mechanism for:
+
+ * Be able to lookup in the Xen hypervisor the symbol names of functions from the ELF payload.
+ * Be able to patch .rodata, .bss, and .data sections.
+ * Further safety checks (blacklist of which functions cannot be patched, check
+   the stack, etc).
+ * NOP out the code sequence if `new_size` is zero.
+
 ### Handle inlined __LINE__
 
 This problem is related to hotpatch construction
diff --git a/xen/arch/x86/test/Makefile b/xen/arch/x86/test/Makefile
index 3fe951d..de9693f 100644
--- a/xen/arch/x86/test/Makefile
+++ b/xen/arch/x86/test/Makefile
@@ -22,7 +22,7 @@ endif
 
 .PHONY: clean
 clean::
-	rm -f *.o .*.o.d $(XSPLICE) config.h
+	rm -f *.o .*.o.d $(XSPLICE) config.h build_id.o
 
 #
 # To compute these values we need the binary files: xen-syms
@@ -41,10 +41,25 @@ config.h: xen_hello_world_func.o
 	 echo "#define OLD_CODE_SZ $(OLD_CODE_SZ)"; \
 	 echo "#define OLD_CODE $(OLD_CODE)") > $@
 
+#
+# This target is only accessible if CONFIG_XSPLICE is defined, which
+# depends on $(build_id_linker) being available. Hence we do not
+# need any checks.
+#
+.PHONY: build_id.o
+build_id.o:
+	$(OBJCOPY) --only-section=.note $(BASEDIR)/xen-syms .$@.0
+	# Need to clear the CODE when the build_id.o is put in the .data
+	$(OBJCOPY) --set-section-flags=.note=alloc,load,data  \
+		   --rename-section=.note=.xsplice.depends .$@.0 $@
+	rm -f .$@.0
+
 .PHONY: xsplice
-xsplice: config.h
+build_id_files := build_id.o
+xsplice: config.h build_id.o
 	# Need to have these done in sequential order
 	$(MAKE) -f $(BASEDIR)/Rules.mk xen_hello_world_func.o
 	$(MAKE) -f $(BASEDIR)/Rules.mk xen_hello_world.o
-	$(LD) $(LDFLAGS) -r -o $(XSPLICE) xen_hello_world_func.o xen_hello_world.o
+	$(LD) $(LDFLAGS) $(build_id_linker) -r -o $(XSPLICE) xen_hello_world_func.o \
+		 xen_hello_world.o build_id.o
 
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 619aa9e..a313171 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -51,6 +51,10 @@ config HAS_GDBSX
 config HAS_IOPORTS
 	bool
 
+config HAS_BUILD_ID
+    string
+    option env="XEN_HAS_BUILD_ID"
+
 # Enable/Disable kexec support
 config KEXEC
 	bool "kexec support"
@@ -156,6 +160,7 @@ endmenu
 config XSPLICE
 	bool "xsplice support"
 	default y
+	depends on HAS_BUILD_ID = "y"
 	---help---
 	  Allows a running Xen hypervisor to be patched without rebooting.
 	  This is primarily used to patch an hypervisor with XSA fixes.
diff --git a/xen/common/version.c b/xen/common/version.c
index 33c09e5..e21c01e 100644
--- a/xen/common/version.c
+++ b/xen/common/version.c
@@ -69,10 +69,29 @@ const char *xen_deny(void)
 /* Defined in linker script. */
 extern const Elf_Note __note_gnu_build_id_start[], __note_gnu_build_id_end[];
 
+int xen_build_id_check(char **p, unsigned int *len, const Elf_Note *n)
+{
+    /* Check if we really have a build-id. */
+    if ( NT_GNU_BUILD_ID != n->type )
+        return -ENODATA;
+
+    /* Sanity check, name should be "GNU" for ld-generated build-id. */
+    if ( strncmp(ELFNOTE_NAME(n), "GNU", n->namesz) != 0 )
+        return -ENODATA;
+
+    if ( len )
+        *len = n->descsz;
+    if ( p )
+        *p = ELFNOTE_DESC(n);
+
+    return 0;
+}
+
 int xen_build_id(char **p, unsigned int *len)
 {
     const Elf_Note *n = __note_gnu_build_id_start;
     static bool_t checked = 0;
+    int rc;
 
     if ( checked )
     {
@@ -86,23 +105,21 @@ int xen_build_id(char **p, unsigned int *len)
 
     /* Check for full Note header. */
     if ( &n[1] > __note_gnu_build_id_end )
+    {
         return -ENODATA;
+    }
 
-    /* Check if we really have a build-id. */
-    if ( NT_GNU_BUILD_ID != n->type )
-        return -ENODATA;
-
-    /* Sanity check, name should be "GNU" for ld-generated build-id. */
-    if ( strncmp(ELFNOTE_NAME(n), "GNU", n->namesz) != 0 )
-        return -ENODATA;
-
-    *len = n->descsz;
-    *p = ELFNOTE_DESC(n);
+    rc = xen_build_id_check(p, len, n);
+    if ( !rc )
+        checked = 1;
 
-    checked = 1;
-    return 0;
+    return rc;
 }
 #else
+int xen_build_id_check(char **p, unsigned int *len, const Elf_Note *n)
+{
+    return -ENODATA;
+}
 int xen_build_id(char **p, unsigned int *len)
 {
     return -ENODATA;
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 34719fc..2ba5bb5 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -4,6 +4,7 @@
  */
 
 #include <xen/cpu.h>
+#include <xen/elf.h>
 #include <xen/guest_access.h>
 #include <xen/keyhandler.h>
 #include <xen/lib.h>
@@ -14,6 +15,7 @@
 #include <xen/softirq.h>
 #include <xen/spinlock.h>
 #include <xen/version.h>
+#include <xen/version.h>
 #include <xen/wait.h>
 #include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
@@ -51,6 +53,8 @@ struct payload {
     struct exception_table_entry *start_ex_table;
     struct exception_table_entry *stop_ex_table;
 #endif
+    struct xsplice_build_id id;          /* ELFNOTE_DESC(.note.gnu.build-id) of the payload. */
+    struct xsplice_build_id dep;         /* ELFNOTE_DESC(.xsplice.depends). */
     char name[XEN_XSPLICE_NAME_SIZE + 1];/* Name of it. */
 };
 
@@ -639,7 +643,9 @@ static int check_special_sections(struct payload *payload,
                                   struct xsplice_elf *elf)
 {
     unsigned int i;
-    static const char *const names[] = { ".xsplice.funcs" };
+    static const char *const names[] = { ".xsplice.funcs" ,
+                                         ".xsplice.depends",
+                                         ".note.gnu.build-id"};
 
     for ( i = 0; i < ARRAY_SIZE(names); i++ )
     {
@@ -648,7 +654,7 @@ static int check_special_sections(struct payload *payload,
         sec = xsplice_elf_sec_by_name(elf, names[i]);
         if ( !sec )
         {
-            printk(XENLOG_ERR "%s: %s is missing!\n", names[i],elf->name);
+            printk(XENLOG_ERR "%s: %s is missing!\n", names[i], elf->name);
             return -EINVAL;
         }
         if ( !sec->sec->sh_size )
@@ -657,12 +663,15 @@ static int check_special_sections(struct payload *payload,
     return 0;
 }
 
+#define NT_GNU_BUILD_ID 3
+
 static int find_special_sections(struct payload *payload,
                                  struct xsplice_elf *elf)
 {
     struct xsplice_elf_sec *sec;
     unsigned int i;
     struct xsplice_patch_func *f;
+    Elf_Note *n;
 
     sec = xsplice_elf_sec_by_name(elf, ".xsplice.funcs");
     if ( sec->sec->sh_size % sizeof *payload->funcs )
@@ -689,6 +698,27 @@ static int find_special_sections(struct payload *payload,
                 return -EINVAL;
     }
 
+    sec = xsplice_elf_sec_by_name(elf, ".note.gnu.build-id");
+    if ( sec )
+    {
+        n = (Elf_Note *)sec->load_addr;
+        if ( sec->sec->sh_size <= sizeof *n )
+            return -EINVAL;
+
+        if ( xen_build_id_check(&payload->id.p, &payload->id.len, n) )
+            return -EINVAL;
+    }
+
+    sec = xsplice_elf_sec_by_name(elf, ".xsplice.depends");
+    {
+        n = (Elf_Note *)sec->load_addr;
+        if ( sec->sec->sh_size <= sizeof *n )
+            return -EINVAL;
+
+        if ( xen_build_id_check(&payload->dep.p, &payload->dep.len, n) )
+            return -EINVAL;
+    }
+
     /* Optional sections. */
     for ( i = 0; i < BUGFRAME_NR; i++ )
     {
@@ -784,6 +814,38 @@ static int load_payload_data(struct payload *payload, uint8_t *raw, ssize_t len)
  * The following functions get the CPUs into an appropriate state and
  * apply (or revert) each of the module's functions.
  */
+/* Only apply if the payload is applied on top of the correct build-id. */
+static int apply_depcheck(struct payload *payload)
+{
+    if ( !payload->dep.len )
+        return -EINVAL;
+
+    if ( list_empty(&applied_list) )
+    {
+        char *id;
+        unsigned int len;
+        int rc;
+
+        rc = xen_build_id(&id, &len);
+        if ( rc )
+            return rc;
+
+        if ( (payload->dep.len != len ) ||
+              memcmp(id, payload->dep.p, len) )
+            return -EINVAL;
+    }
+    else
+    {
+        struct payload *data = list_last_entry(&applied_list, struct payload,
+                                               applied_list);
+
+        if ( (payload->dep.len != data->id.len) ||
+             memcmp(data->id.p, payload->dep.p, data->id.len) )
+            return -EINVAL;
+    }
+
+    return 0;
+}
 
 /*
  * This function is executed having all other CPUs with no stack (we may
@@ -793,6 +855,11 @@ static int load_payload_data(struct payload *payload, uint8_t *raw, ssize_t len)
 static int apply_payload(struct payload *data)
 {
     unsigned int i;
+    int rc;
+
+    rc = apply_depcheck(data);
+    if ( rc )
+        return rc;
 
     printk(XENLOG_DEBUG "%s: Applying %u functions.\n", data->name,
            data->nfuncs);
@@ -805,6 +872,13 @@ static int apply_payload(struct payload *data)
     return 0;
 }
 
+/* Only allow reverting if this is the top of the stack. */
+static int revert_depcheck(struct payload *payload)
+{
+    return (list_last_entry_or_null(&applied_list, struct payload,
+                                    applied_list) == payload) ? 0 : -EINVAL;
+}
+
 /*
  * This function is executed having all other CPUs with no stack (we may
  * have cpu_idle on it) and IRQs disabled.
@@ -812,6 +886,11 @@ static int apply_payload(struct payload *data)
 static int revert_payload(struct payload *data)
 {
     unsigned int i;
+    int rc;
+
+    rc = revert_depcheck(data);
+    if ( rc )
+        return rc;
 
     printk(XENLOG_DEBUG "%s: Reverting.\n", data->name);
 
diff --git a/xen/include/xen/version.h b/xen/include/xen/version.h
index 466c977..7e80012 100644
--- a/xen/include/xen/version.h
+++ b/xen/include/xen/version.h
@@ -15,4 +15,8 @@ const char *xen_banner(void);
 const char *xen_deny(void);
 int xen_build_id(char **p, unsigned int *len);
 
+#include <xen/types.h>
+#include <xen/elfstructs.h>
+int xen_build_id_check(char **p, unsigned int *len, const Elf_Note *n);
+
 #endif /* __XEN_VERSION_H__ */
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index 3a9948a..061a1a1 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -24,6 +24,12 @@ struct xsplice_patch_func {
 };
 
 #ifdef CONFIG_XSPLICE
+
+struct xsplice_build_id {
+   char *p;
+   unsigned int len;
+};
+
 int xsplice_control(struct xen_sysctl_xsplice_op *);
 void do_xsplice(void);
 struct bug_frame *xsplice_find_bug(const char *eip, int *id);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 17/23] xsplice: Print dependency and payloads build_id in the keyhandler.
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
                   ` (15 preceding siblings ...)
  2016-02-12 18:05 ` [PATCH v3 16/23] xsplice: basic build-id dependency checking Konrad Rzeszutek Wilk
@ 2016-02-12 18:05 ` Konrad Rzeszutek Wilk
  2016-02-16 20:20   ` Andrew Cooper
  2016-02-12 18:05 ` [PATCH v3 18/23] xsplice: Prevent duplicate payloads to be loaded Konrad Rzeszutek Wilk
                   ` (6 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:05 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Ian Campbell, Ian Jackson, Jan Beulich,
	Keir Fraser, Tim Deegan, xen-devel
  Cc: Konrad Rzeszutek Wilk

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/common/xsplice.c | 36 ++++++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 2ba5bb5..8c5557e 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -101,6 +101,21 @@ static const char *state2str(int32_t state)
     return names[state];
 }
 
+static void xsplice_print_build_id(char *id, unsigned int len)
+{
+    unsigned int i;
+
+    if ( !len )
+        return;
+
+    for ( i = 0; i < len; i++ )
+    {
+        uint8_t c = id[i];
+        printk("%02x", c);
+    }
+    printk("\n");
+}
+
 static void xsplice_printall(unsigned char key)
 {
     struct payload *data;
@@ -111,14 +126,9 @@ static void xsplice_printall(unsigned char key)
     rc = xen_build_id(&binary_id, &len);
     printk("build-id: ");
     if ( !rc )
-    {
-        for ( i = 0; i < len; i++ )
-        {
-                   uint8_t c = binary_id[i];
-                   printk("%02x", c);
-        }
-           printk("\n");
-    } else if ( rc < 0 )
+        xsplice_print_build_id(binary_id, len);
+
+    else if ( rc < 0 )
         printk("rc = %d\n", rc);
 
     spin_lock(&payload_lock);
@@ -135,6 +145,16 @@ static void xsplice_printall(unsigned char key)
             printk("    %s patch 0x%"PRIx64"(%u) with 0x%"PRIx64"(%u)\n",
                    f->name, f->old_addr, f->old_size, f->new_addr, f->new_size);
         }
+        if ( data->id.len )
+        {
+            printk(" build_id=");
+            xsplice_print_build_id(data->id.p, data->id.len);
+        }
+        if ( data->dep.len )
+        {
+            printk(" depend on=");
+            xsplice_print_build_id(data->dep.p, data->dep.len);
+        }
     }
     spin_unlock(&payload_lock);
 }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 18/23] xsplice: Prevent duplicate payloads to be loaded.
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
                   ` (16 preceding siblings ...)
  2016-02-12 18:05 ` [PATCH v3 17/23] xsplice: Print dependency and payloads build_id in the keyhandler Konrad Rzeszutek Wilk
@ 2016-02-12 18:05 ` Konrad Rzeszutek Wilk
  2016-02-12 18:05 ` [PATCH v3 19/23] xsplice, symbols: Implement symbol name resolution on address. (v2) Konrad Rzeszutek Wilk
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:05 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Ian Campbell, Ian Jackson, Jan Beulich,
	Keir Fraser, Tim Deegan, xen-devel
  Cc: Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/common/xsplice.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 8c5557e..3f1da13 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -728,6 +728,23 @@ static int find_special_sections(struct payload *payload,
         if ( xen_build_id_check(&payload->id.p, &payload->id.len, n) )
             return -EINVAL;
     }
+    /* Make sure it is not a duplicate. */
+    if ( payload->id.len )
+    {
+        struct payload *data;
+
+        spin_lock(&payload_lock);
+        list_for_each_entry ( data, &payload_list, list )
+        {
+            if ( data != payload && data->id.len &&
+                 !memcmp(data->id.p, payload->id.p, data->id.len) )
+            {
+                spin_unlock(&payload_lock);
+                return -EEXIST;
+            }
+        }
+        spin_unlock(&payload_lock);
+    }
 
     sec = xsplice_elf_sec_by_name(elf, ".xsplice.depends");
     {
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 19/23] xsplice, symbols: Implement symbol name resolution on address. (v2)
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
                   ` (17 preceding siblings ...)
  2016-02-12 18:05 ` [PATCH v3 18/23] xsplice: Prevent duplicate payloads to be loaded Konrad Rzeszutek Wilk
@ 2016-02-12 18:05 ` Konrad Rzeszutek Wilk
  2016-02-22 14:57   ` Ross Lagerwall
  2016-02-12 18:05 ` [PATCH v3 20/23] x86, xsplice: Print payload's symbol name and module in backtraces Konrad Rzeszutek Wilk
                   ` (4 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:05 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Keir Fraser, Jan Beulich, xen-devel
  Cc: Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

If in the payload we do not have the old_addr we can resolve
the virtual address based on the UNDEFined symbols.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
v1: Ross original version.
v2: Include test-case and document update.
---
 docs/misc/xsplice.markdown          |   1 -
 xen/arch/x86/Makefile               |   6 +-
 xen/arch/x86/test/Makefile          |   4 +-
 xen/arch/x86/test/xen_hello_world.c |   5 +-
 xen/common/symbols.c                |  23 ++++++
 xen/common/xsplice.c                | 151 ++++++++++++++++++++++++++++++++++++
 xen/common/xsplice_elf.c            |  19 ++++-
 xen/include/xen/symbols.h           |   2 +
 xen/include/xen/xsplice.h           |   7 ++
 9 files changed, 208 insertions(+), 10 deletions(-)

diff --git a/docs/misc/xsplice.markdown b/docs/misc/xsplice.markdown
index c06cd9d..1a982f2 100644
--- a/docs/misc/xsplice.markdown
+++ b/docs/misc/xsplice.markdown
@@ -932,7 +932,6 @@ This is for further development of xSplice.
 
 The design must also have a mechanism for:
 
- * Be able to lookup in the Xen hypervisor the symbol names of functions from the ELF payload.
  * Be able to patch .rodata, .bss, and .data sections.
  * Further safety checks (blacklist of which functions cannot be patched, check
    the stack, etc).
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 4a19ae9..5f7c57e 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -119,12 +119,14 @@ $(TARGET)-syms: prelink.o xen.lds $(BASEDIR)/common/symbols-dummy.o
 	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o  \
 	    $(BASEDIR)/common/symbols-dummy.o -o $(@D)/.$(@F).0
 	$(NM) -pa --format=sysv $(@D)/.$(@F).0 \
-		| $(BASEDIR)/tools/symbols --sysv --sort >$(@D)/.$(@F).0.S
+		| $(BASEDIR)/tools/symbols --all-symbols --sysv --sort \
+		>$(@D)/.$(@F).0.S
 	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).0.o
 	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o  \
 	    $(@D)/.$(@F).0.o -o $(@D)/.$(@F).1
 	$(NM) -pa --format=sysv $(@D)/.$(@F).1 \
-		| $(BASEDIR)/tools/symbols --sysv --sort --warn-dup >$(@D)/.$(@F).1.S
+		| $(BASEDIR)/tools/symbols --all-symbols --sysv --sort --warn-dup \
+		>$(@D)/.$(@F).1.S
 	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1.o
 	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o $(build_id_linker) \
 	    $(@D)/.$(@F).1.o -o $@
diff --git a/xen/arch/x86/test/Makefile b/xen/arch/x86/test/Makefile
index de9693f..57c5189 100644
--- a/xen/arch/x86/test/Makefile
+++ b/xen/arch/x86/test/Makefile
@@ -32,14 +32,12 @@ clean::
 # the last entry in the build target.
 #
 .PHONY: config.h
-config.h: OLD_CODE=$(call CODE_ADDR,$(BASEDIR)/xen-syms,xen_extra_version)
 config.h: OLD_CODE_SZ=$(call CODE_SZ,$<,xen_hello_world)
 config.h: NEW_CODE_SZ=$(call CODE_SZ,$(BASEDIR)/xen-syms,xen_extra_version)
 config.h: xen_hello_world_func.o
 	(set -e; \
 	 echo "#define NEW_CODE_SZ $(NEW_CODE_SZ)"; \
-	 echo "#define OLD_CODE_SZ $(OLD_CODE_SZ)"; \
-	 echo "#define OLD_CODE $(OLD_CODE)") > $@
+	 echo "#define OLD_CODE_SZ $(OLD_CODE_SZ)") > $@
 
 #
 # This target is only accessible if CONFIG_XSPLICE is defined, which
diff --git a/xen/arch/x86/test/xen_hello_world.c b/xen/arch/x86/test/xen_hello_world.c
index 6a1775b..6200fbe 100644
--- a/xen/arch/x86/test/xen_hello_world.c
+++ b/xen/arch/x86/test/xen_hello_world.c
@@ -6,10 +6,13 @@
 static char name[] = "xen_hello_world";
 extern const char *xen_hello_world(void);
 
+/* External symbol. */
+extern const char *xen_extra_version(void);
+
 struct xsplice_patch_func __section(".xsplice.funcs") xsplice_xen_hello_world = {
     .name = name,
     .new_addr = (unsigned long)(xen_hello_world),
-    .old_addr = OLD_CODE,
+    .old_addr = (unsigned long)(xen_extra_version),
     .new_size = NEW_CODE_SZ,
     .old_size = OLD_CODE_SZ,
 };
diff --git a/xen/common/symbols.c b/xen/common/symbols.c
index bf5623f..406e5be 100644
--- a/xen/common/symbols.c
+++ b/xen/common/symbols.c
@@ -209,3 +209,26 @@ int xensyms_read(uint32_t *symnum, char *type,
 
     return 0;
 }
+
+uint64_t symbols_lookup_by_name(const char *symname)
+{
+    uint32_t symnum = 0;
+    uint64_t addr = 0, outaddr = 0;
+    int rc;
+    char type;
+    char name[KSYM_NAME_LEN + 1] = {0};
+
+    do {
+        rc = xensyms_read(&symnum, &type, &addr, name);
+        if ( rc )
+            break;
+
+        if ( !strcmp(name, symname) )
+        {
+            outaddr = addr;
+            break;
+        }
+    } while ( name[0] != '\0' );
+
+    return outaddr;
+}
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 3f1da13..0b42a16 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -14,6 +14,7 @@
 #include <xen/smp.h>
 #include <xen/softirq.h>
 #include <xen/spinlock.h>
+#include <xen/symbols.h>
 #include <xen/version.h>
 #include <xen/version.h>
 #include <xen/wait.h>
@@ -55,6 +56,9 @@ struct payload {
 #endif
     struct xsplice_build_id id;          /* ELFNOTE_DESC(.note.gnu.build-id) of the payload. */
     struct xsplice_build_id dep;         /* ELFNOTE_DESC(.xsplice.depends). */
+    struct xsplice_symbol *symtab;       /* All symbols. */
+    char *strtab;                        /* Pointer to .strtab. */
+    unsigned int nsyms;                  /* Nr of entries in .strtab and symbols. */
     char name[XEN_XSPLICE_NAME_SIZE + 1];/* Name of it. */
 };
 
@@ -232,6 +236,8 @@ static void free_payload(struct payload *data)
     payload_cnt--;
     payload_version++;
     free_payload_data(data);
+    xfree(data->symtab);
+    xfree(data->strtab);
     xfree(data);
 }
 
@@ -290,6 +296,8 @@ static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
  err_raw:
     free_xenheap_pages(raw_data, get_order_from_bytes(upload->size));
  err_data:
+    xfree(data->symtab);
+    xfree(data->strtab);
     xfree(data);
     return rc;
 }
@@ -716,6 +724,24 @@ static int find_special_sections(struct payload *payload,
         for ( j = 0; j < 24; j ++ )
             if ( f->pad[j] )
                 return -EINVAL;
+
+        /* Lookup function's old address if not already resolved. */
+        if ( !f->old_addr )
+        {
+            f->old_addr = symbols_lookup_by_name(f->name);
+            if ( !f->old_addr )
+            {
+                f->old_addr = xsplice_symbols_lookup_by_name(f->name);
+                if ( !f->old_addr )
+                {
+                    printk(XENLOG_ERR "%s: Could not resolve old address of %s\n",
+                           elf->name, f->name);
+                    return -ENOENT;
+                }
+            }
+            printk(XENLOG_DEBUG "%s: Resolved old address %s => 0x%"PRIx64"\n",
+                   elf->name, f->name, f->old_addr);
+        }
     }
 
     sec = xsplice_elf_sec_by_name(elf, ".note.gnu.build-id");
@@ -798,6 +824,102 @@ static int find_special_sections(struct payload *payload,
     return 0;
 }
 
+static bool_t is_core_symbol(struct xsplice_elf *elf,
+                             struct xsplice_elf_sym *sym)
+{
+    if ( sym->sym->st_shndx == SHN_UNDEF ||
+         sym->sym->st_shndx >= elf->hdr->e_shnum )
+        return 0;
+
+    return !!( (elf->sec[sym->sym->st_shndx].sec->sh_flags & SHF_ALLOC) &&
+               (ELF64_ST_TYPE(sym->sym->st_info) == STT_OBJECT ||
+                ELF64_ST_TYPE(sym->sym->st_info) == STT_FUNC) );
+}
+
+static int build_symbol_table(struct payload *payload, struct xsplice_elf *elf)
+{
+    unsigned int i, j, nsyms = 0;
+    size_t strtab_len = 0;
+    struct xsplice_symbol *symtab;
+    char *strtab;
+
+    /* Recall that 0 is always NULL. */
+    for ( i = 1; i < elf->nsym; i++ )
+    {
+        if ( is_core_symbol(elf, elf->sym + i) )
+        {
+            nsyms++;
+            strtab_len += strlen(elf->sym[i].name) + 1;
+        }
+    }
+
+    symtab = xmalloc_array(struct xsplice_symbol, nsyms);
+    if ( !symtab )
+        return -ENOMEM;
+
+    strtab = xmalloc_bytes(strtab_len);
+    if ( !strtab )
+    {
+        xfree(symtab);
+        return -ENOMEM;
+    }
+
+    nsyms = 0;
+    strtab_len = 0;
+    for ( i = 1; i < elf->nsym; i++ )
+    {
+        if ( is_core_symbol(elf, elf->sym + i) )
+        {
+            symtab[nsyms].name = strtab + strtab_len;
+            symtab[nsyms].size = elf->sym[i].sym->st_size;
+            symtab[nsyms].value = elf->sym[i].sym->st_value;
+            symtab[nsyms].flags = 0;
+            strtab_len += strlcpy(strtab + strtab_len, elf->sym[i].name,
+                                  KSYM_NAME_LEN) + 1;
+            nsyms++;
+        }
+    }
+
+    for ( i = 0; i < nsyms; i++ )
+    {
+        bool_t found = 0;
+
+        for ( j = 0; j < payload->nfuncs; j++ )
+        {
+            if ( symtab[i].value == payload->funcs[j].new_addr )
+            {
+                found = 1;
+                break;
+            }
+        }
+
+        if ( !found )
+        {
+            if ( xsplice_symbols_lookup_by_name(symtab[i].name) )
+            {
+                printk(XENLOG_ERR "%s: duplicate new symbol: %s\n", elf->name,
+                       symtab[i].name);
+                xfree(symtab);
+                xfree(strtab);
+                return -EEXIST;
+            }
+            printk(XENLOG_DEBUG "%s: new symbol %s\n", elf->name,
+                   symtab[i].name);
+        }
+        else
+        {
+            printk(XENLOG_DEBUG "%s: overriding symbol %s\n", elf->name,
+                   symtab[i].name);
+        }
+    }
+
+    payload->symtab = symtab;
+    payload->strtab = strtab;
+    payload->nsyms = nsyms;
+
+    return 0;
+}
+
 static int load_payload_data(struct payload *payload, uint8_t *raw, ssize_t len)
 {
     struct xsplice_elf elf;
@@ -831,6 +953,10 @@ static int load_payload_data(struct payload *payload, uint8_t *raw, ssize_t len)
     if ( rc )
         goto err_payload;
 
+    rc = build_symbol_table(payload, &elf);
+    if ( rc )
+        goto err_payload;
+
     rc = find_special_sections(payload, &elf);
     if ( rc )
         goto err_payload;
@@ -1234,6 +1360,31 @@ unsigned long search_module_extables(unsigned long addr)
 }
 #endif
 
+uint64_t xsplice_symbols_lookup_by_name(const char *symname)
+{
+    struct payload *data;
+    unsigned int i;
+    uint64_t value = 0;
+
+    spin_lock(&payload_lock);
+
+    list_for_each_entry ( data, &payload_list, list )
+    {
+        for ( i = 0; i < data->nsyms; i++ )
+        {
+            if ( !strcmp(data->symtab[i].name, symname) )
+            {
+                value = data->symtab[i].value;
+                goto out;
+            }
+        }
+    }
+
+out:
+    spin_unlock(&payload_lock);
+    return value;
+}
+
 static int __init xsplice_init(void)
 {
     register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
diff --git a/xen/common/xsplice_elf.c b/xen/common/xsplice_elf.c
index ad70797..dade9fd 100644
--- a/xen/common/xsplice_elf.c
+++ b/xen/common/xsplice_elf.c
@@ -1,5 +1,6 @@
 #include <xen/errno.h>
 #include <xen/lib.h>
+#include <xen/symbols.h>
 #include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 
@@ -223,9 +224,21 @@ int xsplice_elf_resolve_symbols(struct xsplice_elf *elf)
                 return_(-EINVAL);
                 break;
             case SHN_UNDEF:
-                printk(XENLOG_ERR "%s: Unknown symbol: %s\n", elf->name,
-                       elf->sym[i].name);
-                return_(-ENOENT);
+                elf->sym[i].sym->st_value = symbols_lookup_by_name(elf->sym[i].name);
+                if ( !elf->sym[i].sym->st_value )
+                {
+                    elf->sym[i].sym->st_value =
+                        xsplice_symbols_lookup_by_name(elf->sym[i].name);
+                    if ( !elf->sym[i].sym->st_value )
+                    {
+                        printk(XENLOG_ERR "%s: Unknown symbol: %s\n", elf->name,
+                               elf->sym[i].name);
+                        return_(-ENOENT);
+                    }
+                }
+                printk(XENLOG_DEBUG "%s: Undefined symbol resolved: %s => 0x%"PRIx64"\n",
+                       elf->name, elf->sym[i].name,
+                       elf->sym[i].sym->st_value);
                 break;
             case SHN_ABS:
                 printk(XENLOG_DEBUG "%s: Absolute symbol: %s => 0x%"PRIx64"\n",
diff --git a/xen/include/xen/symbols.h b/xen/include/xen/symbols.h
index 1fa0537..f8ea1dc 100644
--- a/xen/include/xen/symbols.h
+++ b/xen/include/xen/symbols.h
@@ -14,4 +14,6 @@ const char *symbols_lookup(unsigned long addr,
 int xensyms_read(uint32_t *symnum, char *type,
                  uint64_t *address, char *name);
 
+uint64_t symbols_lookup_by_name(const char *symname);
+
 #endif /*_XEN_SYMBOLS_H*/
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index 061a1a1..1045213 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -30,12 +30,19 @@ struct xsplice_build_id {
    unsigned int len;
 };
 
+struct xsplice_symbol {
+    const char *name;
+    uint64_t value;
+    size_t size;
+};
+
 int xsplice_control(struct xen_sysctl_xsplice_op *);
 void do_xsplice(void);
 struct bug_frame *xsplice_find_bug(const char *eip, int *id);
 bool_t is_module(const void *addr);
 bool_t is_active_module_text(unsigned long addr);
 unsigned long search_module_extables(unsigned long addr);
+uint64_t xsplice_symbols_lookup_by_name(const char *symname);
 
 /* Arch hooks */
 int xsplice_verify_elf(struct xsplice_elf *elf, uint8_t *data);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 20/23] x86, xsplice: Print payload's symbol name and module in backtraces
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
                   ` (18 preceding siblings ...)
  2016-02-12 18:05 ` [PATCH v3 19/23] xsplice, symbols: Implement symbol name resolution on address. (v2) Konrad Rzeszutek Wilk
@ 2016-02-12 18:05 ` Konrad Rzeszutek Wilk
  2016-02-12 18:05 ` [PATCH v3 21/23] xsplice: Add support for shadow variables Konrad Rzeszutek Wilk
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:05 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Keir Fraser, Jan Beulich, xen-devel
  Cc: Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/arch/x86/traps.c      |  6 +++---
 xen/common/vsprintf.c     | 18 ++++++++++++++--
 xen/common/xsplice.c      | 52 +++++++++++++++++++++++++++++++++++++++++++----
 xen/include/xen/xsplice.h | 11 ++++++++++
 4 files changed, 78 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index f3adefa..36d42fe 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -341,7 +341,7 @@ static void _show_trace(unsigned long sp, unsigned long __maybe_unused bp)
     while ( stack <= bottom )
     {
         addr = *stack++;
-        if ( is_active_kernel_text(addr) )
+        if ( is_active_text(addr) )
             printk("   [<%p>] %pS\n", _p(addr), _p(addr));
     }
 }
@@ -403,8 +403,8 @@ static void show_trace(const struct cpu_user_regs *regs)
      * If RIP looks sensible, or the top of the stack doesn't, print RIP at
      * the top of the stack trace.
      */
-    if ( is_active_kernel_text(regs->rip) ||
-         !is_active_kernel_text(*sp) )
+    if ( is_active_text(regs->rip) ||
+         !is_active_text(*sp) )
         printk("   [<%p>] %pS\n", _p(regs->rip), _p(regs->rip));
     /*
      * Else RIP looks bad but the top of the stack looks good.  Perhaps we
diff --git a/xen/common/vsprintf.c b/xen/common/vsprintf.c
index 51b5e4e..cfe63a4 100644
--- a/xen/common/vsprintf.c
+++ b/xen/common/vsprintf.c
@@ -20,6 +20,7 @@
 #include <xen/symbols.h>
 #include <xen/lib.h>
 #include <xen/sched.h>
+#include <xen/xsplice.h>
 #include <asm/div64.h>
 #include <asm/page.h>
 
@@ -305,15 +306,21 @@ static char *pointer(char *str, char *end, const char **fmt_ptr,
     {
         unsigned long sym_size, sym_offset;
         char namebuf[KSYM_NAME_LEN+1];
+        const char *module = NULL;
 
         /* Advance parents fmt string, as we have consumed 's' or 'S' */
         ++*fmt_ptr;
 
         s = symbols_lookup((unsigned long)arg, &sym_size, &sym_offset, namebuf);
 
-        /* If the symbol is not found, fall back to printing the address */
         if ( !s )
-            break;
+        {
+            s = xsplice_symbols_lookup((unsigned long)arg, &sym_size,
+                                       &sym_offset, &module);
+            /* If the symbol is not found, fall back to printing the address */
+            if ( !s )
+                break;
+        }
 
         /* Print symbol name */
         str = string(str, end, s, -1, -1, 0);
@@ -328,6 +335,13 @@ static char *pointer(char *str, char *end, const char **fmt_ptr,
             str = number(str, end, sym_size, 16, -1, -1, SPECIAL);
         }
 
+        if ( module )
+        {
+            str = string(str, end, " [", -1, -1, 0);
+            str = string(str, end, module, -1, -1, 0);
+            str = string(str, end, "]", -1, -1, 0);
+        }
+
         return str;
     }
 
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 0b42a16..ae2882f 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -873,7 +873,6 @@ static int build_symbol_table(struct payload *payload, struct xsplice_elf *elf)
             symtab[nsyms].name = strtab + strtab_len;
             symtab[nsyms].size = elf->sym[i].sym->st_size;
             symtab[nsyms].value = elf->sym[i].sym->st_value;
-            symtab[nsyms].flags = 0;
             strtab_len += strlcpy(strtab + strtab_len, elf->sym[i].name,
                                   KSYM_NAME_LEN) + 1;
             nsyms++;
@@ -1276,7 +1275,7 @@ struct bug_frame *xsplice_find_bug(const char *eip, int *id)
 {
     struct payload *data;
     struct bug_frame *bug;
-    int i;
+    unsigned int i;
 
     /* No locking since this list is only ever changed during apply or revert
      * context. */
@@ -1286,10 +1285,12 @@ struct bug_frame *xsplice_find_bug(const char *eip, int *id)
             if (!data->start_bug_frames[i])
                 continue;
             if ( !((void *)eip >= data->payload_address &&
-                   (void *)eip < (data->payload_address + data->core_text_size)))
+                   (void *)eip < (data->payload_address + data->core_text_size)) )
                 continue;
 
-            for ( bug = data->start_bug_frames[i]; bug != data->stop_bug_frames[i]; ++bug ) {
+            for ( bug = data->start_bug_frames[i];
+                  bug != data->stop_bug_frames[i]; ++bug )
+            {
                 if ( bug_loc(bug) == eip )
                 {
                     *id = i;
@@ -1385,6 +1386,49 @@ out:
     return value;
 }
 
+const char *xsplice_symbols_lookup(unsigned long addr,
+                                   unsigned long *symbolsize,
+                                   unsigned long *offset,
+                                   const char **module)
+{
+    struct payload *data;
+    unsigned int i;
+    int best;
+
+    /* No locking since this list is only ever changed during apply or revert
+     * context. */
+    list_for_each_entry ( data, &applied_list, applied_list )
+    {
+        if ( !((void *)addr >= data->payload_address &&
+               (void *)addr < (data->payload_address + data->core_text_size)) )
+            continue;
+
+        best = -1;
+
+        for ( i = 0; i < data->nsyms; i++ )
+        {
+            if ( data->symtab[i].value <= addr &&
+                 ( best == -1 ||
+                   data->symtab[best].value < data->symtab[i].value) )
+                best = i;
+        }
+
+        if ( best == -1 )
+            return NULL;
+
+        if ( symbolsize )
+            *symbolsize = data->symtab[best].size;
+        if ( offset )
+            *offset = addr - data->symtab[best].value;
+        if ( module )
+            *module = data->name;
+
+        return data->symtab[best].name;
+    }
+
+    return NULL;
+}
+
 static int __init xsplice_init(void)
 {
     register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index 1045213..482483b 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -43,6 +43,10 @@ bool_t is_module(const void *addr);
 bool_t is_active_module_text(unsigned long addr);
 unsigned long search_module_extables(unsigned long addr);
 uint64_t xsplice_symbols_lookup_by_name(const char *symname);
+const char *xsplice_symbols_lookup(unsigned long addr,
+                                   unsigned long *symbolsize,
+                                   unsigned long *offset,
+                                   const char **module);
 
 /* Arch hooks */
 int xsplice_verify_elf(struct xsplice_elf *elf, uint8_t *data);
@@ -77,5 +81,12 @@ static inline unsigned long search_module_extables(unsigned long addr)
 {
 	return 0;
 }
+static inline const char *xsplice_symbols_lookup(unsigned long addr,
+                                                 unsigned long *symbolsize,
+                                                 unsigned long *offset,
+                                                 const char **module)
+{
+    return NULL;
+}
 #endif
 #endif /* __XEN_XSPLICE_H__ */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 21/23] xsplice: Add support for shadow variables
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
                   ` (19 preceding siblings ...)
  2016-02-12 18:05 ` [PATCH v3 20/23] x86, xsplice: Print payload's symbol name and module in backtraces Konrad Rzeszutek Wilk
@ 2016-02-12 18:05 ` Konrad Rzeszutek Wilk
  2016-03-07  7:40   ` Martin Pohlack
  2016-03-07 18:52   ` Martin Pohlack
  2016-02-12 18:06 ` [PATCH v3 22/23] xsplice: Add hooks functions and other macros Konrad Rzeszutek Wilk
                   ` (2 subsequent siblings)
  23 siblings, 2 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:05 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Ian Campbell, Ian Jackson, Jan Beulich,
	Keir Fraser, Tim Deegan, xen-devel

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Shadow variables are a piece of infrastructure to be used by xsplice
modules. They are used to attach a new piece of data to an existing
structure in memory.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 xen/common/Makefile             |   1 +
 xen/common/xsplice_shadow.c     | 105 ++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/xsplice_patch.h |  39 +++++++++++++++
 3 files changed, 145 insertions(+)
 create mode 100644 xen/common/xsplice_shadow.c
 create mode 100644 xen/include/xen/xsplice_patch.h

diff --git a/xen/common/Makefile b/xen/common/Makefile
index a8ceaff..f4d54ad 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -75,3 +75,4 @@ subdir-$(CONFIG_HAS_DEVICE_TREE) += libfdt
 
 obj-$(CONFIG_XSPLICE) += xsplice.o
 obj-$(CONFIG_XSPLICE) += xsplice_elf.o
+obj-$(CONFIG_XSPLICE) += xsplice_shadow.o
diff --git a/xen/common/xsplice_shadow.c b/xen/common/xsplice_shadow.c
new file mode 100644
index 0000000..619cdee
--- /dev/null
+++ b/xen/common/xsplice_shadow.c
@@ -0,0 +1,105 @@
+#include <xen/init.h>
+#include <xen/kernel.h>
+#include <xen/lib.h>
+#include <xen/list.h>
+#include <xen/spinlock.h>
+#include <xen/xsplice_patch.h>
+
+#define SHADOW_SLOTS 256
+struct hlist_head shadow_tbl[SHADOW_SLOTS];
+static DEFINE_SPINLOCK(shadow_lock);
+
+struct shadow_var {
+    struct hlist_node list;         /* Linked to 'shadow_tbl' */
+    void *data;
+    const void *obj;
+    char var[16];
+};
+
+void *xsplice_shadow_alloc(const void *obj, const char *var, size_t size)
+{
+    struct shadow_var *shadow;
+    unsigned int slot;
+
+    shadow = xmalloc(struct shadow_var);
+    if ( !shadow )
+        return NULL;
+
+    shadow->obj = obj;
+    strlcpy(shadow->var, var, sizeof shadow->var);
+    shadow->data = xmalloc_bytes(size);
+    if ( !shadow->data )
+    {
+        xfree(shadow);
+        return NULL;
+    }
+
+    slot = (unsigned long)obj % SHADOW_SLOTS;
+    spin_lock(&shadow_lock);
+    hlist_add_head(&shadow->list, &shadow_tbl[slot]);
+    spin_unlock(&shadow_lock);
+
+    return shadow->data;
+}
+
+void xsplice_shadow_free(const void *obj, const char *var)
+{
+    struct shadow_var *entry, *shadow = NULL;
+    unsigned int slot;
+    struct hlist_node *next;
+
+    slot = (unsigned long)obj % SHADOW_SLOTS;
+
+    spin_lock(&shadow_lock);
+    hlist_for_each_entry(entry, next, &shadow_tbl[slot], list)
+    {
+        if ( entry->obj == obj &&
+             !strcmp(entry->var, var) )
+        {
+            shadow = entry;
+            break;
+        }
+    }
+    if (shadow)
+    {
+        hlist_del(&shadow->list);
+        xfree(shadow->data);
+        xfree(shadow);
+    }
+    spin_unlock(&shadow_lock);
+}
+
+void *xsplice_shadow_get(const void *obj, const char *var)
+{
+    struct shadow_var *entry;
+    unsigned int slot;
+    struct hlist_node *next;
+    void *ret = NULL;
+
+    slot = (unsigned long)obj % SHADOW_SLOTS;
+
+    spin_lock(&shadow_lock);
+    hlist_for_each_entry(entry, next, &shadow_tbl[slot], list)
+    {
+        if ( entry->obj == obj &&
+             !strcmp(entry->var, var) )
+        {
+            ret = entry->data;
+            break;
+        }
+    }
+
+    spin_unlock(&shadow_lock);
+    return ret;
+}
+
+static int __init xsplice_shadow_init(void)
+{
+    int i;
+
+    for ( i = 0; i < SHADOW_SLOTS; i++ )
+        INIT_HLIST_HEAD(&shadow_tbl[i]);
+
+    return 0;
+}
+__initcall(xsplice_shadow_init);
diff --git a/xen/include/xen/xsplice_patch.h b/xen/include/xen/xsplice_patch.h
new file mode 100644
index 0000000..e3f344b
--- /dev/null
+++ b/xen/include/xen/xsplice_patch.h
@@ -0,0 +1,39 @@
+#ifndef __XEN_XSPLICE_PATCH_H__
+#define __XEN_XSPLICE_PATCH_H__
+
+/*
+ * The following definitions are to be used in patches. They are taken
+ * from kpatch.
+ */
+
+/*
+ * xsplice shadow variables
+ *
+ * These functions can be used to add new "shadow" fields to existing data
+ * structures.  For example, to allocate a "newpid" variable associated with an
+ * instance of task_struct, and assign it a value of 1000:
+ *
+ * struct task_struct *tsk = current;
+ * int *newpid;
+ * newpid = xsplice_shadow_alloc(tsk, "newpid", sizeof(int));
+ * if (newpid)
+ * 	*newpid = 1000;
+ *
+ * To retrieve a pointer to the variable:
+ *
+ * struct task_struct *tsk = current;
+ * int *newpid;
+ * newpid = xsplice_shadow_get(tsk, "newpid");
+ * if (newpid)
+ * 	printk("task newpid = %d\n", *newpid); // prints "task newpid = 1000"
+ *
+ * To free it:
+ *
+ * xsplice_shadow_free(tsk, "newpid");
+ */
+
+void *xsplice_shadow_alloc(const void *obj, const char *var, size_t size);
+void xsplice_shadow_free(const void *obj, const char *var);
+void *xsplice_shadow_get(const void *obj, const char *var);
+
+#endif /* __XEN_XSPLICE_PATCH_H__ */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 22/23] xsplice: Add hooks functions and other macros
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
                   ` (20 preceding siblings ...)
  2016-02-12 18:05 ` [PATCH v3 21/23] xsplice: Add support for shadow variables Konrad Rzeszutek Wilk
@ 2016-02-12 18:06 ` Konrad Rzeszutek Wilk
  2016-02-12 18:06 ` [PATCH v3 23/23] xsplice, hello_world: Use the XSPLICE_[UN|]LOAD_HOOK hooks for two functions Konrad Rzeszutek Wilk
  2016-02-12 21:57 ` [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
  23 siblings, 0 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:06 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Ian Campbell, Ian Jackson, Jan Beulich,
	Keir Fraser, Tim Deegan, xen-devel
  Cc: Konrad Rzeszutek Wilk

Add hook functions which run during patch apply and patch revert. Hook
functions are used by xsplice modules to manipulate data structures
during patching, etc.

Also add macros to be used by modules for excluding functions or
sections from being included in a patch.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 docs/misc/xsplice.markdown      | 21 +++++++++++++++++++
 xen/common/xsplice.c            | 38 ++++++++++++++++++++++++++++++++++
 xen/include/xen/xsplice_patch.h | 46 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 105 insertions(+)

diff --git a/docs/misc/xsplice.markdown b/docs/misc/xsplice.markdown
index 1a982f2..d74293a 100644
--- a/docs/misc/xsplice.markdown
+++ b/docs/misc/xsplice.markdown
@@ -290,6 +290,12 @@ The payload contains at least three sections:
     depends on.
  *  `.note.gnu.build-id` - the build-id of this payload.
 
+It optionally may contain the address of functions to be called right before
+being applied and reverted:
+
+ * `.xsplice.hooks.load` - an array of function pointers.
+ * `.xsplice.hooks.unload` - an array of function pointers.
+
 ### .xsplice.funcs
 
 The `.xsplice.funcs` contains an array of xsplice_patch_func structures
@@ -387,6 +393,21 @@ checksum, MD5 checksum or any unique value.
 
 The size of these structures varies with the --build-id linker option.
 
+### .xsplice.hooks.load and .xsplice.hooks.unload
+
+This section contains an array of function pointers to be executed
+before payload is being applied (.xsplice.funcs) or after reverting
+the payload.
+
+Each entry in this array is eight bytes.
+
+The type definition of the function are as follow:
+
+<pre>
+typedef void (*xsplice_loadcall_t)(void);  
+typedef void (*xsplice_unloadcall_t)(void);   
+</pre>
+
 ## Hypercalls
 
 We will employ the sub operations of the system management hypercall (sysctl).
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index ae2882f..fc901ad 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -20,6 +20,7 @@
 #include <xen/wait.h>
 #include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
+#include <xen/xsplice_patch.h>
 
 #include <asm/event.h>
 #include <asm/nmi.h>
@@ -46,10 +47,15 @@ struct payload {
     struct list_head applied_list;       /* Linked to 'applied_list'. */
     struct xsplice_patch_func *funcs;    /* The array of functions to patch. */
     unsigned int nfuncs;                 /* Nr of functions to patch. */
+    xsplice_loadcall_t *load_funcs;      /* The array of funcs to call after */
+    xsplice_unloadcall_t *unload_funcs;  /* load and unload of the payload. */
+    unsigned int n_load_funcs;           /* Nr of the funcs to load and execute. */
+    unsigned int n_unload_funcs;         /* Nr of funcs to call durung unload. */
     size_t core_size;                    /* Everything else - .data,.rodata, etc. */
     size_t core_text_size;               /* Only .text size. */
     struct bug_frame *start_bug_frames[BUGFRAME_NR]; /* .bug.frame patching. */
     struct bug_frame *stop_bug_frames[BUGFRAME_NR];
+
 #ifdef CONFIG_X86
     struct exception_table_entry *start_ex_table;
     struct exception_table_entry *stop_ex_table;
@@ -744,6 +750,28 @@ static int find_special_sections(struct payload *payload,
         }
     }
 
+    sec = xsplice_elf_sec_by_name(elf, ".xsplice.hooks.load");
+    if ( sec )
+    {
+        if ( ( !sec->sec->sh_size ) ||
+             ( sec->sec->sh_size % sizeof (*payload->load_funcs) ) )
+            return -EINVAL;
+
+        payload->load_funcs = (xsplice_loadcall_t *)sec->load_addr;
+        payload->n_load_funcs = sec->sec->sh_size / (sizeof *payload->load_funcs);
+    }
+
+    sec = xsplice_elf_sec_by_name(elf, ".xsplice.hooks.unload");
+    if ( sec )
+    {
+        if ( ( !sec->sec->sh_size ) ||
+             ( sec->sec->sh_size % sizeof (*payload->unload_funcs) ) )
+            return -EINVAL;
+
+        payload->unload_funcs = (xsplice_unloadcall_t *)sec->load_addr;
+        payload->n_unload_funcs = sec->sec->sh_size / (sizeof *payload->unload_funcs);
+    }
+
     sec = xsplice_elf_sec_by_name(elf, ".note.gnu.build-id");
     if ( sec )
     {
@@ -1029,6 +1057,11 @@ static int apply_payload(struct payload *data)
     for ( i = 0; i < data->nfuncs; i++ )
         xsplice_apply_jmp(data->funcs + i);
 
+    spin_debug_disable();
+    for (i = 0; i < data->n_load_funcs; i++)
+        data->load_funcs[i]();
+    spin_debug_enable();
+
     list_add_tail(&data->applied_list, &applied_list);
 
     return 0;
@@ -1059,6 +1092,11 @@ static int revert_payload(struct payload *data)
     for ( i = 0; i < data->nfuncs; i++ )
         xsplice_revert_jmp(data->funcs + i);
 
+    spin_debug_disable();
+    for (i = 0; i < data->n_unload_funcs; i++)
+        data->unload_funcs[i]();
+    spin_debug_enable();
+
     list_del(&data->applied_list);
 
     return 0;
diff --git a/xen/include/xen/xsplice_patch.h b/xen/include/xen/xsplice_patch.h
index e3f344b..1450406 100644
--- a/xen/include/xen/xsplice_patch.h
+++ b/xen/include/xen/xsplice_patch.h
@@ -5,6 +5,52 @@
  * The following definitions are to be used in patches. They are taken
  * from kpatch.
  */
+typedef void (*xsplice_loadcall_t)(void);
+typedef void (*xsplice_unloadcall_t)(void);
+
+/* This definition is taken from Linux. */
+#define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__)
+/*
+ * XSPLICE_IGNORE_SECTION macro
+ *
+ * This macro is for ignoring sections that may change as a side effect of
+ * another change or might be a non-bundlable section; that is one that does
+ * not honor -ffunction-section and create a one-to-one relation from function
+ * symbol to section.
+ */
+#define XSPLICE_IGNORE_SECTION(_sec) \
+	char *__UNIQUE_ID(xsplice_ignore_section_) __section(".xsplice.ignore.sections") = _sec;
+
+/*
+ * XSPLICE_IGNORE_FUNCTION macro
+ *
+ * This macro is for ignoring functions that may change as a side effect of a
+ * change in another function.
+ */
+#define XSPLICE_IGNORE_FUNCTION(_fn) \
+	void *__xsplice_ignore_func_##_fn __section(".xsplice.ignore.functions") = _fn;
+
+/*
+ * XSPLICE_LOAD_HOOK macro
+ *
+ * Declares a function pointer to be allocated in a new
+ * .xsplice.hook.load section.  This xsplice_load_data symbol is later
+ * stripped by create-diff-object so that it can be declared in multiple
+ * objects that are later linked together, avoiding global symbol
+ * collision.  Since multiple hooks can be registered, the
+ * .xsplice.hook.load section is a table of functions that will be
+ * executed in series by the xsplice infrastructure at patch load time.
+ */
+#define XSPLICE_LOAD_HOOK(_fn) \
+	xsplice_loadcall_t __attribute__((weak)) xsplice_load_data __section(".xsplice.hooks.load") = _fn;
+
+/*
+ * XSPLICE_UNLOAD_HOOK macro
+ *
+ * Same as LOAD hook with s/load/unload/
+ */
+#define XSPLICE_UNLOAD_HOOK(_fn) \
+	xsplice_unloadcall_t __attribute__((weak)) xsplice_unload_data __section(".xsplice.hooks.unload") = _fn;
 
 /*
  * xsplice shadow variables
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 23/23] xsplice, hello_world: Use the XSPLICE_[UN|]LOAD_HOOK hooks for two functions.
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
                   ` (21 preceding siblings ...)
  2016-02-12 18:06 ` [PATCH v3 22/23] xsplice: Add hooks functions and other macros Konrad Rzeszutek Wilk
@ 2016-02-12 18:06 ` Konrad Rzeszutek Wilk
  2016-02-12 21:57 ` [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
  23 siblings, 0 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 18:06 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Keir Fraser, Jan Beulich, xen-devel
  Cc: Konrad Rzeszutek Wilk

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/arch/x86/test/xen_hello_world.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/xen/arch/x86/test/xen_hello_world.c b/xen/arch/x86/test/xen_hello_world.c
index 6200fbe..b6fa05e 100644
--- a/xen/arch/x86/test/xen_hello_world.c
+++ b/xen/arch/x86/test/xen_hello_world.c
@@ -1,7 +1,9 @@
 #include <xen/config.h>
 #include <xen/types.h>
+#include <xen/xsplice_patch.h>
 #include <xen/xsplice.h>
 #include "config.h"
+#include <xen/lib.h>
 
 static char name[] = "xen_hello_world";
 extern const char *xen_hello_world(void);
@@ -9,6 +11,19 @@ extern const char *xen_hello_world(void);
 /* External symbol. */
 extern const char *xen_extra_version(void);
 
+void apply_hook(void)
+{
+    printk(KERN_DEBUG "Hook executing.\n");
+}
+
+void revert_hook(void)
+{
+    printk(KERN_DEBUG "Hook unloaded.\n");
+}
+
+xsplice_loadcall_t  xsplice_load_data __section(".xsplice.hooks.load") = apply_hook;
+xsplice_unloadcall_t  xsplice_unload_data __section(".xsplice.hooks.unload") = revert_hook;
+
 struct xsplice_patch_func __section(".xsplice.funcs") xsplice_xen_hello_world = {
     .name = name,
     .new_addr = (unsigned long)(xen_hello_world),
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 01/23] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op (v10)
  2016-02-12 18:05 ` [PATCH v3 01/23] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op (v10) Konrad Rzeszutek Wilk
@ 2016-02-12 20:11   ` Andrew Cooper
  2016-02-12 20:40     ` Konrad Rzeszutek Wilk
  2016-02-19 19:36     ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 86+ messages in thread
From: Andrew Cooper @ 2016-02-12 20:11 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, konrad, mpohlack,
	ross.lagerwall, sasha.levin, jinsong.liu, Daniel De Graaf,
	Ian Jackson, Stefano Stabellini, Ian Campbell, Wei Liu,
	xen-devel

On 12/02/16 18:05, Konrad Rzeszutek Wilk wrote:
> diff --git a/xen/common/Kconfig b/xen/common/Kconfig
> index 6f404b4..619aa9e 100644
> --- a/xen/common/Kconfig
> +++ b/xen/common/Kconfig
> @@ -152,4 +152,14 @@ config SCHED_DEFAULT
>  
>  endmenu
>  
> +# Enable/Disable xsplice support
> +config XSPLICE
> +	bool "xsplice support"

"XSplice live patching support" ?

> +	default y
> +	---help---
> +	  Allows a running Xen hypervisor to be patched without rebooting.
> +	  This is primarily used to patch an hypervisor with XSA fixes.

Somewhere in here should use the terms "dynamic" and "binary patching",
to better describe its method of operation.

> +
> +	  If unsure, say Y.
> +
>  endmenu
> diff --git a/xen/common/Makefile b/xen/common/Makefile
> index 6e82b33..43b3911 100644
> --- a/xen/common/Makefile
> +++ b/xen/common/Makefile
> @@ -72,3 +72,5 @@ subdir-$(coverage) += gcov
>  
>  subdir-y += libelf
>  subdir-$(CONFIG_HAS_DEVICE_TREE) += libfdt
> +
> +obj-$(CONFIG_XSPLICE) += xsplice.o

Should be part of the main obj- selection higher up.

> diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
> index 1624024..68e3eb4 100644
> --- a/xen/common/sysctl.c
> +++ b/xen/common/sysctl.c
> @@ -28,6 +28,7 @@
>  #include <xsm/xsm.h>
>  #include <xen/pmstat.h>
>  #include <xen/gcov.h>
> +#include <xen/xsplice.h>
>  
>  long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>  {
> @@ -460,6 +461,12 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>          ret = tmem_control(&op->u.tmem_op);
>          break;
>  
> +    case XEN_SYSCTL_xsplice_op:
> +        ret = xsplice_control(&op->u.xsplice);

Could we name this do_xsplice_op() to match prevailing subop style.

> +        if ( ret != -ENOSYS )
> +            copyback = 1;
> +        break;
> +

Not related to this patch.  I (and by this, I mean someone with time ;p)
should do some cleanup and pass copyback by pointer to subops.  This
allows for finer grain control of whether a copyback is needed.

> +static const char *state2str(int32_t state)
> +{
> +#define STATE(x) [XSPLICE_STATE_##x] = #x
> +    static const char *const names[] = {
> +            STATE(LOADED),
> +            STATE(CHECKED),
> +            STATE(APPLIED),
> +    };
> +#undef STATE
> +
> +    if (state >= ARRAY_SIZE(names))
> +        return "unknown";
> +
> +    if (state < 0)
> +        return "-EXX";
> +
> +    if (!names[state])
> +        return "unknown";

This could be folded into the ARRAY_SIZE() check.

> +
> +    return names[state];
> +}
> +
> +static void xsplice_printall(unsigned char key)
> +{
> +    struct payload *data;
> +
> +    spin_lock(&payload_lock);
> +
> +    list_for_each_entry ( data, &payload_list, list )
> +        printk(" name=%s state=%s(%d)\n", data->name,
> +               state2str(data->state), data->state);
> +
> +    spin_unlock(&payload_lock);
> +}
> +
> +static int verify_name(xen_xsplice_name_t *name)

const

> +{
> +    if ( name->size == 0 || name->size > XEN_XSPLICE_NAME_SIZE )
> +        return -EINVAL;
> +
> +    if ( name->pad[0] || name->pad[1] || name->pad[2] )
> +        return -EINVAL;
> +
> +    if ( !guest_handle_okay(name->name, name->size) )
> +        return -EINVAL;
> +
> +    return 0;
> +}
> +
> +static int find_payload(xen_xsplice_name_t *name, bool_t need_lock,
> +                        struct payload **f)
> +{
> +    struct payload *data;
> +    XEN_GUEST_HANDLE_PARAM(char) str;
> +    char n[XEN_XSPLICE_NAME_SIZE + 1] = { 0 };
> +    int rc = -EINVAL;
> +
> +    rc = verify_name(name);
> +    if ( rc )
> +        return rc;
> +
> +    str = guest_handle_cast(name->name, char);
> +    if ( copy_from_guest(n, str, name->size) )
> +        return -EFAULT;
> +
> +    if ( need_lock )
> +        spin_lock(&payload_lock);

What is the usecase where the lock shouldn't be taken?

[Edit]  From below, its clear that this should be a recursive spinlock.

> +
> +    rc = -ENOENT;
> +    list_for_each_entry ( data, &payload_list, list )
> +    {
> +        if ( !strcmp(data->name, n) )
> +        {
> +            *f = data;
> +            rc = 0;
> +            break;
> +        }
> +    }
> +
> +    if ( need_lock )
> +        spin_unlock(&payload_lock);
> +
> +    return rc;
> +}
> +
> +static int verify_payload(xen_sysctl_xsplice_upload_t *upload)

const

> +{
> +    if ( verify_name(&upload->name) )
> +        return -EINVAL;
> +
> +    if ( upload->size == 0 )
> +        return -EINVAL;
> +
> +    if ( !guest_handle_okay(upload->payload, upload->size) )
> +        return -EFAULT;
> +
> +    return 0;
> +}
> +
> +/*
> + * We MUST be holding the payload_lock spinlock.

In which case ASSERT(spin_is_locked())

> + */
> +static void free_payload(struct payload *data)
> +{
> +    list_del(&data->list);
> +    payload_cnt--;
> +    payload_version++;
> +    xfree(data);
> +}
> +
> +static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
> +{
> +    struct payload *data = NULL;
> +    uint8_t *raw_data;
> +    int rc;
> +
> +    rc = verify_payload(upload);
> +    if ( rc )
> +        return rc;
> +
> +    rc = find_payload(&upload->name, 1 /* true. */, &data);
> +    if ( rc == 0 /* Found. */ )
> +        return -EEXIST;
> +
> +    if ( rc != -ENOENT )
> +        return rc;
> +
> +    data = xzalloc(struct payload);
> +    if ( !data )
> +        return -ENOMEM;
> +
> +    memset(data, 0, sizeof *data);

xzalloc() has already zeroed data for you.

> +    rc = -EFAULT;
> +    if ( copy_from_guest(data->name, upload->name.name, upload->name.size) )
> +        goto err_data;
> +
> +    rc = -ENOMEM;
> +    raw_data = alloc_xenheap_pages(get_order_from_bytes(upload->size), 0);

Better to use valloc(), as it won't fail given lots of memory fragmentation.

> +    if ( !raw_data )
> +        goto err_data;
> +
> +    rc = -EFAULT;
> +    if ( copy_from_guest(raw_data, upload->payload, upload->size) )
> +        goto err_raw;
> +
> +    data->state = XSPLICE_STATE_LOADED;
> +    data->rc = 0;
> +    INIT_LIST_HEAD(&data->list);
> +
> +    spin_lock(&payload_lock);
> +    list_add_tail(&data->list, &payload_list);
> +    payload_cnt++;
> +    payload_version++;
> +    spin_unlock(&payload_lock);
> +
> +    free_xenheap_pages(raw_data, get_order_from_bytes(upload->size));
> +    return 0;
> +
> + err_raw:
> +    free_xenheap_pages(raw_data, get_order_from_bytes(upload->size));
> + err_data:
> +    xfree(data);

It would be cleaner to combine these two err lables into a single err
path.  Both free() functions function sensibly with NULL pointers.

> +    return rc;
> +}
> +
> +static int xsplice_get(xen_sysctl_xsplice_summary_t *summary)
> +{
> +    struct payload *data;
> +    int rc;
> +
> +    if ( summary->status.state )
> +        return -EINVAL;
> +
> +    if ( summary->status.rc != 0 )
> +        return -EINVAL;
> +
> +    rc = verify_name(&summary->name);
> +    if ( rc )
> +        return rc;
> +
> +    rc = find_payload(&summary->name, 1 /* true. */, &data);
> +    if ( rc )
> +        return rc;
> +
> +    summary->status.state = data->state;
> +    summary->status.rc = data->rc;
> +
> +    return 0;
> +}
> +
> +static int xsplice_list(xen_sysctl_xsplice_list_t *list)
> +{
> +    xen_xsplice_status_t status;
> +    struct payload *data;
> +    unsigned int idx = 0, i = 0;
> +    int rc = 0;
> +
> +    if ( list->nr > 1024 )
> +        return -E2BIG;
> +
> +    if ( list->pad != 0 )
> +        return -EINVAL;
> +
> +    if ( !guest_handle_okay(list->status, sizeof(status) * list->nr) ||
> +         !guest_handle_okay(list->name, XEN_XSPLICE_NAME_SIZE * list->nr) ||
> +         !guest_handle_okay(list->len, sizeof(uint32_t) * list->nr) )
> +        return -EINVAL;
> +
> +    spin_lock(&payload_lock);
> +    if ( list->idx > payload_cnt || !list->nr )
> +    {
> +        spin_unlock(&payload_lock);
> +        return -EINVAL;
> +    }
> +
> +    list_for_each_entry( data, &payload_list, list )
> +    {
> +        uint32_t len;
> +
> +        if ( list->idx > i++ )
> +            continue;
> +
> +        status.state = data->state;
> +        status.rc = data->rc;
> +        len = strlen(data->name);
> +
> +        /* N.B. 'idx' != 'i'. */
> +        if ( __copy_to_guest_offset(list->name, idx * XEN_XSPLICE_NAME_SIZE,
> +                                    data->name, len) ||
> +             __copy_to_guest_offset(list->len, idx, &len, 1) ||
> +             __copy_to_guest_offset(list->status, idx, &status, 1) )
> +        {
> +            rc = -EFAULT;
> +            break;
> +        }
> +        idx++;

Some extra newlines around here please.

> +        if ( hypercall_preempt_check() || (idx + 1 > list->nr) )
> +            break;
> +    }
> +    list->nr = payload_cnt - i; /* Remaining amount. */
> +    list->version = payload_version;
> +    spin_unlock(&payload_lock);
> +
> +    /* And how many we have processed. */
> +    return rc ? : idx;
> +}
> +
> +static int xsplice_action(xen_sysctl_xsplice_action_t *action)
> +{
> +    struct payload *data;
> +    int rc;
> +
> +    rc = verify_name(&action->name);
> +    if ( rc )
> +        return rc;
> +
> +    spin_lock(&payload_lock);
> +    rc = find_payload(&action->name, 0 /* We are holding the lock. */, &data);

Looks like payload_lock should be a recursive lock.  Please do that,
rather than risk accessing locked data without the lock held at all.

> +    if ( rc )
> +        goto out;
> +
> +    switch ( action->cmd )
> +    {
> +    case XSPLICE_ACTION_CHECK:
> +        if ( (data->state == XSPLICE_STATE_LOADED) ||
> +             (data->state == XSPLICE_STATE_CHECKED) )
> +        {
> +            /* No implementation yet. */
> +            data->state = XSPLICE_STATE_CHECKED;
> +            data->rc = 0;
> +            rc = 0;
> +        }
> +        break;

Newlines between break and case statements please.

> +    case XSPLICE_ACTION_UNLOAD:
> +        if ( (data->state == XSPLICE_STATE_LOADED) ||
> +             (data->state == XSPLICE_STATE_CHECKED) )
> +        {
> +            free_payload(data);
> +            /* No touching 'data' from here on! */
> +            rc = 0;
> +        }
> +        break;
> +    case XSPLICE_ACTION_REVERT:
> +        if ( data->state == XSPLICE_STATE_APPLIED )
> +        {
> +            /* No implementation yet. */
> +            data->state = XSPLICE_STATE_CHECKED;
> +            data->rc = 0;
> +            rc = 0;
> +        }
> +        break;
> +    case XSPLICE_ACTION_APPLY:
> +        if ( (data->state == XSPLICE_STATE_CHECKED) )
> +        {
> +            /* No implementation yet. */
> +            data->state = XSPLICE_STATE_APPLIED;
> +            data->rc = 0;
> +            rc = 0;
> +        }
> +        break;
> +    case XSPLICE_ACTION_REPLACE:
> +        if ( data->state == XSPLICE_STATE_CHECKED )
> +        {
> +            /* No implementation yet. */
> +            data->state = XSPLICE_STATE_CHECKED;
> +            data->rc = 0;
> +            rc = 0;
> +        }
> +        break;
> +    default:
> +        rc = -EOPNOTSUPP;
> +        break;
> +    }
> +
> + out:
> +    spin_unlock(&payload_lock);
> +
> +    return rc;
> +}
> +
> +int xsplice_control(xen_sysctl_xsplice_op_t *xsplice)
> +{
> +    int rc;
> +
> +    if ( xsplice->pad != 0 )
> +        return -EINVAL;
> +
> +    switch ( xsplice->cmd )
> +    {
> +    case XEN_SYSCTL_XSPLICE_UPLOAD:
> +        rc = xsplice_upload(&xsplice->u.upload);
> +        break;

Newlines for these as well please.

> +    case XEN_SYSCTL_XSPLICE_GET:
> +        rc = xsplice_get(&xsplice->u.get);
> +        break;
> +    case XEN_SYSCTL_XSPLICE_LIST:
> +        rc = xsplice_list(&xsplice->u.list);
> +        break;
> +    case XEN_SYSCTL_XSPLICE_ACTION:
> +        rc = xsplice_action(&xsplice->u.action);
> +        break;
> +    default:
> +        rc = -EOPNOTSUPP;
> +        break;
> +   }
> +
> +    return rc;
> +}
> +
> +static int __init xsplice_init(void)
> +{
> +    register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
> +    return 0;
> +}
> +__initcall(xsplice_init);

Local variable block please.

> diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
> index 96680eb..d549e7a 100644
> --- a/xen/include/public/sysctl.h
> +++ b/xen/include/public/sysctl.h
> @@ -766,6 +766,160 @@ struct xen_sysctl_tmem_op {
>  typedef struct xen_sysctl_tmem_op xen_sysctl_tmem_op_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tmem_op_t);
>  
> +/*
> + * XEN_SYSCTL_XSPLICE_op
> + *
> + * Refer to the http://xenbits.xenproject.org/docs/unstable/misc/xsplice.html

I would refer to the file in the source tree, so docs/misc/xsplice.$FOO
which is far less likely to change.

> + * for the design details of this hyprcall.
> + */
> +
> +/*
> + * Structure describing an ELF payload. Uniquely identifies the
> + * payload. Should be human readable.
> + * Recommended length is upto XEN_XSPLICE_NAME_SIZE.
> + */
> +#define XEN_XSPLICE_NAME_SIZE 128
> +struct xen_xsplice_name {
> +    XEN_GUEST_HANDLE_64(char) name;         /* IN: pointer to name. */
> +    uint16_t size;                          /* IN: size of name. May be upto
> +                                               XEN_XSPLICE_NAME_SIZE. */
> +    uint16_t pad[3];                        /* IN: MUST be zero. */
> +};
> +typedef struct xen_xsplice_name xen_xsplice_name_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_xsplice_name_t);
> +
> +/*
> + * Upload a payload to the hypervisor. The payload is verified
> + * against basic checks and if there are any issues the proper return code
> + * will be returned. The payload is not applied at this time - that is
> + * controlled by XEN_SYSCTL_XSPLICE_ACTION.
> + *
> + * The return value is zero if the payload was succesfully uploaded.
> + * Otherwise an EXX return value is provided. Duplicate `name` are not
> + * supported.

I would recommend having a full state diagram in the xsplice
documentation and referring to that, rather than having half a "but not
this yet" set of comments in the header file.

> + *
> + * The payload at this point is verified against the basic checks.
> + *
> + * The `payload` is the ELF payload as mentioned in the `Payload format`
> + * section in the xSplice design document.
> + */
> +#define XEN_SYSCTL_XSPLICE_UPLOAD 0
> +struct xen_sysctl_xsplice_upload {
> +    xen_xsplice_name_t name;                /* IN, name of the patch. */
> +    uint64_t size;                          /* IN, size of the ELF file. */
> +    XEN_GUEST_HANDLE_64(uint8) payload;     /* IN, the ELF file. */
> +};
> +typedef struct xen_sysctl_xsplice_upload xen_sysctl_xsplice_upload_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_upload_t);
> +
> +/*
> + * Retrieve an status of an specific payload.
> + *
> + * Upon completion the `struct xen_xsplice_status` is updated.
> + *
> + * The return value is zero on success and XEN_EXX on failure. This operation
> + * is synchronous and does not require preemption.
> + */
> +#define XEN_SYSCTL_XSPLICE_GET 1
> +
> +struct xen_xsplice_status {
> +#define XSPLICE_STATE_LOADED       1
> +#define XSPLICE_STATE_CHECKED      2
> +#define XSPLICE_STATE_APPLIED      3
> +    int32_t state;                 /* OUT: XSPLICE_STATE_*. IN: MUST be zero. */
> +    int32_t rc;                    /* OUT: 0 if no error, otherwise -XEN_EXX. */
> +                                   /* IN: MUST be zero. */
> +};
> +typedef struct xen_xsplice_status xen_xsplice_status_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_xsplice_status_t);
> +
> +struct xen_sysctl_xsplice_summary {
> +    xen_xsplice_name_t name;                /* IN, name of the payload. */
> +    xen_xsplice_status_t status;            /* IN/OUT, state of it. */
> +};
> +typedef struct xen_sysctl_xsplice_summary xen_sysctl_xsplice_summary_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_summary_t);
> +
> +/*
> + * Retrieve an array of abbreviated status and names of payloads that are
> + * loaded in the hypervisor.
> + *
> + * If the hypercall returns an positive number, it is the number (up to `nr`)
> + * of the payloads returned, along with `nr` updated with the number of remaining
> + * payloads, `version` updated (it may be the same across hypercalls. If it
> + * varies the data is stale and further calls could fail). The `status`,
> + * `name`, and `len`' are updated at their designed index value (`idx`) with
> + * the returned value of data.
> + *
> + * If the hypercall returns E2BIG the `nr` is too big and should be
> + * lowered.

What would cause this situation to occur?

> + *
> + * This operation can be preempted by the hypercall returning EAGAIN.
> + * Retry.

Again, why is this necessary or useful?

> + *
> + * Note that due to the asynchronous nature of hypercalls the domain might have
> + * added or removed the number of payloads making this information stale. It is
> + * the responsibility of the toolstack to use the `version` field to check
> + * between each invocation. if the version differs it should discard the stale
> + * data and start from scratch. It is OK for the toolstack to use the new
> + * `version` field.
> + */
> +#define XEN_SYSCTL_XSPLICE_LIST 2
> +struct xen_sysctl_xsplice_list {
> +    uint32_t version;                       /* IN/OUT: Initially *MUST* be zero.
> +                                               On subsequent calls reuse value.
> +                                               If varies between calls, we are
> +                                             * getting stale data. */
> +    uint32_t idx;                           /* IN/OUT: Index into array. */
> +    uint32_t nr;                            /* IN: How many status, name, and len
> +                                               should fill out.
> +                                               OUT: How many payloads left. */
> +    uint32_t pad;                           /* IN: Must be zero. */
> +    XEN_GUEST_HANDLE_64(xen_xsplice_status_t) status;  /* OUT. Must have enough
> +                                               space allocate for nr of them. */
> +    XEN_GUEST_HANDLE_64(char) name;         /* OUT: Array of names. Each member
> +                                               MUST XEN_XSPLICE_NAME_SIZE in size.
> +                                               Must have nr of them. */
> +    XEN_GUEST_HANDLE_64(uint32) len;        /* OUT: Array of lengths of name's.
> +                                               Must have nr of them. */
> +};
> +typedef struct xen_sysctl_xsplice_list xen_sysctl_xsplice_list_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_list_t);
> +
> +/*
> + * Perform an operation on the payload structure referenced by the `name` field.
> + * The operation request is asynchronous and the status should be retrieved
> + * by using either XEN_SYSCTL_XSPLICE_GET or XEN_SYSCTL_XSPLICE_LIST hypercall.
> + */
> +#define XEN_SYSCTL_XSPLICE_ACTION 3
> +struct xen_sysctl_xsplice_action {
> +    xen_xsplice_name_t name;                /* IN, name of the patch. */
> +#define XSPLICE_ACTION_CHECK        1
> +#define XSPLICE_ACTION_UNLOAD       2
> +#define XSPLICE_ACTION_REVERT       3
> +#define XSPLICE_ACTION_APPLY        4
> +#define XSPLICE_ACTION_REPLACE      5
> +    uint32_t cmd;                           /* IN: XSPLICE_ACTION_*. */
> +    uint32_t timeout;                       /* IN: Zero if no timeout. */
> +                                            /* Or upper bound of time (ms) */
> +                                            /* for operation to take. */
> +};
> +typedef struct xen_sysctl_xsplice_action xen_sysctl_xsplice_action_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_action_t);
> +
> +struct xen_sysctl_xsplice_op {
> +    uint32_t cmd;                           /* IN: XEN_SYSCTL_XSPLICE_*. */
> +    uint32_t pad;                           /* IN: Always zero. */
> +    union {
> +        xen_sysctl_xsplice_upload_t upload;
> +        xen_sysctl_xsplice_list_t list;
> +        xen_sysctl_xsplice_summary_t get;
> +        xen_sysctl_xsplice_action_t action;
> +    } u;
> +};
> +typedef struct xen_sysctl_xsplice_op xen_sysctl_xsplice_op_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_op_t);
> +
>  struct xen_sysctl {
>      uint32_t cmd;
>  #define XEN_SYSCTL_readconsole                    1
> @@ -791,6 +945,7 @@ struct xen_sysctl {
>  #define XEN_SYSCTL_pcitopoinfo                   22
>  #define XEN_SYSCTL_psr_cat_op                    23
>  #define XEN_SYSCTL_tmem_op                       24
> +#define XEN_SYSCTL_xsplice_op                    25
>      uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
>      union {
>          struct xen_sysctl_readconsole       readconsole;
> @@ -816,6 +971,7 @@ struct xen_sysctl {
>          struct xen_sysctl_psr_cmt_op        psr_cmt_op;
>          struct xen_sysctl_psr_cat_op        psr_cat_op;
>          struct xen_sysctl_tmem_op           tmem_op;
> +        struct xen_sysctl_xsplice_op        xsplice;
>          uint8_t                             pad[128];
>      } u;
>  };
> diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
> new file mode 100644
> index 0000000..cf465c4
> --- /dev/null
> +++ b/xen/include/xen/xsplice.h
> @@ -0,0 +1,15 @@
> +#ifndef __XEN_XSPLICE_H__
> +#define __XEN_XSPLICE_H__
> +
> +struct xen_sysctl_xsplice_op;
> +
> +#ifdef CONFIG_XSPLICE

No reason for this all to be squashed completely together.

> +int xsplice_control(struct xen_sysctl_xsplice_op *);
> +#else
> +#include <xen/errno.h> /* For -ENOSYS */
> +static inline int xsplice_control(struct xen_sysctl_xsplice_op *op)
> +{
> +    return -ENOSYS;
> +}
> +#endif
> +#endif /* __XEN_XSPLICE_H__ */

Variable block please.

~Andrew

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 04/23] elf: Add relocation types to elfstructs.h
  2016-02-12 18:05 ` [PATCH v3 04/23] elf: Add relocation types to elfstructs.h Konrad Rzeszutek Wilk
@ 2016-02-12 20:13   ` Andrew Cooper
  2016-02-15  8:34   ` Jan Beulich
  1 sibling, 0 replies; 86+ messages in thread
From: Andrew Cooper @ 2016-02-12 20:13 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, konrad, mpohlack,
	ross.lagerwall, sasha.levin, jinsong.liu, Ian Campbell,
	Ian Jackson, Jan Beulich, Keir Fraser, Tim Deegan, xen-devel

On 12/02/16 18:05, Konrad Rzeszutek Wilk wrote:
> From: Ross Lagerwall <ross.lagerwall@citrix.com>
>
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

(this patch looks like it can be fast-tracked in the series?)

> ---
> v2: Slim the list as we do not use all of them.
> ---
>  xen/include/xen/elfstructs.h | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/xen/include/xen/elfstructs.h b/xen/include/xen/elfstructs.h
> index 12ffb82..4ff3258 100644
> --- a/xen/include/xen/elfstructs.h
> +++ b/xen/include/xen/elfstructs.h
> @@ -348,6 +348,14 @@ typedef struct {
>  #define	ELF64_R_TYPE(info)	((info) & 0xFFFFFFFF)
>  #define ELF64_R_INFO(s,t) 	(((s) << 32) + (u_int32_t)(t))
>  
> +/* x86-64 relocation types. We list only the ones we implement. */
> +#define R_X86_64_NONE		0	/* No reloc */
> +#define R_X86_64_64		1	/* Direct 64 bit  */
> +#define R_X86_64_PC32		2	/* PC relative 32 bit signed */
> +#define R_X86_64_PLT32		4	/* 32 bit PLT address */
> +#define R_X86_64_32		10	/* Direct 32 bit zero extended */
> +#define R_X86_64_32S		11	/* Direct 32 bit sign extended */
> +
>  /* Program Header */
>  typedef struct {
>  	Elf32_Word	p_type;		/* segment type */

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 05/23] xsplice: Add helper elf routines (v4)
  2016-02-12 18:05 ` [PATCH v3 05/23] xsplice: Add helper elf routines (v4) Konrad Rzeszutek Wilk
@ 2016-02-12 20:24   ` Andrew Cooper
  2016-02-12 20:47     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 86+ messages in thread
From: Andrew Cooper @ 2016-02-12 20:24 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, konrad, mpohlack,
	ross.lagerwall, sasha.levin, jinsong.liu, Ian Campbell,
	Ian Jackson, Jan Beulich, Keir Fraser, Tim Deegan, xen-devel

On 12/02/16 18:05, Konrad Rzeszutek Wilk wrote:
> From: Ross Lagerwall <ross.lagerwall@citrix.com>
>
> Add Elf routines and data structures in preparation for loading an
> xSplice payload.
>
> We also add an macro that will print where we failed during
> the ELF parsing - which is only available during debug builds.
> In production (debug=n) we only return the error value.
>
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
> v2: - With the #define ELFSIZE in the ARM file we can use the common
>      #defines instead of using #ifdef CONFIG_ARM_32. Moved to another
>     patch.
>     - Add checks for ELF file.
>     - Add name to be printed.
>     - Add len for easier ELF checks.
>     - Expand on the checks. Add macro.
> v3: Remove the return_ macro
> v4: Add return_ macro back but make it depend on debug=y
> ---
>  xen/common/Makefile           |   1 +
>  xen/common/xsplice_elf.c      | 205 ++++++++++++++++++++++++++++++++++++++++++
>  xen/include/xen/xsplice_elf.h |  37 ++++++++
>  3 files changed, 243 insertions(+)
>  create mode 100644 xen/common/xsplice_elf.c
>  create mode 100644 xen/include/xen/xsplice_elf.h
>
> diff --git a/xen/common/Makefile b/xen/common/Makefile
> index 43b3911..a8ceaff 100644
> --- a/xen/common/Makefile
> +++ b/xen/common/Makefile
> @@ -74,3 +74,4 @@ subdir-y += libelf
>  subdir-$(CONFIG_HAS_DEVICE_TREE) += libfdt
>  
>  obj-$(CONFIG_XSPLICE) += xsplice.o
> +obj-$(CONFIG_XSPLICE) += xsplice_elf.o

Again, location in the makefile.

> diff --git a/xen/common/xsplice_elf.c b/xen/common/xsplice_elf.c
> new file mode 100644
> index 0000000..d9f9002
> --- /dev/null
> +++ b/xen/common/xsplice_elf.c
> @@ -0,0 +1,205 @@
> +#include <xen/errno.h>
> +#include <xen/lib.h>
> +#include <xen/xsplice_elf.h>
> +#include <xen/xsplice.h>
> +
> +#ifdef NDEBUG
> +#define return_(x) return x
> +#else
> +#define return_(x) { printk(XENLOG_DEBUG "%s:%d rc: %d\n",  \
> +                            __func__,__LINE__, x); return x; }

:(  This is a horrible antipattern.  Just use dprintk() which gets you
this information anyway, and allows for more textural information.

> +#endif
> +
> +struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
> +                                                const char *name)
> +{
> +    unsigned int i;
> +
> +    for ( i = 0; i < elf->hdr->e_shnum; i++ )
> +    {
> +        if ( !strcmp(name, elf->sec[i].name) )
> +            return &elf->sec[i];
> +    }
> +
> +    return NULL;
> +}
> +
> +static int elf_resolve_sections(struct xsplice_elf *elf, uint8_t *data)
> +{
> +    struct xsplice_elf_sec *sec;
> +    unsigned int i;
> +
> +    sec = xmalloc_array(struct xsplice_elf_sec, elf->hdr->e_shnum);

Presumably there will be some sanity checks done somewhere between the
hypercall and here?

> +    if ( !sec )
> +    {
> +        printk(XENLOG_ERR "Could not allocate memory for section table!\n");
> +        return_(-ENOMEM);
> +    }
> +
> +    /* N.B. We also will ingest SHN_UNDEF sections. */
> +    for ( i = 0; i < elf->hdr->e_shnum; i++ )
> +    {
> +        ssize_t delta = elf->hdr->e_shoff + i * elf->hdr->e_shentsize;
> +
> +        if ( delta + sizeof(Elf_Shdr) > elf->len )
> +            return_(-EINVAL);
> +
> +        sec[i].sec = (Elf_Shdr *)(data + delta);
> +        delta = sec[i].sec->sh_offset;
> +
> +        if ( delta > elf->len )
> +            return_(-EINVAL);
> +
> +        sec[i].data = data + delta;
> +        /* Name is populated in xsplice_elf_sections_name. */
> +        sec[i].name = NULL;
> +
> +        if ( sec[i].sec->sh_type == SHT_SYMTAB )
> +        {

(mis) alignment.

> +                if ( elf->symtab )
> +                    return_(-EINVAL);
> +                elf->symtab = &sec[i];
> +                /* elf->symtab->sec->sh_link would point to the right section
> +                 * but we hadn't finished parsing all the sections. */
> +                if ( elf->symtab->sec->sh_link > elf->hdr->e_shnum )
> +                    return_(-EINVAL);
> +        }
> +    }

Newline please.

> <snip>
> +
> +int xsplice_elf_load(struct xsplice_elf *elf, uint8_t *data)
> +{
> +    int rc;
> +
> +    elf->hdr = (Elf_Ehdr *)data;

A lot of this code would be neater (i.e. without explicit typecasts) if
data was void * rather than uint8_t.  GCC pointer arithmetic on void *
works in a sane way.

> +
> +    if ( sizeof(*elf->hdr) >= elf->len )
> +        return_(-EINVAL);
> +
> +    if ( elf->hdr->e_shstrndx == SHN_UNDEF )
> +        return_(-EINVAL);
> +
> +    /* Check that section name index is within the sections. */
> +    if ( elf->hdr->e_shstrndx > elf->hdr->e_shnum )
> +        return_(-EINVAL);
> +
> +    rc = elf_resolve_sections(elf, data);
> +    if ( rc )
> +        return rc;
> +
> +    rc = elf_resolve_section_names(elf, data);
> +    if ( rc )
> +        return rc;
> +
> +    rc = elf_get_sym(elf, data);
> +    if ( rc )
> +        return rc;
> +
> +    return 0;
> +}
> +
> +void xsplice_elf_free(struct xsplice_elf *elf)
> +{
> +    xfree(elf->sec);
> +    elf->sec = NULL;
> +    xfree(elf->sym);
> +    elf->sym = NULL;
> +    elf->nsym = 0;
> +    elf->name = NULL;
> +    elf->len = 0;
> +}

Variable block please.

> diff --git a/xen/include/xen/xsplice_elf.h b/xen/include/xen/xsplice_elf.h
> new file mode 100644
> index 0000000..42dbc6f
> --- /dev/null
> +++ b/xen/include/xen/xsplice_elf.h
> @@ -0,0 +1,37 @@
> +#ifndef __XEN_XSPLICE_ELF_H__
> +#define __XEN_XSPLICE_ELF_H__
> +
> +#include <xen/types.h>
> +#include <xen/elfstructs.h>
> +
> +/* The following describes an Elf file as consumed by xSplice. */
> +struct xsplice_elf_sec {
> +    Elf_Shdr *sec;                 /* Hooked up in elf_resolve_sections. */
> +    const char *name;              /* Human readable name hooked in
> +                                      elf_resolve_section_names. */
> +    const uint8_t *data;           /* Pointer to the section (done by
> +                                      elf_resolve_sections). */
> +};
> +
> +struct xsplice_elf_sym {
> +    Elf_Sym *sym;
> +    const char *name;
> +};
> +
> +struct xsplice_elf {
> +    const char *name;              /* Pointer to payload->name. */
> +    ssize_t len;                   /* Length of the ELF file. */
> +    Elf_Ehdr *hdr;                 /* ELF file. */
> +    struct xsplice_elf_sec *sec;   /* Array of sections, allocated by us. */
> +    struct xsplice_elf_sym *sym;   /* Array of symbols , allocated by us. */
> +    unsigned int nsym;
> +    struct xsplice_elf_sec *symtab;/* Pointer to .symtab section - aka to sec[x]. */
> +    struct xsplice_elf_sec *strtab;/* Pointer to .strtab section - aka to sec[y]. */
> +};
> +
> +struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
> +                                                const char *name);
> +int xsplice_elf_load(struct xsplice_elf *elf, uint8_t *data);
> +void xsplice_elf_free(struct xsplice_elf *elf);
> +
> +#endif /* __XEN_XSPLICE_ELF_H__ */

Variable block please.

~Andrew

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 01/23] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op (v10)
  2016-02-12 20:11   ` Andrew Cooper
@ 2016-02-12 20:40     ` Konrad Rzeszutek Wilk
  2016-02-12 20:53       ` Andrew Cooper
  2016-02-15  8:16       ` Jan Beulich
  2016-02-19 19:36     ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 20:40 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Ian Campbell, jinsong.liu, Stefano Stabellini,
	Ian Jackson, xen-devel, mpohlack, ross.lagerwall, xen-devel,
	Daniel De Graaf, sasha.levin

> > diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
> > index 96680eb..d549e7a 100644
> > --- a/xen/include/public/sysctl.h
> > +++ b/xen/include/public/sysctl.h
> > @@ -766,6 +766,160 @@ struct xen_sysctl_tmem_op {
> >  typedef struct xen_sysctl_tmem_op xen_sysctl_tmem_op_t;
> >  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tmem_op_t);
> >  
> > +/*
> > + * XEN_SYSCTL_XSPLICE_op
> > + *
> > + * Refer to the http://xenbits.xenproject.org/docs/unstable/misc/xsplice.html
> 
> I would refer to the file in the source tree, so docs/misc/xsplice.$FOO
> which is far less likely to change.

The initial patch had exactly that -  but Jan asked me to change it to the
URL. Shall I include both of them?

.. snip..
> > + * Retrieve an array of abbreviated status and names of payloads that are
> > + * loaded in the hypervisor.
> > + *
> > + * If the hypercall returns an positive number, it is the number (up to `nr`)
> > + * of the payloads returned, along with `nr` updated with the number of remaining
> > + * payloads, `version` updated (it may be the same across hypercalls. If it
> > + * varies the data is stale and further calls could fail). The `status`,
> > + * `name`, and `len`' are updated at their designed index value (`idx`) with
> > + * the returned value of data.
> > + *
> > + * If the hypercall returns E2BIG the `nr` is too big and should be
> > + * lowered.
> 
> What would cause this situation to occur?

If the hypervisor decided that the 'nr' is too big. It is hardcoded to an value - but
I don't think it makes sense to mention that in the header filer.
> 
> > + *
> > + * This operation can be preempted by the hypercall returning EAGAIN.
> > + * Retry.
> 
> Again, why is this necessary or useful?

Actually it is a lie. I've updated the design document but forgot to remove it here!

Thanks for the comments! Let me update the file..

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 05/23] xsplice: Add helper elf routines (v4)
  2016-02-12 20:24   ` Andrew Cooper
@ 2016-02-12 20:47     ` Konrad Rzeszutek Wilk
  2016-02-12 20:52       ` Andrew Cooper
  0 siblings, 1 reply; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 20:47 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Keir Fraser, Ian Campbell, jinsong.liu, Ian Jackson, Tim Deegan,
	mpohlack, ross.lagerwall, Jan Beulich, xen-devel, xen-devel,
	sasha.levin

> > +struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
> > +                                                const char *name)
> > +{
> > +    unsigned int i;
> > +
> > +    for ( i = 0; i < elf->hdr->e_shnum; i++ )
> > +    {
> > +        if ( !strcmp(name, elf->sec[i].name) )
> > +            return &elf->sec[i];
> > +    }
> > +
> > +    return NULL;
> > +}
> > +
> > +static int elf_resolve_sections(struct xsplice_elf *elf, uint8_t *data)
> > +{
> > +    struct xsplice_elf_sec *sec;
> > +    unsigned int i;
> > +
> > +    sec = xmalloc_array(struct xsplice_elf_sec, elf->hdr->e_shnum);
> 
> Presumably there will be some sanity checks done somewhere between the
> hypercall and here?

There are checks on it but not the value itself. As in the payload could
have e_shnum be some astronomical value because of many .sections in the
file (even the ones we do not use). We could combat that by having
an whitelist of sections - and:
 - If the payload has them return -EINVAL.
 - If the payload has them - ignore them and continue on but instead of
   using e_shnum use the counted value of the sections we expect?

Preferences?

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 06/23] xsplice: Implement payload loading (v4)
  2016-02-12 18:05 ` [PATCH v3 06/23] xsplice: Implement payload loading (v4) Konrad Rzeszutek Wilk
@ 2016-02-12 20:48   ` Andrew Cooper
  2016-02-19 22:03     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 86+ messages in thread
From: Andrew Cooper @ 2016-02-12 20:48 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, konrad, mpohlack,
	ross.lagerwall, sasha.levin, jinsong.liu, Ian Campbell,
	Stefano Stabellini, Keir Fraser, Jan Beulich, xen-devel

On 12/02/16 18:05, Konrad Rzeszutek Wilk wrote:

I will refrain from repeating the same review from previous patches.

> +int xsplice_verify_elf(struct xsplice_elf *elf, uint8_t *data)
> +{
> +
> +    Elf_Ehdr *hdr = (Elf_Ehdr *)data;
> +
> +    if ( elf->len < (sizeof *hdr) ||
> +         !IS_ELF(*hdr) ||

At the very least, this should return -EINVAL for "not an elf", to
differenciate it from "not the right kind of elf" below.

> +         hdr->e_ident[EI_CLASS] != ELFCLASS64 ||
> +         hdr->e_ident[EI_DATA] != ELFDATA2LSB ||
> +         hdr->e_ident[EI_OSABI] != ELFOSABI_SYSV ||
> +         hdr->e_machine != EM_X86_64 ||
> +         hdr->e_type != ET_REL ||
> +         hdr->e_phnum != 0 )
> +    {
> +        printk(XENLOG_ERR "%s: Invalid ELF file.\n", elf->name);

Where possible, please avoid punction in error messages.  Its just
wasted characters on the uart.

I would also suggest the error message be "xpatch '%s': Invalid ELF
file\n" to give the observer some clue that we are referring to payload
attached to a specific xsplice patch.

> +        return -EOPNOTSUPP;
> +    }
> +
> +    return 0;
> +}
> +
> +int xsplice_perform_rel(struct xsplice_elf *elf,
> +                        struct xsplice_elf_sec *base,
> +                        struct xsplice_elf_sec *rela)
> +{
> +    printk(XENLOG_ERR "%s: SHR_REL relocation unsupported\n", elf->name);

Simiarly here.  All the error messages should have some common
indication that we are in the xsplice subsystem.

> +    return -ENOSYS;
> +}
> +
> <snip>
> @@ -378,6 +391,232 @@ int xsplice_control(xen_sysctl_xsplice_op_t *xsplice)
>      return rc;
>  }
>  
> +#ifdef CONFIG_X86
> +static void find_hole(ssize_t pages, unsigned long *hole_start,
> +                      unsigned long *hole_end)

Find a hole in what?

Also, shouldn't this code live in arch/x86/xsplice rather than a
CONFIG_X86 section of common xsplice?

> +{
> +    struct payload *data, *data2;
> +
> +    spin_lock(&payload_lock);
> +    list_for_each_entry ( data, &payload_list, list )
> +    {
> +        list_for_each_entry ( data2, &payload_list, list )
> +        {
> +            unsigned long start, end;
> +
> +            start = (unsigned long)data2->payload_address;
> +            end = start + data2->payload_pages * PAGE_SIZE;
> +            if ( *hole_end > start && *hole_start < end )
> +            {
> +                *hole_start = end;
> +                *hole_end = *hole_start + pages * PAGE_SIZE;
> +                break;
> +            }
> +        }
> +        if ( &data2->list == &payload_list )
> +            break;
> +    }
> +    spin_unlock(&payload_lock);
> +}
> +
> +/*
> + * The following functions prepare an xSplice payload to be executed by
> + * allocating space, loading the allocated sections, resolving symbols,
> + * performing relocations, etc.
> + */
> +static void *alloc_payload(size_t size)
> +{
> +    mfn_t *mfn, *mfn_ptr;
> +    size_t pages, i;
> +    struct page_info *pg;
> +    unsigned long hole_start, hole_end, cur;
> +
> +    ASSERT(size);
> +
> +    /*
> +     * Copied from vmalloc which allocates pages and then maps them to an
> +     * arbitrary virtual address with PAGE_HYPERVISOR. We need specific
> +     * virtual address with PAGE_HYPERVISOR_RWX.

Can we please therefore please extend the existing valloc infrastructure
rather than copying.

Also, nack to introducing any new uses of RWX.  It poses an unnecessary
security risk, and I am (slowly sadly) trying to remove all uses of it
outside the __init code.

The pages should be mapped RW to start with, then relocated etc, then
having the text section modified to RX just before use.

> +     */
> +    pages = PFN_UP(size);
> +    mfn = xmalloc_array(mfn_t, pages);
> +    if ( mfn == NULL )
> +        return NULL;
> +
> <snip>
> +
> +static int move_payload(struct payload *payload, struct xsplice_elf *elf)
> +{
> +    uint8_t *buf;
> +    unsigned int i;
> +    size_t size = 0;
> +
> +    /* Compute text regions */
> +    for ( i = 0; i < elf->hdr->e_shnum; i++ )
> +    {
> +        if ( (elf->sec[i].sec->sh_flags & (SHF_ALLOC|SHF_EXECINSTR)) ==
> +             (SHF_ALLOC|SHF_EXECINSTR) )
> +            calc_section(&elf->sec[i], &size);
> +    }
> +
> +    /* Compute rw data */
> +    for ( i = 0; i < elf->hdr->e_shnum; i++ )
> +    {
> +        if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> +             !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
> +             (elf->sec[i].sec->sh_flags & SHF_WRITE) )
> +            calc_section(&elf->sec[i], &size);
> +    }
> +
> +    /* Compute ro data */
> +    for ( i = 0; i < elf->hdr->e_shnum; i++ )
> +    {
> +        if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> +             !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
> +             !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
> +            calc_section(&elf->sec[i], &size);
> +    }
> +
> +    buf = alloc_payload(size);
> +    if ( !buf ) {
> +        printk(XENLOG_ERR "%s: Could not allocate memory for module\n",
> +               elf->name);
> +        return -ENOMEM;
> +    }
> +    memset(buf, 0, size);

alloc_payload() should ensure the 0-ness of buf.  That way, you can get
clear pages "for free" by taking from a zeroed pool, rather than forcing
a rezero of a probably-zero buffer.

> +
> +    for ( i = 0; i < elf->hdr->e_shnum; i++ )
> +    {
> +        if ( elf->sec[i].sec->sh_flags & SHF_ALLOC )
> +        {
> +            elf->sec[i].load_addr = buf + elf->sec[i].sec->sh_entsize;
> +            /* Don't copy NOBITS - such as BSS. */
> +            if ( elf->sec[i].sec->sh_type != SHT_NOBITS )
> +            {
> +                memcpy(elf->sec[i].load_addr, elf->sec[i].data,
> +                       elf->sec[i].sec->sh_size);
> +                printk(XENLOG_DEBUG "%s: Loaded %s at 0x%p\n",
> +                       elf->name, elf->sec[i].name, elf->sec[i].load_addr);
> +            }
> +        }
> +    }
> +
> +    payload->payload_address = buf;
> +    payload->payload_pages = PFN_UP(size);
> +
> +    return 0;
> +}
> +
> <snip>
> diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
> index cf465c4..d71c898 100644
> --- a/xen/include/xen/xsplice.h
> +++ b/xen/include/xen/xsplice.h
> @@ -1,10 +1,22 @@
>  #ifndef __XEN_XSPLICE_H__
>  #define __XEN_XSPLICE_H__
>  
> +struct xsplice_elf;
> +struct xsplice_elf_sec;
> +struct xsplice_elf_sym;
>  struct xen_sysctl_xsplice_op;
>  
>  #ifdef CONFIG_XSPLICE
>  int xsplice_control(struct xen_sysctl_xsplice_op *);
> +
> +/* Arch hooks */

Can we name them arch_xsplice_$FOO then, to make it obvious?

~Andrew

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 05/23] xsplice: Add helper elf routines (v4)
  2016-02-12 20:47     ` Konrad Rzeszutek Wilk
@ 2016-02-12 20:52       ` Andrew Cooper
  0 siblings, 0 replies; 86+ messages in thread
From: Andrew Cooper @ 2016-02-12 20:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, Ian Campbell, jinsong.liu, Ian Jackson, Tim Deegan,
	mpohlack, ross.lagerwall, Jan Beulich, xen-devel, xen-devel,
	sasha.levin

On 12/02/16 20:47, Konrad Rzeszutek Wilk wrote:
>>> +struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
>>> +                                                const char *name)
>>> +{
>>> +    unsigned int i;
>>> +
>>> +    for ( i = 0; i < elf->hdr->e_shnum; i++ )
>>> +    {
>>> +        if ( !strcmp(name, elf->sec[i].name) )
>>> +            return &elf->sec[i];
>>> +    }
>>> +
>>> +    return NULL;
>>> +}
>>> +
>>> +static int elf_resolve_sections(struct xsplice_elf *elf, uint8_t *data)
>>> +{
>>> +    struct xsplice_elf_sec *sec;
>>> +    unsigned int i;
>>> +
>>> +    sec = xmalloc_array(struct xsplice_elf_sec, elf->hdr->e_shnum);
>> Presumably there will be some sanity checks done somewhere between the
>> hypercall and here?
> There are checks on it but not the value itself. As in the payload could
> have e_shnum be some astronomical value because of many .sections in the
> file (even the ones we do not use). We could combat that by having
> an whitelist of sections - and:
>  - If the payload has them return -EINVAL.
>  - If the payload has them - ignore them and continue on but instead of
>    using e_shnum use the counted value of the sections we expect?
>
> Preferences?

Anything more than a handful of sections is likely to be a bogus ELF
file.  I would put a hard limit (64 perhaps?).  If we fine a plausible
usecase for that many sections in a patch, we can revisit the logic.

We should bail in any situation where we find a value we don't like,
such as an unknown section.  As this is binary patching Xen, I would
prefer not to take any unnecessary risks.

~Andrew

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 01/23] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op (v10)
  2016-02-12 20:40     ` Konrad Rzeszutek Wilk
@ 2016-02-12 20:53       ` Andrew Cooper
  2016-02-15  8:16       ` Jan Beulich
  1 sibling, 0 replies; 86+ messages in thread
From: Andrew Cooper @ 2016-02-12 20:53 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Ian Campbell, jinsong.liu, Stefano Stabellini,
	Ian Jackson, xen-devel, mpohlack, ross.lagerwall, xen-devel,
	Daniel De Graaf, sasha.levin

On 12/02/16 20:40, Konrad Rzeszutek Wilk wrote:
>>> diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
>>> index 96680eb..d549e7a 100644
>>> --- a/xen/include/public/sysctl.h
>>> +++ b/xen/include/public/sysctl.h
>>> @@ -766,6 +766,160 @@ struct xen_sysctl_tmem_op {
>>>  typedef struct xen_sysctl_tmem_op xen_sysctl_tmem_op_t;
>>>  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tmem_op_t);
>>>  
>>> +/*
>>> + * XEN_SYSCTL_XSPLICE_op
>>> + *
>>> + * Refer to the http://xenbits.xenproject.org/docs/unstable/misc/xsplice.html
>> I would refer to the file in the source tree, so docs/misc/xsplice.$FOO
>> which is far less likely to change.
> The initial patch had exactly that -  but Jan asked me to change it to the
> URL. Shall I include both of them?

Ok then.  (most other docs references are relative to the source tree...)

~Andrew

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 12/23] xsm/xen_version: Add XSM for the xen_version hypercall (v8).
  2016-02-12 18:05 ` [PATCH v3 12/23] xsm/xen_version: Add XSM for the xen_version hypercall (v8) Konrad Rzeszutek Wilk
@ 2016-02-12 21:52   ` Daniel De Graaf
  0 siblings, 0 replies; 86+ messages in thread
From: Daniel De Graaf @ 2016-02-12 21:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, andrew.cooper3, konrad,
	mpohlack, ross.lagerwall, sasha.levin, jinsong.liu, Ian Jackson,
	Stefano Stabellini, Ian Campbell, Wei Liu, xen-devel

On 12/02/16 13:05, Konrad Rzeszutek Wilk wrote:
> All of XENVER_* have now an XSM check for their sub-ops.
>
> The subop for XENVER_commandline is now a priviliged operation.
> To not break guests we still return an string - but it is
> just '<denied>\0'.
>
> The rest: XENVER_[version|extraversion|capabilities|
> parameters|get_features|page_size|guest_handle|changeset|
> compile_info] behave as before - allowed by default for all
> guests if using the XSM default policy or with the dummy one.
>
> The admin can choose to change the sub-ops to be denied
> as they see fit.
>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
> v2: Do XSM check for all the XENVER_ ops.
> v3: Add empty data conditions.
> v4: Return <denied> for priv subops.
> v5: Move extraversion from priv to normal. Drop the XSM check
>      for the non-priv subops.
> v6: Add +1 for strlen(xen_deny()) to include NULL. Move changeset,
>      compile_info to non-priv subops.
> v7: Remove the \0 on xen_deny()
> v8: Add new XSM domain for xenver hypercall. Add all subops to it.

With one excess line removed:
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>

[...]

> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> index c856e1e..7e3bcdd 100644
> --- a/xen/xsm/flask/hooks.c
> +++ b/xen/xsm/flask/hooks.c
> @@ -26,6 +26,7 @@
>   #include <public/xen.h>
>   #include <public/physdev.h>
>   #include <public/platform.h>
> +#include <public/version.h>
>
>   #include <public/xsm/flask_op.h>
>
> @@ -1626,6 +1627,48 @@ static int flask_pmu_op (struct domain *d, unsigned int op)
>   }
>   #endif /* CONFIG_X86 */
>
> +static int flask_version_op (uint32_t op)
> +{
> +    u32 dsid = domain_sid(current->domain);
> +
> +    switch ( op )
> +    {
> +    case XENVER_version:
> +        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
> +                            VERSION__VERSION, NULL);
> +    case XENVER_extraversion:
> +        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
> +                            VERSION__EXTRAVERSION, NULL);
> +    case XENVER_compile_info:
> +        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
> +                            VERSION__COMPILE_INFO, NULL);
> +    case XENVER_capabilities:
> +        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
> +                            VERSION__CAPABILITIES, NULL);
> +    case XENVER_changeset:
> +        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
> +                            VERSION__CHANGESET, NULL);
> +    case XENVER_platform_parameters:
> +        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
> +                            VERSION__PLATFORM_PARAMETERS, NULL);
> +    case XENVER_get_features:
> +        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
> +                            VERSION__GET_FEATURES, NULL);
> +    case XENVER_pagesize:
> +        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
> +                            VERSION__PAGESIZE, NULL);
> +    case XENVER_guest_handle:
> +        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
> +                            VERSION__GUEST_HANDLE, NULL);

> +        return 0; /* These MUST always be accessible to guests. */

This line seems to be misplaced.

> +    case XENVER_commandline:
> +        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
> +                            VERSION__COMMANDLINE, NULL);
> +    default:
> +        return -EPERM;
> +    }
> +}
> +
>   long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
>   int compat_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 13/23] XENVER_build_id: Provide ld-embedded build-ids (v10)
  2016-02-12 18:05 ` [PATCH v3 13/23] XENVER_build_id: Provide ld-embedded build-ids (v10) Konrad Rzeszutek Wilk
@ 2016-02-12 21:52   ` Daniel De Graaf
  2016-02-16 20:09   ` Andrew Cooper
  1 sibling, 0 replies; 86+ messages in thread
From: Daniel De Graaf @ 2016-02-12 21:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, andrew.cooper3, konrad,
	mpohlack, ross.lagerwall, sasha.levin, jinsong.liu, Ian Jackson,
	Stefano Stabellini, Ian Campbell, Wei Liu, Stefano Stabellini,
	Keir Fraser, Jan Beulich, xen-devel

On 12/02/16 13:05, Konrad Rzeszutek Wilk wrote:
> The mechanism to get this is via the XENVER hypercall and
> we add a new sub-command to retrieve the binary build-id
> called XENVER_build_id. The sub-hypercall parameter
> allows an arbitrary size (the buffer and len is provided
> to the hypervisor). A NULL parameter will probe the hypervisor
> for the length of the build-id.
>
> One can also retrieve the value of the build-id by doing
> 'readelf -n xen-syms'.
>
> For EFI builds we re-use the same build-id that the xen-syms
> was built with.
>
> The version of ld that first implemented --build-id is v2.18.
> Hence we check for that or later version - if older version
> found we do not build the hypervisor with the build-id
> (and the return code is -ENODATA for that case).
>
> For x86 we have two binaries - the xen-syms and the xen - an
> smaller version with lots of sections removed. To make it possible
> for readelf -n xen we also modify mkelf32 and xen.lds.S to include
> the PT_NOTE ELF section.
>
> The EFI binary is more complicated. Having any non-recognizable
> sections (.note, .data.note, etc) causes the boot to hang.
> Moving the .note in the .data section makes it work. It is also
> worth noting that the PE/COFF does not have any "comment"
> sections to the author.
>
> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Martin Pohlack <mpohlack@amazon.de>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3] xSplice v1 implementation and design.
  2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
                   ` (22 preceding siblings ...)
  2016-02-12 18:06 ` [PATCH v3 23/23] xsplice, hello_world: Use the XSPLICE_[UN|]LOAD_HOOK hooks for two functions Konrad Rzeszutek Wilk
@ 2016-02-12 21:57 ` Konrad Rzeszutek Wilk
  2016-02-12 21:57   ` [PATCH v3 MISSING/23] xsplice: Design document (v7) Konrad Rzeszutek Wilk
  23 siblings, 1 reply; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 21:57 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu


> Konrad Rzeszutek Wilk (13):
>       xsplice: Design document (v7).

Somehow this patch disappeared in this patchset posting!
To not spam everybody's mailbox I am just sending this simple sub-thread
which has the design document.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v3 MISSING/23] xsplice: Design document (v7).
  2016-02-12 21:57 ` [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
@ 2016-02-12 21:57   ` Konrad Rzeszutek Wilk
  2016-02-18 16:20     ` Jan Beulich
  0 siblings, 1 reply; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-12 21:57 UTC (permalink / raw)
  To: xen-devel, andrew.cooper3, konrad, mpohlack, ross.lagerwall,
	sasha.levin, jinsong.liu, Ian Campbell, Ian Jackson, Jan Beulich,
	Keir Fraser, Tim Deegan, xen-devel
  Cc: Konrad Rzeszutek Wilk

A mechanism is required to binarily patch the running hypervisor with new
opcodes that have come about due to primarily security updates.

This document describes the design of the API that would allow us to
upload to the hypervisor binary patches.

This document has been shaped by the input from:
  Martin Pohlack <mpohlack@amazon.de>
  Jan Beulich <jbeulich@suse.com>

Thank you!

Input-from: Martin Pohlack <mpohlack@amazon.de>
Input-from: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
v1-2: review
v3: Split document in v1 and v2 (todo) to simplify implementation goals.
v4: Add const on some structures. Truncate size to uint16_t where it makes sense.
v5: Convert 'id' to 'name', Add Ross's comments about what is implemented.
v6: Wei's and Ross's reviews.
v7: Jan's review comments.
---
 docs/misc/xsplice.markdown | 1042 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 1042 insertions(+)
 create mode 100644 docs/misc/xsplice.markdown

diff --git a/docs/misc/xsplice.markdown b/docs/misc/xsplice.markdown
new file mode 100644
index 0000000..9a95243
--- /dev/null
+++ b/docs/misc/xsplice.markdown
@@ -0,0 +1,1042 @@
+# xSplice Design v1
+
+## Rationale
+
+A mechanism is required to binarily patch the running hypervisor with new
+opcodes that have come about due to primarily security updates.
+
+This document describes the design of the API that would allow us to
+upload to the hypervisor binary patches.
+
+The document is split in four sections:
+
+ * Detailed descriptions of the problem statement.
+ * Design of the data structures.
+ * Design of the hypercalls.
+ * Implementation notes that should be taken into consideration.
+
+
+## Glossary
+
+ * splice - patch in the binary code with new opcodes
+ * trampoline - a jump to a new instruction.
+ * payload - telemetries of the old code along with binary blob of the new
+   function (if needed).
+ * reloc - telemetries contained in the payload to construct proper trampoline.
+
+## History
+
+The document has gone under various reviews and only covers v1 design.
+
+The end of the document has a section titled `Not Yet Done` which
+outlines ideas and design for the future version of this work.
+
+## Multiple ways to patch
+
+The mechanism needs to be flexible to patch the hypervisor in multiple ways
+and be as simple as possible. The compiled code is contiguous in memory with
+no gaps - so we have no luxury of 'moving' existing code and must either
+insert a trampoline to the new code to be executed - or only modify in-place
+the code if there is sufficient space. The placement of new code has to be done
+by hypervisor and the virtual address for the new code is allocated dynamically.
+
+This implies that the hypervisor must compute the new offsets when splicing
+in the new trampoline code. Where the trampoline is added (inside
+the function we are patching or just the callers?) is also important.
+
+To lessen the amount of code in hypervisor, the consumer of the API
+is responsible for identifying which mechanism to employ and how many locations
+to patch. Combinations of modifying in-place code, adding trampoline, etc
+has to be supported. The API should allow read/write any memory within
+the hypervisor virtual address space.
+
+We must also have a mechanism to query what has been applied and a mechanism
+to revert it if needed.
+
+## Workflow
+
+The expected workflows of higher-level tools that manage multiple patches
+on production machines would be:
+
+ * The first obvious task is loading all available / suggested
+   hotpatches when they are available.
+ * Whenever new hotpatches are installed, they should be loaded too.
+ * One wants to query which modules have been loaded at runtime.
+ * If unloading is deemed safe (see unloading below), one may want to
+   support a workflow where a specific hotpatch is marked as bad and
+   unloaded.
+
+## Patching code
+
+The first mechanism to patch that comes in mind is in-place replacement.
+That is replace the affected code with new code. Unfortunately the x86
+ISA is variable size which places limits on how much space we have available
+to replace the instructions. That is not a problem if the change is smaller
+than the original opcode and we can fill it with nops. Problems will
+appear if the replacement code is longer.
+
+The second mechanism is by ti replace the call or jump to the
+old function with the address of the new function.
+
+A third mechanism is to add a jump to the new function at the
+start of the old function. N.B. The Xen hypervisor implements the third
+mechanism. See `Trampoline (e9 opcode)` section for more details.
+
+### Example of trampoline and in-place splicing
+
+As example we will assume the hypervisor does not have XSA-132 (see
+*domctl/sysctl: don't leak hypervisor stack to toolstacks*
+4ff3449f0e9d175ceb9551d3f2aecb59273f639d) and we would like to binary patch
+the hypervisor with it. The original code looks as so:
+
+<pre>
+   48 89 e0                  mov    %rsp,%rax  
+   48 25 00 80 ff ff         and    $0xffffffffffff8000,%rax  
+</pre>
+
+while the new patched hypervisor would be:
+
+<pre>
+   48 c7 45 b8 00 00 00 00   movq   $0x0,-0x48(%rbp)  
+   48 c7 45 c0 00 00 00 00   movq   $0x0,-0x40(%rbp)  
+   48 c7 45 c8 00 00 00 00   movq   $0x0,-0x38(%rbp)  
+   48 89 e0                  mov    %rsp,%rax  
+   48 25 00 80 ff ff         and    $0xffffffffffff8000,%rax  
+</pre>
+
+This is inside the arch_do_domctl. This new change adds 21 extra
+bytes of code which alters all the offsets inside the function. To alter
+these offsets and add the extra 21 bytes of code we might not have enough
+space in .text to squeeze this in.
+
+As such we could simplify this problem by only patching the site
+which calls arch_do_domctl:
+
+<pre>
+do_domctl:  
+ e8 4b b1 05 00          callq  ffff82d08015fbb9 <arch_do_domctl>  
+</pre>
+
+with a new address for where the new `arch_do_domctl` would be (this
+area would be allocated dynamically).
+
+Astute readers will wonder what we need to do if we were to patch `do_domctl`
+- which is not called directly by hypervisor but on behalf of the guests via
+the `compat_hypercall_table` and `hypercall_table`.
+Patching the offset in `hypercall_table` for `do_domctl:
+(ffff82d080103079 <do_domctl>:)
+
+<pre>
+
+ ffff82d08024d490:   79 30  
+ ffff82d08024d492:   10 80 d0 82 ff ff   
+
+</pre>
+
+with the new address where the new `do_domctl` is possible. The other
+place where it is used is in `hvm_hypercall64_table` which would need
+to be patched in a similar way. This would require an in-place splicing
+of the new virtual address of `arch_do_domctl`.
+
+In summary this example patched the callee of the affected function by
+ * allocating memory for the new code to live in,
+ * changing the virtual address in all the functions which called the old
+   code (computing the new offset, patching the callq with a new callq).
+ * changing the function pointer tables with the new virtual address of
+   the function (splicing in the new virtual address). Since this table
+   resides in the .rodata section we would need to temporarily change the
+   page table permissions during this part.
+
+However it has drawbacks - the safety checks which have to make sure
+the function is not on the stack - must also check every caller. For some
+patches this could mean - if there were an sufficient large amount of
+callers - that we would never be able to apply the update.
+
+Having the patching done at predetermined instances where the stacks
+are not deep mostly solves this problem.
+
+### Example of different trampoline patching.
+
+An alternative mechanism exists where we can insert a trampoline in the
+existing function to be patched to jump directly to the new code. This
+lessens the locations to be patched to one but it puts pressure on the
+CPU branching logic (I-cache, but it is just one unconditional jump).
+
+For this example we will assume that the hypervisor has not been compiled
+with fe2e079f642effb3d24a6e1a7096ef26e691d93e (XSA-125: *pre-fill structures
+for certain HYPERVISOR_xen_version sub-ops*) which mem-sets an structure
+in `xen_version` hypercall. This function is not called **anywhere** in
+the hypervisor (it is called by the guest) but referenced in the
+`compat_hypercall_table` and `hypercall_table` (and indirectly called
+from that). Patching the offset in `hypercall_table` for the old
+`do_xen_version` (ffff82d080112f9e <do_xen_version>)
+
+</pre>
+ ffff82d08024b270 <hypercall_table>:   
+ ...  
+ ffff82d08024b2f8:   9e 2f 11 80 d0 82 ff ff  
+
+</pre>
+
+with the new address where the new `do_xen_version` is possible. The other
+place where it is used is in `hvm_hypercall64_table` which would need
+to be patched in a similar way. This would require an in-place splicing
+of the new virtual address of `do_xen_version`.
+
+An alternative solution would be to patch insert a trampoline in the
+old `do_xen_version' function to directly jump to the new `do_xen_version`.
+
+<pre>
+ ffff82d080112f9e do_xen_version:  
+ ffff82d080112f9e:       48 c7 c0 da ff ff ff    mov    $0xffffffffffffffda,%rax  
+ ffff82d080112fa5:       83 ff 09                cmp    $0x9,%edi  
+ ffff82d080112fa8:       0f 87 24 05 00 00       ja     ffff82d0801134d2 ; do_xen_version+0x534  
+</pre>
+
+with:
+
+<pre>
+ ffff82d080112f9e do_xen_version:  
+ ffff82d080112f9e:       e9 XX YY ZZ QQ          jmpq   [new do_xen_version]  
+</pre>
+
+which would lessen the amount of patching to just one location.
+
+In summary this example patched the affected function to jump to the
+new replacement function which required:
+ * allocating memory for the new code to live in,
+ * inserting trampoline with new offset in the old function to point to the
+   new function.
+ * Optionally we can insert in the old function a trampoline jump to an function
+   providing an BUG_ON to catch errant code.
+
+The disadvantage of this are that the unconditional jump will consume a small
+I-cache penalty. However the simplicity of the patching and higher chance
+of passing safety checks make this a worthwhile option.
+
+This patching has a similar drawback as inline patching - the safety
+checks have to make sure the function is not on the stack. However
+since we are replacing at a higher level (a full function as opposed
+to various offsets within functions) the checks are simpler.
+
+Having the patching done at predetermined instances where the stacks
+are not deep mostly solves this problem as well.
+
+### Security
+
+With this method we can re-write the hypervisor - and as such we **MUST** be
+diligent in only allowing certain guests to perform this operation.
+
+Furthermore with SecureBoot or tboot, we **MUST** also verify the signature
+of the payload to be certain it came from a trusted source and integrity
+was intact.
+
+As such the hypercall **MUST** support an XSM policy to limit what the guest
+is allowed to invoke. If the system is booted with signature checking the
+signature checking will be enforced.
+
+## Design of payload format
+
+The payload **MUST** contain enough data to allow us to apply the update
+and also safely reverse it. As such we **MUST** know:
+
+ * The locations in memory to be patched. This can be determined dynamically
+   via symbols or via virtual addresses.
+ * The new code that will be patched in.
+
+This binary format can be constructed using an custom binary format but
+there are severe disadvantages of it:
+
+ * The format might need to be changed and we need an mechanism to accommodate
+   that.
+ * It has to be platform agnostic.
+ * Easily constructed using existing tools.
+
+As such having the payload in an ELF file is the sensible way. We would be
+carrying the various sets of structures (and data) in the ELF sections under
+different names and with definitions.
+
+Note that every structure has padding. This is added so that the hypervisor
+can re-use those fields as it sees fit.
+
+Earlier design attempted to ineptly explain the relations of the ELF sections
+to each other without using proper ELF mechanism (sh_info, sh_link, data
+structures using Elf types, etc). This design will explain the structures
+and how they are used together and not dig in the ELF format - except mention
+that the section names should match the structure names.
+
+The xSplice payload is a relocatable ELF binary. A typical binary would have:
+
+ * One or more .text sections.
+ * Zero or more read-only data sections.
+ * Zero or more data sections.
+ * Relocations for each of these sections.
+
+It may also have some architecture-specific sections. For example:
+
+ * Alternatives instructions.
+ * Bug frames.
+ * Exception tables.
+ * Relocations for each of these sections.
+
+The xSplice core code loads the payload as a standard ELF binary, relocates it
+and handles the architecture-specifc sections as needed. This process is much
+like what the Linux kernel module loader does.
+
+The payload contains a section (xsplice_patch_func) with an array of structures
+describing the functions to be patched:
+
+<pre>
+struct xsplice_patch_func {  
+    const char *name;  
+    Elf64_Xwordnew_addr;  
+    Elf64_Xword old_addr;  
+    Elf64_Word new_size;  
+    Elf64_Word long old_size;  
+    uint8_t pad[32];  
+};  
+</pre>
+
+The size of the structure is 64 bytes.
+
+* `name` is the symbol name of the old function. Only used if `old_addr` is
+   zero, otherwise will be used during dynamic linking (when hypervisor loads
+   the payload).
+
+* `old_addr` is the address of the function to be patched and is filled in at
+  payload generation time if hypervisor function address is known. If unknown,
+  the value *MUST* be zero and the hypervisor will attempt to resolve the address.
+
+* `new_addr` is the address of the function that is replacing the old
+  function. The address is filled in during relocation. The value **MUST** be
+  the address of the new function in the file.
+
+* `old_size` and `new_size` contain the sizes of the respective functions in bytes.
+   The value of `old_size` **MUST** not be zero.
+
+* `pad` **MUST** be zero.
+
+The size of the `xsplice_patch_func` array is determined from the ELF section
+size.
+
+When applying the patch the hypervisor iterates over each `xsplice_patch_func`
+structure and the core code inserts a trampoline at `old_addr` to `new_addr`.
+
+When reverting a patch, the hypervisor iterates over each `xsplice_patch_func`
+and the core code copies the data from the undo buffer (private internal copy)
+to `old_addr`.
+
+## Hypercalls
+
+We will employ the sub operations of the system management hypercall (sysctl).
+There are to be four sub-operations:
+
+ * upload the payloads.
+ * listing of payloads summary uploaded and their state.
+ * getting an particular payload summary and its state.
+ * command to apply, delete, or revert the payload.
+
+Most of the actions are asynchronous therefore the caller is responsible
+to verify that it has been applied properly by retrieving the summary of it
+and verifying that there are no error codes associated with the payload.
+
+We **MUST** make some of them asynchronous due to the nature of patching
+it requires every physical CPU to be lock-step with each other.
+The patching mechanism while an implementation detail, is not an short
+operation and as such the design **MUST** assume it will be an long-running
+operation.
+
+The sub-operations will spell out how preemption is to be handled (if at all).
+
+Furthermore it is possible to have multiple different payloads for the same
+function. As such an unique name per payload has to be visible to allow proper manipulation.
+
+The hypercall is part of the `xen_sysctl`. The top level structure contains
+one uint32_t to determine the sub-operations and one padding field which
+*MUST* always be zero.
+
+<pre>
+struct xen_sysctl_xsplice_op {  
+    uint32_t cmd;                   /* IN: XEN_SYSCTL_XSPLICE_*. */  
+    uint32_t pad;                   /* IN: Always zero. */  
+	union {  
+          ... see below ...  
+        } u;  
+};  
+
+</pre>
+while the rest of hypercall specific structures are part of the this structure.
+
+### Basic type: struct xen_xsplice_name
+
+Most of the hypercalls employ an shared structure called `struct xen_xsplice_name`
+which contains:
+
+ * `name` - pointer where the string for the name is located.
+ * `size` - the size of the string
+ * `pad` - padding - to be zero.
+
+The structure is as follow:
+
+<pre>
+#define XEN_XSPLICE_NAME_SIZE 128  
+struct xen_xsplice_name {  
+    XEN_GUEST_HANDLE_64(char) name;         /* IN, pointer to name. */  
+    uint16_t size;                          /* IN, size of name. May be upto   
+                                               XEN_XSPLICE_NAME_SIZE. */  
+    uint16_t pad[3];                        /* IN: MUST be zero. */ 
+};  
+</pre>
+
+### XEN_SYSCTL_XSPLICE_UPLOAD (0)
+
+Upload a payload to the hypervisor. The payload is verified
+against basic checks and if there are any issues the proper return code
+will be returned. The payload is not applied at this time - that is
+controlled by *XEN_SYSCTL_XSPLICE_ACTION*.
+
+The caller provides:
+
+ * A `struct xen_xsplice_name` called `name` which has the unique name.
+ * `size` the size of the ELF payload (in bytes).
+ * `payload` the virtual address of where the ELF payload is.
+
+The `name` could be an UUID that stays fixed forever for a given
+payload. It can be embedded into the ELF payload at creation time
+and extracted by tools.
+
+The return value is zero if the payload was succesfully uploaded.
+Otherwise an -XEN_EXX return value is provided. Duplicate `name` are not supported.
+
+The `payload` is the ELF payload as mentioned in the `Payload format` section.
+
+The structure is as follow:
+
+<pre>
+struct xen_sysctl_xsplice_upload {  
+    xen_xsplice_name_t name;            /* IN, name of the patch. */  
+    uint64_t size;                      /* IN, size of the ELF file. */  
+    XEN_GUEST_HANDLE_64(uint8) payload; /* IN: ELF file. */  
+};  
+</pre>
+
+### XEN_SYSCTL_XSPLICE_GET (1)
+
+Retrieve an status of an specific payload. This caller provides:
+
+ * A `struct xen_xsplice_name` called `name` which has the unique name.
+ * A `struct xen_xsplice_status` structure which has all members
+   set to zero: That is:
+   * `status` *MUST* be set to zero.
+   * `rc` *MUST* be set to zero.
+
+Upon completion the `struct xen_xsplice_status` is updated.
+
+ * `status` - indicates the current status of the payload:
+   * *XSPLICE_STATUS_LOADED* (1) has been loaded.
+   * *XSPLICE_STATUS_CHECKED*  (2) the ELF payload safety checks passed.
+   * *XSPLICE_STATUS_APPLIED* (3) loaded, checked, and applied.
+   *  No other value is possible.
+ * `rc` - -XEN_EXX type errors encountered while performing the last
+   XSPLICE_ACTION_* operation. The normal values can be zero or -XEN_EAGAIN which
+   respectively mean: success or operation in progress. Other values
+   imply an error occurred. If there is an error in `rc`, `status` will **NOT**
+   have changed.
+
+The return value of the hypercall is zero on success and -XEN_EXX on failure.
+(Note that the `rc`` value can be different from the return value, as in
+rc=-XEN_EAGAIN and return value can be 0).
+
+For example, supposing there is an payload:
+
+<pre>
+ status: XSPLICE_STATUS_LOADED
+ rc: 0
+</pre>
+
+We apply an action - XSPLICE_ACTION_REVERT - to revert it (which won't work
+as we have not even applied it. Afterwards we will have:
+
+<pre>
+ status: XSPLICE_STATUS_LOADED
+ rc: -XEN_EINVAL
+</pre>
+
+It has failed but it remains loaded.
+
+This operation is synchronous and does not require preemption.
+
+The structure is as follow:
+
+<pre>
+struct xen_xsplice_status {  
+#define XSPLICE_STATUS_LOADED       1  
+#define XSPLICE_STATUS_CHECKED      2  
+#define XSPLICE_STATUS_APPLIED      3  
+    int32_t state;                  /* OUT: XSPLICE_STATE_*. IN: MUST be zero. */  
+    int32_t rc;                     /* OUT: 0 if no error, otherwise -XEN_EXX. */  
+                                    /* IN: MUST be zero. */
+};  
+
+struct xen_sysctl_xsplice_summary {  
+    xen_xsplice_name_t name;        /* IN, the name of the payload. */  
+    xen_xsplice_status_t status;    /* IN/OUT: status of the payload. */  
+};  
+</pre>
+
+### XEN_SYSCTL_XSPLICE_LIST (2)
+
+Retrieve an array of abbreviated status and names of payloads that are loaded in the
+hypervisor.
+
+The caller provides:
+
+ * `version`. Initially (on first hypercall) *MUST* be zero.
+ * `idx` index iterator. On first call *MUST* be zero, subsequent calls varies.
+ * `nr` the max number of entries to populate.
+ * `pad` - *MUST* be zero.
+ * `status` virtual address of where to write `struct xen_xsplice_status`
+   structures. Caller *MUST* allocate up to `nr` of them.
+ * `name` - virtual address of where to write the unique name of the payload.
+   Caller *MUST* allocate up to `nr` of them. Each *MUST* be of
+   **XEN_XSPLICE_NAME_SIZE** size.
+ * `len` - virtual address of where to write the length of each unique name
+   of the payload. Caller *MUST* allocate up to `nr` of them. Each *MUST* be
+   of sizeof(uint32_t) (4 bytes).
+
+If the hypercall returns an positive number, it is the number (upto `nr`
+provided to the hypercall) of the payloads returned, along with `nr` updated
+with the number of remaining payloads, `version` updated (it may be the same
+across hypercalls - if it varies the data is stale and further calls could
+fail). The `status`, `name`, and `len`' are updated at their designed index
+value (`idx`) with the returned value of data.
+
+If the hypercall returns -XEN_E2BIG the `nr` is too big and should be
+lowered.
+
+If the hypercall returns an zero value that means there are no payloads.
+
+Note that due to the asynchronous nature of hypercalls the control domain might
+have added or removed a number of payloads making this information stale. It is
+the responsibility of the toolstack to use the `version` field to check
+between each invocation. if the version differs it should discard the stale
+data and start from scratch. It is OK for the toolstack to use the new
+`version` field.
+
+The `struct xen_xsplice_status` structure contains an status of payload which includes:
+
+ * `status` - indicates the current status of the payload:
+   * *XSPLICE_STATUS_LOADED* (1) has been loaded.
+   * *XSPLICE_STATUS_CHECKED*  (2) the ELF payload safety checks passed.
+   * *XSPLICE_STATUS_APPLIED* (3) loaded, checked, and applied.
+   *  No other value is possible.
+ * `rc` - -XEN_EXX type errors encountered while performing the last
+   XSPLICE_ACTION_* operation. The normal values can be zero or -XEN_EAGAIN which
+   respectively mean: success or operation in progress. Other values
+   imply an error occurred. If there is an error in `rc`, `status` will **NOT**
+   have changed.
+
+The structure is as follow:
+
+<pre>
+struct xen_sysctl_xsplice_list {  
+    uint32_t version;                       /* IN/OUT: Initially *MUST* be zero.  
+                                               On subsequent calls reuse value.  
+                                               If varies between calls, we are  
+                                             * getting stale data. */  
+    uint32_t idx;                           /* IN/OUT: Index into array. */  
+    uint32_t nr;                            /* IN: How many status, names, and len  
+                                               should fill out.  
+                                               OUT: How many payloads left. */  
+    uint32_t pad;                           /* IN: Must be zero. */  
+    XEN_GUEST_HANDLE_64(xen_xsplice_status_t) status;  /* OUT. Must have enough  
+                                               space allocate for nr of them. */  
+    XEN_GUEST_HANDLE_64(char) id;           /* OUT: Array of names. Each member  
+                                               MUST XEN_XSPLICE_NAME_SIZE in size.  
+                                               Must have nr of them. */  
+    XEN_GUEST_HANDLE_64(uint32) len;        /* OUT: Array of lengths of name's.  
+                                               Must have nr of them. */  
+};  
+</pre>
+
+### XEN_SYSCTL_XSPLICE_ACTION (3)
+
+Perform an operation on the payload structure referenced by the `name` field.
+The operation request is asynchronous and the status should be retrieved
+by using either **XEN_SYSCTL_XSPLICE_GET** or **XEN_SYSCTL_XSPLICE_LIST** hypercall.
+
+The caller provides:
+
+ * A 'struct xen_xsplice_name` `name` containing the unique name.
+ * `cmd` the command requested:
+  * *XSPLICE_ACTION_CHECK* (1) check that the payload will apply properly.
+    This also verfies the payload - which may require SecureBoot firmware
+    calls.
+  * *XSPLICE_ACTION_UNLOAD* (2) unload the payload.
+   Any further hypercalls against the `name` will result in failure unless
+   **XEN_SYSCTL_XSPLICE_UPLOAD** hypercall is perfomed with same `name`.
+  * *XSPLICE_ACTION_REVERT* (3) revert the payload. If the operation takes
+  more time than the upper bound of time the `rc` in `xen_xsplice_status'
+  retrieved via **XEN_SYSCTL_XSPLICE_GET** will be -XEN_EBUSY.
+  * *XSPLICE_ACTION_APPLY* (4) apply the payload. If the operation takes
+  more time than the upper bound of time the `rc` in `xen_xsplice_status'
+  retrieved via **XEN_SYSCTL_XSPLICE_GET** will be -XEN_EBUSY.
+  * *XSPLICE_ACTION_REPLACE* (5) revert all applied payloads and apply this
+  payload. If the operation takes more time than the upper bound of time
+  the `rc` in `xen_xsplice_status' retrieved via **XEN_SYSCTL_XSPLICE_GET**
+  will be -XEN_EBUSY.
+  * *XSPLICE_ACTION_LOADED* is an initial state and cannot be requested.
+ * `time` the upper bound of time (ms) the cmd should take. Zero means infinite.
+   If within the time the operation does not succeed the operation would go in
+   error state.
+ * `pad` - *MUST* be zero.
+
+The return value will be zero unless the provided fields are incorrect.
+
+The structure is as follow:
+
+<pre>
+#define XSPLICE_ACTION_CHECK   1  
+#define XSPLICE_ACTION_UNLOAD  2  
+#define XSPLICE_ACTION_REVERT  3  
+#define XSPLICE_ACTION_APPLY   4  
+#define XSPLICE_ACTION_REPLACE 5  
+struct xen_sysctl_xsplice_action {  
+    xen_xsplice_name_t name;                /* IN, name of the patch. */  
+    uint32_t cmd;                           /* IN: XSPLICE_ACTION_* */  
+    uint32_t time;                          /* IN: Zero if no timeout. */   
+                                            /* Or upper bound of time (ms) */   
+                                            /* for operation to take. */  
+};  
+
+</pre>
+
+## State diagrams of XSPLICE_ACTION commands.
+
+There is a strict ordering state of what the commands can be.
+The XSPLICE_ACTION prefix has been dropped to easy reading and
+does not include the XSPLICE_STATES:
+
+<pre>
+              /->\  
+              \  /  
+ UNLOAD <--- CHECK ---> REPLACE|APPLY --> REVERT --\  
+                \                                  |  
+                 \-------------------<-------------/  
+
+</pre>
+## State transition table of XSPLICE_ACTION commands and XSPLICE_STATUS.
+
+Note that:
+
+ - The LOADED state is the starting one achieved with *XEN_SYSCTL_XSPLICE_UPLOAD* hypercall.
+ - The REVERT operation on success will automatically move to the CHECKED state.
+ - There are three STATES: LOADED, CHECKED and APPLIED.
+ - There are five actions (aka commands): CHECK, APPLY, REPLACE, REVERT, and UNLOAD.
+
+The state transition table of valid states and action states:
+
+<pre>
+
++---------+---------+--------------------------------+-------+-------+--------+
+| ACTION  | Current | Result                         |       Next STATE:      |
+| ACTION  | STATE   |                                | LOADED|CHECKED|APPLIED |
++---------+----------+-------------------------------+-------+-------+--------+
+| CHECK   | LOADED  | Check payload (success).       |       |   x   |        |
++---------+---------+--------------------------------+-------+-------+--------+
+| CHECK   | LOADED  | Check payload (error).         |  x    |       |        |
++---------+---------+--------------------------------+-------+-------+--------+
+| CHECK   | CHECKED | Check payload (once more, no)  |       |   x   |        |
+|         |         | errors)                        |       |       |        |
++---------+---------+--------------------------------+-------+-------+--------+
+| CHECK   | CHECKED | Check payload (once more, with |   x   |       |        |
+|         |         | errors)                        |       |       |        |
++---------+---------+--------------------------------+-------+-------+--------+
+| UNLOAD  | CHECKED | Unload payload. Always works.  |       |       |        |
+|         |         | No next states.                |       |       |        |
++---------+---------+--------------------------------+-------+-------+--------+
+| UNLOAD  | LOADED  | Unload payload. Always works.  |       |       |        |
+|         |         | No next states.                |       |       |        |
++---------+---------+--------------------------------+-------+-------+--------+
+| APPLY   | CHECKED | Apply payload (success).       |       |       |   x    |
++---------+---------+--------------------------------+-------+-------+--------+
+| APPLY   | CHECKED | Apply payload (error|timeout)  |       |   x   |        |
++---------+---------+--------------------------------+-------+-------+--------+
+| REPLACE | CHECKED | Revert payloads and apply new  |       |       |   x    |
+|         |         | payload with success.          |       |       |        |
++---------+---------+--------------------------------+-------+-------+--------+
+| REPLACE | CHECKED | Revert payloads and apply new  |       |   x   |        |
+|         |         | payload with error.            |       |       |        |
++---------+---------+--------------------------------+-------+-------+--------+
+| REVERT  | APPLIED | Revert payload (success).      |       |   x   |        |
++---------+---------+--------------------------------+-------+-------+--------+
+| REVERT  | APPLIED | Revert payload (error|timeout) |       |       |   x    |
++---------+---------+--------------------------------+-------+-------+--------+
+</pre>
+
+All the other state transitions are invalid.
+
+## Sequence of events.
+
+The normal sequence of events is to:
+
+ 1. *XEN_SYSCTL_XSPLICE_UPLOAD* to upload the payload. If there are errors *STOP* here.
+ 2. *XEN_SYSCTL_XSPLICE_GET* to check the `->rc`. If *-XEN_EAGAIN* spin. If zero go to next step.
+ 3. *XEN_SYSCTL_XSPLICE_ACTION* with *XSPLICE_ACTION_CHECK* command to verify that the payload can be succesfully applied.
+ 4. *XEN_SYSCTL_XSPLICE_GET* to check the `->rc`. If *-XEN_EAGAIN* spin. If zero go to next step.
+ 5. *XEN_SYSCTL_XSPLICE_ACTION* with *XSPLICE_ACTION_APPLY* to apply the patch.
+ 6. *XEN_SYSCTL_XSPLICE_GET* to check the `->rc`. If in *-XEN_EAGAIN* spin. If zero exit with success.
+
+
+## Addendum
+
+Implementation quirks should not be discussed in a design document.
+
+However these observations can provide aid when developing against this
+document.
+
+
+### Alternative assembler
+
+Alternative assembler is a mechanism to use different instructions depending
+on what the CPU supports. This is done by providing multiple streams of code
+that can be patched in - or if the CPU does not support it - padded with
+`nop` operations. The alternative assembler macros cause the compiler to
+expand the code to place a most generic code in place - emit a special
+ELF .section header to tag this location. During run-time the hypervisor
+can leave the areas alone or patch them with an better suited opcodes.
+
+Note that patching functions that copy to or from guest memory requires
+to support alternative support. For example this can be due to SMAP
+(specifically *stac* and *clac* operations) which is enabled on Broadwell
+and later architectures. It may be related to other alternative instructions.
+
+### When to patch
+
+During the discussion on the design two candidates bubbled where
+the call stack for each CPU would be deterministic. This would
+minimize the chance of the patch not being applied due to safety
+checks failing. Safety checks such as not patching code which
+is on the stack - which can lead to corruption.
+
+#### Rendezvous code instead of stop_machine for patching
+
+The hypervisor's time rendezvous code runs synchronously across all CPUs
+every second. Using the stop_machine to patch can stall the time rendezvous
+code and result in NMI. As such having the patching be done at the tail
+of rendezvous code should avoid this problem.
+
+However the entrance point for that code is
+do_softirq->timer_softirq_action->time_calibration
+which ends up calling on_selected_cpus on remote CPUs.
+
+The remote CPUs receive CALL_FUNCTION_VECTOR IPI and execute the
+desired function.
+
+#### Before entering the guest code.
+
+Before we call VMXResume we check whether any soft IRQs need to be executed.
+This is a good spot because all Xen stacks are effectively empty at
+that point.
+
+To randezvous all the CPUs an barrier with an maximum timeout (which
+could be adjusted), combined with forcing all other CPUs through the
+hypervisor with IPIs, can be utilized to execute lockstep instructions
+on all CPUs.
+
+The approach is similar in concept to stop_machine and the time rendezvous
+but is time-bound. However the local CPU stack is much shorter and
+a lot more deterministic.
+
+This is implemented in the Xen Project hypervisor.
+
+### Compiling the hypervisor code
+
+Hotpatch generation often requires support for compiling the target
+with -ffunction-sections / -fdata-sections.  Changes would have to
+be done to the linker scripts to support this.
+
+### Generation of xSplice ELF payloads
+
+The design of that is not discussed in this design.
+
+This is implemented in a seperate tool which lives in a seperate
+GIT repo.
+
+Currently it resides at https://github.com/rosslagerwall/xsplice-build
+
+### Exception tables and symbol tables growth
+
+We may need support for adapting or augmenting exception tables if
+patching such code.  Hotpatches may need to bring their own small
+exception tables (similar to how Linux modules support this).
+
+If supporting hotpatches that introduce additional exception-locations
+is not important, one could also change the exception table in-place
+and reorder it afterwards.
+
+As found almost every patch (XSA) to a non-trivial function requires
+additional entries in the exception table and/or the bug frames.
+
+This is implemented in the Xen Project hypervisor.
+
+### .rodata sections
+
+The patching might require strings to be updated as well. As such we must be
+also able to patch the strings as needed. This sounds simple - but the compiler
+has a habit of coalescing strings that are the same - which means if we in-place
+alter the strings - other users will be inadvertently affected as well.
+
+This is also where pointers to functions live - and we may need to patch this
+as well. And switch-style jump tables.
+
+To guard against that we must be prepared to do patching similar to
+trampoline patching or in-line depending on the flavour. If we can
+do in-line patching we would need to:
+
+ * alter `.rodata` to be writeable.
+ * inline patch.
+ * alter `.rodata` to be read-only.
+
+If are doing trampoline patching we would need to:
+
+ * allocate a new memory location for the string.
+ * all locations which use this string will have to be updated to use the
+   offset to the string.
+ * mark the region RO when we are done.
+
+The trampoline patching is implemented in the Xen Project hypervisor.
+
+### .bss and .data sections.
+
+In place patching writable data is not suitable as it is unclear what should be done
+depending on the current state of data. As such it should not be attempted.
+
+However, functions which are being patched can bring in changes to strings
+(.data or .rodata section changes), or even to .bss sections.
+
+As such the ELF payload can introduce new .rodata, .bss, and .data sections.
+Patching in the new function will end up also patching in the new .rodata
+section and the new function will reference the new string in the new
+.rodata section.
+
+This is implemented in the Xen Project hypervisor.
+
+### Security
+
+Only the privileged domain should be allowed to do this operation.
+
+
+# Not Yet Done
+
+This is for further development of xSplice.
+
+## Goals
+
+The design must also have a mechanism for:
+
+ *  An dependency mechanism for the payloads. To use that information to load:
+    - The appropiate payload. To verify that payload is built against the
+      hypervisor. This can be done via the `build-id`
+      or via providing an copy of the old code - so that the hypervisor can
+       verify it against the code in memory.
+    - To construct an appropiate order of payloads to load in case they
+      depend on each other.
+ * Be able to lookup in the Xen hypervisor the symbol names of functions from the ELF payload.
+ * Be able to patch .rodata, .bss, and .data sections.
+ * Further safety checks (blacklist of which functions cannot be patched, check
+   the stack, etc).
+ * NOP out the code sequence if `new_size` is zero.
+
+### xSplice interdependencies
+
+xSplice patches interdependencies are tricky.
+
+There are the ways this can be addressed:
+ * A single large patch that subsumes and replaces all previous ones.
+   Over the life-time of patching the hypervisor this large patch
+   grows to accumulate all the code changes.
+ * Hotpatch stack - where an mechanism exists that loads the hotpatches
+   in the same order they were built in. We would need an build-id
+   of the hypevisor to make sure the hot-patches are build against the
+   correct build.
+ * Payload containing the old code to check against that. That allows
+   the hotpatches to be loaded indepedently (if they don't overlap) - or
+   if the old code also containst previously patched code - even if they
+   overlap.
+
+The disadvantage of the first large patch is that it can grow over
+time and not provide an bisection mechanism to identify faulty patches.
+
+The hot-patch stack puts stricts requirements on the order of the patches
+being loaded and requires an hypervisor build-id to match against.
+
+The old code allows much more flexibility and an additional guard,
+but is more complex to implement.
+
+### Handle inlined __LINE__
+
+This problem is related to hotpatch construction
+and potentially has influence on the design of the hotpatching
+infrastructure in Xen.
+
+For example:
+
+We have file1.c with functions f1 and f2 (in that order).  f2 contains a
+BUG() (or WARN()) macro and at that point embeds the source line number
+into the generated code for f2.
+
+Now we want to hotpatch f1 and the hotpatch source-code patch adds 2
+lines to f1 and as a consequence shifts out f2 by two lines.  The newly
+constructed file1.o will now contain differences in both binary
+functions f1 (because we actually changed it with the applied patch) and
+f2 (because the contained BUG macro embeds the new line number).
+
+Without additional information, an algorithm comparing file1.o before
+and after hotpatch application will determine both functions to be
+changed and will have to include both into the binary hotpatch.
+
+Options:
+
+1. Transform source code patches for hotpatches to be line-neutral for
+   each chunk.  This can be done in almost all cases with either
+   reformatting of the source code or by introducing artificial
+   preprocessor "#line n" directives to adjust for the introduced
+   differences.
+
+   This approach is low-tech and simple.  Potentially generated
+   backtraces and existing debug information refers to the original
+   build and does not reflect hotpatching state except for actually
+   hotpatched functions but should be mostly correct.
+
+2. Ignoring the problem and living with artificially large hotpatches
+   that unnecessarily patch many functions.
+
+   This approach might lead to some very large hotpatches depending on
+   content of specific source file.  It may also trigger pulling in
+   functions into the hotpatch that cannot reasonable be hotpatched due
+   to limitations of a hotpatching framework (init-sections, parts of
+   the hotpatching framework itself, ...) and may thereby prevent us
+   from patching a specific problem.
+
+   The decision between 1. and 2. can be made on a patch--by-patch
+   basis.
+
+3. Introducing an indirection table for storing line numbers and
+   treating that specially for binary diffing. Linux may follow
+   this approach.
+
+   We might either use this indirection table for runtime use and patch
+   that with each hotpatch (similarly to exception tables) or we might
+   purely use it when building hotpatches to ignore functions that only
+   differ at exactly the location where a line-number is embedded.
+
+For BUG(), WARN(), etc., the line number is embedded into the bug frame, not
+the function itself.
+
+Similar considerations are true to a lesser extent for __FILE__, but it
+could be argued that file renaming should be done outside of hotpatches.
+
+## Signature checking requirements.
+
+The signature checking requires that the layout of the data in memory
+**MUST** be same for signature to be verified. This means that the payload
+data layout in ELF format **MUST** match what the hypervisor would be
+expecting such that it can properly do signature verification.
+
+The signature is based on the all of the payloads continuously laid out
+in memory. The signature is to be appended at the end of the ELF payload
+prefixed with the string '~Module signature appended~\n', followed by
+an signature header then followed by the signature, key identifier, and signers
+name.
+
+Specifically the signature header would be:
+
+<pre>
+#define PKEY_ALGO_DSA       0  
+#define PKEY_ALGO_RSA       1  
+
+#define PKEY_ID_PGP         0 /* OpenPGP generated key ID */  
+#define PKEY_ID_X509        1 /* X.509 arbitrary subjectKeyIdentifier */  
+
+#define HASH_ALGO_MD4          0  
+#define HASH_ALGO_MD5          1  
+#define HASH_ALGO_SHA1         2  
+#define HASH_ALGO_RIPE_MD_160  3  
+#define HASH_ALGO_SHA256       4  
+#define HASH_ALGO_SHA384       5  
+#define HASH_ALGO_SHA512       6  
+#define HASH_ALGO_SHA224       7  
+#define HASH_ALGO_RIPE_MD_128  8  
+#define HASH_ALGO_RIPE_MD_256  9  
+#define HASH_ALGO_RIPE_MD_320 10  
+#define HASH_ALGO_WP_256      11  
+#define HASH_ALGO_WP_384      12  
+#define HASH_ALGO_WP_512      13  
+#define HASH_ALGO_TGR_128     14  
+#define HASH_ALGO_TGR_160     15  
+#define HASH_ALGO_TGR_192     16  
+
+
+struct elf_payload_signature {  
+	u8	algo;		/* Public-key crypto algorithm PKEY_ALGO_*. */  
+	u8	hash;		/* Digest algorithm: HASH_ALGO_*. */  
+	u8	id_type;	/* Key identifier type PKEY_ID*. */  
+	u8	signer_len;	/* Length of signer's name */  
+	u8	key_id_len;	/* Length of key identifier */  
+	u8	__pad[3];  
+	__be32	sig_len;	/* Length of signature data */  
+};
+
+</pre>
+(Note that this has been borrowed from Linux module signature code.).
+
+
+### .bss and .data sections.
+
+In place patching writable data is not suitable as it is unclear what should be done
+depending on the current state of data. As such it should not be attempted.
+
+That said we should provide hook functions so that the existing data
+can be changed during payload application.
+
+
+### Inline patching
+
+The hypervisor should verify that the in-place patching would fit within
+the code or data.
+
+### Trampoline (e9 opcode)
+
+The e9 opcode used for jmpq uses a 32-bit signed displacement. That means
+we are limited to up to 2GB of virtual address to place the new code
+from the old code. That should not be a problem since Xen hypervisor has
+a very small footprint.
+
+However if we need - we can always add two trampolines. One at the 2GB
+limit that calls the next trampoline.
+
+Please note there is a small limitation for trampolines in
+function entries: The target function (+ trailing padding) must be able
+to accomodate the trampoline. On x86 with +-2 GB relative jumps,
+this means 5 bytes are required.
+
+Depending on compiler settings, there are several functions in Xen that
+are smaller (without inter-function padding).
+
+<pre> 
+readelf -sW xen-syms | grep " FUNC " | \
+    awk '{ if ($3 < 5) print $3, $4, $5, $8 }'
+
+...
+3 FUNC LOCAL wbinvd_ipi
+3 FUNC LOCAL shadow_l1_index
+...
+</pre>
+A compile-time check for, e.g., a minimum alignment of functions or a
+runtime check that verifies symbol size (+ padding to next symbols) for
+that in the hypervisor is advised.
+
+The tool for generating payloads currently does perform a compile-time
+check to ensure that the function to be replaced is large enough.
+
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 01/23] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op (v10)
  2016-02-12 20:40     ` Konrad Rzeszutek Wilk
  2016-02-12 20:53       ` Andrew Cooper
@ 2016-02-15  8:16       ` Jan Beulich
  1 sibling, 0 replies; 86+ messages in thread
From: Jan Beulich @ 2016-02-15  8:16 UTC (permalink / raw)
  To: Andrew Cooper, Konrad Rzeszutek Wilk
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, jinsong.liu,
	Ian Jackson, xen-devel, mpohlack, ross.lagerwall, xen-devel,
	Daniel De Graaf, sasha.levin

>>> On 12.02.16 at 21:40, <konrad.wilk@oracle.com> wrote:
>> > diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
>> > index 96680eb..d549e7a 100644
>> > --- a/xen/include/public/sysctl.h
>> > +++ b/xen/include/public/sysctl.h
>> > @@ -766,6 +766,160 @@ struct xen_sysctl_tmem_op {
>> >  typedef struct xen_sysctl_tmem_op xen_sysctl_tmem_op_t;
>> >  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tmem_op_t);
>> >  
>> > +/*
>> > + * XEN_SYSCTL_XSPLICE_op
>> > + *
>> > + * Refer to the 
> http://xenbits.xenproject.org/docs/unstable/misc/xsplice.html 
>> 
>> I would refer to the file in the source tree, so docs/misc/xsplice.$FOO
>> which is far less likely to change.
> 
> The initial patch had exactly that -  but Jan asked me to change it to the
> URL. Shall I include both of them?

Well, I have to admit that I don't recall, and don't see why I would
have. I agree with Andrew that an in-tree reference would be
better. Maybe I said this neglecting that the (supposedly) first
patch puts the respective doc in place...

Jan

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 04/23] elf: Add relocation types to elfstructs.h
  2016-02-12 18:05 ` [PATCH v3 04/23] elf: Add relocation types to elfstructs.h Konrad Rzeszutek Wilk
  2016-02-12 20:13   ` Andrew Cooper
@ 2016-02-15  8:34   ` Jan Beulich
  2016-02-19 21:05     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 86+ messages in thread
From: Jan Beulich @ 2016-02-15  8:34 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, Ian Campbell, jinsong.liu, Ian Jackson, Tim Deegan,
	mpohlack, ross.lagerwall, andrew.cooper3, xen-devel, sasha.levin

>>> On 12.02.16 at 19:05, <konrad.wilk@oracle.com> wrote:
> --- a/xen/include/xen/elfstructs.h
> +++ b/xen/include/xen/elfstructs.h
> @@ -348,6 +348,14 @@ typedef struct {
>  #define	ELF64_R_TYPE(info)	((info) & 0xFFFFFFFF)
>  #define ELF64_R_INFO(s,t) 	(((s) << 32) + (u_int32_t)(t))
>  
> +/* x86-64 relocation types. We list only the ones we implement. */

"we implement" is too vague for my taste: This comment should
have some kind of reference to xSplice.

> +#define R_X86_64_NONE		0	/* No reloc */
> +#define R_X86_64_64		1	/* Direct 64 bit  */
> +#define R_X86_64_PC32		2	/* PC relative 32 bit signed */
> +#define R_X86_64_PLT32		4	/* 32 bit PLT address */
> +#define R_X86_64_32		10	/* Direct 32 bit zero extended */
> +#define R_X86_64_32S		11	/* Direct 32 bit sign extended */

Is there really a use case for the last two in the hypervisor
(which doesn't live in the top 2G of address space)? (If the
use case are constants, I suppose R_X86_64_{8,16} ought
to also be permitted.) Also, is there a reason why at least
R_X86_64_PC64 shouldn't also be supported?

Jan

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 02/23] libxc: Implementation of XEN_XSPLICE_op in libxc (v5).
  2016-02-12 18:05 ` [PATCH v3 02/23] libxc: Implementation of XEN_XSPLICE_op in libxc (v5) Konrad Rzeszutek Wilk
@ 2016-02-15 12:35   ` Wei Liu
  2016-02-19 20:04     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 86+ messages in thread
From: Wei Liu @ 2016-02-15 12:35 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Ian Campbell, andrew.cooper3, Stefano Stabellini,
	Ian Jackson, xen-devel, mpohlack, ross.lagerwall, jinsong.liu,
	xen-devel, sasha.levin

On Fri, Feb 12, 2016 at 01:05:40PM -0500, Konrad Rzeszutek Wilk wrote:
> The underlaying toolstack code to do the basic
> operations when using the XEN_XSPLICE_op syscalls:
>  - upload the payload,
>  - get status of an payload,
>  - list all the payloads,
>  - apply, check, replace, and revert the payload.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> ---
> v2: Actually set zero for the _pad entries.
> v3: Split status into state and error code.
>     Add REPLACE action.
> v4: Use timeout and utilize pads.
> v5: Update per Wei's review.
> ---
>  tools/libxc/include/xenctrl.h |  19 ++-
>  tools/libxc/xc_misc.c         | 332 ++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 350 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 1a5f4ec..7c666b7 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -2573,9 +2573,26 @@ int xc_psr_cat_get_l3_info(xc_interface *xch, uint32_t socket,
>                             bool *cdp_enabled);
>  #endif
>  
> +int xc_xsplice_upload(xc_interface *xch,
> +                      char *name, unsigned char *payload, uint32_t size);
> +
> +int xc_xsplice_get(xc_interface *xch,
> +                   char *name,
> +                   xen_xsplice_status_t *status);
> +
> +int xc_xsplice_list(xc_interface *xch, unsigned int max, unsigned int start,
> +                    xen_xsplice_status_t *info, char *name,
> +                    uint32_t *len, unsigned int *done,
> +                    unsigned int *left);
> +
> +int xc_xsplice_apply(xc_interface *xch, char *name, uint32_t timeout);
> +int xc_xsplice_revert(xc_interface *xch, char *name, uint32_t timeout);
> +int xc_xsplice_unload(xc_interface *xch, char *name, uint32_t timeout);
> +int xc_xsplice_check(xc_interface *xch, char *name, uint32_t timeout);
> +int xc_xsplice_replace(xc_interface *xch, char *name, uint32_t timeout);
> +

What's the meaning of "timeout"? What can the caller expect from setting
that value? I think it either needs better name or better document.

>  /* Compat shims */
>  #include "xenctrl_compat.h"
> -

Stray blank line change.

>  #endif /* XENCTRL_H */
>  
>  /*
> diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
> index 124537b..b0f7068 100644
> --- a/tools/libxc/xc_misc.c
> +++ b/tools/libxc/xc_misc.c
> @@ -693,6 +693,338 @@ int xc_hvm_inject_trap(
>      return rc;
>  }
[...]
> +/*
> + * The heart of this function is to get an array of xen_xsplice_status_t.
> + *
> + * However it is complex because it has to deal with the hypervisor
> + * returning -EAGAIN or the data that is being returned becomes stale
> + * (another hypercall might alter the list).
> + *

I don't see EAGAIN handled in the following function. Is that expected?

> + * The parameters that the function expects to contain data from
> + * the hypervisor are: 'info', 'name', and 'len'. The 'done' and
> + * 'left' are also updated with the number of entries filled out
> + * and respectively the number of entries left to get from hypervisor.
> + *
> + * It is expected that the caller of this function will take the
> + * 'left' and use the value for 'start'. This way we have an
> + * cursor in the array. Note that the 'info','name', and 'len' will
> + * be updated at the subsequent calls.
> + *
> + * The 'max' is to be provided by the caller with the maximum
> + * number of entries that 'info', 'name', and 'len' arrays can
> + * be filled up with.
> + *
> + * Each entry in the 'name' array is expected to be of XEN_XSPLICE_NAME_SIZE
> + * length.
> + *
> + * Each entry in the 'info' array is expected to be of xen_xsplice_status_t
> + * structure size.
> + *
> + * Each entry in the 'len' array is expected to be of uint32_t size.
> + *
> + * The return value is zero if the hypercall completed successfully.
> + * Note that the return value is _not_ the amount of entries filled
> + * out - that is saved in 'done'.
> + *
> + * If there was an error performing the operation, the return value
> + * will contain an negative -EXX type value. The 'done' and 'left'
> + * will contain the number of entries that had been succesfully
> + * retrieved (if any).
> + */
> +int xc_xsplice_list(xc_interface *xch, unsigned int max, unsigned int start,
> +                    xen_xsplice_status_t *info,
> +                    char *name, uint32_t *len,
> +                    unsigned int *done,
> +                    unsigned int *left)
> +{
> +    int rc;
> +    DECLARE_SYSCTL;
> +    /* The sizes are adjusted later - hence zero. */
> +    DECLARE_HYPERCALL_BOUNCE(info, 0, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    DECLARE_HYPERCALL_BOUNCE(name, 0, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    DECLARE_HYPERCALL_BOUNCE(len, 0, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +    uint32_t max_batch_sz, nr;
> +    uint32_t version = 0, retries = 0;
> +    uint32_t adjust = 0;
> +    ssize_t sz;
> +
> +    if ( !max || !info || !name || !len )
> +        return -1;
> +
> +    sysctl.cmd = XEN_SYSCTL_xsplice_op;
> +    sysctl.u.xsplice.cmd = XEN_SYSCTL_XSPLICE_LIST;
> +    sysctl.u.xsplice.pad = 0;
> +    sysctl.u.xsplice.u.list.version = 0;
> +    sysctl.u.xsplice.u.list.idx = start;
> +    sysctl.u.xsplice.u.list.pad = 0;
> +
> +    max_batch_sz = max;
> +    /* Convience value. */
> +    sz = sizeof(*name) * XEN_XSPLICE_NAME_SIZE;
> +    *done = 0;
> +    *left = 0;
> +    do {
> +        /*
> +         * The first time we go in this loop our 'max' may be bigger
> +         * than what the hypervisor is comfortable with - hence the first
> +         * couple of loops may adjust the number of entries we will
> +         * want filled (tracked by 'nr').
> +         */
> +        if ( adjust )
> +            adjust = 0; /* Used when adjusting the 'max_batch_sz' or 'retries'. */
> +

This is equivalent to always setting adjust to 0.

> +        nr = min(max - *done, max_batch_sz);
> +
> +        sysctl.u.xsplice.u.list.nr = nr;
> +        /* Fix the size (may vary between hypercalls). */
> +        HYPERCALL_BOUNCE_SET_SIZE(info, nr * sizeof(*info));
> +        HYPERCALL_BOUNCE_SET_SIZE(name, nr * nr);
> +        HYPERCALL_BOUNCE_SET_SIZE(len, nr * sizeof(*len));
> +        /* Move the pointer to proper offset into 'info'. */
> +        (HYPERCALL_BUFFER(info))->ubuf = info + *done;
> +        (HYPERCALL_BUFFER(name))->ubuf = name + (sz * *done);
> +        (HYPERCALL_BUFFER(len))->ubuf = len + *done;
> +        /* Allocate memory. */
> +        rc = xc_hypercall_bounce_pre(xch, info);
> +        if ( rc )
> +            break;
> +
> +        rc = xc_hypercall_bounce_pre(xch, name);
> +        if ( rc )
> +            break;
> +
> +        rc = xc_hypercall_bounce_pre(xch, len);
> +        if ( rc )
> +            break;
> +
> +        set_xen_guest_handle(sysctl.u.xsplice.u.list.status, info);
> +        set_xen_guest_handle(sysctl.u.xsplice.u.list.name, name);
> +        set_xen_guest_handle(sysctl.u.xsplice.u.list.len, len);
> +
> +        rc = do_sysctl(xch, &sysctl);
> +        /*
> +         * From here on we MUST call xc_hypercall_bounce. If rc < 0 we
> +         * end up doing it (outside the loop), so using a break is OK.
> +         */
> +        if ( rc < 0 && errno == E2BIG )
> +        {
> +            if ( max_batch_sz <= 1 )
> +                break;
> +            max_batch_sz >>= 1;
> +            adjust = 1; /* For the loop conditional to let us loop again. */
> +            /* No memory leaks! */
> +            xc_hypercall_bounce_post(xch, info);
> +            xc_hypercall_bounce_post(xch, name);
> +            xc_hypercall_bounce_post(xch, len);
> +            continue;
> +        }
> +        else if ( rc < 0 ) /* For all other errors we bail out. */
> +            break;
> +
> +        if ( !version )
> +            version = sysctl.u.xsplice.u.list.version;
> +
> +        if ( sysctl.u.xsplice.u.list.version != version )
> +        {
> +            /* We could make this configurable as parameter? */
> +            if ( retries++ > 3 )
> +            {
> +                rc = -1;
> +                errno = EBUSY;
> +                break;
> +            }
> +            *done = 0; /* Retry from scratch. */
> +            version = sysctl.u.xsplice.u.list.version;
> +            adjust = 1; /* And make sure we continue in the loop. */

Actually this "adjust" variable looks useless to me because you always
use "continue" afterwards. It won't ever get used in "while".

> +            /* No memory leaks. */
> +            xc_hypercall_bounce_post(xch, info);
> +            xc_hypercall_bounce_post(xch, name);
> +            xc_hypercall_bounce_post(xch, len);
> +            continue;
> +        }
> +
> +        /* We should never hit this, but just in case. */
> +        if ( rc > nr )
> +        {
> +            errno = EINVAL; /* Overflow! */

Use EOVERFLOW?

> +            rc = -1;
> +            break;
> +        }
> +        *left = sysctl.u.xsplice.u.list.nr; /* Total remaining count. */
> +        /* Copy only up 'rc' of data' - we could add 'min(rc,nr) if desired. */
> +        HYPERCALL_BOUNCE_SET_SIZE(info, (rc * sizeof(*info)));
> +        HYPERCALL_BOUNCE_SET_SIZE(name, (rc * sz));
> +        HYPERCALL_BOUNCE_SET_SIZE(len, (rc * sizeof(*len)));
> +        /* Bounce the data and free the bounce buffer. */
> +        xc_hypercall_bounce_post(xch, info);
> +        xc_hypercall_bounce_post(xch, name);
> +        xc_hypercall_bounce_post(xch, len);
> +        /* And update how many elements of info we have copied into. */
> +        *done += rc;
> +        /* Update idx. */
> +        sysctl.u.xsplice.u.list.idx = *done;
> +    } while ( adjust || (*done < max && *left != 0) );
> +
> +    if ( rc < 0 )
> +    {
> +        xc_hypercall_bounce_post(xch, len);
> +        xc_hypercall_bounce_post(xch, name);
> +        xc_hypercall_bounce_post(xch, info);
> +    }
> +
> +    return rc > 0 ? 0 : rc;
> +}
> +

Wei.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 14/23] libxl: info: Display build_id of the hypervisor.
  2016-02-12 18:05 ` [PATCH v3 14/23] libxl: info: Display build_id of the hypervisor Konrad Rzeszutek Wilk
@ 2016-02-15 12:45   ` Wei Liu
  0 siblings, 0 replies; 86+ messages in thread
From: Wei Liu @ 2016-02-15 12:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Ian Campbell, andrew.cooper3, Stefano Stabellini,
	Ian Jackson, xen-devel, mpohlack, ross.lagerwall, jinsong.liu,
	xen-devel, sasha.levin

On Fri, Feb 12, 2016 at 01:05:52PM -0500, Konrad Rzeszutek Wilk wrote:
> If the hypervisor is built with we will display it.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
> v2: Include HAVE_*, use libxl_zalloc, s/rc/ret/
> v3: Retry with different size if 1020 is not enough.
> ---
>  tools/libxl/libxl.c         | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  tools/libxl/libxl.h         |  5 +++++
>  tools/libxl/libxl_types.idl |  1 +
>  tools/libxl/xl_cmdimpl.c    |  1 +
>  4 files changed, 52 insertions(+)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 2d18b8d..4efd8dd 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -5256,6 +5256,38 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
>      return ret;
>  }
>  
> +static const int libxl_get_build_id(libxl_ctx *ctx, libxl_version_info *info,
> +                                    xen_build_id_t *build)

Is this supposed to be a public API? If not, please make it

   libxl__get_build_id(libxl__gc *gc, ...)

.

Asking because it only gets used in libxl_get_version_info.

> +{
> +    GC_INIT(ctx);
> +    int ret;
> +
> +    ret = xc_version(ctx->xch, XENVER_build_id, build);
> +    switch ( ret ) {
> +    case -EPERM:
> +    case -ENODATA:
> +    case 0:
> +        info->build_id = libxl__strdup(NOGC, "");
> +        break;
> +    case -ENOBUFS:
> +        GC_FREE;
> +        return -ENOBUFS;

The error code should be libxl error ERROR_*.

> +    default:
> +        if (ret > 0) {
> +            unsigned int i;
> +
> +            info->build_id = libxl__zalloc(NOGC, (ret * 2) + 1);
> +
> +            for (i = 0; i < ret ; i++)
> +                snprintf(&info->build_id[i * 2], 3, "%02hhx", build->buf[i]);
> +        } else
> +            LOGEV(ERROR, ret, "getting build_id");
> +        break;
> +    }
> +    GC_FREE;
> +    return 0;

Please use goto out style error handling.

See CODING_STYLE and existing functions for reference.

> +}
> +
>  const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx)
>  {
>      GC_INIT(ctx);
> @@ -5266,8 +5298,10 @@ const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx)
>          xen_capabilities_info_t xen_caps;
>          xen_platform_parameters_t p_parms;
>          xen_commandline_t xen_commandline;
> +        xen_build_id_t build_id;
>      } u;
>      long xen_version;
> +    int ret;

Normally this is called rc. See CODING_STYLE.

>      libxl_version_info *info = &ctx->version_info;
>  
>      if (info->xen_version_extra != NULL)
> @@ -5300,6 +5334,17 @@ const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx)
>      xc_version(ctx->xch, XENVER_commandline, &u.xen_commandline);
>      info->commandline = libxl__strdup(NOGC, u.xen_commandline);
>  
> +    u.build_id.len = sizeof(u) - sizeof(u.build_id);
> +    ret = libxl_get_build_id(ctx, info, &u.build_id);
> +    if ( ret == -ENOBUFS ) {

No space in after "(" and before ")".

> +            xen_build_id_t *build_id;
> +
> +            build_id = libxl__zalloc(gc, info->pagesize);
> +            build_id->len = info->pagesize - sizeof(*build_id);
> +            ret = libxl_get_build_id(ctx, info, build_id);
> +            if ( ret )
> +                LOGEV(ERROR, ret, "getting build_id");
> +    }
>   out:
>      GC_FREE;
>      return info;
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index fa87f53..b713407 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -218,6 +218,11 @@
>  #define LIBXL_HAVE_SOFT_RESET 1
>  
>  /*
> + * LIBXL_HAVE_BUILD_ID means that libxl_version_info has the extra
> + * field for the hypervisor build_id.
> + */
> +#define LIBXL_HAVE_BUILD_ID 1
> +/*
>   * libxl ABI compatibility
>   *
>   * The only guarantee which libxl makes regarding ABI compatibility
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 9ad7eba..92bf620 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -356,6 +356,7 @@ libxl_version_info = Struct("version_info", [
>      ("virt_start",        uint64),
>      ("pagesize",          integer),
>      ("commandline",       string),
> +    ("build_id",          string),
>      ], dir=DIR_OUT)
>  
>  libxl_domain_create_info = Struct("domain_create_info",[
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index d07ccb2..9bdc42a 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -5552,6 +5552,7 @@ static void output_xeninfo(void)
>      printf("cc_compile_by          : %s\n", info->compile_by);
>      printf("cc_compile_domain      : %s\n", info->compile_domain);
>      printf("cc_compile_date        : %s\n", info->compile_date);
> +    printf("build_id               : %s\n", info->build_id);
>  
>      return;
>  }
> -- 
> 2.1.0
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 03/23] xen-xsplice: Tool to manipulate xsplice payloads (v4)
  2016-02-12 18:05 ` [PATCH v3 03/23] xen-xsplice: Tool to manipulate xsplice payloads (v4) Konrad Rzeszutek Wilk
@ 2016-02-15 12:59   ` Wei Liu
  2016-02-19 20:46     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 86+ messages in thread
From: Wei Liu @ 2016-02-15 12:59 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Ian Campbell, andrew.cooper3, Stefano Stabellini,
	Ian Jackson, xen-devel, mpohlack, ross.lagerwall, jinsong.liu,
	xen-devel, sasha.levin

On Fri, Feb 12, 2016 at 01:05:41PM -0500, Konrad Rzeszutek Wilk wrote:
[...]
> diff --git a/tools/misc/xen-xsplice.c b/tools/misc/xen-xsplice.c
> new file mode 100644
> index 0000000..13f762f
> --- /dev/null
> +++ b/tools/misc/xen-xsplice.c

One gripe I have with this program is that many of its functions mix
direct return and goto style error handling.

> @@ -0,0 +1,470 @@
> +/*
> + * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
> + */
> +
[...]
> +static const char *state2str(long state)
> +{
> +#define STATE(x) [XSPLICE_STATE_##x] = #x
> +    static const char *const names[] = {
> +            STATE(LOADED),
> +            STATE(CHECKED),
> +            STATE(APPLIED),
> +    };
> +#undef STATE
> +    if (state >= ARRAY_SIZE(names))
> +        return "unknown";
> +
> +    if (state < 0)
> +        return "-EXX";
> +

This doesn't look very useful.

Wei.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 08/23] x86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version'. (v2)
  2016-02-12 18:05 ` [PATCH v3 08/23] x86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version'. (v2) Konrad Rzeszutek Wilk
@ 2016-02-16 11:31   ` Ross Lagerwall
  0 siblings, 0 replies; 86+ messages in thread
From: Ross Lagerwall @ 2016-02-16 11:31 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, jinsong.liu,
	Ian Jackson, mpohlack, Stefano Stabellini, Jan Beulich,
	andrew.cooper3, xen-devel, Keir Fraser, sasha.levin

On 02/12/2016 06:05 PM, Konrad Rzeszutek Wilk wrote:
> This change demonstrates how to generate an xSplice ELF payload.
>
> The idea here is that we want to patch in the hypervisor
> the 'xen_version_extra' function with an function that will
> return 'Hello World'. The 'xl info | grep extraversion'
> will reflect the new value after the patching.
>
> To generate this ELF payload file we need:
>   - C code of the new code (xen_hello_world_func.c).
>   - C code generating the .xsplice.funcs structure
>     (xen_hello_world.c)
>   - The address of the old code (xen_extra_version). We
>     retrieve it by  using 'nm --defined' on xen-syms.
>   - The size of the new and old code for which we use
>     nm --defined -S on our code and xen-syms respectively.
>
snip
> diff --git a/tools/misc/xsplice.lds b/tools/misc/xsplice.lds
> new file mode 100644
> index 0000000..f52eb8c
> --- /dev/null
> +++ b/tools/misc/xsplice.lds
> @@ -0,0 +1,11 @@
> +OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", "elf64-x86-64")
> +OUTPUT_ARCH(i386:x86-64)
> +ENTRY(xsplice_hello_world)
> +SECTIONS
> +{
> +    /* The hypervisor expects ".xsplice.func", so change
> +     * the ".data.xsplice_hello_world" to it. */
> +
> +    .xsplice.funcs : { *(*.xsplice_hello_world) }
> +
> +}

I think this file can be dropped now, nothing uses it as far as I can see.

-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)
  2016-02-12 18:05 ` [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5) Konrad Rzeszutek Wilk
@ 2016-02-16 19:11   ` Andrew Cooper
  2016-02-17  8:58     ` Ross Lagerwall
                       ` (3 more replies)
  2016-02-22 15:00   ` Ross Lagerwall
  1 sibling, 4 replies; 86+ messages in thread
From: Andrew Cooper @ 2016-02-16 19:11 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, konrad, mpohlack,
	ross.lagerwall, sasha.levin, jinsong.liu, Ian Campbell,
	Stefano Stabellini, Keir Fraser, Jan Beulich, Boris Ostrovsky,
	Suravee Suthikulpanit, Aravind Gopalakrishnan, Jun Nakajima,
	Kevin Tian, xen-devel

On 12/02/16 18:05, Konrad Rzeszutek Wilk wrote:
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 9d43f7b..b5995b9 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -36,6 +36,7 @@
>  #include <xen/cpu.h>
>  #include <xen/wait.h>
>  #include <xen/guest_access.h>
> +#include <xen/xsplice.h>
>  #include <public/sysctl.h>
>  #include <public/hvm/hvm_vcpu.h>
>  #include <asm/regs.h>
> @@ -121,6 +122,7 @@ static void idle_loop(void)
>          (*pm_idle)();
>          do_tasklet();
>          do_softirq();
> +        do_xsplice(); /* Must be last. */

Then name "do_xsplice()" is slightly misleading (although it is in equal
company here).  check_for_xsplice_work() would be more accurate.

> diff --git a/xen/arch/x86/xsplice.c b/xen/arch/x86/xsplice.c
> index 814dd52..ae35e91 100644
> --- a/xen/arch/x86/xsplice.c
> +++ b/xen/arch/x86/xsplice.c
> @@ -10,6 +10,25 @@
>                              __func__,__LINE__, x); return x; }
>  #endif
>  
> +#define PATCH_INSN_SIZE 5

Somewhere you should have a BUILD_BUG_ON() confirming that
PATCH_INSN_SIZE fits within the undo array.

Having said that, I think all of xsplice_patch_func should be
arch-specific rather than generic.

> +
> +void xsplice_apply_jmp(struct xsplice_patch_func *func)
> +{
> +    uint32_t val;
> +    uint8_t *old_ptr;
> +
> +    old_ptr = (uint8_t *)func->old_addr;
> +    memcpy(func->undo, old_ptr, PATCH_INSN_SIZE);

At least a newline here please.

> +    *old_ptr++ = 0xe9; /* Relative jump */
> +    val = func->new_addr - func->old_addr - PATCH_INSN_SIZE;

E9 takes a rel32 parameter, which is signed.

I think you need to explicitly cast to intptr_t and used signed
arithmetic throughout this calculation to correctly calculate a
backwards jump.

I think there should also be some sanity checks that both old_addr and
new_addr are in the Xen 1G virtual region.

> +    memcpy(old_ptr, &val, sizeof val);
> +}
> +
> +void xsplice_revert_jmp(struct xsplice_patch_func *func)
> +{
> +    memcpy((void *)func->old_addr, func->undo, PATCH_INSN_SIZE);

_p() is common shorthand in Xen for a cast to (void *)

> +}
> +
>  int xsplice_verify_elf(struct xsplice_elf *elf, uint8_t *data)
>  {
>  
> diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
> index fbd6129..b854c0a 100644
> --- a/xen/common/xsplice.c
> +++ b/xen/common/xsplice.c
> @@ -3,6 +3,7 @@
>   *
>   */
>  
> +#include <xen/cpu.h>
>  #include <xen/guest_access.h>
>  #include <xen/keyhandler.h>
>  #include <xen/lib.h>
> @@ -10,16 +11,25 @@
>  #include <xen/mm.h>
>  #include <xen/sched.h>
>  #include <xen/smp.h>
> +#include <xen/softirq.h>
>  #include <xen/spinlock.h>
> +#include <xen/wait.h>
>  #include <xen/xsplice_elf.h>
>  #include <xen/xsplice.h>
>  
>  #include <asm/event.h>
> +#include <asm/nmi.h>
>  #include <public/sysctl.h>
>  
> +/*
> + * Protects against payload_list operations and also allows only one
> + * caller in schedule_work.
> + */
>  static DEFINE_SPINLOCK(payload_lock);
>  static LIST_HEAD(payload_list);
>  
> +static LIST_HEAD(applied_list);
> +
>  static unsigned int payload_cnt;
>  static unsigned int payload_version = 1;
>  
> @@ -29,6 +39,9 @@ struct payload {
>      struct list_head list;               /* Linked to 'payload_list'. */
>      void *payload_address;               /* Virtual address mapped. */
>      size_t payload_pages;                /* Nr of the pages. */
> +    struct list_head applied_list;       /* Linked to 'applied_list'. */
> +    struct xsplice_patch_func *funcs;    /* The array of functions to patch. */
> +    unsigned int nfuncs;                 /* Nr of functions to patch. */
>  
>      char name[XEN_XSPLICE_NAME_SIZE + 1];/* Name of it. */
>  };
> @@ -36,6 +49,24 @@ struct payload {
>  static int load_payload_data(struct payload *payload, uint8_t *raw, ssize_t len);
>  static void free_payload_data(struct payload *payload);
>  
> +/* Defines an outstanding patching action. */
> +struct xsplice_work
> +{
> +    atomic_t semaphore;          /* Used for rendezvous. First to grab it will
> +                                    do the patching. */
> +    atomic_t irq_semaphore;      /* Used to signal all IRQs disabled. */
> +    uint32_t timeout;                    /* Timeout to do the operation. */
> +    struct payload *data;        /* The payload on which to act. */
> +    volatile bool_t do_work;     /* Signals work to do. */
> +    volatile bool_t ready;       /* Signals all CPUs synchronized. */
> +    uint32_t cmd;                /* Action request: XSPLICE_ACTION_* */
> +};
> +
> +/* There can be only one outstanding patching action. */
> +static struct xsplice_work xsplice_work;

This is a scalability issue, specifically that every cpu on the
return-to-guest path polls the bytes making up do_work.  See below for
my suggestion to fix this.

> +
> +static int schedule_work(struct payload *data, uint32_t cmd, uint32_t timeout);
> +
>  static const char *state2str(int32_t state)
>  {
>  #define STATE(x) [XSPLICE_STATE_##x] = #x
> @@ -61,14 +92,23 @@ static const char *state2str(int32_t state)
>  static void xsplice_printall(unsigned char key)
>  {
>      struct payload *data;
> +    unsigned int i;
>  
>      spin_lock(&payload_lock);
>  
>      list_for_each_entry ( data, &payload_list, list )
> -        printk(" name=%s state=%s(%d) %p using %zu pages.\n", data->name,
> +    {
> +        printk(" name=%s state=%s(%d) %p using %zu pages:\n", data->name,
>                 state2str(data->state), data->state, data->payload_address,
>                 data->payload_pages);
>  
> +        for ( i = 0; i < data->nfuncs; i++ )
> +        {
> +            struct xsplice_patch_func *f = &(data->funcs[i]);
> +            printk("    %s patch 0x%"PRIx64"(%u) with 0x%"PRIx64"(%u)\n",
> +                   f->name, f->old_addr, f->old_size, f->new_addr, f->new_size);
> +        }

For a large patch, this could be thousands of entries.  You need to
periodically process pending softirqs to avoid a watchdog timeout.

>  static int load_payload_data(struct payload *payload, uint8_t *raw, ssize_t len)
>  {
>      struct xsplice_elf elf;
> @@ -605,7 +695,14 @@ static int load_payload_data(struct payload *payload, uint8_t *raw, ssize_t len)
>      if ( rc )
>          goto err_payload;
>  
> -    /* Free our temporary data structure. */
> +    rc = check_special_sections(payload, &elf);
> +    if ( rc )
> +        goto err_payload;
> +
> +    rc = find_special_sections(payload, &elf);
> +    if ( rc )
> +        goto err_payload;
> +
>      xsplice_elf_free(&elf);
>      return 0;
>  
> @@ -617,6 +714,253 @@ static int load_payload_data(struct payload *payload, uint8_t *raw, ssize_t len)
>      return rc;
>  }
>  
> +
> +/*
> + * The following functions get the CPUs into an appropriate state and
> + * apply (or revert) each of the module's functions.
> + */
> +
> +/*
> + * This function is executed having all other CPUs with no stack (we may
> + * have cpu_idle on it) and IRQs disabled. We guard against NMI by temporarily
> + * installing our NOP NMI handler.
> + */
> +static int apply_payload(struct payload *data)
> +{
> +    unsigned int i;
> +
> +    printk(XENLOG_DEBUG "%s: Applying %u functions.\n", data->name,
> +           data->nfuncs);
> +
> +    for ( i = 0; i < data->nfuncs; i++ )
> +        xsplice_apply_jmp(data->funcs + i);

In cases such as this, it is better to use data->funcs[i], as the
compiler can more easily perform pointer alias analysis.

> +
> +    list_add_tail(&data->applied_list, &applied_list);
> +
> +    return 0;
> +}
> +
> +/*
> + * This function is executed having all other CPUs with no stack (we may
> + * have cpu_idle on it) and IRQs disabled.
> + */
> +static int revert_payload(struct payload *data)
> +{
> +    unsigned int i;
> +
> +    printk(XENLOG_DEBUG "%s: Reverting.\n", data->name);
> +
> +    for ( i = 0; i < data->nfuncs; i++ )
> +        xsplice_revert_jmp(data->funcs + i);
> +
> +    list_del(&data->applied_list);
> +
> +    return 0;
> +}
> +
> +/* Must be holding the payload_lock. */
> +static int schedule_work(struct payload *data, uint32_t cmd, uint32_t timeout)
> +{
> +    /* Fail if an operation is already scheduled. */
> +    if ( xsplice_work.do_work )
> +        return -EBUSY;
> +
> +    xsplice_work.cmd = cmd;
> +    xsplice_work.data = data;
> +    xsplice_work.timeout = timeout ? timeout : MILLISECS(30);

Can shorten to "timeout ?: MILLISECS(30);"

> +
> +    printk(XENLOG_DEBUG "%s: timeout is %"PRI_stime"ms\n", data->name,
> +           xsplice_work.timeout / MILLISECS(1));
> +
> +    atomic_set(&xsplice_work.semaphore, -1);
> +    atomic_set(&xsplice_work.irq_semaphore, -1);
> +
> +    xsplice_work.ready = 0;
> +    smp_wmb();
> +    xsplice_work.do_work = 1;
> +    smp_wmb();
> +
> +    return 0;
> +}
> +
> +/*
> + * Note that because of this NOP code the do_nmi is not safely patchable.
> + * Also if we do receive 'real' NMIs we have lost them.

The MCE path needs consideration as well.  Unlike the NMI path however,
that one cannot be ignored.

In both cases, it might be best to see about raising a tasklet or
softirq to pick up some deferred work.

> + */
> +static int mask_nmi_callback(const struct cpu_user_regs *regs, int cpu)
> +{
> +    return 1;
> +}
> +
> +static void reschedule_fn(void *unused)
> +{
> +    smp_mb(); /* Synchronize with setting do_work */
> +    raise_softirq(SCHEDULE_SOFTIRQ);

As you have to IPI each processor to raise a schedule softirq, you can
set a per-cpu "xsplice enter rendezvous" variable.  This prevents the
need for the return-to-guest path to poll one single byte.

> +}
> +
> +static int xsplice_do_wait(atomic_t *counter, s_time_t timeout,
> +                           unsigned int total_cpus, const char *s)
> +{
> +    int rc = 0;
> +
> +    while ( atomic_read(counter) != total_cpus && NOW() < timeout )
> +        cpu_relax();
> +
> +    /* Log & abort. */
> +    if ( atomic_read(counter) != total_cpus )
> +    {
> +        printk(XENLOG_DEBUG "%s: %s %u/%u\n", xsplice_work.data->name,
> +               s, atomic_read(counter), total_cpus);
> +        rc = -EBUSY;
> +        xsplice_work.data->rc = rc;
> +        xsplice_work.do_work = 0;
> +        smp_wmb();
> +        return rc;
> +    }
> +    return rc;
> +}
> +
> +static void xsplice_do_single(unsigned int total_cpus)
> +{
> +    nmi_callback_t saved_nmi_callback;
> +    struct payload *data, *tmp;
> +    s_time_t timeout;
> +    int rc;
> +
> +    data = xsplice_work.data;
> +    timeout = xsplice_work.timeout + NOW();
> +    if ( xsplice_do_wait(&xsplice_work.semaphore, timeout, total_cpus,
> +                         "Timed out on CPU semaphore") )
> +        return;
> +
> +    /* "Mask" NMIs. */
> +    saved_nmi_callback = set_nmi_callback(mask_nmi_callback);
> +
> +    /* All CPUs are waiting, now signal to disable IRQs. */
> +    xsplice_work.ready = 1;
> +    smp_wmb();
> +
> +    atomic_inc(&xsplice_work.irq_semaphore);
> +    if ( xsplice_do_wait(&xsplice_work.irq_semaphore, timeout, total_cpus,
> +                         "Timed out on IRQ semaphore.") )

This path "leaks" the NMI mask.  It would be better use "goto out;"
handling in the function to make it clearer that the error paths are
taken care of.

> +        return;
> +
> +    local_irq_disable();

local_irq_save().  Easier to restore on an error path.

> +    /* Now this function should be the only one on any stack.
> +     * No need to lock the payload list or applied list. */
> +    switch ( xsplice_work.cmd )
> +    {
> +    case XSPLICE_ACTION_APPLY:
> +        rc = apply_payload(data);
> +        if ( rc == 0 )
> +            data->state = XSPLICE_STATE_APPLIED;
> +        break;
> +    case XSPLICE_ACTION_REVERT:
> +        rc = revert_payload(data);
> +        if ( rc == 0 )
> +            data->state = XSPLICE_STATE_CHECKED;
> +        break;
> +    case XSPLICE_ACTION_REPLACE:
> +        list_for_each_entry_safe_reverse ( data, tmp, &applied_list, list )
> +        {
> +            data->rc = revert_payload(data);
> +            if ( data->rc == 0 )
> +                data->state = XSPLICE_STATE_CHECKED;
> +            else
> +            {
> +                rc = -EINVAL;
> +                break;
> +            }
> +        }
> +        if ( rc != -EINVAL )
> +        {
> +            rc = apply_payload(xsplice_work.data);
> +            if ( rc == 0 )
> +                xsplice_work.data->state = XSPLICE_STATE_APPLIED;
> +        }
> +        break;
> +    default:
> +        rc = -EINVAL;
> +        break;
> +    }

Once code modification is complete, you must execute a serialising
instruction such as cpuid, to flush the pipeline, on all cpus.  (Refer
to Intel SDM 3 8.1.4 "Handling Self- and Cross-Modifying Code")

> +
> +    xsplice_work.data->rc = rc;
> +
> +    local_irq_enable();
> +    set_nmi_callback(saved_nmi_callback);
> +
> +    xsplice_work.do_work = 0;
> +    smp_wmb(); /* Synchronize with waiting CPUs. */
> +}
> +
> +/*
> + * The main function which manages the work of quiescing the system and
> + * patching code.
> + */
> +void do_xsplice(void)
> +{
> +    struct payload *p = xsplice_work.data;
> +    unsigned int cpu = smp_processor_id();
> +
> +    /* Fast path: no work to do. */
> +    if ( likely(!xsplice_work.do_work) )
> +        return;
> +    ASSERT(local_irq_is_enabled());
> +
> +    /* Set at -1, so will go up to num_online_cpus - 1 */
> +    if ( atomic_inc_and_test(&xsplice_work.semaphore) )
> +    {
> +        unsigned int total_cpus;
> +
> +        if ( !get_cpu_maps() )
> +        {
> +            printk(XENLOG_DEBUG "%s: CPU%u - unable to get cpu_maps lock.\n",
> +                   p->name, cpu);
> +            xsplice_work.data->rc = -EBUSY;
> +            xsplice_work.do_work = 0;
> +            return;

This error path leaves a ref in the semaphore.

> +        }
> +
> +        barrier(); /* MUST do it after get_cpu_maps. */
> +        total_cpus = num_online_cpus() - 1;
> +
> +        if ( total_cpus )
> +        {
> +            printk(XENLOG_DEBUG "%s: CPU%u - IPIing the %u CPUs.\n", p->name,
> +                   cpu, total_cpus);
> +            smp_call_function(reschedule_fn, NULL, 0);
> +        }
> +        (void)xsplice_do_single(total_cpus);
> +
> +        ASSERT(local_irq_is_enabled());
> +
> +        put_cpu_maps();
> +
> +        printk(XENLOG_DEBUG "%s finished with rc=%d\n", p->name, p->rc);
> +    }
> +    else
> +    {
> +        /* Wait for all CPUs to rendezvous. */
> +        while ( xsplice_work.do_work && !xsplice_work.ready )
> +        {
> +            cpu_relax();
> +            smp_rmb();
> +        }
> +

What happens here if the rendezvous initiator times out?  Looks like we
will spin forever waiting for do_work which will never drop back to 0.

> +        /* Disable IRQs and signal. */
> +        local_irq_disable();
> +        atomic_inc(&xsplice_work.irq_semaphore);
> +
> +        /* Wait for patching to complete. */
> +        while ( xsplice_work.do_work )
> +        {
> +            cpu_relax();
> +            smp_rmb();
> +        }
> +        local_irq_enable();

Splitting the modification of do_work and ready across multiple
functions makes it particularly hard to reason about the correctness of
the rendezvous.  It would be better to have a xsplice_rendezvous()
function whose purpose was to negotiate the rendezvous only, using local
static state.  The action can then be just the switch() from
xsplice_do_single().

> +    }
> +}
> +
> diff --git a/xen/include/asm-arm/nmi.h b/xen/include/asm-arm/nmi.h
> index a60587e..82aff35 100644
> --- a/xen/include/asm-arm/nmi.h
> +++ b/xen/include/asm-arm/nmi.h
> @@ -4,6 +4,19 @@
>  #define register_guest_nmi_callback(a)  (-ENOSYS)
>  #define unregister_guest_nmi_callback() (-ENOSYS)
>  
> +typedef int (*nmi_callback_t)(const struct cpu_user_regs *regs, int cpu);
> +
> +/**
> + * set_nmi_callback
> + *
> + * Set a handler for an NMI. Only one handler may be
> + * set. Return the old nmi callback handler.
> + */
> +static inline nmi_callback_t set_nmi_callback(nmi_callback_t callback)
> +{
> +    return NULL;
> +}
> +

This addition suggests that there should probably be an
arch_xsplice_prepair_rendezvous() and arch_xsplice_finish_rendezvous().

~Andrew

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 09/23] xsplice: Add support for bug frames. (v4)
  2016-02-12 18:05 ` [PATCH v3 09/23] xsplice: Add support for bug frames. (v4) Konrad Rzeszutek Wilk
@ 2016-02-16 19:35   ` Andrew Cooper
  2016-02-24 16:22     ` Konrad Rzeszutek Wilk
  2016-02-24 16:26     ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 86+ messages in thread
From: Andrew Cooper @ 2016-02-16 19:35 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, konrad, mpohlack,
	ross.lagerwall, sasha.levin, jinsong.liu, Keir Fraser,
	Jan Beulich, Ian Campbell, Stefano Stabellini, xen-devel

On 12/02/16 18:05, Konrad Rzeszutek Wilk wrote:
> diff --git a/xen/common/symbols.c b/xen/common/symbols.c
> index a59c59d..bf5623f 100644
> --- a/xen/common/symbols.c
> +++ b/xen/common/symbols.c
> @@ -17,6 +17,7 @@
>  #include <xen/lib.h>
>  #include <xen/string.h>
>  #include <xen/spinlock.h>
> +#include <xen/xsplice.h>
>  #include <public/platform.h>
>  #include <xen/guest_access.h>
>  
> @@ -101,6 +102,12 @@ bool_t is_active_kernel_text(unsigned long addr)
>              (system_state < SYS_STATE_active && is_kernel_inittext(addr)));
>  }
>  
> +bool_t is_active_text(unsigned long addr)
> +{
> +    return is_active_kernel_text(addr) ||
> +           is_active_module_text(addr);
> +}

This would be better as a static inline in a header file, to avoid a
call into a separate translation unit.

> +
>  const char *symbols_lookup(unsigned long addr,
>                             unsigned long *symbolsize,
>                             unsigned long *offset,
> diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
> index b854c0a..7f71ac6 100644
> --- a/xen/common/xsplice.c
> +++ b/xen/common/xsplice.c
> @@ -42,7 +42,10 @@ struct payload {
>      struct list_head applied_list;       /* Linked to 'applied_list'. */
>      struct xsplice_patch_func *funcs;    /* The array of functions to patch. */
>      unsigned int nfuncs;                 /* Nr of functions to patch. */
> -
> +    size_t core_size;                    /* Everything else - .data,.rodata, etc. */
> +    size_t core_text_size;               /* Only .text size. */

These two lines should be reversed, so the comments make sense.

> +    struct bug_frame *start_bug_frames[BUGFRAME_NR]; /* .bug.frame patching. */
> +    struct bug_frame *stop_bug_frames[BUGFRAME_NR];
>      char name[XEN_XSPLICE_NAME_SIZE + 1];/* Name of it. */
>  };
>  
> @@ -561,6 +564,7 @@ static int move_payload(struct payload *payload, struct xsplice_elf *elf)
>               (SHF_ALLOC|SHF_EXECINSTR) )
>              calc_section(&elf->sec[i], &size);
>      }
> +    payload->core_text_size = size;
>  
>      /* Compute rw data */
>      for ( i = 0; i < elf->hdr->e_shnum; i++ )
> @@ -579,6 +583,7 @@ static int move_payload(struct payload *payload, struct xsplice_elf *elf)
>               !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
>              calc_section(&elf->sec[i], &size);
>      }
> +    payload->core_size = size;
>  
>      buf = alloc_payload(size);
>      if ( !buf ) {
> @@ -663,6 +668,24 @@ static int find_special_sections(struct payload *payload,
>              if ( f->pad[j] )
>                  return -EINVAL;
>      }
> +
> +    /* Optional sections. */
> +    for ( i = 0; i < BUGFRAME_NR; i++ )
> +    {
> +        char str[14];
> +
> +        snprintf(str, sizeof str, ".bug_frames.%d", i);
> +        sec = xsplice_elf_sec_by_name(elf, str);
> +        if ( !sec )
> +            continue;
> +
> +        if ( ( !sec->sec->sh_size ) ||
> +             ( sec->sec->sh_size % sizeof (struct bug_frame) ) )
> +            return -EINVAL;

Too many spaces.  (not a common style nit!)

> +
> +        payload->start_bug_frames[i] = (struct bug_frame *)sec->load_addr;
> +        payload->stop_bug_frames[i] = (struct bug_frame *)(sec->load_addr + sec->sec->sh_size);
> +    }
>      return 0;
>  }
>  
> @@ -961,6 +984,72 @@ void do_xsplice(void)
>      }
>  }
>  
> +
> +/*
> + * Functions for handling special sections.
> + */
> +struct bug_frame *xsplice_find_bug(const char *eip, int *id)
> +{
> +    struct payload *data;
> +    struct bug_frame *bug;
> +    int i;
> +
> +    /* No locking since this list is only ever changed during apply or revert
> +     * context. */
> +    list_for_each_entry ( data, &applied_list, applied_list )
> +    {
> +        for (i = 0; i < BUGFRAME_NR; i++) {

braces on new lines.

> +            if (!data->start_bug_frames[i])
> +                continue;

Newline, and can you borrow some spaces from above.

> +            if ( !((void *)eip >= data->payload_address &&
> +                   (void *)eip < (data->payload_address + data->core_text_size)))
> +                continue;
> +
> +            for ( bug = data->start_bug_frames[i]; bug != data->stop_bug_frames[i]; ++bug ) {
> +                if ( bug_loc(bug) == eip )
> +                {
> +                    *id = i;
> +                    return bug;
> +                }
> +            }
> +        }
> +    }
> +
> +    return NULL;
> +}
> +
> +bool_t is_module(const void *ptr)

I would recommend naming this "is_patch", to avoid the suggestion that
Xen supports modules.

> +{
> +    struct payload *data;
> +
> +    /* No locking since this list is only ever changed during apply or revert
> +     * context. */
> +    list_for_each_entry ( data, &applied_list, applied_list )
> +    {
> +        if ( ptr >= data->payload_address &&
> +             ptr < (data->payload_address + data->core_size))
> +            return 1;
> +    }
> +
> +    return 0;
> +}
> +
> +bool_t is_active_module_text(unsigned long addr)
> +{
> +    struct payload *data;
> +
> +    /* No locking since this list is only ever changed during apply or revert
> +     * context. */
> +    list_for_each_entry ( data, &applied_list, applied_list )
> +    {
> +        if ( (void *)addr >= data->payload_address &&
> +             (void *)addr < (data->payload_address + data->core_text_size))
> +            return 1;
> +    }
> +
> +    return 0;
> +}
> +
>  static int __init xsplice_init(void)
>  {
>      register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
> diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
> index d6db1c2..c257b3a 100644
> --- a/xen/include/xen/xsplice.h
> +++ b/xen/include/xen/xsplice.h
> @@ -26,6 +26,9 @@ struct xsplice_patch_func {
>  #ifdef CONFIG_XSPLICE
>  int xsplice_control(struct xen_sysctl_xsplice_op *);
>  void do_xsplice(void);
> +struct bug_frame *xsplice_find_bug(const char *eip, int *id);
> +bool_t is_module(const void *addr);
> +bool_t is_active_module_text(unsigned long addr);
>  
>  /* Arch hooks */
>  int xsplice_verify_elf(struct xsplice_elf *elf, uint8_t *data);
> @@ -44,5 +47,17 @@ static inline int xsplice_control(struct xen_sysctl_xsplice_op *op)
>      return -ENOSYS;
>  }
>  static inline void do_xsplice(void) { };
> +static inline struct bug_frame *xsplice_find_bug(const char *eip, int *id)
> +{
> +	return NULL;
> +}
> +static inline bool_t is_module(const void *addr)
> +{
> +	return 0;
> +}
> +static inline bool_t is_active_module_text(unsigned long addr)
> +{
> +	return 0;
> +}

There is a neater way of doing this, which doesn't involve having "if (
regular ) else if ( xsplice )" logic chains through the code.

Given a

struct virtual_region
{
    struct list_head list;
    unsigned long start, size;

    struct bug_frame *foo;
    struct exception_table_entry *bar;
};

The init code can construct one for the base hypervisor, and xsplice can
add or remove entries from the list.  Then, the trap routines search the
virtual region list for [start, size) and follow the appropriate pointers.

~Andrew

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 11/23] xsplice: Add support for alternatives
  2016-02-12 18:05 ` [PATCH v3 11/23] xsplice: Add support for alternatives Konrad Rzeszutek Wilk
@ 2016-02-16 19:41   ` Andrew Cooper
  0 siblings, 0 replies; 86+ messages in thread
From: Andrew Cooper @ 2016-02-16 19:41 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, konrad, mpohlack,
	ross.lagerwall, jinsong.liu, Keir Fraser, Jan Beulich, xen-devel

On 12/02/16 18:05, Konrad Rzeszutek Wilk wrote:
> diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
> index d863a99..65b1f11 100644
> --- a/xen/common/xsplice.c
> +++ b/xen/common/xsplice.c
> @@ -695,7 +695,7 @@ static int find_special_sections(struct payload *payload,
>      if ( sec )
>      {
>          if ( ( !sec->sec->sh_size ) ||
> -             ( sec->sec->sh_size % sizeof *sec->load_addr ) )
> +             ( sec->sec->sh_size % sizeof (struct exception_table_entry) ) )

This hunk looks like it wants to be in the previous patch.

>              return -EINVAL;
>  
>          payload->start_ex_table = (struct exception_table_entry *)sec->load_addr;
> @@ -703,6 +703,14 @@ static int find_special_sections(struct payload *payload,
>  
>          sort_exception_table(payload->start_ex_table, payload->stop_ex_table);
>      }
> +    sec = xsplice_elf_sec_by_name(elf, ".altinstructions");
> +    if ( sec )
> +    {
> +        local_irq_disable();
> +        apply_alternatives((struct alt_instr *)sec->load_addr,
> +                           (struct alt_instr *)(sec->load_addr + sec->sec->sh_size));
> +        local_irq_enable();
> +    }

None of that code is active, and it can't be made active at this point. 
Interrupts absolutely shouldn't be disabled here.  Instead, the
assertion in apply_alternatives() should be modified as you are adding a
new valid usecase.

Also, modifications like this in a function named
find_special_sections() seem wrong.  It looks like the function would be
better named prepare_payload() or similar.

~Andrew

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 13/23] XENVER_build_id: Provide ld-embedded build-ids (v10)
  2016-02-12 18:05 ` [PATCH v3 13/23] XENVER_build_id: Provide ld-embedded build-ids (v10) Konrad Rzeszutek Wilk
  2016-02-12 21:52   ` Daniel De Graaf
@ 2016-02-16 20:09   ` Andrew Cooper
  2016-02-16 20:22     ` Konrad Rzeszutek Wilk
  2016-02-24 18:52     ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 86+ messages in thread
From: Andrew Cooper @ 2016-02-16 20:09 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, konrad, mpohlack,
	ross.lagerwall, sasha.levin, jinsong.liu, Daniel De Graaf,
	Ian Jackson, Stefano Stabellini, Ian Campbell, Wei Liu,
	Stefano Stabellini, Keir Fraser, Jan Beulich, xen-devel

On 12/02/16 18:05, Konrad Rzeszutek Wilk wrote:

Building the hypervisor with buildid and making it available via
hypercall really should be split into two different patches, especially
given the complexity in each.

> The mechanism to get this is via the XENVER hypercall and
> we add a new sub-command to retrieve the binary build-id
> called XENVER_build_id. The sub-hypercall parameter
> allows an arbitrary size (the buffer and len is provided
> to the hypervisor). A NULL parameter will probe the hypervisor
> for the length of the build-id.
>
> One can also retrieve the value of the build-id by doing
> 'readelf -n xen-syms'.
>
> For EFI builds we re-use the same build-id that the xen-syms
> was built with.
>
> The version of ld that first implemented --build-id is v2.18.
> Hence we check for that or later version - if older version
> found we do not build the hypervisor with the build-id
> (and the return code is -ENODATA for that case).
>
> For x86 we have two binaries - the xen-syms and the xen - an
> smaller version with lots of sections removed. To make it possible
> for readelf -n xen we also modify mkelf32 and xen.lds.S to include
> the PT_NOTE ELF section.
>
> The EFI binary is more complicated. Having any non-recognizable
> sections (.note, .data.note, etc) causes the boot to hang.
> Moving the .note in the .data section makes it work. It is also
> worth noting that the PE/COFF does not have any "comment"
> sections to the author.
>
> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Martin Pohlack <mpohlack@amazon.de>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
> v1: Rebase it on Martin's initial patch
> v2: Move it to XENVER hypercall
> v3: Fix EFI building (Ross's fix)
> v4: Don't use the third argument for length.
> v5: Use new structure for XENVER_build_id with variable buf.
> v6: Include Ross's fix.
> v7: Include detection of bin-utils for build-id support, add
>     probing for size, and return -EPERM for XSM denied calls.
> v8: Build xen_build_id under ARM, required adding ELFSIZE in proper file.
> v9: Rebase on top XSM version class.
> v10: Include the build-id .note in the xen ELF binary.
>      s/build_id/build_id_linker/
>     For EFI build, moved the --build-id values in .data section
> ---
>  Config.mk                                    |  11 +++
>  tools/flask/policy/policy/modules/xen/xen.te |   4 +-
>  tools/libxc/xc_private.c                     |   7 ++
>  tools/libxc/xc_private.h                     |  10 ++
>  xen/arch/arm/Makefile                        |   2 +-
>  xen/arch/arm/xen.lds.S                       |  13 +++
>  xen/arch/x86/Makefile                        |  31 +++++-
>  xen/arch/x86/boot/mkelf32.c                  | 137 +++++++++++++++++++++++----
>  xen/arch/x86/xen.lds.S                       |  23 +++++
>  xen/common/kernel.c                          |  36 +++++++
>  xen/common/version.c                         |  48 ++++++++++
>  xen/include/public/version.h                 |  16 +++-
>  xen/include/xen/version.h                    |   1 +
>  xen/xsm/flask/hooks.c                        |   3 +
>  xen/xsm/flask/policy/access_vectors          |   2 +
>  15 files changed, 319 insertions(+), 25 deletions(-)
>
> diff --git a/Config.mk b/Config.mk
> index 429e460..61186e2 100644
> --- a/Config.mk
> +++ b/Config.mk
> @@ -126,6 +126,17 @@ endef
>  check-$(gcc) = $(call cc-ver-check,CC,0x040100,"Xen requires at least gcc-4.1")
>  $(eval $(check-y))
>  
> +ld-ver-build-id = $(shell $(1) --build-id 2>&1 | \
> +					grep -q unrecognized && echo n || echo y)
> +
> +# binutils 2.18 implement build-id.
> +ifeq ($(call ld-ver-build-id,$(LD)),n)
> +build_id_linker :=
> +else
> +CFLAGS += -DBUILD_ID
> +build_id_linker := --build-id=sha1
> +endif
> +
>  # as-insn: Check whether assembler supports an instruction.
>  # Usage: cflags-y += $(call as-insn "insn",option-yes,option-no)
>  as-insn = $(if $(shell echo 'void _(void) { asm volatile ( $(2) ); }' \
> diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
> index 9ad648a..2988954 100644
> --- a/tools/flask/policy/policy/modules/xen/xen.te
> +++ b/tools/flask/policy/policy/modules/xen/xen.te
> @@ -79,7 +79,7 @@ allow dom0_t xen_t:xen2 {
>  # Note that dom0 is part of domain_type so this has duplicates.
>  allow dom0_t xen_t:version {
>      version extraversion compile_info capabilities changeset
> -    platform_parameters get_features pagesize guest_handle commandline
> +    platform_parameters get_features pagesize guest_handle commandline build_id
>  };
>  
>  allow dom0_t xen_t:mmu memorymap;
> @@ -146,7 +146,7 @@ if (guest_writeconsole) {
>  # pmu_ctrl is for)
>  allow domain_type xen_t:xen2 pmu_use;
>  
> -# For normal guests all except XENVER_commandline
> +# For normal guests all except XENVER_commandline|build_id
>  allow domain_type xen_t:version {
>      version extraversion compile_info capabilities changeset
>      platform_parameters get_features pagesize guest_handle
> diff --git a/tools/libxc/xc_private.c b/tools/libxc/xc_private.c
> index c41e433..d57c39a 100644
> --- a/tools/libxc/xc_private.c
> +++ b/tools/libxc/xc_private.c
> @@ -495,6 +495,13 @@ int xc_version(xc_interface *xch, int cmd, void *arg)
>      case XENVER_commandline:
>          sz = sizeof(xen_commandline_t);
>          break;
> +    case XENVER_build_id:
> +        {
> +            xen_build_id_t *build_id = (xen_build_id_t *)arg;
> +            sz = sizeof(*build_id) + build_id->len;
> +            HYPERCALL_BOUNCE_SET_DIR(arg, XC_HYPERCALL_BUFFER_BOUNCE_BOTH);
> +            break;
> +        }
>      default:
>          ERROR("xc_version: unknown command %d\n", cmd);
>          return -EINVAL;
> diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
> index aa8daf1..6b592d3 100644
> --- a/tools/libxc/xc_private.h
> +++ b/tools/libxc/xc_private.h
> @@ -191,6 +191,16 @@ enum {
>  #define DECLARE_HYPERCALL_BOUNCE(_ubuf, _sz, _dir) DECLARE_NAMED_HYPERCALL_BOUNCE(_ubuf, _ubuf, _sz, _dir)
>  
>  /*
> + * Change the direction.
> + *
> + * Can only be used if the bounce_pre/bounce_post commands have
> + * not been used.
> + */
> +#define HYPERCALL_BOUNCE_SET_DIR(_buf, _dir) do { if ((HYPERCALL_BUFFER(_buf))->hbuf)         \
> +                                                        assert(1);                            \
> +                                                   (HYPERCALL_BUFFER(_buf))->dir = _dir;      \
> +                                                } while (0)
> +/*
>   * Set the size of data to bounce. Useful when the size is not known
>   * when the bounce buffer is declared.
>   */
> diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
> index 35ba293..8491267 100644
> --- a/xen/arch/arm/Makefile
> +++ b/xen/arch/arm/Makefile
> @@ -93,7 +93,7 @@ $(TARGET)-syms: prelink.o xen.lds $(BASEDIR)/common/symbols-dummy.o
>  	$(NM) -pa --format=sysv $(@D)/.$(@F).1 \
>  		| $(BASEDIR)/tools/symbols --sysv --sort >$(@D)/.$(@F).1.S
>  	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1.o
> -	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
> +	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o $(build_id_linker) \
>  	    $(@D)/.$(@F).1.o -o $@
>  	rm -f $(@D)/.$(@F).[0-9]*
>  
> diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
> index f501a2f..5cf180f 100644
> --- a/xen/arch/arm/xen.lds.S
> +++ b/xen/arch/arm/xen.lds.S
> @@ -22,6 +22,9 @@ OUTPUT_ARCH(FORMAT)
>  PHDRS
>  {
>    text PT_LOAD /* XXX should be AT ( XEN_PHYS_START ) */ ;
> +#if defined(BUILD_ID)
> +  note PT_NOTE ;
> +#endif
>  }
>  SECTIONS
>  {
> @@ -53,6 +56,16 @@ SECTIONS
>          _erodata = .;          /* End of read-only data */
>    } :text
>  
> +#if defined(BUILD_ID)
> +  .note : {
> +       __note_gnu_build_id_start = .;
> +       *(.note.gnu.build-id)
> +       __note_gnu_build_id_end = .;
> +       *(.note)
> +       *(.note.*)
> +  } :text
> +#endif

This data really should be contained inside rodata.

> diff --git a/xen/include/public/version.h b/xen/include/public/version.h
> index 44f26b0..adca602 100644
> --- a/xen/include/public/version.h
> +++ b/xen/include/public/version.h
> @@ -30,7 +30,8 @@
>  
>  #include "xen.h"
>  
> -/* NB. All ops return zero on success, except XENVER_{version,pagesize} */
> +/* NB. All ops return zero on success, except
> + * XENVER_{version,pagesize,build_id} */
>  
>  /* arg == NULL; returns major:minor (16:16). */
>  #define XENVER_version      0
> @@ -83,6 +84,19 @@ typedef struct xen_feature_info xen_feature_info_t;
>  #define XENVER_commandline 9
>  typedef char xen_commandline_t[1024];
>  
> +/* Return value is the number of bytes written, or XEN_Exx on error.
> + * Calling with empty parameter returns the size of build_id. */
> +#define XENVER_build_id 10
> +struct xen_build_id {
> +        uint32_t        len; /* IN: size of buf[]. */
> +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
> +        unsigned char   buf[];
> +#elif defined(__GNUC__)
> +        unsigned char   buf[1]; /* OUT: Variable length buffer with build_id. */
> +#endif
> +};
> +typedef struct xen_build_id xen_build_id_t;

I am still against trying to perpetuate this broken interface.  Variable
length structures are a pain for everyone to use.  How about introducing
a brand new hypercall with a separate length and data parameters?

~Andrew

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 15/23] xsplice: Print build_id in keyhandler.
  2016-02-12 18:05 ` [PATCH v3 15/23] xsplice: Print build_id in keyhandler Konrad Rzeszutek Wilk
@ 2016-02-16 20:13   ` Andrew Cooper
  0 siblings, 0 replies; 86+ messages in thread
From: Andrew Cooper @ 2016-02-16 20:13 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, konrad, mpohlack,
	ross.lagerwall, sasha.levin, jinsong.liu, Ian Campbell,
	Ian Jackson, Jan Beulich, Keir Fraser, Tim Deegan, xen-devel

On 12/02/16 18:05, Konrad Rzeszutek Wilk wrote:
> As it should be an useful debug mechanism.
>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  xen/common/xsplice.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
> index 65b1f11..34719fc 100644
> --- a/xen/common/xsplice.c
> +++ b/xen/common/xsplice.c
> @@ -13,6 +13,7 @@
>  #include <xen/smp.h>
>  #include <xen/softirq.h>
>  #include <xen/spinlock.h>
> +#include <xen/version.h>
>  #include <xen/wait.h>
>  #include <xen/xsplice_elf.h>
>  #include <xen/xsplice.h>
> @@ -99,7 +100,22 @@ static const char *state2str(int32_t state)
>  static void xsplice_printall(unsigned char key)
>  {
>      struct payload *data;
> -    unsigned int i;
> +    char *binary_id = NULL;
> +    unsigned int len = 0, i;
> +    int rc;
> +
> +    rc = xen_build_id(&binary_id, &len);
> +    printk("build-id: ");

This line should only be printed if a buildid is included.  Otherwise,
you will repeatedly see -ENODATA if the linker was lacking.

> +    if ( !rc )
> +    {
> +        for ( i = 0; i < len; i++ )
> +        {
> +                   uint8_t c = binary_id[i];
> +                   printk("%02x", c);

Indentation.

Also, the buildid will want printing in the start of day banner.

~Andrew

> +        }
> +           printk("\n");
> +    } else if ( rc < 0 )
> +        printk("rc = %d\n", rc);
>  
>      spin_lock(&payload_lock);
>  

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 17/23] xsplice: Print dependency and payloads build_id in the keyhandler.
  2016-02-12 18:05 ` [PATCH v3 17/23] xsplice: Print dependency and payloads build_id in the keyhandler Konrad Rzeszutek Wilk
@ 2016-02-16 20:20   ` Andrew Cooper
  2016-02-17 11:10     ` Jan Beulich
  0 siblings, 1 reply; 86+ messages in thread
From: Andrew Cooper @ 2016-02-16 20:20 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, konrad, mpohlack,
	ross.lagerwall, jinsong.liu, Ian Campbell, Ian Jackson,
	Jan Beulich, Keir Fraser, Tim Deegan, xen-devel

On 12/02/16 18:05, Konrad Rzeszutek Wilk wrote:
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  xen/common/xsplice.c | 36 ++++++++++++++++++++++++++++--------
>  1 file changed, 28 insertions(+), 8 deletions(-)
>
> diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
> index 2ba5bb5..8c5557e 100644
> --- a/xen/common/xsplice.c
> +++ b/xen/common/xsplice.c
> @@ -101,6 +101,21 @@ static const char *state2str(int32_t state)
>      return names[state];
>  }
>  
> +static void xsplice_print_build_id(char *id, unsigned int len)
> +{
> +    unsigned int i;
> +
> +    if ( !len )
> +        return;
> +
> +    for ( i = 0; i < len; i++ )
> +    {
> +        uint8_t c = id[i];
> +        printk("%02x", c);

What about the already existing %*ph custom format?  If the spaces are a
problem we could introduce %*phN from Linux which has no spaces.

The advantage of this is that it is a single call to printk, rather than
many, and avoids the ability for a different cpu to interleave in the
middle of a line.

~Andrew

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 13/23] XENVER_build_id: Provide ld-embedded build-ids (v10)
  2016-02-16 20:09   ` Andrew Cooper
@ 2016-02-16 20:22     ` Konrad Rzeszutek Wilk
  2016-02-16 20:26       ` Andrew Cooper
  2016-02-24 18:52     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-16 20:22 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Ian Campbell, jinsong.liu, Stefano Stabellini,
	Ian Jackson, xen-devel, mpohlack, ross.lagerwall,
	Stefano Stabellini, Jan Beulich, xen-devel, Daniel De Graaf,
	Keir Fraser, sasha.levin

On Tue, Feb 16, 2016 at 08:09:13PM +0000, Andrew Cooper wrote:
> On 12/02/16 18:05, Konrad Rzeszutek Wilk wrote:
> 
> Building the hypervisor with buildid and making it available via
> hypercall really should be split into two different patches, especially
> given the complexity in each.

OK, will do.


.. snip..

> > +/* Return value is the number of bytes written, or XEN_Exx on error.
> > + * Calling with empty parameter returns the size of build_id. */
> > +#define XENVER_build_id 10
> > +struct xen_build_id {
> > +        uint32_t        len; /* IN: size of buf[]. */
> > +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
> > +        unsigned char   buf[];
> > +#elif defined(__GNUC__)
> > +        unsigned char   buf[1]; /* OUT: Variable length buffer with build_id. */
> > +#endif
> > +};
> > +typedef struct xen_build_id xen_build_id_t;
> 
> I am still against trying to perpetuate this broken interface.  Variable
> length structures are a pain for everyone to use.  How about introducing
> a brand new hypercall with a separate length and data parameters?

As in subop to sysctl? I am fine with that (which is what I think was
in the first iteration of this patch had). Or it could go under the
XSPLICE subops :-)

Preferences?
> 
> ~Andrew

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 13/23] XENVER_build_id: Provide ld-embedded build-ids (v10)
  2016-02-16 20:22     ` Konrad Rzeszutek Wilk
@ 2016-02-16 20:26       ` Andrew Cooper
  2016-02-16 20:40         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 86+ messages in thread
From: Andrew Cooper @ 2016-02-16 20:26 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Ian Campbell, jinsong.liu, Stefano Stabellini,
	Ian Jackson, xen-devel, mpohlack, ross.lagerwall,
	Stefano Stabellini, Jan Beulich, xen-devel, Daniel De Graaf,
	Keir Fraser, sasha.levin

On 16/02/16 20:22, Konrad Rzeszutek Wilk wrote:
> On Tue, Feb 16, 2016 at 08:09:13PM +0000, Andrew Cooper wrote:
>> On 12/02/16 18:05, Konrad Rzeszutek Wilk wrote:
>>
>> Building the hypervisor with buildid and making it available via
>> hypercall really should be split into two different patches, especially
>> given the complexity in each.
> OK, will do.
>
>
> .. snip..
>
>>> +/* Return value is the number of bytes written, or XEN_Exx on error.
>>> + * Calling with empty parameter returns the size of build_id. */
>>> +#define XENVER_build_id 10
>>> +struct xen_build_id {
>>> +        uint32_t        len; /* IN: size of buf[]. */
>>> +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
>>> +        unsigned char   buf[];
>>> +#elif defined(__GNUC__)
>>> +        unsigned char   buf[1]; /* OUT: Variable length buffer with build_id. */
>>> +#endif
>>> +};
>>> +typedef struct xen_build_id xen_build_id_t;
>> I am still against trying to perpetuate this broken interface.  Variable
>> length structures are a pain for everyone to use.  How about introducing
>> a brand new hypercall with a separate length and data parameters?
> As in subop to sysctl? I am fine with that (which is what I think was
> in the first iteration of this patch had). Or it could go under the
> XSPLICE subops :-)
>
> Preferences?

A completely brand new hypercall.  Then we can deprecate the existing
xenver, including moving the relevent information such as plain version
numbers and leaving the irrelevant information (compile date, etc.).

~Andrew

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 13/23] XENVER_build_id: Provide ld-embedded build-ids (v10)
  2016-02-16 20:26       ` Andrew Cooper
@ 2016-02-16 20:40         ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-16 20:40 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Ian Campbell, jinsong.liu, Stefano Stabellini,
	Ian Jackson, xen-devel, mpohlack, ross.lagerwall,
	Stefano Stabellini, Jan Beulich, xen-devel, Daniel De Graaf,
	Keir Fraser, sasha.levin

> >>> +/* Return value is the number of bytes written, or XEN_Exx on error.
> >>> + * Calling with empty parameter returns the size of build_id. */
> >>> +#define XENVER_build_id 10
> >>> +struct xen_build_id {
> >>> +        uint32_t        len; /* IN: size of buf[]. */
> >>> +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
> >>> +        unsigned char   buf[];
> >>> +#elif defined(__GNUC__)
> >>> +        unsigned char   buf[1]; /* OUT: Variable length buffer with build_id. */
> >>> +#endif
> >>> +};
> >>> +typedef struct xen_build_id xen_build_id_t;
> >> I am still against trying to perpetuate this broken interface.  Variable
> >> length structures are a pain for everyone to use.  How about introducing
> >> a brand new hypercall with a separate length and data parameters?
> > As in subop to sysctl? I am fine with that (which is what I think was
> > in the first iteration of this patch had). Or it could go under the
> > XSPLICE subops :-)
> >
> > Preferences?
> 
> A completely brand new hypercall.  Then we can deprecate the existing
> xenver, including moving the relevent information such as plain version
> numbers and leaving the irrelevant information (compile date, etc.).

How would you deprecate the xenver when there are existing guests that
depend on this? Say RHEL5, SLES11 or NetBSD? They are not going to move
over and it would be a bit of silly to deprecate something and actually
never deprecate it because of users still depending on it.

Let me stress out that I have no problems adding a new hypercall (albeit
I think it is an overkill), and adding BUILD_ID in it. However I wouldn't
have the time for Xen 4.7 to add the rest of the sub-ops in it - such as
version, command line, what not.

Keep in mind we have one month left..

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)
  2016-02-16 19:11   ` Andrew Cooper
@ 2016-02-17  8:58     ` Ross Lagerwall
  2016-02-17 10:50     ` Jan Beulich
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 86+ messages in thread
From: Ross Lagerwall @ 2016-02-17  8:58 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

On 02/16/2016 07:11 PM, Andrew Cooper wrote:
snip
>> +        }
>> +
>> +        barrier(); /* MUST do it after get_cpu_maps. */
>> +        total_cpus = num_online_cpus() - 1;
>> +
>> +        if ( total_cpus )
>> +        {
>> +            printk(XENLOG_DEBUG "%s: CPU%u - IPIing the %u CPUs.\n", p->name,
>> +                   cpu, total_cpus);
>> +            smp_call_function(reschedule_fn, NULL, 0);
>> +        }
>> +        (void)xsplice_do_single(total_cpus);
>> +
>> +        ASSERT(local_irq_is_enabled());
>> +
>> +        put_cpu_maps();
>> +
>> +        printk(XENLOG_DEBUG "%s finished with rc=%d\n", p->name, p->rc);
>> +    }
>> +    else
>> +    {
>> +        /* Wait for all CPUs to rendezvous. */
>> +        while ( xsplice_work.do_work && !xsplice_work.ready )
>> +        {
>> +            cpu_relax();
>> +            smp_rmb();
>> +        }
>> +
>
> What happens here if the rendezvous initiator times out?  Looks like we
> will spin forever waiting for do_work which will never drop back to 0.

xsplice_do_wait() sets do_work to 0 on a timeout.

-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)
  2016-02-16 19:11   ` Andrew Cooper
  2016-02-17  8:58     ` Ross Lagerwall
@ 2016-02-17 10:50     ` Jan Beulich
  2016-02-19  9:30     ` Ross Lagerwall
  2016-02-23 20:41     ` Konrad Rzeszutek Wilk
  3 siblings, 0 replies; 86+ messages in thread
From: Jan Beulich @ 2016-02-17 10:50 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Jun Nakajima, jinsong.liu,
	mpohlack, ross.lagerwall, Aravind Gopalakrishnan,
	Suravee Suthikulpanit, xen-devel, Stefano Stabellini,
	Boris Ostrovsky

>>> On 16.02.16 at 20:11, <andrew.cooper3@citrix.com> wrote:
> On 12/02/16 18:05, Konrad Rzeszutek Wilk wrote:
>> +void xsplice_revert_jmp(struct xsplice_patch_func *func)
>> +{
>> +    memcpy((void *)func->old_addr, func->undo, PATCH_INSN_SIZE);
> 
> _p() is common shorthand in Xen for a cast to (void *)

Iirc this was meant to be used only in printk() arguments, and may
also only have been needed to abstract out some 32-/64-bit
differences. I'd certainly discourage use here.

>> +static int apply_payload(struct payload *data)
>> +{
>> +    unsigned int i;
>> +
>> +    printk(XENLOG_DEBUG "%s: Applying %u functions.\n", data->name,
>> +           data->nfuncs);
>> +
>> +    for ( i = 0; i < data->nfuncs; i++ )
>> +        xsplice_apply_jmp(data->funcs + i);
> 
> In cases such as this, it is better to use data->funcs[i], as the
> compiler can more easily perform pointer alias analysis.

Why would that be? &x[i] and x + i are identical for all purposes
afaik.

Jan

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 17/23] xsplice: Print dependency and payloads build_id in the keyhandler.
  2016-02-16 20:20   ` Andrew Cooper
@ 2016-02-17 11:10     ` Jan Beulich
  2016-02-24 21:54       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 86+ messages in thread
From: Jan Beulich @ 2016-02-17 11:10 UTC (permalink / raw)
  To: Andrew Cooper, Konrad Rzeszutek Wilk
  Cc: Keir Fraser, Ian Campbell, ross.lagerwall, jinsong.liu,
	Ian Jackson, Tim Deegan, mpohlack, xen-devel

>>> On 16.02.16 at 21:20, <andrew.cooper3@citrix.com> wrote:
> On 12/02/16 18:05, Konrad Rzeszutek Wilk wrote:
>> +static void xsplice_print_build_id(char *id, unsigned int len)
>> +{
>> +    unsigned int i;
>> +
>> +    if ( !len )
>> +        return;
>> +
>> +    for ( i = 0; i < len; i++ )
>> +    {
>> +        uint8_t c = id[i];
>> +        printk("%02x", c);
> 
> What about the already existing %*ph custom format?  If the spaces are a
> problem we could introduce %*phN from Linux which has no spaces.
> 
> The advantage of this is that it is a single call to printk, rather than
> many, and avoids the ability for a different cpu to interleave in the
> middle of a line.

I don't think this ability exists anymore after we've switched to
per-CPU there. Which isn't to say, though, that I wouldn't also
like to see this be just a single printk().

Jan

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 MISSING/23] xsplice: Design document (v7).
  2016-02-12 21:57   ` [PATCH v3 MISSING/23] xsplice: Design document (v7) Konrad Rzeszutek Wilk
@ 2016-02-18 16:20     ` Jan Beulich
  2016-02-19 18:36       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 86+ messages in thread
From: Jan Beulich @ 2016-02-18 16:20 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, Ian Campbell, jinsong.liu, Ian Jackson, Tim Deegan,
	mpohlack, ross.lagerwall, andrew.cooper3, xen-devel

>>> On 12.02.16 at 22:57, <konrad.wilk@oracle.com> wrote:
> +struct xsplice_patch_func {  
> +    const char *name;  
> +    Elf64_Xwordnew_addr;  

Missing space.

> +    Elf64_Xword old_addr;  
> +    Elf64_Word new_size;  
> +    Elf64_Word long old_size;  

There are still two types left here.

> +### XEN_SYSCTL_XSPLICE_GET (1)
> +
> +Retrieve an status of an specific payload. This caller provides:
> +
> + * A `struct xen_xsplice_name` called `name` which has the unique name.
> + * A `struct xen_xsplice_status` structure which has all members
> +   set to zero: That is:
> +   * `status` *MUST* be set to zero.
> +   * `rc` *MUST* be set to zero.

Why is this?

> +The structure is as follow:
> +
> +<pre>
> +struct xen_xsplice_status {  
> +#define XSPLICE_STATUS_LOADED       1  
> +#define XSPLICE_STATUS_CHECKED      2  
> +#define XSPLICE_STATUS_APPLIED      3  
> +    int32_t state;                  /* OUT: XSPLICE_STATE_*. IN: MUST be zero. */  
> +    int32_t rc;                     /* OUT: 0 if no error, otherwise -XEN_EXX. */  
> +                                    /* IN: MUST be zero. */
> +};  
> +
> +struct xen_sysctl_xsplice_summary {  
> +    xen_xsplice_name_t name;        /* IN, the name of the payload. */  
> +    xen_xsplice_status_t status;    /* IN/OUT: status of the payload. */  
> +};  

With the operation being named XEN_SYSCTL_XSPLICE_GET, shouldn't
the structure tag be xen_sysctl_xsplice_get?

> +### XEN_SYSCTL_XSPLICE_LIST (2)
> +
> +Retrieve an array of abbreviated status and names of payloads that are 
> loaded in the
> +hypervisor.
> +
> +The caller provides:
> +
> + * `version`. Initially (on first hypercall) *MUST* be zero.
> + * `idx` index iterator. On first call *MUST* be zero, subsequent calls varies.
> + * `nr` the max number of entries to populate.
> + * `pad` - *MUST* be zero.
> + * `status` virtual address of where to write `struct xen_xsplice_status`
> +   structures. Caller *MUST* allocate up to `nr` of them.
> + * `name` - virtual address of where to write the unique name of the payload.
> +   Caller *MUST* allocate up to `nr` of them. Each *MUST* be of
> +   **XEN_XSPLICE_NAME_SIZE** size.
> + * `len` - virtual address of where to write the length of each unique name
> +   of the payload. Caller *MUST* allocate up to `nr` of them. Each *MUST* be
> +   of sizeof(uint32_t) (4 bytes).
> +
> +If the hypercall returns an positive number, it is the number (upto `nr`
> +provided to the hypercall) of the payloads returned, along with `nr` updated
> +with the number of remaining payloads, `version` updated (it may be the same
> +across hypercalls - if it varies the data is stale and further calls could
> +fail). The `status`, `name`, and `len`' are updated at their designed index
> +value (`idx`) with the returned value of data.
> +
> +If the hypercall returns -XEN_E2BIG the `nr` is too big and should be
> +lowered.
> +
> +If the hypercall returns an zero value that means there are no payloads.

Maybe worth changing to "... there are no (more) payloads",
considering the iterative nature of the operation?

Jan

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)
  2016-02-16 19:11   ` Andrew Cooper
  2016-02-17  8:58     ` Ross Lagerwall
  2016-02-17 10:50     ` Jan Beulich
@ 2016-02-19  9:30     ` Ross Lagerwall
  2016-02-23 20:41     ` Konrad Rzeszutek Wilk
  3 siblings, 0 replies; 86+ messages in thread
From: Ross Lagerwall @ 2016-02-19  9:30 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Jan Beulich

On 02/16/2016 07:11 PM, Andrew Cooper wrote:
> On 12/02/16 18:05, Konrad Rzeszutek Wilk wrote:
>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>> index 9d43f7b..b5995b9 100644
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -36,6 +36,7 @@
>>   #include <xen/cpu.h>
>>   #include <xen/wait.h>
>>   #include <xen/guest_access.h>
>> +#include <xen/xsplice.h>
>>   #include <public/sysctl.h>
>>   #include <public/hvm/hvm_vcpu.h>
>>   #include <asm/regs.h>
>> @@ -121,6 +122,7 @@ static void idle_loop(void)
>>           (*pm_idle)();
>>           do_tasklet();
>>           do_softirq();
>> +        do_xsplice(); /* Must be last. */
>
> Then name "do_xsplice()" is slightly misleading (although it is in equal
> company here).  check_for_xsplice_work() would be more accurate.
>
>> diff --git a/xen/arch/x86/xsplice.c b/xen/arch/x86/xsplice.c
>> index 814dd52..ae35e91 100644
>> --- a/xen/arch/x86/xsplice.c
>> +++ b/xen/arch/x86/xsplice.c
>> @@ -10,6 +10,25 @@
>>                               __func__,__LINE__, x); return x; }
>>   #endif
>>
>> +#define PATCH_INSN_SIZE 5
>
> Somewhere you should have a BUILD_BUG_ON() confirming that
> PATCH_INSN_SIZE fits within the undo array.
>
> Having said that, I think all of xsplice_patch_func should be
> arch-specific rather than generic.
>
>> +
>> +void xsplice_apply_jmp(struct xsplice_patch_func *func)
>> +{
>> +    uint32_t val;
>> +    uint8_t *old_ptr;
>> +
>> +    old_ptr = (uint8_t *)func->old_addr;
>> +    memcpy(func->undo, old_ptr, PATCH_INSN_SIZE);
>
> At least a newline here please.
>
>> +    *old_ptr++ = 0xe9; /* Relative jump */
>> +    val = func->new_addr - func->old_addr - PATCH_INSN_SIZE;
>
> E9 takes a rel32 parameter, which is signed.
>
> I think you need to explicitly cast to intptr_t and used signed
> arithmetic throughout this calculation to correctly calculate a
> backwards jump.

According to my testing and expectations based on the spec and GCC's 
implementation-defined behaviour, the offset is computed correctly for 
backward (and forward) jumps. I'm sure the types can be improved though...

>
> I think there should also be some sanity checks that both old_addr and
> new_addr are in the Xen 1G virtual region.
>

OK. Though these sanity checks should happen when loading the patch, not 
applying it.

-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 MISSING/23] xsplice: Design document (v7).
  2016-02-18 16:20     ` Jan Beulich
@ 2016-02-19 18:36       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-19 18:36 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, Ian Campbell, jinsong.liu, Ian Jackson, Tim Deegan,
	mpohlack, ross.lagerwall, andrew.cooper3, xen-devel

On Thu, Feb 18, 2016 at 09:20:00AM -0700, Jan Beulich wrote:
> >>> On 12.02.16 at 22:57, <konrad.wilk@oracle.com> wrote:
> > +struct xsplice_patch_func {  
> > +    const char *name;  
> > +    Elf64_Xwordnew_addr;  
> 
> Missing space.
> 
> > +    Elf64_Xword old_addr;  
> > +    Elf64_Word new_size;  
> > +    Elf64_Word long old_size;  
> 
> There are still two types left here.

That wouldn't compile very well.
> 
> > +### XEN_SYSCTL_XSPLICE_GET (1)
> > +
> > +Retrieve an status of an specific payload. This caller provides:
> > +
> > + * A `struct xen_xsplice_name` called `name` which has the unique name.
> > + * A `struct xen_xsplice_status` structure which has all members
> > +   set to zero: That is:
> > +   * `status` *MUST* be set to zero.
> > +   * `rc` *MUST* be set to zero.
> 
> Why is this?

<scratches his head>.. 
It had an _pad entry in earlier versions. Let me remove the whole
'set to zero.." 

> 
> > +The structure is as follow:
> > +
> > +<pre>
> > +struct xen_xsplice_status {  
> > +#define XSPLICE_STATUS_LOADED       1  
> > +#define XSPLICE_STATUS_CHECKED      2  
> > +#define XSPLICE_STATUS_APPLIED      3  
> > +    int32_t state;                  /* OUT: XSPLICE_STATE_*. IN: MUST be zero. */  
> > +    int32_t rc;                     /* OUT: 0 if no error, otherwise -XEN_EXX. */  
> > +                                    /* IN: MUST be zero. */
> > +};  
> > +
> > +struct xen_sysctl_xsplice_summary {  
> > +    xen_xsplice_name_t name;        /* IN, the name of the payload. */  
> > +    xen_xsplice_status_t status;    /* IN/OUT: status of the payload. */  
> > +};  
> 
> With the operation being named XEN_SYSCTL_XSPLICE_GET, shouldn't
> the structure tag be xen_sysctl_xsplice_get?

Yes!
> 
> > +### XEN_SYSCTL_XSPLICE_LIST (2)
> > +
> > +Retrieve an array of abbreviated status and names of payloads that are 
> > loaded in the
> > +hypervisor.
> > +
> > +The caller provides:
> > +
> > + * `version`. Initially (on first hypercall) *MUST* be zero.
> > + * `idx` index iterator. On first call *MUST* be zero, subsequent calls varies.
> > + * `nr` the max number of entries to populate.
> > + * `pad` - *MUST* be zero.
> > + * `status` virtual address of where to write `struct xen_xsplice_status`
> > +   structures. Caller *MUST* allocate up to `nr` of them.
> > + * `name` - virtual address of where to write the unique name of the payload.
> > +   Caller *MUST* allocate up to `nr` of them. Each *MUST* be of
> > +   **XEN_XSPLICE_NAME_SIZE** size.
> > + * `len` - virtual address of where to write the length of each unique name
> > +   of the payload. Caller *MUST* allocate up to `nr` of them. Each *MUST* be
> > +   of sizeof(uint32_t) (4 bytes).
> > +
> > +If the hypercall returns an positive number, it is the number (upto `nr`
> > +provided to the hypercall) of the payloads returned, along with `nr` updated
> > +with the number of remaining payloads, `version` updated (it may be the same
> > +across hypercalls - if it varies the data is stale and further calls could
> > +fail). The `status`, `name`, and `len`' are updated at their designed index
> > +value (`idx`) with the returned value of data.
> > +
> > +If the hypercall returns -XEN_E2BIG the `nr` is too big and should be
> > +lowered.
> > +
> > +If the hypercall returns an zero value that means there are no payloads.
> 
> Maybe worth changing to "... there are no (more) payloads",
> considering the iterative nature of the operation?

Yes.

This is the change I did:


diff --git a/docs/misc/xsplice.markdown b/docs/misc/xsplice.markdown
index 9a95243..69c5176 100644
--- a/docs/misc/xsplice.markdown
+++ b/docs/misc/xsplice.markdown
@@ -289,10 +289,10 @@ describing the functions to be patched:
 <pre>
 struct xsplice_patch_func {  
     const char *name;  
-    Elf64_Xwordnew_addr;  
+    Elf64_Xword new_addr;  
     Elf64_Xword old_addr;  
     Elf64_Word new_size;  
-    Elf64_Word long old_size;  
+    Elf64_Word old_size;  
     uint8_t pad[32];  
 };  
 </pre>
@@ -425,10 +425,8 @@ struct xen_sysctl_xsplice_upload {
 Retrieve an status of an specific payload. This caller provides:
 
  * A `struct xen_xsplice_name` called `name` which has the unique name.
- * A `struct xen_xsplice_status` structure which has all members
-   set to zero: That is:
-   * `status` *MUST* be set to zero.
-   * `rc` *MUST* be set to zero.
+ * A `struct xen_xsplice_status` structure. The member values will
+   be over-written upon completion.
 
 Upon completion the `struct xen_xsplice_status` is updated.
 
@@ -473,12 +471,11 @@ struct xen_xsplice_status {
 #define XSPLICE_STATUS_LOADED       1  
 #define XSPLICE_STATUS_CHECKED      2  
 #define XSPLICE_STATUS_APPLIED      3  
-    int32_t state;                  /* OUT: XSPLICE_STATE_*. IN: MUST be zero. */  
+    int32_t state;                  /* OUT: XSPLICE_STATE_*. */  
     int32_t rc;                     /* OUT: 0 if no error, otherwise -XEN_EXX. */  
-                                    /* IN: MUST be zero. */
 };  
 
-struct xen_sysctl_xsplice_summary {  
+struct xen_sysctl_xsplice_get {  
     xen_xsplice_name_t name;        /* IN, the name of the payload. */  
     xen_xsplice_status_t status;    /* IN/OUT: status of the payload. */  
 };  
@@ -514,7 +511,7 @@ value (`idx`) with the returned value of data.
 If the hypercall returns -XEN_E2BIG the `nr` is too big and should be
 lowered.
 
-If the hypercall returns an zero value that means there are no payloads.
+If the hypercall returns an zero value there are no more payloads.
 
 Note that due to the asynchronous nature of hypercalls the control domain might
 have added or removed a number of payloads making this information stale. It is
> 
> Jan
> 

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 01/23] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op (v10)
  2016-02-12 20:11   ` Andrew Cooper
  2016-02-12 20:40     ` Konrad Rzeszutek Wilk
@ 2016-02-19 19:36     ` Konrad Rzeszutek Wilk
  2016-02-19 19:43       ` Andrew Cooper
  1 sibling, 1 reply; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-19 19:36 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Ian Campbell, jinsong.liu, Stefano Stabellini,
	Ian Jackson, xen-devel, mpohlack, ross.lagerwall, xen-devel,
	Daniel De Graaf, sasha.levin

> >  long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
> >  {
> > @@ -460,6 +461,12 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
> >          ret = tmem_control(&op->u.tmem_op);
> >          break;
> >  
> > +    case XEN_SYSCTL_xsplice_op:
> > +        ret = xsplice_control(&op->u.xsplice);
> 
> Could we name this do_xsplice_op() to match prevailing subop style.

There are two instances of that: do_get_pm_info, do_pm_op.

Then variations of 'do' are: cpupool_do_sysctl, arch_do_physinfo, and
arch_do_sysctl.

And then ones enjoying 'op' in it:
sysctl_coverage_op

And then 'control' ones:
spinlock_profile_control, tmem_control, perfc_control, tb_control.

So we have 2 vs 3 vs 1 vs 4.

I would say that the name 'xsplice_control' is the prevailing style?

Unless you want me to take a union of them, perhaps:

 do_xsplice_control_op ?

<chuckles>

I will change it to what you prefer - do_xsplice_op.
> 
> > +        if ( ret != -ENOSYS )
> > +            copyback = 1;
> > +        break;
> > +
> 
> Not related to this patch.  I (and by this, I mean someone with time ;p)
> should do some cleanup and pass copyback by pointer to subops.  This
> allows for finer grain control of whether a copyback is needed.

Yes indeed. But then how often do you do sysctl hypercalls?

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 01/23] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op (v10)
  2016-02-19 19:36     ` Konrad Rzeszutek Wilk
@ 2016-02-19 19:43       ` Andrew Cooper
  0 siblings, 0 replies; 86+ messages in thread
From: Andrew Cooper @ 2016-02-19 19:43 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Ian Campbell, jinsong.liu, Stefano Stabellini,
	Ian Jackson, xen-devel, mpohlack, ross.lagerwall, xen-devel,
	Daniel De Graaf, sasha.levin

On 19/02/2016 19:36, Konrad Rzeszutek Wilk wrote:
>>>  long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>>>  {
>>> @@ -460,6 +461,12 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>>>          ret = tmem_control(&op->u.tmem_op);
>>>          break;
>>>  
>>> +    case XEN_SYSCTL_xsplice_op:
>>> +        ret = xsplice_control(&op->u.xsplice);
>> Could we name this do_xsplice_op() to match prevailing subop style.
> There are two instances of that: do_get_pm_info, do_pm_op.
>
> Then variations of 'do' are: cpupool_do_sysctl, arch_do_physinfo, and
> arch_do_sysctl.
>
> And then ones enjoying 'op' in it:
> sysctl_coverage_op
>
> And then 'control' ones:
> spinlock_profile_control, tmem_control, perfc_control, tb_control.
>
> So we have 2 vs 3 vs 1 vs 4.
>
> I would say that the name 'xsplice_control' is the prevailing style?
>
> Unless you want me to take a union of them, perhaps:
>
>  do_xsplice_control_op ?
>
> <chuckles>
>
> I will change it to what you prefer - do_xsplice_op.

The important bit (for logically associating different bits of code) is
to have the main stem matching the hypercall op name.  Simply
"xsplice_op()" would be ok, and could naturally be extended to
arch_xsplice_op() if the need arises.

>>> +        if ( ret != -ENOSYS )
>>> +            copyback = 1;
>>> +        break;
>>> +
>> Not related to this patch.  I (and by this, I mean someone with time ;p)
>> should do some cleanup and pass copyback by pointer to subops.  This
>> allows for finer grain control of whether a copyback is needed.
> Yes indeed. But then how often do you do sysctl hypercalls?

The purpose is for simplifying the in-hypervisor codepaths, rather than
performance.  (A side effect would be to reduce the size of the
alternatives table patching stac/clac instructions).

~Andrew

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 02/23] libxc: Implementation of XEN_XSPLICE_op in libxc (v5).
  2016-02-15 12:35   ` Wei Liu
@ 2016-02-19 20:04     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-19 20:04 UTC (permalink / raw)
  To: Wei Liu
  Cc: Ian Campbell, andrew.cooper3, Stefano Stabellini, Ian Jackson,
	xen-devel, mpohlack, ross.lagerwall, jinsong.liu, xen-devel,
	sasha.levin

.snip..
> > +/*
> > + * The heart of this function is to get an array of xen_xsplice_status_t.
> > + *
> > + * However it is complex because it has to deal with the hypervisor
> > + * returning -EAGAIN or the data that is being returned becomes stale
> > + * (another hypercall might alter the list).
> > + *
> 
> I don't see EAGAIN handled in the following function. Is that expected?

Wrongly worded. The EAGAIN won't show up - instead it will be the
number of entries potentially less than what is available.
.. snip..
> > +    max_batch_sz = max;
> > +    /* Convience value. */
> > +    sz = sizeof(*name) * XEN_XSPLICE_NAME_SIZE;
> > +    *done = 0;
> > +    *left = 0;
> > +    do {
> > +        /*
> > +         * The first time we go in this loop our 'max' may be bigger
> > +         * than what the hypervisor is comfortable with - hence the first
> > +         * couple of loops may adjust the number of entries we will
> > +         * want filled (tracked by 'nr').
> > +         */
> > +        if ( adjust )
> > +            adjust = 0; /* Used when adjusting the 'max_batch_sz' or 'retries'. */
> > +
> 
> This is equivalent to always setting adjust to 0.

Correct.
> 
> > +        nr = min(max - *done, max_batch_sz);
> > +
> > +        sysctl.u.xsplice.u.list.nr = nr;
> > +        /* Fix the size (may vary between hypercalls). */
> > +        HYPERCALL_BOUNCE_SET_SIZE(info, nr * sizeof(*info));
> > +        HYPERCALL_BOUNCE_SET_SIZE(name, nr * nr);
> > +        HYPERCALL_BOUNCE_SET_SIZE(len, nr * sizeof(*len));
> > +        /* Move the pointer to proper offset into 'info'. */
> > +        (HYPERCALL_BUFFER(info))->ubuf = info + *done;
> > +        (HYPERCALL_BUFFER(name))->ubuf = name + (sz * *done);
> > +        (HYPERCALL_BUFFER(len))->ubuf = len + *done;
> > +        /* Allocate memory. */
> > +        rc = xc_hypercall_bounce_pre(xch, info);
> > +        if ( rc )
> > +            break;
> > +
> > +        rc = xc_hypercall_bounce_pre(xch, name);
> > +        if ( rc )
> > +            break;
> > +
> > +        rc = xc_hypercall_bounce_pre(xch, len);
> > +        if ( rc )
> > +            break;
> > +
> > +        set_xen_guest_handle(sysctl.u.xsplice.u.list.status, info);
> > +        set_xen_guest_handle(sysctl.u.xsplice.u.list.name, name);
> > +        set_xen_guest_handle(sysctl.u.xsplice.u.list.len, len);
> > +
> > +        rc = do_sysctl(xch, &sysctl);
> > +        /*
> > +         * From here on we MUST call xc_hypercall_bounce. If rc < 0 we
> > +         * end up doing it (outside the loop), so using a break is OK.
> > +         */
> > +        if ( rc < 0 && errno == E2BIG )
> > +        {
> > +            if ( max_batch_sz <= 1 )
> > +                break;
> > +            max_batch_sz >>= 1;
> > +            adjust = 1; /* For the loop conditional to let us loop again. */
> > +            /* No memory leaks! */
> > +            xc_hypercall_bounce_post(xch, info);
> > +            xc_hypercall_bounce_post(xch, name);
> > +            xc_hypercall_bounce_post(xch, len);
> > +            continue;
> > +        }
> > +        else if ( rc < 0 ) /* For all other errors we bail out. */
> > +            break;
> > +
> > +        if ( !version )
> > +            version = sysctl.u.xsplice.u.list.version;
> > +
> > +        if ( sysctl.u.xsplice.u.list.version != version )
> > +        {
> > +            /* We could make this configurable as parameter? */
> > +            if ( retries++ > 3 )
> > +            {
> > +                rc = -1;
> > +                errno = EBUSY;
> > +                break;
> > +            }
> > +            *done = 0; /* Retry from scratch. */
> > +            version = sysctl.u.xsplice.u.list.version;
> > +            adjust = 1; /* And make sure we continue in the loop. */
> 
> Actually this "adjust" variable looks useless to me because you always
> use "continue" afterwards. It won't ever get used in "while".

We need that for the conditional. Keep in mind that in a do { .. } while
loop the conditional gets checked _after_ the code has run.

Which means (if we did not have adjust) that it would check for:

(*done < max && *left != 0)

And *done = 0, *left = 0 at the start.

Since *left == 0, so we would exit the loop right away.

I've put a comment in the loop about it.
> 
> > +            /* No memory leaks. */
> > +            xc_hypercall_bounce_post(xch, info);
> > +            xc_hypercall_bounce_post(xch, name);
> > +            xc_hypercall_bounce_post(xch, len);
> > +            continue;
> > +        }
> > +
> > +        /* We should never hit this, but just in case. */
> > +        if ( rc > nr )
> > +        {
> > +            errno = EINVAL; /* Overflow! */
> 
> Use EOVERFLOW?

Duh! Yes :-)

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 03/23] xen-xsplice: Tool to manipulate xsplice payloads (v4)
  2016-02-15 12:59   ` Wei Liu
@ 2016-02-19 20:46     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-19 20:46 UTC (permalink / raw)
  To: Wei Liu
  Cc: Ian Campbell, andrew.cooper3, Stefano Stabellini, Ian Jackson,
	xen-devel, mpohlack, ross.lagerwall, jinsong.liu, xen-devel,
	sasha.levin

On Mon, Feb 15, 2016 at 12:59:02PM +0000, Wei Liu wrote:
> On Fri, Feb 12, 2016 at 01:05:41PM -0500, Konrad Rzeszutek Wilk wrote:
> [...]
> > diff --git a/tools/misc/xen-xsplice.c b/tools/misc/xen-xsplice.c
> > new file mode 100644
> > index 0000000..13f762f
> > --- /dev/null
> > +++ b/tools/misc/xen-xsplice.c
> 
> One gripe I have with this program is that many of its functions mix
> direct return and goto style error handling.

All goto's have been removed.
> 
> > @@ -0,0 +1,470 @@
> > +/*
> > + * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
> > + */
> > +
> [...]
> > +static const char *state2str(long state)
> > +{
> > +#define STATE(x) [XSPLICE_STATE_##x] = #x
> > +    static const char *const names[] = {
> > +            STATE(LOADED),
> > +            STATE(CHECKED),
> > +            STATE(APPLIED),
> > +    };
> > +#undef STATE
> > +    if (state >= ARRAY_SIZE(names))
> > +        return "unknown";
> > +
> > +    if (state < 0)
> > +        return "-EXX";
> > +
> 
> This doesn't look very useful.

No it does not. Removed.

Also cleaned up some of the 'printf( errors that should have
gone to fprintf(stderr,'
> 
> Wei.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 04/23] elf: Add relocation types to elfstructs.h
  2016-02-15  8:34   ` Jan Beulich
@ 2016-02-19 21:05     ` Konrad Rzeszutek Wilk
  2016-02-22 10:17       ` Jan Beulich
  2016-02-22 15:19       ` Ross Lagerwall
  0 siblings, 2 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-19 21:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, Ian Campbell, jinsong.liu, Ian Jackson, Tim Deegan,
	mpohlack, ross.lagerwall, andrew.cooper3, xen-devel, sasha.levin

On Mon, Feb 15, 2016 at 01:34:42AM -0700, Jan Beulich wrote:
> >>> On 12.02.16 at 19:05, <konrad.wilk@oracle.com> wrote:
> > --- a/xen/include/xen/elfstructs.h
> > +++ b/xen/include/xen/elfstructs.h
> > @@ -348,6 +348,14 @@ typedef struct {
> >  #define	ELF64_R_TYPE(info)	((info) & 0xFFFFFFFF)
> >  #define ELF64_R_INFO(s,t) 	(((s) << 32) + (u_int32_t)(t))
> >  
> > +/* x86-64 relocation types. We list only the ones we implement. */
> 
> "we implement" is too vague for my taste: This comment should
> have some kind of reference to xSplice.


/* x86-64 relocation types. We list only the ones xSplice implements. */

?

> 
> > +#define R_X86_64_NONE		0	/* No reloc */
> > +#define R_X86_64_64		1	/* Direct 64 bit  */
> > +#define R_X86_64_PC32		2	/* PC relative 32 bit signed */
> > +#define R_X86_64_PLT32		4	/* 32 bit PLT address */
> > +#define R_X86_64_32		10	/* Direct 32 bit zero extended */
> > +#define R_X86_64_32S		11	/* Direct 32 bit sign extended */
> 
> Is there really a use case for the last two in the hypervisor
> (which doesn't live in the top 2G of address space)? (If the

No. But they are there to catch tools (and developers) by accident
building the payloads with wacky linker options (like I did).

> use case are constants, I suppose R_X86_64_{8,16} ought
> to also be permitted.) Also, is there a reason why at least
> R_X86_64_PC64 shouldn't also be supported?

It hasn't been implemented. Nor has the situation come up
when this was used.

In the previous round of reviews the feedback was that we should only
list the ones the code base was referencing.

Let me add R_X86_64_PC64 on the TODO list.
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 06/23] xsplice: Implement payload loading (v4)
  2016-02-12 20:48   ` Andrew Cooper
@ 2016-02-19 22:03     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-19 22:03 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Keir Fraser, Ian Campbell, jinsong.liu, xen-devel, mpohlack,
	ross.lagerwall, Stefano Stabellini, Jan Beulich, xen-devel,
	sasha.levin

> > +         hdr->e_ident[EI_CLASS] != ELFCLASS64 ||
> > +         hdr->e_ident[EI_DATA] != ELFDATA2LSB ||
> > +         hdr->e_ident[EI_OSABI] != ELFOSABI_SYSV ||
> > +         hdr->e_machine != EM_X86_64 ||
> > +         hdr->e_type != ET_REL ||
> > +         hdr->e_phnum != 0 )
> > +    {
> > +        printk(XENLOG_ERR "%s: Invalid ELF file.\n", elf->name);
> 
> Where possible, please avoid punction in error messages.  Its just
> wasted characters on the uart.
> 
> I would also suggest the error message be "xpatch '%s': Invalid ELF
> file\n" to give the observer some clue that we are referring to payload
> attached to a specific xsplice patch.

Isn't elf->name doing that already?
> 
> > +        return -EOPNOTSUPP;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +int xsplice_perform_rel(struct xsplice_elf *elf,
> > +                        struct xsplice_elf_sec *base,
> > +                        struct xsplice_elf_sec *rela)
> > +{
> > +    printk(XENLOG_ERR "%s: SHR_REL relocation unsupported\n", elf->name);
> 
> Simiarly here.  All the error messages should have some common
> indication that we are in the xsplice subsystem.

I changed that one to dprintk so it will have it. Let me add the xSplice
be part of the naming convention in all the printk's that can come about
in the field.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 04/23] elf: Add relocation types to elfstructs.h
  2016-02-19 21:05     ` Konrad Rzeszutek Wilk
@ 2016-02-22 10:17       ` Jan Beulich
  2016-02-22 15:19       ` Ross Lagerwall
  1 sibling, 0 replies; 86+ messages in thread
From: Jan Beulich @ 2016-02-22 10:17 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, Ian Campbell, jinsong.liu, Ian Jackson, Tim Deegan,
	mpohlack, ross.lagerwall, andrew.cooper3, xen-devel, sasha.levin

>>> On 19.02.16 at 22:05, <konrad.wilk@oracle.com> wrote:
> On Mon, Feb 15, 2016 at 01:34:42AM -0700, Jan Beulich wrote:
>> >>> On 12.02.16 at 19:05, <konrad.wilk@oracle.com> wrote:
>> > --- a/xen/include/xen/elfstructs.h
>> > +++ b/xen/include/xen/elfstructs.h
>> > @@ -348,6 +348,14 @@ typedef struct {
>> >  #define	ELF64_R_TYPE(info)	((info) & 0xFFFFFFFF)
>> >  #define ELF64_R_INFO(s,t) 	(((s) << 32) + (u_int32_t)(t))
>> >  
>> > +/* x86-64 relocation types. We list only the ones we implement. */
>> 
>> "we implement" is too vague for my taste: This comment should
>> have some kind of reference to xSplice.
> 
> 
> /* x86-64 relocation types. We list only the ones xSplice implements. */
> 
> ?
> 
>> 
>> > +#define R_X86_64_NONE		0	/* No reloc */
>> > +#define R_X86_64_64		1	/* Direct 64 bit  */
>> > +#define R_X86_64_PC32		2	/* PC relative 32 bit signed */
>> > +#define R_X86_64_PLT32		4	/* 32 bit PLT address */
>> > +#define R_X86_64_32		10	/* Direct 32 bit zero extended */
>> > +#define R_X86_64_32S		11	/* Direct 32 bit sign extended */
>> 
>> Is there really a use case for the last two in the hypervisor
>> (which doesn't live in the top 2G of address space)? (If the
> 
> No. But they are there to catch tools (and developers) by accident
> building the payloads with wacky linker options (like I did).

Since you will need to refuse any unknown ones anyway, I don't
see a reason to name some unsupported one but not others.

Jan

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 19/23] xsplice, symbols: Implement symbol name resolution on address. (v2)
  2016-02-12 18:05 ` [PATCH v3 19/23] xsplice, symbols: Implement symbol name resolution on address. (v2) Konrad Rzeszutek Wilk
@ 2016-02-22 14:57   ` Ross Lagerwall
  0 siblings, 0 replies; 86+ messages in thread
From: Ross Lagerwall @ 2016-02-22 14:57 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, andrew.cooper3, konrad,
	mpohlack, sasha.levin, jinsong.liu, Keir Fraser, Jan Beulich,
	xen-devel

On 02/12/2016 06:05 PM, Konrad Rzeszutek Wilk wrote:
snip
>   static int load_payload_data(struct payload *payload, uint8_t *raw, ssize_t len)
>   {
>       struct xsplice_elf elf;
> @@ -831,6 +953,10 @@ static int load_payload_data(struct payload *payload, uint8_t *raw, ssize_t len)
>       if ( rc )
>           goto err_payload;
>
> +    rc = build_symbol_table(payload, &elf);
> +    if ( rc )
> +        goto err_payload;
> +
>       rc = find_special_sections(payload, &elf);
>       if ( rc )
>           goto err_payload;
> @@ -1234,6 +1360,31 @@ unsigned long search_module_extables(unsigned long addr)
>   }
>   #endif
>

build_symbol_table() needs to go after find_special_sections() because 
it uses payload->nfuncs which is only calculated in 
find_special_sections(). Why did you reorder it from how I did it?

-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)
  2016-02-12 18:05 ` [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5) Konrad Rzeszutek Wilk
  2016-02-16 19:11   ` Andrew Cooper
@ 2016-02-22 15:00   ` Ross Lagerwall
  2016-02-22 17:06     ` Ross Lagerwall
  2016-02-23 20:43     ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 86+ messages in thread
From: Ross Lagerwall @ 2016-02-22 15:00 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, andrew.cooper3, konrad,
	mpohlack, sasha.levin, jinsong.liu, Ian Campbell,
	Stefano Stabellini, Keir Fraser, Jan Beulich, Boris Ostrovsky,
	Suravee Suthikulpanit, Aravind Gopalakrishnan, Jun Nakajima,
	Kevin Tian, xen-devel

On 02/12/2016 06:05 PM, Konrad Rzeszutek Wilk wrote:
snip
> +static void xsplice_do_single(unsigned int total_cpus)
> +{
> +    nmi_callback_t saved_nmi_callback;
> +    struct payload *data, *tmp;
> +    s_time_t timeout;
> +    int rc;
> +
> +    data = xsplice_work.data;
> +    timeout = xsplice_work.timeout + NOW();
> +    if ( xsplice_do_wait(&xsplice_work.semaphore, timeout, total_cpus,
> +                         "Timed out on CPU semaphore") )
> +        return;
> +
> +    /* "Mask" NMIs. */
> +    saved_nmi_callback = set_nmi_callback(mask_nmi_callback);
> +
> +    /* All CPUs are waiting, now signal to disable IRQs. */
> +    xsplice_work.ready = 1;
> +    smp_wmb();
> +
> +    atomic_inc(&xsplice_work.irq_semaphore);
> +    if ( xsplice_do_wait(&xsplice_work.irq_semaphore, timeout, total_cpus,
> +                         "Timed out on IRQ semaphore.") )
> +        return;
> +
> +    local_irq_disable();
> +    /* Now this function should be the only one on any stack.
> +     * No need to lock the payload list or applied list. */
> +    switch ( xsplice_work.cmd )
> +    {
> +    case XSPLICE_ACTION_APPLY:
> +        rc = apply_payload(data);
> +        if ( rc == 0 )
> +            data->state = XSPLICE_STATE_APPLIED;
> +        break;
> +    case XSPLICE_ACTION_REVERT:
> +        rc = revert_payload(data);
> +        if ( rc == 0 )
> +            data->state = XSPLICE_STATE_CHECKED;
> +        break;
> +    case XSPLICE_ACTION_REPLACE:
> +        list_for_each_entry_safe_reverse ( data, tmp, &applied_list, list )
> +        {
> +            data->rc = revert_payload(data);
> +            if ( data->rc == 0 )
> +                data->state = XSPLICE_STATE_CHECKED;
> +            else
> +            {
> +                rc = -EINVAL;
> +                break;
> +            }
> +        }

You're using data as a loop iterator here but the variable serves 
another purpose outside the loop. That's not gonna end well.

-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 04/23] elf: Add relocation types to elfstructs.h
  2016-02-19 21:05     ` Konrad Rzeszutek Wilk
  2016-02-22 10:17       ` Jan Beulich
@ 2016-02-22 15:19       ` Ross Lagerwall
  1 sibling, 0 replies; 86+ messages in thread
From: Ross Lagerwall @ 2016-02-22 15:19 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Jan Beulich
  Cc: Keir Fraser, Ian Campbell, jinsong.liu, Ian Jackson, Tim Deegan,
	mpohlack, andrew.cooper3, xen-devel, sasha.levin

On 02/19/2016 09:05 PM, Konrad Rzeszutek Wilk wrote:
> On Mon, Feb 15, 2016 at 01:34:42AM -0700, Jan Beulich wrote:
>>>>> On 12.02.16 at 19:05, <konrad.wilk@oracle.com> wrote:
>>> --- a/xen/include/xen/elfstructs.h
>>> +++ b/xen/include/xen/elfstructs.h
>>> @@ -348,6 +348,14 @@ typedef struct {
>>>   #define	ELF64_R_TYPE(info)	((info) & 0xFFFFFFFF)
>>>   #define ELF64_R_INFO(s,t) 	(((s) << 32) + (u_int32_t)(t))
>>>
>>> +/* x86-64 relocation types. We list only the ones we implement. */
>>
>> "we implement" is too vague for my taste: This comment should
>> have some kind of reference to xSplice.
>
>
> /* x86-64 relocation types. We list only the ones xSplice implements. */
>
> ?
>
>>
>>> +#define R_X86_64_NONE		0	/* No reloc */
>>> +#define R_X86_64_64		1	/* Direct 64 bit  */
>>> +#define R_X86_64_PC32		2	/* PC relative 32 bit signed */
>>> +#define R_X86_64_PLT32		4	/* 32 bit PLT address */
>>> +#define R_X86_64_32		10	/* Direct 32 bit zero extended */
>>> +#define R_X86_64_32S		11	/* Direct 32 bit sign extended */
>>
>> Is there really a use case for the last two in the hypervisor
>> (which doesn't live in the top 2G of address space)? (If the
>
> No. But they are there to catch tools (and developers) by accident
> building the payloads with wacky linker options (like I did).
>
>> use case are constants, I suppose R_X86_64_{8,16} ought
>> to also be permitted.) Also, is there a reason why at least
>> R_X86_64_PC64 shouldn't also be supported?
>
> It hasn't been implemented. Nor has the situation come up
> when this was used.
>
> In the previous round of reviews the feedback was that we should only
> list the ones the code base was referencing.
>
> Let me add R_X86_64_PC64 on the TODO list.
>>

 From my testing, GCC generates R_X86_64_64, R_X86_64_PC32, and 
R_X86_64_PLT32 relocations so those are the ones we need initially. More 
can be added later as needed, I suppose.

-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)
  2016-02-22 15:00   ` Ross Lagerwall
@ 2016-02-22 17:06     ` Ross Lagerwall
  2016-02-23 20:47       ` Konrad Rzeszutek Wilk
  2016-02-23 20:43     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 86+ messages in thread
From: Ross Lagerwall @ 2016-02-22 17:06 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel

On 02/22/2016 03:00 PM, Ross Lagerwall wrote:
> On 02/12/2016 06:05 PM, Konrad Rzeszutek Wilk wrote:
> snip
>> +static void xsplice_do_single(unsigned int total_cpus)
>> +{
>> +    nmi_callback_t saved_nmi_callback;
>> +    struct payload *data, *tmp;
>> +    s_time_t timeout;
>> +    int rc;
>> +
>> +    data = xsplice_work.data;
>> +    timeout = xsplice_work.timeout + NOW();
>> +    if ( xsplice_do_wait(&xsplice_work.semaphore, timeout, total_cpus,
>> +                         "Timed out on CPU semaphore") )
>> +        return;
>> +
>> +    /* "Mask" NMIs. */
>> +    saved_nmi_callback = set_nmi_callback(mask_nmi_callback);
>> +
>> +    /* All CPUs are waiting, now signal to disable IRQs. */
>> +    xsplice_work.ready = 1;
>> +    smp_wmb();
>> +
>> +    atomic_inc(&xsplice_work.irq_semaphore);
>> +    if ( xsplice_do_wait(&xsplice_work.irq_semaphore, timeout,
>> total_cpus,
>> +                         "Timed out on IRQ semaphore.") )
>> +        return;
>> +
>> +    local_irq_disable();
>> +    /* Now this function should be the only one on any stack.
>> +     * No need to lock the payload list or applied list. */
>> +    switch ( xsplice_work.cmd )
>> +    {
>> +    case XSPLICE_ACTION_APPLY:
>> +        rc = apply_payload(data);
>> +        if ( rc == 0 )
>> +            data->state = XSPLICE_STATE_APPLIED;
>> +        break;
>> +    case XSPLICE_ACTION_REVERT:
>> +        rc = revert_payload(data);
>> +        if ( rc == 0 )
>> +            data->state = XSPLICE_STATE_CHECKED;
>> +        break;
>> +    case XSPLICE_ACTION_REPLACE:
>> +        list_for_each_entry_safe_reverse ( data, tmp, &applied_list,
>> list )
>> +        {
>> +            data->rc = revert_payload(data);
>> +            if ( data->rc == 0 )
>> +                data->state = XSPLICE_STATE_CHECKED;
>> +            else
>> +            {
>> +                rc = -EINVAL;
>> +                break;
>> +            }
>> +        }
>
> You're using data as a loop iterator here but the variable serves
> another purpose outside the loop. That's not gonna end well.
>

Also above, you've got:
list_for_each_entry_safe_reverse ( data, tmp, &applied_list, list )

but it needs to be:
list_for_each_entry_safe_reverse ( data, tmp, &applied_list, applied_list )

I'm not sure why this was changed from how I had it...

rc is also used uninitialized in the replace path.

-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)
  2016-02-16 19:11   ` Andrew Cooper
                       ` (2 preceding siblings ...)
  2016-02-19  9:30     ` Ross Lagerwall
@ 2016-02-23 20:41     ` Konrad Rzeszutek Wilk
  2016-02-23 20:53       ` Konrad Rzeszutek Wilk
                         ` (2 more replies)
  3 siblings, 3 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-23 20:41 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Jun Nakajima, jinsong.liu,
	xen-devel, mpohlack, ross.lagerwall, Stefano Stabellini,
	Aravind Gopalakrishnan, Jan Beulich, xen-devel, Boris Ostrovsky,
	Suravee Suthikulpanit, sasha.levin

. snip..
> > + * Note that because of this NOP code the do_nmi is not safely patchable.
> > + * Also if we do receive 'real' NMIs we have lost them.
> 
> The MCE path needs consideration as well.  Unlike the NMI path however,
> that one cannot be ignored.
> 
> In both cases, it might be best to see about raising a tasklet or
> softirq to pick up some deferred work.

I will put that in a seperate patch as this is patch is big enough.

> 
> > + */
> > +static int mask_nmi_callback(const struct cpu_user_regs *regs, int cpu)
> > +{
> > +    return 1;
> > +}
> > +
> > +static void reschedule_fn(void *unused)
> > +{
> > +    smp_mb(); /* Synchronize with setting do_work */
> > +    raise_softirq(SCHEDULE_SOFTIRQ);
> 
> As you have to IPI each processor to raise a schedule softirq, you can
> set a per-cpu "xsplice enter rendezvous" variable.  This prevents the
> need for the return-to-guest path to poll one single byte.

.. Not sure I follow. The IPI we send to the other CPU is 0xfb - which
makes the smp_call_function_interrupt run, which calls this function:
reschedule_fn(). Then raise_softirq sets the bit on softirq_pending.

Great. Since we caused an IPI that means we ended up calling VMEXIT which
eventually ends calling process_pending_softirqs() which calls schedule().
And after that it calls check_for_xsplice_work().

Are you suggesting to add new softirq that would call in check_for_xsplice_work()?

Or are you suggesting to skip the softirq_pending check and all the
code around that and instead have each VMEXIT code path check this
per-cpu "xsplice enter" variable? If so, why not use the existing
softirq infrastructure? 

.. snip..
> 
> > +}
> > +
> > +void do_xsplice(void)
> > +{
> > +    struct payload *p = xsplice_work.data;
> > +    unsigned int cpu = smp_processor_id();
> > +
> > +    /* Fast path: no work to do. */
> > +    if ( likely(!xsplice_work.do_work) )
> > +        return;
> > +    ASSERT(local_irq_is_enabled());
> > +
> > +    /* Set at -1, so will go up to num_online_cpus - 1 */
> > +    if ( atomic_inc_and_test(&xsplice_work.semaphore) )
> > +    {
> > +        unsigned int total_cpus;
> > +
> > +        if ( !get_cpu_maps() )
> > +        {
> > +            printk(XENLOG_DEBUG "%s: CPU%u - unable to get cpu_maps lock.\n",
> > +                   p->name, cpu);
> > +            xsplice_work.data->rc = -EBUSY;
> > +            xsplice_work.do_work = 0;
> > +            return;
> 
> This error path leaves a ref in the semaphore.

It does. And it also does so in xsplice_do_single() - if the xsplice_do_wait()
fails, 
> 
> > +        }
> > +
> > +        barrier(); /* MUST do it after get_cpu_maps. */
> > +        total_cpus = num_online_cpus() - 1;
> > +
> > +        if ( total_cpus )
> > +        {
> > +            printk(XENLOG_DEBUG "%s: CPU%u - IPIing the %u CPUs.\n", p->name,
> > +                   cpu, total_cpus);
> > +            smp_call_function(reschedule_fn, NULL, 0);
> > +        }
> > +        (void)xsplice_do_single(total_cpus);

.. here, we never decrement the semaphore.

Which is a safe-guard (documenting that).

The issue here is that say we have two CPUs:

CPU0				CPU1

semaphore=0			semaphore=1
 !get_cpu_maps()
  do_work = 0;			.. now goes in the 'slave' part below and exits out
                                as do_work=0

Now if we decremented the semaphore back on the error path:

CPU0				CPU1

semaphore=0			
 !get_cpu_maps()
				.. do_work is still set.
  do_work = 0;			
                   
  semaphore=-1
				atomic_inc_and_test(semaphore) == 0
				.. now it assumes the role of a master.

				.. it will fail as the other CPU will never
                                renezvous (the do_work is set to zero).
				But we waste another 30ms spinning.


The end result is that after patching the semaphore should equal
num_online_cpus-1.


> > +
> > +        ASSERT(local_irq_is_enabled());
> > +
> > +        put_cpu_maps();
> > +
> > +        printk(XENLOG_DEBUG "%s finished with rc=%d\n", p->name, p->rc);
> > +    }
> > +    else
> > +    {
> > +        /* Wait for all CPUs to rendezvous. */
> > +        while ( xsplice_work.do_work && !xsplice_work.ready )
> > +        {
> > +            cpu_relax();
> > +            smp_rmb();
> > +        }
> > +
> 
> What happens here if the rendezvous initiator times out?  Looks like we
> will spin forever waiting for do_work which will never drop back to 0.

Ross answered that, but the other code (master) will set do_work to zero so
we will exit this.

> 
> > +        /* Disable IRQs and signal. */
> > +        local_irq_disable();
> > +        atomic_inc(&xsplice_work.irq_semaphore);
> > +
> > +        /* Wait for patching to complete. */
> > +        while ( xsplice_work.do_work )

Ditto for this.
> > +        {
> > +            cpu_relax();
> > +            smp_rmb();
> > +        }
> > +        local_irq_enable();
> 
> Splitting the modification of do_work and ready across multiple
> functions makes it particularly hard to reason about the correctness of
> the rendezvous.  It would be better to have a xsplice_rendezvous()
> function whose purpose was to negotiate the rendezvous only, using local
> static state.  The action can then be just the switch() from
> xsplice_do_single().

The earlier code was like that but it ended up being quite
big. Let me make it happen and leave the actions in the xsplice_do_single()
(and rename it to xsplice_do_action().


> 
> > +    }
> > +}
> > +
> > diff --git a/xen/include/asm-arm/nmi.h b/xen/include/asm-arm/nmi.h
> > index a60587e..82aff35 100644
> > --- a/xen/include/asm-arm/nmi.h
> > +++ b/xen/include/asm-arm/nmi.h
> > @@ -4,6 +4,19 @@
> >  #define register_guest_nmi_callback(a)  (-ENOSYS)
> >  #define unregister_guest_nmi_callback() (-ENOSYS)
> >  
> > +typedef int (*nmi_callback_t)(const struct cpu_user_regs *regs, int cpu);
> > +
> > +/**
> > + * set_nmi_callback
> > + *
> > + * Set a handler for an NMI. Only one handler may be
> > + * set. Return the old nmi callback handler.
> > + */
> > +static inline nmi_callback_t set_nmi_callback(nmi_callback_t callback)
> > +{
> > +    return NULL;
> > +}
> > +
> 
> This addition suggests that there should probably be an
> arch_xsplice_prepair_rendezvous() and arch_xsplice_finish_rendezvous().

Yes indeed.
> 
> ~Andrew

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)
  2016-02-22 15:00   ` Ross Lagerwall
  2016-02-22 17:06     ` Ross Lagerwall
@ 2016-02-23 20:43     ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-23 20:43 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Jun Nakajima, jinsong.liu,
	xen-devel, mpohlack, Stefano Stabellini, Aravind Gopalakrishnan,
	Jan Beulich, andrew.cooper3, xen-devel, Boris Ostrovsky,
	Suravee Suthikulpanit, sasha.levin

On Mon, Feb 22, 2016 at 03:00:31PM +0000, Ross Lagerwall wrote:
> On 02/12/2016 06:05 PM, Konrad Rzeszutek Wilk wrote:
> snip
> >+static void xsplice_do_single(unsigned int total_cpus)
> >+{
> >+    nmi_callback_t saved_nmi_callback;
> >+    struct payload *data, *tmp;
> >+    s_time_t timeout;
> >+    int rc;
> >+
> >+    data = xsplice_work.data;
> >+    timeout = xsplice_work.timeout + NOW();
> >+    if ( xsplice_do_wait(&xsplice_work.semaphore, timeout, total_cpus,
> >+                         "Timed out on CPU semaphore") )
> >+        return;
> >+
> >+    /* "Mask" NMIs. */
> >+    saved_nmi_callback = set_nmi_callback(mask_nmi_callback);
> >+
> >+    /* All CPUs are waiting, now signal to disable IRQs. */
> >+    xsplice_work.ready = 1;
> >+    smp_wmb();
> >+
> >+    atomic_inc(&xsplice_work.irq_semaphore);
> >+    if ( xsplice_do_wait(&xsplice_work.irq_semaphore, timeout, total_cpus,
> >+                         "Timed out on IRQ semaphore.") )
> >+        return;
> >+
> >+    local_irq_disable();
> >+    /* Now this function should be the only one on any stack.
> >+     * No need to lock the payload list or applied list. */
> >+    switch ( xsplice_work.cmd )
> >+    {
> >+    case XSPLICE_ACTION_APPLY:
> >+        rc = apply_payload(data);
> >+        if ( rc == 0 )
> >+            data->state = XSPLICE_STATE_APPLIED;
> >+        break;
> >+    case XSPLICE_ACTION_REVERT:
> >+        rc = revert_payload(data);
> >+        if ( rc == 0 )
> >+            data->state = XSPLICE_STATE_CHECKED;
> >+        break;
> >+    case XSPLICE_ACTION_REPLACE:
> >+        list_for_each_entry_safe_reverse ( data, tmp, &applied_list, list )
> >+        {
> >+            data->rc = revert_payload(data);
> >+            if ( data->rc == 0 )
> >+                data->state = XSPLICE_STATE_CHECKED;
> >+            else
> >+            {
> >+                rc = -EINVAL;
> >+                break;
> >+            }
> >+        }
> 
> You're using data as a loop iterator here but the variable serves another
> purpose outside the loop. That's not gonna end well.

No not at all. I've added another variable: "other" that will be used in the loop.

> 
> -- 
> Ross Lagerwall

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)
  2016-02-22 17:06     ` Ross Lagerwall
@ 2016-02-23 20:47       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-23 20:47 UTC (permalink / raw)
  To: Ross Lagerwall; +Cc: xen-devel

> 
> Also above, you've got:
> list_for_each_entry_safe_reverse ( data, tmp, &applied_list, list )
> 
> but it needs to be:
> list_for_each_entry_safe_reverse ( data, tmp, &applied_list, applied_list )

Totally mised it.
> 
> I'm not sure why this was changed from how I had it...

I had issues applying the patch so I modified it by hand - which
was of course the wrong thing to do. Let me also document
that we MUST use the 'applied_list' list.
> 
> rc is also used uninitialized in the replace path.
> 
> -- 
> Ross Lagerwall

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)
  2016-02-23 20:41     ` Konrad Rzeszutek Wilk
@ 2016-02-23 20:53       ` Konrad Rzeszutek Wilk
  2016-02-23 20:57       ` Konrad Rzeszutek Wilk
  2016-02-23 21:10       ` Andrew Cooper
  2 siblings, 0 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-23 20:53 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Jun Nakajima, jinsong.liu,
	xen-devel, mpohlack, ross.lagerwall, Stefano Stabellini,
	Aravind Gopalakrishnan, Jan Beulich, xen-devel, Boris Ostrovsky,
	Suravee Suthikulpanit, sasha.levin

On Tue, Feb 23, 2016 at 03:41:57PM -0500, Konrad Rzeszutek Wilk wrote:
> .. snip..
> > > + * Note that because of this NOP code the do_nmi is not safely patchable.
> > > + * Also if we do receive 'real' NMIs we have lost them.
> > 
> > The MCE path needs consideration as well.  Unlike the NMI path however,
> > that one cannot be ignored.
> > 
> > In both cases, it might be best to see about raising a tasklet or
> > softirq to pick up some deferred work.
> 
> I will put that in a seperate patch as this is patch is big enough.
> 

.. which will also fix the alternative_asm() usage of it.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)
  2016-02-23 20:41     ` Konrad Rzeszutek Wilk
  2016-02-23 20:53       ` Konrad Rzeszutek Wilk
@ 2016-02-23 20:57       ` Konrad Rzeszutek Wilk
  2016-02-23 21:10       ` Andrew Cooper
  2 siblings, 0 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-23 20:57 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Jun Nakajima, jinsong.liu,
	xen-devel, mpohlack, ross.lagerwall, Stefano Stabellini,
	Aravind Gopalakrishnan, Jan Beulich, xen-devel, Boris Ostrovsky,
	Suravee Suthikulpanit, sasha.levin

> > > +static void reschedule_fn(void *unused)
> > > +{
> > > +    smp_mb(); /* Synchronize with setting do_work */
> > > +    raise_softirq(SCHEDULE_SOFTIRQ);
> > 
> > As you have to IPI each processor to raise a schedule softirq, you can
> > set a per-cpu "xsplice enter rendezvous" variable.  This prevents the
> > need for the return-to-guest path to poll one single byte.
> 
> .. Not sure I follow. The IPI we send to the other CPU is 0xfb - which
> makes the smp_call_function_interrupt run, which calls this function:
> reschedule_fn(). Then raise_softirq sets the bit on softirq_pending.
> 
> Great. Since we caused an IPI that means we ended up calling VMEXIT which
> eventually ends calling process_pending_softirqs() which calls schedule().
> And after that it calls check_for_xsplice_work().
> 
> Are you suggesting to add new softirq that would call in check_for_xsplice_work()?
> 
> Or are you suggesting to skip the softirq_pending check and all the
> code around that and instead have each VMEXIT code path check this
> per-cpu "xsplice enter" variable? If so, why not use the existing
> softirq infrastructure? 

N/m.

You were referring to the:

> > > +void do_xsplice(void)
..
> > > +    /* Fast path: no work to do. */
> > > +    if ( likely(!xsplice_work.do_work) )
> > > +        return;

which every CPU is going to do in when it calls idle_loop, svm_do_resume,
and vmx_do_resume.

Let me add that in!

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)
  2016-02-23 20:41     ` Konrad Rzeszutek Wilk
  2016-02-23 20:53       ` Konrad Rzeszutek Wilk
  2016-02-23 20:57       ` Konrad Rzeszutek Wilk
@ 2016-02-23 21:10       ` Andrew Cooper
  2016-02-24  9:31         ` Jan Beulich
  2 siblings, 1 reply; 86+ messages in thread
From: Andrew Cooper @ 2016-02-23 21:10 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Jun Nakajima, jinsong.liu,
	xen-devel, mpohlack, ross.lagerwall, Stefano Stabellini,
	Aravind Gopalakrishnan, Jan Beulich, xen-devel, Boris Ostrovsky,
	Suravee Suthikulpanit, sasha.levin

On 23/02/2016 20:41, Konrad Rzeszutek Wilk wrote:
> . snip..
>>> + * Note that because of this NOP code the do_nmi is not safely patchable.
>>> + * Also if we do receive 'real' NMIs we have lost them.
>> The MCE path needs consideration as well.  Unlike the NMI path however,
>> that one cannot be ignored.
>>
>> In both cases, it might be best to see about raising a tasklet or
>> softirq to pick up some deferred work.
> I will put that in a seperate patch as this is patch is big enough.

Actually, after subsequent thought, raising a tasklet wont help.

The biggest risk is the SMAP alternative in the asm entrypoints.  The
only way patching that can be made safe is to play fun and games with
debug traps. i.e.

Patch a 0xcc first
Then patch the rest of the bytes in the replacement
Then replace the 0xcc with the first byte of the replacement

This way if the codepath is hit while patching is in progress, you will
end up in the debug trap handler rather than executing junk.  There then
has to be some scheduling games for the NMI/MCE handler to take over
patching the code if it interrupted the patching pcpu.  Patching in
principle is a short operation, so performing it the handlers is not too
much of a problem.

The tricky part is patching the top of the debug trap handler and not
ending in an infinite loop.  I have a cunning idea, and will see if I
can find some copious free time to experiment with.

For v1 however, the implementation is fine.  It can be documented that
patching functions on the NMI/MCE path is liable to end in sadness, and
the asm entry points will have been taken care of during boot.

>
>>> + */
>>> +static int mask_nmi_callback(const struct cpu_user_regs *regs, int cpu)
>>> +{
>>> +    return 1;
>>> +}
>>> +
>>> +static void reschedule_fn(void *unused)
>>> +{
>>> +    smp_mb(); /* Synchronize with setting do_work */
>>> +    raise_softirq(SCHEDULE_SOFTIRQ);
>> As you have to IPI each processor to raise a schedule softirq, you can
>> set a per-cpu "xsplice enter rendezvous" variable.  This prevents the
>> need for the return-to-guest path to poll one single byte.
> .. Not sure I follow. The IPI we send to the other CPU is 0xfb - which
> makes the smp_call_function_interrupt run, which calls this function:
> reschedule_fn(). Then raise_softirq sets the bit on softirq_pending.

Correct

>
> Great. Since we caused an IPI that means we ended up calling VMEXIT which
> eventually ends calling process_pending_softirqs() which calls schedule().
> And after that it calls check_for_xsplice_work().

Correct

> Are you suggesting to add new softirq that would call in check_for_xsplice_work()?

No.  I am concerned that check_for_xsplice_work() is reading a single
global variable which, when patching is occurring, will be in a
repeatedly-dirtied cacheline.

> Or are you suggesting to skip the softirq_pending check and all the
> code around that and instead have each VMEXIT code path check this
> per-cpu "xsplice enter" variable? If so, why not use the existing
> softirq infrastructure? 

What I am suggesting is having reschedule_fn() set
this_cpu(xsplice_work_pending) = 1 and have check_for_xsplice_work()
check this_cpu(xsplice_work_pending) rather than the global semaphore.

This should ease the impact on the cache coherency fabric.

>
> .. snip..
>>> +}
>>> +
>>> +void do_xsplice(void)
>>> +{
>>> +    struct payload *p = xsplice_work.data;
>>> +    unsigned int cpu = smp_processor_id();
>>> +
>>> +    /* Fast path: no work to do. */
>>> +    if ( likely(!xsplice_work.do_work) )
>>> +        return;
>>> +    ASSERT(local_irq_is_enabled());
>>> +
>>> +    /* Set at -1, so will go up to num_online_cpus - 1 */
>>> +    if ( atomic_inc_and_test(&xsplice_work.semaphore) )
>>> +    {
>>> +        unsigned int total_cpus;
>>> +
>>> +        if ( !get_cpu_maps() )
>>> +        {
>>> +            printk(XENLOG_DEBUG "%s: CPU%u - unable to get cpu_maps lock.\n",
>>> +                   p->name, cpu);
>>> +            xsplice_work.data->rc = -EBUSY;
>>> +            xsplice_work.do_work = 0;
>>> +            return;
>> This error path leaves a ref in the semaphore.
> It does. And it also does so in xsplice_do_single() - if the xsplice_do_wait()
> fails, 
>>> +        }
>>> +
>>> +        barrier(); /* MUST do it after get_cpu_maps. */
>>> +        total_cpus = num_online_cpus() - 1;
>>> +
>>> +        if ( total_cpus )
>>> +        {
>>> +            printk(XENLOG_DEBUG "%s: CPU%u - IPIing the %u CPUs.\n", p->name,
>>> +                   cpu, total_cpus);
>>> +            smp_call_function(reschedule_fn, NULL, 0);
>>> +        }
>>> +        (void)xsplice_do_single(total_cpus);
> .. here, we never decrement the semaphore.
>
> Which is a safe-guard (documenting that).
>
> The issue here is that say we have two CPUs:
>
> CPU0				CPU1
>
> semaphore=0			semaphore=1
>  !get_cpu_maps()
>   do_work = 0;			.. now goes in the 'slave' part below and exits out
>                                 as do_work=0
>
> Now if we decremented the semaphore back on the error path:
>
> CPU0				CPU1
>
> semaphore=0			
>  !get_cpu_maps()
> 				.. do_work is still set.
>   do_work = 0;			
>                    
>   semaphore=-1
> 				atomic_inc_and_test(semaphore) == 0
> 				.. now it assumes the role of a master.
>
> 				.. it will fail as the other CPU will never
>                                 renezvous (the do_work is set to zero).
> 				But we waste another 30ms spinning.
>
>
> The end result is that after patching the semaphore should equal
> num_online_cpus-1.

Yay concurrency!  I am going to have to consider this more closely.

~Andrew

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)
  2016-02-23 21:10       ` Andrew Cooper
@ 2016-02-24  9:31         ` Jan Beulich
  0 siblings, 0 replies; 86+ messages in thread
From: Jan Beulich @ 2016-02-24  9:31 UTC (permalink / raw)
  To: Andrew Cooper, Konrad Rzeszutek Wilk
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Jun Nakajima, jinsong.liu,
	xen-devel, mpohlack, ross.lagerwall, Aravind Gopalakrishnan,
	Suravee Suthikulpanit, xen-devel, Stefano Stabellini,
	Boris Ostrovsky, sasha.levin

>>> On 23.02.16 at 22:10, <andrew.cooper3@citrix.com> wrote:
> On 23/02/2016 20:41, Konrad Rzeszutek Wilk wrote:
>> . snip..
>>>> + * Note that because of this NOP code the do_nmi is not safely patchable.
>>>> + * Also if we do receive 'real' NMIs we have lost them.
>>> The MCE path needs consideration as well.  Unlike the NMI path however,
>>> that one cannot be ignored.
>>>
>>> In both cases, it might be best to see about raising a tasklet or
>>> softirq to pick up some deferred work.
>> I will put that in a seperate patch as this is patch is big enough.
> 
> Actually, after subsequent thought, raising a tasklet wont help.
> 
> The biggest risk is the SMAP alternative in the asm entrypoints.  The
> only way patching that can be made safe is to play fun and games with
> debug traps. i.e.
> 
> Patch a 0xcc first
> Then patch the rest of the bytes in the replacement
> Then replace the 0xcc with the first byte of the replacement
> 
> This way if the codepath is hit while patching is in progress, you will
> end up in the debug trap handler rather than executing junk.  There then
> has to be some scheduling games for the NMI/MCE handler to take over
> patching the code if it interrupted the patching pcpu.  Patching in
> principle is a short operation, so performing it the handlers is not too
> much of a problem.
> 
> The tricky part is patching the top of the debug trap handler and not
> ending in an infinite loop.  I have a cunning idea, and will see if I
> can find some copious free time to experiment with.

For the SMAP patching this isn't the only way to make it safe, but
the alternative isn't suitable for xSplice: As long as the original
code is just NOPs, and as long as the starting address is aligned,
the first two bytes could become a short branch instead, patching
would fiddle with everything except the first two bytes first, and
as its last action (suitably fenced and maybe even cache flushed)
would atomically overwrite the first two bytes.

Jan

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 09/23] xsplice: Add support for bug frames. (v4)
  2016-02-16 19:35   ` Andrew Cooper
@ 2016-02-24 16:22     ` Konrad Rzeszutek Wilk
  2016-02-24 16:30       ` Andrew Cooper
  2016-02-24 16:26     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-24 16:22 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Keir Fraser, Ian Campbell, jinsong.liu, xen-devel, mpohlack,
	ross.lagerwall, Stefano Stabellini, Jan Beulich, xen-devel,
	sasha.levin

. snip..
> There is a neater way of doing this, which doesn't involve having "if (
> regular ) else if ( xsplice )" logic chains through the code.

s/chains/chain/

There is only one that uses the 'xsplice' name in it:-)

The other two are wrapped with the 'is_patch'.
> 
> Given a
> 
> struct virtual_region
> {
>     struct list_head list;
>     unsigned long start, size;
> 
>     struct bug_frame *foo;
>     struct exception_table_entry *bar;
> };
> 
> The init code can construct one for the base hypervisor, and xsplice can
> add or remove entries from the list.  Then, the trap routines search the
> virtual region list for [start, size) and follow the appropriate pointers.

You are suggesting that on bootup we parse the the __stop_bug_frames_[0-3]
(different on ARM), and create an linked list to contain those.

Then xSplice can call in this API to add their own - and on unload it can
unlink them and free them.

If m understanding is correct - while it is certainly much nicer, it has drawbacks:
 - Increases the code to now handle the linked list and all the code around it
   (And correspondingly we may have now some extra bugs to track).
 - Bigger memory consumption - we now to have to consume memory for this list - even
   for the built-in ones.
 - More code to do for v1 of this patchset.

Can we perhaps we can make this a lesser priority and keep it the existing
if ( .. ) else if (xsplice_find_bug()..) code construct for Xen 4.7?

Thanks.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 09/23] xsplice: Add support for bug frames. (v4)
  2016-02-16 19:35   ` Andrew Cooper
  2016-02-24 16:22     ` Konrad Rzeszutek Wilk
@ 2016-02-24 16:26     ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-24 16:26 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Keir Fraser, Ian Campbell, jinsong.liu, xen-devel, mpohlack,
	ross.lagerwall, Stefano Stabellini, Jan Beulich, xen-devel,
	sasha.levin

On Tue, Feb 16, 2016 at 07:35:32PM +0000, Andrew Cooper wrote:
> On 12/02/16 18:05, Konrad Rzeszutek Wilk wrote:
> > diff --git a/xen/common/symbols.c b/xen/common/symbols.c
> > index a59c59d..bf5623f 100644
> > --- a/xen/common/symbols.c
> > +++ b/xen/common/symbols.c
> > @@ -17,6 +17,7 @@
> >  #include <xen/lib.h>
> >  #include <xen/string.h>
> >  #include <xen/spinlock.h>
> > +#include <xen/xsplice.h>
> >  #include <public/platform.h>
> >  #include <xen/guest_access.h>
> >  
> > @@ -101,6 +102,12 @@ bool_t is_active_kernel_text(unsigned long addr)
> >              (system_state < SYS_STATE_active && is_kernel_inittext(addr)));
> >  }
> >  
> > +bool_t is_active_text(unsigned long addr)
> > +{
> > +    return is_active_kernel_text(addr) ||
> > +           is_active_module_text(addr);
> > +}
> 
> This would be better as a static inline in a header file, to avoid a
> call into a separate translation unit.

I stuck it in kernel.h, as so, would that work for you?

diff --git a/xen/include/xen/kernel.h b/xen/include/xen/kernel.h
index 548b64d..1e8ed68 100644
--- a/xen/include/xen/kernel.h
+++ b/xen/include/xen/kernel.h
@@ -100,5 +100,20 @@ extern enum system_state {
 
 bool_t is_active_kernel_text(unsigned long addr);
 
+#ifdef CONFIG_XSPLICE
+#include <xen/xsplice.h>
+
+static bool_t is_active_text(unsigned long addr)
+{
+    return is_active_kernel_text(addr) ||
+           is_active_patch_text(addr);
+}
+#else
+static bool_t is_active_text(unsigned long addr)
+{
+    return is_active_kernel_text(addr);
+}
+#endif
+
 #endif /* _LINUX_KERNEL_H */

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 09/23] xsplice: Add support for bug frames. (v4)
  2016-02-24 16:22     ` Konrad Rzeszutek Wilk
@ 2016-02-24 16:30       ` Andrew Cooper
  0 siblings, 0 replies; 86+ messages in thread
From: Andrew Cooper @ 2016-02-24 16:30 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, Ian Campbell, jinsong.liu, xen-devel, mpohlack,
	ross.lagerwall, Stefano Stabellini, Jan Beulich, xen-devel,
	sasha.levin

On 24/02/16 16:22, Konrad Rzeszutek Wilk wrote:
> . snip..
>> There is a neater way of doing this, which doesn't involve having "if (
>> regular ) else if ( xsplice )" logic chains through the code.
> s/chains/chain/
>
> There is only one that uses the 'xsplice' name in it:-)
>
> The other two are wrapped with the 'is_patch'.
>> Given a
>>
>> struct virtual_region
>> {
>>     struct list_head list;
>>     unsigned long start, size;
>>
>>     struct bug_frame *foo;
>>     struct exception_table_entry *bar;
>> };
>>
>> The init code can construct one for the base hypervisor, and xsplice can
>> add or remove entries from the list.  Then, the trap routines search the
>> virtual region list for [start, size) and follow the appropriate pointers.
> You are suggesting that on bootup we parse the the __stop_bug_frames_[0-3]
> (different on ARM), and create an linked list to contain those.

Can probably manage this at compile time for the builtin ones.

>
> Then xSplice can call in this API to add their own - and on unload it can
> unlink them and free them.
>
> If m understanding is correct - while it is certainly much nicer, it has drawbacks:
>  - Increases the code to now handle the linked list and all the code around it
>    (And correspondingly we may have now some extra bugs to track).
>  - Bigger memory consumption - we now to have to consume memory for this list - even
>    for the built-in ones.

I mean having one struct virtual_region for Xen itself (perhaps two; one
for .text and one for .init which gets removed later), and one extra for
each xpatch.

The extra memory consumption is a 6 pointers, traded against far cleaner
logic for bugfames and extable redirections.

~Andrew

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 13/23] XENVER_build_id: Provide ld-embedded build-ids (v10)
  2016-02-16 20:09   ` Andrew Cooper
  2016-02-16 20:22     ` Konrad Rzeszutek Wilk
@ 2016-02-24 18:52     ` Konrad Rzeszutek Wilk
  2016-02-24 19:13       ` Andrew Cooper
  1 sibling, 1 reply; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-24 18:52 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Ian Campbell, jinsong.liu, Stefano Stabellini,
	Ian Jackson, xen-devel, mpohlack, ross.lagerwall,
	Stefano Stabellini, Jan Beulich, xen-devel, Daniel De Graaf,
	Keir Fraser, sasha.levin

> > diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
> > index f501a2f..5cf180f 100644
> > --- a/xen/arch/arm/xen.lds.S
> > +++ b/xen/arch/arm/xen.lds.S
> > @@ -22,6 +22,9 @@ OUTPUT_ARCH(FORMAT)
> >  PHDRS
> >  {
> >    text PT_LOAD /* XXX should be AT ( XEN_PHYS_START ) */ ;
> > +#if defined(BUILD_ID)
> > +  note PT_NOTE ;
> > +#endif
> >  }
> >  SECTIONS
> >  {
> > @@ -53,6 +56,16 @@ SECTIONS
> >          _erodata = .;          /* End of read-only data */
> >    } :text
> >  
> > +#if defined(BUILD_ID)
> > +  .note : {
> > +       __note_gnu_build_id_start = .;
> > +       *(.note.gnu.build-id)
> > +       __note_gnu_build_id_end = .;
> > +       *(.note)
> > +       *(.note.*)
> > +  } :text
> > +#endif
> 
> This data really should be contained inside rodata.

I get (I replace :text with :rodata) and got:
ld: section `.note' assigned to non-existent phdr `rodata'

Which makes sense as there are only two PHDRS. Where you suggesting that
the .note should be part of the .rodata section? Jan wanted this to be
in its own section (.note).

Are you suggesting to add another one PHDR? (If so, then mkelf32 has to be modified,
and for EFI I think it will have to have some #ifdef machinery to make it work).

This is what I have right now in the tree.
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 9fde1db..ee16d22 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -31,6 +31,9 @@ OUTPUT_ARCH(i386:x86-64)
 PHDRS
 {
   text PT_LOAD ;
+#if defined(BUILD_ID) && !defined(EFI)
+  note PT_NOTE ;
+#endif
 }
 SECTIONS
 {
@@ -65,8 +68,28 @@ SECTIONS
 
        *(.rodata)
        *(.rodata.*)
+#if defined(BUILD_ID) && defined(EFI)
+       __note_gnu_build_id_start = .;
+       *(.note.gnu.build-id)
+       __note_gnu_build_id_end = .;
+#endif
   } :text
 
+#if defined(BUILD_ID) && !defined(EFI)
+/*
+ * No mechanism to put an PT_NOTE in the EFI file - so put
+ * it in .data section.
+ */
+  . = ALIGN(4);
+  .note : {
+       __note_gnu_build_id_start = .;
+       *(.note.gnu.build-id)
+       __note_gnu_build_id_end = .;
+       *(.note)
+       *(.note.*)
+  } :note :text
+#endif
+
   . = ALIGN(SMP_CACHE_BYTES);
   .data.read_mostly : {
        /* Exception table */

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 13/23] XENVER_build_id: Provide ld-embedded build-ids (v10)
  2016-02-24 18:52     ` Konrad Rzeszutek Wilk
@ 2016-02-24 19:13       ` Andrew Cooper
  2016-02-24 20:54         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 86+ messages in thread
From: Andrew Cooper @ 2016-02-24 19:13 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Ian Campbell, jinsong.liu, Stefano Stabellini,
	Ian Jackson, xen-devel, mpohlack, ross.lagerwall,
	Stefano Stabellini, Jan Beulich, xen-devel, Daniel De Graaf,
	Keir Fraser, sasha.levin

On 24/02/16 18:52, Konrad Rzeszutek Wilk wrote:
>>> diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
>>> index f501a2f..5cf180f 100644
>>> --- a/xen/arch/arm/xen.lds.S
>>> +++ b/xen/arch/arm/xen.lds.S
>>> @@ -22,6 +22,9 @@ OUTPUT_ARCH(FORMAT)
>>>  PHDRS
>>>  {
>>>    text PT_LOAD /* XXX should be AT ( XEN_PHYS_START ) */ ;
>>> +#if defined(BUILD_ID)
>>> +  note PT_NOTE ;
>>> +#endif
>>>  }
>>>  SECTIONS
>>>  {
>>> @@ -53,6 +56,16 @@ SECTIONS
>>>          _erodata = .;          /* End of read-only data */
>>>    } :text
>>>  
>>> +#if defined(BUILD_ID)
>>> +  .note : {
>>> +       __note_gnu_build_id_start = .;
>>> +       *(.note.gnu.build-id)
>>> +       __note_gnu_build_id_end = .;
>>> +       *(.note)
>>> +       *(.note.*)
>>> +  } :text
>>> +#endif
>> This data really should be contained inside rodata.
> I get (I replace :text with :rodata) and got:
> ld: section `.note' assigned to non-existent phdr `rodata'
>
> Which makes sense as there are only two PHDRS. Where you suggesting that
> the .note should be part of the .rodata section? Jan wanted this to be
> in its own section (.note).
>
> Are you suggesting to add another one PHDR? (If so, then mkelf32 has to be modified,
> and for EFI I think it will have to have some #ifdef machinery to make it work).

I was just suggesting moving _erodata down a little to cover .note

Whatever happens patching-wise, this build ID is constant and will want
to remain so.

~Andrew

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 13/23] XENVER_build_id: Provide ld-embedded build-ids (v10)
  2016-02-24 19:13       ` Andrew Cooper
@ 2016-02-24 20:54         ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-24 20:54 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Ian Campbell, jinsong.liu, Stefano Stabellini,
	Ian Jackson, xen-devel, mpohlack, ross.lagerwall,
	Stefano Stabellini, Jan Beulich, xen-devel, Daniel De Graaf,
	Keir Fraser, sasha.levin

On Wed, Feb 24, 2016 at 07:13:11PM +0000, Andrew Cooper wrote:
> On 24/02/16 18:52, Konrad Rzeszutek Wilk wrote:
> >>> diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
> >>> index f501a2f..5cf180f 100644
> >>> --- a/xen/arch/arm/xen.lds.S
> >>> +++ b/xen/arch/arm/xen.lds.S
> >>> @@ -22,6 +22,9 @@ OUTPUT_ARCH(FORMAT)
> >>>  PHDRS
> >>>  {
> >>>    text PT_LOAD /* XXX should be AT ( XEN_PHYS_START ) */ ;
> >>> +#if defined(BUILD_ID)
> >>> +  note PT_NOTE ;
> >>> +#endif
> >>>  }
> >>>  SECTIONS
> >>>  {
> >>> @@ -53,6 +56,16 @@ SECTIONS
> >>>          _erodata = .;          /* End of read-only data */
> >>>    } :text
> >>>  
> >>> +#if defined(BUILD_ID)
> >>> +  .note : {
> >>> +       __note_gnu_build_id_start = .;
> >>> +       *(.note.gnu.build-id)
> >>> +       __note_gnu_build_id_end = .;
> >>> +       *(.note)
> >>> +       *(.note.*)
> >>> +  } :text
> >>> +#endif
> >> This data really should be contained inside rodata.
> > I get (I replace :text with :rodata) and got:
> > ld: section `.note' assigned to non-existent phdr `rodata'
> >
> > Which makes sense as there are only two PHDRS. Where you suggesting that
> > the .note should be part of the .rodata section? Jan wanted this to be
> > in its own section (.note).
> >
> > Are you suggesting to add another one PHDR? (If so, then mkelf32 has to be modified,
> > and for EFI I think it will have to have some #ifdef machinery to make it work).
> 
> I was just suggesting moving _erodata down a little to cover .note

Which oddly enough is only in ARM builds. Done!

> 
> Whatever happens patching-wise, this build ID is constant and will want
> to remain so.
> 
> ~Andrew

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 17/23] xsplice: Print dependency and payloads build_id in the keyhandler.
  2016-02-17 11:10     ` Jan Beulich
@ 2016-02-24 21:54       ` Konrad Rzeszutek Wilk
  2016-02-25  8:47         ` Jan Beulich
  0 siblings, 1 reply; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-02-24 21:54 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, Ian Campbell, Andrew Cooper, Ian Jackson,
	Tim Deegan, mpohlack, ross.lagerwall, jinsong.liu, xen-devel

On Wed, Feb 17, 2016 at 04:10:12AM -0700, Jan Beulich wrote:
> >>> On 16.02.16 at 21:20, <andrew.cooper3@citrix.com> wrote:
> > On 12/02/16 18:05, Konrad Rzeszutek Wilk wrote:
> >> +static void xsplice_print_build_id(char *id, unsigned int len)
> >> +{
> >> +    unsigned int i;
> >> +
> >> +    if ( !len )
> >> +        return;
> >> +
> >> +    for ( i = 0; i < len; i++ )
> >> +    {
> >> +        uint8_t c = id[i];
> >> +        printk("%02x", c);
> > 
> > What about the already existing %*ph custom format?  If the spaces are a
> > problem we could introduce %*phN from Linux which has no spaces.
> > 
> > The advantage of this is that it is a single call to printk, rather than
> > many, and avoids the ability for a different cpu to interleave in the
> > middle of a line.
> 
> I don't think this ability exists anymore after we've switched to
> per-CPU there. Which isn't to say, though, that I wouldn't also
> like to see this be just a single printk().

Would this work for folks:

commit e1905ecf7034635f2a82cd2bbc4f5ff03ee957b0
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Wed Feb 24 15:06:15 2016 -0500

    xsplice: Print build_id in keyhandler and on bootup.
    
    As it should be an useful debug mechanism.
    
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index da26b58..b15a2de 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -13,6 +13,7 @@
 #include <xen/smp.h>
 #include <xen/softirq.h>
 #include <xen/spinlock.h>
+#include <xen/version.h>
 #include <xen/vmap.h>
 #include <xen/wait.h>
 #include <xen/xsplice_elf.h>
@@ -1079,10 +1080,33 @@ static const char *state2str(uint32_t state)
     return names[state];
 }
 
+static void xsplice_print_build_id(const char *hdr, const char *id,
+                                   unsigned int len)
+{
+    unsigned int i;
+
+    if ( !len )
+        return;
+
+    /* Each byte is two ASCII characters. */
+    if ( len*2 + 1 > sizeof(keyhandler_scratch) )
+        return;
+
+    memset(keyhandler_scratch, 0, len * 2 + 1);
+    for ( i = 0; i < len; i++ )
+        snprintf(&keyhandler_scratch[i * 2], 3, "%02x", (uint8_t)id[i]);
+
+    printk("%s%s\n", hdr, keyhandler_scratch);
+}
+
 static void xsplice_printall(unsigned char key)
 {
     struct payload *data;
-    unsigned int i;
+    char *binary_id = NULL;
+    unsigned int len = 0, i;
+
+    if ( !xen_build_id(&binary_id, &len) )
+        xsplice_print_build_id("build-id: ", binary_id, len);
 
     spin_lock_recursive(&payload_lock);
 
@@ -1106,8 +1130,14 @@ static void xsplice_printall(unsigned char key)
 
 static int __init xsplice_init(void)
 {
+    char *binary_id = NULL;
+    unsigned int len = 0;
     BUILD_BUG_ON( sizeof(struct xsplice_patch_func) != 64 );
 
+    if ( !xen_build_id(&binary_id, &len) )
+        xsplice_print_build_id("build-id: ", binary_id, len);
+
+    xsplice_print_build_id("Hypervisor build-id: ", binary_id, len);
     register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
     return 0;
 }
> 
> Jan
> 

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 17/23] xsplice: Print dependency and payloads build_id in the keyhandler.
  2016-02-24 21:54       ` Konrad Rzeszutek Wilk
@ 2016-02-25  8:47         ` Jan Beulich
  0 siblings, 0 replies; 86+ messages in thread
From: Jan Beulich @ 2016-02-25  8:47 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, Ian Campbell, jinsong.liu, Ian Jackson, Tim Deegan,
	mpohlack, ross.lagerwall, Andrew Cooper, xen-devel

>>> On 24.02.16 at 22:54, <konrad.wilk@oracle.com> wrote:
> On Wed, Feb 17, 2016 at 04:10:12AM -0700, Jan Beulich wrote:
>> >>> On 16.02.16 at 21:20, <andrew.cooper3@citrix.com> wrote:
>> > On 12/02/16 18:05, Konrad Rzeszutek Wilk wrote:
>> >> +static void xsplice_print_build_id(char *id, unsigned int len)
>> >> +{
>> >> +    unsigned int i;
>> >> +
>> >> +    if ( !len )
>> >> +        return;
>> >> +
>> >> +    for ( i = 0; i < len; i++ )
>> >> +    {
>> >> +        uint8_t c = id[i];
>> >> +        printk("%02x", c);
>> > 
>> > What about the already existing %*ph custom format?  If the spaces are a
>> > problem we could introduce %*phN from Linux which has no spaces.
>> > 
>> > The advantage of this is that it is a single call to printk, rather than
>> > many, and avoids the ability for a different cpu to interleave in the
>> > middle of a line.
>> 
>> I don't think this ability exists anymore after we've switched to
>> per-CPU there. Which isn't to say, though, that I wouldn't also
>> like to see this be just a single printk().
> 
> Would this work for folks:

Well, while it looks like it would work, it doesn't really follow Andrew's
suggestion.

Jan

> commit e1905ecf7034635f2a82cd2bbc4f5ff03ee957b0
> Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Date:   Wed Feb 24 15:06:15 2016 -0500
> 
>     xsplice: Print build_id in keyhandler and on bootup.
>     
>     As it should be an useful debug mechanism.
>     
>     Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
> index da26b58..b15a2de 100644
> --- a/xen/common/xsplice.c
> +++ b/xen/common/xsplice.c
> @@ -13,6 +13,7 @@
>  #include <xen/smp.h>
>  #include <xen/softirq.h>
>  #include <xen/spinlock.h>
> +#include <xen/version.h>
>  #include <xen/vmap.h>
>  #include <xen/wait.h>
>  #include <xen/xsplice_elf.h>
> @@ -1079,10 +1080,33 @@ static const char *state2str(uint32_t state)
>      return names[state];
>  }
>  
> +static void xsplice_print_build_id(const char *hdr, const char *id,
> +                                   unsigned int len)
> +{
> +    unsigned int i;
> +
> +    if ( !len )
> +        return;
> +
> +    /* Each byte is two ASCII characters. */
> +    if ( len*2 + 1 > sizeof(keyhandler_scratch) )
> +        return;
> +
> +    memset(keyhandler_scratch, 0, len * 2 + 1);
> +    for ( i = 0; i < len; i++ )
> +        snprintf(&keyhandler_scratch[i * 2], 3, "%02x", (uint8_t)id[i]);
> +
> +    printk("%s%s\n", hdr, keyhandler_scratch);
> +}
> +
>  static void xsplice_printall(unsigned char key)
>  {
>      struct payload *data;
> -    unsigned int i;
> +    char *binary_id = NULL;
> +    unsigned int len = 0, i;
> +
> +    if ( !xen_build_id(&binary_id, &len) )
> +        xsplice_print_build_id("build-id: ", binary_id, len);
>  
>      spin_lock_recursive(&payload_lock);
>  
> @@ -1106,8 +1130,14 @@ static void xsplice_printall(unsigned char key)
>  
>  static int __init xsplice_init(void)
>  {
> +    char *binary_id = NULL;
> +    unsigned int len = 0;
>      BUILD_BUG_ON( sizeof(struct xsplice_patch_func) != 64 );
>  
> +    if ( !xen_build_id(&binary_id, &len) )
> +        xsplice_print_build_id("build-id: ", binary_id, len);
> +
> +    xsplice_print_build_id("Hypervisor build-id: ", binary_id, len);
>      register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
>      return 0;
>  }
>> 
>> Jan
>> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 21/23] xsplice: Add support for shadow variables
  2016-02-12 18:05 ` [PATCH v3 21/23] xsplice: Add support for shadow variables Konrad Rzeszutek Wilk
@ 2016-03-07  7:40   ` Martin Pohlack
  2016-03-15 18:02     ` Konrad Rzeszutek Wilk
  2016-03-07 18:52   ` Martin Pohlack
  1 sibling, 1 reply; 86+ messages in thread
From: Martin Pohlack @ 2016-03-07  7:40 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, andrew.cooper3, konrad,
	mpohlack, ross.lagerwall, sasha.levin, jinsong.liu, Ian Campbell,
	Ian Jackson, Jan Beulich, Keir Fraser, Tim Deegan, xen-devel

On 12.02.2016 19:05, Konrad Rzeszutek Wilk wrote:
> From: Ross Lagerwall <ross.lagerwall@citrix.com>
> 
> Shadow variables are a piece of infrastructure to be used by xsplice
> modules. They are used to attach a new piece of data to an existing
> structure in memory.
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> ---
>  xen/common/Makefile             |   1 +
>  xen/common/xsplice_shadow.c     | 105 ++++++++++++++++++++++++++++++++++++++++
>  xen/include/xen/xsplice_patch.h |  39 +++++++++++++++
>  3 files changed, 145 insertions(+)
>  create mode 100644 xen/common/xsplice_shadow.c
>  create mode 100644 xen/include/xen/xsplice_patch.h
> 
> diff --git a/xen/common/Makefile b/xen/common/Makefile
> index a8ceaff..f4d54ad 100644
> --- a/xen/common/Makefile
> +++ b/xen/common/Makefile
> @@ -75,3 +75,4 @@ subdir-$(CONFIG_HAS_DEVICE_TREE) += libfdt
>  
>  obj-$(CONFIG_XSPLICE) += xsplice.o
>  obj-$(CONFIG_XSPLICE) += xsplice_elf.o
> +obj-$(CONFIG_XSPLICE) += xsplice_shadow.o
> diff --git a/xen/common/xsplice_shadow.c b/xen/common/xsplice_shadow.c
> new file mode 100644
> index 0000000..619cdee
> --- /dev/null
> +++ b/xen/common/xsplice_shadow.c
> @@ -0,0 +1,105 @@
> +#include <xen/init.h>
> +#include <xen/kernel.h>
> +#include <xen/lib.h>
> +#include <xen/list.h>
> +#include <xen/spinlock.h>
> +#include <xen/xsplice_patch.h>
> +
> +#define SHADOW_SLOTS 256

Using something very round here will give you lot's of hash collisions
at the price of a very fast hash computation as compilers, linkers, and
memory allocators tend to align starting addresses.  I would suggest to
use a small prime here, e.g., 257 or 251, to have a first approximation
of a simple hash function.

Or use existing hash infrastructure (see below).

> +struct hlist_head shadow_tbl[SHADOW_SLOTS];
> +static DEFINE_SPINLOCK(shadow_lock);
> +
> +struct shadow_var {
> +    struct hlist_node list;         /* Linked to 'shadow_tbl' */
> +    void *data;
> +    const void *obj;
> +    char var[16];
> +};
> +
> +void *xsplice_shadow_alloc(const void *obj, const char *var, size_t size)
> +{
> +    struct shadow_var *shadow;
> +    unsigned int slot;
> +
> +    shadow = xmalloc(struct shadow_var);
> +    if ( !shadow )
> +        return NULL;
> +
> +    shadow->obj = obj;
> +    strlcpy(shadow->var, var, sizeof shadow->var);
> +    shadow->data = xmalloc_bytes(size);
> +    if ( !shadow->data )
> +    {
> +        xfree(shadow);
> +        return NULL;
> +    }
> +
> +    slot = (unsigned long)obj % SHADOW_SLOTS;

hash.h has an earlier import from Linux and provides hash_long().  That
looks like it would not suffer from direct hash collisions.

(also for all other occurrences of "obj % SHADOW_SLOTS" below)

> +    spin_lock(&shadow_lock);
> +    hlist_add_head(&shadow->list, &shadow_tbl[slot]);
> +    spin_unlock(&shadow_lock);
> +
> +    return shadow->data;
> +}
> +
> +void xsplice_shadow_free(const void *obj, const char *var)
> +{
> +    struct shadow_var *entry, *shadow = NULL;
> +    unsigned int slot;
> +    struct hlist_node *next;
> +
> +    slot = (unsigned long)obj % SHADOW_SLOTS;
> +
> +    spin_lock(&shadow_lock);
> +    hlist_for_each_entry(entry, next, &shadow_tbl[slot], list)
> +    {
> +        if ( entry->obj == obj &&
> +             !strcmp(entry->var, var) )
> +        {
> +            shadow = entry;
> +            break;
> +        }
> +    }
> +    if (shadow)
> +    {
> +        hlist_del(&shadow->list);
> +        xfree(shadow->data);
> +        xfree(shadow);
> +    }
> +    spin_unlock(&shadow_lock);
> +}
> +
> +void *xsplice_shadow_get(const void *obj, const char *var)
> +{
> +    struct shadow_var *entry;
> +    unsigned int slot;
> +    struct hlist_node *next;
> +    void *ret = NULL;
> +
> +    slot = (unsigned long)obj % SHADOW_SLOTS;
> +
> +    spin_lock(&shadow_lock);
> +    hlist_for_each_entry(entry, next, &shadow_tbl[slot], list)
> +    {
> +        if ( entry->obj == obj &&
> +             !strcmp(entry->var, var) )
> +        {
> +            ret = entry->data;
> +            break;
> +        }
> +    }
> +
> +    spin_unlock(&shadow_lock);
> +    return ret;
> +}
> +
> +static int __init xsplice_shadow_init(void)
> +{
> +    int i;
> +
> +    for ( i = 0; i < SHADOW_SLOTS; i++ )
> +        INIT_HLIST_HEAD(&shadow_tbl[i]);
> +
> +    return 0;
> +}
> +__initcall(xsplice_shadow_init);
> diff --git a/xen/include/xen/xsplice_patch.h b/xen/include/xen/xsplice_patch.h
> new file mode 100644
> index 0000000..e3f344b
> --- /dev/null
> +++ b/xen/include/xen/xsplice_patch.h
> @@ -0,0 +1,39 @@
> +#ifndef __XEN_XSPLICE_PATCH_H__
> +#define __XEN_XSPLICE_PATCH_H__
> +
> +/*
> + * The following definitions are to be used in patches. They are taken
> + * from kpatch.
> + */
> +
> +/*
> + * xsplice shadow variables
> + *
> + * These functions can be used to add new "shadow" fields to existing data
> + * structures.  For example, to allocate a "newpid" variable associated with an
> + * instance of task_struct, and assign it a value of 1000:
> + *
> + * struct task_struct *tsk = current;
> + * int *newpid;
> + * newpid = xsplice_shadow_alloc(tsk, "newpid", sizeof(int));
> + * if (newpid)
> + * 	*newpid = 1000;
> + *
> + * To retrieve a pointer to the variable:
> + *
> + * struct task_struct *tsk = current;
> + * int *newpid;
> + * newpid = xsplice_shadow_get(tsk, "newpid");
> + * if (newpid)
> + * 	printk("task newpid = %d\n", *newpid); // prints "task newpid = 1000"
> + *
> + * To free it:
> + *
> + * xsplice_shadow_free(tsk, "newpid");
> + */
> +
> +void *xsplice_shadow_alloc(const void *obj, const char *var, size_t size);
> +void xsplice_shadow_free(const void *obj, const char *var);
> +void *xsplice_shadow_get(const void *obj, const char *var);
> +
> +#endif /* __XEN_XSPLICE_PATCH_H__ */
> 

Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 21/23] xsplice: Add support for shadow variables
  2016-02-12 18:05 ` [PATCH v3 21/23] xsplice: Add support for shadow variables Konrad Rzeszutek Wilk
  2016-03-07  7:40   ` Martin Pohlack
@ 2016-03-07 18:52   ` Martin Pohlack
  1 sibling, 0 replies; 86+ messages in thread
From: Martin Pohlack @ 2016-03-07 18:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, andrew.cooper3, konrad,
	mpohlack, ross.lagerwall, sasha.levin, jinsong.liu, Ian Campbell,
	Ian Jackson, Jan Beulich, Keir Fraser, Tim Deegan, xen-devel

On 12.02.2016 19:05, Konrad Rzeszutek Wilk wrote:
> From: Ross Lagerwall <ross.lagerwall@citrix.com>
> 
> Shadow variables are a piece of infrastructure to be used by xsplice
> modules. They are used to attach a new piece of data to an existing
> structure in memory.
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> ---
>  xen/common/Makefile             |   1 +
>  xen/common/xsplice_shadow.c     | 105 ++++++++++++++++++++++++++++++++++++++++
>  xen/include/xen/xsplice_patch.h |  39 +++++++++++++++
>  3 files changed, 145 insertions(+)
>  create mode 100644 xen/common/xsplice_shadow.c
>  create mode 100644 xen/include/xen/xsplice_patch.h
> 
> diff --git a/xen/common/Makefile b/xen/common/Makefile
> index a8ceaff..f4d54ad 100644
> --- a/xen/common/Makefile
> +++ b/xen/common/Makefile
> @@ -75,3 +75,4 @@ subdir-$(CONFIG_HAS_DEVICE_TREE) += libfdt
>  
>  obj-$(CONFIG_XSPLICE) += xsplice.o
>  obj-$(CONFIG_XSPLICE) += xsplice_elf.o
> +obj-$(CONFIG_XSPLICE) += xsplice_shadow.o
> diff --git a/xen/common/xsplice_shadow.c b/xen/common/xsplice_shadow.c
> new file mode 100644
> index 0000000..619cdee
> --- /dev/null
> +++ b/xen/common/xsplice_shadow.c
> @@ -0,0 +1,105 @@
> +#include <xen/init.h>
> +#include <xen/kernel.h>
> +#include <xen/lib.h>
> +#include <xen/list.h>
> +#include <xen/spinlock.h>
> +#include <xen/xsplice_patch.h>
> +
> +#define SHADOW_SLOTS 256
> +struct hlist_head shadow_tbl[SHADOW_SLOTS];

Thinking about this more, how would a module using this global hash ever
be unloadable again without leaking memory?

For unloading you would need some iterator that walks all the
dynamically created shadow elements and frees them.  The simplest
approach would be if each hotpatch would bring its own instance of the
hash table (if it needs it).  That would allow it to fully walk and
release the hash content on its unload path.

Martin

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 21/23] xsplice: Add support for shadow variables
  2016-03-07  7:40   ` Martin Pohlack
@ 2016-03-15 18:02     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 86+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 18:02 UTC (permalink / raw)
  To: Martin Pohlack
  Cc: Keir Fraser, Ian Campbell, andrew.cooper3, Ian Jackson,
	Tim Deegan, mpohlack, ross.lagerwall, Jan Beulich, jinsong.liu,
	xen-devel, xen-devel, sasha.levin

On Mon, Mar 07, 2016 at 08:40:47AM +0100, Martin Pohlack wrote:
> On 12.02.2016 19:05, Konrad Rzeszutek Wilk wrote:
> > From: Ross Lagerwall <ross.lagerwall@citrix.com>
> > 
> > Shadow variables are a piece of infrastructure to be used by xsplice
> > modules. They are used to attach a new piece of data to an existing
> > structure in memory.
> > 
> > Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> > ---
> >  xen/common/Makefile             |   1 +
> >  xen/common/xsplice_shadow.c     | 105 ++++++++++++++++++++++++++++++++++++++++
> >  xen/include/xen/xsplice_patch.h |  39 +++++++++++++++
> >  3 files changed, 145 insertions(+)
> >  create mode 100644 xen/common/xsplice_shadow.c
> >  create mode 100644 xen/include/xen/xsplice_patch.h
> > 
> > diff --git a/xen/common/Makefile b/xen/common/Makefile
> > index a8ceaff..f4d54ad 100644
> > --- a/xen/common/Makefile
> > +++ b/xen/common/Makefile
> > @@ -75,3 +75,4 @@ subdir-$(CONFIG_HAS_DEVICE_TREE) += libfdt
> >  
> >  obj-$(CONFIG_XSPLICE) += xsplice.o
> >  obj-$(CONFIG_XSPLICE) += xsplice_elf.o
> > +obj-$(CONFIG_XSPLICE) += xsplice_shadow.o
> > diff --git a/xen/common/xsplice_shadow.c b/xen/common/xsplice_shadow.c
> > new file mode 100644
> > index 0000000..619cdee
> > --- /dev/null
> > +++ b/xen/common/xsplice_shadow.c
> > @@ -0,0 +1,105 @@
> > +#include <xen/init.h>
> > +#include <xen/kernel.h>
> > +#include <xen/lib.h>
> > +#include <xen/list.h>
> > +#include <xen/spinlock.h>
> > +#include <xen/xsplice_patch.h>
> > +
> > +#define SHADOW_SLOTS 256
> 
> Using something very round here will give you lot's of hash collisions
> at the price of a very fast hash computation as compilers, linkers, and
> memory allocators tend to align starting addresses.  I would suggest to
> use a small prime here, e.g., 257 or 251, to have a first approximation
> of a simple hash function.

Hey!

I totally missed this comment. Let me spoll this up in my TODO to
change this to 257 for v5.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2016-03-15 18:02 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-12 18:05 [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
2016-02-12 18:05 ` [PATCH v3 01/23] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op (v10) Konrad Rzeszutek Wilk
2016-02-12 20:11   ` Andrew Cooper
2016-02-12 20:40     ` Konrad Rzeszutek Wilk
2016-02-12 20:53       ` Andrew Cooper
2016-02-15  8:16       ` Jan Beulich
2016-02-19 19:36     ` Konrad Rzeszutek Wilk
2016-02-19 19:43       ` Andrew Cooper
2016-02-12 18:05 ` [PATCH v3 02/23] libxc: Implementation of XEN_XSPLICE_op in libxc (v5) Konrad Rzeszutek Wilk
2016-02-15 12:35   ` Wei Liu
2016-02-19 20:04     ` Konrad Rzeszutek Wilk
2016-02-12 18:05 ` [PATCH v3 03/23] xen-xsplice: Tool to manipulate xsplice payloads (v4) Konrad Rzeszutek Wilk
2016-02-15 12:59   ` Wei Liu
2016-02-19 20:46     ` Konrad Rzeszutek Wilk
2016-02-12 18:05 ` [PATCH v3 04/23] elf: Add relocation types to elfstructs.h Konrad Rzeszutek Wilk
2016-02-12 20:13   ` Andrew Cooper
2016-02-15  8:34   ` Jan Beulich
2016-02-19 21:05     ` Konrad Rzeszutek Wilk
2016-02-22 10:17       ` Jan Beulich
2016-02-22 15:19       ` Ross Lagerwall
2016-02-12 18:05 ` [PATCH v3 05/23] xsplice: Add helper elf routines (v4) Konrad Rzeszutek Wilk
2016-02-12 20:24   ` Andrew Cooper
2016-02-12 20:47     ` Konrad Rzeszutek Wilk
2016-02-12 20:52       ` Andrew Cooper
2016-02-12 18:05 ` [PATCH v3 06/23] xsplice: Implement payload loading (v4) Konrad Rzeszutek Wilk
2016-02-12 20:48   ` Andrew Cooper
2016-02-19 22:03     ` Konrad Rzeszutek Wilk
2016-02-12 18:05 ` [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5) Konrad Rzeszutek Wilk
2016-02-16 19:11   ` Andrew Cooper
2016-02-17  8:58     ` Ross Lagerwall
2016-02-17 10:50     ` Jan Beulich
2016-02-19  9:30     ` Ross Lagerwall
2016-02-23 20:41     ` Konrad Rzeszutek Wilk
2016-02-23 20:53       ` Konrad Rzeszutek Wilk
2016-02-23 20:57       ` Konrad Rzeszutek Wilk
2016-02-23 21:10       ` Andrew Cooper
2016-02-24  9:31         ` Jan Beulich
2016-02-22 15:00   ` Ross Lagerwall
2016-02-22 17:06     ` Ross Lagerwall
2016-02-23 20:47       ` Konrad Rzeszutek Wilk
2016-02-23 20:43     ` Konrad Rzeszutek Wilk
2016-02-12 18:05 ` [PATCH v3 08/23] x86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version'. (v2) Konrad Rzeszutek Wilk
2016-02-16 11:31   ` Ross Lagerwall
2016-02-12 18:05 ` [PATCH v3 09/23] xsplice: Add support for bug frames. (v4) Konrad Rzeszutek Wilk
2016-02-16 19:35   ` Andrew Cooper
2016-02-24 16:22     ` Konrad Rzeszutek Wilk
2016-02-24 16:30       ` Andrew Cooper
2016-02-24 16:26     ` Konrad Rzeszutek Wilk
2016-02-12 18:05 ` [PATCH v3 10/23] xsplice: Add support for exception tables. (v2) Konrad Rzeszutek Wilk
2016-02-12 18:05 ` [PATCH v3 11/23] xsplice: Add support for alternatives Konrad Rzeszutek Wilk
2016-02-16 19:41   ` Andrew Cooper
2016-02-12 18:05 ` [PATCH v3 12/23] xsm/xen_version: Add XSM for the xen_version hypercall (v8) Konrad Rzeszutek Wilk
2016-02-12 21:52   ` Daniel De Graaf
2016-02-12 18:05 ` [PATCH v3 13/23] XENVER_build_id: Provide ld-embedded build-ids (v10) Konrad Rzeszutek Wilk
2016-02-12 21:52   ` Daniel De Graaf
2016-02-16 20:09   ` Andrew Cooper
2016-02-16 20:22     ` Konrad Rzeszutek Wilk
2016-02-16 20:26       ` Andrew Cooper
2016-02-16 20:40         ` Konrad Rzeszutek Wilk
2016-02-24 18:52     ` Konrad Rzeszutek Wilk
2016-02-24 19:13       ` Andrew Cooper
2016-02-24 20:54         ` Konrad Rzeszutek Wilk
2016-02-12 18:05 ` [PATCH v3 14/23] libxl: info: Display build_id of the hypervisor Konrad Rzeszutek Wilk
2016-02-15 12:45   ` Wei Liu
2016-02-12 18:05 ` [PATCH v3 15/23] xsplice: Print build_id in keyhandler Konrad Rzeszutek Wilk
2016-02-16 20:13   ` Andrew Cooper
2016-02-12 18:05 ` [PATCH v3 16/23] xsplice: basic build-id dependency checking Konrad Rzeszutek Wilk
2016-02-12 18:05 ` [PATCH v3 17/23] xsplice: Print dependency and payloads build_id in the keyhandler Konrad Rzeszutek Wilk
2016-02-16 20:20   ` Andrew Cooper
2016-02-17 11:10     ` Jan Beulich
2016-02-24 21:54       ` Konrad Rzeszutek Wilk
2016-02-25  8:47         ` Jan Beulich
2016-02-12 18:05 ` [PATCH v3 18/23] xsplice: Prevent duplicate payloads to be loaded Konrad Rzeszutek Wilk
2016-02-12 18:05 ` [PATCH v3 19/23] xsplice, symbols: Implement symbol name resolution on address. (v2) Konrad Rzeszutek Wilk
2016-02-22 14:57   ` Ross Lagerwall
2016-02-12 18:05 ` [PATCH v3 20/23] x86, xsplice: Print payload's symbol name and module in backtraces Konrad Rzeszutek Wilk
2016-02-12 18:05 ` [PATCH v3 21/23] xsplice: Add support for shadow variables Konrad Rzeszutek Wilk
2016-03-07  7:40   ` Martin Pohlack
2016-03-15 18:02     ` Konrad Rzeszutek Wilk
2016-03-07 18:52   ` Martin Pohlack
2016-02-12 18:06 ` [PATCH v3 22/23] xsplice: Add hooks functions and other macros Konrad Rzeszutek Wilk
2016-02-12 18:06 ` [PATCH v3 23/23] xsplice, hello_world: Use the XSPLICE_[UN|]LOAD_HOOK hooks for two functions Konrad Rzeszutek Wilk
2016-02-12 21:57 ` [PATCH v3] xSplice v1 implementation and design Konrad Rzeszutek Wilk
2016-02-12 21:57   ` [PATCH v3 MISSING/23] xsplice: Design document (v7) Konrad Rzeszutek Wilk
2016-02-18 16:20     ` Jan Beulich
2016-02-19 18:36       ` Konrad Rzeszutek Wilk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.