xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] xSplice v1 design and implementation.
@ 2016-03-15 17:56 Konrad Rzeszutek Wilk
  2016-03-15 17:56 ` [PATCH v4 01/34] compat/x86: Remove unncessary #define Konrad Rzeszutek Wilk
                   ` (33 more replies)
  0 siblings, 34 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin

Hey!

Changelog:
v3: http://www.gossamer-threads.com/lists/xen/devel/418262
    and 
    http://lists.xenproject.org/archives/html/xen-devel/2016-02/msg04106.html
 - Act on all reviews.
 - Redo the flow of patches
v2: http://lists.xen.org/archives/html/xen-devel/2016-01/msg01597.html
 - Updated code/docs/design with review comments.
 - Make xen also have an PT_NOTE
 - Added more of Ross's patches
 - Combined build-id patchset with this.
(since the RFC and the Seattle Xen presentation)
 - Finished off some of the work around the build-id.
 - Settled on the preemption mechanism.
 - Cleaned the patches a lot up, broke them up to easy
   review for maintainers.
v1: http://lists.xenproject.org/archives/html/xen-devel/2015-09/msg02116.html
  - Put all the design comments in the code
Prototype: http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg02595.html
[Posting by Ross]
 - Took all reviews into account.
 - Redid the patches


*Flow of patches*

The first five patches add the neccessary components that have been
requested by Andrew. There are also four fixes I've included.

I was not sure if it made sense to split it of as a standalone
patchset (similar to how I did it in v3) or just include it in this patchset.
To make it easier for folks to see the whole thing I've just included
them in this giant patchset:

 compat/x86: Remove unncessary #define.
 libxc: Remove dead code (XENVER_capabilities)
 xsm/xen_version: Add XSM for the xen_version hypercall
 HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.
 libxc/libxl/python/xenstat: Use new XEN_VERSION_OP hypercall
 x86/arm: Add BUGFRAME_NR define and BUILD checks.
 arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables
 vmap: Make the while loop less fishy.
 vmap: ASSERT on NULL.
 vmap: Add vmalloc_cb and vfree_cb

Since they are generic and touch ARM and x86 they have been tested
on x86 (legacy and EFI) and on ARM (CubieTruck). Actually the whole
branch has been tested on those three platforms.

*What is xSplice?*

A mechanism to binarily patch the running hypervisor with new
opcodes that have come about due to primarily security updates.

*What will this patchset do once I've it*

Patch the hypervisor.

*Why are you emailing me?*

Please please review as many patches as possible.

*OK, what do you have?*

They are located at a git tree:
  git://xenbits.xen.org/people/konradwilk/xen.git xsplice.v4

(Copying from Ross's email):

Much of the work is implementing a basic version of the Linux kernel module
loader. The code:
* Loading of xSplice ELF payloads.
* Copying allocated sections into a new executable region of memory.
* Resolving symbols.
* Applying relocations.
* Patching of altinstructions.
* Special handling of bug frames and exception tables.
* Unloading of xSplice ELF payloads.
* Compiling a sample xSplice ELF payload
* Resolving symbols
* Using build-id dependencies
* Support for shadow variable framework
* Support for executing ELF payload functions on load/unload.

The other main bit of this work is applying and reverting the patches safely.
As implemented, the code is patched with each CPU waiting in the
return-to-guest path (i.e. with no stack) or on the cpu-idle path
which appears to be the safest way of patching. While it is safe we should
still (in the next wave of patches) to verify to not patch cetain critical
sections (say the code doing the patching)

All of the following should work:
* Applying patches safely.
* Reverting patches safely.
* Replacing patches safely (e.g. reverting any applied patches and applying
   a new patch).
* Bug frames as part of modules. This means adding or
  changing WARN, ASSERT, BUG, and run_in_exception_handler works correctly.
  Line number only changes _are ignored_.
* Exception tables as part of modules. E.g. wrmsr_safe and copy_to_user work
  correctly when used in a patch module.
* Stacking of patches on top of each other
* Resolving symbols (even of patches)

*Limitations*

The above is enough to fully implement an update system where multiple source
patches are combined (using combinediff) and built into a single binary
which then atomically replaces any existing loaded patches
(this is why Ross added a REPLACE operation). This is the approach used
by kPatch and kGraft.

Multiple completely independent patches can also be loaded but unexpected
interactions may occur.

As it stands, the patches are statically linked which means that independent
patches cannot be linked against one another (e.g. if one introduces a
new symbol). Using the combinediff approach above fixes this.

Backtraces containing functions from a patch module do not show the symbol name.

There is no checking that a patch which is loaded is built for the
correct hypervisor (need to use build-id).

Binary patching works at the function level.

*Testing*

You can use the example code included in this patchset:

# xl info | grep extra
xen_extra              : -unstable
# xen-xsplice load /usr/lib/debug/xen_hello_world.xsplice
Uploading /usr/lib/debug/xen_hello_world.xsplice (2071 bytes)
Performing check: completed
Performing apply:. completed
# xl info | grep extra
xen_extra              : Hello World
# xen-xsplice revert xen_hello_world
Performing revert:. completed
# xen-xsplice unload xen_hello_world
Performing unload: completed
# xl info | grep extra
xen_extra              : -unstable

Or you can use git://xenbits.xen.org/people/konradwilk/xsplice-build-tools.git
which generates the ELF payloads.

This link has a nice description of how to use the tool:
http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg02595.html


 .gitignore                                   |    3 +
 Config.mk                                    |   12 +
 MAINTAINERS                                  |   10 +
 docs/misc/xsplice.markdown                   | 1115 +++++++++++++++++++
 tools/flask/policy/policy/modules/xen/xen.te |   21 +
 tools/libxc/include/xenctrl.h                |   86 +-
 tools/libxc/xc_core.c                        |   35 +-
 tools/libxc/xc_dom_boot.c                    |   12 +-
 tools/libxc/xc_dom_x86.c                     |    7 -
 tools/libxc/xc_domain.c                      |    3 +-
 tools/libxc/xc_misc.c                        |  337 ++++++
 tools/libxc/xc_private.c                     |   53 +-
 tools/libxc/xc_private.h                     |    7 +-
 tools/libxc/xc_resume.c                      |    3 +-
 tools/libxc/xc_sr_save.c                     |    9 +-
 tools/libxc/xg_save_restore.h                |    6 +-
 tools/libxl/libxl.c                          |   94 +-
 tools/libxl/libxl.h                          |    5 +
 tools/libxl/libxl_types.idl                  |    1 +
 tools/libxl/xl_cmdimpl.c                     |    1 +
 tools/misc/Makefile                          |    4 +
 tools/misc/xen-xsplice.c                     |  463 ++++++++
 tools/ocaml/libs/xc/xenctrl_stubs.c          |   39 +-
 tools/python/xen/lowlevel/xc/xc.c            |   30 +-
 tools/xenstat/libxenstat/src/xenstat.c       |   12 +-
 tools/xentrace/xenctx.c                      |    3 +-
 xen/Makefile                                 |    2 +
 xen/arch/arm/Makefile                        |    7 +-
 xen/arch/arm/traps.c                         |   46 +-
 xen/arch/arm/xen.lds.S                       |   20 +-
 xen/arch/arm/xsplice.c                       |   76 ++
 xen/arch/x86/Makefile                        |   45 +-
 xen/arch/x86/alternative.c                   |   20 +-
 xen/arch/x86/boot/mkelf32.c                  |  137 ++-
 xen/arch/x86/domain.c                        |    4 +
 xen/arch/x86/extable.c                       |   44 +-
 xen/arch/x86/hvm/hvm.c                       |    1 +
 xen/arch/x86/hvm/svm/svm.c                   |    2 +
 xen/arch/x86/hvm/vmx/vmcs.c                  |    2 +
 xen/arch/x86/setup.c                         |   10 +-
 xen/arch/x86/test/Makefile                   |   87 ++
 xen/arch/x86/test/xen_bye_world.c            |   35 +
 xen/arch/x86/test/xen_bye_world_func.c       |   25 +
 xen/arch/x86/test/xen_hello_world.c          |   54 +
 xen/arch/x86/test/xen_hello_world_func.c     |   26 +
 xen/arch/x86/test/xen_replace_world.c        |   35 +
 xen/arch/x86/test/xen_replace_world_func.c   |   25 +
 xen/arch/x86/traps.c                         |   51 +-
 xen/arch/x86/x86_64/compat/entry.S           |    2 +
 xen/arch/x86/x86_64/entry.S                  |    2 +
 xen/arch/x86/xen.lds.S                       |   23 +
 xen/arch/x86/xsplice.c                       |  295 +++++
 xen/common/Kconfig                           |   16 +
 xen/common/Makefile                          |    4 +
 xen/common/bug_ex_symbols.c                  |  119 +++
 xen/common/compat/kernel.c                   |    4 +-
 xen/common/kernel.c                          |  328 +++++-
 xen/common/symbols.c                         |   62 +-
 xen/common/sysctl.c                          |    7 +
 xen/common/version.c                         |   82 ++
 xen/common/vmap.c                            |   45 +-
 xen/common/vsprintf.c                        |   15 +-
 xen/common/xsplice.c                         | 1476 ++++++++++++++++++++++++++
 xen/common/xsplice_elf.c                     |  385 +++++++
 xen/common/xsplice_shadow.c                  |  109 ++
 xen/include/asm-arm/bug.h                    |    2 +
 xen/include/asm-arm/nmi.h                    |   13 +
 xen/include/asm-x86/alternative.h            |    6 +
 xen/include/asm-x86/bug.h                    |    3 +-
 xen/include/asm-x86/uaccess.h                |    5 +
 xen/include/asm-x86/x86_64/page.h            |    2 +
 xen/include/public/arch-arm.h                |    3 +
 xen/include/public/sysctl.h                  |  169 +++
 xen/include/public/version.h                 |   75 +-
 xen/include/public/xen.h                     |    1 +
 xen/include/xen/bug_ex_symbols.h             |   74 ++
 xen/include/xen/hypercall.h                  |    4 +
 xen/include/xen/kernel.h                     |    2 +
 xen/include/xen/symbols.h                    |   11 +
 xen/include/xen/version.h                    |    6 +
 xen/include/xen/vmap.h                       |   14 +
 xen/include/xen/xsplice.h                    |  129 +++
 xen/include/xen/xsplice_elf.h                |   56 +
 xen/include/xen/xsplice_patch.h              |   95 ++
 xen/include/xsm/dummy.h                      |   41 +
 xen/include/xsm/xsm.h                        |   12 +
 xen/xsm/dummy.c                              |    2 +
 xen/xsm/flask/hooks.c                        |   91 ++
 xen/xsm/flask/policy/access_vectors          |   52 +
 xen/xsm/flask/policy/security_classes        |    1 +
 90 files changed, 6661 insertions(+), 307 deletions(-)

Konrad Rzeszutek Wilk (24):
      compat/x86: Remove unncessary #define.
      libxc: Remove dead code (XENVER_capabilities)
      xsm/xen_version: Add XSM for the xen_version hypercall
      HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.
      libxc/libxl/python/xenstat: Use new XEN_VERSION_OP hypercall
      x86/arm: Add BUGFRAME_NR define and BUILD checks.
      arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables
      vmap: Make the while loop less fishy.
      vmap: ASSERT on NULL.
      vmap: Add vmalloc_cb and vfree_cb
      xsplice: Design document
      xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
      libxc: Implementation of XEN_XSPLICE_op in libxc
      xen-xsplice: Tool to manipulate xsplice payloads
      x86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version'.
      xsplice,symbols: Implement symbol name resolution on address.
      build_id: Provide ld-embedded build-ids
      HYPERCALL_version_op: Add VERSION_OP_build_id to retrieve build-id.
      libxl: info: Display build_id of the hypervisor using XEN_VERSION_OP_build_id
      xsplice: Print build_id in keyhandler and on bootup.
      xsplice: Stacking build-id dependency checking.
      xsplice/xen_replace_world: Test-case for XSPLICE_ACTION_REPLACE
      xsplice: Print dependency and payloads build_id in the keyhandler.
      MAINTAINERS/xsplice: Add myself and Ross as the maintainers.

Ross Lagerwall (10):
      xsplice: Add helper elf routines
      xsplice: Implement payload loading
      xsplice: Implement support for applying/reverting/replacing patches.
      x86, xsplice: Print payload's symbol name and payload name in backtraces
      xsplice: Add .xsplice.hooks functions and test-case
      xsplice: Add support for bug frames.
      xsplice: Add support for exception tables.
      xsplice: Add support for alternatives
      xsplice: Prevent duplicate payloads to be loaded.
      xsplice: Add support for shadow variables.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* [PATCH v4 01/34] compat/x86: Remove unncessary #define.
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 18:57   ` Andrew Cooper
  2016-03-16 11:08   ` Jan Beulich
  2016-03-15 17:56 ` [PATCH v4 02/34] libxc: Remove dead code (XENVER_capabilities) Konrad Rzeszutek Wilk
                   ` (32 subsequent siblings)
  33 siblings, 2 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Tim Deegan, Ian Jackson, Jan Beulich, Konrad Rzeszutek Wilk

It is not used.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
---
---
 xen/common/compat/kernel.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/xen/common/compat/kernel.c b/xen/common/compat/kernel.c
index df93fdd..e1b9013 100644
--- a/xen/common/compat/kernel.c
+++ b/xen/common/compat/kernel.c
@@ -18,7 +18,6 @@ asm(".file \"" __FILE__ "\"");
 
 extern xen_commandline_t saved_cmdline;
 
-#define xen_extraversion compat_extraversion
 #define xen_extraversion_t compat_extraversion_t
 
 #define xen_compile_info compat_compile_info
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 02/34] libxc: Remove dead code (XENVER_capabilities)
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
  2016-03-15 17:56 ` [PATCH v4 01/34] compat/x86: Remove unncessary #define Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 18:04   ` Andrew Cooper
  2016-03-16 18:11   ` Wei Liu
  2016-03-15 17:56 ` [PATCH v4 03/34] xsm/xen_version: Add XSM for the xen_version hypercall Konrad Rzeszutek Wilk
                   ` (31 subsequent siblings)
  33 siblings, 2 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Konrad Rzeszutek Wilk

The 'caps' is not used anywhere in there.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
---
 tools/libxc/xc_dom_x86.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index bdec40a..021f8a8 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -1259,7 +1259,6 @@ static int meminit_hvm(struct xc_dom_image *dom)
     unsigned long target_pages = dom->target_pages;
     unsigned long cur_pages, cur_pfn;
     int rc;
-    xen_capabilities_info_t caps;
     unsigned long stat_normal_pages = 0, stat_2mb_pages = 0, 
         stat_1gb_pages = 0;
     unsigned int memflags = 0;
@@ -1339,12 +1338,6 @@ static int meminit_hvm(struct xc_dom_image *dom)
         goto error_out;
     }
 
-    if ( xc_version(xch, XENVER_capabilities, &caps) != 0 )
-    {
-        DOMPRINTF("Could not get Xen capabilities");
-        goto error_out;
-    }
-
     dom->p2m_size = p2m_size;
     dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) *
                                       dom->p2m_size);
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 03/34] xsm/xen_version: Add XSM for the xen_version hypercall
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
  2016-03-15 17:56 ` [PATCH v4 01/34] compat/x86: Remove unncessary #define Konrad Rzeszutek Wilk
  2016-03-15 17:56 ` [PATCH v4 02/34] libxc: Remove dead code (XENVER_capabilities) Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-18 11:55   ` Jan Beulich
                     ` (2 more replies)
  2016-03-15 17:56 ` [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane Konrad Rzeszutek Wilk
                   ` (30 subsequent siblings)
  33 siblings, 3 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Wei Liu, Daniel De Graaf, Stefano Stabellini, Ian Jackson,
	Konrad Rzeszutek Wilk

All of XENVER_* have now an XSM check for their sub-ops.

The subop for XENVER_commandline is now a priviliged operation.
To not break guests we still return an string - but it is
just '<denied>\0'.

The rest: XENVER_[version|extraversion|capabilities|
parameters|get_features|page_size|guest_handle|changeset|
compile_info] behave as before - allowed by default for all
guests if using the XSM default policy or with the dummy one.

The admin can choose to change the sub-ops to be denied
as they see fit.

Also we add a local variable block.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>

v2: Do XSM check for all the XENVER_ ops.
v3: Add empty data conditions.
v4: Return <denied> for priv subops.
v5: Move extraversion from priv to normal. Drop the XSM check
    for the non-priv subops.
v6: Add +1 for strlen(xen_deny()) to include NULL. Move changeset,
    compile_info to non-priv subops.
v7: Remove the \0 on xen_deny()
v8: Add new XSM domain for xenver hypercall. Add all subops to it.
v9: Remove the extra line, Add Ack from Daniel
v10: Rename the XSM from xen_version_op to xsm_xen_version.
    Prefix the types with 'xen' to distinguish it from another
    hypercall performing similar operation. Removed Ack from Daniel
    as it was so large. Add local variable block.
---
 tools/flask/policy/policy/modules/xen/xen.te | 15 ++++++++
 xen/common/kernel.c                          | 53 +++++++++++++++++++++-------
 xen/common/version.c                         | 15 ++++++++
 xen/include/xen/version.h                    |  2 +-
 xen/include/xsm/dummy.h                      | 22 ++++++++++++
 xen/include/xsm/xsm.h                        |  5 +++
 xen/xsm/dummy.c                              |  1 +
 xen/xsm/flask/hooks.c                        | 43 ++++++++++++++++++++++
 xen/xsm/flask/policy/access_vectors          | 28 +++++++++++++++
 xen/xsm/flask/policy/security_classes        |  1 +
 10 files changed, 172 insertions(+), 13 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index d35ae22..7e7400d 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -73,6 +73,15 @@ allow dom0_t xen_t:xen2 {
     pmu_ctrl
     get_symbol
 };
+
+# Allow dom0 to use all XENVER_ subops
+# Note that dom0 is part of domain_type so this has duplicates.
+allow dom0_t xen_t:version {
+    xen_version xen_extraversion xen_compile_info xen_capabilities
+    xen_changeset xen_platform_parameters xen_get_features xen_pagesize
+    xen_guest_handle xen_commandline
+};
+
 allow dom0_t xen_t:mmu memorymap;
 
 # Allow dom0 to use these domctls on itself. For domctls acting on other
@@ -137,6 +146,12 @@ if (guest_writeconsole) {
 # pmu_ctrl is for)
 allow domain_type xen_t:xen2 pmu_use;
 
+# For normal guests all except XENVER_commandline
+allow domain_type xen_t:version {
+    xen_version xen_extraversion xen_compile_info xen_capabilities
+    xen_changeset xen_platform_parameters xen_get_features xen_pagesize
+    xen_guest_handle
+};
 ###############################################################################
 #
 # Domain creation
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 0618da2..2699ac0 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -13,6 +13,7 @@
 #include <xen/nmi.h>
 #include <xen/guest_access.h>
 #include <xen/hypercall.h>
+#include <xsm/xsm.h>
 #include <asm/current.h>
 #include <public/nmi.h>
 #include <public/version.h>
@@ -223,12 +224,15 @@ void __init do_initcalls(void)
 /*
  * Simple hypercalls.
  */
-
 DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
+    bool_t deny = !!xsm_xen_version(XSM_OTHER, cmd);
+
     switch ( cmd )
     {
     case XENVER_version:
+        if ( deny )
+            return 0;
         return (xen_major_version() << 16) | xen_minor_version();
 
     case XENVER_extraversion:
@@ -236,7 +240,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         xen_extraversion_t extraversion;
 
         memset(extraversion, 0, sizeof(extraversion));
-        safe_strcpy(extraversion, xen_extra_version());
+        safe_strcpy(extraversion, deny ? xen_deny() : xen_extra_version());
         if ( copy_to_guest(arg, extraversion, ARRAY_SIZE(extraversion)) )
             return -EFAULT;
         return 0;
@@ -247,10 +251,10 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         xen_compile_info_t info;
 
         memset(&info, 0, sizeof(info));
-        safe_strcpy(info.compiler,       xen_compiler());
-        safe_strcpy(info.compile_by,     xen_compile_by());
-        safe_strcpy(info.compile_domain, xen_compile_domain());
-        safe_strcpy(info.compile_date,   xen_compile_date());
+        safe_strcpy(info.compiler,       deny ? xen_deny() : xen_compiler());
+        safe_strcpy(info.compile_by,     deny ? xen_deny() : xen_compile_by());
+        safe_strcpy(info.compile_domain, deny ? xen_deny() : xen_compile_domain());
+        safe_strcpy(info.compile_date,   deny ? xen_deny() : xen_compile_date());
         if ( copy_to_guest(arg, &info, 1) )
             return -EFAULT;
         return 0;
@@ -261,7 +265,8 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         xen_capabilities_info_t info;
 
         memset(info, 0, sizeof(info));
-        arch_get_xen_caps(&info);
+        if ( !deny )
+            arch_get_xen_caps(&info);
 
         if ( copy_to_guest(arg, info, ARRAY_SIZE(info)) )
             return -EFAULT;
@@ -274,6 +279,9 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             .virt_start = HYPERVISOR_VIRT_START
         };
 
+        if ( deny )
+            params.virt_start = 0;
+
         if ( copy_to_guest(arg, &params, 1) )
             return -EFAULT;
         return 0;
@@ -285,7 +293,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         xen_changeset_info_t chgset;
 
         memset(chgset, 0, sizeof(chgset));
-        safe_strcpy(chgset, xen_changeset());
+        safe_strcpy(chgset, deny ? xen_deny() : xen_changeset());
         if ( copy_to_guest(arg, chgset, ARRAY_SIZE(chgset)) )
             return -EFAULT;
         return 0;
@@ -302,6 +310,8 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         switch ( fi.submap_idx )
         {
         case 0:
+            if ( deny )
+                break;
             fi.submap = (1U << XENFEAT_memory_op_vnode_supported);
             if ( VM_ASSIST(d, pae_extended_cr3) )
                 fi.submap |= (1U << XENFEAT_pae_pgdir_above_4gb);
@@ -342,19 +352,38 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     }
 
     case XENVER_pagesize:
+        if ( deny )
+            return 0;
         return (!guest_handle_is_null(arg) ? -EINVAL : PAGE_SIZE);
 
     case XENVER_guest_handle:
-        if ( copy_to_guest(arg, current->domain->handle,
-                           ARRAY_SIZE(current->domain->handle)) )
+    {
+        xen_domain_handle_t hdl;
+        ssize_t len;
+
+        if ( deny )
+        {
+            len = sizeof(hdl);
+            memset(&hdl, 0, len);
+        } else
+            len = ARRAY_SIZE(current->domain->handle);
+
+        if ( copy_to_guest(arg, deny ? hdl : current->domain->handle, len ) )
             return -EFAULT;
         return 0;
-
+    }
     case XENVER_commandline:
-        if ( copy_to_guest(arg, saved_cmdline, ARRAY_SIZE(saved_cmdline)) )
+    {
+        size_t len = ARRAY_SIZE(saved_cmdline);
+
+        if ( deny )
+            len = strlen(xen_deny()) + 1;
+
+        if ( copy_to_guest(arg, deny ? xen_deny() : saved_cmdline, len) )
             return -EFAULT;
         return 0;
     }
+    }
 
     return -ENOSYS;
 }
diff --git a/xen/common/version.c b/xen/common/version.c
index b152e27..fc9bf42 100644
--- a/xen/common/version.c
+++ b/xen/common/version.c
@@ -55,3 +55,18 @@ const char *xen_banner(void)
 {
     return XEN_BANNER;
 }
+
+const char *xen_deny(void)
+{
+    return "<denied>";
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/version.h b/xen/include/xen/version.h
index 81a3c7d..016a56c 100644
--- a/xen/include/xen/version.h
+++ b/xen/include/xen/version.h
@@ -12,5 +12,5 @@ unsigned int xen_minor_version(void);
 const char *xen_extra_version(void);
 const char *xen_changeset(void);
 const char *xen_banner(void);
-
+const char *xen_deny(void);
 #endif /* __XEN_VERSION_H__ */
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 1d13826..94b8855 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -727,3 +727,25 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, unsigned int
 }
 
 #endif /* CONFIG_X86 */
+
+#include <public/version.h>
+static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
+{
+    XSM_ASSERT_ACTION(XSM_OTHER);
+    switch ( op )
+    {
+    case XENVER_version:
+    case XENVER_extraversion:
+    case XENVER_compile_info:
+    case XENVER_capabilities:
+    case XENVER_changeset:
+    case XENVER_platform_parameters:
+    case XENVER_get_features:
+    case XENVER_pagesize:
+    case XENVER_guest_handle:
+        /* These MUST always be accessible to any guest by default. */
+        return xsm_default_action(XSM_HOOK, current->domain, NULL);
+    default:
+        return xsm_default_action(XSM_PRIV, current->domain, NULL);
+    }
+}
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 3afed70..db440f6 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -193,6 +193,7 @@ struct xsm_operations {
     int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
     int (*pmu_op) (struct domain *d, unsigned int op);
 #endif
+    int (*xen_version) (uint32_t cmd);
 };
 
 #ifdef CONFIG_XSM
@@ -731,6 +732,10 @@ static inline int xsm_pmu_op (xsm_default_t def, struct domain *d, unsigned int
 
 #endif /* CONFIG_X86 */
 
+static inline int xsm_xen_version (xsm_default_t def, uint32_t op)
+{
+    return xsm_ops->xen_version(op);
+}
 #endif /* XSM_NO_WRAPPERS */
 
 #ifdef CONFIG_MULTIBOOT
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 0f32636..9791ad4 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -162,4 +162,5 @@ void xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, ioport_mapping);
     set_to_dummy_if_null(ops, pmu_op);
 #endif
+    set_to_dummy_if_null(ops, xen_version);
 }
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 4813623..d1bef43 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -26,6 +26,7 @@
 #include <public/xen.h>
 #include <public/physdev.h>
 #include <public/platform.h>
+#include <public/version.h>
 
 #include <public/xsm/flask_op.h>
 
@@ -1620,6 +1621,47 @@ static int flask_pmu_op (struct domain *d, unsigned int op)
 }
 #endif /* CONFIG_X86 */
 
+static int flask_xen_version (uint32_t op)
+{
+    u32 dsid = domain_sid(current->domain);
+
+    switch ( op )
+    {
+    case XENVER_version:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_VERSION, NULL);
+    case XENVER_extraversion:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_EXTRAVERSION, NULL);
+    case XENVER_compile_info:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_COMPILE_INFO, NULL);
+    case XENVER_capabilities:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_CAPABILITIES, NULL);
+    case XENVER_changeset:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_CHANGESET, NULL);
+    case XENVER_platform_parameters:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_PLATFORM_PARAMETERS, NULL);
+    case XENVER_get_features:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_GET_FEATURES, NULL);
+    case XENVER_pagesize:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_PAGESIZE, NULL);
+    case XENVER_guest_handle:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_GUEST_HANDLE, NULL);
+    case XENVER_commandline:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_COMMANDLINE, NULL);
+    default:
+        return -EPERM;
+    }
+}
+
 long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 int compat_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 
@@ -1758,6 +1800,7 @@ static struct xsm_operations flask_ops = {
     .ioport_mapping = flask_ioport_mapping,
     .pmu_op = flask_pmu_op,
 #endif
+    .xen_version = flask_xen_version,
 };
 
 static __init void flask_init(void)
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index effb59f..628dd5c 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -495,3 +495,31 @@ class security
 # remove ocontext label definitions for resources
     del_ocontext
 }
+
+# Class version is used to describe the XENVER_ hypercall.
+# Each sub-ops is described here - in the default case all of them should
+# be allowed except the XENVER_commandline.
+#
+class version
+{
+# Often called by PV kernels to force an callback.
+    xen_version
+# Extra informations (-unstable).
+    xen_extraversion
+# Compile information of the hypervisor.
+    xen_compile_info
+# Such as "xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64".
+    xen_capabilities
+# Such as the virtual address of where the hypervisor resides.
+    xen_platform_parameters
+# Source code changeset.
+    xen_changeset
+# The features the hypervisor supports.
+    xen_get_features
+# Page size the hypervisor uses.
+    xen_pagesize
+# An value that the control stack can choose.
+    xen_guest_handle
+# Xen command line.
+    xen_commandline
+}
diff --git a/xen/xsm/flask/policy/security_classes b/xen/xsm/flask/policy/security_classes
index ca191db..cde4e1a 100644
--- a/xen/xsm/flask/policy/security_classes
+++ b/xen/xsm/flask/policy/security_classes
@@ -18,5 +18,6 @@ class shadow
 class event
 class grant
 class security
+class version
 
 # FLASK
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (2 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 03/34] xsm/xen_version: Add XSM for the xen_version hypercall Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 18:29   ` Andrew Cooper
  2016-03-22 17:51   ` Daniel De Graaf
  2016-03-15 17:56 ` [PATCH v4 05/34] libxc/libxl/python/xenstat: Use new XEN_VERSION_OP hypercall Konrad Rzeszutek Wilk
                   ` (29 subsequent siblings)
  33 siblings, 2 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Wei Liu, Stefano Stabellini, Konrad Rzeszutek Wilk, Ian Jackson,
	Julien Grall, Stefano Stabellini, Jan Beulich, Keir Fraser,
	Daniel De Graaf

This hypercall mirrors the XENVER_ in that it has similar functionality.
However it is designed differently:
 - No compat layer. The data structures are the same size on 32
   as on 64-bit.
 - The hypercall accepts three arguments - the command, pointer to
   an buffer, and the length of the buffer.
 - Each sub-ops can be "probed" for size by returning the size of
   buffer that will be needed - if the buffer is NULL.
 - Subops can complete even if the buffer is too slow - truncated
   data will be filled and hypercall will return -ENOBUFS.
 - VERSION_OP_commandline, VERSION_OP_changeset are privileged.
 - There are no XENVER_compile_info equivalent.
 - The hypercall can return -EPERM and toolstack/OSes are expected
   to deal with it.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
---
 tools/flask/policy/policy/modules/xen/xen.te |   9 +-
 xen/arch/arm/traps.c                         |   1 +
 xen/arch/x86/hvm/hvm.c                       |   1 +
 xen/arch/x86/x86_64/compat/entry.S           |   2 +
 xen/arch/x86/x86_64/entry.S                  |   2 +
 xen/common/compat/kernel.c                   |   3 +
 xen/common/kernel.c                          | 265 +++++++++++++++++++++++----
 xen/include/public/arch-arm.h                |   3 +
 xen/include/public/version.h                 |  72 +++++++-
 xen/include/public/xen.h                     |   1 +
 xen/include/xen/hypercall.h                  |   4 +
 xen/include/xsm/dummy.h                      |  19 ++
 xen/include/xsm/xsm.h                        |   7 +
 xen/xsm/dummy.c                              |   1 +
 xen/xsm/flask/hooks.c                        |  39 ++++
 xen/xsm/flask/policy/access_vectors          |  24 ++-
 16 files changed, 410 insertions(+), 43 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index 7e7400d..bea40c1 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -74,12 +74,14 @@ allow dom0_t xen_t:xen2 {
     get_symbol
 };
 
-# Allow dom0 to use all XENVER_ subops
+# Allow dom0 to use all XENVER_ subops and VERSION_OP subops
 # Note that dom0 is part of domain_type so this has duplicates.
 allow dom0_t xen_t:version {
     xen_version xen_extraversion xen_compile_info xen_capabilities
     xen_changeset xen_platform_parameters xen_get_features xen_pagesize
     xen_guest_handle xen_commandline
+    version extraversion capabilities changeset platform_parameters
+    get_features pagesize guest_handle commandline
 };
 
 allow dom0_t xen_t:mmu memorymap;
@@ -146,11 +148,14 @@ if (guest_writeconsole) {
 # pmu_ctrl is for)
 allow domain_type xen_t:xen2 pmu_use;
 
-# For normal guests all except XENVER_commandline
+# For normal guests all except XENVER_commandline, VERSION_OP_changeset,
+# and VERSION_OP_commandline
 allow domain_type xen_t:version {
     xen_version xen_extraversion xen_compile_info xen_capabilities
     xen_changeset xen_platform_parameters xen_get_features xen_pagesize
     xen_guest_handle
+    version extraversion capabilities  platform_parameters
+    get_features pagesize guest_handle
 };
 ###############################################################################
 #
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 83744e8..31d2115 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1235,6 +1235,7 @@ static arm_hypercall_t arm_hypercall_table[] = {
     HYPERCALL(multicall, 2),
     HYPERCALL(platform_op, 1),
     HYPERCALL_ARM(vcpu_op, 3),
+    HYPERCALL(version_op, 3),
 };
 
 #ifndef NDEBUG
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 255a1d6..56b9f6b 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -5332,6 +5332,7 @@ static const struct {
     COMPAT_CALL(platform_op),
     COMPAT_CALL(mmuext_op),
     HYPERCALL(xenpmu_op),
+    HYPERCALL(version_op),
     HYPERCALL(arch_1)
 };
 
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index 927439d..8715945 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -391,6 +391,7 @@ ENTRY(compat_hypercall_table)
         .quad do_tmem_op
         .quad do_ni_hypercall           /* reserved for XenClient */
         .quad do_xenpmu_op              /* 40 */
+        .quad do_version_op
         .rept __HYPERVISOR_arch_0-((.-compat_hypercall_table)/8)
         .quad compat_ni_hypercall
         .endr
@@ -442,6 +443,7 @@ ENTRY(compat_hypercall_args_table)
         .byte 1 /* do_tmem_op               */
         .byte 0 /* reserved for XenClient   */
         .byte 2 /* do_xenpmu_op             */  /* 40 */
+        .byte 3 /* do_version_op            */
         .rept __HYPERVISOR_arch_0-(.-compat_hypercall_args_table)
         .byte 0 /* compat_ni_hypercall      */
         .endr
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index dd7f114..178dc3a 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -727,6 +727,7 @@ ENTRY(hypercall_table)
         .quad do_tmem_op
         .quad do_ni_hypercall       /* reserved for XenClient */
         .quad do_xenpmu_op          /* 40 */
+        .quad do_version_op
         .rept __HYPERVISOR_arch_0-((.-hypercall_table)/8)
         .quad do_ni_hypercall
         .endr
@@ -778,6 +779,7 @@ ENTRY(hypercall_args_table)
         .byte 1 /* do_tmem_op           */
         .byte 0 /* reserved for XenClient */
         .byte 2 /* do_xenpmu_op         */  /* 40 */
+        .byte 3 /* do_version_op        */
         .rept __HYPERVISOR_arch_0-(.-hypercall_args_table)
         .byte 0 /* do_ni_hypercall      */
         .endr
diff --git a/xen/common/compat/kernel.c b/xen/common/compat/kernel.c
index e1b9013..e98ba7d 100644
--- a/xen/common/compat/kernel.c
+++ b/xen/common/compat/kernel.c
@@ -38,6 +38,9 @@ CHECK_TYPE(capabilities_info);
 
 CHECK_TYPE(domain_handle);
 
+CHECK_TYPE(version_op_buf);
+CHECK_TYPE(version_op_val);
+
 #define xennmi_callback compat_nmi_callback
 #define xennmi_callback_t compat_nmi_callback_t
 
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 2699ac0..f06b3d9 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -221,6 +221,47 @@ void __init do_initcalls(void)
 
 #endif
 
+static int get_features(struct domain *d, xen_feature_info_t *fi)
+{
+    switch ( fi->submap_idx )
+    {
+    case 0:
+        fi->submap = (1U << XENFEAT_memory_op_vnode_supported);
+        if ( VM_ASSIST(d, pae_extended_cr3) )
+            fi->submap |= (1U << XENFEAT_pae_pgdir_above_4gb);
+        if ( paging_mode_translate(d) )
+            fi->submap |= 
+                (1U << XENFEAT_writable_page_tables) |
+                (1U << XENFEAT_auto_translated_physmap);
+        if ( is_hardware_domain(d) )
+            fi->submap |= 1U << XENFEAT_dom0;
+#ifdef CONFIG_X86
+        switch ( d->guest_type )
+        {
+        case guest_type_pv:
+            fi->submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) |
+                          (1U << XENFEAT_highmem_assist) |
+                          (1U << XENFEAT_gnttab_map_avail_bits);
+            break;
+        case guest_type_pvh:
+            fi->submap |= (1U << XENFEAT_hvm_safe_pvclock) |
+                          (1U << XENFEAT_supervisor_mode_kernel) |
+                          (1U << XENFEAT_hvm_callback_vector);
+            break;
+        case guest_type_hvm:
+            fi->submap |= (1U << XENFEAT_hvm_safe_pvclock) |
+                          (1U << XENFEAT_hvm_callback_vector) |
+                          (1U << XENFEAT_hvm_pirqs);
+           break;
+        }
+#endif
+        break;
+    default:
+        return -EINVAL;
+    }
+    return 0;
+}
+
 /*
  * Simple hypercalls.
  */
@@ -302,50 +343,16 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     case XENVER_get_features:
     {
         xen_feature_info_t fi;
-        struct domain *d = current->domain;
 
         if ( copy_from_guest(&fi, arg, 1) )
             return -EFAULT;
 
-        switch ( fi.submap_idx )
+        if ( !deny )
         {
-        case 0:
-            if ( deny )
-                break;
-            fi.submap = (1U << XENFEAT_memory_op_vnode_supported);
-            if ( VM_ASSIST(d, pae_extended_cr3) )
-                fi.submap |= (1U << XENFEAT_pae_pgdir_above_4gb);
-            if ( paging_mode_translate(d) )
-                fi.submap |= 
-                    (1U << XENFEAT_writable_page_tables) |
-                    (1U << XENFEAT_auto_translated_physmap);
-            if ( is_hardware_domain(d) )
-                fi.submap |= 1U << XENFEAT_dom0;
-#ifdef CONFIG_X86
-            switch ( d->guest_type )
-            {
-            case guest_type_pv:
-                fi.submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) |
-                             (1U << XENFEAT_highmem_assist) |
-                             (1U << XENFEAT_gnttab_map_avail_bits);
-                break;
-            case guest_type_pvh:
-                fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
-                             (1U << XENFEAT_supervisor_mode_kernel) |
-                             (1U << XENFEAT_hvm_callback_vector);
-                break;
-            case guest_type_hvm:
-                fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
-                             (1U << XENFEAT_hvm_callback_vector) |
-                             (1U << XENFEAT_hvm_pirqs);
-                break;
-            }
-#endif
-            break;
-        default:
-            return -EINVAL;
+            int rc = get_features(current->domain, &fi);
+            if ( rc )
+                return rc;
         }
-
         if ( __copy_to_guest(arg, &fi, 1) )
             return -EFAULT;
         return 0;
@@ -388,6 +395,188 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     return -ENOSYS;
 }
 
+static const char *capabilities_info(ssize_t *len)
+{
+    static xen_capabilities_info_t cached_cap;
+    static unsigned int cached_cap_len;
+    static bool_t cached;
+
+    if ( cached )
+    {
+        *len = cached_cap_len;
+        return cached_cap;
+    }
+    arch_get_xen_caps(&cached_cap);
+    cached_cap_len = strlen(cached_cap) + 1;
+
+    *len = cached_cap_len;
+    return cached_cap;
+}
+
+static int size_of_subops_data(unsigned int cmd, ssize_t *sz)
+{
+    int rc = 0;
+    /* Compute size. */
+    switch ( cmd )
+    {
+    case XEN_VERSION_OP_version:
+        *sz = sizeof(xen_version_op_val_t);
+        break;
+
+    case XEN_VERSION_OP_extraversion:
+        *sz = strlen(xen_extra_version()) + 1;
+        break;
+
+    case XEN_VERSION_OP_capabilities:
+        capabilities_info(sz);
+        break;
+
+    case XEN_VERSION_OP_platform_parameters:
+        *sz = sizeof(xen_version_op_val_t);
+        break;
+
+    case XEN_VERSION_OP_changeset:
+        *sz = strlen(xen_changeset()) + 1;
+        break;
+
+    case XEN_VERSION_OP_get_features:
+        *sz = sizeof(xen_feature_info_t);
+        break;
+
+    case XEN_VERSION_OP_pagesize:
+        *sz = sizeof(xen_version_op_val_t);
+        break;
+
+    case XEN_VERSION_OP_guest_handle:
+        *sz = ARRAY_SIZE(current->domain->handle);
+        break;
+
+    case XEN_VERSION_OP_commandline:
+        *sz = ARRAY_SIZE(saved_cmdline);
+        break;
+
+    default:
+        rc = -ENOSYS;
+    }
+
+    return rc;
+}
+
+/*
+ * Similar to HYPERVISOR_xen_version but with a sane interface
+ * (has a length, one can probe for the length) and with one less sub-ops:
+ * missing XENVER_compile_info.
+ */
+DO(version_op)(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg,
+               unsigned int len)
+{
+    union {
+        xen_version_op_val_t n;
+        xen_feature_info_t fi;
+    } u;
+    ssize_t sz = 0;
+    const void *ptr = NULL;
+    int rc = xsm_version_op(XSM_OTHER, cmd);
+
+    /* We can safely return -EPERM! */
+    if ( rc )
+        return rc;
+
+    rc = size_of_subops_data(cmd, &sz);
+    if ( rc )
+        return rc;
+
+    /* Some of the subops may have no data. */
+    if ( !sz )
+        return 0;
+    /*
+     * This hypercall also allows the client to probe. If it provides
+     * a NULL arg we will return the size of the space it has to
+     * allocate for the specific sub-op.
+     */
+    if ( guest_handle_is_null(arg) )
+        return sz;
+
+    memset(&u, 0, sizeof(u));
+    /*
+     * The HYPERVISOR_xen_version differs in that some return the value,
+     * and some copy it on back on argument. We follow the same rule for all
+     * sub-ops: return 0 on success, positive value of bytes returned, and
+     * always copy the result in arg. Yeey sanity!
+     */
+
+    rc = 0;
+    switch ( cmd )
+    {
+    case XEN_VERSION_OP_version:
+        u.n = (xen_major_version() << 16) | xen_minor_version();
+        break;
+
+    case XEN_VERSION_OP_extraversion:
+        ptr = xen_extra_version();
+        break;
+
+    case XEN_VERSION_OP_capabilities:
+        ptr = capabilities_info(&sz);
+        break;
+
+    case XEN_VERSION_OP_platform_parameters:
+        u.n = HYPERVISOR_VIRT_START;
+        break;
+
+    case XEN_VERSION_OP_changeset:
+        ptr = xen_changeset();
+        break;
+
+    case XEN_VERSION_OP_get_features:
+        if ( copy_from_guest(&u.fi, arg, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+        rc = get_features(current->domain, &u.fi);
+        break;
+
+    case XEN_VERSION_OP_pagesize:
+        u.n = PAGE_SIZE;
+        break;
+
+    case XEN_VERSION_OP_guest_handle:
+        ptr = current->domain->handle;
+        break;
+
+    case XEN_VERSION_OP_commandline:
+        ptr = saved_cmdline;
+        break;
+
+    default:
+        rc = -ENOSYS;
+    }
+
+    if ( !rc )
+    {
+        ssize_t bytes;
+
+        if ( sz > len )
+            bytes = len;
+        else
+            bytes = sz;
+
+        if ( copy_to_guest(arg, ptr ? ptr : &u, bytes) )
+            rc = -EFAULT;
+    }
+    if ( !rc )
+    {
+        /*
+         * We return len (truncate) worth of data even if we fail.
+         */
+        if ( sz > len )
+            rc = -ENOBUFS;
+    }
+
+    return rc == 0 ? sz : rc;
+}
+
 DO(nmi_op)(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct xennmi_callback cb;
diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index 870bc3b..c9ae315 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -128,6 +128,9 @@
  *    * VCPUOP_register_vcpu_info
  *    * VCPUOP_register_runstate_memory_area
  *
+ *  HYPERVISOR_version_op
+ *   All generic sub-operations
+ *
  *
  * Other notes on the ARM ABI:
  *
diff --git a/xen/include/public/version.h b/xen/include/public/version.h
index 24a582f..4ceb97b 100644
--- a/xen/include/public/version.h
+++ b/xen/include/public/version.h
@@ -30,7 +30,15 @@
 
 #include "xen.h"
 
-/* NB. All ops return zero on success, except XENVER_{version,pagesize} */
+/*
+ * There are two hypercalls mentioned in here. The XENVER_ are for
+ * HYPERCALL_xen_version (17), while VERSION_OP_ are for the
+ * HYPERCALL_version_op (41).
+ *
+ * The subops are very similar except that the later hypercall has a
+ * sane interface.
+ */
+
 
 /* arg == NULL; returns major:minor (16:16). */
 #define XENVER_version      0
@@ -87,6 +95,68 @@ typedef struct xen_feature_info xen_feature_info_t;
 #define XENVER_commandline 9
 typedef char xen_commandline_t[1024];
 
+
+
+/*
+ * The HYPERCALL_version_op has a set of sub-ops which mirror the
+ * sub-ops of HYPERCALL_xen_version. However this hypercall differs
+ * radically from the former:
+ *  - It returns the amount of bytes returned.
+ *  - It will return -XEN_EPERM if the guest is not permitted.
+ *  - It will return the requested data in arg.
+ *  - It requires an third argument (len) for the length of the
+ *    arg. Naturally the arg has to fit the requested data otherwise
+ *    -XEN_ENOBUFS is returned.
+ *
+ * It also offers an mechanism to probe for the amount of bytes an
+ * sub-op will require. Having the arg have an NULL pointer will
+ * return the number of bytes requested for the operation. Or an
+ * negative value if an error is encountered.
+ */
+
+typedef uint64_t xen_version_op_val_t;
+DEFINE_XEN_GUEST_HANDLE(xen_version_op_val_t);
+
+typedef unsigned char xen_version_op_buf_t[];
+DEFINE_XEN_GUEST_HANDLE(xen_version_op_buf_t);
+
+/* arg == version_op_val_t. Encoded as major:minor (31..16:15..0) */
+#define XEN_VERSION_OP_version      0
+
+/* arg == version_op_buf. */
+#define XEN_VERSION_OP_extraversion 1
+
+/* arg == version_op_buf */
+#define XEN_VERSION_OP_capabilities 3
+
+/* arg == version_op_buf */
+#define XEN_VERSION_OP_changeset 4
+
+/*
+ * arg == xen_version_op_val_t. Contains the virtual address
+ * of the hypervisor encoded as [63..0].
+ */
+#define XEN_VERSION_OP_platform_parameters 5
+
+/*
+ * arg = xen_feature_info_t - shares the same structure
+ * as the XENVER_get_features.
+ */
+#define XEN_VERSION_OP_get_features 6
+
+/* arg == xen_version_op_val_t */
+#define XEN_VERSION_OP_pagesize 7
+
+/* arg == version_op_buf.
+ *
+ * The toolstack fills it out for guest consumption. It is intended to hold
+ * the UUID of the guest.
+ */
+#define XEN_VERSION_OP_guest_handle 8
+
+/* arg = version_op_buf */
+#define XEN_VERSION_OP_commandline 9
+
 #endif /* __XEN_PUBLIC_VERSION_H__ */
 
 /*
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 64ba7ab..1a99929 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -115,6 +115,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define __HYPERVISOR_tmem_op              38
 #define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
 #define __HYPERVISOR_xenpmu_op            40
+#define __HYPERVISOR_version_op           41 /* supersedes xen_version (17) */
 
 /* Architecture-specific hypercall definitions. */
 #define __HYPERVISOR_arch_0               48
diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
index 26cb615..00e4245 100644
--- a/xen/include/xen/hypercall.h
+++ b/xen/include/xen/hypercall.h
@@ -143,6 +143,10 @@ do_xenoprof_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg);
 extern long
 do_xenpmu_op(unsigned int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg);
 
+extern long
+do_version_op(unsigned int cmd,
+    XEN_GUEST_HANDLE_PARAM(void) arg, unsigned int len);
+
 #ifdef CONFIG_COMPAT
 
 extern int
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 94b8855..8c6ae90 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -749,3 +749,22 @@ static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
         return xsm_default_action(XSM_PRIV, current->domain, NULL);
     }
 }
+
+static XSM_INLINE int xsm_version_op (XSM_DEFAULT_ARG uint32_t op)
+{
+    XSM_ASSERT_ACTION(XSM_OTHER);
+    switch ( op )
+    {
+    case XEN_VERSION_OP_version:
+    case XEN_VERSION_OP_extraversion:
+    case XEN_VERSION_OP_capabilities:
+    case XEN_VERSION_OP_platform_parameters:
+    case XEN_VERSION_OP_get_features:
+    case XEN_VERSION_OP_pagesize:
+    case XEN_VERSION_OP_guest_handle:
+        /* These MUST always be accessible to any guest by default. */
+        return xsm_default_action(XSM_HOOK, current->domain, NULL);
+    default:
+        return xsm_default_action(XSM_PRIV, current->domain, NULL);
+    }
+}
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index db440f6..ac80472 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -194,6 +194,7 @@ struct xsm_operations {
     int (*pmu_op) (struct domain *d, unsigned int op);
 #endif
     int (*xen_version) (uint32_t cmd);
+    int (*version_op) (uint32_t cmd);
 };
 
 #ifdef CONFIG_XSM
@@ -736,6 +737,12 @@ static inline int xsm_xen_version (xsm_default_t def, uint32_t op)
 {
     return xsm_ops->xen_version(op);
 }
+
+static inline int xsm_version_op (xsm_default_t def, uint32_t op)
+{
+    return xsm_ops->version_op(op);
+}
+
 #endif /* XSM_NO_WRAPPERS */
 
 #ifdef CONFIG_MULTIBOOT
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 9791ad4..776dd09 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -163,4 +163,5 @@ void xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, pmu_op);
 #endif
     set_to_dummy_if_null(ops, xen_version);
+    set_to_dummy_if_null(ops, version_op);
 }
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index d1bef43..2510229 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1662,6 +1662,44 @@ static int flask_xen_version (uint32_t op)
     }
 }
 
+static int flask_version_op (uint32_t op)
+{
+    u32 dsid = domain_sid(current->domain);
+
+    switch ( op )
+    {
+    case XEN_VERSION_OP_version:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__VERSION, NULL);
+    case XEN_VERSION_OP_extraversion:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__EXTRAVERSION, NULL);
+    case XEN_VERSION_OP_capabilities:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__CAPABILITIES, NULL);
+    case XEN_VERSION_OP_changeset:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__CHANGESET, NULL);
+    case XEN_VERSION_OP_platform_parameters:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__PLATFORM_PARAMETERS, NULL);
+    case XEN_VERSION_OP_get_features:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__GET_FEATURES, NULL);
+    case XEN_VERSION_OP_pagesize:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__PAGESIZE, NULL);
+    case XEN_VERSION_OP_guest_handle:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__GUEST_HANDLE, NULL);
+    case XEN_VERSION_OP_commandline:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__COMMANDLINE, NULL);
+    default:
+        return -EPERM;
+    }
+}
+
 long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 int compat_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 
@@ -1801,6 +1839,7 @@ static struct xsm_operations flask_ops = {
     .pmu_op = flask_pmu_op,
 #endif
     .xen_version = flask_xen_version,
+    .version_op = flask_version_op,
 };
 
 static __init void flask_init(void)
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 628dd5c..59c9f69 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -496,9 +496,10 @@ class security
     del_ocontext
 }
 
-# Class version is used to describe the XENVER_ hypercall.
+# Class version is used to describe the XENVER_ and VERSION_OP hypercall.
 # Each sub-ops is described here - in the default case all of them should
-# be allowed except the XENVER_commandline.
+# be allowed except the XENVER_commandline, VERSION_OP_commandline, and
+# VERSION_OP_changeset.
 #
 class version
 {
@@ -522,4 +523,23 @@ class version
     xen_guest_handle
 # Xen command line.
     xen_commandline
+
+# Often called by PV kernels to force an callback.
+    version
+# Extra informations (-unstable).
+    extraversion
+# Such as "xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64".
+    capabilities
+# Such as the virtual address of where the hypervisor resides.
+    platform_parameters
+# Source code changeset.
+    changeset
+# The features the hypervisor supports.
+    get_features
+# Page size the hypervisor uses.
+    pagesize
+# An value that the control stack can choose.
+    guest_handle
+# Xen command line.
+    commandline
 }
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 05/34] libxc/libxl/python/xenstat: Use new XEN_VERSION_OP hypercall
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (3 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 18:45   ` Andrew Cooper
                     ` (2 more replies)
  2016-03-15 17:56 ` [PATCH v4 06/34] x86/arm: Add BUGFRAME_NR define and BUILD checks Konrad Rzeszutek Wilk
                   ` (28 subsequent siblings)
  33 siblings, 3 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Wei Liu, Stefano Stabellini, George Dunlap,
	Konrad Rzeszutek Wilk, Ian Jackson, Julien Grall, David Scott

We change the xen_version libxc code to use the new hypercall.
Which of course means every user in the code base has to
be changed over.

It is important to note that the xc_version_op has a different
return semantic than the previous one. It returns negative
values on error (like the old one), but it also returns
an positive value on success (unlike the old one). The positive
value is the number of bytes copied in.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Scott <dave@recoil.org>
Cc: George Dunlap <george.dunlap@eu.citrix.com>

v2: Use xc_version_op_val_t instead of uint32 or such
v3: Make sure to check ret < 0 instead of ret (as it returns the size) -
    in Ocaml code. Found by Andrew.
v4: Update comment for xc_version to mention the return the size
---
 tools/libxc/include/xenctrl.h          | 24 ++++++++++-
 tools/libxc/xc_core.c                  | 35 +++++++--------
 tools/libxc/xc_dom_boot.c              | 12 +++++-
 tools/libxc/xc_domain.c                |  3 +-
 tools/libxc/xc_private.c               | 53 ++++-------------------
 tools/libxc/xc_private.h               |  7 +--
 tools/libxc/xc_resume.c                |  3 +-
 tools/libxc/xc_sr_save.c               |  9 ++--
 tools/libxc/xg_save_restore.h          |  6 ++-
 tools/libxl/libxl.c                    | 79 ++++++++++++++++++++++------------
 tools/ocaml/libs/xc/xenctrl_stubs.c    | 39 +++++++----------
 tools/python/xen/lowlevel/xc/xc.c      | 30 +++++++------
 tools/xenstat/libxenstat/src/xenstat.c | 12 +++---
 tools/xentrace/xenctx.c                |  3 +-
 14 files changed, 169 insertions(+), 146 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 150d727..379de30 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1477,7 +1477,29 @@ int xc_tbuf_set_evt_mask(xc_interface *xch, uint32_t mask);
 int xc_domctl(xc_interface *xch, struct xen_domctl *domctl);
 int xc_sysctl(xc_interface *xch, struct xen_sysctl *sysctl);
 
-int xc_version(xc_interface *xch, int cmd, void *arg);
+/**
+ * This function returns the size of buffer to be allocated for
+ * the cmd. The cmd are XEN_VERSION_OP_*.
+ */
+int xc_version_len(xc_interface *xch, unsigned int cmd);
+/**
+ * This function retrieves the information from the version_op hypercall.
+ * The len is the size of the arg buffer. If arg is NULL, will not
+ * perform hypercall - instead will just return the size of arg
+ * buffer that is needed.
+ *
+ * Note that prior to Xen 4.7 this would return 0 for success and
+ * negative value (-1) for error (with the error in errno). In Xen 4.7
+ * and later for success it will return an positive value which is the
+ * number of bytes copied in arg.
+ *
+ * It can also return -1 with various errno values:
+ *  - EPERM - not permitted.
+ *  - ENOBUFS - the len was to short, output in arg truncated.
+ *  - ENOSYS - not implemented.
+ *
+ */
+int xc_version(xc_interface *xch, unsigned int cmd, void *arg, ssize_t len);
 
 int xc_flask_op(xc_interface *xch, xen_flask_op_t *op);
 
diff --git a/tools/libxc/xc_core.c b/tools/libxc/xc_core.c
index d792566..58b03d6 100644
--- a/tools/libxc/xc_core.c
+++ b/tools/libxc/xc_core.c
@@ -270,42 +270,43 @@ elfnote_fill_xen_version(xc_interface *xch,
                          *xen_version)
 {
     int rc;
+    xen_version_op_val_t val = 0;
     memset(xen_version, 0, sizeof(*xen_version));
 
-    rc = xc_version(xch, XENVER_version, NULL);
+    rc = xc_version(xch, XEN_VERSION_OP_version, &val, sizeof(val));
     if ( rc < 0 )
         return rc;
-    xen_version->major_version = rc >> 16;
-    xen_version->minor_version = rc & ((1 << 16) - 1);
+    xen_version->major_version = val >> 16;
+    xen_version->minor_version = val & ((1 << 16) - 1);
 
-    rc = xc_version(xch, XENVER_extraversion,
-                    &xen_version->extra_version);
+    rc = xc_version(xch, XEN_VERSION_OP_extraversion,
+                    xen_version->extra_version,
+                    sizeof(xen_version->extra_version));
     if ( rc < 0 )
         return rc;
 
-    rc = xc_version(xch, XENVER_compile_info,
-                    &xen_version->compile_info);
+    rc = xc_version(xch, XEN_VERSION_OP_capabilities,
+                    xen_version->capabilities,
+                    sizeof(xen_version->capabilities));
     if ( rc < 0 )
         return rc;
 
-    rc = xc_version(xch,
-                    XENVER_capabilities, &xen_version->capabilities);
+    rc = xc_version(xch, XEN_VERSION_OP_changeset, xen_version->changeset,
+                    sizeof(xen_version->changeset));
     if ( rc < 0 )
         return rc;
 
-    rc = xc_version(xch, XENVER_changeset, &xen_version->changeset);
+    rc = xc_version(xch, XEN_VERSION_OP_platform_parameters,
+                    &xen_version->platform_parameters,
+                    sizeof(xen_version->platform_parameters));
     if ( rc < 0 )
         return rc;
 
-    rc = xc_version(xch, XENVER_platform_parameters,
-                    &xen_version->platform_parameters);
+    val = 0;
+    rc = xc_version(xch, XEN_VERSION_OP_pagesize, &val, sizeof(val));
     if ( rc < 0 )
         return rc;
-
-    rc = xc_version(xch, XENVER_pagesize, NULL);
-    if ( rc < 0 )
-        return rc;
-    xen_version->pagesize = rc;
+    xen_version->pagesize = val;
 
     return 0;
 }
diff --git a/tools/libxc/xc_dom_boot.c b/tools/libxc/xc_dom_boot.c
index 791041b..3f65095 100644
--- a/tools/libxc/xc_dom_boot.c
+++ b/tools/libxc/xc_dom_boot.c
@@ -112,11 +112,19 @@ int xc_dom_compat_check(struct xc_dom_image *dom)
 
 int xc_dom_boot_xen_init(struct xc_dom_image *dom, xc_interface *xch, domid_t domid)
 {
+    xen_version_op_val_t val = 0;
+
+    if ( xc_version(xch, XEN_VERSION_OP_version, &val, sizeof(val)) < 0 )
+    {
+        xc_dom_panic(xch, XC_INTERNAL_ERROR, "can't get Xen version!");
+        return -1;
+    }
+    dom->xen_version = val;
     dom->xch = xch;
     dom->guest_domid = domid;
 
-    dom->xen_version = xc_version(xch, XENVER_version, NULL);
-    if ( xc_version(xch, XENVER_capabilities, &dom->xen_caps) < 0 )
+    if ( xc_version(xch, XEN_VERSION_OP_capabilities, dom->xen_caps,
+                    sizeof(dom->xen_caps)) < 0 )
     {
         xc_dom_panic(xch, XC_INTERNAL_ERROR, "can't get xen capabilities");
         return -1;
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 050216e..c214700 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -2084,7 +2084,8 @@ int xc_map_domain_meminfo(xc_interface *xch, int domid,
     _di.guest_width = minfo->guest_width;
 
     /* Get page table levels (see get_platform_info() in xg_save_restore.h */
-    if ( xc_version(xch, XENVER_capabilities, &xen_caps) )
+    if ( xc_version(xch, XEN_VERSION_OP_capabilities, xen_caps,
+                    sizeof(xen_caps)) < 0 )
     {
         PERROR("Could not get Xen capabilities (for page table levels)");
         return -1;
diff --git a/tools/libxc/xc_private.c b/tools/libxc/xc_private.c
index c41e433..e33ad8f 100644
--- a/tools/libxc/xc_private.c
+++ b/tools/libxc/xc_private.c
@@ -457,58 +457,23 @@ int xc_sysctl(xc_interface *xch, struct xen_sysctl *sysctl)
     return do_sysctl(xch, sysctl);
 }
 
-int xc_version(xc_interface *xch, int cmd, void *arg)
+int xc_version_len(xc_interface *xch, unsigned int cmd)
 {
-    DECLARE_HYPERCALL_BOUNCE(arg, 0, XC_HYPERCALL_BUFFER_BOUNCE_OUT); /* Size unknown until cmd decoded */
-    size_t sz;
-    int rc;
-
-    switch ( cmd )
-    {
-    case XENVER_version:
-        sz = 0;
-        break;
-    case XENVER_extraversion:
-        sz = sizeof(xen_extraversion_t);
-        break;
-    case XENVER_compile_info:
-        sz = sizeof(xen_compile_info_t);
-        break;
-    case XENVER_capabilities:
-        sz = sizeof(xen_capabilities_info_t);
-        break;
-    case XENVER_changeset:
-        sz = sizeof(xen_changeset_info_t);
-        break;
-    case XENVER_platform_parameters:
-        sz = sizeof(xen_platform_parameters_t);
-        break;
-    case XENVER_get_features:
-        sz = sizeof(xen_feature_info_t);
-        break;
-    case XENVER_pagesize:
-        sz = 0;
-        break;
-    case XENVER_guest_handle:
-        sz = sizeof(xen_domain_handle_t);
-        break;
-    case XENVER_commandline:
-        sz = sizeof(xen_commandline_t);
-        break;
-    default:
-        ERROR("xc_version: unknown command %d\n", cmd);
-        return -EINVAL;
-    }
+    return do_version_op(xch, cmd, NULL, 0);
+}
 
-    HYPERCALL_BOUNCE_SET_SIZE(arg, sz);
+int xc_version(xc_interface *xch, unsigned int cmd, void *arg, ssize_t sz)
+{
+    DECLARE_HYPERCALL_BOUNCE(arg, sz, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    int rc;
 
-    if ( (sz != 0) && xc_hypercall_bounce_pre(xch, arg) )
+    if ( xc_hypercall_bounce_pre(xch, arg) )
     {
         PERROR("Could not bounce buffer for version hypercall");
         return -ENOMEM;
     }
 
-    rc = do_xen_version(xch, cmd, HYPERCALL_BUFFER(arg));
+    rc = do_version_op(xch, cmd, HYPERCALL_BUFFER(arg), sz);
 
     if ( sz != 0 )
         xc_hypercall_bounce_post(xch, arg);
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index aa8daf1..5be8fdd 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -214,11 +214,12 @@ void xc__hypercall_buffer_cache_release(xc_interface *xch);
  * Hypercall interfaces.
  */
 
-static inline int do_xen_version(xc_interface *xch, int cmd, xc_hypercall_buffer_t *dest)
+static inline long do_version_op(xc_interface *xch, int cmd,
+                                 xc_hypercall_buffer_t *dest, ssize_t len)
 {
     DECLARE_HYPERCALL_BUFFER_ARGUMENT(dest);
-    return xencall2(xch->xcall, __HYPERVISOR_xen_version,
-                    cmd, HYPERCALL_BUFFER_AS_ARG(dest));
+    return xencall3(xch->xcall, __HYPERVISOR_version_op,
+                    cmd, HYPERCALL_BUFFER_AS_ARG(dest), len);
 }
 
 static inline int do_physdev_op(xc_interface *xch, int cmd, void *op, size_t len)
diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
index e692b81..7dfc3da 100644
--- a/tools/libxc/xc_resume.c
+++ b/tools/libxc/xc_resume.c
@@ -56,7 +56,8 @@ static int modify_returncode(xc_interface *xch, uint32_t domid)
             return 0;
 
         /* HVM guests have host address width. */
-        if ( xc_version(xch, XENVER_capabilities, &caps) != 0 )
+        if ( xc_version(xch, XEN_VERSION_OP_capabilities, caps,
+                        sizeof(caps)) < 0 )
         {
             PERROR("Could not get Xen capabilities");
             return -1;
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index e258b7c..6daafc4 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -9,7 +9,7 @@
 static int write_headers(struct xc_sr_context *ctx, uint16_t guest_type)
 {
     xc_interface *xch = ctx->xch;
-    int32_t xen_version = xc_version(xch, XENVER_version, NULL);
+    xen_version_op_val_t xen_version;
     struct xc_sr_ihdr ihdr =
         {
             .marker  = IHDR_MARKER,
@@ -21,15 +21,16 @@ static int write_headers(struct xc_sr_context *ctx, uint16_t guest_type)
         {
             .type       = guest_type,
             .page_shift = XC_PAGE_SHIFT,
-            .xen_major  = (xen_version >> 16) & 0xffff,
-            .xen_minor  = (xen_version)       & 0xffff,
         };
 
-    if ( xen_version < 0 )
+    if ( xc_version(xch, XEN_VERSION_OP_version, &xen_version,
+                    sizeof(xen_version)) < 0 )
     {
         PERROR("Unable to obtain Xen Version");
         return -1;
     }
+    dhdr.xen_major = (xen_version >> 16) & 0xffff;
+    dhdr.xen_minor = (xen_version)       & 0xffff;
 
     if ( write_exact(ctx->fd, &ihdr, sizeof(ihdr)) )
     {
diff --git a/tools/libxc/xg_save_restore.h b/tools/libxc/xg_save_restore.h
index 303081d..2663969 100644
--- a/tools/libxc/xg_save_restore.h
+++ b/tools/libxc/xg_save_restore.h
@@ -57,10 +57,12 @@ static inline int get_platform_info(xc_interface *xch, uint32_t dom,
     xen_capabilities_info_t xen_caps = "";
     xen_platform_parameters_t xen_params;
 
-    if (xc_version(xch, XENVER_platform_parameters, &xen_params) != 0)
+    if (xc_version(xch, XEN_VERSION_OP_platform_parameters, &xen_params,
+                   sizeof(xen_params)) < 0)
         return 0;
 
-    if (xc_version(xch, XENVER_capabilities, &xen_caps) != 0)
+    if (xc_version(xch, XEN_VERSION_OP_capabilities, xen_caps,
+                   sizeof(xen_caps)) < 0)
         return 0;
 
     if (xc_maximum_ram_page(xch, max_mfn))
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 93e228d..dc660b7 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5191,50 +5191,73 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
     return ret;
 }
 
+
+static int xc_version_wrapper(libxl_ctx *ctx, unsigned int cmd, char *buf, ssize_t len, char **dst)
+{
+    GC_INIT(ctx);
+    int r;
+
+    r = xc_version(ctx->xch, cmd, buf, len);
+    if ( r == -EPERM )
+        buf[0] = '\0';
+    else if ( r < 0 )
+    {
+        GC_FREE;
+        return r;
+    }
+    *dst = libxl__strdup(NOGC, buf);
+    GC_FREE;
+    return 0;
+}
+
 const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx)
 {
     GC_INIT(ctx);
-    union {
-        xen_extraversion_t xen_extra;
-        xen_compile_info_t xen_cc;
-        xen_changeset_info_t xen_chgset;
-        xen_capabilities_info_t xen_caps;
-        xen_platform_parameters_t p_parms;
-        xen_commandline_t xen_commandline;
-    } u;
-    long xen_version;
+    char *buf;
+    xen_version_op_val_t val = 0;
     libxl_version_info *info = &ctx->version_info;
 
     if (info->xen_version_extra != NULL)
         goto out;
 
-    xen_version = xc_version(ctx->xch, XENVER_version, NULL);
-    info->xen_version_major = xen_version >> 16;
-    info->xen_version_minor = xen_version & 0xFF;
+    if (xc_version(ctx->xch, XEN_VERSION_OP_pagesize, &val, sizeof(val)) < 0)
+        goto out;
 
-    xc_version(ctx->xch, XENVER_extraversion, &u.xen_extra);
-    info->xen_version_extra = libxl__strdup(NOGC, u.xen_extra);
+    info->pagesize = val;
+    /* 4K buffer. */
+    buf = libxl__zalloc(gc, info->pagesize);
 
-    xc_version(ctx->xch, XENVER_compile_info, &u.xen_cc);
-    info->compiler = libxl__strdup(NOGC, u.xen_cc.compiler);
-    info->compile_by = libxl__strdup(NOGC, u.xen_cc.compile_by);
-    info->compile_domain = libxl__strdup(NOGC, u.xen_cc.compile_domain);
-    info->compile_date = libxl__strdup(NOGC, u.xen_cc.compile_date);
+    val = 0;
+    if (xc_version(ctx->xch, XEN_VERSION_OP_version, &val, sizeof(val)) < 0)
+        goto out;
+    info->xen_version_major = val >> 16;
+    info->xen_version_minor = val & 0xFF;
 
-    xc_version(ctx->xch, XENVER_capabilities, &u.xen_caps);
-    info->capabilities = libxl__strdup(NOGC, u.xen_caps);
+    if (xc_version_wrapper(ctx, XEN_VERSION_OP_extraversion, buf,
+                           info->pagesize, &info->xen_version_extra) < 0)
+        goto out;
 
-    xc_version(ctx->xch, XENVER_changeset, &u.xen_chgset);
-    info->changeset = libxl__strdup(NOGC, u.xen_chgset);
+    info->compiler = libxl__strdup(NOGC, "");
+    info->compile_by = libxl__strdup(NOGC, "");
+    info->compile_domain = libxl__strdup(NOGC, "");
+    info->compile_date = libxl__strdup(NOGC, "");
 
-    xc_version(ctx->xch, XENVER_platform_parameters, &u.p_parms);
-    info->virt_start = u.p_parms.virt_start;
+    if (xc_version_wrapper(ctx, XEN_VERSION_OP_capabilities, buf,
+                           info->pagesize, &info->capabilities) < 0)
+        goto out;
 
-    info->pagesize = xc_version(ctx->xch, XENVER_pagesize, NULL);
+    if (xc_version_wrapper(ctx, XEN_VERSION_OP_changeset, buf,
+                           info->pagesize, &info->changeset) < 0)
+        goto out;
 
-    xc_version(ctx->xch, XENVER_commandline, &u.xen_commandline);
-    info->commandline = libxl__strdup(NOGC, u.xen_commandline);
+    val = 0;
+    if (xc_version(ctx->xch, XEN_VERSION_OP_platform_parameters, &val,
+                   sizeof(val)) < 0)
+        goto out;
+    info->virt_start = val;
 
+    (void)xc_version_wrapper(ctx, XEN_VERSION_OP_commandline, buf,
+                             info->pagesize, &info->commandline);
  out:
     GC_FREE;
     return info;
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 74928e9..623b2d7 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -853,21 +853,21 @@ CAMLprim value stub_xc_version_version(value xch)
 	CAMLparam1(xch);
 	CAMLlocal1(result);
 	xen_extraversion_t extra;
-	long packed;
+	xen_version_op_val_t packed;
 	int retval;
 
 	caml_enter_blocking_section();
-	packed = xc_version(_H(xch), XENVER_version, NULL);
+	retval = xc_version(_H(xch), XEN_VERSION_OP_version, &packed, sizeof(packed));
 	caml_leave_blocking_section();
 
-	if (packed < 0)
+	if (retval < 0)
 		failwith_xc(_H(xch));
 
 	caml_enter_blocking_section();
-	retval = xc_version(_H(xch), XENVER_extraversion, &extra);
+	retval = xc_version(_H(xch), XEN_VERSION_OP_extraversion, &extra, sizeof(extra));
 	caml_leave_blocking_section();
 
-	if (retval)
+	if (retval < 0)
 		failwith_xc(_H(xch));
 
 	result = caml_alloc_tuple(3);
@@ -884,37 +884,28 @@ CAMLprim value stub_xc_version_compile_info(value xch)
 {
 	CAMLparam1(xch);
 	CAMLlocal1(result);
-	xen_compile_info_t ci;
-	int retval;
-
-	caml_enter_blocking_section();
-	retval = xc_version(_H(xch), XENVER_compile_info, &ci);
-	caml_leave_blocking_section();
-
-	if (retval)
-		failwith_xc(_H(xch));
 
 	result = caml_alloc_tuple(4);
 
-	Store_field(result, 0, caml_copy_string(ci.compiler));
-	Store_field(result, 1, caml_copy_string(ci.compile_by));
-	Store_field(result, 2, caml_copy_string(ci.compile_domain));
-	Store_field(result, 3, caml_copy_string(ci.compile_date));
+	Store_field(result, 0, caml_copy_string(""));
+	Store_field(result, 1, caml_copy_string(""));
+	Store_field(result, 2, caml_copy_string(""));
+	Store_field(result, 3, caml_copy_string(""));
 
 	CAMLreturn(result);
 }
 
 
-static value xc_version_single_string(value xch, int code, void *info)
+static value xc_version_single_string(value xch, int code, void *info, ssize_t len)
 {
 	CAMLparam1(xch);
 	int retval;
 
 	caml_enter_blocking_section();
-	retval = xc_version(_H(xch), code, info);
+	retval = xc_version(_H(xch), code, info, len);
 	caml_leave_blocking_section();
 
-	if (retval)
+	if (retval < 0)
 		failwith_xc(_H(xch));
 
 	CAMLreturn(caml_copy_string((char *)info));
@@ -925,7 +916,8 @@ CAMLprim value stub_xc_version_changeset(value xch)
 {
 	xen_changeset_info_t ci;
 
-	return xc_version_single_string(xch, XENVER_changeset, &ci);
+	return xc_version_single_string(xch, XEN_VERSION_OP_changeset,
+                                    &ci, sizeof(ci));
 }
 
 
@@ -933,7 +925,8 @@ CAMLprim value stub_xc_version_capabilities(value xch)
 {
 	xen_capabilities_info_t ci;
 
-	return xc_version_single_string(xch, XENVER_capabilities, &ci);
+	return xc_version_single_string(xch, XEN_VERSION_OP_capabilities,
+                                    &ci, sizeof(ci));
 }
 
 
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index c40a4e9..23876f0 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -1204,34 +1204,40 @@ static PyObject *pyxc_xeninfo(XcObject *self)
     xen_capabilities_info_t xen_caps;
     xen_platform_parameters_t p_parms;
     xen_commandline_t xen_commandline;
-    long xen_version;
-    long xen_pagesize;
+    xen_version_op_val_t xen_version;
+    xen_version_op_val_t xen_pagesize;
     char str[128];
 
-    xen_version = xc_version(self->xc_handle, XENVER_version, NULL);
-
-    if ( xc_version(self->xc_handle, XENVER_extraversion, &xen_extra) != 0 )
+    if ( xc_version(self->xc_handle, XEN_VERSION_OP_version, &xen_version,
+                    sizeof(xen_version)) < 0)
         return pyxc_error_to_exception(self->xc_handle);
 
-    if ( xc_version(self->xc_handle, XENVER_compile_info, &xen_cc) != 0 )
+    if ( xc_version(self->xc_handle, XEN_VERSION_OP_extraversion, &xen_extra,
+                    sizeof(xen_extra)) < 0 )
         return pyxc_error_to_exception(self->xc_handle);
 
-    if ( xc_version(self->xc_handle, XENVER_changeset, &xen_chgset) != 0 )
+    memset(&xen_cc, 0, sizeof(xen_cc));
+
+    if ( xc_version(self->xc_handle, XEN_VERSION_OP_changeset, &xen_chgset,
+                    sizeof(xen_chgset)) < 0 )
         return pyxc_error_to_exception(self->xc_handle);
 
-    if ( xc_version(self->xc_handle, XENVER_capabilities, &xen_caps) != 0 )
+    if ( xc_version(self->xc_handle, XEN_VERSION_OP_capabilities, &xen_caps,
+                   sizeof(xen_caps)) < 0 )
         return pyxc_error_to_exception(self->xc_handle);
 
-    if ( xc_version(self->xc_handle, XENVER_platform_parameters, &p_parms) != 0 )
+    if ( xc_version(self->xc_handle, XEN_VERSION_OP_platform_parameters,
+                    &p_parms, sizeof(p_parms)) < 0 )
         return pyxc_error_to_exception(self->xc_handle);
 
-    if ( xc_version(self->xc_handle, XENVER_commandline, &xen_commandline) != 0 )
+    if ( xc_version(self->xc_handle, XEN_VERSION_OP_commandline,
+                    &xen_commandline, sizeof(xen_commandline)) < 0 )
         return pyxc_error_to_exception(self->xc_handle);
 
     snprintf(str, sizeof(str), "virt_start=0x%"PRI_xen_ulong, p_parms.virt_start);
 
-    xen_pagesize = xc_version(self->xc_handle, XENVER_pagesize, NULL);
-    if (xen_pagesize < 0 )
+    if ( xc_version(self->xc_handle, XEN_VERSION_OP_pagesize, &xen_pagesize,
+                    sizeof(xen_pagesize)) < 0)
         return pyxc_error_to_exception(self->xc_handle);
 
     return Py_BuildValue("{s:i,s:i,s:s,s:s,s:i,s:s,s:s,s:s,s:s,s:s,s:s,s:s}",
diff --git a/tools/xenstat/libxenstat/src/xenstat.c b/tools/xenstat/libxenstat/src/xenstat.c
index 3495f3f..723e46a 100644
--- a/tools/xenstat/libxenstat/src/xenstat.c
+++ b/tools/xenstat/libxenstat/src/xenstat.c
@@ -621,20 +621,18 @@ unsigned long long xenstat_network_tdrop(xenstat_network * network)
 /* Collect Xen version information */
 static int xenstat_collect_xen_version(xenstat_node * node)
 {
-	long vnum = 0;
+	xen_version_op_val_t vnum = 0;
 	xen_extraversion_t version;
 
 	/* Collect Xen version information if not already collected */
 	if (node->handle->xen_version[0] == '\0') {
 		/* Get the Xen version number and extraversion string */
-		vnum = xc_version(node->handle->xc_handle,
-			XENVER_version, NULL);
-
-		if (vnum < 0)
+		if (xc_version(node->handle->xc_handle,
+			           XEN_VERSION_OP_version, &vnum, sizeof(vnum)) < 0 )
 			return 0;
 
-		if (xc_version(node->handle->xc_handle, XENVER_extraversion,
-			&version) < 0)
+		if (xc_version(node->handle->xc_handle, XEN_VERSION_OP_extraversion,
+			           &version, sizeof(version)) < 0)
 			return 0;
 		/* Format the version information as a string and store it */
 		snprintf(node->handle->xen_version, VERSION_SIZE, "%ld.%ld%s",
diff --git a/tools/xentrace/xenctx.c b/tools/xentrace/xenctx.c
index e647179..14d2f8b 100644
--- a/tools/xentrace/xenctx.c
+++ b/tools/xentrace/xenctx.c
@@ -1000,7 +1000,8 @@ static void dump_ctx(int vcpu)
             guest_word_size = (cpuctx.msr_efer & 0x400) ? 8 :
                 guest_protected_mode ? 4 : 2;
             /* HVM guest context records are always host-sized */
-            if (xc_version(xenctx.xc_handle, XENVER_capabilities, &xen_caps) != 0) {
+            if (xc_version(xenctx.xc_handle, XEN_VERSION_OP_capabilities,
+                           &xen_caps, sizeof(xen_caps)) < 0) {
                 perror("xc_version");
                 return;
             }
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 06/34] x86/arm: Add BUGFRAME_NR define and BUILD checks.
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (4 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 05/34] libxc/libxl/python/xenstat: Use new XEN_VERSION_OP hypercall Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 18:54   ` Andrew Cooper
                     ` (2 more replies)
  2016-03-15 17:56 ` [PATCH v4 07/34] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables Konrad Rzeszutek Wilk
                   ` (27 subsequent siblings)
  33 siblings, 3 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Julien Grall, Stefano Stabellini, Jan Beulich,
	Konrad Rzeszutek Wilk

So that we have a nice mechansim to figure out the upper
bounds of bug.frames and also catch compiler errors in case
one tries to use a higher frame number.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
---
 xen/include/asm-arm/bug.h | 2 ++
 xen/include/asm-x86/bug.h | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/xen/include/asm-arm/bug.h b/xen/include/asm-arm/bug.h
index ab9e811..4df6b2a 100644
--- a/xen/include/asm-arm/bug.h
+++ b/xen/include/asm-arm/bug.h
@@ -31,6 +31,7 @@ struct bug_frame {
 #define BUGFRAME_warn   0
 #define BUGFRAME_bug    1
 #define BUGFRAME_assert 2
+#define BUGFRAME_NR     3
 
 /* Many versions of GCC doesn't support the asm %c parameter which would
  * be preferable to this unpleasantness. We use mergeable string
@@ -39,6 +40,7 @@ struct bug_frame {
  */
 #define BUG_FRAME(type, line, file, has_msg, msg) do {                      \
     BUILD_BUG_ON((line) >> 16);                                             \
+    BUILD_BUG_ON(type >= BUGFRAME_NR);                                      \
     asm ("1:"BUG_INSTR"\n"                                                  \
          ".pushsection .rodata.str, \"aMS\", %progbits, 1\n"                \
          "2:\t.asciz " __stringify(file) "\n"                               \
diff --git a/xen/include/asm-x86/bug.h b/xen/include/asm-x86/bug.h
index e868e85..bd17ade 100644
--- a/xen/include/asm-x86/bug.h
+++ b/xen/include/asm-x86/bug.h
@@ -9,7 +9,7 @@
 #define BUGFRAME_warn   1
 #define BUGFRAME_bug    2
 #define BUGFRAME_assert 3
-
+#define BUGFRAME_NR     4
 #ifndef __ASSEMBLY__
 
 struct bug_frame {
@@ -51,6 +51,7 @@ struct bug_frame {
 
 #define BUG_FRAME(type, line, ptr, second_frame, msg) do {                   \
     BUILD_BUG_ON((line) >> (BUG_LINE_LO_WIDTH + BUG_LINE_HI_WIDTH));         \
+    BUILD_BUG_ON((type) >= (BUGFRAME_NR));                                   \
     asm volatile ( _ASM_BUGFRAME_TEXT(second_frame)                          \
                    :: _ASM_BUGFRAME_INFO(type, line, ptr, msg) );            \
 } while (0)
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 07/34] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (5 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 06/34] x86/arm: Add BUGFRAME_NR define and BUILD checks Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 19:24   ` Andrew Cooper
  2016-03-18 13:07   ` Jan Beulich
  2016-03-15 17:56 ` [PATCH v4 08/34] vmap: Make the while loop less fishy Konrad Rzeszutek Wilk
                   ` (26 subsequent siblings)
  33 siblings, 2 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Julien Grall, Stefano Stabellini, Jan Beulich,
	Konrad Rzeszutek Wilk

lookup.

During execution of the hypervisor we have two regions of
executable code - stext -> _etext, and _sinittext -> _einitext.

The later is not needed after bootup.

We also have various built-in macros and functions to search
in between those two swaths depending on the state of the system.

That is either for bug_frames, exceptions (x86) or symbol
names for the instruction.

With xSplice in the picture - we need a mechansim for new payloads
to searched as well for all of this.

Originally we had extra 'if (xsplice)...' but that gets
a bit tiring and does not hook up nicely.

This 'struct virtual_region' and virtual_region_list provide a
mechanism to search for the bug_frames,exception table,
and symbol names entries without having various calls in
other sub-components in the system.

Code which wishes to participate in bug_frames and exception table
entries search has to only use two public APIs:
 - register_virtual_region
 - unregister_virtual_region

to let the core code know. Furthermore there are also overrides
via the .skip function. There are three possible flags that
can be passed in - depending on what kind of search is being
done. A return of 1 means skip this region. If the .skip is
NULL the region will be considered.

The ->lookup_symbol will only be used if ->skip returns 1
for CHECKING_SYMBOLS (and of course if it points to
a function). Otherwise the default internal symbol lookup
mechanism is used.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
---
 xen/arch/arm/traps.c             |  45 ++++++++++-----
 xen/arch/x86/extable.c           |  16 +++++-
 xen/arch/x86/setup.c             |   3 +-
 xen/arch/x86/traps.c             |  46 +++++++++------
 xen/common/Makefile              |   1 +
 xen/common/bug_ex_symbols.c      | 119 +++++++++++++++++++++++++++++++++++++++
 xen/common/symbols.c             |  29 +++++++++-
 xen/include/xen/bug_ex_symbols.h |  74 ++++++++++++++++++++++++
 xen/include/xen/kernel.h         |   2 +
 xen/include/xen/symbols.h        |   9 +++
 10 files changed, 307 insertions(+), 37 deletions(-)
 create mode 100644 xen/common/bug_ex_symbols.c
 create mode 100644 xen/include/xen/bug_ex_symbols.h

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 31d2115..b62c91f 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -16,6 +16,7 @@
  * GNU General Public License for more details.
  */
 
+#include <xen/bug_ex_symbols.h>
 #include <xen/config.h>
 #include <xen/stdbool.h>
 #include <xen/init.h>
@@ -101,6 +102,8 @@ integer_param("debug_stack_lines", debug_stack_lines);
 
 void init_traps(void)
 {
+    setup_virtual_regions();
+
     /* Setup Hyp vector base */
     WRITE_SYSREG((vaddr_t)hyp_traps_vector, VBAR_EL2);
 
@@ -1077,27 +1080,39 @@ void do_unexpected_trap(const char *msg, struct cpu_user_regs *regs)
 
 int do_bug_frame(struct cpu_user_regs *regs, vaddr_t pc)
 {
-    const struct bug_frame *bug;
+    const struct bug_frame *bug = NULL;
     const char *prefix = "", *filename, *predicate;
     unsigned long fixup;
-    int id, lineno;
-    static const struct bug_frame *const stop_frames[] = {
-        __stop_bug_frames_0,
-        __stop_bug_frames_1,
-        __stop_bug_frames_2,
-        NULL
-    };
+    int id = -1, lineno;
+    struct virtual_region *region;
 
-    for ( bug = __start_bug_frames, id = 0; stop_frames[id]; ++bug )
+    list_for_each_entry( region, &virtual_region_list, list )
     {
-        while ( unlikely(bug == stop_frames[id]) )
-            ++id;
+        unsigned int i;
 
-        if ( ((vaddr_t)bug_loc(bug)) == pc )
-            break;
-    }
+        if ( region->skip && region->skip(CHECKING_BUG_FRAME, region->priv) )
+            continue;
+
+        if ( pc < region->start || pc > region->end )
+            continue;
 
-    if ( !stop_frames[id] )
+        for ( id = 0; id < BUGFRAME_NR; id++ )
+        {
+            const struct bug_frame *b = NULL;
+
+            for ( i = 0, b = region->frame[id].bugs;
+                  i < region->frame[id].n_bugs; b++, i++ )
+            {
+                if ( ((vaddr_t)bug_loc(b)) == pc )
+                {
+                    bug = b;
+                    goto found;
+                }
+            }
+        }
+    }
+ found:
+    if ( !bug )
         return -ENOENT;
 
     /* WARN, BUG or ASSERT: decode the filename pointer and line number. */
diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c
index 89b5bcb..6e083a8 100644
--- a/xen/arch/x86/extable.c
+++ b/xen/arch/x86/extable.c
@@ -1,6 +1,8 @@
 
+#include <xen/bug_ex_symbols.h>
 #include <xen/config.h>
 #include <xen/init.h>
+#include <xen/list.h>
 #include <xen/perfc.h>
 #include <xen/sort.h>
 #include <xen/spinlock.h>
@@ -80,8 +82,18 @@ search_one_table(const struct exception_table_entry *first,
 unsigned long
 search_exception_table(unsigned long addr)
 {
-    return search_one_table(
-        __start___ex_table, __stop___ex_table-1, addr);
+    struct virtual_region *region;
+
+    list_for_each_entry( region, &virtual_region_list, list )
+    {
+        if ( region->skip && region->skip(CHECKING_EXCEPTION, region->priv) )
+            continue;
+
+        if ( (addr >= region->start) && (addr < region->end) )
+            return search_one_table(region->ex, region->ex_end-1, addr);
+    }
+
+    return 0;
 }
 
 unsigned long
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index a8bf2c9..115e6fd 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1,3 +1,4 @@
+#include <xen/bug_ex_symbols.h>
 #include <xen/config.h>
 #include <xen/init.h>
 #include <xen/lib.h>
@@ -614,8 +615,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     load_system_tables();
 
     smp_prepare_boot_cpu();
-    sort_exception_tables();
 
+    setup_virtual_regions();
     /* Full exception support from here on in. */
 
     loader = (mbi->flags & MBI_LOADERNAME)
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 564a107..eeada97 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -24,6 +24,7 @@
  * Gareth Hughes <gareth@valinux.com>, May 2000
  */
 
+#include <xen/bug_ex_symbols.h>
 #include <xen/config.h>
 #include <xen/init.h>
 #include <xen/sched.h>
@@ -1132,18 +1133,12 @@ static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
 
 void do_invalid_op(struct cpu_user_regs *regs)
 {
-    const struct bug_frame *bug;
+    const struct bug_frame *bug = NULL;
     u8 bug_insn[2];
     const char *prefix = "", *filename, *predicate, *eip = (char *)regs->eip;
     unsigned long fixup;
-    int id, lineno;
-    static const struct bug_frame *const stop_frames[] = {
-        __stop_bug_frames_0,
-        __stop_bug_frames_1,
-        __stop_bug_frames_2,
-        __stop_bug_frames_3,
-        NULL
-    };
+    int id = -1, lineno;
+    struct virtual_region *region;
 
     DEBUGGER_trap_entry(TRAP_invalid_op, regs);
 
@@ -1160,16 +1155,35 @@ void do_invalid_op(struct cpu_user_regs *regs)
          memcmp(bug_insn, "\xf\xb", sizeof(bug_insn)) )
         goto die;
 
-    for ( bug = __start_bug_frames, id = 0; stop_frames[id]; ++bug )
+    list_for_each_entry( region, &virtual_region_list, list )
     {
-        while ( unlikely(bug == stop_frames[id]) )
-            ++id;
-        if ( bug_loc(bug) == eip )
-            break;
+        unsigned int i;
+
+        if ( region->skip && region->skip(CHECKING_BUG_FRAME, region->priv) )
+            continue;
+
+        if ( regs->eip < region->start || regs->eip > region->end )
+            continue;
+
+        for ( id = 0; id < BUGFRAME_NR; id++ )
+        {
+            const struct bug_frame *b = NULL;
+
+            for ( i = 0, b = region->frame[id].bugs;
+                  i < region->frame[id].n_bugs; b++, i++ )
+            {
+                if ( bug_loc(b) == eip )
+                {
+                    bug = b;
+                    goto found;
+                }
+            }
+        }
     }
-    if ( !stop_frames[id] )
-        goto die;
 
+ found:
+    if ( !bug )
+        goto die;
     eip += sizeof(bug_insn);
     if ( id == BUGFRAME_run_fn )
     {
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 82625a5..76d7b07 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -1,4 +1,5 @@
 obj-y += bitmap.o
+obj-y += bug_ex_symbols.o
 obj-$(CONFIG_CORE_PARKING) += core_parking.o
 obj-y += cpu.o
 obj-y += cpupool.o
diff --git a/xen/common/bug_ex_symbols.c b/xen/common/bug_ex_symbols.c
new file mode 100644
index 0000000..77bb72b
--- /dev/null
+++ b/xen/common/bug_ex_symbols.c
@@ -0,0 +1,119 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#include <xen/bug_ex_symbols.h>
+#include <xen/config.h>
+#include <xen/kernel.h>
+#include <xen/init.h>
+#include <xen/spinlock.h>
+
+extern char __stext[];
+
+struct virtual_region kernel_text = {
+    .list = LIST_HEAD_INIT(kernel_text.list),
+    .start = (unsigned long)_stext,
+    .end = (unsigned long)_etext,
+#ifdef CONFIG_X86
+    .ex = (struct exception_table_entry *)__start___ex_table,
+    .ex_end = (struct exception_table_entry *)__stop___ex_table,
+#endif
+};
+
+/*
+ * The kernel_inittext should only be used when system_state
+ * is booting. Otherwise all accesses should be ignored.
+ */
+static bool_t ignore_if_active(unsigned int flag, unsigned long priv)
+{
+    return (system_state >= SYS_STATE_active);
+}
+
+/*
+ * Becomes irrelevant when __init sections are cleared.
+ */
+struct virtual_region kernel_inittext  = {
+    .list = LIST_HEAD_INIT(kernel_inittext.list),
+    .skip = ignore_if_active,
+    .start = (unsigned long)_sinittext,
+    .end = (unsigned long)_einittext,
+#ifdef CONFIG_X86
+    /* Even if they are __init their exception entry still gets stuck here. */
+    .ex = (struct exception_table_entry *)__start___ex_table,
+    .ex_end = (struct exception_table_entry *)__stop___ex_table,
+#endif
+};
+
+/*
+ * No locking. Additions are done either at startup (when there is only
+ * one CPU) or when all CPUs are running without IRQs.
+ *
+ * Deletions are big tricky. We MUST make sure all but one CPU
+ * are running cpu_relax().
+ *
+ */
+LIST_HEAD(virtual_region_list);
+
+int register_virtual_region(struct virtual_region *r)
+{
+    ASSERT(!local_irq_is_enabled());
+
+    list_add_tail(&r->list, &virtual_region_list);
+    return 0;
+}
+
+void unregister_virtual_region(struct virtual_region *r)
+{
+    ASSERT(!local_irq_is_enabled());
+
+    list_del_init(&r->list);
+}
+
+void __init setup_virtual_regions(void)
+{
+    ssize_t sz;
+    unsigned int i, idx;
+    static const struct bug_frame *const stop_frames[] = {
+        __start_bug_frames,
+        __stop_bug_frames_0,
+        __stop_bug_frames_1,
+        __stop_bug_frames_2,
+#ifdef CONFIG_X86
+        __stop_bug_frames_3,
+#endif
+        NULL
+    };
+
+#ifdef CONFIG_X86
+    sort_exception_tables();
+#endif
+
+    /* N.B. idx != i */
+    for ( idx = 0, i = 1; stop_frames[i]; i++, idx++ )
+    {
+        struct bug_frame *s;
+
+        s = (struct bug_frame *)stop_frames[i-1];
+        sz = stop_frames[i] - s;
+
+        kernel_text.frame[idx].n_bugs = sz;
+        kernel_text.frame[idx].bugs = s;
+
+        kernel_inittext.frame[idx].n_bugs = sz;
+        kernel_inittext.frame[idx].bugs = s;
+    }
+
+    register_virtual_region(&kernel_text);
+    register_virtual_region(&kernel_inittext);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/common/symbols.c b/xen/common/symbols.c
index a59c59d..2cc416e 100644
--- a/xen/common/symbols.c
+++ b/xen/common/symbols.c
@@ -10,6 +10,7 @@
  *      compression (see tools/symbols.c for a more complete description)
  */
 
+#include <xen/bug_ex_symbols.h>
 #include <xen/config.h>
 #include <xen/symbols.h>
 #include <xen/kernel.h>
@@ -95,10 +96,28 @@ static unsigned int get_symbol_offset(unsigned long pos)
     return name - symbols_names;
 }
 
+bool_t __is_active_kernel_text(unsigned long addr, symbols_lookup_t *cb)
+{
+    struct virtual_region *region;
+
+    list_for_each_entry( region, &virtual_region_list, list )
+    {
+        if ( region->skip && region->skip(CHECKING_SYMBOL, region->priv) )
+            continue;
+
+        if ( addr >= region->start && addr < region->end )
+        {
+            if ( cb && region->symbols_lookup )
+                *cb = region->symbols_lookup;
+            return 1;
+        }
+    }
+    return 0;
+}
+
 bool_t is_active_kernel_text(unsigned long addr)
 {
-    return (is_kernel_text(addr) ||
-            (system_state < SYS_STATE_active && is_kernel_inittext(addr)));
+    return __is_active_kernel_text(addr, NULL);
 }
 
 const char *symbols_lookup(unsigned long addr,
@@ -108,13 +127,17 @@ const char *symbols_lookup(unsigned long addr,
 {
     unsigned long i, low, high, mid;
     unsigned long symbol_end = 0;
+    symbols_lookup_t symbol_lookup = NULL;
 
     namebuf[KSYM_NAME_LEN] = 0;
     namebuf[0] = 0;
 
-    if (!is_active_kernel_text(addr))
+    if (!__is_active_kernel_text(addr, &symbol_lookup))
         return NULL;
 
+    if (symbol_lookup)
+        return (symbol_lookup)(addr, symbolsize, offset, namebuf);
+
         /* do a binary search on the sorted symbols_addresses array */
     low = 0;
     high = symbols_num_syms;
diff --git a/xen/include/xen/bug_ex_symbols.h b/xen/include/xen/bug_ex_symbols.h
new file mode 100644
index 0000000..6f3401b
--- /dev/null
+++ b/xen/include/xen/bug_ex_symbols.h
@@ -0,0 +1,74 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#ifndef __BUG_EX_SYMBOL_LIST__
+#define __BUG_EX_SYMBOL_LIST__
+
+#include <xen/config.h>
+#include <xen/list.h>
+#include <xen/symbols.h>
+
+#ifdef CONFIG_X86
+#include <asm/uaccess.h>
+#endif
+#include <asm/bug.h>
+
+struct virtual_region
+{
+    struct list_head list;
+
+#define CHECKING_SYMBOL         (1<<1)
+#define CHECKING_BUG_FRAME      (1<<2)
+#define CHECKING_EXCEPTION      (1<<3)
+    /*
+     * Whether to skip this region for particular searches. The flag
+     * can be CHECKING_[SYMBOL|BUG_FRAMES|EXCEPTION].
+     *
+     * If the function returns 1 this region will be skipped.
+     */
+    bool_t (*skip)(unsigned int flag, unsigned long priv);
+
+    unsigned long start;        /* Virtual address start. */
+    unsigned long end;          /* Virtual address start. */
+
+    /*
+     * If ->skip returns false for CHECKING_SYMBOL we will use
+     * 'symbols_lookup' callback to retrieve the name of the
+     * addr between start and end. If this is NULL the
+     * default lookup mechanism is used (the skip value is
+     * ignored).
+     */
+    symbols_lookup_t symbols_lookup;
+
+    struct {
+        struct bug_frame *bugs; /* The pointer to array of bug frames. */
+        ssize_t n_bugs;         /* The number of them. */
+    } frame[BUGFRAME_NR];
+
+#ifdef CONFIG_X86
+    struct exception_table_entry *ex;
+    struct exception_table_entry *ex_end;
+#endif
+
+    unsigned long priv;         /* To be used by above funcionts if need to. */
+};
+
+extern struct list_head virtual_region_list;
+
+extern void setup_virtual_regions(void);
+extern int register_virtual_region(struct virtual_region *r);
+extern void unregister_virtual_region(struct virtual_region *r);
+
+#endif /* __BUG_EX_SYMBOL_LIST__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/kernel.h b/xen/include/xen/kernel.h
index 548b64d..8cf7af7 100644
--- a/xen/include/xen/kernel.h
+++ b/xen/include/xen/kernel.h
@@ -65,12 +65,14 @@
 	1;                                      \
 })
 
+
 extern char _start[], _end[], start[];
 #define is_kernel(p) ({                         \
     char *__p = (char *)(unsigned long)(p);     \
     (__p >= _start) && (__p < _end);            \
 })
 
+/* For symbols_lookup usage. */
 extern char _stext[], _etext[];
 #define is_kernel_text(p) ({                    \
     char *__p = (char *)(unsigned long)(p);     \
diff --git a/xen/include/xen/symbols.h b/xen/include/xen/symbols.h
index 1fa0537..fe9ed8f 100644
--- a/xen/include/xen/symbols.h
+++ b/xen/include/xen/symbols.h
@@ -5,6 +5,15 @@
 
 #define KSYM_NAME_LEN 127
 
+/*
+ * Typedef for the callback functions that symbols_lookup
+ * can call if virtual_region_list has an callback for it.
+ */
+typedef const char *(*symbols_lookup_t)(unsigned long addr,
+                                        unsigned long *symbolsize,
+                                        unsigned long *offset,
+                                        char *namebuf);
+
 /* Lookup an address. */
 const char *symbols_lookup(unsigned long addr,
                            unsigned long *symbolsize,
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 08/34] vmap: Make the while loop less fishy.
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (6 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 07/34] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 19:33   ` Andrew Cooper
                     ` (2 more replies)
  2016-03-15 17:56 ` [PATCH v4 09/34] vmap: ASSERT on NULL Konrad Rzeszutek Wilk
                   ` (25 subsequent siblings)
  33 siblings, 3 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Tim Deegan, Ian Jackson, Jan Beulich, Konrad Rzeszutek Wilk

It looks like it could underflow at first glance. That is
if i is zero and you get in the while loop with the
i--. However the postfix expression is evaluated after the
conditional so the loop is fine and won't execute (with i==0).

However in spirit of defense programming lets clarify
the loop conditional.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
---
---
 xen/common/vmap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index c57239f..be01285 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -246,8 +246,8 @@ void *vmalloc(size_t size)
     return va;
 
  error:
-    while ( i-- )
-        free_domheap_page(mfn_to_page(mfn_x(mfn[i])));
+    while ( i )
+        free_domheap_page(mfn_to_page(mfn_x(mfn[--i])));
     xfree(mfn);
     return NULL;
 }
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 09/34] vmap: ASSERT on NULL.
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (7 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 08/34] vmap: Make the while loop less fishy Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 17:56 ` [PATCH v4 10/34] vmap: Add vmalloc_cb and vfree_cb Konrad Rzeszutek Wilk
                   ` (24 subsequent siblings)
  33 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Tim Deegan, Ian Jackson, Jan Beulich, Konrad Rzeszutek Wilk

The vmap_to_page macro (three levels deep!) can come up with
a NULL pointer. Lets add the proper ASSERT to catch this errant
behavior.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
---
---
 xen/common/vmap.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index be01285..1f1da1b 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -279,8 +279,12 @@ void vfree(void *va)
     ASSERT(pages);
 
     for ( i = 0; i < pages; i++ )
-        page_list_add(vmap_to_page(va + i * PAGE_SIZE), &pg_list);
+    {
+        struct page_info *page = vmap_to_page(va + i * PAGE_SIZE);
 
+        ASSERT(page);
+        page_list_add(page, &pg_list);
+    }
     vunmap(va);
 
     while ( (pg = page_list_remove_head(&pg_list)) != NULL )
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 10/34] vmap: Add vmalloc_cb and vfree_cb
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (8 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 09/34] vmap: ASSERT on NULL Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-18 13:20   ` Jan Beulich
  2016-03-15 17:56 ` [PATCH v4 11/34] xsplice: Design document Konrad Rzeszutek Wilk
                   ` (23 subsequent siblings)
  33 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Tim Deegan, Ian Jackson, Jan Beulich, Konrad Rzeszutek Wilk

For those users who want to supply their own vmap callback.
To be called _after_ the pages have been allocated and
the vmap API is ready to hand out virtual addresses.

Instead of using the vmap ones it can call the callback
which will be responsible for generating the virtual
address.

This allows users (such as xSplice) to provide their own
mechanism to set the page flags.
The users (such as patch titled "xsplice: Implement payload
loading") can wrap the calls to __vmap to accomplish this.

We also provide a mechanism for the calleer to squirrel
the MFN array in case they want to modify the virtual
addresses easily.

We also provide the free-ing code path - to use the vunmap_cb
to take care of tearing down the virtual addresses.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
---
---
 xen/common/vmap.c      | 35 ++++++++++++++++++++++++++++-------
 xen/include/xen/vmap.h | 14 ++++++++++++++
 2 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index 1f1da1b..d1f18fb 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -216,7 +216,7 @@ void vunmap(const void *va)
     vm_free(va);
 }
 
-void *vmalloc(size_t size)
+void *vmalloc_cb(size_t size, vmap_cb_t vmap_cb, mfn_t **mfn_array)
 {
     mfn_t *mfn;
     size_t pages, i;
@@ -238,11 +238,15 @@ void *vmalloc(size_t size)
         mfn[i] = _mfn(page_to_mfn(pg));
     }
 
-    va = vmap(mfn, pages);
+    va = vmap_cb ? (vmap_cb)(mfn, pages) : vmap(mfn, pages);
     if ( va == NULL )
         goto error;
 
-    xfree(mfn);
+    if ( mfn_array )
+        *mfn_array = mfn;
+    else
+        xfree(mfn);
+
     return va;
 
  error:
@@ -252,6 +256,11 @@ void *vmalloc(size_t size)
     return NULL;
 }
 
+void *vmalloc(size_t size)
+{
+    return vmalloc_cb(size, NULL, NULL);
+}
+
 void *vzalloc(size_t size)
 {
     void *p = vmalloc(size);
@@ -266,7 +275,7 @@ void *vzalloc(size_t size)
     return p;
 }
 
-void vfree(void *va)
+void vfree_cb(void *va, unsigned int nr_pages, vfree_cb_t vfree_cb_fnc)
 {
     unsigned int i, pages;
     struct page_info *pg;
@@ -275,8 +284,12 @@ void vfree(void *va)
     if ( !va )
         return;
 
-    pages = vm_size(va);
-    ASSERT(pages);
+    if ( !vfree_cb_fnc )
+    {
+        pages = vm_size(va);
+        ASSERT(pages);
+    } else
+        pages = nr_pages;
 
     for ( i = 0; i < pages; i++ )
     {
@@ -285,9 +298,17 @@ void vfree(void *va)
         ASSERT(page);
         page_list_add(page, &pg_list);
     }
-    vunmap(va);
+    if ( !vfree_cb_fnc )
+        vunmap(va);
+    else
+        vfree_cb_fnc(va, nr_pages);
 
     while ( (pg = page_list_remove_head(&pg_list)) != NULL )
         free_domheap_page(pg);
 }
+
+void vfree(void *va)
+{
+    vfree_cb(va, 0, NULL);
+}
 #endif
diff --git a/xen/include/xen/vmap.h b/xen/include/xen/vmap.h
index 5671ac8..054eb25 100644
--- a/xen/include/xen/vmap.h
+++ b/xen/include/xen/vmap.h
@@ -12,9 +12,23 @@ void *__vmap(const mfn_t *mfn, unsigned int granularity,
 void *vmap(const mfn_t *mfn, unsigned int nr);
 void vunmap(const void *);
 void *vmalloc(size_t size);
+
+/*
+ * Callback for vmalloc_cb to use when vmap-ing.
+ */
+typedef void *(*vmap_cb_t)(const mfn_t *mfn, unsigned int pages);
+void *vmalloc_cb(size_t size, vmap_cb_t vmap_cb, mfn_t **);
+
 void *vzalloc(size_t size);
 void vfree(void *va);
 
+/*
+ * Callback for vfree to use an equivalent of vmap_cb_t
+ * when tearing down.
+ */
+typedef void (*vfree_cb_t)(void *va, unsigned int pages);
+void vfree_cb(void *va, unsigned int pages, vfree_cb_t vfree_cb_fnc);
+
 void __iomem *ioremap(paddr_t, size_t);
 
 static inline void iounmap(void __iomem *va)
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 11/34] xsplice: Design document
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (9 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 10/34] vmap: Add vmalloc_cb and vfree_cb Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-23 11:18   ` Jan Beulich
  2016-03-15 17:56 ` [PATCH v4 12/34] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op Konrad Rzeszutek Wilk
                   ` (22 subsequent siblings)
  33 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Tim Deegan, Ian Jackson, Jan Beulich, Konrad Rzeszutek Wilk

A mechanism is required to binarily patch the running hypervisor with new
opcodes that have come about due to primarily security updates.

This document describes the design of the API that would allow us to
upload to the hypervisor binary patches.

This document has been shaped by the input from:
  Martin Pohlack <mpohlack@amazon.de>
  Jan Beulich <jbeulich@suse.com>

Thank you!

Input-from: Martin Pohlack <mpohlack@amazon.de>
Input-from: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>

v1-2: review
v3: Split document in v1 and v2 (todo) to simplify implementation goals.
v4: Add const on some structures. Truncate size to uint16_t where it makes sense.
v5: Convert 'id' to 'name', Add Ross's comments about what is implemented.
v6: Wei's and Ross's reviews.
v7: Jan's review comments.
v8: Jan's review comments.
    s/int32_t state/uint32_t state/ now that return code is in seperate
    field (rc). Add various other types, such as R_X86_64_PC64 in the list.
    Mention the need for compiler check.
v9: Drop the LOADED->CHECKED state and go directly to CHECKED state. Drop
    LOADED.
---
 docs/misc/xsplice.markdown | 1027 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 1027 insertions(+)
 create mode 100644 docs/misc/xsplice.markdown

diff --git a/docs/misc/xsplice.markdown b/docs/misc/xsplice.markdown
new file mode 100644
index 0000000..cb3af6e
--- /dev/null
+++ b/docs/misc/xsplice.markdown
@@ -0,0 +1,1027 @@
+# xSplice Design v1
+
+## Rationale
+
+A mechanism is required to binarily patch the running hypervisor with new
+opcodes that have come about due to primarily security updates.
+
+This document describes the design of the API that would allow us to
+upload to the hypervisor binary patches.
+
+The document is split in four sections:
+
+ * Detailed descriptions of the problem statement.
+ * Design of the data structures.
+ * Design of the hypercalls.
+ * Implementation notes that should be taken into consideration.
+
+
+## Glossary
+
+ * splice - patch in the binary code with new opcodes
+ * trampoline - a jump to a new instruction.
+ * payload - telemetries of the old code along with binary blob of the new
+   function (if needed).
+ * reloc - telemetries contained in the payload to construct proper trampoline.
+
+## History
+
+The document has gone under various reviews and only covers v1 design.
+
+The end of the document has a section titled `Not Yet Done` which
+outlines ideas and design for the future version of this work.
+
+## Multiple ways to patch
+
+The mechanism needs to be flexible to patch the hypervisor in multiple ways
+and be as simple as possible. The compiled code is contiguous in memory with
+no gaps - so we have no luxury of 'moving' existing code and must either
+insert a trampoline to the new code to be executed - or only modify in-place
+the code if there is sufficient space. The placement of new code has to be done
+by hypervisor and the virtual address for the new code is allocated dynamically.
+
+This implies that the hypervisor must compute the new offsets when splicing
+in the new trampoline code. Where the trampoline is added (inside
+the function we are patching or just the callers?) is also important.
+
+To lessen the amount of code in hypervisor, the consumer of the API
+is responsible for identifying which mechanism to employ and how many locations
+to patch. Combinations of modifying in-place code, adding trampoline, etc
+has to be supported. The API should allow read/write any memory within
+the hypervisor virtual address space.
+
+We must also have a mechanism to query what has been applied and a mechanism
+to revert it if needed.
+
+## Workflow
+
+The expected workflows of higher-level tools that manage multiple patches
+on production machines would be:
+
+ * The first obvious task is loading all available / suggested
+   hotpatches when they are available.
+ * Whenever new hotpatches are installed, they should be loaded too.
+ * One wants to query which modules have been loaded at runtime.
+ * If unloading is deemed safe (see unloading below), one may want to
+   support a workflow where a specific hotpatch is marked as bad and
+   unloaded.
+
+## Patching code
+
+The first mechanism to patch that comes in mind is in-place replacement.
+That is replace the affected code with new code. Unfortunately the x86
+ISA is variable size which places limits on how much space we have available
+to replace the instructions. That is not a problem if the change is smaller
+than the original opcode and we can fill it with nops. Problems will
+appear if the replacement code is longer.
+
+The second mechanism is by ti replace the call or jump to the
+old function with the address of the new function.
+
+A third mechanism is to add a jump to the new function at the
+start of the old function. N.B. The Xen hypervisor implements the third
+mechanism. See `Trampoline (e9 opcode)` section for more details.
+
+### Example of trampoline and in-place splicing
+
+As example we will assume the hypervisor does not have XSA-132 (see
+*domctl/sysctl: don't leak hypervisor stack to toolstacks*
+4ff3449f0e9d175ceb9551d3f2aecb59273f639d) and we would like to binary patch
+the hypervisor with it. The original code looks as so:
+
+<pre>
+   48 89 e0                  mov    %rsp,%rax  
+   48 25 00 80 ff ff         and    $0xffffffffffff8000,%rax  
+</pre>
+
+while the new patched hypervisor would be:
+
+<pre>
+   48 c7 45 b8 00 00 00 00   movq   $0x0,-0x48(%rbp)  
+   48 c7 45 c0 00 00 00 00   movq   $0x0,-0x40(%rbp)  
+   48 c7 45 c8 00 00 00 00   movq   $0x0,-0x38(%rbp)  
+   48 89 e0                  mov    %rsp,%rax  
+   48 25 00 80 ff ff         and    $0xffffffffffff8000,%rax  
+</pre>
+
+This is inside the arch_do_domctl. This new change adds 21 extra
+bytes of code which alters all the offsets inside the function. To alter
+these offsets and add the extra 21 bytes of code we might not have enough
+space in .text to squeeze this in.
+
+As such we could simplify this problem by only patching the site
+which calls arch_do_domctl:
+
+<pre>
+do_domctl:  
+ e8 4b b1 05 00          callq  ffff82d08015fbb9 <arch_do_domctl>  
+</pre>
+
+with a new address for where the new `arch_do_domctl` would be (this
+area would be allocated dynamically).
+
+Astute readers will wonder what we need to do if we were to patch `do_domctl`
+- which is not called directly by hypervisor but on behalf of the guests via
+the `compat_hypercall_table` and `hypercall_table`.
+Patching the offset in `hypercall_table` for `do_domctl:
+(ffff82d080103079 <do_domctl>:)
+
+<pre>
+
+ ffff82d08024d490:   79 30  
+ ffff82d08024d492:   10 80 d0 82 ff ff   
+
+</pre>
+
+with the new address where the new `do_domctl` is possible. The other
+place where it is used is in `hvm_hypercall64_table` which would need
+to be patched in a similar way. This would require an in-place splicing
+of the new virtual address of `arch_do_domctl`.
+
+In summary this example patched the callee of the affected function by
+ * allocating memory for the new code to live in,
+ * changing the virtual address in all the functions which called the old
+   code (computing the new offset, patching the callq with a new callq).
+ * changing the function pointer tables with the new virtual address of
+   the function (splicing in the new virtual address). Since this table
+   resides in the .rodata section we would need to temporarily change the
+   page table permissions during this part.
+
+However it has drawbacks - the safety checks which have to make sure
+the function is not on the stack - must also check every caller. For some
+patches this could mean - if there were an sufficient large amount of
+callers - that we would never be able to apply the update.
+
+Having the patching done at predetermined instances where the stacks
+are not deep mostly solves this problem.
+
+### Example of different trampoline patching.
+
+An alternative mechanism exists where we can insert a trampoline in the
+existing function to be patched to jump directly to the new code. This
+lessens the locations to be patched to one but it puts pressure on the
+CPU branching logic (I-cache, but it is just one unconditional jump).
+
+For this example we will assume that the hypervisor has not been compiled
+with fe2e079f642effb3d24a6e1a7096ef26e691d93e (XSA-125: *pre-fill structures
+for certain HYPERVISOR_xen_version sub-ops*) which mem-sets an structure
+in `xen_version` hypercall. This function is not called **anywhere** in
+the hypervisor (it is called by the guest) but referenced in the
+`compat_hypercall_table` and `hypercall_table` (and indirectly called
+from that). Patching the offset in `hypercall_table` for the old
+`do_xen_version` (ffff82d080112f9e <do_xen_version>)
+
+</pre>
+ ffff82d08024b270 <hypercall_table>:   
+ ...  
+ ffff82d08024b2f8:   9e 2f 11 80 d0 82 ff ff  
+
+</pre>
+
+with the new address where the new `do_xen_version` is possible. The other
+place where it is used is in `hvm_hypercall64_table` which would need
+to be patched in a similar way. This would require an in-place splicing
+of the new virtual address of `do_xen_version`.
+
+An alternative solution would be to patch insert a trampoline in the
+old `do_xen_version' function to directly jump to the new `do_xen_version`.
+
+<pre>
+ ffff82d080112f9e do_xen_version:  
+ ffff82d080112f9e:       48 c7 c0 da ff ff ff    mov    $0xffffffffffffffda,%rax  
+ ffff82d080112fa5:       83 ff 09                cmp    $0x9,%edi  
+ ffff82d080112fa8:       0f 87 24 05 00 00       ja     ffff82d0801134d2 ; do_xen_version+0x534  
+</pre>
+
+with:
+
+<pre>
+ ffff82d080112f9e do_xen_version:  
+ ffff82d080112f9e:       e9 XX YY ZZ QQ          jmpq   [new do_xen_version]  
+</pre>
+
+which would lessen the amount of patching to just one location.
+
+In summary this example patched the affected function to jump to the
+new replacement function which required:
+ * allocating memory for the new code to live in,
+ * inserting trampoline with new offset in the old function to point to the
+   new function.
+ * Optionally we can insert in the old function a trampoline jump to an function
+   providing an BUG_ON to catch errant code.
+
+The disadvantage of this are that the unconditional jump will consume a small
+I-cache penalty. However the simplicity of the patching and higher chance
+of passing safety checks make this a worthwhile option.
+
+This patching has a similar drawback as inline patching - the safety
+checks have to make sure the function is not on the stack. However
+since we are replacing at a higher level (a full function as opposed
+to various offsets within functions) the checks are simpler.
+
+Having the patching done at predetermined instances where the stacks
+are not deep mostly solves this problem as well.
+
+### Security
+
+With this method we can re-write the hypervisor - and as such we **MUST** be
+diligent in only allowing certain guests to perform this operation.
+
+Furthermore with SecureBoot or tboot, we **MUST** also verify the signature
+of the payload to be certain it came from a trusted source and integrity
+was intact.
+
+As such the hypercall **MUST** support an XSM policy to limit what the guest
+is allowed to invoke. If the system is booted with signature checking the
+signature checking will be enforced.
+
+## Design of payload format
+
+The payload **MUST** contain enough data to allow us to apply the update
+and also safely reverse it. As such we **MUST** know:
+
+ * The locations in memory to be patched. This can be determined dynamically
+   via symbols or via virtual addresses.
+ * The new code that will be patched in.
+
+This binary format can be constructed using an custom binary format but
+there are severe disadvantages of it:
+
+ * The format might need to be changed and we need an mechanism to accommodate
+   that.
+ * It has to be platform agnostic.
+ * Easily constructed using existing tools.
+
+As such having the payload in an ELF file is the sensible way. We would be
+carrying the various sets of structures (and data) in the ELF sections under
+different names and with definitions.
+
+Note that every structure has padding. This is added so that the hypervisor
+can re-use those fields as it sees fit.
+
+Earlier design attempted to ineptly explain the relations of the ELF sections
+to each other without using proper ELF mechanism (sh_info, sh_link, data
+structures using Elf types, etc). This design will explain the structures
+and how they are used together and not dig in the ELF format - except mention
+that the section names should match the structure names.
+
+The xSplice payload is a relocatable ELF binary. A typical binary would have:
+
+ * One or more .text sections.
+ * Zero or more read-only data sections.
+ * Zero or more data sections.
+ * Relocations for each of these sections.
+
+It may also have some architecture-specific sections. For example:
+
+ * Alternatives instructions.
+ * Bug frames.
+ * Exception tables.
+ * Relocations for each of these sections.
+
+The xSplice core code loads the payload as a standard ELF binary, relocates it
+and handles the architecture-specifc sections as needed. This process is much
+like what the Linux kernel module loader does.
+
+The payload contains a section (xsplice_patch_func) with an array of structures
+describing the functions to be patched:
+
+<pre>
+struct xsplice_patch_func {  
+    const char *name;  
+    Elf64_Xword new_addr;  
+    Elf64_Xword old_addr;  
+    Elf64_Word new_size;  
+    Elf64_Word old_size;  
+    uint8_t pad[32];  
+};  
+</pre>
+
+The size of the structure is 64 bytes.
+
+* `name` is the symbol name of the old function. Only used if `old_addr` is
+   zero, otherwise will be used during dynamic linking (when hypervisor loads
+   the payload).
+
+* `old_addr` is the address of the function to be patched and is filled in at
+  payload generation time if hypervisor function address is known. If unknown,
+  the value *MUST* be zero and the hypervisor will attempt to resolve the address.
+
+* `new_addr` is the address of the function that is replacing the old
+  function. The address is filled in during relocation. The value **MUST** be
+  the address of the new function in the file.
+
+* `old_size` and `new_size` contain the sizes of the respective functions in bytes.
+   The value of `old_size` **MUST** not be zero.
+
+* `pad` **MUST** be zero.
+
+The size of the `xsplice_patch_func` array is determined from the ELF section
+size.
+
+When applying the patch the hypervisor iterates over each `xsplice_patch_func`
+structure and the core code inserts a trampoline at `old_addr` to `new_addr`.
+
+When reverting a patch, the hypervisor iterates over each `xsplice_patch_func`
+and the core code copies the data from the undo buffer (private internal copy)
+to `old_addr`.
+
+## Hypercalls
+
+We will employ the sub operations of the system management hypercall (sysctl).
+There are to be four sub-operations:
+
+ * upload the payloads.
+ * listing of payloads summary uploaded and their state.
+ * getting an particular payload summary and its state.
+ * command to apply, delete, or revert the payload.
+
+Most of the actions are asynchronous therefore the caller is responsible
+to verify that it has been applied properly by retrieving the summary of it
+and verifying that there are no error codes associated with the payload.
+
+We **MUST** make some of them asynchronous due to the nature of patching
+it requires every physical CPU to be lock-step with each other.
+The patching mechanism while an implementation detail, is not an short
+operation and as such the design **MUST** assume it will be an long-running
+operation.
+
+The sub-operations will spell out how preemption is to be handled (if at all).
+
+Furthermore it is possible to have multiple different payloads for the same
+function. As such an unique name per payload has to be visible to allow proper manipulation.
+
+The hypercall is part of the `xen_sysctl`. The top level structure contains
+one uint32_t to determine the sub-operations and one padding field which
+*MUST* always be zero.
+
+<pre>
+struct xen_sysctl_xsplice_op {  
+    uint32_t cmd;                   /* IN: XEN_SYSCTL_XSPLICE_*. */  
+    uint32_t pad;                   /* IN: Always zero. */  
+	union {  
+          ... see below ...  
+        } u;  
+};  
+
+</pre>
+while the rest of hypercall specific structures are part of the this structure.
+
+### Basic type: struct xen_xsplice_name
+
+Most of the hypercalls employ an shared structure called `struct xen_xsplice_name`
+which contains:
+
+ * `name` - pointer where the string for the name is located.
+ * `size` - the size of the string
+ * `pad` - padding - to be zero.
+
+The structure is as follow:
+
+<pre>
+#define XEN_XSPLICE_NAME_SIZE 128  
+struct xen_xsplice_name {  
+    XEN_GUEST_HANDLE_64(char) name;         /* IN, pointer to name. */  
+    uint16_t size;                          /* IN, size of name. May be upto   
+                                               XEN_XSPLICE_NAME_SIZE. */  
+    uint16_t pad[3];                        /* IN: MUST be zero. */ 
+};  
+</pre>
+
+### XEN_SYSCTL_XSPLICE_UPLOAD (0)
+
+Upload a payload to the hypervisor. The payload is verified
+against basic checks and if there are any issues the proper return code
+will be returned. The payload is not applied at this time - that is
+controlled by *XEN_SYSCTL_XSPLICE_ACTION*.
+
+The caller provides:
+
+ * A `struct xen_xsplice_name` called `name` which has the unique name.
+ * `size` the size of the ELF payload (in bytes).
+ * `payload` the virtual address of where the ELF payload is.
+
+The `name` could be an UUID that stays fixed forever for a given
+payload. It can be embedded into the ELF payload at creation time
+and extracted by tools.
+
+The return value is zero if the payload was succesfully uploaded.
+Otherwise an -XEN_EXX return value is provided. Duplicate `name` are not supported.
+
+The `payload` is the ELF payload as mentioned in the `Payload format` section.
+
+The structure is as follow:
+
+<pre>
+struct xen_sysctl_xsplice_upload {  
+    xen_xsplice_name_t name;            /* IN, name of the patch. */  
+    uint64_t size;                      /* IN, size of the ELF file. */  
+    XEN_GUEST_HANDLE_64(uint8) payload; /* IN: ELF file. */  
+};  
+</pre>
+
+### XEN_SYSCTL_XSPLICE_GET (1)
+
+Retrieve an status of an specific payload. This caller provides:
+
+ * A `struct xen_xsplice_name` called `name` which has the unique name.
+ * A `struct xen_xsplice_status` structure. The member values will
+   be over-written upon completion.
+
+Upon completion the `struct xen_xsplice_status` is updated.
+
+ * `status` - indicates the current status of the payload:
+   * *XSPLICE_STATUS_CHECKED*  (1) loaded and the ELF payload safety checks passed.
+   * *XSPLICE_STATUS_APPLIED* (2) loaded, checked, and applied.
+   *  No other value is possible.
+ * `rc` - -XEN_EXX type errors encountered while performing the last
+   XSPLICE_ACTION_* operation. The normal values can be zero or -XEN_EAGAIN which
+   respectively mean: success or operation in progress. Other values
+   imply an error occurred. If there is an error in `rc`, `status` will **NOT**
+   have changed.
+
+The return value of the hypercall is zero on success and -XEN_EXX on failure.
+(Note that the `rc`` value can be different from the return value, as in
+rc=-XEN_EAGAIN and return value can be 0).
+
+For example, supposing there is an payload:
+
+<pre>
+ status: XSPLICE_STATUS_CHECKED
+ rc: 0
+</pre>
+
+We apply an action - XSPLICE_ACTION_REVERT - to revert it (which won't work
+as we have not even applied it. Afterwards we will have:
+
+<pre>
+ status: XSPLICE_STATUS_CHECKED
+ rc: -XEN_EINVAL
+</pre>
+
+It has failed but it remains loaded.
+
+This operation is synchronous and does not require preemption.
+
+The structure is as follow:
+
+<pre>
+struct xen_xsplice_status {  
+#define XSPLICE_STATUS_CHECKED      1  
+#define XSPLICE_STATUS_APPLIED      2  
+    uint32_t state;                 /* OUT: XSPLICE_STATE_*. */  
+    int32_t rc;                     /* OUT: 0 if no error, otherwise -XEN_EXX. */  
+};  
+
+struct xen_sysctl_xsplice_get {  
+    xen_xsplice_name_t name;        /* IN, the name of the payload. */  
+    xen_xsplice_status_t status;    /* IN/OUT: status of the payload. */  
+};  
+</pre>
+
+### XEN_SYSCTL_XSPLICE_LIST (2)
+
+Retrieve an array of abbreviated status and names of payloads that are loaded in the
+hypervisor.
+
+The caller provides:
+
+ * `version`. Initially (on first hypercall) *MUST* be zero.
+ * `idx` index iterator. On first call *MUST* be zero, subsequent calls varies.
+ * `nr` the max number of entries to populate.
+ * `pad` - *MUST* be zero.
+ * `status` virtual address of where to write `struct xen_xsplice_status`
+   structures. Caller *MUST* allocate up to `nr` of them.
+ * `name` - virtual address of where to write the unique name of the payload.
+   Caller *MUST* allocate up to `nr` of them. Each *MUST* be of
+   **XEN_XSPLICE_NAME_SIZE** size.
+ * `len` - virtual address of where to write the length of each unique name
+   of the payload. Caller *MUST* allocate up to `nr` of them. Each *MUST* be
+   of sizeof(uint32_t) (4 bytes).
+
+If the hypercall returns an positive number, it is the number (upto `nr`
+provided to the hypercall) of the payloads returned, along with `nr` updated
+with the number of remaining payloads, `version` updated (it may be the same
+across hypercalls - if it varies the data is stale and further calls could
+fail). The `status`, `name`, and `len`' are updated at their designed index
+value (`idx`) with the returned value of data.
+
+If the hypercall returns -XEN_E2BIG the `nr` is too big and should be
+lowered.
+
+If the hypercall returns an zero value there are no more payloads.
+
+Note that due to the asynchronous nature of hypercalls the control domain might
+have added or removed a number of payloads making this information stale. It is
+the responsibility of the toolstack to use the `version` field to check
+between each invocation. if the version differs it should discard the stale
+data and start from scratch. It is OK for the toolstack to use the new
+`version` field.
+
+The `struct xen_xsplice_status` structure contains an status of payload which includes:
+
+ * `status` - indicates the current status of the payload:
+   * *XSPLICE_STATUS_CHECKED*  (1) loaded and the ELF payload safety checks passed.
+   * *XSPLICE_STATUS_APPLIED* (2) loaded, checked, and applied.
+   *  No other value is possible.
+ * `rc` - -XEN_EXX type errors encountered while performing the last
+   XSPLICE_ACTION_* operation. The normal values can be zero or -XEN_EAGAIN which
+   respectively mean: success or operation in progress. Other values
+   imply an error occurred. If there is an error in `rc`, `status` will **NOT**
+   have changed.
+
+The structure is as follow:
+
+<pre>
+struct xen_sysctl_xsplice_list {  
+    uint32_t version;                       /* IN/OUT: Initially *MUST* be zero.  
+                                               On subsequent calls reuse value.  
+                                               If varies between calls, we are  
+                                             * getting stale data. */  
+    uint32_t idx;                           /* IN/OUT: Index into array. */  
+    uint32_t nr;                            /* IN: How many status, names, and len  
+                                               should fill out.  
+                                               OUT: How many payloads left. */  
+    uint32_t pad;                           /* IN: Must be zero. */  
+    XEN_GUEST_HANDLE_64(xen_xsplice_status_t) status;  /* OUT. Must have enough  
+                                               space allocate for nr of them. */  
+    XEN_GUEST_HANDLE_64(char) id;           /* OUT: Array of names. Each member  
+                                               MUST XEN_XSPLICE_NAME_SIZE in size.  
+                                               Must have nr of them. */  
+    XEN_GUEST_HANDLE_64(uint32) len;        /* OUT: Array of lengths of name's.  
+                                               Must have nr of them. */  
+};  
+</pre>
+
+### XEN_SYSCTL_XSPLICE_ACTION (3)
+
+Perform an operation on the payload structure referenced by the `name` field.
+The operation request is asynchronous and the status should be retrieved
+by using either **XEN_SYSCTL_XSPLICE_GET** or **XEN_SYSCTL_XSPLICE_LIST** hypercall.
+
+The caller provides:
+
+ * A 'struct xen_xsplice_name` `name` containing the unique name.
+ * `cmd` the command requested:
+  * *XSPLICE_ACTION_CHECK* (1) check that the payload will apply properly.
+    This also verfies the payload - which may require SecureBoot firmware
+    calls. This is the initial state an payload is in.
+  * *XSPLICE_ACTION_UNLOAD* (2) unload the payload.
+   Any further hypercalls against the `name` will result in failure unless
+   **XEN_SYSCTL_XSPLICE_UPLOAD** hypercall is perfomed with same `name`.
+  * *XSPLICE_ACTION_REVERT* (3) revert the payload. If the operation takes
+  more time than the upper bound of time the `rc` in `xen_xsplice_status'
+  retrieved via **XEN_SYSCTL_XSPLICE_GET** will be -XEN_EBUSY.
+  * *XSPLICE_ACTION_APPLY* (4) apply the payload. If the operation takes
+  more time than the upper bound of time the `rc` in `xen_xsplice_status'
+  retrieved via **XEN_SYSCTL_XSPLICE_GET** will be -XEN_EBUSY.
+  * *XSPLICE_ACTION_REPLACE* (5) revert all applied payloads and apply this
+  payload. If the operation takes more time than the upper bound of time
+  the `rc` in `xen_xsplice_status' retrieved via **XEN_SYSCTL_XSPLICE_GET**
+  will be -XEN_EBUSY.
+ * `time` the upper bound of time (ms) the cmd should take. Zero means infinite.
+   If within the time the operation does not succeed the operation would go in
+   error state.
+ * `pad` - *MUST* be zero.
+
+The return value will be zero unless the provided fields are incorrect.
+
+The structure is as follow:
+
+<pre>
+#define XSPLICE_ACTION_CHECK   1  
+#define XSPLICE_ACTION_UNLOAD  2  
+#define XSPLICE_ACTION_REVERT  3  
+#define XSPLICE_ACTION_APPLY   4  
+#define XSPLICE_ACTION_REPLACE 5  
+struct xen_sysctl_xsplice_action {  
+    xen_xsplice_name_t name;                /* IN, name of the patch. */  
+    uint32_t cmd;                           /* IN: XSPLICE_ACTION_* */  
+    uint32_t time;                          /* IN: Zero if no timeout. */   
+                                            /* Or upper bound of time (ms) */   
+                                            /* for operation to take. */  
+};  
+
+</pre>
+
+## State diagrams of XSPLICE_ACTION commands.
+
+There is a strict ordering state of what the commands can be.
+The XSPLICE_ACTION prefix has been dropped to easy reading and
+does not include the XSPLICE_STATES:
+
+<pre>
+              /->\  
+              \  /  
+ UNLOAD <--- CHECK ---> REPLACE|APPLY --> REVERT --\  
+                \                                  |  
+                 \-------------------<-------------/  
+
+</pre>
+## State transition table of XSPLICE_ACTION commands and XSPLICE_STATUS.
+
+Note that:
+
+ - The CHECKED state is the starting one achieved with *XEN_SYSCTL_XSPLICE_UPLOAD* hypercall.
+ - The REVERT operation on success will automatically move to the CHECKED state.
+ - There are two STATES: CHECKED and APPLIED.
+ - There are five actions (aka commands): CHECK, APPLY, REPLACE, REVERT, and UNLOAD.
+
+The state transition table of valid states and action states:
+
+<pre>
+
++---------+---------+--------------------------------+-------+--------+
+| ACTION  | Current | Result                         | Next STATE:    |
+| ACTION  | STATE   |                                |CHECKED|APPLIED |
++---------+----------+-------------------------------+-------+--------+
+| CHECK   | CHECKED | Check payload (once more, no)  |   x   |        |
+|         |         | errors)                        |       |        |
++---------+---------+--------------------------------+-------+--------+
+| CHECK   | CHECKED | Check payload (once more, with |       |        |
+|         |         | errors)                        |       |        |
++---------+---------+--------------------------------+-------+--------+
+| UNLOAD  | CHECKED | Unload payload. Always works.  |       |        |
+|         |         | No next states.                |       |        |
++---------+---------+--------------------------------+-------+--------+
+| APPLY   | CHECKED | Apply payload (success).       |       |   x    |
++---------+---------+--------------------------------+-------+--------+
+| APPLY   | CHECKED | Apply payload (error|timeout)  |   x   |        |
++---------+---------+--------------------------------+-------+--------+
+| REPLACE | CHECKED | Revert payloads and apply new  |       |   x    |
+|         |         | payload with success.          |       |        |
++---------+---------+--------------------------------+-------+--------+
+| REPLACE | CHECKED | Revert payloads and apply new  |   x   |        |
+|         |         | payload with error.            |       |        |
++---------+---------+--------------------------------+-------+--------+
+| REVERT  | APPLIED | Revert payload (success).      |   x   |        |
++---------+---------+--------------------------------+-------+--------+
+| REVERT  | APPLIED | Revert payload (error|timeout) |       |   x    |
++---------+---------+--------------------------------+-------+--------+
+</pre>
+
+All the other state transitions are invalid.
+
+## Sequence of events.
+
+The normal sequence of events is to:
+
+ 1. *XEN_SYSCTL_XSPLICE_UPLOAD* to upload the payload. If there are errors *STOP* here.
+ 2. *XEN_SYSCTL_XSPLICE_GET* to check the `->rc`. If *-XEN_EAGAIN* spin. If zero go to next step.
+ 3. *XEN_SYSCTL_XSPLICE_ACTION* with *XSPLICE_ACTION_APPLY* to apply the patch.
+ 4. *XEN_SYSCTL_XSPLICE_GET* to check the `->rc`. If in *-XEN_EAGAIN* spin. If zero exit with success.
+
+
+## Addendum
+
+Implementation quirks should not be discussed in a design document.
+
+However these observations can provide aid when developing against this
+document.
+
+
+### Alternative assembler
+
+Alternative assembler is a mechanism to use different instructions depending
+on what the CPU supports. This is done by providing multiple streams of code
+that can be patched in - or if the CPU does not support it - padded with
+`nop` operations. The alternative assembler macros cause the compiler to
+expand the code to place a most generic code in place - emit a special
+ELF .section header to tag this location. During run-time the hypervisor
+can leave the areas alone or patch them with an better suited opcodes.
+
+Note that patching functions that copy to or from guest memory requires
+to support alternative support. For example this can be due to SMAP
+(specifically *stac* and *clac* operations) which is enabled on Broadwell
+and later architectures. It may be related to other alternative instructions.
+
+### When to patch
+
+During the discussion on the design two candidates bubbled where
+the call stack for each CPU would be deterministic. This would
+minimize the chance of the patch not being applied due to safety
+checks failing. Safety checks such as not patching code which
+is on the stack - which can lead to corruption.
+
+#### Rendezvous code instead of stop_machine for patching
+
+The hypervisor's time rendezvous code runs synchronously across all CPUs
+every second. Using the stop_machine to patch can stall the time rendezvous
+code and result in NMI. As such having the patching be done at the tail
+of rendezvous code should avoid this problem.
+
+However the entrance point for that code is
+do_softirq->timer_softirq_action->time_calibration
+which ends up calling on_selected_cpus on remote CPUs.
+
+The remote CPUs receive CALL_FUNCTION_VECTOR IPI and execute the
+desired function.
+
+#### Before entering the guest code.
+
+Before we call VMXResume we check whether any soft IRQs need to be executed.
+This is a good spot because all Xen stacks are effectively empty at
+that point.
+
+To randezvous all the CPUs an barrier with an maximum timeout (which
+could be adjusted), combined with forcing all other CPUs through the
+hypervisor with IPIs, can be utilized to execute lockstep instructions
+on all CPUs.
+
+The approach is similar in concept to stop_machine and the time rendezvous
+but is time-bound. However the local CPU stack is much shorter and
+a lot more deterministic.
+
+This is implemented in the Xen Project hypervisor.
+
+### Compiling the hypervisor code
+
+Hotpatch generation often requires support for compiling the target
+with -ffunction-sections / -fdata-sections.  Changes would have to
+be done to the linker scripts to support this.
+
+### Generation of xSplice ELF payloads
+
+The design of that is not discussed in this design.
+
+This is implemented in a seperate tool which lives in a seperate
+GIT repo.
+
+Currently it resides at https://github.com/rosslagerwall/xsplice-build
+
+### Exception tables and symbol tables growth
+
+We may need support for adapting or augmenting exception tables if
+patching such code.  Hotpatches may need to bring their own small
+exception tables (similar to how Linux modules support this).
+
+If supporting hotpatches that introduce additional exception-locations
+is not important, one could also change the exception table in-place
+and reorder it afterwards.
+
+As found almost every patch (XSA) to a non-trivial function requires
+additional entries in the exception table and/or the bug frames.
+
+This is implemented in the Xen Project hypervisor.
+
+### .rodata sections
+
+The patching might require strings to be updated as well. As such we must be
+also able to patch the strings as needed. This sounds simple - but the compiler
+has a habit of coalescing strings that are the same - which means if we in-place
+alter the strings - other users will be inadvertently affected as well.
+
+This is also where pointers to functions live - and we may need to patch this
+as well. And switch-style jump tables.
+
+To guard against that we must be prepared to do patching similar to
+trampoline patching or in-line depending on the flavour. If we can
+do in-line patching we would need to:
+
+ * alter `.rodata` to be writeable.
+ * inline patch.
+ * alter `.rodata` to be read-only.
+
+If are doing trampoline patching we would need to:
+
+ * allocate a new memory location for the string.
+ * all locations which use this string will have to be updated to use the
+   offset to the string.
+ * mark the region RO when we are done.
+
+The trampoline patching is implemented in the Xen Project hypervisor.
+
+### .bss and .data sections.
+
+In place patching writable data is not suitable as it is unclear what should be done
+depending on the current state of data. As such it should not be attempted.
+
+However, functions which are being patched can bring in changes to strings
+(.data or .rodata section changes), or even to .bss sections.
+
+As such the ELF payload can introduce new .rodata, .bss, and .data sections.
+Patching in the new function will end up also patching in the new .rodata
+section and the new function will reference the new string in the new
+.rodata section.
+
+This is implemented in the Xen Project hypervisor.
+
+### Security
+
+Only the privileged domain should be allowed to do this operation.
+
+
+# Not Yet Done
+
+This is for further development of xSplice.
+
+## Goals
+
+The implementation must also have a mechanism for:
+
+ *  An dependency mechanism for the payloads. To use that information to load:
+    - The appropiate payload. To verify that payload is built against the
+      hypervisor. This can be done via the `build-id`
+      or via providing an copy of the old code - so that the hypervisor can
+       verify it against the code in memory.
+    - To construct an appropiate order of payloads to load in case they
+      depend on each other.
+ * Be able to lookup in the Xen hypervisor the symbol names of functions from the ELF payload.
+ * Be able to patch .rodata, .bss, and .data sections.
+ * Further safety checks (blacklist of which functions cannot be patched, check
+   the stack, make sure the payload is built with same compiler as hypervisor).
+ * NOP out the code sequence if `new_size` is zero.
+ * Deal with other relocation types:  R_X86_64_[8,16,32,32S], R_X86_64_PC[8,16,64] in payload file.
+
+### xSplice interdependencies
+
+xSplice patches interdependencies are tricky.
+
+There are the ways this can be addressed:
+ * A single large patch that subsumes and replaces all previous ones.
+   Over the life-time of patching the hypervisor this large patch
+   grows to accumulate all the code changes.
+ * Hotpatch stack - where an mechanism exists that loads the hotpatches
+   in the same order they were built in. We would need an build-id
+   of the hypevisor to make sure the hot-patches are build against the
+   correct build.
+ * Payload containing the old code to check against that. That allows
+   the hotpatches to be loaded indepedently (if they don't overlap) - or
+   if the old code also containst previously patched code - even if they
+   overlap.
+
+The disadvantage of the first large patch is that it can grow over
+time and not provide an bisection mechanism to identify faulty patches.
+
+The hot-patch stack puts stricts requirements on the order of the patches
+being loaded and requires an hypervisor build-id to match against.
+
+The old code allows much more flexibility and an additional guard,
+but is more complex to implement.
+
+### Handle inlined __LINE__
+
+This problem is related to hotpatch construction
+and potentially has influence on the design of the hotpatching
+infrastructure in Xen.
+
+For example:
+
+We have file1.c with functions f1 and f2 (in that order).  f2 contains a
+BUG() (or WARN()) macro and at that point embeds the source line number
+into the generated code for f2.
+
+Now we want to hotpatch f1 and the hotpatch source-code patch adds 2
+lines to f1 and as a consequence shifts out f2 by two lines.  The newly
+constructed file1.o will now contain differences in both binary
+functions f1 (because we actually changed it with the applied patch) and
+f2 (because the contained BUG macro embeds the new line number).
+
+Without additional information, an algorithm comparing file1.o before
+and after hotpatch application will determine both functions to be
+changed and will have to include both into the binary hotpatch.
+
+Options:
+
+1. Transform source code patches for hotpatches to be line-neutral for
+   each chunk.  This can be done in almost all cases with either
+   reformatting of the source code or by introducing artificial
+   preprocessor "#line n" directives to adjust for the introduced
+   differences.
+
+   This approach is low-tech and simple.  Potentially generated
+   backtraces and existing debug information refers to the original
+   build and does not reflect hotpatching state except for actually
+   hotpatched functions but should be mostly correct.
+
+2. Ignoring the problem and living with artificially large hotpatches
+   that unnecessarily patch many functions.
+
+   This approach might lead to some very large hotpatches depending on
+   content of specific source file.  It may also trigger pulling in
+   functions into the hotpatch that cannot reasonable be hotpatched due
+   to limitations of a hotpatching framework (init-sections, parts of
+   the hotpatching framework itself, ...) and may thereby prevent us
+   from patching a specific problem.
+
+   The decision between 1. and 2. can be made on a patch--by-patch
+   basis.
+
+3. Introducing an indirection table for storing line numbers and
+   treating that specially for binary diffing. Linux may follow
+   this approach.
+
+   We might either use this indirection table for runtime use and patch
+   that with each hotpatch (similarly to exception tables) or we might
+   purely use it when building hotpatches to ignore functions that only
+   differ at exactly the location where a line-number is embedded.
+
+For BUG(), WARN(), etc., the line number is embedded into the bug frame, not
+the function itself.
+
+Similar considerations are true to a lesser extent for __FILE__, but it
+could be argued that file renaming should be done outside of hotpatches.
+
+## Signature checking requirements.
+
+The signature checking requires that the layout of the data in memory
+**MUST** be same for signature to be verified. This means that the payload
+data layout in ELF format **MUST** match what the hypervisor would be
+expecting such that it can properly do signature verification.
+
+The signature is based on the all of the payloads continuously laid out
+in memory. The signature is to be appended at the end of the ELF payload
+prefixed with the string '~Module signature appended~\n', followed by
+an signature header then followed by the signature, key identifier, and signers
+name.
+
+Specifically the signature header would be:
+
+<pre>
+#define PKEY_ALGO_DSA       0  
+#define PKEY_ALGO_RSA       1  
+
+#define PKEY_ID_PGP         0 /* OpenPGP generated key ID */  
+#define PKEY_ID_X509        1 /* X.509 arbitrary subjectKeyIdentifier */  
+
+#define HASH_ALGO_MD4          0  
+#define HASH_ALGO_MD5          1  
+#define HASH_ALGO_SHA1         2  
+#define HASH_ALGO_RIPE_MD_160  3  
+#define HASH_ALGO_SHA256       4  
+#define HASH_ALGO_SHA384       5  
+#define HASH_ALGO_SHA512       6  
+#define HASH_ALGO_SHA224       7  
+#define HASH_ALGO_RIPE_MD_128  8  
+#define HASH_ALGO_RIPE_MD_256  9  
+#define HASH_ALGO_RIPE_MD_320 10  
+#define HASH_ALGO_WP_256      11  
+#define HASH_ALGO_WP_384      12  
+#define HASH_ALGO_WP_512      13  
+#define HASH_ALGO_TGR_128     14  
+#define HASH_ALGO_TGR_160     15  
+#define HASH_ALGO_TGR_192     16  
+
+
+struct elf_payload_signature {  
+	u8	algo;		/* Public-key crypto algorithm PKEY_ALGO_*. */  
+	u8	hash;		/* Digest algorithm: HASH_ALGO_*. */  
+	u8	id_type;	/* Key identifier type PKEY_ID*. */  
+	u8	signer_len;	/* Length of signer's name */  
+	u8	key_id_len;	/* Length of key identifier */  
+	u8	__pad[3];  
+	__be32	sig_len;	/* Length of signature data */  
+};
+
+</pre>
+(Note that this has been borrowed from Linux module signature code.).
+
+
+### .bss and .data sections.
+
+In place patching writable data is not suitable as it is unclear what should be done
+depending on the current state of data. As such it should not be attempted.
+
+That said we should provide hook functions so that the existing data
+can be changed during payload application.
+
+
+### Inline patching
+
+The hypervisor should verify that the in-place patching would fit within
+the code or data.
+
+### Trampoline (e9 opcode)
+
+The e9 opcode used for jmpq uses a 32-bit signed displacement. That means
+we are limited to up to 2GB of virtual address to place the new code
+from the old code. That should not be a problem since Xen hypervisor has
+a very small footprint.
+
+However if we need - we can always add two trampolines. One at the 2GB
+limit that calls the next trampoline.
+
+Please note there is a small limitation for trampolines in
+function entries: The target function (+ trailing padding) must be able
+to accomodate the trampoline. On x86 with +-2 GB relative jumps,
+this means 5 bytes are required.
+
+Depending on compiler settings, there are several functions in Xen that
+are smaller (without inter-function padding).
+
+<pre> 
+readelf -sW xen-syms | grep " FUNC " | \
+    awk '{ if ($3 < 5) print $3, $4, $5, $8 }'
+
+...
+3 FUNC LOCAL wbinvd_ipi
+3 FUNC LOCAL shadow_l1_index
+...
+</pre>
+A compile-time check for, e.g., a minimum alignment of functions or a
+runtime check that verifies symbol size (+ padding to next symbols) for
+that in the hypervisor is advised.
+
+The tool for generating payloads currently does perform a compile-time
+check to ensure that the function to be replaced is large enough.
+
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 12/34] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (10 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 11/34] xsplice: Design document Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-16 12:12   ` Julien Grall
  2016-03-23 13:51   ` Jan Beulich
  2016-03-15 17:56 ` [PATCH v4 13/34] libxc: Implementation of XEN_XSPLICE_op in libxc Konrad Rzeszutek Wilk
                   ` (21 subsequent siblings)
  33 siblings, 2 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Wei Liu, Daniel De Graaf, Stefano Stabellini, Ian Jackson,
	Konrad Rzeszutek Wilk

The implementation does not actually do any patching.

It just adds the framework for doing the hypercalls,
keeping track of ELF payloads, and the basic operations:
 - query which payloads exist,
 - query for specific payloads,
 - check*1, apply*1, replace*1, and unload payloads.

*1: Which of course in this patch are nops.

Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>

---
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>

v2: Rebased on keyhandler: rework keyhandler infrastructure
v3: Fixed XSM.
v4: Removed REVERTED state.
    Split status and error code.
    Add REPLACE action.
    Separate payload data from the payload structure.
    s/XSPLICE_ID_../XSPLICE_NAME_../
v5: Add xsplice and CONFIG_XSPLICE build toption.
    Fix code per Jan's review.
    Update the sysctl.h (change bits to enum like)
v6: Rebase on Kconfig changes.
v7: Add missing pad checks. Re-order keyhandler.h to build on ARM.
v8: Rebase on build: hook the schedulers into Kconfig
v9: s/id/name/
    s/payload_list_lock/payload_lock/
v10: Put #ifdef CONFIG_XSPLICE in header file per Doug review.
v11: Andrew review:
    - use recursive spinlocks, change name to xsplice_op,
      sprinkle new-lines, add local variable block, include
      state diagram, squash two goto labels, use vzalloc instead of
      alloc_xenheap_pages.
    - change 'state' from int32 to uint32_t
    - remove the err label out of xsplice_upload
    - use void* instaed of uint8_t
    - move code around to make it easier to read.
    - Add vmap.h to compiler under ARM.
v12: Add missing Copyright in header file
v13: Dropped LOADED state, make the payload go in CHECKED.
---
 tools/flask/policy/policy/modules/xen/xen.te |   1 +
 xen/common/Kconfig                           |  11 +
 xen/common/Makefile                          |   1 +
 xen/common/sysctl.c                          |   7 +
 xen/common/xsplice.c                         | 389 +++++++++++++++++++++++++++
 xen/include/public/sysctl.h                  | 169 ++++++++++++
 xen/include/xen/xsplice.h                    |  35 +++
 xen/xsm/flask/hooks.c                        |   6 +
 xen/xsm/flask/policy/access_vectors          |   2 +
 9 files changed, 621 insertions(+)
 create mode 100644 xen/common/xsplice.c
 create mode 100644 xen/include/xen/xsplice.h

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index bea40c1..bac0c9e 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -72,6 +72,7 @@ allow dom0_t xen_t:xen2 {
 allow dom0_t xen_t:xen2 {
     pmu_ctrl
     get_symbol
+    xsplice_op
 };
 
 # Allow dom0 to use all XENVER_ subops and VERSION_OP subops
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 8fbc46d..dbe9ccc 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -168,4 +168,15 @@ config SCHED_DEFAULT
 
 endmenu
 
+# Enable/Disable xsplice support
+config XSPLICE
+	bool "xSplice live patching support"
+	default y
+	---help---
+	  Allows a running Xen hypervisor to be dynamically patched using
+	  binary patches without rebooting. This is primarily used to binarily
+	  patch in the field an hypervisor with XSA fixes.
+
+	  If unsure, say Y.
+
 endmenu
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 76d7b07..0c8ba2c 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -60,6 +60,7 @@ obj-y += vsprintf.o
 obj-y += wait.o
 obj-$(CONFIG_XENOPROF) += xenoprof.o
 obj-y += xmalloc_tlsf.o
+obj-$(CONFIG_XSPLICE) += xsplice.o
 
 obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma unlzo unlz4 earlycpio,$(n).init.o)
 
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 253b7c8..168a153 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -28,6 +28,7 @@
 #include <xsm/xsm.h>
 #include <xen/pmstat.h>
 #include <xen/gcov.h>
+#include <xen/xsplice.h>
 
 long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
 {
@@ -460,6 +461,12 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
         ret = tmem_control(&op->u.tmem_op);
         break;
 
+    case XEN_SYSCTL_xsplice_op:
+        ret = xsplice_op(&op->u.xsplice);
+        if ( ret != -ENOSYS )
+            copyback = 1;
+        break;
+
     default:
         ret = arch_do_sysctl(op, u_sysctl);
         copyback = 0;
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
new file mode 100644
index 0000000..ba2d376
--- /dev/null
+++ b/xen/common/xsplice.c
@@ -0,0 +1,389 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#include <xen/guest_access.h>
+#include <xen/keyhandler.h>
+#include <xen/lib.h>
+#include <xen/list.h>
+#include <xen/mm.h>
+#include <xen/sched.h>
+#include <xen/smp.h>
+#include <xen/spinlock.h>
+#include <xen/vmap.h>
+#include <xen/xsplice.h>
+
+#include <asm/event.h>
+#include <public/sysctl.h>
+
+static DEFINE_SPINLOCK(payload_lock);
+static LIST_HEAD(payload_list);
+
+static unsigned int payload_cnt;
+static unsigned int payload_version = 1;
+
+struct payload {
+    uint32_t state;                      /* One of the XSPLICE_STATE_*. */
+    int32_t rc;                          /* 0 or -XEN_EXX. */
+    struct list_head list;               /* Linked to 'payload_list'. */
+    char name[XEN_XSPLICE_NAME_SIZE + 1];/* Name of it. */
+};
+
+static int verify_name(const xen_xsplice_name_t *name)
+{
+    if ( name->size == 0 || name->size > XEN_XSPLICE_NAME_SIZE )
+        return -EINVAL;
+
+    if ( name->pad[0] || name->pad[1] || name->pad[2] )
+        return -EINVAL;
+
+    if ( !guest_handle_okay(name->name, name->size) )
+        return -EINVAL;
+
+    return 0;
+}
+
+static int verify_payload(const xen_sysctl_xsplice_upload_t *upload)
+{
+    if ( verify_name(&upload->name) )
+        return -EINVAL;
+
+    if ( upload->size == 0 )
+        return -EINVAL;
+
+    if ( !guest_handle_okay(upload->payload, upload->size) )
+        return -EFAULT;
+
+    return 0;
+}
+
+static int find_payload(const xen_xsplice_name_t *name, struct payload **f)
+{
+    struct payload *data;
+    XEN_GUEST_HANDLE_PARAM(char) str;
+    char n[XEN_XSPLICE_NAME_SIZE + 1] = { 0 };
+    int rc = -EINVAL;
+
+    rc = verify_name(name);
+    if ( rc )
+        return rc;
+
+    str = guest_handle_cast(name->name, char);
+    if ( copy_from_guest(n, str, name->size) )
+        return -EFAULT;
+
+    spin_lock_recursive(&payload_lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( data, &payload_list, list )
+    {
+        if ( !strcmp(data->name, n) )
+        {
+            *f = data;
+            rc = 0;
+            break;
+        }
+    }
+
+    spin_unlock_recursive(&payload_lock);
+
+    return rc;
+}
+
+/*
+ * We MUST be holding the payload_lock spinlock.
+ */
+static void free_payload(struct payload *data)
+{
+    ASSERT(spin_is_locked(&payload_lock));
+    list_del(&data->list);
+    payload_cnt--;
+    payload_version++;
+    xfree(data);
+}
+
+static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
+{
+    struct payload *data = NULL;
+    void *raw_data = NULL;
+    int rc;
+
+    rc = verify_payload(upload);
+    if ( rc )
+        return rc;
+
+    rc = find_payload(&upload->name, &data);
+    if ( rc == 0 /* Found. */ )
+        return -EEXIST;
+
+    if ( rc != -ENOENT )
+        return rc;
+
+    data = xzalloc(struct payload);
+    if ( !data )
+        return -ENOMEM;
+
+    rc = -EFAULT;
+    if ( copy_from_guest(data->name, upload->name.name, upload->name.size) )
+        goto out;
+
+    rc = -ENOMEM;
+    raw_data = vzalloc(upload->size);
+    if ( !raw_data )
+        goto out;
+
+    rc = -EFAULT;
+    if ( copy_from_guest(raw_data, upload->payload, upload->size) )
+        goto out;
+
+    data->state = XSPLICE_STATE_CHECKED;
+    data->rc = 0;
+    INIT_LIST_HEAD(&data->list);
+
+    spin_lock_recursive(&payload_lock);
+    list_add_tail(&data->list, &payload_list);
+    payload_cnt++;
+    payload_version++;
+    spin_unlock_recursive(&payload_lock);
+
+ out:
+    vfree(raw_data);
+    if ( rc )
+    {
+        xfree(data);
+    }
+    return rc;
+}
+
+static int xsplice_get(xen_sysctl_xsplice_get_t *get)
+{
+    struct payload *data;
+    int rc;
+
+    rc = verify_name(&get->name);
+    if ( rc )
+        return rc;
+
+    rc = find_payload(&get->name, &data);
+    if ( rc )
+        return rc;
+
+    get->status.state = data->state;
+    get->status.rc = data->rc;
+
+    return 0;
+}
+
+static int xsplice_list(xen_sysctl_xsplice_list_t *list)
+{
+    xen_xsplice_status_t status;
+    struct payload *data;
+    unsigned int idx = 0, i = 0;
+    int rc = 0;
+
+    if ( list->nr > 1024 )
+        return -E2BIG;
+
+    if ( list->pad != 0 )
+        return -EINVAL;
+
+    if ( !guest_handle_okay(list->status, sizeof(status) * list->nr) ||
+         !guest_handle_okay(list->name, XEN_XSPLICE_NAME_SIZE * list->nr) ||
+         !guest_handle_okay(list->len, sizeof(uint32_t) * list->nr) )
+        return -EINVAL;
+
+    spin_lock_recursive(&payload_lock);
+    if ( list->idx > payload_cnt || !list->nr )
+    {
+        spin_unlock_recursive(&payload_lock);
+        return -EINVAL;
+    }
+
+    list_for_each_entry( data, &payload_list, list )
+    {
+        uint32_t len;
+
+        if ( list->idx > i++ )
+            continue;
+
+        status.state = data->state;
+        status.rc = data->rc;
+        len = strlen(data->name);
+
+        /* N.B. 'idx' != 'i'. */
+        if ( __copy_to_guest_offset(list->name, idx * XEN_XSPLICE_NAME_SIZE,
+                                    data->name, len) ||
+             __copy_to_guest_offset(list->len, idx, &len, 1) ||
+             __copy_to_guest_offset(list->status, idx, &status, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        idx++;
+
+        if ( hypercall_preempt_check() || (idx + 1 > list->nr) )
+            break;
+    }
+    list->nr = payload_cnt - i; /* Remaining amount. */
+    list->version = payload_version;
+    spin_unlock_recursive(&payload_lock);
+
+    /* And how many we have processed. */
+    return rc ? : idx;
+}
+
+static int xsplice_action(xen_sysctl_xsplice_action_t *action)
+{
+    struct payload *data;
+    int rc;
+
+    rc = verify_name(&action->name);
+    if ( rc )
+        return rc;
+
+    spin_lock_recursive(&payload_lock);
+    rc = find_payload(&action->name, &data);
+    if ( rc )
+        goto out;
+
+    switch ( action->cmd )
+    {
+    case XSPLICE_ACTION_CHECK:
+        if ( data->state == XSPLICE_STATE_CHECKED )
+        {
+            /* No implementation yet. */
+            data->state = XSPLICE_STATE_CHECKED;
+            data->rc = 0;
+            rc = 0;
+        }
+        break;
+
+    case XSPLICE_ACTION_UNLOAD:
+        if ( data->state == XSPLICE_STATE_CHECKED )
+        {
+            free_payload(data);
+            /* No touching 'data' from here on! */
+            rc = 0;
+        }
+        break;
+
+    case XSPLICE_ACTION_REVERT:
+        if ( data->state == XSPLICE_STATE_APPLIED )
+        {
+            /* No implementation yet. */
+            data->state = XSPLICE_STATE_CHECKED;
+            data->rc = 0;
+            rc = 0;
+        }
+        break;
+
+    case XSPLICE_ACTION_APPLY:
+        if ( (data->state == XSPLICE_STATE_CHECKED) )
+        {
+            /* No implementation yet. */
+            data->state = XSPLICE_STATE_APPLIED;
+            data->rc = 0;
+            rc = 0;
+        }
+        break;
+
+    case XSPLICE_ACTION_REPLACE:
+        if ( data->state == XSPLICE_STATE_CHECKED )
+        {
+            /* No implementation yet. */
+            data->state = XSPLICE_STATE_CHECKED;
+            data->rc = 0;
+            rc = 0;
+        }
+        break;
+
+    default:
+        rc = -EOPNOTSUPP;
+        break;
+    }
+
+ out:
+    spin_unlock_recursive(&payload_lock);
+
+    return rc;
+}
+
+int xsplice_op(xen_sysctl_xsplice_op_t *xsplice)
+{
+    int rc;
+
+    if ( xsplice->pad != 0 )
+        return -EINVAL;
+
+    switch ( xsplice->cmd )
+    {
+    case XEN_SYSCTL_XSPLICE_UPLOAD:
+        rc = xsplice_upload(&xsplice->u.upload);
+        break;
+
+    case XEN_SYSCTL_XSPLICE_GET:
+        rc = xsplice_get(&xsplice->u.get);
+        break;
+
+    case XEN_SYSCTL_XSPLICE_LIST:
+        rc = xsplice_list(&xsplice->u.list);
+        break;
+
+    case XEN_SYSCTL_XSPLICE_ACTION:
+        rc = xsplice_action(&xsplice->u.action);
+        break;
+
+    default:
+        rc = -EOPNOTSUPP;
+        break;
+   }
+
+    return rc;
+}
+
+static const char *state2str(uint32_t state)
+{
+#define STATE(x) [XSPLICE_STATE_##x] = #x
+    static const char *const names[] = {
+            STATE(CHECKED),
+            STATE(APPLIED),
+    };
+#undef STATE
+
+    if (state >= ARRAY_SIZE(names) || !names[state])
+        return "unknown";
+
+    return names[state];
+}
+
+static void xsplice_printall(unsigned char key)
+{
+    struct payload *data;
+
+    spin_lock_recursive(&payload_lock);
+
+    list_for_each_entry ( data, &payload_list, list )
+        printk(" name=%s state=%s(%d)\n", data->name,
+               state2str(data->state), data->state);
+
+    spin_unlock_recursive(&payload_lock);
+}
+
+static int __init xsplice_init(void)
+{
+    register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
+    return 0;
+}
+__initcall(xsplice_init);
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 96680eb..487f62c 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -766,6 +766,173 @@ struct xen_sysctl_tmem_op {
 typedef struct xen_sysctl_tmem_op xen_sysctl_tmem_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tmem_op_t);
 
+/*
+ * XEN_SYSCTL_XSPLICE_op
+ *
+ * Refer to the docs/unstable/misc/xsplice.markdown
+ * for the design details of this hypercall.
+ *
+ * There are four sub-ops:
+ *  XEN_SYSCTL_XSPLICE_UPLOAD (0)
+ *  XEN_SYSCTL_XSPLICE_GET (1)
+ *  XEN_SYSCTL_XSPLICE_LIST (2)
+ *  XEN_SYSCTL_XSPLICE_ACTION (3)
+ *
+ * The normal sequence of sub-ops is to:
+ *  1) XEN_SYSCTL_XSPLICE_UPLOAD to upload the payload. If errors STOP.
+ *  2) XEN_SYSCTL_XSPLICE_GET to check the `->rc`. If -XEN_EAGAIN spin.
+ *     If zero go to next step.
+ *  3) XEN_SYSCTL_XSPLICE_ACTION with XSPLICE_ACTION_CHECK command to verify
+ *     that the payload can be succesfully applied.
+ *  4) XEN_SYSCTL_XSPLICE_GET to check the `->rc`. If -XEN_EAGAIN spin.
+ *     If zero go to next step.
+ *  5) XEN_SYSCTL_XSPLICE_ACTION with XSPLICE_ACTION_APPLY to apply the patch.
+ *  6) XEN_SYSCTL_XSPLICE_GET to check the `->rc`. If in -XEN_EAGAIN spin.
+ *     If zero exit with success.
+ */
+
+/*
+ * Structure describing an ELF payload. Uniquely identifies the
+ * payload. Should be human readable.
+ * Recommended length is upto XEN_XSPLICE_NAME_SIZE.
+ */
+#define XEN_XSPLICE_NAME_SIZE 128
+struct xen_xsplice_name {
+    XEN_GUEST_HANDLE_64(char) name;         /* IN: pointer to name. */
+    uint16_t size;                          /* IN: size of name. May be upto
+                                               XEN_XSPLICE_NAME_SIZE. */
+    uint16_t pad[3];                        /* IN: MUST be zero. */
+};
+typedef struct xen_xsplice_name xen_xsplice_name_t;
+DEFINE_XEN_GUEST_HANDLE(xen_xsplice_name_t);
+
+/*
+ * Upload a payload to the hypervisor. The payload is verified
+ * against basic checks and if there are any issues the proper return code
+ * will be returned. The payload is not applied at this time - that is
+ * controlled by XEN_SYSCTL_XSPLICE_ACTION.
+ *
+ * The return value is zero if the payload was succesfully uploaded.
+ * Otherwise an EXX return value is provided. Duplicate `name` are not
+ * supported.
+ *
+ * The payload at this point is verified against basic checks.
+ *
+ * The `payload` is the ELF payload as mentioned in the `Payload format`
+ * section in the xSplice design document.
+ */
+#define XEN_SYSCTL_XSPLICE_UPLOAD 0
+struct xen_sysctl_xsplice_upload {
+    xen_xsplice_name_t name;                /* IN, name of the patch. */
+    uint64_t size;                          /* IN, size of the ELF file. */
+    XEN_GUEST_HANDLE_64(uint8) payload;     /* IN, the ELF file. */
+};
+typedef struct xen_sysctl_xsplice_upload xen_sysctl_xsplice_upload_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_upload_t);
+
+/*
+ * Retrieve an status of an specific payload.
+ *
+ * Upon completion the `struct xen_xsplice_status` is updated.
+ *
+ * The return value is zero on success and XEN_EXX on failure. This operation
+ * is synchronous and does not require preemption.
+ */
+#define XEN_SYSCTL_XSPLICE_GET 1
+
+struct xen_xsplice_status {
+#define XSPLICE_STATE_CHECKED      1
+#define XSPLICE_STATE_APPLIED      2
+    uint32_t state;                /* OUT: XSPLICE_STATE_*. */
+    int32_t rc;                    /* OUT: 0 if no error, otherwise -XEN_EXX. */
+};
+typedef struct xen_xsplice_status xen_xsplice_status_t;
+DEFINE_XEN_GUEST_HANDLE(xen_xsplice_status_t);
+
+struct xen_sysctl_xsplice_get {
+    xen_xsplice_name_t name;                /* IN, name of the payload. */
+    xen_xsplice_status_t status;            /* IN/OUT, state of it. */
+};
+typedef struct xen_sysctl_xsplice_get xen_sysctl_xsplice_get_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_get_t);
+
+/*
+ * Retrieve an array of abbreviated status and names of payloads that are
+ * loaded in the hypervisor.
+ *
+ * If the hypercall returns an positive number, it is the number (up to `nr`)
+ * of the payloads returned, along with `nr` updated with the number of remaining
+ * payloads, `version` updated (it may be the same across hypercalls. If it
+ * varies the data is stale and further calls could fail). The `status`,
+ * `name`, and `len`' are updated at their designed index value (`idx`) with
+ * the returned value of data.
+ *
+ * If the hypercall returns E2BIG the `nr` is too big and should be
+ * lowered. The upper limit of `nr` is left to the implemention.
+ *
+ * Note that due to the asynchronous nature of hypercalls the domain might have
+ * added or removed the number of payloads making this information stale. It is
+ * the responsibility of the toolstack to use the `version` field to check
+ * between each invocation. if the version differs it should discard the stale
+ * data and start from scratch. It is OK for the toolstack to use the new
+ * `version` field.
+ */
+#define XEN_SYSCTL_XSPLICE_LIST 2
+struct xen_sysctl_xsplice_list {
+    uint32_t version;                       /* IN/OUT: Initially *MUST* be zero.
+                                               On subsequent calls reuse value.
+                                               If varies between calls, we are
+                                             * getting stale data. */
+    uint32_t idx;                           /* IN/OUT: Index into array. */
+    uint32_t nr;                            /* IN: How many status, name, and len
+                                               should fill out.
+                                               OUT: How many payloads left. */
+    uint32_t pad;                           /* IN: Must be zero. */
+    XEN_GUEST_HANDLE_64(xen_xsplice_status_t) status;  /* OUT. Must have enough
+                                               space allocate for nr of them. */
+    XEN_GUEST_HANDLE_64(char) name;         /* OUT: Array of names. Each member
+                                               MUST XEN_XSPLICE_NAME_SIZE in size.
+                                               Must have nr of them. */
+    XEN_GUEST_HANDLE_64(uint32) len;        /* OUT: Array of lengths of name's.
+                                               Must have nr of them. */
+};
+typedef struct xen_sysctl_xsplice_list xen_sysctl_xsplice_list_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_list_t);
+
+/*
+ * Perform an operation on the payload structure referenced by the `name` field.
+ * The operation request is asynchronous and the status should be retrieved
+ * by using either XEN_SYSCTL_XSPLICE_GET or XEN_SYSCTL_XSPLICE_LIST hypercall.
+ */
+#define XEN_SYSCTL_XSPLICE_ACTION 3
+struct xen_sysctl_xsplice_action {
+    xen_xsplice_name_t name;                /* IN, name of the patch. */
+#define XSPLICE_ACTION_CHECK        1
+#define XSPLICE_ACTION_UNLOAD       2
+#define XSPLICE_ACTION_REVERT       3
+#define XSPLICE_ACTION_APPLY        4
+#define XSPLICE_ACTION_REPLACE      5
+    uint32_t cmd;                           /* IN: XSPLICE_ACTION_*. */
+    uint32_t timeout;                       /* IN: Zero if no timeout. */
+                                            /* Or upper bound of time (ms) */
+                                            /* for operation to take. */
+};
+typedef struct xen_sysctl_xsplice_action xen_sysctl_xsplice_action_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_action_t);
+
+struct xen_sysctl_xsplice_op {
+    uint32_t cmd;                           /* IN: XEN_SYSCTL_XSPLICE_*. */
+    uint32_t pad;                           /* IN: Always zero. */
+    union {
+        xen_sysctl_xsplice_upload_t upload;
+        xen_sysctl_xsplice_list_t list;
+        xen_sysctl_xsplice_get_t get;
+        xen_sysctl_xsplice_action_t action;
+    } u;
+};
+typedef struct xen_sysctl_xsplice_op xen_sysctl_xsplice_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_op_t);
+
 struct xen_sysctl {
     uint32_t cmd;
 #define XEN_SYSCTL_readconsole                    1
@@ -791,6 +958,7 @@ struct xen_sysctl {
 #define XEN_SYSCTL_pcitopoinfo                   22
 #define XEN_SYSCTL_psr_cat_op                    23
 #define XEN_SYSCTL_tmem_op                       24
+#define XEN_SYSCTL_xsplice_op                    25
     uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
     union {
         struct xen_sysctl_readconsole       readconsole;
@@ -816,6 +984,7 @@ struct xen_sysctl {
         struct xen_sysctl_psr_cmt_op        psr_cmt_op;
         struct xen_sysctl_psr_cat_op        psr_cat_op;
         struct xen_sysctl_tmem_op           tmem_op;
+        struct xen_sysctl_xsplice_op        xsplice;
         uint8_t                             pad[128];
     } u;
 };
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
new file mode 100644
index 0000000..b9f08cd
--- /dev/null
+++ b/xen/include/xen/xsplice.h
@@ -0,0 +1,35 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#ifndef __XEN_XSPLICE_H__
+#define __XEN_XSPLICE_H__
+
+struct xen_sysctl_xsplice_op;
+
+#ifdef CONFIG_XSPLICE
+
+int xsplice_op(struct xen_sysctl_xsplice_op *);
+
+#else
+
+#include <xen/errno.h> /* For -ENOSYS */
+static inline int xsplice_op(struct xen_sysctl_xsplice_op *op)
+{
+    return -ENOSYS;
+}
+
+#endif /* CONFIG_XSPLICE */
+
+#endif /* __XEN_XSPLICE_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 2510229..fb5cc4a 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -808,6 +808,12 @@ static int flask_sysctl(int cmd)
     case XEN_SYSCTL_tmem_op:
         return domain_has_xen(current->domain, XEN__TMEM_CONTROL);
 
+#ifdef CONFIG_XSPLICE
+    case XEN_SYSCTL_xsplice_op:
+        return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2,
+                                    XEN2__XSPLICE_OP, NULL);
+#endif
+
     default:
         printk("flask_sysctl: Unknown op %d\n", cmd);
         return -EPERM;
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 59c9f69..a227f88 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -93,6 +93,8 @@ class xen2
     pmu_ctrl
 # PMU use (domains, including unprivileged ones, will be using this operation)
     pmu_use
+# XEN_SYSCTL_xsplice_op
+    xsplice_op
 }
 
 # Classes domain and domain2 consist of operations that a domain performs on
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 13/34] libxc: Implementation of XEN_XSPLICE_op in libxc
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (11 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 12/34] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-16 18:12   ` Wei Liu
  2016-03-15 17:56 ` [PATCH v4 14/34] xen-xsplice: Tool to manipulate xsplice payloads Konrad Rzeszutek Wilk
                   ` (20 subsequent siblings)
  33 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Konrad Rzeszutek Wilk

The underlaying toolstack code to do the basic
operations when using the XEN_XSPLICE_op syscalls:
 - upload the payload,
 - get status of an payload,
 - list all the payloads,
 - apply, check, replace, and revert the payload.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>

v2: Actually set zero for the _pad entries.
v3: Split status into state and error code.
    Add REPLACE action.
v4: Use timeout and utilize pads.
v5: Update per Wei's review.
v6: Update per Wei's review.
v7: Extra space slipped in, remove it
---
 tools/libxc/include/xenctrl.h |  62 ++++++++
 tools/libxc/xc_misc.c         | 337 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 399 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 379de30..2b8a2d7 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2553,6 +2553,68 @@ int xc_psr_cat_get_l3_info(xc_interface *xch, uint32_t socket,
                            bool *cdp_enabled);
 #endif
 
+int xc_xsplice_upload(xc_interface *xch,
+                      char *name, unsigned char *payload, uint32_t size);
+
+int xc_xsplice_get(xc_interface *xch,
+                   char *name,
+                   xen_xsplice_status_t *status);
+
+/*
+ * The heart of this function is to get an array of xen_xsplice_status_t.
+ *
+ * However it is complex because it has to deal with the hypervisor
+ * returning some of the requested data or data being stale
+ * (another hypercall might alter the list).
+ *
+ * The parameters that the function expects to contain data from
+ * the hypervisor are: 'info', 'name', and 'len'. The 'done' and
+ * 'left' are also updated with the number of entries filled out
+ * and respectively the number of entries left to get from hypervisor.
+ *
+ * It is expected that the caller of this function will take the
+ * 'left' and use the value for 'start'. This way we have an
+ * cursor in the array. Note that the 'info','name', and 'len' will
+ * be updated at the subsequent calls.
+ *
+ * The 'max' is to be provided by the caller with the maximum
+ * number of entries that 'info', 'name', and 'len' arrays can
+ * be filled up with.
+ *
+ * Each entry in the 'name' array is expected to be of XEN_XSPLICE_NAME_SIZE
+ * length.
+ *
+ * Each entry in the 'info' array is expected to be of xen_xsplice_status_t
+ * structure size.
+ *
+ * Each entry in the 'len' array is expected to be of uint32_t size.
+ *
+ * The return value is zero if the hypercall completed successfully.
+ * Note that the return value is _not_ the amount of entries filled
+ * out - that is saved in 'done'.
+ *
+ * If there was an error performing the operation, the return value
+ * will contain an negative -EXX type value. The 'done' and 'left'
+ * will contain the number of entries that had been succesfully
+ * retrieved (if any).
+ */
+int xc_xsplice_list(xc_interface *xch, unsigned int max, unsigned int start,
+                    xen_xsplice_status_t *info, char *name,
+                    uint32_t *len, unsigned int *done,
+                    unsigned int *left);
+
+/*
+ * The operations are asynchronous and the hypervisor may take a while
+ * to complete them. The `timeout` offers an option to expire the
+ * operation if it could not be completed within the specified time.
+ * Value of 0 means let hypervisor decide the best timeout.
+ */
+int xc_xsplice_apply(xc_interface *xch, char *name, uint32_t timeout);
+int xc_xsplice_revert(xc_interface *xch, char *name, uint32_t timeout);
+int xc_xsplice_unload(xc_interface *xch, char *name, uint32_t timeout);
+int xc_xsplice_check(xc_interface *xch, char *name, uint32_t timeout);
+int xc_xsplice_replace(xc_interface *xch, char *name, uint32_t timeout);
+
 /* Compat shims */
 #include "xenctrl_compat.h"
 
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index 124537b..3f62bc7 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -693,6 +693,343 @@ int xc_hvm_inject_trap(
     return rc;
 }
 
+int xc_xsplice_upload(xc_interface *xch,
+                      char *name,
+                      unsigned char *payload,
+                      uint32_t size)
+{
+    int rc;
+    DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BUFFER(char, local);
+    DECLARE_HYPERCALL_BOUNCE(name, 0 /* later */, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    xen_xsplice_name_t def_name = { .pad = { 0, 0, 0 } };
+
+    if ( !name || !payload )
+        return -1;
+
+    def_name.size = strlen(name);
+    if ( def_name.size > XEN_XSPLICE_NAME_SIZE )
+        return -1;
+
+    HYPERCALL_BOUNCE_SET_SIZE(name, def_name.size);
+
+    if ( xc_hypercall_bounce_pre(xch, name) )
+        return -1;
+
+    local = xc_hypercall_buffer_alloc(xch, local, size);
+    if ( !local )
+    {
+        xc_hypercall_bounce_post(xch, name);
+        return -1;
+    }
+    memcpy(local, payload, size);
+
+    sysctl.cmd = XEN_SYSCTL_xsplice_op;
+    sysctl.u.xsplice.cmd = XEN_SYSCTL_XSPLICE_UPLOAD;
+    sysctl.u.xsplice.pad = 0;
+    sysctl.u.xsplice.u.upload.size = size;
+    set_xen_guest_handle(sysctl.u.xsplice.u.upload.payload, local);
+
+    sysctl.u.xsplice.u.upload.name = def_name;
+    set_xen_guest_handle(sysctl.u.xsplice.u.upload.name.name, name);
+
+    rc = do_sysctl(xch, &sysctl);
+
+    xc_hypercall_buffer_free(xch, local);
+    xc_hypercall_bounce_post(xch, name);
+
+    return rc;
+}
+
+int xc_xsplice_get(xc_interface *xch,
+                   char *name,
+                   xen_xsplice_status_t *status)
+{
+    int rc;
+    DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BOUNCE(name, 0 /*adjust later */, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    xen_xsplice_name_t def_name = { .pad = { 0, 0, 0 } };
+
+    if ( !name )
+        return -1;
+
+    def_name.size = strlen(name);
+    if ( def_name.size > XEN_XSPLICE_NAME_SIZE )
+        return -1;
+
+    HYPERCALL_BOUNCE_SET_SIZE(name, def_name.size);
+
+    if ( xc_hypercall_bounce_pre(xch, name) )
+        return -1;
+
+    sysctl.cmd = XEN_SYSCTL_xsplice_op;
+    sysctl.u.xsplice.cmd = XEN_SYSCTL_XSPLICE_GET;
+    sysctl.u.xsplice.pad = 0;
+
+    sysctl.u.xsplice.u.get.status.state = 0;
+    sysctl.u.xsplice.u.get.status.rc = 0;
+
+    sysctl.u.xsplice.u.get.name = def_name;
+    set_xen_guest_handle(sysctl.u.xsplice.u.get.name.name, name);
+
+    rc = do_sysctl(xch, &sysctl);
+
+    xc_hypercall_bounce_post(xch, name);
+
+    memcpy(status, &sysctl.u.xsplice.u.get.status, sizeof(*status));
+
+    return rc;
+}
+
+/*
+ * The heart of this function is to get an array of xen_xsplice_status_t.
+ *
+ * However it is complex because it has to deal with the hypervisor
+ * returning some of the requested data or data being stale
+ * (another hypercall might alter the list).
+ *
+ * The parameters that the function expects to contain data from
+ * the hypervisor are: 'info', 'name', and 'len'. The 'done' and
+ * 'left' are also updated with the number of entries filled out
+ * and respectively the number of entries left to get from hypervisor.
+ *
+ * It is expected that the caller of this function will take the
+ * 'left' and use the value for 'start'. This way we have an
+ * cursor in the array. Note that the 'info','name', and 'len' will
+ * be updated at the subsequent calls.
+ *
+ * The 'max' is to be provided by the caller with the maximum
+ * number of entries that 'info', 'name', and 'len' arrays can
+ * be filled up with.
+ *
+ * Each entry in the 'name' array is expected to be of XEN_XSPLICE_NAME_SIZE
+ * length.
+ *
+ * Each entry in the 'info' array is expected to be of xen_xsplice_status_t
+ * structure size.
+ *
+ * Each entry in the 'len' array is expected to be of uint32_t size.
+ *
+ * The return value is zero if the hypercall completed successfully.
+ * Note that the return value is _not_ the amount of entries filled
+ * out - that is saved in 'done'.
+ *
+ * If there was an error performing the operation, the return value
+ * will contain an negative -EXX type value. The 'done' and 'left'
+ * will contain the number of entries that had been succesfully
+ * retrieved (if any).
+ */
+int xc_xsplice_list(xc_interface *xch, unsigned int max, unsigned int start,
+                    xen_xsplice_status_t *info,
+                    char *name, uint32_t *len,
+                    unsigned int *done,
+                    unsigned int *left)
+{
+    int rc;
+    DECLARE_SYSCTL;
+    /* The sizes are adjusted later - hence zero. */
+    DECLARE_HYPERCALL_BOUNCE(info, 0, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_HYPERCALL_BOUNCE(name, 0, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_HYPERCALL_BOUNCE(len, 0, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    uint32_t max_batch_sz, nr;
+    uint32_t version = 0, retries = 0;
+    uint32_t adjust = 0;
+    ssize_t sz;
+
+    if ( !max || !info || !name || !len )
+        return -1;
+
+    sysctl.cmd = XEN_SYSCTL_xsplice_op;
+    sysctl.u.xsplice.cmd = XEN_SYSCTL_XSPLICE_LIST;
+    sysctl.u.xsplice.pad = 0;
+    sysctl.u.xsplice.u.list.version = 0;
+    sysctl.u.xsplice.u.list.idx = start;
+    sysctl.u.xsplice.u.list.pad = 0;
+
+    max_batch_sz = max;
+    /* Convience value. */
+    sz = sizeof(*name) * XEN_XSPLICE_NAME_SIZE;
+    *done = 0;
+    *left = 0;
+    do {
+        /*
+         * The first time we go in this loop our 'max' may be bigger
+         * than what the hypervisor is comfortable with - hence the first
+         * couple of loops may adjust the number of entries we will
+         * want filled (tracked by 'nr').
+         *
+         * N.B. This is a do { } while loop and the right hand side of
+         * the conditional when adjusting will evaluate to false (as
+         * *left is set to zero before the loop. Hence we need this
+         * adjust - even if we reset it at the start of the loop.
+         */
+        if ( adjust )
+            adjust = 0; /* Used when adjusting the 'max_batch_sz' or 'retries'. */
+
+        nr = min(max - *done, max_batch_sz);
+
+        sysctl.u.xsplice.u.list.nr = nr;
+        /* Fix the size (may vary between hypercalls). */
+        HYPERCALL_BOUNCE_SET_SIZE(info, nr * sizeof(*info));
+        HYPERCALL_BOUNCE_SET_SIZE(name, nr * nr);
+        HYPERCALL_BOUNCE_SET_SIZE(len, nr * sizeof(*len));
+        /* Move the pointer to proper offset into 'info'. */
+        (HYPERCALL_BUFFER(info))->ubuf = info + *done;
+        (HYPERCALL_BUFFER(name))->ubuf = name + (sz * *done);
+        (HYPERCALL_BUFFER(len))->ubuf = len + *done;
+        /* Allocate memory. */
+        rc = xc_hypercall_bounce_pre(xch, info);
+        if ( rc )
+            break;
+
+        rc = xc_hypercall_bounce_pre(xch, name);
+        if ( rc )
+            break;
+
+        rc = xc_hypercall_bounce_pre(xch, len);
+        if ( rc )
+            break;
+
+        set_xen_guest_handle(sysctl.u.xsplice.u.list.status, info);
+        set_xen_guest_handle(sysctl.u.xsplice.u.list.name, name);
+        set_xen_guest_handle(sysctl.u.xsplice.u.list.len, len);
+
+        rc = do_sysctl(xch, &sysctl);
+        /*
+         * From here on we MUST call xc_hypercall_bounce. If rc < 0 we
+         * end up doing it (outside the loop), so using a break is OK.
+         */
+        if ( rc < 0 && errno == E2BIG )
+        {
+            if ( max_batch_sz <= 1 )
+                break;
+            max_batch_sz >>= 1;
+            adjust = 1; /* For the loop conditional to let us loop again. */
+            /* No memory leaks! */
+            xc_hypercall_bounce_post(xch, info);
+            xc_hypercall_bounce_post(xch, name);
+            xc_hypercall_bounce_post(xch, len);
+            continue;
+        }
+        else if ( rc < 0 ) /* For all other errors we bail out. */
+            break;
+
+        if ( !version )
+            version = sysctl.u.xsplice.u.list.version;
+
+        if ( sysctl.u.xsplice.u.list.version != version )
+        {
+            /* We could make this configurable as parameter? */
+            if ( retries++ > 3 )
+            {
+                rc = -1;
+                errno = EBUSY;
+                break;
+            }
+            *done = 0; /* Retry from scratch. */
+            version = sysctl.u.xsplice.u.list.version;
+            adjust = 1; /* And make sure we continue in the loop. */
+            /* No memory leaks. */
+            xc_hypercall_bounce_post(xch, info);
+            xc_hypercall_bounce_post(xch, name);
+            xc_hypercall_bounce_post(xch, len);
+            continue;
+        }
+
+        /* We should never hit this, but just in case. */
+        if ( rc > nr )
+        {
+            errno = EOVERFLOW; /* Overflow! */
+            rc = -1;
+            break;
+        }
+        *left = sysctl.u.xsplice.u.list.nr; /* Total remaining count. */
+        /* Copy only up 'rc' of data' - we could add 'min(rc,nr) if desired. */
+        HYPERCALL_BOUNCE_SET_SIZE(info, (rc * sizeof(*info)));
+        HYPERCALL_BOUNCE_SET_SIZE(name, (rc * sz));
+        HYPERCALL_BOUNCE_SET_SIZE(len, (rc * sizeof(*len)));
+        /* Bounce the data and free the bounce buffer. */
+        xc_hypercall_bounce_post(xch, info);
+        xc_hypercall_bounce_post(xch, name);
+        xc_hypercall_bounce_post(xch, len);
+        /* And update how many elements of info we have copied into. */
+        *done += rc;
+        /* Update idx. */
+        sysctl.u.xsplice.u.list.idx = *done;
+    } while ( adjust || (*done < max && *left != 0) );
+
+    if ( rc < 0 )
+    {
+        xc_hypercall_bounce_post(xch, len);
+        xc_hypercall_bounce_post(xch, name);
+        xc_hypercall_bounce_post(xch, info);
+    }
+
+    return rc > 0 ? 0 : rc;
+}
+
+static int _xc_xsplice_action(xc_interface *xch,
+                              char *name,
+                              unsigned int action,
+                              uint32_t timeout)
+{
+    int rc;
+    DECLARE_SYSCTL;
+    /* The size is figured out when we strlen(name) */
+    DECLARE_HYPERCALL_BOUNCE(name, 0, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    xen_xsplice_name_t def_name = { .pad = { 0, 0, 0 } };
+
+    def_name.size = strlen(name);
+
+    if ( def_name.size > XEN_XSPLICE_NAME_SIZE )
+        return -1;
+
+    HYPERCALL_BOUNCE_SET_SIZE(name, def_name.size);
+
+    if ( xc_hypercall_bounce_pre(xch, name) )
+        return -1;
+
+    sysctl.cmd = XEN_SYSCTL_xsplice_op;
+    sysctl.u.xsplice.cmd = XEN_SYSCTL_XSPLICE_ACTION;
+    sysctl.u.xsplice.pad = 0;
+    sysctl.u.xsplice.u.action.cmd = action;
+    sysctl.u.xsplice.u.action.timeout = timeout;
+
+    sysctl.u.xsplice.u.action.name = def_name;
+    set_xen_guest_handle(sysctl.u.xsplice.u.action.name.name, name);
+
+    rc = do_sysctl(xch, &sysctl);
+
+    xc_hypercall_bounce_post(xch, name);
+
+    return rc;
+}
+
+int xc_xsplice_apply(xc_interface *xch, char *name, uint32_t timeout)
+{
+    return _xc_xsplice_action(xch, name, XSPLICE_ACTION_APPLY, timeout);
+}
+
+int xc_xsplice_revert(xc_interface *xch, char *name, uint32_t timeout)
+{
+    return _xc_xsplice_action(xch, name, XSPLICE_ACTION_REVERT, timeout);
+}
+
+int xc_xsplice_unload(xc_interface *xch, char *name, uint32_t timeout)
+{
+    return _xc_xsplice_action(xch, name, XSPLICE_ACTION_UNLOAD, timeout);
+}
+
+int xc_xsplice_check(xc_interface *xch, char *name, uint32_t timeout)
+{
+    return _xc_xsplice_action(xch, name, XSPLICE_ACTION_CHECK, timeout);
+}
+
+int xc_xsplice_replace(xc_interface *xch, char *name, uint32_t timeout)
+{
+    return _xc_xsplice_action(xch, name, XSPLICE_ACTION_REPLACE, timeout);
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 14/34] xen-xsplice: Tool to manipulate xsplice payloads
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (12 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 13/34] libxc: Implementation of XEN_XSPLICE_op in libxc Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-16 18:12   ` Wei Liu
  2016-03-15 17:56 ` [PATCH v4 15/34] xsplice: Add helper elf routines Konrad Rzeszutek Wilk
                   ` (19 subsequent siblings)
  33 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Konrad Rzeszutek Wilk

A simple tool that allows an system admin to perform
basic xsplice operations:

 - Upload a xsplice file (with an unique name)
 - List all the xsplice payloads loaded.
 - Apply, revert, replace, unload, or check the payload using the
   unique name.
 - Do all three - upload, check, and apply the
   payload in one go (load). Also will use the name of the
   file as the <name>

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>

v2:
 - Removed REVERTED state.
 - Fixed bugs handling XSPLICE_STATUS_PROGRESS.
 - Split status into state and error.
   Add REPLACE action.
v3:
 - Utilize the timeout and use the default one (let the hypervisor
   pick it).
 - Change the s/all/load and infer the <id> from name of file.
v4:
 - s/id/name/
 - Don't use hypercall buffer in upload_func, instead do it in libxc
 - Remove the debug printk.
v5:
 - Remove goto's (per Wei's review)
 - Use fprintf(stderr in error paths.
 - Add local variable block.
v6:
 - Syntax, expand comment, and don't overwrite rc if xc_xsplice_upload failed.
v7:
 - Remove LOADED state. Only have CHECKED state.
---
 .gitignore               |   1 +
 tools/misc/Makefile      |   4 +
 tools/misc/xen-xsplice.c | 463 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 468 insertions(+)
 create mode 100644 tools/misc/xen-xsplice.c

diff --git a/.gitignore b/.gitignore
index 91f690c..5cae935 100644
--- a/.gitignore
+++ b/.gitignore
@@ -181,6 +181,7 @@ tools/misc/xc_shadow
 tools/misc/xen_cpuperf
 tools/misc/xen-detect
 tools/misc/xen-tmem-list-parse
+tools/misc/xen-xsplice
 tools/misc/xenperf
 tools/misc/xenpm
 tools/misc/xen-hvmctx
diff --git a/tools/misc/Makefile b/tools/misc/Makefile
index a2ef0ec..e1956f6 100644
--- a/tools/misc/Makefile
+++ b/tools/misc/Makefile
@@ -31,6 +31,7 @@ INSTALL_SBIN                   += xenlockprof
 INSTALL_SBIN                   += xenperf
 INSTALL_SBIN                   += xenpm
 INSTALL_SBIN                   += xenwatchdogd
+INSTALL_SBIN                   += xen-xsplice
 INSTALL_SBIN += $(INSTALL_SBIN-y)
 
 # Everything to be installed in a private bin/
@@ -99,6 +100,9 @@ xen-mfndump: xen-mfndump.o
 xenwatchdogd: xenwatchdogd.o
 	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
 
+xen-xsplice: xen-xsplice.o
+	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
+
 xen-lowmemd: xen-lowmemd.o
 	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenevtchn) $(LDLIBS_libxenctrl) $(LDLIBS_libxenstore) $(APPEND_LDFLAGS)
 
diff --git a/tools/misc/xen-xsplice.c b/tools/misc/xen-xsplice.c
new file mode 100644
index 0000000..fb9228e
--- /dev/null
+++ b/tools/misc/xen-xsplice.c
@@ -0,0 +1,463 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ */
+
+#include <fcntl.h>
+#include <libgen.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <unistd.h>
+#include <xenctrl.h>
+#include <xenstore.h>
+
+static xc_interface *xch;
+
+void show_help(void)
+{
+    fprintf(stderr,
+            "xen-xsplice: Xsplice test tool\n"
+            "Usage: xen-xsplice <command> [args]\n"
+            " <name> An unique name of payload. Up to %d characters.\n"
+            "Commands:\n"
+            "  help                   display this help\n"
+            "  upload <name> <file>   upload file <file> with <name> name\n"
+            "  list                   list payloads uploaded.\n"
+            "  apply <name>           apply <name> patch.\n"
+            "  revert <name>          revert name <name> patch.\n"
+            "  replace <name>         apply <name> patch and revert all others.\n"
+            "  unload <name>          unload name <name> patch.\n"
+            "  load  <file>           upload, check and apply <file>.\n"
+            "                         name is the <file> name\n",
+            XEN_XSPLICE_NAME_SIZE);
+}
+
+/* wrapper function */
+static int help_func(int argc, char *argv[])
+{
+    show_help();
+    return 0;
+}
+
+#define ARRAY_SIZE(a) (sizeof (a) / sizeof ((a)[0]))
+
+static const char *state2str(unsigned int state)
+{
+#define STATE(x) [XSPLICE_STATE_##x] = #x
+    static const char *const names[] = {
+            STATE(CHECKED),
+            STATE(APPLIED),
+    };
+#undef STATE
+    if (state >= ARRAY_SIZE(names) || !names[state])
+        return "unknown";
+
+    return names[state];
+}
+
+/* This value was choosen adhoc. It could be 42 too. */
+#define MAX_LEN 11
+static int list_func(int argc, char *argv[])
+{
+    unsigned int idx, done, left, i;
+    xen_xsplice_status_t *info = NULL;
+    char *name = NULL;
+    uint32_t *len = NULL;
+    int rc = ENOMEM;
+
+    if ( argc )
+    {
+        show_help();
+        return -1;
+    }
+    idx = left = 0;
+    info = malloc(sizeof(*info) * MAX_LEN);
+    if ( !info )
+        return rc;
+    name = malloc(sizeof(*name) * XEN_XSPLICE_NAME_SIZE * MAX_LEN);
+    if ( !name )
+    {
+        free(info);
+        return rc;
+    }
+    len = malloc(sizeof(*len) * MAX_LEN);
+    if ( !len ) {
+        free(name);
+        free(info);
+        return rc;
+    }
+
+    fprintf(stdout," ID                                     | status\n"
+                   "----------------------------------------+------------\n");
+    do {
+        done = 0;
+        /* The memset is done to catch errors. */
+        memset(info, 'A', sizeof(*info) * MAX_LEN);
+        memset(name, 'B', sizeof(*name * MAX_LEN * XEN_XSPLICE_NAME_SIZE));
+        memset(len, 'C', sizeof(*len) * MAX_LEN);
+        rc = xc_xsplice_list(xch, MAX_LEN, idx, info, name, len, &done, &left);
+        if ( rc )
+        {
+            fprintf(stderr, "Failed to list %d/%d: %d(%s)!\n",
+                    idx, left, errno, strerror(errno));
+            break;
+        }
+        for ( i = 0; i < done; i++ )
+        {
+            unsigned int j;
+            uint32_t sz;
+            char *str;
+
+            sz = len[i];
+            str = name + (i * XEN_XSPLICE_NAME_SIZE);
+            for ( j = sz; j < XEN_XSPLICE_NAME_SIZE; j++ )
+                str[j] = '\0';
+
+            printf("%-40s| %s", str, state2str(info[i].state));
+            if ( info[i].rc )
+                printf(" (%d, %s)\n", -info[i].rc, strerror(-info[i].rc));
+            else
+                puts("");
+        }
+        idx += done;
+    } while ( left );
+
+    free(name);
+    free(info);
+    free(len);
+    return rc;
+}
+#undef MAX_LEN
+
+static int get_name(int argc, char *argv[], char *name)
+{
+    ssize_t len = strlen(argv[0]);
+    if ( len > XEN_XSPLICE_NAME_SIZE )
+    {
+        fprintf(stderr, "ID MUST be %d characters!\n", XEN_XSPLICE_NAME_SIZE);
+        errno = EINVAL;
+        return errno;
+    }
+    /* Don't want any funny strings from the stack. */
+    memset(name, 0, XEN_XSPLICE_NAME_SIZE);
+    strncpy(name, argv[0], len);
+    return 0;
+}
+
+static int upload_func(int argc, char *argv[])
+{
+    char *filename;
+    char name[XEN_XSPLICE_NAME_SIZE];
+    int fd = 0, rc;
+    struct stat buf;
+    unsigned char *fbuf;
+    ssize_t len;
+
+    if ( argc != 2 )
+    {
+        show_help();
+        return -1;
+    }
+
+    if ( get_name(argc, argv, name) )
+        return EINVAL;
+
+    filename = argv[1];
+    fd = open(filename, O_RDONLY);
+    if ( fd < 0 )
+    {
+        fprintf(stderr, "Could not open %s, error: %d(%s)\n",
+                filename, errno, strerror(errno));
+        return errno;
+    }
+    if ( stat(filename, &buf) != 0 )
+    {
+        fprintf(stderr, "Could not get right size %s, error: %d(%s)\n",
+                filename, errno, strerror(errno));
+        close(fd);
+        return errno;
+    }
+
+    len = buf.st_size;
+    fbuf = mmap(0, len, PROT_READ, MAP_PRIVATE, fd, 0);
+    if ( fbuf == MAP_FAILED )
+    {
+        fprintf(stderr,"Could not map: %s, error: %d(%s)\n",
+                filename, errno, strerror(errno));
+        close (fd);
+        return errno;
+    }
+    printf("Uploading %s (%zu bytes)\n", filename, len);
+    rc = xc_xsplice_upload(xch, name, fbuf, len);
+    if ( rc )
+        fprintf(stderr, "Upload failed: %s, error: %d(%s)!\n",
+                filename, errno, strerror(errno));
+
+    if ( munmap( fbuf, len) )
+    {
+        fprintf(stderr, "Could not unmap!? error: %d(%s)!\n",
+                errno, strerror(errno));
+        if ( !rc )
+            rc = errno;
+    }
+    close(fd);
+
+    return rc;
+}
+
+/* These MUST match to the 'action_options[]' array slots. */
+enum {
+    ACTION_APPLY = 0,
+    ACTION_REVERT = 1,
+    ACTION_UNLOAD = 2,
+    ACTION_REPLACE = 3,
+};
+
+struct {
+    int allow; /* State it must be in to call function. */
+    int expected; /* The state to be in after the function. */
+    const char *name;
+    int (*function)(xc_interface *xch, char *name, uint32_t timeout);
+    unsigned int executed; /* Has the function been called?. */
+} action_options[] = {
+    {   .allow = XSPLICE_STATE_CHECKED,
+        .expected = XSPLICE_STATE_APPLIED,
+        .name = "apply",
+        .function = xc_xsplice_apply,
+    },
+    {   .allow = XSPLICE_STATE_APPLIED,
+        .expected = XSPLICE_STATE_CHECKED,
+        .name = "revert",
+        .function = xc_xsplice_revert,
+    },
+    {   .allow = XSPLICE_STATE_CHECKED,
+        .expected = -ENOENT,
+        .name = "unload",
+        .function = xc_xsplice_unload,
+    },
+    {   .allow = XSPLICE_STATE_CHECKED,
+        .expected = XSPLICE_STATE_APPLIED,
+        .name = "replace",
+        .function = xc_xsplice_replace,
+    },
+};
+
+/* Go around 300 * 0.1 seconds = 30 seconds. */
+#define RETRIES 300
+/* aka 0.1 second */
+#define DELAY 100000
+
+int action_func(int argc, char *argv[], unsigned int idx)
+{
+    char name[XEN_XSPLICE_NAME_SIZE];
+    int rc, original_state;
+    xen_xsplice_status_t status;
+    unsigned int retry = 0;
+
+    if ( argc != 1 )
+    {
+        show_help();
+        return -1;
+    }
+
+    if ( idx >= ARRAY_SIZE(action_options) )
+        return -1;
+
+    if ( get_name(argc, argv, name) )
+        return EINVAL;
+
+    /* Check initial status. */
+    rc = xc_xsplice_get(xch, name, &status);
+    if ( rc )
+    {
+        fprintf(stderr, "%s failed to get status (rc=%d, %s)!\n",
+                name, -rc, strerror(-rc));
+        return -1;
+    }
+    if ( status.rc == -EAGAIN )
+    {
+        fprintf(stderr, "%s failed. Operation already in progress\n", name);
+        return -1;
+    }
+
+    if ( status.state == action_options[idx].expected )
+    {
+        printf("No action needed\n");
+        return 0;
+    }
+
+    /* Perform action. */
+    if ( action_options[idx].allow & status.state )
+    {
+        printf("Performing %s:", action_options[idx].name);
+        rc = action_options[idx].function(xch, name, 0);
+        if ( rc )
+        {
+            fprintf(stderr, "%s failed with %d(%s)\n", name, -rc, strerror(-rc));
+            return -1;
+        }
+    }
+    else
+    {
+        printf("%s: in wrong state (%s), expected (%s)\n",
+               name, state2str(status.state),
+               state2str(action_options[idx].expected));
+        return -1;
+    }
+
+    original_state = status.state;
+    do {
+        rc = xc_xsplice_get(xch, name, &status);
+        if ( rc )
+        {
+            rc = -errno;
+            break;
+        }
+
+        if ( status.state != original_state )
+            break;
+        if ( status.rc && status.rc != -EAGAIN )
+        {
+            rc = status.rc;
+            break;
+        }
+
+        printf(".");
+        fflush(stdout);
+        usleep(DELAY);
+    } while ( ++retry < RETRIES );
+
+    if ( retry >= RETRIES )
+    {
+        fprintf(stderr, "%s: Operation didn't complete after 30 seconds.\n", name);
+        return -1;
+    }
+    else
+    {
+        if ( rc == 0 )
+            rc = status.state;
+
+        if ( action_options[idx].expected == rc )
+            printf(" completed\n");
+        else if ( rc < 0 )
+        {
+            fprintf(stderr, "%s failed with %d(%s)\n", name, -rc, strerror(-rc));
+            return -1;
+        }
+        else
+        {
+            fprintf(stderr, "%s: in wrong state (%s), expected (%s)\n",
+               name, state2str(rc),
+               state2str(action_options[idx].expected));
+            return -1;
+        }
+    }
+
+    return 0;
+}
+
+static int load_func(int argc, char *argv[])
+{
+    int rc;
+    char *new_argv[2];
+    char *path, *name, *lastdot;
+
+    if ( argc != 1 )
+    {
+        show_help();
+        return -1;
+    }
+    /* <file> */
+    new_argv[1] = argv[0];
+
+    /* Synthesize the <id> */
+    path = strdup(argv[0]);
+
+    name = basename(path);
+    lastdot = strrchr(name, '.');
+    if ( lastdot != NULL )
+        *lastdot = '\0';
+    new_argv[0] = name;
+
+    rc = upload_func(2 /* <id> <file> */, new_argv);
+    if ( rc )
+        return rc;
+
+    rc = action_func(1 /* only <id> */, new_argv, ACTION_APPLY);
+    if ( rc )
+        action_func(1, new_argv, ACTION_UNLOAD);
+
+    free(path);
+    return rc;
+}
+
+/*
+ * These are also functions in action_options that are called in case
+ * none of the ones in main_options match.
+ */
+struct {
+    const char *name;
+    int (*function)(int argc, char *argv[]);
+} main_options[] = {
+    { "help", help_func },
+    { "list", list_func },
+    { "upload", upload_func },
+    { "load", load_func },
+};
+
+int main(int argc, char *argv[])
+{
+    int i, j, ret;
+
+    if ( argc  <= 1 )
+    {
+        show_help();
+        return 0;
+    }
+    for ( i = 0; i < ARRAY_SIZE(main_options); i++ )
+        if (!strncmp(main_options[i].name, argv[1], strlen(argv[1])))
+            break;
+
+    if ( i == ARRAY_SIZE(main_options) )
+    {
+        for ( j = 0; j < ARRAY_SIZE(action_options); j++ )
+            if (!strncmp(action_options[j].name, argv[1], strlen(argv[1])))
+                break;
+
+        if ( j == ARRAY_SIZE(action_options) )
+        {
+            fprintf(stderr, "Unrecognised command '%s' -- try "
+                   "'xen-xsplice help'\n", argv[1]);
+            return 1;
+        }
+    } else
+        j = ARRAY_SIZE(action_options);
+
+    xch = xc_interface_open(0,0,0);
+    if ( !xch )
+    {
+        fprintf(stderr, "failed to get the handler\n");
+        return 0;
+    }
+
+    if ( i == ARRAY_SIZE(main_options) )
+        ret = action_func(argc -2, argv + 2, j);
+    else
+        ret = main_options[i].function(argc -2, argv + 2);
+
+    xc_interface_close(xch);
+
+    return !!ret;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 15/34] xsplice: Add helper elf routines
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (13 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 14/34] xen-xsplice: Tool to manipulate xsplice payloads Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 17:56 ` [PATCH v4 16/34] xsplice: Implement payload loading Konrad Rzeszutek Wilk
                   ` (18 subsequent siblings)
  33 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Tim Deegan, Ian Jackson, Jan Beulich, Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Add Elf routines and data structures in preparation for loading an
xSplice payload.

We make an assumption that the max number of sections an ELF payload
can have is 64. We can in future make this be dependent on the
names of the sections and verifying against a list, but for right now
this suffices.

Also we a whole lot of checks to make sure that the ELF payload
file is not corrupted nor that the offsets point past the file.

For most of the checks we print an message if the hypervisor is built
with debug enabled.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>

v2: - With the #define ELFSIZE in the ARM file we can use the common
     #defines instead of using #ifdef CONFIG_ARM_32. Moved to another
    patch.
    - Add checks for ELF file.
    - Add name to be printed.
    - Add len for easier ELF checks.
    - Expand on the checks. Add macro.
v3: Remove the return_ macro
v4: Add return_ macro back but make it depend on debug=y
v5: Per Andrew review: ddd local variable. Fix memory leak in
    elf_resolve_sections, Remove macro and use dprintk. Fix alignment.
    Use void* instead of uint8_t to handle raw payload.
v6: Fix memory leak in elf_get_sym
v7: Add XSPLICE to printk/dprintk
---
 xen/common/Makefile           |   1 +
 xen/common/xsplice_elf.c      | 287 ++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/xsplice.h     |   3 +
 xen/include/xen/xsplice_elf.h |  51 ++++++++
 4 files changed, 342 insertions(+)
 create mode 100644 xen/common/xsplice_elf.c
 create mode 100644 xen/include/xen/xsplice_elf.h

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 0c8ba2c..9b7fac7 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -61,6 +61,7 @@ obj-y += wait.o
 obj-$(CONFIG_XENOPROF) += xenoprof.o
 obj-y += xmalloc_tlsf.o
 obj-$(CONFIG_XSPLICE) += xsplice.o
+obj-$(CONFIG_XSPLICE) += xsplice_elf.o
 
 obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma unlzo unlz4 earlycpio,$(n).init.o)
 
diff --git a/xen/common/xsplice_elf.c b/xen/common/xsplice_elf.c
new file mode 100644
index 0000000..ae87361
--- /dev/null
+++ b/xen/common/xsplice_elf.c
@@ -0,0 +1,287 @@
+/*
+ * Copyright (C) 2016 Citrix Systems R&D Ltd.
+ */
+
+#include <xen/errno.h>
+#include <xen/lib.h>
+#include <xen/xsplice_elf.h>
+#include <xen/xsplice.h>
+
+struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
+                                                const char *name)
+{
+    unsigned int i;
+
+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( !strcmp(name, elf->sec[i].name) )
+            return &elf->sec[i];
+    }
+
+    return NULL;
+}
+
+static int elf_resolve_sections(struct xsplice_elf *elf, const void *data)
+{
+    struct xsplice_elf_sec *sec;
+    unsigned int i;
+
+    /* xsplice_elf_load sanity checked e_shnum checked. */
+    sec = xmalloc_array(struct xsplice_elf_sec, elf->hdr->e_shnum);
+    if ( !sec )
+    {
+        printk(XENLOG_ERR "%s%s: Could not allocate memory for section table!\n",
+               XSPLICE, elf->name);
+        return -ENOMEM;
+    }
+
+    elf->sec = sec;
+
+    /* N.B. We also will ingest SHN_UNDEF sections. */
+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
+    {
+        ssize_t delta = elf->hdr->e_shoff + i * elf->hdr->e_shentsize;
+
+        if ( delta + sizeof(Elf_Shdr) > elf->len )
+        {
+            dprintk(XENLOG_DEBUG, "%s%s: Section header [%d] is past end of payload!\n",
+                    XSPLICE, elf->name, i);
+            return -EINVAL;
+        }
+        sec[i].sec = (Elf_Shdr *)(data + delta);
+        delta = sec[i].sec->sh_offset;
+
+        if ( delta > elf->len )
+        {
+            dprintk(XENLOG_DEBUG, "%s%s: Section [%d] data is past end of payload!\n",
+                    XSPLICE, elf->name, i);
+            return -EINVAL;
+        }
+        sec[i].data = data + delta;
+
+        /* Name is populated in xsplice_elf_sections_name. */
+        sec[i].name = NULL;
+
+        if ( sec[i].sec->sh_type == SHT_SYMTAB )
+        {
+            if ( elf->symtab )
+            {
+                dprintk(XENLOG_DEBUG, "%s%s: Multiple symbol tables!\n",
+                        XSPLICE, elf->name);
+                return -EINVAL;
+            }
+            elf->symtab = &sec[i];
+
+            /*
+             * elf->symtab->sec->sh_link would point to the right section
+             * but we hadn't finished parsing all the sections.
+             */
+            if ( elf->symtab->sec->sh_link > elf->hdr->e_shnum )
+            {
+                dprintk(XENLOG_DEBUG, "%s%s: Symbol table idx (%d) to strtab past end (%d)\n",
+                        XSPLICE, elf->name, elf->symtab->sec->sh_link,
+                        elf->hdr->e_shnum);
+                return -EINVAL;
+            }
+        }
+    }
+
+    if ( !elf->symtab )
+    {
+        dprintk(XENLOG_DEBUG, "%s%s: No symbol table found!\n",
+                XSPLICE, elf->name);
+        return -EINVAL;
+    }
+
+    /* There can be multiple SHT_STRTAB so pick the right one. */
+    elf->strtab = &sec[elf->symtab->sec->sh_link];
+
+    if ( !elf->symtab->sec->sh_size || !elf->symtab->sec->sh_entsize ||
+         elf->symtab->sec->sh_entsize != sizeof(Elf_Sym) )
+    {
+        dprintk(XENLOG_DEBUG, "%s%s: Symbol table header is corrupted!\n",
+                XSPLICE, elf->name);
+        return -EINVAL;
+    }
+    return 0;
+}
+
+static int elf_resolve_section_names(struct xsplice_elf *elf, const void *data)
+{
+    const char *shstrtab;
+    unsigned int i;
+    unsigned int offset, delta;
+
+    /*
+     * The elf->sec[0 -> e_shnum] structures have been verified by
+     * elf_resolve_sections. Find file offset for section string table.
+     */
+    offset =  elf->sec[elf->hdr->e_shstrndx].sec->sh_offset;
+
+    if ( offset > elf->len )
+    {
+        dprintk(XENLOG_DEBUG, "%s%s: shstrtab section offset (%u) past end of payload!\n",
+                XSPLICE, elf->name, elf->hdr->e_shstrndx);
+        return -EINVAL;
+    }
+    shstrtab = (data + offset);
+
+    /* We could ignore the first as it is reserved.. */
+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
+    {
+        delta = elf->sec[i].sec->sh_name;
+
+        if ( offset + delta > elf->len )
+        {
+            dprintk(XENLOG_DEBUG, "%s%s: shstrtab [%d] data is past end of payload!\n",
+                    XSPLICE, elf->name, i);
+            return -EINVAL;
+        }
+        elf->sec[i].name = shstrtab + delta;
+    }
+    return 0;
+}
+
+static int elf_get_sym(struct xsplice_elf *elf, const void *data)
+{
+    struct xsplice_elf_sec *symtab_sec, *strtab_sec;
+    struct xsplice_elf_sym *sym;
+    unsigned int i, delta, offset, nsym;
+
+    symtab_sec = elf->symtab;
+
+    strtab_sec = elf->strtab;
+
+    /* Pointers arithmetic to get file offset. */
+    offset = strtab_sec->data - data;
+
+    ASSERT( offset == strtab_sec->sec->sh_offset );
+
+    /* symtab_sec->data was computed in elf_resolve_sections. */
+    ASSERT((symtab_sec->sec->sh_offset + data) == symtab_sec->data );
+
+    /* No need to check values as elf_resolve_sections did it. */
+    nsym = symtab_sec->sec->sh_size / symtab_sec->sec->sh_entsize;
+
+    sym = xmalloc_array(struct xsplice_elf_sym, nsym);
+    if ( !sym )
+    {
+        printk(XENLOG_ERR "%s%s: Could not allocate memory for symbols\n",
+               XSPLICE, elf->name);
+        return -ENOMEM;
+    }
+
+    /* So we don't leak memory. */
+    elf->sym = sym;
+    for ( i = 0; i < nsym; i++ )
+    {
+        Elf_Sym *s;
+
+        if ( i * sizeof(Elf_Sym) > elf->len )
+        {
+            dprintk(XENLOG_DEBUG, "%s%s: Symbol header [%d] is past end of payload!\n",
+                    XSPLICE, elf->name, i);
+            return -EINVAL;
+        }
+        s = &((Elf_Sym *)symtab_sec->data)[i];
+
+        /* If st->name is STN_UNDEF is zero, the check will always be true. */
+        delta = s->st_name;
+
+        /* Offset has been computed earlier. */
+        if ( offset + delta > elf->len )
+        {
+            dprintk(XENLOG_DEBUG, "%s%s: Symbol [%u] data is past end of payload!\n",
+                    XSPLICE, elf->name, i);
+            return -EINVAL;
+        }
+        sym[i].sym = s;
+        if ( s->st_name == STN_UNDEF )
+            sym[i].name = NULL;
+        else
+            sym[i].name = data + ( delta + offset );
+    }
+    elf->nsym = nsym;
+
+    return 0;
+}
+
+static int xsplice_header_check(const struct xsplice_elf *elf)
+{
+    if ( sizeof(*elf->hdr) >= elf->len )
+    {
+        dprintk(XENLOG_DEBUG, "%s%s: Section header is bigger than payload!\n",
+                XSPLICE, elf->name);
+        return -EINVAL;
+    }
+
+    if ( elf->hdr->e_shstrndx == SHN_UNDEF )
+    {
+        dprintk(XENLOG_DEBUG, "%s%s: Section name idx is undefined!?\n",
+                XSPLICE, elf->name);
+        return -EINVAL;
+    }
+
+    /* Check that section name index is within the sections. */
+    if ( elf->hdr->e_shstrndx > elf->hdr->e_shnum )
+    {
+        dprintk(XENLOG_DEBUG, "%s%s: Section name idx (%d) is past end of  sections (%d)!\n",
+                XSPLICE, elf->name, elf->hdr->e_shstrndx, elf->hdr->e_shnum);
+        return -EINVAL;
+    }
+
+    if ( elf->hdr->e_shnum > 64 )
+    {
+        dprintk(XENLOG_DEBUG, "%s%s: Too many (%d) sections!\n",
+                XSPLICE, elf->name, elf->hdr->e_shnum);
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+int xsplice_elf_load(struct xsplice_elf *elf, void *data)
+{
+    int rc;
+
+    elf->hdr = data;
+
+    rc = xsplice_header_check(elf);
+    if ( rc )
+        return rc;
+
+    rc = elf_resolve_sections(elf, data);
+    if ( rc )
+        return rc;
+
+    rc = elf_resolve_section_names(elf, data);
+    if ( rc )
+        return rc;
+
+    rc = elf_get_sym(elf, data);
+    if ( rc )
+        return rc;
+
+    return 0;
+}
+
+void xsplice_elf_free(struct xsplice_elf *elf)
+{
+    xfree(elf->sec);
+    elf->sec = NULL;
+    xfree(elf->sym);
+    elf->sym = NULL;
+    elf->nsym = 0;
+    elf->name = NULL;
+    elf->len = 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index b9f08cd..cd805a8 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -10,6 +10,9 @@ struct xen_sysctl_xsplice_op;
 
 #ifdef CONFIG_XSPLICE
 
+/* Convenience define for printk. */
+#define XSPLICE "xsplice: "
+
 int xsplice_op(struct xen_sysctl_xsplice_op *);
 
 #else
diff --git a/xen/include/xen/xsplice_elf.h b/xen/include/xen/xsplice_elf.h
new file mode 100644
index 0000000..e2dea18
--- /dev/null
+++ b/xen/include/xen/xsplice_elf.h
@@ -0,0 +1,51 @@
+/*
+ * Copyright (C) 2016 Citrix Systems R&D Ltd.
+ */
+
+#ifndef __XEN_XSPLICE_ELF_H__
+#define __XEN_XSPLICE_ELF_H__
+
+#include <xen/types.h>
+#include <xen/elfstructs.h>
+
+/* The following describes an Elf file as consumed by xSplice. */
+struct xsplice_elf_sec {
+    Elf_Shdr *sec;                 /* Hooked up in elf_resolve_sections. */
+    const char *name;              /* Human readable name hooked in
+                                      elf_resolve_section_names. */
+    const void *data;              /* Pointer to the section (done by
+                                      elf_resolve_sections). */
+};
+
+struct xsplice_elf_sym {
+    Elf_Sym *sym;
+    const char *name;
+};
+
+struct xsplice_elf {
+    const char *name;              /* Pointer to payload->name. */
+    ssize_t len;                   /* Length of the ELF file. */
+    Elf_Ehdr *hdr;                 /* ELF file. */
+    struct xsplice_elf_sec *sec;   /* Array of sections, allocated by us. */
+    struct xsplice_elf_sym *sym;   /* Array of symbols , allocated by us. */
+    unsigned int nsym;
+    struct xsplice_elf_sec *symtab;/* Pointer to .symtab section - aka to sec[x]. */
+    struct xsplice_elf_sec *strtab;/* Pointer to .strtab section - aka to sec[y]. */
+};
+
+struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
+                                                const char *name);
+int xsplice_elf_load(struct xsplice_elf *elf, void *data);
+void xsplice_elf_free(struct xsplice_elf *elf);
+
+#endif /* __XEN_XSPLICE_ELF_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 16/34] xsplice: Implement payload loading
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (14 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 15/34] xsplice: Add helper elf routines Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-22 17:25   ` Konrad Rzeszutek Wilk
  2016-03-15 17:56 ` [PATCH v4 17/34] xsplice: Implement support for applying/reverting/replacing patches Konrad Rzeszutek Wilk
                   ` (17 subsequent siblings)
  33 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Julien Grall, Stefano Stabellini, Jan Beulich,
	Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Add support for loading xsplice payloads. This is somewhat similar to
the Linux kernel module loader, implementing the following steps:
- Verify the elf file.
- Parse the elf file.
- Allocate a region of memory mapped within a free area of
  [xen_virt_end, XEN_VIRT_END].
- Copy allocated sections into the new region.
- Resolve section symbols. All other symbols must be absolute addresses.
  (Note that patch titled "xsplice,symbols: Implement symbol name resolution
   on address" implements that)
- Perform relocations.

We capitalize on the vmalloc callback API to allocate a region
of memory within the [xen_virt_end, XEN_VIRT_END] for the code.
The .data, .rodata and so on will be (not implemented in this
patch) populated in vmalloc area.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v2: - Change the 'xsplice_patch_func' structure layout/size.
    - Add more error checking. Fix memory leak.
    - Move elf_resolve and elf_perform relocs in elf file.
    - Print the payload address and pages in keyhandler.
v3:
    - Make it build under ARM
    - Build it without using the return_ macro.
    - Add fixes from Ross.
v4:
    - Add the _return macro back - but only use it during debug builds.
v5:
    - Remove the macro, prefix arch_ on arch specific calls.
    - Move alloc_payload to arch specific file.
    - Use void* instead of uint8_t, use const
v6:
    - Add copyrights
v7:
    - Unroll the vmap code to add ASSERT. Change while to not incur
      potential long error loop
v8:
   - Use vmalloc/vfree cb APIs
   - Secure .text pages to be RX instead of RWX.
---
---
 xen/arch/arm/Makefile             |   1 +
 xen/arch/arm/xsplice.c            |  56 +++++++++
 xen/arch/x86/Makefile             |   1 +
 xen/arch/x86/setup.c              |   7 ++
 xen/arch/x86/xsplice.c            | 256 ++++++++++++++++++++++++++++++++++++++
 xen/common/xsplice.c              | 186 ++++++++++++++++++++++++++-
 xen/common/xsplice_elf.c          |  85 +++++++++++++
 xen/include/asm-x86/x86_64/page.h |   2 +
 xen/include/xen/xsplice.h         |  44 +++++++
 xen/include/xen/xsplice_elf.h     |   5 +
 10 files changed, 641 insertions(+), 2 deletions(-)
 create mode 100644 xen/arch/arm/xsplice.c
 create mode 100644 xen/arch/x86/xsplice.c

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 0328b50..eae5cb3 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -40,6 +40,7 @@ obj-y += device.o
 obj-y += decode.o
 obj-y += processor.o
 obj-y += smc.o
+obj-$(CONFIG_XSPLICE) += xsplice.o
 
 #obj-bin-y += ....o
 
diff --git a/xen/arch/arm/xsplice.c b/xen/arch/arm/xsplice.c
new file mode 100644
index 0000000..b20365c
--- /dev/null
+++ b/xen/arch/arm/xsplice.c
@@ -0,0 +1,56 @@
+/*
+ *  Copyright (C) 2016 Citrix Systems R&D Ltd.
+ */
+#include <xen/lib.h>
+#include <xen/errno.h>
+#include <xen/xsplice_elf.h>
+#include <xen/xsplice.h>
+
+int arch_xsplice_verify_elf(const struct xsplice_elf *elf, void *data)
+{
+    return -ENOSYS;
+}
+
+int arch_xsplice_perform_rel(struct xsplice_elf *elf,
+                             const struct xsplice_elf_sec *base,
+                             const struct xsplice_elf_sec *rela)
+{
+    return -ENOSYS;
+}
+
+int arch_xsplice_perform_rela(struct xsplice_elf *elf,
+                              const struct xsplice_elf_sec *base,
+                              const struct xsplice_elf_sec *rela)
+{
+    return -ENOSYS;
+}
+
+void *arch_xsplice_alloc_payload(unsigned int pages, enum va_type type,
+                                 mfn_t **mfn)
+{
+    return NULL;
+}
+
+int arch_xsplice_secure(void *va, unsigned int pages, enum va_type type,
+                        const mfn_t *mfn)
+{
+    return -ENOSYS;
+}
+
+void arch_xsplice_register_find_space(find_space_t cb)
+{
+}
+
+void arch_xsplice_free_payload(void *va, unsigned int pages, enum va_type type)
+{
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 1bcb08b..dfcebe8 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -63,6 +63,7 @@ obj-y += vm_event.o
 obj-y += xstate.o
 
 obj-$(crash_debug) += gdbstub.o
+obj-$(CONFIG_XSPLICE) += xsplice.o
 
 x86_emulate.o: x86_emulate/x86_emulate.c x86_emulate/x86_emulate.h
 
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 115e6fd..27a4677 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -100,6 +100,9 @@ unsigned long __read_mostly xen_phys_start;
 
 unsigned long __read_mostly xen_virt_end;
 
+unsigned long __read_mostly avail_virt_start;
+unsigned long __read_mostly avail_virt_end;
+
 DEFINE_PER_CPU(struct tss_struct, init_tss);
 
 char __section(".bss.stack_aligned") cpu0_stack[STACK_SIZE];
@@ -1206,6 +1209,10 @@ void __init noreturn __start_xen(unsigned long mbi_p)
                    ~((1UL << L2_PAGETABLE_SHIFT) - 1);
     destroy_xen_mappings(xen_virt_end, XEN_VIRT_START + BOOTSTRAP_MAP_BASE);
 
+    avail_virt_start = xen_virt_end;
+    avail_virt_end = XEN_VIRT_END - NR_CPUS * PAGE_SIZE;
+    BUG_ON(avail_virt_end <= avail_virt_start);
+
     nr_pages = 0;
     for ( i = 0; i < e820.nr_map; i++ )
         if ( e820.map[i].type == E820_RAM )
diff --git a/xen/arch/x86/xsplice.c b/xen/arch/x86/xsplice.c
new file mode 100644
index 0000000..48a3645
--- /dev/null
+++ b/xen/arch/x86/xsplice.c
@@ -0,0 +1,256 @@
+/*
+ * Copyright (C) 2016 Citrix Systems R&D Ltd.
+ */
+
+#include <xen/errno.h>
+#include <xen/lib.h>
+#include <xen/mm.h>
+#include <xen/pfn.h>
+#include <xen/vmap.h>
+#include <xen/xsplice_elf.h>
+#include <xen/xsplice.h>
+
+int arch_xsplice_verify_elf(const struct xsplice_elf *elf, void *data)
+{
+
+    Elf_Ehdr *hdr = data;
+
+    if ( !IS_ELF(*hdr) )
+    {
+        printk(XENLOG_ERR "%s%s: Not an ELF payload!\n", XSPLICE, elf->name);
+        return -EINVAL;
+    }
+    if ( elf->len < (sizeof *hdr) ||
+         !IS_ELF(*hdr) ||
+         hdr->e_ident[EI_CLASS] != ELFCLASS64 ||
+         hdr->e_ident[EI_DATA] != ELFDATA2LSB ||
+         hdr->e_ident[EI_OSABI] != ELFOSABI_SYSV ||
+         hdr->e_machine != EM_X86_64 ||
+         hdr->e_type != ET_REL ||
+         hdr->e_phnum != 0 )
+    {
+        printk(XENLOG_ERR "%s%s: Invalid ELF payload!\n", XSPLICE, elf->name);
+        return -EOPNOTSUPP;
+    }
+
+    return 0;
+}
+
+int arch_xsplice_perform_rel(struct xsplice_elf *elf,
+                             const struct xsplice_elf_sec *base,
+                             const struct xsplice_elf_sec *rela)
+{
+    dprintk(XENLOG_ERR, "%s%s: SHR_REL relocation unsupported\n",
+            XSPLICE, elf->name);
+    return -ENOSYS;
+}
+
+int arch_xsplice_perform_rela(struct xsplice_elf *elf,
+                              const struct xsplice_elf_sec *base,
+                              const struct xsplice_elf_sec *rela)
+{
+    Elf_RelA *r;
+    unsigned int symndx, i;
+    uint64_t val;
+    uint8_t *dest;
+
+    if ( !rela->sec->sh_entsize || !rela->sec->sh_size ||
+         rela->sec->sh_entsize != sizeof(Elf_RelA) )
+    {
+        dprintk(XENLOG_DEBUG, "%s%s: Section relative header is corrupted!\n",
+                XSPLICE, elf->name);
+        return -EINVAL;
+    }
+    for ( i = 0; i < (rela->sec->sh_size / rela->sec->sh_entsize); i++ )
+    {
+        r = (Elf_RelA *)(rela->data + i * rela->sec->sh_entsize);
+        if ( (unsigned long)r > (unsigned long)(elf->hdr + elf->len) )
+        {
+            dprintk(XENLOG_DEBUG, "%s%s: Relative section %u  is past end!\n",
+                    XSPLICE, elf->name, i);
+            return -EINVAL;
+        }
+        symndx = ELF64_R_SYM(r->r_info);
+        if ( symndx > elf->nsym )
+        {
+            dprintk(XENLOG_DEBUG, "%s%s: Relative symbol wants symbol@%u which is past end!\n",
+                    XSPLICE, elf->name, symndx);
+            return -EINVAL;
+        }
+        dest = base->load_addr + r->r_offset;
+        val = r->r_addend + elf->sym[symndx].sym->st_value;
+
+        switch ( ELF64_R_TYPE(r->r_info) )
+        {
+            case R_X86_64_NONE:
+                break;
+            case R_X86_64_64:
+                *(uint64_t *)dest = val;
+                break;
+            case R_X86_64_PLT32:
+                /*
+                 * Xen uses -fpic which normally uses PLT relocations
+                 * except that it sets visibility to hidden which means
+                 * that they are not used.  However, when gcc cannot
+                 * inline memcpy it emits memcpy with default visibility
+                 * which then creates a PLT relocation.  It can just be
+                 * treated the same as R_X86_64_PC32.
+                 */
+                /* Fall through */
+            case R_X86_64_PC32:
+                *(uint32_t *)dest = val - (uint64_t)dest;
+                break;
+            default:
+                printk(XENLOG_ERR "%s%s: Unhandled relocation %lu\n",
+                       XSPLICE, elf->name, ELF64_R_TYPE(r->r_info));
+                return -EINVAL;
+        }
+    }
+
+    return 0;
+}
+
+static find_space_t find_space_fnc = NULL;
+
+void arch_xsplice_register_find_space(find_space_t cb)
+{
+    ASSERT(!find_space_fnc);
+
+    find_space_fnc = cb;
+}
+
+static void* xsplice_map_rx(const mfn_t *mfn, unsigned int pages)
+{
+    unsigned long cur;
+    unsigned long start, end;
+
+    start = (unsigned long)avail_virt_start;
+    end = start + pages * PAGE_SIZE;
+
+    ASSERT(find_space_fnc);
+
+    if ( (find_space_fnc)(pages, &start, &end) )
+        return NULL;
+
+    if ( end >= avail_virt_end )
+        return NULL;
+
+    for ( cur = start; pages--; ++mfn, cur += PAGE_SIZE )
+    {
+        /*
+         * We would like to to RX, but we need to copy data in it first.
+         * See arch_xsplice_secure for how we lockdown.
+         */
+        if ( map_pages_to_xen(start, mfn_x(*mfn), 1, PAGE_HYPERVISOR_RWX) )
+        {
+            if ( cur != start )
+                destroy_xen_mappings(start, cur);
+            return NULL;
+        }
+    }
+    return (void*)start;
+}
+
+/*
+ * The function prepares an xSplice payload by allocating space which
+ * then can be used for loading the allocated sections, resolving symbols,
+ * performing relocations, etc.
+ */
+void *arch_xsplice_alloc_payload(unsigned int pages, enum va_type type,
+                                 mfn_t **mfn)
+{
+    vmap_cb_t cb = NULL;
+    unsigned int i;
+    void *p;
+
+    ASSERT(pages); /* Which is in bytes. */
+    ASSERT(mfn && !*mfn);
+
+    /*
+     * Initially the pages allocated must have W otherwise we can't
+     * put anything in them.
+     */
+    if ( type == XSPLICE_VA_RX )
+        cb = xsplice_map_rx;
+    else
+        cb = vmap;
+
+    *mfn = NULL;
+    /*
+     * We let the vmalloc allocate the pages we need, and use
+     * our callback.
+     */
+    p = vmalloc_cb(pages * PAGE_SIZE, cb, mfn);
+    WARN_ON(!p);
+    if ( !p )
+        return NULL;
+    for ( i = 0; i < pages; i++ )
+        clear_page(p + (i * PAGE_SIZE) );
+
+    /* Note that we do not free mfn. The caller is responsible for that. */
+    return p;
+}
+
+static void arch_xsplice_vfree_cb(void *va, unsigned int pages)
+{
+    unsigned long addr = (unsigned long)va;
+
+    destroy_xen_mappings(addr, addr + pages * PAGE_SIZE);
+}
+
+/*
+ * Once the resolving symbols, performing relocations, etc is complete
+ * we secure the memory by putting in the proper page table attributes
+ * for the desired type.
+ *
+ */
+int arch_xsplice_secure(void *va, unsigned int pages, enum va_type type,
+                        const mfn_t *mfn)
+{
+    unsigned long cur;
+    unsigned long start = (unsigned long)va;
+
+    ASSERT(va);
+    ASSERT(pages);
+
+    if ( type != XSPLICE_VA_RX )
+        return 0;
+
+    /*
+     * We could walk the pagetable and do the pagetable manipulations
+     * (strip the _PAGE_RW), which would mean also not needing the mfn
+     * array, but there are no generic code for this yet (TODO).
+     *
+     * For right now tear down the pagetables and recreate them.
+     */
+    arch_xsplice_vfree_cb(va, pages);
+
+    for ( cur = start; pages--; ++mfn, cur += PAGE_SIZE )
+    {
+        if ( map_pages_to_xen(start, mfn_x(*mfn), 1, PAGE_HYPERVISOR_RX) )
+        {
+            if ( cur != start )
+                destroy_xen_mappings(start, cur);
+            return -EINVAL;
+        }
+    }
+    return 0;
+}
+
+void arch_xsplice_free_payload(void *va, unsigned int pages, enum va_type type)
+{
+    if ( type == XSPLICE_VA_RX )
+        vfree_cb(va, pages, arch_xsplice_vfree_cb);
+    else
+        vfree(va);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index ba2d376..5bbb5bb 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -12,6 +12,7 @@
 #include <xen/smp.h>
 #include <xen/spinlock.h>
 #include <xen/vmap.h>
+#include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 
 #include <asm/event.h>
@@ -27,6 +28,9 @@ struct payload {
     uint32_t state;                      /* One of the XSPLICE_STATE_*. */
     int32_t rc;                          /* 0 or -XEN_EXX. */
     struct list_head list;               /* Linked to 'payload_list'. */
+    void *payload_address;               /* Virtual address mapped. */
+    size_t payload_pages;                /* Nr of the pages. */
+    mfn_t *mfn;                          /* Array of MFNs of the pages. */
     char name[XEN_XSPLICE_NAME_SIZE + 1];/* Name of it. */
 };
 
@@ -92,6 +96,136 @@ static int find_payload(const xen_xsplice_name_t *name, struct payload **f)
 }
 
 /*
+ * Functions related to XEN_SYSCTL_XSPLICE_UPLOAD (see xsplice_upload), and
+ * freeing payload (XEN_SYSCTL_XSPLICE_ACTION:XSPLICE_ACTION_UNLOAD).
+ */
+
+static void free_payload_data(struct payload *payload)
+{
+    /* Set to zero until "move_payload". */
+    if ( !payload->payload_address )
+        return;
+
+    xfree(payload->mfn);
+    payload->mfn = NULL;
+    arch_xsplice_free_payload(payload->payload_address,
+                              payload->payload_pages, XSPLICE_VA_RX);
+
+    payload->payload_address = NULL;
+    payload->payload_pages = 0;
+}
+
+static void calc_section(struct xsplice_elf_sec *sec, size_t *size)
+{
+    size_t align_size = ROUNDUP(*size, sec->sec->sh_addralign);
+    sec->sec->sh_entsize = align_size;
+    *size = sec->sec->sh_size + align_size;
+}
+
+static int find_hole(ssize_t pages, unsigned long *hole_start,
+                      unsigned long *hole_end)
+{
+    struct payload *data, *data2;
+
+    spin_lock_recursive(&payload_lock);
+    list_for_each_entry ( data, &payload_list, list )
+    {
+        list_for_each_entry ( data2, &payload_list, list )
+        {
+            unsigned long start, end;
+
+            start = (unsigned long)data2->payload_address;
+            end = start + data2->payload_pages * PAGE_SIZE;
+            if ( *hole_end > start && *hole_start < end )
+            {
+                *hole_start = end;
+                *hole_end = end + pages * PAGE_SIZE;
+                break;
+            }
+        }
+        if ( &data2->list == &payload_list )
+            break;
+    }
+    spin_unlock_recursive(&payload_lock);
+
+    return 0;
+}
+
+static int move_payload(struct payload *payload, struct xsplice_elf *elf)
+{
+    uint8_t *buf;
+    unsigned int i;
+    size_t size = 0;
+
+    /* Compute text regions. */
+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( (elf->sec[i].sec->sh_flags & (SHF_ALLOC|SHF_EXECINSTR)) ==
+             (SHF_ALLOC|SHF_EXECINSTR) )
+            calc_section(&elf->sec[i], &size);
+    }
+
+    /* Compute rw data. */
+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
+             !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
+             (elf->sec[i].sec->sh_flags & SHF_WRITE) )
+            calc_section(&elf->sec[i], &size);
+    }
+
+    /* Compute ro data. */
+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
+             !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
+             !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
+            calc_section(&elf->sec[i], &size);
+    }
+
+    size = PFN_UP(size);
+    buf = arch_xsplice_alloc_payload(size, XSPLICE_VA_RX, &payload->mfn);
+    if ( !buf ) {
+        printk(XENLOG_ERR "%s%s: Could not allocate memory for payload!\n",
+               XSPLICE, elf->name);
+        return -ENOMEM;
+    }
+    payload->payload_address = buf;
+    payload->payload_pages = size;
+
+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( elf->sec[i].sec->sh_flags & SHF_ALLOC )
+        {
+            elf->sec[i].load_addr = buf + elf->sec[i].sec->sh_entsize;
+
+            /* Don't copy NOBITS - such as BSS. */
+            if ( elf->sec[i].sec->sh_type != SHT_NOBITS )
+            {
+                memcpy(elf->sec[i].load_addr, elf->sec[i].data,
+                       elf->sec[i].sec->sh_size);
+                dprintk(XENLOG_DEBUG, "%s%s: Loaded %s at 0x%p\n", XSPLICE,
+                        elf->name, elf->sec[i].name, elf->sec[i].load_addr);
+            }
+        }
+    }
+    return 0;
+}
+
+static int secure_payload(struct payload *payload, struct xsplice_elf *elf)
+{
+    int rc;
+
+    ASSERT(payload->mfn);
+
+    rc = arch_xsplice_secure(payload->payload_address, payload->payload_pages,
+                             XSPLICE_VA_RX, payload->mfn);
+    xfree(payload->mfn);
+    payload->mfn = NULL;
+    return rc;
+}
+
+/*
  * We MUST be holding the payload_lock spinlock.
  */
 static void free_payload(struct payload *data)
@@ -100,9 +234,51 @@ static void free_payload(struct payload *data)
     list_del(&data->list);
     payload_cnt--;
     payload_version++;
+    free_payload_data(data);
     xfree(data);
 }
 
+static int load_payload_data(struct payload *payload, void *raw, ssize_t len)
+{
+    struct xsplice_elf elf;
+    int rc = 0;
+
+    memset(&elf, 0, sizeof(elf));
+    elf.name = payload->name;
+    elf.len = len;
+
+    rc = arch_xsplice_verify_elf(&elf, raw);
+    if ( rc )
+        return rc;
+
+    rc = xsplice_elf_load(&elf, raw);
+    if ( rc )
+        goto out;
+
+    rc = move_payload(payload, &elf);
+    if ( rc )
+        goto out;
+
+    rc = xsplice_elf_resolve_symbols(&elf);
+    if ( rc )
+        goto out;
+
+    rc = xsplice_elf_perform_relocs(&elf);
+    if ( rc )
+        goto out;
+
+    rc = secure_payload(payload, &elf);
+
+ out:
+    if ( rc )
+        free_payload_data(payload);
+
+    /* Free our temporary data structure. */
+    xsplice_elf_free(&elf);
+
+    return rc;
+}
+
 static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
 {
     struct payload *data = NULL;
@@ -137,6 +313,10 @@ static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
     if ( copy_from_guest(raw_data, upload->payload, upload->size) )
         goto out;
 
+    rc = load_payload_data(data, raw_data, upload->size);
+    if ( rc )
+        goto out;
+
     data->state = XSPLICE_STATE_CHECKED;
     data->rc = 0;
     INIT_LIST_HEAD(&data->list);
@@ -365,8 +545,9 @@ static void xsplice_printall(unsigned char key)
     spin_lock_recursive(&payload_lock);
 
     list_for_each_entry ( data, &payload_list, list )
-        printk(" name=%s state=%s(%d)\n", data->name,
-               state2str(data->state), data->state);
+        printk(" name=%s state=%s(%d) %p using %zu pages.\n", data->name,
+               state2str(data->state), data->state, data->payload_address,
+               data->payload_pages);
 
     spin_unlock_recursive(&payload_lock);
 }
@@ -374,6 +555,7 @@ static void xsplice_printall(unsigned char key)
 static int __init xsplice_init(void)
 {
     register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
+    arch_xsplice_register_find_space(&find_hole);
     return 0;
 }
 __initcall(xsplice_init);
diff --git a/xen/common/xsplice_elf.c b/xen/common/xsplice_elf.c
index ae87361..6128726 100644
--- a/xen/common/xsplice_elf.c
+++ b/xen/common/xsplice_elf.c
@@ -206,6 +206,91 @@ static int elf_get_sym(struct xsplice_elf *elf, const void *data)
     return 0;
 }
 
+int xsplice_elf_resolve_symbols(struct xsplice_elf *elf)
+{
+    unsigned int i;
+
+    /*
+     * The first entry of an ELF symbol table is the "undefined symbol index".
+     * aka reserved so we skip it.
+     */
+    ASSERT( elf->sym );
+    for ( i = 1; i < elf->nsym; i++ )
+    {
+        switch ( elf->sym[i].sym->st_shndx )
+        {
+            case SHN_COMMON:
+                printk(XENLOG_ERR "%s%s: Unexpected common symbol: %s\n",
+                       XSPLICE, elf->name, elf->sym[i].name);
+                return -EINVAL;
+                break;
+            case SHN_UNDEF:
+                printk(XENLOG_ERR "%s%s: Unknown symbol: %s\n",
+                       XSPLICE, elf->name, elf->sym[i].name);
+                return -ENOENT;
+                break;
+            case SHN_ABS:
+                dprintk(XENLOG_DEBUG, "%s%s: Absolute symbol: %s => 0x%"PRIx64"\n",
+                      XSPLICE, elf->name, elf->sym[i].name,
+                      elf->sym[i].sym->st_value);
+                break;
+            default:
+                if ( elf->sec[elf->sym[i].sym->st_shndx].sec->sh_flags & SHF_ALLOC )
+                {
+                    elf->sym[i].sym->st_value +=
+                        (unsigned long)elf->sec[elf->sym[i].sym->st_shndx].load_addr;
+                    if ( elf->sym[i].name )
+                        printk(XENLOG_DEBUG "%s%s: Symbol resolved: %s => 0x%"PRIx64"\n",
+                               XSPLICE, elf->name, elf->sym[i].name,
+                               elf->sym[i].sym->st_value);
+                }
+        }
+    }
+
+    return 0;
+}
+
+int xsplice_elf_perform_relocs(struct xsplice_elf *elf)
+{
+    struct xsplice_elf_sec *rela, *base;
+    unsigned int i;
+    int rc;
+
+    /*
+     * The first entry of an ELF symbol table is the "undefined symbol index".
+     * aka reserved so we skip it.
+     */
+    ASSERT( elf->sym );
+    for ( i = 1; i < elf->hdr->e_shnum; i++ )
+    {
+        rela = &elf->sec[i];
+
+        if ( (rela->sec->sh_type != SHT_RELA ) &&
+             (rela->sec->sh_type != SHT_REL ) )
+            continue;
+
+         /* Is it a valid relocation section? */
+         if ( rela->sec->sh_info >= elf->hdr->e_shnum )
+            continue;
+
+         base = &elf->sec[rela->sec->sh_info];
+
+         /* Don't relocate non-allocated sections. */
+         if ( !(base->sec->sh_flags & SHF_ALLOC) )
+            continue;
+
+        if ( elf->sec[i].sec->sh_type == SHT_RELA )
+            rc = arch_xsplice_perform_rela(elf, base, rela);
+        else /* SHT_REL */
+            rc = arch_xsplice_perform_rel(elf, base, rela);
+
+        if ( rc )
+            return rc;
+    }
+
+    return 0;
+}
+
 static int xsplice_header_check(const struct xsplice_elf *elf)
 {
     if ( sizeof(*elf->hdr) >= elf->len )
diff --git a/xen/include/asm-x86/x86_64/page.h b/xen/include/asm-x86/x86_64/page.h
index 86abb94..a854e05 100644
--- a/xen/include/asm-x86/x86_64/page.h
+++ b/xen/include/asm-x86/x86_64/page.h
@@ -38,6 +38,8 @@
 #include <xen/pdx.h>
 
 extern unsigned long xen_virt_end;
+extern unsigned long avail_virt_start;
+extern unsigned long avail_virt_end;
 
 #define spage_to_pdx(spg) (((spg) - spage_table)<<(SUPERPAGE_SHIFT-PAGE_SHIFT))
 #define pdx_to_spage(pdx) (spage_table + ((pdx)>>(SUPERPAGE_SHIFT-PAGE_SHIFT)))
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index cd805a8..343e59f 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -6,6 +6,9 @@
 #ifndef __XEN_XSPLICE_H__
 #define __XEN_XSPLICE_H__
 
+struct xsplice_elf;
+struct xsplice_elf_sec;
+struct xsplice_elf_sym;
 struct xen_sysctl_xsplice_op;
 
 #ifdef CONFIG_XSPLICE
@@ -15,6 +18,47 @@ struct xen_sysctl_xsplice_op;
 
 int xsplice_op(struct xen_sysctl_xsplice_op *);
 
+/* Arch hooks. */
+int arch_xsplice_verify_elf(const struct xsplice_elf *elf, void *data);
+int arch_xsplice_perform_rel(struct xsplice_elf *elf,
+                             const struct xsplice_elf_sec *base,
+                             const struct xsplice_elf_sec *rela);
+int arch_xsplice_perform_rela(struct xsplice_elf *elf,
+                              const struct xsplice_elf_sec *base,
+                              const struct xsplice_elf_sec *rela);
+enum va_type {
+    XSPLICE_VA_RX, /* .text */
+    XSPLICE_VA_RW, /* Everything else. */
+};
+
+#include <xen/mm.h>
+void *arch_xsplice_alloc_payload(unsigned int pages, enum va_type, mfn_t **mfn);
+
+/*
+ * Function to secure the allocate pages (from arch_xsplice_alloc_payload)
+ * with the right page permissions.
+ */
+int arch_xsplice_secure(void *va, unsigned int pages, enum va_type type,
+                        const mfn_t *mfn);
+
+void arch_xsplice_free_payload(void *va, unsigned int pages, enum va_type);
+
+/*
+ * Callback to find available virtual address space in which the
+ * payload could be put in.
+ *
+ * The arguments are:
+ *  - The size of the payload in bytes.
+ *  - The starting virtual address to search. To be updated by
+ *    callback if space found.
+ *  - The ending virtual address to search. To be updated by
+ *    callback if space found.
+ *
+ * The return value is zero if search was done. -EXX values
+ * if errors were encountered.
+ */
+typedef int (*find_space_t)(ssize_t, unsigned long *, unsigned long *);
+void arch_xsplice_register_find_space(find_space_t cb);
 #else
 
 #include <xen/errno.h> /* For -ENOSYS */
diff --git a/xen/include/xen/xsplice_elf.h b/xen/include/xen/xsplice_elf.h
index e2dea18..f6ffcbe 100644
--- a/xen/include/xen/xsplice_elf.h
+++ b/xen/include/xen/xsplice_elf.h
@@ -15,6 +15,8 @@ struct xsplice_elf_sec {
                                       elf_resolve_section_names. */
     const void *data;              /* Pointer to the section (done by
                                       elf_resolve_sections). */
+    uint8_t *load_addr;            /* A pointer to the allocated destination.
+                                      Done by load_payload_data. */
 };
 
 struct xsplice_elf_sym {
@@ -38,6 +40,9 @@ struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
 int xsplice_elf_load(struct xsplice_elf *elf, void *data);
 void xsplice_elf_free(struct xsplice_elf *elf);
 
+int xsplice_elf_resolve_symbols(struct xsplice_elf *elf);
+int xsplice_elf_perform_relocs(struct xsplice_elf *elf);
+
 #endif /* __XEN_XSPLICE_ELF_H__ */
 
 /*
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 17/34] xsplice: Implement support for applying/reverting/replacing patches.
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (15 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 16/34] xsplice: Implement payload loading Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 17:56 ` [PATCH v4 18/34] x86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version' Konrad Rzeszutek Wilk
                   ` (16 subsequent siblings)
  33 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit,
	Konrad Rzeszutek Wilk, Jun Nakajima, Julien Grall,
	Stefano Stabellini, Jan Beulich, Boris Ostrovsky

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Implement support for the apply, revert and replace actions.

To perform and action on a payload, the hypercall sets up a data
structure to schedule the work.  A hook is added in all the
return-to-guest paths to check for work to do and execute it if needed.
In this way, patches can be applied with all CPUs idle and without
stacks.  The first CPU to do_xsplice() becomes the master and triggers a
reschedule softirq to trigger all the other CPUs to enter do_xsplice()
with no stack.  Once all CPUs have rendezvoused, all CPUs disable IRQs
and NMIs are ignored. The system is then quiscient and the master
performs the action.  After this, all CPUs enable IRQs and NMIs are
re-enabled.

Note that it is unsafe to patch do_nmi and the xSplice internal functions.
Patching functions on NMI/MCE path is liable to end in disaster.

The action to perform is one of:
- APPLY: For each function in the module, store the first 5 bytes of the
  old function and replace it with a jump to the new function.
- REVERT: Copy the previously stored bytes into the first 5 bytes of the
  old function.
- REPLACE: Revert each applied module and then apply the new module.

To prevent a deadlock with any other barrier in the system, the master
will wait for up to 30ms before timing out.
Measurements found that the patch application to take about 100 μs on a
72 CPU system, whether idle or fully loaded.

We also add an BUILD_ON to make sure that the size of the structure
of the payload is not inadvertly changed.

Lastly we unroll the 'vmap_to_page' on x86 as inside the macro there
is a posibility of a NULL pointer. Hence we unroll it with extra
ASSERTS. Note that asserts on non-debug builds are compiled out hence
the extra checks that will just return (and leak memory).

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

--
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>

v2: - Pluck the 'struct xsplice_patch_func' in this patch.
    - Modify code per review comments.
    - Add more data in the keyboard handler.
    - Redo the patching code, split it in functions.
v3: - Add return_ macro for debug builds.
    - Move s/payload_list_lock/payload_list/ to earlier patch
    - Remove const and use ELF types for xsplice_patch_func
v4: - Add check routine to do simple sanity checks for various
      sections.
v5: - s/%p/PRIx64/ as ARM builds complain.
v6: - Move code around. Add more dprintk. Add XSPLICE in front of all
      printks/dprintk.
      Put the NMIs back if we fail patching.
      Add per-cpu to lessen contention for global structure.
      Extract from xsplice_do_single patching code into xsplice_do_action
      Squash xsplice_do_single and check_for_xsplice_work together to
      have all rendezvous in one place.
      Made XSPLICE_ACTION_REPLACE work again (wrong list iterator)
      s/find_special_sections/prepare_payload/
      Use list_del_init and INIT_LIST_HEAD for applied_list
v7: - Add comment, adjust spacing for "Timed out on CPU semaphore"
v8: - Added CR0.WP manipulations when altering the .text of hypervisor.
v9: - Added fix from Andrew for CR0.WP manipulation.
---
---
 docs/misc/xsplice.markdown  |   3 +-
 xen/arch/arm/xsplice.c      |  20 ++
 xen/arch/x86/domain.c       |   4 +
 xen/arch/x86/hvm/svm/svm.c  |   2 +
 xen/arch/x86/hvm/vmx/vmcs.c |   2 +
 xen/arch/x86/xsplice.c      |  39 ++++
 xen/common/xsplice.c        | 453 ++++++++++++++++++++++++++++++++++++++++++--
 xen/include/asm-arm/nmi.h   |  13 ++
 xen/include/xen/xsplice.h   |  31 ++-
 9 files changed, 551 insertions(+), 16 deletions(-)

diff --git a/docs/misc/xsplice.markdown b/docs/misc/xsplice.markdown
index cb3af6e..1940e29 100644
--- a/docs/misc/xsplice.markdown
+++ b/docs/misc/xsplice.markdown
@@ -829,7 +829,8 @@ The implementation must also have a mechanism for:
  * Be able to lookup in the Xen hypervisor the symbol names of functions from the ELF payload.
  * Be able to patch .rodata, .bss, and .data sections.
  * Further safety checks (blacklist of which functions cannot be patched, check
-   the stack, make sure the payload is built with same compiler as hypervisor).
+   the stack, make sure the payload is built with same compiler as hypervisor,
+   and NMI/MCE handlers and do_nmi for right now - until an safe solution is found).
  * NOP out the code sequence if `new_size` is zero.
  * Deal with other relocation types:  R_X86_64_[8,16,32,32S], R_X86_64_PC[8,16,64] in payload file.
 
diff --git a/xen/arch/arm/xsplice.c b/xen/arch/arm/xsplice.c
index b20365c..8f997b0 100644
--- a/xen/arch/arm/xsplice.c
+++ b/xen/arch/arm/xsplice.c
@@ -6,6 +6,26 @@
 #include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 
+void arch_xsplice_patching_enter(void)
+{
+}
+
+void arch_xsplice_patching_leave(void)
+{
+}
+
+void arch_xsplice_apply_jmp(struct xsplice_patch_func *func)
+{
+}
+
+void arch_xsplice_revert_jmp(struct xsplice_patch_func *func)
+{
+}
+
+void arch_xsplice_post_action(void)
+{
+}
+
 int arch_xsplice_verify_elf(const struct xsplice_elf *elf, void *data)
 {
     return -ENOSYS;
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index a6d721b..ecea694 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -36,6 +36,7 @@
 #include <xen/cpu.h>
 #include <xen/wait.h>
 #include <xen/guest_access.h>
+#include <xen/xsplice.h>
 #include <public/sysctl.h>
 #include <public/hvm/hvm_vcpu.h>
 #include <asm/regs.h>
@@ -121,6 +122,7 @@ static void idle_loop(void)
         (*pm_idle)();
         do_tasklet();
         do_softirq();
+        check_for_xsplice_work(); /* Must be last. */
     }
 }
 
@@ -137,6 +139,7 @@ void startup_cpu_idle_loop(void)
 
 static void noreturn continue_idle_domain(struct vcpu *v)
 {
+    check_for_xsplice_work();
     reset_stack_and_jump(idle_loop);
 }
 
@@ -144,6 +147,7 @@ static void noreturn continue_nonidle_domain(struct vcpu *v)
 {
     check_wakeup_from_wait();
     mark_regs_dirty(guest_cpu_user_regs());
+    check_for_xsplice_work();
     reset_stack_and_jump(ret_from_intr);
 }
 
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 7634c3f..bbb0a73 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -26,6 +26,7 @@
 #include <xen/hypercall.h>
 #include <xen/domain_page.h>
 #include <xen/xenoprof.h>
+#include <xen/xsplice.h>
 #include <asm/current.h>
 #include <asm/io.h>
 #include <asm/paging.h>
@@ -1096,6 +1097,7 @@ static void noreturn svm_do_resume(struct vcpu *v)
 
     hvm_do_resume(v);
 
+    check_for_xsplice_work();
     reset_stack_and_jump(svm_asm_do_resume);
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index fd4d876..d31f2a4 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -25,6 +25,7 @@
 #include <xen/kernel.h>
 #include <xen/keyhandler.h>
 #include <xen/vm_event.h>
+#include <xen/xsplice.h>
 #include <asm/current.h>
 #include <asm/cpufeature.h>
 #include <asm/processor.h>
@@ -1722,6 +1723,7 @@ void vmx_do_resume(struct vcpu *v)
     }
 
     hvm_do_resume(v);
+    check_for_xsplice_work();
     reset_stack_and_jump(vmx_asm_do_vmentry);
 }
 
diff --git a/xen/arch/x86/xsplice.c b/xen/arch/x86/xsplice.c
index 48a3645..f4d2610 100644
--- a/xen/arch/x86/xsplice.c
+++ b/xen/arch/x86/xsplice.c
@@ -10,6 +10,45 @@
 #include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 
+#define PATCH_INSN_SIZE 5
+
+void arch_xsplice_patching_enter(void)
+{
+    /* Disable WP to allow changes to read-only pages. */
+    write_cr0(read_cr0() & ~X86_CR0_WP);
+}
+
+void arch_xsplice_patching_leave(void)
+{
+    /* Reinstate WP. */
+    write_cr0(read_cr0() | X86_CR0_WP);
+}
+
+void arch_xsplice_apply_jmp(struct xsplice_patch_func *func)
+{
+    uint32_t val;
+    uint8_t *old_ptr;
+
+    BUILD_BUG_ON(PATCH_INSN_SIZE < sizeof(*func->undo));
+
+    old_ptr = (uint8_t *)func->old_addr;
+    memcpy(func->undo, old_ptr, PATCH_INSN_SIZE);
+
+    *old_ptr++ = 0xe9; /* Relative jump */
+    val = func->new_addr - func->old_addr - PATCH_INSN_SIZE;
+    memcpy(old_ptr, &val, sizeof val);
+}
+
+void arch_xsplice_revert_jmp(struct xsplice_patch_func *func)
+{
+    memcpy((void *)func->old_addr, func->undo, PATCH_INSN_SIZE);
+}
+
+void arch_xsplice_post_action(void)
+{
+    cpuid_eax(0);
+}
+
 int arch_xsplice_verify_elf(const struct xsplice_elf *elf, void *data)
 {
 
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 5bbb5bb..d456671 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -3,6 +3,7 @@
  *
  */
 
+#include <xen/cpu.h>
 #include <xen/guest_access.h>
 #include <xen/keyhandler.h>
 #include <xen/lib.h>
@@ -10,17 +11,29 @@
 #include <xen/mm.h>
 #include <xen/sched.h>
 #include <xen/smp.h>
+#include <xen/softirq.h>
 #include <xen/spinlock.h>
 #include <xen/vmap.h>
+#include <xen/wait.h>
 #include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 
 #include <asm/event.h>
+#include <asm/nmi.h>
 #include <public/sysctl.h>
 
+/*
+ * Protects against payload_list operations and also allows only one
+ * caller in schedule_work.
+ */
 static DEFINE_SPINLOCK(payload_lock);
 static LIST_HEAD(payload_list);
 
+/*
+ * Patches which have been applied.
+ */
+static LIST_HEAD(applied_list);
+
 static unsigned int payload_cnt;
 static unsigned int payload_version = 1;
 
@@ -31,9 +44,31 @@ struct payload {
     void *payload_address;               /* Virtual address mapped. */
     size_t payload_pages;                /* Nr of the pages. */
     mfn_t *mfn;                          /* Array of MFNs of the pages. */
+    struct list_head applied_list;       /* Linked to 'applied_list'. */
+    struct xsplice_patch_func *funcs;    /* The array of functions to patch. */
+    unsigned int nfuncs;                 /* Nr of functions to patch. */
     char name[XEN_XSPLICE_NAME_SIZE + 1];/* Name of it. */
 };
 
+/* Defines an outstanding patching action. */
+struct xsplice_work
+{
+    atomic_t semaphore;          /* Used for rendezvous. First to grab it will
+                                    do the patching. */
+    atomic_t irq_semaphore;      /* Used to signal all IRQs disabled. */
+    uint32_t timeout;                    /* Timeout to do the operation. */
+    struct payload *data;        /* The payload on which to act. */
+    volatile bool_t do_work;     /* Signals work to do. */
+    volatile bool_t ready;       /* Signals all CPUs synchronized. */
+    uint32_t cmd;                /* Action request: XSPLICE_ACTION_* */
+};
+
+/* There can be only one outstanding patching action. */
+static struct xsplice_work xsplice_work;
+
+/* Indicate whether the CPU needs to consult xsplice_work structure. */
+static DEFINE_PER_CPU(bool_t, work_to_do);
+
 static int verify_name(const xen_xsplice_name_t *name)
 {
     if ( name->size == 0 || name->size > XEN_XSPLICE_NAME_SIZE )
@@ -225,6 +260,72 @@ static int secure_payload(struct payload *payload, struct xsplice_elf *elf)
     return rc;
 }
 
+static int check_special_sections(struct payload *payload,
+                                  struct xsplice_elf *elf)
+{
+    unsigned int i;
+    static const char *const names[] = { ".xsplice.funcs" };
+
+    for ( i = 0; i < ARRAY_SIZE(names); i++ )
+    {
+        struct xsplice_elf_sec *sec;
+
+        sec = xsplice_elf_sec_by_name(elf, names[i]);
+        if ( !sec )
+        {
+            printk(XENLOG_ERR "%s%s: %s is missing!\n",
+                   XSPLICE, elf->name, names[i]);
+            return -EINVAL;
+        }
+        if ( !sec->sec->sh_size )
+            return -EINVAL;
+    }
+    return 0;
+}
+
+static int prepare_payload(struct payload *payload,
+                           struct xsplice_elf *elf)
+{
+    struct xsplice_elf_sec *sec;
+    unsigned int i;
+    struct xsplice_patch_func *f;
+
+    sec = xsplice_elf_sec_by_name(elf, ".xsplice.funcs");
+    if ( sec )
+    {
+        if ( sec->sec->sh_size % sizeof *payload->funcs )
+        {
+            dprintk(XENLOG_DEBUG, "%s%s: Wrong size of .xsplice.funcs!\n",
+                    XSPLICE, elf->name);
+            return -EINVAL;
+        }
+        payload->funcs = (struct xsplice_patch_func *)sec->load_addr;
+        payload->nfuncs = sec->sec->sh_size / (sizeof *payload->funcs);
+    }
+
+    for ( i = 0; i < payload->nfuncs; i++ )
+    {
+        unsigned int j;
+
+        f = &(payload->funcs[i]);
+
+        if ( !f->new_addr || !f->old_addr || !f->old_size || !f->new_size )
+        {
+            dprintk(XENLOG_DEBUG, "%s%s: Address or size fields are zero!\n",
+                    XSPLICE, elf->name);
+            return -EINVAL;
+        }
+        for ( j = 0; j < 8; j++ )
+            if ( f->undo[j] )
+                return -EINVAL;
+
+        for ( j = 0; j < 24; j++ )
+            if ( f->pad[j] )
+                return -EINVAL;
+    }
+    return 0;
+}
+
 /*
  * We MUST be holding the payload_lock spinlock.
  */
@@ -267,8 +368,15 @@ static int load_payload_data(struct payload *payload, void *raw, ssize_t len)
     if ( rc )
         goto out;
 
-    rc = secure_payload(payload, &elf);
+    rc = check_special_sections(payload, &elf);
+    if ( rc )
+        goto out;
 
+    rc = prepare_payload(payload, &elf);
+    if ( rc )
+        goto out;
+
+    rc = secure_payload(payload, &elf);
  out:
     if ( rc )
         free_payload_data(payload);
@@ -320,6 +428,7 @@ static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
     data->state = XSPLICE_STATE_CHECKED;
     data->rc = 0;
     INIT_LIST_HEAD(&data->list);
+    INIT_LIST_HEAD(&data->applied_list);
 
     spin_lock_recursive(&payload_lock);
     list_add_tail(&data->list, &payload_list);
@@ -414,6 +523,315 @@ static int xsplice_list(xen_sysctl_xsplice_list_t *list)
     return rc ? : idx;
 }
 
+/*
+ * The following functions get the CPUs into an appropriate state and
+ * apply (or revert) each of the payload's functions. This is needed
+ * for XEN_SYSCTL_XSPLICE_ACTION operation (see xsplice_action).
+ */
+
+static int apply_payload(struct payload *data)
+{
+    unsigned int i;
+
+    dprintk(XENLOG_DEBUG, "%s%s: Applying %u functions.\n", XSPLICE,
+            data->name, data->nfuncs);
+
+    arch_xsplice_patching_enter();
+
+    for ( i = 0; i < data->nfuncs; i++ )
+        arch_xsplice_apply_jmp(&data->funcs[i]);
+
+    arch_xsplice_patching_leave();
+
+    list_add_tail(&data->applied_list, &applied_list);
+
+    return 0;
+}
+
+/*
+ * This function is executed having all other CPUs with no stack (we may
+ * have cpu_idle on it) and IRQs disabled.
+ */
+static int revert_payload(struct payload *data)
+{
+    unsigned int i;
+
+    dprintk(XENLOG_DEBUG, "%s%s: Reverting.\n", XSPLICE, data->name);
+
+    arch_xsplice_patching_enter();
+
+    for ( i = 0; i < data->nfuncs; i++ )
+        arch_xsplice_revert_jmp(&data->funcs[i]);
+
+    arch_xsplice_patching_leave();
+
+    list_del_init(&data->applied_list);
+
+    return 0;
+}
+
+/*
+ * This function is executed having all other CPUs with no stack (we may
+ * have cpu_idle on it) and IRQs disabled. We guard against NMI by temporarily
+ * installing our NOP NMI handler.
+ */
+static void xsplice_do_action(void)
+{
+    int rc;
+    struct payload *data, *other, *tmp;
+
+    data = xsplice_work.data;
+    /* Now this function should be the only one on any stack.
+     * No need to lock the payload list or applied list. */
+    switch ( xsplice_work.cmd )
+    {
+    case XSPLICE_ACTION_APPLY:
+        rc = apply_payload(data);
+        if ( rc == 0 )
+            data->state = XSPLICE_STATE_APPLIED;
+        break;
+    case XSPLICE_ACTION_REVERT:
+        rc = revert_payload(data);
+        if ( rc == 0 )
+            data->state = XSPLICE_STATE_CHECKED;
+        break;
+    case XSPLICE_ACTION_REPLACE:
+        rc = 0;
+        /* N.B: Use 'applied_list' member, not 'list'. */
+        list_for_each_entry_safe_reverse ( other, tmp, &applied_list, applied_list )
+        {
+            other->rc = revert_payload(other);
+            if ( other->rc == 0 )
+                other->state = XSPLICE_STATE_CHECKED;
+            else
+            {
+                rc = -EINVAL;
+                break;
+            }
+        }
+        if ( rc != -EINVAL )
+        {
+            rc = apply_payload(data);
+            if ( rc == 0 )
+                data->state = XSPLICE_STATE_APPLIED;
+        }
+        break;
+    default:
+        rc = -EINVAL;
+        break;
+    }
+
+    data->rc = rc;
+}
+
+/*
+ * MUST be holding the payload_lock.
+ */
+static int schedule_work(struct payload *data, uint32_t cmd, uint32_t timeout)
+{
+    unsigned int cpu;
+
+    ASSERT(spin_is_locked(&payload_lock));
+
+    /* Fail if an operation is already scheduled. */
+    if ( xsplice_work.do_work )
+        return -EBUSY;
+
+    if ( !get_cpu_maps() )
+    {
+        printk(XENLOG_ERR "%s%s: unable to get cpu_maps lock!\n",
+               XSPLICE, data->name);
+        return -EBUSY;
+    }
+
+    xsplice_work.cmd = cmd;
+    xsplice_work.data = data;
+    xsplice_work.timeout = timeout ?: MILLISECS(30);
+
+    dprintk(XENLOG_DEBUG, "%s%s: timeout is %"PRI_stime"ms\n",
+            XSPLICE, data->name, xsplice_work.timeout / MILLISECS(1));
+
+    /*
+     * Once the patching has been completed, the semaphore value will
+     * be num_online_cpus()-1.
+     */
+    atomic_set(&xsplice_work.semaphore, -1);
+    atomic_set(&xsplice_work.irq_semaphore, -1);
+
+    xsplice_work.ready = 0;
+    smp_wmb();
+    xsplice_work.do_work = 1;
+    smp_wmb();
+    /*
+     * Above smp_wmb() gives us an compiler barrier, as we MUST do this
+     * after setting the global structure.
+     */
+    for_each_online_cpu ( cpu )
+        per_cpu(work_to_do, cpu) = 1;
+
+    put_cpu_maps();
+
+    return 0;
+}
+
+/*
+ * Note that because of this NOP code the do_nmi is not safely patchable.
+ * Also if we do receive 'real' NMIs we have lost them. Ditto for MCE.
+ */
+static int mask_nmi_callback(const struct cpu_user_regs *regs, int cpu)
+{
+    /* TODO: Handle missing NMI/MCE.*/
+    return 1;
+}
+
+static void reschedule_fn(void *unused)
+{
+    smp_mb(); /* Synchronize with setting do_work */
+    raise_softirq(SCHEDULE_SOFTIRQ);
+}
+
+static int xsplice_do_wait(atomic_t *counter, s_time_t timeout,
+                           unsigned int total_cpus, const char *s)
+{
+    int rc = 0;
+
+    while ( atomic_read(counter) != total_cpus && NOW() < timeout )
+        cpu_relax();
+
+    /* Log & abort. */
+    if ( atomic_read(counter) != total_cpus )
+    {
+        printk(XENLOG_ERR "%s%s: %s %u/%u\n", XSPLICE,
+               xsplice_work.data->name, s, atomic_read(counter), total_cpus);
+        rc = -EBUSY;
+        xsplice_work.data->rc = rc;
+        xsplice_work.do_work = 0;
+        smp_wmb();
+    }
+    return rc;
+}
+
+/*
+ * The main function which manages the work of quiescing the system and
+ * patching code.
+ */
+void check_for_xsplice_work(void)
+{
+    unsigned int cpu = smp_processor_id();
+    nmi_callback_t saved_nmi_callback;
+    s_time_t timeout;
+    unsigned long flags;
+
+    /* Fast path: no work to do. */
+    if ( !per_cpu(work_to_do, cpu ) )
+        return;
+
+    /* In case we aborted, other CPUs can skip right away. */
+    if ( (!xsplice_work.do_work) )
+    {
+        per_cpu(work_to_do, cpu) = 0;
+        return;
+    }
+
+    ASSERT(local_irq_is_enabled());
+
+    /* Set at -1, so will go up to num_online_cpus - 1. */
+    if ( atomic_inc_and_test(&xsplice_work.semaphore) )
+    {
+        struct payload *p;
+        unsigned int total_cpus;
+
+        p = xsplice_work.data;
+        if ( !get_cpu_maps() )
+        {
+            printk(XENLOG_ERR "%s%s: CPU%u - unable to get cpu_maps lock!\n",
+                   XSPLICE, p->name, cpu);
+            per_cpu(work_to_do, cpu) = 0;
+            xsplice_work.data->rc = -EBUSY;
+            xsplice_work.do_work = 0;
+            /*
+             * Do NOT decrement semaphore down - as that may cause the other
+             * CPU (which may be at this exact moment checking the ASSERT)
+             * to assume the role of master and then needlessly time out
+             * out (as do_work is zero).
+             */
+            return;
+        }
+
+        barrier(); /* MUST do it after get_cpu_maps. */
+        total_cpus = num_online_cpus() - 1;
+
+        if ( total_cpus )
+        {
+            dprintk(XENLOG_DEBUG, "%s%s: CPU%u - IPIing the %u CPUs\n",
+                    XSPLICE, p->name, cpu, total_cpus);
+            smp_call_function(reschedule_fn, NULL, 0);
+        }
+
+        timeout = xsplice_work.timeout + NOW();
+        if ( xsplice_do_wait(&xsplice_work.semaphore, timeout, total_cpus,
+                             "Timed out on CPU semaphore") )
+            goto abort;
+
+        /* "Mask" NMIs. */
+        saved_nmi_callback = set_nmi_callback(mask_nmi_callback);
+
+        /* All CPUs are waiting, now signal to disable IRQs. */
+        xsplice_work.ready = 1;
+        smp_wmb();
+
+        atomic_inc(&xsplice_work.irq_semaphore);
+        if ( !xsplice_do_wait(&xsplice_work.irq_semaphore, timeout, total_cpus,
+                              "Timed out on IRQ semaphore") )
+        {
+            local_irq_save(flags);
+            /* Do the patching. */
+            xsplice_do_action();
+            /* To flush out pipeline. */
+            arch_xsplice_post_action();
+            local_irq_restore(flags);
+        }
+        set_nmi_callback(saved_nmi_callback);
+
+ abort:
+        per_cpu(work_to_do, cpu) = 0;
+        xsplice_work.do_work = 0;
+
+        smp_wmb(); /* Synchronize with waiting CPUs. */
+        ASSERT(local_irq_is_enabled());
+
+        put_cpu_maps();
+
+        printk(XENLOG_INFO "%s%s finished with rc=%d\n", XSPLICE,
+               p->name, p->rc);
+    }
+    else
+    {
+        /* Wait for all CPUs to rendezvous. */
+        while ( xsplice_work.do_work && !xsplice_work.ready )
+        {
+            cpu_relax();
+            smp_rmb();
+        }
+
+        /* Disable IRQs and signal. */
+        local_irq_save(flags);
+        atomic_inc(&xsplice_work.irq_semaphore);
+
+        /* Wait for patching to complete. */
+        while ( xsplice_work.do_work )
+        {
+            cpu_relax();
+            smp_rmb();
+        }
+        /* To flush out pipeline. */
+        arch_xsplice_post_action();
+        local_irq_restore(flags);
+
+        per_cpu(work_to_do, cpu) = 0;
+    }
+}
+
 static int xsplice_action(xen_sysctl_xsplice_action_t *action)
 {
     struct payload *data;
@@ -452,30 +870,24 @@ static int xsplice_action(xen_sysctl_xsplice_action_t *action)
     case XSPLICE_ACTION_REVERT:
         if ( data->state == XSPLICE_STATE_APPLIED )
         {
-            /* No implementation yet. */
-            data->state = XSPLICE_STATE_CHECKED;
-            data->rc = 0;
-            rc = 0;
+            data->rc = -EAGAIN;
+            rc = schedule_work(data, action->cmd, action->timeout);
         }
         break;
 
     case XSPLICE_ACTION_APPLY:
         if ( (data->state == XSPLICE_STATE_CHECKED) )
         {
-            /* No implementation yet. */
-            data->state = XSPLICE_STATE_APPLIED;
-            data->rc = 0;
-            rc = 0;
+            data->rc = -EAGAIN;
+            rc = schedule_work(data, action->cmd, action->timeout);
         }
         break;
 
     case XSPLICE_ACTION_REPLACE:
         if ( data->state == XSPLICE_STATE_CHECKED )
         {
-            /* No implementation yet. */
-            data->state = XSPLICE_STATE_CHECKED;
-            data->rc = 0;
-            rc = 0;
+            data->rc = -EAGAIN;
+            rc = schedule_work(data, action->cmd, action->timeout);
         }
         break;
 
@@ -541,19 +953,32 @@ static const char *state2str(uint32_t state)
 static void xsplice_printall(unsigned char key)
 {
     struct payload *data;
+    unsigned int i;
 
     spin_lock_recursive(&payload_lock);
 
     list_for_each_entry ( data, &payload_list, list )
-        printk(" name=%s state=%s(%d) %p using %zu pages.\n", data->name,
+    {
+        printk(" name=%s state=%s(%d) %p using %zu pages:\n", data->name,
                state2str(data->state), data->state, data->payload_address,
                data->payload_pages);
 
+        for ( i = 0; i < data->nfuncs; i++ )
+        {
+            struct xsplice_patch_func *f = &(data->funcs[i]);
+            printk("    %s patch 0x%"PRIx64"(%u) with 0x%"PRIx64"(%u)\n",
+                   f->name, f->old_addr, f->old_size, f->new_addr, f->new_size);
+            if ( !(i % 100) )
+                process_pending_softirqs();
+        }
+    }
     spin_unlock_recursive(&payload_lock);
 }
 
 static int __init xsplice_init(void)
 {
+    BUILD_BUG_ON( sizeof(struct xsplice_patch_func) != 64 );
+
     register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
     arch_xsplice_register_find_space(&find_hole);
     return 0;
diff --git a/xen/include/asm-arm/nmi.h b/xen/include/asm-arm/nmi.h
index a60587e..82aff35 100644
--- a/xen/include/asm-arm/nmi.h
+++ b/xen/include/asm-arm/nmi.h
@@ -4,6 +4,19 @@
 #define register_guest_nmi_callback(a)  (-ENOSYS)
 #define unregister_guest_nmi_callback() (-ENOSYS)
 
+typedef int (*nmi_callback_t)(const struct cpu_user_regs *regs, int cpu);
+
+/**
+ * set_nmi_callback
+ *
+ * Set a handler for an NMI. Only one handler may be
+ * set. Return the old nmi callback handler.
+ */
+static inline nmi_callback_t set_nmi_callback(nmi_callback_t callback)
+{
+    return NULL;
+}
+
 #endif /* ASM_NMI_H */
 /*
  * Local variables:
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index 343e59f..b48a811 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -11,12 +11,30 @@ struct xsplice_elf_sec;
 struct xsplice_elf_sym;
 struct xen_sysctl_xsplice_op;
 
+#include <xen/elfstructs.h>
+/*
+ * The structure which defines the patching. This is what the hypervisor
+ * expects in the '.xsplice.func' section of the ELF file.
+ *
+ * This MUST be in sync with what the tools generate.
+ */
+struct xsplice_patch_func {
+    const char *name;
+    Elf64_Xword new_addr;
+    Elf64_Xword old_addr;
+    Elf64_Word new_size;
+    Elf64_Word old_size;
+    uint8_t undo[8];
+    uint8_t pad[24];
+};
+
 #ifdef CONFIG_XSPLICE
 
 /* Convenience define for printk. */
 #define XSPLICE "xsplice: "
 
 int xsplice_op(struct xen_sysctl_xsplice_op *);
+void check_for_xsplice_work(void);
 
 /* Arch hooks. */
 int arch_xsplice_verify_elf(const struct xsplice_elf *elf, void *data);
@@ -59,6 +77,17 @@ void arch_xsplice_free_payload(void *va, unsigned int pages, enum va_type);
  */
 typedef int (*find_space_t)(ssize_t, unsigned long *, unsigned long *);
 void arch_xsplice_register_find_space(find_space_t cb);
+
+/*
+ * These functions are called around the critical region patching live code,
+ * for an architecture to take make appropratie global state adjustments.
+ */
+void arch_xsplice_patching_enter(void);
+void arch_xsplice_patching_leave(void);
+
+void arch_xsplice_apply_jmp(struct xsplice_patch_func *func);
+void arch_xsplice_revert_jmp(struct xsplice_patch_func *func);
+void arch_xsplice_post_action(void);
 #else
 
 #include <xen/errno.h> /* For -ENOSYS */
@@ -66,7 +95,7 @@ static inline int xsplice_op(struct xen_sysctl_xsplice_op *op)
 {
     return -ENOSYS;
 }
-
+static inline void check_for_xsplice_work(void) { };
 #endif /* CONFIG_XSPLICE */
 
 #endif /* __XEN_XSPLICE_H__ */
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 18/34] x86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version'.
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (16 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 17/34] xsplice: Implement support for applying/reverting/replacing patches Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 17:56 ` [PATCH v4 19/34] xsplice, symbols: Implement symbol name resolution on address Konrad Rzeszutek Wilk
                   ` (15 subsequent siblings)
  33 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Julien Grall, Stefano Stabellini, Jan Beulich,
	Konrad Rzeszutek Wilk

This change demonstrates how to generate an xSplice ELF payload.

The idea here is that we want to patch in the hypervisor
the 'xen_version_extra' function with an function that will
return 'Hello World'. The 'xl info | grep extraversion'
will reflect the new value after the patching.

To generate this ELF payload file we need:
 - C code of the new code (xen_hello_world_func.c).
 - C code generating the .xsplice.funcs structure
   (xen_hello_world.c)
 - The address of the old code (xen_extra_version). We
   retrieve it by  using 'nm --defined' on xen-syms.
 - The size of the new and old code for which we use
   nm --defined -S on our code and xen-syms respectively.

There are two C files and one header files generated
during build. One could make this one C file if the
size of the newly patched function size was known in
advance (or an random value was choosen).

There is also a strict order of compiling:
 1) xen_hello_world_func.c
 2) config.h - extract the size of the new function,
    the old function and the old function address.
 3) xen_hello_world.c - which contains the .xsplice.funcs
    structure.
 4) Link the object files in an xen_hello_world.xsplice file.

The use-case is simple:

$xen-xsplice load /usr/lib/debug/xen_hello_world.xsplice
$xen-xsplice list
 ID                                     | status
----------------------------------------+------------
xen_hello_world                           APPLIED
$xl info | grep extra
xen_extra              : Hello World
$xen-xsplice revert xen_hello_world
Performing revert: completed
$xen-xsplice unload xen_hello_world
Performing unload: completed
$xl info | grep extra
xen_extra              : -unstable

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v2: Do it using hypervisor Makefiles
v3: Remove the stale linker file.
v4: Add Copyright and local definition block
v5: s/name/xen_hello_world_name/
---
 .gitignore                               |  2 ++
 docs/misc/xsplice.markdown               | 36 +++++++++++++++++++++++
 xen/Makefile                             |  2 ++
 xen/arch/arm/Makefile                    |  4 +++
 xen/arch/x86/Makefile                    |  6 ++++
 xen/arch/x86/test/Makefile               | 50 ++++++++++++++++++++++++++++++++
 xen/arch/x86/test/xen_hello_world.c      | 30 +++++++++++++++++++
 xen/arch/x86/test/xen_hello_world_func.c | 23 +++++++++++++++
 8 files changed, 153 insertions(+)
 create mode 100644 xen/arch/x86/test/Makefile
 create mode 100644 xen/arch/x86/test/xen_hello_world.c
 create mode 100644 xen/arch/x86/test/xen_hello_world_func.c

diff --git a/.gitignore b/.gitignore
index 5cae935..075800d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -245,6 +245,8 @@ xen/arch/x86/efi.lds
 xen/arch/x86/efi/check.efi
 xen/arch/x86/efi/disabled
 xen/arch/x86/efi/mkreloc
+xen/arch/x86/test/config.h
+xen/arch/x86/test/xen_hello_world.xsplice
 xen/arch/*/efi/boot.c
 xen/arch/*/efi/compat.c
 xen/arch/*/efi/efi.h
diff --git a/docs/misc/xsplice.markdown b/docs/misc/xsplice.markdown
index 1940e29..e7c41da 100644
--- a/docs/misc/xsplice.markdown
+++ b/docs/misc/xsplice.markdown
@@ -321,11 +321,47 @@ size.
 
 When applying the patch the hypervisor iterates over each `xsplice_patch_func`
 structure and the core code inserts a trampoline at `old_addr` to `new_addr`.
+The `new_addr` is altered when the ELF payload is loaded.
 
 When reverting a patch, the hypervisor iterates over each `xsplice_patch_func`
 and the core code copies the data from the undo buffer (private internal copy)
 to `old_addr`.
 
+### Example
+
+A simple example of what a payload file can be:
+
+<pre>
+/* MUST be in sync with hypervisor. */  
+struct xsplice_patch_func {  
+    const char *name;  
+    uint64_t new_addr;  
+    uint64_t old_addr;  
+    uint32_t new_size;  
+    uint32_t old_size;  
+    uint8_t pad[32];  
+};  
+
+/* Our replacement function for xen_extra_version. */  
+const char *xen_hello_world(void)  
+{  
+    return "Hello World";  
+}  
+
+static unsigned char name[] = "xen_hello_world";  
+
+struct xsplice_patch_func xsplice_hello_world = {  
+    .name = name,  
+    .new_addr = (unsigned long)(xen_hello_world),  
+    .old_addr = 0xffff82d08013963c, /* Extracted from xen-syms. */  
+    .new_size = 13, /* To be be computed by scripts. */  
+    .old_size = 13, /* -----------""---------------  */  
+} __attribute__((__section__(".xsplice.funcs")));  
+
+</pre>
+
+Code must be compiled with -fPIC.
+
 ## Hypercalls
 
 We will employ the sub operations of the system management hypercall (sysctl).
diff --git a/xen/Makefile b/xen/Makefile
index c908544..2be7deb 100644
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -75,6 +75,7 @@ _install: $(TARGET)$(CONFIG_XEN_INSTALL_SUFFIX)
 			echo 'EFI installation only partially done (EFI_VENDOR not set)' >&2; \
 		fi; \
 	fi
+	$(MAKE) -f $(BASEDIR)/Rules.mk -C arch/$(TARGET_ARCH) install
 
 .PHONY: _uninstall
 _uninstall: D=$(DESTDIR)
@@ -92,6 +93,7 @@ _uninstall:
 	rm -f $(D)$(EFI_DIR)/$(T)-$(XEN_VERSION).efi
 	rm -f $(D)$(EFI_DIR)/$(T).efi
 	rm -f $(D)$(EFI_MOUNTPOINT)/efi/$(EFI_VENDOR)/$(T)-$(XEN_FULLVERSION).efi
+	$(MAKE) -f $(BASEDIR)/Rules.mk -C arch/$(TARGET_ARCH) uninstall
 
 .PHONY: _debug
 _debug:
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index eae5cb3..17e9e3a 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -57,6 +57,10 @@ ifeq ($(CONFIG_ARM_64),y)
 	ln -sf $(notdir $@)  ../../$(notdir $@).efi
 endif
 
+install:
+
+uninstall:
+
 $(TARGET).axf: $(TARGET)-syms
 	# XXX: VE model loads by VMA so instead of
 	# making a proper ELF we link with LMA == VMA and adjust crudely
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index dfcebe8..5f40e63 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -74,7 +74,12 @@ efi-y := $(shell if [ ! -r $(BASEDIR)/include/xen/compile.h -o \
 $(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32
 	./boot/mkelf32 $(TARGET)-syms $(TARGET) 0x100000 \
 	`$(NM) -nr $(TARGET)-syms | head -n 1 | sed -e 's/^\([^ ]*\).*/0x\1/'`
+	$(MAKE) -f $(BASEDIR)/Rules.mk -C test
 
+install:
+	$(MAKE) -f $(BASEDIR)/Rules.mk -C test install
+uninstall:
+	$(MAKE) -f $(BASEDIR)/Rules.mk -C test uninstall
 
 ALL_OBJS := $(BASEDIR)/arch/x86/boot/built_in.o $(BASEDIR)/arch/x86/efi/built_in.o $(ALL_OBJS)
 
@@ -178,3 +183,4 @@ clean::
 	rm -f $(BASEDIR)/.xen-syms.[0-9]* boot/.*.d
 	rm -f $(BASEDIR)/.xen.efi.[0-9]* efi/*.o efi/.*.d efi/*.efi efi/disabled efi/mkreloc
 	rm -f boot/reloc.S boot/reloc.lnk boot/reloc.bin
+	$(MAKE) -f $(BASEDIR)/Rules.mk -C test clean
diff --git a/xen/arch/x86/test/Makefile b/xen/arch/x86/test/Makefile
new file mode 100644
index 0000000..941f586
--- /dev/null
+++ b/xen/arch/x86/test/Makefile
@@ -0,0 +1,50 @@
+include $(XEN_ROOT)/Config.mk
+
+CODE_ADDR=$(shell nm --defined $(1) | grep $(2) | awk '{print "0x"$$1}')
+CODE_SZ=$(shell nm --defined -S $(1) | grep $(2) | awk '{ print "0x"$$2}')
+
+.PHONY: default
+ifdef CONFIG_XSPLICE
+
+XSPLICE := xen_hello_world.xsplice
+
+default: xsplice
+
+install: xsplice
+	$(INSTALL_DATA) $(XSPLICE) $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE)
+uninstall:
+	rm -f $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE)
+else
+default:
+install:
+uninstall:
+endif
+
+.PHONY: clean
+clean::
+	rm -f *.o .*.o.d $(XSPLICE) config.h
+
+#
+# To compute these values we need the binary files: xen-syms
+# and xen_hello_world_func.o to be already compiled.
+#
+# We can be assured that xen-syms is already built as we are
+# the last entry in the build target.
+#
+.PHONY: config.h
+config.h: OLD_CODE=$(call CODE_ADDR,$(BASEDIR)/xen-syms,xen_extra_version)
+config.h: OLD_CODE_SZ=$(call CODE_SZ,$(BASEDIR)/xen-syms,xen_extra_version)
+config.h: NEW_CODE_SZ=$(call CODE_SZ,$<,xen_hello_world)
+config.h: xen_hello_world_func.o
+	(set -e; \
+	 echo "#define NEW_CODE_SZ $(NEW_CODE_SZ)"; \
+	 echo "#define OLD_CODE_SZ $(OLD_CODE_SZ)"; \
+	 echo "#define OLD_CODE $(OLD_CODE)") > $@
+
+.PHONY: xsplice
+xsplice: config.h
+	# Need to have these done in sequential order
+	$(MAKE) -f $(BASEDIR)/Rules.mk xen_hello_world_func.o
+	$(MAKE) -f $(BASEDIR)/Rules.mk xen_hello_world.o
+	$(LD) $(LDFLAGS) -r -o $(XSPLICE) xen_hello_world_func.o xen_hello_world.o
+
diff --git a/xen/arch/x86/test/xen_hello_world.c b/xen/arch/x86/test/xen_hello_world.c
new file mode 100644
index 0000000..f6ac098
--- /dev/null
+++ b/xen/arch/x86/test/xen_hello_world.c
@@ -0,0 +1,30 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#include <xen/config.h>
+#include <xen/types.h>
+#include <xen/xsplice.h>
+#include "config.h"
+
+static char xen_hello_world_name[] = "xen_hello_world";
+extern const char *xen_hello_world(void);
+
+struct xsplice_patch_func __section(".xsplice.funcs") xsplice_xen_hello_world = {
+    .name = xen_hello_world_name,
+    .new_addr = (unsigned long)(xen_hello_world),
+    .old_addr = OLD_CODE,
+    .new_size = NEW_CODE_SZ,
+    .old_size = OLD_CODE_SZ,
+};
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/test/xen_hello_world_func.c b/xen/arch/x86/test/xen_hello_world_func.c
new file mode 100644
index 0000000..81380a6
--- /dev/null
+++ b/xen/arch/x86/test/xen_hello_world_func.c
@@ -0,0 +1,23 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#include <xen/config.h>
+#include <xen/types.h>
+
+/* Our replacement function for xen_extra_version. */
+const char *xen_hello_world(void)
+{
+    return "Hello World";
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 19/34] xsplice, symbols: Implement symbol name resolution on address.
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (17 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 18/34] x86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version' Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 17:56 ` [PATCH v4 20/34] x86, xsplice: Print payload's symbol name and payload name in backtraces Konrad Rzeszutek Wilk
                   ` (14 subsequent siblings)
  33 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Jan Beulich, Konrad Rzeszutek Wilk

If in the payload we do not have the old_addr we can resolve
the virtual address based on the UNDEFined symbols.

We also use an boolean flag: new_symbol to track symbols. The usual
case this is used is by:

* A payload may introduce a new symbol
* A payload may override an existing symbol (introduced in Xen or another
  payload)
* Overriding symbols must exist in the symtab for backtraces.
* A payload must always link against the object which defines the new symbol.

Considering that payloads may be loaded in any order it would be incorrect to
link against a payload which simply overrides a symbol because you could end
up with a chain of jumps which is inefficient and may result in the expected
function not being executed.

Also we include a local definition block in the symbols.c file.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>

---
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v1: Ross original version.
v2: Include test-case and document update.
v3: s/size_t/ssize_t/
v4: Include core_text_size, core_text calculation
---
 xen/arch/x86/Makefile               |   6 +-
 xen/arch/x86/test/Makefile          |   4 +-
 xen/arch/x86/test/xen_hello_world.c |   5 +-
 xen/common/symbols.c                |  33 ++++++++
 xen/common/xsplice.c                | 163 ++++++++++++++++++++++++++++++++++++
 xen/common/xsplice_elf.c            |  19 ++++-
 xen/include/xen/symbols.h           |   2 +
 xen/include/xen/xsplice.h           |   8 ++
 8 files changed, 231 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 5f40e63..a2e3017 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -112,12 +112,14 @@ $(TARGET)-syms: prelink.o xen.lds $(BASEDIR)/common/symbols-dummy.o
 	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
 	    $(BASEDIR)/common/symbols-dummy.o -o $(@D)/.$(@F).0
 	$(NM) -pa --format=sysv $(@D)/.$(@F).0 \
-		| $(BASEDIR)/tools/symbols --sysv --sort >$(@D)/.$(@F).0.S
+		| $(BASEDIR)/tools/symbols --all-symbols --sysv --sort \
+		>$(@D)/.$(@F).0.S
 	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).0.o
 	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
 	    $(@D)/.$(@F).0.o -o $(@D)/.$(@F).1
 	$(NM) -pa --format=sysv $(@D)/.$(@F).1 \
-		| $(BASEDIR)/tools/symbols --sysv --sort --warn-dup >$(@D)/.$(@F).1.S
+		| $(BASEDIR)/tools/symbols --all-symbols --sysv --sort --warn-dup \
+		>$(@D)/.$(@F).1.S
 	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1.o
 	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
 	    $(@D)/.$(@F).1.o -o $@
diff --git a/xen/arch/x86/test/Makefile b/xen/arch/x86/test/Makefile
index 941f586..45df301 100644
--- a/xen/arch/x86/test/Makefile
+++ b/xen/arch/x86/test/Makefile
@@ -32,14 +32,12 @@ clean::
 # the last entry in the build target.
 #
 .PHONY: config.h
-config.h: OLD_CODE=$(call CODE_ADDR,$(BASEDIR)/xen-syms,xen_extra_version)
 config.h: OLD_CODE_SZ=$(call CODE_SZ,$(BASEDIR)/xen-syms,xen_extra_version)
 config.h: NEW_CODE_SZ=$(call CODE_SZ,$<,xen_hello_world)
 config.h: xen_hello_world_func.o
 	(set -e; \
 	 echo "#define NEW_CODE_SZ $(NEW_CODE_SZ)"; \
-	 echo "#define OLD_CODE_SZ $(OLD_CODE_SZ)"; \
-	 echo "#define OLD_CODE $(OLD_CODE)") > $@
+	 echo "#define OLD_CODE_SZ $(OLD_CODE_SZ)") > $@
 
 .PHONY: xsplice
 xsplice: config.h
diff --git a/xen/arch/x86/test/xen_hello_world.c b/xen/arch/x86/test/xen_hello_world.c
index f6ac098..243eb3f 100644
--- a/xen/arch/x86/test/xen_hello_world.c
+++ b/xen/arch/x86/test/xen_hello_world.c
@@ -11,10 +11,13 @@
 static char xen_hello_world_name[] = "xen_hello_world";
 extern const char *xen_hello_world(void);
 
+/* External symbol. */
+extern const char *xen_extra_version(void);
+
 struct xsplice_patch_func __section(".xsplice.funcs") xsplice_xen_hello_world = {
     .name = xen_hello_world_name,
     .new_addr = (unsigned long)(xen_hello_world),
-    .old_addr = OLD_CODE,
+    .old_addr = (unsigned long)(xen_extra_version),
     .new_size = NEW_CODE_SZ,
     .old_size = OLD_CODE_SZ,
 };
diff --git a/xen/common/symbols.c b/xen/common/symbols.c
index 2cc416e..4fe7a87 100644
--- a/xen/common/symbols.c
+++ b/xen/common/symbols.c
@@ -225,3 +225,36 @@ int xensyms_read(uint32_t *symnum, char *type,
 
     return 0;
 }
+
+uint64_t symbols_lookup_by_name(const char *symname)
+{
+    uint32_t symnum = 0;
+    uint64_t addr = 0, outaddr = 0;
+    int rc;
+    char type;
+    char name[KSYM_NAME_LEN + 1] = {0};
+
+    do {
+        rc = xensyms_read(&symnum, &type, &addr, name);
+        if ( rc )
+            break;
+
+        if ( !strcmp(name, symname) )
+        {
+            outaddr = addr;
+            break;
+        }
+    } while ( name[0] != '\0' );
+
+    return outaddr;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index d456671..54120bb 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -13,6 +13,7 @@
 #include <xen/smp.h>
 #include <xen/softirq.h>
 #include <xen/spinlock.h>
+#include <xen/symbols.h>
 #include <xen/vmap.h>
 #include <xen/wait.h>
 #include <xen/xsplice_elf.h>
@@ -44,9 +45,14 @@ struct payload {
     void *payload_address;               /* Virtual address mapped. */
     size_t payload_pages;                /* Nr of the pages. */
     mfn_t *mfn;                          /* Array of MFNs of the pages. */
+    size_t core_size;                    /* Everything else - .data,.rodata, etc. */
+    size_t core_text_size;               /* Only .text size. */
     struct list_head applied_list;       /* Linked to 'applied_list'. */
     struct xsplice_patch_func *funcs;    /* The array of functions to patch. */
     unsigned int nfuncs;                 /* Nr of functions to patch. */
+    struct xsplice_symbol *symtab;       /* All symbols. */
+    char *strtab;                        /* Pointer to .strtab. */
+    unsigned int nsyms;                  /* Nr of entries in .strtab and symbols. */
     char name[XEN_XSPLICE_NAME_SIZE + 1];/* Name of it. */
 };
 
@@ -97,6 +103,34 @@ static int verify_payload(const xen_sysctl_xsplice_upload_t *upload)
     return 0;
 }
 
+uint64_t xsplice_symbols_lookup_by_name(const char *symname)
+{
+    struct payload *data;
+    unsigned int i;
+    uint64_t value = 0;
+
+    spin_lock_recursive(&payload_lock);
+
+    list_for_each_entry ( data, &payload_list, list )
+    {
+        for ( i = 0; i < data->nsyms; i++ )
+        {
+            if ( !data->symtab[i].new_symbol )
+                continue;
+
+            if ( !strcmp(data->symtab[i].name, symname) )
+            {
+                value = data->symtab[i].value;
+                goto out;
+            }
+        }
+    }
+
+out:
+    spin_unlock_recursive(&payload_lock);
+    return value;
+}
+
 static int find_payload(const xen_xsplice_name_t *name, struct payload **f)
 {
     struct payload *data;
@@ -199,6 +233,7 @@ static int move_payload(struct payload *payload, struct xsplice_elf *elf)
              (SHF_ALLOC|SHF_EXECINSTR) )
             calc_section(&elf->sec[i], &size);
     }
+    payload->core_text_size = size;
 
     /* Compute rw data. */
     for ( i = 0; i < elf->hdr->e_shnum; i++ )
@@ -217,6 +252,7 @@ static int move_payload(struct payload *payload, struct xsplice_elf *elf)
              !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
             calc_section(&elf->sec[i], &size);
     }
+    payload->core_size = size;
 
     size = PFN_UP(size);
     buf = arch_xsplice_alloc_payload(size, XSPLICE_VA_RX, &payload->mfn);
@@ -322,10 +358,131 @@ static int prepare_payload(struct payload *payload,
         for ( j = 0; j < 24; j++ )
             if ( f->pad[j] )
                 return -EINVAL;
+
+        /* Lookup function's old address if not already resolved. */
+        if ( !f->old_addr )
+        {
+            f->old_addr = symbols_lookup_by_name(f->name);
+            if ( !f->old_addr )
+            {
+                f->old_addr = xsplice_symbols_lookup_by_name(f->name);
+                if ( !f->old_addr )
+                {
+                    printk(XENLOG_ERR "%s%s: Could not resolve old address of %s\n",
+                           XSPLICE, elf->name, f->name);
+                    return -ENOENT;
+                }
+            }
+            dprintk(XENLOG_DEBUG, "%s%s: Resolved old address %s => 0x%"PRIx64"\n",
+                   XSPLICE, elf->name, f->name, f->old_addr);
+        }
     }
     return 0;
 }
 
+static bool_t is_core_symbol(const struct xsplice_elf *elf,
+                             const struct xsplice_elf_sym *sym)
+{
+    if ( sym->sym->st_shndx == SHN_UNDEF ||
+         sym->sym->st_shndx >= elf->hdr->e_shnum )
+        return 0;
+
+    return !!( (elf->sec[sym->sym->st_shndx].sec->sh_flags & SHF_ALLOC) &&
+               (ELF64_ST_TYPE(sym->sym->st_info) == STT_OBJECT ||
+                ELF64_ST_TYPE(sym->sym->st_info) == STT_FUNC) );
+}
+
+/*
+ * MUST be called after prepare_payload as we depend on payload->nfuncs.
+ */
+static int build_symbol_table(struct payload *payload,
+                              const struct xsplice_elf *elf)
+{
+    unsigned int i, j, nsyms = 0;
+    size_t strtab_len = 0;
+    struct xsplice_symbol *symtab;
+    char *strtab;
+
+    ASSERT(payload->nfuncs);
+
+    /* Recall that 0 is always NULL. */
+    for ( i = 1; i < elf->nsym; i++ )
+    {
+        if ( is_core_symbol(elf, elf->sym + i) )
+        {
+            nsyms++;
+            strtab_len += strlen(elf->sym[i].name) + 1;
+        }
+    }
+
+    symtab = xmalloc_array(struct xsplice_symbol, nsyms);
+    if ( !symtab )
+        return -ENOMEM;
+
+    strtab = xmalloc_bytes(strtab_len);
+    if ( !strtab )
+    {
+        xfree(symtab);
+        return -ENOMEM;
+    }
+
+    nsyms = 0;
+    strtab_len = 0;
+    for ( i = 1; i < elf->nsym; i++ )
+    {
+        if ( is_core_symbol(elf, elf->sym + i) )
+        {
+            symtab[nsyms].name = strtab + strtab_len;
+            symtab[nsyms].size = elf->sym[i].sym->st_size;
+            symtab[nsyms].value = elf->sym[i].sym->st_value;
+            symtab[nsyms].new_symbol = 0; /* To be checked below. */
+            strtab_len += strlcpy(strtab + strtab_len, elf->sym[i].name,
+                                  KSYM_NAME_LEN) + 1;
+            nsyms++;
+        }
+    }
+
+    for ( i = 0; i < nsyms; i++ )
+    {
+        bool_t found = 0;
+
+        for ( j = 0; j < payload->nfuncs; j++ )
+        {
+            if ( symtab[i].value == payload->funcs[j].new_addr )
+            {
+                found = 1;
+                break;
+            }
+        }
+
+        if ( !found )
+        {
+            if ( xsplice_symbols_lookup_by_name(symtab[i].name) )
+            {
+                printk(XENLOG_ERR "%s%s: duplicate new symbol: %s\n",
+                       XSPLICE, elf->name, symtab[i].name);
+                xfree(symtab);
+                xfree(strtab);
+                return -EEXIST;
+            }
+            symtab[i].new_symbol = 1;
+            dprintk(XENLOG_DEBUG, "%s%s: new symbol %s\n",
+                    XSPLICE, elf->name, symtab[i].name);
+        }
+        else
+        {
+            dprintk(XENLOG_DEBUG, "%s%s: overriding symbol %s\n",
+                    XSPLICE, elf->name, symtab[i].name);
+        }
+    }
+
+    payload->symtab = symtab;
+    payload->strtab = strtab;
+    payload->nsyms = nsyms;
+
+    return 0;
+}
+
 /*
  * We MUST be holding the payload_lock spinlock.
  */
@@ -336,6 +493,8 @@ static void free_payload(struct payload *data)
     payload_cnt--;
     payload_version++;
     free_payload_data(data);
+    xfree(data->symtab);
+    xfree(data->strtab);
     xfree(data);
 }
 
@@ -376,6 +535,8 @@ static int load_payload_data(struct payload *payload, void *raw, ssize_t len)
     if ( rc )
         goto out;
 
+    rc = build_symbol_table(payload, &elf);
+
     rc = secure_payload(payload, &elf);
  out:
     if ( rc )
@@ -440,6 +601,8 @@ static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
     vfree(raw_data);
     if ( rc )
     {
+        xfree(data->symtab);
+        xfree(data->strtab);
         xfree(data);
     }
     return rc;
diff --git a/xen/common/xsplice_elf.c b/xen/common/xsplice_elf.c
index 6128726..1ed133c 100644
--- a/xen/common/xsplice_elf.c
+++ b/xen/common/xsplice_elf.c
@@ -4,6 +4,7 @@
 
 #include <xen/errno.h>
 #include <xen/lib.h>
+#include <xen/symbols.h>
 #include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 
@@ -225,9 +226,21 @@ int xsplice_elf_resolve_symbols(struct xsplice_elf *elf)
                 return -EINVAL;
                 break;
             case SHN_UNDEF:
-                printk(XENLOG_ERR "%s%s: Unknown symbol: %s\n",
-                       XSPLICE, elf->name, elf->sym[i].name);
-                return -ENOENT;
+                elf->sym[i].sym->st_value = symbols_lookup_by_name(elf->sym[i].name);
+                if ( !elf->sym[i].sym->st_value )
+                {
+                    elf->sym[i].sym->st_value =
+                        xsplice_symbols_lookup_by_name(elf->sym[i].name);
+                    if ( !elf->sym[i].sym->st_value )
+                    {
+                        printk(XENLOG_ERR "%s%s: Unknown symbol: %s\n",
+                               XSPLICE, elf->name, elf->sym[i].name);
+                        return -ENOENT;
+                    }
+                }
+                dprintk(XENLOG_DEBUG, "%s%s: Undefined symbol resolved: %s => 0x%"PRIx64"\n",
+                       XSPLICE, elf->name, elf->sym[i].name,
+                       elf->sym[i].sym->st_value);
                 break;
             case SHN_ABS:
                 dprintk(XENLOG_DEBUG, "%s%s: Absolute symbol: %s => 0x%"PRIx64"\n",
diff --git a/xen/include/xen/symbols.h b/xen/include/xen/symbols.h
index fe9ed8f..e6101d1 100644
--- a/xen/include/xen/symbols.h
+++ b/xen/include/xen/symbols.h
@@ -23,4 +23,6 @@ const char *symbols_lookup(unsigned long addr,
 int xensyms_read(uint32_t *symnum, char *type,
                  uint64_t *address, char *name);
 
+uint64_t symbols_lookup_by_name(const char *symname);
+
 #endif /*_XEN_SYMBOLS_H*/
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index b48a811..2e2fb78 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -33,8 +33,16 @@ struct xsplice_patch_func {
 /* Convenience define for printk. */
 #define XSPLICE "xsplice: "
 
+struct xsplice_symbol {
+    const char *name;
+    uint64_t value;
+    ssize_t size;
+    bool_t new_symbol;
+};
+
 int xsplice_op(struct xen_sysctl_xsplice_op *);
 void check_for_xsplice_work(void);
+uint64_t xsplice_symbols_lookup_by_name(const char *symname);
 
 /* Arch hooks. */
 int arch_xsplice_verify_elf(const struct xsplice_elf *elf, void *data);
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 20/34] x86, xsplice: Print payload's symbol name and payload name in backtraces
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (18 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 19/34] xsplice, symbols: Implement symbol name resolution on address Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 17:56 ` [PATCH v4 21/34] xsplice: Add .xsplice.hooks functions and test-case Konrad Rzeszutek Wilk
                   ` (13 subsequent siblings)
  33 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Tim Deegan, Ian Jackson, Jan Beulich, Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Naturally the backtrace is presented when an instruction
hits an bug_frame or %p is used.

The payloads do not support bug_frames yet - however the functions
the payloads call could hit an BUG() or WARN().
The traps.c has logic to scan for it this - and eventually it will
find the correct bug_frame and the walk the stack using %p to print
the backtrace. For %p and symbols to print a string -  the
'is_active_kernel_text' is consulted which uses an 'struct virtual_region'.

Therefore we register our start->end addresses so that
'is_active_kernel_text' will include our payload address.

We also register our symbol lookup table function so that it can
scan the list of payloads and retrieve the correct name.

Lastly we change vsprintf to take into account s and namebuf.
For core code they are the same, but for payloads they are different.
This gets us:

Xen call trace:
   [<ffff82d080a00041>] revert_hook+0x31/0x35 [xen_hello_world]
   [<ffff82d0801431bd>] xsplice.c#revert_payload+0x86/0xc6
   [<ffff82d080143502>] check_for_xsplice_work+0x233/0x3cd
   [<ffff82d08017a0b2>] domain.c#continue_idle_domain+0x9/0x1f

Which is great if payloads have similar or same symbol names.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>

v2: Add missing full stop.
v3: s/module/payload/
v4: Expand comment and include registration of 'virtual_region'
v5: Redo the vsprintf handling of payload name.
---
 xen/common/vsprintf.c | 15 +++++++++---
 xen/common/xsplice.c  | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 78 insertions(+), 4 deletions(-)

diff --git a/xen/common/vsprintf.c b/xen/common/vsprintf.c
index 18d2634..f0b743f 100644
--- a/xen/common/vsprintf.c
+++ b/xen/common/vsprintf.c
@@ -20,6 +20,7 @@
 #include <xen/symbols.h>
 #include <xen/lib.h>
 #include <xen/sched.h>
+#include <xen/xsplice.h>
 #include <asm/div64.h>
 #include <asm/page.h>
 
@@ -331,16 +332,17 @@ static char *pointer(char *str, char *end, const char **fmt_ptr,
     {
         unsigned long sym_size, sym_offset;
         char namebuf[KSYM_NAME_LEN+1];
+        bool_t payload = 0;
 
         /* Advance parents fmt string, as we have consumed 's' or 'S' */
         ++*fmt_ptr;
 
         s = symbols_lookup((unsigned long)arg, &sym_size, &sym_offset, namebuf);
-
-        /* If the symbol is not found, fall back to printing the address */
+        /* If the symbol is not found, fall back to printing the address. */
         if ( !s )
             break;
-
+        if ( strncmp(namebuf, s, KSYM_NAME_LEN) )
+            payload = 1;
         /* Print symbol name */
         str = string(str, end, s, -1, -1, 0);
 
@@ -354,6 +356,13 @@ static char *pointer(char *str, char *end, const char **fmt_ptr,
             str = number(str, end, sym_size, 16, -1, -1, SPECIAL);
         }
 
+        if ( payload )
+        {
+            str = string(str, end, " [", -1, -1, 0);
+            str = string(str, end, namebuf, -1, -1, 0);
+            str = string(str, end, "]", -1, -1, 0);
+        }
+
         return str;
     }
 
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 54120bb..a3366ef 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -3,6 +3,7 @@
  *
  */
 
+#include <xen/bug_ex_symbols.h>
 #include <xen/cpu.h>
 #include <xen/guest_access.h>
 #include <xen/keyhandler.h>
@@ -13,6 +14,7 @@
 #include <xen/smp.h>
 #include <xen/softirq.h>
 #include <xen/spinlock.h>
+#include <xen/string.h>
 #include <xen/symbols.h>
 #include <xen/vmap.h>
 #include <xen/wait.h>
@@ -50,6 +52,7 @@ struct payload {
     struct list_head applied_list;       /* Linked to 'applied_list'. */
     struct xsplice_patch_func *funcs;    /* The array of functions to patch. */
     unsigned int nfuncs;                 /* Nr of functions to patch. */
+    struct virtual_region bug_ex_region; /*.bug.frame patching and exception table. */
     struct xsplice_symbol *symtab;       /* All symbols. */
     char *strtab;                        /* Pointer to .strtab. */
     unsigned int nsyms;                  /* Nr of entries in .strtab and symbols. */
@@ -103,6 +106,12 @@ static int verify_payload(const xen_sysctl_xsplice_upload_t *upload)
     return 0;
 }
 
+static bool_t ignore_region(unsigned int flag, unsigned long priv)
+{
+    /* See CHECKING_[SYMBOL|BUG_FRAME|EXCEPTION]. */
+    return !(flag & priv);
+}
+
 uint64_t xsplice_symbols_lookup_by_name(const char *symname)
 {
     struct payload *data;
@@ -131,6 +140,51 @@ out:
     return value;
 }
 
+static const char *xsplice_symbols_lookup(unsigned long addr,
+                                          unsigned long *symbolsize,
+                                          unsigned long *offset,
+                                          char *namebuf)
+{
+    struct payload *data;
+    unsigned int i;
+    int best;
+
+    /*
+     * No locking since this list is only ever changed during apply or revert
+     * context.
+     */
+    list_for_each_entry ( data, &applied_list, applied_list )
+    {
+        if ( !((void *)addr >= data->payload_address &&
+               (void *)addr < (data->payload_address + data->core_text_size)) )
+            continue;
+
+        best = -1;
+
+        for ( i = 0; i < data->nsyms; i++ )
+        {
+            if ( data->symtab[i].value <= addr &&
+                 ( best == -1 ||
+                   data->symtab[best].value < data->symtab[i].value) )
+                best = i;
+        }
+
+        if ( best == -1 )
+            return NULL;
+
+        if ( symbolsize )
+            *symbolsize = data->symtab[best].size;
+        if ( offset )
+            *offset = addr - data->symtab[best].value;
+        if ( namebuf )
+            strlcpy(namebuf, data->name, KSYM_NAME_LEN);
+
+        return data->symtab[best].name;
+    }
+
+    return NULL;
+}
+
 static int find_payload(const xen_xsplice_name_t *name, struct payload **f)
 {
     struct payload *data;
@@ -325,6 +379,7 @@ static int prepare_payload(struct payload *payload,
     struct xsplice_elf_sec *sec;
     unsigned int i;
     struct xsplice_patch_func *f;
+    struct virtual_region *region;
 
     sec = xsplice_elf_sec_by_name(elf, ".xsplice.funcs");
     if ( sec )
@@ -377,6 +432,15 @@ static int prepare_payload(struct payload *payload,
                    XSPLICE, elf->name, f->name, f->old_addr);
         }
     }
+
+    /* Setup the virtual region with proper data. */
+    region = &payload->bug_ex_region;
+    region->skip = ignore_region;
+    region->symbols_lookup = xsplice_symbols_lookup;
+    region->priv = CHECKING_SYMBOL;
+    region->start = (unsigned long)payload->payload_address;
+    region->end = (unsigned long)(payload->payload_address + payload->core_text_size);
+
     return 0;
 }
 
@@ -479,7 +543,6 @@ static int build_symbol_table(struct payload *payload,
     payload->symtab = symtab;
     payload->strtab = strtab;
     payload->nsyms = nsyms;
-
     return 0;
 }
 
@@ -707,6 +770,7 @@ static int apply_payload(struct payload *data)
     arch_xsplice_patching_leave();
 
     list_add_tail(&data->applied_list, &applied_list);
+    register_virtual_region(&data->bug_ex_region);
 
     return 0;
 }
@@ -729,6 +793,7 @@ static int revert_payload(struct payload *data)
     arch_xsplice_patching_leave();
 
     list_del_init(&data->applied_list);
+    unregister_virtual_region(&data->bug_ex_region);
 
     return 0;
 }
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 21/34] xsplice: Add .xsplice.hooks functions and test-case
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (19 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 20/34] x86, xsplice: Print payload's symbol name and payload name in backtraces Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 17:56 ` [PATCH v4 22/34] xsplice: Add support for bug frames Konrad Rzeszutek Wilk
                   ` (12 subsequent siblings)
  33 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Jan Beulich, Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Add hook functions which run during patch apply and patch revert.
Hook functions are used by xsplice payloads to manipulate data structures
during patching, etc.

Also add macros to be used by payloads for excluding functions or
sections from being included in a patch.

Furthermore include a test-case for it.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v2: Style guide changes
v3: Include the test-case - and also re-order this patch
---
 docs/misc/xsplice.markdown          | 21 +++++++++++++
 xen/arch/x86/test/xen_hello_world.c | 15 ++++++++++
 xen/common/xsplice.c                | 37 +++++++++++++++++++++++
 xen/include/xen/xsplice_patch.h     | 59 +++++++++++++++++++++++++++++++++++++
 4 files changed, 132 insertions(+)
 create mode 100644 xen/include/xen/xsplice_patch.h

diff --git a/docs/misc/xsplice.markdown b/docs/misc/xsplice.markdown
index e7c41da..d1f2a5b 100644
--- a/docs/misc/xsplice.markdown
+++ b/docs/misc/xsplice.markdown
@@ -285,6 +285,12 @@ like what the Linux kernel module loader does.
 
 The payload contains a section (xsplice_patch_func) with an array of structures
 describing the functions to be patched:
+It optionally may contain the address of functions to be called right before
+being applied and after being reverted:
+
+ * `.xsplice.hooks.load` - an array of function pointers.
+ * `.xsplice.hooks.unload` - an array of function pointers.
+
 
 <pre>
 struct xsplice_patch_func {  
@@ -362,6 +368,21 @@ struct xsplice_patch_func xsplice_hello_world = {
 
 Code must be compiled with -fPIC.
 
+
+### .xsplice.hooks.load and .xsplice.hooks.unload
+
+This section contains an array of function pointers to be executed
+before payload is being applied (.xsplice.funcs) or after reverting
+the payload.
+
+Each entry in this array is eight bytes.
+
+The type definition of the function are as follow:
+
+<pre>
+typedef void (*xsplice_loadcall_t)(void);  
+typedef void (*xsplice_unloadcall_t)(void);   
+</pre>
 ## Hypercalls
 
 We will employ the sub operations of the system management hypercall (sysctl).
diff --git a/xen/arch/x86/test/xen_hello_world.c b/xen/arch/x86/test/xen_hello_world.c
index 243eb3f..d2b3cc2 100644
--- a/xen/arch/x86/test/xen_hello_world.c
+++ b/xen/arch/x86/test/xen_hello_world.c
@@ -5,8 +5,10 @@
 
 #include <xen/config.h>
 #include <xen/types.h>
+#include <xen/xsplice_patch.h>
 #include <xen/xsplice.h>
 #include "config.h"
+#include <xen/lib.h>
 
 static char xen_hello_world_name[] = "xen_hello_world";
 extern const char *xen_hello_world(void);
@@ -14,6 +16,19 @@ extern const char *xen_hello_world(void);
 /* External symbol. */
 extern const char *xen_extra_version(void);
 
+void apply_hook(void)
+{
+    printk(KERN_DEBUG "Hook executing.\n");
+}
+
+void revert_hook(void)
+{
+    printk(KERN_DEBUG "Hook unloaded.\n");
+}
+
+XSPLICE_LOAD_HOOK(apply_hook);
+XSPLICE_UNLOAD_HOOK(revert_hook);
+
 struct xsplice_patch_func __section(".xsplice.funcs") xsplice_xen_hello_world = {
     .name = xen_hello_world_name,
     .new_addr = (unsigned long)(xen_hello_world),
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index a3366ef..5fb1867 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -20,6 +20,7 @@
 #include <xen/wait.h>
 #include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
+#include <xen/xsplice_patch.h>
 
 #include <asm/event.h>
 #include <asm/nmi.h>
@@ -56,6 +57,10 @@ struct payload {
     struct xsplice_symbol *symtab;       /* All symbols. */
     char *strtab;                        /* Pointer to .strtab. */
     unsigned int nsyms;                  /* Nr of entries in .strtab and symbols. */
+    xsplice_loadcall_t *load_funcs;      /* The array of funcs to call after */
+    xsplice_unloadcall_t *unload_funcs;  /* load and unload of the payload. */
+    unsigned int n_load_funcs;           /* Nr of the funcs to load and execute. */
+    unsigned int n_unload_funcs;         /* Nr of funcs to call durung unload. */
     char name[XEN_XSPLICE_NAME_SIZE + 1];/* Name of it. */
 };
 
@@ -433,6 +438,28 @@ static int prepare_payload(struct payload *payload,
         }
     }
 
+    sec = xsplice_elf_sec_by_name(elf, ".xsplice.hooks.load");
+    if ( sec )
+    {
+        if ( !sec->sec->sh_size ||
+             (sec->sec->sh_size % sizeof (*payload->load_funcs)) )
+            return -EINVAL;
+
+        payload->load_funcs = (xsplice_loadcall_t *)sec->load_addr;
+        payload->n_load_funcs = sec->sec->sh_size / (sizeof *payload->load_funcs);
+    }
+
+    sec = xsplice_elf_sec_by_name(elf, ".xsplice.hooks.unload");
+    if ( sec )
+    {
+        if ( !sec->sec->sh_size ||
+             (sec->sec->sh_size % sizeof (*payload->unload_funcs)) )
+            return -EINVAL;
+
+        payload->unload_funcs = (xsplice_unloadcall_t *)sec->load_addr;
+        payload->n_unload_funcs = sec->sec->sh_size / (sizeof *payload->unload_funcs);
+    }
+
     /* Setup the virtual region with proper data. */
     region = &payload->bug_ex_region;
     region->skip = ignore_region;
@@ -769,6 +796,11 @@ static int apply_payload(struct payload *data)
 
     arch_xsplice_patching_leave();
 
+    spin_debug_disable();
+    for ( i = 0; i < data->n_load_funcs; i++ )
+        data->load_funcs[i]();
+    spin_debug_enable();
+
     list_add_tail(&data->applied_list, &applied_list);
     register_virtual_region(&data->bug_ex_region);
 
@@ -792,6 +824,11 @@ static int revert_payload(struct payload *data)
 
     arch_xsplice_patching_leave();
 
+    spin_debug_disable();
+    for ( i = 0; i < data->n_unload_funcs; i++ )
+        data->unload_funcs[i]();
+    spin_debug_enable();
+
     list_del_init(&data->applied_list);
     unregister_virtual_region(&data->bug_ex_region);
 
diff --git a/xen/include/xen/xsplice_patch.h b/xen/include/xen/xsplice_patch.h
new file mode 100644
index 0000000..19d3f76
--- /dev/null
+++ b/xen/include/xen/xsplice_patch.h
@@ -0,0 +1,59 @@
+/*
+ * Copyright (C) 2016 Citrix Systems R&D Ltd.
+ */
+
+#ifndef __XEN_XSPLICE_PATCH_H__
+#define __XEN_XSPLICE_PATCH_H__
+
+/*
+ * The following definitions are to be used in patches. They are taken
+ * from kpatch.
+ */
+typedef void (*xsplice_loadcall_t)(void);
+typedef void (*xsplice_unloadcall_t)(void);
+
+/* This definition is taken from Linux. */
+#define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__)
+/*
+ * XSPLICE_IGNORE_SECTION macro
+ *
+ * This macro is for ignoring sections that may change as a side effect of
+ * another change or might be a non-bundlable section; that is one that does
+ * not honor -ffunction-section and create a one-to-one relation from function
+ * symbol to section.
+ */
+#define XSPLICE_IGNORE_SECTION(_sec) \
+	char *__UNIQUE_ID(xsplice_ignore_section_) __section(".xsplice.ignore.sections") = _sec;
+
+/*
+ * XSPLICE_IGNORE_FUNCTION macro
+ *
+ * This macro is for ignoring functions that may change as a side effect of a
+ * change in another function.
+ */
+#define XSPLICE_IGNORE_FUNCTION(_fn) \
+	void *__xsplice_ignore_func_##_fn __section(".xsplice.ignore.functions") = _fn;
+
+/*
+ * XSPLICE_LOAD_HOOK macro
+ *
+ * Declares a function pointer to be allocated in a new
+ * .xsplice.hook.load section.  This xsplice_load_data symbol is later
+ * stripped by create-diff-object so that it can be declared in multiple
+ * objects that are later linked together, avoiding global symbol
+ * collision.  Since multiple hooks can be registered, the
+ * .xsplice.hook.load section is a table of functions that will be
+ * executed in series by the xsplice infrastructure at patch load time.
+ */
+#define XSPLICE_LOAD_HOOK(_fn) \
+	xsplice_loadcall_t __attribute__((weak)) xsplice_load_data __section(".xsplice.hooks.load") = _fn;
+
+/*
+ * XSPLICE_UNLOAD_HOOK macro
+ *
+ * Same as LOAD hook with s/load/unload/
+ */
+#define XSPLICE_UNLOAD_HOOK(_fn) \
+	xsplice_unloadcall_t __attribute__((weak)) xsplice_unload_data __section(".xsplice.hooks.unload") = _fn;
+
+#endif /* __XEN_XSPLICE_PATCH_H__ */
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 22/34] xsplice: Add support for bug frames.
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (20 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 21/34] xsplice: Add .xsplice.hooks functions and test-case Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 17:56 ` [PATCH v4 23/34] xsplice: Add support for exception tables Konrad Rzeszutek Wilk
                   ` (11 subsequent siblings)
  33 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Jan Beulich, Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Add support for handling bug frames contained with xsplice modules. If a
trap occurs search either the kernel bug table or an applied payload's
bug table depending on the instruction pointer.

We also include a test-case - which will test the function that was
patched to make sure it has the right value. And will only be triggered
if something has gone horribly wrong.

P.S.
If one really wants to test, insert an WARN_ON(1) at the end of
the revert_hook.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v2:- s/module/payload/
   - add build time check in case amount of bug frames expands.
   - add define for the number of bug-frames.
v3:
  - add missing BUGFRAME_NR, squash s/core_size/core/ in earlier patch.
v4:- Add comment about it being optional.
  - Moved code around.
  - Changed per Andrew's recommendation.
  - Fixed style changes.
  - Made it compile under ARM (PRIu32,PRIu64)
v5: Use 'struct virtual_region'
  - Rip more of the is_active_text code.
  - Use one function for the ->skip
v6: Include test-case
---
 xen/arch/x86/test/xen_hello_world.c |  6 ++++++
 xen/arch/x86/traps.c                |  5 +++--
 xen/common/xsplice.c                | 42 +++++++++++++++++++++++++++++++++++++
 xen/include/xen/xsplice.h           |  5 +++++
 4 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/test/xen_hello_world.c b/xen/arch/x86/test/xen_hello_world.c
index d2b3cc2..5364114 100644
--- a/xen/arch/x86/test/xen_hello_world.c
+++ b/xen/arch/x86/test/xen_hello_world.c
@@ -19,11 +19,17 @@ extern const char *xen_extra_version(void);
 void apply_hook(void)
 {
     printk(KERN_DEBUG "Hook executing.\n");
+    /* The hook is called  _after_ the patching. */
+    if ( strcmp(xen_extra_version(), "Hello World") )
+        BUG();
 }
 
 void revert_hook(void)
 {
     printk(KERN_DEBUG "Hook unloaded.\n");
+    /* The hook is called  _after_ the unpatching. */
+    if ( !strcmp(xen_extra_version(), "Hello World") )
+        BUG();
 }
 
 XSPLICE_LOAD_HOOK(apply_hook);
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index eeada97..f35fd1a 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -49,6 +49,7 @@
 #include <xen/kexec.h>
 #include <xen/trace.h>
 #include <xen/paging.h>
+#include <xen/xsplice.h>
 #include <xen/watchdog.h>
 #include <asm/system.h>
 #include <asm/io.h>
@@ -1196,7 +1197,7 @@ void do_invalid_op(struct cpu_user_regs *regs)
 
     /* WARN, BUG or ASSERT: decode the filename pointer and line number. */
     filename = bug_ptr(bug);
-    if ( !is_kernel(filename) )
+    if ( !is_kernel(filename) && !is_patch(filename) )
         goto die;
     fixup = strlen(filename);
     if ( fixup > 50 )
@@ -1223,7 +1224,7 @@ void do_invalid_op(struct cpu_user_regs *regs)
     case BUGFRAME_assert:
         /* ASSERT: decode the predicate string pointer. */
         predicate = bug_msg(bug);
-        if ( !is_kernel(predicate) )
+        if ( !is_kernel(predicate) && !is_patch(predicate) )
             predicate = "<unknown>";
 
         printk("Assertion '%s' failed at %s%s:%d\n",
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 5fb1867..49f0d6e 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -117,6 +117,25 @@ static bool_t ignore_region(unsigned int flag, unsigned long priv)
     return !(flag & priv);
 }
 
+
+bool_t is_patch(const void *ptr)
+{
+    struct payload *data;
+
+    /*
+     * No locking since this list is only ever changed during apply or revert
+     * context.
+     */
+    list_for_each_entry ( data, &applied_list, applied_list )
+    {
+        if ( ptr >= data->payload_address &&
+             ptr < (data->payload_address + data->core_size) )
+            return 1;
+    }
+
+    return 0;
+}
+
 uint64_t xsplice_symbols_lookup_by_name(const char *symname)
 {
     struct payload *data;
@@ -468,6 +487,29 @@ static int prepare_payload(struct payload *payload,
     region->start = (unsigned long)payload->payload_address;
     region->end = (unsigned long)(payload->payload_address + payload->core_text_size);
 
+
+    /* Optional sections. */
+    for ( i = 0; i < BUGFRAME_NR; i++ )
+    {
+        char str[14];
+
+        snprintf(str, sizeof str, ".bug_frames.%u", i);
+        sec = xsplice_elf_sec_by_name(elf, str);
+        if ( !sec )
+            continue;
+
+        if ( !sec->sec->sh_size ||
+             (sec->sec->sh_size % sizeof (struct bug_frame)) )
+        {
+            dprintk(XENLOG_DEBUG, "%s%s: Wrong size of .bug_frames.%u!\n",
+                    XSPLICE, elf->name, i);
+            return -EINVAL;
+        }
+        region->frame[i].bugs = (struct bug_frame *)sec->load_addr;
+        region->frame[i].n_bugs = sec->sec->sh_size / sizeof(struct bug_frame);
+        if ( !(region->priv & CHECKING_BUG_FRAME) )
+            region->priv |= CHECKING_BUG_FRAME;
+    }
     return 0;
 }
 
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index 2e2fb78..16d35b8 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -42,6 +42,7 @@ struct xsplice_symbol {
 
 int xsplice_op(struct xen_sysctl_xsplice_op *);
 void check_for_xsplice_work(void);
+bool_t is_patch(const void *addr);
 uint64_t xsplice_symbols_lookup_by_name(const char *symname);
 
 /* Arch hooks. */
@@ -104,6 +105,10 @@ static inline int xsplice_op(struct xen_sysctl_xsplice_op *op)
     return -ENOSYS;
 }
 static inline void check_for_xsplice_work(void) { };
+static inline bool_t is_patch(const void *addr)
+{
+    return 0;
+}
 #endif /* CONFIG_XSPLICE */
 
 #endif /* __XEN_XSPLICE_H__ */
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 23/34] xsplice: Add support for exception tables.
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (21 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 22/34] xsplice: Add support for bug frames Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 17:56 ` [PATCH v4 24/34] xsplice: Add support for alternatives Konrad Rzeszutek Wilk
                   ` (10 subsequent siblings)
  33 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Jan Beulich, Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Add support for exception tables contained within xSplice payloads. If an
exception occurs search either the main exception table or a particular
active payload's exception table depending on the instruction pointer.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v2:
 - s/module/payload/
 - sanity checks.
 - Move code around.
 - s/module/payload/
v3: Use 'struct virtual_region'
---
 xen/arch/x86/extable.c        | 30 +++++++++++++++++-------------
 xen/common/xsplice.c          | 19 +++++++++++++++++++
 xen/include/asm-x86/uaccess.h |  5 +++++
 3 files changed, 41 insertions(+), 13 deletions(-)

diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c
index 6e083a8..4a9016b 100644
--- a/xen/arch/x86/extable.c
+++ b/xen/arch/x86/extable.c
@@ -20,7 +20,7 @@ static inline unsigned long ex_cont(const struct exception_table_entry *x)
 	return EX_FIELD(x, cont);
 }
 
-static int __init cmp_ex(const void *a, const void *b)
+static int cmp_ex(const void *a, const void *b)
 {
 	const struct exception_table_entry *l = a, *r = b;
 	unsigned long lip = ex_addr(l);
@@ -35,7 +35,7 @@ static int __init cmp_ex(const void *a, const void *b)
 }
 
 #ifndef swap_ex
-static void __init swap_ex(void *a, void *b, int size)
+static void swap_ex(void *a, void *b, int size)
 {
 	struct exception_table_entry *l = a, *r = b, tmp;
 	long delta = b - a;
@@ -48,19 +48,23 @@ static void __init swap_ex(void *a, void *b, int size)
 }
 #endif
 
-void __init sort_exception_tables(void)
+void sort_exception_table(struct exception_table_entry *start,
+                          struct exception_table_entry *stop)
 {
-    sort(__start___ex_table, __stop___ex_table - __start___ex_table,
-         sizeof(struct exception_table_entry), cmp_ex, swap_ex);
-    sort(__start___pre_ex_table,
-         __stop___pre_ex_table - __start___pre_ex_table,
+    sort(start, stop - start,
          sizeof(struct exception_table_entry), cmp_ex, swap_ex);
 }
 
-static inline unsigned long
-search_one_table(const struct exception_table_entry *first,
-                 const struct exception_table_entry *last,
-                 unsigned long value)
+void __init sort_exception_tables(void)
+{
+    sort_exception_table(__start___ex_table, __stop___ex_table);
+    sort_exception_table(__start___pre_ex_table, __stop___pre_ex_table);
+}
+
+unsigned long
+search_one_extable(const struct exception_table_entry *first,
+                   const struct exception_table_entry *last,
+                   unsigned long value)
 {
     const struct exception_table_entry *mid;
     long diff;
@@ -90,7 +94,7 @@ search_exception_table(unsigned long addr)
             continue;
 
         if ( (addr >= region->start) && (addr < region->end) )
-            return search_one_table(region->ex, region->ex_end-1, addr);
+            return search_one_extable(region->ex, region->ex_end-1, addr);
     }
 
     return 0;
@@ -100,7 +104,7 @@ unsigned long
 search_pre_exception_table(struct cpu_user_regs *regs)
 {
     unsigned long addr = (unsigned long)regs->eip;
-    unsigned long fixup = search_one_table(
+    unsigned long fixup = search_one_extable(
         __start___pre_ex_table, __stop___pre_ex_table-1, addr);
     if ( fixup )
     {
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 49f0d6e..834fda3 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -510,6 +510,25 @@ static int prepare_payload(struct payload *payload,
         if ( !(region->priv & CHECKING_BUG_FRAME) )
             region->priv |= CHECKING_BUG_FRAME;
     }
+#ifdef CONFIG_X86
+    sec = xsplice_elf_sec_by_name(elf, ".ex_table");
+    if ( sec )
+    {
+        if ( !sec->sec->sh_size ||
+             (sec->sec->sh_size % sizeof (struct exception_table_entry)) )
+        {
+            dprintk(XENLOG_DEBUG, "%s%s: Wrong size of .ex_table (exp:%lu vs %lu)!\n",
+                    XSPLICE, elf->name, sizeof (struct exception_table_entry),
+                    sec->sec->sh_size);
+            return -EINVAL;
+        }
+        region->ex = (struct exception_table_entry *)sec->load_addr;
+        region->ex_end = (struct exception_table_entry *)(sec->load_addr + sec->sec->sh_size);
+
+        sort_exception_table(region->ex, region->ex_end);
+        region->priv |= CHECKING_EXCEPTION;
+    }
+#endif
     return 0;
 }
 
diff --git a/xen/include/asm-x86/uaccess.h b/xen/include/asm-x86/uaccess.h
index 947470d..9e67bf0 100644
--- a/xen/include/asm-x86/uaccess.h
+++ b/xen/include/asm-x86/uaccess.h
@@ -276,6 +276,11 @@ extern struct exception_table_entry __start___pre_ex_table[];
 extern struct exception_table_entry __stop___pre_ex_table[];
 
 extern unsigned long search_exception_table(unsigned long);
+extern unsigned long search_one_extable(const struct exception_table_entry *first,
+                                        const struct exception_table_entry *last,
+                                        unsigned long value);
 extern void sort_exception_tables(void);
+extern void sort_exception_table(struct exception_table_entry *start,
+                                 struct exception_table_entry *stop);
 
 #endif /* __X86_UACCESS_H__ */
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 24/34] xsplice: Add support for alternatives
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (22 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 23/34] xsplice: Add support for exception tables Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 17:56 ` [PATCH v4 25/34] build_id: Provide ld-embedded build-ids Konrad Rzeszutek Wilk
                   ` (9 subsequent siblings)
  33 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Jan Beulich, Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Add support for applying alternative sections within xsplice payload.
At payload load time, apply an alternative sections that are found.

Also we add an test-case exercising a rather useless alternative
(patching a NOP with a NOP) - but it does exercise the code-path.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v2: Make a new alternative function that does not ASSERT on IRQs and
    don't disable IRQs in the code when loading payload.
v3: Include test-case
v4: Include check for size of alternatives and that it is not a 0 size
    section.
---
 xen/arch/x86/Makefile                    |  2 +-
 xen/arch/x86/alternative.c               | 20 ++++++++++++--------
 xen/arch/x86/test/xen_hello_world_func.c |  3 +++
 xen/common/xsplice.c                     | 16 ++++++++++++++++
 xen/include/asm-x86/alternative.h        |  6 ++++++
 5 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index a2e3017..8a100be 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -6,7 +6,7 @@ subdir-y += mm
 subdir-$(CONFIG_XENOPROF) += oprofile
 subdir-y += x86_64
 
-obj-bin-y += alternative.init.o
+obj-bin-y += alternative.o
 obj-y += apic.o
 obj-y += bitops.o
 obj-bin-y += bzimage.init.o
diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index 26ad2b9..e423d3a 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -28,7 +28,7 @@
 extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
 
 #ifdef K8_NOP1
-static const unsigned char k8nops[] __initconst = {
+static const unsigned char k8nops[] = {
     K8_NOP1,
     K8_NOP2,
     K8_NOP3,
@@ -52,7 +52,7 @@ static const unsigned char * const k8_nops[ASM_NOP_MAX+1] __initconstrel = {
 #endif
 
 #ifdef P6_NOP1
-static const unsigned char p6nops[] __initconst = {
+static const unsigned char p6nops[] = {
     P6_NOP1,
     P6_NOP2,
     P6_NOP3,
@@ -75,7 +75,7 @@ static const unsigned char * const p6_nops[ASM_NOP_MAX+1] __initconstrel = {
 };
 #endif
 
-static const unsigned char * const *ideal_nops __initdata = k8_nops;
+static const unsigned char * const *ideal_nops = k8_nops;
 
 static int __init mask_nmi_callback(const struct cpu_user_regs *regs, int cpu)
 {
@@ -100,7 +100,7 @@ static void __init arch_init_ideal_nops(void)
 }
 
 /* Use this to add nops to a buffer, then text_poke the whole buffer. */
-static void __init add_nops(void *insns, unsigned int len)
+static void add_nops(void *insns, unsigned int len)
 {
     while ( len > 0 )
     {
@@ -127,7 +127,7 @@ static void __init add_nops(void *insns, unsigned int len)
  *
  * This routine is called with local interrupt disabled.
  */
-static void *__init text_poke_early(void *addr, const void *opcode, size_t len)
+static void *text_poke_early(void *addr, const void *opcode, size_t len)
 {
     memcpy(addr, opcode, len);
     sync_core();
@@ -142,15 +142,13 @@ static void *__init text_poke_early(void *addr, const void *opcode, size_t len)
  * APs have less capabilities than the boot processor are not handled.
  * Tough. Make sure you disable such features by hand.
  */
-static void __init apply_alternatives(struct alt_instr *start, struct alt_instr *end)
+void apply_alternatives_nocheck(struct alt_instr *start, struct alt_instr *end)
 {
     struct alt_instr *a;
     u8 *instr, *replacement;
     u8 insnbuf[MAX_PATCH_LEN];
     unsigned long cr0 = read_cr0();
 
-    ASSERT(!local_irq_is_enabled());
-
     printk(KERN_INFO "alt table %p -> %p\n", start, end);
 
     /* Disable WP to allow application of alternatives to read-only pages. */
@@ -190,6 +188,12 @@ static void __init apply_alternatives(struct alt_instr *start, struct alt_instr
     write_cr0(cr0);
 }
 
+void apply_alternatives(struct alt_instr *start, struct alt_instr *end)
+{
+    ASSERT(!local_irq_is_enabled());
+    apply_alternatives_nocheck(start, end);
+}
+
 void __init alternative_instructions(void)
 {
     nmi_callback_t saved_nmi_callback;
diff --git a/xen/arch/x86/test/xen_hello_world_func.c b/xen/arch/x86/test/xen_hello_world_func.c
index 81380a6..2465ce9 100644
--- a/xen/arch/x86/test/xen_hello_world_func.c
+++ b/xen/arch/x86/test/xen_hello_world_func.c
@@ -5,10 +5,13 @@
 
 #include <xen/config.h>
 #include <xen/types.h>
+#include <asm/nops.h>
+#include <asm/alternative.h>
 
 /* Our replacement function for xen_extra_version. */
 const char *xen_hello_world(void)
 {
+    alternative(ASM_NOP1, ASM_NOP1, 1);
     return "Hello World";
 }
 
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 834fda3..e5867bd 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -526,8 +526,24 @@ static int prepare_payload(struct payload *payload,
         region->ex_end = (struct exception_table_entry *)(sec->load_addr + sec->sec->sh_size);
 
         sort_exception_table(region->ex, region->ex_end);
+
         region->priv |= CHECKING_EXCEPTION;
     }
+    sec = xsplice_elf_sec_by_name(elf, ".altinstructions");
+    if ( sec )
+    {
+        if ( !sec->sec->sh_size ||
+             (sec->sec->sh_size % sizeof (struct alt_instr)) )
+        {
+            dprintk(XENLOG_DEBUG, "%s%s: Wrong size of .alt_instr (exp:%lu vs %lu)!\n",
+                    XSPLICE, elf->name, sizeof (struct alt_instr),
+                    sec->sec->sh_size);
+            return -EINVAL;
+        }
+        apply_alternatives_nocheck((struct alt_instr *)sec->load_addr,
+                                   (struct alt_instr *)(sec->load_addr +
+                                   sec->sec->sh_size));
+    }
 #endif
     return 0;
 }
diff --git a/xen/include/asm-x86/alternative.h b/xen/include/asm-x86/alternative.h
index 1056630..d50c0b5 100644
--- a/xen/include/asm-x86/alternative.h
+++ b/xen/include/asm-x86/alternative.h
@@ -23,6 +23,12 @@ struct alt_instr {
     u8  replacementlen;     /* length of new instruction, <= instrlen */
 };
 
+/*
+ * An variant to be used on code that can be patched without many checks.
+ */
+extern void apply_alternatives_nocheck(struct alt_instr *start,
+                                       struct alt_instr *end);
+extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end);
 extern void alternative_instructions(void);
 
 #define OLDINSTR(oldinstr)      "661:\n\t" oldinstr "\n662:\n"
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 25/34] build_id: Provide ld-embedded build-ids
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (23 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 24/34] xsplice: Add support for alternatives Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-16 18:34   ` Julien Grall
  2016-03-15 17:56 ` [PATCH v4 26/34] HYPERCALL_version_op: Add VERSION_OP_build_id to retrieve build-id Konrad Rzeszutek Wilk
                   ` (8 subsequent siblings)
  33 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Julien Grall, Stefano Stabellini, Jan Beulich,
	Konrad Rzeszutek Wilk

This patch enables the Elf to be built with the build-id
and provide in the Xen hypervisor the code to extract it.

One can also retrieve the value of the build-id by doing
'readelf -n xen-syms'.

For EFI builds we re-use the same build-id that the xen-syms
was built with.

The version of ld that first implemented --build-id is v2.18.
Hence we check for that or later version - if older version
found we do not build the hypervisor with the build-id
(and the return code is -ENODATA for xen_build_id() call).

For x86 we have two binaries - the xen-syms and the xen - an
smaller version with lots of sections removed. To make it possible
for readelf -n xen we also modify mkelf32 and xen.lds.S to include
the PT_NOTE ELF section.

The EFI binary is more complicated. Having any non-recognizable
sections (.note, .data.note, etc) causes the boot to hang.
Moving the .note in the .data section makes it work. It is also
worth noting that the PE/COFF does not have any "comment"
sections to the author.

Lastly, we MUST call --binary-id=sha1 on all linker invocation so that
symbol offsets don't changes (which means we have multiple binary
ids - except that the last one is the final one). Without this change,
the symbol table embedded in Xen are incorrect - some of the values it
contains are offset by the size of the included build id.
This obviously causes problems when resolving symbols.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Martin Pohlack <mpohlack@amazon.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v1: Rebase it on Martin's initial patch
v2: Move it to XENVER hypercall
v3: Fix EFI building (Ross's fix)
v4: Don't use the third argument for length.
v5: Use new structure for XENVER_build_id with variable buf.
v6: Include Ross's fix.
v7: Include detection of bin-utils for build-id support, add
    probing for size, and return -EPERM for XSM denied calls.
v8: Build xen_build_id under ARM, required adding ELFSIZE in proper file.
v9: Rebase on top XSM version class.
v10: Include the build-id .note in the xen ELF binary.
     s/build_id/build_id_linker/
    For EFI build, moved the --build-id values in .data section
v11: Rebase on staging.
v12: Split patch in two. Always do --build-id call. Include the .note in
    .rodata. USe const void * and ssize_t
v13: Use -S to make build_id.o and objcopy differently (Andrew suggested)
---
 Config.mk                   |  11 ++++
 xen/arch/arm/Makefile       |   2 +-
 xen/arch/arm/xen.lds.S      |  20 +++++--
 xen/arch/x86/Makefile       |  30 ++++++++--
 xen/arch/x86/boot/mkelf32.c | 137 ++++++++++++++++++++++++++++++++++++++------
 xen/arch/x86/xen.lds.S      |  23 ++++++++
 xen/common/version.c        |  51 +++++++++++++++++
 xen/include/xen/version.h   |   3 +
 8 files changed, 249 insertions(+), 28 deletions(-)

diff --git a/Config.mk b/Config.mk
index 79eb2bd..c8e89fe 100644
--- a/Config.mk
+++ b/Config.mk
@@ -126,6 +126,17 @@ endef
 check-$(gcc) = $(call cc-ver-check,CC,0x040100,"Xen requires at least gcc-4.1")
 $(eval $(check-y))
 
+ld-ver-build-id = $(shell $(1) --build-id 2>&1 | \
+					grep -q unrecognized && echo n || echo y)
+
+# binutils 2.18 implement build-id.
+ifeq ($(call ld-ver-build-id,$(LD)),n)
+build_id_linker :=
+else
+CFLAGS += -DBUILD_ID
+build_id_linker := --build-id=sha1
+endif
+
 # as-insn: Check whether assembler supports an instruction.
 # Usage: cflags-y += $(call as-insn "insn",option-yes,option-no)
 as-insn = $(if $(shell echo 'void _(void) { asm volatile ( $(2) ); }' \
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 17e9e3a..a3319ab 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -94,7 +94,7 @@ $(TARGET)-syms: prelink.o xen.lds $(BASEDIR)/common/symbols-dummy.o
 	$(NM) -pa --format=sysv $(@D)/.$(@F).1 \
 		| $(BASEDIR)/tools/symbols --sysv --sort >$(@D)/.$(@F).1.S
 	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1.o
-	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
+	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o $(build_id_linker) \
 	    $(@D)/.$(@F).1.o -o $@
 	rm -f $(@D)/.$(@F).[0-9]*
 
diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index 9909595..187ef73 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -22,6 +22,9 @@ OUTPUT_ARCH(FORMAT)
 PHDRS
 {
   text PT_LOAD /* XXX should be AT ( XEN_PHYS_START ) */ ;
+#if defined(BUILD_ID)
+  note PT_NOTE ;
+#endif
 }
 SECTIONS
 {
@@ -50,16 +53,21 @@ SECTIONS
        __stop_bug_frames_2 = .;
        *(.rodata)
        *(.rodata.*)
-
-#ifdef LOCK_PROFILE
-       . = ALIGN(POINTER_ALIGN);
-       __lock_profile_start = .;
-       *(.lockprofile.data)
-       __lock_profile_end = .;
+#if !defined(BUILD_ID)
+        _erodata = .;          /* End of read-only data */
 #endif
+  } :text
 
+#if defined(BUILD_ID)
+  .note : {
+       __note_gnu_build_id_start = .;
+       *(.note.gnu.build-id)
+       __note_gnu_build_id_end = .;
+       *(.note)
+       *(.note.*)
         _erodata = .;          /* End of read-only data */
   } :text
+#endif
 
   .data : {                    /* Data */
        . = ALIGN(PAGE_SIZE);
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 8a100be..7db2e53 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -71,9 +71,16 @@ efi-y := $(shell if [ ! -r $(BASEDIR)/include/xen/compile.h -o \
                       -O $(BASEDIR)/include/xen/compile.h ]; then \
                          echo '$(TARGET).efi'; fi)
 
+ifdef build_id_linker
+num_phdrs = 2
+else
+num_phdrs = 1
+endif
+
 $(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32
 	./boot/mkelf32 $(TARGET)-syms $(TARGET) 0x100000 \
-	`$(NM) -nr $(TARGET)-syms | head -n 1 | sed -e 's/^\([^ ]*\).*/0x\1/'`
+	`$(NM) -nr $(TARGET)-syms | head -n 1 | sed -e 's/^\([^ ]*\).*/0x\1/'` \
+	$(num_phdrs)
 	$(MAKE) -f $(BASEDIR)/Rules.mk -C test
 
 install:
@@ -109,22 +116,28 @@ $(BASEDIR)/common/symbols-dummy.o:
 	$(MAKE) -f $(BASEDIR)/Rules.mk -C $(BASEDIR)/common symbols-dummy.o
 
 $(TARGET)-syms: prelink.o xen.lds $(BASEDIR)/common/symbols-dummy.o
-	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
+	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o $(build_id_linker) \
 	    $(BASEDIR)/common/symbols-dummy.o -o $(@D)/.$(@F).0
 	$(NM) -pa --format=sysv $(@D)/.$(@F).0 \
 		| $(BASEDIR)/tools/symbols --all-symbols --sysv --sort \
 		>$(@D)/.$(@F).0.S
 	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).0.o
-	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
+	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o $(build_id_linker) \
 	    $(@D)/.$(@F).0.o -o $(@D)/.$(@F).1
 	$(NM) -pa --format=sysv $(@D)/.$(@F).1 \
 		| $(BASEDIR)/tools/symbols --all-symbols --sysv --sort --warn-dup \
 		>$(@D)/.$(@F).1.S
 	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1.o
-	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
+	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o $(build_id_linker) \
 	    $(@D)/.$(@F).1.o -o $@
 	rm -f $(@D)/.$(@F).[0-9]*
 
+build_id.o: $(TARGET)-syms
+	$(OBJCOPY) -O binary --only-section=.note $(BASEDIR)/xen-syms $@.bin
+	$(OBJCOPY) -I binary -O elf64-x86-64 -B i386:x86-64 \
+		--rename-section=.data=.note.gnu.build-id -S $@.bin $@
+	rm -f $@.bin
+
 EFI_LDFLAGS = $(patsubst -m%,-mi386pep,$(LDFLAGS)) --subsystem=10
 EFI_LDFLAGS += --image-base=$(1) --stack=0,0 --heap=0,0 --strip-debug
 EFI_LDFLAGS += --section-alignment=0x200000 --file-alignment=0x20
@@ -137,6 +150,13 @@ $(TARGET).efi: VIRT_BASE = 0x$(shell $(NM) efi/relocs-dummy.o | sed -n 's, A VIR
 $(TARGET).efi: ALT_BASE = 0x$(shell $(NM) efi/relocs-dummy.o | sed -n 's, A ALT_START$$,,p')
 # Don't use $(wildcard ...) here - at least make 3.80 expands this too early!
 $(TARGET).efi: guard = $(if $(shell echo efi/dis* | grep disabled),:)
+ifdef build_id_linker
+$(TARGET).efi: build_id.o
+build_id_file := build_id.o
+else
+build_id_file :=
+endif
+
 $(TARGET).efi: prelink-efi.o efi.lds efi/relocs-dummy.o $(BASEDIR)/common/symbols-dummy.o efi/mkreloc
 	$(foreach base, $(VIRT_BASE) $(ALT_BASE), \
 	          $(guard) $(LD) $(call EFI_LDFLAGS,$(base)) -T efi.lds -N $< efi/relocs-dummy.o \
@@ -153,7 +173,7 @@ $(TARGET).efi: prelink-efi.o efi.lds efi/relocs-dummy.o $(BASEDIR)/common/symbol
 		| $(guard) $(BASEDIR)/tools/symbols --sysv --sort >$(@D)/.$(@F).1s.S
 	$(guard) $(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1r.o $(@D)/.$(@F).1s.o
 	$(guard) $(LD) $(call EFI_LDFLAGS,$(VIRT_BASE)) -T efi.lds -N $< \
-	                $(@D)/.$(@F).1r.o $(@D)/.$(@F).1s.o -o $@
+	                $(@D)/.$(@F).1r.o $(@D)/.$(@F).1s.o $(build_id_file) -o $@
 	if $(guard) false; then rm -f $@; echo 'EFI support disabled'; fi
 	rm -f $(@D)/.$(@F).[0-9]*
 
diff --git a/xen/arch/x86/boot/mkelf32.c b/xen/arch/x86/boot/mkelf32.c
index 993a7ee..d230e4c 100644
--- a/xen/arch/x86/boot/mkelf32.c
+++ b/xen/arch/x86/boot/mkelf32.c
@@ -45,9 +45,9 @@ static Elf32_Ehdr out_ehdr = {
     0,                                       /* e_flags */
     sizeof(Elf32_Ehdr),                      /* e_ehsize */
     sizeof(Elf32_Phdr),                      /* e_phentsize */
-    1,                                       /* e_phnum */
+    1,  /* modify based on num_phdrs */      /* e_phnum */
     sizeof(Elf32_Shdr),                      /* e_shentsize */
-    3,                                       /* e_shnum */
+    3,  /* modify based on num_phdrs */      /* e_shnum */
     2                                        /* e_shstrndx */
 };
 
@@ -61,8 +61,20 @@ static Elf32_Phdr out_phdr = {
     PF_R|PF_W|PF_X,                          /* p_flags */
     64                                       /* p_align */
 };
+static Elf32_Phdr note_phdr = {
+    PT_NOTE,                                 /* p_type */
+    DYNAMICALLY_FILLED,                      /* p_offset */
+    DYNAMICALLY_FILLED,                      /* p_vaddr */
+    DYNAMICALLY_FILLED,                      /* p_paddr */
+    DYNAMICALLY_FILLED,                      /* p_filesz */
+    DYNAMICALLY_FILLED,                      /* p_memsz */
+    PF_R,                                    /* p_flags */
+    4                                        /* p_align */
+};
 
 static u8 out_shstrtab[] = "\0.text\0.shstrtab";
+/* If num_phdrs >= 2, we need to tack the .note. */
+static u8 out_shstrtab_extra[] = ".note\0";
 
 static Elf32_Shdr out_shdr[] = {
     { 0 },
@@ -90,6 +102,23 @@ static Elf32_Shdr out_shdr[] = {
     }
 };
 
+/*
+ * The 17 points to the '.note' in the out_shstrtab and out_shstrtab_extra
+ * laid out in the file.
+ */
+static Elf32_Shdr out_shdr_extra = {
+      17,                                    /* sh_name */
+      SHT_NOTE,                              /* sh_type */
+      0,                                     /* sh_flags */
+      DYNAMICALLY_FILLED,                    /* sh_addr */
+      DYNAMICALLY_FILLED,                    /* sh_offset */
+      DYNAMICALLY_FILLED,                    /* sh_size */
+      0,                                     /* sh_link */
+      0,                                     /* sh_info */
+      4,                                     /* sh_addralign */
+      0                                      /* sh_entsize */
+};
+
 /* Some system header files define these macros and pollute our namespace. */
 #undef swap16
 #undef swap32
@@ -228,21 +257,22 @@ static void do_read(int fd, void *data, int len)
 int main(int argc, char **argv)
 {
     u64        final_exec_addr;
-    u32        loadbase, dat_siz, mem_siz;
+    u32        loadbase, dat_siz, mem_siz, note_base, note_sz, offset;
     char      *inimage, *outimage;
     int        infd, outfd;
     char       buffer[1024];
     int        bytes, todo, i;
+    int        num_phdrs;
 
     Elf32_Ehdr in32_ehdr;
 
     Elf64_Ehdr in64_ehdr;
     Elf64_Phdr in64_phdr;
 
-    if ( argc != 5 )
+    if ( argc != 6 )
     {
         fprintf(stderr, "Usage: mkelf32 <in-image> <out-image> "
-                "<load-base> <final-exec-addr>\n");
+                "<load-base> <final-exec-addr> <number of program headers>\n");
         return 1;
     }
 
@@ -250,7 +280,13 @@ int main(int argc, char **argv)
     outimage = argv[2];
     loadbase = strtoul(argv[3], NULL, 16);
     final_exec_addr = strtoull(argv[4], NULL, 16);
-
+    num_phdrs = atoi(argv[5]);
+    if ( num_phdrs > 2 || num_phdrs < 1 )
+    {
+        fprintf(stderr, "Number of program headers MUST be 1 or 2, got %d!\n",
+                num_phdrs);
+        return 1;
+    }
     infd = open(inimage, O_RDONLY);
     if ( infd == -1 )
     {
@@ -285,11 +321,10 @@ int main(int argc, char **argv)
                 (int)in64_ehdr.e_phentsize, (int)sizeof(in64_phdr));
         return 1;
     }
-
-    if ( in64_ehdr.e_phnum != 1 )
+    if ( in64_ehdr.e_phnum != num_phdrs )
     {
-        fprintf(stderr, "Expect precisly 1 program header; found %d.\n",
-                (int)in64_ehdr.e_phnum);
+        fprintf(stderr, "Expect precisly %d program header; found %d.\n",
+                num_phdrs, (int)in64_ehdr.e_phnum);
         return 1;
     }
 
@@ -299,11 +334,36 @@ int main(int argc, char **argv)
 
     (void)lseek(infd, in64_phdr.p_offset, SEEK_SET);
     dat_siz = (u32)in64_phdr.p_filesz;
-
     /* Do not use p_memsz: it does not include BSS alignment padding. */
     /*mem_siz = (u32)in64_phdr.p_memsz;*/
     mem_siz = (u32)(final_exec_addr - in64_phdr.p_vaddr);
 
+    note_sz = note_base = offset = 0;
+    if ( num_phdrs > 1 )
+    {
+        offset = in64_phdr.p_offset;
+        note_base = in64_phdr.p_vaddr;
+
+        (void)lseek(infd, in64_ehdr.e_phoff+sizeof(in64_phdr), SEEK_SET);
+        do_read(infd, &in64_phdr, sizeof(in64_phdr));
+        endianadjust_phdr64(&in64_phdr);
+
+        (void)lseek(infd, offset, SEEK_SET);
+
+        note_sz = in64_phdr.p_memsz;
+        note_base = in64_phdr.p_vaddr - note_base;
+
+        if ( in64_phdr.p_offset > dat_siz || offset > in64_phdr.p_offset )
+        {
+            fprintf(stderr, "Expected .note section within .text section!\n" \
+                    "Offset %ld not within %d!\n",
+                    in64_phdr.p_offset, dat_siz);
+            return 1;
+        }
+        /* Gets us the absolute offset within the .text section. */
+        offset = in64_phdr.p_offset - offset;
+    }
+
     /*
      * End the image on a page boundary. This gets round alignment bugs
      * in the boot- or chain-loader (e.g., kexec on the XenoBoot CD).
@@ -322,6 +382,31 @@ int main(int argc, char **argv)
     out_shdr[1].sh_size   = dat_siz;
     out_shdr[2].sh_offset = RAW_OFFSET + dat_siz + sizeof(out_shdr);
 
+    if ( num_phdrs > 1 )
+    {
+        /* We have two of them! */
+        out_ehdr.e_phnum = num_phdrs;
+        /* Extra .note section. */
+        out_ehdr.e_shnum++;
+
+        /* Fill out the PT_NOTE program header. */
+        note_phdr.p_vaddr   = note_base;
+        note_phdr.p_paddr   = note_base;
+        note_phdr.p_filesz  = note_sz;
+        note_phdr.p_memsz   = note_sz;
+        note_phdr.p_offset  = offset;
+
+        /* Tack on the .note\0 */
+        out_shdr[2].sh_size += sizeof(out_shstrtab_extra);
+        /* And move it past the .note section. */
+        out_shdr[2].sh_offset += sizeof(out_shdr_extra);
+
+        /* Fill out the .note section. */
+        out_shdr_extra.sh_size = note_sz;
+        out_shdr_extra.sh_addr = note_base;
+        out_shdr_extra.sh_offset = RAW_OFFSET + offset;
+    }
+
     outfd = open(outimage, O_WRONLY|O_CREAT|O_TRUNC, 0775);
     if ( outfd == -1 )
     {
@@ -335,8 +420,15 @@ int main(int argc, char **argv)
 
     endianadjust_phdr32(&out_phdr);
     do_write(outfd, &out_phdr, sizeof(out_phdr));
-    
-    if ( (bytes = RAW_OFFSET - sizeof(out_ehdr) - sizeof(out_phdr)) < 0 )
+
+    if ( num_phdrs > 1 )
+    {
+        endianadjust_phdr32(&note_phdr);
+        do_write(outfd, &note_phdr, sizeof(note_phdr));
+    }
+
+    if ( (bytes = RAW_OFFSET - sizeof(out_ehdr) - sizeof(out_phdr) -
+          ( num_phdrs > 1 ? sizeof(note_phdr) : 0 ) ) < 0 )
     {
         fprintf(stderr, "Header overflow.\n");
         return 1;
@@ -355,9 +447,22 @@ int main(int argc, char **argv)
         endianadjust_shdr32(&out_shdr[i]);
     do_write(outfd, &out_shdr[0], sizeof(out_shdr));
 
-    do_write(outfd, out_shstrtab, sizeof(out_shstrtab));
-    do_write(outfd, buffer, 4-((sizeof(out_shstrtab)+dat_siz)&3));
-
+    if ( num_phdrs > 1 )
+    {
+        endianadjust_shdr32(&out_shdr_extra);
+        /* Append the .note section. */
+        do_write(outfd, &out_shdr_extra, sizeof(out_shdr_extra));
+        /* The normal strings - .text\0.. */
+        do_write(outfd, out_shstrtab, sizeof(out_shstrtab));
+        /* Our .note */
+        do_write(outfd, out_shstrtab_extra, sizeof(out_shstrtab_extra));
+        do_write(outfd, buffer, 4-((sizeof(out_shstrtab)+sizeof(out_shstrtab_extra)+dat_siz)&3));
+    }
+    else
+    {
+        do_write(outfd, out_shstrtab, sizeof(out_shstrtab));
+        do_write(outfd, buffer, 4-((sizeof(out_shstrtab)+dat_siz)&3));
+    }
     close(infd);
     close(outfd);
 
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 961f48f..705fa98 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -31,6 +31,9 @@ OUTPUT_ARCH(i386:x86-64)
 PHDRS
 {
   text PT_LOAD ;
+#if defined(BUILD_ID) && !defined(EFI)
+  note PT_NOTE ;
+#endif
 }
 SECTIONS
 {
@@ -75,6 +78,11 @@ SECTIONS
 
        *(.rodata)
        *(.rodata.*)
+#if defined(BUILD_ID) && defined(EFI)
+       __note_gnu_build_id_start = .;
+       *(.note.gnu.build-id)
+       __note_gnu_build_id_end = .;
+#endif
 
        . = ALIGN(8);
        /* Exception table */
@@ -96,6 +104,21 @@ SECTIONS
        _erodata = .;
   } :text
 
+#if defined(BUILD_ID) && !defined(EFI)
+/*
+ * No mechanism to put an PT_NOTE in the EFI file - so put
+ * it in .data section.
+ */
+  . = ALIGN(4);
+  .note : {
+       __note_gnu_build_id_start = .;
+       *(.note.gnu.build-id)
+       __note_gnu_build_id_end = .;
+       *(.note)
+       *(.note.*)
+  } :note :text
+#endif
+
 #ifdef EFI
   . = ALIGN(MB(2));
 #endif
diff --git a/xen/common/version.c b/xen/common/version.c
index fc9bf42..af87371 100644
--- a/xen/common/version.c
+++ b/xen/common/version.c
@@ -1,5 +1,9 @@
 #include <xen/compile.h>
+#include <xen/errno.h>
+#include <xen/string.h>
+#include <xen/types.h>
 #include <xen/version.h>
+#include <xen/elf.h>
 
 const char *xen_compile_date(void)
 {
@@ -61,6 +65,53 @@ const char *xen_deny(void)
     return "<denied>";
 }
 
+#ifdef BUILD_ID
+#define NT_GNU_BUILD_ID 3
+/* Defined in linker script. */
+extern const Elf_Note __note_gnu_build_id_start[], __note_gnu_build_id_end[];
+
+int xen_build_id(const void **p, ssize_t *len)
+{
+    const Elf_Note *n = __note_gnu_build_id_start;
+    static bool_t checked = 0;
+
+    if ( checked )
+    {
+        *len = n->descsz;
+        *p = ELFNOTE_DESC(n);
+        return 0;
+    }
+    /* --build-id invoked with wrong parameters. */
+    if ( __note_gnu_build_id_end <= __note_gnu_build_id_start )
+        return -ENODATA;
+
+    /* Check for full Note header. */
+    if ( &n[1] > __note_gnu_build_id_end )
+        return -ENODATA;
+
+    /* Check if we really have a build-id. */
+    if ( NT_GNU_BUILD_ID != n->type )
+        return -ENODATA;
+
+    /* Sanity check, name should be "GNU" for ld-generated build-id. */
+    if ( strncmp(ELFNOTE_NAME(n), "GNU", n->namesz) != 0 )
+        return -ENODATA;
+
+    *len = n->descsz;
+    *p = ELFNOTE_DESC(n);
+
+    checked = 1;
+    return 0;
+}
+
+#else
+
+int xen_build_id(const void **p, ssize_t *len)
+{
+    return -ENODATA;
+}
+#endif
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/xen/version.h b/xen/include/xen/version.h
index 016a56c..a461c85 100644
--- a/xen/include/xen/version.h
+++ b/xen/include/xen/version.h
@@ -13,4 +13,7 @@ const char *xen_extra_version(void);
 const char *xen_changeset(void);
 const char *xen_banner(void);
 const char *xen_deny(void);
+#include <xen/types.h>
+int xen_build_id(const void **p, ssize_t *len);
+
 #endif /* __XEN_VERSION_H__ */
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 26/34] HYPERCALL_version_op: Add VERSION_OP_build_id to retrieve build-id.
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (24 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 25/34] build_id: Provide ld-embedded build-ids Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 17:56 ` [PATCH v4 27/34] libxl: info: Display build_id of the hypervisor using XEN_VERSION_OP_build_id Konrad Rzeszutek Wilk
                   ` (7 subsequent siblings)
  33 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Wei Liu, Daniel De Graaf, Stefano Stabellini, Ian Jackson,
	Konrad Rzeszutek Wilk

The VERSION_OP hypercall provides the flexibility to expose
the size of the build-id (so the callers can allocate the
proper size before trying to retrieve it). It also allows
in one nice swoop to retrieve the hypervisor build-id in the
provided buffer.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
---
 tools/flask/policy/policy/modules/xen/xen.te |  4 ++--
 xen/common/kernel.c                          | 14 ++++++++++++++
 xen/include/public/version.h                 |  3 +++
 xen/xsm/flask/hooks.c                        |  3 +++
 xen/xsm/flask/policy/access_vectors          |  2 ++
 5 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index bac0c9e..e5eb666 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -82,7 +82,7 @@ allow dom0_t xen_t:version {
     xen_changeset xen_platform_parameters xen_get_features xen_pagesize
     xen_guest_handle xen_commandline
     version extraversion capabilities changeset platform_parameters
-    get_features pagesize guest_handle commandline
+    get_features pagesize guest_handle commandline build_id
 };
 
 allow dom0_t xen_t:mmu memorymap;
@@ -150,7 +150,7 @@ if (guest_writeconsole) {
 allow domain_type xen_t:xen2 pmu_use;
 
 # For normal guests all except XENVER_commandline, VERSION_OP_changeset,
-# and VERSION_OP_commandline
+# VERSION_OP_commandline, and VERSION_OP_build_info
 allow domain_type xen_t:version {
     xen_version xen_extraversion xen_compile_info xen_capabilities
     xen_changeset xen_platform_parameters xen_get_features xen_pagesize
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index f06b3d9..96d08ed 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -390,6 +390,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             return -EFAULT;
         return 0;
     }
+
     }
 
     return -ENOSYS;
@@ -455,6 +456,13 @@ static int size_of_subops_data(unsigned int cmd, ssize_t *sz)
         *sz = ARRAY_SIZE(saved_cmdline);
         break;
 
+    case XEN_VERSION_OP_build_id:
+    {
+        const void *p;
+        rc = xen_build_id(&p, sz);
+        break;
+    }
+
     default:
         rc = -ENOSYS;
     }
@@ -549,6 +557,12 @@ DO(version_op)(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg,
         ptr = saved_cmdline;
         break;
 
+    case XEN_VERSION_OP_build_id:
+    {
+        rc = xen_build_id(&ptr, &sz);
+        break;
+    }
+
     default:
         rc = -ENOSYS;
     }
diff --git a/xen/include/public/version.h b/xen/include/public/version.h
index 4ceb97b..ca0ffca 100644
--- a/xen/include/public/version.h
+++ b/xen/include/public/version.h
@@ -157,6 +157,9 @@ DEFINE_XEN_GUEST_HANDLE(xen_version_op_buf_t);
 /* arg = version_op_buf */
 #define XEN_VERSION_OP_commandline 9
 
+/* arg = version_op_buf */
+#define XEN_VERSION_OP_build_id 10
+
 #endif /* __XEN_PUBLIC_VERSION_H__ */
 
 /*
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index fb5cc4a..29debc4 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1701,6 +1701,9 @@ static int flask_version_op (uint32_t op)
     case XEN_VERSION_OP_commandline:
         return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
                             VERSION__COMMANDLINE, NULL);
+    case XEN_VERSION_OP_build_id:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__BUILD_ID, NULL);
     default:
         return -EPERM;
     }
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index a227f88..5ff47c2 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -544,4 +544,6 @@ class version
     guest_handle
 # Xen command line.
     commandline
+# Build id of the hypervisor
+    build_id
 }
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 27/34] libxl: info: Display build_id of the hypervisor using XEN_VERSION_OP_build_id
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (25 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 26/34] HYPERCALL_version_op: Add VERSION_OP_build_id to retrieve build-id Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-16 18:12   ` Wei Liu
  2016-03-15 17:56 ` [PATCH v4 28/34] xsplice: Print build_id in keyhandler and on bootup Konrad Rzeszutek Wilk
                   ` (6 subsequent siblings)
  33 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Konrad Rzeszutek Wilk

If the hypervisor is built with we will display it.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>

v2: Include HAVE_*, use libxl_zalloc, s/rc/ret/
v3: Retry with different size if 1020 is not enough.
v4: Use VERSION_OP subops instead of the XENVER_ subops
---
 tools/libxl/libxl.c         | 19 +++++++++++++++++--
 tools/libxl/libxl.h         |  5 +++++
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl_cmdimpl.c    |  1 +
 4 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index dc660b7..f20b926 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5215,6 +5215,7 @@ const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx)
     GC_INIT(ctx);
     char *buf;
     xen_version_op_val_t val = 0;
+    int r;
     libxl_version_info *info = &ctx->version_info;
 
     if (info->xen_version_extra != NULL)
@@ -5256,8 +5257,22 @@ const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx)
         goto out;
     info->virt_start = val;
 
-    (void)xc_version_wrapper(ctx, XEN_VERSION_OP_commandline, buf,
-                             info->pagesize, &info->commandline);
+    if (xc_version_wrapper(ctx, XEN_VERSION_OP_commandline, buf,
+                           info->pagesize, &info->commandline) < 0)
+        goto out;
+
+    r = xc_version(ctx->xch, XEN_VERSION_OP_build_id, buf, info->pagesize);
+    if (r < 0)
+        info->build_id = libxl__strdup(NOGC, "");
+    else if (r > 0)
+    {
+        unsigned int i;
+
+        info->build_id = libxl__zalloc(NOGC, (r * 2) + 1);
+
+        for (i = 0; i < r; i++)
+            snprintf(&info->build_id[i * 2], 3, "%02hhx", buf[i]);
+    }
  out:
     GC_FREE;
     return info;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index f9e3ef5..18258ef 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -218,6 +218,11 @@
 #define LIBXL_HAVE_SOFT_RESET 1
 
 /*
+ * LIBXL_HAVE_BUILD_ID means that libxl_version_info has the extra
+ * field for the hypervisor build_id.
+ */
+#define LIBXL_HAVE_BUILD_ID 1
+/*
  * libxl ABI compatibility
  *
  * The only guarantee which libxl makes regarding ABI compatibility
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 632c009..9e2ef1a 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -362,6 +362,7 @@ libxl_version_info = Struct("version_info", [
     ("virt_start",        uint64),
     ("pagesize",          integer),
     ("commandline",       string),
+    ("build_id",          string),
     ], dir=DIR_OUT)
 
 libxl_domain_create_info = Struct("domain_create_info",[
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 990d3c9..f90a50c 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -5566,6 +5566,7 @@ static void output_xeninfo(void)
     printf("cc_compile_by          : %s\n", info->compile_by);
     printf("cc_compile_domain      : %s\n", info->compile_domain);
     printf("cc_compile_date        : %s\n", info->compile_date);
+    printf("build_id               : %s\n", info->build_id);
 
     return;
 }
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 28/34] xsplice: Print build_id in keyhandler and on bootup.
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (26 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 27/34] libxl: info: Display build_id of the hypervisor using XEN_VERSION_OP_build_id Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 17:56 ` [PATCH v4 29/34] xsplice: Stacking build-id dependency checking Konrad Rzeszutek Wilk
                   ` (5 subsequent siblings)
  33 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Tim Deegan, Ian Jackson, Jan Beulich, Konrad Rzeszutek Wilk

As it should be an useful debug mechanism.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
--
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>

v2: s/char */const void *
---
 xen/common/xsplice.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index e5867bd..effae86 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -16,6 +16,7 @@
 #include <xen/spinlock.h>
 #include <xen/string.h>
 #include <xen/symbols.h>
+#include <xen/version.h>
 #include <xen/vmap.h>
 #include <xen/wait.h>
 #include <xen/xsplice_elf.h>
@@ -1295,8 +1296,13 @@ static const char *state2str(uint32_t state)
 static void xsplice_printall(unsigned char key)
 {
     struct payload *data;
+    const void *binary_id = NULL;
+    ssize_t len = 0;
     unsigned int i;
 
+    if ( !xen_build_id(&binary_id, &len) )
+        printk("build-id: %*phN\n", (int)len, binary_id);
+
     spin_lock_recursive(&payload_lock);
 
     list_for_each_entry ( data, &payload_list, list )
@@ -1319,8 +1325,14 @@ static void xsplice_printall(unsigned char key)
 
 static int __init xsplice_init(void)
 {
+    const void *binary_id = NULL;
+    ssize_t len = 0;
+
     BUILD_BUG_ON( sizeof(struct xsplice_patch_func) != 64 );
 
+    if ( !xen_build_id(&binary_id, &len) )
+        printk(XENLOG_INFO "%s: build-id: %*phN\n", XSPLICE, (int)len, binary_id);
+
     register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
     arch_xsplice_register_find_space(&find_hole);
     return 0;
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 29/34] xsplice: Stacking build-id dependency checking.
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (27 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 28/34] xsplice: Print build_id in keyhandler and on bootup Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 17:56 ` [PATCH v4 30/34] xsplice/xen_replace_world: Test-case for XSPLICE_ACTION_REPLACE Konrad Rzeszutek Wilk
                   ` (4 subsequent siblings)
  33 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Jan Beulich, Konrad Rzeszutek Wilk

We now expect that the ELF payloads be built with the
--build-id.

Also the .xsplice.deps section has to have the contents
of the hypervisor (or a preceding payload) build-id.

We already have the code to verify the Elf_Note build-id
so export parts of it.

This dependency means the hypervisor MUST be compiled with
--build-id - so we gate the build of xSplice on the availability
of said functionality.

This does not impact the ordering of how the payloads can
be loaded, but it does enforce an STRICT ordering when the
payloads are applied. Also the REPLACE is special - we need
to check that its dependency against the hypervisor - not
the last applied patch.

To make this easier to test we also add an extra test-case
to be used - which can only be applied on top of the
xen_hello_world payload.

As in, one can apply xen_hello_world and then xen_bye_world
on top of that. Not the other way.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v2: Andrew fix against the build_id.o mutilations.
v3: Andrew fix to not include extra symbols in binary.id
---
 Config.mk                              |   1 +
 docs/misc/xsplice.markdown             |  84 ++++++++++++++++++---------
 xen/arch/x86/test/Makefile             |  41 +++++++++++--
 xen/arch/x86/test/xen_bye_world.c      |  35 +++++++++++
 xen/arch/x86/test/xen_bye_world_func.c |  25 ++++++++
 xen/common/Kconfig                     |   5 ++
 xen/common/version.c                   |  40 +++++++++----
 xen/common/xsplice.c                   | 102 ++++++++++++++++++++++++++++++++-
 xen/include/xen/version.h              |   3 +
 xen/include/xen/xsplice.h              |   5 ++
 10 files changed, 297 insertions(+), 44 deletions(-)
 create mode 100644 xen/arch/x86/test/xen_bye_world.c
 create mode 100644 xen/arch/x86/test/xen_bye_world_func.c

diff --git a/Config.mk b/Config.mk
index c8e89fe..80629e3 100644
--- a/Config.mk
+++ b/Config.mk
@@ -134,6 +134,7 @@ ifeq ($(call ld-ver-build-id,$(LD)),n)
 build_id_linker :=
 else
 CFLAGS += -DBUILD_ID
+export XEN_HAS_BUILD_ID=y
 build_id_linker := --build-id=sha1
 endif
 
diff --git a/docs/misc/xsplice.markdown b/docs/misc/xsplice.markdown
index d1f2a5b..c453b67 100644
--- a/docs/misc/xsplice.markdown
+++ b/docs/misc/xsplice.markdown
@@ -283,8 +283,7 @@ The xSplice core code loads the payload as a standard ELF binary, relocates it
 and handles the architecture-specifc sections as needed. This process is much
 like what the Linux kernel module loader does.
 
-The payload contains a section (xsplice_patch_func) with an array of structures
-describing the functions to be patched:
+The payload contains at least three sections:
 It optionally may contain the address of functions to be called right before
 being applied and after being reverted:
 
@@ -292,6 +291,15 @@ being applied and after being reverted:
  * `.xsplice.hooks.unload` - an array of function pointers.
 
 
+ * `.xsplice.funcs` - which is an array of xsplice_patch_func structures.
+ * `.xsplice.depends` - which is an ELF Note that describes what the payload
+    depends on.
+ *  `.note.gnu.build-id` - the build-id of this payload.
+
+### .xsplice.funcs
+
+The `.xsplice.funcs` contains an array of xsplice_patch_func structures
+which describe the functions to be patched:
 <pre>
 struct xsplice_patch_func {  
     const char *name;  
@@ -333,7 +341,7 @@ When reverting a patch, the hypervisor iterates over each `xsplice_patch_func`
 and the core code copies the data from the undo buffer (private internal copy)
 to `old_addr`.
 
-### Example
+### Example of .xsplice.funcs
 
 A simple example of what a payload file can be:
 
@@ -379,10 +387,29 @@ Each entry in this array is eight bytes.
 
 The type definition of the function are as follow:
 
+
 <pre>
 typedef void (*xsplice_loadcall_t)(void);  
 typedef void (*xsplice_unloadcall_t)(void);   
 </pre>
+
+### .xsplice.depends and .note.gnu.build-id
+
+To support dependencies checking and safe loading (to load the
+appropiate payload against the right hypervisor) there is a need
+to embbed an build-id dependency.
+
+This is done by the payload containing an section `.xsplice.depends`
+which follows the format of an ELF Note. The contents of this
+(name, and description) are specific to the linker utilized to
+build the hypevisor and payload.
+
+If GNU linker is used then the name is `GNU` and the description
+is an NT_GNU_BUILD_ID type ID. The description can be an SHA1
+checksum, MD5 checksum or any unique value.
+
+The size of these structures varies with the --build-id linker option.
+
 ## Hypercalls
 
 We will employ the sub operations of the system management hypercall (sysctl).
@@ -867,30 +894,6 @@ This is implemented in the Xen Project hypervisor.
 
 Only the privileged domain should be allowed to do this operation.
 
-
-# Not Yet Done
-
-This is for further development of xSplice.
-
-## Goals
-
-The implementation must also have a mechanism for:
-
- *  An dependency mechanism for the payloads. To use that information to load:
-    - The appropiate payload. To verify that payload is built against the
-      hypervisor. This can be done via the `build-id`
-      or via providing an copy of the old code - so that the hypervisor can
-       verify it against the code in memory.
-    - To construct an appropiate order of payloads to load in case they
-      depend on each other.
- * Be able to lookup in the Xen hypervisor the symbol names of functions from the ELF payload.
- * Be able to patch .rodata, .bss, and .data sections.
- * Further safety checks (blacklist of which functions cannot be patched, check
-   the stack, make sure the payload is built with same compiler as hypervisor,
-   and NMI/MCE handlers and do_nmi for right now - until an safe solution is found).
- * NOP out the code sequence if `new_size` is zero.
- * Deal with other relocation types:  R_X86_64_[8,16,32,32S], R_X86_64_PC[8,16,64] in payload file.
-
 ### xSplice interdependencies
 
 xSplice patches interdependencies are tricky.
@@ -917,6 +920,33 @@ being loaded and requires an hypervisor build-id to match against.
 The old code allows much more flexibility and an additional guard,
 but is more complex to implement.
 
+The second option which requires an build-id of the hypervisor
+is implemented in the Xen Project hypervisor.
+
+Specifically each payload has two build-id ELF notes:
+ * The build-id of the payload itself (generated via --build-id).
+ * The build-id of the payload it depends on (extracted from the
+   the previous payload or hypervisor during build time).
+
+This means that the very first payload depends on the hypervisor
+build-id.
+
+# Not Yet Done
+
+This is for further development of xSplice.
+
+## Goals
+
+The implementation must also have a mechanism for:
+
+ * Be able to lookup in the Xen hypervisor the symbol names of functions from the ELF payload.
+ * Be able to patch .rodata, .bss, and .data sections.
+ * Further safety checks (blacklist of which functions cannot be patched, check
+   the stack, make sure the payload is built with same compiler as hypervisor,
+   and NMI/MCE handlers and do_nmi for right now - until an safe solution is found).
+ * NOP out the code sequence if `new_size` is zero.
+ * Deal with other relocation types:  R_X86_64_[8,16,32,32S], R_X86_64_PC[8,16,64] in payload file.
+
 ### Handle inlined __LINE__
 
 This problem is related to hotpatch construction
diff --git a/xen/arch/x86/test/Makefile b/xen/arch/x86/test/Makefile
index 45df301..a84c609 100644
--- a/xen/arch/x86/test/Makefile
+++ b/xen/arch/x86/test/Makefile
@@ -7,13 +7,16 @@ CODE_SZ=$(shell nm --defined -S $(1) | grep $(2) | awk '{ print "0x"$$2}')
 ifdef CONFIG_XSPLICE
 
 XSPLICE := xen_hello_world.xsplice
+XSPLICE_BYE := xen_bye_world.xsplice
 
 default: xsplice
 
 install: xsplice
 	$(INSTALL_DATA) $(XSPLICE) $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE)
+	$(INSTALL_DATA) $(XSPLICE_BYE) $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE_BYE)
 uninstall:
 	rm -f $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE)
+	rm -f $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE_BYE)
 else
 default:
 install:
@@ -22,7 +25,7 @@ endif
 
 .PHONY: clean
 clean::
-	rm -f *.o .*.o.d $(XSPLICE) config.h
+	rm -f *.o .*.o.d $(XSPLICE) config.h build_id.o
 
 #
 # To compute these values we need the binary files: xen-syms
@@ -34,15 +37,45 @@ clean::
 .PHONY: config.h
 config.h: OLD_CODE_SZ=$(call CODE_SZ,$(BASEDIR)/xen-syms,xen_extra_version)
 config.h: NEW_CODE_SZ=$(call CODE_SZ,$<,xen_hello_world)
-config.h: xen_hello_world_func.o
+config.h: xen_hello_world_func.o xen_bye_world_func.o
 	(set -e; \
 	 echo "#define NEW_CODE_SZ $(NEW_CODE_SZ)"; \
 	 echo "#define OLD_CODE_SZ $(OLD_CODE_SZ)") > $@
 
+#
+# This target is only accessible if CONFIG_XSPLICE is defined, which
+# depends on $(build_id_linker) being available. Hence we do not
+# need any checks.
+#
+.PHONY: build_id.o
+build_id.o:
+	$(OBJCOPY) -O binary --only-section=.note $(BASEDIR)/xen-syms $@.bin
+	$(OBJCOPY) -I binary -O elf64-x86-64 -B i386:x86-64 \
+		   --rename-section=.data=.xsplice.depends -S $@.bin $@
+	rm -f $@.bin
+
+
+#
+# Extract the build-id of the xen_hello_world.xsplice
+# (which xen_bye_world will depend on).
+#
+.PHONY: hello_world_build_id.o
+hello_world_build_id.o:
+	$(OBJCOPY) -O binary --only-section=.note.gnu.build-id $(XSPLICE) $@.bin
+	$(OBJCOPY)  -I binary -O elf64-x86-64 -B i386:x86-64 \
+		   --rename-section=.data=.xsplice.depends -S $@.bin $@
+	rm -f $@.bin
+
 .PHONY: xsplice
-xsplice: config.h
+xsplice: config.h build_id.o
 	# Need to have these done in sequential order
 	$(MAKE) -f $(BASEDIR)/Rules.mk xen_hello_world_func.o
 	$(MAKE) -f $(BASEDIR)/Rules.mk xen_hello_world.o
-	$(LD) $(LDFLAGS) -r -o $(XSPLICE) xen_hello_world_func.o xen_hello_world.o
+	$(LD) $(LDFLAGS) $(build_id_linker) -r -o $(XSPLICE) \
+		xen_hello_world_func.o xen_hello_world.o build_id.o
+	$(MAKE) -f $(BASEDIR)/Rules.mk xen_bye_world_func.o
+	$(MAKE) -f $(BASEDIR)/Rules.mk xen_bye_world.o
+	$(MAKE) -f $(BASEDIR)/Rules.mk hello_world_build_id.o
+	$(LD) $(LDFLAGS) $(build_id_linker) -r -o $(XSPLICE_BYE) \
+		xen_bye_world_func.o xen_bye_world.o hello_world_build_id.o
 
diff --git a/xen/arch/x86/test/xen_bye_world.c b/xen/arch/x86/test/xen_bye_world.c
new file mode 100644
index 0000000..d26641c
--- /dev/null
+++ b/xen/arch/x86/test/xen_bye_world.c
@@ -0,0 +1,35 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#include <xen/config.h>
+#include <xen/types.h>
+#include <xen/xsplice_patch.h>
+#include <xen/xsplice.h>
+#include "config.h"
+#include <xen/lib.h>
+
+static char xen_bye_world_name[] = "xen_bye_world";
+extern const char *xen_bye_world(void);
+
+/* External symbol. */
+extern const char *xen_extra_version(void);
+
+struct xsplice_patch_func __section(".xsplice.funcs") xsplice_xen_bye_world = {
+    .name = xen_bye_world_name,
+    .new_addr = (unsigned long)(xen_bye_world),
+    .old_addr = (unsigned long)(xen_extra_version),
+    .new_size = NEW_CODE_SZ,
+    .old_size = OLD_CODE_SZ,
+};
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/test/xen_bye_world_func.c b/xen/arch/x86/test/xen_bye_world_func.c
new file mode 100644
index 0000000..574268c
--- /dev/null
+++ b/xen/arch/x86/test/xen_bye_world_func.c
@@ -0,0 +1,25 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#include <xen/config.h>
+#include <xen/types.h>
+#include <asm/nops.h>
+#include <asm/alternative.h>
+
+/* Our replacement function for xen_hello_world. */
+const char *xen_bye_world(void)
+{
+    return "Bye World!";
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index dbe9ccc..d153c4a 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -54,6 +54,10 @@ config HAS_GDBSX
 config HAS_IOPORTS
 	bool
 
+config HAS_BUILD_ID
+	string
+	option env="XEN_HAS_BUILD_ID"
+
 # Enable/Disable kexec support
 config KEXEC
 	bool "kexec support"
@@ -172,6 +176,7 @@ endmenu
 config XSPLICE
 	bool "xSplice live patching support"
 	default y
+	depends on HAS_BUILD_ID = "y"
 	---help---
 	  Allows a running Xen hypervisor to be dynamically patched using
 	  binary patches without rebooting. This is primarily used to binarily
diff --git a/xen/common/version.c b/xen/common/version.c
index af87371..50906b1 100644
--- a/xen/common/version.c
+++ b/xen/common/version.c
@@ -70,10 +70,29 @@ const char *xen_deny(void)
 /* Defined in linker script. */
 extern const Elf_Note __note_gnu_build_id_start[], __note_gnu_build_id_end[];
 
+int xen_build_id_check(const Elf_Note *n, const void **p, ssize_t *len)
+{
+    /* Check if we really have a build-id. */
+    if ( NT_GNU_BUILD_ID != n->type )
+        return -ENODATA;
+
+    /* Sanity check, name should be "GNU" for ld-generated build-id. */
+    if ( strncmp(ELFNOTE_NAME(n), "GNU", n->namesz) != 0 )
+        return -ENODATA;
+
+    if ( len )
+        *len = n->descsz;
+    if ( p )
+        *p = ELFNOTE_DESC(n);
+
+    return 0;
+}
+
 int xen_build_id(const void **p, ssize_t *len)
 {
     const Elf_Note *n = __note_gnu_build_id_start;
     static bool_t checked = 0;
+    int rc;
 
     if ( checked )
     {
@@ -89,23 +108,20 @@ int xen_build_id(const void **p, ssize_t *len)
     if ( &n[1] > __note_gnu_build_id_end )
         return -ENODATA;
 
-    /* Check if we really have a build-id. */
-    if ( NT_GNU_BUILD_ID != n->type )
-        return -ENODATA;
-
-    /* Sanity check, name should be "GNU" for ld-generated build-id. */
-    if ( strncmp(ELFNOTE_NAME(n), "GNU", n->namesz) != 0 )
-        return -ENODATA;
-
-    *len = n->descsz;
-    *p = ELFNOTE_DESC(n);
+    rc = xen_build_id_check(n, p, len);
+    if ( !rc )
+        checked = 1;
 
-    checked = 1;
-    return 0;
+    return rc;
 }
 
 #else
 
+int xen_build_id_check(const Elf_Note *n, const void **p, ssize_t *len)
+{
+    return -ENODATA;
+}
+
 int xen_build_id(const void **p, ssize_t *len)
 {
     return -ENODATA;
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index effae86..e963ccd 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -5,6 +5,7 @@
 
 #include <xen/bug_ex_symbols.h>
 #include <xen/cpu.h>
+#include <xen/elf.h>
 #include <xen/guest_access.h>
 #include <xen/keyhandler.h>
 #include <xen/lib.h>
@@ -62,6 +63,8 @@ struct payload {
     xsplice_unloadcall_t *unload_funcs;  /* load and unload of the payload. */
     unsigned int n_load_funcs;           /* Nr of the funcs to load and execute. */
     unsigned int n_unload_funcs;         /* Nr of funcs to call durung unload. */
+    struct xsplice_build_id id;          /* ELFNOTE_DESC(.note.gnu.build-id) of the payload. */
+    struct xsplice_build_id dep;         /* ELFNOTE_DESC(.xsplice.depends). */
     char name[XEN_XSPLICE_NAME_SIZE + 1];/* Name of it. */
 };
 
@@ -379,7 +382,9 @@ static int check_special_sections(struct payload *payload,
                                   struct xsplice_elf *elf)
 {
     unsigned int i;
-    static const char *const names[] = { ".xsplice.funcs" };
+    static const char *const names[] = { ".xsplice.funcs" ,
+                                         ".xsplice.depends",
+                                         ".note.gnu.build-id"};
 
     for ( i = 0; i < ARRAY_SIZE(names); i++ )
     {
@@ -398,6 +403,8 @@ static int check_special_sections(struct payload *payload,
     return 0;
 }
 
+#define NT_GNU_BUILD_ID 3
+
 static int prepare_payload(struct payload *payload,
                            struct xsplice_elf *elf)
 {
@@ -405,6 +412,7 @@ static int prepare_payload(struct payload *payload,
     unsigned int i;
     struct xsplice_patch_func *f;
     struct virtual_region *region;
+    Elf_Note *n;
 
     sec = xsplice_elf_sec_by_name(elf, ".xsplice.funcs");
     if ( sec )
@@ -480,6 +488,33 @@ static int prepare_payload(struct payload *payload,
         payload->n_unload_funcs = sec->sec->sh_size / (sizeof *payload->unload_funcs);
     }
 
+    sec = xsplice_elf_sec_by_name(elf, ".note.gnu.build-id");
+    if ( sec )
+    {
+        n = (Elf_Note *)sec->load_addr;
+        if ( sec->sec->sh_size <= sizeof *n )
+            return -EINVAL;
+
+        if ( xen_build_id_check(n, &payload->id.p, &payload->id.len) )
+            return -EINVAL;
+
+        if ( !payload->id.len || !payload->id.p )
+            return -EINVAL;
+    }
+
+    sec = xsplice_elf_sec_by_name(elf, ".xsplice.depends");
+    {
+        n = (Elf_Note *)sec->load_addr;
+        if ( sec->sec->sh_size <= sizeof *n )
+            return -EINVAL;
+
+        if ( xen_build_id_check(n, &payload->dep.p, &payload->dep.len) )
+            return -EINVAL;
+
+        if ( !payload->dep.len || !payload->dep.p )
+            return -EINVAL;
+    }
+
     /* Setup the virtual region with proper data. */
     region = &payload->bug_ex_region;
     region->skip = ignore_region;
@@ -1175,6 +1210,53 @@ void check_for_xsplice_work(void)
     }
 }
 
+/*
+ * Only allow dependent payload is applied on top of the correct
+ * build-id.
+ *
+ * This enforces an stacking order - the first payload MUST be against the
+ * hypervisor. The second against the first payload, and so on.
+ *
+ * Unless the 'ignore' parameter is used - in which case we only
+ * check against the hypervisor.
+ */
+static int build_id_dep(struct payload *payload, bool_t ignore)
+{
+    const void *id = NULL;
+    ssize_t len = 0;
+    int rc;
+    const char *name = "hypervisor";
+
+    ASSERT(payload->dep.len && payload->dep.p);
+
+    /* First time user is against hypervisor. */
+    if ( ignore || list_empty(&applied_list) )
+    {
+        rc = xen_build_id(&id, &len);
+        if ( rc )
+            return rc;
+    }
+    else
+    {
+        /* We should be against the last applied one. */
+        struct payload *data = list_last_entry(&applied_list, struct payload,
+                                               applied_list);
+
+        id = data->id.p;
+        len = data->id.len;
+        name = data->name;
+    }
+
+    if ( payload->dep.len != len ||
+         memcmp(id, payload->dep.p, len) )
+    {
+        dprintk(XENLOG_DEBUG, "%s%s: check against %s build-id failed!\n",
+                XSPLICE, payload->name, name);
+        return -EINVAL;
+    }
+    return 0;
+}
+
 static int xsplice_action(xen_sysctl_xsplice_action_t *action)
 {
     struct payload *data;
@@ -1213,6 +1295,18 @@ static int xsplice_action(xen_sysctl_xsplice_action_t *action)
     case XSPLICE_ACTION_REVERT:
         if ( data->state == XSPLICE_STATE_APPLIED )
         {
+            struct payload *p = list_last_entry(&applied_list, struct payload,
+                                                   applied_list);
+
+            ASSERT(p);
+            /* We should be the last applied one. */
+            if ( p != data )
+            {
+                dprintk(XENLOG_DEBUG, "%s%s: can't unload. Top is %s!\n",
+                        XSPLICE, data->name, p->name);
+                rc = -EBUSY;
+                break;
+            }
             data->rc = -EAGAIN;
             rc = schedule_work(data, action->cmd, action->timeout);
         }
@@ -1221,6 +1315,9 @@ static int xsplice_action(xen_sysctl_xsplice_action_t *action)
     case XSPLICE_ACTION_APPLY:
         if ( (data->state == XSPLICE_STATE_CHECKED) )
         {
+            rc = build_id_dep(data, 0 /* against top one. */);
+            if ( rc )
+                break;
             data->rc = -EAGAIN;
             rc = schedule_work(data, action->cmd, action->timeout);
         }
@@ -1229,6 +1326,9 @@ static int xsplice_action(xen_sysctl_xsplice_action_t *action)
     case XSPLICE_ACTION_REPLACE:
         if ( data->state == XSPLICE_STATE_CHECKED )
         {
+            rc = build_id_dep(data, 1 /* against hypervisor. */);
+            if ( rc )
+                break;
             data->rc = -EAGAIN;
             rc = schedule_work(data, action->cmd, action->timeout);
         }
diff --git a/xen/include/xen/version.h b/xen/include/xen/version.h
index a461c85..bc92370 100644
--- a/xen/include/xen/version.h
+++ b/xen/include/xen/version.h
@@ -16,4 +16,7 @@ const char *xen_deny(void);
 #include <xen/types.h>
 int xen_build_id(const void **p, ssize_t *len);
 
+#include <xen/elfstructs.h>
+int xen_build_id_check(const Elf_Note *n, const void **p, ssize_t *len);
+
 #endif /* __XEN_VERSION_H__ */
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index 16d35b8..82b20ca 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -40,6 +40,11 @@ struct xsplice_symbol {
     bool_t new_symbol;
 };
 
+struct xsplice_build_id {
+   const void *p;
+   ssize_t len;
+};
+
 int xsplice_op(struct xen_sysctl_xsplice_op *);
 void check_for_xsplice_work(void);
 bool_t is_patch(const void *addr);
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 30/34] xsplice/xen_replace_world: Test-case for XSPLICE_ACTION_REPLACE
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (28 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 29/34] xsplice: Stacking build-id dependency checking Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 17:56 ` [PATCH v4 31/34] xsplice: Print dependency and payloads build_id in the keyhandler Konrad Rzeszutek Wilk
                   ` (3 subsequent siblings)
  33 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Jan Beulich, Konrad Rzeszutek Wilk

With this third payload one can do:

-bash-4.1# xen-xsplice load xen_hello_world.xsplice
Uploading xen_hello_world.xsplice (10148 bytes)
Performing check: completed
Performing apply:. completed

[xen_hello_world depends on hypervisor build-id]
-bash-4.1# xen-xsplice load xen_bye_world.xsplice
Uploading xen_bye_world.xsplice (7076 bytes)
Performing check: completed
Performing apply:. completed
[xen_bye_world depends on xen_hello_world build-id]
-bash-4.1# xen-xsplice upload xen_replace_world xen_replace_world.xsplice
Uploading xen_replace_world.xsplice (7148 bytes)
-bash-4.1# xen-xsplice list
 ID                                     | status
----------------------------------------+------------
xen_hello_world                         | APPLIED
xen_bye_world                           | APPLIED
xen_replace_world                       | CHECKED
-bash-4.1# xen-xsplice replace xen_replace_world
Performing replace:. completed
-bash-4.1# xl info | grep extra
xen_extra              : Hello Again World!
-bash-4.1# xen-xsplice list
 ID                                     | status
----------------------------------------+------------
xen_hello_world                         | CHECKED
xen_bye_world                           | CHECKED
xen_replace_world                       | APPLIED

and revert both of the previous payloads and apply
the xen_replace_world.

All the magic of this is in the Makefile - we extract
the build-id from the hypervisor (xen-syms) and jam it
in the xen_replace_world as .xsplice.depends.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v1: Make the objcopy use -S to strip the name.
---
---
 xen/arch/x86/test/Makefile                 | 10 +++++++--
 xen/arch/x86/test/xen_replace_world.c      | 35 ++++++++++++++++++++++++++++++
 xen/arch/x86/test/xen_replace_world_func.c | 25 +++++++++++++++++++++
 3 files changed, 68 insertions(+), 2 deletions(-)
 create mode 100644 xen/arch/x86/test/xen_replace_world.c
 create mode 100644 xen/arch/x86/test/xen_replace_world_func.c

diff --git a/xen/arch/x86/test/Makefile b/xen/arch/x86/test/Makefile
index a84c609..c9bcb1c 100644
--- a/xen/arch/x86/test/Makefile
+++ b/xen/arch/x86/test/Makefile
@@ -8,15 +8,18 @@ ifdef CONFIG_XSPLICE
 
 XSPLICE := xen_hello_world.xsplice
 XSPLICE_BYE := xen_bye_world.xsplice
+XSPLICE_REPLACE := xen_replace_world.xsplice
 
 default: xsplice
 
 install: xsplice
 	$(INSTALL_DATA) $(XSPLICE) $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE)
 	$(INSTALL_DATA) $(XSPLICE_BYE) $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE_BYE)
+	$(INSTALL_DATA) $(XSPLICE_REPLACE) $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE_REPLACE)
 uninstall:
 	rm -f $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE)
 	rm -f $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE_BYE)
+	rm -f $(DESTDIR)$(DEBUG_DIR)/$(XSPLICE_REPLACE)
 else
 default:
 install:
@@ -54,7 +57,6 @@ build_id.o:
 		   --rename-section=.data=.xsplice.depends -S $@.bin $@
 	rm -f $@.bin
 
-
 #
 # Extract the build-id of the xen_hello_world.xsplice
 # (which xen_bye_world will depend on).
@@ -78,4 +80,8 @@ xsplice: config.h build_id.o
 	$(MAKE) -f $(BASEDIR)/Rules.mk hello_world_build_id.o
 	$(LD) $(LDFLAGS) $(build_id_linker) -r -o $(XSPLICE_BYE) \
 		xen_bye_world_func.o xen_bye_world.o hello_world_build_id.o
-
+	$(MAKE) -f $(BASEDIR)/Rules.mk xen_replace_world_func.o
+	$(MAKE) -f $(BASEDIR)/Rules.mk xen_replace_world.o
+	$(LD) $(LDFLAGS) $(build_id_linker) -r -o $(XSPLICE_REPLACE) \
+		 xen_replace_world_func.o \
+		 xen_replace_world.o build_id.o
diff --git a/xen/arch/x86/test/xen_replace_world.c b/xen/arch/x86/test/xen_replace_world.c
new file mode 100644
index 0000000..72871f7
--- /dev/null
+++ b/xen/arch/x86/test/xen_replace_world.c
@@ -0,0 +1,35 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#include <xen/config.h>
+#include <xen/types.h>
+#include <xen/xsplice_patch.h>
+#include <xen/xsplice.h>
+#include "config.h"
+#include <xen/lib.h>
+
+static char xen_replace_world_name[] = "xen_replace_world";
+extern const char *xen_replace_world(void);
+
+/* External symbol. */
+extern const char *xen_extra_version(void);
+
+struct xsplice_patch_func __section(".xsplice.funcs") xsplice_xen_replace_world = {
+    .name = xen_replace_world_name,
+    .new_addr = (unsigned long)(xen_replace_world),
+    .old_addr = (unsigned long)(xen_extra_version),
+    .new_size = NEW_CODE_SZ,
+    .old_size = OLD_CODE_SZ,
+};
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/test/xen_replace_world_func.c b/xen/arch/x86/test/xen_replace_world_func.c
new file mode 100644
index 0000000..75ee3b5
--- /dev/null
+++ b/xen/arch/x86/test/xen_replace_world_func.c
@@ -0,0 +1,25 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#include <xen/config.h>
+#include <xen/types.h>
+#include <asm/nops.h>
+#include <asm/alternative.h>
+
+/* Our replacement function for xen_hello_world. */
+const char *xen_replace_world(void)
+{
+    return "Hello Again World!";
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 31/34] xsplice: Print dependency and payloads build_id in the keyhandler.
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (29 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 30/34] xsplice/xen_replace_world: Test-case for XSPLICE_ACTION_REPLACE Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 17:56 ` [PATCH v4 32/34] xsplice: Prevent duplicate payloads from being loaded Konrad Rzeszutek Wilk
                   ` (2 subsequent siblings)
  33 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Tim Deegan, Ian Jackson, Jan Beulich, Konrad Rzeszutek Wilk

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
---
---
 xen/common/xsplice.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index e963ccd..e8af051 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -1419,6 +1419,11 @@ static void xsplice_printall(unsigned char key)
             if ( !(i % 100) )
                 process_pending_softirqs();
         }
+        if ( data->id.len )
+            printk("build-id=%*phN\n", (int)data->id.len, data->id.p);
+
+        if ( data->dep.len )
+            printk("depend-on=%*phN\n", (int)data->dep.len, data->dep.p);
     }
     spin_unlock_recursive(&payload_lock);
 }
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 32/34] xsplice: Prevent duplicate payloads from being loaded.
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (30 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 31/34] xsplice: Print dependency and payloads build_id in the keyhandler Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 17:56 ` [PATCH v4 33/34] xsplice: Add support for shadow variables Konrad Rzeszutek Wilk
  2016-03-15 17:56 ` [PATCH v4 34/34] MAINTAINERS/xsplice: Add myself and Ross as the maintainers Konrad Rzeszutek Wilk
  33 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Tim Deegan, Ian Jackson, Jan Beulich, Konrad Rzeszutek Wilk

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
---
---
 xen/common/xsplice.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index e8af051..b745c1b 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -501,6 +501,27 @@ static int prepare_payload(struct payload *payload,
         if ( !payload->id.len || !payload->id.p )
             return -EINVAL;
     }
+    /* Make sure it is not a duplicate. */
+    if ( payload->id.len )
+    {
+        struct payload *data;
+
+        spin_lock_recursive(&payload_lock);
+        list_for_each_entry ( data, &payload_list, list )
+        {
+            /* No way payload is on the list. */
+            ASSERT( data != payload );
+            if ( data->id.len &&
+                 !memcmp(data->id.p, payload->id.p, data->id.len) )
+            {
+                spin_unlock_recursive(&payload_lock);
+                dprintk(XENLOG_DEBUG, "%s%s: Already loaded as %s!\n",
+                        XSPLICE, elf->name, data->name);
+                return -EEXIST;
+            }
+        }
+        spin_unlock_recursive(&payload_lock);
+    }
 
     sec = xsplice_elf_sec_by_name(elf, ".xsplice.depends");
     {
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 33/34] xsplice: Add support for shadow variables.
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (31 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 32/34] xsplice: Prevent duplicate payloads from being loaded Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-15 17:56 ` [PATCH v4 34/34] MAINTAINERS/xsplice: Add myself and Ross as the maintainers Konrad Rzeszutek Wilk
  33 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Ian Jackson, Jan Beulich, Tim Deegan

From: Ross Lagerwall <ross.lagerwall@citrix.com>

Shadow variables are a piece of infrastructure to be used by xsplice
modules. They are used to attach a new piece of data to an existing
structure in memory.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>

v2: Add Copyright
---
---
 xen/common/Makefile             |   1 +
 xen/common/xsplice_shadow.c     | 109 ++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/xsplice_patch.h |  36 +++++++++++++
 3 files changed, 146 insertions(+)
 create mode 100644 xen/common/xsplice_shadow.c

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 9b7fac7..200a544 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -62,6 +62,7 @@ obj-$(CONFIG_XENOPROF) += xenoprof.o
 obj-y += xmalloc_tlsf.o
 obj-$(CONFIG_XSPLICE) += xsplice.o
 obj-$(CONFIG_XSPLICE) += xsplice_elf.o
+obj-$(CONFIG_XSPLICE) += xsplice_shadow.o
 
 obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma unlzo unlz4 earlycpio,$(n).init.o)
 
diff --git a/xen/common/xsplice_shadow.c b/xen/common/xsplice_shadow.c
new file mode 100644
index 0000000..4196a0a
--- /dev/null
+++ b/xen/common/xsplice_shadow.c
@@ -0,0 +1,109 @@
+/*
+ * Copyright (C) 2016 Citrix Systems R&D Ltd.
+ */
+
+#include <xen/init.h>
+#include <xen/kernel.h>
+#include <xen/lib.h>
+#include <xen/list.h>
+#include <xen/spinlock.h>
+#include <xen/xsplice_patch.h>
+
+#define SHADOW_SLOTS 256
+struct hlist_head shadow_tbl[SHADOW_SLOTS];
+static DEFINE_SPINLOCK(shadow_lock);
+
+struct shadow_var {
+    struct hlist_node list;         /* Linked to 'shadow_tbl' */
+    void *data;
+    const void *obj;
+    char var[16];
+};
+
+void *xsplice_shadow_alloc(const void *obj, const char *var, size_t size)
+{
+    struct shadow_var *shadow;
+    unsigned int slot;
+
+    shadow = xmalloc(struct shadow_var);
+    if ( !shadow )
+        return NULL;
+
+    shadow->obj = obj;
+    strlcpy(shadow->var, var, sizeof shadow->var);
+    shadow->data = xmalloc_bytes(size);
+    if ( !shadow->data )
+    {
+        xfree(shadow);
+        return NULL;
+    }
+
+    slot = (unsigned long)obj % SHADOW_SLOTS;
+    spin_lock(&shadow_lock);
+    hlist_add_head(&shadow->list, &shadow_tbl[slot]);
+    spin_unlock(&shadow_lock);
+
+    return shadow->data;
+}
+
+void xsplice_shadow_free(const void *obj, const char *var)
+{
+    struct shadow_var *entry, *shadow = NULL;
+    unsigned int slot;
+    struct hlist_node *next;
+
+    slot = (unsigned long)obj % SHADOW_SLOTS;
+
+    spin_lock(&shadow_lock);
+    hlist_for_each_entry(entry, next, &shadow_tbl[slot], list)
+    {
+        if ( entry->obj == obj &&
+             !strcmp(entry->var, var) )
+        {
+            shadow = entry;
+            break;
+        }
+    }
+    if (shadow)
+    {
+        hlist_del(&shadow->list);
+        xfree(shadow->data);
+        xfree(shadow);
+    }
+    spin_unlock(&shadow_lock);
+}
+
+void *xsplice_shadow_get(const void *obj, const char *var)
+{
+    struct shadow_var *entry;
+    unsigned int slot;
+    struct hlist_node *next;
+    void *ret = NULL;
+
+    slot = (unsigned long)obj % SHADOW_SLOTS;
+
+    spin_lock(&shadow_lock);
+    hlist_for_each_entry(entry, next, &shadow_tbl[slot], list)
+    {
+        if ( entry->obj == obj &&
+             !strcmp(entry->var, var) )
+        {
+            ret = entry->data;
+            break;
+        }
+    }
+
+    spin_unlock(&shadow_lock);
+    return ret;
+}
+
+static int __init xsplice_shadow_init(void)
+{
+    int i;
+
+    for ( i = 0; i < SHADOW_SLOTS; i++ )
+        INIT_HLIST_HEAD(&shadow_tbl[i]);
+
+    return 0;
+}
+__initcall(xsplice_shadow_init);
diff --git a/xen/include/xen/xsplice_patch.h b/xen/include/xen/xsplice_patch.h
index 19d3f76..e297fe1 100644
--- a/xen/include/xen/xsplice_patch.h
+++ b/xen/include/xen/xsplice_patch.h
@@ -56,4 +56,40 @@ typedef void (*xsplice_unloadcall_t)(void);
 #define XSPLICE_UNLOAD_HOOK(_fn) \
 	xsplice_unloadcall_t __attribute__((weak)) xsplice_unload_data __section(".xsplice.hooks.unload") = _fn;
 
+
+/*
+ * The following definitions are to be used in patches. They are taken
+ * from kpatch.
+ */
+
+/*
+ * xsplice shadow variables
+ *
+ * These functions can be used to add new "shadow" fields to existing data
+ * structures.  For example, to allocate a "newpid" variable associated with an
+ * instance of task_struct, and assign it a value of 1000:
+ *
+ * struct task_struct *tsk = current;
+ * int *newpid;
+ * newpid = xsplice_shadow_alloc(tsk, "newpid", sizeof(int));
+ * if (newpid)
+ * 	*newpid = 1000;
+ *
+ * To retrieve a pointer to the variable:
+ *
+ * struct task_struct *tsk = current;
+ * int *newpid;
+ * newpid = xsplice_shadow_get(tsk, "newpid");
+ * if (newpid)
+ * 	printk("task newpid = %d\n", *newpid); // prints "task newpid = 1000"
+ *
+ * To free it:
+ *
+ * xsplice_shadow_free(tsk, "newpid");
+ */
+
+void *xsplice_shadow_alloc(const void *obj, const char *var, size_t size);
+void xsplice_shadow_free(const void *obj, const char *var);
+void *xsplice_shadow_get(const void *obj, const char *var);
+
 #endif /* __XEN_XSPLICE_PATCH_H__ */
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* [PATCH v4 34/34] MAINTAINERS/xsplice: Add myself and Ross as the maintainers.
  2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
                   ` (32 preceding siblings ...)
  2016-03-15 17:56 ` [PATCH v4 33/34] xsplice: Add support for shadow variables Konrad Rzeszutek Wilk
@ 2016-03-15 17:56 ` Konrad Rzeszutek Wilk
  2016-03-16 11:10   ` Jan Beulich
  33 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 17:56 UTC (permalink / raw)
  To: xen-devel, ross.lagerwall, konrad, andrew.cooper3, mpohlack, sasha.levin
  Cc: Keir Fraser, Tim Deegan, Ian Jackson, Jan Beulich, Konrad Rzeszutek Wilk

If you have a patch for xSplice send it our way!

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
---
---
 MAINTAINERS | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 52cc538..dc7a929 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -420,6 +420,16 @@ F:  xen/include/xsm/
 F:  xen/xsm/
 F:  docs/misc/xsm-flask.txt
 
+XSPLICE
+M:  Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
+M:  Ross Lagerwall <ross.lagerwall@citrix.com>
+S:  Supported
+F:  xen/common/xsplice*
+F:  xen/include/xen/xsplice*
+F:  arch/*/xsplice*
+F:  docs/misc/xsplice.markdown
+F:  tools/misc/xen-xsplice.c
+
 THE REST
 M:	Ian Jackson <ian.jackson@eu.citrix.com>
 M:	Jan Beulich <jbeulich@suse.com>
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 02/34] libxc: Remove dead code (XENVER_capabilities)
  2016-03-15 17:56 ` [PATCH v4 02/34] libxc: Remove dead code (XENVER_capabilities) Konrad Rzeszutek Wilk
@ 2016-03-15 18:04   ` Andrew Cooper
  2016-03-15 18:08     ` Konrad Rzeszutek Wilk
  2016-03-16 18:11   ` Wei Liu
  1 sibling, 1 reply; 124+ messages in thread
From: Andrew Cooper @ 2016-03-15 18:04 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, ross.lagerwall, konrad,
	mpohlack, sasha.levin
  Cc: Wei Liu, Ian Jackson, Stefano Stabellini

On 15/03/16 17:56, Konrad Rzeszutek Wilk wrote:
> The 'caps' is not used anywhere in there.
>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 02/34] libxc: Remove dead code (XENVER_capabilities)
  2016-03-15 18:04   ` Andrew Cooper
@ 2016-03-15 18:08     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 18:08 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, ross.lagerwall, Stefano Stabellini, Ian Jackson,
	mpohlack, sasha.levin, xen-devel

On Tue, Mar 15, 2016 at 06:04:16PM +0000, Andrew Cooper wrote:
> On 15/03/16 17:56, Konrad Rzeszutek Wilk wrote:
> > The 'caps' is not used anywhere in there.
> >
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

And the winnner for fastest review time goes to Andrew.

"In less than 60 seconds and he pounces on the patch! He tags it!!! And
off he goes for the next one!"

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.
  2016-03-15 17:56 ` [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane Konrad Rzeszutek Wilk
@ 2016-03-15 18:29   ` Andrew Cooper
  2016-03-15 20:19     ` Konrad Rzeszutek Wilk
  2016-03-22 17:51   ` Daniel De Graaf
  1 sibling, 1 reply; 124+ messages in thread
From: Andrew Cooper @ 2016-03-15 18:29 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, ross.lagerwall, konrad,
	mpohlack, sasha.levin
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Julien Grall,
	Stefano Stabellini, Jan Beulich, Keir Fraser, Daniel De Graaf

On 15/03/16 17:56, Konrad Rzeszutek Wilk wrote:
> @@ -388,6 +395,188 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>      return -ENOSYS;
>  }
>  
> +static const char *capabilities_info(ssize_t *len)
> +{
> +    static xen_capabilities_info_t cached_cap;
> +    static unsigned int cached_cap_len;
> +    static bool_t cached;
> +
> +    if ( cached )

I am surprised that Coverity didn't complain about this being unused...

> +    {
> +        *len = cached_cap_len;
> +        return cached_cap;
> +    }
> +    arch_get_xen_caps(&cached_cap);
> +    cached_cap_len = strlen(cached_cap) + 1;
> +
> +    *len = cached_cap_len;
> +    return cached_cap;

You can turn the logic around as

if ( unliklely(!cached) )
{
    arch_get_xen_caps(&cached_cap);
    cached_cap_len = strlen(cached_cap) + 1;
    cached = 1;
}

and have a single return path.

> +}
> +
> +static int size_of_subops_data(unsigned int cmd, ssize_t *sz)
> +{
> +    int rc = 0;
> +    /* Compute size. */
> +    switch ( cmd )
> +    {
> +    case XEN_VERSION_OP_version:
> +        *sz = sizeof(xen_version_op_val_t);
> +        break;
> +
> +    case XEN_VERSION_OP_extraversion:
> +        *sz = strlen(xen_extra_version()) + 1;
> +        break;
> +
> +    case XEN_VERSION_OP_capabilities:
> +        capabilities_info(sz);
> +        break;
> +
> +    case XEN_VERSION_OP_platform_parameters:
> +        *sz = sizeof(xen_version_op_val_t);
> +        break;
> +
> +    case XEN_VERSION_OP_changeset:
> +        *sz = strlen(xen_changeset()) + 1;
> +        break;
> +
> +    case XEN_VERSION_OP_get_features:
> +        *sz = sizeof(xen_feature_info_t);
> +        break;
> +
> +    case XEN_VERSION_OP_pagesize:
> +        *sz = sizeof(xen_version_op_val_t);
> +        break;
> +
> +    case XEN_VERSION_OP_guest_handle:
> +        *sz = ARRAY_SIZE(current->domain->handle);
> +        break;
> +
> +    case XEN_VERSION_OP_commandline:
> +        *sz = ARRAY_SIZE(saved_cmdline);
> +        break;
> +
> +    default:
> +        rc = -ENOSYS;
> +    }
> +
> +    return rc;
> +}
> +
> +/*
> + * Similar to HYPERVISOR_xen_version but with a sane interface
> + * (has a length, one can probe for the length) and with one less sub-ops:
> + * missing XENVER_compile_info.
> + */
> +DO(version_op)(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg,
> +               unsigned int len)
> +{
> +    union {
> +        xen_version_op_val_t n;
> +        xen_feature_info_t fi;
> +    } u;

= {}; and you can forgo the explicit memset() below.

> +    ssize_t sz = 0;
> +    const void *ptr = NULL;
> +    int rc = xsm_version_op(XSM_OTHER, cmd);
> +
> +    /* We can safely return -EPERM! */
> +    if ( rc )
> +        return rc;
> +
> +    rc = size_of_subops_data(cmd, &sz);
> +    if ( rc )
> +        return rc;
> +
> +    /* Some of the subops may have no data. */
> +    if ( !sz )
> +        return 0;

Really? I would have thought it would be reasonable to assert that
either sz != 0 after the rc != 0 return.

> +    /*
> +     * This hypercall also allows the client to probe. If it provides
> +     * a NULL arg we will return the size of the space it has to
> +     * allocate for the specific sub-op.
> +     */
> +    if ( guest_handle_is_null(arg) )
> +        return sz;
> +
> +    memset(&u, 0, sizeof(u));
> +    /*
> +     * The HYPERVISOR_xen_version differs in that some return the value,
> +     * and some copy it on back on argument. We follow the same rule for all
> +     * sub-ops: return 0 on success, positive value of bytes returned, and
> +     * always copy the result in arg. Yeey sanity!
> +     */
> +
> +    rc = 0;

rc is guaranteed to be 0 at this point.

> +    switch ( cmd )
> +    {
> +    case XEN_VERSION_OP_version:
> +        u.n = (xen_major_version() << 16) | xen_minor_version();
> +        break;
> +
> +    case XEN_VERSION_OP_extraversion:
> +        ptr = xen_extra_version();
> +        break;
> +
> +    case XEN_VERSION_OP_capabilities:
> +        ptr = capabilities_info(&sz);
> +        break;
> +
> +    case XEN_VERSION_OP_platform_parameters:
> +        u.n = HYPERVISOR_VIRT_START;
> +        break;
> +
> +    case XEN_VERSION_OP_changeset:
> +        ptr = xen_changeset();
> +        break;
> +
> +    case XEN_VERSION_OP_get_features:
> +        if ( copy_from_guest(&u.fi, arg, 1) )
> +        {
> +            rc = -EFAULT;
> +            break;
> +        }
> +        rc = get_features(current->domain, &u.fi);
> +        break;
> +
> +    case XEN_VERSION_OP_pagesize:
> +        u.n = PAGE_SIZE;
> +        break;
> +
> +    case XEN_VERSION_OP_guest_handle:
> +        ptr = current->domain->handle;
> +        break;
> +
> +    case XEN_VERSION_OP_commandline:
> +        ptr = saved_cmdline;
> +        break;
> +
> +    default:
> +        rc = -ENOSYS;
> +    }
> +
> +    if ( !rc )
> +    {
> +        ssize_t bytes;
> +
> +        if ( sz > len )
> +            bytes = len;
> +        else
> +            bytes = sz;
> +
> +        if ( copy_to_guest(arg, ptr ? ptr : &u, bytes) )

Can be shortened to ptr ?: &u

> +            rc = -EFAULT;
> +    }
> +    if ( !rc )
> +    {
> +        /*
> +         * We return len (truncate) worth of data even if we fail.
> +         */
> +        if ( sz > len )
> +            rc = -ENOBUFS;

This needs to be in the previous if() clause to avoid overriding -EFAULT
with -ENOBUFS.

> +
> +/*
> + * The HYPERCALL_version_op has a set of sub-ops which mirror the
> + * sub-ops of HYPERCALL_xen_version. However this hypercall differs
> + * radically from the former:
> + *  - It returns the amount of bytes returned.
> + *  - It will return -XEN_EPERM if the guest is not permitted.
> + *  - It will return the requested data in arg.
> + *  - It requires an third argument (len) for the length of the
> + *    arg. Naturally the arg has to fit the requested data otherwise
> + *    -XEN_ENOBUFS is returned.
> + *
> + * It also offers an mechanism to probe for the amount of bytes an
> + * sub-op will require. Having the arg have an NULL pointer will
> + * return the number of bytes requested for the operation. Or an
> + * negative value if an error is encountered.
> + */
> +
> +typedef uint64_t xen_version_op_val_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_version_op_val_t);
> +
> +typedef unsigned char xen_version_op_buf_t[];
> +DEFINE_XEN_GUEST_HANDLE(xen_version_op_buf_t);

Strictly speaking this should be a void* guest handle, as not all data
is returned via this mechanism is unsigned char.

> +
> +/* arg == version_op_val_t. Encoded as major:minor (31..16:15..0) */
> +#define XEN_VERSION_OP_version      0
> +
> +/* arg == version_op_buf. */
> +#define XEN_VERSION_OP_extraversion 1
> +
> +/* arg == version_op_buf */
> +#define XEN_VERSION_OP_capabilities 3
> +
> +/* arg == version_op_buf */
> +#define XEN_VERSION_OP_changeset 4

Might be worth stating that these return NUL terminated utf-8 strings?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 05/34] libxc/libxl/python/xenstat: Use new XEN_VERSION_OP hypercall
  2016-03-15 17:56 ` [PATCH v4 05/34] libxc/libxl/python/xenstat: Use new XEN_VERSION_OP hypercall Konrad Rzeszutek Wilk
@ 2016-03-15 18:45   ` Andrew Cooper
  2016-03-16 12:31   ` George Dunlap
  2016-03-16 18:11   ` Wei Liu
  2 siblings, 0 replies; 124+ messages in thread
From: Andrew Cooper @ 2016-03-15 18:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, ross.lagerwall, konrad,
	mpohlack, sasha.levin
  Cc: Wei Liu, Stefano Stabellini, George Dunlap, Ian Jackson,
	Julien Grall, David Scott

On 15/03/16 17:56, Konrad Rzeszutek Wilk wrote:
> We change the xen_version libxc code to use the new hypercall.
> Which of course means every user in the code base has to
> be changed over.
>
> It is important to note that the xc_version_op has a different
> return semantic than the previous one. It returns negative
> values on error (like the old one), but it also returns
> an positive value on success (unlike the old one). The positive
> value is the number of bytes copied in.
>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: Julien Grall <julien.grall@arm.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: David Scott <dave@recoil.org>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>
> v2: Use xc_version_op_val_t instead of uint32 or such
> v3: Make sure to check ret < 0 instead of ret (as it returns the size) -
>     in Ocaml code. Found by Andrew.
> v4: Update comment for xc_version to mention the return the size
> ---
>  tools/libxc/include/xenctrl.h          | 24 ++++++++++-
>  tools/libxc/xc_core.c                  | 35 +++++++--------
>  tools/libxc/xc_dom_boot.c              | 12 +++++-
>  tools/libxc/xc_domain.c                |  3 +-
>  tools/libxc/xc_private.c               | 53 ++++-------------------
>  tools/libxc/xc_private.h               |  7 +--
>  tools/libxc/xc_resume.c                |  3 +-
>  tools/libxc/xc_sr_save.c               |  9 ++--
>  tools/libxc/xg_save_restore.h          |  6 ++-
>  tools/libxl/libxl.c                    | 79 ++++++++++++++++++++++------------
>  tools/ocaml/libs/xc/xenctrl_stubs.c    | 39 +++++++----------
>  tools/python/xen/lowlevel/xc/xc.c      | 30 +++++++------
>  tools/xenstat/libxenstat/src/xenstat.c | 12 +++---
>  tools/xentrace/xenctx.c                |  3 +-
>  14 files changed, 169 insertions(+), 146 deletions(-)
>
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 150d727..379de30 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -1477,7 +1477,29 @@ int xc_tbuf_set_evt_mask(xc_interface *xch, uint32_t mask);
>  int xc_domctl(xc_interface *xch, struct xen_domctl *domctl);
>  int xc_sysctl(xc_interface *xch, struct xen_sysctl *sysctl);
>  
> -int xc_version(xc_interface *xch, int cmd, void *arg);
> +/**
> + * This function returns the size of buffer to be allocated for
> + * the cmd. The cmd are XEN_VERSION_OP_*.
> + */

This should return ssize_t as it could also plausibly return -1 and set
errno.

> +int xc_version_len(xc_interface *xch, unsigned int cmd);
> +/**
> + * This function retrieves the information from the version_op hypercall.
> + * The len is the size of the arg buffer. If arg is NULL, will not
> + * perform hypercall - instead will just return the size of arg
> + * buffer that is needed.
> + *
> + * Note that prior to Xen 4.7 this would return 0 for success and
> + * negative value (-1) for error (with the error in errno). In Xen 4.7
> + * and later for success it will return an positive value which is the
> + * number of bytes copied in arg.
> + *
> + * It can also return -1 with various errno values:
> + *  - EPERM - not permitted.
> + *  - ENOBUFS - the len was to short, output in arg truncated.
> + *  - ENOSYS - not implemented.
> + *
> + */
> +int xc_version(xc_interface *xch, unsigned int cmd, void *arg, ssize_t len);

This can get away with taking a size_t len.  I am not sure how much we
care about people trying to claim negative length for *arg.

> diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
> index c40a4e9..23876f0 100644
> --- a/tools/python/xen/lowlevel/xc/xc.c
> +++ b/tools/python/xen/lowlevel/xc/xc.c
> @@ -1204,34 +1204,40 @@ static PyObject *pyxc_xeninfo(XcObject *self)
>      xen_capabilities_info_t xen_caps;
>      xen_platform_parameters_t p_parms;
>      xen_commandline_t xen_commandline;
> -    long xen_version;
> -    long xen_pagesize;
> +    xen_version_op_val_t xen_version;
> +    xen_version_op_val_t xen_pagesize;
>      char str[128];
>  
> -    xen_version = xc_version(self->xc_handle, XENVER_version, NULL);
> -
> -    if ( xc_version(self->xc_handle, XENVER_extraversion, &xen_extra) != 0 )
> +    if ( xc_version(self->xc_handle, XEN_VERSION_OP_version, &xen_version,
> +                    sizeof(xen_version)) < 0)

Xen Style.

> --- a/tools/xenstat/libxenstat/src/xenstat.c
> +++ b/tools/xenstat/libxenstat/src/xenstat.c
> @@ -621,20 +621,18 @@ unsigned long long xenstat_network_tdrop(xenstat_network * network)
>  /* Collect Xen version information */
>  static int xenstat_collect_xen_version(xenstat_node * node)
>  {
> -	long vnum = 0;
> +	xen_version_op_val_t vnum = 0;
>  	xen_extraversion_t version;
>  
>  	/* Collect Xen version information if not already collected */
>  	if (node->handle->xen_version[0] == '\0') {
>  		/* Get the Xen version number and extraversion string */
> -		vnum = xc_version(node->handle->xc_handle,
> -			XENVER_version, NULL);
> -
> -		if (vnum < 0)
> +		if (xc_version(node->handle->xc_handle,
> +			           XEN_VERSION_OP_version, &vnum, sizeof(vnum)) < 0 )
>  			return 0;

Curiously, the opposite style bug here.

With these trivial bits fixed, Reviewed-by: Andrew Cooper
<andrew.cooper3@citrix.com>, and a Tested-by: for the Ocaml stubs
(eventually).

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 06/34] x86/arm: Add BUGFRAME_NR define and BUILD checks.
  2016-03-15 17:56 ` [PATCH v4 06/34] x86/arm: Add BUGFRAME_NR define and BUILD checks Konrad Rzeszutek Wilk
@ 2016-03-15 18:54   ` Andrew Cooper
  2016-03-16 11:49   ` Julien Grall
  2016-03-18 12:40   ` Jan Beulich
  2 siblings, 0 replies; 124+ messages in thread
From: Andrew Cooper @ 2016-03-15 18:54 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, ross.lagerwall, konrad,
	mpohlack, sasha.levin
  Cc: Keir Fraser, Julien Grall, Stefano Stabellini, Jan Beulich

On 15/03/16 17:56, Konrad Rzeszutek Wilk wrote:
> So that we have a nice mechansim to figure out the upper
> bounds of bug.frames and also catch compiler errors in case
> one tries to use a higher frame number.
>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>
> ---
> Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
> Cc: Julien Grall <julien.grall@arm.com>
> Cc: Keir Fraser <keir@xen.org>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> ---
>  xen/include/asm-arm/bug.h | 2 ++
>  xen/include/asm-x86/bug.h | 3 ++-
>  2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/xen/include/asm-arm/bug.h b/xen/include/asm-arm/bug.h
> index ab9e811..4df6b2a 100644
> --- a/xen/include/asm-arm/bug.h
> +++ b/xen/include/asm-arm/bug.h
> @@ -31,6 +31,7 @@ struct bug_frame {
>  #define BUGFRAME_warn   0
>  #define BUGFRAME_bug    1
>  #define BUGFRAME_assert 2
> +#define BUGFRAME_NR     3
>  
>  /* Many versions of GCC doesn't support the asm %c parameter which would
>   * be preferable to this unpleasantness. We use mergeable string
> @@ -39,6 +40,7 @@ struct bug_frame {
>   */
>  #define BUG_FRAME(type, line, file, has_msg, msg) do {                      \
>      BUILD_BUG_ON((line) >> 16);                                             \
> +    BUILD_BUG_ON(type >= BUGFRAME_NR);                                      \
>      asm ("1:"BUG_INSTR"\n"                                                  \
>           ".pushsection .rodata.str, \"aMS\", %progbits, 1\n"                \
>           "2:\t.asciz " __stringify(file) "\n"                               \
> diff --git a/xen/include/asm-x86/bug.h b/xen/include/asm-x86/bug.h
> index e868e85..bd17ade 100644
> --- a/xen/include/asm-x86/bug.h
> +++ b/xen/include/asm-x86/bug.h
> @@ -9,7 +9,7 @@
>  #define BUGFRAME_warn   1
>  #define BUGFRAME_bug    2
>  #define BUGFRAME_assert 3
> -
> +#define BUGFRAME_NR     4
>  #ifndef __ASSEMBLY__
>  
>  struct bug_frame {
> @@ -51,6 +51,7 @@ struct bug_frame {
>  
>  #define BUG_FRAME(type, line, ptr, second_frame, msg) do {                   \
>      BUILD_BUG_ON((line) >> (BUG_LINE_LO_WIDTH + BUG_LINE_HI_WIDTH));         \
> +    BUILD_BUG_ON((type) >= (BUGFRAME_NR));                                   \
>      asm volatile ( _ASM_BUGFRAME_TEXT(second_frame)                          \
>                     :: _ASM_BUGFRAME_INFO(type, line, ptr, msg) );            \
>  } while (0)

Please fold this hunk in as well,

diff --git a/xen/include/asm-x86/bug.h b/xen/include/asm-x86/bug.h
index e868e85..5f9032e 100644
--- a/xen/include/asm-x86/bug.h
+++ b/xen/include/asm-x86/bug.h
@@ -83,6 +83,11 @@ extern const struct bug_frame __start_bug_frames[],
  * in .rodata
  */
     .macro BUG_FRAME type, line, file_str, second_frame, msg
+
+    .if \type >= BUGFRAME_NR
+         .error "Invalid BUGFRAME index"
+    .endif
+
     .L\@ud: ud2a
 
     .pushsection .rodata.str1, "aMS", @progbits, 1

Which is an equivalent check for ASM bugframes.

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 01/34] compat/x86: Remove unncessary #define.
  2016-03-15 17:56 ` [PATCH v4 01/34] compat/x86: Remove unncessary #define Konrad Rzeszutek Wilk
@ 2016-03-15 18:57   ` Andrew Cooper
  2016-03-16 11:08   ` Jan Beulich
  1 sibling, 0 replies; 124+ messages in thread
From: Andrew Cooper @ 2016-03-15 18:57 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, ross.lagerwall, konrad,
	mpohlack, sasha.levin
  Cc: Keir Fraser, Ian Jackson, Jan Beulich, Tim Deegan

On 15/03/16 17:56, Konrad Rzeszutek Wilk wrote:
> It is not used.
>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 07/34] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables
  2016-03-15 17:56 ` [PATCH v4 07/34] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables Konrad Rzeszutek Wilk
@ 2016-03-15 19:24   ` Andrew Cooper
  2016-03-15 19:34     ` Konrad Rzeszutek Wilk
  2016-03-18 13:07   ` Jan Beulich
  1 sibling, 1 reply; 124+ messages in thread
From: Andrew Cooper @ 2016-03-15 19:24 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, ross.lagerwall, konrad,
	mpohlack, sasha.levin
  Cc: Stefano Stabellini, Julien Grall, Keir Fraser, Jan Beulich

On 15/03/16 17:56, Konrad Rzeszutek Wilk wrote:
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index 31d2115..b62c91f 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -16,6 +16,7 @@
>   * GNU General Public License for more details.
>   */
>  
> +#include <xen/bug_ex_symbols.h>

how about just <xen/virtual_region.h> ? It contains more than just
bugframes.

> diff --git a/xen/common/bug_ex_symbols.c b/xen/common/bug_ex_symbols.c
> new file mode 100644
> index 0000000..77bb72b
> --- /dev/null
> +++ b/xen/common/bug_ex_symbols.c
> @@ -0,0 +1,119 @@
> +/*
> + * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
> + *
> + */
> +
> +#include <xen/bug_ex_symbols.h>
> +#include <xen/config.h>
> +#include <xen/kernel.h>
> +#include <xen/init.h>
> +#include <xen/spinlock.h>
> +
> +extern char __stext[];

There is no such symbol.  _stext comes in via kernel.h

> +
> +struct virtual_region kernel_text = {

How about just "compiled" ? This is more than just .text.

> +    .list = LIST_HEAD_INIT(kernel_text.list),
> +    .start = (unsigned long)_stext,
> +    .end = (unsigned long)_etext,
> +#ifdef CONFIG_X86
> +    .ex = (struct exception_table_entry *)__start___ex_table,
> +    .ex_end = (struct exception_table_entry *)__stop___ex_table,
> +#endif
> +};
> +
> +/*
> + * The kernel_inittext should only be used when system_state
> + * is booting. Otherwise all accesses should be ignored.
> + */
> +static bool_t ignore_if_active(unsigned int flag, unsigned long priv)
> +{
> +    return (system_state >= SYS_STATE_active);
> +}
> +
> +/*
> + * Becomes irrelevant when __init sections are cleared.
> + */
> +struct virtual_region kernel_inittext  = {
> +    .list = LIST_HEAD_INIT(kernel_inittext.list),
> +    .skip = ignore_if_active,
> +    .start = (unsigned long)_sinittext,
> +    .end = (unsigned long)_einittext,
> +#ifdef CONFIG_X86
> +    /* Even if they are __init their exception entry still gets stuck here. */
> +    .ex = (struct exception_table_entry *)__start___ex_table,
> +    .ex_end = (struct exception_table_entry *)__stop___ex_table,
> +#endif
> +};

This can live in .init.data and be taken off the linked list in
init_done(), which performs other bits of cleanup relating to .init

> +
> +/*
> + * No locking. Additions are done either at startup (when there is only
> + * one CPU) or when all CPUs are running without IRQs.
> + *
> + * Deletions are big tricky. We MUST make sure all but one CPU
> + * are running cpu_relax().

It should still be possible to lock this properly.  We expect no
contention, at which point acquiring and releasing the locks will always
hit fastpaths, but it will avoid accidental corruption if something goes
wrong.

In each of register or deregister, take the lock, then confirm whether
the current region is in a list or not, by looking at r->list.  With the
single virtual_region_lock held, that can safely avoid repeatedly adding
the region to the region list.

> + *
> + */
> +LIST_HEAD(virtual_region_list);
> +
> +int register_virtual_region(struct virtual_region *r)
> +{
> +    ASSERT(!local_irq_is_enabled());
> +
> +    list_add_tail(&r->list, &virtual_region_list);
> +    return 0;
> +}
> +
> +void unregister_virtual_region(struct virtual_region *r)
> +{
> +    ASSERT(!local_irq_is_enabled());
> +
> +    list_del_init(&r->list);
> +}
> +
> +void __init setup_virtual_regions(void)
> +{
> +    ssize_t sz;
> +    unsigned int i, idx;
> +    static const struct bug_frame *const stop_frames[] = {
> +        __start_bug_frames,
> +        __stop_bug_frames_0,
> +        __stop_bug_frames_1,
> +        __stop_bug_frames_2,
> +#ifdef CONFIG_X86
> +        __stop_bug_frames_3,
> +#endif
> +        NULL
> +    };
> +
> +#ifdef CONFIG_X86
> +    sort_exception_tables();
> +#endif

Any reason why this needs moving out of setup.c ?

> diff --git a/xen/include/xen/bug_ex_symbols.h b/xen/include/xen/bug_ex_symbols.h
> new file mode 100644
> index 0000000..6f3401b
> --- /dev/null
> +++ b/xen/include/xen/bug_ex_symbols.h
> @@ -0,0 +1,74 @@
> +/*
> + * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
> + *
> + */
> +
> +#ifndef __BUG_EX_SYMBOL_LIST__
> +#define __BUG_EX_SYMBOL_LIST__
> +
> +#include <xen/config.h>
> +#include <xen/list.h>
> +#include <xen/symbols.h>
> +
> +#ifdef CONFIG_X86
> +#include <asm/uaccess.h>
> +#endif
> +#include <asm/bug.h>
> +
> +struct virtual_region
> +{
> +    struct list_head list;
> +
> +#define CHECKING_SYMBOL         (1<<1)
> +#define CHECKING_BUG_FRAME      (1<<2)
> +#define CHECKING_EXCEPTION      (1<<3)
> +    /*
> +     * Whether to skip this region for particular searches. The flag
> +     * can be CHECKING_[SYMBOL|BUG_FRAMES|EXCEPTION].
> +     *
> +     * If the function returns 1 this region will be skipped.
> +     */
> +    bool_t (*skip)(unsigned int flag, unsigned long priv);

Why do we need infrastructure like this?  A virtual region is either
active and in use (in which case it should be on the list and fully
complete), or not in use and never available to query.

If it was only to deal with .init, I would recommend dropping it all.

> +
> +    unsigned long start;        /* Virtual address start. */
> +    unsigned long end;          /* Virtual address start. */
> +
> +    /*
> +     * If ->skip returns false for CHECKING_SYMBOL we will use
> +     * 'symbols_lookup' callback to retrieve the name of the
> +     * addr between start and end. If this is NULL the
> +     * default lookup mechanism is used (the skip value is
> +     * ignored).
> +     */
> +    symbols_lookup_t symbols_lookup;
> +
> +    struct {
> +        struct bug_frame *bugs; /* The pointer to array of bug frames. */
> +        ssize_t n_bugs;         /* The number of them. */
> +    } frame[BUGFRAME_NR];
> +
> +#ifdef CONFIG_X86
> +    struct exception_table_entry *ex;
> +    struct exception_table_entry *ex_end;
> +#endif
> +
> +    unsigned long priv;         /* To be used by above funcionts if need to. */
> +};
> +
> +extern struct list_head virtual_region_list;
> +
> +extern void setup_virtual_regions(void);
> +extern int register_virtual_region(struct virtual_region *r);
> +extern void unregister_virtual_region(struct virtual_region *r);
> +
> +#endif /* __BUG_EX_SYMBOL_LIST__ */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/include/xen/kernel.h b/xen/include/xen/kernel.h
> index 548b64d..8cf7af7 100644
> --- a/xen/include/xen/kernel.h
> +++ b/xen/include/xen/kernel.h
> @@ -65,12 +65,14 @@
>  	1;                                      \
>  })
>  
> +

Spurious change.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 08/34] vmap: Make the while loop less fishy.
  2016-03-15 17:56 ` [PATCH v4 08/34] vmap: Make the while loop less fishy Konrad Rzeszutek Wilk
@ 2016-03-15 19:33   ` Andrew Cooper
  2016-03-17 11:49     ` Jan Beulich
  2016-03-17 11:48   ` Jan Beulich
  2016-03-17 16:08   ` Ian Jackson
  2 siblings, 1 reply; 124+ messages in thread
From: Andrew Cooper @ 2016-03-15 19:33 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, ross.lagerwall, konrad,
	mpohlack, sasha.levin
  Cc: Keir Fraser, Ian Jackson, Jan Beulich, Tim Deegan

On 15/03/16 17:56, Konrad Rzeszutek Wilk wrote:
> It looks like it could underflow at first glance. That is
> if i is zero and you get in the while loop with the
> i--. However the postfix expression is evaluated after the
> conditional so the loop is fine and won't execute (with i==0).
>
> However in spirit of defense programming lets clarify
> the loop conditional.
>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

This looks as if it will quieten Coverity, even though it is no
functional change.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 07/34] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables
  2016-03-15 19:24   ` Andrew Cooper
@ 2016-03-15 19:34     ` Konrad Rzeszutek Wilk
  2016-03-15 19:51       ` Andrew Cooper
  0 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 19:34 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Keir Fraser, ross.lagerwall, mpohlack, Julien Grall,
	Stefano Stabellini, Jan Beulich, sasha.levin, xen-devel

On Tue, Mar 15, 2016 at 07:24:30PM +0000, Andrew Cooper wrote:
> On 15/03/16 17:56, Konrad Rzeszutek Wilk wrote:
> > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> > index 31d2115..b62c91f 100644
> > --- a/xen/arch/arm/traps.c
> > +++ b/xen/arch/arm/traps.c
> > @@ -16,6 +16,7 @@
> >   * GNU General Public License for more details.
> >   */
> >  
> > +#include <xen/bug_ex_symbols.h>
> 
> how about just <xen/virtual_region.h> ? It contains more than just
> bugframes.

/me nods.
> 
> > diff --git a/xen/common/bug_ex_symbols.c b/xen/common/bug_ex_symbols.c
> > new file mode 100644
> > index 0000000..77bb72b
> > --- /dev/null
> > +++ b/xen/common/bug_ex_symbols.c
> > @@ -0,0 +1,119 @@
> > +/*
> > + * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
> > + *
> > + */
> > +
> > +#include <xen/bug_ex_symbols.h>
> > +#include <xen/config.h>
> > +#include <xen/kernel.h>
> > +#include <xen/init.h>
> > +#include <xen/spinlock.h>
> > +
> > +extern char __stext[];
> 
> There is no such symbol.  _stext comes in via kernel.h

Argh.

> 
> > +
> > +struct virtual_region kernel_text = {
> 
> How about just "compiled" ? This is more than just .text.
> 
> > +    .list = LIST_HEAD_INIT(kernel_text.list),
> > +    .start = (unsigned long)_stext,
> > +    .end = (unsigned long)_etext,
> > +#ifdef CONFIG_X86
> > +    .ex = (struct exception_table_entry *)__start___ex_table,
> > +    .ex_end = (struct exception_table_entry *)__stop___ex_table,
> > +#endif
> > +};
> > +
> > +/*
> > + * The kernel_inittext should only be used when system_state
> > + * is booting. Otherwise all accesses should be ignored.
> > + */
> > +static bool_t ignore_if_active(unsigned int flag, unsigned long priv)
> > +{
> > +    return (system_state >= SYS_STATE_active);
> > +}
> > +
> > +/*
> > + * Becomes irrelevant when __init sections are cleared.
> > + */
> > +struct virtual_region kernel_inittext  = {
> > +    .list = LIST_HEAD_INIT(kernel_inittext.list),
> > +    .skip = ignore_if_active,
> > +    .start = (unsigned long)_sinittext,
> > +    .end = (unsigned long)_einittext,
> > +#ifdef CONFIG_X86
> > +    /* Even if they are __init their exception entry still gets stuck here. */
> > +    .ex = (struct exception_table_entry *)__start___ex_table,
> > +    .ex_end = (struct exception_table_entry *)__stop___ex_table,
> > +#endif
> > +};
> 
> This can live in .init.data and be taken off the linked list in
> init_done(), which performs other bits of cleanup relating to .init

Unfortunatly at that point of time it is SMP - so if we clean it up
we need to use a spin_lock.

> 
> > +
> > +/*
> > + * No locking. Additions are done either at startup (when there is only
> > + * one CPU) or when all CPUs are running without IRQs.
> > + *
> > + * Deletions are big tricky. We MUST make sure all but one CPU
> > + * are running cpu_relax().
> 
> It should still be possible to lock this properly.  We expect no
> contention, at which point acquiring and releasing the locks will always
> hit fastpaths, but it will avoid accidental corruption if something goes
> wrong.
> 
> In each of register or deregister, take the lock, then confirm whether
> the current region is in a list or not, by looking at r->list.  With the
> single virtual_region_lock held, that can safely avoid repeatedly adding
> the region to the region list.

Yeah. I don't know why I was thinking we can't. Ah, I was thinking about
traversing the list - and we don't want the spin_lock as this is in
the do_traps or other code that really really should not take any spinlocks.

But if the adding/removing is done under a spinlock then that is OK.

Let me do that.

> 
> > + *
> > + */
> > +LIST_HEAD(virtual_region_list);
> > +
> > +int register_virtual_region(struct virtual_region *r)
> > +{
> > +    ASSERT(!local_irq_is_enabled());
> > +
> > +    list_add_tail(&r->list, &virtual_region_list);
> > +    return 0;
> > +}
> > +
> > +void unregister_virtual_region(struct virtual_region *r)
> > +{
> > +    ASSERT(!local_irq_is_enabled());
> > +
> > +    list_del_init(&r->list);
> > +}
> > +
> > +void __init setup_virtual_regions(void)
> > +{
> > +    ssize_t sz;
> > +    unsigned int i, idx;
> > +    static const struct bug_frame *const stop_frames[] = {
> > +        __start_bug_frames,
> > +        __stop_bug_frames_0,
> > +        __stop_bug_frames_1,
> > +        __stop_bug_frames_2,
> > +#ifdef CONFIG_X86
> > +        __stop_bug_frames_3,
> > +#endif
> > +        NULL
> > +    };
> > +
> > +#ifdef CONFIG_X86
> > +    sort_exception_tables();
> > +#endif
> 
> Any reason why this needs moving out of setup.c ?

None at all.
> 
> > diff --git a/xen/include/xen/bug_ex_symbols.h b/xen/include/xen/bug_ex_symbols.h
> > new file mode 100644
> > index 0000000..6f3401b
> > --- /dev/null
> > +++ b/xen/include/xen/bug_ex_symbols.h
> > @@ -0,0 +1,74 @@
> > +/*
> > + * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
> > + *
> > + */
> > +
> > +#ifndef __BUG_EX_SYMBOL_LIST__
> > +#define __BUG_EX_SYMBOL_LIST__
> > +
> > +#include <xen/config.h>
> > +#include <xen/list.h>
> > +#include <xen/symbols.h>
> > +
> > +#ifdef CONFIG_X86
> > +#include <asm/uaccess.h>
> > +#endif
> > +#include <asm/bug.h>
> > +
> > +struct virtual_region
> > +{
> > +    struct list_head list;
> > +
> > +#define CHECKING_SYMBOL         (1<<1)
> > +#define CHECKING_BUG_FRAME      (1<<2)
> > +#define CHECKING_EXCEPTION      (1<<3)
> > +    /*
> > +     * Whether to skip this region for particular searches. The flag
> > +     * can be CHECKING_[SYMBOL|BUG_FRAMES|EXCEPTION].
> > +     *
> > +     * If the function returns 1 this region will be skipped.
> > +     */
> > +    bool_t (*skip)(unsigned int flag, unsigned long priv);
> 
> Why do we need infrastructure like this?  A virtual region is either
> active and in use (in which case it should be on the list and fully
> complete), or not in use and never available to query.

> 
> If it was only to deal with .init, I would recommend dropping it all.

That was the reason.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 07/34] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables
  2016-03-15 19:34     ` Konrad Rzeszutek Wilk
@ 2016-03-15 19:51       ` Andrew Cooper
  2016-03-15 20:02         ` Andrew Cooper
  0 siblings, 1 reply; 124+ messages in thread
From: Andrew Cooper @ 2016-03-15 19:51 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, ross.lagerwall, mpohlack, Julien Grall,
	Stefano Stabellini, Jan Beulich, sasha.levin, xen-devel

On 15/03/16 19:34, Konrad Rzeszutek Wilk wrote:
> On Tue, Mar 15, 2016 at 07:24:30PM +0000, Andrew Cooper wrote:
>> On 15/03/16 17:56, Konrad Rzeszutek Wilk wrote:
>>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
>>> index 31d2115..b62c91f 100644
>>> --- a/xen/arch/arm/traps.c
>>> +++ b/xen/arch/arm/traps.c
>>> @@ -16,6 +16,7 @@
>>>   * GNU General Public License for more details.
>>>   */
>>>  
>>> +#include <xen/bug_ex_symbols.h>
>> how about just <xen/virtual_region.h> ? It contains more than just
>> bugframes.
> /me nods.
>>> diff --git a/xen/common/bug_ex_symbols.c b/xen/common/bug_ex_symbols.c
>>> new file mode 100644
>>> index 0000000..77bb72b
>>> --- /dev/null
>>> +++ b/xen/common/bug_ex_symbols.c
>>> @@ -0,0 +1,119 @@
>>> +/*
>>> + * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
>>> + *
>>> + */
>>> +
>>> +#include <xen/bug_ex_symbols.h>
>>> +#include <xen/config.h>
>>> +#include <xen/kernel.h>
>>> +#include <xen/init.h>
>>> +#include <xen/spinlock.h>
>>> +
>>> +extern char __stext[];
>> There is no such symbol.  _stext comes in via kernel.h
> Argh.
>
>>> +
>>> +struct virtual_region kernel_text = {
>> How about just "compiled" ? This is more than just .text.
>>
>>> +    .list = LIST_HEAD_INIT(kernel_text.list),
>>> +    .start = (unsigned long)_stext,
>>> +    .end = (unsigned long)_etext,
>>> +#ifdef CONFIG_X86
>>> +    .ex = (struct exception_table_entry *)__start___ex_table,
>>> +    .ex_end = (struct exception_table_entry *)__stop___ex_table,
>>> +#endif
>>> +};
>>> +
>>> +/*
>>> + * The kernel_inittext should only be used when system_state
>>> + * is booting. Otherwise all accesses should be ignored.
>>> + */
>>> +static bool_t ignore_if_active(unsigned int flag, unsigned long priv)
>>> +{
>>> +    return (system_state >= SYS_STATE_active);
>>> +}
>>> +
>>> +/*
>>> + * Becomes irrelevant when __init sections are cleared.
>>> + */
>>> +struct virtual_region kernel_inittext  = {
>>> +    .list = LIST_HEAD_INIT(kernel_inittext.list),
>>> +    .skip = ignore_if_active,
>>> +    .start = (unsigned long)_sinittext,
>>> +    .end = (unsigned long)_einittext,
>>> +#ifdef CONFIG_X86
>>> +    /* Even if they are __init their exception entry still gets stuck here. */
>>> +    .ex = (struct exception_table_entry *)__start___ex_table,
>>> +    .ex_end = (struct exception_table_entry *)__stop___ex_table,
>>> +#endif
>>> +};
>> This can live in .init.data and be taken off the linked list in
>> init_done(), which performs other bits of cleanup relating to .init
> Unfortunatly at that point of time it is SMP - so if we clean it up
> we need to use a spin_lock.
>
>>> +
>>> +/*
>>> + * No locking. Additions are done either at startup (when there is only
>>> + * one CPU) or when all CPUs are running without IRQs.
>>> + *
>>> + * Deletions are big tricky. We MUST make sure all but one CPU
>>> + * are running cpu_relax().
>> It should still be possible to lock this properly.  We expect no
>> contention, at which point acquiring and releasing the locks will always
>> hit fastpaths, but it will avoid accidental corruption if something goes
>> wrong.
>>
>> In each of register or deregister, take the lock, then confirm whether
>> the current region is in a list or not, by looking at r->list.  With the
>> single virtual_region_lock held, that can safely avoid repeatedly adding
>> the region to the region list.
> Yeah. I don't know why I was thinking we can't. Ah, I was thinking about
> traversing the list - and we don't want the spin_lock as this is in
> the do_traps or other code that really really should not take any spinlocks.
>
> But if the adding/removing is done under a spinlock then that is OK.
>
> Let me do that.

Actually, that isn't sufficient.  Sorry for misleaing you. 

You have to exclude modifications to the list against other cpus waking
it in an exception handler, which might include NMI and MCE context.

Now I think about it, going lockless here is probably a bonus, as we
don't want to be messing around with locks in fatal contexts.  In which
case, it would be better to use a single linked list and cmpxchg to
insert/remove elements.  It generally wants to be walked forwards, and
will only have a handful of elements, so searching forwards to delete
will be ok.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 07/34] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables
  2016-03-15 19:51       ` Andrew Cooper
@ 2016-03-15 20:02         ` Andrew Cooper
  2016-03-16 10:33           ` Jan Beulich
  0 siblings, 1 reply; 124+ messages in thread
From: Andrew Cooper @ 2016-03-15 20:02 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, mpohlack, ross.lagerwall, Julien Grall,
	Stefano Stabellini, Jan Beulich, xen-devel, sasha.levin

On 15/03/16 19:51, Andrew Cooper wrote:
> On 15/03/16 19:34, Konrad Rzeszutek Wilk wrote:
>> On Tue, Mar 15, 2016 at 07:24:30PM +0000, Andrew Cooper wrote:
>>> On 15/03/16 17:56, Konrad Rzeszutek Wilk wrote:
>>>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
>>>> index 31d2115..b62c91f 100644
>>>> --- a/xen/arch/arm/traps.c
>>>> +++ b/xen/arch/arm/traps.c
>>>> @@ -16,6 +16,7 @@
>>>>   * GNU General Public License for more details.
>>>>   */
>>>>  
>>>> +#include <xen/bug_ex_symbols.h>
>>> how about just <xen/virtual_region.h> ? It contains more than just
>>> bugframes.
>> /me nods.
>>>> diff --git a/xen/common/bug_ex_symbols.c b/xen/common/bug_ex_symbols.c
>>>> new file mode 100644
>>>> index 0000000..77bb72b
>>>> --- /dev/null
>>>> +++ b/xen/common/bug_ex_symbols.c
>>>> @@ -0,0 +1,119 @@
>>>> +/*
>>>> + * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
>>>> + *
>>>> + */
>>>> +
>>>> +#include <xen/bug_ex_symbols.h>
>>>> +#include <xen/config.h>
>>>> +#include <xen/kernel.h>
>>>> +#include <xen/init.h>
>>>> +#include <xen/spinlock.h>
>>>> +
>>>> +extern char __stext[];
>>> There is no such symbol.  _stext comes in via kernel.h
>> Argh.
>>
>>>> +
>>>> +struct virtual_region kernel_text = {
>>> How about just "compiled" ? This is more than just .text.
>>>
>>>> +    .list = LIST_HEAD_INIT(kernel_text.list),
>>>> +    .start = (unsigned long)_stext,
>>>> +    .end = (unsigned long)_etext,
>>>> +#ifdef CONFIG_X86
>>>> +    .ex = (struct exception_table_entry *)__start___ex_table,
>>>> +    .ex_end = (struct exception_table_entry *)__stop___ex_table,
>>>> +#endif
>>>> +};
>>>> +
>>>> +/*
>>>> + * The kernel_inittext should only be used when system_state
>>>> + * is booting. Otherwise all accesses should be ignored.
>>>> + */
>>>> +static bool_t ignore_if_active(unsigned int flag, unsigned long priv)
>>>> +{
>>>> +    return (system_state >= SYS_STATE_active);
>>>> +}
>>>> +
>>>> +/*
>>>> + * Becomes irrelevant when __init sections are cleared.
>>>> + */
>>>> +struct virtual_region kernel_inittext  = {
>>>> +    .list = LIST_HEAD_INIT(kernel_inittext.list),
>>>> +    .skip = ignore_if_active,
>>>> +    .start = (unsigned long)_sinittext,
>>>> +    .end = (unsigned long)_einittext,
>>>> +#ifdef CONFIG_X86
>>>> +    /* Even if they are __init their exception entry still gets stuck here. */
>>>> +    .ex = (struct exception_table_entry *)__start___ex_table,
>>>> +    .ex_end = (struct exception_table_entry *)__stop___ex_table,
>>>> +#endif
>>>> +};
>>> This can live in .init.data and be taken off the linked list in
>>> init_done(), which performs other bits of cleanup relating to .init
>> Unfortunatly at that point of time it is SMP - so if we clean it up
>> we need to use a spin_lock.
>>
>>>> +
>>>> +/*
>>>> + * No locking. Additions are done either at startup (when there is only
>>>> + * one CPU) or when all CPUs are running without IRQs.
>>>> + *
>>>> + * Deletions are big tricky. We MUST make sure all but one CPU
>>>> + * are running cpu_relax().
>>> It should still be possible to lock this properly.  We expect no
>>> contention, at which point acquiring and releasing the locks will always
>>> hit fastpaths, but it will avoid accidental corruption if something goes
>>> wrong.
>>>
>>> In each of register or deregister, take the lock, then confirm whether
>>> the current region is in a list or not, by looking at r->list.  With the
>>> single virtual_region_lock held, that can safely avoid repeatedly adding
>>> the region to the region list.
>> Yeah. I don't know why I was thinking we can't. Ah, I was thinking about
>> traversing the list - and we don't want the spin_lock as this is in
>> the do_traps or other code that really really should not take any spinlocks.
>>
>> But if the adding/removing is done under a spinlock then that is OK.
>>
>> Let me do that.
> Actually, that isn't sufficient.  Sorry for misleaing you. 
>
> You have to exclude modifications to the list against other cpus waking
> it in an exception handler, which might include NMI and MCE context.
>
> Now I think about it, going lockless here is probably a bonus, as we
> don't want to be messing around with locks in fatal contexts.  In which
> case, it would be better to use a single linked list and cmpxchg to
> insert/remove elements.  It generally wants to be walked forwards, and
> will only have a handful of elements, so searching forwards to delete
> will be ok.

Actually, knowing that the list is only ever walked forwards by the
exception handlers, and with some regular spinlocks around mutation,
dudicious use of list_add_tail_rcu() and list_del_rcu() should suffice
(I think), and will definitely be better than handrolling a single
linked list.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.
  2016-03-15 18:29   ` Andrew Cooper
@ 2016-03-15 20:19     ` Konrad Rzeszutek Wilk
  2016-03-17  1:38       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-15 20:19 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, mpohlack,
	ross.lagerwall, Julien Grall, Stefano Stabellini, Jan Beulich,
	sasha.levin, xen-devel, Daniel De Graaf, Keir Fraser

.. snip ..
> > +    case XEN_VERSION_OP_guest_handle:
> > +        *sz = ARRAY_SIZE(current->domain->handle);
> > +        break;
> > +
> > +    case XEN_VERSION_OP_commandline:
> > +        *sz = ARRAY_SIZE(saved_cmdline);
> > +        break;
> > +
> > +    default:
> > +        rc = -ENOSYS;
> > +    }
> > +
> > +    return rc;
> > +}
> > +
> > +/*
> > + * Similar to HYPERVISOR_xen_version but with a sane interface
> > + * (has a length, one can probe for the length) and with one less sub-ops:
> > + * missing XENVER_compile_info.
> > + */
> > +DO(version_op)(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg,
> > +               unsigned int len)
> > +{
> > +    union {
> > +        xen_version_op_val_t n;
> > +        xen_feature_info_t fi;
> > +    } u;
> 
> = {}; and you can forgo the explicit memset() below.

Done!
> 
> > +    ssize_t sz = 0;
> > +    const void *ptr = NULL;
> > +    int rc = xsm_version_op(XSM_OTHER, cmd);
> > +
> > +    /* We can safely return -EPERM! */
> > +    if ( rc )
> > +        return rc;
> > +
> > +    rc = size_of_subops_data(cmd, &sz);
> > +    if ( rc )
> > +        return rc;
> > +
> > +    /* Some of the subops may have no data. */
> > +    if ( !sz )
> > +        return 0;
> 
> Really? I would have thought it would be reasonable to assert that
> either sz != 0 after the rc != 0 return.

Commandline and guest_handle may be empty. Ah they aren't as
they are array.

ARRAY_SIZE(saved_commandline) is always 1024. Ugh.

.. snip..

> > +
> > +    if ( !rc )
> > +    {
> > +        ssize_t bytes;
> > +
> > +        if ( sz > len )
> > +            bytes = len;
> > +        else
> > +            bytes = sz;
> > +
> > +        if ( copy_to_guest(arg, ptr ? ptr : &u, bytes) )
> 
> Can be shortened to ptr ?: &u
> 
> > +            rc = -EFAULT;
> > +    }
> > +    if ( !rc )

         ^^^^^^^^^ - here
> > +    {
> > +        /*
> > +         * We return len (truncate) worth of data even if we fail.
> > +         */
> > +        if ( sz > len )
> > +            rc = -ENOBUFS;
> 
> This needs to be in the previous if() clause to avoid overriding -EFAULT
> with -ENOBUFS.

That is exactly why it is in its own 'if ( !rc )' - so it won't
overwrite -EFAULT. See above for 'here'


> 
> > +
> > +/*
> > + * The HYPERCALL_version_op has a set of sub-ops which mirror the
> > + * sub-ops of HYPERCALL_xen_version. However this hypercall differs
> > + * radically from the former:
> > + *  - It returns the amount of bytes returned.
> > + *  - It will return -XEN_EPERM if the guest is not permitted.
> > + *  - It will return the requested data in arg.
> > + *  - It requires an third argument (len) for the length of the
> > + *    arg. Naturally the arg has to fit the requested data otherwise
> > + *    -XEN_ENOBUFS is returned.
> > + *
> > + * It also offers an mechanism to probe for the amount of bytes an
> > + * sub-op will require. Having the arg have an NULL pointer will
> > + * return the number of bytes requested for the operation. Or an
> > + * negative value if an error is encountered.
> > + */
> > +
> > +typedef uint64_t xen_version_op_val_t;
> > +DEFINE_XEN_GUEST_HANDLE(xen_version_op_val_t);
> > +
> > +typedef unsigned char xen_version_op_buf_t[];
> > +DEFINE_XEN_GUEST_HANDLE(xen_version_op_buf_t);
> 
> Strictly speaking this should be a void* guest handle, as not all data
> is returned via this mechanism is unsigned char.

Done!
> 
> > +
> > +/* arg == version_op_val_t. Encoded as major:minor (31..16:15..0) */
> > +#define XEN_VERSION_OP_version      0
> > +
> > +/* arg == version_op_buf. */
> > +#define XEN_VERSION_OP_extraversion 1
> > +
> > +/* arg == version_op_buf */
> > +#define XEN_VERSION_OP_capabilities 3
> > +
> > +/* arg == version_op_buf */
> > +#define XEN_VERSION_OP_changeset 4
> 
> Might be worth stating that these return NUL terminated utf-8 strings?

Done!
> 
> ~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 07/34] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables
  2016-03-15 20:02         ` Andrew Cooper
@ 2016-03-16 10:33           ` Jan Beulich
  0 siblings, 0 replies; 124+ messages in thread
From: Jan Beulich @ 2016-03-16 10:33 UTC (permalink / raw)
  To: Andrew Cooper, Konrad Rzeszutek Wilk
  Cc: Keir Fraser, mpohlack, ross.lagerwall, Julien Grall,
	Stefano Stabellini, xen-devel, sasha.levin

>>> On 15.03.16 at 21:02, <andrew.cooper3@citrix.com> wrote:
> On 15/03/16 19:51, Andrew Cooper wrote:
>> On 15/03/16 19:34, Konrad Rzeszutek Wilk wrote:
>>> On Tue, Mar 15, 2016 at 07:24:30PM +0000, Andrew Cooper wrote:
>>>> On 15/03/16 17:56, Konrad Rzeszutek Wilk wrote:
>>>>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
>>>>> index 31d2115..b62c91f 100644
>>>>> --- a/xen/arch/arm/traps.c
>>>>> +++ b/xen/arch/arm/traps.c
>>>>> @@ -16,6 +16,7 @@
>>>>>   * GNU General Public License for more details.
>>>>>   */
>>>>>  
>>>>> +#include <xen/bug_ex_symbols.h>
>>>> how about just <xen/virtual_region.h> ? It contains more than just
>>>> bugframes.
>>> /me nods.
>>>>> diff --git a/xen/common/bug_ex_symbols.c b/xen/common/bug_ex_symbols.c
>>>>> new file mode 100644
>>>>> index 0000000..77bb72b
>>>>> --- /dev/null
>>>>> +++ b/xen/common/bug_ex_symbols.c
>>>>> @@ -0,0 +1,119 @@
>>>>> +/*
>>>>> + * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
>>>>> + *
>>>>> + */
>>>>> +
>>>>> +#include <xen/bug_ex_symbols.h>
>>>>> +#include <xen/config.h>
>>>>> +#include <xen/kernel.h>
>>>>> +#include <xen/init.h>
>>>>> +#include <xen/spinlock.h>
>>>>> +
>>>>> +extern char __stext[];
>>>> There is no such symbol.  _stext comes in via kernel.h
>>> Argh.
>>>
>>>>> +
>>>>> +struct virtual_region kernel_text = {
>>>> How about just "compiled" ? This is more than just .text.
>>>>
>>>>> +    .list = LIST_HEAD_INIT(kernel_text.list),
>>>>> +    .start = (unsigned long)_stext,
>>>>> +    .end = (unsigned long)_etext,
>>>>> +#ifdef CONFIG_X86
>>>>> +    .ex = (struct exception_table_entry *)__start___ex_table,
>>>>> +    .ex_end = (struct exception_table_entry *)__stop___ex_table,
>>>>> +#endif
>>>>> +};
>>>>> +
>>>>> +/*
>>>>> + * The kernel_inittext should only be used when system_state
>>>>> + * is booting. Otherwise all accesses should be ignored.
>>>>> + */
>>>>> +static bool_t ignore_if_active(unsigned int flag, unsigned long priv)
>>>>> +{
>>>>> +    return (system_state >= SYS_STATE_active);
>>>>> +}
>>>>> +
>>>>> +/*
>>>>> + * Becomes irrelevant when __init sections are cleared.
>>>>> + */
>>>>> +struct virtual_region kernel_inittext  = {
>>>>> +    .list = LIST_HEAD_INIT(kernel_inittext.list),
>>>>> +    .skip = ignore_if_active,
>>>>> +    .start = (unsigned long)_sinittext,
>>>>> +    .end = (unsigned long)_einittext,
>>>>> +#ifdef CONFIG_X86
>>>>> +    /* Even if they are __init their exception entry still gets stuck here. 
> */
>>>>> +    .ex = (struct exception_table_entry *)__start___ex_table,
>>>>> +    .ex_end = (struct exception_table_entry *)__stop___ex_table,
>>>>> +#endif
>>>>> +};
>>>> This can live in .init.data and be taken off the linked list in
>>>> init_done(), which performs other bits of cleanup relating to .init
>>> Unfortunatly at that point of time it is SMP - so if we clean it up
>>> we need to use a spin_lock.
>>>
>>>>> +
>>>>> +/*
>>>>> + * No locking. Additions are done either at startup (when there is only
>>>>> + * one CPU) or when all CPUs are running without IRQs.
>>>>> + *
>>>>> + * Deletions are big tricky. We MUST make sure all but one CPU
>>>>> + * are running cpu_relax().
>>>> It should still be possible to lock this properly.  We expect no
>>>> contention, at which point acquiring and releasing the locks will always
>>>> hit fastpaths, but it will avoid accidental corruption if something goes
>>>> wrong.
>>>>
>>>> In each of register or deregister, take the lock, then confirm whether
>>>> the current region is in a list or not, by looking at r->list.  With the
>>>> single virtual_region_lock held, that can safely avoid repeatedly adding
>>>> the region to the region list.
>>> Yeah. I don't know why I was thinking we can't. Ah, I was thinking about
>>> traversing the list - and we don't want the spin_lock as this is in
>>> the do_traps or other code that really really should not take any spinlocks.
>>>
>>> But if the adding/removing is done under a spinlock then that is OK.
>>>
>>> Let me do that.
>> Actually, that isn't sufficient.  Sorry for misleaing you. 
>>
>> You have to exclude modifications to the list against other cpus waking
>> it in an exception handler, which might include NMI and MCE context.
>>
>> Now I think about it, going lockless here is probably a bonus, as we
>> don't want to be messing around with locks in fatal contexts.  In which
>> case, it would be better to use a single linked list and cmpxchg to
>> insert/remove elements.  It generally wants to be walked forwards, and
>> will only have a handful of elements, so searching forwards to delete
>> will be ok.
> 
> Actually, knowing that the list is only ever walked forwards by the
> exception handlers, and with some regular spinlocks around mutation,
> dudicious use of list_add_tail_rcu() and list_del_rcu() should suffice
> (I think), and will definitely be better than handrolling a single
> linked list.

Good that I went to the end of this sub-thread, before replying to
suggest just this.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 01/34] compat/x86: Remove unncessary #define.
  2016-03-15 17:56 ` [PATCH v4 01/34] compat/x86: Remove unncessary #define Konrad Rzeszutek Wilk
  2016-03-15 18:57   ` Andrew Cooper
@ 2016-03-16 11:08   ` Jan Beulich
  2016-03-17  0:44     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 124+ messages in thread
From: Jan Beulich @ 2016-03-16 11:08 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, Ian Jackson,
	Tim Deegan, mpohlack, sasha.levin, xen-devel

>>> On 15.03.16 at 18:56, <konrad.wilk@oracle.com> wrote:
> It is not used.

Consistently please - either keep them all (just to cover the case
that they might get used) or remove them all: xen_compile_info,
xen_changeset_info, etc are all unused too. Otoh
xennmi_callback is used, but xennmi_callback_t isn't. Which to me
suggests that we should leave this alone.

Jan

> --- a/xen/common/compat/kernel.c
> +++ b/xen/common/compat/kernel.c
> @@ -18,7 +18,6 @@ asm(".file \"" __FILE__ "\"");
>  
>  extern xen_commandline_t saved_cmdline;
>  
> -#define xen_extraversion compat_extraversion
>  #define xen_extraversion_t compat_extraversion_t
>  
>  #define xen_compile_info compat_compile_info
> -- 
> 2.5.0




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 34/34] MAINTAINERS/xsplice: Add myself and Ross as the maintainers.
  2016-03-15 17:56 ` [PATCH v4 34/34] MAINTAINERS/xsplice: Add myself and Ross as the maintainers Konrad Rzeszutek Wilk
@ 2016-03-16 11:10   ` Jan Beulich
  2016-03-17  0:44     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 124+ messages in thread
From: Jan Beulich @ 2016-03-16 11:10 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, Ian Jackson,
	Tim Deegan, mpohlack, sasha.levin, xen-devel

>>> On 15.03.16 at 18:56, <konrad.wilk@oracle.com> wrote:
> If you have a patch for xSplice send it our way!
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Keir Fraser <keir@xen.org>
> Cc: Tim Deegan <tim@xen.org>
> ---
> ---
>  MAINTAINERS | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 52cc538..dc7a929 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -420,6 +420,16 @@ F:  xen/include/xsm/
>  F:  xen/xsm/
>  F:  docs/misc/xsm-flask.txt
>  
> +XSPLICE
> +M:  Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> +M:  Ross Lagerwall <ross.lagerwall@citrix.com>
> +S:  Supported
> +F:  xen/common/xsplice*
> +F:  xen/include/xen/xsplice*
> +F:  arch/*/xsplice*

xen/arch/*/xsplice*

> +F:  docs/misc/xsplice.markdown
> +F:  tools/misc/xen-xsplice.c

I also wonder whether it wouldn't be nice to sort the entries by
name.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 06/34] x86/arm: Add BUGFRAME_NR define and BUILD checks.
  2016-03-15 17:56 ` [PATCH v4 06/34] x86/arm: Add BUGFRAME_NR define and BUILD checks Konrad Rzeszutek Wilk
  2016-03-15 18:54   ` Andrew Cooper
@ 2016-03-16 11:49   ` Julien Grall
  2016-03-18 12:40   ` Jan Beulich
  2 siblings, 0 replies; 124+ messages in thread
From: Julien Grall @ 2016-03-16 11:49 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, ross.lagerwall, konrad,
	andrew.cooper3, mpohlack, sasha.levin
  Cc: Stefano Stabellini, Keir Fraser, Jan Beulich

Hi Konrad,

On 15/03/2016 17:56, Konrad Rzeszutek Wilk wrote:
> So that we have a nice mechansim to figure out the upper
> bounds of bug.frames and also catch compiler errors in case
> one tries to use a higher frame number.
>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

For the ARM part:

Acked-by: Julien Grall <julien.grall@arm.com>

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 12/34] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
  2016-03-15 17:56 ` [PATCH v4 12/34] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op Konrad Rzeszutek Wilk
@ 2016-03-16 12:12   ` Julien Grall
  2016-03-16 19:58     ` Konrad Rzeszutek Wilk
  2016-03-23 13:51   ` Jan Beulich
  1 sibling, 1 reply; 124+ messages in thread
From: Julien Grall @ 2016-03-16 12:12 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, Ross Lagerwall, konrad,
	andrew.cooper3, mpohlack, Stefano Stabellini
  Cc: Ian Jackson, Daniel De Graaf, Wei Liu, Stefano Stabellini

Hi Konrad,

On 15/03/2016 17:56, Konrad Rzeszutek Wilk wrote:
> diff --git a/xen/common/Kconfig b/xen/common/Kconfig
> index 8fbc46d..dbe9ccc 100644
> --- a/xen/common/Kconfig
> +++ b/xen/common/Kconfig
> @@ -168,4 +168,15 @@ config SCHED_DEFAULT
>
>   endmenu
>
> +# Enable/Disable xsplice support
> +config XSPLICE
> +	bool "xSplice live patching support"
> +	default y

I think it would be better to disable xSplice on ARM until we 
effectively support it.

It will avoid people asking on the mailing why xSplice doesn't work for 
ARM platform.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 05/34] libxc/libxl/python/xenstat: Use new XEN_VERSION_OP hypercall
  2016-03-15 17:56 ` [PATCH v4 05/34] libxc/libxl/python/xenstat: Use new XEN_VERSION_OP hypercall Konrad Rzeszutek Wilk
  2016-03-15 18:45   ` Andrew Cooper
@ 2016-03-16 12:31   ` George Dunlap
  2016-03-16 18:11   ` Wei Liu
  2 siblings, 0 replies; 124+ messages in thread
From: George Dunlap @ 2016-03-16 12:31 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, ross.lagerwall, konrad,
	andrew.cooper3, mpohlack, sasha.levin
  Cc: Wei Liu, Stefano Stabellini, George Dunlap, Ian Jackson,
	Julien Grall, David Scott

On 15/03/16 17:56, Konrad Rzeszutek Wilk wrote:
> We change the xen_version libxc code to use the new hypercall.
> Which of course means every user in the code base has to
> be changed over.
> 
> It is important to note that the xc_version_op has a different
> return semantic than the previous one. It returns negative
> values on error (like the old one), but it also returns
> an positive value on success (unlike the old one). The positive
> value is the number of bytes copied in.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: Julien Grall <julien.grall@arm.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: David Scott <dave@recoil.org>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>

xenctx bit:

Acked-by: George Dunlap <george.dunlap@citrix.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 02/34] libxc: Remove dead code (XENVER_capabilities)
  2016-03-15 17:56 ` [PATCH v4 02/34] libxc: Remove dead code (XENVER_capabilities) Konrad Rzeszutek Wilk
  2016-03-15 18:04   ` Andrew Cooper
@ 2016-03-16 18:11   ` Wei Liu
  1 sibling, 0 replies; 124+ messages in thread
From: Wei Liu @ 2016-03-16 18:11 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, ross.lagerwall, andrew.cooper3, Stefano Stabellini,
	Ian Jackson, mpohlack, sasha.levin, xen-devel

On Tue, Mar 15, 2016 at 01:56:24PM -0400, Konrad Rzeszutek Wilk wrote:
> The 'caps' is not used anywhere in there.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

> ---
> ---
>  tools/libxc/xc_dom_x86.c | 7 -------
>  1 file changed, 7 deletions(-)
> 
> diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
> index bdec40a..021f8a8 100644
> --- a/tools/libxc/xc_dom_x86.c
> +++ b/tools/libxc/xc_dom_x86.c
> @@ -1259,7 +1259,6 @@ static int meminit_hvm(struct xc_dom_image *dom)
>      unsigned long target_pages = dom->target_pages;
>      unsigned long cur_pages, cur_pfn;
>      int rc;
> -    xen_capabilities_info_t caps;
>      unsigned long stat_normal_pages = 0, stat_2mb_pages = 0, 
>          stat_1gb_pages = 0;
>      unsigned int memflags = 0;
> @@ -1339,12 +1338,6 @@ static int meminit_hvm(struct xc_dom_image *dom)
>          goto error_out;
>      }
>  
> -    if ( xc_version(xch, XENVER_capabilities, &caps) != 0 )
> -    {
> -        DOMPRINTF("Could not get Xen capabilities");
> -        goto error_out;
> -    }
> -
>      dom->p2m_size = p2m_size;
>      dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) *
>                                        dom->p2m_size);
> -- 
> 2.5.0
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 05/34] libxc/libxl/python/xenstat: Use new XEN_VERSION_OP hypercall
  2016-03-15 17:56 ` [PATCH v4 05/34] libxc/libxl/python/xenstat: Use new XEN_VERSION_OP hypercall Konrad Rzeszutek Wilk
  2016-03-15 18:45   ` Andrew Cooper
  2016-03-16 12:31   ` George Dunlap
@ 2016-03-16 18:11   ` Wei Liu
  2016-03-17  1:08     ` Konrad Rzeszutek Wilk
  2 siblings, 1 reply; 124+ messages in thread
From: Wei Liu @ 2016-03-16 18:11 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, ross.lagerwall, George Dunlap, andrew.cooper3,
	Stefano Stabellini, Ian Jackson, mpohlack, Julien Grall,
	sasha.levin, David Scott, xen-devel

On Tue, Mar 15, 2016 at 01:56:27PM -0400, Konrad Rzeszutek Wilk wrote:
[...]
> diff --git a/tools/libxc/xg_save_restore.h b/tools/libxc/xg_save_restore.h
> index 303081d..2663969 100644
> --- a/tools/libxc/xg_save_restore.h
> +++ b/tools/libxc/xg_save_restore.h
> @@ -57,10 +57,12 @@ static inline int get_platform_info(xc_interface *xch, uint32_t dom,
>      xen_capabilities_info_t xen_caps = "";
>      xen_platform_parameters_t xen_params;
>  
> -    if (xc_version(xch, XENVER_platform_parameters, &xen_params) != 0)
> +    if (xc_version(xch, XEN_VERSION_OP_platform_parameters, &xen_params,
> +                   sizeof(xen_params)) < 0)
>          return 0;
>  
> -    if (xc_version(xch, XENVER_capabilities, &xen_caps) != 0)
> +    if (xc_version(xch, XEN_VERSION_OP_capabilities, xen_caps,
> +                   sizeof(xen_caps)) < 0)
>          return 0;
>  
>      if (xc_maximum_ram_page(xch, max_mfn))
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 93e228d..dc660b7 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -5191,50 +5191,73 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
>      return ret;
>  }
>  
> +
> +static int xc_version_wrapper(libxl_ctx *ctx, unsigned int cmd, char *buf, ssize_t len, char **dst)
> +{

This function should accept libxl__gc *gc instead of libxl_ctx.

Preferably is should be renamed to

libxl__xc_version_wrapper

> +    GC_INIT(ctx);
> +    int r;
> +
> +    r = xc_version(ctx->xch, cmd, buf, len);

Then here CTX->xch

> +    if ( r == -EPERM )
> +        buf[0] = '\0';
> +    else if ( r < 0 )
> +    {
> +        GC_FREE;
> +        return r;
> +    }
> +    *dst = libxl__strdup(NOGC, buf);
> +    GC_FREE;

Then get rid of all GC_* macros.

> +    return 0;
> +}
> +
>  const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx)
[...]
> diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
> index c40a4e9..23876f0 100644
> --- a/tools/python/xen/lowlevel/xc/xc.c
> +++ b/tools/python/xen/lowlevel/xc/xc.c
> @@ -1204,34 +1204,40 @@ static PyObject *pyxc_xeninfo(XcObject *self)
>      xen_capabilities_info_t xen_caps;
>      xen_platform_parameters_t p_parms;
>      xen_commandline_t xen_commandline;
> -    long xen_version;
> -    long xen_pagesize;
> +    xen_version_op_val_t xen_version;
> +    xen_version_op_val_t xen_pagesize;
>      char str[128];
>  
> -    xen_version = xc_version(self->xc_handle, XENVER_version, NULL);
> -
> -    if ( xc_version(self->xc_handle, XENVER_extraversion, &xen_extra) != 0 )
> +    if ( xc_version(self->xc_handle, XEN_VERSION_OP_version, &xen_version,
> +                    sizeof(xen_version)) < 0)
>          return pyxc_error_to_exception(self->xc_handle);
>  
> -    if ( xc_version(self->xc_handle, XENVER_compile_info, &xen_cc) != 0 )
> +    if ( xc_version(self->xc_handle, XEN_VERSION_OP_extraversion, &xen_extra,
> +                    sizeof(xen_extra)) < 0 )
>          return pyxc_error_to_exception(self->xc_handle);
>  
> -    if ( xc_version(self->xc_handle, XENVER_changeset, &xen_chgset) != 0 )
> +    memset(&xen_cc, 0, sizeof(xen_cc));
> +
> +    if ( xc_version(self->xc_handle, XEN_VERSION_OP_changeset, &xen_chgset,
> +                    sizeof(xen_chgset)) < 0 )
>          return pyxc_error_to_exception(self->xc_handle);
>  
> -    if ( xc_version(self->xc_handle, XENVER_capabilities, &xen_caps) != 0 )
> +    if ( xc_version(self->xc_handle, XEN_VERSION_OP_capabilities, &xen_caps,
> +                   sizeof(xen_caps)) < 0 )

Indentation.

>          return pyxc_error_to_exception(self->xc_handle);
>  
> -    if ( xc_version(self->xc_handle, XENVER_platform_parameters, &p_parms) != 0 )
> +    if ( xc_version(self->xc_handle, XEN_VERSION_OP_platform_parameters,
> +                    &p_parms, sizeof(p_parms)) < 0 )
>          return pyxc_error_to_exception(self->xc_handle);
>  
> -    if ( xc_version(self->xc_handle, XENVER_commandline, &xen_commandline) != 0 )
> +    if ( xc_version(self->xc_handle, XEN_VERSION_OP_commandline,
> +                    &xen_commandline, sizeof(xen_commandline)) < 0 )
>          return pyxc_error_to_exception(self->xc_handle);
>  
>      snprintf(str, sizeof(str), "virt_start=0x%"PRI_xen_ulong, p_parms.virt_start);
>  
> -    xen_pagesize = xc_version(self->xc_handle, XENVER_pagesize, NULL);
> -    if (xen_pagesize < 0 )
> +    if ( xc_version(self->xc_handle, XEN_VERSION_OP_pagesize, &xen_pagesize,
> +                    sizeof(xen_pagesize)) < 0)
>          return pyxc_error_to_exception(self->xc_handle);
>  
>      return Py_BuildValue("{s:i,s:i,s:s,s:s,s:i,s:s,s:s,s:s,s:s,s:s,s:s,s:s}",
> diff --git a/tools/xenstat/libxenstat/src/xenstat.c b/tools/xenstat/libxenstat/src/xenstat.c
> index 3495f3f..723e46a 100644
> --- a/tools/xenstat/libxenstat/src/xenstat.c
> +++ b/tools/xenstat/libxenstat/src/xenstat.c
> @@ -621,20 +621,18 @@ unsigned long long xenstat_network_tdrop(xenstat_network * network)
>  /* Collect Xen version information */
>  static int xenstat_collect_xen_version(xenstat_node * node)
>  {
> -	long vnum = 0;
> +	xen_version_op_val_t vnum = 0;
>  	xen_extraversion_t version;
>  
>  	/* Collect Xen version information if not already collected */
>  	if (node->handle->xen_version[0] == '\0') {
>  		/* Get the Xen version number and extraversion string */
> -		vnum = xc_version(node->handle->xc_handle,
> -			XENVER_version, NULL);
> -
> -		if (vnum < 0)
> +		if (xc_version(node->handle->xc_handle,
> +			           XEN_VERSION_OP_version, &vnum, sizeof(vnum)) < 0 )

Indentation.

>  			return 0;
>  
> -		if (xc_version(node->handle->xc_handle, XENVER_extraversion,
> -			&version) < 0)
> +		if (xc_version(node->handle->xc_handle, XEN_VERSION_OP_extraversion,
> +			           &version, sizeof(version)) < 0)

Indentation.


Wei.

>  			return 0;
>  		/* Format the version information as a string and store it */
>  		snprintf(node->handle->xen_version, VERSION_SIZE, "%ld.%ld%s",
> diff --git a/tools/xentrace/xenctx.c b/tools/xentrace/xenctx.c
> index e647179..14d2f8b 100644
> --- a/tools/xentrace/xenctx.c
> +++ b/tools/xentrace/xenctx.c
> @@ -1000,7 +1000,8 @@ static void dump_ctx(int vcpu)
>              guest_word_size = (cpuctx.msr_efer & 0x400) ? 8 :
>                  guest_protected_mode ? 4 : 2;
>              /* HVM guest context records are always host-sized */
> -            if (xc_version(xenctx.xc_handle, XENVER_capabilities, &xen_caps) != 0) {
> +            if (xc_version(xenctx.xc_handle, XEN_VERSION_OP_capabilities,
> +                           &xen_caps, sizeof(xen_caps)) < 0) {
>                  perror("xc_version");
>                  return;
>              }
> -- 
> 2.5.0
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 13/34] libxc: Implementation of XEN_XSPLICE_op in libxc
  2016-03-15 17:56 ` [PATCH v4 13/34] libxc: Implementation of XEN_XSPLICE_op in libxc Konrad Rzeszutek Wilk
@ 2016-03-16 18:12   ` Wei Liu
  2016-03-16 20:36     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 124+ messages in thread
From: Wei Liu @ 2016-03-16 18:12 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, ross.lagerwall, andrew.cooper3, Stefano Stabellini,
	Ian Jackson, mpohlack, sasha.levin, xen-devel

On Tue, Mar 15, 2016 at 01:56:35PM -0400, Konrad Rzeszutek Wilk wrote:
> The underlaying toolstack code to do the basic
> operations when using the XEN_XSPLICE_op syscalls:
>  - upload the payload,
>  - get status of an payload,
>  - list all the payloads,
>  - apply, check, replace, and revert the payload.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> 
> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> 
> v2: Actually set zero for the _pad entries.
> v3: Split status into state and error code.
>     Add REPLACE action.
> v4: Use timeout and utilize pads.
> v5: Update per Wei's review.
> v6: Update per Wei's review.
> v7: Extra space slipped in, remove it

Huh, the title says  v4 but here it is v7.

I believe issues I mentioned in previous iterations are fixed.

Acked-by: Wei Liu <wei.liu2@citrix.com>

Only one nitpick below.

> +/*
> + * The operations are asynchronous and the hypervisor may take a while
> + * to complete them. The `timeout` offers an option to expire the
> + * operation if it could not be completed within the specified time.
> + * Value of 0 means let hypervisor decide the best timeout.
> + */

Might be useful to specify the unit of timeout (ms?).


Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 14/34] xen-xsplice: Tool to manipulate xsplice payloads
  2016-03-15 17:56 ` [PATCH v4 14/34] xen-xsplice: Tool to manipulate xsplice payloads Konrad Rzeszutek Wilk
@ 2016-03-16 18:12   ` Wei Liu
  0 siblings, 0 replies; 124+ messages in thread
From: Wei Liu @ 2016-03-16 18:12 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, ross.lagerwall, andrew.cooper3, Stefano Stabellini,
	Ian Jackson, mpohlack, sasha.levin, xen-devel

On Tue, Mar 15, 2016 at 01:56:36PM -0400, Konrad Rzeszutek Wilk wrote:
> A simple tool that allows an system admin to perform
> basic xsplice operations:
> 
>  - Upload a xsplice file (with an unique name)
>  - List all the xsplice payloads loaded.
>  - Apply, revert, replace, unload, or check the payload using the
>    unique name.
>  - Do all three - upload, check, and apply the
>    payload in one go (load). Also will use the name of the
>    file as the <name>
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> 

Acked-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 27/34] libxl: info: Display build_id of the hypervisor using XEN_VERSION_OP_build_id
  2016-03-15 17:56 ` [PATCH v4 27/34] libxl: info: Display build_id of the hypervisor using XEN_VERSION_OP_build_id Konrad Rzeszutek Wilk
@ 2016-03-16 18:12   ` Wei Liu
  0 siblings, 0 replies; 124+ messages in thread
From: Wei Liu @ 2016-03-16 18:12 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, ross.lagerwall, andrew.cooper3, Stefano Stabellini,
	Ian Jackson, mpohlack, sasha.levin, xen-devel

On Tue, Mar 15, 2016 at 01:56:49PM -0400, Konrad Rzeszutek Wilk wrote:
> If the hypervisor is built with we will display it.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 25/34] build_id: Provide ld-embedded build-ids
  2016-03-15 17:56 ` [PATCH v4 25/34] build_id: Provide ld-embedded build-ids Konrad Rzeszutek Wilk
@ 2016-03-16 18:34   ` Julien Grall
  2016-03-16 21:02     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 124+ messages in thread
From: Julien Grall @ 2016-03-16 18:34 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, ross.lagerwall, konrad,
	andrew.cooper3, mpohlack, sasha.levin
  Cc: Stefano Stabellini, Keir Fraser, Jan Beulich

Hi Konrad,

On 15/03/2016 17:56, Konrad Rzeszutek Wilk wrote:
> diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
> index 9909595..187ef73 100644
> --- a/xen/arch/arm/xen.lds.S
> +++ b/xen/arch/arm/xen.lds.S
> @@ -22,6 +22,9 @@ OUTPUT_ARCH(FORMAT)
>   PHDRS
>   {
>     text PT_LOAD /* XXX should be AT ( XEN_PHYS_START ) */ ;
> +#if defined(BUILD_ID)
> +  note PT_NOTE ;
> +#endif
>   }
>   SECTIONS
>   {
> @@ -50,16 +53,21 @@ SECTIONS
>          __stop_bug_frames_2 = .;
>          *(.rodata)
>          *(.rodata.*)
> -
> -#ifdef LOCK_PROFILE
> -       . = ALIGN(POINTER_ALIGN);
> -       __lock_profile_start = .;
> -       *(.lockprofile.data)
> -       __lock_profile_end = .;

I think this is a spurious change.


> +#if !defined(BUILD_ID)
> +        _erodata = .;          /* End of read-only data */
>   #endif

Is it possible to move _erodata out of the section?

Something like:

.ALIGN(PAGE_SIZE);
_srodata = .; /* Read-only data */
.rodata : {
[...]
} :text

#if defined(BUILD_ID)
.note : {
} :text
#endif

_erodata = .; /* End of read-only data */

> +  } :text
>
> +#if defined(BUILD_ID)

No alignment required?

> +  .note : {
> +       __note_gnu_build_id_start = .;
> +       *(.note.gnu.build-id)
> +       __note_gnu_build_id_end = .;
> +       *(.note)
> +       *(.note.*)
>           _erodata = .;          /* End of read-only data */
>     } :text
> +#endif
>
>     .data : {                    /* Data */
>          . = ALIGN(PAGE_SIZE);

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 12/34] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
  2016-03-16 12:12   ` Julien Grall
@ 2016-03-16 19:58     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-16 19:58 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Liu, Stefano Stabellini, andrew.cooper3, Ross Lagerwall,
	Ian Jackson, mpohlack, xen-devel, Daniel De Graaf

On Wed, Mar 16, 2016 at 12:12:00PM +0000, Julien Grall wrote:
> Hi Konrad,
> 
> On 15/03/2016 17:56, Konrad Rzeszutek Wilk wrote:
> >diff --git a/xen/common/Kconfig b/xen/common/Kconfig
> >index 8fbc46d..dbe9ccc 100644
> >--- a/xen/common/Kconfig
> >+++ b/xen/common/Kconfig
> >@@ -168,4 +168,15 @@ config SCHED_DEFAULT
> >
> >  endmenu
> >
> >+# Enable/Disable xsplice support
> >+config XSPLICE
> >+	bool "xSplice live patching support"
> >+	default y
> 
> I think it would be better to disable xSplice on ARM until we effectively
> support it.

If you would like. I made it dependent on x86.
> 
> It will avoid people asking on the mailing why xSplice doesn't work for ARM
> platform.
> 
> Regards,
> 
> -- 
> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 13/34] libxc: Implementation of XEN_XSPLICE_op in libxc
  2016-03-16 18:12   ` Wei Liu
@ 2016-03-16 20:36     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-16 20:36 UTC (permalink / raw)
  To: Wei Liu
  Cc: ross.lagerwall, andrew.cooper3, Stefano Stabellini, Ian Jackson,
	mpohlack, sasha.levin, xen-devel

On Wed, Mar 16, 2016 at 06:12:02PM +0000, Wei Liu wrote:
> On Tue, Mar 15, 2016 at 01:56:35PM -0400, Konrad Rzeszutek Wilk wrote:
> > The underlaying toolstack code to do the basic
> > operations when using the XEN_XSPLICE_op syscalls:
> >  - upload the payload,
> >  - get status of an payload,
> >  - list all the payloads,
> >  - apply, check, replace, and revert the payload.
> > 
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> > 
> > ---
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > Cc: Wei Liu <wei.liu2@citrix.com>
> > 
> > v2: Actually set zero for the _pad entries.
> > v3: Split status into state and error code.
> >     Add REPLACE action.
> > v4: Use timeout and utilize pads.
> > v5: Update per Wei's review.
> > v6: Update per Wei's review.
> > v7: Extra space slipped in, remove it
> 
> Huh, the title says  v4 but here it is v7.
> 
> I believe issues I mentioned in previous iterations are fixed.

Yes. Thanks.
> 
> Acked-by: Wei Liu <wei.liu2@citrix.com>
> 
> Only one nitpick below.
> 
> > +/*
> > + * The operations are asynchronous and the hypervisor may take a while
> > + * to complete them. The `timeout` offers an option to expire the
> > + * operation if it could not be completed within the specified time.
> > + * Value of 0 means let hypervisor decide the best timeout.
> > + */
> 
> Might be useful to specify the unit of timeout (ms?).

Done! Thanks!
> 
> 
> Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 25/34] build_id: Provide ld-embedded build-ids
  2016-03-16 18:34   ` Julien Grall
@ 2016-03-16 21:02     ` Konrad Rzeszutek Wilk
  2016-03-17  1:12       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-16 21:02 UTC (permalink / raw)
  To: Julien Grall
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, mpohlack,
	Stefano Stabellini, Jan Beulich, sasha.levin, xen-devel

On Wed, Mar 16, 2016 at 06:34:24PM +0000, Julien Grall wrote:
> Hi Konrad,
> 
> On 15/03/2016 17:56, Konrad Rzeszutek Wilk wrote:
> >diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
> >index 9909595..187ef73 100644
> >--- a/xen/arch/arm/xen.lds.S
> >+++ b/xen/arch/arm/xen.lds.S
> >@@ -22,6 +22,9 @@ OUTPUT_ARCH(FORMAT)
> >  PHDRS
> >  {
> >    text PT_LOAD /* XXX should be AT ( XEN_PHYS_START ) */ ;
> >+#if defined(BUILD_ID)
> >+  note PT_NOTE ;
> >+#endif
> >  }
> >  SECTIONS
> >  {
> >@@ -50,16 +53,21 @@ SECTIONS
> >         __stop_bug_frames_2 = .;
> >         *(.rodata)
> >         *(.rodata.*)
> >-
> >-#ifdef LOCK_PROFILE
> >-       . = ALIGN(POINTER_ALIGN);
> >-       __lock_profile_start = .;
> >-       *(.lockprofile.data)
> >-       __lock_profile_end = .;
> 
> I think this is a spurious change.

Indeed! git mergetool and git rerere really did a number on me.
> 
> 
> >+#if !defined(BUILD_ID)
> >+        _erodata = .;          /* End of read-only data */
> >  #endif
> 
> Is it possible to move _erodata out of the section?

Should be. _end does that.

> 
> Something like:
> 
> .ALIGN(PAGE_SIZE);
> _srodata = .; /* Read-only data */
> .rodata : {
> [...]
> } :text
> 
> #if defined(BUILD_ID)
> .note : {
> } :text
> #endif
> 
> _erodata = .; /* End of read-only data */

Let me try that.

> 
> >+  } :text
> >
> >+#if defined(BUILD_ID)
> 
> No alignment required?

None needed. Can do byte granularity.
> 
> >+  .note : {
> >+       __note_gnu_build_id_start = .;
> >+       *(.note.gnu.build-id)
> >+       __note_gnu_build_id_end = .;
> >+       *(.note)
> >+       *(.note.*)
> >          _erodata = .;          /* End of read-only data */
> >    } :text
> >+#endif
> >
> >    .data : {                    /* Data */
> >         . = ALIGN(PAGE_SIZE);
> 
> Regards,
> 
> -- 
> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 01/34] compat/x86: Remove unncessary #define.
  2016-03-16 11:08   ` Jan Beulich
@ 2016-03-17  0:44     ` Konrad Rzeszutek Wilk
  2016-03-17  7:45       ` Jan Beulich
  0 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-17  0:44 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, andrew.cooper3, Ian Jackson, Tim Deegan, mpohlack,
	ross.lagerwall, sasha.levin, xen-devel

On Wed, Mar 16, 2016 at 05:08:30AM -0600, Jan Beulich wrote:
> >>> On 15.03.16 at 18:56, <konrad.wilk@oracle.com> wrote:
> > It is not used.
> 
> Consistently please - either keep them all (just to cover the case
> that they might get used) or remove them all: xen_compile_info,
> xen_changeset_info, etc are all unused too. Otoh
> xennmi_callback is used, but xennmi_callback_t isn't. Which to me
> suggests that we should leave this alone.

Oddly enough taking an cleaver to it was OK.

From 7e3ed6faed6e083f27ad6be947ac528c3eaba9a1 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Wed, 2 Mar 2016 12:50:32 -0500
Subject: [PATCH v4 02/35] compat/x86: Remove unncessary #defines.

They are not used.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>

v2: Remove a lot more of them.
---
---
 xen/common/compat/kernel.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/xen/common/compat/kernel.c b/xen/common/compat/kernel.c
index df93fdd..dc898ae 100644
--- a/xen/common/compat/kernel.c
+++ b/xen/common/compat/kernel.c
@@ -18,30 +18,22 @@ asm(".file \"" __FILE__ "\"");
 
 extern xen_commandline_t saved_cmdline;
 
-#define xen_extraversion compat_extraversion
 #define xen_extraversion_t compat_extraversion_t
 
-#define xen_compile_info compat_compile_info
 #define xen_compile_info_t compat_compile_info_t
 
 CHECK_TYPE(capabilities_info);
 
-#define xen_platform_parameters compat_platform_parameters
 #define xen_platform_parameters_t compat_platform_parameters_t
 #undef HYPERVISOR_VIRT_START
 #define HYPERVISOR_VIRT_START HYPERVISOR_COMPAT_VIRT_START(current->domain)
 
-#define xen_changeset_info compat_changeset_info
 #define xen_changeset_info_t compat_changeset_info_t
 
-#define xen_feature_info compat_feature_info
 #define xen_feature_info_t compat_feature_info_t
 
 CHECK_TYPE(domain_handle);
 
-#define xennmi_callback compat_nmi_callback
-#define xennmi_callback_t compat_nmi_callback_t
-
 #ifdef COMPAT_VM_ASSIST_VALID
 #undef VM_ASSIST_VALID
 #define VM_ASSIST_VALID COMPAT_VM_ASSIST_VALID
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 34/34] MAINTAINERS/xsplice: Add myself and Ross as the maintainers.
  2016-03-16 11:10   ` Jan Beulich
@ 2016-03-17  0:44     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-17  0:44 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, andrew.cooper3, Ian Jackson, Tim Deegan, mpohlack,
	ross.lagerwall, sasha.levin, xen-devel

On Wed, Mar 16, 2016 at 05:10:29AM -0600, Jan Beulich wrote:
> >>> On 15.03.16 at 18:56, <konrad.wilk@oracle.com> wrote:
> > If you have a patch for xSplice send it our way!
> > 
> > Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > 
> > ---
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Keir Fraser <keir@xen.org>
> > Cc: Tim Deegan <tim@xen.org>
> > ---
> > ---
> >  MAINTAINERS | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 52cc538..dc7a929 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -420,6 +420,16 @@ F:  xen/include/xsm/
> >  F:  xen/xsm/
> >  F:  docs/misc/xsm-flask.txt
> >  
> > +XSPLICE
> > +M:  Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > +M:  Ross Lagerwall <ross.lagerwall@citrix.com>
> > +S:  Supported
> > +F:  xen/common/xsplice*
> > +F:  xen/include/xen/xsplice*
> > +F:  arch/*/xsplice*
> 
> xen/arch/*/xsplice*
> 
> > +F:  docs/misc/xsplice.markdown
> > +F:  tools/misc/xen-xsplice.c
> 
> I also wonder whether it wouldn't be nice to sort the entries by
> name.

Done!
From 1ef0b51f29de904515098470c49fa8c7ad868648 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Mon, 15 Feb 2016 16:24:58 -0500
Subject: [PATCH v4 35/35] MAINTAINERS/xsplice: Add myself and Ross as the
 maintainers.

If you have a patch for xSplice send it our way!

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>

v2: Sort them F: fields (Jan)
---
---
 MAINTAINERS | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 52cc538..64036e7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -420,6 +420,16 @@ F:  xen/include/xsm/
 F:  xen/xsm/
 F:  docs/misc/xsm-flask.txt
 
+XSPLICE
+M:  Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
+M:  Ross Lagerwall <ross.lagerwall@citrix.com>
+S:  Supported
+F:  docs/misc/xsplice.markdown
+F:  tools/misc/xen-xsplice.c
+F:  xen/arch/*/xsplice*
+F:  xen/common/xsplice*
+F:  xen/include/xen/xsplice*
+
 THE REST
 M:	Ian Jackson <ian.jackson@eu.citrix.com>
 M:	Jan Beulich <jbeulich@suse.com>
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 05/34] libxc/libxl/python/xenstat: Use new XEN_VERSION_OP hypercall
  2016-03-16 18:11   ` Wei Liu
@ 2016-03-17  1:08     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-17  1:08 UTC (permalink / raw)
  To: Wei Liu
  Cc: George Dunlap, andrew.cooper3, Stefano Stabellini, Ian Jackson,
	mpohlack, ross.lagerwall, Julien Grall, sasha.levin, David Scott,
	xen-devel

On Wed, Mar 16, 2016 at 06:11:57PM +0000, Wei Liu wrote:
> On Tue, Mar 15, 2016 at 01:56:27PM -0400, Konrad Rzeszutek Wilk wrote:
> [...]
> > diff --git a/tools/libxc/xg_save_restore.h b/tools/libxc/xg_save_restore.h
> > index 303081d..2663969 100644
> > --- a/tools/libxc/xg_save_restore.h
> > +++ b/tools/libxc/xg_save_restore.h
> > @@ -57,10 +57,12 @@ static inline int get_platform_info(xc_interface *xch, uint32_t dom,
> >      xen_capabilities_info_t xen_caps = "";
> >      xen_platform_parameters_t xen_params;
> >  
> > -    if (xc_version(xch, XENVER_platform_parameters, &xen_params) != 0)
> > +    if (xc_version(xch, XEN_VERSION_OP_platform_parameters, &xen_params,
> > +                   sizeof(xen_params)) < 0)
> >          return 0;
> >  
> > -    if (xc_version(xch, XENVER_capabilities, &xen_caps) != 0)
> > +    if (xc_version(xch, XEN_VERSION_OP_capabilities, xen_caps,
> > +                   sizeof(xen_caps)) < 0)
> >          return 0;
> >  
> >      if (xc_maximum_ram_page(xch, max_mfn))
> > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> > index 93e228d..dc660b7 100644
> > --- a/tools/libxl/libxl.c
> > +++ b/tools/libxl/libxl.c
> > @@ -5191,50 +5191,73 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
> >      return ret;
> >  }
> >  
> > +
> > +static int xc_version_wrapper(libxl_ctx *ctx, unsigned int cmd, char *buf, ssize_t len, char **dst)
> > +{
> 
> This function should accept libxl__gc *gc instead of libxl_ctx.
> 
> Preferably is should be renamed to
> 
> libxl__xc_version_wrapper
> 
> > +    GC_INIT(ctx);
> > +    int r;
> > +
> > +    r = xc_version(ctx->xch, cmd, buf, len);
> 
> Then here CTX->xch
> 
> > +    if ( r == -EPERM )
> > +        buf[0] = '\0';
> > +    else if ( r < 0 )
> > +    {
> > +        GC_FREE;
> > +        return r;
> > +    }
> > +    *dst = libxl__strdup(NOGC, buf);
> > +    GC_FREE;
> 
> Then get rid of all GC_* macros.
> 
> > +    return 0;
> > +}
> > +
> >  const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx)
> [...]

Would this be to your liking?


Note that the Ocaml and xenstat are using real tabs instead of four
spaces so they look odd.


From 36f39388c262123a0bdcddffdf565a032d7ca2cb Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Thu, 10 Mar 2016 16:11:59 -0500
Subject: [PATCH v4 06/35] libxc/libxl/python/xenstat/ocaml: Use new
 XEN_VERSION_OP hypercall

We change the xen_version libxc code to use the new hypercall.
Which of course means every user in the code base has to
be changed over.

It is important to note that the xc_version_op has a different
return semantic than the previous one. It returns negative
values on error (like the old one), but it also returns
an positive value on success (unlike the old one). The positive
value is the number of bytes copied in.

Note that both Ocaml and xenstat use tabs instead of four
spaces so they look quite odd.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com> [for the Ocaml stubs]
Acked-by: George Dunlap <george.dunlap@eu.citrix.com> [xenctx bits]
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Scott <dave@recoil.org>
Cc: George Dunlap <george.dunlap@eu.citrix.com>

v2: Use xc_version_op_val_t instead of uint32 or such
v3: Make sure to check ret < 0 instead of ret (as it returns the size) -
    in Ocaml code. Found by Andrew.
v4: Update comment for xc_version to mention the return the size
v5: Wei's review
---
 tools/libxc/include/xenctrl.h          | 32 +++++++++++++-
 tools/libxc/xc_core.c                  | 35 +++++++--------
 tools/libxc/xc_dom_boot.c              | 12 +++++-
 tools/libxc/xc_domain.c                |  3 +-
 tools/libxc/xc_private.c               | 53 ++++-------------------
 tools/libxc/xc_private.h               |  7 +--
 tools/libxc/xc_resume.c                |  3 +-
 tools/libxc/xc_sr_save.c               |  9 ++--
 tools/libxc/xg_save_restore.h          |  6 ++-
 tools/libxl/libxl.c                    | 79 ++++++++++++++++++++++------------
 tools/ocaml/libs/xc/xenctrl_stubs.c    | 39 +++++++----------
 tools/python/xen/lowlevel/xc/xc.c      | 30 +++++++------
 tools/xenstat/libxenstat/src/xenstat.c | 12 +++---
 tools/xentrace/xenctx.c                |  3 +-
 14 files changed, 177 insertions(+), 146 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 150d727..05ca440 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1477,7 +1477,37 @@ int xc_tbuf_set_evt_mask(xc_interface *xch, uint32_t mask);
 int xc_domctl(xc_interface *xch, struct xen_domctl *domctl);
 int xc_sysctl(xc_interface *xch, struct xen_sysctl *sysctl);
 
-int xc_version(xc_interface *xch, int cmd, void *arg);
+/**
+ * This function returns the size of buffer to be allocated for
+ * the cmd. The cmd are XEN_VERSION_OP_*.
+ */
+ssize_t xc_version_len(xc_interface *xch, unsigned int cmd);
+
+/**
+ * This function retrieves the information from the version_op hypercall.
+ * The len is the size of the arg buffer. If arg is NULL, will not
+ * perform hypercall - instead will just return the size of arg
+ * buffer that is needed.
+ *
+ * Note that prior to Xen 4.7 this would return 0 for success and
+ * negative value (-1) for error (with the error in errno). In Xen 4.7
+ * and later for success it will return an positive value which is the
+ * number of bytes copied in arg.
+ *
+ * It can also return -1 with various errno values:
+ *  - EPERM - not permitted.
+ *  - ENOBUFS - the len was to short, output in arg truncated.
+ *  - ENOSYS - not implemented.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm cmd XEN_VERSION_OP_* value
+ * @param arg Pointer to xen_version_op_buf_t or xen_version_op_val_t
+ * @param len Size of arg
+ * @return size of bytes copied in arg on success, -1 on failure (and
+ * errno will contain the error)
+ *
+ */
+int xc_version(xc_interface *xch, unsigned int cmd, void *arg, size_t len);
 
 int xc_flask_op(xc_interface *xch, xen_flask_op_t *op);
 
diff --git a/tools/libxc/xc_core.c b/tools/libxc/xc_core.c
index d792566..58b03d6 100644
--- a/tools/libxc/xc_core.c
+++ b/tools/libxc/xc_core.c
@@ -270,42 +270,43 @@ elfnote_fill_xen_version(xc_interface *xch,
                          *xen_version)
 {
     int rc;
+    xen_version_op_val_t val = 0;
     memset(xen_version, 0, sizeof(*xen_version));
 
-    rc = xc_version(xch, XENVER_version, NULL);
+    rc = xc_version(xch, XEN_VERSION_OP_version, &val, sizeof(val));
     if ( rc < 0 )
         return rc;
-    xen_version->major_version = rc >> 16;
-    xen_version->minor_version = rc & ((1 << 16) - 1);
+    xen_version->major_version = val >> 16;
+    xen_version->minor_version = val & ((1 << 16) - 1);
 
-    rc = xc_version(xch, XENVER_extraversion,
-                    &xen_version->extra_version);
+    rc = xc_version(xch, XEN_VERSION_OP_extraversion,
+                    xen_version->extra_version,
+                    sizeof(xen_version->extra_version));
     if ( rc < 0 )
         return rc;
 
-    rc = xc_version(xch, XENVER_compile_info,
-                    &xen_version->compile_info);
+    rc = xc_version(xch, XEN_VERSION_OP_capabilities,
+                    xen_version->capabilities,
+                    sizeof(xen_version->capabilities));
     if ( rc < 0 )
         return rc;
 
-    rc = xc_version(xch,
-                    XENVER_capabilities, &xen_version->capabilities);
+    rc = xc_version(xch, XEN_VERSION_OP_changeset, xen_version->changeset,
+                    sizeof(xen_version->changeset));
     if ( rc < 0 )
         return rc;
 
-    rc = xc_version(xch, XENVER_changeset, &xen_version->changeset);
+    rc = xc_version(xch, XEN_VERSION_OP_platform_parameters,
+                    &xen_version->platform_parameters,
+                    sizeof(xen_version->platform_parameters));
     if ( rc < 0 )
         return rc;
 
-    rc = xc_version(xch, XENVER_platform_parameters,
-                    &xen_version->platform_parameters);
+    val = 0;
+    rc = xc_version(xch, XEN_VERSION_OP_pagesize, &val, sizeof(val));
     if ( rc < 0 )
         return rc;
-
-    rc = xc_version(xch, XENVER_pagesize, NULL);
-    if ( rc < 0 )
-        return rc;
-    xen_version->pagesize = rc;
+    xen_version->pagesize = val;
 
     return 0;
 }
diff --git a/tools/libxc/xc_dom_boot.c b/tools/libxc/xc_dom_boot.c
index 791041b..3f65095 100644
--- a/tools/libxc/xc_dom_boot.c
+++ b/tools/libxc/xc_dom_boot.c
@@ -112,11 +112,19 @@ int xc_dom_compat_check(struct xc_dom_image *dom)
 
 int xc_dom_boot_xen_init(struct xc_dom_image *dom, xc_interface *xch, domid_t domid)
 {
+    xen_version_op_val_t val = 0;
+
+    if ( xc_version(xch, XEN_VERSION_OP_version, &val, sizeof(val)) < 0 )
+    {
+        xc_dom_panic(xch, XC_INTERNAL_ERROR, "can't get Xen version!");
+        return -1;
+    }
+    dom->xen_version = val;
     dom->xch = xch;
     dom->guest_domid = domid;
 
-    dom->xen_version = xc_version(xch, XENVER_version, NULL);
-    if ( xc_version(xch, XENVER_capabilities, &dom->xen_caps) < 0 )
+    if ( xc_version(xch, XEN_VERSION_OP_capabilities, dom->xen_caps,
+                    sizeof(dom->xen_caps)) < 0 )
     {
         xc_dom_panic(xch, XC_INTERNAL_ERROR, "can't get xen capabilities");
         return -1;
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 050216e..c214700 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -2084,7 +2084,8 @@ int xc_map_domain_meminfo(xc_interface *xch, int domid,
     _di.guest_width = minfo->guest_width;
 
     /* Get page table levels (see get_platform_info() in xg_save_restore.h */
-    if ( xc_version(xch, XENVER_capabilities, &xen_caps) )
+    if ( xc_version(xch, XEN_VERSION_OP_capabilities, xen_caps,
+                    sizeof(xen_caps)) < 0 )
     {
         PERROR("Could not get Xen capabilities (for page table levels)");
         return -1;
diff --git a/tools/libxc/xc_private.c b/tools/libxc/xc_private.c
index c41e433..631ad91 100644
--- a/tools/libxc/xc_private.c
+++ b/tools/libxc/xc_private.c
@@ -457,58 +457,23 @@ int xc_sysctl(xc_interface *xch, struct xen_sysctl *sysctl)
     return do_sysctl(xch, sysctl);
 }
 
-int xc_version(xc_interface *xch, int cmd, void *arg)
+ssize_t xc_version_len(xc_interface *xch, unsigned int cmd)
 {
-    DECLARE_HYPERCALL_BOUNCE(arg, 0, XC_HYPERCALL_BUFFER_BOUNCE_OUT); /* Size unknown until cmd decoded */
-    size_t sz;
-    int rc;
-
-    switch ( cmd )
-    {
-    case XENVER_version:
-        sz = 0;
-        break;
-    case XENVER_extraversion:
-        sz = sizeof(xen_extraversion_t);
-        break;
-    case XENVER_compile_info:
-        sz = sizeof(xen_compile_info_t);
-        break;
-    case XENVER_capabilities:
-        sz = sizeof(xen_capabilities_info_t);
-        break;
-    case XENVER_changeset:
-        sz = sizeof(xen_changeset_info_t);
-        break;
-    case XENVER_platform_parameters:
-        sz = sizeof(xen_platform_parameters_t);
-        break;
-    case XENVER_get_features:
-        sz = sizeof(xen_feature_info_t);
-        break;
-    case XENVER_pagesize:
-        sz = 0;
-        break;
-    case XENVER_guest_handle:
-        sz = sizeof(xen_domain_handle_t);
-        break;
-    case XENVER_commandline:
-        sz = sizeof(xen_commandline_t);
-        break;
-    default:
-        ERROR("xc_version: unknown command %d\n", cmd);
-        return -EINVAL;
-    }
+    return do_version_op(xch, cmd, NULL, 0);
+}
 
-    HYPERCALL_BOUNCE_SET_SIZE(arg, sz);
+int xc_version(xc_interface *xch, unsigned int cmd, void *arg, size_t sz)
+{
+    DECLARE_HYPERCALL_BOUNCE(arg, sz, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    int rc;
 
-    if ( (sz != 0) && xc_hypercall_bounce_pre(xch, arg) )
+    if ( xc_hypercall_bounce_pre(xch, arg) )
     {
         PERROR("Could not bounce buffer for version hypercall");
         return -ENOMEM;
     }
 
-    rc = do_xen_version(xch, cmd, HYPERCALL_BUFFER(arg));
+    rc = do_version_op(xch, cmd, HYPERCALL_BUFFER(arg), sz);
 
     if ( sz != 0 )
         xc_hypercall_bounce_post(xch, arg);
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index aa8daf1..5be8fdd 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -214,11 +214,12 @@ void xc__hypercall_buffer_cache_release(xc_interface *xch);
  * Hypercall interfaces.
  */
 
-static inline int do_xen_version(xc_interface *xch, int cmd, xc_hypercall_buffer_t *dest)
+static inline long do_version_op(xc_interface *xch, int cmd,
+                                 xc_hypercall_buffer_t *dest, ssize_t len)
 {
     DECLARE_HYPERCALL_BUFFER_ARGUMENT(dest);
-    return xencall2(xch->xcall, __HYPERVISOR_xen_version,
-                    cmd, HYPERCALL_BUFFER_AS_ARG(dest));
+    return xencall3(xch->xcall, __HYPERVISOR_version_op,
+                    cmd, HYPERCALL_BUFFER_AS_ARG(dest), len);
 }
 
 static inline int do_physdev_op(xc_interface *xch, int cmd, void *op, size_t len)
diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
index e692b81..7dfc3da 100644
--- a/tools/libxc/xc_resume.c
+++ b/tools/libxc/xc_resume.c
@@ -56,7 +56,8 @@ static int modify_returncode(xc_interface *xch, uint32_t domid)
             return 0;
 
         /* HVM guests have host address width. */
-        if ( xc_version(xch, XENVER_capabilities, &caps) != 0 )
+        if ( xc_version(xch, XEN_VERSION_OP_capabilities, caps,
+                        sizeof(caps)) < 0 )
         {
             PERROR("Could not get Xen capabilities");
             return -1;
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index e258b7c..6daafc4 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -9,7 +9,7 @@
 static int write_headers(struct xc_sr_context *ctx, uint16_t guest_type)
 {
     xc_interface *xch = ctx->xch;
-    int32_t xen_version = xc_version(xch, XENVER_version, NULL);
+    xen_version_op_val_t xen_version;
     struct xc_sr_ihdr ihdr =
         {
             .marker  = IHDR_MARKER,
@@ -21,15 +21,16 @@ static int write_headers(struct xc_sr_context *ctx, uint16_t guest_type)
         {
             .type       = guest_type,
             .page_shift = XC_PAGE_SHIFT,
-            .xen_major  = (xen_version >> 16) & 0xffff,
-            .xen_minor  = (xen_version)       & 0xffff,
         };
 
-    if ( xen_version < 0 )
+    if ( xc_version(xch, XEN_VERSION_OP_version, &xen_version,
+                    sizeof(xen_version)) < 0 )
     {
         PERROR("Unable to obtain Xen Version");
         return -1;
     }
+    dhdr.xen_major = (xen_version >> 16) & 0xffff;
+    dhdr.xen_minor = (xen_version)       & 0xffff;
 
     if ( write_exact(ctx->fd, &ihdr, sizeof(ihdr)) )
     {
diff --git a/tools/libxc/xg_save_restore.h b/tools/libxc/xg_save_restore.h
index 303081d..2663969 100644
--- a/tools/libxc/xg_save_restore.h
+++ b/tools/libxc/xg_save_restore.h
@@ -57,10 +57,12 @@ static inline int get_platform_info(xc_interface *xch, uint32_t dom,
     xen_capabilities_info_t xen_caps = "";
     xen_platform_parameters_t xen_params;
 
-    if (xc_version(xch, XENVER_platform_parameters, &xen_params) != 0)
+    if (xc_version(xch, XEN_VERSION_OP_platform_parameters, &xen_params,
+                   sizeof(xen_params)) < 0)
         return 0;
 
-    if (xc_version(xch, XENVER_capabilities, &xen_caps) != 0)
+    if (xc_version(xch, XEN_VERSION_OP_capabilities, xen_caps,
+                   sizeof(xen_caps)) < 0)
         return 0;
 
     if (xc_maximum_ram_page(xch, max_mfn))
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 93e228d..e6b191d 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -5191,50 +5191,73 @@ libxl_numainfo *libxl_get_numainfo(libxl_ctx *ctx, int *nr)
     return ret;
 }
 
+
+static int libxl__xc_version_wrapper(libxl__gc *gc, unsigned int cmd, char *buf, ssize_t len, char **dst)
+{
+    int r;
+
+    r = xc_version(CTX->xch, cmd, buf, len);
+    if ( r == -EPERM )
+    {
+        buf[0] = '\0';
+    }
+    else if ( r < 0 )
+    {
+        return r;
+    }
+    *dst = libxl__strdup(NOGC, buf);
+    return 0;
+}
+
 const libxl_version_info* libxl_get_version_info(libxl_ctx *ctx)
 {
     GC_INIT(ctx);
-    union {
-        xen_extraversion_t xen_extra;
-        xen_compile_info_t xen_cc;
-        xen_changeset_info_t xen_chgset;
-        xen_capabilities_info_t xen_caps;
-        xen_platform_parameters_t p_parms;
-        xen_commandline_t xen_commandline;
-    } u;
-    long xen_version;
+    char *buf;
+    xen_version_op_val_t val = 0;
     libxl_version_info *info = &ctx->version_info;
 
     if (info->xen_version_extra != NULL)
         goto out;
 
-    xen_version = xc_version(ctx->xch, XENVER_version, NULL);
-    info->xen_version_major = xen_version >> 16;
-    info->xen_version_minor = xen_version & 0xFF;
+    if (xc_version(CTX->xch, XEN_VERSION_OP_pagesize, &val, sizeof(val)) < 0)
+        goto out;
+
+    info->pagesize = val;
+    /* 4K buffer. */
+    buf = libxl__zalloc(gc, info->pagesize);
 
-    xc_version(ctx->xch, XENVER_extraversion, &u.xen_extra);
-    info->xen_version_extra = libxl__strdup(NOGC, u.xen_extra);
+    val = 0;
+    if (xc_version(CTX->xch, XEN_VERSION_OP_version, &val, sizeof(val)) < 0)
+        goto out;
+    info->xen_version_major = val >> 16;
+    info->xen_version_minor = val & 0xFF;
 
-    xc_version(ctx->xch, XENVER_compile_info, &u.xen_cc);
-    info->compiler = libxl__strdup(NOGC, u.xen_cc.compiler);
-    info->compile_by = libxl__strdup(NOGC, u.xen_cc.compile_by);
-    info->compile_domain = libxl__strdup(NOGC, u.xen_cc.compile_domain);
-    info->compile_date = libxl__strdup(NOGC, u.xen_cc.compile_date);
+    if (libxl__xc_version_wrapper(gc, XEN_VERSION_OP_extraversion, buf,
+                                  info->pagesize, &info->xen_version_extra) < 0)
+        goto out;
 
-    xc_version(ctx->xch, XENVER_capabilities, &u.xen_caps);
-    info->capabilities = libxl__strdup(NOGC, u.xen_caps);
+    info->compiler = libxl__strdup(NOGC, "");
+    info->compile_by = libxl__strdup(NOGC, "");
+    info->compile_domain = libxl__strdup(NOGC, "");
+    info->compile_date = libxl__strdup(NOGC, "");
 
-    xc_version(ctx->xch, XENVER_changeset, &u.xen_chgset);
-    info->changeset = libxl__strdup(NOGC, u.xen_chgset);
+    if (libxl__xc_version_wrapper(gc, XEN_VERSION_OP_capabilities, buf,
+                                  info->pagesize, &info->capabilities) < 0)
+        goto out;
 
-    xc_version(ctx->xch, XENVER_platform_parameters, &u.p_parms);
-    info->virt_start = u.p_parms.virt_start;
+    if (libxl__xc_version_wrapper(gc, XEN_VERSION_OP_changeset, buf,
+                                  info->pagesize, &info->changeset) < 0)
+        goto out;
 
-    info->pagesize = xc_version(ctx->xch, XENVER_pagesize, NULL);
+    val = 0;
+    if (xc_version(CTX->xch, XEN_VERSION_OP_platform_parameters, &val,
+                   sizeof(val)) < 0)
+        goto out;
 
-    xc_version(ctx->xch, XENVER_commandline, &u.xen_commandline);
-    info->commandline = libxl__strdup(NOGC, u.xen_commandline);
+    info->virt_start = val;
 
+    (void)libxl__xc_version_wrapper(gc, XEN_VERSION_OP_commandline, buf,
+                                    info->pagesize, &info->commandline);
  out:
     GC_FREE;
     return info;
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 74928e9..44dccc9 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -853,21 +853,21 @@ CAMLprim value stub_xc_version_version(value xch)
 	CAMLparam1(xch);
 	CAMLlocal1(result);
 	xen_extraversion_t extra;
-	long packed;
+	xen_version_op_val_t packed;
 	int retval;
 
 	caml_enter_blocking_section();
-	packed = xc_version(_H(xch), XENVER_version, NULL);
+	retval = xc_version(_H(xch), XEN_VERSION_OP_version, &packed, sizeof(packed));
 	caml_leave_blocking_section();
 
-	if (packed < 0)
+	if (retval < 0)
 		failwith_xc(_H(xch));
 
 	caml_enter_blocking_section();
-	retval = xc_version(_H(xch), XENVER_extraversion, &extra);
+	retval = xc_version(_H(xch), XEN_VERSION_OP_extraversion, &extra, sizeof(extra));
 	caml_leave_blocking_section();
 
-	if (retval)
+	if (retval < 0)
 		failwith_xc(_H(xch));
 
 	result = caml_alloc_tuple(3);
@@ -884,37 +884,28 @@ CAMLprim value stub_xc_version_compile_info(value xch)
 {
 	CAMLparam1(xch);
 	CAMLlocal1(result);
-	xen_compile_info_t ci;
-	int retval;
-
-	caml_enter_blocking_section();
-	retval = xc_version(_H(xch), XENVER_compile_info, &ci);
-	caml_leave_blocking_section();
-
-	if (retval)
-		failwith_xc(_H(xch));
 
 	result = caml_alloc_tuple(4);
 
-	Store_field(result, 0, caml_copy_string(ci.compiler));
-	Store_field(result, 1, caml_copy_string(ci.compile_by));
-	Store_field(result, 2, caml_copy_string(ci.compile_domain));
-	Store_field(result, 3, caml_copy_string(ci.compile_date));
+	Store_field(result, 0, caml_copy_string(""));
+	Store_field(result, 1, caml_copy_string(""));
+	Store_field(result, 2, caml_copy_string(""));
+	Store_field(result, 3, caml_copy_string(""));
 
 	CAMLreturn(result);
 }
 
 
-static value xc_version_single_string(value xch, int code, void *info)
+static value xc_version_single_string(value xch, int code, void *info, ssize_t len)
 {
 	CAMLparam1(xch);
 	int retval;
 
 	caml_enter_blocking_section();
-	retval = xc_version(_H(xch), code, info);
+	retval = xc_version(_H(xch), code, info, len);
 	caml_leave_blocking_section();
 
-	if (retval)
+	if (retval < 0)
 		failwith_xc(_H(xch));
 
 	CAMLreturn(caml_copy_string((char *)info));
@@ -925,7 +916,8 @@ CAMLprim value stub_xc_version_changeset(value xch)
 {
 	xen_changeset_info_t ci;
 
-	return xc_version_single_string(xch, XENVER_changeset, &ci);
+	return xc_version_single_string(xch, XEN_VERSION_OP_changeset,
+					&ci, sizeof(ci));
 }
 
 
@@ -933,7 +925,8 @@ CAMLprim value stub_xc_version_capabilities(value xch)
 {
 	xen_capabilities_info_t ci;
 
-	return xc_version_single_string(xch, XENVER_capabilities, &ci);
+	return xc_version_single_string(xch, XEN_VERSION_OP_capabilities,
+					&ci, sizeof(ci));
 }
 
 
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index c40a4e9..d2029c7 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -1204,34 +1204,40 @@ static PyObject *pyxc_xeninfo(XcObject *self)
     xen_capabilities_info_t xen_caps;
     xen_platform_parameters_t p_parms;
     xen_commandline_t xen_commandline;
-    long xen_version;
-    long xen_pagesize;
+    xen_version_op_val_t xen_version;
+    xen_version_op_val_t xen_pagesize;
     char str[128];
 
-    xen_version = xc_version(self->xc_handle, XENVER_version, NULL);
-
-    if ( xc_version(self->xc_handle, XENVER_extraversion, &xen_extra) != 0 )
+    if ( xc_version(self->xc_handle, XEN_VERSION_OP_version, &xen_version,
+                    sizeof(xen_version)) < 0 )
         return pyxc_error_to_exception(self->xc_handle);
 
-    if ( xc_version(self->xc_handle, XENVER_compile_info, &xen_cc) != 0 )
+    if ( xc_version(self->xc_handle, XEN_VERSION_OP_extraversion, &xen_extra,
+                    sizeof(xen_extra)) < 0 )
         return pyxc_error_to_exception(self->xc_handle);
 
-    if ( xc_version(self->xc_handle, XENVER_changeset, &xen_chgset) != 0 )
+    memset(&xen_cc, 0, sizeof(xen_cc));
+
+    if ( xc_version(self->xc_handle, XEN_VERSION_OP_changeset, &xen_chgset,
+                    sizeof(xen_chgset)) < 0 )
         return pyxc_error_to_exception(self->xc_handle);
 
-    if ( xc_version(self->xc_handle, XENVER_capabilities, &xen_caps) != 0 )
+    if ( xc_version(self->xc_handle, XEN_VERSION_OP_capabilities, &xen_caps,
+                    sizeof(xen_caps)) < 0 )
         return pyxc_error_to_exception(self->xc_handle);
 
-    if ( xc_version(self->xc_handle, XENVER_platform_parameters, &p_parms) != 0 )
+    if ( xc_version(self->xc_handle, XEN_VERSION_OP_platform_parameters,
+                    &p_parms, sizeof(p_parms)) < 0 )
         return pyxc_error_to_exception(self->xc_handle);
 
-    if ( xc_version(self->xc_handle, XENVER_commandline, &xen_commandline) != 0 )
+    if ( xc_version(self->xc_handle, XEN_VERSION_OP_commandline,
+                    &xen_commandline, sizeof(xen_commandline)) < 0 )
         return pyxc_error_to_exception(self->xc_handle);
 
     snprintf(str, sizeof(str), "virt_start=0x%"PRI_xen_ulong, p_parms.virt_start);
 
-    xen_pagesize = xc_version(self->xc_handle, XENVER_pagesize, NULL);
-    if (xen_pagesize < 0 )
+    if ( xc_version(self->xc_handle, XEN_VERSION_OP_pagesize, &xen_pagesize,
+                    sizeof(xen_pagesize)) < 0 )
         return pyxc_error_to_exception(self->xc_handle);
 
     return Py_BuildValue("{s:i,s:i,s:s,s:s,s:i,s:s,s:s,s:s,s:s,s:s,s:s,s:s}",
diff --git a/tools/xenstat/libxenstat/src/xenstat.c b/tools/xenstat/libxenstat/src/xenstat.c
index 3495f3f..a57d5e1 100644
--- a/tools/xenstat/libxenstat/src/xenstat.c
+++ b/tools/xenstat/libxenstat/src/xenstat.c
@@ -621,20 +621,18 @@ unsigned long long xenstat_network_tdrop(xenstat_network * network)
 /* Collect Xen version information */
 static int xenstat_collect_xen_version(xenstat_node * node)
 {
-	long vnum = 0;
+	xen_version_op_val_t vnum = 0;
 	xen_extraversion_t version;
 
 	/* Collect Xen version information if not already collected */
 	if (node->handle->xen_version[0] == '\0') {
 		/* Get the Xen version number and extraversion string */
-		vnum = xc_version(node->handle->xc_handle,
-			XENVER_version, NULL);
-
-		if (vnum < 0)
+		if (xc_version(node->handle->xc_handle,
+			       XEN_VERSION_OP_version, &vnum, sizeof(vnum)) < 0)
 			return 0;
 
-		if (xc_version(node->handle->xc_handle, XENVER_extraversion,
-			&version) < 0)
+		if (xc_version(node->handle->xc_handle, XEN_VERSION_OP_extraversion,
+			       &version, sizeof(version)) < 0)
 			return 0;
 		/* Format the version information as a string and store it */
 		snprintf(node->handle->xen_version, VERSION_SIZE, "%ld.%ld%s",
diff --git a/tools/xentrace/xenctx.c b/tools/xentrace/xenctx.c
index e647179..14d2f8b 100644
--- a/tools/xentrace/xenctx.c
+++ b/tools/xentrace/xenctx.c
@@ -1000,7 +1000,8 @@ static void dump_ctx(int vcpu)
             guest_word_size = (cpuctx.msr_efer & 0x400) ? 8 :
                 guest_protected_mode ? 4 : 2;
             /* HVM guest context records are always host-sized */
-            if (xc_version(xenctx.xc_handle, XENVER_capabilities, &xen_caps) != 0) {
+            if (xc_version(xenctx.xc_handle, XEN_VERSION_OP_capabilities,
+                           &xen_caps, sizeof(xen_caps)) < 0) {
                 perror("xc_version");
                 return;
             }
-- 
2.5.0



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 25/34] build_id: Provide ld-embedded build-ids
  2016-03-16 21:02     ` Konrad Rzeszutek Wilk
@ 2016-03-17  1:12       ` Konrad Rzeszutek Wilk
  2016-03-17 11:08         ` Julien Grall
  0 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-17  1:12 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, andrew.cooper3, mpohlack, ross.lagerwall,
	Julien Grall, Stefano Stabellini, Jan Beulich, sasha.levin,
	xen-devel

> Let me try that.

Please see inline patch which has your suggestion:


From 20ddfe00c72bebd20f5f8c02283bdca5e1459616 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Tue, 15 Mar 2016 09:51:03 -0400
Subject: [PATCH v4 26/35] build_id: Provide ld-embedded build-ids

This patch enables the Elf to be built with the build-id
and provide in the Xen hypervisor the code to extract it.

One can also retrieve the value of the build-id by doing
'readelf -n xen-syms'.

For EFI builds we re-use the same build-id that the xen-syms
was built with.

The version of ld that first implemented --build-id is v2.18.
Hence we check for that or later version - if older version
found we do not build the hypervisor with the build-id
(and the return code is -ENODATA for xen_build_id() call).

For x86 we have two binaries - the xen-syms and the xen - an
smaller version with lots of sections removed. To make it possible
for readelf -n xen we also modify mkelf32 and xen.lds.S to include
the PT_NOTE ELF section.

The EFI binary is more complicated. Having any non-recognizable
sections (.note, .data.note, etc) causes the boot to hang.
Moving the .note in the .data section makes it work. It is also
worth noting that the PE/COFF does not have any "comment"
sections to the author.

Lastly, we MUST call --binary-id=sha1 on all linker invocation so that
symbol offsets don't changes (which means we have multiple binary
ids - except that the last one is the final one). Without this change,
the symbol table embedded in Xen are incorrect - some of the values it
contains are offset by the size of the included build id.
This obviously causes problems when resolving symbols.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Martin Pohlack <mpohlack@amazon.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v1: Rebase it on Martin's initial patch
v2: Move it to XENVER hypercall
v3: Fix EFI building (Ross's fix)
v4: Don't use the third argument for length.
v5: Use new structure for XENVER_build_id with variable buf.
v6: Include Ross's fix.
v7: Include detection of bin-utils for build-id support, add
    probing for size, and return -EPERM for XSM denied calls.
v8: Build xen_build_id under ARM, required adding ELFSIZE in proper file.
v9: Rebase on top XSM version class.
v10: Include the build-id .note in the xen ELF binary.
     s/build_id/build_id_linker/
    For EFI build, moved the --build-id values in .data section
v11: Rebase on staging.
v12: Split patch in two. Always do --build-id call. Include the .note in
    .rodata. USe const void * and ssize_t
v13: Use -S to make build_id.o and objcopy differently (Andrew suggested)
v14: Put back the #ifdef LOCK_PROFILE on ARM. (Bad change). Move the _erodata
     around.
---
 Config.mk                   |  11 ++++
 xen/arch/arm/Makefile       |   2 +-
 xen/arch/arm/xen.lds.S      |  14 ++++-
 xen/arch/x86/Makefile       |  30 ++++++++--
 xen/arch/x86/boot/mkelf32.c | 137 ++++++++++++++++++++++++++++++++++++++------
 xen/arch/x86/xen.lds.S      |  23 ++++++++
 xen/common/version.c        |  51 +++++++++++++++++
 xen/include/xen/version.h   |   3 +
 8 files changed, 248 insertions(+), 23 deletions(-)

diff --git a/Config.mk b/Config.mk
index 79eb2bd..c8e89fe 100644
--- a/Config.mk
+++ b/Config.mk
@@ -126,6 +126,17 @@ endef
 check-$(gcc) = $(call cc-ver-check,CC,0x040100,"Xen requires at least gcc-4.1")
 $(eval $(check-y))
 
+ld-ver-build-id = $(shell $(1) --build-id 2>&1 | \
+					grep -q unrecognized && echo n || echo y)
+
+# binutils 2.18 implement build-id.
+ifeq ($(call ld-ver-build-id,$(LD)),n)
+build_id_linker :=
+else
+CFLAGS += -DBUILD_ID
+build_id_linker := --build-id=sha1
+endif
+
 # as-insn: Check whether assembler supports an instruction.
 # Usage: cflags-y += $(call as-insn "insn",option-yes,option-no)
 as-insn = $(if $(shell echo 'void _(void) { asm volatile ( $(2) ); }' \
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 17e9e3a..a3319ab 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -94,7 +94,7 @@ $(TARGET)-syms: prelink.o xen.lds $(BASEDIR)/common/symbols-dummy.o
 	$(NM) -pa --format=sysv $(@D)/.$(@F).1 \
 		| $(BASEDIR)/tools/symbols --sysv --sort >$(@D)/.$(@F).1.S
 	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1.o
-	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
+	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o $(build_id_linker) \
 	    $(@D)/.$(@F).1.o -o $@
 	rm -f $(@D)/.$(@F).[0-9]*
 
diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index 9909595..aad26e3 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -22,6 +22,9 @@ OUTPUT_ARCH(FORMAT)
 PHDRS
 {
   text PT_LOAD /* XXX should be AT ( XEN_PHYS_START ) */ ;
+#if defined(BUILD_ID)
+  note PT_NOTE ;
+#endif
 }
 SECTIONS
 {
@@ -57,9 +60,18 @@ SECTIONS
        *(.lockprofile.data)
        __lock_profile_end = .;
 #endif
+  } :text
 
-        _erodata = .;          /* End of read-only data */
+#if defined(BUILD_ID)
+  .note : {
+       __note_gnu_build_id_start = .;
+       *(.note.gnu.build-id)
+       __note_gnu_build_id_end = .;
+       *(.note)
+       *(.note.*)
   } :text
+#endif
+  _erodata = .;                /* End of read-only data */
 
   .data : {                    /* Data */
        . = ALIGN(PAGE_SIZE);
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 8a100be..7db2e53 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -71,9 +71,16 @@ efi-y := $(shell if [ ! -r $(BASEDIR)/include/xen/compile.h -o \
                       -O $(BASEDIR)/include/xen/compile.h ]; then \
                          echo '$(TARGET).efi'; fi)
 
+ifdef build_id_linker
+num_phdrs = 2
+else
+num_phdrs = 1
+endif
+
 $(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32
 	./boot/mkelf32 $(TARGET)-syms $(TARGET) 0x100000 \
-	`$(NM) -nr $(TARGET)-syms | head -n 1 | sed -e 's/^\([^ ]*\).*/0x\1/'`
+	`$(NM) -nr $(TARGET)-syms | head -n 1 | sed -e 's/^\([^ ]*\).*/0x\1/'` \
+	$(num_phdrs)
 	$(MAKE) -f $(BASEDIR)/Rules.mk -C test
 
 install:
@@ -109,22 +116,28 @@ $(BASEDIR)/common/symbols-dummy.o:
 	$(MAKE) -f $(BASEDIR)/Rules.mk -C $(BASEDIR)/common symbols-dummy.o
 
 $(TARGET)-syms: prelink.o xen.lds $(BASEDIR)/common/symbols-dummy.o
-	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
+	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o $(build_id_linker) \
 	    $(BASEDIR)/common/symbols-dummy.o -o $(@D)/.$(@F).0
 	$(NM) -pa --format=sysv $(@D)/.$(@F).0 \
 		| $(BASEDIR)/tools/symbols --all-symbols --sysv --sort \
 		>$(@D)/.$(@F).0.S
 	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).0.o
-	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
+	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o $(build_id_linker) \
 	    $(@D)/.$(@F).0.o -o $(@D)/.$(@F).1
 	$(NM) -pa --format=sysv $(@D)/.$(@F).1 \
 		| $(BASEDIR)/tools/symbols --all-symbols --sysv --sort --warn-dup \
 		>$(@D)/.$(@F).1.S
 	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1.o
-	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
+	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o $(build_id_linker) \
 	    $(@D)/.$(@F).1.o -o $@
 	rm -f $(@D)/.$(@F).[0-9]*
 
+build_id.o: $(TARGET)-syms
+	$(OBJCOPY) -O binary --only-section=.note $(BASEDIR)/xen-syms $@.bin
+	$(OBJCOPY) -I binary -O elf64-x86-64 -B i386:x86-64 \
+		--rename-section=.data=.note.gnu.build-id -S $@.bin $@
+	rm -f $@.bin
+
 EFI_LDFLAGS = $(patsubst -m%,-mi386pep,$(LDFLAGS)) --subsystem=10
 EFI_LDFLAGS += --image-base=$(1) --stack=0,0 --heap=0,0 --strip-debug
 EFI_LDFLAGS += --section-alignment=0x200000 --file-alignment=0x20
@@ -137,6 +150,13 @@ $(TARGET).efi: VIRT_BASE = 0x$(shell $(NM) efi/relocs-dummy.o | sed -n 's, A VIR
 $(TARGET).efi: ALT_BASE = 0x$(shell $(NM) efi/relocs-dummy.o | sed -n 's, A ALT_START$$,,p')
 # Don't use $(wildcard ...) here - at least make 3.80 expands this too early!
 $(TARGET).efi: guard = $(if $(shell echo efi/dis* | grep disabled),:)
+ifdef build_id_linker
+$(TARGET).efi: build_id.o
+build_id_file := build_id.o
+else
+build_id_file :=
+endif
+
 $(TARGET).efi: prelink-efi.o efi.lds efi/relocs-dummy.o $(BASEDIR)/common/symbols-dummy.o efi/mkreloc
 	$(foreach base, $(VIRT_BASE) $(ALT_BASE), \
 	          $(guard) $(LD) $(call EFI_LDFLAGS,$(base)) -T efi.lds -N $< efi/relocs-dummy.o \
@@ -153,7 +173,7 @@ $(TARGET).efi: prelink-efi.o efi.lds efi/relocs-dummy.o $(BASEDIR)/common/symbol
 		| $(guard) $(BASEDIR)/tools/symbols --sysv --sort >$(@D)/.$(@F).1s.S
 	$(guard) $(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1r.o $(@D)/.$(@F).1s.o
 	$(guard) $(LD) $(call EFI_LDFLAGS,$(VIRT_BASE)) -T efi.lds -N $< \
-	                $(@D)/.$(@F).1r.o $(@D)/.$(@F).1s.o -o $@
+	                $(@D)/.$(@F).1r.o $(@D)/.$(@F).1s.o $(build_id_file) -o $@
 	if $(guard) false; then rm -f $@; echo 'EFI support disabled'; fi
 	rm -f $(@D)/.$(@F).[0-9]*
 
diff --git a/xen/arch/x86/boot/mkelf32.c b/xen/arch/x86/boot/mkelf32.c
index 993a7ee..d230e4c 100644
--- a/xen/arch/x86/boot/mkelf32.c
+++ b/xen/arch/x86/boot/mkelf32.c
@@ -45,9 +45,9 @@ static Elf32_Ehdr out_ehdr = {
     0,                                       /* e_flags */
     sizeof(Elf32_Ehdr),                      /* e_ehsize */
     sizeof(Elf32_Phdr),                      /* e_phentsize */
-    1,                                       /* e_phnum */
+    1,  /* modify based on num_phdrs */      /* e_phnum */
     sizeof(Elf32_Shdr),                      /* e_shentsize */
-    3,                                       /* e_shnum */
+    3,  /* modify based on num_phdrs */      /* e_shnum */
     2                                        /* e_shstrndx */
 };
 
@@ -61,8 +61,20 @@ static Elf32_Phdr out_phdr = {
     PF_R|PF_W|PF_X,                          /* p_flags */
     64                                       /* p_align */
 };
+static Elf32_Phdr note_phdr = {
+    PT_NOTE,                                 /* p_type */
+    DYNAMICALLY_FILLED,                      /* p_offset */
+    DYNAMICALLY_FILLED,                      /* p_vaddr */
+    DYNAMICALLY_FILLED,                      /* p_paddr */
+    DYNAMICALLY_FILLED,                      /* p_filesz */
+    DYNAMICALLY_FILLED,                      /* p_memsz */
+    PF_R,                                    /* p_flags */
+    4                                        /* p_align */
+};
 
 static u8 out_shstrtab[] = "\0.text\0.shstrtab";
+/* If num_phdrs >= 2, we need to tack the .note. */
+static u8 out_shstrtab_extra[] = ".note\0";
 
 static Elf32_Shdr out_shdr[] = {
     { 0 },
@@ -90,6 +102,23 @@ static Elf32_Shdr out_shdr[] = {
     }
 };
 
+/*
+ * The 17 points to the '.note' in the out_shstrtab and out_shstrtab_extra
+ * laid out in the file.
+ */
+static Elf32_Shdr out_shdr_extra = {
+      17,                                    /* sh_name */
+      SHT_NOTE,                              /* sh_type */
+      0,                                     /* sh_flags */
+      DYNAMICALLY_FILLED,                    /* sh_addr */
+      DYNAMICALLY_FILLED,                    /* sh_offset */
+      DYNAMICALLY_FILLED,                    /* sh_size */
+      0,                                     /* sh_link */
+      0,                                     /* sh_info */
+      4,                                     /* sh_addralign */
+      0                                      /* sh_entsize */
+};
+
 /* Some system header files define these macros and pollute our namespace. */
 #undef swap16
 #undef swap32
@@ -228,21 +257,22 @@ static void do_read(int fd, void *data, int len)
 int main(int argc, char **argv)
 {
     u64        final_exec_addr;
-    u32        loadbase, dat_siz, mem_siz;
+    u32        loadbase, dat_siz, mem_siz, note_base, note_sz, offset;
     char      *inimage, *outimage;
     int        infd, outfd;
     char       buffer[1024];
     int        bytes, todo, i;
+    int        num_phdrs;
 
     Elf32_Ehdr in32_ehdr;
 
     Elf64_Ehdr in64_ehdr;
     Elf64_Phdr in64_phdr;
 
-    if ( argc != 5 )
+    if ( argc != 6 )
     {
         fprintf(stderr, "Usage: mkelf32 <in-image> <out-image> "
-                "<load-base> <final-exec-addr>\n");
+                "<load-base> <final-exec-addr> <number of program headers>\n");
         return 1;
     }
 
@@ -250,7 +280,13 @@ int main(int argc, char **argv)
     outimage = argv[2];
     loadbase = strtoul(argv[3], NULL, 16);
     final_exec_addr = strtoull(argv[4], NULL, 16);
-
+    num_phdrs = atoi(argv[5]);
+    if ( num_phdrs > 2 || num_phdrs < 1 )
+    {
+        fprintf(stderr, "Number of program headers MUST be 1 or 2, got %d!\n",
+                num_phdrs);
+        return 1;
+    }
     infd = open(inimage, O_RDONLY);
     if ( infd == -1 )
     {
@@ -285,11 +321,10 @@ int main(int argc, char **argv)
                 (int)in64_ehdr.e_phentsize, (int)sizeof(in64_phdr));
         return 1;
     }
-
-    if ( in64_ehdr.e_phnum != 1 )
+    if ( in64_ehdr.e_phnum != num_phdrs )
     {
-        fprintf(stderr, "Expect precisly 1 program header; found %d.\n",
-                (int)in64_ehdr.e_phnum);
+        fprintf(stderr, "Expect precisly %d program header; found %d.\n",
+                num_phdrs, (int)in64_ehdr.e_phnum);
         return 1;
     }
 
@@ -299,11 +334,36 @@ int main(int argc, char **argv)
 
     (void)lseek(infd, in64_phdr.p_offset, SEEK_SET);
     dat_siz = (u32)in64_phdr.p_filesz;
-
     /* Do not use p_memsz: it does not include BSS alignment padding. */
     /*mem_siz = (u32)in64_phdr.p_memsz;*/
     mem_siz = (u32)(final_exec_addr - in64_phdr.p_vaddr);
 
+    note_sz = note_base = offset = 0;
+    if ( num_phdrs > 1 )
+    {
+        offset = in64_phdr.p_offset;
+        note_base = in64_phdr.p_vaddr;
+
+        (void)lseek(infd, in64_ehdr.e_phoff+sizeof(in64_phdr), SEEK_SET);
+        do_read(infd, &in64_phdr, sizeof(in64_phdr));
+        endianadjust_phdr64(&in64_phdr);
+
+        (void)lseek(infd, offset, SEEK_SET);
+
+        note_sz = in64_phdr.p_memsz;
+        note_base = in64_phdr.p_vaddr - note_base;
+
+        if ( in64_phdr.p_offset > dat_siz || offset > in64_phdr.p_offset )
+        {
+            fprintf(stderr, "Expected .note section within .text section!\n" \
+                    "Offset %ld not within %d!\n",
+                    in64_phdr.p_offset, dat_siz);
+            return 1;
+        }
+        /* Gets us the absolute offset within the .text section. */
+        offset = in64_phdr.p_offset - offset;
+    }
+
     /*
      * End the image on a page boundary. This gets round alignment bugs
      * in the boot- or chain-loader (e.g., kexec on the XenoBoot CD).
@@ -322,6 +382,31 @@ int main(int argc, char **argv)
     out_shdr[1].sh_size   = dat_siz;
     out_shdr[2].sh_offset = RAW_OFFSET + dat_siz + sizeof(out_shdr);
 
+    if ( num_phdrs > 1 )
+    {
+        /* We have two of them! */
+        out_ehdr.e_phnum = num_phdrs;
+        /* Extra .note section. */
+        out_ehdr.e_shnum++;
+
+        /* Fill out the PT_NOTE program header. */
+        note_phdr.p_vaddr   = note_base;
+        note_phdr.p_paddr   = note_base;
+        note_phdr.p_filesz  = note_sz;
+        note_phdr.p_memsz   = note_sz;
+        note_phdr.p_offset  = offset;
+
+        /* Tack on the .note\0 */
+        out_shdr[2].sh_size += sizeof(out_shstrtab_extra);
+        /* And move it past the .note section. */
+        out_shdr[2].sh_offset += sizeof(out_shdr_extra);
+
+        /* Fill out the .note section. */
+        out_shdr_extra.sh_size = note_sz;
+        out_shdr_extra.sh_addr = note_base;
+        out_shdr_extra.sh_offset = RAW_OFFSET + offset;
+    }
+
     outfd = open(outimage, O_WRONLY|O_CREAT|O_TRUNC, 0775);
     if ( outfd == -1 )
     {
@@ -335,8 +420,15 @@ int main(int argc, char **argv)
 
     endianadjust_phdr32(&out_phdr);
     do_write(outfd, &out_phdr, sizeof(out_phdr));
-    
-    if ( (bytes = RAW_OFFSET - sizeof(out_ehdr) - sizeof(out_phdr)) < 0 )
+
+    if ( num_phdrs > 1 )
+    {
+        endianadjust_phdr32(&note_phdr);
+        do_write(outfd, &note_phdr, sizeof(note_phdr));
+    }
+
+    if ( (bytes = RAW_OFFSET - sizeof(out_ehdr) - sizeof(out_phdr) -
+          ( num_phdrs > 1 ? sizeof(note_phdr) : 0 ) ) < 0 )
     {
         fprintf(stderr, "Header overflow.\n");
         return 1;
@@ -355,9 +447,22 @@ int main(int argc, char **argv)
         endianadjust_shdr32(&out_shdr[i]);
     do_write(outfd, &out_shdr[0], sizeof(out_shdr));
 
-    do_write(outfd, out_shstrtab, sizeof(out_shstrtab));
-    do_write(outfd, buffer, 4-((sizeof(out_shstrtab)+dat_siz)&3));
-
+    if ( num_phdrs > 1 )
+    {
+        endianadjust_shdr32(&out_shdr_extra);
+        /* Append the .note section. */
+        do_write(outfd, &out_shdr_extra, sizeof(out_shdr_extra));
+        /* The normal strings - .text\0.. */
+        do_write(outfd, out_shstrtab, sizeof(out_shstrtab));
+        /* Our .note */
+        do_write(outfd, out_shstrtab_extra, sizeof(out_shstrtab_extra));
+        do_write(outfd, buffer, 4-((sizeof(out_shstrtab)+sizeof(out_shstrtab_extra)+dat_siz)&3));
+    }
+    else
+    {
+        do_write(outfd, out_shstrtab, sizeof(out_shstrtab));
+        do_write(outfd, buffer, 4-((sizeof(out_shstrtab)+dat_siz)&3));
+    }
     close(infd);
     close(outfd);
 
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 961f48f..705fa98 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -31,6 +31,9 @@ OUTPUT_ARCH(i386:x86-64)
 PHDRS
 {
   text PT_LOAD ;
+#if defined(BUILD_ID) && !defined(EFI)
+  note PT_NOTE ;
+#endif
 }
 SECTIONS
 {
@@ -75,6 +78,11 @@ SECTIONS
 
        *(.rodata)
        *(.rodata.*)
+#if defined(BUILD_ID) && defined(EFI)
+       __note_gnu_build_id_start = .;
+       *(.note.gnu.build-id)
+       __note_gnu_build_id_end = .;
+#endif
 
        . = ALIGN(8);
        /* Exception table */
@@ -96,6 +104,21 @@ SECTIONS
        _erodata = .;
   } :text
 
+#if defined(BUILD_ID) && !defined(EFI)
+/*
+ * No mechanism to put an PT_NOTE in the EFI file - so put
+ * it in .data section.
+ */
+  . = ALIGN(4);
+  .note : {
+       __note_gnu_build_id_start = .;
+       *(.note.gnu.build-id)
+       __note_gnu_build_id_end = .;
+       *(.note)
+       *(.note.*)
+  } :note :text
+#endif
+
 #ifdef EFI
   . = ALIGN(MB(2));
 #endif
diff --git a/xen/common/version.c b/xen/common/version.c
index fc9bf42..af87371 100644
--- a/xen/common/version.c
+++ b/xen/common/version.c
@@ -1,5 +1,9 @@
 #include <xen/compile.h>
+#include <xen/errno.h>
+#include <xen/string.h>
+#include <xen/types.h>
 #include <xen/version.h>
+#include <xen/elf.h>
 
 const char *xen_compile_date(void)
 {
@@ -61,6 +65,53 @@ const char *xen_deny(void)
     return "<denied>";
 }
 
+#ifdef BUILD_ID
+#define NT_GNU_BUILD_ID 3
+/* Defined in linker script. */
+extern const Elf_Note __note_gnu_build_id_start[], __note_gnu_build_id_end[];
+
+int xen_build_id(const void **p, ssize_t *len)
+{
+    const Elf_Note *n = __note_gnu_build_id_start;
+    static bool_t checked = 0;
+
+    if ( checked )
+    {
+        *len = n->descsz;
+        *p = ELFNOTE_DESC(n);
+        return 0;
+    }
+    /* --build-id invoked with wrong parameters. */
+    if ( __note_gnu_build_id_end <= __note_gnu_build_id_start )
+        return -ENODATA;
+
+    /* Check for full Note header. */
+    if ( &n[1] > __note_gnu_build_id_end )
+        return -ENODATA;
+
+    /* Check if we really have a build-id. */
+    if ( NT_GNU_BUILD_ID != n->type )
+        return -ENODATA;
+
+    /* Sanity check, name should be "GNU" for ld-generated build-id. */
+    if ( strncmp(ELFNOTE_NAME(n), "GNU", n->namesz) != 0 )
+        return -ENODATA;
+
+    *len = n->descsz;
+    *p = ELFNOTE_DESC(n);
+
+    checked = 1;
+    return 0;
+}
+
+#else
+
+int xen_build_id(const void **p, ssize_t *len)
+{
+    return -ENODATA;
+}
+#endif
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/xen/version.h b/xen/include/xen/version.h
index 016a56c..a461c85 100644
--- a/xen/include/xen/version.h
+++ b/xen/include/xen/version.h
@@ -13,4 +13,7 @@ const char *xen_extra_version(void);
 const char *xen_changeset(void);
 const char *xen_banner(void);
 const char *xen_deny(void);
+#include <xen/types.h>
+int xen_build_id(const void **p, ssize_t *len);
+
 #endif /* __XEN_VERSION_H__ */
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.
  2016-03-15 20:19     ` Konrad Rzeszutek Wilk
@ 2016-03-17  1:38       ` Konrad Rzeszutek Wilk
  2016-03-17 14:28         ` Andrew Cooper
  2016-03-18 12:36         ` Jan Beulich
  0 siblings, 2 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-17  1:38 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Stefano Stabellini, Andrew Cooper, Ian Jackson,
	mpohlack, ross.lagerwall, Julien Grall, Stefano Stabellini,
	Jan Beulich, xen-devel, sasha.levin, Daniel De Graaf,
	Keir Fraser

> > > +
> > > +    if ( !rc )
> > > +    {
> > > +        ssize_t bytes;
> > > +
> > > +        if ( sz > len )
> > > +            bytes = len;
> > > +        else
> > > +            bytes = sz;
> > > +
> > > +        if ( copy_to_guest(arg, ptr ? ptr : &u, bytes) )
> > 
> > Can be shortened to ptr ?: &u
> > 
> > > +            rc = -EFAULT;
> > > +    }
> > > +    if ( !rc )
> 
>          ^^^^^^^^^ - here
> > > +    {
> > > +        /*
> > > +         * We return len (truncate) worth of data even if we fail.
> > > +         */
> > > +        if ( sz > len )
> > > +            rc = -ENOBUFS;
> > 
> > This needs to be in the previous if() clause to avoid overriding -EFAULT
> > with -ENOBUFS.
> 
> That is exactly why it is in its own 'if ( !rc )' - so it won't
> overwrite -EFAULT. See above for 'here'

All changes incorporated. This is what the patch ends up looking:


From 534f9277aebb9b89b937a79dd33c0a7016ce00a2 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Wed, 16 Mar 2016 15:48:56 -0400
Subject: [PATCH v4 05/35] HYPERCALL_version_op. New hypercall mirroring
 XENVER_ but sane.

This hypercall mirrors the XENVER_ in that it has similar functionality.
However it is designed differently:
 - No compat layer. The data structures are the same size on 32
   as on 64-bit.
 - The hypercall accepts three arguments - the command, pointer to
   an buffer, and the length of the buffer.
 - Each sub-ops can be "probed" for size by returning the size of
   buffer that will be needed - if the buffer is NULL.
 - Subops can complete even if the buffer is too slow - truncated
   data will be filled and hypercall will return -ENOBUFS.
 - VERSION_OP_commandline, VERSION_OP_changeset are privileged.
 - There are no XENVER_compile_info equivalent.
 - The hypercall can return -EPERM and toolstack/OSes are expected
   to deal with it.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v2: Remove memset and use {}. Tweak copy_to_guest and capabilities_info,
    add ASSERT(sz) per Andrew's review.
---
---
 tools/flask/policy/policy/modules/xen/xen.te |   9 +-
 xen/arch/arm/traps.c                         |   1 +
 xen/arch/x86/hvm/hvm.c                       |   1 +
 xen/arch/x86/x86_64/compat/entry.S           |   2 +
 xen/arch/x86/x86_64/entry.S                  |   2 +
 xen/common/compat/kernel.c                   |   3 +
 xen/common/kernel.c                          | 259 +++++++++++++++++++++++----
 xen/include/public/arch-arm.h                |   3 +
 xen/include/public/version.h                 |  72 +++++++-
 xen/include/public/xen.h                     |   1 +
 xen/include/xen/hypercall.h                  |   4 +
 xen/include/xsm/dummy.h                      |  19 ++
 xen/include/xsm/xsm.h                        |   7 +
 xen/xsm/dummy.c                              |   1 +
 xen/xsm/flask/hooks.c                        |  39 ++++
 xen/xsm/flask/policy/access_vectors          |  24 ++-
 16 files changed, 404 insertions(+), 43 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index 7e7400d..bea40c1 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -74,12 +74,14 @@ allow dom0_t xen_t:xen2 {
     get_symbol
 };
 
-# Allow dom0 to use all XENVER_ subops
+# Allow dom0 to use all XENVER_ subops and VERSION_OP subops
 # Note that dom0 is part of domain_type so this has duplicates.
 allow dom0_t xen_t:version {
     xen_version xen_extraversion xen_compile_info xen_capabilities
     xen_changeset xen_platform_parameters xen_get_features xen_pagesize
     xen_guest_handle xen_commandline
+    version extraversion capabilities changeset platform_parameters
+    get_features pagesize guest_handle commandline
 };
 
 allow dom0_t xen_t:mmu memorymap;
@@ -146,11 +148,14 @@ if (guest_writeconsole) {
 # pmu_ctrl is for)
 allow domain_type xen_t:xen2 pmu_use;
 
-# For normal guests all except XENVER_commandline
+# For normal guests all except XENVER_commandline, VERSION_OP_changeset,
+# and VERSION_OP_commandline
 allow domain_type xen_t:version {
     xen_version xen_extraversion xen_compile_info xen_capabilities
     xen_changeset xen_platform_parameters xen_get_features xen_pagesize
     xen_guest_handle
+    version extraversion capabilities  platform_parameters
+    get_features pagesize guest_handle
 };
 ###############################################################################
 #
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 83744e8..31d2115 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1235,6 +1235,7 @@ static arm_hypercall_t arm_hypercall_table[] = {
     HYPERCALL(multicall, 2),
     HYPERCALL(platform_op, 1),
     HYPERCALL_ARM(vcpu_op, 3),
+    HYPERCALL(version_op, 3),
 };
 
 #ifndef NDEBUG
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 255a1d6..56b9f6b 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -5332,6 +5332,7 @@ static const struct {
     COMPAT_CALL(platform_op),
     COMPAT_CALL(mmuext_op),
     HYPERCALL(xenpmu_op),
+    HYPERCALL(version_op),
     HYPERCALL(arch_1)
 };
 
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index 5218f8a..cc49f4a 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -395,6 +395,7 @@ ENTRY(compat_hypercall_table)
         .quad do_tmem_op
         .quad do_ni_hypercall           /* reserved for XenClient */
         .quad do_xenpmu_op              /* 40 */
+        .quad do_version_op
         .rept __HYPERVISOR_arch_0-((.-compat_hypercall_table)/8)
         .quad compat_ni_hypercall
         .endr
@@ -446,6 +447,7 @@ ENTRY(compat_hypercall_args_table)
         .byte 1 /* do_tmem_op               */
         .byte 0 /* reserved for XenClient   */
         .byte 2 /* do_xenpmu_op             */  /* 40 */
+        .byte 3 /* do_version_op            */
         .rept __HYPERVISOR_arch_0-(.-compat_hypercall_args_table)
         .byte 0 /* compat_ni_hypercall      */
         .endr
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index cab9763..3a350e0 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -731,6 +731,7 @@ ENTRY(hypercall_table)
         .quad do_tmem_op
         .quad do_ni_hypercall       /* reserved for XenClient */
         .quad do_xenpmu_op          /* 40 */
+        .quad do_version_op
         .rept __HYPERVISOR_arch_0-((.-hypercall_table)/8)
         .quad do_ni_hypercall
         .endr
@@ -782,6 +783,7 @@ ENTRY(hypercall_args_table)
         .byte 1 /* do_tmem_op           */
         .byte 0 /* reserved for XenClient */
         .byte 2 /* do_xenpmu_op         */  /* 40 */
+        .byte 3 /* do_version_op        */
         .rept __HYPERVISOR_arch_0-(.-hypercall_args_table)
         .byte 0 /* do_ni_hypercall      */
         .endr
diff --git a/xen/common/compat/kernel.c b/xen/common/compat/kernel.c
index dc898ae..b763318 100644
--- a/xen/common/compat/kernel.c
+++ b/xen/common/compat/kernel.c
@@ -34,6 +34,9 @@ CHECK_TYPE(capabilities_info);
 
 CHECK_TYPE(domain_handle);
 
+CHECK_TYPE(version_op_buf);
+CHECK_TYPE(version_op_val);
+
 #ifdef COMPAT_VM_ASSIST_VALID
 #undef VM_ASSIST_VALID
 #define VM_ASSIST_VALID COMPAT_VM_ASSIST_VALID
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 2699ac0..4ab4640 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -221,6 +221,47 @@ void __init do_initcalls(void)
 
 #endif
 
+static int get_features(struct domain *d, xen_feature_info_t *fi)
+{
+    switch ( fi->submap_idx )
+    {
+    case 0:
+        fi->submap = (1U << XENFEAT_memory_op_vnode_supported);
+        if ( VM_ASSIST(d, pae_extended_cr3) )
+            fi->submap |= (1U << XENFEAT_pae_pgdir_above_4gb);
+        if ( paging_mode_translate(d) )
+            fi->submap |= 
+                (1U << XENFEAT_writable_page_tables) |
+                (1U << XENFEAT_auto_translated_physmap);
+        if ( is_hardware_domain(d) )
+            fi->submap |= 1U << XENFEAT_dom0;
+#ifdef CONFIG_X86
+        switch ( d->guest_type )
+        {
+        case guest_type_pv:
+            fi->submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) |
+                          (1U << XENFEAT_highmem_assist) |
+                          (1U << XENFEAT_gnttab_map_avail_bits);
+            break;
+        case guest_type_pvh:
+            fi->submap |= (1U << XENFEAT_hvm_safe_pvclock) |
+                          (1U << XENFEAT_supervisor_mode_kernel) |
+                          (1U << XENFEAT_hvm_callback_vector);
+            break;
+        case guest_type_hvm:
+            fi->submap |= (1U << XENFEAT_hvm_safe_pvclock) |
+                          (1U << XENFEAT_hvm_callback_vector) |
+                          (1U << XENFEAT_hvm_pirqs);
+           break;
+        }
+#endif
+        break;
+    default:
+        return -EINVAL;
+    }
+    return 0;
+}
+
 /*
  * Simple hypercalls.
  */
@@ -302,50 +343,16 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     case XENVER_get_features:
     {
         xen_feature_info_t fi;
-        struct domain *d = current->domain;
 
         if ( copy_from_guest(&fi, arg, 1) )
             return -EFAULT;
 
-        switch ( fi.submap_idx )
+        if ( !deny )
         {
-        case 0:
-            if ( deny )
-                break;
-            fi.submap = (1U << XENFEAT_memory_op_vnode_supported);
-            if ( VM_ASSIST(d, pae_extended_cr3) )
-                fi.submap |= (1U << XENFEAT_pae_pgdir_above_4gb);
-            if ( paging_mode_translate(d) )
-                fi.submap |= 
-                    (1U << XENFEAT_writable_page_tables) |
-                    (1U << XENFEAT_auto_translated_physmap);
-            if ( is_hardware_domain(d) )
-                fi.submap |= 1U << XENFEAT_dom0;
-#ifdef CONFIG_X86
-            switch ( d->guest_type )
-            {
-            case guest_type_pv:
-                fi.submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) |
-                             (1U << XENFEAT_highmem_assist) |
-                             (1U << XENFEAT_gnttab_map_avail_bits);
-                break;
-            case guest_type_pvh:
-                fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
-                             (1U << XENFEAT_supervisor_mode_kernel) |
-                             (1U << XENFEAT_hvm_callback_vector);
-                break;
-            case guest_type_hvm:
-                fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
-                             (1U << XENFEAT_hvm_callback_vector) |
-                             (1U << XENFEAT_hvm_pirqs);
-                break;
-            }
-#endif
-            break;
-        default:
-            return -EINVAL;
+            int rc = get_features(current->domain, &fi);
+            if ( rc )
+                return rc;
         }
-
         if ( __copy_to_guest(arg, &fi, 1) )
             return -EFAULT;
         return 0;
@@ -388,6 +395,182 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     return -ENOSYS;
 }
 
+static const char *capabilities_info(ssize_t *len)
+{
+    static xen_capabilities_info_t cached_cap;
+    static unsigned int cached_cap_len;
+    static bool_t cached;
+
+    if ( unlikely(!cached) )
+    {
+        arch_get_xen_caps(&cached_cap);
+        cached_cap_len = strlen(cached_cap) + 1;
+    }
+
+    *len = cached_cap_len;
+    return cached_cap;
+}
+
+static int size_of_subops_data(unsigned int cmd, ssize_t *sz)
+{
+    int rc = 0;
+    /* Compute size. */
+    switch ( cmd )
+    {
+    case XEN_VERSION_OP_version:
+        *sz = sizeof(xen_version_op_val_t);
+        break;
+
+    case XEN_VERSION_OP_extraversion:
+        *sz = strlen(xen_extra_version()) + 1;
+        break;
+
+    case XEN_VERSION_OP_capabilities:
+        capabilities_info(sz);
+        break;
+
+    case XEN_VERSION_OP_platform_parameters:
+        *sz = sizeof(xen_version_op_val_t);
+        break;
+
+    case XEN_VERSION_OP_changeset:
+        *sz = strlen(xen_changeset()) + 1;
+        break;
+
+    case XEN_VERSION_OP_get_features:
+        *sz = sizeof(xen_feature_info_t);
+        break;
+
+    case XEN_VERSION_OP_pagesize:
+        *sz = sizeof(xen_version_op_val_t);
+        break;
+
+    case XEN_VERSION_OP_guest_handle:
+        *sz = ARRAY_SIZE(current->domain->handle);
+        break;
+
+    case XEN_VERSION_OP_commandline:
+        *sz = ARRAY_SIZE(saved_cmdline);
+        break;
+
+    default:
+        rc = -ENOSYS;
+    }
+
+    return rc;
+}
+
+/*
+ * Similar to HYPERVISOR_xen_version but with a sane interface
+ * (has a length, one can probe for the length) and with one less sub-ops:
+ * missing XENVER_compile_info.
+ */
+DO(version_op)(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg,
+               unsigned int len)
+{
+    union {
+        xen_version_op_val_t n;
+        xen_feature_info_t fi;
+    } u = {};
+    ssize_t sz = 0;
+    const void *ptr = NULL;
+    int rc = xsm_version_op(XSM_OTHER, cmd);
+
+    /* We can safely return -EPERM! */
+    if ( rc )
+        return rc;
+
+    rc = size_of_subops_data(cmd, &sz);
+    if ( rc )
+        return rc;
+
+    ASSERT(sz);
+    /*
+     * This hypercall also allows the client to probe. If it provides
+     * a NULL arg we will return the size of the space it has to
+     * allocate for the specific sub-op.
+     */
+    if ( guest_handle_is_null(arg) )
+        return sz;
+
+    /*
+     * The HYPERVISOR_xen_version differs in that some return the value,
+     * and some copy it on back on argument. We follow the same rule for all
+     * sub-ops: return 0 on success, positive value of bytes returned, and
+     * always copy the result in arg. Yeey sanity!
+     */
+
+    switch ( cmd )
+    {
+    case XEN_VERSION_OP_version:
+        u.n = (xen_major_version() << 16) | xen_minor_version();
+        break;
+
+    case XEN_VERSION_OP_extraversion:
+        ptr = xen_extra_version();
+        break;
+
+    case XEN_VERSION_OP_capabilities:
+        ptr = capabilities_info(&sz);
+        break;
+
+    case XEN_VERSION_OP_platform_parameters:
+        u.n = HYPERVISOR_VIRT_START;
+        break;
+
+    case XEN_VERSION_OP_changeset:
+        ptr = xen_changeset();
+        break;
+
+    case XEN_VERSION_OP_get_features:
+        if ( copy_from_guest(&u.fi, arg, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+        rc = get_features(current->domain, &u.fi);
+        break;
+
+    case XEN_VERSION_OP_pagesize:
+        u.n = PAGE_SIZE;
+        break;
+
+    case XEN_VERSION_OP_guest_handle:
+        ptr = current->domain->handle;
+        break;
+
+    case XEN_VERSION_OP_commandline:
+        ptr = saved_cmdline;
+        break;
+
+    default:
+        rc = -ENOSYS;
+    }
+
+    if ( !rc )
+    {
+        ssize_t bytes;
+
+        if ( sz > len )
+            bytes = len;
+        else
+            bytes = sz;
+
+        if ( copy_to_guest(arg, ptr ? : &u, bytes) )
+            rc = -EFAULT;
+    }
+    if ( !rc )
+    {
+        /*
+         * We return len (truncate) worth of data even if we fail.
+         */
+        if ( sz > len )
+            rc = -ENOBUFS;
+    }
+
+    return rc == 0 ? sz : rc;
+}
+
 DO(nmi_op)(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct xennmi_callback cb;
diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index 870bc3b..c9ae315 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -128,6 +128,9 @@
  *    * VCPUOP_register_vcpu_info
  *    * VCPUOP_register_runstate_memory_area
  *
+ *  HYPERVISOR_version_op
+ *   All generic sub-operations
+ *
  *
  * Other notes on the ARM ABI:
  *
diff --git a/xen/include/public/version.h b/xen/include/public/version.h
index 24a582f..a025489 100644
--- a/xen/include/public/version.h
+++ b/xen/include/public/version.h
@@ -30,7 +30,15 @@
 
 #include "xen.h"
 
-/* NB. All ops return zero on success, except XENVER_{version,pagesize} */
+/*
+ * There are two hypercalls mentioned in here. The XENVER_ are for
+ * HYPERCALL_xen_version (17), while VERSION_OP_ are for the
+ * HYPERCALL_version_op (41).
+ *
+ * The subops are very similar except that the later hypercall has a
+ * sane interface.
+ */
+
 
 /* arg == NULL; returns major:minor (16:16). */
 #define XENVER_version      0
@@ -87,6 +95,68 @@ typedef struct xen_feature_info xen_feature_info_t;
 #define XENVER_commandline 9
 typedef char xen_commandline_t[1024];
 
+
+
+/*
+ * The HYPERCALL_version_op has a set of sub-ops which mirror the
+ * sub-ops of HYPERCALL_xen_version. However this hypercall differs
+ * radically from the former:
+ *  - It returns the amount of bytes returned.
+ *  - It will return -XEN_EPERM if the guest is not permitted.
+ *  - It will return the requested data in arg.
+ *  - It requires an third argument (len) for the length of the
+ *    arg. Naturally the arg has to fit the requested data otherwise
+ *    -XEN_ENOBUFS is returned.
+ *
+ * It also offers an mechanism to probe for the amount of bytes an
+ * sub-op will require. Having the arg have an NULL pointer will
+ * return the number of bytes requested for the operation. Or an
+ * negative value if an error is encountered.
+ */
+
+typedef uint64_t xen_version_op_val_t;
+DEFINE_XEN_GUEST_HANDLE(xen_version_op_val_t);
+
+typedef void xen_version_op_buf_t;
+DEFINE_XEN_GUEST_HANDLE(xen_version_op_buf_t);
+
+/* arg == version_op_val_t. Encoded as major:minor (31..16:15..0) */
+#define XEN_VERSION_OP_version      0
+
+/* arg == version_op_buf. Contains NUL terminated utf-8 string. */
+#define XEN_VERSION_OP_extraversion 1
+
+/* arg == version_op_buf. Contains NUL terminated utf-8 string. */
+#define XEN_VERSION_OP_capabilities 3
+
+/* arg == version_op_buf. Contains NUL terminated utf-8 string. */
+#define XEN_VERSION_OP_changeset 4
+
+/*
+ * arg == xen_version_op_val_t. Contains the virtual address
+ * of the hypervisor encoded as [63..0].
+ */
+#define XEN_VERSION_OP_platform_parameters 5
+
+/*
+ * arg = xen_feature_info_t - shares the same structure
+ * as the XENVER_get_features.
+ */
+#define XEN_VERSION_OP_get_features 6
+
+/* arg == xen_version_op_val_t. */
+#define XEN_VERSION_OP_pagesize 7
+
+/* arg == version_op_buf.
+ *
+ * The toolstack fills it out for guest consumption. It is intended to hold
+ * the UUID of the guest.
+ */
+#define XEN_VERSION_OP_guest_handle 8
+
+/* arg = version_op_buf. Contains NUL terminated utf-8 string. */
+#define XEN_VERSION_OP_commandline 9
+
 #endif /* __XEN_PUBLIC_VERSION_H__ */
 
 /*
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 64ba7ab..1a99929 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -115,6 +115,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define __HYPERVISOR_tmem_op              38
 #define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
 #define __HYPERVISOR_xenpmu_op            40
+#define __HYPERVISOR_version_op           41 /* supersedes xen_version (17) */
 
 /* Architecture-specific hypercall definitions. */
 #define __HYPERVISOR_arch_0               48
diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
index 0c8ae0e..e8d2b81 100644
--- a/xen/include/xen/hypercall.h
+++ b/xen/include/xen/hypercall.h
@@ -147,6 +147,10 @@ do_xenoprof_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg);
 extern long
 do_xenpmu_op(unsigned int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg);
 
+extern long
+do_version_op(unsigned int cmd,
+    XEN_GUEST_HANDLE_PARAM(void) arg, unsigned int len);
+
 #ifdef CONFIG_COMPAT
 
 extern int
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 94b8855..8c6ae90 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -749,3 +749,22 @@ static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
         return xsm_default_action(XSM_PRIV, current->domain, NULL);
     }
 }
+
+static XSM_INLINE int xsm_version_op (XSM_DEFAULT_ARG uint32_t op)
+{
+    XSM_ASSERT_ACTION(XSM_OTHER);
+    switch ( op )
+    {
+    case XEN_VERSION_OP_version:
+    case XEN_VERSION_OP_extraversion:
+    case XEN_VERSION_OP_capabilities:
+    case XEN_VERSION_OP_platform_parameters:
+    case XEN_VERSION_OP_get_features:
+    case XEN_VERSION_OP_pagesize:
+    case XEN_VERSION_OP_guest_handle:
+        /* These MUST always be accessible to any guest by default. */
+        return xsm_default_action(XSM_HOOK, current->domain, NULL);
+    default:
+        return xsm_default_action(XSM_PRIV, current->domain, NULL);
+    }
+}
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index db440f6..ac80472 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -194,6 +194,7 @@ struct xsm_operations {
     int (*pmu_op) (struct domain *d, unsigned int op);
 #endif
     int (*xen_version) (uint32_t cmd);
+    int (*version_op) (uint32_t cmd);
 };
 
 #ifdef CONFIG_XSM
@@ -736,6 +737,12 @@ static inline int xsm_xen_version (xsm_default_t def, uint32_t op)
 {
     return xsm_ops->xen_version(op);
 }
+
+static inline int xsm_version_op (xsm_default_t def, uint32_t op)
+{
+    return xsm_ops->version_op(op);
+}
+
 #endif /* XSM_NO_WRAPPERS */
 
 #ifdef CONFIG_MULTIBOOT
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 9791ad4..776dd09 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -163,4 +163,5 @@ void xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, pmu_op);
 #endif
     set_to_dummy_if_null(ops, xen_version);
+    set_to_dummy_if_null(ops, version_op);
 }
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index d1bef43..2510229 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1662,6 +1662,44 @@ static int flask_xen_version (uint32_t op)
     }
 }
 
+static int flask_version_op (uint32_t op)
+{
+    u32 dsid = domain_sid(current->domain);
+
+    switch ( op )
+    {
+    case XEN_VERSION_OP_version:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__VERSION, NULL);
+    case XEN_VERSION_OP_extraversion:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__EXTRAVERSION, NULL);
+    case XEN_VERSION_OP_capabilities:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__CAPABILITIES, NULL);
+    case XEN_VERSION_OP_changeset:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__CHANGESET, NULL);
+    case XEN_VERSION_OP_platform_parameters:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__PLATFORM_PARAMETERS, NULL);
+    case XEN_VERSION_OP_get_features:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__GET_FEATURES, NULL);
+    case XEN_VERSION_OP_pagesize:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__PAGESIZE, NULL);
+    case XEN_VERSION_OP_guest_handle:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__GUEST_HANDLE, NULL);
+    case XEN_VERSION_OP_commandline:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__COMMANDLINE, NULL);
+    default:
+        return -EPERM;
+    }
+}
+
 long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 int compat_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 
@@ -1801,6 +1839,7 @@ static struct xsm_operations flask_ops = {
     .pmu_op = flask_pmu_op,
 #endif
     .xen_version = flask_xen_version,
+    .version_op = flask_version_op,
 };
 
 static __init void flask_init(void)
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 628dd5c..59c9f69 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -496,9 +496,10 @@ class security
     del_ocontext
 }
 
-# Class version is used to describe the XENVER_ hypercall.
+# Class version is used to describe the XENVER_ and VERSION_OP hypercall.
 # Each sub-ops is described here - in the default case all of them should
-# be allowed except the XENVER_commandline.
+# be allowed except the XENVER_commandline, VERSION_OP_commandline, and
+# VERSION_OP_changeset.
 #
 class version
 {
@@ -522,4 +523,23 @@ class version
     xen_guest_handle
 # Xen command line.
     xen_commandline
+
+# Often called by PV kernels to force an callback.
+    version
+# Extra informations (-unstable).
+    extraversion
+# Such as "xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64".
+    capabilities
+# Such as the virtual address of where the hypervisor resides.
+    platform_parameters
+# Source code changeset.
+    changeset
+# The features the hypervisor supports.
+    get_features
+# Page size the hypervisor uses.
+    pagesize
+# An value that the control stack can choose.
+    guest_handle
+# Xen command line.
+    commandline
 }
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 01/34] compat/x86: Remove unncessary #define.
  2016-03-17  0:44     ` Konrad Rzeszutek Wilk
@ 2016-03-17  7:45       ` Jan Beulich
  0 siblings, 0 replies; 124+ messages in thread
From: Jan Beulich @ 2016-03-17  7:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, andrew.cooper3, Ian Jackson, Tim Deegan, mpohlack,
	ross.lagerwall, sasha.levin, xen-devel

>>> On 17.03.16 at 01:44, <konrad@kernel.org> wrote:
> On Wed, Mar 16, 2016 at 05:08:30AM -0600, Jan Beulich wrote:
>> >>> On 15.03.16 at 18:56, <konrad.wilk@oracle.com> wrote:
>> > It is not used.
>> 
>> Consistently please - either keep them all (just to cover the case
>> that they might get used) or remove them all: xen_compile_info,
>> xen_changeset_info, etc are all unused too. Otoh
>> xennmi_callback is used, but xennmi_callback_t isn't. Which to me
>> suggests that we should leave this alone.
> 
> Oddly enough taking an cleaver to it was OK.
> 
> From 7e3ed6faed6e083f27ad6be947ac528c3eaba9a1 Mon Sep 17 00:00:00 2001
> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Date: Wed, 2 Mar 2016 12:50:32 -0500
> Subject: [PATCH v4 02/35] compat/x86: Remove unncessary #defines.
> 
> They are not used.

This now goes too far.

> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Hence this can't stay.

> ---
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Keir Fraser <keir@xen.org>
> Cc: Tim Deegan <tim@xen.org>
> 
> v2: Remove a lot more of them.

(Side note: Not only this time round I notice versioning mixup in
your patches: The subject says v4 here, yet the update to it
above says v2. In the previous round of the xSplice series - v3 -
I seem to recall there were patches showing a "history" up to
something like v9. I think in cases of such heavy mixing it should
be the patch with the oldest history that determines the version
of the entire series.)

> --- a/xen/common/compat/kernel.c
> +++ b/xen/common/compat/kernel.c
> @@ -18,30 +18,22 @@ asm(".file \"" __FILE__ "\"");
>  
>  extern xen_commandline_t saved_cmdline;
>  
> -#define xen_extraversion compat_extraversion
>  #define xen_extraversion_t compat_extraversion_t
>  
> -#define xen_compile_info compat_compile_info
>  #define xen_compile_info_t compat_compile_info_t
>  
>  CHECK_TYPE(capabilities_info);
>  
> -#define xen_platform_parameters compat_platform_parameters
>  #define xen_platform_parameters_t compat_platform_parameters_t
>  #undef HYPERVISOR_VIRT_START
>  #define HYPERVISOR_VIRT_START HYPERVISOR_COMPAT_VIRT_START(current->domain)
>  
> -#define xen_changeset_info compat_changeset_info
>  #define xen_changeset_info_t compat_changeset_info_t
>  
> -#define xen_feature_info compat_feature_info
>  #define xen_feature_info_t compat_feature_info_t
>  
>  CHECK_TYPE(domain_handle);
>  
> -#define xennmi_callback compat_nmi_callback
> -#define xennmi_callback_t compat_nmi_callback_t

The former definitely is being used; not getting a compilation
error here is not a sign of things being right. That's another
reason why - as I had suggested already - we'd probably
better leave things as they are: Introduction of uses of the
now removed identifiers might otherwise break compat code
without anyone noticing right away.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 25/34] build_id: Provide ld-embedded build-ids
  2016-03-17  1:12       ` Konrad Rzeszutek Wilk
@ 2016-03-17 11:08         ` Julien Grall
  2016-03-17 13:39           ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 124+ messages in thread
From: Julien Grall @ 2016-03-17 11:08 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Konrad Rzeszutek Wilk
  Cc: Keir Fraser, andrew.cooper3, mpohlack, ross.lagerwall,
	Stefano Stabellini, Jan Beulich, sasha.levin, xen-devel

Hi Konrad,

On 17/03/16 01:12, Konrad Rzeszutek Wilk wrote:
>> Let me try that.
>
> Please see inline patch which has your suggestion:

The ARM part looks good to me.

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 08/34] vmap: Make the while loop less fishy.
  2016-03-15 17:56 ` [PATCH v4 08/34] vmap: Make the while loop less fishy Konrad Rzeszutek Wilk
  2016-03-15 19:33   ` Andrew Cooper
@ 2016-03-17 11:48   ` Jan Beulich
  2016-03-17 16:08   ` Ian Jackson
  2 siblings, 0 replies; 124+ messages in thread
From: Jan Beulich @ 2016-03-17 11:48 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, Ian Jackson,
	Tim Deegan, mpohlack, sasha.levin, xen-devel

>>> On 15.03.16 at 18:56, <konrad.wilk@oracle.com> wrote:
> It looks like it could underflow at first glance. That is
> if i is zero and you get in the while loop with the
> i--. However the postfix expression is evaluated after the
> conditional so the loop is fine and won't execute (with i==0).

I don't think this is the only place we do such, and with this
being well defined behavior I also don't see why we need to
change any of such uses.

(And btw., what does this have to do with the already large
xSplice series?)

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 08/34] vmap: Make the while loop less fishy.
  2016-03-15 19:33   ` Andrew Cooper
@ 2016-03-17 11:49     ` Jan Beulich
  2016-03-17 14:37       ` Andrew Cooper
  0 siblings, 1 reply; 124+ messages in thread
From: Jan Beulich @ 2016-03-17 11:49 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Keir Fraser, Tim Deegan, Ian Jackson, mpohlack, ross.lagerwall,
	sasha.levin, xen-devel

>>> On 15.03.16 at 20:33, <andrew.cooper3@citrix.com> wrote:
> On 15/03/16 17:56, Konrad Rzeszutek Wilk wrote:
>> It looks like it could underflow at first glance. That is
>> if i is zero and you get in the while loop with the
>> i--. However the postfix expression is evaluated after the
>> conditional so the loop is fine and won't execute (with i==0).
>>
>> However in spirit of defense programming lets clarify
>> the loop conditional.
>>
>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> This looks as if it will quieten Coverity, even though it is no
> functional change.

Quieten Coverity? In what way? And why would it complain in
the first place? As just in reply to Konrad, this is well defined
behavior.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 25/34] build_id: Provide ld-embedded build-ids
  2016-03-17 11:08         ` Julien Grall
@ 2016-03-17 13:39           ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-17 13:39 UTC (permalink / raw)
  To: Julien Grall
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, mpohlack,
	Stefano Stabellini, Jan Beulich, xen-devel, sasha.levin

On Thu, Mar 17, 2016 at 11:08:02AM +0000, Julien Grall wrote:
> Hi Konrad,
> 
> On 17/03/16 01:12, Konrad Rzeszutek Wilk wrote:
> >>Let me try that.
> >
> >Please see inline patch which has your suggestion:
> 
> The ARM part looks good to me.

Thank you! I am going to take that as Acked-by on the ARM parts.


> 
> Regards,
> 
> -- 
> Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.
  2016-03-17  1:38       ` Konrad Rzeszutek Wilk
@ 2016-03-17 14:28         ` Andrew Cooper
  2016-03-18 12:36         ` Jan Beulich
  1 sibling, 0 replies; 124+ messages in thread
From: Andrew Cooper @ 2016-03-17 14:28 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Konrad Rzeszutek Wilk
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, mpohlack,
	ross.lagerwall, Julien Grall, Stefano Stabellini, Jan Beulich,
	sasha.levin, xen-devel, Daniel De Graaf, Keir Fraser

On 17/03/16 01:38, Konrad Rzeszutek Wilk wrote:
> +static const char *capabilities_info(ssize_t *len)
> +{
> +    static xen_capabilities_info_t cached_cap;
> +    static unsigned int cached_cap_len;
> +    static bool_t cached;
> +
> +    if ( unlikely(!cached) )
> +    {
> +        arch_get_xen_caps(&cached_cap);
> +        cached_cap_len = strlen(cached_cap) + 1;

cached = 1;

With this coherency bug fixed, Reviewed-by: Andrew Cooper
<andrew.cooper3@citrix.com>

> +    }
> +
> +    *len = cached_cap_len;
> +    return cached_cap;
> +}


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 08/34] vmap: Make the while loop less fishy.
  2016-03-17 11:49     ` Jan Beulich
@ 2016-03-17 14:37       ` Andrew Cooper
  2016-03-17 15:30         ` Jan Beulich
  0 siblings, 1 reply; 124+ messages in thread
From: Andrew Cooper @ 2016-03-17 14:37 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, Ian Jackson, Tim Deegan, mpohlack, ross.lagerwall,
	xen-devel, sasha.levin


[-- Attachment #1.1: Type: text/plain, Size: 1582 bytes --]

On 17/03/16 11:49, Jan Beulich wrote:
>>>> On 15.03.16 at 20:33, <andrew.cooper3@citrix.com> wrote:
>> On 15/03/16 17:56, Konrad Rzeszutek Wilk wrote:
>>> It looks like it could underflow at first glance. That is
>>> if i is zero and you get in the while loop with the
>>> i--. However the postfix expression is evaluated after the
>>> conditional so the loop is fine and won't execute (with i==0).
>>>
>>> However in spirit of defense programming lets clarify
>>> the loop conditional.
>>>
>>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>
>> This looks as if it will quieten Coverity, even though it is no
>> functional change.
> Quieten Coverity? In what way? And why would it complain in
> the first place? As just in reply to Konrad, this is well defined
> behavior.

213 error:
        CID 63648: Overflowed constant (INTEGER_OVERFLOW)
        7. overflow_const: Decrement (--) operation overflows on operand
i, whose value is an unsigned constant, 0.
214    while ( i-- )
215        free_domheap_page(mfn_to_page(mfn_x(mfn[i])));
216    xfree(mfn);
217    return NULL;

By flipping the location of the postfix decrement, the problematic case
of getting to error: with i as 0 will not enter the loop, and won't
decrement i to UINT32_MAX.

It is arguable as to whether this is a Coverity bug or not.  Unsigned
integer overflow is defined under the C spec.  On the other hand, I
really don't blame Coverity for raising an issue here saying "did you
really mean for this underflow to happen".

~Andrew

[-- Attachment #1.2: Type: text/html, Size: 4750 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 08/34] vmap: Make the while loop less fishy.
  2016-03-17 14:37       ` Andrew Cooper
@ 2016-03-17 15:30         ` Jan Beulich
  2016-03-17 16:06           ` Ian Jackson
  0 siblings, 1 reply; 124+ messages in thread
From: Jan Beulich @ 2016-03-17 15:30 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Keir Fraser, Tim Deegan, Ian Jackson, mpohlack, ross.lagerwall,
	xen-devel, sasha.levin

>>> On 17.03.16 at 15:37, <andrew.cooper3@citrix.com> wrote:
> On 17/03/16 11:49, Jan Beulich wrote:
>>>>> On 15.03.16 at 20:33, <andrew.cooper3@citrix.com> wrote:
>>> On 15/03/16 17:56, Konrad Rzeszutek Wilk wrote:
>>>> It looks like it could underflow at first glance. That is
>>>> if i is zero and you get in the while loop with the
>>>> i--. However the postfix expression is evaluated after the
>>>> conditional so the loop is fine and won't execute (with i==0).
>>>>
>>>> However in spirit of defense programming lets clarify
>>>> the loop conditional.
>>>>
>>>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>
>>> This looks as if it will quieten Coverity, even though it is no
>>> functional change.
>> Quieten Coverity? In what way? And why would it complain in
>> the first place? As just in reply to Konrad, this is well defined
>> behavior.
> 
> 213 error:
>         CID 63648: Overflowed constant (INTEGER_OVERFLOW)
>         7. overflow_const: Decrement (--) operation overflows on operand
> i, whose value is an unsigned constant, 0.
> 214    while ( i-- )
> 215        free_domheap_page(mfn_to_page(mfn_x(mfn[i])));
> 216    xfree(mfn);
> 217    return NULL;
> 
> By flipping the location of the postfix decrement, the problematic case
> of getting to error: with i as 0 will not enter the loop, and won't
> decrement i to UINT32_MAX.

But (as alluded to before) this is a pretty common cleanup pattern,
and I really don't see us (a) fix all instances just because Coverity
complains and (b) avoid introducing any new instances.

> It is arguable as to whether this is a Coverity bug or not.  Unsigned
> integer overflow is defined under the C spec.  On the other hand, I
> really don't blame Coverity for raising an issue here saying "did you
> really mean for this underflow to happen".

Since this is defined behavior, I personally view it as a Coverity issue.
Which is not to say that such a warning may not help some people in
certain cases.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 08/34] vmap: Make the while loop less fishy.
  2016-03-17 15:30         ` Jan Beulich
@ 2016-03-17 16:06           ` Ian Jackson
  0 siblings, 0 replies; 124+ messages in thread
From: Ian Jackson @ 2016-03-17 16:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, Andrew Cooper, Tim Deegan, mpohlack, ross.lagerwall,
	sasha.levin, xen-devel

Jan Beulich writes ("Re: [Xen-devel] [PATCH v4 08/34] vmap: Make the while loop less fishy."):
> On 17.03.16 at 15:37, <andrew.cooper3@citrix.com> wrote:
> > 213 error:
> >         CID 63648: Overflowed constant (INTEGER_OVERFLOW)
> >         7. overflow_const: Decrement (--) operation overflows on operand
> > i, whose value is an unsigned constant, 0.
> > 214    while ( i-- )
> > 215        free_domheap_page(mfn_to_page(mfn_x(mfn[i])));
> > 216    xfree(mfn);
> > 217    return NULL;
> > 
> > By flipping the location of the postfix decrement, the problematic case
> > of getting to error: with i as 0 will not enter the loop, and won't
> > decrement i to UINT32_MAX.
> 
> But (as alluded to before) this is a pretty common cleanup pattern,
> and I really don't see us (a) fix all instances just because Coverity
> complains and (b) avoid introducing any new instances.

I'm inclined to agree.

> > It is arguable as to whether this is a Coverity bug or not.  Unsigned
> > integer overflow is defined under the C spec.  On the other hand, I
> > really don't blame Coverity for raising an issue here saying "did you
> > really mean for this underflow to happen".
> 
> Since this is defined behavior, I personally view it as a Coverity issue.
> Which is not to say that such a warning may not help some people in
> certain cases.

I think this should be marked as a false positive in Coverity.

Coverity ought to be re-educated so that it can see that:
  * The decrement result is defined as ~(unsigned)0
  * So there is no UB at this stage
  * ~0 will be written to i, which is potentially hazardous
  * But this is a dead store so in fact it is harmless

The problem is that Coverity is failing to distinguish this from cases
where an unsigned value is decreased and wraps, and then _the
resulting value with huge magnitude is used_.

These latter situations are often serious security bugs.  But if the
huge value is never used there is clearly no bug.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 08/34] vmap: Make the while loop less fishy.
  2016-03-15 17:56 ` [PATCH v4 08/34] vmap: Make the while loop less fishy Konrad Rzeszutek Wilk
  2016-03-15 19:33   ` Andrew Cooper
  2016-03-17 11:48   ` Jan Beulich
@ 2016-03-17 16:08   ` Ian Jackson
  2016-03-21 12:04     ` George Dunlap
  2 siblings, 1 reply; 124+ messages in thread
From: Ian Jackson @ 2016-03-17 16:08 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, Tim Deegan,
	mpohlack, Jan Beulich, sasha.levin, xen-devel

Konrad Rzeszutek Wilk writes ("[PATCH v4 08/34] vmap: Make the while loop less fishy."):
>   error:
> -    while ( i-- )
> -        free_domheap_page(mfn_to_page(mfn_x(mfn[i])));
> +    while ( i )
> +        free_domheap_page(mfn_to_page(mfn_x(mfn[--i])));

I quite strongly dislike this.  It is good practice to keep the loop
control code together where this is reasonably convenient.

I wouldn't quibble on such a stylistic matter (particularly outside my
bailiwick) but (a) I would like to reinforce Jan's position and
(b) it seems worth writing an email as there will be many occurrences.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 03/34] xsm/xen_version: Add XSM for the xen_version hypercall
  2016-03-15 17:56 ` [PATCH v4 03/34] xsm/xen_version: Add XSM for the xen_version hypercall Konrad Rzeszutek Wilk
@ 2016-03-18 11:55   ` Jan Beulich
  2016-03-18 17:26     ` Konrad Rzeszutek Wilk
  2016-03-22 17:49   ` Daniel De Graaf
  2016-03-24 15:34   ` anshul makkar
  2 siblings, 1 reply; 124+ messages in thread
From: Jan Beulich @ 2016-03-18 11:55 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Stefano Stabellini, andrew.cooper3, Ian Jackson,
	mpohlack, ross.lagerwall, xen-devel, Daniel De Graaf,
	sasha.levin

>>> On 15.03.16 at 18:56, <konrad.wilk@oracle.com> wrote:
> @@ -223,12 +224,15 @@ void __init do_initcalls(void)
>  /*
>   * Simple hypercalls.
>   */
> -
>  DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)

Please retain the blank line, as it relates to more than just this
one function.

>  {
> +    bool_t deny = !!xsm_xen_version(XSM_OTHER, cmd);
> +
>      switch ( cmd )
>      {
>      case XENVER_version:
> +        if ( deny )
> +            return 0;
>          return (xen_major_version() << 16) | xen_minor_version();

To be honest, I'm now rather uncertain about this one: If a guest
can't figure out the hypervisor version, how would it be able to
adjust its behavior accordingly (e.g. use deprecated hypercalls as
needed)? IOW, other than for most/all other stuff here (the
get-features and platform-parameters sub-ops may be considered
similar to this one, see also below), I don't think allowing the
"permitted" default to be overridden makes sense here.

> @@ -274,6 +279,9 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>              .virt_start = HYPERVISOR_VIRT_START
>          };
>  
> +        if ( deny )
> +            params.virt_start = 0;

Guests may (validly imo) assume to get a valid address here. If you
mean to not expose the non-constant address in the compat mode
case, I could accept that. But you would then need to set the ABI
mandated __HYPERVISOR_COMPAT_VIRT_START (and retain the
constant value in the non-compat case). Our old 32-bit PV guests
would crash extremely early on boot if they got back zero here
(that's for 2.6.30 and later, and I think both you and Citrix had
derived some of their kernels from our 2.6.32 based one).

> @@ -302,6 +310,8 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          switch ( fi.submap_idx )
>          {
>          case 0:
> +            if ( deny )
> +                break;

I think if to be put here at all, this should go ahead of the switch(),
so that guests wouldn't be able to guess from the valid index values
which features may be available. And of course you should clear
fi.submap if you deny access, instead of leaving in it what has been
there before.

>      case XENVER_guest_handle:
> -        if ( copy_to_guest(arg, current->domain->handle,
> -                           ARRAY_SIZE(current->domain->handle)) )
> +    {
> +        xen_domain_handle_t hdl;
> +        ssize_t len;
> +
> +        if ( deny )
> +        {
> +            len = sizeof(hdl);
> +            memset(&hdl, 0, len);
> +        } else
> +            len = ARRAY_SIZE(current->domain->handle);
> +
> +        if ( copy_to_guest(arg, deny ? hdl : current->domain->handle, len ) )
>              return -EFAULT;
>          return 0;

What is this "len" handling here about? Aren't both the same type
and hence size? Perhaps, if you feel unsure about that, simply add
a respective BUILD_BUG_ON()?

> --- a/xen/include/xen/version.h
> +++ b/xen/include/xen/version.h
> @@ -12,5 +12,5 @@ unsigned int xen_minor_version(void);
>  const char *xen_extra_version(void);
>  const char *xen_changeset(void);
>  const char *xen_banner(void);
> -
> +const char *xen_deny(void);
>  #endif /* __XEN_VERSION_H__ */

Please retain the blank line.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.
  2016-03-17  1:38       ` Konrad Rzeszutek Wilk
  2016-03-17 14:28         ` Andrew Cooper
@ 2016-03-18 12:36         ` Jan Beulich
  2016-03-18 19:22           ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 124+ messages in thread
From: Jan Beulich @ 2016-03-18 12:36 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Stefano Stabellini, Andrew Cooper, Ian Jackson,
	mpohlack, ross.lagerwall, Julien Grall, Stefano Stabellini,
	xen-devel, Daniel De Graaf, Keir Fraser, sasha.levin

> Subject: [PATCH v4 05/35] HYPERCALL_version_op. New hypercall mirroring
>  XENVER_ but sane.
> 
> This hypercall mirrors the XENVER_ in that it has similar functionality.
> However it is designed differently:
>  - No compat layer. The data structures are the same size on 32
>    as on 64-bit.
>  - The hypercall accepts three arguments - the command, pointer to
>    an buffer, and the length of the buffer.
>  - Each sub-ops can be "probed" for size by returning the size of
>    buffer that will be needed - if the buffer is NULL.
>  - Subops can complete even if the buffer is too slow - truncated
>    data will be filled and hypercall will return -ENOBUFS.

s/too slow/too small/ ?

>  - VERSION_OP_commandline, VERSION_OP_changeset are privileged.

Aiui this is no difference to the old one anymore if we assume
patches get committed in the order they're being presented in
this series.

> --- a/xen/common/kernel.c
> +++ b/xen/common/kernel.c
> @@ -221,6 +221,47 @@ void __init do_initcalls(void)
>  
>  #endif
>  
> +static int get_features(struct domain *d, xen_feature_info_t *fi)
> +{
> +    switch ( fi->submap_idx )
> +    {
> +    case 0:
> +        fi->submap = (1U << XENFEAT_memory_op_vnode_supported);
> +        if ( VM_ASSIST(d, pae_extended_cr3) )
> +            fi->submap |= (1U << XENFEAT_pae_pgdir_above_4gb);

Since you already move this code, I think the two lines above
would better go into the x86-specific section below.

> @@ -302,50 +343,16 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>      case XENVER_get_features:
>      {
>          xen_feature_info_t fi;
> -        struct domain *d = current->domain;
>  
>          if ( copy_from_guest(&fi, arg, 1) )
>              return -EFAULT;
>  
> -        switch ( fi.submap_idx )
> +        if ( !deny )
>          {
> -        case 0:
> -            if ( deny )
> -                break;
> -            fi.submap = (1U << XENFEAT_memory_op_vnode_supported);
> -            if ( VM_ASSIST(d, pae_extended_cr3) )
> -                fi.submap |= (1U << XENFEAT_pae_pgdir_above_4gb);
> -            if ( paging_mode_translate(d) )
> -                fi.submap |= 
> -                    (1U << XENFEAT_writable_page_tables) |
> -                    (1U << XENFEAT_auto_translated_physmap);
> -            if ( is_hardware_domain(d) )
> -                fi.submap |= 1U << XENFEAT_dom0;
> -#ifdef CONFIG_X86
> -            switch ( d->guest_type )
> -            {
> -            case guest_type_pv:
> -                fi.submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) |
> -                             (1U << XENFEAT_highmem_assist) |
> -                             (1U << XENFEAT_gnttab_map_avail_bits);
> -                break;
> -            case guest_type_pvh:
> -                fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
> -                             (1U << XENFEAT_supervisor_mode_kernel) |
> -                             (1U << XENFEAT_hvm_callback_vector);
> -                break;
> -            case guest_type_hvm:
> -                fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
> -                             (1U << XENFEAT_hvm_callback_vector) |
> -                             (1U << XENFEAT_hvm_pirqs);
> -                break;
> -            }
> -#endif
> -            break;
> -        default:
> -            return -EINVAL;
> +            int rc = get_features(current->domain, &fi);
> +            if ( rc )

Blank line between declaration(s) and statement(s) please.

> @@ -388,6 +395,182 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
> arg)
>      return -ENOSYS;
>  }
>  
> +static const char *capabilities_info(ssize_t *len)

Why ssize_t?

> +{
> +    static xen_capabilities_info_t cached_cap;

__read_mostly?

> +static int size_of_subops_data(unsigned int cmd, ssize_t *sz)
> +{
> +    int rc = 0;
> +    /* Compute size. */
> +    switch ( cmd )
> +    {
> +    case XEN_VERSION_OP_version:
> +        *sz = sizeof(xen_version_op_val_t);
> +        break;
> +
> +    case XEN_VERSION_OP_extraversion:
> +        *sz = strlen(xen_extra_version()) + 1;
> +        break;
> +
> +    case XEN_VERSION_OP_capabilities:
> +        capabilities_info(sz);
> +        break;
> +
> +    case XEN_VERSION_OP_platform_parameters:
> +        *sz = sizeof(xen_version_op_val_t);
> +        break;
> +
> +    case XEN_VERSION_OP_changeset:
> +        *sz = strlen(xen_changeset()) + 1;
> +        break;
> +
> +    case XEN_VERSION_OP_get_features:
> +        *sz = sizeof(xen_feature_info_t);
> +        break;
> +
> +    case XEN_VERSION_OP_pagesize:
> +        *sz = sizeof(xen_version_op_val_t);
> +        break;

Please combine all the cases producing this value.

> +    case XEN_VERSION_OP_guest_handle:
> +        *sz = ARRAY_SIZE(current->domain->handle);
> +        break;
> +
> +    case XEN_VERSION_OP_commandline:
> +        *sz = ARRAY_SIZE(saved_cmdline);

strlen()?

> +DO(version_op)(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg,
> +               unsigned int len)
> +{
> +    union {
> +        xen_version_op_val_t n;

Would "v" or "val" be the more natural name here?

> +        xen_feature_info_t fi;
> +    } u = {};
> +    ssize_t sz = 0;
> +    const void *ptr = NULL;
> +    int rc = xsm_version_op(XSM_OTHER, cmd);
> +
> +    /* We can safely return -EPERM! */
> +    if ( rc )
> +        return rc;
> +
> +    rc = size_of_subops_data(cmd, &sz);
> +    if ( rc )
> +        return rc;
> +
> +    ASSERT(sz);
> +    /*
> +     * This hypercall also allows the client to probe. If it provides
> +     * a NULL arg we will return the size of the space it has to
> +     * allocate for the specific sub-op.
> +     */
> +    if ( guest_handle_is_null(arg) )
> +        return sz;
> +
> +    /*
> +     * The HYPERVISOR_xen_version differs in that some return the value,
> +     * and some copy it on back on argument. We follow the same rule for all
> +     * sub-ops: return 0 on success, positive value of bytes returned, and
> +     * always copy the result in arg. Yeey sanity!
> +     */
> +
> +    switch ( cmd )
> +    {
> +    case XEN_VERSION_OP_version:
> +        u.n = (xen_major_version() << 16) | xen_minor_version();
> +        break;
> +
> +    case XEN_VERSION_OP_extraversion:
> +        ptr = xen_extra_version();
> +        break;
> +
> +    case XEN_VERSION_OP_capabilities:
> +        ptr = capabilities_info(&sz);
> +        break;
> +
> +    case XEN_VERSION_OP_platform_parameters:
> +        u.n = HYPERVISOR_VIRT_START;
> +        break;
> +
> +    case XEN_VERSION_OP_changeset:
> +        ptr = xen_changeset();
> +        break;
> +
> +    case XEN_VERSION_OP_get_features:
> +        if ( copy_from_guest(&u.fi, arg, 1) )
> +        {
> +            rc = -EFAULT;
> +            break;
> +        }
> +        rc = get_features(current->domain, &u.fi);
> +        break;
> +
> +    case XEN_VERSION_OP_pagesize:
> +        u.n = PAGE_SIZE;
> +        break;
> +
> +    case XEN_VERSION_OP_guest_handle:
> +        ptr = current->domain->handle;
> +        break;
> +
> +    case XEN_VERSION_OP_commandline:
> +        ptr = saved_cmdline;
> +        break;
> +
> +    default:
> +        rc = -ENOSYS;
> +    }

Seeing this long switch() a second time I wonder why
size_of_subops_data() doesn't just get folded here, with the null
handle being taken care of below instead of above.

> +    if ( !rc )
> +    {
> +        ssize_t bytes;
> +
> +        if ( sz > len )
> +            bytes = len;
> +        else
> +            bytes = sz;

min() (for which sz and bytes being "unsigned int" would help)?

> +        if ( copy_to_guest(arg, ptr ? : &u, bytes) )
> +            rc = -EFAULT;
> +    }
> +    if ( !rc )
> +    {
> +        /*
> +         * We return len (truncate) worth of data even if we fail.
> +         */
> +        if ( sz > len )
> +            rc = -ENOBUFS;

Perhaps worth moving this up into the previous if(), such that
-EFAULT would also take precedence over -ENOBUFS?

> @@ -87,6 +95,68 @@ typedef struct xen_feature_info xen_feature_info_t;
>  #define XENVER_commandline 9
>  typedef char xen_commandline_t[1024];
>  
> +
> +
> +/*
> + * The HYPERCALL_version_op has a set of sub-ops which mirror the
> + * sub-ops of HYPERCALL_xen_version. However this hypercall differs
> + * radically from the former:
> + *  - It returns the amount of bytes returned.
> + *  - It will return -XEN_EPERM if the guest is not permitted.
> + *  - It will return the requested data in arg.
> + *  - It requires an third argument (len) for the length of the
> + *    arg. Naturally the arg has to fit the requested data otherwise
> + *    -XEN_ENOBUFS is returned.
> + *
> + * It also offers an mechanism to probe for the amount of bytes an
> + * sub-op will require. Having the arg have an NULL pointer will

s/pointer/handle/

> + * return the number of bytes requested for the operation. Or an
> + * negative value if an error is encountered.
> + */
> +
> +typedef uint64_t xen_version_op_val_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_version_op_val_t);
> +
> +typedef void xen_version_op_buf_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_version_op_buf_t);

Are these actually useful for anything? And for the various strings,
wouldn't a "char" handle be more natural?

> +/* arg == version_op_val_t. Encoded as major:minor (31..16:15..0) */

Please make explicit that 63...32 are zero (or whatever they really
are).

> +#define XEN_VERSION_OP_version      0
> +
> +/* arg == version_op_buf. Contains NUL terminated utf-8 string. */
> +#define XEN_VERSION_OP_extraversion 1
> +
> +/* arg == version_op_buf. Contains NUL terminated utf-8 string. */
> +#define XEN_VERSION_OP_capabilities 3
> +
> +/* arg == version_op_buf. Contains NUL terminated utf-8 string. */
> +#define XEN_VERSION_OP_changeset 4
> +
> +/*
> + * arg == xen_version_op_val_t. Contains the virtual address
> + * of the hypervisor encoded as [63..0].

I'd say the encoding info here is unnecessary and could - just like
you already do for pagesize below - be omitted.

> + */
> +#define XEN_VERSION_OP_platform_parameters 5
> +
> +/*
> + * arg = xen_feature_info_t - shares the same structure
> + * as the XENVER_get_features.
> + */
> +#define XEN_VERSION_OP_get_features 6
> +
> +/* arg == xen_version_op_val_t. */
> +#define XEN_VERSION_OP_pagesize 7
> +
> +/* arg == version_op_buf.
> + *
> + * The toolstack fills it out for guest consumption. It is intended to hold
> + * the UUID of the guest.
> + */
> +#define XEN_VERSION_OP_guest_handle 8
> +
> +/* arg = version_op_buf. Contains NUL terminated utf-8 string. */
> +#define XEN_VERSION_OP_commandline 9
> +
>  #endif /* __XEN_PUBLIC_VERSION_H__ */

Would leaving out the _OP and _op everywhere result in any
name collisions with the old one?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 06/34] x86/arm: Add BUGFRAME_NR define and BUILD checks.
  2016-03-15 17:56 ` [PATCH v4 06/34] x86/arm: Add BUGFRAME_NR define and BUILD checks Konrad Rzeszutek Wilk
  2016-03-15 18:54   ` Andrew Cooper
  2016-03-16 11:49   ` Julien Grall
@ 2016-03-18 12:40   ` Jan Beulich
  2016-03-18 19:59     ` Konrad Rzeszutek Wilk
  2 siblings, 1 reply; 124+ messages in thread
From: Jan Beulich @ 2016-03-18 12:40 UTC (permalink / raw)
  To: mpohlack, andrew.cooper3, ross.lagerwall, konrad, xen-devel,
	Konrad Rzeszutek Wilk, sasha.levin
  Cc: Keir Fraser, Julien Grall, Stefano Stabellini

>>> On 15.03.16 at 18:56, <konrad.wilk@oracle.com> wrote:
> --- a/xen/include/asm-arm/bug.h
> +++ b/xen/include/asm-arm/bug.h
> @@ -31,6 +31,7 @@ struct bug_frame {
>  #define BUGFRAME_warn   0
>  #define BUGFRAME_bug    1
>  #define BUGFRAME_assert 2
> +#define BUGFRAME_NR     3
>  
>  /* Many versions of GCC doesn't support the asm %c parameter which would
>   * be preferable to this unpleasantness. We use mergeable string
> @@ -39,6 +40,7 @@ struct bug_frame {
>   */
>  #define BUG_FRAME(type, line, file, has_msg, msg) do {                      \
>      BUILD_BUG_ON((line) >> 16);                                             \
> +    BUILD_BUG_ON(type >= BUGFRAME_NR);                                      \

The x86 variant has type properly parenthesized - why not here?

> --- a/xen/include/asm-x86/bug.h
> +++ b/xen/include/asm-x86/bug.h
> @@ -9,7 +9,7 @@
>  #define BUGFRAME_warn   1
>  #define BUGFRAME_bug    2
>  #define BUGFRAME_assert 3
> -
> +#define BUGFRAME_NR     4
>  #ifndef __ASSEMBLY__

Please retain the blank line.

> @@ -51,6 +51,7 @@ struct bug_frame {
>  
>  #define BUG_FRAME(type, line, ptr, second_frame, msg) do {                   \
>      BUILD_BUG_ON((line) >> (BUG_LINE_LO_WIDTH + BUG_LINE_HI_WIDTH));         \
> +    BUILD_BUG_ON((type) >= (BUGFRAME_NR));                                   \

The ARM variant has BUGFRAME_NR properly un-parenthesized -
why here?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 07/34] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables
  2016-03-15 17:56 ` [PATCH v4 07/34] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables Konrad Rzeszutek Wilk
  2016-03-15 19:24   ` Andrew Cooper
@ 2016-03-18 13:07   ` Jan Beulich
  2016-03-22 20:18     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 124+ messages in thread
From: Jan Beulich @ 2016-03-18 13:07 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, mpohlack,
	Julien Grall, Stefano Stabellini, sasha.levin, xen-devel

>>> On 15.03.16 at 18:56, <konrad.wilk@oracle.com> wrote:
> lookup.

Was this meant to be part of $subject?

> @@ -1077,27 +1080,39 @@ void do_unexpected_trap(const char *msg, struct 
> cpu_user_regs *regs)
>  
>  int do_bug_frame(struct cpu_user_regs *regs, vaddr_t pc)
>  {
> -    const struct bug_frame *bug;
> +    const struct bug_frame *bug = NULL;
>      const char *prefix = "", *filename, *predicate;
>      unsigned long fixup;
> -    int id, lineno;
> -    static const struct bug_frame *const stop_frames[] = {
> -        __stop_bug_frames_0,
> -        __stop_bug_frames_1,
> -        __stop_bug_frames_2,
> -        NULL
> -    };
> +    int id = -1, lineno;
> +    struct virtual_region *region;
>  
> -    for ( bug = __start_bug_frames, id = 0; stop_frames[id]; ++bug )
> +    list_for_each_entry( region, &virtual_region_list, list )
>      {
> -        while ( unlikely(bug == stop_frames[id]) )
> -            ++id;
> +        unsigned int i;
>  
> -        if ( ((vaddr_t)bug_loc(bug)) == pc )
> -            break;
> -    }
> +        if ( region->skip && region->skip(CHECKING_BUG_FRAME, region->priv) )
> +            continue;
> +
> +        if ( pc < region->start || pc > region->end )
> +            continue;
>  
> -    if ( !stop_frames[id] )
> +        for ( id = 0; id < BUGFRAME_NR; id++ )
> +        {
> +            const struct bug_frame *b = NULL;

Pointless initializer.

> --- a/xen/arch/x86/extable.c
> +++ b/xen/arch/x86/extable.c
> @@ -1,6 +1,8 @@
>  
> +#include <xen/bug_ex_symbols.h>
>  #include <xen/config.h>

In cases like this please take the opportunity to get rid of the
explicit inclusion of xen/config.h.

> --- /dev/null
> +++ b/xen/common/bug_ex_symbols.c
> @@ -0,0 +1,119 @@
> +/*
> + * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
> + *
> + */
> +
> +#include <xen/bug_ex_symbols.h>
> +#include <xen/config.h>
> +#include <xen/kernel.h>
> +#include <xen/init.h>
> +#include <xen/spinlock.h>
> +
> +extern char __stext[];
> +
> +struct virtual_region kernel_text = {

static

> +    .list = LIST_HEAD_INIT(kernel_text.list),
> +    .start = (unsigned long)_stext,
> +    .end = (unsigned long)_etext,
> +#ifdef CONFIG_X86
> +    .ex = (struct exception_table_entry *)__start___ex_table,
> +    .ex_end = (struct exception_table_entry *)__stop___ex_table,
> +#endif

Is this together with ...

> +/*
> + * Becomes irrelevant when __init sections are cleared.
> + */
> +struct virtual_region kernel_inittext  = {
> +    .list = LIST_HEAD_INIT(kernel_inittext.list),
> +    .skip = ignore_if_active,
> +    .start = (unsigned long)_sinittext,
> +    .end = (unsigned long)_einittext,
> +#ifdef CONFIG_X86
> +    /* Even if they are __init their exception entry still gets stuck here. */
> +    .ex = (struct exception_table_entry *)__start___ex_table,
> +    .ex_end = (struct exception_table_entry *)__stop___ex_table,
> +#endif

... this really a good idea? I.e. are there not going to be any
odd side effects because of that redundancy?

Also note that the comment preceding this object is a single line one.

> +/*
> + * No locking. Additions are done either at startup (when there is only
> + * one CPU) or when all CPUs are running without IRQs.
> + *
> + * Deletions are big tricky. We MUST make sure all but one CPU
> + * are running cpu_relax().
> + *
> + */
> +LIST_HEAD(virtual_region_list);

I wonder whether this wouldn't better be static, with the iterator
that the various parties need getting put here as an out-of-line
function (instead of getting open coded in a couple of places).

> +void __init setup_virtual_regions(void)
> +{
> +    ssize_t sz;
> +    unsigned int i, idx;
> +    static const struct bug_frame *const stop_frames[] = {
> +        __start_bug_frames,
> +        __stop_bug_frames_0,
> +        __stop_bug_frames_1,
> +        __stop_bug_frames_2,
> +#ifdef CONFIG_X86
> +        __stop_bug_frames_3,
> +#endif
> +        NULL
> +    };
> +
> +#ifdef CONFIG_X86
> +    sort_exception_tables();
> +#endif
> +
> +    /* N.B. idx != i */
> +    for ( idx = 0, i = 1; stop_frames[i]; i++, idx++ )

Irrespective of the comment - why two loop variables when they get
incremented in lockstep.

> +    {
> +        struct bug_frame *s;
> +
> +        s = (struct bug_frame *)stop_frames[i-1];

Bogus cast, the more that it discards constness.

> @@ -95,10 +96,28 @@ static unsigned int get_symbol_offset(unsigned long pos)
>      return name - symbols_names;
>  }
>  
> +bool_t __is_active_kernel_text(unsigned long addr, symbols_lookup_t *cb)

No new (and even more so global) symbols with double underscores
at their beginning please.

> @@ -108,13 +127,17 @@ const char *symbols_lookup(unsigned long addr,
>  {
>      unsigned long i, low, high, mid;
>      unsigned long symbol_end = 0;
> +    symbols_lookup_t symbol_lookup = NULL;

Pointless initializer.

>      namebuf[KSYM_NAME_LEN] = 0;
>      namebuf[0] = 0;
>  
> -    if (!is_active_kernel_text(addr))
> +    if (!__is_active_kernel_text(addr, &symbol_lookup))
>          return NULL;
>  
> +    if (symbol_lookup)
> +        return (symbol_lookup)(addr, symbolsize, offset, namebuf);

Note that there are few coding style issues here (missing blanks,
superfluous parentheses).

> --- /dev/null
> +++ b/xen/include/xen/bug_ex_symbols.h
> @@ -0,0 +1,74 @@
> +/*
> + * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
> + *
> + */
> +
> +#ifndef __BUG_EX_SYMBOL_LIST__
> +#define __BUG_EX_SYMBOL_LIST__
> +
> +#include <xen/config.h>
> +#include <xen/list.h>
> +#include <xen/symbols.h>
> +
> +#ifdef CONFIG_X86
> +#include <asm/uaccess.h>
> +#endif

Why?

> +#include <asm/bug.h>
> +
> +struct virtual_region
> +{
> +    struct list_head list;
> +
> +#define CHECKING_SYMBOL         (1<<1)
> +#define CHECKING_BUG_FRAME      (1<<2)
> +#define CHECKING_EXCEPTION      (1<<3)
> +    /*
> +     * Whether to skip this region for particular searches. The flag
> +     * can be CHECKING_[SYMBOL|BUG_FRAMES|EXCEPTION].
> +     *
> +     * If the function returns 1 this region will be skipped.
> +     */
> +    bool_t (*skip)(unsigned int flag, unsigned long priv);
> +
> +    unsigned long start;        /* Virtual address start. */
> +    unsigned long end;          /* Virtual address start. */
> +
> +    /*
> +     * If ->skip returns false for CHECKING_SYMBOL we will use
> +     * 'symbols_lookup' callback to retrieve the name of the
> +     * addr between start and end. If this is NULL the
> +     * default lookup mechanism is used (the skip value is
> +     * ignored).
> +     */
> +    symbols_lookup_t symbols_lookup;
> +
> +    struct {
> +        struct bug_frame *bugs; /* The pointer to array of bug frames. */
> +        ssize_t n_bugs;         /* The number of them. */
> +    } frame[BUGFRAME_NR];
> +
> +#ifdef CONFIG_X86
> +    struct exception_table_entry *ex;
> +    struct exception_table_entry *ex_end;
> +#endif

The bug frame and exception related data are kind of odd to be
placed in a structure with this name. Would that not better be
accessed through ...

> +    unsigned long priv;         /* To be used by above funcionts if need to. */

... this by the interested parties?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 10/34] vmap: Add vmalloc_cb and vfree_cb
  2016-03-15 17:56 ` [PATCH v4 10/34] vmap: Add vmalloc_cb and vfree_cb Konrad Rzeszutek Wilk
@ 2016-03-18 13:20   ` Jan Beulich
  0 siblings, 0 replies; 124+ messages in thread
From: Jan Beulich @ 2016-03-18 13:20 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, Ian Jackson,
	Tim Deegan, mpohlack, sasha.levin, xen-devel

>>> On 15.03.16 at 18:56, <konrad.wilk@oracle.com> wrote:
> For those users who want to supply their own vmap callback.
> To be called _after_ the pages have been allocated and
> the vmap API is ready to hand out virtual addresses.
> 
> Instead of using the vmap ones it can call the callback
> which will be responsible for generating the virtual
> address.
> 
> This allows users (such as xSplice) to provide their own
> mechanism to set the page flags.
> The users (such as patch titled "xsplice: Implement payload
> loading") can wrap the calls to __vmap to accomplish this.
> 
> We also provide a mechanism for the calleer to squirrel
> the MFN array in case they want to modify the virtual
> addresses easily.
> 
> We also provide the free-ing code path - to use the vunmap_cb
> to take care of tearing down the virtual addresses.
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

To be honest, looking at this alone I'm not convinced. But I'll make
my final opinion dependent on seeing the actual use. Nevertheless
a few comments right away.

> @@ -238,11 +238,15 @@ void *vmalloc(size_t size)
>          mfn[i] = _mfn(page_to_mfn(pg));
>      }
>  
> -    va = vmap(mfn, pages);
> +    va = vmap_cb ? (vmap_cb)(mfn, pages) : vmap(mfn, pages);

Stray parentheses.

> @@ -266,7 +275,7 @@ void *vzalloc(size_t size)
>      return p;
>  }
>  
> -void vfree(void *va)
> +void vfree_cb(void *va, unsigned int nr_pages, vfree_cb_t vfree_cb_fnc)
>  {
>      unsigned int i, pages;
>      struct page_info *pg;
> @@ -275,8 +284,12 @@ void vfree(void *va)
>      if ( !va )
>          return;
>  
> -    pages = vm_size(va);
> -    ASSERT(pages);
> +    if ( !vfree_cb_fnc )
> +    {
> +        pages = vm_size(va);
> +        ASSERT(pages);
> +    } else

Coding style.

> +        pages = nr_pages;

So this "caller provides size" worries me in particular, the more that
this doesn't mirror anything the allocation side does. And if this
indeed is needed for usablity - why two variables with the same
purpose but different names (nr_pages and pages)?

> --- a/xen/include/xen/vmap.h
> +++ b/xen/include/xen/vmap.h
> @@ -12,9 +12,23 @@ void *__vmap(const mfn_t *mfn, unsigned int granularity,
>  void *vmap(const mfn_t *mfn, unsigned int nr);
>  void vunmap(const void *);
>  void *vmalloc(size_t size);
> +
> +/*
> + * Callback for vmalloc_cb to use when vmap-ing.
> + */
> +typedef void *(*vmap_cb_t)(const mfn_t *mfn, unsigned int pages);
> +void *vmalloc_cb(size_t size, vmap_cb_t vmap_cb, mfn_t **);

I think it is generally better to typedef such to be functions, not
pointers to functions, as that allows function declarations to be
made using the typedef (which in turn avoids having to touch
those declarations when the typedef changes).

> +void *vmalloc_cb(size_t size, vmap_cb_t vmap_cb, mfn_t **);

Doing so then also makes more obvious in such declarations that
the respective parameter is a pointer.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 03/34] xsm/xen_version: Add XSM for the xen_version hypercall
  2016-03-18 11:55   ` Jan Beulich
@ 2016-03-18 17:26     ` Konrad Rzeszutek Wilk
  2016-03-21 11:22       ` Jan Beulich
  0 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-18 17:26 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Stefano Stabellini, andrew.cooper3, Ian Jackson,
	mpohlack, ross.lagerwall, xen-devel, Daniel De Graaf,
	sasha.levin

On Fri, Mar 18, 2016 at 05:55:55AM -0600, Jan Beulich wrote:
> >>> On 15.03.16 at 18:56, <konrad.wilk@oracle.com> wrote:
> > @@ -223,12 +224,15 @@ void __init do_initcalls(void)
> >  /*
> >   * Simple hypercalls.
> >   */
> > -
> >  DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> 
> Please retain the blank line, as it relates to more than just this
> one function.

Done! (stray change).
> 
> >  {
> > +    bool_t deny = !!xsm_xen_version(XSM_OTHER, cmd);
> > +
> >      switch ( cmd )
> >      {
> >      case XENVER_version:
> > +        if ( deny )
> > +            return 0;
> >          return (xen_major_version() << 16) | xen_minor_version();
> 
> To be honest, I'm now rather uncertain about this one: If a guest
> can't figure out the hypervisor version, how would it be able to
> adjust its behavior accordingly (e.g. use deprecated hypercalls as
> needed)? IOW, other than for most/all other stuff here (the
> get-features and platform-parameters sub-ops may be considered
> similar to this one, see also below), I don't think allowing the
> "permitted" default to be overridden makes sense here.

I don't want to crash old guests or lead them astray. Removed the 'deny' here.
Also removed the XSM checks for this sub-op (and the others below)
as they are ignored.
> 
> > @@ -274,6 +279,9 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> >              .virt_start = HYPERVISOR_VIRT_START
> >          };
> >  
> > +        if ( deny )
> > +            params.virt_start = 0;
> 
> Guests may (validly imo) assume to get a valid address here. If you
> mean to not expose the non-constant address in the compat mode
> case, I could accept that. But you would then need to set the ABI
> mandated __HYPERVISOR_COMPAT_VIRT_START (and retain the
> constant value in the non-compat case). Our old 32-bit PV guests
> would crash extremely early on boot if they got back zero here
> (that's for 2.6.30 and later, and I think both you and Citrix had
> derived some of their kernels from our 2.6.32 based one).

OK. Let me also relax this one and always return a value.
> 
> > @@ -302,6 +310,8 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> >          switch ( fi.submap_idx )
> >          {
> >          case 0:
> > +            if ( deny )
> > +                break;
> 
> I think if to be put here at all, this should go ahead of the switch(),

I am OK not acking on the XSM check. It really throws a wrench in Linux
(upstream Linux hangs when initializing the XenBus frontend driver).

> so that guests wouldn't be able to guess from the valid index values
> which features may be available. And of course you should clear
> fi.submap if you deny access, instead of leaving in it what has been
> there before.
> 
> >      case XENVER_guest_handle:
> > -        if ( copy_to_guest(arg, current->domain->handle,
> > -                           ARRAY_SIZE(current->domain->handle)) )
> > +    {
> > +        xen_domain_handle_t hdl;
> > +        ssize_t len;
> > +
> > +        if ( deny )
> > +        {
> > +            len = sizeof(hdl);
> > +            memset(&hdl, 0, len);
> > +        } else
> > +            len = ARRAY_SIZE(current->domain->handle);
> > +
> > +        if ( copy_to_guest(arg, deny ? hdl : current->domain->handle, len ) )
> >              return -EFAULT;
> >          return 0;
> 
> What is this "len" handling here about? Aren't both the same type
> and hence size? Perhaps, if you feel unsure about that, simply add
> a respective BUILD_BUG_ON()?

Yes they are. Used a BUILD_BUG_ON just in case somebody mucks
around.
> 
> > --- a/xen/include/xen/version.h
> > +++ b/xen/include/xen/version.h
> > @@ -12,5 +12,5 @@ unsigned int xen_minor_version(void);
> >  const char *xen_extra_version(void);
> >  const char *xen_changeset(void);
> >  const char *xen_banner(void);
> > -
> > +const char *xen_deny(void);
> >  #endif /* __XEN_VERSION_H__ */
> 
> Please retain the blank line.

Yes. 
> 
> Jan
> 

Inline is what the patch now looks like:

From 0d5d62a9f15b8306e0c62fb00af193a733af435c Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Fri, 11 Mar 2016 21:40:43 -0500
Subject: [PATCH] xsm/xen_version: Add XSM for most of xen_version hypercall

Most of XENVER_* have now an XSM check for their sub-ops.

The subop for XENVER_commandline is now a priviliged operation.
To not break guests we still return an string - but it is
just '<denied>\0'.

The XENVER_[version|parameters|get_features] - will always
return an value to the guest.

The rest: XENVER_[extraversion|capabilities|page_size|
guest_handle|changeset| compile_info] behave as before -
allowed by default for all guests if using the XSM default
policy or with the dummy one. And if the system admin
wants to curtail access to some of them - they can do
that now with a non-default XSM policy.

Also we add a local variable block.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>

v2: Do XSM check for all the XENVER_ ops.
 - Add empty data conditions.
 - Return <denied> for priv subops.
 - Move extraversion from priv to normal. Drop the XSM check
    for the non-priv subops.
v3:
 - Add +1 for strlen(xen_deny()) to include NULL. Move changeset,
    compile_info to non-priv subops.
 - Remove the \0 on xen_deny()
 - Add new XSM domain for xenver hypercall. Add all subops to it.
 - Remove the extra line, Add Ack from Daniel
v4:
 - Rename the XSM from xen_version_op to xsm_xen_version.
   Prefix the types with 'xen' to distinguish it from another
   hypercall performing similar operation. Removed Ack from Daniel
   as it was so large. Add local variable block.
v5:
 - Make XENVER_platform_parameters,get_features,version be excluded
   from the XSM check per Jan's review. Add BUILD_BUG_CHECK and fix
   odd line removals.
---
 tools/flask/policy/policy/modules/xen/xen.te | 14 +++++++++
 xen/common/kernel.c                          | 43 +++++++++++++++++++++-------
 xen/common/version.c                         | 15 ++++++++++
 xen/include/xen/version.h                    |  1 +
 xen/include/xsm/dummy.h                      | 24 ++++++++++++++++
 xen/include/xsm/xsm.h                        |  5 ++++
 xen/xsm/dummy.c                              |  1 +
 xen/xsm/flask/hooks.c                        | 39 +++++++++++++++++++++++++
 xen/xsm/flask/policy/access_vectors          | 25 ++++++++++++++++
 xen/xsm/flask/policy/security_classes        |  1 +
 10 files changed, 157 insertions(+), 11 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index d35ae22..18f49b5 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -73,6 +73,14 @@ allow dom0_t xen_t:xen2 {
     pmu_ctrl
     get_symbol
 };
+
+# Allow dom0 to use all XENVER_ subops that have checks.
+# Note that dom0 is part of domain_type so this has duplicates.
+allow dom0_t xen_t:version {
+    xen_extraversion xen_compile_info xen_capabilities
+    xen_changeset xen_pagesize xen_guest_handle xen_commandline
+};
+
 allow dom0_t xen_t:mmu memorymap;
 
 # Allow dom0 to use these domctls on itself. For domctls acting on other
@@ -137,6 +145,12 @@ if (guest_writeconsole) {
 # pmu_ctrl is for)
 allow domain_type xen_t:xen2 pmu_use;
 
+# For normal guests all possible except XENVER_commandline.
+allow domain_type xen_t:version {
+    xen_extraversion xen_compile_info xen_capabilities
+    xen_changeset  xen_pagesize xen_guest_handle
+};
+
 ###############################################################################
 #
 # Domain creation
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 0618da2..06ecf26 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -13,6 +13,7 @@
 #include <xen/nmi.h>
 #include <xen/guest_access.h>
 #include <xen/hypercall.h>
+#include <xsm/xsm.h>
 #include <asm/current.h>
 #include <public/nmi.h>
 #include <public/version.h>
@@ -226,6 +227,8 @@ void __init do_initcalls(void)
 
 DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
+    bool_t deny = !!xsm_xen_version(XSM_OTHER, cmd);
+
     switch ( cmd )
     {
     case XENVER_version:
@@ -236,7 +239,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         xen_extraversion_t extraversion;
 
         memset(extraversion, 0, sizeof(extraversion));
-        safe_strcpy(extraversion, xen_extra_version());
+        safe_strcpy(extraversion, deny ? xen_deny() : xen_extra_version());
         if ( copy_to_guest(arg, extraversion, ARRAY_SIZE(extraversion)) )
             return -EFAULT;
         return 0;
@@ -247,10 +250,10 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         xen_compile_info_t info;
 
         memset(&info, 0, sizeof(info));
-        safe_strcpy(info.compiler,       xen_compiler());
-        safe_strcpy(info.compile_by,     xen_compile_by());
-        safe_strcpy(info.compile_domain, xen_compile_domain());
-        safe_strcpy(info.compile_date,   xen_compile_date());
+        safe_strcpy(info.compiler,       deny ? xen_deny() : xen_compiler());
+        safe_strcpy(info.compile_by,     deny ? xen_deny() : xen_compile_by());
+        safe_strcpy(info.compile_domain, deny ? xen_deny() : xen_compile_domain());
+        safe_strcpy(info.compile_date,   deny ? xen_deny() : xen_compile_date());
         if ( copy_to_guest(arg, &info, 1) )
             return -EFAULT;
         return 0;
@@ -261,7 +264,8 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         xen_capabilities_info_t info;
 
         memset(info, 0, sizeof(info));
-        arch_get_xen_caps(&info);
+        if ( !deny )
+            arch_get_xen_caps(&info);
 
         if ( copy_to_guest(arg, info, ARRAY_SIZE(info)) )
             return -EFAULT;
@@ -285,7 +289,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         xen_changeset_info_t chgset;
 
         memset(chgset, 0, sizeof(chgset));
-        safe_strcpy(chgset, xen_changeset());
+        safe_strcpy(chgset, deny ? xen_deny() : xen_changeset());
         if ( copy_to_guest(arg, chgset, ARRAY_SIZE(chgset)) )
             return -EFAULT;
         return 0;
@@ -342,19 +346,36 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     }
 
     case XENVER_pagesize:
+        if ( deny )
+            return 0;
         return (!guest_handle_is_null(arg) ? -EINVAL : PAGE_SIZE);
 
     case XENVER_guest_handle:
-        if ( copy_to_guest(arg, current->domain->handle,
-                           ARRAY_SIZE(current->domain->handle)) )
+    {
+        xen_domain_handle_t hdl;
+
+        if ( deny )
+            memset(&hdl, 0, ARRAY_SIZE(hdl));
+
+        BUILD_BUG_ON(ARRAY_SIZE(current->domain->handle) != ARRAY_SIZE(hdl));
+
+        if ( copy_to_guest(arg, deny ? hdl : current->domain->handle,
+                           ARRAY_SIZE(hdl) ) )
             return -EFAULT;
         return 0;
-
+    }
     case XENVER_commandline:
-        if ( copy_to_guest(arg, saved_cmdline, ARRAY_SIZE(saved_cmdline)) )
+    {
+        size_t len = ARRAY_SIZE(saved_cmdline);
+
+        if ( deny )
+            len = strlen(xen_deny()) + 1;
+
+        if ( copy_to_guest(arg, deny ? xen_deny() : saved_cmdline, len) )
             return -EFAULT;
         return 0;
     }
+    }
 
     return -ENOSYS;
 }
diff --git a/xen/common/version.c b/xen/common/version.c
index b152e27..fc9bf42 100644
--- a/xen/common/version.c
+++ b/xen/common/version.c
@@ -55,3 +55,18 @@ const char *xen_banner(void)
 {
     return XEN_BANNER;
 }
+
+const char *xen_deny(void)
+{
+    return "<denied>";
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/version.h b/xen/include/xen/version.h
index 81a3c7d..2015c0b 100644
--- a/xen/include/xen/version.h
+++ b/xen/include/xen/version.h
@@ -12,5 +12,6 @@ unsigned int xen_minor_version(void);
 const char *xen_extra_version(void);
 const char *xen_changeset(void);
 const char *xen_banner(void);
+const char *xen_deny(void);
 
 #endif /* __XEN_VERSION_H__ */
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 1d13826..87be9e5 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -727,3 +727,27 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, unsigned int
 }
 
 #endif /* CONFIG_X86 */
+
+#include <public/version.h>
+static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
+{
+    XSM_ASSERT_ACTION(XSM_OTHER);
+    switch ( op )
+    {
+    case XENVER_version:
+    case XENVER_platform_parameters:
+    case XENVER_get_features:
+        /* The sub-ops ignores the permission check and returns data. */
+        return 0;
+    case XENVER_extraversion:
+    case XENVER_compile_info:
+    case XENVER_capabilities:
+    case XENVER_changeset:
+    case XENVER_pagesize:
+    case XENVER_guest_handle:
+        /* These MUST always be accessible to any guest by default. */
+        return xsm_default_action(XSM_HOOK, current->domain, NULL);
+    default:
+        return xsm_default_action(XSM_PRIV, current->domain, NULL);
+    }
+}
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 3afed70..db440f6 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -193,6 +193,7 @@ struct xsm_operations {
     int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
     int (*pmu_op) (struct domain *d, unsigned int op);
 #endif
+    int (*xen_version) (uint32_t cmd);
 };
 
 #ifdef CONFIG_XSM
@@ -731,6 +732,10 @@ static inline int xsm_pmu_op (xsm_default_t def, struct domain *d, unsigned int
 
 #endif /* CONFIG_X86 */
 
+static inline int xsm_xen_version (xsm_default_t def, uint32_t op)
+{
+    return xsm_ops->xen_version(op);
+}
 #endif /* XSM_NO_WRAPPERS */
 
 #ifdef CONFIG_MULTIBOOT
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 0f32636..9791ad4 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -162,4 +162,5 @@ void xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, ioport_mapping);
     set_to_dummy_if_null(ops, pmu_op);
 #endif
+    set_to_dummy_if_null(ops, xen_version);
 }
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 4813623..1a95689 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -26,6 +26,7 @@
 #include <public/xen.h>
 #include <public/physdev.h>
 #include <public/platform.h>
+#include <public/version.h>
 
 #include <public/xsm/flask_op.h>
 
@@ -1620,6 +1621,43 @@ static int flask_pmu_op (struct domain *d, unsigned int op)
 }
 #endif /* CONFIG_X86 */
 
+static int flask_xen_version (uint32_t op)
+{
+    u32 dsid = domain_sid(current->domain);
+
+    switch ( op )
+    {
+    case XENVER_version:
+    case XENVER_platform_parameters:
+    case XENVER_get_features:
+        /* The sub-ops ignore the permission check and always return data. */
+        return 0;
+    case XENVER_extraversion:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_EXTRAVERSION, NULL);
+    case XENVER_compile_info:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_COMPILE_INFO, NULL);
+    case XENVER_capabilities:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_CAPABILITIES, NULL);
+    case XENVER_changeset:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_CHANGESET, NULL);
+    case XENVER_pagesize:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_PAGESIZE, NULL);
+    case XENVER_guest_handle:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_GUEST_HANDLE, NULL);
+    case XENVER_commandline:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_COMMANDLINE, NULL);
+    default:
+        return -EPERM;
+    }
+}
+
 long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 int compat_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 
@@ -1758,6 +1796,7 @@ static struct xsm_operations flask_ops = {
     .ioport_mapping = flask_ioport_mapping,
     .pmu_op = flask_pmu_op,
 #endif
+    .xen_version = flask_xen_version,
 };
 
 static __init void flask_init(void)
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index effb59f..badcf1c 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -495,3 +495,28 @@ class security
 # remove ocontext label definitions for resources
     del_ocontext
 }
+
+# Class version is used to describe the XENVER_ hypercall.
+# Almost all sub-ops are described here - in the default case all of them should
+# be allowed except the XENVER_commandline.
+#
+# The ones that are omitted are XENVER_version, XENVER_platform_parameters,
+# and XENVER_get_features  - as they MUST always be returned to a guest.
+#
+class version
+{
+# Extra informations (-unstable).
+    xen_extraversion
+# Compile information of the hypervisor.
+    xen_compile_info
+# Such as "xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64".
+    xen_capabilities
+# Source code changeset.
+    xen_changeset
+# Page size the hypervisor uses.
+    xen_pagesize
+# An value that the control stack can choose.
+    xen_guest_handle
+# Xen command line.
+    xen_commandline
+}
diff --git a/xen/xsm/flask/policy/security_classes b/xen/xsm/flask/policy/security_classes
index ca191db..cde4e1a 100644
--- a/xen/xsm/flask/policy/security_classes
+++ b/xen/xsm/flask/policy/security_classes
@@ -18,5 +18,6 @@ class shadow
 class event
 class grant
 class security
+class version
 
 # FLASK
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.
  2016-03-18 12:36         ` Jan Beulich
@ 2016-03-18 19:22           ` Konrad Rzeszutek Wilk
  2016-03-21 12:45             ` Jan Beulich
  0 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-18 19:22 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Stefano Stabellini, Andrew Cooper, Ian Jackson,
	mpohlack, ross.lagerwall, Julien Grall, Stefano Stabellini,
	xen-devel, Daniel De Graaf, Keir Fraser, sasha.levin

. snip..
> >  - VERSION_OP_commandline, VERSION_OP_changeset are privileged.
> 
> Aiui this is no difference to the old one anymore if we assume
> patches get committed in the order they're being presented in
> this series.

The old one (XENVER) would disallow XENVER_cmdline for guests. The
rest are allowed.

.. snip..
> > @@ -87,6 +95,68 @@ typedef struct xen_feature_info xen_feature_info_t;
> >  #define XENVER_commandline 9
> >  typedef char xen_commandline_t[1024];
> >  
> > +
> > +
> > +/*
> > + * The HYPERCALL_version_op has a set of sub-ops which mirror the
> > + * sub-ops of HYPERCALL_xen_version. However this hypercall differs
> > + * radically from the former:
> > + *  - It returns the amount of bytes returned.
> > + *  - It will return -XEN_EPERM if the guest is not permitted.
> > + *  - It will return the requested data in arg.
> > + *  - It requires an third argument (len) for the length of the
> > + *    arg. Naturally the arg has to fit the requested data otherwise
> > + *    -XEN_ENOBUFS is returned.
> > + *
> > + * It also offers an mechanism to probe for the amount of bytes an
> > + * sub-op will require. Having the arg have an NULL pointer will
> 
> s/pointer/handle/
> 
> > + * return the number of bytes requested for the operation. Or an
> > + * negative value if an error is encountered.
> > + */
> > +
> > +typedef uint64_t xen_version_op_val_t;
> > +DEFINE_XEN_GUEST_HANDLE(xen_version_op_val_t);
> > +
> > +typedef void xen_version_op_buf_t;
> > +DEFINE_XEN_GUEST_HANDLE(xen_version_op_buf_t);
> 
> Are these actually useful for anything? And for the various strings,

The xen_version_op_val_t is definitly used by the toolstack.

> wouldn't a "char" handle be more natural?

Heh. It was char[] before but Andrew liked it as void.

The xen_version_op_buf_t is not used but I do reference it below in
comment.  Let me put s/arg == version_op_buf/arg == char/ ?

Let me do that.

> 
> > +/* arg == version_op_val_t. Encoded as major:minor (31..16:15..0) */
> 
> Please make explicit that 63...32 are zero (or whatever they really
> are).
> 
> > +#define XEN_VERSION_OP_version      0
> > +
> > +/* arg == version_op_buf. Contains NUL terminated utf-8 string. */
> > +#define XEN_VERSION_OP_extraversion 1
> > +
> > +/* arg == version_op_buf. Contains NUL terminated utf-8 string. */
> > +#define XEN_VERSION_OP_capabilities 3
> > +
> > +/* arg == version_op_buf. Contains NUL terminated utf-8 string. */
> > +#define XEN_VERSION_OP_changeset 4
> > +
> > +/*
> > + * arg == xen_version_op_val_t. Contains the virtual address
> > + * of the hypervisor encoded as [63..0].
> 
> I'd say the encoding info here is unnecessary and could - just like
> you already do for pagesize below - be omitted.
> 
> > + */
> > +#define XEN_VERSION_OP_platform_parameters 5
> > +
> > +/*
> > + * arg = xen_feature_info_t - shares the same structure
> > + * as the XENVER_get_features.
> > + */
> > +#define XEN_VERSION_OP_get_features 6
> > +
> > +/* arg == xen_version_op_val_t. */
> > +#define XEN_VERSION_OP_pagesize 7
> > +
> > +/* arg == version_op_buf.
> > + *
> > + * The toolstack fills it out for guest consumption. It is intended to hold
> > + * the UUID of the guest.
> > + */
> > +#define XEN_VERSION_OP_guest_handle 8
> > +
> > +/* arg = version_op_buf. Contains NUL terminated utf-8 string. */
> > +#define XEN_VERSION_OP_commandline 9
> > +
> >  #endif /* __XEN_PUBLIC_VERSION_H__ */
> 
> Would leaving out the _OP and _op everywhere result in any
> name collisions with the old one?

It worked out fine. Made it XEN_VERSION_[..]

See inline new patch:

From aa7ba11778dd3ffb2a53ebf6123ddfb196f203d8 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Fri, 18 Mar 2016 13:41:07 -0400
Subject: [PATCH] HYPERCALL_version_op. New hypercall mirroring XENVER_ but
 sane.

This hypercall mirrors the XENVER_ in that it has similar functionality.
However it is designed differently:
 - No compat layer. The data structures are the same size on 32
   as on 64-bit.
 - The hypercall accepts three arguments - the command, pointer to
   an buffer, and the length of the buffer.
 - Each sub-ops can be "probed" for size by returning the size of
   buffer that will be needed - if the buffer is NULL.
 - Subops can complete even if the buffer is too small - truncated
   data will be filled and hypercall will return -ENOBUFS.
 - VERSION_commandline, VERSION_changeset are privileged.
 - There is no XENVER_compile_info equivalent.
 - The hypercall can return -EPERM and toolstack/OSes are expected
   to deal with it.

While we combine some of the common code between XENVER_ and VERSION_
take the liberty of moving pae_extended_cr3 in x86 area.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v1-v3: Was not part of the series.
v4: New posting.
v5: Remove memset and use {}. Tweak copy_to_guest and capabilities_info,
    add ASSERT(sz) per Andrew's review. Add cached=1 back in.
    Per Jan, s/VERSION_OP/VERSION/, squash size check with do_version_op,
    update the comments. Dropped Andrew's Review-by.
---
---
 tools/flask/policy/policy/modules/xen/xen.te |   9 +-
 xen/arch/arm/traps.c                         |   1 +
 xen/arch/x86/hvm/hvm.c                       |   1 +
 xen/arch/x86/x86_64/compat/entry.S           |   2 +
 xen/arch/x86/x86_64/entry.S                  |   2 +
 xen/common/compat/kernel.c                   |   2 +
 xen/common/kernel.c                          | 209 ++++++++++++++++++++++-----
 xen/include/public/arch-arm.h                |   3 +
 xen/include/public/version.h                 |  70 ++++++++-
 xen/include/public/xen.h                     |   1 +
 xen/include/xen/hypercall.h                  |   4 +
 xen/include/xsm/dummy.h                      |  19 +++
 xen/include/xsm/xsm.h                        |   7 +
 xen/xsm/dummy.c                              |   1 +
 xen/xsm/flask/hooks.c                        |  39 +++++
 xen/xsm/flask/policy/access_vectors          |  24 ++-
 16 files changed, 352 insertions(+), 42 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index 18f49b5..d61e697 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -74,11 +74,13 @@ allow dom0_t xen_t:xen2 {
     get_symbol
 };
 
-# Allow dom0 to use all XENVER_ subops that have checks.
+# Allow dom0 to use all XENVER_ subops that have checks and VERSION subops
 # Note that dom0 is part of domain_type so this has duplicates.
 allow dom0_t xen_t:version {
     xen_extraversion xen_compile_info xen_capabilities
     xen_changeset xen_pagesize xen_guest_handle xen_commandline
+    version extraversion capabilities changeset platform_parameters
+    get_features pagesize guest_handle commandline
 };
 
 allow dom0_t xen_t:mmu memorymap;
@@ -145,10 +147,13 @@ if (guest_writeconsole) {
 # pmu_ctrl is for)
 allow domain_type xen_t:xen2 pmu_use;
 
-# For normal guests all possible except XENVER_commandline.
+# For normal guests all possible except XENVER_commandline, VERSION_changeset,
+# and VERSION_commandline
 allow domain_type xen_t:version {
     xen_extraversion xen_compile_info xen_capabilities
     xen_changeset  xen_pagesize xen_guest_handle
+    version extraversion capabilities  platform_parameters
+    get_features pagesize guest_handle
 };
 
 ###############################################################################
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 83744e8..31d2115 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1235,6 +1235,7 @@ static arm_hypercall_t arm_hypercall_table[] = {
     HYPERCALL(multicall, 2),
     HYPERCALL(platform_op, 1),
     HYPERCALL_ARM(vcpu_op, 3),
+    HYPERCALL(version_op, 3),
 };
 
 #ifndef NDEBUG
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 80d59ff..f16b590 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -5322,6 +5322,7 @@ static const struct {
     COMPAT_CALL(platform_op),
     COMPAT_CALL(mmuext_op),
     HYPERCALL(xenpmu_op),
+    HYPERCALL(version_op),
     HYPERCALL(arch_1)
 };
 
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index 33e2c12..fd25e84 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -394,6 +394,7 @@ ENTRY(compat_hypercall_table)
         .quad do_tmem_op
         .quad do_ni_hypercall           /* reserved for XenClient */
         .quad do_xenpmu_op              /* 40 */
+        .quad do_version_op
         .rept __HYPERVISOR_arch_0-((.-compat_hypercall_table)/8)
         .quad compat_ni_hypercall
         .endr
@@ -445,6 +446,7 @@ ENTRY(compat_hypercall_args_table)
         .byte 1 /* do_tmem_op               */
         .byte 0 /* reserved for XenClient   */
         .byte 2 /* do_xenpmu_op             */  /* 40 */
+        .byte 3 /* do_version_op            */
         .rept __HYPERVISOR_arch_0-(.-compat_hypercall_args_table)
         .byte 0 /* compat_ni_hypercall      */
         .endr
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 07ef096..b0e7257 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -730,6 +730,7 @@ ENTRY(hypercall_table)
         .quad do_tmem_op
         .quad do_ni_hypercall       /* reserved for XenClient */
         .quad do_xenpmu_op          /* 40 */
+        .quad do_version_op
         .rept __HYPERVISOR_arch_0-((.-hypercall_table)/8)
         .quad do_ni_hypercall
         .endr
@@ -781,6 +782,7 @@ ENTRY(hypercall_args_table)
         .byte 1 /* do_tmem_op           */
         .byte 0 /* reserved for XenClient */
         .byte 2 /* do_xenpmu_op         */  /* 40 */
+        .byte 3 /* do_version_op        */
         .rept __HYPERVISOR_arch_0-(.-hypercall_args_table)
         .byte 0 /* do_ni_hypercall      */
         .endr
diff --git a/xen/common/compat/kernel.c b/xen/common/compat/kernel.c
index df93fdd..7a7ca53 100644
--- a/xen/common/compat/kernel.c
+++ b/xen/common/compat/kernel.c
@@ -39,6 +39,8 @@ CHECK_TYPE(capabilities_info);
 
 CHECK_TYPE(domain_handle);
 
+CHECK_TYPE(version_op_val);
+
 #define xennmi_callback compat_nmi_callback
 #define xennmi_callback_t compat_nmi_callback_t
 
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 06ecf26..3ea35cc 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -221,6 +221,47 @@ void __init do_initcalls(void)
 
 #endif
 
+static int get_features(struct domain *d, xen_feature_info_t *fi)
+{
+    switch ( fi->submap_idx )
+    {
+    case 0:
+        fi->submap = (1U << XENFEAT_memory_op_vnode_supported);
+        if ( paging_mode_translate(d) )
+            fi->submap |=
+                (1U << XENFEAT_writable_page_tables) |
+                (1U << XENFEAT_auto_translated_physmap);
+        if ( is_hardware_domain(d) )
+            fi->submap |= 1U << XENFEAT_dom0;
+#ifdef CONFIG_X86
+        if ( VM_ASSIST(d, pae_extended_cr3) )
+            fi->submap |= (1U << XENFEAT_pae_pgdir_above_4gb);
+        switch ( d->guest_type )
+        {
+        case guest_type_pv:
+            fi->submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) |
+                          (1U << XENFEAT_highmem_assist) |
+                          (1U << XENFEAT_gnttab_map_avail_bits);
+            break;
+        case guest_type_pvh:
+            fi->submap |= (1U << XENFEAT_hvm_safe_pvclock) |
+                          (1U << XENFEAT_supervisor_mode_kernel) |
+                          (1U << XENFEAT_hvm_callback_vector);
+            break;
+        case guest_type_hvm:
+            fi->submap |= (1U << XENFEAT_hvm_safe_pvclock) |
+                          (1U << XENFEAT_hvm_callback_vector) |
+                          (1U << XENFEAT_hvm_pirqs);
+           break;
+        }
+#endif
+        break;
+    default:
+        return -EINVAL;
+    }
+    return 0;
+}
+
 /*
  * Simple hypercalls.
  */
@@ -298,47 +339,14 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     case XENVER_get_features:
     {
         xen_feature_info_t fi;
-        struct domain *d = current->domain;
+        int rc;
 
         if ( copy_from_guest(&fi, arg, 1) )
             return -EFAULT;
 
-        switch ( fi.submap_idx )
-        {
-        case 0:
-            fi.submap = (1U << XENFEAT_memory_op_vnode_supported);
-            if ( VM_ASSIST(d, pae_extended_cr3) )
-                fi.submap |= (1U << XENFEAT_pae_pgdir_above_4gb);
-            if ( paging_mode_translate(d) )
-                fi.submap |= 
-                    (1U << XENFEAT_writable_page_tables) |
-                    (1U << XENFEAT_auto_translated_physmap);
-            if ( is_hardware_domain(d) )
-                fi.submap |= 1U << XENFEAT_dom0;
-#ifdef CONFIG_X86
-            switch ( d->guest_type )
-            {
-            case guest_type_pv:
-                fi.submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) |
-                             (1U << XENFEAT_highmem_assist) |
-                             (1U << XENFEAT_gnttab_map_avail_bits);
-                break;
-            case guest_type_pvh:
-                fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
-                             (1U << XENFEAT_supervisor_mode_kernel) |
-                             (1U << XENFEAT_hvm_callback_vector);
-                break;
-            case guest_type_hvm:
-                fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
-                             (1U << XENFEAT_hvm_callback_vector) |
-                             (1U << XENFEAT_hvm_pirqs);
-                break;
-            }
-#endif
-            break;
-        default:
-            return -EINVAL;
-        }
+        rc = get_features(current->domain, &fi);
+        if ( rc )
+            return rc;
 
         if ( __copy_to_guest(arg, &fi, 1) )
             return -EFAULT;
@@ -380,6 +388,133 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     return -ENOSYS;
 }
 
+static const char *capabilities_info(unsigned int *len)
+{
+    static xen_capabilities_info_t __read_mostly cached_cap;
+    static unsigned int __read_mostly cached_cap_len;
+    static bool_t cached;
+
+    if ( unlikely(!cached) )
+    {
+        arch_get_xen_caps(&cached_cap);
+        cached_cap_len = strlen(cached_cap) + 1;
+        cached = 1;
+    }
+
+    *len = cached_cap_len;
+    return cached_cap;
+}
+
+/*
+ * Similar to HYPERVISOR_xen_version but with a sane interface
+ * (has a length, one can probe for the length) and with one less sub-ops:
+ * missing XENVER_compile_info.
+ */
+DO(version_op)(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg,
+               unsigned int len)
+{
+    union {
+        xen_version_op_val_t val;
+        xen_feature_info_t fi;
+    } u = {};
+    unsigned int sz = 0;
+    const void *ptr = NULL;
+    int rc = xsm_version_op(XSM_OTHER, cmd);
+
+    /* We can safely return -EPERM! */
+    if ( rc )
+        return rc;
+
+    /*
+     * The HYPERVISOR_xen_version differs in that some return the value,
+     * and some copy it on back on argument. We follow the same rule for all
+     * sub-ops: return 0 on success, positive value of bytes returned, and
+     * always copy the result in arg. Yeey sanity!
+     */
+    switch ( cmd )
+    {
+    case XEN_VERSION_version:
+        sz = sizeof(xen_version_op_val_t);
+        u.val = (xen_major_version() << 16) | xen_minor_version();
+        break;
+
+    case XEN_VERSION_extraversion:
+        sz = strlen(xen_extra_version()) + 1;
+        ptr = xen_extra_version();
+        break;
+
+    case XEN_VERSION_capabilities:
+        ptr = capabilities_info(&sz);
+        break;
+
+    case XEN_VERSION_changeset:
+        sz = strlen(xen_changeset()) + 1;
+        ptr = xen_changeset();
+        break;
+
+    case XEN_VERSION_platform_parameters:
+        sz = sizeof(xen_version_op_val_t);
+        u.val = HYPERVISOR_VIRT_START;
+        break;
+
+    case XEN_VERSION_get_features:
+        if ( copy_from_guest(&u.fi, arg, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+        sz = sizeof(xen_feature_info_t);
+        rc = get_features(current->domain, &u.fi);
+        break;
+
+    case XEN_VERSION_pagesize:
+        sz = sizeof(xen_version_op_val_t);
+        u.val = PAGE_SIZE;
+        break;
+
+    case XEN_VERSION_guest_handle:
+        sz = ARRAY_SIZE(current->domain->handle);
+        ptr = current->domain->handle;
+        break;
+
+    case XEN_VERSION_commandline:
+        sz = strlen(saved_cmdline) + 1;
+        ptr = saved_cmdline;
+        break;
+
+    default:
+        rc = -ENOSYS;
+    }
+
+    if ( rc )
+        return rc;
+
+    /*
+     * This hypercall also allows the client to probe. If it provides
+     * a NULL arg we will return the size of the space it has to
+     * allocate for the specific sub-op.
+     */
+    ASSERT(sz);
+    if ( guest_handle_is_null(arg) )
+        return sz;
+
+    if ( !rc )
+    {
+        unsigned int bytes = min(sz, len);
+
+        if ( copy_to_guest(arg, ptr ? : &u, bytes) )
+            rc = -EFAULT;
+
+        /*
+         * We return len (truncate) worth of data even if we fail.
+         */
+        if ( !rc && sz > len )
+            rc = -ENOBUFS;
+    }
+
+    return rc == 0 ? sz : rc;
+}
+
 DO(nmi_op)(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct xennmi_callback cb;
diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index 870bc3b..c9ae315 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -128,6 +128,9 @@
  *    * VCPUOP_register_vcpu_info
  *    * VCPUOP_register_runstate_memory_area
  *
+ *  HYPERVISOR_version_op
+ *   All generic sub-operations
+ *
  *
  * Other notes on the ARM ABI:
  *
diff --git a/xen/include/public/version.h b/xen/include/public/version.h
index 24a582f..ebb0664 100644
--- a/xen/include/public/version.h
+++ b/xen/include/public/version.h
@@ -30,7 +30,15 @@
 
 #include "xen.h"
 
-/* NB. All ops return zero on success, except XENVER_{version,pagesize} */
+/*
+ * There are two hypercalls mentioned in here. The XENVER_ are for
+ * HYPERCALL_xen_version (17), while VERSION_ are for the
+ * HYPERCALL_version_op (41).
+ *
+ * The subops are very similar except that the later hypercall has a
+ * sane interface.
+ */
+
 
 /* arg == NULL; returns major:minor (16:16). */
 #define XENVER_version      0
@@ -87,6 +95,66 @@ typedef struct xen_feature_info xen_feature_info_t;
 #define XENVER_commandline 9
 typedef char xen_commandline_t[1024];
 
+
+
+/*
+ * The HYPERCALL_version_op has a set of sub-ops which mirror the
+ * sub-ops of HYPERCALL_xen_version. However this hypercall differs
+ * radically from the former:
+ *  - It returns the amount of bytes returned.
+ *  - It will return -XEN_EPERM if the guest is not permitted.
+ *  - It will return the requested data in arg.
+ *  - It requires an third argument (len) for the length of the
+ *    arg. Naturally the arg has to fit the requested data otherwise
+ *    -XEN_ENOBUFS is returned.
+ *
+ * It also offers an mechanism to probe for the amount of bytes an
+ * sub-op will require. Having the arg have an NULL handle will
+ * return the number of bytes requested for the operation. Or an
+ * negative value if an error is encountered.
+ */
+
+typedef uint64_t xen_version_op_val_t;
+DEFINE_XEN_GUEST_HANDLE(xen_version_op_val_t);
+
+/*
+ * arg == xen_version_op_val_t. Encoded as major:minor (31..16:15..0), while
+ * 63..32 are zero.
+ */
+#define XEN_VERSION_version             0
+
+/* arg == char. Contains NUL terminated utf-8 string. */
+#define XEN_VERSION_extraversion        1
+
+/* arg == char. Contains NUL terminated utf-8 string. */
+#define XEN_VERSION_capabilities        3
+
+/* arg == char. Contains NUL terminated utf-8 string. */
+#define XEN_VERSION_changeset           4
+
+/* arg == xen_version_op_val_t. */
+#define XEN_VERSION_platform_parameters 5
+
+/*
+ * arg = xen_feature_info_t - shares the same structure
+ * as the XENVER_get_features.
+ */
+#define XEN_VERSION_get_features        6
+
+/* arg == xen_version_op_val_t. */
+#define XEN_VERSION_pagesize            7
+
+/*
+ * arg == char.
+ *
+ * The toolstack fills it out for guest consumption. It is intended to hold
+ * the UUID of the guest.
+ */
+#define XEN_VERSION_guest_handle        8
+
+/* arg = char. Contains NUL terminated utf-8 string. */
+#define XEN_VERSION_commandline         9
+
 #endif /* __XEN_PUBLIC_VERSION_H__ */
 
 /*
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 64ba7ab..1a99929 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -115,6 +115,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define __HYPERVISOR_tmem_op              38
 #define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
 #define __HYPERVISOR_xenpmu_op            40
+#define __HYPERVISOR_version_op           41 /* supersedes xen_version (17) */
 
 /* Architecture-specific hypercall definitions. */
 #define __HYPERVISOR_arch_0               48
diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
index 0c8ae0e..e8d2b81 100644
--- a/xen/include/xen/hypercall.h
+++ b/xen/include/xen/hypercall.h
@@ -147,6 +147,10 @@ do_xenoprof_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg);
 extern long
 do_xenpmu_op(unsigned int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg);
 
+extern long
+do_version_op(unsigned int cmd,
+    XEN_GUEST_HANDLE_PARAM(void) arg, unsigned int len);
+
 #ifdef CONFIG_COMPAT
 
 extern int
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 87be9e5..71d9509 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -751,3 +751,22 @@ static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
         return xsm_default_action(XSM_PRIV, current->domain, NULL);
     }
 }
+
+static XSM_INLINE int xsm_version_op (XSM_DEFAULT_ARG uint32_t op)
+{
+    XSM_ASSERT_ACTION(XSM_OTHER);
+    switch ( op )
+    {
+    case XEN_VERSION_version:
+    case XEN_VERSION_extraversion:
+    case XEN_VERSION_capabilities:
+    case XEN_VERSION_platform_parameters:
+    case XEN_VERSION_get_features:
+    case XEN_VERSION_pagesize:
+    case XEN_VERSION_guest_handle:
+        /* These MUST always be accessible to any guest by default. */
+        return xsm_default_action(XSM_HOOK, current->domain, NULL);
+    default:
+        return xsm_default_action(XSM_PRIV, current->domain, NULL);
+    }
+}
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index db440f6..ac80472 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -194,6 +194,7 @@ struct xsm_operations {
     int (*pmu_op) (struct domain *d, unsigned int op);
 #endif
     int (*xen_version) (uint32_t cmd);
+    int (*version_op) (uint32_t cmd);
 };
 
 #ifdef CONFIG_XSM
@@ -736,6 +737,12 @@ static inline int xsm_xen_version (xsm_default_t def, uint32_t op)
 {
     return xsm_ops->xen_version(op);
 }
+
+static inline int xsm_version_op (xsm_default_t def, uint32_t op)
+{
+    return xsm_ops->version_op(op);
+}
+
 #endif /* XSM_NO_WRAPPERS */
 
 #ifdef CONFIG_MULTIBOOT
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 9791ad4..776dd09 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -163,4 +163,5 @@ void xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, pmu_op);
 #endif
     set_to_dummy_if_null(ops, xen_version);
+    set_to_dummy_if_null(ops, version_op);
 }
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 1a95689..7569086 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1658,6 +1658,44 @@ static int flask_xen_version (uint32_t op)
     }
 }
 
+static int flask_version_op (uint32_t op)
+{
+    u32 dsid = domain_sid(current->domain);
+
+    switch ( op )
+    {
+    case XEN_VERSION_version:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__VERSION, NULL);
+    case XEN_VERSION_extraversion:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__EXTRAVERSION, NULL);
+    case XEN_VERSION_capabilities:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__CAPABILITIES, NULL);
+    case XEN_VERSION_changeset:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__CHANGESET, NULL);
+    case XEN_VERSION_platform_parameters:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__PLATFORM_PARAMETERS, NULL);
+    case XEN_VERSION_get_features:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__GET_FEATURES, NULL);
+    case XEN_VERSION_pagesize:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__PAGESIZE, NULL);
+    case XEN_VERSION_guest_handle:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__GUEST_HANDLE, NULL);
+    case XEN_VERSION_commandline:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__COMMANDLINE, NULL);
+    default:
+        return -EPERM;
+    }
+}
+
 long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 int compat_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 
@@ -1797,6 +1835,7 @@ static struct xsm_operations flask_ops = {
     .pmu_op = flask_pmu_op,
 #endif
     .xen_version = flask_xen_version,
+    .version_op = flask_version_op,
 };
 
 static __init void flask_init(void)
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index badcf1c..7b098db 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -496,9 +496,10 @@ class security
     del_ocontext
 }
 
-# Class version is used to describe the XENVER_ hypercall.
+# Class version is used to describe the XENVER_ and VERSION hypercall.
 # Almost all sub-ops are described here - in the default case all of them should
-# be allowed except the XENVER_commandline.
+# be allowed except the XENVER_commandline, VERSION_commandline, and
+# VERSION_changeset.
 #
 # The ones that are omitted are XENVER_version, XENVER_platform_parameters,
 # and XENVER_get_features  - as they MUST always be returned to a guest.
@@ -519,4 +520,23 @@ class version
     xen_guest_handle
 # Xen command line.
     xen_commandline
+# --- VERSION hypercall ---
+# Often called by PV kernels to force an callback.
+    version
+# Extra informations (-unstable).
+    extraversion
+# Such as "xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64".
+    capabilities
+# Such as the virtual address of where the hypervisor resides.
+    platform_parameters
+# Source code changeset.
+    changeset
+# The features the hypervisor supports.
+    get_features
+# Page size the hypervisor uses.
+    pagesize
+# An value that the control stack can choose.
+    guest_handle
+# Xen command line.
+    commandline
 }
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 06/34] x86/arm: Add BUGFRAME_NR define and BUILD checks.
  2016-03-18 12:40   ` Jan Beulich
@ 2016-03-18 19:59     ` Konrad Rzeszutek Wilk
  2016-03-21 12:49       ` Jan Beulich
  0 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-18 19:59 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, mpohlack,
	Julien Grall, Stefano Stabellini, sasha.levin, xen-devel

On Fri, Mar 18, 2016 at 06:40:31AM -0600, Jan Beulich wrote:
> >>> On 15.03.16 at 18:56, <konrad.wilk@oracle.com> wrote:
> > --- a/xen/include/asm-arm/bug.h
> > +++ b/xen/include/asm-arm/bug.h
> > @@ -31,6 +31,7 @@ struct bug_frame {
> >  #define BUGFRAME_warn   0
> >  #define BUGFRAME_bug    1
> >  #define BUGFRAME_assert 2
> > +#define BUGFRAME_NR     3
> >  
> >  /* Many versions of GCC doesn't support the asm %c parameter which would
> >   * be preferable to this unpleasantness. We use mergeable string
> > @@ -39,6 +40,7 @@ struct bug_frame {
> >   */
> >  #define BUG_FRAME(type, line, file, has_msg, msg) do {                      \
> >      BUILD_BUG_ON((line) >> 16);                                             \
> > +    BUILD_BUG_ON(type >= BUGFRAME_NR);                                      \
> 
> The x86 variant has type properly parenthesized - why not here?
> 
> > --- a/xen/include/asm-x86/bug.h
> > +++ b/xen/include/asm-x86/bug.h
> > @@ -9,7 +9,7 @@
> >  #define BUGFRAME_warn   1
> >  #define BUGFRAME_bug    2
> >  #define BUGFRAME_assert 3
> > -
> > +#define BUGFRAME_NR     4
> >  #ifndef __ASSEMBLY__
> 
> Please retain the blank line.
> 
> > @@ -51,6 +51,7 @@ struct bug_frame {
> >  
> >  #define BUG_FRAME(type, line, ptr, second_frame, msg) do {                   \
> >      BUILD_BUG_ON((line) >> (BUG_LINE_LO_WIDTH + BUG_LINE_HI_WIDTH));         \
> > +    BUILD_BUG_ON((type) >= (BUGFRAME_NR));                                   \
> 
> The ARM variant has BUGFRAME_NR properly un-parenthesized -
> why here?

I know I copied and pasted it and I must have done something uncanny.

Anyhow this is what the change looks like now (I've retained the Reviewed
and Ack as I think this change is mostly cosmetical in nature?)

From 123ad665b283f8c59688bd86be0bbe79ce5723bb Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Thu, 10 Mar 2016 16:45:31 -0500
Subject: [PATCH] x86/arm: Add BUGFRAME_NR define and BUILD checks.

So that we have a nice mechansim to figure out the upper
bounds of bug.frames and also catch compiler errors in case
one tries to use a higher frame number.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>

---
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v3: First time included.
v4: Add BUG_FRAME check also in the assembler version of the macro.
v5: Add Acks, make BUILD_BUG_ON checks look correct. Position the
    BUGFRAME_NR properly.
---
---
 xen/include/asm-arm/bug.h | 3 +++
 xen/include/asm-x86/bug.h | 7 +++++++
 2 files changed, 10 insertions(+)

diff --git a/xen/include/asm-arm/bug.h b/xen/include/asm-arm/bug.h
index ab9e811..68353e1 100644
--- a/xen/include/asm-arm/bug.h
+++ b/xen/include/asm-arm/bug.h
@@ -32,6 +32,8 @@ struct bug_frame {
 #define BUGFRAME_bug    1
 #define BUGFRAME_assert 2
 
+#define BUGFRAME_NR     3
+
 /* Many versions of GCC doesn't support the asm %c parameter which would
  * be preferable to this unpleasantness. We use mergeable string
  * sections to avoid multiple copies of the string appearing in the
@@ -39,6 +41,7 @@ struct bug_frame {
  */
 #define BUG_FRAME(type, line, file, has_msg, msg) do {                      \
     BUILD_BUG_ON((line) >> 16);                                             \
+    BUILD_BUG_ON((type) >= BUGFRAME_NR);                                    \
     asm ("1:"BUG_INSTR"\n"                                                  \
          ".pushsection .rodata.str, \"aMS\", %progbits, 1\n"                \
          "2:\t.asciz " __stringify(file) "\n"                               \
diff --git a/xen/include/asm-x86/bug.h b/xen/include/asm-x86/bug.h
index e868e85..7825565 100644
--- a/xen/include/asm-x86/bug.h
+++ b/xen/include/asm-x86/bug.h
@@ -10,6 +10,7 @@
 #define BUGFRAME_bug    2
 #define BUGFRAME_assert 3
 
+#define BUGFRAME_NR     4
 #ifndef __ASSEMBLY__
 
 struct bug_frame {
@@ -51,6 +52,7 @@ struct bug_frame {
 
 #define BUG_FRAME(type, line, ptr, second_frame, msg) do {                   \
     BUILD_BUG_ON((line) >> (BUG_LINE_LO_WIDTH + BUG_LINE_HI_WIDTH));         \
+    BUILD_BUG_ON((type) >= BUGFRAME_NR);                                     \
     asm volatile ( _ASM_BUGFRAME_TEXT(second_frame)                          \
                    :: _ASM_BUGFRAME_INFO(type, line, ptr, msg) );            \
 } while (0)
@@ -83,6 +85,11 @@ extern const struct bug_frame __start_bug_frames[],
  * in .rodata
  */
     .macro BUG_FRAME type, line, file_str, second_frame, msg
+
+    .if \type >= BUGFRAME_NR
+        .error "Invalid BUGFRAME index"
+    .endif
+
     .L\@ud: ud2a
 
     .pushsection .rodata.str1, "aMS", @progbits, 1
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 03/34] xsm/xen_version: Add XSM for the xen_version hypercall
  2016-03-18 17:26     ` Konrad Rzeszutek Wilk
@ 2016-03-21 11:22       ` Jan Beulich
  2016-03-22 16:10         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 124+ messages in thread
From: Jan Beulich @ 2016-03-21 11:22 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Stefano Stabellini, andrew.cooper3, Ian Jackson,
	mpohlack, ross.lagerwall, xen-devel, Daniel De Graaf,
	sasha.levin

>>> On 18.03.16 at 18:26, <konrad.wilk@oracle.com> wrote:
> On Fri, Mar 18, 2016 at 05:55:55AM -0600, Jan Beulich wrote:
>> >>> On 15.03.16 at 18:56, <konrad.wilk@oracle.com> wrote:
>> > @@ -223,12 +224,15 @@ void __init do_initcalls(void)
>> >  /*
>> >   * Simple hypercalls.
>> >   */
>> > -
>> >  DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>> 
>> Please retain the blank line, as it relates to more than just this
>> one function.
> 
> Done! (stray change).

Considering this I'm not puzzled by ...

>      case XENVER_guest_handle:
> -        if ( copy_to_guest(arg, current->domain->handle,
> -                           ARRAY_SIZE(current->domain->handle)) )
> +    {
> +        xen_domain_handle_t hdl;
> +
> +        if ( deny )
> +            memset(&hdl, 0, ARRAY_SIZE(hdl));
> +
> +        BUILD_BUG_ON(ARRAY_SIZE(current->domain->handle) != ARRAY_SIZE(hdl));
> +
> +        if ( copy_to_guest(arg, deny ? hdl : current->domain->handle,
> +                           ARRAY_SIZE(hdl) ) )
>              return -EFAULT;
>          return 0;
> -
> +    }
>      case XENVER_commandline:

... this.

> --- a/xen/include/xsm/dummy.h
> +++ b/xen/include/xsm/dummy.h
> @@ -727,3 +727,27 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, unsigned int
>  }
>  
>  #endif /* CONFIG_X86 */
> +
> +#include <public/version.h>
> +static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
> +{
> +    XSM_ASSERT_ACTION(XSM_OTHER);
> +    switch ( op )
> +    {
> +    case XENVER_version:
> +    case XENVER_platform_parameters:
> +    case XENVER_get_features:
> +        /* The sub-ops ignores the permission check and returns data. */

ignore ... and return ...

With those minor things addressed I think the patch can have my ack.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 08/34] vmap: Make the while loop less fishy.
  2016-03-17 16:08   ` Ian Jackson
@ 2016-03-21 12:04     ` George Dunlap
  2016-03-21 13:26       ` Jan Beulich
  0 siblings, 1 reply; 124+ messages in thread
From: George Dunlap @ 2016-03-21 12:04 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Keir Fraser, Andrew Cooper, Tim Deegan, mpohlack, Ross Lagerwall,
	Jan Beulich, xen-devel, sasha.levin

On Thu, Mar 17, 2016 at 4:08 PM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
> Konrad Rzeszutek Wilk writes ("[PATCH v4 08/34] vmap: Make the while loop less fishy."):
>>   error:
>> -    while ( i-- )
>> -        free_domheap_page(mfn_to_page(mfn_x(mfn[i])));
>> +    while ( i )
>> +        free_domheap_page(mfn_to_page(mfn_x(mfn[--i])));
>
> I quite strongly dislike this.  It is good practice to keep the loop
> control code together where this is reasonably convenient.
>
> I wouldn't quibble on such a stylistic matter (particularly outside my
> bailiwick) but (a) I would like to reinforce Jan's position and
> (b) it seems worth writing an email as there will be many occurrences.

Since we're taking about general principle (and I've been referred to
here from a similar discussion elsewhere [1]), let me weigh in as
well.

I can see the point of not wanting the decrement to be in the middle
of the expression here.  But I also entirely agree with Konrad's
assessment that this code is likely to be confusing; and the fact that
a computer program following a list of rules *developed by
professional bug-finders* is confused by this kind of semantics I
think supports this assessment.  At very least it has the potential to
waste a lot of mental energy figuring out why code that looks wrong
isn't wrong; and at worst there's a risk that at some point someone
will "fix" it incorrectly.

The fact that there are already many instances of this pattern in the
source tree would be relevant if we expect nobody but people currently
familiar with the code to every try to read or modify it.  But since
on the contrary we hope that others will contribute to the codebase,
and even that they may eventually become maintainers, I think there is
sense in addressing them, at least as they come up.

In my case I've suggested adding a comment to clue people into the
fact that the postfix semantics are in operation; I think that
balances "reducing cognitive load" with "avoids unnecessarily verbose
code".

Other options would be things like this:

do {
 i--;
 [cleanup]
} while ( i > 0 );

or

while ( i > 0 ) {
 i--;
 [cleanup]
}

The first one I think is the clearest, but neither one are very concise.

 -George

[1] marc.info/?i=<56EBC5E102000078000DE376@prv-mh.provo.novell.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.
  2016-03-18 19:22           ` Konrad Rzeszutek Wilk
@ 2016-03-21 12:45             ` Jan Beulich
  2016-03-22 15:52               ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 124+ messages in thread
From: Jan Beulich @ 2016-03-21 12:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Stefano Stabellini, Andrew Cooper, Ian Jackson,
	mpohlack, ross.lagerwall, Julien Grall, Stefano Stabellini,
	xen-devel, Daniel De Graaf, Keir Fraser, sasha.levin

>>> On 18.03.16 at 20:22, <konrad.wilk@oracle.com> wrote:
>> > + * return the number of bytes requested for the operation. Or an
>> > + * negative value if an error is encountered.
>> > + */
>> > +
>> > +typedef uint64_t xen_version_op_val_t;
>> > +DEFINE_XEN_GUEST_HANDLE(xen_version_op_val_t);
>> > +
>> > +typedef void xen_version_op_buf_t;
>> > +DEFINE_XEN_GUEST_HANDLE(xen_version_op_buf_t);
>> 
>> Are these actually useful for anything? And for the various strings,
> 
> The xen_version_op_val_t is definitly used by the toolstack.
> 
>> wouldn't a "char" handle be more natural?
> 
> Heh. It was char[] before but Andrew liked it as void.

But that was because you used it for non string types too,
wasn't it?

> @@ -380,6 +388,133 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>      return -ENOSYS;
>  }
>  
> +static const char *capabilities_info(unsigned int *len)
> +{
> +    static xen_capabilities_info_t __read_mostly cached_cap;
> +    static unsigned int __read_mostly cached_cap_len;
> +    static bool_t cached;
> +
> +    if ( unlikely(!cached) )
> +    {
> +        arch_get_xen_caps(&cached_cap);
> +        cached_cap_len = strlen(cached_cap) + 1;
> +        cached = 1;
> +    }

I'm sorry for noticing this only now, but without any locking this is
unsafe: x86's arch_get_xen_caps() using safe_strcat() to fill the
buffer, simultaneous invocations would possibly produce garbled
output to all (i.e. also subsequently started) guests. Either use a
real lock here, or make the guard a tristate one, which gets
transitioned e.g. from 0 to -1 by the first one coming here (doing
the initialization), with everyone else waiting for it to become +1
(to which the initializing party sets it once it is done).

> +DO(version_op)(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg,
> +               unsigned int len)
> +{
> +    union {
> +        xen_version_op_val_t val;
> +        xen_feature_info_t fi;
> +    } u = {};
> +    unsigned int sz = 0;
> +    const void *ptr = NULL;
> +    int rc = xsm_version_op(XSM_OTHER, cmd);
> +
> +    /* We can safely return -EPERM! */
> +    if ( rc )
> +        return rc;
> +
> +    /*
> +     * The HYPERVISOR_xen_version differs in that some return the value,
> +     * and some copy it on back on argument. We follow the same rule for all
> +     * sub-ops: return 0 on success, positive value of bytes returned, and
> +     * always copy the result in arg. Yeey sanity!
> +     */
> +    switch ( cmd )
> +    {
> +    case XEN_VERSION_version:
> +        sz = sizeof(xen_version_op_val_t);
> +        u.val = (xen_major_version() << 16) | xen_minor_version();
> +        break;
> +
> +    case XEN_VERSION_extraversion:
> +        sz = strlen(xen_extra_version()) + 1;
> +        ptr = xen_extra_version();
> +        break;
> +
> +    case XEN_VERSION_capabilities:
> +        ptr = capabilities_info(&sz);
> +        break;
> +
> +    case XEN_VERSION_changeset:
> +        sz = strlen(xen_changeset()) + 1;
> +        ptr = xen_changeset();
> +        break;
> +
> +    case XEN_VERSION_platform_parameters:
> +        sz = sizeof(xen_version_op_val_t);
> +        u.val = HYPERVISOR_VIRT_START;
> +        break;
> +
> +    case XEN_VERSION_get_features:
> +        if ( copy_from_guest(&u.fi, arg, 1) )

Afaict this is incompatible with the null handle check further down (i.e.
you also need to check for a null handle here).

> --- a/xen/include/public/arch-arm.h
> +++ b/xen/include/public/arch-arm.h
> @@ -128,6 +128,9 @@
>   *    * VCPUOP_register_vcpu_info
>   *    * VCPUOP_register_runstate_memory_area
>   *
> + *  HYPERVISOR_version_op
> + *   All generic sub-operations
> + *
>   *
>   * Other notes on the ARM ABI:

I don't think the extra almost blank line is warranted here.

> --- a/xen/include/public/version.h
> +++ b/xen/include/public/version.h
> @@ -30,7 +30,15 @@
>  
>  #include "xen.h"
>  
> -/* NB. All ops return zero on success, except XENVER_{version,pagesize} */
> +/*
> + * There are two hypercalls mentioned in here. The XENVER_ are for
> + * HYPERCALL_xen_version (17), while VERSION_ are for the
> + * HYPERCALL_version_op (41).
> + *
> + * The subops are very similar except that the later hypercall has a
> + * sane interface.
> + */
> +
>  
>  /* arg == NULL; returns major:minor (16:16). */

Nor is the extra blank one here.

> @@ -87,6 +95,66 @@ typedef struct xen_feature_info xen_feature_info_t;
>  #define XENVER_commandline 9
>  typedef char xen_commandline_t[1024];
>  
> +
> +
> +/*
> + * The HYPERCALL_version_op has a set of sub-ops which mirror the

And three consecutive blank lines are too much in any event. (If
for no other reason that because that provides extremely bad
patch context if a later change happened right next to these three
lines.)

> +/*
> + * arg == char.
> + *
> + * The toolstack fills it out for guest consumption. It is intended to hold
> + * the UUID of the guest.
> + */
> +#define XEN_VERSION_guest_handle        8

So this is the place where I agree with Andrew char is not an
appropriate type. A void or uint8 handle seems like what you
want here.

> --- a/xen/include/xsm/dummy.h
> +++ b/xen/include/xsm/dummy.h
> @@ -751,3 +751,22 @@ static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
>          return xsm_default_action(XSM_PRIV, current->domain, NULL);
>      }
>  }
> +
> +static XSM_INLINE int xsm_version_op (XSM_DEFAULT_ARG uint32_t op)
> +{
> +    XSM_ASSERT_ACTION(XSM_OTHER);
> +    switch ( op )
> +    {
> +    case XEN_VERSION_version:
> +    case XEN_VERSION_extraversion:
> +    case XEN_VERSION_capabilities:
> +    case XEN_VERSION_platform_parameters:
> +    case XEN_VERSION_get_features:
> +    case XEN_VERSION_pagesize:
> +    case XEN_VERSION_guest_handle:
> +        /* These MUST always be accessible to any guest by default. */
> +        return xsm_default_action(XSM_HOOK, current->domain, NULL);
> +    default:
> +        return xsm_default_action(XSM_PRIV, current->domain, NULL);

Considering that we seem to have settled on some exceptions here
for the change adding XSM check to the legacy version op, do you
really think going with no exception at all here is the right approach?
Because if we do, that'll prevent guests getting fully converted over
to the new interface. Of course, we could also make this conversion
specifically a non-goal, and omit e.g. XEN_VERSION_VERSION from
this new interface.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 06/34] x86/arm: Add BUGFRAME_NR define and BUILD checks.
  2016-03-18 19:59     ` Konrad Rzeszutek Wilk
@ 2016-03-21 12:49       ` Jan Beulich
  2016-03-22 15:39         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 124+ messages in thread
From: Jan Beulich @ 2016-03-21 12:49 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, mpohlack,
	Julien Grall, Stefano Stabellini, sasha.levin, xen-devel

>>> On 18.03.16 at 20:59, <konrad.wilk@oracle.com> wrote:
> I know I copied and pasted it and I must have done something uncanny.
> 
> Anyhow this is what the change looks like now (I've retained the Reviewed
> and Ack as I think this change is mostly cosmetical in nature?)

I think that's okay.

> v5: Add Acks, make BUILD_BUG_ON checks look correct. Position the
>     BUGFRAME_NR properly.

Almost, that is.

> --- a/xen/include/asm-x86/bug.h
> +++ b/xen/include/asm-x86/bug.h
> @@ -10,6 +10,7 @@
>  #define BUGFRAME_bug    2
>  #define BUGFRAME_assert 3
>  
> +#define BUGFRAME_NR     4
>  #ifndef __ASSEMBLY__

The insertion wants to go _before_ the blank line. (And in the
ARM case you then may consider removing the preceding blank
line too; in any event the ARM and x86 ones should look similar
in the end.)

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 08/34] vmap: Make the while loop less fishy.
  2016-03-21 12:04     ` George Dunlap
@ 2016-03-21 13:26       ` Jan Beulich
  2016-03-21 14:22         ` George Dunlap
  0 siblings, 1 reply; 124+ messages in thread
From: Jan Beulich @ 2016-03-21 13:26 UTC (permalink / raw)
  To: George Dunlap
  Cc: Keir Fraser, Andrew Cooper, Ian Jackson, Tim Deegan, mpohlack,
	Ross Lagerwall, sasha.levin, xen-devel

>>> On 21.03.16 at 13:04, <George.Dunlap@eu.citrix.com> wrote:
> On Thu, Mar 17, 2016 at 4:08 PM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
>> Konrad Rzeszutek Wilk writes ("[PATCH v4 08/34] vmap: Make the while loop less fishy."):
>>>   error:
>>> -    while ( i-- )
>>> -        free_domheap_page(mfn_to_page(mfn_x(mfn[i])));
>>> +    while ( i )
>>> +        free_domheap_page(mfn_to_page(mfn_x(mfn[--i])));
>>
>> I quite strongly dislike this.  It is good practice to keep the loop
>> control code together where this is reasonably convenient.
>>
>> I wouldn't quibble on such a stylistic matter (particularly outside my
>> bailiwick) but (a) I would like to reinforce Jan's position and
>> (b) it seems worth writing an email as there will be many occurrences.
> 
> Since we're taking about general principle (and I've been referred to
> here from a similar discussion elsewhere [1]), let me weigh in as
> well.
> 
> I can see the point of not wanting the decrement to be in the middle
> of the expression here.  But I also entirely agree with Konrad's
> assessment that this code is likely to be confusing; and the fact that
> a computer program following a list of rules *developed by
> professional bug-finders* is confused by this kind of semantics I
> think supports this assessment.  At very least it has the potential to
> waste a lot of mental energy figuring out why code that looks wrong
> isn't wrong; and at worst there's a risk that at some point someone
> will "fix" it incorrectly.
> 
> The fact that there are already many instances of this pattern in the
> source tree would be relevant if we expect nobody but people currently
> familiar with the code to every try to read or modify it.  But since
> on the contrary we hope that others will contribute to the codebase,
> and even that they may eventually become maintainers, I think there is
> sense in addressing them, at least as they come up.

Well, if talk was about something really complex here, I might
agree. But unary prefix and postfix operators are an integral
part of the C language, and while code reading indeed shouldn't
require overly much mental energy, I think we should be
permitted to make full knowledge of the base programming
language a prereq to reading our code. Otherwise - where do
you want to draw the boundary of what is permitted and what
is not? (Yes, I know I'm guilty in occasionally writing rather
complex expressions, with at times not immediately obvious side
effects, and I'm trying to do better irrespective of all such also
falling in the above "basic language" features category. That's
because I can see doing so being past the boundary of
reasonably understandable code.)

> In my case I've suggested adding a comment to clue people into the
> fact that the postfix semantics are in operation; I think that
> balances "reducing cognitive load" with "avoids unnecessarily verbose
> code".
> 
> Other options would be things like this:
> 
> do {
>  i--;
>  [cleanup]
> } while ( i > 0 );
> 
> or
> 
> while ( i > 0 ) {
>  i--;
>  [cleanup]
> }
> 
> The first one I think is the clearest, but neither one are very concise.

But you realize that the first (but not the second) one is wrong for
the i == 0 case?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 08/34] vmap: Make the while loop less fishy.
  2016-03-21 13:26       ` Jan Beulich
@ 2016-03-21 14:22         ` George Dunlap
  2016-03-21 15:05           ` Jan Beulich
  0 siblings, 1 reply; 124+ messages in thread
From: George Dunlap @ 2016-03-21 14:22 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, Andrew Cooper, Ian Jackson, Tim Deegan, mpohlack,
	Ross Lagerwall, xen-devel, sasha.levin

On Mon, Mar 21, 2016 at 1:26 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 21.03.16 at 13:04, <George.Dunlap@eu.citrix.com> wrote:
>> On Thu, Mar 17, 2016 at 4:08 PM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
>>> Konrad Rzeszutek Wilk writes ("[PATCH v4 08/34] vmap: Make the while loop less fishy."):
>>>>   error:
>>>> -    while ( i-- )
>>>> -        free_domheap_page(mfn_to_page(mfn_x(mfn[i])));
>>>> +    while ( i )
>>>> +        free_domheap_page(mfn_to_page(mfn_x(mfn[--i])));
>>>
>>> I quite strongly dislike this.  It is good practice to keep the loop
>>> control code together where this is reasonably convenient.
>>>
>>> I wouldn't quibble on such a stylistic matter (particularly outside my
>>> bailiwick) but (a) I would like to reinforce Jan's position and
>>> (b) it seems worth writing an email as there will be many occurrences.
>>
>> Since we're taking about general principle (and I've been referred to
>> here from a similar discussion elsewhere [1]), let me weigh in as
>> well.
>>
>> I can see the point of not wanting the decrement to be in the middle
>> of the expression here.  But I also entirely agree with Konrad's
>> assessment that this code is likely to be confusing; and the fact that
>> a computer program following a list of rules *developed by
>> professional bug-finders* is confused by this kind of semantics I
>> think supports this assessment.  At very least it has the potential to
>> waste a lot of mental energy figuring out why code that looks wrong
>> isn't wrong; and at worst there's a risk that at some point someone
>> will "fix" it incorrectly.
>>
>> The fact that there are already many instances of this pattern in the
>> source tree would be relevant if we expect nobody but people currently
>> familiar with the code to every try to read or modify it.  But since
>> on the contrary we hope that others will contribute to the codebase,
>> and even that they may eventually become maintainers, I think there is
>> sense in addressing them, at least as they come up.
>
> Well, if talk was about something really complex here, I might
> agree. But unary prefix and postfix operators are an integral
> part of the C language, and while code reading indeed shouldn't
> require overly much mental energy, I think we should be
> permitted to make full knowledge of the base programming
> language a prereq to reading our code. Otherwise - where do
> you want to draw the boundary of what is permitted and what
> is not? (Yes, I know I'm guilty in occasionally writing rather
> complex expressions, with at times not immediately obvious side
> effects, and I'm trying to do better irrespective of all such also
> falling in the above "basic language" features category. That's
> because I can see doing so being past the boundary of
> reasonably understandable code.)

And I'm not saying that we can't take advantage of postfix operators
in this case; I'm just saying that if we are going to, it would be
better to point it out in a comment.

There are lots of "features" of C that standard practice still demands
that we add a comment for; the default fall-through for switch
statements comes to mind.  Of course programmers should be required to
know that switch statements default to fallthrough in C; but it's
still a common mistake to forget that, and so we point them in the
right direction by adding comments.

Similarly, of course programmers should know the difference between
prefix and postfix operators.  But this is a case where there's a risk
of tripping over something, so it makes sense to point them in the
right direction by adding comments.

Or to take a different tack: I understand that you don't think there's
no particular benefit to adding a comment in cases like this; could
you explain to me why you think it would have a significant cost?

>> do {
>>  i--;
>>  [cleanup]
>> } while ( i > 0 );
>>
>
> But you realize that the first (but not the second) one is wrong for
> the i == 0 case?

So it is.  I think I was thinking of a case where it was known that at
least one iteration had succeeded; but obviously the second is more
safe in general.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 08/34] vmap: Make the while loop less fishy.
  2016-03-21 14:22         ` George Dunlap
@ 2016-03-21 15:05           ` Jan Beulich
  0 siblings, 0 replies; 124+ messages in thread
From: Jan Beulich @ 2016-03-21 15:05 UTC (permalink / raw)
  To: George Dunlap
  Cc: Keir Fraser, Andrew Cooper, Ian Jackson, Tim Deegan, mpohlack,
	Ross Lagerwall, sasha.levin, xen-devel

>>> On 21.03.16 at 15:22, <George.Dunlap@eu.citrix.com> wrote:
> Or to take a different tack: I understand that you don't think there's
> no particular benefit to adding a comment in cases like this; could
> you explain to me why you think it would have a significant cost?

There's no significant cost here. Yet I do think that commenting the
obvious is not really helpful (or else we end up with more comments
than there is actual code; some may consider this a good thing, but
I'm of the opinion that this would only serve obfuscating the code).
Plus - as said in various other contexts - I'm in favor of consistency,
i.e. I would think that if these constructs warrant a comment, all
of them should get one. The fall through example you gave for
switch statements is actually a good one: There the comments
silence Coverity. If comments here had the same effect, I think I'd
have no reservations against adding such, yet I don't think they
do.

In the end it boils down to: I'm not going to ack any such change
myself, but I'm also not going to stand in the way of other
maintainers ack-ing such comments getting added. What I would
object to though is calling such constructs "fishy" in commit titles,
commit messages, or the comments being added.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 06/34] x86/arm: Add BUGFRAME_NR define and BUILD checks.
  2016-03-21 12:49       ` Jan Beulich
@ 2016-03-22 15:39         ` Konrad Rzeszutek Wilk
  2016-03-22 15:58           ` Jan Beulich
  0 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-22 15:39 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, mpohlack,
	Julien Grall, Stefano Stabellini, sasha.levin, xen-devel

On Mon, Mar 21, 2016 at 06:49:03AM -0600, Jan Beulich wrote:
> >>> On 18.03.16 at 20:59, <konrad.wilk@oracle.com> wrote:
> > I know I copied and pasted it and I must have done something uncanny.
> > 
> > Anyhow this is what the change looks like now (I've retained the Reviewed
> > and Ack as I think this change is mostly cosmetical in nature?)
> 
> I think that's okay.
> 
> > v5: Add Acks, make BUILD_BUG_ON checks look correct. Position the
> >     BUGFRAME_NR properly.
> 
> Almost, that is.
> 
> > --- a/xen/include/asm-x86/bug.h
> > +++ b/xen/include/asm-x86/bug.h
> > @@ -10,6 +10,7 @@
> >  #define BUGFRAME_bug    2
> >  #define BUGFRAME_assert 3
> >  
> > +#define BUGFRAME_NR     4
> >  #ifndef __ASSEMBLY__
> 
> The insertion wants to go _before_ the blank line. (And in the
> ARM case you then may consider removing the preceding blank
> line too; in any event the ARM and x86 ones should look similar
> in the end.)
> 

Here it is. Last call :-)

From f97548200461b9eb4d8187eb9e1f021c74160759 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Thu, 10 Mar 2016 16:45:31 -0500
Subject: [PATCH] x86/arm: Add BUGFRAME_NR define and BUILD checks.

So that we have a nice mechansim to figure out the upper
bounds of bug.frames and also catch compiler errors in case
one tries to use a higher frame number.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>

---
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v3: First time included.
v4: Add BUG_FRAME check also in the assembler version of the macro.
v5: Add Acks, make BUILD_BUG_ON checks look correct. Position the
    BUGFRAME_NR properly. Reposition the BUGFRAME_NR again.
---
---
 xen/include/asm-arm/bug.h | 3 +++
 xen/include/asm-x86/bug.h | 8 ++++++++
 2 files changed, 11 insertions(+)

diff --git a/xen/include/asm-arm/bug.h b/xen/include/asm-arm/bug.h
index ab9e811..68353e1 100644
--- a/xen/include/asm-arm/bug.h
+++ b/xen/include/asm-arm/bug.h
@@ -32,6 +32,8 @@ struct bug_frame {
 #define BUGFRAME_bug    1
 #define BUGFRAME_assert 2
 
+#define BUGFRAME_NR     3
+
 /* Many versions of GCC doesn't support the asm %c parameter which would
  * be preferable to this unpleasantness. We use mergeable string
  * sections to avoid multiple copies of the string appearing in the
@@ -39,6 +41,7 @@ struct bug_frame {
  */
 #define BUG_FRAME(type, line, file, has_msg, msg) do {                      \
     BUILD_BUG_ON((line) >> 16);                                             \
+    BUILD_BUG_ON((type) >= BUGFRAME_NR);                                    \
     asm ("1:"BUG_INSTR"\n"                                                  \
          ".pushsection .rodata.str, \"aMS\", %progbits, 1\n"                \
          "2:\t.asciz " __stringify(file) "\n"                               \
diff --git a/xen/include/asm-x86/bug.h b/xen/include/asm-x86/bug.h
index e868e85..c5d2d4c 100644
--- a/xen/include/asm-x86/bug.h
+++ b/xen/include/asm-x86/bug.h
@@ -10,6 +10,8 @@
 #define BUGFRAME_bug    2
 #define BUGFRAME_assert 3
 
+#define BUGFRAME_NR     4
+
 #ifndef __ASSEMBLY__
 
 struct bug_frame {
@@ -51,6 +53,7 @@ struct bug_frame {
 
 #define BUG_FRAME(type, line, ptr, second_frame, msg) do {                   \
     BUILD_BUG_ON((line) >> (BUG_LINE_LO_WIDTH + BUG_LINE_HI_WIDTH));         \
+    BUILD_BUG_ON((type) >= BUGFRAME_NR);                                     \
     asm volatile ( _ASM_BUGFRAME_TEXT(second_frame)                          \
                    :: _ASM_BUGFRAME_INFO(type, line, ptr, msg) );            \
 } while (0)
@@ -83,6 +86,11 @@ extern const struct bug_frame __start_bug_frames[],
  * in .rodata
  */
     .macro BUG_FRAME type, line, file_str, second_frame, msg
+
+    .if \type >= BUGFRAME_NR
+        .error "Invalid BUGFRAME index"
+    .endif
+
     .L\@ud: ud2a
 
     .pushsection .rodata.str1, "aMS", @progbits, 1
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.
  2016-03-21 12:45             ` Jan Beulich
@ 2016-03-22 15:52               ` Konrad Rzeszutek Wilk
  2016-03-22 16:06                 ` Jan Beulich
  0 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-22 15:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Stefano Stabellini, Andrew Cooper, Ian Jackson,
	mpohlack, ross.lagerwall, Julien Grall, Stefano Stabellini,
	xen-devel, Daniel De Graaf, Keir Fraser, sasha.levin

On Mon, Mar 21, 2016 at 06:45:28AM -0600, Jan Beulich wrote:
> >>> On 18.03.16 at 20:22, <konrad.wilk@oracle.com> wrote:
> >> > + * return the number of bytes requested for the operation. Or an
> >> > + * negative value if an error is encountered.
> >> > + */
> >> > +
> >> > +typedef uint64_t xen_version_op_val_t;
> >> > +DEFINE_XEN_GUEST_HANDLE(xen_version_op_val_t);
> >> > +
> >> > +typedef void xen_version_op_buf_t;
> >> > +DEFINE_XEN_GUEST_HANDLE(xen_version_op_buf_t);
> >> 
> >> Are these actually useful for anything? And for the various strings,
> > 
> > The xen_version_op_val_t is definitly used by the toolstack.
> > 
> >> wouldn't a "char" handle be more natural?
> > 
> > Heh. It was char[] before but Andrew liked it as void.
> 
> But that was because you used it for non string types too,
> wasn't it?

Yes. For the build-id which is a binary blob. And as you noticed  - also
the domain handle which can be anything.

> 
> > @@ -380,6 +388,133 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> >      return -ENOSYS;
> >  }
> >  
> > +static const char *capabilities_info(unsigned int *len)
> > +{
> > +    static xen_capabilities_info_t __read_mostly cached_cap;
> > +    static unsigned int __read_mostly cached_cap_len;
> > +    static bool_t cached;
> > +
> > +    if ( unlikely(!cached) )
> > +    {
> > +        arch_get_xen_caps(&cached_cap);
> > +        cached_cap_len = strlen(cached_cap) + 1;
> > +        cached = 1;
> > +    }
> 
> I'm sorry for noticing this only now, but without any locking this is
> unsafe: x86's arch_get_xen_caps() using safe_strcat() to fill the
> buffer, simultaneous invocations would possibly produce garbled
> output to all (i.e. also subsequently started) guests. Either use a
> real lock here, or make the guard a tristate one, which gets
> transitioned e.g. from 0 to -1 by the first one coming here (doing
> the initialization), with everyone else waiting for it to become +1
> (to which the initializing party sets it once it is done).

That would indeed be bad.

What if an _init_ code called this to 'pre-cache' it?

> 
> > +DO(version_op)(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg,
> > +               unsigned int len)
> > +{
> > +    union {
> > +        xen_version_op_val_t val;
> > +        xen_feature_info_t fi;
> > +    } u = {};
> > +    unsigned int sz = 0;
> > +    const void *ptr = NULL;
> > +    int rc = xsm_version_op(XSM_OTHER, cmd);
> > +
> > +    /* We can safely return -EPERM! */
> > +    if ( rc )
> > +        return rc;
> > +
> > +    /*
> > +     * The HYPERVISOR_xen_version differs in that some return the value,
> > +     * and some copy it on back on argument. We follow the same rule for all
> > +     * sub-ops: return 0 on success, positive value of bytes returned, and
> > +     * always copy the result in arg. Yeey sanity!
> > +     */
> > +    switch ( cmd )
> > +    {
> > +    case XEN_VERSION_version:
> > +        sz = sizeof(xen_version_op_val_t);
> > +        u.val = (xen_major_version() << 16) | xen_minor_version();
> > +        break;
> > +
> > +    case XEN_VERSION_extraversion:
> > +        sz = strlen(xen_extra_version()) + 1;
> > +        ptr = xen_extra_version();
> > +        break;
> > +
> > +    case XEN_VERSION_capabilities:
> > +        ptr = capabilities_info(&sz);
> > +        break;
> > +
> > +    case XEN_VERSION_changeset:
> > +        sz = strlen(xen_changeset()) + 1;
> > +        ptr = xen_changeset();
> > +        break;
> > +
> > +    case XEN_VERSION_platform_parameters:
> > +        sz = sizeof(xen_version_op_val_t);
> > +        u.val = HYPERVISOR_VIRT_START;
> > +        break;
> > +
> > +    case XEN_VERSION_get_features:
> > +        if ( copy_from_guest(&u.fi, arg, 1) )
> 
> Afaict this is incompatible with the null handle check further down (i.e.
> you also need to check for a null handle here).

Oh my. Indeed.

> 
> > --- a/xen/include/public/arch-arm.h
> > +++ b/xen/include/public/arch-arm.h
> > @@ -128,6 +128,9 @@
> >   *    * VCPUOP_register_vcpu_info
> >   *    * VCPUOP_register_runstate_memory_area
> >   *
> > + *  HYPERVISOR_version_op
> > + *   All generic sub-operations
> > + *
> >   *
> >   * Other notes on the ARM ABI:
> 
> I don't think the extra almost blank line is warranted here.
> 
> > --- a/xen/include/public/version.h
> > +++ b/xen/include/public/version.h
> > @@ -30,7 +30,15 @@
> >  
> >  #include "xen.h"
> >  
> > -/* NB. All ops return zero on success, except XENVER_{version,pagesize} */
> > +/*
> > + * There are two hypercalls mentioned in here. The XENVER_ are for
> > + * HYPERCALL_xen_version (17), while VERSION_ are for the
> > + * HYPERCALL_version_op (41).
> > + *
> > + * The subops are very similar except that the later hypercall has a
> > + * sane interface.
> > + */
> > +
> >  
> >  /* arg == NULL; returns major:minor (16:16). */
> 
> Nor is the extra blank one here.
> 
> > @@ -87,6 +95,66 @@ typedef struct xen_feature_info xen_feature_info_t;
> >  #define XENVER_commandline 9
> >  typedef char xen_commandline_t[1024];
> >  
> > +
> > +
> > +/*
> > + * The HYPERCALL_version_op has a set of sub-ops which mirror the
> 
> And three consecutive blank lines are too much in any event. (If
> for no other reason that because that provides extremely bad
> patch context if a later change happened right next to these three
> lines.)
> 
> > +/*
> > + * arg == char.
> > + *
> > + * The toolstack fills it out for guest consumption. It is intended to hold
> > + * the UUID of the guest.
> > + */
> > +#define XEN_VERSION_guest_handle        8
> 
> So this is the place where I agree with Andrew char is not an
> appropriate type. A void or uint8 handle seems like what you
> want here.

/me nods.
> 
> > --- a/xen/include/xsm/dummy.h
> > +++ b/xen/include/xsm/dummy.h
> > @@ -751,3 +751,22 @@ static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
> >          return xsm_default_action(XSM_PRIV, current->domain, NULL);
> >      }
> >  }
> > +
> > +static XSM_INLINE int xsm_version_op (XSM_DEFAULT_ARG uint32_t op)
> > +{
> > +    XSM_ASSERT_ACTION(XSM_OTHER);
> > +    switch ( op )
> > +    {
> > +    case XEN_VERSION_version:
> > +    case XEN_VERSION_extraversion:
> > +    case XEN_VERSION_capabilities:
> > +    case XEN_VERSION_platform_parameters:
> > +    case XEN_VERSION_get_features:
> > +    case XEN_VERSION_pagesize:
> > +    case XEN_VERSION_guest_handle:
> > +        /* These MUST always be accessible to any guest by default. */
> > +        return xsm_default_action(XSM_HOOK, current->domain, NULL);
> > +    default:
> > +        return xsm_default_action(XSM_PRIV, current->domain, NULL);
> 
> Considering that we seem to have settled on some exceptions here
> for the change adding XSM check to the legacy version op, do you
> really think going with no exception at all here is the right approach?

> Because if we do, that'll prevent guests getting fully converted over
> to the new interface. Of course, we could also make this conversion
> specifically a non-goal, and omit e.g. XEN_VERSION_VERSION from
> this new interface.

No no. I think convesion is the right long-term goal. 

However the nice thing about this hypercall is that it can return -EPERM.

Making it always return an value for XEN_VERSION_version,
XEN_VERSION_platform_parameters, XEN_VERSION_get_features means that
there are some exceptions to this "can return -EPERM" as they will
be guaranteed an postive return value. They can ignore the -EPERM
case.

And means that guest can still take shortcuts.

I agree with you that guests need these hypercalls but at the same
time I am not sure what can be done so they don't fall flat on their
faces if they are presented with -EPERM. The Linux xenbus_init can be
modified to deal with this returning -EPERM where it makes some assumptions.

But in a likelyhood it is the bad assumption!

Andrew, what do you think?


> 
> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 06/34] x86/arm: Add BUGFRAME_NR define and BUILD checks.
  2016-03-22 15:39         ` Konrad Rzeszutek Wilk
@ 2016-03-22 15:58           ` Jan Beulich
  0 siblings, 0 replies; 124+ messages in thread
From: Jan Beulich @ 2016-03-22 15:58 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, mpohlack,
	Julien Grall, Stefano Stabellini, sasha.levin, xen-devel

>>> On 22.03.16 at 16:39, <konrad.wilk@oracle.com> wrote:
> On Mon, Mar 21, 2016 at 06:49:03AM -0600, Jan Beulich wrote:
>> >>> On 18.03.16 at 20:59, <konrad.wilk@oracle.com> wrote:
>> > I know I copied and pasted it and I must have done something uncanny.
>> > 
>> > Anyhow this is what the change looks like now (I've retained the Reviewed
>> > and Ack as I think this change is mostly cosmetical in nature?)
>> 
>> I think that's okay.
>> 
>> > v5: Add Acks, make BUILD_BUG_ON checks look correct. Position the
>> >     BUGFRAME_NR properly.
>> 
>> Almost, that is.
>> 
>> > --- a/xen/include/asm-x86/bug.h
>> > +++ b/xen/include/asm-x86/bug.h
>> > @@ -10,6 +10,7 @@
>> >  #define BUGFRAME_bug    2
>> >  #define BUGFRAME_assert 3
>> >  
>> > +#define BUGFRAME_NR     4
>> >  #ifndef __ASSEMBLY__
>> 
>> The insertion wants to go _before_ the blank line. (And in the
>> ARM case you then may consider removing the preceding blank
>> line too; in any event the ARM and x86 ones should look similar
>> in the end.)
>> 
> 
> Here it is. Last call :-)

Thanks, looks fine now.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.
  2016-03-22 15:52               ` Konrad Rzeszutek Wilk
@ 2016-03-22 16:06                 ` Jan Beulich
  2016-03-22 18:57                   ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 124+ messages in thread
From: Jan Beulich @ 2016-03-22 16:06 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Stefano Stabellini, Andrew Cooper, Ian Jackson,
	mpohlack, ross.lagerwall, Julien Grall, Stefano Stabellini,
	xen-devel, Daniel De Graaf, Keir Fraser, sasha.levin

>>> On 22.03.16 at 16:52, <konrad.wilk@oracle.com> wrote:
> On Mon, Mar 21, 2016 at 06:45:28AM -0600, Jan Beulich wrote:
>> >>> On 18.03.16 at 20:22, <konrad.wilk@oracle.com> wrote:
>> > @@ -380,6 +388,133 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>> >      return -ENOSYS;
>> >  }
>> >  
>> > +static const char *capabilities_info(unsigned int *len)
>> > +{
>> > +    static xen_capabilities_info_t __read_mostly cached_cap;
>> > +    static unsigned int __read_mostly cached_cap_len;
>> > +    static bool_t cached;
>> > +
>> > +    if ( unlikely(!cached) )
>> > +    {
>> > +        arch_get_xen_caps(&cached_cap);
>> > +        cached_cap_len = strlen(cached_cap) + 1;
>> > +        cached = 1;
>> > +    }
>> 
>> I'm sorry for noticing this only now, but without any locking this is
>> unsafe: x86's arch_get_xen_caps() using safe_strcat() to fill the
>> buffer, simultaneous invocations would possibly produce garbled
>> output to all (i.e. also subsequently started) guests. Either use a
>> real lock here, or make the guard a tristate one, which gets
>> transitioned e.g. from 0 to -1 by the first one coming here (doing
>> the initialization), with everyone else waiting for it to become +1
>> (to which the initializing party sets it once it is done).
> 
> That would indeed be bad.
> 
> What if an _init_ code called this to 'pre-cache' it?

That's one of the options you have.

>> > --- a/xen/include/xsm/dummy.h
>> > +++ b/xen/include/xsm/dummy.h
>> > @@ -751,3 +751,22 @@ static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
>> >          return xsm_default_action(XSM_PRIV, current->domain, NULL);
>> >      }
>> >  }
>> > +
>> > +static XSM_INLINE int xsm_version_op (XSM_DEFAULT_ARG uint32_t op)
>> > +{
>> > +    XSM_ASSERT_ACTION(XSM_OTHER);
>> > +    switch ( op )
>> > +    {
>> > +    case XEN_VERSION_version:
>> > +    case XEN_VERSION_extraversion:
>> > +    case XEN_VERSION_capabilities:
>> > +    case XEN_VERSION_platform_parameters:
>> > +    case XEN_VERSION_get_features:
>> > +    case XEN_VERSION_pagesize:
>> > +    case XEN_VERSION_guest_handle:
>> > +        /* These MUST always be accessible to any guest by default. */
>> > +        return xsm_default_action(XSM_HOOK, current->domain, NULL);
>> > +    default:
>> > +        return xsm_default_action(XSM_PRIV, current->domain, NULL);
>> 
>> Considering that we seem to have settled on some exceptions here
>> for the change adding XSM check to the legacy version op, do you
>> really think going with no exception at all here is the right approach?
> 
>> Because if we do, that'll prevent guests getting fully converted over
>> to the new interface. Of course, we could also make this conversion
>> specifically a non-goal, and omit e.g. XEN_VERSION_VERSION from
>> this new interface.
> 
> No no. I think convesion is the right long-term goal. 
> 
> However the nice thing about this hypercall is that it can return -EPERM.
> 
> Making it always return an value for XEN_VERSION_version,
> XEN_VERSION_platform_parameters, XEN_VERSION_get_features means that
> there are some exceptions to this "can return -EPERM" as they will
> be guaranteed an postive return value. They can ignore the -EPERM
> case.
> 
> And means that guest can still take shortcuts.
> 
> I agree with you that guests need these hypercalls but at the same
> time I am not sure what can be done so they don't fall flat on their
> faces if they are presented with -EPERM. The Linux xenbus_init can be
> modified to deal with this returning -EPERM where it makes some assumptions.
> 
> But in a likelyhood it is the bad assumption!

I'm afraid I can't conclude what you mean to say with all of the
above.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 03/34] xsm/xen_version: Add XSM for the xen_version hypercall
  2016-03-21 11:22       ` Jan Beulich
@ 2016-03-22 16:10         ` Konrad Rzeszutek Wilk
  2016-03-22 17:54           ` Daniel De Graaf
  0 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-22 16:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Stefano Stabellini, andrew.cooper3, Ian Jackson,
	mpohlack, ross.lagerwall, xen-devel, Daniel De Graaf,
	sasha.levin

On Mon, Mar 21, 2016 at 05:22:09AM -0600, Jan Beulich wrote:
> >>> On 18.03.16 at 18:26, <konrad.wilk@oracle.com> wrote:
> > On Fri, Mar 18, 2016 at 05:55:55AM -0600, Jan Beulich wrote:
> >> >>> On 15.03.16 at 18:56, <konrad.wilk@oracle.com> wrote:
> >> > @@ -223,12 +224,15 @@ void __init do_initcalls(void)
> >> >  /*
> >> >   * Simple hypercalls.
> >> >   */
> >> > -
> >> >  DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> >> 
> >> Please retain the blank line, as it relates to more than just this
> >> one function.
> > 
> > Done! (stray change).
> 
> Considering this I'm not puzzled by ...
> 
> >      case XENVER_guest_handle:
> > -        if ( copy_to_guest(arg, current->domain->handle,
> > -                           ARRAY_SIZE(current->domain->handle)) )
> > +    {
> > +        xen_domain_handle_t hdl;
> > +
> > +        if ( deny )
> > +            memset(&hdl, 0, ARRAY_SIZE(hdl));
> > +
> > +        BUILD_BUG_ON(ARRAY_SIZE(current->domain->handle) != ARRAY_SIZE(hdl));
> > +
> > +        if ( copy_to_guest(arg, deny ? hdl : current->domain->handle,
> > +                           ARRAY_SIZE(hdl) ) )
> >              return -EFAULT;
> >          return 0;
> > -
> > +    }
> >      case XENVER_commandline:
> 
> ... this.

Wow. That is some sharp eyes!
> 
> > --- a/xen/include/xsm/dummy.h
> > +++ b/xen/include/xsm/dummy.h
> > @@ -727,3 +727,27 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, unsigned int
> >  }
> >  
> >  #endif /* CONFIG_X86 */
> > +
> > +#include <public/version.h>
> > +static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
> > +{
> > +    XSM_ASSERT_ACTION(XSM_OTHER);
> > +    switch ( op )
> > +    {
> > +    case XENVER_version:
> > +    case XENVER_platform_parameters:
> > +    case XENVER_get_features:
> > +        /* The sub-ops ignores the permission check and returns data. */
> 
> ignore ... and return ...
> 
> With those minor things addressed I think the patch can have my ack.

Thank you!

Now I just need Daniel's Ack again.

From 1ccf59abdd2cd9228f0159dce77fe404d98c7300 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Fri, 11 Mar 2016 21:40:43 -0500
Subject: [PATCH] xsm/xen_version: Add XSM for most of xen_version hypercall

Most of XENVER_* have now an XSM check for their sub-ops.

The subop for XENVER_commandline is now a priviliged operation.
To not break guests we still return an string - but it is
just '<denied>\0'.

The XENVER_[version|platform_parameters|get_features] - will
always return an value to the guest.

The rest: XENVER_[extraversion|capabilities|page_size|
guest_handle|changeset| compile_info] behave as before -
allowed by default for all guests if using the XSM default
policy or with the dummy one. And if the system admin
wants to curtail access to some of them - they can do
that now with a non-default XSM policy.

Also we add a local variable block.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

---
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>

v2: Do XSM check for all the XENVER_ ops.
 - Add empty data conditions.
 - Return <denied> for priv subops.
 - Move extraversion from priv to normal. Drop the XSM check
    for the non-priv subops.
v3:
 - Add +1 for strlen(xen_deny()) to include NULL. Move changeset,
    compile_info to non-priv subops.
 - Remove the \0 on xen_deny()
 - Add new XSM domain for xenver hypercall. Add all subops to it.
 - Remove the extra line, Add Ack from Daniel
v4:
 - Rename the XSM from xen_version_op to xsm_xen_version.
   Prefix the types with 'xen' to distinguish it from another
   hypercall performing similar operation. Removed Ack from Daniel
   as it was so large. Add local variable block.
v5:
 - Make XENVER_platform_parameters,get_features,version be excluded
   from the XSM check per Jans' review. Add BUILD_BUG_CHECK and fix
   odd line removals. Remove stray changes and fix spelling.
---
 tools/flask/policy/policy/modules/xen/xen.te | 14 ++++++++++
 xen/common/kernel.c                          | 42 +++++++++++++++++++++-------
 xen/common/version.c                         | 15 ++++++++++
 xen/include/xen/version.h                    |  1 +
 xen/include/xsm/dummy.h                      | 24 ++++++++++++++++
 xen/include/xsm/xsm.h                        |  6 ++++
 xen/xsm/dummy.c                              |  1 +
 xen/xsm/flask/hooks.c                        | 39 ++++++++++++++++++++++++++
 xen/xsm/flask/policy/access_vectors          | 25 +++++++++++++++++
 xen/xsm/flask/policy/security_classes        |  1 +
 10 files changed, 158 insertions(+), 10 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index d35ae22..18f49b5 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -73,6 +73,14 @@ allow dom0_t xen_t:xen2 {
     pmu_ctrl
     get_symbol
 };
+
+# Allow dom0 to use all XENVER_ subops that have checks.
+# Note that dom0 is part of domain_type so this has duplicates.
+allow dom0_t xen_t:version {
+    xen_extraversion xen_compile_info xen_capabilities
+    xen_changeset xen_pagesize xen_guest_handle xen_commandline
+};
+
 allow dom0_t xen_t:mmu memorymap;
 
 # Allow dom0 to use these domctls on itself. For domctls acting on other
@@ -137,6 +145,12 @@ if (guest_writeconsole) {
 # pmu_ctrl is for)
 allow domain_type xen_t:xen2 pmu_use;
 
+# For normal guests all possible except XENVER_commandline.
+allow domain_type xen_t:version {
+    xen_extraversion xen_compile_info xen_capabilities
+    xen_changeset  xen_pagesize xen_guest_handle
+};
+
 ###############################################################################
 #
 # Domain creation
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 0618da2..a4a3c36 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -13,6 +13,7 @@
 #include <xen/nmi.h>
 #include <xen/guest_access.h>
 #include <xen/hypercall.h>
+#include <xsm/xsm.h>
 #include <asm/current.h>
 #include <public/nmi.h>
 #include <public/version.h>
@@ -226,6 +227,8 @@ void __init do_initcalls(void)
 
 DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
+    bool_t deny = !!xsm_xen_version(XSM_OTHER, cmd);
+
     switch ( cmd )
     {
     case XENVER_version:
@@ -236,7 +239,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         xen_extraversion_t extraversion;
 
         memset(extraversion, 0, sizeof(extraversion));
-        safe_strcpy(extraversion, xen_extra_version());
+        safe_strcpy(extraversion, deny ? xen_deny() : xen_extra_version());
         if ( copy_to_guest(arg, extraversion, ARRAY_SIZE(extraversion)) )
             return -EFAULT;
         return 0;
@@ -247,10 +250,10 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         xen_compile_info_t info;
 
         memset(&info, 0, sizeof(info));
-        safe_strcpy(info.compiler,       xen_compiler());
-        safe_strcpy(info.compile_by,     xen_compile_by());
-        safe_strcpy(info.compile_domain, xen_compile_domain());
-        safe_strcpy(info.compile_date,   xen_compile_date());
+        safe_strcpy(info.compiler,       deny ? xen_deny() : xen_compiler());
+        safe_strcpy(info.compile_by,     deny ? xen_deny() : xen_compile_by());
+        safe_strcpy(info.compile_domain, deny ? xen_deny() : xen_compile_domain());
+        safe_strcpy(info.compile_date,   deny ? xen_deny() : xen_compile_date());
         if ( copy_to_guest(arg, &info, 1) )
             return -EFAULT;
         return 0;
@@ -261,7 +264,8 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         xen_capabilities_info_t info;
 
         memset(info, 0, sizeof(info));
-        arch_get_xen_caps(&info);
+        if ( !deny )
+            arch_get_xen_caps(&info);
 
         if ( copy_to_guest(arg, info, ARRAY_SIZE(info)) )
             return -EFAULT;
@@ -285,7 +289,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         xen_changeset_info_t chgset;
 
         memset(chgset, 0, sizeof(chgset));
-        safe_strcpy(chgset, xen_changeset());
+        safe_strcpy(chgset, deny ? xen_deny() : xen_changeset());
         if ( copy_to_guest(arg, chgset, ARRAY_SIZE(chgset)) )
             return -EFAULT;
         return 0;
@@ -342,19 +346,37 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     }
 
     case XENVER_pagesize:
+        if ( deny )
+            return 0;
         return (!guest_handle_is_null(arg) ? -EINVAL : PAGE_SIZE);
 
     case XENVER_guest_handle:
-        if ( copy_to_guest(arg, current->domain->handle,
-                           ARRAY_SIZE(current->domain->handle)) )
+    {
+        xen_domain_handle_t hdl;
+
+        if ( deny )
+            memset(&hdl, 0, ARRAY_SIZE(hdl));
+
+        BUILD_BUG_ON(ARRAY_SIZE(current->domain->handle) != ARRAY_SIZE(hdl));
+
+        if ( copy_to_guest(arg, deny ? hdl : current->domain->handle,
+                           ARRAY_SIZE(hdl) ) )
             return -EFAULT;
         return 0;
+    }
 
     case XENVER_commandline:
-        if ( copy_to_guest(arg, saved_cmdline, ARRAY_SIZE(saved_cmdline)) )
+    {
+        size_t len = ARRAY_SIZE(saved_cmdline);
+
+        if ( deny )
+            len = strlen(xen_deny()) + 1;
+
+        if ( copy_to_guest(arg, deny ? xen_deny() : saved_cmdline, len) )
             return -EFAULT;
         return 0;
     }
+    }
 
     return -ENOSYS;
 }
diff --git a/xen/common/version.c b/xen/common/version.c
index b152e27..fc9bf42 100644
--- a/xen/common/version.c
+++ b/xen/common/version.c
@@ -55,3 +55,18 @@ const char *xen_banner(void)
 {
     return XEN_BANNER;
 }
+
+const char *xen_deny(void)
+{
+    return "<denied>";
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/version.h b/xen/include/xen/version.h
index 81a3c7d..2015c0b 100644
--- a/xen/include/xen/version.h
+++ b/xen/include/xen/version.h
@@ -12,5 +12,6 @@ unsigned int xen_minor_version(void);
 const char *xen_extra_version(void);
 const char *xen_changeset(void);
 const char *xen_banner(void);
+const char *xen_deny(void);
 
 #endif /* __XEN_VERSION_H__ */
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 1d13826..abbe282 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -727,3 +727,27 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, unsigned int
 }
 
 #endif /* CONFIG_X86 */
+
+#include <public/version.h>
+static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
+{
+    XSM_ASSERT_ACTION(XSM_OTHER);
+    switch ( op )
+    {
+    case XENVER_version:
+    case XENVER_platform_parameters:
+    case XENVER_get_features:
+        /* These sub-ops ignore the permission checks and return data. */
+        return 0;
+    case XENVER_extraversion:
+    case XENVER_compile_info:
+    case XENVER_capabilities:
+    case XENVER_changeset:
+    case XENVER_pagesize:
+    case XENVER_guest_handle:
+        /* These MUST always be accessible to any guest by default. */
+        return xsm_default_action(XSM_HOOK, current->domain, NULL);
+    default:
+        return xsm_default_action(XSM_PRIV, current->domain, NULL);
+    }
+}
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 3afed70..5ecbee0 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -193,6 +193,7 @@ struct xsm_operations {
     int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
     int (*pmu_op) (struct domain *d, unsigned int op);
 #endif
+    int (*xen_version) (uint32_t cmd);
 };
 
 #ifdef CONFIG_XSM
@@ -731,6 +732,11 @@ static inline int xsm_pmu_op (xsm_default_t def, struct domain *d, unsigned int
 
 #endif /* CONFIG_X86 */
 
+static inline int xsm_xen_version (xsm_default_t def, uint32_t op)
+{
+    return xsm_ops->xen_version(op);
+}
+
 #endif /* XSM_NO_WRAPPERS */
 
 #ifdef CONFIG_MULTIBOOT
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 0f32636..9791ad4 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -162,4 +162,5 @@ void xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, ioport_mapping);
     set_to_dummy_if_null(ops, pmu_op);
 #endif
+    set_to_dummy_if_null(ops, xen_version);
 }
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 4813623..2069cb3 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -26,6 +26,7 @@
 #include <public/xen.h>
 #include <public/physdev.h>
 #include <public/platform.h>
+#include <public/version.h>
 
 #include <public/xsm/flask_op.h>
 
@@ -1620,6 +1621,43 @@ static int flask_pmu_op (struct domain *d, unsigned int op)
 }
 #endif /* CONFIG_X86 */
 
+static int flask_xen_version (uint32_t op)
+{
+    u32 dsid = domain_sid(current->domain);
+
+    switch ( op )
+    {
+    case XENVER_version:
+    case XENVER_platform_parameters:
+    case XENVER_get_features:
+        /* These sub-ops ignore the permission checks and return data. */
+        return 0;
+    case XENVER_extraversion:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_EXTRAVERSION, NULL);
+    case XENVER_compile_info:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_COMPILE_INFO, NULL);
+    case XENVER_capabilities:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_CAPABILITIES, NULL);
+    case XENVER_changeset:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_CHANGESET, NULL);
+    case XENVER_pagesize:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_PAGESIZE, NULL);
+    case XENVER_guest_handle:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_GUEST_HANDLE, NULL);
+    case XENVER_commandline:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__XEN_COMMANDLINE, NULL);
+    default:
+        return -EPERM;
+    }
+}
+
 long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 int compat_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 
@@ -1758,6 +1796,7 @@ static struct xsm_operations flask_ops = {
     .ioport_mapping = flask_ioport_mapping,
     .pmu_op = flask_pmu_op,
 #endif
+    .xen_version = flask_xen_version,
 };
 
 static __init void flask_init(void)
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index effb59f..badcf1c 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -495,3 +495,28 @@ class security
 # remove ocontext label definitions for resources
     del_ocontext
 }
+
+# Class version is used to describe the XENVER_ hypercall.
+# Almost all sub-ops are described here - in the default case all of them should
+# be allowed except the XENVER_commandline.
+#
+# The ones that are omitted are XENVER_version, XENVER_platform_parameters,
+# and XENVER_get_features  - as they MUST always be returned to a guest.
+#
+class version
+{
+# Extra informations (-unstable).
+    xen_extraversion
+# Compile information of the hypervisor.
+    xen_compile_info
+# Such as "xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64".
+    xen_capabilities
+# Source code changeset.
+    xen_changeset
+# Page size the hypervisor uses.
+    xen_pagesize
+# An value that the control stack can choose.
+    xen_guest_handle
+# Xen command line.
+    xen_commandline
+}
diff --git a/xen/xsm/flask/policy/security_classes b/xen/xsm/flask/policy/security_classes
index ca191db..cde4e1a 100644
--- a/xen/xsm/flask/policy/security_classes
+++ b/xen/xsm/flask/policy/security_classes
@@ -18,5 +18,6 @@ class shadow
 class event
 class grant
 class security
+class version
 
 # FLASK
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 16/34] xsplice: Implement payload loading
  2016-03-15 17:56 ` [PATCH v4 16/34] xsplice: Implement payload loading Konrad Rzeszutek Wilk
@ 2016-03-22 17:25   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-22 17:25 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, Andrew Cooper, Martin Pohlack, Ross Lagerwall,
	Julien Grall, Stefano Stabellini, Jan Beulich, sasha.levin,
	Xen-devel

.. snip..
> +static void* xsplice_map_rx(const mfn_t *mfn, unsigned int pages)
> +{
> +    unsigned long cur;
> +    unsigned long start, end;
> +
> +    start = (unsigned long)avail_virt_start;
> +    end = start + pages * PAGE_SIZE;
> +
> +    ASSERT(find_space_fnc);
> +
> +    if ( (find_space_fnc)(pages, &start, &end) )
> +        return NULL;
> +
> +    if ( end >= avail_virt_end )
> +        return NULL;
> +
> +    for ( cur = start; pages--; ++mfn, cur += PAGE_SIZE )
> +    {
> +        /*
> +         * We would like to to RX, but we need to copy data in it first.
> +         * See arch_xsplice_secure for how we lockdown.
> +         */
> +        if ( map_pages_to_xen(start, mfn_x(*mfn), 1, PAGE_HYPERVISOR_RWX) )

s/start/cur/
That sneaked in as I can see earlier versions having the right iterator.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 03/34] xsm/xen_version: Add XSM for the xen_version hypercall
  2016-03-15 17:56 ` [PATCH v4 03/34] xsm/xen_version: Add XSM for the xen_version hypercall Konrad Rzeszutek Wilk
  2016-03-18 11:55   ` Jan Beulich
@ 2016-03-22 17:49   ` Daniel De Graaf
  2016-03-24 15:34   ` anshul makkar
  2 siblings, 0 replies; 124+ messages in thread
From: Daniel De Graaf @ 2016-03-22 17:49 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, ross.lagerwall, konrad,
	andrew.cooper3, mpohlack, sasha.levin
  Cc: Wei Liu, Ian Jackson, Stefano Stabellini

On 03/15/2016 01:56 PM, Konrad Rzeszutek Wilk wrote:
> All of XENVER_* have now an XSM check for their sub-ops.
>
> The subop for XENVER_commandline is now a priviliged operation.
> To not break guests we still return an string - but it is
> just '<denied>\0'.
>
> The rest: XENVER_[version|extraversion|capabilities|
> parameters|get_features|page_size|guest_handle|changeset|
> compile_info] behave as before - allowed by default for all
> guests if using the XSM default policy or with the dummy one.
>
> The admin can choose to change the sub-ops to be denied
> as they see fit.
>
> Also we add a local variable block.
>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.
  2016-03-15 17:56 ` [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane Konrad Rzeszutek Wilk
  2016-03-15 18:29   ` Andrew Cooper
@ 2016-03-22 17:51   ` Daniel De Graaf
  1 sibling, 0 replies; 124+ messages in thread
From: Daniel De Graaf @ 2016-03-22 17:51 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, ross.lagerwall, konrad,
	andrew.cooper3, mpohlack, sasha.levin
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Julien Grall,
	Stefano Stabellini, Jan Beulich, Keir Fraser

On 03/15/2016 01:56 PM, Konrad Rzeszutek Wilk wrote:
> This hypercall mirrors the XENVER_ in that it has similar functionality.
> However it is designed differently:
>   - No compat layer. The data structures are the same size on 32
>     as on 64-bit.
>   - The hypercall accepts three arguments - the command, pointer to
>     an buffer, and the length of the buffer.
>   - Each sub-ops can be "probed" for size by returning the size of
>     buffer that will be needed - if the buffer is NULL.
>   - Subops can complete even if the buffer is too slow - truncated
>     data will be filled and hypercall will return -ENOBUFS.
>   - VERSION_OP_commandline, VERSION_OP_changeset are privileged.
>   - There are no XENVER_compile_info equivalent.
>   - The hypercall can return -EPERM and toolstack/OSes are expected
>     to deal with it.
>
> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 03/34] xsm/xen_version: Add XSM for the xen_version hypercall
  2016-03-22 16:10         ` Konrad Rzeszutek Wilk
@ 2016-03-22 17:54           ` Daniel De Graaf
  0 siblings, 0 replies; 124+ messages in thread
From: Daniel De Graaf @ 2016-03-22 17:54 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Jan Beulich
  Cc: Wei Liu, Stefano Stabellini, andrew.cooper3, Ian Jackson,
	mpohlack, ross.lagerwall, xen-devel, sasha.levin

On 03/22/2016 12:10 PM, Konrad Rzeszutek Wilk wrote:
> On Mon, Mar 21, 2016 at 05:22:09AM -0600, Jan Beulich wrote:
>>>>> On 18.03.16 at 18:26, <konrad.wilk@oracle.com> wrote:
>>> On Fri, Mar 18, 2016 at 05:55:55AM -0600, Jan Beulich wrote:
>>>>>>> On 15.03.16 at 18:56, <konrad.wilk@oracle.com> wrote:
>>>>> @@ -223,12 +224,15 @@ void __init do_initcalls(void)
>>>>>   /*
>>>>>    * Simple hypercalls.
>>>>>    */
>>>>> -
>>>>>   DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>
>>>> Please retain the blank line, as it relates to more than just this
>>>> one function.
>>>
>>> Done! (stray change).
>>
>> Considering this I'm not puzzled by ...
>>
>>>       case XENVER_guest_handle:
>>> -        if ( copy_to_guest(arg, current->domain->handle,
>>> -                           ARRAY_SIZE(current->domain->handle)) )
>>> +    {
>>> +        xen_domain_handle_t hdl;
>>> +
>>> +        if ( deny )
>>> +            memset(&hdl, 0, ARRAY_SIZE(hdl));
>>> +
>>> +        BUILD_BUG_ON(ARRAY_SIZE(current->domain->handle) != ARRAY_SIZE(hdl));
>>> +
>>> +        if ( copy_to_guest(arg, deny ? hdl : current->domain->handle,
>>> +                           ARRAY_SIZE(hdl) ) )
>>>               return -EFAULT;
>>>           return 0;
>>> -
>>> +    }
>>>       case XENVER_commandline:
>>
>> ... this.
>
> Wow. That is some sharp eyes!
>>
>>> --- a/xen/include/xsm/dummy.h
>>> +++ b/xen/include/xsm/dummy.h
>>> @@ -727,3 +727,27 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, unsigned int
>>>   }
>>>
>>>   #endif /* CONFIG_X86 */
>>> +
>>> +#include <public/version.h>
>>> +static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
>>> +{
>>> +    XSM_ASSERT_ACTION(XSM_OTHER);
>>> +    switch ( op )
>>> +    {
>>> +    case XENVER_version:
>>> +    case XENVER_platform_parameters:
>>> +    case XENVER_get_features:
>>> +        /* The sub-ops ignores the permission check and returns data. */
>>
>> ignore ... and return ...
>>
>> With those minor things addressed I think the patch can have my ack.
>
> Thank you!
>
> Now I just need Daniel's Ack again.
>
>>From 1ccf59abdd2cd9228f0159dce77fe404d98c7300 Mon Sep 17 00:00:00 2001
> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Date: Fri, 11 Mar 2016 21:40:43 -0500
> Subject: [PATCH] xsm/xen_version: Add XSM for most of xen_version hypercall
>
> Most of XENVER_* have now an XSM check for their sub-ops.
>
> The subop for XENVER_commandline is now a priviliged operation.
> To not break guests we still return an string - but it is
> just '<denied>\0'.
>
> The XENVER_[version|platform_parameters|get_features] - will
> always return an value to the guest.
>
> The rest: XENVER_[extraversion|capabilities|page_size|
> guest_handle|changeset| compile_info] behave as before -
> allowed by default for all guests if using the XSM default
> policy or with the dummy one. And if the system admin
> wants to curtail access to some of them - they can do
> that now with a non-default XSM policy.
>
> Also we add a local variable block.
>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Acked-by: Jan Beulich <jbeulich@suse.com>

Replied to the wrong email before; this one is actually:

Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.
  2016-03-22 16:06                 ` Jan Beulich
@ 2016-03-22 18:57                   ` Konrad Rzeszutek Wilk
  2016-03-22 19:28                     ` Andrew Cooper
  0 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-22 18:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Stefano Stabellini, Andrew Cooper, Ian Jackson,
	mpohlack, ross.lagerwall, Julien Grall, Stefano Stabellini,
	xen-devel, Daniel De Graaf, Keir Fraser, sasha.levin

> >> > --- a/xen/include/xsm/dummy.h
> >> > +++ b/xen/include/xsm/dummy.h
> >> > @@ -751,3 +751,22 @@ static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
> >> >          return xsm_default_action(XSM_PRIV, current->domain, NULL);
> >> >      }
> >> >  }
> >> > +
> >> > +static XSM_INLINE int xsm_version_op (XSM_DEFAULT_ARG uint32_t op)
> >> > +{
> >> > +    XSM_ASSERT_ACTION(XSM_OTHER);
> >> > +    switch ( op )
> >> > +    {
> >> > +    case XEN_VERSION_version:
> >> > +    case XEN_VERSION_extraversion:
> >> > +    case XEN_VERSION_capabilities:
> >> > +    case XEN_VERSION_platform_parameters:
> >> > +    case XEN_VERSION_get_features:
> >> > +    case XEN_VERSION_pagesize:
> >> > +    case XEN_VERSION_guest_handle:
> >> > +        /* These MUST always be accessible to any guest by default. */
> >> > +        return xsm_default_action(XSM_HOOK, current->domain, NULL);
> >> > +    default:
> >> > +        return xsm_default_action(XSM_PRIV, current->domain, NULL);
> >> 
> >> Considering that we seem to have settled on some exceptions here
> >> for the change adding XSM check to the legacy version op, do you
> >> really think going with no exception at all here is the right approach?
> > 
> >> Because if we do, that'll prevent guests getting fully converted over
> >> to the new interface. Of course, we could also make this conversion
> >> specifically a non-goal, and omit e.g. XEN_VERSION_VERSION from
> >> this new interface.
> > 
> > No no. I think convesion is the right long-term goal. 
> > 
> > However the nice thing about this hypercall is that it can return -EPERM.
> > 
> > Making it always return an value for XEN_VERSION_version,
> > XEN_VERSION_platform_parameters, XEN_VERSION_get_features means that
> > there are some exceptions to this "can return -EPERM" as they will
> > be guaranteed an postive return value. They can ignore the -EPERM
> > case.
> > 
> > And means that guest can still take shortcuts.
> > 
> > I agree with you that guests need these hypercalls but at the same
> > time I am not sure what can be done so they don't fall flat on their
> > faces if they are presented with -EPERM. The Linux xenbus_init can be
> > modified to deal with this returning -EPERM where it makes some assumptions.
> > 
> > But in a likelyhood it is the bad assumption!
> 
> I'm afraid I can't conclude what you mean to say with all of the
> above.

That I am waffling.

Andrew, what is your opinion?
> 
> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.
  2016-03-22 18:57                   ` Konrad Rzeszutek Wilk
@ 2016-03-22 19:28                     ` Andrew Cooper
  2016-03-22 20:39                       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 124+ messages in thread
From: Andrew Cooper @ 2016-03-22 19:28 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Jan Beulich
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, mpohlack,
	ross.lagerwall, Julien Grall, Stefano Stabellini, xen-devel,
	Daniel De Graaf, Keir Fraser, sasha.levin

On 22/03/16 18:57, Konrad Rzeszutek Wilk wrote:
>>>>> --- a/xen/include/xsm/dummy.h
>>>>> +++ b/xen/include/xsm/dummy.h
>>>>> @@ -751,3 +751,22 @@ static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
>>>>>          return xsm_default_action(XSM_PRIV, current->domain, NULL);
>>>>>      }
>>>>>  }
>>>>> +
>>>>> +static XSM_INLINE int xsm_version_op (XSM_DEFAULT_ARG uint32_t op)
>>>>> +{
>>>>> +    XSM_ASSERT_ACTION(XSM_OTHER);
>>>>> +    switch ( op )
>>>>> +    {
>>>>> +    case XEN_VERSION_version:
>>>>> +    case XEN_VERSION_extraversion:
>>>>> +    case XEN_VERSION_capabilities:
>>>>> +    case XEN_VERSION_platform_parameters:
>>>>> +    case XEN_VERSION_get_features:
>>>>> +    case XEN_VERSION_pagesize:
>>>>> +    case XEN_VERSION_guest_handle:
>>>>> +        /* These MUST always be accessible to any guest by default. */
>>>>> +        return xsm_default_action(XSM_HOOK, current->domain, NULL);
>>>>> +    default:
>>>>> +        return xsm_default_action(XSM_PRIV, current->domain, NULL);
>>>> Considering that we seem to have settled on some exceptions here
>>>> for the change adding XSM check to the legacy version op, do you
>>>> really think going with no exception at all here is the right approach?
>>>> Because if we do, that'll prevent guests getting fully converted over
>>>> to the new interface. Of course, we could also make this conversion
>>>> specifically a non-goal, and omit e.g. XEN_VERSION_VERSION from
>>>> this new interface.
>>> No no. I think convesion is the right long-term goal. 
>>>
>>> However the nice thing about this hypercall is that it can return -EPERM.
>>>
>>> Making it always return an value for XEN_VERSION_version,
>>> XEN_VERSION_platform_parameters, XEN_VERSION_get_features means that
>>> there are some exceptions to this "can return -EPERM" as they will
>>> be guaranteed an postive return value. They can ignore the -EPERM
>>> case.
>>>
>>> And means that guest can still take shortcuts.
>>>
>>> I agree with you that guests need these hypercalls but at the same
>>> time I am not sure what can be done so they don't fall flat on their
>>> faces if they are presented with -EPERM. The Linux xenbus_init can be
>>> modified to deal with this returning -EPERM where it makes some assumptions.
>>>
>>> But in a likelyhood it is the bad assumption!
>> I'm afraid I can't conclude what you mean to say with all of the
>> above.
> That I am waffling.
>
> Andrew, what is your opinion?

Nothing good can come from failing a XEN_VERSION_version hypercall. 
There are a number easy ways for a guest to infer such information.

XEN_VERSION_platform_parameters is only useful for 32bit PV guests, and
the toolstack.  Given that it is returning a fixed number in the ABI,
nothing good can come of failing this either.

get_features can effectively be failed for permission reasons by
returning 0.  As such, explicitly failing with -EPERM is similarly
pointless.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 07/34] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables
  2016-03-18 13:07   ` Jan Beulich
@ 2016-03-22 20:18     ` Konrad Rzeszutek Wilk
  2016-03-23  8:19       ` Jan Beulich
  0 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-22 20:18 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, mpohlack,
	Julien Grall, Stefano Stabellini, sasha.levin, xen-devel

. snip..
> > +struct virtual_region kernel_text = {
> 
> static
> 
> > +    .list = LIST_HEAD_INIT(kernel_text.list),
> > +    .start = (unsigned long)_stext,
> > +    .end = (unsigned long)_etext,
> > +#ifdef CONFIG_X86
> > +    .ex = (struct exception_table_entry *)__start___ex_table,
> > +    .ex_end = (struct exception_table_entry *)__stop___ex_table,
> > +#endif
> 
> Is this together with ...
> 
> > +/*
> > + * Becomes irrelevant when __init sections are cleared.
> > + */
> > +struct virtual_region kernel_inittext  = {
> > +    .list = LIST_HEAD_INIT(kernel_inittext.list),
> > +    .skip = ignore_if_active,
> > +    .start = (unsigned long)_sinittext,
> > +    .end = (unsigned long)_einittext,
> > +#ifdef CONFIG_X86
> > +    /* Even if they are __init their exception entry still gets stuck here. */
> > +    .ex = (struct exception_table_entry *)__start___ex_table,
> > +    .ex_end = (struct exception_table_entry *)__stop___ex_table,
> > +#endif
> 
> ... this really a good idea? I.e. are there not going to be any
> odd side effects because of that redundancy?

None. If the EIP falls within _stext and _etext then its 'ex' table
will be scanned. If _inittext and _einittext then this one. Both of
them end up scanning the same exact exception table (which is kind
silly) - but there are no side-effect.

It would be good to only have __init related exceptions on the __inittext
(and also ditch the __init exception tables once the boot is completed)
but I am not exactly sure how to automatically make the macros resolve
what sections it should go in. That is a further TODO though.
> 
> Also note that the comment preceding this object is a single line one.
> 
> > +/*
> > + * No locking. Additions are done either at startup (when there is only
> > + * one CPU) or when all CPUs are running without IRQs.
> > + *
> > + * Deletions are big tricky. We MUST make sure all but one CPU
> > + * are running cpu_relax().
> > + *
> > + */
> > +LIST_HEAD(virtual_region_list);
> 
> I wonder whether this wouldn't better be static, with the iterator
> that the various parties need getting put here as an out-of-line
> function (instead of getting open coded in a couple of places).

There are three users of this list:
 * search_exception_table - which ends up calling search_one_table
    if within start->end and if region->ex is defined.
 * do_invalid_trap (x86 and ARM) - where both scan start->end.
 * symbols_lookup - where we scan start->end.

The last two could be unified in a:

struct virtual_region search_virtual_region(unsigned long addr);

And the first one can use this and then check if region->ex is set as well.

[trying it out]

.. snip..
> > @@ -108,13 +127,17 @@ const char *symbols_lookup(unsigned long addr,
> >  {
> >      unsigned long i, low, high, mid;
> >      unsigned long symbol_end = 0;
> > +    symbols_lookup_t symbol_lookup = NULL;
> 
> Pointless initializer.

I believe we need it. That is the contents on the stack can be garbage
and the __is_active_kernel_text won't update symbol_lookup unless
it finds a match. Ah, and also the compiler is unhappy:

symbols.c: In function ‘symbols_lookup’:
symbols.c:136:8: error: ‘symbol_lookup’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
     if (symbol_lookup)
        ^
cc1: all warnings being treated as errors

> 
> >      namebuf[KSYM_NAME_LEN] = 0;
> >      namebuf[0] = 0;
> >  
> > -    if (!is_active_kernel_text(addr))
> > +    if (!__is_active_kernel_text(addr, &symbol_lookup))
> >          return NULL;
> >  
> > +    if (symbol_lookup)
> > +        return (symbol_lookup)(addr, symbolsize, offset, namebuf);
> 
> Note that there are few coding style issues here (missing blanks,
> superfluous parentheses).

That file uses a different StyleGuide. The Linux one. Which reminds me
that __is_active_kernel_text needs to adhere to different StyleGuide.

> 
> > --- /dev/null
> > +++ b/xen/include/xen/bug_ex_symbols.h
> > @@ -0,0 +1,74 @@
> > +/*
> > + * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
> > + *
> > + */
> > +
> > +#ifndef __BUG_EX_SYMBOL_LIST__
> > +#define __BUG_EX_SYMBOL_LIST__
> > +
> > +#include <xen/config.h>
> > +#include <xen/list.h>
> > +#include <xen/symbols.h>
> > +
> > +#ifdef CONFIG_X86
> > +#include <asm/uaccess.h>
> > +#endif
> 
> Why?

Otherwise the compilation will fail on ARM as they do not have exceptions
(and no asm/uaccess.h file)
> 
> > +#include <asm/bug.h>
> > +
> > +struct virtual_region
> > +{
> > +    struct list_head list;
> > +
> > +#define CHECKING_SYMBOL         (1<<1)
> > +#define CHECKING_BUG_FRAME      (1<<2)
> > +#define CHECKING_EXCEPTION      (1<<3)
> > +    /*
> > +     * Whether to skip this region for particular searches. The flag
> > +     * can be CHECKING_[SYMBOL|BUG_FRAMES|EXCEPTION].
> > +     *
> > +     * If the function returns 1 this region will be skipped.
> > +     */
> > +    bool_t (*skip)(unsigned int flag, unsigned long priv);
> > +
> > +    unsigned long start;        /* Virtual address start. */
> > +    unsigned long end;          /* Virtual address start. */
> > +
> > +    /*
> > +     * If ->skip returns false for CHECKING_SYMBOL we will use
> > +     * 'symbols_lookup' callback to retrieve the name of the
> > +     * addr between start and end. If this is NULL the
> > +     * default lookup mechanism is used (the skip value is
> > +     * ignored).
> > +     */
> > +    symbols_lookup_t symbols_lookup;
> > +
> > +    struct {
> > +        struct bug_frame *bugs; /* The pointer to array of bug frames. */
> > +        ssize_t n_bugs;         /* The number of them. */
> > +    } frame[BUGFRAME_NR];
> > +
> > +#ifdef CONFIG_X86
> > +    struct exception_table_entry *ex;
> > +    struct exception_table_entry *ex_end;
> > +#endif
> 
> The bug frame and exception related data are kind of odd to be
> placed in a structure with this name. Would that not better be
> accessed through ...
> 
> > +    unsigned long priv;         /* To be used by above funcionts if need to. */
> 
> ... this by the interested parties?

This had been re-worked with Andrew's suggestions and yours. Pls see:


From d542ae2b6de421c5402888519d4d11a09abe8d46 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Thu, 10 Mar 2016 16:35:50 -0500
Subject: [PATCH] arm/x86: Use struct virtual_region to do bug, symbol, and
 (x86) exception tables lookup.

During execution of the hypervisor we have two regions of
executable code - stext -> _etext, and _sinittext -> _einitext.

The later is not needed after bootup.

We also have various built-in macros and functions to search
in between those two swaths depending on the state of the system.

That is either for bug_frames, exceptions (x86) or symbol
names for the instruction.

With xSplice in the picture - we need a mechansim for new payloads
to searched as well for all of this.

Originally we had extra 'if (xsplice)...' but that gets
a bit tiring and does not hook up nicely.

This 'struct virtual_region' and virtual_region_list provide a
mechanism to search for the bug_frames, exception table,
and symbol names entries without having various calls in
other sub-components in the system.

Code which wishes to participate in bug_frames and exception table
entries search has to only use two public APIs:
 - register_virtual_region
 - unregister_virtual_region

to let the core code know.

If the ->lookup_symbol is not then the default internal symbol lookup
mechanism is used.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v4: New patch.
v5:
 - Rename to virtual_region.
 - Ditch the 'skip' function.
 - Remove the _stext.
 - Use RCU lists.
 - Add a search function.
---
---
 xen/arch/arm/setup.c             |   4 ++
 xen/arch/arm/traps.c             |  39 ++++++----
 xen/arch/x86/extable.c           |  12 +++-
 xen/arch/x86/setup.c             |   6 ++
 xen/arch/x86/traps.c             |  40 ++++++-----
 xen/common/Makefile              |   1 +
 xen/common/symbols.c             |  11 ++-
 xen/common/virtual_region.c      | 151 +++++++++++++++++++++++++++++++++++++++
 xen/include/xen/symbols.h        |   9 +++
 xen/include/xen/virtual_region.h |  56 +++++++++++++++
 10 files changed, 292 insertions(+), 37 deletions(-)
 create mode 100644 xen/common/virtual_region.c
 create mode 100644 xen/include/xen/virtual_region.h

diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 6d205a9..09ff1ea 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -34,6 +34,7 @@
 #include <xen/keyhandler.h>
 #include <xen/cpu.h>
 #include <xen/pfn.h>
+#include <xen/virtual_region.h>
 #include <xen/vmap.h>
 #include <xen/libfdt/libfdt.h>
 #include <xen/acpi.h>
@@ -860,6 +861,9 @@ void __init start_xen(unsigned long boot_phys_offset,
 
     system_state = SYS_STATE_active;
 
+    /* Must be done past setting system_state. */
+    unregister_init_virtual_region();
+
     domain_unpause_by_systemcontroller(dom0);
 
     /* Switch on to the dynamically allocated stack for the idle vcpu
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 31d2115..97e40bb 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -31,6 +31,7 @@
 #include <xen/softirq.h>
 #include <xen/domain_page.h>
 #include <xen/perfc.h>
+#include <xen/virtual_region.h>
 #include <public/sched.h>
 #include <public/xen.h>
 #include <asm/debugger.h>
@@ -101,6 +102,8 @@ integer_param("debug_stack_lines", debug_stack_lines);
 
 void init_traps(void)
 {
+    setup_virtual_regions();
+
     /* Setup Hyp vector base */
     WRITE_SYSREG((vaddr_t)hyp_traps_vector, VBAR_EL2);
 
@@ -1077,27 +1080,33 @@ void do_unexpected_trap(const char *msg, struct cpu_user_regs *regs)
 
 int do_bug_frame(struct cpu_user_regs *regs, vaddr_t pc)
 {
-    const struct bug_frame *bug;
+    const struct bug_frame *bug = NULL;
     const char *prefix = "", *filename, *predicate;
     unsigned long fixup;
-    int id, lineno;
-    static const struct bug_frame *const stop_frames[] = {
-        __stop_bug_frames_0,
-        __stop_bug_frames_1,
-        __stop_bug_frames_2,
-        NULL
-    };
+    int id = -1, lineno;
+    struct virtual_region *region;
 
-    for ( bug = __start_bug_frames, id = 0; stop_frames[id]; ++bug )
+    region = search_virtual_regions(pc);
+    if ( region )
     {
-        while ( unlikely(bug == stop_frames[id]) )
-            ++id;
+        for ( id = 0; id < BUGFRAME_NR; id++ )
+        {
+            const struct bug_frame *b;
+            unsigned int i;
 
-        if ( ((vaddr_t)bug_loc(bug)) == pc )
-            break;
+            for ( i = 0, b = region->frame[id].bugs;
+                  i < region->frame[id].n_bugs; b++, i++ )
+            {
+                if ( ((vaddr_t)bug_loc(b)) == pc )
+                {
+                    bug = b;
+                    goto found;
+                }
+            }
+        }
     }
-
-    if ( !stop_frames[id] )
+ found:
+    if ( !bug )
         return -ENOENT;
 
     /* WARN, BUG or ASSERT: decode the filename pointer and line number. */
diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c
index 89b5bcb..d0f1361 100644
--- a/xen/arch/x86/extable.c
+++ b/xen/arch/x86/extable.c
@@ -1,10 +1,12 @@
 
-#include <xen/config.h>
 #include <xen/init.h>
+#include <xen/list.h>
 #include <xen/perfc.h>
+#include <xen/rcupdate.h>
 #include <xen/sort.h>
 #include <xen/spinlock.h>
 #include <asm/uaccess.h>
+#include <xen/virtual_region.h>
 
 #define EX_FIELD(ptr, field) ((unsigned long)&(ptr)->field + (ptr)->field)
 
@@ -80,8 +82,12 @@ search_one_table(const struct exception_table_entry *first,
 unsigned long
 search_exception_table(unsigned long addr)
 {
-    return search_one_table(
-        __start___ex_table, __stop___ex_table-1, addr);
+    struct virtual_region *region = search_virtual_regions(addr);
+
+    if ( region && region->ex )
+        return search_one_table(region->ex, region->ex_end-1, addr);
+
+    return 0;
 }
 
 unsigned long
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 1876a28..20cd9b3 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -26,6 +26,7 @@
 #include <xen/pfn.h>
 #include <xen/nodemask.h>
 #include <xen/tmem_xen.h>
+#include <xen/virtual_region.h>
 #include <xen/watchdog.h>
 #include <public/version.h>
 #include <compat/platform.h>
@@ -514,6 +515,9 @@ static void noinline init_done(void)
 
     system_state = SYS_STATE_active;
 
+    /* MUST be done prior to removing .init data. */
+    unregister_init_virtual_region();
+
     domain_unpause_by_systemcontroller(hardware_domain);
 
     /* Zero the .init code and data. */
@@ -616,6 +620,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     smp_prepare_boot_cpu();
     sort_exception_tables();
 
+    setup_virtual_regions();
+
     /* Full exception support from here on in. */
 
     loader = (mbi->flags & MBI_LOADERNAME)
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 6fbb1cf..f90d6ac 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -48,6 +48,7 @@
 #include <xen/kexec.h>
 #include <xen/trace.h>
 #include <xen/paging.h>
+#include <xen/virtual_region.h>
 #include <xen/watchdog.h>
 #include <asm/system.h>
 #include <asm/io.h>
@@ -1132,18 +1133,12 @@ static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
 
 void do_invalid_op(struct cpu_user_regs *regs)
 {
-    const struct bug_frame *bug;
+    const struct bug_frame *bug = NULL;
     u8 bug_insn[2];
     const char *prefix = "", *filename, *predicate, *eip = (char *)regs->eip;
     unsigned long fixup;
-    int id, lineno;
-    static const struct bug_frame *const stop_frames[] = {
-        __stop_bug_frames_0,
-        __stop_bug_frames_1,
-        __stop_bug_frames_2,
-        __stop_bug_frames_3,
-        NULL
-    };
+    int id = -1, lineno;
+    struct virtual_region *region;
 
     DEBUGGER_trap_entry(TRAP_invalid_op, regs);
 
@@ -1160,16 +1155,29 @@ void do_invalid_op(struct cpu_user_regs *regs)
          memcmp(bug_insn, "\xf\xb", sizeof(bug_insn)) )
         goto die;
 
-    for ( bug = __start_bug_frames, id = 0; stop_frames[id]; ++bug )
+    region = search_virtual_regions(regs->eip);
+    if ( region )
     {
-        while ( unlikely(bug == stop_frames[id]) )
-            ++id;
-        if ( bug_loc(bug) == eip )
-            break;
+        for ( id = 0; id < BUGFRAME_NR; id++ )
+        {
+            const struct bug_frame *b;
+            unsigned int i;
+
+            for ( i = 0, b = region->frame[id].bugs;
+                  i < region->frame[id].n_bugs; b++, i++ )
+            {
+                if ( bug_loc(b) == eip )
+                {
+                    bug = b;
+                    goto found;
+                }
+            }
+        }
     }
-    if ( !stop_frames[id] )
-        goto die;
 
+ found:
+    if ( !bug )
+        goto die;
     eip += sizeof(bug_insn);
     if ( id == BUGFRAME_run_fn )
     {
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 77de27e..e43ec49 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -51,6 +51,7 @@ obj-y += time.o
 obj-y += timer.o
 obj-y += trace.o
 obj-y += version.o
+obj-y += virtual_region.o
 obj-y += vm_event.o
 obj-y += vmap.o
 obj-y += vsprintf.o
diff --git a/xen/common/symbols.c b/xen/common/symbols.c
index a59c59d..bba0f2e 100644
--- a/xen/common/symbols.c
+++ b/xen/common/symbols.c
@@ -17,6 +17,7 @@
 #include <xen/lib.h>
 #include <xen/string.h>
 #include <xen/spinlock.h>
+#include <xen/virtual_region.h>
 #include <public/platform.h>
 #include <xen/guest_access.h>
 
@@ -97,8 +98,7 @@ static unsigned int get_symbol_offset(unsigned long pos)
 
 bool_t is_active_kernel_text(unsigned long addr)
 {
-    return (is_kernel_text(addr) ||
-            (system_state < SYS_STATE_active && is_kernel_inittext(addr)));
+    return !!search_virtual_regions(addr);
 }
 
 const char *symbols_lookup(unsigned long addr,
@@ -108,13 +108,18 @@ const char *symbols_lookup(unsigned long addr,
 {
     unsigned long i, low, high, mid;
     unsigned long symbol_end = 0;
+    struct virtual_region *region;
 
     namebuf[KSYM_NAME_LEN] = 0;
     namebuf[0] = 0;
 
-    if (!is_active_kernel_text(addr))
+    region = search_virtual_regions(addr);
+    if (!region)
         return NULL;
 
+    if (region->symbols_lookup)
+        return region->symbols_lookup(addr, symbolsize, offset, namebuf);
+
         /* do a binary search on the sorted symbols_addresses array */
     low = 0;
     high = symbols_num_syms;
diff --git a/xen/common/virtual_region.c b/xen/common/virtual_region.c
new file mode 100644
index 0000000..618a312
--- /dev/null
+++ b/xen/common/virtual_region.c
@@ -0,0 +1,151 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#include <xen/config.h>
+#include <xen/init.h>
+#include <xen/kernel.h>
+#include <xen/rcupdate.h>
+#include <xen/spinlock.h>
+#include <xen/virtual_region.h>
+
+static struct virtual_region compiled = {
+    .list = LIST_HEAD_INIT(compiled.list),
+    .start = (unsigned long)_stext,
+    .end = (unsigned long)_etext,
+#ifdef CONFIG_X86
+    .ex = (struct exception_table_entry *)__start___ex_table,
+    .ex_end = (struct exception_table_entry *)__stop___ex_table,
+#endif
+};
+
+/* Becomes irrelevant when __init sections are cleared. */
+static struct virtual_region compiled_init __initdata = {
+    .list = LIST_HEAD_INIT(compiled_init.list),
+    .start = (unsigned long)_sinittext,
+    .end = (unsigned long)_einittext,
+#ifdef CONFIG_X86
+    /* Even if they are __init their exception entry still gets stuck here. */
+    .ex = (struct exception_table_entry *)__start___ex_table,
+    .ex_end = (struct exception_table_entry *)__stop___ex_table,
+#endif
+};
+
+/*
+ * RCU locking. Additions are done either at startup (when there is only
+ * one CPU) or when all CPUs are running without IRQs.
+ *
+ * Deletions are big tricky. We do it when xSplicing (all CPUs running
+ * without IRQs) or during bootup (when clearing the init).
+ *
+ * Hence we use list_del_rcu (which sports an memory fence) and a spinlock
+ * on deletion.
+ *
+ * All readers of virtual_region_list MUST use list list_for_each_entry_rcu.
+ *
+ */
+static LIST_HEAD(virtual_region_list);
+static DEFINE_SPINLOCK(virtual_region_lock);
+
+struct virtual_region* search_virtual_regions(unsigned long addr)
+{
+    struct virtual_region *region;
+
+    list_for_each_entry_rcu( region, &virtual_region_list, list )
+    {
+        if ( addr >= region->start && addr < region->end )
+            return region;
+    }
+
+    return NULL;
+}
+
+int register_virtual_region(struct virtual_region *r)
+{
+    ASSERT(!local_irq_is_enabled());
+
+    list_add_tail_rcu(&r->list, &virtual_region_list);
+
+    return 0;
+}
+
+static void __unregister_virtual_region(struct virtual_region *r)
+{
+    unsigned long flags;
+
+    spin_lock_irqsave(&virtual_region_lock, flags);
+    list_del_rcu(&r->list);
+    spin_unlock_irqrestore(&virtual_region_lock, flags);
+    /*
+     * We do not need to invoke call_rcu.
+     *
+     * This is due to the fact that on the deletion we have made sure
+     * to use spinlocks (to guard against somebody else calling
+     * unregister_virtual_region) and list_deletion spiced with an memory
+     * barrier - which will flush out the cache lines in other CPUs.
+     *
+     * That protects us from corrupting the list as the readers all
+     * use list_for_each_entry_rcu which is safe against concurrent
+     * deletions.
+     */
+}
+
+void unregister_virtual_region(struct virtual_region *r)
+{
+    /* Expected to be called from xSplice - which has IRQs disabled. */
+    ASSERT(!local_irq_is_enabled());
+
+    __unregister_virtual_region(r);
+}
+
+void unregister_init_virtual_region(void)
+{
+    BUG_ON(system_state != SYS_STATE_active);
+
+    __unregister_virtual_region(&compiled_init);
+}
+
+void __init setup_virtual_regions(void)
+{
+    ssize_t sz;
+    unsigned int i;
+    static const struct bug_frame *const stop_frames[] = {
+        __start_bug_frames,
+        __stop_bug_frames_0,
+        __stop_bug_frames_1,
+        __stop_bug_frames_2,
+#ifdef CONFIG_X86
+        __stop_bug_frames_3,
+#endif
+        NULL
+    };
+
+    /* N.B. idx != i */
+    for ( i = 1; stop_frames[i]; i++ )
+    {
+        const struct bug_frame *s;
+
+        s = stop_frames[i-1];
+        sz = stop_frames[i] - s;
+
+        compiled.frame[i-1].n_bugs = sz;
+        compiled.frame[i-1].bugs = s;
+
+        compiled_init.frame[i-1].n_bugs = sz;
+        compiled_init.frame[i-1].bugs = s;
+    }
+
+    register_virtual_region(&compiled_init);
+    register_virtual_region(&compiled);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/symbols.h b/xen/include/xen/symbols.h
index 1fa0537..fa353f8 100644
--- a/xen/include/xen/symbols.h
+++ b/xen/include/xen/symbols.h
@@ -5,6 +5,15 @@
 
 #define KSYM_NAME_LEN 127
 
+/*
+ * Typedef for the callback functions that symbols_lookup
+ * can call if virtual_region_list has an callback for it.
+ */
+typedef const char *(symbols_lookup_t)(unsigned long addr,
+                                       unsigned long *symbolsize,
+                                       unsigned long *offset,
+                                       char *namebuf);
+
 /* Lookup an address. */
 const char *symbols_lookup(unsigned long addr,
                            unsigned long *symbolsize,
diff --git a/xen/include/xen/virtual_region.h b/xen/include/xen/virtual_region.h
new file mode 100644
index 0000000..bf15fe9
--- /dev/null
+++ b/xen/include/xen/virtual_region.h
@@ -0,0 +1,56 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#ifndef __BUG_EX_SYMBOL_LIST__
+#define __BUG_EX_SYMBOL_LIST__
+
+#include <xen/config.h>
+#include <xen/list.h>
+#include <xen/symbols.h>
+
+#ifdef CONFIG_X86
+#include <asm/uaccess.h>
+#endif
+#include <asm/bug.h>
+
+struct virtual_region
+{
+    struct list_head list;
+    unsigned long start;        /* Virtual address start. */
+    unsigned long end;          /* Virtual address start. */
+
+    /*
+     * If this is NULL the default lookup mechanism is used.
+     */
+    symbols_lookup_t *symbols_lookup;
+
+    struct {
+        const struct bug_frame *bugs; /* The pointer to array of bug frames. */
+        ssize_t n_bugs;         /* The number of them. */
+    } frame[BUGFRAME_NR];
+
+#ifdef CONFIG_X86
+    struct exception_table_entry *ex;
+    struct exception_table_entry *ex_end;
+#endif
+};
+
+extern struct virtual_region *search_virtual_regions(unsigned long addr);
+extern void setup_virtual_regions(void);
+extern void unregister_init_virtual_region(void);
+extern int register_virtual_region(struct virtual_region *r);
+extern void unregister_virtual_region(struct virtual_region *r);
+
+#endif /* __BUG_EX_SYMBOL_LIST__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.5.0



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.
  2016-03-22 19:28                     ` Andrew Cooper
@ 2016-03-22 20:39                       ` Konrad Rzeszutek Wilk
  2016-03-23  8:56                         ` Jan Beulich
  0 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-22 20:39 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, mpohlack,
	ross.lagerwall, Julien Grall, Stefano Stabellini, Jan Beulich,
	xen-devel, Daniel De Graaf, Keir Fraser, sasha.levin

On Tue, Mar 22, 2016 at 07:28:57PM +0000, Andrew Cooper wrote:
> On 22/03/16 18:57, Konrad Rzeszutek Wilk wrote:
> >>>>> --- a/xen/include/xsm/dummy.h
> >>>>> +++ b/xen/include/xsm/dummy.h
> >>>>> @@ -751,3 +751,22 @@ static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
> >>>>>          return xsm_default_action(XSM_PRIV, current->domain, NULL);
> >>>>>      }
> >>>>>  }
> >>>>> +
> >>>>> +static XSM_INLINE int xsm_version_op (XSM_DEFAULT_ARG uint32_t op)
> >>>>> +{
> >>>>> +    XSM_ASSERT_ACTION(XSM_OTHER);
> >>>>> +    switch ( op )
> >>>>> +    {
> >>>>> +    case XEN_VERSION_version:
> >>>>> +    case XEN_VERSION_extraversion:
> >>>>> +    case XEN_VERSION_capabilities:
> >>>>> +    case XEN_VERSION_platform_parameters:
> >>>>> +    case XEN_VERSION_get_features:
> >>>>> +    case XEN_VERSION_pagesize:
> >>>>> +    case XEN_VERSION_guest_handle:
> >>>>> +        /* These MUST always be accessible to any guest by default. */
> >>>>> +        return xsm_default_action(XSM_HOOK, current->domain, NULL);
> >>>>> +    default:
> >>>>> +        return xsm_default_action(XSM_PRIV, current->domain, NULL);
> >>>> Considering that we seem to have settled on some exceptions here
> >>>> for the change adding XSM check to the legacy version op, do you
> >>>> really think going with no exception at all here is the right approach?
> >>>> Because if we do, that'll prevent guests getting fully converted over
> >>>> to the new interface. Of course, we could also make this conversion
> >>>> specifically a non-goal, and omit e.g. XEN_VERSION_VERSION from
> >>>> this new interface.
> >>> No no. I think convesion is the right long-term goal. 
> >>>
> >>> However the nice thing about this hypercall is that it can return -EPERM.
> >>>
> >>> Making it always return an value for XEN_VERSION_version,
> >>> XEN_VERSION_platform_parameters, XEN_VERSION_get_features means that
> >>> there are some exceptions to this "can return -EPERM" as they will
> >>> be guaranteed an postive return value. They can ignore the -EPERM
> >>> case.
> >>>
> >>> And means that guest can still take shortcuts.
> >>>
> >>> I agree with you that guests need these hypercalls but at the same
> >>> time I am not sure what can be done so they don't fall flat on their
> >>> faces if they are presented with -EPERM. The Linux xenbus_init can be
> >>> modified to deal with this returning -EPERM where it makes some assumptions.
> >>>
> >>> But in a likelyhood it is the bad assumption!
> >> I'm afraid I can't conclude what you mean to say with all of the
> >> above.
> > That I am waffling.
> >
> > Andrew, what is your opinion?
> 
> Nothing good can come from failing a XEN_VERSION_version hypercall. 
> There are a number easy ways for a guest to infer such information.
> 
> XEN_VERSION_platform_parameters is only useful for 32bit PV guests, and
> the toolstack.  Given that it is returning a fixed number in the ABI,
> nothing good can come of failing this either.
> 
> get_features can effectively be failed for permission reasons by
> returning 0.  As such, explicitly failing with -EPERM is similarly
> pointless.

Allright. So here it is:

From 0f82c5f4cd1537ab8555f2b94bb3fe738df69733 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Tue, 22 Mar 2016 12:03:21 -0400
Subject: [PATCH] HYPERCALL_version_op. New hypercall mirroring XENVER_ but
 sane.

This hypercall mirrors the XENVER_ in that it has similar functionality.
However it is designed differently:
 - No compat layer. The data structures are the same size on 32
   as on 64-bit.
 - The hypercall accepts three arguments - the command, pointer to
   an buffer, and the length of the buffer.
 - Each sub-ops can be "probed" for size by returning the size of
   buffer that will be needed - if the buffer is NULL.
 - Subops can complete even if the buffer is too small - truncated
   data will be filled and hypercall will return -ENOBUFS.
 - VERSION_commandline, VERSION_changeset are privileged.
 - There is no XENVER_compile_info equivalent.
 - The hypercall can return -EPERM and toolstack/OSes are expected
   to deal with. However there are three subops: XEN_VERSION_version,
   XEN_VERSION_platform_parameters and XEN_VERSION_get_features
   that will always return an value as guests cannot survive without them.

While we combine some of the common code between XENVER_ and VERSION_
take the liberty of moving pae_extended_cr3 in x86 area.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> [XSM bits]

---
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v1-v3: Was not part of the series.
v4: New posting.
v5: Remove memset and use {}. Tweak copy_to_guest and capabilities_info,
    add ASSERT(sz) per Andrew's review. Add cached=1 back in.
    Per Jan, s/VERSION_OP/VERSION/, squash size check with do_version_op,
    update the comments. Dropped Andrew's Review-by. Ate newlines.
    Added initcall to guard against garbage being set in cached data.
---
---
 tools/flask/policy/policy/modules/xen/xen.te |   7 +-
 xen/arch/arm/traps.c                         |   1 +
 xen/arch/x86/hvm/hvm.c                       |   1 +
 xen/arch/x86/x86_64/compat/entry.S           |   2 +
 xen/arch/x86/x86_64/entry.S                  |   2 +
 xen/common/compat/kernel.c                   |   2 +
 xen/common/kernel.c                          | 228 ++++++++++++++++++++++-----
 xen/include/public/arch-arm.h                |   2 +
 xen/include/public/version.h                 |  70 +++++++-
 xen/include/public/xen.h                     |   1 +
 xen/include/xen/hypercall.h                  |   4 +
 xen/include/xsm/dummy.h                      |  21 +++
 xen/include/xsm/xsm.h                        |   6 +
 xen/xsm/dummy.c                              |   1 +
 xen/xsm/flask/hooks.c                        |  35 ++++
 xen/xsm/flask/policy/access_vectors          |  21 ++-
 16 files changed, 361 insertions(+), 43 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index 18f49b5..b528797 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -74,11 +74,12 @@ allow dom0_t xen_t:xen2 {
     get_symbol
 };
 
-# Allow dom0 to use all XENVER_ subops that have checks.
+# Allow dom0 to use all XENVER_ subops and VERSION subops that have checks.
 # Note that dom0 is part of domain_type so this has duplicates.
 allow dom0_t xen_t:version {
     xen_extraversion xen_compile_info xen_capabilities
     xen_changeset xen_pagesize xen_guest_handle xen_commandline
+    extraversion capabilities changeset pagesize guest_handle commandline
 };
 
 allow dom0_t xen_t:mmu memorymap;
@@ -145,10 +146,12 @@ if (guest_writeconsole) {
 # pmu_ctrl is for)
 allow domain_type xen_t:xen2 pmu_use;
 
-# For normal guests all possible except XENVER_commandline.
+# For normal guests all possible except XENVER_commandline, VERSION_changeset,
+# and VERSION_commandline
 allow domain_type xen_t:version {
     xen_extraversion xen_compile_info xen_capabilities
     xen_changeset  xen_pagesize xen_guest_handle
+    extraversion capabilities pagesize guest_handle
 };
 
 ###############################################################################
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 83744e8..31d2115 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1235,6 +1235,7 @@ static arm_hypercall_t arm_hypercall_table[] = {
     HYPERCALL(multicall, 2),
     HYPERCALL(platform_op, 1),
     HYPERCALL_ARM(vcpu_op, 3),
+    HYPERCALL(version_op, 3),
 };
 
 #ifndef NDEBUG
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 80d59ff..f16b590 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -5322,6 +5322,7 @@ static const struct {
     COMPAT_CALL(platform_op),
     COMPAT_CALL(mmuext_op),
     HYPERCALL(xenpmu_op),
+    HYPERCALL(version_op),
     HYPERCALL(arch_1)
 };
 
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index 33e2c12..fd25e84 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -394,6 +394,7 @@ ENTRY(compat_hypercall_table)
         .quad do_tmem_op
         .quad do_ni_hypercall           /* reserved for XenClient */
         .quad do_xenpmu_op              /* 40 */
+        .quad do_version_op
         .rept __HYPERVISOR_arch_0-((.-compat_hypercall_table)/8)
         .quad compat_ni_hypercall
         .endr
@@ -445,6 +446,7 @@ ENTRY(compat_hypercall_args_table)
         .byte 1 /* do_tmem_op               */
         .byte 0 /* reserved for XenClient   */
         .byte 2 /* do_xenpmu_op             */  /* 40 */
+        .byte 3 /* do_version_op            */
         .rept __HYPERVISOR_arch_0-(.-compat_hypercall_args_table)
         .byte 0 /* compat_ni_hypercall      */
         .endr
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 07ef096..b0e7257 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -730,6 +730,7 @@ ENTRY(hypercall_table)
         .quad do_tmem_op
         .quad do_ni_hypercall       /* reserved for XenClient */
         .quad do_xenpmu_op          /* 40 */
+        .quad do_version_op
         .rept __HYPERVISOR_arch_0-((.-hypercall_table)/8)
         .quad do_ni_hypercall
         .endr
@@ -781,6 +782,7 @@ ENTRY(hypercall_args_table)
         .byte 1 /* do_tmem_op           */
         .byte 0 /* reserved for XenClient */
         .byte 2 /* do_xenpmu_op         */  /* 40 */
+        .byte 3 /* do_version_op        */
         .rept __HYPERVISOR_arch_0-(.-hypercall_args_table)
         .byte 0 /* do_ni_hypercall      */
         .endr
diff --git a/xen/common/compat/kernel.c b/xen/common/compat/kernel.c
index df93fdd..7a7ca53 100644
--- a/xen/common/compat/kernel.c
+++ b/xen/common/compat/kernel.c
@@ -39,6 +39,8 @@ CHECK_TYPE(capabilities_info);
 
 CHECK_TYPE(domain_handle);
 
+CHECK_TYPE(version_op_val);
+
 #define xennmi_callback compat_nmi_callback
 #define xennmi_callback_t compat_nmi_callback_t
 
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index a4a3c36..d614067 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -221,6 +221,47 @@ void __init do_initcalls(void)
 
 #endif
 
+static int get_features(struct domain *d, xen_feature_info_t *fi)
+{
+    switch ( fi->submap_idx )
+    {
+    case 0:
+        fi->submap = (1U << XENFEAT_memory_op_vnode_supported);
+        if ( paging_mode_translate(d) )
+            fi->submap |=
+                (1U << XENFEAT_writable_page_tables) |
+                (1U << XENFEAT_auto_translated_physmap);
+        if ( is_hardware_domain(d) )
+            fi->submap |= 1U << XENFEAT_dom0;
+#ifdef CONFIG_X86
+        if ( VM_ASSIST(d, pae_extended_cr3) )
+            fi->submap |= (1U << XENFEAT_pae_pgdir_above_4gb);
+        switch ( d->guest_type )
+        {
+        case guest_type_pv:
+            fi->submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) |
+                          (1U << XENFEAT_highmem_assist) |
+                          (1U << XENFEAT_gnttab_map_avail_bits);
+            break;
+        case guest_type_pvh:
+            fi->submap |= (1U << XENFEAT_hvm_safe_pvclock) |
+                          (1U << XENFEAT_supervisor_mode_kernel) |
+                          (1U << XENFEAT_hvm_callback_vector);
+            break;
+        case guest_type_hvm:
+            fi->submap |= (1U << XENFEAT_hvm_safe_pvclock) |
+                          (1U << XENFEAT_hvm_callback_vector) |
+                          (1U << XENFEAT_hvm_pirqs);
+           break;
+        }
+#endif
+        break;
+    default:
+        return -EINVAL;
+    }
+    return 0;
+}
+
 /*
  * Simple hypercalls.
  */
@@ -298,47 +339,14 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     case XENVER_get_features:
     {
         xen_feature_info_t fi;
-        struct domain *d = current->domain;
+        int rc;
 
         if ( copy_from_guest(&fi, arg, 1) )
             return -EFAULT;
 
-        switch ( fi.submap_idx )
-        {
-        case 0:
-            fi.submap = (1U << XENFEAT_memory_op_vnode_supported);
-            if ( VM_ASSIST(d, pae_extended_cr3) )
-                fi.submap |= (1U << XENFEAT_pae_pgdir_above_4gb);
-            if ( paging_mode_translate(d) )
-                fi.submap |= 
-                    (1U << XENFEAT_writable_page_tables) |
-                    (1U << XENFEAT_auto_translated_physmap);
-            if ( is_hardware_domain(d) )
-                fi.submap |= 1U << XENFEAT_dom0;
-#ifdef CONFIG_X86
-            switch ( d->guest_type )
-            {
-            case guest_type_pv:
-                fi.submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) |
-                             (1U << XENFEAT_highmem_assist) |
-                             (1U << XENFEAT_gnttab_map_avail_bits);
-                break;
-            case guest_type_pvh:
-                fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
-                             (1U << XENFEAT_supervisor_mode_kernel) |
-                             (1U << XENFEAT_hvm_callback_vector);
-                break;
-            case guest_type_hvm:
-                fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
-                             (1U << XENFEAT_hvm_callback_vector) |
-                             (1U << XENFEAT_hvm_pirqs);
-                break;
-            }
-#endif
-            break;
-        default:
-            return -EINVAL;
-        }
+        rc = get_features(current->domain, &fi);
+        if ( rc )
+            return rc;
 
         if ( __copy_to_guest(arg, &fi, 1) )
             return -EFAULT;
@@ -381,6 +389,137 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     return -ENOSYS;
 }
 
+static const char *capabilities_info(unsigned int *len)
+{
+    static xen_capabilities_info_t __read_mostly cached_cap;
+    static unsigned int __read_mostly cached_cap_len;
+    static bool_t cached;
+
+    if ( unlikely(!cached) )
+    {
+        arch_get_xen_caps(&cached_cap);
+        cached_cap_len = strlen(cached_cap) + 1;
+        cached = 1;
+    }
+
+    *len = cached_cap_len;
+    return cached_cap;
+}
+
+/*
+ * Similar to HYPERVISOR_xen_version but with a sane interface
+ * (has a length, one can probe for the length) and with one less sub-ops:
+ * missing XENVER_compile_info.
+ */
+DO(version_op)(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg,
+               unsigned int len)
+{
+    union {
+        xen_version_op_val_t val;
+        xen_feature_info_t fi;
+    } u = {};
+    unsigned int sz = 0;
+    const void *ptr = NULL;
+    int rc = xsm_version_op(XSM_OTHER, cmd);
+
+    /* We can safely return -EPERM! */
+    if ( rc )
+        return rc;
+
+    /*
+     * The HYPERVISOR_xen_version differs in that some return the value,
+     * and some copy it on back on argument. We follow the same rule for all
+     * sub-ops: return 0 on success, positive value of bytes returned, and
+     * always copy the result in arg. Yeey sanity!
+     */
+    switch ( cmd )
+    {
+    case XEN_VERSION_version:
+        sz = sizeof(xen_version_op_val_t);
+        u.val = (xen_major_version() << 16) | xen_minor_version();
+        break;
+
+    case XEN_VERSION_extraversion:
+        sz = strlen(xen_extra_version()) + 1;
+        ptr = xen_extra_version();
+        break;
+
+    case XEN_VERSION_capabilities:
+        ptr = capabilities_info(&sz);
+        break;
+
+    case XEN_VERSION_changeset:
+        sz = strlen(xen_changeset()) + 1;
+        ptr = xen_changeset();
+        break;
+
+    case XEN_VERSION_platform_parameters:
+        sz = sizeof(xen_version_op_val_t);
+        u.val = HYPERVISOR_VIRT_START;
+        break;
+
+    case XEN_VERSION_get_features:
+        sz = sizeof(xen_feature_info_t);
+
+        if ( guest_handle_is_null(arg) )
+            break;
+
+        if ( copy_from_guest(&u.fi, arg, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+        rc = get_features(current->domain, &u.fi);
+        break;
+
+    case XEN_VERSION_pagesize:
+        sz = sizeof(xen_version_op_val_t);
+        u.val = PAGE_SIZE;
+        break;
+
+    case XEN_VERSION_guest_handle:
+        sz = ARRAY_SIZE(current->domain->handle);
+        ptr = current->domain->handle;
+        break;
+
+    case XEN_VERSION_commandline:
+        sz = strlen(saved_cmdline) + 1;
+        ptr = saved_cmdline;
+        break;
+
+    default:
+        rc = -ENOSYS;
+    }
+
+    if ( rc )
+        return rc;
+
+    /*
+     * This hypercall also allows the client to probe. If it provides
+     * a NULL arg we will return the size of the space it has to
+     * allocate for the specific sub-op.
+     */
+    ASSERT(sz);
+    if ( guest_handle_is_null(arg) )
+        return sz;
+
+    if ( !rc )
+    {
+        unsigned int bytes = min(sz, len);
+
+        if ( copy_to_guest(arg, ptr ? : &u, bytes) )
+            rc = -EFAULT;
+
+        /*
+         * We return len (truncate) worth of data even if we fail.
+         */
+        if ( !rc && sz > len )
+            rc = -ENOBUFS;
+    }
+
+    return rc == 0 ? sz : rc;
+}
+
 DO(nmi_op)(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct xennmi_callback cb;
@@ -418,6 +557,21 @@ DO(ni_hypercall)(void)
     return -ENOSYS;
 }
 
+static int __init kernel_cache_init(void)
+{
+    unsigned int len;
+
+    /*
+     * Pre-allocate the cache so we do not have to worry about
+     * simultaneous invocations on safe_strcat by guests and the cache
+     * data becoming garbage.
+     */
+    (void)capabilities_info(&len);
+
+    return 0;
+}
+__initcall(kernel_cache_init);
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index 870bc3b..5f90718 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -128,6 +128,8 @@
  *    * VCPUOP_register_vcpu_info
  *    * VCPUOP_register_runstate_memory_area
  *
+ *  HYPERVISOR_version_op
+ *   All generic sub-operations
  *
  * Other notes on the ARM ABI:
  *
diff --git a/xen/include/public/version.h b/xen/include/public/version.h
index 24a582f..6e327bb 100644
--- a/xen/include/public/version.h
+++ b/xen/include/public/version.h
@@ -30,7 +30,14 @@
 
 #include "xen.h"
 
-/* NB. All ops return zero on success, except XENVER_{version,pagesize} */
+/*
+ * There are two hypercalls mentioned in here. The XENVER_ are for
+ * HYPERCALL_xen_version (17), while VERSION_ are for the
+ * HYPERCALL_version_op (41).
+ *
+ * The subops are very similar except that the later hypercall has a
+ * sane interface.
+ */
 
 /* arg == NULL; returns major:minor (16:16). */
 #define XENVER_version      0
@@ -87,6 +94,67 @@ typedef struct xen_feature_info xen_feature_info_t;
 #define XENVER_commandline 9
 typedef char xen_commandline_t[1024];
 
+/*
+ * The HYPERCALL_version_op has a set of sub-ops which mirror the
+ * sub-ops of HYPERCALL_xen_version. However this hypercall differs
+ * radically from the former:
+ *  - It returns the amount of bytes returned.
+ *  - It will return -XEN_EPERM if the guest is not permitted
+ *    (Albeit XEN_VERSION_version, XEN_VERSION_platform_parameters, and
+ *    XEN_VERSION_get_features will always return an value as guest cannot
+ *    survive without this information).
+ *  - It will return the requested data in arg.
+ *  - It requires an third argument (len) for the length of the
+ *    arg. Naturally the arg has to fit the requested data otherwise
+ *    -XEN_ENOBUFS is returned.
+ *
+ * It also offers an mechanism to probe for the amount of bytes an
+ * sub-op will require. Having the arg have an NULL handle will
+ * return the number of bytes requested for the operation. Or an
+ * negative value if an error is encountered.
+ */
+
+typedef uint64_t xen_version_op_val_t;
+DEFINE_XEN_GUEST_HANDLE(xen_version_op_val_t);
+
+/*
+ * arg == xen_version_op_val_t. Encoded as major:minor (31..16:15..0), while
+ * 63..32 are zero.
+ */
+#define XEN_VERSION_version             0
+
+/* arg == char. Contains NUL terminated utf-8 string. */
+#define XEN_VERSION_extraversion        1
+
+/* arg == char. Contains NUL terminated utf-8 string. */
+#define XEN_VERSION_capabilities        3
+
+/* arg == char. Contains NUL terminated utf-8 string. */
+#define XEN_VERSION_changeset           4
+
+/* arg == xen_version_op_val_t. */
+#define XEN_VERSION_platform_parameters 5
+
+/*
+ * arg = xen_feature_info_t - shares the same structure
+ * as the XENVER_get_features.
+ */
+#define XEN_VERSION_get_features        6
+
+/* arg == xen_version_op_val_t. */
+#define XEN_VERSION_pagesize            7
+
+/*
+ * arg == void.
+ *
+ * The toolstack fills it out for guest consumption. It is intended to hold
+ * the UUID of the guest.
+ */
+#define XEN_VERSION_guest_handle        8
+
+/* arg = char. Contains NUL terminated utf-8 string. */
+#define XEN_VERSION_commandline         9
+
 #endif /* __XEN_PUBLIC_VERSION_H__ */
 
 /*
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 64ba7ab..1a99929 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -115,6 +115,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define __HYPERVISOR_tmem_op              38
 #define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
 #define __HYPERVISOR_xenpmu_op            40
+#define __HYPERVISOR_version_op           41 /* supersedes xen_version (17) */
 
 /* Architecture-specific hypercall definitions. */
 #define __HYPERVISOR_arch_0               48
diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
index 0c8ae0e..e8d2b81 100644
--- a/xen/include/xen/hypercall.h
+++ b/xen/include/xen/hypercall.h
@@ -147,6 +147,10 @@ do_xenoprof_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg);
 extern long
 do_xenpmu_op(unsigned int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg);
 
+extern long
+do_version_op(unsigned int cmd,
+    XEN_GUEST_HANDLE_PARAM(void) arg, unsigned int len);
+
 #ifdef CONFIG_COMPAT
 
 extern int
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index abbe282..e5dad35 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -751,3 +751,24 @@ static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
         return xsm_default_action(XSM_PRIV, current->domain, NULL);
     }
 }
+
+static XSM_INLINE int xsm_version_op (XSM_DEFAULT_ARG uint32_t op)
+{
+    XSM_ASSERT_ACTION(XSM_OTHER);
+    switch ( op )
+    {
+    case XEN_VERSION_version:
+    case XEN_VERSION_platform_parameters:
+    case XEN_VERSION_get_features:
+        /* These MUST always be accessible to any guest by default. */
+        return 0;
+    case XEN_VERSION_extraversion:
+    case XEN_VERSION_capabilities:
+    case XEN_VERSION_pagesize:
+    case XEN_VERSION_guest_handle:
+        /* These can be accessible to a guest. */
+        return xsm_default_action(XSM_HOOK, current->domain, NULL);
+    default:
+        return xsm_default_action(XSM_PRIV, current->domain, NULL);
+    }
+}
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 5ecbee0..ac80472 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -194,6 +194,7 @@ struct xsm_operations {
     int (*pmu_op) (struct domain *d, unsigned int op);
 #endif
     int (*xen_version) (uint32_t cmd);
+    int (*version_op) (uint32_t cmd);
 };
 
 #ifdef CONFIG_XSM
@@ -737,6 +738,11 @@ static inline int xsm_xen_version (xsm_default_t def, uint32_t op)
     return xsm_ops->xen_version(op);
 }
 
+static inline int xsm_version_op (xsm_default_t def, uint32_t op)
+{
+    return xsm_ops->version_op(op);
+}
+
 #endif /* XSM_NO_WRAPPERS */
 
 #ifdef CONFIG_MULTIBOOT
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 9791ad4..776dd09 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -163,4 +163,5 @@ void xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, pmu_op);
 #endif
     set_to_dummy_if_null(ops, xen_version);
+    set_to_dummy_if_null(ops, version_op);
 }
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 2069cb3..1eaec58 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1658,6 +1658,40 @@ static int flask_xen_version (uint32_t op)
     }
 }
 
+static int flask_version_op (uint32_t op)
+{
+    u32 dsid = domain_sid(current->domain);
+
+    switch ( op )
+    {
+    case XEN_VERSION_version:
+    case XEN_VERSION_platform_parameters:
+    case XEN_VERSION_get_features:
+        /* These MUST always be accessible to any guest by default. */
+        return 0;
+    case XEN_VERSION_extraversion:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__EXTRAVERSION, NULL);
+    case XEN_VERSION_capabilities:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__CAPABILITIES, NULL);
+    case XEN_VERSION_changeset:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__CHANGESET, NULL);
+    case XEN_VERSION_pagesize:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__PAGESIZE, NULL);
+    case XEN_VERSION_guest_handle:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__GUEST_HANDLE, NULL);
+    case XEN_VERSION_commandline:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__COMMANDLINE, NULL);
+    default:
+        return -EPERM;
+    }
+}
+
 long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 int compat_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 
@@ -1797,6 +1831,7 @@ static struct xsm_operations flask_ops = {
     .pmu_op = flask_pmu_op,
 #endif
     .xen_version = flask_xen_version,
+    .version_op = flask_version_op,
 };
 
 static __init void flask_init(void)
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index badcf1c..56600bb 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -496,12 +496,14 @@ class security
     del_ocontext
 }
 
-# Class version is used to describe the XENVER_ hypercall.
+# Class version is used to describe the XENVER_ and VERSION hypercall.
 # Almost all sub-ops are described here - in the default case all of them should
-# be allowed except the XENVER_commandline.
+# be allowed except the XENVER_commandline, VERSION_commandline, and
+# VERSION_changeset.
 #
 # The ones that are omitted are XENVER_version, XENVER_platform_parameters,
-# and XENVER_get_features  - as they MUST always be returned to a guest.
+# XENVER_get_features, XEN_VERSION_version, XEN_VERSION_platform_parameters,
+# and XEN_VERSION_get_features - as they MUST always be returned to a guest.
 #
 class version
 {
@@ -519,4 +521,17 @@ class version
     xen_guest_handle
 # Xen command line.
     xen_commandline
+# --- VERSION hypercall ---
+# Extra informations (-unstable).
+    extraversion
+# Such as "xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64".
+    capabilities
+# Source code changeset.
+    changeset
+# Page size the hypervisor uses.
+    pagesize
+# An value that the control stack can choose.
+    guest_handle
+# Xen command line.
+    commandline
 }
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 07/34] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables
  2016-03-22 20:18     ` Konrad Rzeszutek Wilk
@ 2016-03-23  8:19       ` Jan Beulich
  2016-03-23 11:17         ` Julien Grall
  2016-03-24  2:49         ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 124+ messages in thread
From: Jan Beulich @ 2016-03-23  8:19 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, mpohlack,
	Julien Grall, Stefano Stabellini, sasha.levin, xen-devel

>>> On 22.03.16 at 21:18, <konrad.wilk@oracle.com> wrote:
> It would be good to only have __init related exceptions on the __inittext
> (and also ditch the __init exception tables once the boot is completed)
> but I am not exactly sure how to automatically make the macros resolve
> what sections it should go in. That is a further TODO though.

I'm pretty confident that this can't be reasonably addressed without
the compiler's help, by extending the section attribute to also allow
for templates rather than only fixed names.

>> > @@ -108,13 +127,17 @@ const char *symbols_lookup(unsigned long addr,
>> >  {
>> >      unsigned long i, low, high, mid;
>> >      unsigned long symbol_end = 0;
>> > +    symbols_lookup_t symbol_lookup = NULL;
>> 
>> Pointless initializer.
> 
> I believe we need it. That is the contents on the stack can be garbage
> and the __is_active_kernel_text won't update symbol_lookup unless
> it finds a match. Ah, and also the compiler is unhappy:
> 
> symbols.c: In function ‘symbols_lookup’:
> symbols.c:136:8: error: ‘symbol_lookup’ may be used uninitialized in this 
> function [-Werror=maybe-uninitialized]
>      if (symbol_lookup)
>         ^
> cc1: all warnings being treated as errors

That's unfortunate, since ...

>> >      namebuf[KSYM_NAME_LEN] = 0;
>> >      namebuf[0] = 0;
>> >  
>> > -    if (!is_active_kernel_text(addr))
>> > +    if (!__is_active_kernel_text(addr, &symbol_lookup))
>> >          return NULL;

... this return ensures that ...

>> > +    if (symbol_lookup)
>> > +        return (symbol_lookup)(addr, symbolsize, offset, namebuf);

... this won't be reached with symbol_lookup uninitialized.

>> Note that there are few coding style issues here (missing blanks,
>> superfluous parentheses).
> 
> That file uses a different StyleGuide. The Linux one. Which reminds me
> that __is_active_kernel_text needs to adhere to different StyleGuide.

Linux style would not be using spaces for indentation. The file is
really a bad mixture of styles, but yes, you're right that you don't
violate that pre-existing mixture.

>> > --- /dev/null
>> > +++ b/xen/include/xen/bug_ex_symbols.h
>> > @@ -0,0 +1,74 @@
>> > +/*
>> > + * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
>> > + *
>> > + */
>> > +
>> > +#ifndef __BUG_EX_SYMBOL_LIST__
>> > +#define __BUG_EX_SYMBOL_LIST__
>> > +
>> > +#include <xen/config.h>
>> > +#include <xen/list.h>
>> > +#include <xen/symbols.h>
>> > +
>> > +#ifdef CONFIG_X86
>> > +#include <asm/uaccess.h>
>> > +#endif
>> 
>> Why?
> 
> Otherwise the compilation will fail on ARM as they do not have exceptions
> (and no asm/uaccess.h file)

Well, the question was for the #include, not the #ifdef.

> --- a/xen/common/symbols.c
> +++ b/xen/common/symbols.c
> @@ -17,6 +17,7 @@
>  #include <xen/lib.h>
>  #include <xen/string.h>
>  #include <xen/spinlock.h>
> +#include <xen/virtual_region.h>
>  #include <public/platform.h>
>  #include <xen/guest_access.h>
>  
> @@ -97,8 +98,7 @@ static unsigned int get_symbol_offset(unsigned long pos)
>  
>  bool_t is_active_kernel_text(unsigned long addr)
>  {
> -    return (is_kernel_text(addr) ||
> -            (system_state < SYS_STATE_active && is_kernel_inittext(addr)));
> +    return !!search_virtual_regions(addr);

search_virtual_regions() doesn't sound like it would be looking for
text addresses only.

> --- /dev/null
> +++ b/xen/common/virtual_region.c
> @@ -0,0 +1,151 @@
> +/*
> + * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
> + *
> + */
> +
> +#include <xen/config.h>

No new explicit inclusion of xen/config.h please. We're actually in
the process of getting rid of such elsewhere.

> +#include <xen/init.h>
> +#include <xen/kernel.h>
> +#include <xen/rcupdate.h>
> +#include <xen/spinlock.h>
> +#include <xen/virtual_region.h>
> +
> +static struct virtual_region compiled = {

I think "compiled" is a bad name, as patch modules are equally
compiled - "core" or "builtin" perhaps?

> +struct virtual_region* search_virtual_regions(unsigned long addr)

Misplaced *. And probably should return a pointer to const.

> +{
> +    struct virtual_region *region;
> +
> +    list_for_each_entry_rcu( region, &virtual_region_list, list )

Where's the rcu_read_lock() use of which the comment preceding
the #define of this mandates?

> +static void __unregister_virtual_region(struct virtual_region *r)

As I think I have said before - no double underscore prefixes on
new identifiers please.

> +{
> +    unsigned long flags;
> +
> +    spin_lock_irqsave(&virtual_region_lock, flags);
> +    list_del_rcu(&r->list);
> +    spin_unlock_irqrestore(&virtual_region_lock, flags);
> +    /*
> +     * We do not need to invoke call_rcu.
> +     *
> +     * This is due to the fact that on the deletion we have made sure
> +     * to use spinlocks (to guard against somebody else calling
> +     * unregister_virtual_region) and list_deletion spiced with an memory
> +     * barrier - which will flush out the cache lines in other CPUs.

I don't think barriers do any kind of cache flushing on remote CPUs
(not even on the local one).

> +void __init setup_virtual_regions(void)
> +{
> +    ssize_t sz;
> +    unsigned int i;
> +    static const struct bug_frame *const stop_frames[] = {

This (now sitting in an __init function) should then become __initconstrel.

> +        __start_bug_frames,

This element makes the array name bogus.

> +        __stop_bug_frames_0,
> +        __stop_bug_frames_1,
> +        __stop_bug_frames_2,
> +#ifdef CONFIG_X86
> +        __stop_bug_frames_3,
> +#endif
> +        NULL
> +    };
> +
> +    /* N.B. idx != i */

Stale comment?

> +    for ( i = 1; stop_frames[i]; i++ )
> +    {
> +        const struct bug_frame *s;
> +
> +        s = stop_frames[i-1];
> +        sz = stop_frames[i] - s;
> +
> +        compiled.frame[i-1].n_bugs = sz;
> +        compiled.frame[i-1].bugs = s;
> +
> +        compiled_init.frame[i-1].n_bugs = sz;
> +        compiled_init.frame[i-1].bugs = s;
> +    }

Many times "[i - 1]" please.

> +    register_virtual_region(&compiled_init);
> +    register_virtual_region(&compiled);

Is there a particular reason not to do the main region first? Probably
it's benign, but if there is a reason, I think a comment would be
warranted.

> --- a/xen/include/xen/symbols.h
> +++ b/xen/include/xen/symbols.h
> @@ -5,6 +5,15 @@
>  
>  #define KSYM_NAME_LEN 127
>  
> +/*
> + * Typedef for the callback functions that symbols_lookup
> + * can call if virtual_region_list has an callback for it.
> + */
> +typedef const char *(symbols_lookup_t)(unsigned long addr,

Stray parentheses.

> --- /dev/null
> +++ b/xen/include/xen/virtual_region.h
> @@ -0,0 +1,56 @@
> +/*
> + * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
> + *
> + */
> +
> +#ifndef __BUG_EX_SYMBOL_LIST__
> +#define __BUG_EX_SYMBOL_LIST__

No longer in line with the header name, and missing a XEN_ portion.

> +struct virtual_region
> +{
> +    struct list_head list;
> +    unsigned long start;        /* Virtual address start. */
> +    unsigned long end;          /* Virtual address start. */
> +
> +    /*
> +     * If this is NULL the default lookup mechanism is used.
> +     */

Here or elsewhere I'm sure I've made the comment before: This is
a single line comment.

> +    symbols_lookup_t *symbols_lookup;
> +
> +    struct {
> +        const struct bug_frame *bugs; /* The pointer to array of bug 
> frames. */
> +        ssize_t n_bugs;         /* The number of them. */
> +    } frame[BUGFRAME_NR];
> +
> +#ifdef CONFIG_X86
> +    struct exception_table_entry *ex;
> +    struct exception_table_entry *ex_end;
> +#endif

Would there be any harm omitting the #ifdef and leaving the
pointers be NULL on ARM?

> +extern struct virtual_region *search_virtual_regions(unsigned long addr);
> +extern void setup_virtual_regions(void);
> +extern void unregister_init_virtual_region(void);
> +extern int register_virtual_region(struct virtual_region *r);
> +extern void unregister_virtual_region(struct virtual_region *r);

While a matter of taste, I personally would prefer if "extern" was
omitted on function declarations.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.
  2016-03-22 20:39                       ` Konrad Rzeszutek Wilk
@ 2016-03-23  8:56                         ` Jan Beulich
  2016-03-24  2:37                           ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 124+ messages in thread
From: Jan Beulich @ 2016-03-23  8:56 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Stefano Stabellini, Andrew Cooper, Ian Jackson,
	mpohlack, ross.lagerwall, Julien Grall, Stefano Stabellini,
	xen-devel, Daniel De Graaf, Keir Fraser, sasha.levin

>>> On 22.03.16 at 21:39, <konrad.wilk@oracle.com> wrote:
> @@ -381,6 +389,137 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>      return -ENOSYS;
>  }
>  
> +static const char *capabilities_info(unsigned int *len)
> +{
> +    static xen_capabilities_info_t __read_mostly cached_cap;
> +    static unsigned int __read_mostly cached_cap_len;
> +    static bool_t cached;
> +
> +    if ( unlikely(!cached) )
> +    {
> +        arch_get_xen_caps(&cached_cap);
> +        cached_cap_len = strlen(cached_cap) + 1;
> +        cached = 1;
> +    }
> +
> +    *len = cached_cap_len;
> +    return cached_cap;
> +}

With the init time call to prefill the cache being quite far away, I
think you need a comment here. Even better, though, would be if
you ditched the function altogether and did the prefilling right in
that __init function below, while the consumers of the data would
access the static variables directly. In the end that might even
allow arch_get_xen_caps() to become __init.

> +    if ( !rc )
> +    {
> +        unsigned int bytes = min(sz, len);
> +
> +        if ( copy_to_guest(arg, ptr ? : &u, bytes) )
> +            rc = -EFAULT;
> +
> +        /*
> +         * We return len (truncate) worth of data even if we fail.
> +         */

Single line comment.

> @@ -418,6 +557,21 @@ DO(ni_hypercall)(void)
>      return -ENOSYS;
>  }
>  
> +static int __init kernel_cache_init(void)
> +{
> +    unsigned int len;
> +
> +    /*
> +     * Pre-allocate the cache so we do not have to worry about
> +     * simultaneous invocations on safe_strcat by guests and the cache
> +     * data becoming garbage.
> +     */
> +    (void)capabilities_info(&len);

No need for the cast, afaics.


> +/*
> + * arg == xen_version_op_val_t. Encoded as major:minor (31..16:15..0), while
> + * 63..32 are zero.
> + */
> +#define XEN_VERSION_version             0
> +
> +/* arg == char. Contains NUL terminated utf-8 string. */

I should have noticed this before: "char" isn't really what you mean
here and below. Perhaps better "char[]"?

> --- a/xen/xsm/flask/hooks.c
> +++ b/xen/xsm/flask/hooks.c
> @@ -1658,6 +1658,40 @@ static int flask_xen_version (uint32_t op)
>      }
>  }
>  
> +static int flask_version_op (uint32_t op)
> +{
> +    u32 dsid = domain_sid(current->domain);
> +
> +    switch ( op )
> +    {
> +    case XEN_VERSION_version:
> +    case XEN_VERSION_platform_parameters:
> +    case XEN_VERSION_get_features:
> +        /* These MUST always be accessible to any guest by default. */
> +        return 0;

Perhaps these would better be taken care of in xsm_version_op()?
(That consideration then also applies to the other patch of course.)

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 07/34] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables
  2016-03-23  8:19       ` Jan Beulich
@ 2016-03-23 11:17         ` Julien Grall
  2016-03-23 11:21           ` Jan Beulich
  2016-03-24  2:49         ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 124+ messages in thread
From: Julien Grall @ 2016-03-23 11:17 UTC (permalink / raw)
  To: Jan Beulich, Konrad Rzeszutek Wilk
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, mpohlack,
	Stefano Stabellini, sasha.levin, xen-devel

Hi Jan,

On 23/03/16 08:19, Jan Beulich wrote:
>>>> On 22.03.16 at 21:18, <konrad.wilk@oracle.com> wrote:
>> +    symbols_lookup_t *symbols_lookup;
>> +
>> +    struct {
>> +        const struct bug_frame *bugs; /* The pointer to array of bug
>> frames. */
>> +        ssize_t n_bugs;         /* The number of them. */
>> +    } frame[BUGFRAME_NR];
>> +
>> +#ifdef CONFIG_X86
>> +    struct exception_table_entry *ex;
>> +    struct exception_table_entry *ex_end;
>> +#endif
>
> Would there be any harm omitting the #ifdef and leaving the
> pointers be NULL on ARM?

The structure exception_table_entry is only defined for x86 
(asm-x86/uaccess.h).

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 11/34] xsplice: Design document
  2016-03-15 17:56 ` [PATCH v4 11/34] xsplice: Design document Konrad Rzeszutek Wilk
@ 2016-03-23 11:18   ` Jan Beulich
  2016-03-23 20:12     ` Konrad Rzeszutek Wilk
  2016-03-24  3:15     ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 124+ messages in thread
From: Jan Beulich @ 2016-03-23 11:18 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, Ian Jackson,
	Tim Deegan, mpohlack, sasha.levin, xen-devel

>>> On 15.03.16 at 18:56, <konrad.wilk@oracle.com> wrote:
> +### XEN_SYSCTL_XSPLICE_LIST (2)
> +
> +Retrieve an array of abbreviated status and names of payloads that are 
> loaded in the
> +hypervisor.
> +
> +The caller provides:
> +
> + * `version`. Initially (on first hypercall) *MUST* be zero.
> + * `idx` index iterator. On first call *MUST* be zero, subsequent calls varies.
> + * `nr` the max number of entries to populate.
> + * `pad` - *MUST* be zero.
> + * `status` virtual address of where to write `struct xen_xsplice_status`
> +   structures. Caller *MUST* allocate up to `nr` of them.
> + * `name` - virtual address of where to write the unique name of the payload.
> +   Caller *MUST* allocate up to `nr` of them. Each *MUST* be of
> +   **XEN_XSPLICE_NAME_SIZE** size.
> + * `len` - virtual address of where to write the length of each unique name
> +   of the payload. Caller *MUST* allocate up to `nr` of them. Each *MUST* be
> +   of sizeof(uint32_t) (4 bytes).
> +
> +If the hypercall returns an positive number, it is the number (upto `nr`
> +provided to the hypercall) of the payloads returned, along with `nr` updated
> +with the number of remaining payloads, `version` updated (it may be the same
> +across hypercalls - if it varies the data is stale and further calls could
> +fail). The `status`, `name`, and `len`' are updated at their designed index
> +value (`idx`) with the returned value of data.
> +
> +If the hypercall returns -XEN_E2BIG the `nr` is too big and should be
> +lowered.
> +
> +If the hypercall returns an zero value there are no more payloads.
> +
> +Note that due to the asynchronous nature of hypercalls the control domain might
> +have added or removed a number of payloads making this information stale. It is
> +the responsibility of the toolstack to use the `version` field to check
> +between each invocation. if the version differs it should discard the stale
> +data and start from scratch. It is OK for the toolstack to use the new
> +`version` field.
> +
> +The `struct xen_xsplice_status` structure contains an status of payload which includes:
> +
> + * `status` - indicates the current status of the payload:
> +   * *XSPLICE_STATUS_CHECKED*  (1) loaded and the ELF payload safety checks passed.
> +   * *XSPLICE_STATUS_APPLIED* (2) loaded, checked, and applied.
> +   *  No other value is possible.
> + * `rc` - -XEN_EXX type errors encountered while performing the last
> +   XSPLICE_ACTION_* operation. The normal values can be zero or -XEN_EAGAIN which
> +   respectively mean: success or operation in progress. Other values
> +   imply an error occurred. If there is an error in `rc`, `status` will **NOT**
> +   have changed.
> +
> +The structure is as follow:
> +
> +<pre>
> +struct xen_sysctl_xsplice_list {  
> +    uint32_t version;                       /* IN/OUT: Initially *MUST* be zero.  
> +                                               On subsequent calls reuse value.  
> +                                               If varies between calls, we are  
> +                                             * getting stale data. */  
> +    uint32_t idx;                           /* IN/OUT: Index into array. */ 
> +    uint32_t nr;                            /* IN: How many status, names, and len  
> +                                               should fill out.  
> +                                               OUT: How many payloads left. */  

I think there's an ambiguity left in both the description above and
the comments here: With idx required to be zero upon first
invocation (which I'm not clear why that is), which parts of the
three arrays get filled when idx is non-zero: [0, idx) or [nr, nr + idx)?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 07/34] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables
  2016-03-23 11:17         ` Julien Grall
@ 2016-03-23 11:21           ` Jan Beulich
  0 siblings, 0 replies; 124+ messages in thread
From: Jan Beulich @ 2016-03-23 11:21 UTC (permalink / raw)
  To: Julien Grall
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, mpohlack,
	Stefano Stabellini, sasha.levin, xen-devel

>>> On 23.03.16 at 12:17, <julien.grall@arm.com> wrote:
> Hi Jan,
> 
> On 23/03/16 08:19, Jan Beulich wrote:
>>>>> On 22.03.16 at 21:18, <konrad.wilk@oracle.com> wrote:
>>> +    symbols_lookup_t *symbols_lookup;
>>> +
>>> +    struct {
>>> +        const struct bug_frame *bugs; /* The pointer to array of bug
>>> frames. */
>>> +        ssize_t n_bugs;         /* The number of them. */
>>> +    } frame[BUGFRAME_NR];
>>> +
>>> +#ifdef CONFIG_X86
>>> +    struct exception_table_entry *ex;
>>> +    struct exception_table_entry *ex_end;
>>> +#endif
>>
>> Would there be any harm omitting the #ifdef and leaving the
>> pointers be NULL on ARM?
> 
> The structure exception_table_entry is only defined for x86 
> (asm-x86/uaccess.h).

But the uses above are fine without the structure ever getting
defined.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 12/34] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
  2016-03-15 17:56 ` [PATCH v4 12/34] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op Konrad Rzeszutek Wilk
  2016-03-16 12:12   ` Julien Grall
@ 2016-03-23 13:51   ` Jan Beulich
  2016-03-24  3:13     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 124+ messages in thread
From: Jan Beulich @ 2016-03-23 13:51 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Stefano Stabellini, andrew.cooper3, Ian Jackson,
	mpohlack, ross.lagerwall, xen-devel, Daniel De Graaf,
	sasha.levin

>>> On 15.03.16 at 18:56, <konrad.wilk@oracle.com> wrote:
> --- a/xen/common/Kconfig
> +++ b/xen/common/Kconfig
> @@ -168,4 +168,15 @@ config SCHED_DEFAULT
>  
>  endmenu
>  
> +# Enable/Disable xsplice support
> +config XSPLICE
> +	bool "xSplice live patching support"
> +	default y

Isn't it a little early in the series to default this to on?

And then of course the EXPERT question comes up again. No
matter that IanC is no longer around to help with the
argumentation, the point he has been making about too many
flavors ending up in the wild continues to apply.

> @@ -460,6 +461,12 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
>          ret = tmem_control(&op->u.tmem_op);
>          break;
>  
> +    case XEN_SYSCTL_xsplice_op:
> +        ret = xsplice_op(&op->u.xsplice);
> +        if ( ret != -ENOSYS )
> +            copyback = 1;
> +        break;

Why is ENOSYS special here, but not e.g. EOPNOTSUPP?

> +struct payload {
> +    uint32_t state;                      /* One of the XSPLICE_STATE_*. */
> +    int32_t rc;                          /* 0 or -XEN_EXX. */
> +    struct list_head list;               /* Linked to 'payload_list'. */
> +    char name[XEN_XSPLICE_NAME_SIZE + 1];/* Name of it. */

Could I talk you into reducing XEN_XSPLICE_NAME_SIZE to 127,
to avoid needless padding in places like this one?

> +static int verify_name(const xen_xsplice_name_t *name)
> +{
> +    if ( name->size == 0 || name->size > XEN_XSPLICE_NAME_SIZE )
> +        return -EINVAL;
> +
> +    if ( name->pad[0] || name->pad[1] || name->pad[2] )

I'd like to ask for consistency here: Either always use == 0 / != 0,
or always omit the latter and use ! in place of the former.

> +static int verify_payload(const xen_sysctl_xsplice_upload_t *upload)
> +{
> +    if ( verify_name(&upload->name) )
> +        return -EINVAL;
> +
> +    if ( upload->size == 0 )
> +        return -EINVAL;
> +
> +    if ( !guest_handle_okay(upload->payload, upload->size) )

Careful here - upload->size is uint64_t, yet array_access_ok() makes
assumptions on not too large a size getting passed. I.e. I think you
want to apply an upper bound to the size right here - for example, it
can't reasonably be bigger than XEN_VIRT_END - XEN_VIRT_START
if I remember correctly how you intend to place those payloads.

> +static int find_payload(const xen_xsplice_name_t *name, struct payload **f)

Perhaps neater to use the xen/err.h constructs here instead
of indirection?

> +{
> +    struct payload *data;
> +    XEN_GUEST_HANDLE_PARAM(char) str;
> +    char n[XEN_XSPLICE_NAME_SIZE + 1] = { 0 };

This pointlessly zeroes the entire array. Just set str[name->size]
to zero after the copy-in.

> +    int rc = -EINVAL;

Pointless initializer.

> +    rc = verify_name(name);
> +    if ( rc )
> +        return rc;
> +
> +    str = guest_handle_cast(name->name, char);

Why do you need a cast here?

> +    if ( copy_from_guest(n, str, name->size) )

You validated the address range already, so __copy_from_guest()
will be just fine and more efficient.

> +        return -EFAULT;
> +
> +    spin_lock_recursive(&payload_lock);

Why do you need a recursive lock here? I think something like this
should be reasoned about in the commit message.

> +/*
> + * We MUST be holding the payload_lock spinlock.
> + */

Single line comment (but kind of redundant with ...

> +static void free_payload(struct payload *data)
> +{
> +    ASSERT(spin_is_locked(&payload_lock));

... this anyway).

> +static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
> +{
> +    struct payload *data = NULL;

Pointless initializer.

> +    void *raw_data = NULL;
> +    int rc;
> +
> +    rc = verify_payload(upload);
> +    if ( rc )
> +        return rc;
> +
> +    rc = find_payload(&upload->name, &data);
> +    if ( rc == 0 /* Found. */ )
> +        return -EEXIST;
> +
> +    if ( rc != -ENOENT )
> +        return rc;
> +
> +    data = xzalloc(struct payload);
> +    if ( !data )
> +        return -ENOMEM;
> +
> +    rc = -EFAULT;
> +    if ( copy_from_guest(data->name, upload->name.name, upload->name.size) )

__copy_from_guest()

> +        goto out;
> +
> +    rc = -ENOMEM;
> +    raw_data = vzalloc(upload->size);

vmalloc()

> +    if ( !raw_data )
> +        goto out;
> +
> +    rc = -EFAULT;
> +    if ( copy_from_guest(raw_data, upload->payload, upload->size) )

__copy_from_guest()

> +        goto out;
> +
> +    data->state = XSPLICE_STATE_CHECKED;
> +    data->rc = 0;

This is redundant with the xzalloc() above.

> +    INIT_LIST_HEAD(&data->list);
> +
> +    spin_lock_recursive(&payload_lock);
> +    list_add_tail(&data->list, &payload_list);
> +    payload_cnt++;
> +    payload_version++;
> +    spin_unlock_recursive(&payload_lock);
> +
> + out:
> +    vfree(raw_data);

By here you allocated and filled raw_data. And now you
unconditionally free it. What is that good for?

> +    if ( rc )
> +    {
> +        xfree(data);
> +    }

The use of braces here is inconsistent with all of the rest of this
function.

> +static int xsplice_get(xen_sysctl_xsplice_get_t *get)
> +{
> +    struct payload *data;
> +    int rc;
> +
> +    rc = verify_name(&get->name);
> +    if ( rc )
> +        return rc;
> +
> +    rc = find_payload(&get->name, &data);
> +    if ( rc )
> +        return rc;
> +
> +    get->status.state = data->state;
> +    get->status.rc = data->rc;

What guarantees that data didn't get freed by the time you get here?

> +static int xsplice_list(xen_sysctl_xsplice_list_t *list)
> +{
> +    xen_xsplice_status_t status;
> +    struct payload *data;
> +    unsigned int idx = 0, i = 0;
> +    int rc = 0;
> +
> +    if ( list->nr > 1024 )
> +        return -E2BIG;
> +
> +    if ( list->pad != 0 )
> +        return -EINVAL;
> +
> +    if ( !guest_handle_okay(list->status, sizeof(status) * list->nr) ||
> +         !guest_handle_okay(list->name, XEN_XSPLICE_NAME_SIZE * list->nr) ||
> +         !guest_handle_okay(list->len, sizeof(uint32_t) * list->nr) )

guest_handle_okay() already takes into account the element size,
i.e. it's only the middle one which needs to do any multiplication.

> +        return -EINVAL;
> +
> +    spin_lock_recursive(&payload_lock);
> +    if ( list->idx > payload_cnt || !list->nr )

The list->nr check could move up outside the locked region (e.g.
merge with the pad field check).

> +    {
> +        spin_unlock_recursive(&payload_lock);
> +        return -EINVAL;
> +    }
> +
> +    list_for_each_entry( data, &payload_list, list )

Aren't you lacking a list->version check prior to entering this loop
(which would then mean you don't need to store it below, but only
on the error path from that check)?

> +    {
> +        uint32_t len;
> +
> +        if ( list->idx > i++ )
> +            continue;
> +
> +        status.state = data->state;
> +        status.rc = data->rc;
> +        len = strlen(data->name);
> +
> +        /* N.B. 'idx' != 'i'. */
> +        if ( __copy_to_guest_offset(list->name, idx * XEN_XSPLICE_NAME_SIZE,
> +                                    data->name, len) ||
> +             __copy_to_guest_offset(list->len, idx, &len, 1) ||

You're not coping the NUL terminator here, which makes the result
more cumbersome to consume by the caller. Perhaps
XEN_XSPLICE_NAME_SIZE should remain to be 128 (other than
suggested above), but be specified to include the terminator?

> +             __copy_to_guest_offset(list->status, idx, &status, 1) )
> +        {
> +            rc = -EFAULT;
> +            break;
> +        }
> +
> +        idx++;
> +
> +        if ( hypercall_preempt_check() || (idx + 1 > list->nr) )

idx >= list->nr would seem easier to grok. Also the two should be
switched, as hypercall_preempt_check() is the more expensive of
the two checks.

> +static int xsplice_action(xen_sysctl_xsplice_action_t *action)
> +{
> +    struct payload *data;
> +    int rc;
> +
> +    rc = verify_name(&action->name);
> +    if ( rc )
> +        return rc;
> +
> +    spin_lock_recursive(&payload_lock);
> +    rc = find_payload(&action->name, &data);
> +    if ( rc )
> +        goto out;
> +
> +    switch ( action->cmd )
> +    {
> +    case XSPLICE_ACTION_CHECK:
> +        if ( data->state == XSPLICE_STATE_CHECKED )
> +        {
> +            /* No implementation yet. */
> +            data->state = XSPLICE_STATE_CHECKED;
> +            data->rc = 0;
> +            rc = 0;

rc is zero already.

> +        }
> +        break;
> +
> +    case XSPLICE_ACTION_UNLOAD:
> +        if ( data->state == XSPLICE_STATE_CHECKED )
> +        {
> +            free_payload(data);
> +            /* No touching 'data' from here on! */

Poison the pointer to make sure?

> +            rc = 0;
> +        }
> +        break;
> +
> +    case XSPLICE_ACTION_REVERT:
> +        if ( data->state == XSPLICE_STATE_APPLIED )
> +        {
> +            /* No implementation yet. */
> +            data->state = XSPLICE_STATE_CHECKED;
> +            data->rc = 0;
> +            rc = 0;
> +        }
> +        break;
> +
> +    case XSPLICE_ACTION_APPLY:
> +        if ( (data->state == XSPLICE_STATE_CHECKED) )

Stray parentheses.

> +static void xsplice_printall(unsigned char key)
> +{
> +    struct payload *data;
> +
> +    spin_lock_recursive(&payload_lock);

I think this would better be a try-lock, bailing if the acquire failed.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 11/34] xsplice: Design document
  2016-03-23 11:18   ` Jan Beulich
@ 2016-03-23 20:12     ` Konrad Rzeszutek Wilk
  2016-03-23 20:21       ` Konrad Rzeszutek Wilk
  2016-03-24  3:15     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-23 20:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, Ian Jackson,
	Tim Deegan, mpohlack, sasha.levin, xen-devel

On Wed, Mar 23, 2016 at 05:18:39AM -0600, Jan Beulich wrote:
> >>> On 15.03.16 at 18:56, <konrad.wilk@oracle.com> wrote:
> > +### XEN_SYSCTL_XSPLICE_LIST (2)
> > +
> > +Retrieve an array of abbreviated status and names of payloads that are 
> > loaded in the
> > +hypervisor.
> > +
> > +The caller provides:
> > +
> > + * `version`. Initially (on first hypercall) *MUST* be zero.
> > + * `idx` index iterator. On first call *MUST* be zero, subsequent calls varies.
> > + * `nr` the max number of entries to populate.
> > + * `pad` - *MUST* be zero.
> > + * `status` virtual address of where to write `struct xen_xsplice_status`
> > +   structures. Caller *MUST* allocate up to `nr` of them.
> > + * `name` - virtual address of where to write the unique name of the payload.
> > +   Caller *MUST* allocate up to `nr` of them. Each *MUST* be of
> > +   **XEN_XSPLICE_NAME_SIZE** size.
> > + * `len` - virtual address of where to write the length of each unique name
> > +   of the payload. Caller *MUST* allocate up to `nr` of them. Each *MUST* be
> > +   of sizeof(uint32_t) (4 bytes).
> > +
> > +If the hypercall returns an positive number, it is the number (upto `nr`
> > +provided to the hypercall) of the payloads returned, along with `nr` updated
> > +with the number of remaining payloads, `version` updated (it may be the same
> > +across hypercalls - if it varies the data is stale and further calls could
> > +fail). The `status`, `name`, and `len`' are updated at their designed index
> > +value (`idx`) with the returned value of data.
> > +
> > +If the hypercall returns -XEN_E2BIG the `nr` is too big and should be
> > +lowered.
> > +
> > +If the hypercall returns an zero value there are no more payloads.
> > +
> > +Note that due to the asynchronous nature of hypercalls the control domain might
> > +have added or removed a number of payloads making this information stale. It is
> > +the responsibility of the toolstack to use the `version` field to check
> > +between each invocation. if the version differs it should discard the stale
> > +data and start from scratch. It is OK for the toolstack to use the new
> > +`version` field.
> > +
> > +The `struct xen_xsplice_status` structure contains an status of payload which includes:
> > +
> > + * `status` - indicates the current status of the payload:
> > +   * *XSPLICE_STATUS_CHECKED*  (1) loaded and the ELF payload safety checks passed.
> > +   * *XSPLICE_STATUS_APPLIED* (2) loaded, checked, and applied.
> > +   *  No other value is possible.
> > + * `rc` - -XEN_EXX type errors encountered while performing the last
> > +   XSPLICE_ACTION_* operation. The normal values can be zero or -XEN_EAGAIN which
> > +   respectively mean: success or operation in progress. Other values
> > +   imply an error occurred. If there is an error in `rc`, `status` will **NOT**
> > +   have changed.
> > +
> > +The structure is as follow:
> > +
> > +<pre>
> > +struct xen_sysctl_xsplice_list {  
> > +    uint32_t version;                       /* IN/OUT: Initially *MUST* be zero.  
> > +                                               On subsequent calls reuse value.  
> > +                                               If varies between calls, we are  
> > +                                             * getting stale data. */  
> > +    uint32_t idx;                           /* IN/OUT: Index into array. */ 
> > +    uint32_t nr;                            /* IN: How many status, names, and len  
> > +                                               should fill out.  
> > +                                               OUT: How many payloads left. */  
> 
> I think there's an ambiguity left in both the description above and
> the comments here: With idx required to be zero upon first
> invocation (which I'm not clear why that is), which parts of the

That is actually a stale design choice. Initially the "How many payloads left"
was going to be stamped in 'idx'. But it is now in 'nr'.

The value can be arbitrary, albeit on first invocation it should be 0 otherwise
you won't get 'nr' telling you how many payloads there left. Unless your
'idx' falls below the amount of payloads.

As in, say we have 20 payloads.
If the first hypercall for 'idx' has 30, then the hypercall will return -EINVAL.
If the first hypercall 'idx' has 19, then the hypercall will populate
->name,->len,->status, ->version and write ->nr with 1.

> three arrays get filled when idx is non-zero: [0, idx) or [nr, nr + idx)?

I am going to assume the you are filling the two /*IN*/ entries, so ->idx
and ->nr.

[0, idx]:

If there is data and the amount of payloads is greater than idx (0), and there
are no hypercall preemptions, then:

->nr = remaining amount
->version = version value
->name[0..idx]
->len[0..idx]
->status[0..idx]


[nr, nr + idx]:

If there is data and the amount of payloads is less than nr, then -EINVAL
is returned.

If there is data and the amount of payloads is greater than 'nr', and there
are no hypercall preemptions, then:

->nr = remaining amount
->version = version value
->name[nr..nr + idx]
->len[nr..nr+idx]
->status[nr..nr+idx]

Let me update the design doc to remove the /*IN*/ part about the 'idx'.


diff --git a/docs/misc/xsplice.markdown b/docs/misc/xsplice.markdown
index 6aa5a27..8252e6c 100644
--- a/docs/misc/xsplice.markdown
+++ b/docs/misc/xsplice.markdown
@@ -487,7 +487,9 @@ hypervisor.
 The caller provides:
 
  * `version`. Initially (on first hypercall) *MUST* be zero.
- * `idx` index iterator. On first call *MUST* be zero, subsequent calls varies.
+ * `idx` index iterator. The index into the hypervisor's payload count. It is
+    recommended that on first invocation zero be used so that `nr` (which the
+    hypervisor will update with the remaining payload count) be provided.
  * `nr` the max number of entries to populate.
  * `pad` - *MUST* be zero.
  * `status` virtual address of where to write `struct xen_xsplice_status`
@@ -538,9 +540,9 @@ struct xen_sysctl_xsplice_list {
                                                On subsequent calls reuse value.  
                                                If varies between calls, we are  
                                              * getting stale data. */  
-    uint32_t idx;                           /* IN/OUT: Index into array. */  
+    uint32_t idx;                           /* IN: Index into array. */  
     uint32_t nr;                            /* IN: How many status, names, and len  
-                                               should fill out.  
+                                               should be filled out.  
                                                OUT: How many payloads left. */  
     uint32_t pad;                           /* IN: Must be zero. */  
     XEN_GUEST_HANDLE_64(xen_xsplice_status_t) status;  /* OUT. Must have enough  
> 
> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 11/34] xsplice: Design document
  2016-03-23 20:12     ` Konrad Rzeszutek Wilk
@ 2016-03-23 20:21       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-23 20:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, Ian Jackson,
	Tim Deegan, mpohlack, sasha.levin, xen-devel

> diff --git a/docs/misc/xsplice.markdown b/docs/misc/xsplice.markdown
> index 6aa5a27..8252e6c 100644
> --- a/docs/misc/xsplice.markdown
> +++ b/docs/misc/xsplice.markdown
> @@ -487,7 +487,9 @@ hypervisor.
>  The caller provides:
>  
>   * `version`. Initially (on first hypercall) *MUST* be zero.
> - * `idx` index iterator. On first call *MUST* be zero, subsequent calls varies.
> + * `idx` index iterator. The index into the hypervisor's payload count. It is
> +    recommended that on first invocation zero be used so that `nr` (which the
> +    hypervisor will update with the remaining payload count) be provided.
>   * `nr` the max number of entries to populate.
>   * `pad` - *MUST* be zero.
>   * `status` virtual address of where to write `struct xen_xsplice_status`
> @@ -538,9 +540,9 @@ struct xen_sysctl_xsplice_list {
>                                                 On subsequent calls reuse value.  
>                                                 If varies between calls, we are  
>                                               * getting stale data. */  
> -    uint32_t idx;                           /* IN/OUT: Index into array. */  
> +    uint32_t idx;                           /* IN: Index into array. */  
>      uint32_t nr;                            /* IN: How many status, names, and len  
> -                                               should fill out.  
> +                                               should be filled out.  
>                                                 OUT: How many payloads left. */  
>      uint32_t pad;                           /* IN: Must be zero. */  
>      XEN_GUEST_HANDLE_64(xen_xsplice_status_t) status;  /* OUT. Must have enough  
> > 
> > Jan
> > 

And it occurred to me that we can do a probe call similar to XEN_VERSION.

That is fill 'nr' with zero and ->names, ->status, ->list, etc can be NULL.
Then 'nr' will be filled back with the number of payloads.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.
  2016-03-23  8:56                         ` Jan Beulich
@ 2016-03-24  2:37                           ` Konrad Rzeszutek Wilk
  2016-03-24  9:15                             ` Jan Beulich
  0 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-24  2:37 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Stefano Stabellini, Andrew Cooper, Ian Jackson,
	mpohlack, ross.lagerwall, Julien Grall, Stefano Stabellini,
	xen-devel, Daniel De Graaf, Keir Fraser, sasha.levin

. fixed all of those ..
> > --- a/xen/xsm/flask/hooks.c
> > +++ b/xen/xsm/flask/hooks.c
> > @@ -1658,6 +1658,40 @@ static int flask_xen_version (uint32_t op)
> >      }
> >  }
> >  
> > +static int flask_version_op (uint32_t op)
> > +{
> > +    u32 dsid = domain_sid(current->domain);
> > +
> > +    switch ( op )
> > +    {
> > +    case XEN_VERSION_version:
> > +    case XEN_VERSION_platform_parameters:
> > +    case XEN_VERSION_get_features:
> > +        /* These MUST always be accessible to any guest by default. */
> > +        return 0;
> 
> Perhaps these would better be taken care of in xsm_version_op()?

It would be the oddball one.
All of the xsm_**() in the header file (include/xsm/xsm.h) call the function
pointers.

> (That consideration then also applies to the other patch of course.)

Here is the updated patch:

From 10ecad7469a5ba12895418aa2d035def852654e7 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Tue, 22 Mar 2016 16:53:19 -0400
Subject: [PATCH] HYPERCALL_version_op. New hypercall mirroring XENVER_ but
 sane.

This hypercall mirrors the XENVER_ in that it has similar functionality.
However it is designed differently:
 - No compat layer. The data structures are the same size on 32
   as on 64-bit.
 - The hypercall accepts three arguments - the command, pointer to
   an buffer, and the length of the buffer.
 - Each sub-ops can be "probed" for size by returning the size of
   buffer that will be needed - if the buffer is NULL.
 - Subops can complete even if the buffer is too small - truncated
   data will be filled and hypercall will return -ENOBUFS.
 - VERSION_commandline, VERSION_changeset are privileged.
 - There is no XENVER_compile_info equivalent.
 - The hypercall can return -EPERM and toolstack/OSes are expected
   to deal with. However there are three subops: XEN_VERSION_version,
   XEN_VERSION_platform_parameters and XEN_VERSION_get_features
   that will always return an value as guests cannot survive without them.

While we combine some of the common code between XENVER_ and VERSION_
take the liberty of moving pae_extended_cr3 in x86 area.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> [XSM bits]

---
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v1-v3: Was not part of the series.
v4: New posting.
v5: Remove memset and use {}. Tweak copy_to_guest and capabilities_info,
    add ASSERT(sz) per Andrew's review. Add cached=1 back in.
    Per Jan, s/VERSION_OP/VERSION/, squash size check with do_version_op,
    update the comments. Dropped Andrew's Review-by. Ate newlines.
    Added initcall to guard against garbage being set in cached data.
    Folded code populating cache in __init. s/char/char[]/ in public.h
---
---
 tools/flask/policy/policy/modules/xen/xen.te |   7 +-
 xen/arch/arm/traps.c                         |   1 +
 xen/arch/x86/hvm/hvm.c                       |   1 +
 xen/arch/x86/x86_64/compat/entry.S           |   2 +
 xen/arch/x86/x86_64/entry.S                  |   2 +
 xen/common/compat/kernel.c                   |   2 +
 xen/common/kernel.c                          | 213 ++++++++++++++++++++++-----
 xen/include/public/arch-arm.h                |   2 +
 xen/include/public/version.h                 |  70 ++++++++-
 xen/include/public/xen.h                     |   1 +
 xen/include/xen/hypercall.h                  |   4 +
 xen/include/xsm/dummy.h                      |  21 +++
 xen/include/xsm/xsm.h                        |   6 +
 xen/xsm/dummy.c                              |   1 +
 xen/xsm/flask/hooks.c                        |  35 +++++
 xen/xsm/flask/policy/access_vectors          |  21 ++-
 16 files changed, 346 insertions(+), 43 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index e174e48..7e69ce9 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -74,11 +74,12 @@ allow dom0_t xen_t:xen2 {
     get_symbol
 };
 
-# Allow dom0 to use all XENVER_ subops that have checks.
+# Allow dom0 to use all XENVER_ subops and VERSION subops that have checks.
 # Note that dom0 is part of domain_type so this has duplicates.
 allow dom0_t xen_t:version {
     xen_extraversion xen_compile_info xen_capabilities
     xen_changeset xen_pagesize xen_guest_handle xen_commandline
+    extraversion capabilities changeset pagesize guest_handle commandline
 };
 
 allow dom0_t xen_t:mmu memorymap;
@@ -145,10 +146,12 @@ if (guest_writeconsole) {
 # pmu_ctrl is for)
 allow domain_type xen_t:xen2 pmu_use;
 
-# For normal guests all possible except XENVER_commandline.
+# For normal guests all possible except XENVER_commandline, VERSION_changeset,
+# and VERSION_commandline
 allow domain_type xen_t:version {
     xen_extraversion xen_compile_info xen_capabilities
     xen_changeset xen_pagesize xen_guest_handle
+    extraversion capabilities pagesize guest_handle
 };
 
 ###############################################################################
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 83744e8..31d2115 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1235,6 +1235,7 @@ static arm_hypercall_t arm_hypercall_table[] = {
     HYPERCALL(multicall, 2),
     HYPERCALL(platform_op, 1),
     HYPERCALL_ARM(vcpu_op, 3),
+    HYPERCALL(version_op, 3),
 };
 
 #ifndef NDEBUG
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 80d59ff..f16b590 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -5322,6 +5322,7 @@ static const struct {
     COMPAT_CALL(platform_op),
     COMPAT_CALL(mmuext_op),
     HYPERCALL(xenpmu_op),
+    HYPERCALL(version_op),
     HYPERCALL(arch_1)
 };
 
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index 33e2c12..fd25e84 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -394,6 +394,7 @@ ENTRY(compat_hypercall_table)
         .quad do_tmem_op
         .quad do_ni_hypercall           /* reserved for XenClient */
         .quad do_xenpmu_op              /* 40 */
+        .quad do_version_op
         .rept __HYPERVISOR_arch_0-((.-compat_hypercall_table)/8)
         .quad compat_ni_hypercall
         .endr
@@ -445,6 +446,7 @@ ENTRY(compat_hypercall_args_table)
         .byte 1 /* do_tmem_op               */
         .byte 0 /* reserved for XenClient   */
         .byte 2 /* do_xenpmu_op             */  /* 40 */
+        .byte 3 /* do_version_op            */
         .rept __HYPERVISOR_arch_0-(.-compat_hypercall_args_table)
         .byte 0 /* compat_ni_hypercall      */
         .endr
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 07ef096..b0e7257 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -730,6 +730,7 @@ ENTRY(hypercall_table)
         .quad do_tmem_op
         .quad do_ni_hypercall       /* reserved for XenClient */
         .quad do_xenpmu_op          /* 40 */
+        .quad do_version_op
         .rept __HYPERVISOR_arch_0-((.-hypercall_table)/8)
         .quad do_ni_hypercall
         .endr
@@ -781,6 +782,7 @@ ENTRY(hypercall_args_table)
         .byte 1 /* do_tmem_op           */
         .byte 0 /* reserved for XenClient */
         .byte 2 /* do_xenpmu_op         */  /* 40 */
+        .byte 3 /* do_version_op        */
         .rept __HYPERVISOR_arch_0-(.-hypercall_args_table)
         .byte 0 /* do_ni_hypercall      */
         .endr
diff --git a/xen/common/compat/kernel.c b/xen/common/compat/kernel.c
index df93fdd..7a7ca53 100644
--- a/xen/common/compat/kernel.c
+++ b/xen/common/compat/kernel.c
@@ -39,6 +39,8 @@ CHECK_TYPE(capabilities_info);
 
 CHECK_TYPE(domain_handle);
 
+CHECK_TYPE(version_op_val);
+
 #define xennmi_callback compat_nmi_callback
 #define xennmi_callback_t compat_nmi_callback_t
 
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index a4a3c36..fb25359 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -221,6 +221,47 @@ void __init do_initcalls(void)
 
 #endif
 
+static int get_features(struct domain *d, xen_feature_info_t *fi)
+{
+    switch ( fi->submap_idx )
+    {
+    case 0:
+        fi->submap = (1U << XENFEAT_memory_op_vnode_supported);
+        if ( paging_mode_translate(d) )
+            fi->submap |=
+                (1U << XENFEAT_writable_page_tables) |
+                (1U << XENFEAT_auto_translated_physmap);
+        if ( is_hardware_domain(d) )
+            fi->submap |= 1U << XENFEAT_dom0;
+#ifdef CONFIG_X86
+        if ( VM_ASSIST(d, pae_extended_cr3) )
+            fi->submap |= (1U << XENFEAT_pae_pgdir_above_4gb);
+        switch ( d->guest_type )
+        {
+        case guest_type_pv:
+            fi->submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) |
+                          (1U << XENFEAT_highmem_assist) |
+                          (1U << XENFEAT_gnttab_map_avail_bits);
+            break;
+        case guest_type_pvh:
+            fi->submap |= (1U << XENFEAT_hvm_safe_pvclock) |
+                          (1U << XENFEAT_supervisor_mode_kernel) |
+                          (1U << XENFEAT_hvm_callback_vector);
+            break;
+        case guest_type_hvm:
+            fi->submap |= (1U << XENFEAT_hvm_safe_pvclock) |
+                          (1U << XENFEAT_hvm_callback_vector) |
+                          (1U << XENFEAT_hvm_pirqs);
+           break;
+        }
+#endif
+        break;
+    default:
+        return -EINVAL;
+    }
+    return 0;
+}
+
 /*
  * Simple hypercalls.
  */
@@ -298,47 +339,14 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     case XENVER_get_features:
     {
         xen_feature_info_t fi;
-        struct domain *d = current->domain;
+        int rc;
 
         if ( copy_from_guest(&fi, arg, 1) )
             return -EFAULT;
 
-        switch ( fi.submap_idx )
-        {
-        case 0:
-            fi.submap = (1U << XENFEAT_memory_op_vnode_supported);
-            if ( VM_ASSIST(d, pae_extended_cr3) )
-                fi.submap |= (1U << XENFEAT_pae_pgdir_above_4gb);
-            if ( paging_mode_translate(d) )
-                fi.submap |= 
-                    (1U << XENFEAT_writable_page_tables) |
-                    (1U << XENFEAT_auto_translated_physmap);
-            if ( is_hardware_domain(d) )
-                fi.submap |= 1U << XENFEAT_dom0;
-#ifdef CONFIG_X86
-            switch ( d->guest_type )
-            {
-            case guest_type_pv:
-                fi.submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) |
-                             (1U << XENFEAT_highmem_assist) |
-                             (1U << XENFEAT_gnttab_map_avail_bits);
-                break;
-            case guest_type_pvh:
-                fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
-                             (1U << XENFEAT_supervisor_mode_kernel) |
-                             (1U << XENFEAT_hvm_callback_vector);
-                break;
-            case guest_type_hvm:
-                fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
-                             (1U << XENFEAT_hvm_callback_vector) |
-                             (1U << XENFEAT_hvm_pirqs);
-                break;
-            }
-#endif
-            break;
-        default:
-            return -EINVAL;
-        }
+        rc = get_features(current->domain, &fi);
+        if ( rc )
+            return rc;
 
         if ( __copy_to_guest(arg, &fi, 1) )
             return -EFAULT;
@@ -381,6 +389,123 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     return -ENOSYS;
 }
 
+/* Computed be kernel_cache_init. */
+static xen_capabilities_info_t __read_mostly cached_cap;
+static unsigned int __read_mostly cached_cap_len;
+
+/*
+ * Similar to HYPERVISOR_xen_version but with a sane interface
+ * (has a length, one can probe for the length) and with one less sub-ops:
+ * missing XENVER_compile_info.
+ */
+DO(version_op)(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg,
+               unsigned int len)
+{
+    union {
+        xen_version_op_val_t val;
+        xen_feature_info_t fi;
+    } u = {};
+    unsigned int sz = 0;
+    const void *ptr = NULL;
+    int rc = xsm_version_op(XSM_OTHER, cmd);
+
+    /* We can safely return -EPERM! */
+    if ( rc )
+        return rc;
+
+    /*
+     * The HYPERVISOR_xen_version differs in that some return the value,
+     * and some copy it on back on argument. We follow the same rule for all
+     * sub-ops: return 0 on success, positive value of bytes returned, and
+     * always copy the result in arg. Yeey sanity!
+     */
+    switch ( cmd )
+    {
+    case XEN_VERSION_version:
+        sz = sizeof(xen_version_op_val_t);
+        u.val = (xen_major_version() << 16) | xen_minor_version();
+        break;
+
+    case XEN_VERSION_extraversion:
+        sz = strlen(xen_extra_version()) + 1;
+        ptr = xen_extra_version();
+        break;
+
+    case XEN_VERSION_capabilities:
+        sz = cached_cap_len;
+        ptr = cached_cap;
+        break;
+
+    case XEN_VERSION_changeset:
+        sz = strlen(xen_changeset()) + 1;
+        ptr = xen_changeset();
+        break;
+
+    case XEN_VERSION_platform_parameters:
+        sz = sizeof(xen_version_op_val_t);
+        u.val = HYPERVISOR_VIRT_START;
+        break;
+
+    case XEN_VERSION_get_features:
+        sz = sizeof(xen_feature_info_t);
+
+        if ( guest_handle_is_null(arg) )
+            break;
+
+        if ( copy_from_guest(&u.fi, arg, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+        rc = get_features(current->domain, &u.fi);
+        break;
+
+    case XEN_VERSION_pagesize:
+        sz = sizeof(xen_version_op_val_t);
+        u.val = PAGE_SIZE;
+        break;
+
+    case XEN_VERSION_guest_handle:
+        sz = ARRAY_SIZE(current->domain->handle);
+        ptr = current->domain->handle;
+        break;
+
+    case XEN_VERSION_commandline:
+        sz = strlen(saved_cmdline) + 1;
+        ptr = saved_cmdline;
+        break;
+
+    default:
+        rc = -ENOSYS;
+    }
+
+    if ( rc )
+        return rc;
+
+    /*
+     * This hypercall also allows the client to probe. If it provides
+     * a NULL arg we will return the size of the space it has to
+     * allocate for the specific sub-op.
+     */
+    ASSERT(sz);
+    if ( guest_handle_is_null(arg) )
+        return sz;
+
+    if ( !rc )
+    {
+        unsigned int bytes = min(sz, len);
+
+        if ( copy_to_guest(arg, ptr ? : &u, bytes) )
+            rc = -EFAULT;
+
+        /* We return len (truncate) worth of data even if we fail. */
+        if ( !rc && sz > len )
+            rc = -ENOBUFS;
+    }
+
+    return rc == 0 ? sz : rc;
+}
+
 DO(nmi_op)(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct xennmi_callback cb;
@@ -418,6 +543,20 @@ DO(ni_hypercall)(void)
     return -ENOSYS;
 }
 
+static int __init kernel_cache_init(void)
+{
+    /*
+     * Pre-allocate the cache so we do not have to worry about
+     * simultaneous invocations on safe_strcat by guests and the cache
+     * data becoming garbage.
+     */
+    arch_get_xen_caps(&cached_cap);
+    cached_cap_len = strlen(cached_cap) + 1;
+
+    return 0;
+}
+__initcall(kernel_cache_init);
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index 870bc3b..5f90718 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -128,6 +128,8 @@
  *    * VCPUOP_register_vcpu_info
  *    * VCPUOP_register_runstate_memory_area
  *
+ *  HYPERVISOR_version_op
+ *   All generic sub-operations
  *
  * Other notes on the ARM ABI:
  *
diff --git a/xen/include/public/version.h b/xen/include/public/version.h
index 24a582f..d71ec5b 100644
--- a/xen/include/public/version.h
+++ b/xen/include/public/version.h
@@ -30,7 +30,14 @@
 
 #include "xen.h"
 
-/* NB. All ops return zero on success, except XENVER_{version,pagesize} */
+/*
+ * There are two hypercalls mentioned in here. The XENVER_ are for
+ * HYPERCALL_xen_version (17), while VERSION_ are for the
+ * HYPERCALL_version_op (41).
+ *
+ * The subops are very similar except that the later hypercall has a
+ * sane interface.
+ */
 
 /* arg == NULL; returns major:minor (16:16). */
 #define XENVER_version      0
@@ -87,6 +94,67 @@ typedef struct xen_feature_info xen_feature_info_t;
 #define XENVER_commandline 9
 typedef char xen_commandline_t[1024];
 
+/*
+ * The HYPERCALL_version_op has a set of sub-ops which mirror the
+ * sub-ops of HYPERCALL_xen_version. However this hypercall differs
+ * radically from the former:
+ *  - It returns the amount of bytes returned.
+ *  - It will return -XEN_EPERM if the guest is not permitted
+ *    (Albeit XEN_VERSION_version, XEN_VERSION_platform_parameters, and
+ *    XEN_VERSION_get_features will always return an value as guest cannot
+ *    survive without this information).
+ *  - It will return the requested data in arg.
+ *  - It requires an third argument (len) for the length of the
+ *    arg. Naturally the arg has to fit the requested data otherwise
+ *    -XEN_ENOBUFS is returned.
+ *
+ * It also offers an mechanism to probe for the amount of bytes an
+ * sub-op will require. Having the arg have an NULL handle will
+ * return the number of bytes requested for the operation. Or an
+ * negative value if an error is encountered.
+ */
+
+typedef uint64_t xen_version_op_val_t;
+DEFINE_XEN_GUEST_HANDLE(xen_version_op_val_t);
+
+/*
+ * arg == xen_version_op_val_t. Encoded as major:minor (31..16:15..0), while
+ * 63..32 are zero.
+ */
+#define XEN_VERSION_version             0
+
+/* arg == char[]. Contains NUL terminated utf-8 string. */
+#define XEN_VERSION_extraversion        1
+
+/* arg == char[]. Contains NUL terminated utf-8 string. */
+#define XEN_VERSION_capabilities        3
+
+/* arg == char[]. Contains NUL terminated utf-8 string. */
+#define XEN_VERSION_changeset           4
+
+/* arg == xen_version_op_val_t. */
+#define XEN_VERSION_platform_parameters 5
+
+/*
+ * arg = xen_feature_info_t - shares the same structure
+ * as the XENVER_get_features.
+ */
+#define XEN_VERSION_get_features        6
+
+/* arg == xen_version_op_val_t. */
+#define XEN_VERSION_pagesize            7
+
+/*
+ * arg == void.
+ *
+ * The toolstack fills it out for guest consumption. It is intended to hold
+ * the UUID of the guest.
+ */
+#define XEN_VERSION_guest_handle        8
+
+/* arg = char[]. Contains NUL terminated utf-8 string. */
+#define XEN_VERSION_commandline         9
+
 #endif /* __XEN_PUBLIC_VERSION_H__ */
 
 /*
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 64ba7ab..1a99929 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -115,6 +115,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define __HYPERVISOR_tmem_op              38
 #define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
 #define __HYPERVISOR_xenpmu_op            40
+#define __HYPERVISOR_version_op           41 /* supersedes xen_version (17) */
 
 /* Architecture-specific hypercall definitions. */
 #define __HYPERVISOR_arch_0               48
diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
index 0c8ae0e..e8d2b81 100644
--- a/xen/include/xen/hypercall.h
+++ b/xen/include/xen/hypercall.h
@@ -147,6 +147,10 @@ do_xenoprof_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg);
 extern long
 do_xenpmu_op(unsigned int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg);
 
+extern long
+do_version_op(unsigned int cmd,
+    XEN_GUEST_HANDLE_PARAM(void) arg, unsigned int len);
+
 #ifdef CONFIG_COMPAT
 
 extern int
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index abbe282..e5dad35 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -751,3 +751,24 @@ static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
         return xsm_default_action(XSM_PRIV, current->domain, NULL);
     }
 }
+
+static XSM_INLINE int xsm_version_op (XSM_DEFAULT_ARG uint32_t op)
+{
+    XSM_ASSERT_ACTION(XSM_OTHER);
+    switch ( op )
+    {
+    case XEN_VERSION_version:
+    case XEN_VERSION_platform_parameters:
+    case XEN_VERSION_get_features:
+        /* These MUST always be accessible to any guest by default. */
+        return 0;
+    case XEN_VERSION_extraversion:
+    case XEN_VERSION_capabilities:
+    case XEN_VERSION_pagesize:
+    case XEN_VERSION_guest_handle:
+        /* These can be accessible to a guest. */
+        return xsm_default_action(XSM_HOOK, current->domain, NULL);
+    default:
+        return xsm_default_action(XSM_PRIV, current->domain, NULL);
+    }
+}
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 5ecbee0..ac80472 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -194,6 +194,7 @@ struct xsm_operations {
     int (*pmu_op) (struct domain *d, unsigned int op);
 #endif
     int (*xen_version) (uint32_t cmd);
+    int (*version_op) (uint32_t cmd);
 };
 
 #ifdef CONFIG_XSM
@@ -737,6 +738,11 @@ static inline int xsm_xen_version (xsm_default_t def, uint32_t op)
     return xsm_ops->xen_version(op);
 }
 
+static inline int xsm_version_op (xsm_default_t def, uint32_t op)
+{
+    return xsm_ops->version_op(op);
+}
+
 #endif /* XSM_NO_WRAPPERS */
 
 #ifdef CONFIG_MULTIBOOT
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 9791ad4..776dd09 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -163,4 +163,5 @@ void xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, pmu_op);
 #endif
     set_to_dummy_if_null(ops, xen_version);
+    set_to_dummy_if_null(ops, version_op);
 }
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 2069cb3..1eaec58 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1658,6 +1658,40 @@ static int flask_xen_version (uint32_t op)
     }
 }
 
+static int flask_version_op (uint32_t op)
+{
+    u32 dsid = domain_sid(current->domain);
+
+    switch ( op )
+    {
+    case XEN_VERSION_version:
+    case XEN_VERSION_platform_parameters:
+    case XEN_VERSION_get_features:
+        /* These MUST always be accessible to any guest by default. */
+        return 0;
+    case XEN_VERSION_extraversion:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__EXTRAVERSION, NULL);
+    case XEN_VERSION_capabilities:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__CAPABILITIES, NULL);
+    case XEN_VERSION_changeset:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__CHANGESET, NULL);
+    case XEN_VERSION_pagesize:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__PAGESIZE, NULL);
+    case XEN_VERSION_guest_handle:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__GUEST_HANDLE, NULL);
+    case XEN_VERSION_commandline:
+        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
+                            VERSION__COMMANDLINE, NULL);
+    default:
+        return -EPERM;
+    }
+}
+
 long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 int compat_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 
@@ -1797,6 +1831,7 @@ static struct xsm_operations flask_ops = {
     .pmu_op = flask_pmu_op,
 #endif
     .xen_version = flask_xen_version,
+    .version_op = flask_version_op,
 };
 
 static __init void flask_init(void)
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index badcf1c..56600bb 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -496,12 +496,14 @@ class security
     del_ocontext
 }
 
-# Class version is used to describe the XENVER_ hypercall.
+# Class version is used to describe the XENVER_ and VERSION hypercall.
 # Almost all sub-ops are described here - in the default case all of them should
-# be allowed except the XENVER_commandline.
+# be allowed except the XENVER_commandline, VERSION_commandline, and
+# VERSION_changeset.
 #
 # The ones that are omitted are XENVER_version, XENVER_platform_parameters,
-# and XENVER_get_features  - as they MUST always be returned to a guest.
+# XENVER_get_features, XEN_VERSION_version, XEN_VERSION_platform_parameters,
+# and XEN_VERSION_get_features - as they MUST always be returned to a guest.
 #
 class version
 {
@@ -519,4 +521,17 @@ class version
     xen_guest_handle
 # Xen command line.
     xen_commandline
+# --- VERSION hypercall ---
+# Extra informations (-unstable).
+    extraversion
+# Such as "xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64".
+    capabilities
+# Source code changeset.
+    changeset
+# Page size the hypervisor uses.
+    pagesize
+# An value that the control stack can choose.
+    guest_handle
+# Xen command line.
+    commandline
 }
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 07/34] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables
  2016-03-23  8:19       ` Jan Beulich
  2016-03-23 11:17         ` Julien Grall
@ 2016-03-24  2:49         ` Konrad Rzeszutek Wilk
  2016-03-24  9:20           ` Jan Beulich
  1 sibling, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-24  2:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, mpohlack,
	Julien Grall, Stefano Stabellini, sasha.levin, xen-devel

> >> > +#ifdef CONFIG_X86
> >> > +#include <asm/uaccess.h>
> >> > +#endif
> >> 
> >> Why?
> > 
> > Otherwise the compilation will fail on ARM as they do not have exceptions
> > (and no asm/uaccess.h file)
> 
> Well, the question was for the #include, not the #ifdef.

Ah, yes. And with the 'ex' being pointers it matters no.

> 
> > --- a/xen/common/symbols.c
> > +++ b/xen/common/symbols.c
> > @@ -17,6 +17,7 @@
> >  #include <xen/lib.h>
> >  #include <xen/string.h>
> >  #include <xen/spinlock.h>
> > +#include <xen/virtual_region.h>
> >  #include <public/platform.h>
> >  #include <xen/guest_access.h>
> >  
> > @@ -97,8 +98,7 @@ static unsigned int get_symbol_offset(unsigned long pos)
> >  
> >  bool_t is_active_kernel_text(unsigned long addr)
> >  {
> > -    return (is_kernel_text(addr) ||
> > -            (system_state < SYS_STATE_active && is_kernel_inittext(addr)));
> > +    return !!search_virtual_regions(addr);
> 
> search_virtual_regions() doesn't sound like it would be looking for
> text addresses only.

I am not sure what would be a better name - as it
(search_virtual_regions) is used by three other callers?

search_for_addr? 

.. snip..
> > +static void __unregister_virtual_region(struct virtual_region *r)
> 
> > +{
> > +    unsigned long flags;
> > +
> > +    spin_lock_irqsave(&virtual_region_lock, flags);
> > +    list_del_rcu(&r->list);
> > +    spin_unlock_irqrestore(&virtual_region_lock, flags);
> > +    /*
> > +     * We do not need to invoke call_rcu.
> > +     *
> > +     * This is due to the fact that on the deletion we have made sure
> > +     * to use spinlocks (to guard against somebody else calling
> > +     * unregister_virtual_region) and list_deletion spiced with an memory
> > +     * barrier - which will flush out the cache lines in other CPUs.
> 
> I don't think barriers do any kind of cache flushing on remote CPUs
> (not even on the local one).

I am not sure what I had been thinking. The only thing it does is a
memory barrier.

.. snip..

I believe I've addressed the review comments you had:
From 4a2690ba815db7edd5fe075c8d7e9e2ac62a0020 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Thu, 10 Mar 2016 16:35:50 -0500
Subject: [PATCH] arm/x86: Use struct virtual_region to do bug, symbol, and
 (x86) exception tables lookup.

During execution of the hypervisor we have two regions of
executable code - stext -> _etext, and _sinittext -> _einitext.

The later is not needed after bootup.

We also have various built-in macros and functions to search
in between those two swaths depending on the state of the system.

That is either for bug_frames, exceptions (x86) or symbol
names for the instruction.

With xSplice in the picture - we need a mechansim for new payloads
to searched as well for all of this.

Originally we had extra 'if (xsplice)...' but that gets
a bit tiring and does not hook up nicely.

This 'struct virtual_region' and virtual_region_list provide a
mechanism to search for the bug_frames, exception table,
and symbol names entries without having various calls in
other sub-components in the system.

Code which wishes to participate in bug_frames and exception table
entries search has to only use two public APIs:
 - register_virtual_region
 - unregister_virtual_region

to let the core code know.

If the ->lookup_symbol is not then the default internal symbol lookup
mechanism is used.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

---
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v4: New patch.
v5:
 - Rename to virtual_region.
 - Ditch the 'skip' function.
 - Remove the _stext.
 - Use RCU lists.
 - Add a search function.
 - Remove extern, add rcu_read_lock. remove __ from name.
---
---
 xen/arch/arm/setup.c             |   4 +
 xen/arch/arm/traps.c             |  39 ++++++----
 xen/arch/x86/extable.c           |  12 ++-
 xen/arch/x86/setup.c             |   6 ++
 xen/arch/x86/traps.c             |  40 ++++++----
 xen/common/Makefile              |   1 +
 xen/common/symbols.c             |  11 ++-
 xen/common/virtual_region.c      | 160 +++++++++++++++++++++++++++++++++++++++
 xen/include/xen/symbols.h        |   9 +++
 xen/include/xen/virtual_region.h |  48 ++++++++++++
 10 files changed, 293 insertions(+), 37 deletions(-)
 create mode 100644 xen/common/virtual_region.c
 create mode 100644 xen/include/xen/virtual_region.h

diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 6d205a9..09ff1ea 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -34,6 +34,7 @@
 #include <xen/keyhandler.h>
 #include <xen/cpu.h>
 #include <xen/pfn.h>
+#include <xen/virtual_region.h>
 #include <xen/vmap.h>
 #include <xen/libfdt/libfdt.h>
 #include <xen/acpi.h>
@@ -860,6 +861,9 @@ void __init start_xen(unsigned long boot_phys_offset,
 
     system_state = SYS_STATE_active;
 
+    /* Must be done past setting system_state. */
+    unregister_init_virtual_region();
+
     domain_unpause_by_systemcontroller(dom0);
 
     /* Switch on to the dynamically allocated stack for the idle vcpu
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 31d2115..69ccb47 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -31,6 +31,7 @@
 #include <xen/softirq.h>
 #include <xen/domain_page.h>
 #include <xen/perfc.h>
+#include <xen/virtual_region.h>
 #include <public/sched.h>
 #include <public/xen.h>
 #include <asm/debugger.h>
@@ -101,6 +102,8 @@ integer_param("debug_stack_lines", debug_stack_lines);
 
 void init_traps(void)
 {
+    setup_virtual_regions();
+
     /* Setup Hyp vector base */
     WRITE_SYSREG((vaddr_t)hyp_traps_vector, VBAR_EL2);
 
@@ -1077,27 +1080,33 @@ void do_unexpected_trap(const char *msg, struct cpu_user_regs *regs)
 
 int do_bug_frame(struct cpu_user_regs *regs, vaddr_t pc)
 {
-    const struct bug_frame *bug;
+    const struct bug_frame *bug = NULL;
     const char *prefix = "", *filename, *predicate;
     unsigned long fixup;
-    int id, lineno;
-    static const struct bug_frame *const stop_frames[] = {
-        __stop_bug_frames_0,
-        __stop_bug_frames_1,
-        __stop_bug_frames_2,
-        NULL
-    };
+    int id = -1, lineno;
+    struct virtual_region *region;
 
-    for ( bug = __start_bug_frames, id = 0; stop_frames[id]; ++bug )
+    region = search_for_addr(pc);
+    if ( region )
     {
-        while ( unlikely(bug == stop_frames[id]) )
-            ++id;
+        for ( id = 0; id < BUGFRAME_NR; id++ )
+        {
+            const struct bug_frame *b;
+            unsigned int i;
 
-        if ( ((vaddr_t)bug_loc(bug)) == pc )
-            break;
+            for ( i = 0, b = region->frame[id].bugs;
+                  i < region->frame[id].n_bugs; b++, i++ )
+            {
+                if ( ((vaddr_t)bug_loc(b)) == pc )
+                {
+                    bug = b;
+                    goto found;
+                }
+            }
+        }
     }
-
-    if ( !stop_frames[id] )
+ found:
+    if ( !bug )
         return -ENOENT;
 
     /* WARN, BUG or ASSERT: decode the filename pointer and line number. */
diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c
index 89b5bcb..14cd533 100644
--- a/xen/arch/x86/extable.c
+++ b/xen/arch/x86/extable.c
@@ -1,10 +1,12 @@
 
-#include <xen/config.h>
 #include <xen/init.h>
+#include <xen/list.h>
 #include <xen/perfc.h>
+#include <xen/rcupdate.h>
 #include <xen/sort.h>
 #include <xen/spinlock.h>
 #include <asm/uaccess.h>
+#include <xen/virtual_region.h>
 
 #define EX_FIELD(ptr, field) ((unsigned long)&(ptr)->field + (ptr)->field)
 
@@ -80,8 +82,12 @@ search_one_table(const struct exception_table_entry *first,
 unsigned long
 search_exception_table(unsigned long addr)
 {
-    return search_one_table(
-        __start___ex_table, __stop___ex_table-1, addr);
+    struct virtual_region *region = search_for_addr(addr);
+
+    if ( region && region->ex )
+        return search_one_table(region->ex, region->ex_end-1, addr);
+
+    return 0;
 }
 
 unsigned long
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index ee65f55..c8a5adb 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -26,6 +26,7 @@
 #include <xen/pfn.h>
 #include <xen/nodemask.h>
 #include <xen/tmem_xen.h>
+#include <xen/virtual_region.h>
 #include <xen/watchdog.h>
 #include <public/version.h>
 #include <compat/platform.h>
@@ -514,6 +515,9 @@ static void noinline init_done(void)
 
     system_state = SYS_STATE_active;
 
+    /* MUST be done prior to removing .init data. */
+    unregister_init_virtual_region();
+
     domain_unpause_by_systemcontroller(hardware_domain);
 
     /* Zero the .init code and data. */
@@ -616,6 +620,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     smp_prepare_boot_cpu();
     sort_exception_tables();
 
+    setup_virtual_regions();
+
     /* Full exception support from here on in. */
 
     loader = (mbi->flags & MBI_LOADERNAME)
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 6fbb1cf..3708309 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -48,6 +48,7 @@
 #include <xen/kexec.h>
 #include <xen/trace.h>
 #include <xen/paging.h>
+#include <xen/virtual_region.h>
 #include <xen/watchdog.h>
 #include <asm/system.h>
 #include <asm/io.h>
@@ -1132,18 +1133,12 @@ static int emulate_forced_invalid_op(struct cpu_user_regs *regs)
 
 void do_invalid_op(struct cpu_user_regs *regs)
 {
-    const struct bug_frame *bug;
+    const struct bug_frame *bug = NULL;
     u8 bug_insn[2];
     const char *prefix = "", *filename, *predicate, *eip = (char *)regs->eip;
     unsigned long fixup;
-    int id, lineno;
-    static const struct bug_frame *const stop_frames[] = {
-        __stop_bug_frames_0,
-        __stop_bug_frames_1,
-        __stop_bug_frames_2,
-        __stop_bug_frames_3,
-        NULL
-    };
+    int id = -1, lineno;
+    struct virtual_region *region;
 
     DEBUGGER_trap_entry(TRAP_invalid_op, regs);
 
@@ -1160,16 +1155,29 @@ void do_invalid_op(struct cpu_user_regs *regs)
          memcmp(bug_insn, "\xf\xb", sizeof(bug_insn)) )
         goto die;
 
-    for ( bug = __start_bug_frames, id = 0; stop_frames[id]; ++bug )
+    region = search_for_addr(regs->eip);
+    if ( region )
     {
-        while ( unlikely(bug == stop_frames[id]) )
-            ++id;
-        if ( bug_loc(bug) == eip )
-            break;
+        for ( id = 0; id < BUGFRAME_NR; id++ )
+        {
+            const struct bug_frame *b;
+            unsigned int i;
+
+            for ( i = 0, b = region->frame[id].bugs;
+                  i < region->frame[id].n_bugs; b++, i++ )
+            {
+                if ( bug_loc(b) == eip )
+                {
+                    bug = b;
+                    goto found;
+                }
+            }
+        }
     }
-    if ( !stop_frames[id] )
-        goto die;
 
+ found:
+    if ( !bug )
+        goto die;
     eip += sizeof(bug_insn);
     if ( id == BUGFRAME_run_fn )
     {
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 77de27e..e43ec49 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -51,6 +51,7 @@ obj-y += time.o
 obj-y += timer.o
 obj-y += trace.o
 obj-y += version.o
+obj-y += virtual_region.o
 obj-y += vm_event.o
 obj-y += vmap.o
 obj-y += vsprintf.o
diff --git a/xen/common/symbols.c b/xen/common/symbols.c
index a59c59d..23b142d 100644
--- a/xen/common/symbols.c
+++ b/xen/common/symbols.c
@@ -17,6 +17,7 @@
 #include <xen/lib.h>
 #include <xen/string.h>
 #include <xen/spinlock.h>
+#include <xen/virtual_region.h>
 #include <public/platform.h>
 #include <xen/guest_access.h>
 
@@ -97,8 +98,7 @@ static unsigned int get_symbol_offset(unsigned long pos)
 
 bool_t is_active_kernel_text(unsigned long addr)
 {
-    return (is_kernel_text(addr) ||
-            (system_state < SYS_STATE_active && is_kernel_inittext(addr)));
+    return !!search_for_addr(addr);
 }
 
 const char *symbols_lookup(unsigned long addr,
@@ -108,13 +108,18 @@ const char *symbols_lookup(unsigned long addr,
 {
     unsigned long i, low, high, mid;
     unsigned long symbol_end = 0;
+    struct virtual_region *region;
 
     namebuf[KSYM_NAME_LEN] = 0;
     namebuf[0] = 0;
 
-    if (!is_active_kernel_text(addr))
+    region = search_for_addr(addr);
+    if (!region)
         return NULL;
 
+    if (region->symbols_lookup)
+        return region->symbols_lookup(addr, symbolsize, offset, namebuf);
+
         /* do a binary search on the sorted symbols_addresses array */
     low = 0;
     high = symbols_num_syms;
diff --git a/xen/common/virtual_region.c b/xen/common/virtual_region.c
new file mode 100644
index 0000000..2ff2d7b
--- /dev/null
+++ b/xen/common/virtual_region.c
@@ -0,0 +1,160 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#include <xen/init.h>
+#include <xen/kernel.h>
+#include <xen/rcupdate.h>
+#include <xen/spinlock.h>
+#include <xen/virtual_region.h>
+
+#ifdef CONFIG_X86
+#include <asm/uaccess.h>
+#endif
+
+static struct virtual_region core = {
+    .list = LIST_HEAD_INIT(core.list),
+    .start = (unsigned long)_stext,
+    .end = (unsigned long)_etext,
+#ifdef CONFIG_X86
+    .ex = (struct exception_table_entry *)__start___ex_table,
+    .ex_end = (struct exception_table_entry *)__stop___ex_table,
+#endif
+};
+
+/* Becomes irrelevant when __init sections are cleared. */
+static struct virtual_region core_init __initdata = {
+    .list = LIST_HEAD_INIT(core_init.list),
+    .start = (unsigned long)_sinittext,
+    .end = (unsigned long)_einittext,
+#ifdef CONFIG_X86
+    /* Even if they are __init their exception entry still gets stuck here. */
+    .ex = (struct exception_table_entry *)__start___ex_table,
+    .ex_end = (struct exception_table_entry *)__stop___ex_table,
+#endif
+};
+
+/*
+ * RCU locking. Additions are done either at startup (when there is only
+ * one CPU) or when all CPUs are running without IRQs.
+ *
+ * Deletions are big tricky. We do it when xSplicing (all CPUs running
+ * without IRQs) or during bootup (when clearing the init).
+ *
+ * Hence we use list_del_rcu (which sports an memory fence) and a spinlock
+ * on deletion.
+ *
+ * All readers of virtual_region_list MUST use list list_for_each_entry_rcu.
+ *
+ */
+static LIST_HEAD(virtual_region_list);
+static DEFINE_SPINLOCK(virtual_region_lock);
+static DEFINE_RCU_READ_LOCK(rcu_virtual_region_lock);
+
+struct virtual_region* search_for_addr(unsigned long addr)
+{
+    struct virtual_region *region;
+
+    rcu_read_lock(&rcu_virtual_region_lock);
+
+    list_for_each_entry_rcu( region, &virtual_region_list, list )
+    {
+        if ( addr >= region->start && addr < region->end )
+        {
+            rcu_read_unlock(&rcu_virtual_region_lock);
+            return region;
+        }
+    }
+
+    rcu_read_unlock(&rcu_virtual_region_lock);
+    return NULL;
+}
+
+int register_virtual_region(struct virtual_region *r)
+{
+    ASSERT(!local_irq_is_enabled());
+
+    list_add_tail_rcu(&r->list, &virtual_region_list);
+
+    return 0;
+}
+
+static void remove_virtual_region(struct virtual_region *r)
+{
+    unsigned long flags;
+
+    spin_lock_irqsave(&virtual_region_lock, flags);
+    list_del_rcu(&r->list);
+    spin_unlock_irqrestore(&virtual_region_lock, flags);
+    /*
+     * We do not need to invoke call_rcu.
+     *
+     * This is due to the fact that on the deletion we have made sure
+     * to use spinlocks (to guard against somebody else calling
+     * unregister_virtual_region) and list_deletion spiced with
+     * memory barrier.
+     *
+     * That protects us from corrupting the list as the readers all
+     * use list_for_each_entry_rcu which is safe against concurrent
+     * deletions.
+     */
+}
+
+void unregister_virtual_region(struct virtual_region *r)
+{
+    /* Expected to be called from xSplice - which has IRQs disabled. */
+    ASSERT(!local_irq_is_enabled());
+
+    remove_virtual_region(r);
+}
+
+void unregister_init_virtual_region(void)
+{
+    BUG_ON(system_state != SYS_STATE_active);
+
+    remove_virtual_region(&core_init);
+}
+
+void __init setup_virtual_regions(void)
+{
+    ssize_t sz;
+    unsigned int i;
+    static const struct bug_frame *const __initconstrel bug_frames[] = {
+        __start_bug_frames,
+        __stop_bug_frames_0,
+        __stop_bug_frames_1,
+        __stop_bug_frames_2,
+#ifdef CONFIG_X86
+        __stop_bug_frames_3,
+#endif
+        NULL
+    };
+
+    for ( i = 1; bug_frames[i]; i++ )
+    {
+        const struct bug_frame *s;
+
+        s = bug_frames[i - 1];
+        sz = bug_frames[i] - s;
+
+        core.frame[i - 1].n_bugs = sz;
+        core.frame[i - 1].bugs = s;
+
+        core_init.frame[i - 1].n_bugs = sz;
+        core_init.frame[i - 1].bugs = s;
+    }
+
+    register_virtual_region(&core_init);
+    register_virtual_region(&core);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/symbols.h b/xen/include/xen/symbols.h
index 1fa0537..f58e611 100644
--- a/xen/include/xen/symbols.h
+++ b/xen/include/xen/symbols.h
@@ -5,6 +5,15 @@
 
 #define KSYM_NAME_LEN 127
 
+/*
+ * Typedef for the callback functions that symbols_lookup
+ * can call if virtual_region_list has an callback for it.
+ */
+typedef const char *symbols_lookup_t(unsigned long addr,
+                                     unsigned long *symbolsize,
+                                     unsigned long *offset,
+                                     char *namebuf);
+
 /* Lookup an address. */
 const char *symbols_lookup(unsigned long addr,
                            unsigned long *symbolsize,
diff --git a/xen/include/xen/virtual_region.h b/xen/include/xen/virtual_region.h
new file mode 100644
index 0000000..04640cc
--- /dev/null
+++ b/xen/include/xen/virtual_region.h
@@ -0,0 +1,48 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#ifndef __XEN_VIRTUAL_REGION_LIST__
+#define __XEN_VIRTUAL_REGION_LIST__
+
+#include <xen/list.h>
+#include <xen/symbols.h>
+
+struct virtual_region
+{
+    struct list_head list;
+    unsigned long start;        /* Virtual address start. */
+    unsigned long end;          /* Virtual address start. */
+
+    /*
+     * If this is NULL the default lookup mechanism is used.
+     */
+    symbols_lookup_t *symbols_lookup;
+
+    struct {
+        const struct bug_frame *bugs; /* The pointer to array of bug frames. */
+        ssize_t n_bugs;         /* The number of them. */
+    } frame[BUGFRAME_NR];
+
+    struct exception_table_entry *ex;
+    struct exception_table_entry *ex_end;
+};
+
+struct virtual_region *search_for_addr(unsigned long addr);
+void setup_virtual_regions(void);
+void unregister_init_virtual_region(void);
+int register_virtual_region(struct virtual_region *r);
+void unregister_virtual_region(struct virtual_region *r);
+
+#endif /* __XEN_VIRTUAL_REGION_LIST__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 12/34] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
  2016-03-23 13:51   ` Jan Beulich
@ 2016-03-24  3:13     ` Konrad Rzeszutek Wilk
  2016-03-24  9:29       ` Jan Beulich
  0 siblings, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-24  3:13 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Stefano Stabellini, andrew.cooper3, Ian Jackson,
	mpohlack, ross.lagerwall, xen-devel, Daniel De Graaf,
	sasha.levin

On Wed, Mar 23, 2016 at 07:51:29AM -0600, Jan Beulich wrote:
> >>> On 15.03.16 at 18:56, <konrad.wilk@oracle.com> wrote:
> > --- a/xen/common/Kconfig
> > +++ b/xen/common/Kconfig
> > @@ -168,4 +168,15 @@ config SCHED_DEFAULT
> >  
> >  endmenu
> >  
> > +# Enable/Disable xsplice support
> > +config XSPLICE
> > +	bool "xSplice live patching support"
> > +	default y
> 
> Isn't it a little early in the series to default this to on?

I am ambitious!
> 
> And then of course the EXPERT question comes up again. No
> matter that IanC is no longer around to help with the
> argumentation, the point he has been making about too many
> flavors ending up in the wild continues to apply.

'too many flavors'? As in different versions of Xen with or without
these options enabled? 

.. snip..
> 
> > +static int find_payload(const xen_xsplice_name_t *name, struct payload **f)
> 
..snip..
> > +        return -EFAULT;
> > +
> > +    spin_lock_recursive(&payload_lock);
> 
> Why do you need a recursive lock here? I think something like this
> should be reasoned about in the commit message.

The earlier version used an extra parameter (locked) to diffrenciate
whether to take a lock or not as the caller could have taken it.

Andrew didn't like it particularly and asked it to be recursive
so that we don't by accident mess up the locking.

.. snip..
> > +static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)

.. snip..
> > + out:
> > +    vfree(raw_data);
> 
> By here you allocated and filled raw_data. And now you
> unconditionally free it. What is that good for?

Nothing. It was added as a placeholder - as the patch
titled "xsplice: Implement payload loading" is actually doing
useful things. I've moved the operations around raw_data into that
patch.

> > +static int xsplice_list(xen_sysctl_xsplice_list_t *list)
> > +{
> > +    xen_xsplice_status_t status;
> > +    struct payload *data;
> > +    unsigned int idx = 0, i = 0;
> > +    int rc = 0;
> > +
> > +    if ( list->nr > 1024 )
> > +        return -E2BIG;
> > +
> > +    if ( list->pad != 0 )
> > +        return -EINVAL;
> > +
> > +    if ( !guest_handle_okay(list->status, sizeof(status) * list->nr) ||
> > +         !guest_handle_okay(list->name, XEN_XSPLICE_NAME_SIZE * list->nr) ||
> > +         !guest_handle_okay(list->len, sizeof(uint32_t) * list->nr) )
> 
> guest_handle_okay() already takes into account the element size,
> i.e. it's only the middle one which needs to do any multiplication.
> 
> > +        return -EINVAL;
> > +
> > +    spin_lock_recursive(&payload_lock);
> > +    if ( list->idx > payload_cnt || !list->nr )
> 
> The list->nr check could move up outside the locked region (e.g.
> merge with the pad field check).

I reworked this a bit. I made it so that if list->nr is 0 we would
populate list->nr=payload_count, list->version=payload_version.


> 
> > +    {
> > +        spin_unlock_recursive(&payload_lock);
> > +        return -EINVAL;
> > +    }
> > +
> > +    list_for_each_entry( data, &payload_list, list )
> 
> Aren't you lacking a list->version check prior to entering this loop
> (which would then mean you don't need to store it below, but only
> on the error path from that check)?

No. The toolstack has no idea of what the right version is on the
first invocation. Which is OK since it gets fresh data (it is
its first invocation).

On subsequent invocations we gleefuly populate up to
min(payload_cnt, ->nr) of data even if the version the toolstack
provided is different. The toolstack will have to decide to throw away
the data and retry the hypercall; or print it out as is.

> 
> > +    {
> > +        uint32_t len;
> > +
> > +        if ( list->idx > i++ )
> > +            continue;
> > +
> > +        status.state = data->state;
> > +        status.rc = data->rc;
> > +        len = strlen(data->name);
> > +
> > +        /* N.B. 'idx' != 'i'. */
> > +        if ( __copy_to_guest_offset(list->name, idx * XEN_XSPLICE_NAME_SIZE,
> > +                                    data->name, len) ||
> > +             __copy_to_guest_offset(list->len, idx, &len, 1) ||
> 
> You're not coping the NUL terminator here, which makes the result
> more cumbersome to consume by the caller. Perhaps
> XEN_XSPLICE_NAME_SIZE should remain to be 128 (other than
> suggested above), but be specified to include the terminator?

Yes. Fixed that. It also needed a minor change in:
"libxc: Implementation of XEN_XSPLICE_op in libxc" to account for
strlen. (+1 to its result)


Here is the newly minted patch with your suggestions hopefully
implemented to your liking!

From 40f0e9fdb50935d4d3df608950313051a28f12b9 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Mon, 25 Jan 2016 10:51:22 -0500
Subject: [PATCH] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op

The implementation does not actually do any patching.

It just adds the framework for doing the hypercalls,
keeping track of ELF payloads, and the basic operations:
 - query which payloads exist,
 - query for specific payloads,
 - check*1, apply*1, replace*1, and unload payloads.

*1: Which of course in this patch are nops.

The functionality is disabled on ARM until all arch
components are implemented.

Also by default it is disabled until the implementation
is in place.

We also use recursive spinlocks to so that the find_payload
function does not need to have a 'lock' and 'non-lock' variant.

Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>

---
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>

v2: Rebased on keyhandler: rework keyhandler infrastructure
v3: Fixed XSM.
 - Removed REVERTED state.
    Split status and error code.
    Add REPLACE action.
    Separate payload data from the payload structure.
    s/XSPLICE_ID_../XSPLICE_NAME_../
 - Add xsplice and CONFIG_XSPLICE build toption.
    Fix code per Jan's review.
    Update the sysctl.h (change bits to enum like)
 - Rebase on Kconfig changes.
 - Add missing pad checks. Re-order keyhandler.h to build on ARM.
 - Rebase on build: hook the schedulers into Kconfig
 - s/id/name/; s/payload_list_lock/payload_lock/
 - Put #ifdef CONFIG_XSPLICE in header file per Doug review.
 - Andrew review:
    - use recursive spinlocks, change name to xsplice_op,
      sprinkle new-lines, add local variable block, include
      state diagram, squash two goto labels, use vzalloc instead of
      alloc_xenheap_pages.
    - change 'state' from int32 to uint32_t
    - remove the err label out of xsplice_upload
    - use void* instaed of uint8_t
    - move code around to make it easier to read.
    - Add vmap.h to compiler under ARM.
 - Add missing Copyright in header file
 - Dropped LOADED state, make the payload go in CHECKED.
v4: Made it only work on x86 per Julien's (ARM) maintainer request.
v5: Dropped the load->check state example in sysctl.h
    Made the ->nr=0 call work. Remove rc=0 in lots of cases.
---
 tools/flask/policy/policy/modules/xen/xen.te |   1 +
 xen/common/Kconfig                           |  12 +
 xen/common/Makefile                          |   1 +
 xen/common/sysctl.c                          |   7 +
 xen/common/xsplice.c                         | 406 +++++++++++++++++++++++++++
 xen/include/public/sysctl.h                  | 166 +++++++++++
 xen/include/xen/xsplice.h                    |  35 +++
 xen/xsm/flask/hooks.c                        |   6 +
 xen/xsm/flask/policy/access_vectors          |   2 +
 9 files changed, 636 insertions(+)
 create mode 100644 xen/common/xsplice.c
 create mode 100644 xen/include/xen/xsplice.h

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index 7e69ce9..68ef6de 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -72,6 +72,7 @@ allow dom0_t xen_t:xen2 {
 allow dom0_t xen_t:xen2 {
     pmu_ctrl
     get_symbol
+    xsplice_op
 };
 
 # Allow dom0 to use all XENVER_ subops and VERSION subops that have checks.
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 3522ecb..d80dddb 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -182,4 +182,16 @@ config SCHED_DEFAULT
 
 endmenu
 
+# Enable/Disable xsplice support
+config XSPLICE
+	bool "xSplice live patching support"
+	default n
+	depends on X86
+	---help---
+	  Allows a running Xen hypervisor to be dynamically patched using
+	  binary patches without rebooting. This is primarily used to binarily
+	  patch in the field an hypervisor with XSA fixes.
+
+	  If unsure, say Y.
+
 endmenu
diff --git a/xen/common/Makefile b/xen/common/Makefile
index e43ec49..1e4bc70 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -58,6 +58,7 @@ obj-y += vsprintf.o
 obj-y += wait.o
 obj-$(CONFIG_XENOPROF) += xenoprof.o
 obj-y += xmalloc_tlsf.o
+obj-$(CONFIG_XSPLICE) += xsplice.o
 
 obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma unlzo unlz4 earlycpio,$(n).init.o)
 
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 253b7c8..0fac940 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -28,6 +28,7 @@
 #include <xsm/xsm.h>
 #include <xen/pmstat.h>
 #include <xen/gcov.h>
+#include <xen/xsplice.h>
 
 long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
 {
@@ -460,6 +461,12 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
         ret = tmem_control(&op->u.tmem_op);
         break;
 
+    case XEN_SYSCTL_xsplice_op:
+        ret = xsplice_op(&op->u.xsplice);
+        if ( ret != -EOPNOTSUPP )
+            copyback = 1;
+        break;
+
     default:
         ret = arch_do_sysctl(op, u_sysctl);
         copyback = 0;
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
new file mode 100644
index 0000000..0047032
--- /dev/null
+++ b/xen/common/xsplice.c
@@ -0,0 +1,406 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#include <xen/err.h>
+#include <xen/guest_access.h>
+#include <xen/keyhandler.h>
+#include <xen/lib.h>
+#include <xen/list.h>
+#include <xen/mm.h>
+#include <xen/sched.h>
+#include <xen/smp.h>
+#include <xen/spinlock.h>
+#include <xen/vmap.h>
+#include <xen/xsplice.h>
+
+#include <asm/event.h>
+#include <public/sysctl.h>
+
+static DEFINE_SPINLOCK(payload_lock);
+static LIST_HEAD(payload_list);
+
+static unsigned int payload_cnt;
+static unsigned int payload_version = 1;
+
+struct payload {
+    uint32_t state;                      /* One of the XSPLICE_STATE_*. */
+    int32_t rc;                          /* 0 or -XEN_EXX. */
+    struct list_head list;               /* Linked to 'payload_list'. */
+    char name[XEN_XSPLICE_NAME_SIZE];    /* Name of it. */
+};
+
+static int verify_name(const xen_xsplice_name_t *name)
+{
+    if ( !name->size || name->size > XEN_XSPLICE_NAME_SIZE )
+        return -EINVAL;
+
+    if ( name->pad[0] || name->pad[1] || name->pad[2] )
+        return -EINVAL;
+
+    if ( !guest_handle_okay(name->name, name->size) )
+        return -EINVAL;
+
+    return 0;
+}
+
+static int verify_payload(const xen_sysctl_xsplice_upload_t *upload)
+{
+    if ( verify_name(&upload->name) )
+        return -EINVAL;
+
+    if ( !upload->size )
+        return -EINVAL;
+
+    if ( upload->size > MB(2) )
+        return -EINVAL;
+
+    if ( !guest_handle_okay(upload->payload, upload->size) )
+        return -EFAULT;
+
+    return 0;
+}
+
+/*
+ * We may be holding the payload_lock or not. Hence we need
+ * the recursive spinlock. Or we can judiciously use an
+ * lock argument to differenciate - but it is simpler with recursive locks.
+ */
+static struct payload *find_payload(const xen_xsplice_name_t *name)
+{
+    struct payload *data, *found = NULL;
+    char n[XEN_XSPLICE_NAME_SIZE];
+    int rc;
+
+    rc = verify_name(name);
+    if ( rc )
+        return ERR_PTR(rc);
+
+    if ( __copy_from_guest(n, name->name, name->size) )
+        return ERR_PTR(-EFAULT);
+
+    if ( n[name->size - 1] )
+        return ERR_PTR(-EINVAL);
+
+    spin_lock_recursive(&payload_lock);
+
+    list_for_each_entry ( data, &payload_list, list )
+    {
+        if ( !strcmp(data->name, n) )
+        {
+            found = data;
+            break;
+        }
+    }
+
+    spin_unlock_recursive(&payload_lock);
+
+    return found;
+}
+
+/* We MUST be holding the payload_lock spinlock. */
+static void free_payload(struct payload *data)
+{
+    ASSERT(spin_is_locked(&payload_lock));
+    list_del(&data->list);
+    payload_cnt--;
+    payload_version++;
+    xfree(data);
+}
+
+static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
+{
+    struct payload *data;
+    int rc;
+
+    rc = verify_payload(upload);
+    if ( rc )
+        return rc;
+
+    data = find_payload(&upload->name);
+    if ( data && !IS_ERR(data) /* Found. */ )
+        return -EEXIST;
+
+    if ( IS_ERR(data) )
+        return PTR_ERR(data);
+
+    data = xzalloc(struct payload);
+    if ( !data )
+        return -ENOMEM;
+
+    rc = -EFAULT;
+    if ( __copy_from_guest(data->name, upload->name.name, upload->name.size) )
+        goto out;
+
+    rc = -EINVAL;
+    if ( data->name[upload->name.size - 1] )
+        goto out;
+
+    rc = 0;
+    data->state = XSPLICE_STATE_CHECKED;
+    INIT_LIST_HEAD(&data->list);
+
+    spin_lock_recursive(&payload_lock);
+    list_add_tail(&data->list, &payload_list);
+    payload_cnt++;
+    payload_version++;
+    spin_unlock_recursive(&payload_lock);
+
+ out:
+    if ( rc )
+        xfree(data);
+
+    return rc;
+}
+
+static int xsplice_get(xen_sysctl_xsplice_get_t *get)
+{
+    struct payload *data;
+    int rc;
+
+    rc = verify_name(&get->name);
+    if ( rc )
+        return rc;
+
+    spin_lock_recursive(&payload_lock);
+
+    data = find_payload(&get->name);
+    if ( IS_ERR_OR_NULL(data) )
+    {
+        spin_unlock_recursive(&payload_lock);
+        if ( !data )
+            return -ENOENT;
+
+        return PTR_ERR(data);
+    }
+
+    get->status.state = data->state;
+    get->status.rc = data->rc;
+
+    spin_unlock_recursive(&payload_lock);
+
+    return 0;
+}
+
+static int xsplice_list(xen_sysctl_xsplice_list_t *list)
+{
+    xen_xsplice_status_t status;
+    struct payload *data;
+    unsigned int idx = 0, i = 0;
+    int rc = 0;
+
+    if ( list->nr > 1024 )
+        return -E2BIG;
+
+    if ( list->pad )
+        return -EINVAL;
+
+    if ( list->nr &&
+         (!guest_handle_okay(list->status, list->nr) ||
+          !guest_handle_okay(list->name, XEN_XSPLICE_NAME_SIZE * list->nr) ||
+          !guest_handle_okay(list->len, list->nr)) )
+        return -EINVAL;
+
+    spin_lock_recursive(&payload_lock);
+    if ( list->idx > payload_cnt )
+    {
+        spin_unlock_recursive(&payload_lock);
+        return -EINVAL;
+    }
+
+    if ( list->nr )
+    {
+        list_for_each_entry( data, &payload_list, list )
+        {
+            uint32_t len;
+
+            if ( list->idx > i++ )
+                continue;
+
+            status.state = data->state;
+            status.rc = data->rc;
+            len = strlen(data->name) + 1;
+
+            /* N.B. 'idx' != 'i'. */
+            if ( __copy_to_guest_offset(list->name, idx * XEN_XSPLICE_NAME_SIZE,
+                                        data->name, len) ||
+                __copy_to_guest_offset(list->len, idx, &len, 1) ||
+                __copy_to_guest_offset(list->status, idx, &status, 1) )
+            {
+                rc = -EFAULT;
+                break;
+            }
+
+            idx++;
+
+            if ( (idx >= list->nr) || hypercall_preempt_check() )
+                break;
+        }
+    }
+    list->nr = payload_cnt - i; /* Remaining amount. */
+    list->version = payload_version;
+    spin_unlock_recursive(&payload_lock);
+
+    /* And how many we have processed. */
+    return rc ? : idx;
+}
+
+static int xsplice_action(xen_sysctl_xsplice_action_t *action)
+{
+    struct payload *data;
+    int rc;
+
+    rc = verify_name(&action->name);
+    if ( rc )
+        return rc;
+
+    spin_lock_recursive(&payload_lock);
+    data = find_payload(&action->name);
+    if ( IS_ERR_OR_NULL(data) )
+    {
+        spin_unlock_recursive(&payload_lock);
+        if ( !data )
+            return -ENOENT;
+
+        return PTR_ERR(data);
+    }
+
+    switch ( action->cmd )
+    {
+    case XSPLICE_ACTION_CHECK:
+        if ( data->state == XSPLICE_STATE_CHECKED )
+        {
+            /* No implementation yet. */
+            data->state = XSPLICE_STATE_CHECKED;
+            data->rc = 0;
+        }
+        break;
+
+    case XSPLICE_ACTION_UNLOAD:
+        if ( data->state == XSPLICE_STATE_CHECKED )
+        {
+            free_payload(data);
+            /* No touching 'data' from here on! */
+            data = NULL;
+        }
+        break;
+
+    case XSPLICE_ACTION_REVERT:
+        if ( data->state == XSPLICE_STATE_APPLIED )
+        {
+            /* No implementation yet. */
+            data->state = XSPLICE_STATE_CHECKED;
+            data->rc = 0;
+        }
+        break;
+
+    case XSPLICE_ACTION_APPLY:
+        if ( data->state == XSPLICE_STATE_CHECKED )
+        {
+            /* No implementation yet. */
+            data->state = XSPLICE_STATE_APPLIED;
+            data->rc = 0;
+        }
+        break;
+
+    case XSPLICE_ACTION_REPLACE:
+        if ( data->state == XSPLICE_STATE_CHECKED )
+        {
+            /* No implementation yet. */
+            data->state = XSPLICE_STATE_CHECKED;
+            data->rc = 0;
+        }
+        break;
+
+    default:
+        rc = -EOPNOTSUPP;
+        break;
+    }
+
+    spin_unlock_recursive(&payload_lock);
+
+    return rc;
+}
+
+int xsplice_op(xen_sysctl_xsplice_op_t *xsplice)
+{
+    int rc;
+
+    if ( xsplice->pad )
+        return -EINVAL;
+
+    switch ( xsplice->cmd )
+    {
+    case XEN_SYSCTL_XSPLICE_UPLOAD:
+        rc = xsplice_upload(&xsplice->u.upload);
+        break;
+
+    case XEN_SYSCTL_XSPLICE_GET:
+        rc = xsplice_get(&xsplice->u.get);
+        break;
+
+    case XEN_SYSCTL_XSPLICE_LIST:
+        rc = xsplice_list(&xsplice->u.list);
+        break;
+
+    case XEN_SYSCTL_XSPLICE_ACTION:
+        rc = xsplice_action(&xsplice->u.action);
+        break;
+
+    default:
+        rc = -EOPNOTSUPP;
+        break;
+   }
+
+    return rc;
+}
+
+static const char *state2str(uint32_t state)
+{
+#define STATE(x) [XSPLICE_STATE_##x] = #x
+    static const char *const names[] = {
+            STATE(CHECKED),
+            STATE(APPLIED),
+    };
+#undef STATE
+
+    if (state >= ARRAY_SIZE(names) || !names[state])
+        return "unknown";
+
+    return names[state];
+}
+
+static void xsplice_printall(unsigned char key)
+{
+    struct payload *data;
+
+    if ( !spin_trylock_recursive(&payload_lock) )
+    {
+        printk("Lock held. Try again.\n");
+        return;
+    }
+
+    list_for_each_entry ( data, &payload_list, list )
+        printk(" name=%s state=%s(%d)\n", data->name,
+               state2str(data->state), data->state);
+
+    spin_unlock_recursive(&payload_lock);
+}
+
+static int __init xsplice_init(void)
+{
+    register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
+    return 0;
+}
+__initcall(xsplice_init);
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 96680eb..ce674d8 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -766,6 +766,170 @@ struct xen_sysctl_tmem_op {
 typedef struct xen_sysctl_tmem_op xen_sysctl_tmem_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tmem_op_t);
 
+/*
+ * XEN_SYSCTL_XSPLICE_op
+ *
+ * Refer to the docs/unstable/misc/xsplice.markdown
+ * for the design details of this hypercall.
+ *
+ * There are four sub-ops:
+ *  XEN_SYSCTL_XSPLICE_UPLOAD (0)
+ *  XEN_SYSCTL_XSPLICE_GET (1)
+ *  XEN_SYSCTL_XSPLICE_LIST (2)
+ *  XEN_SYSCTL_XSPLICE_ACTION (3)
+ *
+ * The normal sequence of sub-ops is to:
+ *  1) XEN_SYSCTL_XSPLICE_UPLOAD to upload the payload. If errors STOP.
+ *  2) XEN_SYSCTL_XSPLICE_GET to check the `->rc`. If -XEN_EAGAIN spin.
+ *     If zero go to next step.
+ *  3) XEN_SYSCTL_XSPLICE_ACTION with XSPLICE_ACTION_APPLY to apply the patch.
+ *  4) XEN_SYSCTL_XSPLICE_GET to check the `->rc`. If in -XEN_EAGAIN spin.
+ *     If zero exit with success.
+ */
+
+/*
+ * Structure describing an ELF payload. Uniquely identifies the
+ * payload. Should be human readable.
+ * Recommended length is upto XEN_XSPLICE_NAME_SIZE.
+ * Includes the NUL terminator.
+ */
+#define XEN_XSPLICE_NAME_SIZE 128
+struct xen_xsplice_name {
+    XEN_GUEST_HANDLE_64(char) name;         /* IN: pointer to name. */
+    uint16_t size;                          /* IN: size of name. May be upto
+                                               XEN_XSPLICE_NAME_SIZE. */
+    uint16_t pad[3];                        /* IN: MUST be zero. */
+};
+typedef struct xen_xsplice_name xen_xsplice_name_t;
+DEFINE_XEN_GUEST_HANDLE(xen_xsplice_name_t);
+
+/*
+ * Upload a payload to the hypervisor. The payload is verified
+ * against basic checks and if there are any issues the proper return code
+ * will be returned. The payload is not applied at this time - that is
+ * controlled by XEN_SYSCTL_XSPLICE_ACTION.
+ *
+ * The return value is zero if the payload was succesfully uploaded.
+ * Otherwise an EXX return value is provided. Duplicate `name` are not
+ * supported.
+ *
+ * The payload at this point is verified against basic checks.
+ *
+ * The `payload` is the ELF payload as mentioned in the `Payload format`
+ * section in the xSplice design document.
+ */
+#define XEN_SYSCTL_XSPLICE_UPLOAD 0
+struct xen_sysctl_xsplice_upload {
+    xen_xsplice_name_t name;                /* IN, name of the patch. */
+    uint64_t size;                          /* IN, size of the ELF file. */
+    XEN_GUEST_HANDLE_64(uint8) payload;     /* IN, the ELF file. */
+};
+typedef struct xen_sysctl_xsplice_upload xen_sysctl_xsplice_upload_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_upload_t);
+
+/*
+ * Retrieve an status of an specific payload.
+ *
+ * Upon completion the `struct xen_xsplice_status` is updated.
+ *
+ * The return value is zero on success and XEN_EXX on failure. This operation
+ * is synchronous and does not require preemption.
+ */
+#define XEN_SYSCTL_XSPLICE_GET 1
+
+struct xen_xsplice_status {
+#define XSPLICE_STATE_CHECKED      1
+#define XSPLICE_STATE_APPLIED      2
+    uint32_t state;                /* OUT: XSPLICE_STATE_*. */
+    int32_t rc;                    /* OUT: 0 if no error, otherwise -XEN_EXX. */
+};
+typedef struct xen_xsplice_status xen_xsplice_status_t;
+DEFINE_XEN_GUEST_HANDLE(xen_xsplice_status_t);
+
+struct xen_sysctl_xsplice_get {
+    xen_xsplice_name_t name;                /* IN, name of the payload. */
+    xen_xsplice_status_t status;            /* IN/OUT, state of it. */
+};
+typedef struct xen_sysctl_xsplice_get xen_sysctl_xsplice_get_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_get_t);
+
+/*
+ * Retrieve an array of abbreviated status and names of payloads that are
+ * loaded in the hypervisor.
+ *
+ * If the hypercall returns an positive number, it is the number (up to `nr`)
+ * of the payloads returned, along with `nr` updated with the number of remaining
+ * payloads, `version` updated (it may be the same across hypercalls. If it
+ * varies the data is stale and further calls could fail). The `status`,
+ * `name`, and `len`' are updated at their designed index value (`idx`) with
+ * the returned value of data.
+ *
+ * If the hypercall returns E2BIG the `nr` is too big and should be
+ * lowered. The upper limit of `nr` is left to the implemention.
+ *
+ * Note that due to the asynchronous nature of hypercalls the domain might have
+ * added or removed the number of payloads making this information stale. It is
+ * the responsibility of the toolstack to use the `version` field to check
+ * between each invocation. if the version differs it should discard the stale
+ * data and start from scratch. It is OK for the toolstack to use the new
+ * `version` field.
+ */
+#define XEN_SYSCTL_XSPLICE_LIST 2
+struct xen_sysctl_xsplice_list {
+    uint32_t version;                       /* OUT: Hypervisor stamps value.
+                                               If varies between calls, we are
+                                             * getting stale data. */
+    uint32_t idx;                           /* IN: Index into array. */
+    uint32_t nr;                            /* IN: How many status, name, and len
+                                               should fill out. Can be zero to get
+                                               amount of payloads and version.
+                                               OUT: How many payloads left. */
+    uint32_t pad;                           /* IN: Must be zero. */
+    XEN_GUEST_HANDLE_64(xen_xsplice_status_t) status;  /* OUT. Must have enough
+                                               space allocate for nr of them. */
+    XEN_GUEST_HANDLE_64(char) name;         /* OUT: Array of names. Each member
+                                               MUST XEN_XSPLICE_NAME_SIZE in size.
+                                               Must have nr of them. */
+    XEN_GUEST_HANDLE_64(uint32) len;        /* OUT: Array of lengths of name's.
+                                               Must have nr of them. */
+};
+typedef struct xen_sysctl_xsplice_list xen_sysctl_xsplice_list_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_list_t);
+
+/*
+ * Perform an operation on the payload structure referenced by the `name` field.
+ * The operation request is asynchronous and the status should be retrieved
+ * by using either XEN_SYSCTL_XSPLICE_GET or XEN_SYSCTL_XSPLICE_LIST hypercall.
+ */
+#define XEN_SYSCTL_XSPLICE_ACTION 3
+struct xen_sysctl_xsplice_action {
+    xen_xsplice_name_t name;                /* IN, name of the patch. */
+#define XSPLICE_ACTION_CHECK        1
+#define XSPLICE_ACTION_UNLOAD       2
+#define XSPLICE_ACTION_REVERT       3
+#define XSPLICE_ACTION_APPLY        4
+#define XSPLICE_ACTION_REPLACE      5
+    uint32_t cmd;                           /* IN: XSPLICE_ACTION_*. */
+    uint32_t timeout;                       /* IN: Zero if no timeout. */
+                                            /* Or upper bound of time (ms) */
+                                            /* for operation to take. */
+};
+typedef struct xen_sysctl_xsplice_action xen_sysctl_xsplice_action_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_action_t);
+
+struct xen_sysctl_xsplice_op {
+    uint32_t cmd;                           /* IN: XEN_SYSCTL_XSPLICE_*. */
+    uint32_t pad;                           /* IN: Always zero. */
+    union {
+        xen_sysctl_xsplice_upload_t upload;
+        xen_sysctl_xsplice_list_t list;
+        xen_sysctl_xsplice_get_t get;
+        xen_sysctl_xsplice_action_t action;
+    } u;
+};
+typedef struct xen_sysctl_xsplice_op xen_sysctl_xsplice_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_op_t);
+
 struct xen_sysctl {
     uint32_t cmd;
 #define XEN_SYSCTL_readconsole                    1
@@ -791,6 +955,7 @@ struct xen_sysctl {
 #define XEN_SYSCTL_pcitopoinfo                   22
 #define XEN_SYSCTL_psr_cat_op                    23
 #define XEN_SYSCTL_tmem_op                       24
+#define XEN_SYSCTL_xsplice_op                    25
     uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
     union {
         struct xen_sysctl_readconsole       readconsole;
@@ -816,6 +981,7 @@ struct xen_sysctl {
         struct xen_sysctl_psr_cmt_op        psr_cmt_op;
         struct xen_sysctl_psr_cat_op        psr_cat_op;
         struct xen_sysctl_tmem_op           tmem_op;
+        struct xen_sysctl_xsplice_op        xsplice;
         uint8_t                             pad[128];
     } u;
 };
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
new file mode 100644
index 0000000..5c84851
--- /dev/null
+++ b/xen/include/xen/xsplice.h
@@ -0,0 +1,35 @@
+/*
+ * Copyright (c) 2016 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#ifndef __XEN_XSPLICE_H__
+#define __XEN_XSPLICE_H__
+
+struct xen_sysctl_xsplice_op;
+
+#ifdef CONFIG_XSPLICE
+
+int xsplice_op(struct xen_sysctl_xsplice_op *);
+
+#else
+
+#include <xen/errno.h> /* For -EOPNOTSUPP */
+static inline int xsplice_op(struct xen_sysctl_xsplice_op *op)
+{
+    return -EOPNOTSUPP;
+}
+
+#endif /* CONFIG_XSPLICE */
+
+#endif /* __XEN_XSPLICE_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 1eaec58..3ef0441 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -808,6 +808,12 @@ static int flask_sysctl(int cmd)
     case XEN_SYSCTL_tmem_op:
         return domain_has_xen(current->domain, XEN__TMEM_CONTROL);
 
+#ifdef CONFIG_XSPLICE
+    case XEN_SYSCTL_xsplice_op:
+        return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2,
+                                    XEN2__XSPLICE_OP, NULL);
+#endif
+
     default:
         printk("flask_sysctl: Unknown op %d\n", cmd);
         return -EPERM;
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 56600bb..1c59b58 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -93,6 +93,8 @@ class xen2
     pmu_ctrl
 # PMU use (domains, including unprivileged ones, will be using this operation)
     pmu_use
+# XEN_SYSCTL_xsplice_op
+    xsplice_op
 }
 
 # Classes domain and domain2 consist of operations that a domain performs on
-- 
2.5.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 11/34] xsplice: Design document
  2016-03-23 11:18   ` Jan Beulich
  2016-03-23 20:12     ` Konrad Rzeszutek Wilk
@ 2016-03-24  3:15     ` Konrad Rzeszutek Wilk
  2016-03-24  9:32       ` Jan Beulich
  1 sibling, 1 reply; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-24  3:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, Ian Jackson,
	Tim Deegan, mpohlack, sasha.levin, xen-devel

On Wed, Mar 23, 2016 at 05:18:39AM -0600, Jan Beulich wrote:
> >>> On 15.03.16 at 18:56, <konrad.wilk@oracle.com> wrote:
> > +### XEN_SYSCTL_XSPLICE_LIST (2)
> > +
> > +Retrieve an array of abbreviated status and names of payloads that are 
> > loaded in the
> > +hypervisor.
> > +
> > +The caller provides:
> > +
> > + * `version`. Initially (on first hypercall) *MUST* be zero.
> > + * `idx` index iterator. On first call *MUST* be zero, subsequent calls varies.
> > + * `nr` the max number of entries to populate.
> > + * `pad` - *MUST* be zero.
> > + * `status` virtual address of where to write `struct xen_xsplice_status`
> > +   structures. Caller *MUST* allocate up to `nr` of them.
> > + * `name` - virtual address of where to write the unique name of the payload.
> > +   Caller *MUST* allocate up to `nr` of them. Each *MUST* be of
> > +   **XEN_XSPLICE_NAME_SIZE** size.
> > + * `len` - virtual address of where to write the length of each unique name
> > +   of the payload. Caller *MUST* allocate up to `nr` of them. Each *MUST* be
> > +   of sizeof(uint32_t) (4 bytes).
> > +
> > +If the hypercall returns an positive number, it is the number (upto `nr`
> > +provided to the hypercall) of the payloads returned, along with `nr` updated
> > +with the number of remaining payloads, `version` updated (it may be the same
> > +across hypercalls - if it varies the data is stale and further calls could
> > +fail). The `status`, `name`, and `len`' are updated at their designed index
> > +value (`idx`) with the returned value of data.
> > +
> > +If the hypercall returns -XEN_E2BIG the `nr` is too big and should be
> > +lowered.
> > +
> > +If the hypercall returns an zero value there are no more payloads.
> > +
> > +Note that due to the asynchronous nature of hypercalls the control domain might
> > +have added or removed a number of payloads making this information stale. It is
> > +the responsibility of the toolstack to use the `version` field to check
> > +between each invocation. if the version differs it should discard the stale
> > +data and start from scratch. It is OK for the toolstack to use the new
> > +`version` field.
> > +
> > +The `struct xen_xsplice_status` structure contains an status of payload which includes:
> > +
> > + * `status` - indicates the current status of the payload:
> > +   * *XSPLICE_STATUS_CHECKED*  (1) loaded and the ELF payload safety checks passed.
> > +   * *XSPLICE_STATUS_APPLIED* (2) loaded, checked, and applied.
> > +   *  No other value is possible.
> > + * `rc` - -XEN_EXX type errors encountered while performing the last
> > +   XSPLICE_ACTION_* operation. The normal values can be zero or -XEN_EAGAIN which
> > +   respectively mean: success or operation in progress. Other values
> > +   imply an error occurred. If there is an error in `rc`, `status` will **NOT**
> > +   have changed.
> > +
> > +The structure is as follow:
> > +
> > +<pre>
> > +struct xen_sysctl_xsplice_list {  
> > +    uint32_t version;                       /* IN/OUT: Initially *MUST* be zero.  
> > +                                               On subsequent calls reuse value.  
> > +                                               If varies between calls, we are  
> > +                                             * getting stale data. */  
> > +    uint32_t idx;                           /* IN/OUT: Index into array. */ 
> > +    uint32_t nr;                            /* IN: How many status, names, and len  
> > +                                               should fill out.  
> > +                                               OUT: How many payloads left. */  
> 
> I think there's an ambiguity left in both the description above and
> the comments here: With idx required to be zero upon first
> invocation (which I'm not clear why that is), which parts of the
> three arrays get filled when idx is non-zero: [0, idx) or [nr, nr + idx)?

Here is the new updated design. Hopefully it is more clear?

From ccd6f3521241ec56158e58bf9e26388b573469b3 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Mon, 14 Sep 2015 09:05:11 -0400
Subject: [PATCH] xsplice: Design document

A mechanism is required to binarily patch the running hypervisor with new
opcodes that have come about due to primarily security updates.

This document describes the design of the API that would allow us to
upload to the hypervisor binary patches.

This document has been shaped by the input from:
  Martin Pohlack <mpohlack@amazon.de>
  Jan Beulich <jbeulich@suse.com>

Thank you!

Input-from: Martin Pohlack <mpohlack@amazon.de>
Input-from: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>

---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>

v1-2: review
v3: Split document in v1 and v2 (todo) to simplify implementation goals.
 - Add const on some structures. Truncate size to uint16_t where it makes sense.
 - Convert 'id' to 'name', Add Ross's comments about what is implemented.
 - Wei's and Ross's reviews.
 - Jan's review comments.
 - Jan's review comments.
    s/int32_t state/uint32_t state/ now that return code is in seperate
    field (rc). Add various other types, such as R_X86_64_PC64 in the list.
    Mention the need for compiler check.
v4:
 - Drop the LOADED->CHECKED state and go directly to CHECKED state. Drop
    LOADED.
v5: Julien mentioned ARM 32-bit would not use ELF64, so make the .xsplice.func
    use uintXX_t types instead of ELF ones. Remove the OUT on idx subfield.
    Mention that 'nr' being zero can be used for probing the number of payloads.
---
 docs/misc/xsplice.markdown | 1038 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 1038 insertions(+)
 create mode 100644 docs/misc/xsplice.markdown

diff --git a/docs/misc/xsplice.markdown b/docs/misc/xsplice.markdown
new file mode 100644
index 0000000..cb11867
--- /dev/null
+++ b/docs/misc/xsplice.markdown
@@ -0,0 +1,1038 @@
+# xSplice Design v1
+
+## Rationale
+
+A mechanism is required to binarily patch the running hypervisor with new
+opcodes that have come about due to primarily security updates.
+
+This document describes the design of the API that would allow us to
+upload to the hypervisor binary patches.
+
+The document is split in four sections:
+
+ * Detailed descriptions of the problem statement.
+ * Design of the data structures.
+ * Design of the hypercalls.
+ * Implementation notes that should be taken into consideration.
+
+
+## Glossary
+
+ * splice - patch in the binary code with new opcodes
+ * trampoline - a jump to a new instruction.
+ * payload - telemetries of the old code along with binary blob of the new
+   function (if needed).
+ * reloc - telemetries contained in the payload to construct proper trampoline.
+
+## History
+
+The document has gone under various reviews and only covers v1 design.
+
+The end of the document has a section titled `Not Yet Done` which
+outlines ideas and design for the future version of this work.
+
+## Multiple ways to patch
+
+The mechanism needs to be flexible to patch the hypervisor in multiple ways
+and be as simple as possible. The compiled code is contiguous in memory with
+no gaps - so we have no luxury of 'moving' existing code and must either
+insert a trampoline to the new code to be executed - or only modify in-place
+the code if there is sufficient space. The placement of new code has to be done
+by hypervisor and the virtual address for the new code is allocated dynamically.
+
+This implies that the hypervisor must compute the new offsets when splicing
+in the new trampoline code. Where the trampoline is added (inside
+the function we are patching or just the callers?) is also important.
+
+To lessen the amount of code in hypervisor, the consumer of the API
+is responsible for identifying which mechanism to employ and how many locations
+to patch. Combinations of modifying in-place code, adding trampoline, etc
+has to be supported. The API should allow read/write any memory within
+the hypervisor virtual address space.
+
+We must also have a mechanism to query what has been applied and a mechanism
+to revert it if needed.
+
+## Workflow
+
+The expected workflows of higher-level tools that manage multiple patches
+on production machines would be:
+
+ * The first obvious task is loading all available / suggested
+   hotpatches when they are available.
+ * Whenever new hotpatches are installed, they should be loaded too.
+ * One wants to query which modules have been loaded at runtime.
+ * If unloading is deemed safe (see unloading below), one may want to
+   support a workflow where a specific hotpatch is marked as bad and
+   unloaded.
+
+## Patching code
+
+The first mechanism to patch that comes in mind is in-place replacement.
+That is replace the affected code with new code. Unfortunately the x86
+ISA is variable size which places limits on how much space we have available
+to replace the instructions. That is not a problem if the change is smaller
+than the original opcode and we can fill it with nops. Problems will
+appear if the replacement code is longer.
+
+The second mechanism is by ti replace the call or jump to the
+old function with the address of the new function.
+
+A third mechanism is to add a jump to the new function at the
+start of the old function. N.B. The Xen hypervisor implements the third
+mechanism. See `Trampoline (e9 opcode)` section for more details.
+
+### Example of trampoline and in-place splicing
+
+As example we will assume the hypervisor does not have XSA-132 (see
+*domctl/sysctl: don't leak hypervisor stack to toolstacks*
+4ff3449f0e9d175ceb9551d3f2aecb59273f639d) and we would like to binary patch
+the hypervisor with it. The original code looks as so:
+
+<pre>
+   48 89 e0                  mov    %rsp,%rax  
+   48 25 00 80 ff ff         and    $0xffffffffffff8000,%rax  
+</pre>
+
+while the new patched hypervisor would be:
+
+<pre>
+   48 c7 45 b8 00 00 00 00   movq   $0x0,-0x48(%rbp)  
+   48 c7 45 c0 00 00 00 00   movq   $0x0,-0x40(%rbp)  
+   48 c7 45 c8 00 00 00 00   movq   $0x0,-0x38(%rbp)  
+   48 89 e0                  mov    %rsp,%rax  
+   48 25 00 80 ff ff         and    $0xffffffffffff8000,%rax  
+</pre>
+
+This is inside the arch_do_domctl. This new change adds 21 extra
+bytes of code which alters all the offsets inside the function. To alter
+these offsets and add the extra 21 bytes of code we might not have enough
+space in .text to squeeze this in.
+
+As such we could simplify this problem by only patching the site
+which calls arch_do_domctl:
+
+<pre>
+do_domctl:  
+ e8 4b b1 05 00          callq  ffff82d08015fbb9 <arch_do_domctl>  
+</pre>
+
+with a new address for where the new `arch_do_domctl` would be (this
+area would be allocated dynamically).
+
+Astute readers will wonder what we need to do if we were to patch `do_domctl`
+- which is not called directly by hypervisor but on behalf of the guests via
+the `compat_hypercall_table` and `hypercall_table`.
+Patching the offset in `hypercall_table` for `do_domctl:
+(ffff82d080103079 <do_domctl>:)
+
+<pre>
+
+ ffff82d08024d490:   79 30  
+ ffff82d08024d492:   10 80 d0 82 ff ff   
+
+</pre>
+
+with the new address where the new `do_domctl` is possible. The other
+place where it is used is in `hvm_hypercall64_table` which would need
+to be patched in a similar way. This would require an in-place splicing
+of the new virtual address of `arch_do_domctl`.
+
+In summary this example patched the callee of the affected function by
+ * allocating memory for the new code to live in,
+ * changing the virtual address in all the functions which called the old
+   code (computing the new offset, patching the callq with a new callq).
+ * changing the function pointer tables with the new virtual address of
+   the function (splicing in the new virtual address). Since this table
+   resides in the .rodata section we would need to temporarily change the
+   page table permissions during this part.
+
+However it has drawbacks - the safety checks which have to make sure
+the function is not on the stack - must also check every caller. For some
+patches this could mean - if there were an sufficient large amount of
+callers - that we would never be able to apply the update.
+
+Having the patching done at predetermined instances where the stacks
+are not deep mostly solves this problem.
+
+### Example of different trampoline patching.
+
+An alternative mechanism exists where we can insert a trampoline in the
+existing function to be patched to jump directly to the new code. This
+lessens the locations to be patched to one but it puts pressure on the
+CPU branching logic (I-cache, but it is just one unconditional jump).
+
+For this example we will assume that the hypervisor has not been compiled
+with fe2e079f642effb3d24a6e1a7096ef26e691d93e (XSA-125: *pre-fill structures
+for certain HYPERVISOR_xen_version sub-ops*) which mem-sets an structure
+in `xen_version` hypercall. This function is not called **anywhere** in
+the hypervisor (it is called by the guest) but referenced in the
+`compat_hypercall_table` and `hypercall_table` (and indirectly called
+from that). Patching the offset in `hypercall_table` for the old
+`do_xen_version` (ffff82d080112f9e <do_xen_version>)
+
+</pre>
+ ffff82d08024b270 <hypercall_table>:   
+ ...  
+ ffff82d08024b2f8:   9e 2f 11 80 d0 82 ff ff  
+
+</pre>
+
+with the new address where the new `do_xen_version` is possible. The other
+place where it is used is in `hvm_hypercall64_table` which would need
+to be patched in a similar way. This would require an in-place splicing
+of the new virtual address of `do_xen_version`.
+
+An alternative solution would be to patch insert a trampoline in the
+old `do_xen_version' function to directly jump to the new `do_xen_version`.
+
+<pre>
+ ffff82d080112f9e do_xen_version:  
+ ffff82d080112f9e:       48 c7 c0 da ff ff ff    mov    $0xffffffffffffffda,%rax  
+ ffff82d080112fa5:       83 ff 09                cmp    $0x9,%edi  
+ ffff82d080112fa8:       0f 87 24 05 00 00       ja     ffff82d0801134d2 ; do_xen_version+0x534  
+</pre>
+
+with:
+
+<pre>
+ ffff82d080112f9e do_xen_version:  
+ ffff82d080112f9e:       e9 XX YY ZZ QQ          jmpq   [new do_xen_version]  
+</pre>
+
+which would lessen the amount of patching to just one location.
+
+In summary this example patched the affected function to jump to the
+new replacement function which required:
+ * allocating memory for the new code to live in,
+ * inserting trampoline with new offset in the old function to point to the
+   new function.
+ * Optionally we can insert in the old function a trampoline jump to an function
+   providing an BUG_ON to catch errant code.
+
+The disadvantage of this are that the unconditional jump will consume a small
+I-cache penalty. However the simplicity of the patching and higher chance
+of passing safety checks make this a worthwhile option.
+
+This patching has a similar drawback as inline patching - the safety
+checks have to make sure the function is not on the stack. However
+since we are replacing at a higher level (a full function as opposed
+to various offsets within functions) the checks are simpler.
+
+Having the patching done at predetermined instances where the stacks
+are not deep mostly solves this problem as well.
+
+### Security
+
+With this method we can re-write the hypervisor - and as such we **MUST** be
+diligent in only allowing certain guests to perform this operation.
+
+Furthermore with SecureBoot or tboot, we **MUST** also verify the signature
+of the payload to be certain it came from a trusted source and integrity
+was intact.
+
+As such the hypercall **MUST** support an XSM policy to limit what the guest
+is allowed to invoke. If the system is booted with signature checking the
+signature checking will be enforced.
+
+## Design of payload format
+
+The payload **MUST** contain enough data to allow us to apply the update
+and also safely reverse it. As such we **MUST** know:
+
+ * The locations in memory to be patched. This can be determined dynamically
+   via symbols or via virtual addresses.
+ * The new code that will be patched in.
+
+This binary format can be constructed using an custom binary format but
+there are severe disadvantages of it:
+
+ * The format might need to be changed and we need an mechanism to accommodate
+   that.
+ * It has to be platform agnostic.
+ * Easily constructed using existing tools.
+
+As such having the payload in an ELF file is the sensible way. We would be
+carrying the various sets of structures (and data) in the ELF sections under
+different names and with definitions.
+
+Note that every structure has padding. This is added so that the hypervisor
+can re-use those fields as it sees fit.
+
+Earlier design attempted to ineptly explain the relations of the ELF sections
+to each other without using proper ELF mechanism (sh_info, sh_link, data
+structures using Elf types, etc). This design will explain the structures
+and how they are used together and not dig in the ELF format - except mention
+that the section names should match the structure names.
+
+The xSplice payload is a relocatable ELF binary. A typical binary would have:
+
+ * One or more .text sections.
+ * Zero or more read-only data sections.
+ * Zero or more data sections.
+ * Relocations for each of these sections.
+
+It may also have some architecture-specific sections. For example:
+
+ * Alternatives instructions.
+ * Bug frames.
+ * Exception tables.
+ * Relocations for each of these sections.
+
+The xSplice core code loads the payload as a standard ELF binary, relocates it
+and handles the architecture-specifc sections as needed. This process is much
+like what the Linux kernel module loader does.
+
+The payload contains a section (xsplice_patch_func) with an array of structures
+describing the functions to be patched:
+
+<pre>
+struct xsplice_patch_func {  
+    const char *name;  
+    uint64_t new_addr;  
+    uint64_t old_addr;  
+    uint32_t new_size;  
+    uint32_t old_size;  
+    uint8_t pad[32];  
+};  
+</pre>
+
+The size of the structure is 64 bytes.
+
+* `name` is the symbol name of the old function. Only used if `old_addr` is
+   zero, otherwise will be used during dynamic linking (when hypervisor loads
+   the payload).
+
+* `old_addr` is the address of the function to be patched and is filled in at
+  payload generation time if hypervisor function address is known. If unknown,
+  the value *MUST* be zero and the hypervisor will attempt to resolve the address.
+
+* `new_addr` is the address of the function that is replacing the old
+  function. The address is filled in during relocation. The value **MUST** be
+  the address of the new function in the file.
+
+* `old_size` and `new_size` contain the sizes of the respective functions in bytes.
+   The value of `old_size` **MUST** not be zero.
+
+* `pad` **MUST** be zero.
+
+The size of the `xsplice_patch_func` array is determined from the ELF section
+size.
+
+When applying the patch the hypervisor iterates over each `xsplice_patch_func`
+structure and the core code inserts a trampoline at `old_addr` to `new_addr`.
+
+When reverting a patch, the hypervisor iterates over each `xsplice_patch_func`
+and the core code copies the data from the undo buffer (private internal copy)
+to `old_addr`.
+
+## Hypercalls
+
+We will employ the sub operations of the system management hypercall (sysctl).
+There are to be four sub-operations:
+
+ * upload the payloads.
+ * listing of payloads summary uploaded and their state.
+ * getting an particular payload summary and its state.
+ * command to apply, delete, or revert the payload.
+
+Most of the actions are asynchronous therefore the caller is responsible
+to verify that it has been applied properly by retrieving the summary of it
+and verifying that there are no error codes associated with the payload.
+
+We **MUST** make some of them asynchronous due to the nature of patching
+it requires every physical CPU to be lock-step with each other.
+The patching mechanism while an implementation detail, is not an short
+operation and as such the design **MUST** assume it will be an long-running
+operation.
+
+The sub-operations will spell out how preemption is to be handled (if at all).
+
+Furthermore it is possible to have multiple different payloads for the same
+function. As such an unique name per payload has to be visible to allow proper manipulation.
+
+The hypercall is part of the `xen_sysctl`. The top level structure contains
+one uint32_t to determine the sub-operations and one padding field which
+*MUST* always be zero.
+
+<pre>
+struct xen_sysctl_xsplice_op {  
+    uint32_t cmd;                   /* IN: XEN_SYSCTL_XSPLICE_*. */  
+    uint32_t pad;                   /* IN: Always zero. */  
+	union {  
+          ... see below ...  
+        } u;  
+};  
+
+</pre>
+while the rest of hypercall specific structures are part of the this structure.
+
+### Basic type: struct xen_xsplice_name
+
+Most of the hypercalls employ an shared structure called `struct xen_xsplice_name`
+which contains:
+
+ * `name` - pointer where the string for the name is located.
+ * `size` - the size of the string
+ * `pad` - padding - to be zero.
+
+The structure is as follow:
+
+<pre>
+/*  
+ *  Uniquely identifies the payload.  Should be human readable.  
+ * Includes the NUL terminator  
+ */  
+#define XEN_XSPLICE_NAME_SIZE 128  
+struct xen_xsplice_name {  
+    XEN_GUEST_HANDLE_64(char) name;         /* IN, pointer to name. */  
+    uint16_t size;                          /* IN, size of name. May be upto   
+                                               XEN_XSPLICE_NAME_SIZE. */  
+    uint16_t pad[3];                        /* IN: MUST be zero. */ 
+};  
+</pre>
+
+### XEN_SYSCTL_XSPLICE_UPLOAD (0)
+
+Upload a payload to the hypervisor. The payload is verified
+against basic checks and if there are any issues the proper return code
+will be returned. The payload is not applied at this time - that is
+controlled by *XEN_SYSCTL_XSPLICE_ACTION*.
+
+The caller provides:
+
+ * A `struct xen_xsplice_name` called `name` which has the unique name.
+ * `size` the size of the ELF payload (in bytes).
+ * `payload` the virtual address of where the ELF payload is.
+
+The `name` could be an UUID that stays fixed forever for a given
+payload. It can be embedded into the ELF payload at creation time
+and extracted by tools.
+
+The return value is zero if the payload was succesfully uploaded.
+Otherwise an -XEN_EXX return value is provided. Duplicate `name` are not supported.
+
+The `payload` is the ELF payload as mentioned in the `Payload format` section.
+
+The structure is as follow:
+
+<pre>
+struct xen_sysctl_xsplice_upload {  
+    xen_xsplice_name_t name;            /* IN, name of the patch. */  
+    uint64_t size;                      /* IN, size of the ELF file. */  
+    XEN_GUEST_HANDLE_64(uint8) payload; /* IN: ELF file. */  
+};  
+</pre>
+
+### XEN_SYSCTL_XSPLICE_GET (1)
+
+Retrieve an status of an specific payload. This caller provides:
+
+ * A `struct xen_xsplice_name` called `name` which has the unique name.
+ * A `struct xen_xsplice_status` structure. The member values will
+   be over-written upon completion.
+
+Upon completion the `struct xen_xsplice_status` is updated.
+
+ * `status` - indicates the current status of the payload:
+   * *XSPLICE_STATUS_CHECKED*  (1) loaded and the ELF payload safety checks passed.
+   * *XSPLICE_STATUS_APPLIED* (2) loaded, checked, and applied.
+   *  No other value is possible.
+ * `rc` - -XEN_EXX type errors encountered while performing the last
+   XSPLICE_ACTION_* operation. The normal values can be zero or -XEN_EAGAIN which
+   respectively mean: success or operation in progress. Other values
+   imply an error occurred. If there is an error in `rc`, `status` will **NOT**
+   have changed.
+
+The return value of the hypercall is zero on success and -XEN_EXX on failure.
+(Note that the `rc`` value can be different from the return value, as in
+rc=-XEN_EAGAIN and return value can be 0).
+
+For example, supposing there is an payload:
+
+<pre>
+ status: XSPLICE_STATUS_CHECKED
+ rc: 0
+</pre>
+
+We apply an action - XSPLICE_ACTION_REVERT - to revert it (which won't work
+as we have not even applied it. Afterwards we will have:
+
+<pre>
+ status: XSPLICE_STATUS_CHECKED
+ rc: -XEN_EINVAL
+</pre>
+
+It has failed but it remains loaded.
+
+This operation is synchronous and does not require preemption.
+
+The structure is as follow:
+
+<pre>
+struct xen_xsplice_status {  
+#define XSPLICE_STATUS_CHECKED      1  
+#define XSPLICE_STATUS_APPLIED      2  
+    uint32_t state;                 /* OUT: XSPLICE_STATE_*. */  
+    int32_t rc;                     /* OUT: 0 if no error, otherwise -XEN_EXX. */  
+};  
+
+struct xen_sysctl_xsplice_get {  
+    xen_xsplice_name_t name;        /* IN, the name of the payload. */  
+    xen_xsplice_status_t status;    /* IN/OUT: status of the payload. */  
+};  
+</pre>
+
+### XEN_SYSCTL_XSPLICE_LIST (2)
+
+Retrieve an array of abbreviated status and names of payloads that are loaded in the
+hypervisor.
+
+The caller provides:
+
+ * `version`. Version of the payload. Caller should re-use the field provided by
+    the hypervisor. If the value differs the data is stale.
+ * `idx` index iterator. The index into the hypervisor's payload count. It is
+    recommended that on first invocation zero be used so that `nr` (which the
+    hypervisor will update with the remaining payload count) be provided.
+    Also the hypervisor will provide `version` with the most current value.
+ * `nr` the max number of entries to populate. Can be zero which will result
+    in the hypercall being a probing one and return the number of payloads
+    (and update the `version`).
+ * `pad` - *MUST* be zero.
+ * `status` virtual address of where to write `struct xen_xsplice_status`
+   structures. Caller *MUST* allocate up to `nr` of them.
+ * `name` - virtual address of where to write the unique name of the payload.
+   Caller *MUST* allocate up to `nr` of them. Each *MUST* be of
+   **XEN_XSPLICE_NAME_SIZE** size. Note that **XEN_XSPLICE_NAME_SIZE** includes
+   the NUL terminator.
+ * `len` - virtual address of where to write the length of each unique name
+   of the payload. Caller *MUST* allocate up to `nr` of them. Each *MUST* be
+   of sizeof(uint32_t) (4 bytes).
+
+If the hypercall returns an positive number, it is the number (upto `nr`
+provided to the hypercall) of the payloads returned, along with `nr` updated
+with the number of remaining payloads, `version` updated (it may be the same
+across hypercalls - if it varies the data is stale and further calls could
+fail). The `status`, `name`, and `len`' are updated at their designed index
+value (`idx`) with the returned value of data.
+
+If the hypercall returns -XEN_E2BIG the `nr` is too big and should be
+lowered.
+
+If the hypercall returns an zero value there are no more payloads.
+
+Note that due to the asynchronous nature of hypercalls the control domain might
+have added or removed a number of payloads making this information stale. It is
+the responsibility of the toolstack to use the `version` field to check
+between each invocation. if the version differs it should discard the stale
+data and start from scratch. It is OK for the toolstack to use the new
+`version` field.
+
+The `struct xen_xsplice_status` structure contains an status of payload which includes:
+
+ * `status` - indicates the current status of the payload:
+   * *XSPLICE_STATUS_CHECKED*  (1) loaded and the ELF payload safety checks passed.
+   * *XSPLICE_STATUS_APPLIED* (2) loaded, checked, and applied.
+   *  No other value is possible.
+ * `rc` - -XEN_EXX type errors encountered while performing the last
+   XSPLICE_ACTION_* operation. The normal values can be zero or -XEN_EAGAIN which
+   respectively mean: success or operation in progress. Other values
+   imply an error occurred. If there is an error in `rc`, `status` will **NOT**
+   have changed.
+
+The structure is as follow:
+
+<pre>
+struct xen_sysctl_xsplice_list {  
+    uint32_t version;                       /* OUT: Hypervisor stamps value.
+                                               If varies between calls, we are  
+                                               getting stale data. */  
+    uint32_t idx;                           /* IN: Index into array. */  
+    uint32_t nr;                            /* IN: How many status, names, and len  
+                                               should be filled out. Can be zero to get  
+                                               amount of payloads and version.  
+                                               OUT: How many payloads left. */  
+    uint32_t pad;                           /* IN: Must be zero. */  
+    XEN_GUEST_HANDLE_64(xen_xsplice_status_t) status;  /* OUT. Must have enough  
+                                               space allocate for nr of them. */  
+    XEN_GUEST_HANDLE_64(char) id;           /* OUT: Array of names. Each member  
+                                               MUST XEN_XSPLICE_NAME_SIZE in size.  
+                                               Must have nr of them. */  
+    XEN_GUEST_HANDLE_64(uint32) len;        /* OUT: Array of lengths of name's.  
+                                               Must have nr of them. */  
+};  
+</pre>
+
+### XEN_SYSCTL_XSPLICE_ACTION (3)
+
+Perform an operation on the payload structure referenced by the `name` field.
+The operation request is asynchronous and the status should be retrieved
+by using either **XEN_SYSCTL_XSPLICE_GET** or **XEN_SYSCTL_XSPLICE_LIST** hypercall.
+
+The caller provides:
+
+ * A 'struct xen_xsplice_name` `name` containing the unique name.
+ * `cmd` the command requested:
+  * *XSPLICE_ACTION_CHECK* (1) check that the payload will apply properly.
+    This also verfies the payload - which may require SecureBoot firmware
+    calls. This is the initial state an payload is in.
+  * *XSPLICE_ACTION_UNLOAD* (2) unload the payload.
+   Any further hypercalls against the `name` will result in failure unless
+   **XEN_SYSCTL_XSPLICE_UPLOAD** hypercall is perfomed with same `name`.
+  * *XSPLICE_ACTION_REVERT* (3) revert the payload. If the operation takes
+  more time than the upper bound of time the `rc` in `xen_xsplice_status'
+  retrieved via **XEN_SYSCTL_XSPLICE_GET** will be -XEN_EBUSY.
+  * *XSPLICE_ACTION_APPLY* (4) apply the payload. If the operation takes
+  more time than the upper bound of time the `rc` in `xen_xsplice_status'
+  retrieved via **XEN_SYSCTL_XSPLICE_GET** will be -XEN_EBUSY.
+  * *XSPLICE_ACTION_REPLACE* (5) revert all applied payloads and apply this
+  payload. If the operation takes more time than the upper bound of time
+  the `rc` in `xen_xsplice_status' retrieved via **XEN_SYSCTL_XSPLICE_GET**
+  will be -XEN_EBUSY.
+ * `time` the upper bound of time (ms) the cmd should take. Zero means infinite.
+   If within the time the operation does not succeed the operation would go in
+   error state.
+ * `pad` - *MUST* be zero.
+
+The return value will be zero unless the provided fields are incorrect.
+
+The structure is as follow:
+
+<pre>
+#define XSPLICE_ACTION_CHECK   1  
+#define XSPLICE_ACTION_UNLOAD  2  
+#define XSPLICE_ACTION_REVERT  3  
+#define XSPLICE_ACTION_APPLY   4  
+#define XSPLICE_ACTION_REPLACE 5  
+struct xen_sysctl_xsplice_action {  
+    xen_xsplice_name_t name;                /* IN, name of the patch. */  
+    uint32_t cmd;                           /* IN: XSPLICE_ACTION_* */  
+    uint32_t time;                          /* IN: Zero if no timeout. */   
+                                            /* Or upper bound of time (ms) */   
+                                            /* for operation to take. */  
+};  
+
+</pre>
+
+## State diagrams of XSPLICE_ACTION commands.
+
+There is a strict ordering state of what the commands can be.
+The XSPLICE_ACTION prefix has been dropped to easy reading and
+does not include the XSPLICE_STATES:
+
+<pre>
+              /->\  
+              \  /  
+ UNLOAD <--- CHECK ---> REPLACE|APPLY --> REVERT --\  
+                \                                  |  
+                 \-------------------<-------------/  
+
+</pre>
+## State transition table of XSPLICE_ACTION commands and XSPLICE_STATUS.
+
+Note that:
+
+ - The CHECKED state is the starting one achieved with *XEN_SYSCTL_XSPLICE_UPLOAD* hypercall.
+ - The REVERT operation on success will automatically move to the CHECKED state.
+ - There are two STATES: CHECKED and APPLIED.
+ - There are five actions (aka commands): CHECK, APPLY, REPLACE, REVERT, and UNLOAD.
+
+The state transition table of valid states and action states:
+
+<pre>
+
++---------+---------+--------------------------------+-------+--------+
+| ACTION  | Current | Result                         | Next STATE:    |
+| ACTION  | STATE   |                                |CHECKED|APPLIED |
++---------+----------+-------------------------------+-------+--------+
+| CHECK   | CHECKED | Check payload (once more, no)  |   x   |        |
+|         |         | errors)                        |       |        |
++---------+---------+--------------------------------+-------+--------+
+| CHECK   | CHECKED | Check payload (once more, with |       |        |
+|         |         | errors)                        |       |        |
++---------+---------+--------------------------------+-------+--------+
+| UNLOAD  | CHECKED | Unload payload. Always works.  |       |        |
+|         |         | No next states.                |       |        |
++---------+---------+--------------------------------+-------+--------+
+| APPLY   | CHECKED | Apply payload (success).       |       |   x    |
++---------+---------+--------------------------------+-------+--------+
+| APPLY   | CHECKED | Apply payload (error|timeout)  |   x   |        |
++---------+---------+--------------------------------+-------+--------+
+| REPLACE | CHECKED | Revert payloads and apply new  |       |   x    |
+|         |         | payload with success.          |       |        |
++---------+---------+--------------------------------+-------+--------+
+| REPLACE | CHECKED | Revert payloads and apply new  |   x   |        |
+|         |         | payload with error.            |       |        |
++---------+---------+--------------------------------+-------+--------+
+| REVERT  | APPLIED | Revert payload (success).      |   x   |        |
++---------+---------+--------------------------------+-------+--------+
+| REVERT  | APPLIED | Revert payload (error|timeout) |       |   x    |
++---------+---------+--------------------------------+-------+--------+
+</pre>
+
+All the other state transitions are invalid.
+
+## Sequence of events.
+
+The normal sequence of events is to:
+
+ 1. *XEN_SYSCTL_XSPLICE_UPLOAD* to upload the payload. If there are errors *STOP* here.
+ 2. *XEN_SYSCTL_XSPLICE_GET* to check the `->rc`. If *-XEN_EAGAIN* spin. If zero go to next step.
+ 3. *XEN_SYSCTL_XSPLICE_ACTION* with *XSPLICE_ACTION_APPLY* to apply the patch.
+ 4. *XEN_SYSCTL_XSPLICE_GET* to check the `->rc`. If in *-XEN_EAGAIN* spin. If zero exit with success.
+
+
+## Addendum
+
+Implementation quirks should not be discussed in a design document.
+
+However these observations can provide aid when developing against this
+document.
+
+
+### Alternative assembler
+
+Alternative assembler is a mechanism to use different instructions depending
+on what the CPU supports. This is done by providing multiple streams of code
+that can be patched in - or if the CPU does not support it - padded with
+`nop` operations. The alternative assembler macros cause the compiler to
+expand the code to place a most generic code in place - emit a special
+ELF .section header to tag this location. During run-time the hypervisor
+can leave the areas alone or patch them with an better suited opcodes.
+
+Note that patching functions that copy to or from guest memory requires
+to support alternative support. For example this can be due to SMAP
+(specifically *stac* and *clac* operations) which is enabled on Broadwell
+and later architectures. It may be related to other alternative instructions.
+
+### When to patch
+
+During the discussion on the design two candidates bubbled where
+the call stack for each CPU would be deterministic. This would
+minimize the chance of the patch not being applied due to safety
+checks failing. Safety checks such as not patching code which
+is on the stack - which can lead to corruption.
+
+#### Rendezvous code instead of stop_machine for patching
+
+The hypervisor's time rendezvous code runs synchronously across all CPUs
+every second. Using the stop_machine to patch can stall the time rendezvous
+code and result in NMI. As such having the patching be done at the tail
+of rendezvous code should avoid this problem.
+
+However the entrance point for that code is
+do_softirq->timer_softirq_action->time_calibration
+which ends up calling on_selected_cpus on remote CPUs.
+
+The remote CPUs receive CALL_FUNCTION_VECTOR IPI and execute the
+desired function.
+
+#### Before entering the guest code.
+
+Before we call VMXResume we check whether any soft IRQs need to be executed.
+This is a good spot because all Xen stacks are effectively empty at
+that point.
+
+To randezvous all the CPUs an barrier with an maximum timeout (which
+could be adjusted), combined with forcing all other CPUs through the
+hypervisor with IPIs, can be utilized to execute lockstep instructions
+on all CPUs.
+
+The approach is similar in concept to stop_machine and the time rendezvous
+but is time-bound. However the local CPU stack is much shorter and
+a lot more deterministic.
+
+This is implemented in the Xen Project hypervisor.
+
+### Compiling the hypervisor code
+
+Hotpatch generation often requires support for compiling the target
+with -ffunction-sections / -fdata-sections.  Changes would have to
+be done to the linker scripts to support this.
+
+### Generation of xSplice ELF payloads
+
+The design of that is not discussed in this design.
+
+This is implemented in a seperate tool which lives in a seperate
+GIT repo.
+
+Currently it resides at https://github.com/rosslagerwall/xsplice-build
+
+### Exception tables and symbol tables growth
+
+We may need support for adapting or augmenting exception tables if
+patching such code.  Hotpatches may need to bring their own small
+exception tables (similar to how Linux modules support this).
+
+If supporting hotpatches that introduce additional exception-locations
+is not important, one could also change the exception table in-place
+and reorder it afterwards.
+
+As found almost every patch (XSA) to a non-trivial function requires
+additional entries in the exception table and/or the bug frames.
+
+This is implemented in the Xen Project hypervisor.
+
+### .rodata sections
+
+The patching might require strings to be updated as well. As such we must be
+also able to patch the strings as needed. This sounds simple - but the compiler
+has a habit of coalescing strings that are the same - which means if we in-place
+alter the strings - other users will be inadvertently affected as well.
+
+This is also where pointers to functions live - and we may need to patch this
+as well. And switch-style jump tables.
+
+To guard against that we must be prepared to do patching similar to
+trampoline patching or in-line depending on the flavour. If we can
+do in-line patching we would need to:
+
+ * alter `.rodata` to be writeable.
+ * inline patch.
+ * alter `.rodata` to be read-only.
+
+If are doing trampoline patching we would need to:
+
+ * allocate a new memory location for the string.
+ * all locations which use this string will have to be updated to use the
+   offset to the string.
+ * mark the region RO when we are done.
+
+The trampoline patching is implemented in the Xen Project hypervisor.
+
+### .bss and .data sections.
+
+In place patching writable data is not suitable as it is unclear what should be done
+depending on the current state of data. As such it should not be attempted.
+
+However, functions which are being patched can bring in changes to strings
+(.data or .rodata section changes), or even to .bss sections.
+
+As such the ELF payload can introduce new .rodata, .bss, and .data sections.
+Patching in the new function will end up also patching in the new .rodata
+section and the new function will reference the new string in the new
+.rodata section.
+
+This is implemented in the Xen Project hypervisor.
+
+### Security
+
+Only the privileged domain should be allowed to do this operation.
+
+
+# Not Yet Done
+
+This is for further development of xSplice.
+
+## Goals
+
+The implementation must also have a mechanism for:
+
+ *  An dependency mechanism for the payloads. To use that information to load:
+    - The appropiate payload. To verify that payload is built against the
+      hypervisor. This can be done via the `build-id`
+      or via providing an copy of the old code - so that the hypervisor can
+       verify it against the code in memory.
+    - To construct an appropiate order of payloads to load in case they
+      depend on each other.
+ * Be able to lookup in the Xen hypervisor the symbol names of functions from the ELF payload.
+ * Be able to patch .rodata, .bss, and .data sections.
+ * Further safety checks (blacklist of which functions cannot be patched, check
+   the stack, make sure the payload is built with same compiler as hypervisor).
+ * NOP out the code sequence if `new_size` is zero.
+ * Deal with other relocation types:  R_X86_64_[8,16,32,32S], R_X86_64_PC[8,16,64] in payload file.
+
+### xSplice interdependencies
+
+xSplice patches interdependencies are tricky.
+
+There are the ways this can be addressed:
+ * A single large patch that subsumes and replaces all previous ones.
+   Over the life-time of patching the hypervisor this large patch
+   grows to accumulate all the code changes.
+ * Hotpatch stack - where an mechanism exists that loads the hotpatches
+   in the same order they were built in. We would need an build-id
+   of the hypevisor to make sure the hot-patches are build against the
+   correct build.
+ * Payload containing the old code to check against that. That allows
+   the hotpatches to be loaded indepedently (if they don't overlap) - or
+   if the old code also containst previously patched code - even if they
+   overlap.
+
+The disadvantage of the first large patch is that it can grow over
+time and not provide an bisection mechanism to identify faulty patches.
+
+The hot-patch stack puts stricts requirements on the order of the patches
+being loaded and requires an hypervisor build-id to match against.
+
+The old code allows much more flexibility and an additional guard,
+but is more complex to implement.
+
+### Handle inlined __LINE__
+
+This problem is related to hotpatch construction
+and potentially has influence on the design of the hotpatching
+infrastructure in Xen.
+
+For example:
+
+We have file1.c with functions f1 and f2 (in that order).  f2 contains a
+BUG() (or WARN()) macro and at that point embeds the source line number
+into the generated code for f2.
+
+Now we want to hotpatch f1 and the hotpatch source-code patch adds 2
+lines to f1 and as a consequence shifts out f2 by two lines.  The newly
+constructed file1.o will now contain differences in both binary
+functions f1 (because we actually changed it with the applied patch) and
+f2 (because the contained BUG macro embeds the new line number).
+
+Without additional information, an algorithm comparing file1.o before
+and after hotpatch application will determine both functions to be
+changed and will have to include both into the binary hotpatch.
+
+Options:
+
+1. Transform source code patches for hotpatches to be line-neutral for
+   each chunk.  This can be done in almost all cases with either
+   reformatting of the source code or by introducing artificial
+   preprocessor "#line n" directives to adjust for the introduced
+   differences.
+
+   This approach is low-tech and simple.  Potentially generated
+   backtraces and existing debug information refers to the original
+   build and does not reflect hotpatching state except for actually
+   hotpatched functions but should be mostly correct.
+
+2. Ignoring the problem and living with artificially large hotpatches
+   that unnecessarily patch many functions.
+
+   This approach might lead to some very large hotpatches depending on
+   content of specific source file.  It may also trigger pulling in
+   functions into the hotpatch that cannot reasonable be hotpatched due
+   to limitations of a hotpatching framework (init-sections, parts of
+   the hotpatching framework itself, ...) and may thereby prevent us
+   from patching a specific problem.
+
+   The decision between 1. and 2. can be made on a patch--by-patch
+   basis.
+
+3. Introducing an indirection table for storing line numbers and
+   treating that specially for binary diffing. Linux may follow
+   this approach.
+
+   We might either use this indirection table for runtime use and patch
+   that with each hotpatch (similarly to exception tables) or we might
+   purely use it when building hotpatches to ignore functions that only
+   differ at exactly the location where a line-number is embedded.
+
+For BUG(), WARN(), etc., the line number is embedded into the bug frame, not
+the function itself.
+
+Similar considerations are true to a lesser extent for __FILE__, but it
+could be argued that file renaming should be done outside of hotpatches.
+
+## Signature checking requirements.
+
+The signature checking requires that the layout of the data in memory
+**MUST** be same for signature to be verified. This means that the payload
+data layout in ELF format **MUST** match what the hypervisor would be
+expecting such that it can properly do signature verification.
+
+The signature is based on the all of the payloads continuously laid out
+in memory. The signature is to be appended at the end of the ELF payload
+prefixed with the string '~Module signature appended~\n', followed by
+an signature header then followed by the signature, key identifier, and signers
+name.
+
+Specifically the signature header would be:
+
+<pre>
+#define PKEY_ALGO_DSA       0  
+#define PKEY_ALGO_RSA       1  
+
+#define PKEY_ID_PGP         0 /* OpenPGP generated key ID */  
+#define PKEY_ID_X509        1 /* X.509 arbitrary subjectKeyIdentifier */  
+
+#define HASH_ALGO_MD4          0  
+#define HASH_ALGO_MD5          1  
+#define HASH_ALGO_SHA1         2  
+#define HASH_ALGO_RIPE_MD_160  3  
+#define HASH_ALGO_SHA256       4  
+#define HASH_ALGO_SHA384       5  
+#define HASH_ALGO_SHA512       6  
+#define HASH_ALGO_SHA224       7  
+#define HASH_ALGO_RIPE_MD_128  8  
+#define HASH_ALGO_RIPE_MD_256  9  
+#define HASH_ALGO_RIPE_MD_320 10  
+#define HASH_ALGO_WP_256      11  
+#define HASH_ALGO_WP_384      12  
+#define HASH_ALGO_WP_512      13  
+#define HASH_ALGO_TGR_128     14  
+#define HASH_ALGO_TGR_160     15  
+#define HASH_ALGO_TGR_192     16  
+
+
+struct elf_payload_signature {  
+	u8	algo;		/* Public-key crypto algorithm PKEY_ALGO_*. */  
+	u8	hash;		/* Digest algorithm: HASH_ALGO_*. */  
+	u8	id_type;	/* Key identifier type PKEY_ID*. */  
+	u8	signer_len;	/* Length of signer's name */  
+	u8	key_id_len;	/* Length of key identifier */  
+	u8	__pad[3];  
+	__be32	sig_len;	/* Length of signature data */  
+};
+
+</pre>
+(Note that this has been borrowed from Linux module signature code.).
+
+
+### .bss and .data sections.
+
+In place patching writable data is not suitable as it is unclear what should be done
+depending on the current state of data. As such it should not be attempted.
+
+That said we should provide hook functions so that the existing data
+can be changed during payload application.
+
+
+### Inline patching
+
+The hypervisor should verify that the in-place patching would fit within
+the code or data.
+
+### Trampoline (e9 opcode)
+
+The e9 opcode used for jmpq uses a 32-bit signed displacement. That means
+we are limited to up to 2GB of virtual address to place the new code
+from the old code. That should not be a problem since Xen hypervisor has
+a very small footprint.
+
+However if we need - we can always add two trampolines. One at the 2GB
+limit that calls the next trampoline.
+
+Please note there is a small limitation for trampolines in
+function entries: The target function (+ trailing padding) must be able
+to accomodate the trampoline. On x86 with +-2 GB relative jumps,
+this means 5 bytes are required.
+
+Depending on compiler settings, there are several functions in Xen that
+are smaller (without inter-function padding).
+
+<pre> 
+readelf -sW xen-syms | grep " FUNC " | \
+    awk '{ if ($3 < 5) print $3, $4, $5, $8 }'
+
+...
+3 FUNC LOCAL wbinvd_ipi
+3 FUNC LOCAL shadow_l1_index
+...
+</pre>
+A compile-time check for, e.g., a minimum alignment of functions or a
+runtime check that verifies symbol size (+ padding to next symbols) for
+that in the hypervisor is advised.
+
+The tool for generating payloads currently does perform a compile-time
+check to ensure that the function to be replaced is large enough.
+
-- 
2.5.0

> 
> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.
  2016-03-24  2:37                           ` Konrad Rzeszutek Wilk
@ 2016-03-24  9:15                             ` Jan Beulich
  2016-03-24 11:39                               ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 124+ messages in thread
From: Jan Beulich @ 2016-03-24  9:15 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Stefano Stabellini, Andrew Cooper, Ian Jackson,
	mpohlack, ross.lagerwall, Julien Grall, Stefano Stabellini,
	xen-devel, Daniel De Graaf, Keir Fraser, sasha.levin

>>> On 24.03.16 at 03:37, <konrad.wilk@oracle.com> wrote:
>> > --- a/xen/xsm/flask/hooks.c
>> > +++ b/xen/xsm/flask/hooks.c
>> > @@ -1658,6 +1658,40 @@ static int flask_xen_version (uint32_t op)
>> >      }
>> >  }
>> >  
>> > +static int flask_version_op (uint32_t op)
>> > +{
>> > +    u32 dsid = domain_sid(current->domain);
>> > +
>> > +    switch ( op )
>> > +    {
>> > +    case XEN_VERSION_version:
>> > +    case XEN_VERSION_platform_parameters:
>> > +    case XEN_VERSION_get_features:
>> > +        /* These MUST always be accessible to any guest by default. */
>> > +        return 0;
>> 
>> Perhaps these would better be taken care of in xsm_version_op()?
> 
> It would be the oddball one.
> All of the xsm_**() in the header file (include/xsm/xsm.h) call the function
> pointers.

True, but if there appeared any second implementation besides
FLASK, it would need to repeat code to meet this backend
independent policy. Anyway - I'll leave it to Daniel to judge.

> @@ -381,6 +389,123 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>      return -ENOSYS;
>  }
>  
> +/* Computed be kernel_cache_init. */

... by ...

And I also think kernel_cache_init is a bad name - you initialize the
capabilities cache, not some kernel cache.

> @@ -418,6 +543,20 @@ DO(ni_hypercall)(void)
>      return -ENOSYS;
>  }
>  
> +static int __init kernel_cache_init(void)
> +{
> +    /*
> +     * Pre-allocate the cache so we do not have to worry about
> +     * simultaneous invocations on safe_strcat by guests and the cache
> +     * data becoming garbage.
> +     */
> +    arch_get_xen_caps(&cached_cap);
> +    cached_cap_len = strlen(cached_cap) + 1;
> +
> +    return 0;
> +}

With this I'm now missing the conversion of arch_get_xen_caps()
to __init. Or was this meant to become a follow-up patch (since it
might get a little larger if at once also taking care of moving the
string literals into .init.*)?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 07/34] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables
  2016-03-24  2:49         ` Konrad Rzeszutek Wilk
@ 2016-03-24  9:20           ` Jan Beulich
  0 siblings, 0 replies; 124+ messages in thread
From: Jan Beulich @ 2016-03-24  9:20 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, mpohlack,
	Julien Grall, Stefano Stabellini, sasha.levin, xen-devel

>>> On 24.03.16 at 03:49, <konrad.wilk@oracle.com> wrote:
>> > --- a/xen/common/symbols.c
>> > +++ b/xen/common/symbols.c
>> > @@ -17,6 +17,7 @@
>> >  #include <xen/lib.h>
>> >  #include <xen/string.h>
>> >  #include <xen/spinlock.h>
>> > +#include <xen/virtual_region.h>
>> >  #include <public/platform.h>
>> >  #include <xen/guest_access.h>
>> >  
>> > @@ -97,8 +98,7 @@ static unsigned int get_symbol_offset(unsigned long pos)
>> >  
>> >  bool_t is_active_kernel_text(unsigned long addr)
>> >  {
>> > -    return (is_kernel_text(addr) ||
>> > -            (system_state < SYS_STATE_active && is_kernel_inittext(addr)));
>> > +    return !!search_virtual_regions(addr);
>> 
>> search_virtual_regions() doesn't sound like it would be looking for
>> text addresses only.
> 
> I am not sure what would be a better name - as it
> (search_virtual_regions) is used by three other callers?
> 
> search_for_addr? 

How would that make clear that you're after .text addresses only?
Part of the problem of course is that only .text regions get registered,
yet the function names all don't reflect this. Perhaps generally
s/virtual/text/ ?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 12/34] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
  2016-03-24  3:13     ` Konrad Rzeszutek Wilk
@ 2016-03-24  9:29       ` Jan Beulich
  0 siblings, 0 replies; 124+ messages in thread
From: Jan Beulich @ 2016-03-24  9:29 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, Stefano Stabellini, andrew.cooper3, Ian Jackson,
	mpohlack, ross.lagerwall, xen-devel, Daniel De Graaf,
	sasha.levin

>>> On 24.03.16 at 04:13, <konrad.wilk@oracle.com> wrote:
> On Wed, Mar 23, 2016 at 07:51:29AM -0600, Jan Beulich wrote:
>> >>> On 15.03.16 at 18:56, <konrad.wilk@oracle.com> wrote:
>> And then of course the EXPERT question comes up again. No
>> matter that IanC is no longer around to help with the
>> argumentation, the point he has been making about too many
>> flavors ending up in the wild continues to apply.
> 
> 'too many flavors'? As in different versions of Xen with or without
> these options enabled? 

Yes.

>> > +    {
>> > +        spin_unlock_recursive(&payload_lock);
>> > +        return -EINVAL;
>> > +    }
>> > +
>> > +    list_for_each_entry( data, &payload_list, list )
>> 
>> Aren't you lacking a list->version check prior to entering this loop
>> (which would then mean you don't need to store it below, but only
>> on the error path from that check)?
> 
> No. The toolstack has no idea of what the right version is on the
> first invocation. Which is OK since it gets fresh data (it is
> its first invocation).
> 
> On subsequent invocations we gleefuly populate up to
> min(payload_cnt, ->nr) of data even if the version the toolstack
> provided is different. The toolstack will have to decide to throw away
> the data and retry the hypercall; or print it out as is.

Makes sense, but doesn't really fit with this

+The caller provides:
+
+ * `version`. Version of the payload. Caller should re-use the field provided by
+    the hypervisor. If the value differs the data is stale.

in the most recent patch 11.

> Here is the newly minted patch with your suggestions hopefully
> implemented to your liking!

I think this immediate providing of a partly next-version patch is
getting unwieldy: I just can't re-review several of these large
patches again every day. I'll look at the entire next version once
you've sent that out.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 11/34] xsplice: Design document
  2016-03-24  3:15     ` Konrad Rzeszutek Wilk
@ 2016-03-24  9:32       ` Jan Beulich
  0 siblings, 0 replies; 124+ messages in thread
From: Jan Beulich @ 2016-03-24  9:32 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, ross.lagerwall, andrew.cooper3, Ian Jackson,
	Tim Deegan, mpohlack, sasha.levin, xen-devel

>>> On 24.03.16 at 04:15, <konrad.wilk@oracle.com> wrote:
> +### XEN_SYSCTL_XSPLICE_LIST (2)
> +
> +Retrieve an array of abbreviated status and names of payloads that are loaded in the
> +hypervisor.
> +
> +The caller provides:
> +
> + * `version`. Version of the payload. Caller should re-use the field provided by
> +    the hypervisor. If the value differs the data is stale.
> + * `idx` index iterator. The index into the hypervisor's payload count. It is
> +    recommended that on first invocation zero be used so that `nr` (which the
> +    hypervisor will update with the remaining payload count) be provided.
> +    Also the hypervisor will provide `version` with the most current value.

This reads okay now.

> +struct xen_sysctl_xsplice_list {  
> +    uint32_t version;                       /* OUT: Hypervisor stamps value.
> +                                               If varies between calls, we are  
> +                                               getting stale data. */  
> +    uint32_t idx;                           /* IN: Index into array. */  

But the comment here can really only be taken to refer to one or
more of the arrays the following handles point at.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane.
  2016-03-24  9:15                             ` Jan Beulich
@ 2016-03-24 11:39                               ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-24 11:39 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Stefano Stabellini, Andrew Cooper, Ian Jackson,
	mpohlack, ross.lagerwall, Julien Grall, Stefano Stabellini,
	xen-devel, Daniel De Graaf, Keir Fraser, sasha.levin

On Thu, Mar 24, 2016 at 03:15:25AM -0600, Jan Beulich wrote:
> >>> On 24.03.16 at 03:37, <konrad.wilk@oracle.com> wrote:
> >> > --- a/xen/xsm/flask/hooks.c
> >> > +++ b/xen/xsm/flask/hooks.c
> >> > @@ -1658,6 +1658,40 @@ static int flask_xen_version (uint32_t op)
> >> >      }
> >> >  }
> >> >  
> >> > +static int flask_version_op (uint32_t op)
> >> > +{
> >> > +    u32 dsid = domain_sid(current->domain);
> >> > +
> >> > +    switch ( op )
> >> > +    {
> >> > +    case XEN_VERSION_version:
> >> > +    case XEN_VERSION_platform_parameters:
> >> > +    case XEN_VERSION_get_features:
> >> > +        /* These MUST always be accessible to any guest by default. */
> >> > +        return 0;
> >> 
> >> Perhaps these would better be taken care of in xsm_version_op()?
> > 
> > It would be the oddball one.
> > All of the xsm_**() in the header file (include/xsm/xsm.h) call the function
> > pointers.
> 
> True, but if there appeared any second implementation besides
> FLASK, it would need to repeat code to meet this backend
> independent policy. Anyway - I'll leave it to Daniel to judge.

/me nods.
> 
> > @@ -381,6 +389,123 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> >      return -ENOSYS;
> >  }
> >  
> > +/* Computed be kernel_cache_init. */
> 
> ... by ...
> 
> And I also think kernel_cache_init is a bad name - you initialize the
> capabilities cache, not some kernel cache.

/me nods. 
> 
> > @@ -418,6 +543,20 @@ DO(ni_hypercall)(void)
> >      return -ENOSYS;
> >  }
> >  
> > +static int __init kernel_cache_init(void)
> > +{
> > +    /*
> > +     * Pre-allocate the cache so we do not have to worry about
> > +     * simultaneous invocations on safe_strcat by guests and the cache
> > +     * data becoming garbage.
> > +     */
> > +    arch_get_xen_caps(&cached_cap);
> > +    cached_cap_len = strlen(cached_cap) + 1;
> > +
> > +    return 0;
> > +}
> 
> With this I'm now missing the conversion of arch_get_xen_caps()
> to __init. Or was this meant to become a follow-up patch (since it
> might get a little larger if at once also taking care of moving the
> string literals into .init.*)?

A follow up.
> 
> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 03/34] xsm/xen_version: Add XSM for the xen_version hypercall
  2016-03-15 17:56 ` [PATCH v4 03/34] xsm/xen_version: Add XSM for the xen_version hypercall Konrad Rzeszutek Wilk
  2016-03-18 11:55   ` Jan Beulich
  2016-03-22 17:49   ` Daniel De Graaf
@ 2016-03-24 15:34   ` anshul makkar
  2016-03-24 19:19     ` Konrad Rzeszutek Wilk
  2 siblings, 1 reply; 124+ messages in thread
From: anshul makkar @ 2016-03-24 15:34 UTC (permalink / raw)
  To: xen-devel, konrad.wilk, dgdegra

On 15/03/16 17:56, Konrad Rzeszutek Wilk wrote:
> All of XENVER_* have now an XSM check for their sub-ops.
>
> The subop for XENVER_commandline is now a priviliged operation.
> To not break guests we still return an string - but it is
> just '<denied>\0'.
>
> The rest: XENVER_[version|extraversion|capabilities|
> parameters|get_features|page_size|guest_handle|changeset|
> compile_info] behave as before - allowed by default for all
> guests if using the XSM default policy or with the dummy one.
>
> The admin can choose to change the sub-ops to be denied
> as they see fit.
>
> Also we add a local variable block.
>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>
> ---
> Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
>
> v2: Do XSM check for all the XENVER_ ops.
> v3: Add empty data conditions.
> v4: Return <denied> for priv subops.
> v5: Move extraversion from priv to normal. Drop the XSM check
>      for the non-priv subops.
> v6: Add +1 for strlen(xen_deny()) to include NULL. Move changeset,
>      compile_info to non-priv subops.
> v7: Remove the \0 on xen_deny()
> v8: Add new XSM domain for xenver hypercall. Add all subops to it.
> v9: Remove the extra line, Add Ack from Daniel
> v10: Rename the XSM from xen_version_op to xsm_xen_version.
>      Prefix the types with 'xen' to distinguish it from another
>      hypercall performing similar operation. Removed Ack from Daniel
>      as it was so large. Add local variable block.
> ---
>   tools/flask/policy/policy/modules/xen/xen.te | 15 ++++++++
>   xen/common/kernel.c                          | 53 +++++++++++++++++++++-------
>   xen/common/version.c                         | 15 ++++++++
>   xen/include/xen/version.h                    |  2 +-
>   xen/include/xsm/dummy.h                      | 22 ++++++++++++
>   xen/include/xsm/xsm.h                        |  5 +++
>   xen/xsm/dummy.c                              |  1 +
>   xen/xsm/flask/hooks.c                        | 43 ++++++++++++++++++++++
>   xen/xsm/flask/policy/access_vectors          | 28 +++++++++++++++
>   xen/xsm/flask/policy/security_classes        |  1 +
>   10 files changed, 172 insertions(+), 13 deletions(-)
>
> diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
> index d35ae22..7e7400d 100644
> --- a/tools/flask/policy/policy/modules/xen/xen.te
> +++ b/tools/flask/policy/policy/modules/xen/xen.te
> @@ -73,6 +73,15 @@ allow dom0_t xen_t:xen2 {
>       pmu_ctrl
>       get_symbol
>   };
> +
> +# Allow dom0 to use all XENVER_ subops
> +# Note that dom0 is part of domain_type so this has duplicates.
> +allow dom0_t xen_t:version {
> +    xen_version xen_extraversion xen_compile_info xen_capabilities
> +    xen_changeset xen_platform_parameters xen_get_features xen_pagesize
> +    xen_guest_handle xen_commandline
> +};
> +
>   allow dom0_t xen_t:mmu memorymap;
>
>   # Allow dom0 to use these domctls on itself. For domctls acting on other
> @@ -137,6 +146,12 @@ if (guest_writeconsole) {
>   # pmu_ctrl is for)
>   allow domain_type xen_t:xen2 pmu_use;
>
> +# For normal guests all except XENVER_commandline
> +allow domain_type xen_t:version {
> +    xen_version xen_extraversion xen_compile_info xen_capabilities
> +    xen_changeset xen_platform_parameters xen_get_features xen_pagesize
> +    xen_guest_handle
> +};
>   ###############################################################################
>   #
>   # Domain creation
> diff --git a/xen/common/kernel.c b/xen/common/kernel.c
> index 0618da2..2699ac0 100644
> --- a/xen/common/kernel.c
> +++ b/xen/common/kernel.c
> @@ -13,6 +13,7 @@
>   #include <xen/nmi.h>
>   #include <xen/guest_access.h>
>   #include <xen/hypercall.h>
> +#include <xsm/xsm.h>
>   #include <asm/current.h>
>   #include <public/nmi.h>
>   #include <public/version.h>
> @@ -223,12 +224,15 @@ void __init do_initcalls(void)
>   /*
>    * Simple hypercalls.
>    */
> -
>   DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>   {
> +    bool_t deny = !!xsm_xen_version(XSM_OTHER, cmd);
> +
>       switch ( cmd )
>       {
>       case XENVER_version:
> +        if ( deny )
> +            return 0;
>           return (xen_major_version() << 16) | xen_minor_version();
>
>       case XENVER_extraversion:
> @@ -236,7 +240,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>           xen_extraversion_t extraversion;
>
>           memset(extraversion, 0, sizeof(extraversion));
> -        safe_strcpy(extraversion, xen_extra_version());
> +        safe_strcpy(extraversion, deny ? xen_deny() : xen_extra_version());
>           if ( copy_to_guest(arg, extraversion, ARRAY_SIZE(extraversion)) )
>               return -EFAULT;
>           return 0;
> @@ -247,10 +251,10 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>           xen_compile_info_t info;
>
>           memset(&info, 0, sizeof(info));
> -        safe_strcpy(info.compiler,       xen_compiler());
> -        safe_strcpy(info.compile_by,     xen_compile_by());
> -        safe_strcpy(info.compile_domain, xen_compile_domain());
> -        safe_strcpy(info.compile_date,   xen_compile_date());
> +        safe_strcpy(info.compiler,       deny ? xen_deny() : xen_compiler());
> +        safe_strcpy(info.compile_by,     deny ? xen_deny() : xen_compile_by());
> +        safe_strcpy(info.compile_domain, deny ? xen_deny() : xen_compile_domain());
> +        safe_strcpy(info.compile_date,   deny ? xen_deny() : xen_compile_date());
>           if ( copy_to_guest(arg, &info, 1) )
>               return -EFAULT;
>           return 0;
> @@ -261,7 +265,8 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>           xen_capabilities_info_t info;
>
>           memset(info, 0, sizeof(info));
> -        arch_get_xen_caps(&info);
> +        if ( !deny )
> +            arch_get_xen_caps(&info);
>
>           if ( copy_to_guest(arg, info, ARRAY_SIZE(info)) )
>               return -EFAULT;
> @@ -274,6 +279,9 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>               .virt_start = HYPERVISOR_VIRT_START
>           };
>
> +        if ( deny )
> +            params.virt_start = 0;
> +
>           if ( copy_to_guest(arg, &params, 1) )
>               return -EFAULT;
>           return 0;
> @@ -285,7 +293,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>           xen_changeset_info_t chgset;
>
>           memset(chgset, 0, sizeof(chgset));
> -        safe_strcpy(chgset, xen_changeset());
> +        safe_strcpy(chgset, deny ? xen_deny() : xen_changeset());
>           if ( copy_to_guest(arg, chgset, ARRAY_SIZE(chgset)) )
>               return -EFAULT;
>           return 0;
> @@ -302,6 +310,8 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>           switch ( fi.submap_idx )
>           {
>           case 0:
> +            if ( deny )
> +                break;
>               fi.submap = (1U << XENFEAT_memory_op_vnode_supported);
>               if ( VM_ASSIST(d, pae_extended_cr3) )
>                   fi.submap |= (1U << XENFEAT_pae_pgdir_above_4gb);
> @@ -342,19 +352,38 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>       }
>
>       case XENVER_pagesize:
> +        if ( deny )
> +            return 0;
>           return (!guest_handle_is_null(arg) ? -EINVAL : PAGE_SIZE);
>
>       case XENVER_guest_handle:
> -        if ( copy_to_guest(arg, current->domain->handle,
> -                           ARRAY_SIZE(current->domain->handle)) )
> +    {
> +        xen_domain_handle_t hdl;
> +        ssize_t len;
> +
> +        if ( deny )
> +        {
> +            len = sizeof(hdl);
> +            memset(&hdl, 0, len);
> +        } else
> +            len = ARRAY_SIZE(current->domain->handle);
> +
> +        if ( copy_to_guest(arg, deny ? hdl : current->domain->handle, len ) )
>               return -EFAULT;
>           return 0;
> -
> +    }
>       case XENVER_commandline:
> -        if ( copy_to_guest(arg, saved_cmdline, ARRAY_SIZE(saved_cmdline)) )
> +    {
> +        size_t len = ARRAY_SIZE(saved_cmdline);
> +
> +        if ( deny )
> +            len = strlen(xen_deny()) + 1;
> +
> +        if ( copy_to_guest(arg, deny ? xen_deny() : saved_cmdline, len) )
>               return -EFAULT;
>           return 0;
>       }
> +    }
>
>       return -ENOSYS;
>   }
> diff --git a/xen/common/version.c b/xen/common/version.c
> index b152e27..fc9bf42 100644
> --- a/xen/common/version.c
> +++ b/xen/common/version.c
> @@ -55,3 +55,18 @@ const char *xen_banner(void)
>   {
>       return XEN_BANNER;
>   }
> +
> +const char *xen_deny(void)
> +{
> +    return "<denied>";
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/include/xen/version.h b/xen/include/xen/version.h
> index 81a3c7d..016a56c 100644
> --- a/xen/include/xen/version.h
> +++ b/xen/include/xen/version.h
> @@ -12,5 +12,5 @@ unsigned int xen_minor_version(void);
>   const char *xen_extra_version(void);
>   const char *xen_changeset(void);
>   const char *xen_banner(void);
> -
> +const char *xen_deny(void);
>   #endif /* __XEN_VERSION_H__ */
> diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
> index 1d13826..94b8855 100644
> --- a/xen/include/xsm/dummy.h
> +++ b/xen/include/xsm/dummy.h
> @@ -727,3 +727,25 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, unsigned int
>   }
>
>   #endif /* CONFIG_X86 */
> +
> +#include <public/version.h>
> +static XSM_INLINE int xsm_xen_version (XSM_DEFAULT_ARG uint32_t op)
> +{
> +    XSM_ASSERT_ACTION(XSM_OTHER);
> +    switch ( op )
> +    {
> +    case XENVER_version:
> +    case XENVER_extraversion:
> +    case XENVER_compile_info:
> +    case XENVER_capabilities:
> +    case XENVER_changeset:
> +    case XENVER_platform_parameters:
> +    case XENVER_get_features:
> +    case XENVER_pagesize:
> +    case XENVER_guest_handle:
> +        /* These MUST always be accessible to any guest by default. */
> +        return xsm_default_action(XSM_HOOK, current->domain, NULL);
> +    default:
> +        return xsm_default_action(XSM_PRIV, current->domain, NULL);
> +    }
> +}
> diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
> index 3afed70..db440f6 100644
> --- a/xen/include/xsm/xsm.h
> +++ b/xen/include/xsm/xsm.h
> @@ -193,6 +193,7 @@ struct xsm_operations {
>       int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
>       int (*pmu_op) (struct domain *d, unsigned int op);
>   #endif
> +    int (*xen_version) (uint32_t cmd);
>   };
>
>   #ifdef CONFIG_XSM
> @@ -731,6 +732,10 @@ static inline int xsm_pmu_op (xsm_default_t def, struct domain *d, unsigned int
>
>   #endif /* CONFIG_X86 */
>
> +static inline int xsm_xen_version (xsm_default_t def, uint32_t op)
> +{
> +    return xsm_ops->xen_version(op);
> +}
>   #endif /* XSM_NO_WRAPPERS */
>
>   #ifdef CONFIG_MULTIBOOT
> diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
> index 0f32636..9791ad4 100644
> --- a/xen/xsm/dummy.c
> +++ b/xen/xsm/dummy.c
> @@ -162,4 +162,5 @@ void xsm_fixup_ops (struct xsm_operations *ops)
>       set_to_dummy_if_null(ops, ioport_mapping);
>       set_to_dummy_if_null(ops, pmu_op);
>   #endif
> +    set_to_dummy_if_null(ops, xen_version);
>   }
> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> index 4813623..d1bef43 100644
> --- a/xen/xsm/flask/hooks.c
> +++ b/xen/xsm/flask/hooks.c
> @@ -26,6 +26,7 @@
>   #include <public/xen.h>
>   #include <public/physdev.h>
>   #include <public/platform.h>
> +#include <public/version.h>
>
>   #include <public/xsm/flask_op.h>
>
> @@ -1620,6 +1621,47 @@ static int flask_pmu_op (struct domain *d, unsigned int op)
>   }
>   #endif /* CONFIG_X86 */
>
> +static int flask_xen_version (uint32_t op)
> +{
> +    u32 dsid = domain_sid(current->domain);
> +
> +    switch ( op )
> +    {
> +    case XENVER_version:
> +        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
> +                            VERSION__XEN_VERSION, NULL);
> +    case XENVER_extraversion:
> +        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
> +                            VERSION__XEN_EXTRAVERSION, NULL);
> +    case XENVER_compile_info:
> +        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
> +                            VERSION__XEN_COMPILE_INFO, NULL);
> +    case XENVER_capabilities:
> +        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
> +                            VERSION__XEN_CAPABILITIES, NULL);
> +    case XENVER_changeset:
> +        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
> +                            VERSION__XEN_CHANGESET, NULL);
> +    case XENVER_platform_parameters:
> +        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
> +                            VERSION__XEN_PLATFORM_PARAMETERS, NULL);
> +    case XENVER_get_features:
> +        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
> +                            VERSION__XEN_GET_FEATURES, NULL);
> +    case XENVER_pagesize:
> +        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
> +                            VERSION__XEN_PAGESIZE, NULL);
> +    case XENVER_guest_handle:
> +        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
> +                            VERSION__XEN_GUEST_HANDLE, NULL);
> +    case XENVER_commandline:
> +        return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
> +                            VERSION__XEN_COMMANDLINE, NULL);
> +    default:
> +        return -EPERM;
> +    }
> +}
> +
>   long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
>   int compat_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
>
> @@ -1758,6 +1800,7 @@ static struct xsm_operations flask_ops = {
>       .ioport_mapping = flask_ioport_mapping,
>       .pmu_op = flask_pmu_op,
>   #endif
> +    .xen_version = flask_xen_version,
>   };
>
>   static __init void flask_init(void)
> diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
> index effb59f..628dd5c 100644
> --- a/xen/xsm/flask/policy/access_vectors
> +++ b/xen/xsm/flask/policy/access_vectors
> @@ -495,3 +495,31 @@ class security
>   # remove ocontext label definitions for resources
>       del_ocontext
>   }
> +
> +# Class version is used to describe the XENVER_ hypercall.
> +# Each sub-ops is described here - in the default case all of them should
> +# be allowed except the XENVER_commandline.
> +#
> +class version
> +{
> +# Often called by PV kernels to force an callback.
> +    xen_version
> +# Extra informations (-unstable).
> +    xen_extraversion
> +# Compile information of the hypervisor.
> +    xen_compile_info
> +# Such as "xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64".
> +    xen_capabilities
> +# Such as the virtual address of where the hypervisor resides.
> +    xen_platform_parameters
> +# Source code changeset.
> +    xen_changeset
> +# The features the hypervisor supports.
> +    xen_get_features
> +# Page size the hypervisor uses.
> +    xen_pagesize
> +# An value that the control stack can choose.
> +    xen_guest_handle
> +# Xen command line.
> +    xen_commandline
> +}
> diff --git a/xen/xsm/flask/policy/security_classes b/xen/xsm/flask/policy/security_classes
> index ca191db..cde4e1a 100644
> --- a/xen/xsm/flask/policy/security_classes
> +++ b/xen/xsm/flask/policy/security_classes
> @@ -18,5 +18,6 @@ class shadow
>   class event
>   class grant
>   class security
> +class version
>
>   # FLASK
>
Can we have more meaningful name for XSM class. "version" doesn't seem 
to be informative enough to convey the message as to why we need it to 
be secure. (Is it a resource, or domain specific or event or...)

My suggestion would be xenmetainfo or something more meaningful.

Anshul


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

* Re: [PATCH v4 03/34] xsm/xen_version: Add XSM for the xen_version hypercall
  2016-03-24 15:34   ` anshul makkar
@ 2016-03-24 19:19     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 124+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-24 19:19 UTC (permalink / raw)
  To: anshul makkar; +Cc: dgdegra, xen-devel

> >diff --git a/xen/xsm/flask/policy/security_classes b/xen/xsm/flask/policy/security_classes
> >index ca191db..cde4e1a 100644
> >--- a/xen/xsm/flask/policy/security_classes
> >+++ b/xen/xsm/flask/policy/security_classes
> >@@ -18,5 +18,6 @@ class shadow
> >  class event
> >  class grant
> >  class security
> >+class version
> >
> >  # FLASK
> >
> Can we have more meaningful name for XSM class. "version" doesn't seem to be
> informative enough to convey the message as to why we need it to be secure.
> (Is it a resource, or domain specific or event or...)

Heya!

1). Please trim your replies.
2). The patch is already in the tree.
3). If you prefer a different name - by all means please submit a patch for that.
    I am all up for it.
> 
> My suggestion would be xenmetainfo or something more meaningful.

/me blinks

Metainfo? I am not sure how one is suppose to associate that with
features that the hypervisor is exposing to guests.

> 
> Anshul
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 124+ messages in thread

end of thread, other threads:[~2016-03-24 19:19 UTC | newest]

Thread overview: 124+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-15 17:56 [PATCH v4] xSplice v1 design and implementation Konrad Rzeszutek Wilk
2016-03-15 17:56 ` [PATCH v4 01/34] compat/x86: Remove unncessary #define Konrad Rzeszutek Wilk
2016-03-15 18:57   ` Andrew Cooper
2016-03-16 11:08   ` Jan Beulich
2016-03-17  0:44     ` Konrad Rzeszutek Wilk
2016-03-17  7:45       ` Jan Beulich
2016-03-15 17:56 ` [PATCH v4 02/34] libxc: Remove dead code (XENVER_capabilities) Konrad Rzeszutek Wilk
2016-03-15 18:04   ` Andrew Cooper
2016-03-15 18:08     ` Konrad Rzeszutek Wilk
2016-03-16 18:11   ` Wei Liu
2016-03-15 17:56 ` [PATCH v4 03/34] xsm/xen_version: Add XSM for the xen_version hypercall Konrad Rzeszutek Wilk
2016-03-18 11:55   ` Jan Beulich
2016-03-18 17:26     ` Konrad Rzeszutek Wilk
2016-03-21 11:22       ` Jan Beulich
2016-03-22 16:10         ` Konrad Rzeszutek Wilk
2016-03-22 17:54           ` Daniel De Graaf
2016-03-22 17:49   ` Daniel De Graaf
2016-03-24 15:34   ` anshul makkar
2016-03-24 19:19     ` Konrad Rzeszutek Wilk
2016-03-15 17:56 ` [PATCH v4 04/34] HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane Konrad Rzeszutek Wilk
2016-03-15 18:29   ` Andrew Cooper
2016-03-15 20:19     ` Konrad Rzeszutek Wilk
2016-03-17  1:38       ` Konrad Rzeszutek Wilk
2016-03-17 14:28         ` Andrew Cooper
2016-03-18 12:36         ` Jan Beulich
2016-03-18 19:22           ` Konrad Rzeszutek Wilk
2016-03-21 12:45             ` Jan Beulich
2016-03-22 15:52               ` Konrad Rzeszutek Wilk
2016-03-22 16:06                 ` Jan Beulich
2016-03-22 18:57                   ` Konrad Rzeszutek Wilk
2016-03-22 19:28                     ` Andrew Cooper
2016-03-22 20:39                       ` Konrad Rzeszutek Wilk
2016-03-23  8:56                         ` Jan Beulich
2016-03-24  2:37                           ` Konrad Rzeszutek Wilk
2016-03-24  9:15                             ` Jan Beulich
2016-03-24 11:39                               ` Konrad Rzeszutek Wilk
2016-03-22 17:51   ` Daniel De Graaf
2016-03-15 17:56 ` [PATCH v4 05/34] libxc/libxl/python/xenstat: Use new XEN_VERSION_OP hypercall Konrad Rzeszutek Wilk
2016-03-15 18:45   ` Andrew Cooper
2016-03-16 12:31   ` George Dunlap
2016-03-16 18:11   ` Wei Liu
2016-03-17  1:08     ` Konrad Rzeszutek Wilk
2016-03-15 17:56 ` [PATCH v4 06/34] x86/arm: Add BUGFRAME_NR define and BUILD checks Konrad Rzeszutek Wilk
2016-03-15 18:54   ` Andrew Cooper
2016-03-16 11:49   ` Julien Grall
2016-03-18 12:40   ` Jan Beulich
2016-03-18 19:59     ` Konrad Rzeszutek Wilk
2016-03-21 12:49       ` Jan Beulich
2016-03-22 15:39         ` Konrad Rzeszutek Wilk
2016-03-22 15:58           ` Jan Beulich
2016-03-15 17:56 ` [PATCH v4 07/34] arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables Konrad Rzeszutek Wilk
2016-03-15 19:24   ` Andrew Cooper
2016-03-15 19:34     ` Konrad Rzeszutek Wilk
2016-03-15 19:51       ` Andrew Cooper
2016-03-15 20:02         ` Andrew Cooper
2016-03-16 10:33           ` Jan Beulich
2016-03-18 13:07   ` Jan Beulich
2016-03-22 20:18     ` Konrad Rzeszutek Wilk
2016-03-23  8:19       ` Jan Beulich
2016-03-23 11:17         ` Julien Grall
2016-03-23 11:21           ` Jan Beulich
2016-03-24  2:49         ` Konrad Rzeszutek Wilk
2016-03-24  9:20           ` Jan Beulich
2016-03-15 17:56 ` [PATCH v4 08/34] vmap: Make the while loop less fishy Konrad Rzeszutek Wilk
2016-03-15 19:33   ` Andrew Cooper
2016-03-17 11:49     ` Jan Beulich
2016-03-17 14:37       ` Andrew Cooper
2016-03-17 15:30         ` Jan Beulich
2016-03-17 16:06           ` Ian Jackson
2016-03-17 11:48   ` Jan Beulich
2016-03-17 16:08   ` Ian Jackson
2016-03-21 12:04     ` George Dunlap
2016-03-21 13:26       ` Jan Beulich
2016-03-21 14:22         ` George Dunlap
2016-03-21 15:05           ` Jan Beulich
2016-03-15 17:56 ` [PATCH v4 09/34] vmap: ASSERT on NULL Konrad Rzeszutek Wilk
2016-03-15 17:56 ` [PATCH v4 10/34] vmap: Add vmalloc_cb and vfree_cb Konrad Rzeszutek Wilk
2016-03-18 13:20   ` Jan Beulich
2016-03-15 17:56 ` [PATCH v4 11/34] xsplice: Design document Konrad Rzeszutek Wilk
2016-03-23 11:18   ` Jan Beulich
2016-03-23 20:12     ` Konrad Rzeszutek Wilk
2016-03-23 20:21       ` Konrad Rzeszutek Wilk
2016-03-24  3:15     ` Konrad Rzeszutek Wilk
2016-03-24  9:32       ` Jan Beulich
2016-03-15 17:56 ` [PATCH v4 12/34] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op Konrad Rzeszutek Wilk
2016-03-16 12:12   ` Julien Grall
2016-03-16 19:58     ` Konrad Rzeszutek Wilk
2016-03-23 13:51   ` Jan Beulich
2016-03-24  3:13     ` Konrad Rzeszutek Wilk
2016-03-24  9:29       ` Jan Beulich
2016-03-15 17:56 ` [PATCH v4 13/34] libxc: Implementation of XEN_XSPLICE_op in libxc Konrad Rzeszutek Wilk
2016-03-16 18:12   ` Wei Liu
2016-03-16 20:36     ` Konrad Rzeszutek Wilk
2016-03-15 17:56 ` [PATCH v4 14/34] xen-xsplice: Tool to manipulate xsplice payloads Konrad Rzeszutek Wilk
2016-03-16 18:12   ` Wei Liu
2016-03-15 17:56 ` [PATCH v4 15/34] xsplice: Add helper elf routines Konrad Rzeszutek Wilk
2016-03-15 17:56 ` [PATCH v4 16/34] xsplice: Implement payload loading Konrad Rzeszutek Wilk
2016-03-22 17:25   ` Konrad Rzeszutek Wilk
2016-03-15 17:56 ` [PATCH v4 17/34] xsplice: Implement support for applying/reverting/replacing patches Konrad Rzeszutek Wilk
2016-03-15 17:56 ` [PATCH v4 18/34] x86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version' Konrad Rzeszutek Wilk
2016-03-15 17:56 ` [PATCH v4 19/34] xsplice, symbols: Implement symbol name resolution on address Konrad Rzeszutek Wilk
2016-03-15 17:56 ` [PATCH v4 20/34] x86, xsplice: Print payload's symbol name and payload name in backtraces Konrad Rzeszutek Wilk
2016-03-15 17:56 ` [PATCH v4 21/34] xsplice: Add .xsplice.hooks functions and test-case Konrad Rzeszutek Wilk
2016-03-15 17:56 ` [PATCH v4 22/34] xsplice: Add support for bug frames Konrad Rzeszutek Wilk
2016-03-15 17:56 ` [PATCH v4 23/34] xsplice: Add support for exception tables Konrad Rzeszutek Wilk
2016-03-15 17:56 ` [PATCH v4 24/34] xsplice: Add support for alternatives Konrad Rzeszutek Wilk
2016-03-15 17:56 ` [PATCH v4 25/34] build_id: Provide ld-embedded build-ids Konrad Rzeszutek Wilk
2016-03-16 18:34   ` Julien Grall
2016-03-16 21:02     ` Konrad Rzeszutek Wilk
2016-03-17  1:12       ` Konrad Rzeszutek Wilk
2016-03-17 11:08         ` Julien Grall
2016-03-17 13:39           ` Konrad Rzeszutek Wilk
2016-03-15 17:56 ` [PATCH v4 26/34] HYPERCALL_version_op: Add VERSION_OP_build_id to retrieve build-id Konrad Rzeszutek Wilk
2016-03-15 17:56 ` [PATCH v4 27/34] libxl: info: Display build_id of the hypervisor using XEN_VERSION_OP_build_id Konrad Rzeszutek Wilk
2016-03-16 18:12   ` Wei Liu
2016-03-15 17:56 ` [PATCH v4 28/34] xsplice: Print build_id in keyhandler and on bootup Konrad Rzeszutek Wilk
2016-03-15 17:56 ` [PATCH v4 29/34] xsplice: Stacking build-id dependency checking Konrad Rzeszutek Wilk
2016-03-15 17:56 ` [PATCH v4 30/34] xsplice/xen_replace_world: Test-case for XSPLICE_ACTION_REPLACE Konrad Rzeszutek Wilk
2016-03-15 17:56 ` [PATCH v4 31/34] xsplice: Print dependency and payloads build_id in the keyhandler Konrad Rzeszutek Wilk
2016-03-15 17:56 ` [PATCH v4 32/34] xsplice: Prevent duplicate payloads from being loaded Konrad Rzeszutek Wilk
2016-03-15 17:56 ` [PATCH v4 33/34] xsplice: Add support for shadow variables Konrad Rzeszutek Wilk
2016-03-15 17:56 ` [PATCH v4 34/34] MAINTAINERS/xsplice: Add myself and Ross as the maintainers Konrad Rzeszutek Wilk
2016-03-16 11:10   ` Jan Beulich
2016-03-17  0:44     ` Konrad Rzeszutek Wilk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).