All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1] xSplice initial foundation patches.
@ 2015-09-16 21:01 Konrad Rzeszutek Wilk
  2015-09-16 21:01 ` [PATCH v1 1/5] xsplice: Design document Konrad Rzeszutek Wilk
                   ` (5 more replies)
  0 siblings, 6 replies; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-09-16 21:01 UTC (permalink / raw)
  To: xen-devel, msw, aliguori, amesserl, rick.harris, paul.voccio,
	steven.wilson, major.hayden, josh.kearney, jinsong.liu,
	xiantao.zxt, boris.ostrovsky, daniel.kiper, elena.ufimtseva,
	bob.liu, lars.kurth, hanweidong, peter.huangpeng, fanhenglong,
	liuyingdong, john.liuqiming, jbeulich, andrew.cooper3, mpohlack,
	ian.campbell

Changelog (since the RFC and the Seattle Xen presentation)
 - Finished off some of the work around the build-id.
 - Settled on the preemption mechanism.
 - Cleaned the patches a lot up, broke them up to easy
   review for maintainers.

*What is xSplice?*

A mechanism to  binarily patch the running hypervisor with new
opcodes that have come about due to primarily security updates.

*What will this patchset do once I've it*

Nothing. No patching at all. But you will have the API and
hypercalls that will server as foundation for the rest of the
code to come.

*Why are you emailing me?*

Please please review the patches. They are the foundation of the
design and any further work to be done (further work is outlined
in http://wiki.xen.org/wiki?title=XSplice)

*OK, what do you have?*

They are located at a git tree:
  git://xenbits.xen.org/people/konradwilk/xen.git xsplice.v1

And here is a brief description of the patches:

 [PATCH v1 1/5] xsplice: Design document.
   If you like it as paper and have a red pen ready,
   please print out http://darnok.org/xsplice.html
 
Implementation of the design: 
 [PATCH v1 2/5] xen/xsplice: Hypervisor implementation of
 [PATCH v1 3/5] libxc: Implementation of XEN_XSPLICE_op in libxc.

And a tool to use the toolstack libxc commands:
 [PATCH v1 4/5] xen-xsplice: Tool to manipulate xsplice payloads.

And the implementation to provide the build-id:
 [PATCH v1 5/5] xsplice: Use ld-embedded build-ids

Thank you!

 docs/misc/xsplice.markdown          | 1370 +++++++++++++++++++++++++++++++++++
 tools/libxc/include/xenctrl.h       |   18 +
 tools/libxc/xc_misc.c               |  300 ++++++++
 tools/misc/Makefile                 |    4 +
 tools/misc/xen-xsplice.c            |  456 ++++++++++++
 xen/arch/x86/Makefile               |    4 +-
 xen/arch/x86/xen.lds.S              |    5 +
 xen/common/Makefile                 |    1 +
 xen/common/keyhandler.c             |    8 +-
 xen/common/sysctl.c                 |    6 +
 xen/common/xsplice.c                |  528 ++++++++++++++
 xen/include/public/sysctl.h         |  174 +++++
 xen/include/xen/version.h           |    1 +
 xen/include/xen/xsplice.h           |    9 +
 xen/xsm/flask/hooks.c               |    3 +
 xen/xsm/flask/policy/access_vectors |    2 +
 16 files changed, 2886 insertions(+), 3 deletions(-)

Konrad Rzeszutek Wilk (4):
      xsplice: Design document.
      xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
      libxc: Implementation of XEN_XSPLICE_op in libxc.
      xen-xsplice: Tool to manipulate xsplice payloads.

Martin Pohlack (1):
      xsplice: Use ld-embedded build-ids

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v1 1/5] xsplice: Design document.
  2015-09-16 21:01 [PATCH v1] xSplice initial foundation patches Konrad Rzeszutek Wilk
@ 2015-09-16 21:01 ` Konrad Rzeszutek Wilk
  2015-10-05 10:02   ` Jan Beulich
                     ` (4 more replies)
  2015-09-16 21:01 ` [PATCH v1 2/5] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op Konrad Rzeszutek Wilk
                   ` (4 subsequent siblings)
  5 siblings, 5 replies; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-09-16 21:01 UTC (permalink / raw)
  To: xen-devel, msw, aliguori, amesserl, rick.harris, paul.voccio,
	steven.wilson, major.hayden, josh.kearney, jinsong.liu,
	xiantao.zxt, boris.ostrovsky, daniel.kiper, elena.ufimtseva,
	bob.liu, lars.kurth, hanweidong, peter.huangpeng, fanhenglong,
	liuyingdong, john.liuqiming, jbeulich, andrew.cooper3, mpohlack,
	ian.campbell
  Cc: Konrad Rzeszutek Wilk

A mechanism is required to binarily patch the running hypervisor with new
opcodes that have come about due to primarily security updates.

This document describes the design of the API that would allow us to
upload to the hypervisor binary patches.

This document has been shapped by the input from:
  Martin Pohlack <mpohlack@amazon.de>
  Jan Beulich <jbeulich@suse.com>

Thank you!

Input-from: Martin Pohlack <mpohlack@amazon.de>
Input-from: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 docs/misc/xsplice.markdown | 1370 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 1370 insertions(+)
 create mode 100644 docs/misc/xsplice.markdown

diff --git a/docs/misc/xsplice.markdown b/docs/misc/xsplice.markdown
new file mode 100644
index 0000000..18a264e
--- /dev/null
+++ b/docs/misc/xsplice.markdown
@@ -0,0 +1,1370 @@
+# xSplice Design v1
+
+## Rationale
+
+A mechanism is required to binarily patch the running hypervisor with new
+opcodes that have come about due to primarily security updates.
+
+This document describes the design of the API that would allow us to
+upload to the hypervisor binary patches.
+
+The document is split in four sections:
+ - Detailed descriptions of the problem statement.
+ - Design of the data structures.
+ - Design of the hypercalls.
+ - Implementation notes that should be taken into consideration.
+
+
+## Glossary
+
+ * splice - patch in the binary code with new opcodes
+ * trampoline - a jump to a new instruction.
+ * payload - telemetries of the old code along with binary blob of the new
+   function (if needed).
+ * reloc - telemetries contained in the payload to construct proper trampoline.
+
+## Multiple ways to patch
+
+The mechanism needs to be flexible to patch the hypervisor in multiple ways
+and be as simple as possible. The compiled code is contiguous in memory with
+no gaps - so we have no luxury of 'moving' existing code and must either
+insert a trampoline to the new code to be executed - or only modify in-place
+the code if there is sufficient space. The placement of new code has to be done
+by hypervisor and the virtual address for the new code is allocated dynamically.
+
+This implies that the hypervisor must compute the new offsets when splicing
+in the new trampoline code. Where the trampoline is added (inside
+the function we are patching or just the callers?) is also important.
+
+To lessen the amount of code in hypervisor, the consumer of the API
+is responsible for identifying which mechanism to employ and how many locations
+to patch. Combinations of modifying in-place code, adding trampoline, etc
+has to be supported. The API should allow read/write any memory within
+the hypervisor virtual address space.
+
+We must also have a mechanism to query what has been applied and a mechanism
+to revert it if needed.
+
+We must also have a mechanism to: (optional) provide an copy of the old code - so
+that the hypervisor can verify it against the code in memory; the new code;
+the symbol name of the function to be patched; or offset from the symbol;
+or virtual address.
+
+The complications that this design will encounter are explained later
+in this document.
+
+## Workflow
+
+
+The expected workflows of higher-level tools that manage multiple patches
+on production machines would be:
+
+ * The first obvious task is loading all available / suggested
+   hotpatches around system start.
+ * Whenever new hotpatches are installed, they should be loaded too.
+ * One wants to query which modules have been loaded at runtime.
+ * If unloading is deemed safe (see unloading below), one may want to
+   support a workflow where a specific hotpatch is marked as bad and
+   unloaded.
+ * If we do no restrict module activation order and want to report tboot
+   state on sequences, we might have a complexity explosion problem, in
+   what system hashes should be considered acceptable.
+
+## Patching code
+
+The first mechanism to patch that comes in mind is in-place replacement.
+That is replace the affected code with new code. Unfortunately the x86
+ISA is variable size which places limits on how much space we have available
+to replace the instructions. That is not a problem if the change is smaller
+than the original opcode and we can fill it with nops. Problems will
+appear if the replacement code is longer.
+
+The second mechanism is by replacing the call or jump to the
+old function with the address of the new function.
+
+A third mechanism is to add a jump to the new function at the
+start of the old function.
+
+### Example of trampoline and in-place splicing
+
+As example we will assume the hypervisor does not have XSA-132 (see
+*domctl/sysctl: don't leak hypervisor stack to toolstacks*
+4ff3449f0e9d175ceb9551d3f2aecb59273f639d) and we would like to binary patch
+the hypervisor with it. The original code looks as so:
+
+<pre>
+   48 89 e0                  mov    %rsp,%rax  
+   48 25 00 80 ff ff         and    $0xffffffffffff8000,%rax  
+</pre>
+
+while the new patched hypervisor would be:
+
+<pre>
+   48 c7 45 b8 00 00 00 00   movq   $0x0,-0x48(%rbp)  
+   48 c7 45 c0 00 00 00 00   movq   $0x0,-0x40(%rbp)  
+   48 c7 45 c8 00 00 00 00   movq   $0x0,-0x38(%rbp)  
+   48 89 e0                  mov    %rsp,%rax  
+   48 25 00 80 ff ff         and    $0xffffffffffff8000,%rax  
+</pre>
+
+This is inside the arch_do_domctl. This new change adds 21 extra
+bytes of code which alters all the offsets inside the function. To alter
+these offsets and add the extra 21 bytes of code we might not have enough
+space in .text to squeeze this in.
+
+As such we could simplify this problem by only patching the site
+which calls arch_do_domctl:
+
+<pre>
+<do_domctl>:  
+ e8 4b b1 05 00          callq  ffff82d08015fbb9 <arch_do_domctl>  
+</pre>
+
+with a new address for where the new `arch_do_domctl` would be (this
+area would be allocated dynamically).
+
+Astute readers will wonder what we need to do if we were to patch `do_domctl`
+- which is not called directly by hypervisor but on behalf of the guests via
+the `compat_hypercall_table` and `hypercall_table`.
+Patching the offset in `hypercall_table` for `do_domctl:
+(ffff82d080103079 <do_domctl>:)
+<pre>
+
+ ffff82d08024d490:   79 30  
+ ffff82d08024d492:   10 80 d0 82 ff ff   
+
+</pre>
+with the new address where the new `do_domctl` is possible. The other
+place where it is used is in `hvm_hypercall64_table` which would need
+to be patched in a similar way. This would require an in-place splicing
+of the new virtual address of `arch_do_domctl`.
+
+In summary this example patched the callee of the affected function by
+ * allocating memory for the new code to live in,
+ * changing the virtual address in all the functions which called the old
+   code (computing the new offset, patching the callq with a new callq).
+ * changing the function pointer tables with the new virtual address of
+   the function (splicing in the new virtual address). Since this table
+   resides in the .rodata section we would need to temporarily change the
+   page table permissions during this part.
+
+
+However it has severe drawbacks - the safety checks which have to make sure
+the function is not on the stack - must also check every caller. For some
+patches this could mean - if there were an sufficient large amount of
+callers - that we would never be able to apply the update.
+
+### Example of different trampoline patching.
+
+An alternative mechanism exists where we can insert a trampoline in the
+existing function to be patched to jump directly to the new code. This
+lessens the locations to be patched to one but it puts pressure on the
+CPU branching logic (I-cache, but it is just one unconditional jump).
+
+For this example we will assume that the hypervisor has not been compiled
+with fe2e079f642effb3d24a6e1a7096ef26e691d93e (XSA-125: *pre-fill structures
+for certain HYPERVISOR_xen_version sub-ops*) which mem-sets an structure
+in `xen_version` hypercall. This function is not called **anywhere** in
+the hypervisor (it is called by the guest) but referenced in the
+`compat_hypercall_table` and `hypercall_table` (and indirectly called
+from that). Patching the offset in `hypercall_table` for the old
+`do_xen_version` (ffff82d080112f9e <do_xen_version>)
+
+</pre>
+ ffff82d08024b270 <hypercall_table>  
+ ...  
+ ffff82d08024b2f8:   9e 2f 11 80 d0 82 ff ff  
+
+</pre>
+with the new address where the new `do_xen_version` is possible. The other
+place where it is used is in `hvm_hypercall64_table` which would need
+to be patched in a similar way. This would require an in-place splicing
+of the new virtual address of `do_xen_version`.
+
+An alternative solution would be to patch insert a trampoline in the
+old `do_xen_version' function to directly jump to the new `do_xen_version`.
+
+<pre>
+ ffff82d080112f9e <do_xen_version>:  
+ ffff82d080112f9e:       48 c7 c0 da ff ff ff    mov    $0xffffffffffffffda,%rax  
+ ffff82d080112fa5:       83 ff 09                cmp    $0x9,%edi  
+ ffff82d080112fa8:       0f 87 24 05 00 00       ja     ffff82d0801134d2 <do_xen_version+0x534>  
+</pre>
+
+with:
+
+<pre>
+ ffff82d080112f9e <do_xen_version>:  
+ ffff82d080112f9e:       e9 XX YY ZZ QQ          jmpq   [new do_xen_version]  
+</pre>
+
+which would lessen the amount of patching to just one location.
+
+In summary this example patched the affected function to jump to the
+new replacement function which required:
+ * allocating memory for the new code to live in,
+ * inserting trampoline with new offset in the old function to point to the
+   new function.
+ * Optionally we can insert in the old function a trampoline jump to an function
+   providing an BUG_ON to catch errant code.
+
+The disadvantage of this are that the unconditional jump will consume a small
+I-cache penalty. However the simplicity of the patching and higher chance
+of passing safety checks make this a worthwhile option.
+
+### Security
+
+With this method we can re-write the hypervisor - and as such we **MUST** be
+diligent in only allowing certain guests to perform this operation.
+
+Furthermore with SecureBoot or tboot, we **MUST** also verify the signature
+of the payload to be certain it came from a trusted source and integrity
+was intact.
+
+As such the hypercall **MUST** support an XSM policy to limit what the guest
+is allowed to invoke. If the system is booted with signature checking the
+signature checking will be enforced.
+
+## Design of payload format
+
+The payload **MUST** contain enough data to allow us to apply the update
+and also safely reverse it. As such we **MUST** know:
+
+ * (optional) What the old code is expected to be. We **MUST** be able verify it
+   against the runtime code if old code is included in the payload.
+ * Verify the build-id of hypervisor against the payload build-id.
+ * The locations in memory to be patched. This can be determined dynamically
+   via symbols or via virtual addresses.
+ * The new code (or data) that will be patched in.
+ * Signature to verify the payload.
+
+This binary format can be constructed using an custom binary format but
+there are severe disadvantages of it:
+
+ * The format might need to be changed and we need an mechanism to accommodate
+   that.
+ * It has to be platform agnostic.
+ * Easily constructed using existing tools.
+
+As such having the payload in an ELF file is the sensible way. We would be
+carrying the various sets of structures (and data) in the ELF sections under
+different names and with definitions. The prefix for the ELF section name
+would always be: *.xsplice* to match up to the names of the structures.
+
+Note that every structure has padding. This is added so that the hypervisor
+can re-use those fields as it sees fit.
+
+Earlier design attempted to ineptly explain the relations of the ELF sections
+to each other without using proper ELF mechanism (sh_info, sh_link, data
+structures using Elf_* types, etc). This design will explain in detail
+the structures and how they are used together and not dig in the ELF
+format - except mention that the section names should match the
+structure names.
+
+### ASCII art of structures.
+
+The diagram below is omitting some entries to easy the relationship explanation.
+
+<pre>
+                                                                          /---------------------\  
+                                                                       +->| xsplice_reloc_howto |  
+                                                                      /   \---------------------/  
+                                                /---------------\ 1:1/  
+                                             +->| xsplice_reloc |   /  
+                                            /   | - howto       +--/  1:1 /----------------\  
+                                           /    | - symbol      +-------->| xsplice_symbol |  
+                                     1:N  /     \---------------/       / \----------------/  
+/----------\        /--------------\     /                             /  
+| xsplice  |  1:1   | xsplice_code |    /                          1:1/  
+| - new    +------->|  - relocs    +---/  1:N   /-----------------\  /  
+| - old    +------->|  - sections  +----------->| xsplice_section | /  
+\----------/        |  - patches   +--\         | - symbol        +/ 1:1   /----------------\  
+                    \--------------/   \        | - addr          +------->| .text or .data |  
+                                        \       \----------------/         \----------------/  
+                                         \  
+                                      1:N \  
+                                           \    /----------------\  
+                                            +-->| xsplice_patch  |  1:1  /----------------\  
+                                                | - content      +------>| binary code or |  
+                                                \----------------/       | data           |  
+                                                                         \----------------/  
+
+</pre>
+
+### xsplice structures
+
+From the top (or left in the above diagram) the structures are:
+
+ *  `xsplice`. The top most structure - contains the the name of the update,
+    the id to match against the hypervisor, the pointer to the metadata for
+    the new code and optionally the metadata for the old code.
+
+ * `xsplice_code`. The structure that ties all of this together and defines
+   the payload. Contains arrays of `xsplice_reloc`, `xsplice_section`, and
+   `xsplice_patch`.
+
+ * `xsplice_reloc` contains telemetry used for patching - which describes the
+   targets to be patched and how to do it.
+
+ * `xsplice_section` - the safety data for the code. Contains pointer to the
+   symbol (`xsplice_symbols`) and pointer to the code (`.text`) or data (`.data`),
+   which are to be used during safety and dependency checking.
+
+ * `xsplice_patch`: the description of the new function to be patched in
+   along with the binary code or data.
+
+ * ` xsplice_reloc_howto`: the howto properly construct trampolines for an patch.
+   We may have multiple locations for which we need to insert a trampoline for a
+   payload and each location might require a different way of handling it.
+
+ * `xsplice_symbols `.  The symbol that will be patched.
+
+In short the *.xsplice* sections (with `xsplice` being the top) represent
+various structures to define the new code and safety checks for the old
+code (optional). The ELF provides the mechanism to glue it all together when
+loaded in memory.
+
+
+Note that a lot of these ideas are borrowed from kSplice which is
+available at: https://github.com/jirislaby/ksplice
+
+### struct xsplice
+
+The top most structure is quite simple. It defines the name, the id
+of the hypervisor, pointer to the new code & data and an pointer to
+the old code & data (optional).
+
+The `new` code uses all of the `xsplice_*` structures while the
+`old`  does not use the `xsplice_reloc` structures.
+
+The sections defining the structures will explicitly state
+when they are not used.
+
+<pre>
+struct xsplice {
+    const char *name; /* A sensible name for the patch. Up to 40 characters. */  
+    const char *build_id; /* ID of the hypervisor this binary was built against. */  
+    uint32_t version; /* Version of payload. */  
+    uint32_t id_size; /* Size of the ID. */  
+    struct xsplice_code *new; /* Pointer to the new code & data to be patched. */  
+    struct xsplice_code *old; /* Pointer to the old code & data to be checked against. */  
+    uint8_t pad[24];  /* Must be zero. */  
+};
+</pre>
+
+The size of this structure should be 64 bytes.
+
+### xsplice_code
+
+The structure embedded within this section ties the other
+structures together. It has the pointers with an start and end
+address for each set of structures. This means that an update
+can be split in multiple changes - for example to accomodate
+an update that contains both code and data and will need patching
+in both .text and .data sections.
+
+<pre>
+struct xsplice_code {  
+    struct xsplice_reloc *relocs; /* How to patch it. */  
+    struct xsplice_section *sections; /* Safety data. */  
+    struct xsplice_patch *patches; /* Patch code and data */  
+    uint32_t n_relocs;  
+    uint32_t n_sections;  
+    uint32_t n_patches;  
+    uint8_t pad[28]; /* Must be zero. */  
+};
+</pre>
+
+The size of this structure is 64 bytes.
+
+There can be at most two of those structures in the payload.
+One for the `new` and another for the `old` (optional).
+
+If it is for the old code the relocs, and relocs_end values will be ignored.
+
+
+### xsplice_reloc
+
+The `xsplice_code` defines an array of these structures. As such
+an singular structure defines an singular point where to patch the
+hypervisor.
+
+The structure contains the address of the hypervisor (if known),
+the symbol associated with this address, how the patching is to
+be done, and platform specific details.
+
+The `isns_added` is an value to be used to compute the new offset
+due to the quirks of the operands of the instruction. For example
+to patch in an jump operation to the new code - the offset is relative
+to the program counter of the next instruction - hence the offset
+value has to be subtracted by four bytes - hence this would contain -4 .
+
+The `isns_target` is the offset against the symbol.
+
+The relation of this structure with `xsplice_patch` is 1:1, even
+for inline patches. See the section detailing the structure
+`xsplice_reloc_howto`.
+
+The relation of this structure with `xsplice_section` is 1:1.
+
+This structure is as follow:
+
+<pre>
+struct xsplice_reloc {  
+    uint64_t addr; /* The address of the relocation (if known). */  
+    struct xsplice_symbol *symbol; /* Symbol for this relocation. */  
+    int64_t isns_target; /* rest of the ELF addend.  This is equal to the offset against the symbol that the relocation refers to. */  
+    struct xsplice_reloc_howto  *howto; /* Pointer to the above structure. */  
+    int64_t isns_added; /* ELF addend resulting from quirks of instruction one of whose operands is the relocation. For example, this is -4 on x86 pc-relative jumps. */  
+    uint8_t pad[24];  /* Must be zero. */  
+};  
+
+</pre>
+
+The size of this structure is 64 bytes.
+
+### xsplice_section
+
+The structure defined in this section is used during pre-patching and
+during patching. Pre-patching it is used to verify that it is safe
+to update with the new changes - and contains safety data on the old code
+and what kind of matching we are to expect.
+
+That is whether the address (either provided or resolved when payload is
+loaded by referencing the symbols) is:
+
+ * in memory,
+ * correct size,
+ * in it's proper ELF section,
+ * has been already patched (or not),
+ * is expected not to be on any CPU stack - (or if it is OK for it be on the CPU stack).
+
+with what we expect it to be.
+
+Some of the checks can be relaxed, as such the `flag` values
+can be or-ed together.
+
+Depending on the time when patching is done, stack checking might not
+be required.
+<pre>
+
+#define XSPLICE_SECTION_TEXT   0x00000001 /* Section is in .text */  
+#define XSPLICE_SECTION_RODATA 0x00000002 /* Section is in .rodata */  
+#define XSPLICE_SECTION_DATA   0x00000004 /* Section is in .data */  
+#define XSPLICE_SECTION_STRING 0x00000008 /* Section is in .str */  
+
+#define XSPLICE_SECTION_TEXT_INLINE 0x00000200 /* Change is to be inline. */   
+#define XSPLICE_SECTION_MATCH_EXACT 0x00000400 /* Must match exactly. */  
+#define XSPLICE_SECTION_NO_STACKCHECK 0x00000800 /* Do not check the stack. */  
+
+
+struct xsplice_section {  
+    struct xsplice_symbol *symbol; /* The symbol associated with this change. */  
+    uint64_t address; /* The address of the section (if known). */  
+    uint32_t size; /* The size of the section. */  
+    uint32_t flags; /* Various XSPLICE_SECTION_* flags. */
+    uint8_t pad[2]; /* To be zero. */  
+};
+
+</pre>
+
+The size of this structure is 32 bytes.
+
+### xsplice_patch
+
+This structure has the binary code (or data) to be patched. Depending on the
+type it can either an inline patch (data or text) or require an relocation
+change (which requires a trampoline). Naturally it also points to a blob
+of the binary data to patch in, and the size of the patch.
+
+The `addr` is used when the patch is for inline change. It can be
+an virtual address or an offset from the symbol start.
+
+If it is an relocation (requiring a trampoline), the `addr` should be zero.
+
+There must be an corresponding ` struct xsplice_reloc` and
+`struct xsplice_section` describing this patch.
+
+<pre>
+#define XSPLICE_PATCH_INLINE_TEXT   0x1
+#define XSPLICE_PATCH_INLINE_DATA   0x2
+#define XSPLICE_PATCH_RELOC_TEXT    0x3
+
+struct xsplice_patch {  
+    uint32_t type; /* XSPLICE_PATCH_* .*/  
+    uint32_t size; /* Size of patch. */  
+    uint64_t addr; /* The address (or offset from symbol) of the inline new code (or data). */  
+    void *content; /* The bytes to be installed. */  
+    uint8_t pad[40]; /* Must be zero. */  
+};
+
+</pre>
+
+The size of this structure is 64 bytes.
+
+### xsplice_symbols
+
+The structure contains an pointer to the name of the ELF symbol
+to be patched and as well an unique name for the symbol.
+
+The `label` is used for diagnostic purposes - such as including the
+name and the offset.
+
+The structure is as follow:
+
+<pre>
+struct xsplice_symbol {  
+    const char *name; /* The ELF name of the symbol. */  
+    const char *label; /* A unique xSplice name for the symbol. */  
+    uint8_t pad[16]; /* Must be zero. */  
+};  
+</pre>
+
+The size of this structure is 32 bytes.
+
+
+### xsplice_reloc_howto
+
+The howto defines in the detail the change. It contains the type,
+whether the relocation is relative, the size of the relocation,
+bitmask for which parts of the instruction or data are to be replaced,
+amount the final relocation is shifted by (to drop unwanted data), and
+whether the replacement should be interpreted as signed value.
+
+The structure is as follow:
+
+<pre>
+#define XSPLICE_HOWTO_INLINE        0x1 /* It is an inline replacement. */  
+#define XSPLICE_HOWTO_RELOC_PATCH   0x2 /* Add a trampoline. */  
+
+#define XSPLICE_HOWTO_FLAG_PC_REL    0x1 /* Is PC relative. */  
+#define XSPLICE_HOWOT_FLAG_SIGN      0x2 /* Should the new value be treated as signed value. */  
+
+struct xsplice_reloc_howto {  
+    uint32_t    howto; /* XSPLICE_HOWTO_* */  
+    uint32_t    flag; /* XSPLICE_HOWTO_FLAG_* */  
+    uint32_t    size; /* Size, in bytes, of the item to be relocated. */  
+    uint32_t    r_shift; /* The value the final relocation is shifted right by; used to drop unwanted data from the relocation. */  
+    uint64_t    mask; /* Bitmask for which parts of the instruction or data are replaced with the relocated value. */  
+    uint8_t     pad[8]; /* Must be zero. */  
+};  
+
+</pre>
+
+The size of this structure is 32 bytes.
+
+### Example
+
+There is a wealth of information that the payload must have to define a simple
+patch.  For this example we will assume that the hypervisor has not been compiled
+with fe2e079f642effb3d24a6e1a7096ef26e691d93e (XSA-125: *pre-fill structures
+for certain HYPERVISOR_xen_version sub-ops*) which mem-sets an structure
+in `xen_version` hypercall. This function is not called **anywhere** in
+the hypervisor (it is called by the guest) but referenced in the
+`compat_hypercall_table` and `hypercall_table` (and indirectly called
+from that). There are two ways to patch this:
+inline patch `hvm_hypercall64_table` and `hvm_hypercall` with a new
+address for the new `do_xen_version` , or insert
+trampoline in `do_xen_version` code. The example will focus on the later.
+
+The `do_xen_version` code is located at virtual address ffff82d080112f9e.
+
+<pre>
+struct xsplice_code xsplice_xsa125;  
+struct xsplice_reloc relocs[1];  
+struct xsplice_section sections[1];  
+struct xsplice_patch patches[1];  
+struct xsplice_symbol do_xen_version_symbol;  
+struct xsplice_reloc_howto do_xen_version_howto;  
+char do_xen_version_new_code[1728];  
+
+#ifndef  BUILD_ID 
+#define BUILD_ID "92dd05a61556c554155b1508c9cf67d993336d28"
+#endif  
+
+struct xsplice xsa125 = {  
+    .name = "xsa125",  
+    .build_id = BUILD_ID,  
+    .old = NULL,  
+    .new = &xsplice_xsa125,  
+};  
+
+struct xsplice_code xsplice_xsa125 = {  
+    .relocs = &relocs[0],  
+    .n_relocs = 1,  
+    .sections = &sections[0],  
+    .n_sections = 1,  
+    .patches = &patches[0],  
+    .n_patches = 1,   
+};
+
+struct xsplice_reloc relocs[1] = {  
+    {  
+        .addr = 0xffff82d080112f9e,  
+        .symbol = &do_xen_version_symbol,  
+        .isns_target = 0,  
+        .howto = &do_xen_version_howto,  
+        .isns_added = -4,  
+    },  
+};  
+
+struct xsplice_symbol do_xen_version_symbol = {  
+    .name = "do_xen_version",  
+    .label = "do_xen_version+<0x0>",  
+};  
+
+struct xsplice_reloc_howto do_xen_version_howto = {  
+    .type = XSPLICE_HOWTO_RELOC_PATCH,  
+    .flag = XSPLICE_HOWTO_FLAG_PC_REL,  
+    .r_shift = 0,  
+    .mask = (-1ULL),  
+};  
+
+
+struct xsplice_section sections[1] = {  
+    {  
+        .symbol = &do_xen_version_symbol,  
+        .address = 0xffff82d080112f9e,  
+        .size = 1728,  
+        .flags = XSPLICE_SECTION_TEXT,  
+    },  
+};  
+
+struct xsplice_patch patches[1] = {  
+    {  
+        .type = XSPLICE_PATCH_RELOC_TEXT,  
+        .size = 1728,  
+        .addr = 0,  
+        .content = &do_xen_version_new_code,  
+    },  
+};  
+
+char do_xen_version_new_code[1728] = { 0x83, 0xff, 0x09, /* And more code. */};  
+</pre>
+
+
+## Signature checking requirements.
+
+The signature checking requires that the layout of the data in memory
+**MUST** be same for signature to be verified. This means that the payload
+data layout in ELF format **MUST** match what the hypervisor would be
+expecting such that it can properly do signature verification.
+
+The signature is based on the all of the payloads continuously laid out
+in memory. The signature is to be appended at the end of the ELF payload
+prefixed with the string '~Module signature appended~\n', followed by
+an signature header then followed by the signature, key identifier, and signers
+name.
+
+Specifically the signature header would be:
+
+<pre>
+#define PKEY_ALGO_DSA       0  
+#define PKEY_ALGO_RSA       1  
+
+#define PKEY_ID_PGP         0 /* OpenPGP generated key ID */  
+#define PKEY_ID_X509        1 /* X.509 arbitrary subjectKeyIdentifier */  
+
+#define HASH_ALGO_MD4          0  
+#define HASH_ALGO_MD5          1  
+#define HASH_ALGO_SHA1         2  
+#define HASH_ALGO_RIPE_MD_160  3  
+#define HASH_ALGO_SHA256       4  
+#define HASH_ALGO_SHA384       5  
+#define HASH_ALGO_SHA512       6  
+#define HASH_ALGO_SHA224       7  
+#define HASH_ALGO_RIPE_MD_128  8  
+#define HASH_ALGO_RIPE_MD_256  9  
+#define HASH_ALGO_RIPE_MD_320 10  
+#define HASH_ALGO_WP_256      11  
+#define HASH_ALGO_WP_384      12  
+#define HASH_ALGO_WP_512      13  
+#define HASH_ALGO_TGR_128     14  
+#define HASH_ALGO_TGR_160     15  
+#define HASH_ALGO_TGR_192     16  
+
+
+struct elf_payload_signature {  
+	u8	algo;		/* Public-key crypto algorithm PKEY_ALGO_*. */  
+	u8	hash;		/* Digest algorithm: HASH_ALGO_*. */  
+	u8	id_type;	/* Key identifier type PKEY_ID*. */  
+	u8	signer_len;	/* Length of signer's name */  
+	u8	key_id_len;	/* Length of key identifier */  
+	u8	__pad[3];  
+	__be32	sig_len;	/* Length of signature data */  
+};
+
+</pre>
+(Note that this has been borrowed from Linux module signature code.).
+
+
+## Hypercalls
+
+We will employ the sub operations of the system management hypercall (sysctl).
+There are to be four sub-operations:
+
+ * upload the payloads.
+ * listing of payloads summary uploaded and their state.
+ * getting an particular payload summary and its state.
+ * command to apply, delete, or revert the payload.
+ * querying of the hypervisor build ID.
+
+Most of the actions are asynchronous therefore the caller is responsible
+to verify that it has been applied properly by retrieving the summary of it
+and verifying that there are no error codes associated with the payload.
+
+We **MUST** make some of them asynchronous due to the nature of patching
+it requires every physical CPU to be lock-step with each other.
+The patching mechanism while an implementation detail, is not an short
+operation and as such the design **MUST** assume it will be an long-running
+operation.
+
+The sub-operations will spell out how preemption is to be handled (if at all).
+
+Furthermore it is possible to have multiple different payloads for the same
+function. As such an unique id per payload has to be visible to allow proper manipulation.
+
+The hypercall is part of the `xen_sysctl`. The top level structure contains
+one uint32_t to determine the sub-operations:
+
+<pre>
+struct xen_sysctl_xsplice_op {  
+    uint32_t cmd;  
+	union {  
+          ... see below ...  
+        } u;  
+};  
+
+</pre>
+while the rest of hypercall specific structures are part of the this structure.
+
+### Basic type: struct xen_xsplice_id
+
+Most of the hypercalls employ an shared structure called `struct xen_xsplice_id`
+which contains:
+
+ * `name` - pointer where the string for the id is located.
+ * `size` - the size of the string
+ * `_pad` - padding - to be zero.
+
+The structure is as follow:
+
+<pre>
+#define XEN_XSPLICE_ID_SIZE 128
+struct xen_xsplice_id {  
+    XEN_GUEST_HANDLE_64(char) name;         /* IN, pointer to name. */  
+    uint32_t    size;                       /* IN, size of name. May be upto   
+                                               XEN_XSPLICE_ID_SIZE. */  
+    uint32_t    _pad;  
+};  
+</pre>
+### XEN_SYSCTL_XSPLICE_UPLOAD (0)
+
+Upload a payload to the hypervisor. The payload is verified
+against basic checks and if there are any issues the proper return code
+will be returned. The payload is not applied at this time - that is
+controlled by *XEN_SYSCTL_XSPLICE_ACTION*.
+
+The caller provides:
+
+ * A `struct xen_xsplice_id` called `id` which has the unique id.
+ * `size` the size of the ELF payload (in bytes).
+ * `payload` the virtual address of where the ELF payload is.
+
+The `id` could be an UUID in mind that stays fixed forever for a given
+hotpatch. It can be embedded into the Elf payload at creation time
+and extracted by tools.
+
+The return value is zero if the payload was succesfully uploaded.
+Otherwise an EXX return value is provided. Duplicate `id` are not supported.
+The payload at this point is verified against the basic checks.
+
+The `payload` is the ELF payload as mentioned in the `Payload format` section.
+
+The structure is as follow:
+
+<pre>
+struct xen_sysctl_xsplice_upload {  
+    xen_xsplice_id_t id;        /* IN, name of the patch. */  
+    uint64_t size;              /* IN, size of the ELF file. */
+    XEN_GUEST_HANDLE_64(uint8) payload; /* IN: ELF file. */  
+}; 
+</pre>
+
+### XEN_SYSCTL_XSPLICE_GET (1)
+
+Retrieve an status of an specific payload. This caller provides:
+
+ * A `struct xen_xsplice_id` called `id` which has the unique id.
+ * A `struct xen_xsplice_status` structure which has all members
+   set to zero: That is:
+   * `status` *MUST* be set to zero.
+   * `_pad` *MUST* be set to zero.
+
+Upon completion the `struct xen_xsplice_status` is updated.
+
+ * `_pad` - reserved for further usage.
+ * `status` - whether it has been:
+   * *XSPLICE_STATUS_LOADED* (0x1) has been loaded.
+   * *XSPLICE_STATUS_PROGRESS* (0x2) acting on the **XEN_SYSCTL_XSPLICE_ACTION** command.
+   * *XSPLICE_STATUS_CHECKED*  (0x4) the ELF payload safety checks passed.
+   * *XSPLICE_STATUS_APPLIED* (0x8) loaded, checked, and applied.
+   * *XSPLICE_STATUS_REVERTED* (0x10) loaded, checked, applied and then also reverted.
+   *  Negative values is an error. The error would be of EXX format.
+
+The return value is zero on success and EXX on failure. This operation
+is synchronous and does not require preemption.
+
+If the status has a negative value more details on the payload failure
+can be retrieved via **XEN_SYSCTL_XSPLICE_INFO** hypercall.
+
+The structure is as follow:
+
+<pre>
+struct xen_xsplice_status {  
+#define XSPLICE_STATUS_LOADED       0x01  
+#define XSPLICE_STATUS_PROGRESS     0x02  
+#define XSPLICE_STATUS_CHECKED      0x04  
+#define XSPLICE_STATUS_APPLIED      0x08  
+#define XSPLICE_STATUS_REVERTED     0x10  
+ /* Any negative value is an error. The error would be in -EXX format. */  
+	int32_t status;                 /* OUT, On IN has to be zero. */  
+    uint32_t _pad;                  /* IN: Must be zero. */  
+};  
+
+struct xen_sysctl_xsplice_summary {  
+    xen_xsplice_id_t    id;         /* IN, the name of the payload. */  
+    xen_xsplice_status_t status;    /* IN/OUT: status of the payload. */  
+};  
+</pre>
+
+### XEN_SYSCTL_XSPLICE_LIST (2)
+
+Retrieve an array of abbreviated status and names of payloads that are loaded in the
+hypervisor.
+
+The caller provides:
+
+ * `version`. Initially (on first hypercall) *MUST* be zero.
+ * `idx` index iterator. On first call *MUST* be zero, subsequent calls varies.
+ * `nr` the max number of entries to populate.
+ * `_pad` - *MUST* be zero.
+ * `status` virtual address of where to write `struct xen_xsplice_status`
+   structures. *MUST* allocate up to `nr` of them.
+ * `id` - virtual address of where to write the unique id of the payload.
+   *MUST* allocate up to `nr` of them. Each *MUST* be of
+   **XEN_XSPLICE_ID_SIZE** size.
+ * `len` - virtual address of where to write the length of each unique id
+   of the payload. *MUST* allocate up to `nr` of them. Each *MUST* be
+   of sizeof(uint32_t) (4 bytes).
+
+If the hypercall returns an positive number, it is the number (up to `nr`)
+of the payloads returned, along with `nr` updated with the number of remaining
+payloads, `version` updated (it may be the same across hypercalls. If it
+varies the data is stale and further calls could fail). The `status`,
+`id`, and `len`' are updated at their designed index value (`idx`) with
+the returned value of data.
+
+If the hypercall returns E2BIG the `count` is too big and should be
+lowered.
+
+This operation can be preempted by the hypercall returning EAGAIN.
+Retry.
+
+Note that due to the asynchronous nature of hypercalls the domain might have
+added or removed the number of payloads making this information stale. It is
+the responsibility of the toolstack to use the `version` field to check
+between each invocation. if the version differs it should discard the stale
+data and start from scratch. It is OK for the toolstack to use the new
+`version` field.
+
+The `struct xen_xsplice_status` structure contains an status of payload which includes:
+
+ * `status` - whether it has been:
+   * *XSPLICE_STATUS_LOADED* (0x1) has been loaded.
+   * *XSPLICE_STATUS_PROGRESS* (0x2) acting on the **XEN_SYSCTL_XSPLICE_ACTION** command.
+   * *XSPLICE_STATUS_CHECKED*  (0x4) the ELF payload safety checks passed.
+   * *XSPLICE_STATUS_APPLIED* (0x8) loaded, checked, and applied.
+   * *XSPLICE_STATUS_REVERTED* (0x10) loaded, checked, applied and then also reverted.
+   * Any negative values means there has been error. The value is in EXX format.
+
+If the status has a negative value more details on the payload failure
+can be retrieved via **XEN_SYSCTL_XSPLICE_INFO** hypercall.
+
+The structure is as follow:
+
+<pre>
+struct xen_sysctl_xsplice_list {  
+    uint32_t version;                       /* IN/OUT: Initially *MUST* be zero.  
+                                               On subsequent calls reuse value.  
+                                               If varies between calls, we are  
+                                             * getting stale data. */  
+    uint32_t idx;                           /* IN/OUT: Index into array. */  
+    uint32_t nr;                            /* IN: How many status, id, and len  
+                                               should populate.  
+                                               OUT: How many payloads left. */  
+    uint32_t _pad;                          /* IN: Must be zero. */  
+    XEN_GUEST_HANDLE_64(xen_xsplice_status_t) status;  /* OUT. Must have enough  
+                                               space allocate for n of them. */  
+    XEN_GUEST_HANDLE_64(char) id;           /* OUT: Array of ids. Each member  
+                                               MUST XEN_XSPLICE_ID_SIZE in size.  
+                                               Must have n of them. */  
+    XEN_GUEST_HANDLE_64(uint32) len;        /* OUT: Array of lengths of ids.  
+                                               Must have n of them. */  
+};  
+</pre>
+
+### XEN_SYSCTL_XSPLICE_ACTION (3)
+
+Perform an operation on the payload structure referenced by the `id` field.
+The operation request is asynchronous and the status should be retrieved
+by using either **XEN_SYSCTL_XSPLICE_GET** or **XEN_SYSCTL_XSPLICE_LIST** hypercall.
+If the operation fails more details on the operation can be retrieved via
+**XEN_SYSCTL_XSPLICE_INFO** hypercall.
+
+The caller provides:
+
+ * A 'struct xen_xsplice_id` `id` containing the unique id.
+ * `cmd` the command requested:
+  * *XSPLICE_ACTION_CHECK* (1) check that the payload will apply properly.
+    This also verfies the payload - which may require SecureBoot firmware
+    calls.
+  * *XSPLICE_ACTION_UNLOAD* (2) unload the payload.
+   Any further hypercalls against the `id` will result in failure unless
+   **XEN_SYSCTL_XSPLICE_UPLOAD** hypercall is perfomed with same `id`.
+  * *XSPLICE_ACTION_REVERT* (3) revert the payload. If the operation takes
+  more time than the upper bound of time the `status` will EBUSY.
+  * *XSPLICE_ACTION_APPLY* (4) apply the payload. If the operation takes
+  more time than the upper bound of time the `status` will be EBUSY.
+  * *XSPLICE_ACTION_LOADED* is an initial state and cannot be requested.
+ * `time` the upper bound of time the cmd should take. Zero means infinite.
+   If within the time the operation does not succeed the operation would go in
+   error state.
+ * `_pad` - *MUST* be zero.
+
+The return value will be zero unless the provided fields are incorrect.
+
+The structure is as follow:
+
+<pre>
+#define XSPLICE_ACTION_CHECK  1  
+#define XSPLICE_ACTION_UNLOAD 2  
+#define XSPLICE_ACTION_REVERT 3  
+#define XSPLICE_ACTION_APPLY  4  
+
+struct xen_sysctl_xsplice_action {  
+    xen_xsplice_id_t id;                    /* IN, name of the patch. */  
+    uint32_t cmd;                           /* IN: XSPLICE_ACTION_* */  
+    uint32_t _pad;                          /* IN: MUST be zero. */  
+    uint64_t time;  /* IN, upper bound of time (ms) for the operation to take. */  
+};  
+
+</pre>
+
+### XEN_SYSCTL_XSPLICE_INFO (4)
+
+Retrieve information useful for the patching tools.
+
+The calleer provides:
+
+ * `cmd` The command of the sub-command requested, which can be:
+   * **XEN_SYSCTL_XSPLICE_INFO_BUILD_ID** (0) - which request the build-id
+   of the hypervisor.
+   * **XEN_SYSCTL_XSPLICE_INFO_TRACE_CLEAR** (1) - clear the hypervisor
+    patching trace.
+   * **XEN_SYSCTL_XSPLICE_INFO_TRACE_GET** (2) - retrieve the trace. The
+    return value will be the number of bytes retrieved. Zero means end of trace.
+ * `size` - The size of the `info` char array. *MUST* not be zero.
+ * `info` - virtual address where to write requested information.
+    *MUST* not be zero.
+
+On completion if the return value contains an positive value it signifies
+that this many bytes were written into `info`.
+
+The return value can also be EXX format. EINVAL if incorrect values have been provided,
+ENOENT the requested information cannot be supplied, or any other error occurred.
+
+The structure is as a follow:
+<pre>
+#define XEN_SYSCTL_XSPLICE_INFO_BUILD_ID 0  
+struct xen_sysctl_xsplice_info {  
+    uint32_t cmd;                           /* IN: XEN_SYSCTL_XSPLICE_INFO_* */  
+    uint32_t size;                          /* IN: Size of info: OUT: Amount of  
+                                               bytes filed out in info. */  
+    union {  
+        XEN_GUEST_HANDLE_64(char) info;     /* OUT: Requested information. */  
+    } u;  
+};  
+</pre>
+
+## State diagrams of XSPLICE_ACTION values.
+
+There is a strict ordering state of what the commands can be.
+The XSPLICE_ACTION prefix has been dropped to easy reading:
+
+<pre>
+                        /->\  
+                        \  /  
+             /-------< CHECK  
+             |          |  
+             |          +  
+             |       UNLOAD<--\  
+             |                 \  
+             |                   \  
+      /-> APPLY -----------> REVERT --\  
+      |                               |  
+      \-------------------------------/  
+
+</pre>
+Or an state transition table of valid states:
+<pre>
++-------+-------+--------+--------+---------+-------+------------------+  
+| CHECK | APPLY | REVERT | UNLOAD | Current | Next  | Result           |  
++-------+-------+--------+--------+---------+-------+------------------+  
+|   x   |       |        |        | LOADED  | CHECK | Check payload.   |  
++-------+-------+--------+--------+---------+-------+------------------+  
+|       |       |        |   x    | LOADED  | UNLOAD| unload payload.  |  
++-------+-------+--------+--------+---------+-------+------------------+  
+|   x   |       |        |        | CHECK   | CHECK | Check payload.   |  
++-------+-------+--------+--------+---------+-------+------------------+  
+|       |   x   |        |        | CHECK   | APPLY | Apply payload.   |  
++-------+-------+--------+--------+---------+-------+------------------+  
+|       |       |        |   x    | CHECK   | UNLOAD| Unload payload.  |  
++-------+-------+--------+--------+---------+-------+------------------+  
+|       |       |   x    |        | APPLY   | REVERT| Revert payload.  |  
++-------+-------+--------+--------+---------+-------+------------------+  
+|       |   x   |        |        | REVERT  | APPLY | Apply payload.   |  
++-------+-------+--------+--------+---------+-------+------------------+  
+|       |       |        |   x    | REVERT  | UNLOAD| Unload payload.  |  
++-------+-------+--------+--------+---------+-------+------------------+  
+</pre>
+All the other state transitions are invalid.
+
+## Sequence of events.
+
+The normal sequence of events is to:
+
+ 1. *XEN_SYSCTL_XSPLICE_UPLOAD* to upload the payload. If there are errors *STOP* here.
+ 2. *XEN_SYSCTL_XSPLICE_GET* to check the `->status`. If in *XSPLICE_STATUS_PROGRESS* spin. If in *XSPLICE_STATUS_LOADED* go to next step.
+ 3. *XEN_SYSCTL_XSPLICE_ACTION* with *XSPLICE_ACTION_CHECK* command to verify that the payload can be succesfully applied.
+ 4. *XEN_SYSCTL_XSPLICE_GET* to check the `->status`. If in *XSPLICE_STATUS_PROGRESS* spin. If in *XSPLICE_STATUS_CHECKED* go to next step.
+ 5. *XEN_SYSCTL_XSPLICE_ACTION* with *XSPLICE_ACTION_APPLY* to apply the patch.
+ 6. *XEN_SYSCTL_XSPLICE_GET* to check the `->status`. If in *XSPLICE_STATUS_PROGRESS* spin. If in *XSPLICE_STATUS_APPLIED* exit with success.
+
+ 
+## Addendum
+
+Implementation quirks should not be discussed in a design document.
+
+However these observations can provide aid when developing against this
+document.
+
+
+### Alternative assembler
+
+Alternative assembler is a mechanism to use different instructions depending
+on what the CPU supports. This is done by providing multiple streams of code
+that can be patched in - or if the CPU does not support it - padded with
+`nop` operations. The alternative assembler macros cause the compiler to
+expand the code to place a most generic code in place - emit a special
+ELF .section header to tag this location. During run-time the hypervisor
+can leave the areas alone or patch them with an better suited opcodes.
+
+However these sections are part of .init. and as such can't reasonably be
+subject to patching.
+
+### .rodata sections
+
+The patching might require strings to be updated as well. As such we must be
+also able to patch the strings as needed. This sounds simple - but the compiler
+has a habit of coalescing strings that are the same - which means if we in-place
+alter the strings - other users will be inadvertently affected as well.
+
+This is also where pointers to functions live - and we may need to patch this
+as well. And switch-style jump tables.
+
+To guard against that we must be prepared to do patching similar to
+trampoline patching or in-line depending on the flavour. If we can
+do in-line patching we would need to:
+
+ * alter `.rodata` to be writeable.
+ * inline patch.
+ * alter `.rodata` to be read-only.
+
+If are doing trampoline patching we would need to:
+
+ * allocate a new memory location for the string.
+ * all locations which use this string will have to be updated to use the
+   offset to the string.
+ * mark the region RO when we are done.
+
+### .bss and .data sections.
+
+Patching writable data is not suitable as it is unclear what should be done
+depending on the current state of data. As such it should not be attempted.
+
+
+### Patching code which is in the stack.
+
+We should not patch the code which is on the stack. That can lead
+to corruption.
+
+### Inline patching
+
+The hypervisor should verify that the in-place patching would fit within
+the code or data.
+
+### Trampoline (e9 opcode)
+
+The e9 opcode used for jmpq uses a 32-bit signed displacement. That means
+we are limited to up to 2GB of virtual address to place the new code
+from the old code. That should not be a problem since Xen hypervisor has
+a very small footprint.
+
+However if we need - we can always add two trampolines. One at the 2GB
+limit that calls the next trampoline.
+
+Please note there is a small limitation for trampolines in
+function entries: The target function (+ trailing padding) must be able
+to accomodate the trampoline. On x86 with +-2 GB relative jumps,
+this means 5 bytes are  required.
+
+Depending on compiler settings, there are several functions in Xen that
+are smaller (without inter-function padding).
+
+<pre> 
+readelf -sW xen-syms | grep " FUNC " | \
+    awk '{ if ($3 < 5) print $3, $4, $5, $8 }'
+
+...
+3 FUNC LOCAL wbinvd_ipi
+3 FUNC LOCAL shadow_l1_index
+...
+</pre>
+A compile-time check for, e.g., a minimum alignment of functions or a
+runtime check that verifies symbol size (+ padding to next symbols) for
+that in the hypervisor is advised.
+
+### When to patch
+
+During the discussion on the design two candidates bubbled where
+the call stack for each CPU would be deterministic. This would
+minimize the chance of the patch not being applied due to safety
+checks failing.
+
+#### Rendezvous code instead of stop_machine for patching
+
+The hypervisor's time rendezvous code runs synchronously across all CPUs
+every second. Using the stop_machine to patch can stall the time rendezvous
+code and result in NMI. As such having the patching be done at the tail
+of rendezvous code should avoid this problem.
+
+However the entrance point for that code is
+do_softirq->timer_softirq_action->time_calibration
+which ends up calling on_selected_cpus on remote CPUs.
+
+The remote CPUs receive CALL_FUNCTION_VECTOR IPI and execute the
+desired function.
+
+
+#### Before entering the guest code.
+
+Before we call VMXResume we check whether any soft IRQs need to be executed.
+This is a good spot because all Xen stacks are effectively empty at
+that point.
+
+To randezvous all the CPUs an barrier with an maximum timeout (which
+could be adjusted), combined with forcing all other CPUs through the
+hypervisor with IPIs, can be utilized to have all the CPUs be lockstep.
+
+The approach is similar in concept to stop_machine and the time rendezvous
+but is time-bound. However the local CPU stack is much shorter and
+a lot more deterministic.
+
+### Compiling the hypervisor code
+
+Hotpatch generation often requires support for compiling the target
+with -ffunction-sections / -fdata-sections.  Changes would have to
+be done to the linker scripts to support this.
+
+
+### Generation of xSplice ELF payloads
+
+The design of that is not discussed in this design.
+
+The author of this design envisions objdump and objcopy along
+with special GCC parameters (see above) to create .o.xsplice files
+which can be used to splice an ELF with the new payload.
+
+The ksplice code can provide inspiration.
+
+### Exception tables and symbol tables growth
+
+We may need support for adapting or augmenting exception tables if
+patching such code.  Hotpatches may need to bring their own small
+exception tables (similar to how Linux modules support this).
+
+If supporting hotpatches that introduce additional exception-locations
+is not important, one could also change the exception table in-place
+and reorder it afterwards.
+
+
+### xSplice interdependencies
+
+xSplice patches interdependencies are tricky.
+
+There are the ways this can be addressed:
+ * A single large patch that subsumes and replaces all previous ones.
+   Over the life-time of patching the hypervisor this large patch
+   grows to accumulate all the code changes.
+ * Hotpatch stack - where an mechanism exists that loads the hotpatches
+   in the same order they were built in. We would need an build-id
+   of the hypevisor to make sure the hot-patches are build against the
+   correct build.
+ * Payload containing the old code to check against that. That allows
+   the hotpatches to be loaded indepedently (if they don't overlap) - or
+   if the old code also containst previously patched code - even if they
+   overlap.
+
+The disadvantage of the first large patch is that it can grow over
+time and not provide an bisection mechanism to identify faulty patches.
+
+The hot-patch stack puts stricts requirements on the order of the patches
+being loaded and requires an hypervisor build-id to match against.
+
+The old code allows much more flexibility and an additional guard,
+but is more complex to implement.
+
+### Hypervisor ID (buid-id)
+
+The build-id can help with:
+
+  * Prevent loading of wrong hotpatches (intended for other builds)
+
+  * Allow to identify suitable hotpatches on disk and help with runtime
+    tooling (if laid out using build ID)
+
+The build-id (aka hypervisor id) can be easily obtained by utilizing
+the ld --build-id operatin which (copied from ld):
+
+<pre>
+--build-id  
+    --build-id=style  
+        Request creation of ".note.gnu.build-id" ELF note section.  The contents of the note are unique bits identifying this  
+        linked file.  style can be "uuid" to use 128 random bits, "sha1" to use a 160-bit SHA1 hash on the normative parts of the  
+        output contents, "md5" to use a 128-bit MD5 hash on the normative parts of the output contents, or "0xhexstring" to use a  
+        chosen bit string specified as an even number of hexadecimal digits ("-" and ":" characters between digit pairs are  
+        ignored).  If style is omitted, "sha1" is used.  
+
+        The "md5" and "sha1" styles produces an identifier that is always the same in an identical output file, but will be  
+        unique among all nonidentical output files.  It is not intended to be compared as a checksum for the file's contents.  A  
+        linked file may be changed later by other tools, but the build ID bit string identifying the original linked file does  
+        not change.  
+
+        Passing "none" for style disables the setting from any "--build-id" options earlier on the command line.  
+
+</pre>
+
+### Symbol names
+
+
+Xen as it is now, has a couple of non-unique symbol names which will
+make runtime symbol identification hard.  Sometimes, static symbols
+simply have the same name in C files, sometimes such symbols get
+included via header files, and some C files are also compiled
+multiple times and linked under different names (guest_walk.c).
+
+As such we need to modify the linker to make sure that the symbol
+table qualifies also symbols by their source file name.
+
+For the awkward situations in which C-files are compiled multiple
+times patches we would need to some modification in the Xen code.
+
+
+The convention for file-type symbols (that would allow to map many
+symbols to their compilation unit) says that only the basename (i.e.,
+without directories) is embedded.  This creates another layer of
+confusion for duplicate file names in the build tree.
+
+That would have to be resolved.
+
+<pre>
+> find . -name \*.c -print0 | xargs -0 -n1 basename | sort | uniq -c | sort -n | tail -n10
+      3 shutdown.c
+      3 sysctl.c
+      3 time.c
+      3 xenoprof.c
+      4 gdbstub.c
+      4 irq.c
+      5 domain.c
+      5 mm.c
+      5 pci.c
+      5 traps.c
+</pre>
+
+### Security
+
+Only the privileged domain should be allowed to do this operation.
+
+
+### Handle inlined __LINE__
+
+
+This problem is related to hotpatch construction
+and potentially has influence on the design of the hotpatching
+infrastructure in Xen.
+
+For example:
+
+We have file1.c with functions f1 and f2 (in that order).  f2 contains a
+BUG() (or WARN()) macro and at that point embeds the source line number
+into the generated code for f2.
+
+Now we want to hotpatch f1 and the hotpatch source-code patch adds 2
+lines to f1 and as a consequence shifts out f2 by two lines.  The newly
+constructed file1.o will now contain differences in both binary
+functions f1 (because we actually changed it with the applied patch) and
+f2 (because the contained BUG macro embeds the new line number).
+
+Without additional information, an algorithm comparing file1.o before
+and after hotpatch application will determine both functions to be
+changed and will have to include both into the binary hotpatch.
+
+Options:
+
+1. Transform source code patches for hotpatches to be line-neutral for
+   each chunk.  This can be done in almost all cases with either
+   reformatting of the source code or by introducing artificial
+   preprocessor "#line n" directives to adjust for the introduced
+   differences.
+
+   This approach is low-tech and simple.  Potentially generated
+   backtraces and existing debug information refers to the original
+   build and does not reflect hotpatching state except for actually
+   hotpatched functions but should be mostly correct.
+
+2. Ignoring the problem and living with artificially large hotpatches
+   that unnecessarily patch many functions.
+
+   This approach might lead to some very large hotpatches depending on
+   content of specific source file.  It may also trigger pulling in
+   functions into the hotpatch that cannot reasonable be hotpatched due
+   to limitations of a hotpatching framework (init-sections, parts of
+   the hotpatching framework itself, ...) and may thereby prevent us
+   from patching a specific problem.
+
+   The decision between 1. and 2. can be made on a patch--by-patch
+   basis.
+
+3. Introducing an indirection table for storing line numbers and
+   treating that specially for binary diffing. Linux may follow
+   this approach.
+
+   We might either use this indirection table for runtime use and patch
+   that with each hotpatch (similarly to exception tables) or we might
+   purely use it when building hotpatches to ignore functions that only
+   differ at exactly the location where a line-number is embedded.
+
+Similar considerations are true to a lesser extent for __FILE__, but it
+could be argued that file renaming should be done outside of hotpatches.
+
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v1 2/5] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
  2015-09-16 21:01 [PATCH v1] xSplice initial foundation patches Konrad Rzeszutek Wilk
  2015-09-16 21:01 ` [PATCH v1 1/5] xsplice: Design document Konrad Rzeszutek Wilk
@ 2015-09-16 21:01 ` Konrad Rzeszutek Wilk
  2015-10-02 15:06   ` Jan Beulich
  2015-09-16 21:01 ` [PATCH v1 3/5] libxc: Implementation of XEN_XSPLICE_op in libxc Konrad Rzeszutek Wilk
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-09-16 21:01 UTC (permalink / raw)
  To: xen-devel, msw, aliguori, amesserl, rick.harris, paul.voccio,
	steven.wilson, major.hayden, josh.kearney, jinsong.liu,
	xiantao.zxt, boris.ostrovsky, daniel.kiper, elena.ufimtseva,
	bob.liu, lars.kurth, hanweidong, peter.huangpeng, fanhenglong,
	liuyingdong, john.liuqiming, jbeulich, andrew.cooper3, mpohlack,
	ian.campbell
  Cc: Konrad Rzeszutek Wilk

The implementation does not actually do any patching.

It just adds the framework for doing the hypercalls,
keeping track of ELF payloads, and the basic operations:
 - query which payloads exist,
 - query for specific payloads,
 - check*1, apply*1, and unload payloads.

*1: Which of course in this patch are nops.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/common/Makefile                 |   1 +
 xen/common/keyhandler.c             |   8 +-
 xen/common/sysctl.c                 |   6 +
 xen/common/xsplice.c                | 442 ++++++++++++++++++++++++++++++++++++
 xen/include/public/sysctl.h         | 156 +++++++++++++
 xen/include/xen/xsplice.h           |   9 +
 xen/xsm/flask/hooks.c               |   3 +
 xen/xsm/flask/policy/access_vectors |   2 +
 8 files changed, 626 insertions(+), 1 deletion(-)
 create mode 100644 xen/common/xsplice.c
 create mode 100644 xen/include/xen/xsplice.h

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 1726fac..7d53234 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -55,6 +55,7 @@ obj-y += vmap.o
 obj-y += vsprintf.o
 obj-y += wait.o
 obj-y += xmalloc_tlsf.o
+obj-y += xsplice.o
 
 obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma unlzo unlz4 earlycpio,$(n).init.o)
 
diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c
index 5d21e48..1d4574a 100644
--- a/xen/common/keyhandler.c
+++ b/xen/common/keyhandler.c
@@ -18,6 +18,7 @@
 #include <xen/mm.h>
 #include <xen/watchdog.h>
 #include <xen/init.h>
+#include <xen/xsplice.h>
 #include <asm/debugger.h>
 #include <asm/div64.h>
 
@@ -455,6 +456,11 @@ static struct keyhandler spinlock_reset_keyhandler = {
     .desc = "reset lock profile info"
 };
 #endif
+static struct keyhandler xsplice_printall_keyhandler = {
+    .diagnostic = 1,
+    .u.fn = xsplice_printall,
+    .desc = "print splicing information"
+};
 
 static void run_all_nonirq_keyhandlers(unsigned long unused)
 {
@@ -567,7 +573,7 @@ void __init initialize_keytable(void)
     register_keyhandler('l', &spinlock_printall_keyhandler);
     register_keyhandler('L', &spinlock_reset_keyhandler);
 #endif
-
+    register_keyhandler('x', &xsplice_printall_keyhandler);
 }
 
 /*
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 85e853f..517d684 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -28,6 +28,7 @@
 #include <xsm/xsm.h>
 #include <xen/pmstat.h>
 #include <xen/gcov.h>
+#include <xen/xsplice.h>
 
 long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
 {
@@ -460,6 +461,11 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
         ret = tmem_control(&op->u.tmem_op);
         break;
 
+    case XEN_SYSCTL_xsplice_op:
+        ret = xsplice_control(&op->u.xsplice);
+        copyback = 1;
+        break;
+
     default:
         ret = arch_do_sysctl(op, u_sysctl);
         copyback = 0;
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
new file mode 100644
index 0000000..d330efe
--- /dev/null
+++ b/xen/common/xsplice.c
@@ -0,0 +1,442 @@
+/*
+ * xSplice - Copyright Oracle Corp. Inc 2015.
+ *
+ * Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
+ */
+
+#include <xen/smp.h>
+#include <xen/keyhandler.h>
+#include <xen/spinlock.h>
+#include <xen/mm.h>
+#include <xen/list.h>
+#include <xen/guest_access.h>
+#include <xen/stdbool.h>
+#include <xen/sched.h>
+#include <xen/lib.h>
+#include <xen/xsplice.h>
+#include <public/sysctl.h>
+
+#include <asm/event.h>
+
+static DEFINE_SPINLOCK(payload_list_lock);
+static LIST_HEAD(payload_list);
+
+static unsigned int payload_cnt;
+static unsigned int payload_version = 1;
+
+struct payload {
+    char  *id;          /* Name of it. Past this structure. */
+    ssize_t id_len;     /* Length of the name. */
+
+    uint8_t *raw;       /* Pointer to Elf file. Past 'id'*/
+    ssize_t raw_len;    /* Size of 'raw'. */
+
+    int32_t status;     /* XSPLICE_STATUS_* or Exx type value. */
+    int32_t old_status; /* XSPLICE_STATUS_* or Exx type value. */
+
+    struct spinlock cmd_lock; /* Lock against the action. */
+    uint32_t cmd;       /* Action request. XSPLICE_ACTION_* */
+
+    /* Boring things below: */
+    struct list_head   list;   /* Linked to 'payload_list'. */
+    ssize_t len;        /* This structure + raw_len + id_len + 1. */
+
+    struct tasklet tasklet;
+};
+
+static const char *status2str(int64_t status)
+{
+#define STATUS(x) [XSPLICE_STATUS_##x] = #x
+    static const char *const names[] = {
+            STATUS(LOADED),
+            STATUS(PROGRESS),
+            STATUS(CHECKED),
+            STATUS(APPLIED),
+            STATUS(REVERTED),
+    };
+#undef STATUS
+
+    if (status >= ARRAY_SIZE(names))
+        return "unknown";
+
+    if (status < 0)
+        return "-EXX";
+
+    if (!names[status])
+        return "unknown";
+
+    return names[status];
+}
+
+void xsplice_printall(unsigned char key)
+{
+    struct payload *data;
+
+    spin_lock(&payload_list_lock);
+
+    list_for_each_entry ( data, &payload_list, list )
+    {
+        printk(" id=%s status=%s(%d,old=%d): \n", data->id,
+               status2str(data->status), data->status, data->old_status);
+    }
+    spin_unlock(&payload_list_lock);
+}
+
+static int verify_id(xen_xsplice_id_t *id)
+{
+    if ( id->size == 0 || id->size > XEN_XSPLICE_ID_SIZE )
+        return -EINVAL;
+
+    if ( id->_pad != 0 )
+        return -EINVAL;
+
+    if ( !guest_handle_okay(id->name, id->size) )
+        return -EINVAL;
+
+    return 0;
+}
+
+int find_payload(xen_xsplice_id_t *id, bool_t need_lock, struct payload **f)
+{
+    struct payload *data;
+    XEN_GUEST_HANDLE_PARAM(char) str;
+    char name[XEN_XSPLICE_ID_SIZE]; /* 128 bytes on stack. Perhaps kzalloc? */
+    int rc = -EINVAL;
+
+    rc = verify_id(id);
+    if ( rc )
+        return rc;
+
+    str = guest_handle_cast(id->name, char);
+    if ( copy_from_guest(name, str, id->size) )
+        return -EFAULT;
+
+    if ( need_lock )
+        spin_lock(&payload_list_lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( data, &payload_list, list )
+    {
+        if ( !strncmp(data->id, name, data->id_len) )
+        {
+            *f = data;
+            rc = 0;
+            break;
+        }
+    }
+
+    if ( need_lock )
+        spin_unlock(&payload_list_lock);
+
+    return rc;
+}
+
+
+static int verify_payload(xen_sysctl_xsplice_upload_t *upload)
+{
+    if ( verify_id(&upload->id) )
+        return -EINVAL;
+
+    if ( upload->size == 0 )
+        return -EINVAL;
+
+    if ( !guest_handle_okay(upload->payload, upload->size) )
+        return -EFAULT;
+
+    return 0;
+}
+
+/*
+ * We MUST be holding the spinlock.
+ */
+static void __free_payload(struct payload *data)
+{
+    list_del(&data->list);
+    payload_cnt --;
+    payload_version ++;
+    tasklet_kill(&data->tasklet);
+    free_xenheap_pages(data, get_order_from_bytes(data->len));
+}
+
+static void xsplice_tasklet(unsigned long _data)
+{
+    struct payload *data = (struct payload *)_data;
+
+    spin_lock(&data->cmd_lock);
+    switch ( data->cmd ) {
+    case XSPLICE_ACTION_CHECK:
+            /* TODO: Do the operation here. */
+            data->status = XSPLICE_STATUS_CHECKED;
+            break;
+    case XSPLICE_ACTION_APPLY:
+            /* TODO: Well, do the work :-) */
+            data->status = XSPLICE_STATUS_APPLIED;
+            break;
+    case XSPLICE_ACTION_REVERT:
+            /* TODO: Well, do the work :-) */
+            data->status = XSPLICE_STATUS_REVERTED;
+            break;
+    default:
+            data->status = -EINVAL;
+    }
+    spin_unlock(&data->cmd_lock);
+}
+
+static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
+{
+    struct payload *data = NULL;
+    int rc;
+    ssize_t len;
+
+    rc = verify_payload(upload);
+    if ( rc )
+        return rc;
+
+    rc = find_payload(&upload->id, true, &data);
+    if ( rc == 0 /* Found. */ )
+        return -EEXIST;
+
+    if ( rc != -ENOENT )
+        return rc;
+
+    /*
+     * Compute the size of the structures which need to be verified.
+     * The 1 is for the extra \0 in case name does not have it.
+     */
+    len = sizeof(*data) + upload->id.size + 1 + upload->size;
+    data = alloc_xenheap_pages(get_order_from_bytes(len), 0);
+    if ( !data )
+        return -ENOMEM;
+
+    memset(data, 0, len);
+    data->len = len;
+
+    /* At the end of structure we put the name. */
+    data->id = (char *)data + sizeof(*data);
+    data->id_len = upload->id.size;
+    /* And after the name + \0 we stick the raw ELF data. */
+    data->raw = (uint8_t *)data + sizeof(*data) + data->id_len + 1;
+    data->raw_len = upload->size;
+
+    rc = -EFAULT;
+    if ( copy_from_guest(data->raw, upload->payload, upload->size) )
+        goto err_out;
+
+    if ( copy_from_guest(data->id, upload->id.name, upload->id.size) )
+        goto err_out;
+
+    data->status = XSPLICE_STATUS_LOADED;
+    INIT_LIST_HEAD(&data->list);
+    spin_lock_init(&data->cmd_lock);
+    data->cmd = 0;
+    tasklet_init(&data->tasklet, xsplice_tasklet, (unsigned long)data);
+
+    spin_lock(&payload_list_lock);
+    list_add_tail(&data->list, &payload_list);
+    payload_cnt ++;
+    payload_version ++;
+    spin_unlock(&payload_list_lock);
+
+    return 0;
+
+ err_out:
+    free_xenheap_pages(data, get_order_from_bytes(len));
+    return rc;
+}
+
+static int xsplice_get(xen_sysctl_xsplice_summary_t *summary)
+{
+    struct payload *data;
+    int rc;
+
+    if ( summary->status.status )
+        return -EINVAL;
+
+    if ( summary->status._pad != 0 )
+        return -EINVAL;
+
+    rc = verify_id(&summary->id );
+    if ( rc )
+        return rc;
+
+    rc = find_payload(&summary->id, true, &data);
+    if ( rc )
+        return rc;
+
+    summary->status.status = data->status;
+
+    return 0;
+}
+
+static int xsplice_list(xen_sysctl_xsplice_list_t *list)
+{
+    xen_xsplice_status_t status;
+    struct payload *data;
+    unsigned int idx = 0, i = 0;
+    int rc = 0;
+    unsigned int ver = payload_version;
+
+    // TODO: Increase to a 64 or other value. Leave 4 for debug.
+    if ( list->nr > 4 )
+        return -E2BIG;
+
+    if ( list->_pad != 0 )
+        return -EINVAL;
+
+    if ( guest_handle_is_null(list->status) ||
+         guest_handle_is_null(list->id) ||
+         guest_handle_is_null(list->len) )
+        return -EINVAL;
+
+    if ( !guest_handle_okay(list->status, sizeof(status) * list->nr) ||
+         !guest_handle_okay(list->id, XEN_XSPLICE_ID_SIZE * list->nr) ||
+         !guest_handle_okay(list->len, sizeof(uint32_t) * list->nr) )
+        return -EINVAL;
+
+    spin_lock(&payload_list_lock);
+    if ( list->idx > payload_cnt )
+    {
+        spin_unlock(&payload_list_lock);
+        return -EINVAL;
+    }
+
+    status._pad = 0; /* No stack leaking. */
+    list_for_each_entry( data, &payload_list, list )
+    {
+        uint32_t len;
+
+        if ( list->idx > i++ )
+            continue;
+
+        status.status = data->status;
+        len = data->id_len;
+
+        /* N.B. 'idx' != 'i'. */
+        if ( copy_to_guest_offset(list->id, idx * XEN_XSPLICE_ID_SIZE,
+                                  data->id, len) ||
+             copy_to_guest_offset(list->len, idx, &len, 1) ||
+             copy_to_guest_offset(list->status, idx, &status, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+        idx ++;
+        if ( hypercall_preempt_check() || (idx + 1 > list->nr) )
+        {
+            break;
+        }
+    }
+    list->nr = payload_cnt - i; /* Remaining amount. */
+    spin_unlock(&payload_list_lock);
+    list->version = ver;
+
+    /* And how many we have processed. */
+    return rc ? rc : idx;
+}
+
+static int xsplice_action(xen_sysctl_xsplice_action_t *action)
+{
+    struct payload *data;
+    int rc;
+
+    if ( action->_pad != 0 )
+        return -EINVAL;
+
+    rc = verify_id(&action->id);
+    if ( rc )
+        return rc;
+
+    spin_lock(&payload_list_lock);
+    rc = find_payload(&action->id, false /* we are holding the lock. */, &data);
+    if ( rc )
+        goto out;
+
+    if ( action->cmd != XSPLICE_ACTION_UNLOAD )
+        spin_lock(&data->cmd_lock);
+
+    switch ( action->cmd )
+    {
+    case XSPLICE_ACTION_CHECK:
+        if ( ( data->status == XSPLICE_STATUS_LOADED ) )
+        {
+            data->old_status = data->status;
+            data->status = XSPLICE_STATUS_PROGRESS;
+            data->cmd = action->cmd;
+            tasklet_schedule(&data->tasklet);
+            rc = 0;
+        } else if ( data->status == XSPLICE_STATUS_CHECKED )
+        {
+            rc = 0;
+        }
+        break;
+    case XSPLICE_ACTION_UNLOAD:
+        if ( ( data->status == XSPLICE_STATUS_REVERTED ) ||
+             ( data->status == XSPLICE_STATUS_LOADED ) ||
+             ( data->status == XSPLICE_STATUS_CHECKED ) )
+        {
+            __free_payload(data);
+            /* No touching 'data' from here on! */
+            rc = 0;
+        }
+        break;
+    case XSPLICE_ACTION_REVERT:
+        if ( data->status == XSPLICE_STATUS_APPLIED )
+        {
+            data->old_status = data->status;
+            data->status = XSPLICE_STATUS_PROGRESS;
+            data->cmd = action->cmd;
+            rc = 0;
+            /* TODO: Tasklet is not good for this. We need a different vehicle. */
+            tasklet_schedule(&data->tasklet);
+        }
+        break;
+    case XSPLICE_ACTION_APPLY:
+        if ( ( data->status == XSPLICE_STATUS_CHECKED ) ||
+             ( data->status == XSPLICE_STATUS_REVERTED ))
+        {
+            data->old_status = data->status;
+            data->status = XSPLICE_STATUS_PROGRESS;
+            data->cmd = action->cmd;
+            rc = 0;
+            /* TODO: Tasklet is not good for this. We need a different vehicle. */
+            tasklet_schedule(&data->tasklet);
+        }
+        break;
+    default:
+        rc = -ENOSYS;
+        break;
+    }
+
+    if ( action->cmd != XSPLICE_ACTION_UNLOAD )
+        spin_unlock(&data->cmd_lock);
+ out:
+    spin_unlock(&payload_list_lock);
+
+    return rc;
+}
+
+int xsplice_control(xen_sysctl_xsplice_op_t *xsplice)
+{
+    int rc;
+
+    switch ( xsplice->cmd )
+    {
+    case XEN_SYSCTL_XSPLICE_UPLOAD:
+        rc = xsplice_upload(&xsplice->u.upload);
+        break;
+    case XEN_SYSCTL_XSPLICE_GET:
+        rc = xsplice_get(&xsplice->u.get);
+        break;
+    case XEN_SYSCTL_XSPLICE_LIST:
+        rc = xsplice_list(&xsplice->u.list);
+        break;
+    case XEN_SYSCTL_XSPLICE_ACTION:
+        rc = xsplice_action(&xsplice->u.action);
+        break;
+    default:
+        rc = -ENOSYS;
+        break;
+   }
+
+    return rc;
+}
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 0cacacc..08952de 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -764,6 +764,160 @@ struct xen_sysctl_tmem_op {
 typedef struct xen_sysctl_tmem_op xen_sysctl_tmem_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tmem_op_t);
 
+/*
+ * XEN_SYSCTL_XSPLICE_op
+ *
+ * Refer to the docs/misc/xsplice.markdown for the design details
+ * of this hypercall.
+ */
+
+/*
+ * Structure describing an ELF payload. Uniquely identifies the
+ * payload. Should be human readable.
+ * Recommended length is XEN_XSPLICE_ID_SIZE.
+ */
+#define XEN_XSPLICE_ID_SIZE 128
+struct xen_xsplice_id {
+    XEN_GUEST_HANDLE_64(char) name;         /* IN, pointer to name. */
+    uint32_t    size;                       /* IN, size of name. May be upto
+                                               XEN_XSPLICE_ID_SIZE. */
+    uint32_t    _pad;
+};
+typedef struct xen_xsplice_id xen_xsplice_id_t;
+DEFINE_XEN_GUEST_HANDLE(xen_xsplice_id_t);
+
+/*
+ * Upload a payload to the hypervisor. The payload is verified
+ * against basic checks and if there are any issues the proper return code
+ * will be returned. The payload is not applied at this time - that is
+ * controlled by XEN_SYSCTL_XSPLICE_ACTION.
+ *
+ * The return value is zero if the payload was succesfully uploaded.
+ * Otherwise an EXX return value is provided. Duplicate `id` are not supported.
+ * The payload at this point is verified against the basic checks.
+ *
+ * The `payload` is the ELF payload as mentioned in the `Payload format`
+ * section in the xSplice design document.
+ */
+#define XEN_SYSCTL_XSPLICE_UPLOAD 0
+struct xen_sysctl_xsplice_upload {
+    xen_xsplice_id_t id;                    /* IN, name of the patch. */
+    uint64_t    size;                       /* IN, size of the ELF file. */
+    XEN_GUEST_HANDLE_64(uint8) payload;     /* IN, the ELF file. */
+};
+typedef struct xen_sysctl_xsplice_upload xen_sysctl_xsplice_upload_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_upload_t);
+
+/*
+ * Retrieve an status of an specific payload.
+ *
+ * Upon completion the `struct xen_xsplice_status` is updated.
+ *
+ * The return value is zero on success and EXX on failure. This operation
+ * is synchronous and does not require preemption.
+ */
+#define XEN_SYSCTL_XSPLICE_GET 1
+
+struct xen_xsplice_status {
+#define XSPLICE_STATUS_LOADED       0x01
+#define XSPLICE_STATUS_PROGRESS     0x02
+#define XSPLICE_STATUS_CHECKED      0x04
+#define XSPLICE_STATUS_APPLIED      0x08
+#define XSPLICE_STATUS_REVERTED     0x10
+ /* Any negative value is an error. The error would be in -EXX format. */
+	int32_t status;                 /* OUT, On IN has to be zero. */
+    uint32_t _pad;                  /* IN, Must be zero. */
+};
+typedef struct xen_xsplice_status xen_xsplice_status_t;
+DEFINE_XEN_GUEST_HANDLE(xen_xsplice_status_t);
+
+struct xen_sysctl_xsplice_summary {
+    xen_xsplice_id_t id;                    /* IN, name of the payload. */
+    xen_xsplice_status_t status;            /* IN/OUT, status of it. */
+};
+typedef struct xen_sysctl_xsplice_summary xen_sysctl_xsplice_summary_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_summary_t);
+
+/*
+ * Retrieve an array of abbreviated status and names of payloads that are
+ * loaded in the hypervisor.
+ *
+ * If the hypercall returns an positive number, it is the number (up to `nr`)
+ * of the payloads returned, along with `nr` updated with the number of remaining
+ * payloads, `version` updated (it may be the same across hypercalls. If it
+ * varies the data is stale and further calls could fail). The `status`,
+ * `id`, and `len`' are updated at their designed index value (`idx`) with
+ * the returned value of data.
+ *
+ * If the hypercall returns E2BIG the `count` is too big and should be
+ * lowered.
+ *
+ * This operation can be preempted by the hypercall returning EAGAIN.
+ * Retry.
+ *
+ * Note that due to the asynchronous nature of hypercalls the domain might have
+ * added or removed the number of payloads making this information stale. It is
+ * the responsibility of the toolstack to use the `version` field to check
+ * between each invocation. if the version differs it should discard the stale
+ * data and start from scratch. It is OK for the toolstack to use the new
+ * `version` field.
+ */
+#define XEN_SYSCTL_XSPLICE_LIST 2
+struct xen_sysctl_xsplice_list {
+    uint32_t version;                       /* IN/OUT: Initially *MUST* be zero.
+                                               On subsequent calls reuse value.
+                                               If varies between calls, we are
+                                             * getting stale data. */
+    uint32_t idx;                           /* IN/OUT: Index into array. */
+    uint32_t nr;                            /* IN: How many status, id, and len
+                                               should populate.
+                                               OUT: How many payloads left. */
+    uint32_t _pad;                          /* IN: Must be zero. */
+    XEN_GUEST_HANDLE_64(xen_xsplice_status_t) status;  /* OUT. Must have enough
+                                               space allocate for n of them. */
+    XEN_GUEST_HANDLE_64(char) id;           /* OUT: Array of ids. Each member
+                                               MUST XEN_XSPLICE_ID_SIZE in size.
+                                               Must have n of them. */
+    XEN_GUEST_HANDLE_64(uint32) len;        /* OUT: Array of lengths of ids.
+                                               Must have n of them. */
+};
+typedef struct xen_sysctl_xsplice_list xen_sysctl_xsplice_list_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_list_t);
+
+/*
+ * Perform an operation on the payload structure referenced by the `id` field.
+ * The operation request is asynchronous and the status should be retrieved
+ * by using either XEN_SYSCTL_XSPLICE_GET or XEN_SYSCTL_XSPLICE_LIST hypercall.
+ * If the operation fails more details on the operation can be retrieved via
+ * XEN_SYSCTL_XSPLICE_INFO hypercall.
+ */
+#define XEN_SYSCTL_XSPLICE_ACTION 3
+struct xen_sysctl_xsplice_action {
+    xen_xsplice_id_t id;                    /* IN, name of the patch. */
+#define XSPLICE_ACTION_CHECK        1
+#define XSPLICE_ACTION_UNLOAD       2
+#define XSPLICE_ACTION_REVERT       3
+#define XSPLICE_ACTION_APPLY        4
+    uint32_t    cmd;                        /* IN: XSPLICE_ACTION_*. */
+    uint32_t    _pad;                       /* IN: Always zero. */
+    uint64_aligned_t time;                  /* IN: Zero if no timeout. */
+};
+typedef struct xen_sysctl_xsplice_action xen_sysctl_xsplice_action_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_action_t);
+
+struct xen_sysctl_xsplice_op {
+    uint32_t cmd;                           /* IN: XEN_SYSCTL_XSPLICE_* */
+    uint32_t _pad;                          /* IN: Always zero. */
+    union {
+        xen_sysctl_xsplice_upload_t upload;
+        xen_sysctl_xsplice_list_t list;
+        xen_sysctl_xsplice_summary_t get;
+        xen_sysctl_xsplice_action_t action;
+    } u;
+};
+typedef struct xen_sysctl_xsplice_op xen_sysctl_xsplice_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_op_t);
+
 struct xen_sysctl {
     uint32_t cmd;
 #define XEN_SYSCTL_readconsole                    1
@@ -789,6 +943,7 @@ struct xen_sysctl {
 #define XEN_SYSCTL_pcitopoinfo                   22
 #define XEN_SYSCTL_psr_cat_op                    23
 #define XEN_SYSCTL_tmem_op                       24
+#define XEN_SYSCTL_xsplice_op                    25
     uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
     union {
         struct xen_sysctl_readconsole       readconsole;
@@ -814,6 +969,7 @@ struct xen_sysctl {
         struct xen_sysctl_psr_cmt_op        psr_cmt_op;
         struct xen_sysctl_psr_cat_op        psr_cat_op;
         struct xen_sysctl_tmem_op           tmem_op;
+        struct xen_sysctl_xsplice_op        xsplice;
         uint8_t                             pad[128];
     } u;
 };
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
new file mode 100644
index 0000000..41e28da
--- /dev/null
+++ b/xen/include/xen/xsplice.h
@@ -0,0 +1,9 @@
+#ifndef __XEN_XSPLICE_H__
+#define __XEN_XSPLICE_H__
+
+struct xen_sysctl_xsplice_op;
+int xsplice_control(struct xen_sysctl_xsplice_op *);
+
+extern void xsplice_printall(unsigned char key);
+
+#endif /* __XEN_XSPLICE_H__ */
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 4180f3b..053f083 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -807,6 +807,9 @@ static int flask_sysctl(int cmd)
     case XEN_SYSCTL_tmem_op:
         return domain_has_xen(current->domain, XEN__TMEM_CONTROL);
 
+    case XEN_SYSCTL_xsplice_op:
+        return domain_has_xen(current->domain, XEN2__XSPLICE_OP);
+
     default:
         printk("flask_sysctl: Unknown op %d\n", cmd);
         return -EPERM;
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index effb59f..5f08d05 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -93,6 +93,8 @@ class xen2
     pmu_ctrl
 # PMU use (domains, including unprivileged ones, will be using this operation)
     pmu_use
+# XEN_SYSCTL_xsplice_op
+    xsplice_op
 }
 
 # Classes domain and domain2 consist of operations that a domain performs on
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v1 3/5] libxc: Implementation of XEN_XSPLICE_op in libxc.
  2015-09-16 21:01 [PATCH v1] xSplice initial foundation patches Konrad Rzeszutek Wilk
  2015-09-16 21:01 ` [PATCH v1 1/5] xsplice: Design document Konrad Rzeszutek Wilk
  2015-09-16 21:01 ` [PATCH v1 2/5] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op Konrad Rzeszutek Wilk
@ 2015-09-16 21:01 ` Konrad Rzeszutek Wilk
  2015-09-16 21:01 ` [PATCH v1 4/5] xen-xsplice: Tool to manipulate xsplice payloads Konrad Rzeszutek Wilk
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-09-16 21:01 UTC (permalink / raw)
  To: xen-devel, msw, aliguori, amesserl, rick.harris, paul.voccio,
	steven.wilson, major.hayden, josh.kearney, jinsong.liu,
	xiantao.zxt, boris.ostrovsky, daniel.kiper, elena.ufimtseva,
	bob.liu, lars.kurth, hanweidong, peter.huangpeng, fanhenglong,
	liuyingdong, john.liuqiming, jbeulich, andrew.cooper3, mpohlack,
	ian.campbell
  Cc: Konrad Rzeszutek Wilk

The underlaying toolstack code to do the basic
operations when using the XEN_XSPLICE_op syscalls:
 - upload the payload,
 - get status of an payload,
 - list all the payloads,
 - apply, check, and revert the payload.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/libxc/include/xenctrl.h |  17 +++
 tools/libxc/xc_misc.c         | 274 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 291 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 3482544..2cd982d 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2844,6 +2844,23 @@ int xc_psr_cat_get_l3_info(xc_interface *xch, uint32_t socket,
                            uint32_t *cos_max, uint32_t *cbm_len);
 #endif
 
+int xc_xsplice_upload(xc_interface *xch,
+                      char *id, char *payload, uint32_t size);
+
+int xc_xsplice_get(xc_interface *xch,
+                   char *id,
+                   xen_xsplice_status_t *status);
+
+int xc_xsplice_list(xc_interface *xch, unsigned int max, unsigned int start,
+                    xen_xsplice_status_t *info, char *id,
+                    uint32_t *len, unsigned int *done,
+                    unsigned int *left);
+
+int xc_xsplice_apply(xc_interface *xch, char *id);
+int xc_xsplice_revert(xc_interface *xch, char *id);
+int xc_xsplice_unload(xc_interface *xch, char *id);
+int xc_xsplice_check(xc_interface *xch, char *id);
+
 #endif /* XENCTRL_H */
 
 /*
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index c613545..e59f0a1 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -718,6 +718,280 @@ int xc_hvm_inject_trap(
     return rc;
 }
 
+int xc_xsplice_upload(xc_interface *xch,
+                      char *id,
+                      char *payload,
+                      uint32_t size)
+{
+    int rc;
+    DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BOUNCE(payload, size, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    DECLARE_HYPERCALL_BOUNCE(id, 0 /* adjust later */, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    ssize_t len;
+
+    if ( !id || !payload )
+        return -1;
+
+    len = strlen(id);
+    if ( len > XEN_XSPLICE_ID_SIZE )
+        return -1;
+
+    HYPERCALL_BOUNCE_SET_SIZE(id, len);
+
+    if ( xc_hypercall_bounce_pre(xch, id) )
+        return -1;
+
+    if ( xc_hypercall_bounce_pre(xch, payload) )
+        return -1;
+
+    sysctl.cmd = XEN_SYSCTL_xsplice_op;
+    sysctl.u.xsplice.cmd = XEN_SYSCTL_XSPLICE_UPLOAD;
+    sysctl.u.xsplice.u.upload.size = size;
+    set_xen_guest_handle(sysctl.u.xsplice.u.upload.payload, payload);
+
+    sysctl.u.xsplice.u.upload.id.size = len;
+    set_xen_guest_handle(sysctl.u.xsplice.u.upload.id.name, id);
+
+    rc = do_sysctl(xch, &sysctl);
+
+    xc_hypercall_bounce_post(xch, payload);
+    xc_hypercall_bounce_post(xch, id);
+
+    return rc;
+}
+
+int xc_xsplice_get(xc_interface *xch,
+                   char *id,
+                   xen_xsplice_status_t *status)
+{
+    int rc;
+    DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BOUNCE(id, 0 /*adjust later */, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    ssize_t len;
+
+    if ( !id )
+        return -1;
+
+    len = strlen(id);
+    if ( len > XEN_XSPLICE_ID_SIZE )
+        return -1;
+
+    HYPERCALL_BOUNCE_SET_SIZE(id, len);
+
+    if ( xc_hypercall_bounce_pre(xch, id) )
+        return -1;
+
+    sysctl.cmd = XEN_SYSCTL_xsplice_op;
+    sysctl.u.xsplice.cmd = XEN_SYSCTL_XSPLICE_GET;
+
+    sysctl.u.xsplice.u.get.status.status = 0;
+
+    sysctl.u.xsplice.u.get.id.size = len;
+    set_xen_guest_handle(sysctl.u.xsplice.u.get.id.name, id);
+
+    rc = do_sysctl(xch, &sysctl);
+
+    xc_hypercall_bounce_post(xch, id);
+
+    memcpy(status, &sysctl.u.xsplice.u.get.status, sizeof(*status));
+
+    return rc;
+}
+
+int xc_xsplice_list(xc_interface *xch, unsigned int max, unsigned int start,
+                    xen_xsplice_status_t *info,
+                    char *id, uint32_t *len,
+                    unsigned int *done,
+                    unsigned int *left)
+{
+    int rc;
+    DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BOUNCE(info, 0 /* adjust later. */, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_HYPERCALL_BOUNCE(id, 0 /* adjust later. */, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_HYPERCALL_BOUNCE(len, 0 /* adjust later. */, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    uint32_t max_batch_sz, nr;
+    uint32_t version = 0, retries = 0;
+    uint32_t adjust = 0;
+
+    if ( !max || !info || !id || !len )
+        return -1;
+
+    sysctl.cmd = XEN_SYSCTL_xsplice_op;
+    sysctl.u.xsplice.cmd = XEN_SYSCTL_XSPLICE_LIST;
+    sysctl.u.xsplice.u.list.version = 0;
+    sysctl.u.xsplice.u.list.idx = start;
+    sysctl.u.xsplice.u.list._pad = 0;
+
+    max_batch_sz = max;
+
+    *done = 0;
+    *left = 0;
+    do {
+        if ( adjust )
+            adjust = 0; /* Used when adjusting the 'max_batch_sz' or 'retries'. */
+
+        nr = min(max - *done, max_batch_sz);
+
+        sysctl.u.xsplice.u.list.nr = nr;
+        /* Fix the size (may vary between hypercalls). */
+        HYPERCALL_BOUNCE_SET_SIZE(info, nr * sizeof(*info));
+        HYPERCALL_BOUNCE_SET_SIZE(id, nr * sizeof(*id) * XEN_XSPLICE_ID_SIZE);
+        HYPERCALL_BOUNCE_SET_SIZE(len, nr * sizeof(*len));
+        /* Move the pointer to proper offset into 'info'. */
+        (HYPERCALL_BUFFER(info))->ubuf = info + *done;
+        (HYPERCALL_BUFFER(id))->ubuf = id + (sizeof(*id) * XEN_XSPLICE_ID_SIZE * *done);
+        (HYPERCALL_BUFFER(len))->ubuf = len + *done;
+        /* Allocate memory. */
+        rc = xc_hypercall_bounce_pre(xch, info);
+        if ( rc )
+            return rc;
+
+        rc = xc_hypercall_bounce_pre(xch, id);
+        if ( rc )
+        {
+            xc_hypercall_bounce_post(xch, info);
+            return rc;
+        }
+        rc = xc_hypercall_bounce_pre(xch, len);
+        if ( rc )
+        {
+            xc_hypercall_bounce_post(xch, info);
+            xc_hypercall_bounce_post(xch, id);
+            return rc;
+        }
+        set_xen_guest_handle(sysctl.u.xsplice.u.list.status, info);
+        set_xen_guest_handle(sysctl.u.xsplice.u.list.id, id);
+        set_xen_guest_handle(sysctl.u.xsplice.u.list.len, len);
+
+        rc = do_sysctl(xch, &sysctl);
+        /*
+         * From here on we MUST call xc_hypercall_bounce. If rc < 0 we
+         * end up doing it (outside the loop), so using a break is OK.
+         */
+        if ( rc < 0 && errno == E2BIG )
+        {
+            if ( max_batch_sz <= 1 )
+                break;
+            max_batch_sz >>= 1;
+            adjust = 1; /* For the loop conditional to let us loop again. */
+            /* No memory leaks! */
+            xc_hypercall_bounce_post(xch, info);
+            xc_hypercall_bounce_post(xch, id);
+            xc_hypercall_bounce_post(xch, len);
+            continue;
+        }
+        else if ( rc < 0 ) /* For all other errors we bail out. */
+            break;
+
+        if ( !version )
+            version = sysctl.u.xsplice.u.list.version;
+
+        if ( sysctl.u.xsplice.u.list.version != version )
+        {
+            /* TODO: retries should be configurable? */
+            if ( retries++ > 3 )
+            {
+                rc = -1;
+                errno = EBUSY;
+                break;
+            }
+            *done = 0; /* Retry from scratch. */
+            version = sysctl.u.xsplice.u.list.version;
+            adjust = 1; /* And make sure we continue in the loop. */
+            /* No memory leaks. */
+            xc_hypercall_bounce_post(xch, info);
+            xc_hypercall_bounce_post(xch, id);
+            xc_hypercall_bounce_post(xch, len);
+            continue;
+        }
+
+        /* We should never hit this, but just in case. */
+        if ( rc > nr )
+        {
+            errno = EINVAL; /* Overflow! */
+            rc = -1;
+            break;
+        }
+        *left = sysctl.u.xsplice.u.list.nr; /* Total remaining count. */
+        /* Copy only up 'rc' of data' - we could add 'min(rc,nr) if desired. */
+        HYPERCALL_BOUNCE_SET_SIZE(info, (rc * sizeof(*info)));
+        HYPERCALL_BOUNCE_SET_SIZE(id, (rc * sizeof(*id) * XEN_XSPLICE_ID_SIZE));
+        HYPERCALL_BOUNCE_SET_SIZE(len, (rc * sizeof(*len)));
+        /* Bounce the data and free the bounce buffer. */
+        xc_hypercall_bounce_post(xch, info);
+        xc_hypercall_bounce_post(xch, id);
+        xc_hypercall_bounce_post(xch, len);
+        /* And update how many elements of info we have copied into. */
+        *done += rc;
+        /* Update idx. */
+        sysctl.u.xsplice.u.list.idx = *done;
+    } while ( adjust || (*done < max && *left != 0) );
+
+    if ( rc < 0 )
+    {
+        xc_hypercall_bounce_post(xch, len);
+        xc_hypercall_bounce_post(xch, id);
+        xc_hypercall_bounce_post(xch, info);
+    }
+
+    return rc > 0 ? 0 : rc;
+}
+
+static int _xc_xsplice_action(xc_interface *xch,
+                              char *id,
+                              unsigned int action)
+{
+    int rc;
+    DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BOUNCE(id, 0 /* adjust later */, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    ssize_t len;
+
+    len = strlen(id);
+
+    if ( len > XEN_XSPLICE_ID_SIZE )
+        return -1;
+
+    HYPERCALL_BOUNCE_SET_SIZE(id, len);
+
+    if ( xc_hypercall_bounce_pre(xch, id) )
+        return -1;
+
+    sysctl.cmd = XEN_SYSCTL_xsplice_op;
+    sysctl.u.xsplice.cmd = XEN_SYSCTL_XSPLICE_ACTION;
+    sysctl.u.xsplice.u.action.cmd = action;
+    sysctl.u.xsplice.u.action._pad = 0;
+    sysctl.u.xsplice.u.action.time = 0; /* TODO */
+
+    sysctl.u.xsplice.u.action.id.size = len;
+    set_xen_guest_handle(sysctl.u.xsplice.u.action.id.name, id);
+
+    rc = do_sysctl(xch, &sysctl);
+
+    xc_hypercall_bounce_post(xch, id);
+
+    return rc;
+}
+
+int xc_xsplice_apply(xc_interface *xch, char *id)
+{
+    return _xc_xsplice_action(xch, id, XSPLICE_ACTION_APPLY);
+}
+
+int xc_xsplice_revert(xc_interface *xch, char *id)
+{
+    return _xc_xsplice_action(xch, id, XSPLICE_ACTION_REVERT);
+}
+
+int xc_xsplice_unload(xc_interface *xch, char *id)
+{
+    return _xc_xsplice_action(xch, id, XSPLICE_ACTION_UNLOAD);
+}
+
+int xc_xsplice_check(xc_interface *xch, char *id)
+{
+    return _xc_xsplice_action(xch, id, XSPLICE_ACTION_CHECK);
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v1 4/5] xen-xsplice: Tool to manipulate xsplice payloads.
  2015-09-16 21:01 [PATCH v1] xSplice initial foundation patches Konrad Rzeszutek Wilk
                   ` (2 preceding siblings ...)
  2015-09-16 21:01 ` [PATCH v1 3/5] libxc: Implementation of XEN_XSPLICE_op in libxc Konrad Rzeszutek Wilk
@ 2015-09-16 21:01 ` Konrad Rzeszutek Wilk
  2015-09-16 21:01 ` [PATCH v1 5/5] xsplice: Use ld-embedded build-ids Konrad Rzeszutek Wilk
  2015-10-02 14:48 ` [PATCH v1] xSplice initial foundation patches Konrad Rzeszutek Wilk
  5 siblings, 0 replies; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-09-16 21:01 UTC (permalink / raw)
  To: xen-devel, msw, aliguori, amesserl, rick.harris, paul.voccio,
	steven.wilson, major.hayden, josh.kearney, jinsong.liu,
	xiantao.zxt, boris.ostrovsky, daniel.kiper, elena.ufimtseva,
	bob.liu, lars.kurth, hanweidong, peter.huangpeng, fanhenglong,
	liuyingdong, john.liuqiming, jbeulich, andrew.cooper3, mpohlack,
	ian.campbell
  Cc: Konrad Rzeszutek Wilk

A simple tool that allows an system admin to perform
basic xsplice operations:

 - Upload a xsplice file (with an unique id)
 - List all the xsplice payloads loaded.
 - Apply, revert, unload, or check the payload using the
   unique id.
 - Do all three - upload, check, and apply the
   payload in one go (--all).

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/misc/Makefile      |   4 +
 tools/misc/xen-xsplice.c | 417 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 421 insertions(+)
 create mode 100644 tools/misc/xen-xsplice.c

diff --git a/tools/misc/Makefile b/tools/misc/Makefile
index c4490f3..c46873e 100644
--- a/tools/misc/Makefile
+++ b/tools/misc/Makefile
@@ -30,6 +30,7 @@ INSTALL_SBIN                   += xenlockprof
 INSTALL_SBIN                   += xenperf
 INSTALL_SBIN                   += xenpm
 INSTALL_SBIN                   += xenwatchdogd
+INSTALL_SBIN                   += xen-xsplice
 INSTALL_SBIN += $(INSTALL_SBIN-y)
 
 # Everything to be installed in a private bin/
@@ -98,6 +99,9 @@ xen-mfndump: xen-mfndump.o
 xenwatchdogd: xenwatchdogd.o
 	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
 
+xen-xsplice: xen-xsplice.o
+	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
+
 xen-lowmemd: xen-lowmemd.o
 	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(LDLIBS_libxenstore) $(APPEND_LDFLAGS)
 
diff --git a/tools/misc/xen-xsplice.c b/tools/misc/xen-xsplice.c
new file mode 100644
index 0000000..d8bd222
--- /dev/null
+++ b/tools/misc/xen-xsplice.c
@@ -0,0 +1,417 @@
+#include <xenctrl.h>
+#include <xenstore.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdio.h>
+#include <sys/stat.h>
+#include <sys/mman.h>
+#include <fcntl.h>
+static xc_interface *xch;
+
+void show_help(void)
+{
+    fprintf(stderr,
+            "xen-xsplice: Xsplice test tool\n"
+            "Usage: xen-xsplice <command> [args]\n"
+            " <id> An unique name of payload. Up to %d characters.\n"
+            "Commands:\n"
+            "  help                 display this help\n"
+            "  upload <id> <file>   upload file <cpuid> with <id> name\n"
+            "  list                 list payloads uploaded.\n"
+            "  apply <id>           apply <id> patch.\n"
+            "  revert <id>          revert id <id> patch.\n"
+            "  unload <id>          unload id <id> patch.\n"
+            "  check <id>           check id <id> patch.\n",
+            XEN_XSPLICE_ID_SIZE);
+}
+
+/* wrapper function */
+static int help_func(int argc, char *argv[])
+{
+    show_help();
+    return 0;
+}
+
+#define ARRAY_SIZE(a) (sizeof (a) / sizeof ((a)[0]))
+
+static const char *status2str(long status)
+{
+#define STATUS(x) [XSPLICE_STATUS_##x] = #x
+    static const char *const names[] = {
+            STATUS(LOADED),
+            STATUS(PROGRESS),
+            STATUS(CHECKED),
+            STATUS(APPLIED),
+            STATUS(REVERTED),
+    };
+
+    if (status >= ARRAY_SIZE(names))
+        return "unknown";
+
+    if (status < 0)
+        return "-EXX";
+
+    if (!names[status])
+        return "unknown";
+
+    return names[status];
+}
+
+#define MAX_LEN 11
+static int list_func(int argc, char *argv[])
+{
+    unsigned int idx, done, left, i;
+    xen_xsplice_status_t *info = NULL;
+    char *id = NULL;
+    uint32_t *len = NULL;
+    int rc = ENOMEM;
+
+    if ( argc )
+    {
+        show_help();
+        return -1;
+    }
+    idx = left = 0;
+    info = malloc(sizeof(*info) * MAX_LEN);
+    if ( !info )
+        goto out;
+    id = malloc(sizeof(*id) * XEN_XSPLICE_ID_SIZE * MAX_LEN);
+    if ( !id )
+        goto out;
+    len = malloc(sizeof(*len) * MAX_LEN);
+    if ( !len )
+        goto out;
+
+    fprintf(stdout," ID                                     | status\n"
+                   "----------------------------------------+------------\n");
+    do {
+        done = 0;
+        memset(info, 'A', sizeof(*info) * MAX_LEN); /* Optional. */
+        memset(id, 'i', sizeof(*id) * MAX_LEN * XEN_XSPLICE_ID_SIZE); /* Optional. */
+        memset(len, 'l', sizeof(*len) * MAX_LEN); /* Optional. */
+        rc = xc_xsplice_list(xch, MAX_LEN, idx, info, id, len, &done, &left);
+        if ( rc )
+        {
+            fprintf(stderr, "Failed to list %d/%d: %d(%s)!\n", idx, left, errno, strerror(errno));
+            break;
+        }
+        for ( i = 0; i < done; i++ )
+        {
+            unsigned int j;
+            uint32_t sz;
+            char *str;
+
+            sz = len[i];
+            str = id + (i * XEN_XSPLICE_ID_SIZE);
+            for ( j = sz; j < XEN_XSPLICE_ID_SIZE; j++ )
+                str[j] = '\0';
+
+            fprintf(stdout, "%-40s| ", str);
+            if ( info[i].status < 0 )
+                fprintf(stdout, "%s\n", strerror(info[i].status));
+            else
+                fprintf(stdout, "%s\n", status2str(info[i].status));
+        }
+        idx += done;
+    } while ( left );
+
+out:
+    free(id);
+    free(info);
+    free(len);
+    return rc;
+}
+#undef MAX_LEN
+
+static int get_id(int argc, char *argv[], char *id)
+{
+    ssize_t len = strlen(argv[0]);
+    if ( len > XEN_XSPLICE_ID_SIZE )
+    {
+        fprintf(stderr, "ID MUST be %d characters!\n", XEN_XSPLICE_ID_SIZE);
+        errno = EINVAL;
+        return errno;
+    }
+    /* Don't want any funny strings from the stack. */
+    memset(id, 0, XEN_XSPLICE_ID_SIZE);
+    strncpy(id, argv[0], len);
+    return 0;
+}
+
+static int upload_func(int argc, char *argv[])
+{
+    char *filename;
+    char id[XEN_XSPLICE_ID_SIZE];
+    int fd = 0, rc;
+    struct stat buf;
+    unsigned char *fbuf;
+    ssize_t len;
+    DECLARE_HYPERCALL_BUFFER(char, payload);
+
+    if ( argc != 2 )
+    {
+        show_help();
+        return -1;
+    }
+
+    if ( get_id(argc, argv, id) )
+        return EINVAL;
+
+    filename = argv[1];
+    fd = open(filename, O_RDONLY);
+    if ( fd < 0 )
+    {
+        fprintf(stderr, "Could not open %s, error: %d(%s)\n",
+                filename, errno, strerror(errno));
+        return errno;
+    }
+    if ( stat(filename, &buf) != 0 )
+    {
+        fprintf(stderr, "Could not get right size %s, error: %d(%s)\n",
+                filename, errno, strerror(errno));
+        close(fd);
+        return errno;
+    }
+
+    len = buf.st_size;
+    fbuf = mmap(0, len, PROT_READ, MAP_PRIVATE, fd, 0);
+    if ( fbuf == MAP_FAILED )
+    {
+        fprintf(stderr,"Could not map: %s, error: %d(%s)\n",
+                filename, errno, strerror(errno));
+        close (fd);
+        return errno;
+    }
+    printf("Uploading %s (%zu bytes)\n", filename, len);
+    payload = xc_hypercall_buffer_alloc(xch, payload, len);
+    memcpy(payload, fbuf, len);
+
+    rc = xc_xsplice_upload(xch, id, payload, len);
+    if ( rc )
+    {
+        fprintf(stderr, "Upload failed: %s, error: %d(%s)!\n",
+                filename, errno, strerror(errno));
+        goto out;
+    }
+    xc_hypercall_buffer_free(xch, payload);
+
+out:
+    if ( munmap( fbuf, len) )
+    {
+        fprintf(stderr, "Could not unmap!? error: %d(%s)!\n",
+                errno, strerror(errno));
+        rc = errno;
+    }
+    close(fd);
+
+    return rc;
+}
+
+enum {
+    ACTION_APPLY = 0,
+    ACTION_REVERT = 1,
+    ACTION_UNLOAD = 2,
+    ACTION_CHECK = 3
+};
+
+struct {
+    int allow; /* State it must be in to call function. */
+    int expected; /* The state to be in after the function. */
+    const char *name;
+    int (*function)(xc_interface *xch, char *id);
+    unsigned int executed; /* Has the function been called?. */
+} action_options[] = {
+    {   .allow = XSPLICE_STATUS_CHECKED | XSPLICE_STATUS_REVERTED,
+        .expected = XSPLICE_STATUS_APPLIED,
+        .name = "apply",
+        .function = xc_xsplice_apply,
+    },
+    {   .allow = XSPLICE_STATUS_APPLIED,
+        .expected = XSPLICE_STATUS_REVERTED,
+        .name = "revert",
+        .function = xc_xsplice_revert,
+    },
+    {   .allow = XSPLICE_STATUS_CHECKED | XSPLICE_STATUS_REVERTED | XSPLICE_STATUS_LOADED,
+        .expected = ENOENT,
+        .name = "unload",
+        .function = xc_xsplice_unload,
+    },
+    {   .allow = XSPLICE_STATUS_CHECKED | XSPLICE_STATUS_LOADED,
+        .expected = XSPLICE_STATUS_CHECKED,
+        .name = "check",
+        .function = xc_xsplice_check
+    },
+};
+
+int action_func(int argc, char *argv[], unsigned int idx)
+{
+    char id[XEN_XSPLICE_ID_SIZE];
+    int rc;
+    xen_xsplice_status_t status;
+    unsigned int retry = 0;
+
+    if ( argc != 1 )
+    {
+        show_help();
+        return -1;
+    }
+
+    if ( idx >= ARRAY_SIZE(action_options) )
+        return -1;
+
+    if ( get_id(argc, argv, id) )
+        return EINVAL;
+
+    do {
+        rc = xc_xsplice_get(xch, id, &status);
+        /* N.B. Successfull unload will return ENOENT. */
+        if ( rc )
+        {
+            rc = errno; /* rc is just -1 and we want proper EXX. */
+            break;
+        }
+
+        if ( status.status < 0 )
+        { /* We report it outside the loop. */
+            rc = status.status;
+            break;
+        }
+        if ( status.status == XSPLICE_STATUS_PROGRESS )
+        {
+            if ( !action_options[idx].executed )
+            {
+                printf("%s is in progress and we didn't initiate it!\n", id);
+                errno = EBUSY;
+                rc = -1;
+                break;
+            }
+            if ( retry++ < 30 )
+            {
+                printf(".");
+                sleep(1);
+                continue;
+            }
+            printf("%s: Waited more than 30 seconds! Bailing out.\n", id);
+            errno = EBUSY;
+            rc = -1;
+            break;
+        }
+        /* We use rc outside loop to deal with EXX type expected values. */
+        rc = status.status;
+        if ( action_options[idx].expected == rc ) /* Yeey! */
+            break;
+
+        if ( action_options[idx].allow & rc )
+        {
+            if ( action_options[idx].executed )
+            {
+                printf(" (0x%x vs 0x%x) state not reached!?\n",
+                       action_options[idx].expected, rc);
+                errno = EINVAL;
+                break;
+            }
+            printf("%s: State is 0x%x, ok are 0x%x. Commencing %s:",
+                   id, rc, action_options[idx].allow,
+                   action_options[idx].name);
+
+            rc = action_options[idx].function(xch, id);
+            if ( rc ) /* We report it outside the loop. */
+                break;
+
+            action_options[idx].executed = 1;
+            rc = 1; /* Loop again so we can display the dots. */
+        } else {
+            printf("%s: in wrong state (0x%x), expected 0x%x\n",
+                   id, rc, action_options[idx].expected);
+            errno = EINVAL;
+            rc = -1;
+            break;
+        }
+    } while ( rc > 0 );
+
+    if ( action_options[idx].expected == rc )
+    {
+            printf("completed!\n");
+            rc = 0;
+    } else
+        printf("%s failed with %d(%s)\n", id, errno, strerror(errno));
+
+    return rc;
+}
+static int all_func(int argc, char *argv[])
+{
+    int rc;
+
+    rc = upload_func(argc, argv);
+    if ( rc )
+        return rc;
+
+    rc = action_func(1 /* only id */, argv, ACTION_CHECK);
+    if ( rc )
+        goto unload;
+
+    rc = action_func(1 /* only id */, argv, ACTION_APPLY);
+    if ( rc )
+        goto unload;
+
+    return 0;
+unload:
+    action_func(argc, argv, ACTION_UNLOAD);
+    return rc;
+}
+
+struct {
+    const char *name;
+    int (*function)(int argc, char *argv[]);
+} main_options[] = {
+    { "help", help_func },
+    { "list", list_func },
+    { "upload", upload_func },
+    { "all", all_func },
+};
+
+int main(int argc, char *argv[])
+{
+    int i, j, ret;
+
+    if ( argc  <= 1 )
+    {
+        show_help();
+        return 0;
+    }
+    for ( i = 0; i < ARRAY_SIZE(main_options); i++ )
+        if (!strncmp(main_options[i].name, argv[1], strlen(argv[1])))
+            break;
+
+    if ( i == ARRAY_SIZE(main_options) )
+    {
+        for ( j = 0; j < ARRAY_SIZE(action_options); j++ )
+            if (!strncmp(action_options[j].name, argv[1], strlen(argv[1])))
+                break;
+
+        if ( j == ARRAY_SIZE(action_options) )
+        {
+            fprintf(stderr, "Unrecognised command '%s' -- try "
+                   "'xen-xsplice help'\n", argv[1]);
+            return 1;
+        }
+    } else
+        j = ARRAY_SIZE(action_options);
+
+    xch = xc_interface_open(0,0,0);
+    if ( !xch )
+    {
+        fprintf(stderr, "failed to get the handler\n");
+        return 0;
+    }
+
+    if ( i == ARRAY_SIZE(main_options) )
+        ret = action_func(argc -2, argv + 2, j);
+    else
+        ret = main_options[i].function(argc -2, argv + 2);
+
+    xc_interface_close(xch);
+
+    return !!ret;
+}
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v1 5/5] xsplice: Use ld-embedded build-ids
  2015-09-16 21:01 [PATCH v1] xSplice initial foundation patches Konrad Rzeszutek Wilk
                   ` (3 preceding siblings ...)
  2015-09-16 21:01 ` [PATCH v1 4/5] xen-xsplice: Tool to manipulate xsplice payloads Konrad Rzeszutek Wilk
@ 2015-09-16 21:01 ` Konrad Rzeszutek Wilk
  2015-09-16 21:41   ` Andrew Cooper
  2015-10-02 15:13   ` Jan Beulich
  2015-10-02 14:48 ` [PATCH v1] xSplice initial foundation patches Konrad Rzeszutek Wilk
  5 siblings, 2 replies; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-09-16 21:01 UTC (permalink / raw)
  To: xen-devel, msw, aliguori, amesserl, rick.harris, paul.voccio,
	steven.wilson, major.hayden, josh.kearney, jinsong.liu,
	xiantao.zxt, boris.ostrovsky, daniel.kiper, elena.ufimtseva,
	bob.liu, lars.kurth, hanweidong, peter.huangpeng, fanhenglong,
	liuyingdong, john.liuqiming, jbeulich, andrew.cooper3, mpohlack,
	ian.campbell
  Cc: Martin Pohlack, Konrad Rzeszutek Wilk

From: Martin Pohlack <mpohlack@amazon.de>

The mechanism to get this is via the XSPLICE_OP and
we add a new subsequent hypercall to retrieve the
binary build-id. The hypercall allows an arbirarty
size (the buffer is provided to the hypervisor) - however
by default the toolstack will allocate it up to 128
bytes.

We also add two places for the build-id to be printed:
 - xsplice keyhandler. We cannot use 'hh' in the hypervisor
   snprintf handler (as it is not implemented) so instead
   we use an simpler way to print it.
 - In the 'xen-xsplice' tool add an extra parameter - build-id
   to print this as an human readable value.

Note that one can also retrieve the value by 'readelf -h xen-syms'.

Signed-off-by: Martin Pohlack <mpohlack@amazon.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 tools/libxc/include/xenctrl.h |  1 +
 tools/libxc/xc_misc.c         | 26 +++++++++++++
 tools/misc/xen-xsplice.c      | 39 ++++++++++++++++++++
 xen/arch/x86/Makefile         |  4 +-
 xen/arch/x86/xen.lds.S        |  5 +++
 xen/common/xsplice.c          | 86 +++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/sysctl.h   | 18 +++++++++
 xen/include/xen/version.h     |  1 +
 8 files changed, 178 insertions(+), 2 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 2cd982d..946ddc0 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2860,6 +2860,7 @@ int xc_xsplice_apply(xc_interface *xch, char *id);
 int xc_xsplice_revert(xc_interface *xch, char *id);
 int xc_xsplice_unload(xc_interface *xch, char *id);
 int xc_xsplice_check(xc_interface *xch, char *id);
+int xc_xsplice_build_id(xc_interface *xch, char *build_id, unsigned int max);
 
 #endif /* XENCTRL_H */
 
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index e59f0a1..f67ff16 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -992,6 +992,32 @@ int xc_xsplice_check(xc_interface *xch, char *id)
     return _xc_xsplice_action(xch, id, XSPLICE_ACTION_CHECK);
 }
 
+int xc_xsplice_build_id(xc_interface *xch, char *build_id, unsigned int max)
+{
+    int rc;
+    DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BOUNCE(build_id, max, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+    if ( !build_id || !max )
+        return -1;
+
+    if ( xc_hypercall_bounce_pre(xch, build_id) )
+        return -1;
+
+    sysctl.cmd = XEN_SYSCTL_xsplice_op;
+    sysctl.u.xsplice.cmd = XEN_SYSCTL_XSPLICE_INFO;
+
+    sysctl.u.xsplice.u.info.cmd = XEN_SYSCTL_XSPLICE_INFO_BUILD_ID;
+    sysctl.u.xsplice.u.info.size = max;
+
+    set_xen_guest_handle(sysctl.u.xsplice.u.info.u.info, build_id);
+
+    rc = do_sysctl(xch, &sysctl);
+
+    xc_hypercall_bounce_post(xch, build_id);
+
+    return rc;
+}
 /*
  * Local variables:
  * mode: C
diff --git a/tools/misc/xen-xsplice.c b/tools/misc/xen-xsplice.c
index d8bd222..609d799 100644
--- a/tools/misc/xen-xsplice.c
+++ b/tools/misc/xen-xsplice.c
@@ -17,6 +17,7 @@ void show_help(void)
             " <id> An unique name of payload. Up to %d characters.\n"
             "Commands:\n"
             "  help                 display this help\n"
+            "  build-id             display build-id of hypervisor.\n"
             "  upload <id> <file>   upload file <cpuid> with <id> name\n"
             "  list                 list payloads uploaded.\n"
             "  apply <id>           apply <id> patch.\n"
@@ -361,12 +362,50 @@ unload:
     return rc;
 }
 
+
+#define MAX_LEN 1024
+static int build_id_func(int argc, char *argv[])
+{
+    char binary_id[MAX_LEN];
+    char ascii_id[MAX_LEN];
+    int rc;
+    unsigned int i;
+
+    if ( argc )
+    {
+        show_help();
+        return -1;
+    }
+
+    memset(binary_id, 0, sizeof(binary_id));
+
+    rc = xc_xsplice_build_id(xch, binary_id, MAX_LEN);
+    if ( rc < 0 )
+    {
+        printf("Failed to get build_id: %d(%s)\n", errno, strerror(errno));
+        return -1;
+    }
+    /* Convert to printable format. */
+    if ( rc > MAX_LEN )
+        rc = MAX_LEN;
+
+    for ( i = 0; i < rc && (i + 1) * 2 < sizeof(binary_id); i++ )
+        snprintf(&ascii_id[i * 2], 3, "%02hhx", binary_id[i]);
+
+    ascii_id[i*2]='\0';
+    printf("%s", ascii_id);
+
+    return 0;
+}
+#undef MAX
+
 struct {
     const char *name;
     int (*function)(int argc, char *argv[]);
 } main_options[] = {
     { "help", help_func },
     { "list", list_func },
+    { "build-id", build_id_func },
     { "upload", upload_func },
     { "all", all_func },
 };
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 39a8059..de11910 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -108,11 +108,11 @@ $(TARGET)-syms: prelink.o xen.lds $(BASEDIR)/common/symbols-dummy.o
 	    $(BASEDIR)/common/symbols-dummy.o -o $(@D)/.$(@F).0
 	$(NM) -n $(@D)/.$(@F).0 | $(BASEDIR)/tools/symbols >$(@D)/.$(@F).0.S
 	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).0.o
-	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
+	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o --build-id=sha1 \
 	    $(@D)/.$(@F).0.o -o $(@D)/.$(@F).1
 	$(NM) -n $(@D)/.$(@F).1 | $(BASEDIR)/tools/symbols >$(@D)/.$(@F).1.S
 	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1.o
-	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
+	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o --build-id=sha1 \
 	    $(@D)/.$(@F).1.o -o $@
 	rm -f $(@D)/.$(@F).[0-9]*
 
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 6553cff..2176782 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -67,6 +67,11 @@ SECTIONS
        *(.rodata.*)
   } :text
 
+  .note.gnu.build-id : {
+      __note_gnu_build_id_start = .;
+      *(.note.gnu.build-id)
+  } :text
+
   . = ALIGN(SMP_CACHE_BYTES);
   .data.read_mostly : {
        /* Exception table */
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index d330efe..5728c4b 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -14,6 +14,8 @@
 #include <xen/sched.h>
 #include <xen/lib.h>
 #include <xen/xsplice.h>
+#include <xen/elf.h>
+#include <xen/types.h>
 #include <public/sysctl.h>
 
 #include <asm/event.h>
@@ -44,6 +46,36 @@ struct payload {
     struct tasklet tasklet;
 };
 
+#ifdef CONFIG_ARM
+static int build_id(char **p, unsigned int *len)
+{
+    return 0;
+}
+#else
+#define NT_GNU_BUILD_ID 3
+
+extern char * __note_gnu_build_id_start;  /* defined in linker script */
+static int build_id(char **p, unsigned int *len)
+{
+    Elf_Note *n;
+
+    n = (Elf_Note *)&__note_gnu_build_id_start;
+
+    /* Check if we really have a build-id. */
+    if ( NT_GNU_BUILD_ID != n->type )
+        return -ENODATA;
+
+    /* Sanity check, name should be "GNU" for ld-generated build-id. */
+    if ( strncmp(ELFNOTE_NAME(n), "GNU", n->namesz) != 0 )
+        return -ENODATA;
+
+    *len = n->descsz;
+    *p = ELFNOTE_DESC(n);
+
+    return 0;
+}
+#endif
+
 static const char *status2str(int64_t status)
 {
 #define STATUS(x) [XSPLICE_STATUS_##x] = #x
@@ -68,9 +100,31 @@ static const char *status2str(int64_t status)
     return names[status];
 }
 
+#define LEN 128
 void xsplice_printall(unsigned char key)
 {
     struct payload *data;
+    char *binary_id = NULL;
+    unsigned int len = 0;
+    int rc;
+
+    rc = build_id(&binary_id, &len);
+    printk("build-id: ");
+    if ( !rc )
+    {
+        unsigned int i;
+
+        if ( len > LEN )
+            len = LEN;
+
+        for ( i = 0; i < len; i++ )
+        {
+		    uint8_t c = binary_id[i];
+		    printk("%02x", c);
+        }
+	    printk("\n");
+    } else if ( rc < 0 )
+        printk("rc = %d\n", rc);
 
     spin_lock(&payload_list_lock);
 
@@ -81,6 +135,7 @@ void xsplice_printall(unsigned char key)
     }
     spin_unlock(&payload_list_lock);
 }
+#undef LEN
 
 static int verify_id(xen_xsplice_id_t *id)
 {
@@ -415,6 +470,34 @@ static int xsplice_action(xen_sysctl_xsplice_action_t *action)
     return rc;
 }
 
+static int xsplice_info(xen_sysctl_xsplice_info_t *info)
+{
+    int rc;
+    unsigned int len = 0;
+    char *p = NULL;
+
+    if ( info->cmd != XEN_SYSCTL_XSPLICE_INFO_BUILD_ID )
+        return -EINVAL;
+
+    if ( info->size == 0 )
+        return -EINVAL;
+
+    if ( !guest_handle_okay(info->u.info, info->size) )
+        return -EFAULT;
+
+    rc = build_id(&p, &len);
+    if ( rc )
+        return rc;
+
+    if ( len > info->size )
+        return -ENOMEM;
+
+    if ( copy_to_guest(info->u.info, p, len) )
+        return -EFAULT;
+
+    return len;
+}
+
 int xsplice_control(xen_sysctl_xsplice_op_t *xsplice)
 {
     int rc;
@@ -433,6 +516,9 @@ int xsplice_control(xen_sysctl_xsplice_op_t *xsplice)
     case XEN_SYSCTL_XSPLICE_ACTION:
         rc = xsplice_action(&xsplice->u.action);
         break;
+    case XEN_SYSCTL_XSPLICE_INFO:
+        rc = xsplice_info(&xsplice->u.info);
+        break;
     default:
         rc = -ENOSYS;
         break;
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 08952de..583dce8 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -905,6 +905,23 @@ struct xen_sysctl_xsplice_action {
 typedef struct xen_sysctl_xsplice_action xen_sysctl_xsplice_action_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_action_t);
 
+/*
+ * Retrieve information useful for patching tools.
+ */
+#define XEN_SYSCTL_XSPLICE_INFO 4
+/* The build-id of the hypervisor. */
+#define XEN_SYSCTL_XSPLICE_INFO_BUILD_ID 0
+struct xen_sysctl_xsplice_info {
+    uint32_t cmd;                           /* IN: XEN_SYSCTL_XSPLICE_INFO_* */
+    uint32_t size;                          /* IN: Size of info: OUT: Amount of
+                                               bytes filed out in info. */
+    union {
+        XEN_GUEST_HANDLE_64(char) info;     /* OUT: Requested information. */
+    } u;
+};
+typedef struct xen_sysctl_xsplice_info xen_sysctl_xsplice_info_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_info_t);
+
 struct xen_sysctl_xsplice_op {
     uint32_t cmd;                           /* IN: XEN_SYSCTL_XSPLICE_* */
     uint32_t _pad;                          /* IN: Always zero. */
@@ -913,6 +930,7 @@ struct xen_sysctl_xsplice_op {
         xen_sysctl_xsplice_list_t list;
         xen_sysctl_xsplice_summary_t get;
         xen_sysctl_xsplice_action_t action;
+        xen_sysctl_xsplice_info_t info;
     } u;
 };
 typedef struct xen_sysctl_xsplice_op xen_sysctl_xsplice_op_t;
diff --git a/xen/include/xen/version.h b/xen/include/xen/version.h
index 81a3c7d..02f9585 100644
--- a/xen/include/xen/version.h
+++ b/xen/include/xen/version.h
@@ -12,5 +12,6 @@ unsigned int xen_minor_version(void);
 const char *xen_extra_version(void);
 const char *xen_changeset(void);
 const char *xen_banner(void);
+const char *xen_build_id(void);
 
 #endif /* __XEN_VERSION_H__ */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v1 5/5] xsplice: Use ld-embedded build-ids
  2015-09-16 21:01 ` [PATCH v1 5/5] xsplice: Use ld-embedded build-ids Konrad Rzeszutek Wilk
@ 2015-09-16 21:41   ` Andrew Cooper
  2015-09-16 21:59     ` Konrad Rzeszutek Wilk
  2015-10-02 15:13   ` Jan Beulich
  1 sibling, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2015-09-16 21:41 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, msw, aliguori, amesserl,
	rick.harris, paul.voccio, steven.wilson, major.hayden,
	josh.kearney, jinsong.liu, xiantao.zxt, boris.ostrovsky,
	daniel.kiper, elena.ufimtseva, bob.liu, lars.kurth, hanweidong,
	peter.huangpeng, fanhenglong, liuyingdong, john.liuqiming,
	jbeulich, mpohlack, ian.campbell
  Cc: Martin Pohlack

On 16/09/2015 22:01, Konrad Rzeszutek Wilk wrote:
> From: Martin Pohlack <mpohlack@amazon.de>
>
> The mechanism to get this is via the XSPLICE_OP and
> we add a new subsequent hypercall to retrieve the
> binary build-id. The hypercall allows an arbirarty
> size (the buffer is provided to the hypervisor) - however
> by default the toolstack will allocate it up to 128
> bytes.
>
> We also add two places for the build-id to be printed:
>  - xsplice keyhandler. We cannot use 'hh' in the hypervisor
>    snprintf handler (as it is not implemented) so instead
>    we use an simpler way to print it.
>  - In the 'xen-xsplice' tool add an extra parameter - build-id
>    to print this as an human readable value.
>
> Note that one can also retrieve the value by 'readelf -h xen-syms'.
>
> Signed-off-by: Martin Pohlack <mpohlack@amazon.de>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  tools/libxc/include/xenctrl.h |  1 +
>  tools/libxc/xc_misc.c         | 26 +++++++++++++
>  tools/misc/xen-xsplice.c      | 39 ++++++++++++++++++++
>  xen/arch/x86/Makefile         |  4 +-
>  xen/arch/x86/xen.lds.S        |  5 +++
>  xen/common/xsplice.c          | 86 +++++++++++++++++++++++++++++++++++++++++++
>  xen/include/public/sysctl.h   | 18 +++++++++
>  xen/include/xen/version.h     |  1 +
>  8 files changed, 178 insertions(+), 2 deletions(-)
>
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 2cd982d..946ddc0 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -2860,6 +2860,7 @@ int xc_xsplice_apply(xc_interface *xch, char *id);
>  int xc_xsplice_revert(xc_interface *xch, char *id);
>  int xc_xsplice_unload(xc_interface *xch, char *id);
>  int xc_xsplice_check(xc_interface *xch, char *id);
> +int xc_xsplice_build_id(xc_interface *xch, char *build_id, unsigned int max);

The build id of the current running hypervisor should belong in the
xeninfo hypercall.  It is not specific to xsplice.

> diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
> index d330efe..5728c4b 100644
> --- a/xen/common/xsplice.c
> +++ b/xen/common/xsplice.c
> @@ -14,6 +14,8 @@
>  #include <xen/sched.h>
>  #include <xen/lib.h>
>  #include <xen/xsplice.h>
> +#include <xen/elf.h>
> +#include <xen/types.h>
>  #include <public/sysctl.h>
>  
>  #include <asm/event.h>
> @@ -44,6 +46,36 @@ struct payload {
>      struct tasklet tasklet;
>  };
>  
> +#ifdef CONFIG_ARM
> +static int build_id(char **p, unsigned int *len)
> +{
> +    return 0;
> +}
> +#else
> +#define NT_GNU_BUILD_ID 3
> +
> +extern char * __note_gnu_build_id_start;  /* defined in linker script */

This is liable to cause you to deference the first 8 bytes of the note,
as the symbol is not a char*.

Use extern Elf_Note __note_gnu_build_id_start[]; instead.

Also, it is probably worth having an _end symbol as well and check for
end > start to confirm that the linker has actually put something in there.

~Andrew

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v1 5/5] xsplice: Use ld-embedded build-ids
  2015-09-16 21:41   ` Andrew Cooper
@ 2015-09-16 21:59     ` Konrad Rzeszutek Wilk
  2015-09-16 22:31       ` Andrew Cooper
  0 siblings, 1 reply; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-09-16 21:59 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel, msw, aliguori, amesserl, rick.harris,
	paul.voccio, steven.wilson, major.hayden, josh.kearney,
	jinsong.liu, xiantao.zxt, boris.ostrovsky, daniel.kiper,
	elena.ufimtseva, bob.liu, lars.kurth, hanweidong,
	peter.huangpeng, fanhenglong, liuyingdong, john.liuqiming,
	jbeulich, mpohlack, ian.campbell
  Cc: Martin Pohlack

On September 16, 2015 5:41:26 PM EDT, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>On 16/09/2015 22:01, Konrad Rzeszutek Wilk wrote:
>> From: Martin Pohlack <mpohlack@amazon.de>
>>
>> The mechanism to get this is via the XSPLICE_OP and
>> we add a new subsequent hypercall to retrieve the
>> binary build-id. The hypercall allows an arbirarty
>> size (the buffer is provided to the hypervisor) - however
>> by default the toolstack will allocate it up to 128
>> bytes.
>>
>> We also add two places for the build-id to be printed:
>>  - xsplice keyhandler. We cannot use 'hh' in the hypervisor
>>    snprintf handler (as it is not implemented) so instead
>>    we use an simpler way to print it.
>>  - In the 'xen-xsplice' tool add an extra parameter - build-id
>>    to print this as an human readable value.
>>
>> Note that one can also retrieve the value by 'readelf -h xen-syms'.
>>
>> Signed-off-by: Martin Pohlack <mpohlack@amazon.de>
>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> ---
>>  tools/libxc/include/xenctrl.h |  1 +
>>  tools/libxc/xc_misc.c         | 26 +++++++++++++
>>  tools/misc/xen-xsplice.c      | 39 ++++++++++++++++++++
>>  xen/arch/x86/Makefile         |  4 +-
>>  xen/arch/x86/xen.lds.S        |  5 +++
>>  xen/common/xsplice.c          | 86
>+++++++++++++++++++++++++++++++++++++++++++
>>  xen/include/public/sysctl.h   | 18 +++++++++
>>  xen/include/xen/version.h     |  1 +
>>  8 files changed, 178 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/libxc/include/xenctrl.h
>b/tools/libxc/include/xenctrl.h
>> index 2cd982d..946ddc0 100644
>> --- a/tools/libxc/include/xenctrl.h
>> +++ b/tools/libxc/include/xenctrl.h
>> @@ -2860,6 +2860,7 @@ int xc_xsplice_apply(xc_interface *xch, char
>*id);
>>  int xc_xsplice_revert(xc_interface *xch, char *id);
>>  int xc_xsplice_unload(xc_interface *xch, char *id);
>>  int xc_xsplice_check(xc_interface *xch, char *id);
>> +int xc_xsplice_build_id(xc_interface *xch, char *build_id, unsigned
>int max);
>
>The build id of the current running hypervisor should belong in the
>xeninfo hypercall.  It is not specific to xsplice.


However in the previous reviews it was pointed out that it should only be accessible to dom0.

Or to any domains as long as the XSM allows (and is turned on) - so not the default dummy one.

That is a bit of 'if' extra complexity which I am not sure is worth it?

>
>> diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
>> index d330efe..5728c4b 100644
>> --- a/xen/common/xsplice.c
>> +++ b/xen/common/xsplice.c
>> @@ -14,6 +14,8 @@
>>  #include <xen/sched.h>
>>  #include <xen/lib.h>
>>  #include <xen/xsplice.h>
>> +#include <xen/elf.h>
>> +#include <xen/types.h>
>>  #include <public/sysctl.h>
>>  
>>  #include <asm/event.h>
>> @@ -44,6 +46,36 @@ struct payload {
>>      struct tasklet tasklet;
>>  };
>>  
>> +#ifdef CONFIG_ARM
>> +static int build_id(char **p, unsigned int *len)
>> +{
>> +    return 0;
>> +}
>> +#else
>> +#define NT_GNU_BUILD_ID 3
>> +
>> +extern char * __note_gnu_build_id_start;  /* defined in linker
>script */
>
>This is liable to cause you to deference the first 8 bytes of the note,
>as the symbol is not a char*.
>
>Use extern Elf_Note __note_gnu_build_id_start[]; instead.
>
>Also, it is probably worth having an _end symbol as well and check for
>end > start to confirm that the linker has actually put something in
>there.
>
>~Andrew

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v1 5/5] xsplice: Use ld-embedded build-ids
  2015-09-16 21:59     ` Konrad Rzeszutek Wilk
@ 2015-09-16 22:31       ` Andrew Cooper
  2015-09-17  6:41         ` Martin Pohlack
  0 siblings, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2015-09-16 22:31 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, msw, aliguori, amesserl,
	rick.harris, paul.voccio, steven.wilson, major.hayden,
	josh.kearney, jinsong.liu, xiantao.zxt, boris.ostrovsky,
	daniel.kiper, elena.ufimtseva, bob.liu, lars.kurth, hanweidong,
	peter.huangpeng, fanhenglong, liuyingdong, john.liuqiming,
	jbeulich, mpohlack, ian.campbell
  Cc: Martin Pohlack

On 16/09/2015 22:59, Konrad Rzeszutek Wilk wrote:
> On September 16, 2015 5:41:26 PM EDT, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 16/09/2015 22:01, Konrad Rzeszutek Wilk wrote:
>>> From: Martin Pohlack <mpohlack@amazon.de>
>>>
>>> The mechanism to get this is via the XSPLICE_OP and
>>> we add a new subsequent hypercall to retrieve the
>>> binary build-id. The hypercall allows an arbirarty
>>> size (the buffer is provided to the hypervisor) - however
>>> by default the toolstack will allocate it up to 128
>>> bytes.
>>>
>>> We also add two places for the build-id to be printed:
>>>  - xsplice keyhandler. We cannot use 'hh' in the hypervisor
>>>    snprintf handler (as it is not implemented) so instead
>>>    we use an simpler way to print it.
>>>  - In the 'xen-xsplice' tool add an extra parameter - build-id
>>>    to print this as an human readable value.
>>>
>>> Note that one can also retrieve the value by 'readelf -h xen-syms'.
>>>
>>> Signed-off-by: Martin Pohlack <mpohlack@amazon.de>
>>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>> ---
>>>  tools/libxc/include/xenctrl.h |  1 +
>>>  tools/libxc/xc_misc.c         | 26 +++++++++++++
>>>  tools/misc/xen-xsplice.c      | 39 ++++++++++++++++++++
>>>  xen/arch/x86/Makefile         |  4 +-
>>>  xen/arch/x86/xen.lds.S        |  5 +++
>>>  xen/common/xsplice.c          | 86
>> +++++++++++++++++++++++++++++++++++++++++++
>>>  xen/include/public/sysctl.h   | 18 +++++++++
>>>  xen/include/xen/version.h     |  1 +
>>>  8 files changed, 178 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/tools/libxc/include/xenctrl.h
>> b/tools/libxc/include/xenctrl.h
>>> index 2cd982d..946ddc0 100644
>>> --- a/tools/libxc/include/xenctrl.h
>>> +++ b/tools/libxc/include/xenctrl.h
>>> @@ -2860,6 +2860,7 @@ int xc_xsplice_apply(xc_interface *xch, char
>> *id);
>>>  int xc_xsplice_revert(xc_interface *xch, char *id);
>>>  int xc_xsplice_unload(xc_interface *xch, char *id);
>>>  int xc_xsplice_check(xc_interface *xch, char *id);
>>> +int xc_xsplice_build_id(xc_interface *xch, char *build_id, unsigned
>> int max);
>>
>> The build id of the current running hypervisor should belong in the
>> xeninfo hypercall.  It is not specific to xsplice.
>
> However in the previous reviews it was pointed out that it should only be accessible to dom0.
>
> Or to any domains as long as the XSM allows (and is turned on) - so not the default dummy one.
>
> That is a bit of 'if' extra complexity which I am not sure is worth it?

DomU can already read the build information such as changeset, compile
time, etc.  Build-id is no more special or revealing.

~Andrew

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v1 5/5] xsplice: Use ld-embedded build-ids
  2015-09-16 22:31       ` Andrew Cooper
@ 2015-09-17  6:41         ` Martin Pohlack
  2015-09-17  9:35           ` Andrew Cooper
  0 siblings, 1 reply; 36+ messages in thread
From: Martin Pohlack @ 2015-09-17  6:41 UTC (permalink / raw)
  To: Andrew Cooper, Konrad Rzeszutek Wilk, xen-devel, msw, aliguori,
	amesserl, rick.harris, paul.voccio, steven.wilson, major.hayden,
	josh.kearney, jinsong.liu, xiantao.zxt, boris.ostrovsky,
	daniel.kiper, elena.ufimtseva, bob.liu, lars.kurth, hanweidong,
	peter.huangpeng, fanhenglong, liuyingdong, john.liuqiming,
	jbeulich, ian.campbell
  Cc: Martin Pohlack

On 17.09.2015 00:31, Andrew Cooper wrote:
> On 16/09/2015 22:59, Konrad Rzeszutek Wilk wrote:
>> On September 16, 2015 5:41:26 PM EDT, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> On 16/09/2015 22:01, Konrad Rzeszutek Wilk wrote:
>>>> From: Martin Pohlack <mpohlack@amazon.de>
>>>>
>>>> The mechanism to get this is via the XSPLICE_OP and
>>>> we add a new subsequent hypercall to retrieve the
>>>> binary build-id. The hypercall allows an arbirarty
>>>> size (the buffer is provided to the hypervisor) - however
>>>> by default the toolstack will allocate it up to 128
>>>> bytes.
>>>>
>>>> We also add two places for the build-id to be printed:
>>>>  - xsplice keyhandler. We cannot use 'hh' in the hypervisor
>>>>    snprintf handler (as it is not implemented) so instead
>>>>    we use an simpler way to print it.
>>>>  - In the 'xen-xsplice' tool add an extra parameter - build-id
>>>>    to print this as an human readable value.
>>>>
>>>> Note that one can also retrieve the value by 'readelf -h xen-syms'.
>>>>
>>>> Signed-off-by: Martin Pohlack <mpohlack@amazon.de>
>>>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>>> ---
>>>>  tools/libxc/include/xenctrl.h |  1 +
>>>>  tools/libxc/xc_misc.c         | 26 +++++++++++++
>>>>  tools/misc/xen-xsplice.c      | 39 ++++++++++++++++++++
>>>>  xen/arch/x86/Makefile         |  4 +-
>>>>  xen/arch/x86/xen.lds.S        |  5 +++
>>>>  xen/common/xsplice.c          | 86
>>> +++++++++++++++++++++++++++++++++++++++++++
>>>>  xen/include/public/sysctl.h   | 18 +++++++++
>>>>  xen/include/xen/version.h     |  1 +
>>>>  8 files changed, 178 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/tools/libxc/include/xenctrl.h
>>> b/tools/libxc/include/xenctrl.h
>>>> index 2cd982d..946ddc0 100644
>>>> --- a/tools/libxc/include/xenctrl.h
>>>> +++ b/tools/libxc/include/xenctrl.h
>>>> @@ -2860,6 +2860,7 @@ int xc_xsplice_apply(xc_interface *xch, char
>>> *id);
>>>>  int xc_xsplice_revert(xc_interface *xch, char *id);
>>>>  int xc_xsplice_unload(xc_interface *xch, char *id);
>>>>  int xc_xsplice_check(xc_interface *xch, char *id);
>>>> +int xc_xsplice_build_id(xc_interface *xch, char *build_id, unsigned
>>> int max);
>>>
>>> The build id of the current running hypervisor should belong in the
>>> xeninfo hypercall.  It is not specific to xsplice.
>>
>> However in the previous reviews it was pointed out that it should only be accessible to dom0.
>>
>> Or to any domains as long as the XSM allows (and is turned on) - so not the default dummy one.
>>
>> That is a bit of 'if' extra complexity which I am not sure is worth it?
> 
> DomU can already read the build information such as changeset, compile
> time, etc.  Build-id is no more special or revealing.

I would take this as an argument *against* giving DomU access to those
pieces of information in details and not as an argument for
*additionally* giving it access to build-id.

With build-id we have the chance to shape a not-yet-established API and
I would like to follow the Principle of least privilege wherever it
makes sense.

To reach a similar security level with the existing API, it might make
sense to limit DomU access to compile date, compile time, compiled by,
compiled domain, compiler version and command line details, xen extra
version, and xen changeset.  Basically anything that might help DomUs to
uniquely identify a Xen build.

The approach can not directly drop access to those fields, as that would
break an existing API, but it could restrict the detail level handed out
to DomU.

Martin
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v1 5/5] xsplice: Use ld-embedded build-ids
  2015-09-17  6:41         ` Martin Pohlack
@ 2015-09-17  9:35           ` Andrew Cooper
  2015-09-17 18:45             ` Is: Make XENVER_* use XSM, seperate the different ops in smaller security domains. Was:Re: " Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2015-09-17  9:35 UTC (permalink / raw)
  To: Martin Pohlack, Konrad Rzeszutek Wilk, xen-devel, msw, aliguori,
	amesserl, rick.harris, paul.voccio, steven.wilson, major.hayden,
	josh.kearney, jinsong.liu, xiantao.zxt, boris.ostrovsky,
	daniel.kiper, elena.ufimtseva, bob.liu, lars.kurth, hanweidong,
	peter.huangpeng, fanhenglong, liuyingdong, john.liuqiming,
	jbeulich, ian.campbell
  Cc: Martin Pohlack

On 17/09/15 07:41, Martin Pohlack wrote:
> On 17.09.2015 00:31, Andrew Cooper wrote:
>> On 16/09/2015 22:59, Konrad Rzeszutek Wilk wrote:
>>> On September 16, 2015 5:41:26 PM EDT, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>> On 16/09/2015 22:01, Konrad Rzeszutek Wilk wrote:
>>>>> From: Martin Pohlack <mpohlack@amazon.de>
>>>>>
>>>>> The mechanism to get this is via the XSPLICE_OP and
>>>>> we add a new subsequent hypercall to retrieve the
>>>>> binary build-id. The hypercall allows an arbirarty
>>>>> size (the buffer is provided to the hypervisor) - however
>>>>> by default the toolstack will allocate it up to 128
>>>>> bytes.
>>>>>
>>>>> We also add two places for the build-id to be printed:
>>>>>  - xsplice keyhandler. We cannot use 'hh' in the hypervisor
>>>>>    snprintf handler (as it is not implemented) so instead
>>>>>    we use an simpler way to print it.
>>>>>  - In the 'xen-xsplice' tool add an extra parameter - build-id
>>>>>    to print this as an human readable value.
>>>>>
>>>>> Note that one can also retrieve the value by 'readelf -h xen-syms'.
>>>>>
>>>>> Signed-off-by: Martin Pohlack <mpohlack@amazon.de>
>>>>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>>>> ---
>>>>>  tools/libxc/include/xenctrl.h |  1 +
>>>>>  tools/libxc/xc_misc.c         | 26 +++++++++++++
>>>>>  tools/misc/xen-xsplice.c      | 39 ++++++++++++++++++++
>>>>>  xen/arch/x86/Makefile         |  4 +-
>>>>>  xen/arch/x86/xen.lds.S        |  5 +++
>>>>>  xen/common/xsplice.c          | 86
>>>> +++++++++++++++++++++++++++++++++++++++++++
>>>>>  xen/include/public/sysctl.h   | 18 +++++++++
>>>>>  xen/include/xen/version.h     |  1 +
>>>>>  8 files changed, 178 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/tools/libxc/include/xenctrl.h
>>>> b/tools/libxc/include/xenctrl.h
>>>>> index 2cd982d..946ddc0 100644
>>>>> --- a/tools/libxc/include/xenctrl.h
>>>>> +++ b/tools/libxc/include/xenctrl.h
>>>>> @@ -2860,6 +2860,7 @@ int xc_xsplice_apply(xc_interface *xch, char
>>>> *id);
>>>>>  int xc_xsplice_revert(xc_interface *xch, char *id);
>>>>>  int xc_xsplice_unload(xc_interface *xch, char *id);
>>>>>  int xc_xsplice_check(xc_interface *xch, char *id);
>>>>> +int xc_xsplice_build_id(xc_interface *xch, char *build_id, unsigned
>>>> int max);
>>>>
>>>> The build id of the current running hypervisor should belong in the
>>>> xeninfo hypercall.  It is not specific to xsplice.
>>> However in the previous reviews it was pointed out that it should only be accessible to dom0.
>>>
>>> Or to any domains as long as the XSM allows (and is turned on) - so not the default dummy one.
>>>
>>> That is a bit of 'if' extra complexity which I am not sure is worth it?
>> DomU can already read the build information such as changeset, compile
>> time, etc.  Build-id is no more special or revealing.
> I would take this as an argument *against* giving DomU access to those
> pieces of information in details and not as an argument for
> *additionally* giving it access to build-id.
>
> With build-id we have the chance to shape a not-yet-established API and
> I would like to follow the Principle of least privilege wherever it
> makes sense.
>
> To reach a similar security level with the existing API, it might make
> sense to limit DomU access to compile date, compile time, compiled by,
> compiled domain, compiler version and command line details, xen extra
> version, and xen changeset.  Basically anything that might help DomUs to
> uniquely identify a Xen build.
>
> The approach can not directly drop access to those fields, as that would
> break an existing API, but it could restrict the detail level handed out
> to DomU.

These are all valid arguments to be made, but please lets fix the issue
properly rather than hacking build-id on the side of an unrelated component.

>From my point of view, the correct course of action is this:

* Split the current XSM model to contain separate attributes for general
and privileged information.
** For current compatibility, all existing XENVER_* subops fall into general
* Apply an XSM check at the very start of the hypercall.
* Extend do_xen_version() to take 3 parameters.  (It is curious that it
didn't take a length parameter before)
** This is still ABI compatible, as existing subops simply ignore the
parameter.
* Introduce new XENVER_build_id subop which is documented to require the
3-parameter version of the hypercall.
** This subop falls into straight into privileged information.

This will introduce build-id in its correct location, with appropriate
restrictions.

Moving forwards, we really should have an audit of the existing XENVER_*
subops.  Some are clearly safe/required for domU to read, but some such
as XENVER_commandline have no business being accessible.  A separate
argument, from the repeatable build point of view, says that compilation
information isn't useful at all.

Depending on how we wish to fix the issue, we could either do a blanket
move of the subops into the privileged XSM catagory, or introduce a 3rd
"legacy privileged" category to allow the admin to control access on a
per-vm basis.

~Andrew

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Is: Make XENVER_* use XSM, seperate the different ops in smaller security domains. Was:Re: [PATCH v1 5/5] xsplice: Use ld-embedded build-ids
  2015-09-17  9:35           ` Andrew Cooper
@ 2015-09-17 18:45             ` Konrad Rzeszutek Wilk
  2015-09-18 11:40               ` Andrew Cooper
  2015-09-22 16:28               ` Daniel De Graaf
  0 siblings, 2 replies; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-09-17 18:45 UTC (permalink / raw)
  To: Andrew Cooper, dgdegra, ian.jackson, wei.liu2
  Cc: elena.ufimtseva, hanweidong, Martin Pohlack, jbeulich,
	john.liuqiming, paul.voccio, daniel.kiper, major.hayden,
	liuyingdong, aliguori, xen-devel, lars.kurth, steven.wilson,
	ian.campbell, peter.huangpeng, msw, xiantao.zxt, rick.harris,
	boris.ostrovsky, josh.kearney, jinsong.liu, amesserl,
	Martin Pohlack, fanhenglong

. snip..
> >>>> The build id of the current running hypervisor should belong in the
> >>>> xeninfo hypercall.  It is not specific to xsplice.
> >>> However in the previous reviews it was pointed out that it should only be accessible to dom0.
> >>>
> >>> Or to any domains as long as the XSM allows (and is turned on) - so not the default dummy one.
> >>>
> >>> That is a bit of 'if' extra complexity which I am not sure is worth it?
> >> DomU can already read the build information such as changeset, compile
> >> time, etc.  Build-id is no more special or revealing.
> > I would take this as an argument *against* giving DomU access to those
> > pieces of information in details and not as an argument for
> > *additionally* giving it access to build-id.
> >
> > With build-id we have the chance to shape a not-yet-established API and
> > I would like to follow the Principle of least privilege wherever it
> > makes sense.
> >
> > To reach a similar security level with the existing API, it might make
> > sense to limit DomU access to compile date, compile time, compiled by,
> > compiled domain, compiler version and command line details, xen extra
> > version, and xen changeset.  Basically anything that might help DomUs to
> > uniquely identify a Xen build.
> >
> > The approach can not directly drop access to those fields, as that would
> > break an existing API, but it could restrict the detail level handed out
> > to DomU.
> 
> These are all valid arguments to be made, but please lets fix the issue
> properly rather than hacking build-id on the side of an unrelated component.
> 
> From my point of view, the correct course of action is this:
> 
> * Split the current XSM model to contain separate attributes for general
> and privileged information.
> ** For current compatibility, all existing XENVER_* subops fall into general
> * Apply an XSM check at the very start of the hypercall.
> * Extend do_xen_version() to take 3 parameters.  (It is curious that it
> didn't take a length parameter before)
> ** This is still ABI compatible, as existing subops simply ignore the
> parameter.

Or we can just use 1024 bytes space the XENVER_* use.

> * Introduce new XENVER_build_id subop which is documented to require the
> 3-parameter version of the hypercall.
> ** This subop falls into straight into privileged information.
> 
> This will introduce build-id in its correct location, with appropriate
> restrictions.
> 
> Moving forwards, we really should have an audit of the existing XENVER_*
> subops.  Some are clearly safe/required for domU to read, but some such
> as XENVER_commandline have no business being accessible.  A separate
> argument, from the repeatable build point of view, says that compilation
> information isn't useful at all.
> 
> Depending on how we wish to fix the issue, we could either do a blanket
> move of the subops into the privileged XSM catagory, or introduce a 3rd
> "legacy privileged" category to allow the admin to control access on a
> per-vm basis.

CC-ing Daniel. Changing title.
> 
> ~Andrew

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Is: Make XENVER_* use XSM, seperate the different ops in smaller security domains. Was:Re: [PATCH v1 5/5] xsplice: Use ld-embedded build-ids
  2015-09-17 18:45             ` Is: Make XENVER_* use XSM, seperate the different ops in smaller security domains. Was:Re: " Konrad Rzeszutek Wilk
@ 2015-09-18 11:40               ` Andrew Cooper
  2015-09-22 13:22                 ` Konrad Rzeszutek Wilk
  2015-09-22 16:28               ` Daniel De Graaf
  1 sibling, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2015-09-18 11:40 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, dgdegra, ian.jackson, wei.liu2
  Cc: elena.ufimtseva, hanweidong, Martin Pohlack, jbeulich,
	john.liuqiming, paul.voccio, daniel.kiper, major.hayden,
	liuyingdong, aliguori, xen-devel, lars.kurth, steven.wilson,
	ian.campbell, peter.huangpeng, msw, xiantao.zxt, rick.harris,
	boris.ostrovsky, josh.kearney, jinsong.liu, amesserl,
	Martin Pohlack, fanhenglong

On 17/09/15 19:45, Konrad Rzeszutek Wilk wrote:
> . snip..
>>>>>> The build id of the current running hypervisor should belong in the
>>>>>> xeninfo hypercall.  It is not specific to xsplice.
>>>>> However in the previous reviews it was pointed out that it should only be accessible to dom0.
>>>>>
>>>>> Or to any domains as long as the XSM allows (and is turned on) - so not the default dummy one.
>>>>>
>>>>> That is a bit of 'if' extra complexity which I am not sure is worth it?
>>>> DomU can already read the build information such as changeset, compile
>>>> time, etc.  Build-id is no more special or revealing.
>>> I would take this as an argument *against* giving DomU access to those
>>> pieces of information in details and not as an argument for
>>> *additionally* giving it access to build-id.
>>>
>>> With build-id we have the chance to shape a not-yet-established API and
>>> I would like to follow the Principle of least privilege wherever it
>>> makes sense.
>>>
>>> To reach a similar security level with the existing API, it might make
>>> sense to limit DomU access to compile date, compile time, compiled by,
>>> compiled domain, compiler version and command line details, xen extra
>>> version, and xen changeset.  Basically anything that might help DomUs to
>>> uniquely identify a Xen build.
>>>
>>> The approach can not directly drop access to those fields, as that would
>>> break an existing API, but it could restrict the detail level handed out
>>> to DomU.
>> These are all valid arguments to be made, but please lets fix the issue
>> properly rather than hacking build-id on the side of an unrelated component.
>>
>> From my point of view, the correct course of action is this:
>>
>> * Split the current XSM model to contain separate attributes for general
>> and privileged information.
>> ** For current compatibility, all existing XENVER_* subops fall into general
>> * Apply an XSM check at the very start of the hypercall.
>> * Extend do_xen_version() to take 3 parameters.  (It is curious that it
>> didn't take a length parameter before)
>> ** This is still ABI compatible, as existing subops simply ignore the
>> parameter.
> Or we can just use 1024 bytes space the XENVER_* use.

What 1024 bytes?

Each subop currently assumes the guest handle is a pointer to an
appropriately typed structure, which puts arbitrary and unnecessary
length restrictions on items.

~Andrew

>
>> * Introduce new XENVER_build_id subop which is documented to require the
>> 3-parameter version of the hypercall.
>> ** This subop falls into straight into privileged information.
>>
>> This will introduce build-id in its correct location, with appropriate
>> restrictions.
>>
>> Moving forwards, we really should have an audit of the existing XENVER_*
>> subops.  Some are clearly safe/required for domU to read, but some such
>> as XENVER_commandline have no business being accessible.  A separate
>> argument, from the repeatable build point of view, says that compilation
>> information isn't useful at all.
>>
>> Depending on how we wish to fix the issue, we could either do a blanket
>> move of the subops into the privileged XSM catagory, or introduce a 3rd
>> "legacy privileged" category to allow the admin to control access on a
>> per-vm basis.
> CC-ing Daniel. Changing title.
>> ~Andrew

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Is: Make XENVER_* use XSM, seperate the different ops in smaller security domains. Was:Re: [PATCH v1 5/5] xsplice: Use ld-embedded build-ids
  2015-09-18 11:40               ` Andrew Cooper
@ 2015-09-22 13:22                 ` Konrad Rzeszutek Wilk
  2015-09-22 13:33                   ` Andrew Cooper
  0 siblings, 1 reply; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-09-22 13:22 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: elena.ufimtseva, hanweidong, Martin Pohlack, jbeulich,
	john.liuqiming, paul.voccio, daniel.kiper, major.hayden,
	liuyingdong, aliguori, xen-devel, lars.kurth, steven.wilson,
	ian.campbell, peter.huangpeng, msw, xiantao.zxt, rick.harris,
	boris.ostrovsky, josh.kearney, wei.liu2, jinsong.liu, amesserl,
	ian.jackson, Martin Pohlack, fanhenglong, dgdegra

On Fri, Sep 18, 2015 at 12:40:46PM +0100, Andrew Cooper wrote:
> On 17/09/15 19:45, Konrad Rzeszutek Wilk wrote:
> > . snip..
> >>>>>> The build id of the current running hypervisor should belong in the
> >>>>>> xeninfo hypercall.  It is not specific to xsplice.
> >>>>> However in the previous reviews it was pointed out that it should only be accessible to dom0.
> >>>>>
> >>>>> Or to any domains as long as the XSM allows (and is turned on) - so not the default dummy one.
> >>>>>
> >>>>> That is a bit of 'if' extra complexity which I am not sure is worth it?
> >>>> DomU can already read the build information such as changeset, compile
> >>>> time, etc.  Build-id is no more special or revealing.
> >>> I would take this as an argument *against* giving DomU access to those
> >>> pieces of information in details and not as an argument for
> >>> *additionally* giving it access to build-id.
> >>>
> >>> With build-id we have the chance to shape a not-yet-established API and
> >>> I would like to follow the Principle of least privilege wherever it
> >>> makes sense.
> >>>
> >>> To reach a similar security level with the existing API, it might make
> >>> sense to limit DomU access to compile date, compile time, compiled by,
> >>> compiled domain, compiler version and command line details, xen extra
> >>> version, and xen changeset.  Basically anything that might help DomUs to
> >>> uniquely identify a Xen build.
> >>>
> >>> The approach can not directly drop access to those fields, as that would
> >>> break an existing API, but it could restrict the detail level handed out
> >>> to DomU.
> >> These are all valid arguments to be made, but please lets fix the issue
> >> properly rather than hacking build-id on the side of an unrelated component.
> >>
> >> From my point of view, the correct course of action is this:
> >>
> >> * Split the current XSM model to contain separate attributes for general
> >> and privileged information.
> >> ** For current compatibility, all existing XENVER_* subops fall into general
> >> * Apply an XSM check at the very start of the hypercall.

That would introduce a performance regression I fear. Linux pvops does this:

                                                                              
/*                                                                              
 * Force a proper event-channel callback from Xen after clearing the            
 * callback mask. We do this in a very simple manner, by making a call          
 * down into Xen. The pending flag will be checked by Xen on return.            
 */                                                                             
void xen_force_evtchn_callback(void)                                            
{                                                                               
        (void)HYPERVISOR_xen_version(0, NULL);                                  
}                                          

quite often, which now will have to do the XSM check which is extra code.


I would say that the XENVER_compile_info (/sys/hypervisor/compilation),
XENVER_changeset (/sys/hypervisor/compilation) should go over
the XSM check.

While:XENVER_version, XENVER_extraversion,XENVER_capabilities,
XENVER_platform_parameters, XENVER_get_features,XENVER_pagesize

should have no XSM check.

> >> * Extend do_xen_version() to take 3 parameters.  (It is curious that it
> >> didn't take a length parameter before)
> >> ** This is still ABI compatible, as existing subops simply ignore the
> >> parameter.
> > Or we can just use 1024 bytes space the XENVER_* use.
> 
> What 1024 bytes?
> 
> Each subop currently assumes the guest handle is a pointer to an
> appropriately typed structure, which puts arbitrary and unnecessary
> length restrictions on items.
> 
> ~Andrew
> 
> >
> >> * Introduce new XENVER_build_id subop which is documented to require the
> >> 3-parameter version of the hypercall.
> >> ** This subop falls into straight into privileged information.
> >>
> >> This will introduce build-id in its correct location, with appropriate
> >> restrictions.
> >>
> >> Moving forwards, we really should have an audit of the existing XENVER_*
> >> subops.  Some are clearly safe/required for domU to read, but some such
> >> as XENVER_commandline have no business being accessible.  A separate
> >> argument, from the repeatable build point of view, says that compilation
> >> information isn't useful at all.
> >>
> >> Depending on how we wish to fix the issue, we could either do a blanket
> >> move of the subops into the privileged XSM catagory, or introduce a 3rd
> >> "legacy privileged" category to allow the admin to control access on a
> >> per-vm basis.
> > CC-ing Daniel. Changing title.
> >> ~Andrew
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Is: Make XENVER_* use XSM, seperate the different ops in smaller security domains. Was:Re: [PATCH v1 5/5] xsplice: Use ld-embedded build-ids
  2015-09-22 13:22                 ` Konrad Rzeszutek Wilk
@ 2015-09-22 13:33                   ` Andrew Cooper
  2015-09-22 13:45                     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2015-09-22 13:33 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: elena.ufimtseva, hanweidong, Martin Pohlack, jbeulich,
	john.liuqiming, paul.voccio, daniel.kiper, major.hayden,
	liuyingdong, aliguori, xen-devel, lars.kurth, steven.wilson,
	ian.campbell, peter.huangpeng, msw, xiantao.zxt, rick.harris,
	boris.ostrovsky, josh.kearney, wei.liu2, jinsong.liu, amesserl,
	ian.jackson, Martin Pohlack, fanhenglong, dgdegra

On 22/09/15 14:22, Konrad Rzeszutek Wilk wrote:
> On Fri, Sep 18, 2015 at 12:40:46PM +0100, Andrew Cooper wrote:
>> On 17/09/15 19:45, Konrad Rzeszutek Wilk wrote:
>>> . snip..
>>>>>>>> The build id of the current running hypervisor should belong in the
>>>>>>>> xeninfo hypercall.  It is not specific to xsplice.
>>>>>>> However in the previous reviews it was pointed out that it should only be accessible to dom0.
>>>>>>>
>>>>>>> Or to any domains as long as the XSM allows (and is turned on) - so not the default dummy one.
>>>>>>>
>>>>>>> That is a bit of 'if' extra complexity which I am not sure is worth it?
>>>>>> DomU can already read the build information such as changeset, compile
>>>>>> time, etc.  Build-id is no more special or revealing.
>>>>> I would take this as an argument *against* giving DomU access to those
>>>>> pieces of information in details and not as an argument for
>>>>> *additionally* giving it access to build-id.
>>>>>
>>>>> With build-id we have the chance to shape a not-yet-established API and
>>>>> I would like to follow the Principle of least privilege wherever it
>>>>> makes sense.
>>>>>
>>>>> To reach a similar security level with the existing API, it might make
>>>>> sense to limit DomU access to compile date, compile time, compiled by,
>>>>> compiled domain, compiler version and command line details, xen extra
>>>>> version, and xen changeset.  Basically anything that might help DomUs to
>>>>> uniquely identify a Xen build.
>>>>>
>>>>> The approach can not directly drop access to those fields, as that would
>>>>> break an existing API, but it could restrict the detail level handed out
>>>>> to DomU.
>>>> These are all valid arguments to be made, but please lets fix the issue
>>>> properly rather than hacking build-id on the side of an unrelated component.
>>>>
>>>>  From my point of view, the correct course of action is this:
>>>>
>>>> * Split the current XSM model to contain separate attributes for general
>>>> and privileged information.
>>>> ** For current compatibility, all existing XENVER_* subops fall into general
>>>> * Apply an XSM check at the very start of the hypercall.
> That would introduce a performance regression I fear. Linux pvops does this:
>
>                                                                                
> /*
>   * Force a proper event-channel callback from Xen after clearing the
>   * callback mask. We do this in a very simple manner, by making a call
>   * down into Xen. The pending flag will be checked by Xen on return.
>   */
> void xen_force_evtchn_callback(void)
> {
>          (void)HYPERVISOR_xen_version(0, NULL);
> }
>
> quite often, which now will have to do the XSM check which is extra code.
>
>
> I would say that the XENVER_compile_info (/sys/hypervisor/compilation),
> XENVER_changeset (/sys/hypervisor/compilation) should go over
> the XSM check.
>
> While:XENVER_version, XENVER_extraversion,XENVER_capabilities,
> XENVER_platform_parameters, XENVER_get_features,XENVER_pagesize
>
> should have no XSM check.

The XSM check will fall into the noise, performance wise, compared to 
the context switch to make the hypercall in the first place.  It is just 
another switch statement.  Also, selectively applying XSM checks will 
incur even more overhead than doing a blanket XSM check.

Also, I really don't care if you can measure a performance hit (not that 
I reckon you could).  How Linux chooses to behave itself has absolutely 
no bearing on how we go about securing the hypercall.

~Andrew

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Is: Make XENVER_* use XSM, seperate the different ops in smaller security domains. Was:Re: [PATCH v1 5/5] xsplice: Use ld-embedded build-ids
  2015-09-22 13:33                   ` Andrew Cooper
@ 2015-09-22 13:45                     ` Konrad Rzeszutek Wilk
  2015-09-22 16:28                       ` Daniel De Graaf
  0 siblings, 1 reply; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-09-22 13:45 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: elena.ufimtseva, hanweidong, Martin Pohlack, jbeulich,
	john.liuqiming, paul.voccio, daniel.kiper, major.hayden,
	liuyingdong, aliguori, xen-devel, lars.kurth, steven.wilson,
	ian.campbell, peter.huangpeng, msw, xiantao.zxt, rick.harris,
	boris.ostrovsky, josh.kearney, wei.liu2, jinsong.liu, amesserl,
	ian.jackson, Martin Pohlack, fanhenglong, dgdegra

On Tue, Sep 22, 2015 at 02:33:23PM +0100, Andrew Cooper wrote:
> On 22/09/15 14:22, Konrad Rzeszutek Wilk wrote:
> >On Fri, Sep 18, 2015 at 12:40:46PM +0100, Andrew Cooper wrote:
> >>On 17/09/15 19:45, Konrad Rzeszutek Wilk wrote:
> >>>. snip..
> >>>>>>>>The build id of the current running hypervisor should belong in the
> >>>>>>>>xeninfo hypercall.  It is not specific to xsplice.
> >>>>>>>However in the previous reviews it was pointed out that it should only be accessible to dom0.
> >>>>>>>
> >>>>>>>Or to any domains as long as the XSM allows (and is turned on) - so not the default dummy one.
> >>>>>>>
> >>>>>>>That is a bit of 'if' extra complexity which I am not sure is worth it?
> >>>>>>DomU can already read the build information such as changeset, compile
> >>>>>>time, etc.  Build-id is no more special or revealing.
> >>>>>I would take this as an argument *against* giving DomU access to those
> >>>>>pieces of information in details and not as an argument for
> >>>>>*additionally* giving it access to build-id.
> >>>>>
> >>>>>With build-id we have the chance to shape a not-yet-established API and
> >>>>>I would like to follow the Principle of least privilege wherever it
> >>>>>makes sense.
> >>>>>
> >>>>>To reach a similar security level with the existing API, it might make
> >>>>>sense to limit DomU access to compile date, compile time, compiled by,
> >>>>>compiled domain, compiler version and command line details, xen extra
> >>>>>version, and xen changeset.  Basically anything that might help DomUs to
> >>>>>uniquely identify a Xen build.
> >>>>>
> >>>>>The approach can not directly drop access to those fields, as that would
> >>>>>break an existing API, but it could restrict the detail level handed out
> >>>>>to DomU.
> >>>>These are all valid arguments to be made, but please lets fix the issue
> >>>>properly rather than hacking build-id on the side of an unrelated component.
> >>>>
> >>>> From my point of view, the correct course of action is this:
> >>>>
> >>>>* Split the current XSM model to contain separate attributes for general
> >>>>and privileged information.
> >>>>** For current compatibility, all existing XENVER_* subops fall into general
> >>>>* Apply an XSM check at the very start of the hypercall.
> >That would introduce a performance regression I fear. Linux pvops does this:
> >
> >/*
> >  * Force a proper event-channel callback from Xen after clearing the
> >  * callback mask. We do this in a very simple manner, by making a call
> >  * down into Xen. The pending flag will be checked by Xen on return.
> >  */
> >void xen_force_evtchn_callback(void)
> >{
> >         (void)HYPERVISOR_xen_version(0, NULL);
> >}
> >
> >quite often, which now will have to do the XSM check which is extra code.
> >
> >
> >I would say that the XENVER_compile_info (/sys/hypervisor/compilation),
> >XENVER_changeset (/sys/hypervisor/compilation) should go over
> >the XSM check.
> >
> >While:XENVER_version, XENVER_extraversion,XENVER_capabilities,
> >XENVER_platform_parameters, XENVER_get_features,XENVER_pagesize
> >
> >should have no XSM check.
> 
> The XSM check will fall into the noise, performance wise, compared to the
> context switch to make the hypercall in the first place.  It is just another
> switch statement.  Also, selectively applying XSM checks will incur even
> more overhead than doing a blanket XSM check.

I am worried about some spinlock in the depths of XSM code.

But then I haven't looked in detail so perhaps this is not an issue after all.

> 
> Also, I really don't care if you can measure a performance hit (not that I
> reckon you could).  How Linux chooses to behave itself has absolutely no
> bearing on how we go about securing the hypercall.

But making something slower is surely not something we strive for.

> 
> ~Andrew

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Is: Make XENVER_* use XSM, seperate the different ops in smaller security domains. Was:Re: [PATCH v1 5/5] xsplice: Use ld-embedded build-ids
  2015-09-22 13:45                     ` Konrad Rzeszutek Wilk
@ 2015-09-22 16:28                       ` Daniel De Graaf
  0 siblings, 0 replies; 36+ messages in thread
From: Daniel De Graaf @ 2015-09-22 16:28 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Andrew Cooper
  Cc: elena.ufimtseva, hanweidong, Martin Pohlack, jbeulich,
	john.liuqiming, paul.voccio, daniel.kiper, major.hayden,
	liuyingdong, aliguori, xen-devel, lars.kurth, steven.wilson,
	ian.campbell, peter.huangpeng, msw, xiantao.zxt, rick.harris,
	boris.ostrovsky, josh.kearney, wei.liu2, jinsong.liu, amesserl,
	ian.jackson, Martin Pohlack, fanhenglong

On 22/09/15 09:45, Konrad Rzeszutek Wilk wrote:
> On Tue, Sep 22, 2015 at 02:33:23PM +0100, Andrew Cooper wrote:
>> On 22/09/15 14:22, Konrad Rzeszutek Wilk wrote:
>>> On Fri, Sep 18, 2015 at 12:40:46PM +0100, Andrew Cooper wrote:
>>>> On 17/09/15 19:45, Konrad Rzeszutek Wilk wrote:
>>>>> . snip..
>>>>>>>>>> The build id of the current running hypervisor should belong in the
>>>>>>>>>> xeninfo hypercall.  It is not specific to xsplice.
>>>>>>>>> However in the previous reviews it was pointed out that it should only be accessible to dom0.
>>>>>>>>>
>>>>>>>>> Or to any domains as long as the XSM allows (and is turned on) - so not the default dummy one.
>>>>>>>>>
>>>>>>>>> That is a bit of 'if' extra complexity which I am not sure is worth it?
>>>>>>>> DomU can already read the build information such as changeset, compile
>>>>>>>> time, etc.  Build-id is no more special or revealing.
>>>>>>> I would take this as an argument *against* giving DomU access to those
>>>>>>> pieces of information in details and not as an argument for
>>>>>>> *additionally* giving it access to build-id.
>>>>>>>
>>>>>>> With build-id we have the chance to shape a not-yet-established API and
>>>>>>> I would like to follow the Principle of least privilege wherever it
>>>>>>> makes sense.
>>>>>>>
>>>>>>> To reach a similar security level with the existing API, it might make
>>>>>>> sense to limit DomU access to compile date, compile time, compiled by,
>>>>>>> compiled domain, compiler version and command line details, xen extra
>>>>>>> version, and xen changeset.  Basically anything that might help DomUs to
>>>>>>> uniquely identify a Xen build.
>>>>>>>
>>>>>>> The approach can not directly drop access to those fields, as that would
>>>>>>> break an existing API, but it could restrict the detail level handed out
>>>>>>> to DomU.
>>>>>> These are all valid arguments to be made, but please lets fix the issue
>>>>>> properly rather than hacking build-id on the side of an unrelated component.
>>>>>>
>>>>>>  From my point of view, the correct course of action is this:
>>>>>>
>>>>>> * Split the current XSM model to contain separate attributes for general
>>>>>> and privileged information.
>>>>>> ** For current compatibility, all existing XENVER_* subops fall into general
>>>>>> * Apply an XSM check at the very start of the hypercall.
>>> That would introduce a performance regression I fear. Linux pvops does this:
>>>
>>> /*
>>>   * Force a proper event-channel callback from Xen after clearing the
>>>   * callback mask. We do this in a very simple manner, by making a call
>>>   * down into Xen. The pending flag will be checked by Xen on return.
>>>   */
>>> void xen_force_evtchn_callback(void)
>>> {
>>>          (void)HYPERVISOR_xen_version(0, NULL);
>>> }
>>>
>>> quite often, which now will have to do the XSM check which is extra code.
>>>
>>>
>>> I would say that the XENVER_compile_info (/sys/hypervisor/compilation),
>>> XENVER_changeset (/sys/hypervisor/compilation) should go over
>>> the XSM check.
>>>
>>> While:XENVER_version, XENVER_extraversion,XENVER_capabilities,
>>> XENVER_platform_parameters, XENVER_get_features,XENVER_pagesize
>>>
>>> should have no XSM check.
>>
>> The XSM check will fall into the noise, performance wise, compared to the
>> context switch to make the hypercall in the first place.  It is just another
>> switch statement.  Also, selectively applying XSM checks will incur even
>> more overhead than doing a blanket XSM check.

I would be surprised if placing the XSM check inside the individual case
statement is more expensive than a single XSM check outside the switch:
in either case, it's either a single call or a single if statement, and
since there's only 10 cases (6 of which were identified as not needing a
check) it's not much code added either.

You could also define a bitmask of operations that require an XSM check
and do "if ((1 << cmd) & NEEDS_XSM_CHECK_BITMASK) { do_xsm_check() }".

> I am worried about some spinlock in the depths of XSM code.
>
> But then I haven't looked in detail so perhaps this is not an issue after all.

The XSM code tries to use read locks where possible to avoid this type of
performance hit, and if the call is being made as often as it sounds like,
the check will almost always hit the AVC cache (avoiding the more expensive
parts of the FLASK security server).  However, it's always faster to return
success right away (and no less secure if you were going to do so anyway).

>> Also, I really don't care if you can measure a performance hit (not that I
>> reckon you could).  How Linux chooses to behave itself has absolutely no
>> bearing on how we go about securing the hypercall.
>
> But making something slower is surely not something we strive for.
>
>>
>> ~Andrew
>


-- 
Daniel De Graaf
National Security Agency

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Is: Make XENVER_* use XSM, seperate the different ops in smaller security domains. Was:Re: [PATCH v1 5/5] xsplice: Use ld-embedded build-ids
  2015-09-17 18:45             ` Is: Make XENVER_* use XSM, seperate the different ops in smaller security domains. Was:Re: " Konrad Rzeszutek Wilk
  2015-09-18 11:40               ` Andrew Cooper
@ 2015-09-22 16:28               ` Daniel De Graaf
  2015-09-25 20:18                 ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 36+ messages in thread
From: Daniel De Graaf @ 2015-09-22 16:28 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Andrew Cooper, ian.jackson, wei.liu2
  Cc: elena.ufimtseva, hanweidong, Martin Pohlack, jbeulich,
	john.liuqiming, paul.voccio, daniel.kiper, major.hayden,
	liuyingdong, aliguori, xen-devel, lars.kurth, steven.wilson,
	ian.campbell, peter.huangpeng, msw, xiantao.zxt, rick.harris,
	boris.ostrovsky, josh.kearney, jinsong.liu, amesserl,
	Martin Pohlack, fanhenglong

On 17/09/15 14:45, Konrad Rzeszutek Wilk wrote:
> . snip..
>>>>>> The build id of the current running hypervisor should belong in the
>>>>>> xeninfo hypercall.  It is not specific to xsplice.
>>>>> However in the previous reviews it was pointed out that it should only be accessible to dom0.
>>>>>
>>>>> Or to any domains as long as the XSM allows (and is turned on) - so not the default dummy one.
>>>>>
>>>>> That is a bit of 'if' extra complexity which I am not sure is worth it?
>>>> DomU can already read the build information such as changeset, compile
>>>> time, etc.  Build-id is no more special or revealing.
>>> I would take this as an argument *against* giving DomU access to those
>>> pieces of information in details and not as an argument for
>>> *additionally* giving it access to build-id.
>>>
>>> With build-id we have the chance to shape a not-yet-established API and
>>> I would like to follow the Principle of least privilege wherever it
>>> makes sense.
>>>
>>> To reach a similar security level with the existing API, it might make
>>> sense to limit DomU access to compile date, compile time, compiled by,
>>> compiled domain, compiler version and command line details, xen extra
>>> version, and xen changeset.  Basically anything that might help DomUs to
>>> uniquely identify a Xen build.
>>>
>>> The approach can not directly drop access to those fields, as that would
>>> break an existing API, but it could restrict the detail level handed out
>>> to DomU.
>>
>> These are all valid arguments to be made, but please lets fix the issue
>> properly rather than hacking build-id on the side of an unrelated component.
>>
>>  From my point of view, the correct course of action is this:
>>
>> * Split the current XSM model to contain separate attributes for general
>> and privileged information.
>> ** For current compatibility, all existing XENVER_* subops fall into general
>> * Apply an XSM check at the very start of the hypercall.
>> * Extend do_xen_version() to take 3 parameters.  (It is curious that it
>> didn't take a length parameter before)
>> ** This is still ABI compatible, as existing subops simply ignore the
>> parameter.
>
> Or we can just use 1024 bytes space the XENVER_* use.
>
>> * Introduce new XENVER_build_id subop which is documented to require the
>> 3-parameter version of the hypercall.
>> ** This subop falls into straight into privileged information.
>>
>> This will introduce build-id in its correct location, with appropriate
>> restrictions.
>>
>> Moving forwards, we really should have an audit of the existing XENVER_*
>> subops.  Some are clearly safe/required for domU to read, but some such
>> as XENVER_commandline have no business being accessible.  A separate
>> argument, from the repeatable build point of view, says that compilation
>> information isn't useful at all.
>>
>> Depending on how we wish to fix the issue, we could either do a blanket
>> move of the subops into the privileged XSM catagory, or introduce a 3rd
>> "legacy privileged" category to allow the admin to control access on a
>> per-vm basis.
>
> CC-ing Daniel. Changing title.

With XSM enabled, I think the correct thing to do is to have a distinct
permission so that an admin can do per-VM controls.  Without XSM enabled,
how common is the case where some VMs need to get this information and
some need it hidden?  A global (command line controlled?) enable of the
feature for domUs seems like a reasonable solution if this is uncommon.

As far as the xsm_default_t value, this is really what XSM_OTHER is for,
but if there are going to be many instances of this type of data, a new
value like XSM_PRIV_INFOLEAK could be introduced.

-- 
Daniel De Graaf
National Security Agency

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Is: Make XENVER_* use XSM, seperate the different ops in smaller security domains. Was:Re: [PATCH v1 5/5] xsplice: Use ld-embedded build-ids
  2015-09-22 16:28               ` Daniel De Graaf
@ 2015-09-25 20:18                 ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-09-25 20:18 UTC (permalink / raw)
  To: Daniel De Graaf
  Cc: elena.ufimtseva, hanweidong, Martin Pohlack, jbeulich,
	john.liuqiming, paul.voccio, daniel.kiper, major.hayden,
	liuyingdong, aliguori, xen-devel, lars.kurth, steven.wilson,
	ian.campbell, peter.huangpeng, msw, xiantao.zxt, rick.harris,
	boris.ostrovsky, josh.kearney, wei.liu2, jinsong.liu, amesserl,
	ian.jackson, Martin Pohlack, fanhenglong, Andrew Cooper

On Tue, Sep 22, 2015 at 12:28:50PM -0400, Daniel De Graaf wrote:
> On 17/09/15 14:45, Konrad Rzeszutek Wilk wrote:
> >. snip..
> >>>>>>The build id of the current running hypervisor should belong in the
> >>>>>>xeninfo hypercall.  It is not specific to xsplice.
> >>>>>However in the previous reviews it was pointed out that it should only be accessible to dom0.
> >>>>>
> >>>>>Or to any domains as long as the XSM allows (and is turned on) - so not the default dummy one.
> >>>>>
> >>>>>That is a bit of 'if' extra complexity which I am not sure is worth it?
> >>>>DomU can already read the build information such as changeset, compile
> >>>>time, etc.  Build-id is no more special or revealing.
> >>>I would take this as an argument *against* giving DomU access to those
> >>>pieces of information in details and not as an argument for
> >>>*additionally* giving it access to build-id.
> >>>
> >>>With build-id we have the chance to shape a not-yet-established API and
> >>>I would like to follow the Principle of least privilege wherever it
> >>>makes sense.
> >>>
> >>>To reach a similar security level with the existing API, it might make
> >>>sense to limit DomU access to compile date, compile time, compiled by,
> >>>compiled domain, compiler version and command line details, xen extra
> >>>version, and xen changeset.  Basically anything that might help DomUs to
> >>>uniquely identify a Xen build.
> >>>
> >>>The approach can not directly drop access to those fields, as that would
> >>>break an existing API, but it could restrict the detail level handed out
> >>>to DomU.
> >>
> >>These are all valid arguments to be made, but please lets fix the issue
> >>properly rather than hacking build-id on the side of an unrelated component.
> >>
> >> From my point of view, the correct course of action is this:
> >>
> >>* Split the current XSM model to contain separate attributes for general
> >>and privileged information.
> >>** For current compatibility, all existing XENVER_* subops fall into general
> >>* Apply an XSM check at the very start of the hypercall.
> >>* Extend do_xen_version() to take 3 parameters.  (It is curious that it
> >>didn't take a length parameter before)
> >>** This is still ABI compatible, as existing subops simply ignore the
> >>parameter.
> >
> >Or we can just use 1024 bytes space the XENVER_* use.
> >
> >>* Introduce new XENVER_build_id subop which is documented to require the
> >>3-parameter version of the hypercall.
> >>** This subop falls into straight into privileged information.
> >>
> >>This will introduce build-id in its correct location, with appropriate
> >>restrictions.
> >>
> >>Moving forwards, we really should have an audit of the existing XENVER_*
> >>subops.  Some are clearly safe/required for domU to read, but some such
> >>as XENVER_commandline have no business being accessible.  A separate
> >>argument, from the repeatable build point of view, says that compilation
> >>information isn't useful at all.
> >>
> >>Depending on how we wish to fix the issue, we could either do a blanket
> >>move of the subops into the privileged XSM catagory, or introduce a 3rd
> >>"legacy privileged" category to allow the admin to control access on a
> >>per-vm basis.
> >
> >CC-ing Daniel. Changing title.
> 
> With XSM enabled, I think the correct thing to do is to have a distinct
> permission so that an admin can do per-VM controls.  Without XSM enabled,
> how common is the case where some VMs need to get this information and
> some need it hidden?  A global (command line controlled?) enable of the
> feature for domUs seems like a reasonable solution if this is uncommon.

The one (not in the tree) case we want to add is to expose the build-id
 which is an ldd linker type data. The right place to do it is in XENVER_
but having it exposed to the initial domain only makes sense.

However if you have XSM you could use that to expose it to other guests
as well?

But if we are going to add XSM we may as well add XSM for the other ones.
My preference would be to (which I believe is what you think is correct to):

 1). Allow some of the XENVER_ subops to be presented to guests without
     any XSM calls (aka, as it is now).

 2). The other ones (the existing ones), would be bundled under XSM_OTHER.

 3). The new one (build-id) would be XSM_PRIV  and if XSM is
     disabled would be only available for initial domain.

The XSM_PRIV is used in sysctl and we would expand the 'struct xsm_operations'
to have the 'build-id' member.

Hows that?

> 
> As far as the xsm_default_t value, this is really what XSM_OTHER is for,
> but if there are going to be many instances of this type of data, a new
> value like XSM_PRIV_INFOLEAK could be introduced.
> 
> -- 
> Daniel De Graaf
> National Security Agency

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v1] xSplice initial foundation patches.
  2015-09-16 21:01 [PATCH v1] xSplice initial foundation patches Konrad Rzeszutek Wilk
                   ` (4 preceding siblings ...)
  2015-09-16 21:01 ` [PATCH v1 5/5] xsplice: Use ld-embedded build-ids Konrad Rzeszutek Wilk
@ 2015-10-02 14:48 ` Konrad Rzeszutek Wilk
  2015-10-09 12:46   ` Konrad Rzeszutek Wilk
  5 siblings, 1 reply; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-10-02 14:48 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: elena.ufimtseva, hanweidong, jbeulich, john.liuqiming,
	paul.voccio, daniel.kiper, major.hayden, liuyingdong, aliguori,
	xen-devel, lars.kurth, steven.wilson, ian.campbell,
	peter.huangpeng, msw, xiantao.zxt, rick.harris, boris.ostrovsky,
	josh.kearney, jinsong.liu, amesserl, mpohlack, ross.lagerwall,
	fanhenglong, andrew.cooper3

On Wed, Sep 16, 2015 at 05:01:11PM -0400, Konrad Rzeszutek Wilk wrote:
> 
> *OK, what do you have?*
> 
> They are located at a git tree:
>   git://xenbits.xen.org/people/konradwilk/xen.git xsplice.v1

I've created another branch 'xsplice.v1.1' which has some bug-fixes
and improvements. Will be posting it soonish when I am done
with testing.

But more importantly I would like to setup an IRC meeting
so that we may coordinate.

I've used doodle, and the link is:

 http://doodle.com/poll/adtuwr4hs5m4i9y6

The times I've put in there are geared towards earlier hours
but please feel free to propose other times.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v1 2/5] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
  2015-09-16 21:01 ` [PATCH v1 2/5] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op Konrad Rzeszutek Wilk
@ 2015-10-02 15:06   ` Jan Beulich
  0 siblings, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2015-10-02 15:06 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: elena.ufimtseva, hanweidong, john.liuqiming, paul.voccio,
	daniel.kiper, major.hayden, liuyingdong, aliguori, xiantao.zxt,
	lars.kurth, steven.wilson, ian.campbell, peter.huangpeng, msw,
	xen-devel, rick.harris, boris.ostrovsky, josh.kearney,
	jinsong.liu, amesserl, mpohlack, fanhenglong, andrew.cooper3

>>> On 16.09.15 at 23:01, <konrad.wilk@oracle.com> wrote:
> --- /dev/null
> +++ b/xen/common/xsplice.c
> @@ -0,0 +1,442 @@
> +/*
> + * xSplice - Copyright Oracle Corp. Inc 2015.
> + *
> + * Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> + */
> +
> +#include <xen/smp.h>
> +#include <xen/keyhandler.h>
> +#include <xen/spinlock.h>
> +#include <xen/mm.h>
> +#include <xen/list.h>
> +#include <xen/guest_access.h>
> +#include <xen/stdbool.h>

No. This header should only be used in a very limited set up cases
(namely source code shared with the tools). Looks like we accumulated
a few other violations.

> +struct payload {
> +    char  *id;          /* Name of it. Past this structure. */
> +    ssize_t id_len;     /* Length of the name. */
> +
> +    uint8_t *raw;       /* Pointer to Elf file. Past 'id'*/
> +    ssize_t raw_len;    /* Size of 'raw'. */
> +
> +    int32_t status;     /* XSPLICE_STATUS_* or Exx type value. */
> +    int32_t old_status; /* XSPLICE_STATUS_* or Exx type value. */
> +
> +    struct spinlock cmd_lock; /* Lock against the action. */
> +    uint32_t cmd;       /* Action request. XSPLICE_ACTION_* */
> +
> +    /* Boring things below: */
> +    struct list_head   list;   /* Linked to 'payload_list'. */
> +    ssize_t len;        /* This structure + raw_len + id_len + 1. */
> +
> +    struct tasklet tasklet;

char name[]; here would avoid some casting further down.

> +static const char *status2str(int64_t status)
> +{
> +#define STATUS(x) [XSPLICE_STATUS_##x] = #x
> +    static const char *const names[] = {
> +            STATUS(LOADED),
> +            STATUS(PROGRESS),
> +            STATUS(CHECKED),
> +            STATUS(APPLIED),
> +            STATUS(REVERTED),
> +    };

This array will grow unreasonably once we gain a few more
XSPLICE_STATUS_* values (which are bits masks, not enum like
values).

> +int find_payload(xen_xsplice_id_t *id, bool_t need_lock, struct payload **f)

static?

> +{
> +    struct payload *data;
> +    XEN_GUEST_HANDLE_PARAM(char) str;
> +    char name[XEN_XSPLICE_ID_SIZE]; /* 128 bytes on stack. Perhaps kzalloc? */

I don't think 128 bytes are a significant problem, unless this could
be run in the context of an interrupt.

> +static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
> +{
> +    struct payload *data = NULL;
> +    int rc;
> +    ssize_t len;
> +
> +    rc = verify_payload(upload);
> +    if ( rc )
> +        return rc;
> +
> +    rc = find_payload(&upload->id, true, &data);
> +    if ( rc == 0 /* Found. */ )
> +        return -EEXIST;
> +
> +    if ( rc != -ENOENT )
> +        return rc;
> +
> +    /*
> +     * Compute the size of the structures which need to be verified.
> +     * The 1 is for the extra \0 in case name does not have it.
> +     */
> +    len = sizeof(*data) + upload->id.size + 1 + upload->size;
> +    data = alloc_xenheap_pages(get_order_from_bytes(len), 0);
> +    if ( !data )
> +        return -ENOMEM;
> +
> +    memset(data, 0, len);
> +    data->len = len;
> +
> +    /* At the end of structure we put the name. */
> +    data->id = (char *)data + sizeof(*data);
> +    data->id_len = upload->id.size;
> +    /* And after the name + \0 we stick the raw ELF data. */
> +    data->raw = (uint8_t *)data + sizeof(*data) + data->id_len + 1;

Please use data->id here to shorten the expression.

> +static int xsplice_list(xen_sysctl_xsplice_list_t *list)
> +{
> +    xen_xsplice_status_t status;
> +    struct payload *data;
> +    unsigned int idx = 0, i = 0;
> +    int rc = 0;
> +    unsigned int ver = payload_version;

Is it correct to latch this before taking the lock?

> +    // TODO: Increase to a 64 or other value. Leave 4 for debug.
> +    if ( list->nr > 4 )
> +        return -E2BIG;

Command line driven?

> +    if ( list->_pad != 0 )
> +        return -EINVAL;
> +
> +    if ( guest_handle_is_null(list->status) ||
> +         guest_handle_is_null(list->id) ||
> +         guest_handle_is_null(list->len) )
> +        return -EINVAL;

Are these really useful?

> +    if ( !guest_handle_okay(list->status, sizeof(status) * list->nr) ||
> +         !guest_handle_okay(list->id, XEN_XSPLICE_ID_SIZE * list->nr) ||
> +         !guest_handle_okay(list->len, sizeof(uint32_t) * list->nr) )
> +        return -EINVAL;
> +
> +    spin_lock(&payload_list_lock);
> +    if ( list->idx > payload_cnt )
> +    {
> +        spin_unlock(&payload_list_lock);
> +        return -EINVAL;
> +    }
> +
> +    status._pad = 0; /* No stack leaking. */
> +    list_for_each_entry( data, &payload_list, list )
> +    {
> +        uint32_t len;
> +
> +        if ( list->idx > i++ )
> +            continue;
> +
> +        status.status = data->status;
> +        len = data->id_len;
> +
> +        /* N.B. 'idx' != 'i'. */
> +        if ( copy_to_guest_offset(list->id, idx * XEN_XSPLICE_ID_SIZE,
> +                                  data->id, len) ||
> +             copy_to_guest_offset(list->len, idx, &len, 1) ||
> +             copy_to_guest_offset(list->status, idx, &status, 1) )
> +        {
> +            rc = -EFAULT;
> +            break;
> +        }
> +        idx ++;
> +        if ( hypercall_preempt_check() || (idx + 1 > list->nr) )
> +        {
> +            break;
> +        }

Pointless braces. (I won't make specific note of the many other
coding style issues.)

> +static int xsplice_action(xen_sysctl_xsplice_action_t *action)
> +{
> +    struct payload *data;
> +    int rc;
> +
> +    if ( action->_pad != 0 )
> +        return -EINVAL;
> +
> +    rc = verify_id(&action->id);
> +    if ( rc )
> +        return rc;
> +
> +    spin_lock(&payload_list_lock);
> +    rc = find_payload(&action->id, false /* we are holding the lock. */, &data);
> +    if ( rc )
> +        goto out;
> +
> +    if ( action->cmd != XSPLICE_ACTION_UNLOAD )
> +        spin_lock(&data->cmd_lock);

Wouldn't it be better to drop the lock in the specific case below,
after having done the ->status checks?

> +    switch ( action->cmd )
> +    {
> +    case XSPLICE_ACTION_CHECK:
> +        if ( ( data->status == XSPLICE_STATUS_LOADED ) )
> +        {
> +            data->old_status = data->status;
> +            data->status = XSPLICE_STATUS_PROGRESS;
> +            data->cmd = action->cmd;
> +            tasklet_schedule(&data->tasklet);
> +            rc = 0;
> +        } else if ( data->status == XSPLICE_STATUS_CHECKED )
> +        {
> +            rc = 0;
> +        }
> +        break;
> +    case XSPLICE_ACTION_UNLOAD:
> +        if ( ( data->status == XSPLICE_STATUS_REVERTED ) ||
> +             ( data->status == XSPLICE_STATUS_LOADED ) ||
> +             ( data->status == XSPLICE_STATUS_CHECKED ) )
> +        {
> +            __free_payload(data);
> +            /* No touching 'data' from here on! */
> +            rc = 0;
> +        }
> +        break;
> +    case XSPLICE_ACTION_REVERT:
> +        if ( data->status == XSPLICE_STATUS_APPLIED )
> +        {
> +            data->old_status = data->status;
> +            data->status = XSPLICE_STATUS_PROGRESS;
> +            data->cmd = action->cmd;
> +            rc = 0;
> +            /* TODO: Tasklet is not good for this. We need a different vehicle. */
> +            tasklet_schedule(&data->tasklet);
> +        }
> +        break;
> +    case XSPLICE_ACTION_APPLY:
> +        if ( ( data->status == XSPLICE_STATUS_CHECKED ) ||
> +             ( data->status == XSPLICE_STATUS_REVERTED ))
> +        {
> +            data->old_status = data->status;
> +            data->status = XSPLICE_STATUS_PROGRESS;
> +            data->cmd = action->cmd;
> +            rc = 0;
> +            /* TODO: Tasklet is not good for this. We need a different vehicle. */
> +            tasklet_schedule(&data->tasklet);
> +        }
> +        break;

Looking at all of these ->status checks - why do XSPLICE_STATUS_*
need to be bit masks rather then enum like values?

> +    default:
> +        rc = -ENOSYS;

EOPNOTSUPP or EINVAL or ..., but not ENOSYS.

> +int xsplice_control(xen_sysctl_xsplice_op_t *xsplice)
> +{
> +    int rc;
> +
> +    switch ( xsplice->cmd )
> +    {
> +    case XEN_SYSCTL_XSPLICE_UPLOAD:
> +        rc = xsplice_upload(&xsplice->u.upload);
> +        break;
> +    case XEN_SYSCTL_XSPLICE_GET:
> +        rc = xsplice_get(&xsplice->u.get);
> +        break;
> +    case XEN_SYSCTL_XSPLICE_LIST:
> +        rc = xsplice_list(&xsplice->u.list);
> +        break;
> +    case XEN_SYSCTL_XSPLICE_ACTION:
> +        rc = xsplice_action(&xsplice->u.action);
> +        break;
> +    default:
> +        rc = -ENOSYS;

Same here.

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v1 5/5] xsplice: Use ld-embedded build-ids
  2015-09-16 21:01 ` [PATCH v1 5/5] xsplice: Use ld-embedded build-ids Konrad Rzeszutek Wilk
  2015-09-16 21:41   ` Andrew Cooper
@ 2015-10-02 15:13   ` Jan Beulich
  1 sibling, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2015-10-02 15:13 UTC (permalink / raw)
  To: Martin Pohlack, Konrad Rzeszutek Wilk
  Cc: elena.ufimtseva, hanweidong, john.liuqiming, paul.voccio,
	daniel.kiper, major.hayden, liuyingdong, aliguori, xiantao.zxt,
	lars.kurth, steven.wilson, ian.campbell, peter.huangpeng, msw,
	xen-devel, rick.harris, boris.ostrovsky, josh.kearney,
	jinsong.liu, amesserl, mpohlack, fanhenglong, andrew.cooper3

>>> On 16.09.15 at 23:01, <konrad.wilk@oracle.com> wrote:
> --- a/xen/arch/x86/Makefile
> +++ b/xen/arch/x86/Makefile
> @@ -108,11 +108,11 @@ $(TARGET)-syms: prelink.o xen.lds $(BASEDIR)/common/symbols-dummy.o
>  	    $(BASEDIR)/common/symbols-dummy.o -o $(@D)/.$(@F).0
>  	$(NM) -n $(@D)/.$(@F).0 | $(BASEDIR)/tools/symbols >$(@D)/.$(@F).0.S
>  	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).0.o
> -	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
> +	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o --build-id=sha1 \
>  	    $(@D)/.$(@F).0.o -o $(@D)/.$(@F).1
>  	$(NM) -n $(@D)/.$(@F).1 | $(BASEDIR)/tools/symbols >$(@D)/.$(@F).1.S
>  	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1.o
> -	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
> +	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o --build-id=sha1 \
>  	    $(@D)/.$(@F).1.o -o $@
>  	rm -f $(@D)/.$(@F).[0-9]*

Which raises the question: What about xen.efi?

> @@ -44,6 +46,36 @@ struct payload {
>      struct tasklet tasklet;
>  };
>  
> +#ifdef CONFIG_ARM
> +static int build_id(char **p, unsigned int *len)
> +{
> +    return 0;
> +}
> +#else

Any reason not to make the build logic common, in order to avoid such
#ifdef-ery?


> +extern char * __note_gnu_build_id_start;  /* defined in linker script */

const, and I think you mean the type to be Elf_Note[].

> @@ -68,9 +100,31 @@ static const char *status2str(int64_t status)
>      return names[status];
>  }
>  
> +#define LEN 128

Because of?

>  void xsplice_printall(unsigned char key)
>  {
>      struct payload *data;
> +    char *binary_id = NULL;
> +    unsigned int len = 0;
> +    int rc;
> +
> +    rc = build_id(&binary_id, &len);
> +    printk("build-id: ");
> +    if ( !rc )
> +    {
> +        unsigned int i;
> +
> +        if ( len > LEN )
> +            len = LEN;
> +
> +        for ( i = 0; i < len; i++ )
> +        {
> +		    uint8_t c = binary_id[i];
> +		    printk("%02x", c);

Hard tabs.

> +        }
> +	    printk("\n");
> +    } else if ( rc < 0 )

Coding style.

> @@ -415,6 +470,34 @@ static int xsplice_action(xen_sysctl_xsplice_action_t *action)
>      return rc;
>  }
>  
> +static int xsplice_info(xen_sysctl_xsplice_info_t *info)
> +{
> +    int rc;
> +    unsigned int len = 0;
> +    char *p = NULL;
> +
> +    if ( info->cmd != XEN_SYSCTL_XSPLICE_INFO_BUILD_ID )
> +        return -EINVAL;
> +
> +    if ( info->size == 0 )
> +        return -EINVAL;
> +
> +    if ( !guest_handle_okay(info->u.info, info->size) )
> +        return -EFAULT;
> +
> +    rc = build_id(&p, &len);
> +    if ( rc )
> +        return rc;
> +
> +    if ( len > info->size )
> +        return -ENOMEM;
> +
> +    if ( copy_to_guest(info->u.info, p, len) )
> +        return -EFAULT;

So how does the caller know how much data got copied?

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v1 1/5] xsplice: Design document.
  2015-09-16 21:01 ` [PATCH v1 1/5] xsplice: Design document Konrad Rzeszutek Wilk
@ 2015-10-05 10:02   ` Jan Beulich
  2015-10-05 10:28   ` Ross Lagerwall
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2015-10-05 10:02 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: elena.ufimtseva, hanweidong, john.liuqiming, paul.voccio,
	daniel.kiper, major.hayden, liuyingdong, aliguori, xiantao.zxt,
	lars.kurth, steven.wilson, ian.campbell, peter.huangpeng, msw,
	xen-devel, rick.harris, boris.ostrovsky, josh.kearney,
	jinsong.liu, amesserl, mpohlack, fanhenglong, andrew.cooper3

>>> On 16.09.15 at 23:01, <konrad.wilk@oracle.com> wrote:
> +### Symbol names
> +
> +
> +Xen as it is now, has a couple of non-unique symbol names which will
> +make runtime symbol identification hard.  Sometimes, static symbols
> +simply have the same name in C files, sometimes such symbols get
> +included via header files, and some C files are also compiled
> +multiple times and linked under different names (guest_walk.c).
> +
> +As such we need to modify the linker to make sure that the symbol
> +table qualifies also symbols by their source file name.
> +
> +For the awkward situations in which C-files are compiled multiple
> +times patches we would need to some modification in the Xen code.
> +
> +
> +The convention for file-type symbols (that would allow to map many
> +symbols to their compilation unit) says that only the basename (i.e.,
> +without directories) is embedded.  This creates another layer of
> +confusion for duplicate file names in the build tree.
> +
> +That would have to be resolved.

I'm working on this, btw. From what I can tell after some
investigation over the weekend we probably don't even need to
fully overhaul the current symbol handling logic, we just need to
pass different options to nm and alter processing its output
accordingly. The cumbersome case (but that would have been so
also with the originally considered full symbol table approach) will
be xen.efi, but I guess I'll first try to deal with this on the binutils
side.

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v1 1/5] xsplice: Design document.
  2015-09-16 21:01 ` [PATCH v1 1/5] xsplice: Design document Konrad Rzeszutek Wilk
  2015-10-05 10:02   ` Jan Beulich
@ 2015-10-05 10:28   ` Ross Lagerwall
  2015-10-12 11:44     ` xsplice-build prototype (was [PATCH v1 1/5] xsplice: Design document.) Ross Lagerwall
  2015-10-06 12:57   ` [PATCH v1 1/5] xsplice: Design document Ross Lagerwall
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 36+ messages in thread
From: Ross Lagerwall @ 2015-10-05 10:28 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, msw, aliguori, amesserl,
	rick.harris, paul.voccio, steven.wilson, major.hayden,
	josh.kearney, jinsong.liu, xiantao.zxt, boris.ostrovsky,
	daniel.kiper, elena.ufimtseva, bob.liu, lars.kurth, hanweidong,
	peter.huangpeng, fanhenglong, liuyingdong, john.liuqiming,
	jbeulich, andrew.cooper3, mpohlack, ian.campbell

On 09/16/2015 10:01 PM, Konrad Rzeszutek Wilk wrote:
> +### Generation of xSplice ELF payloads
> +
> +The design of that is not discussed in this design.
> +
> +The author of this design envisions objdump and objcopy along
> +with special GCC parameters (see above) to create .o.xsplice files
> +which can be used to splice an ELF with the new payload.
> +
> +The ksplice code can provide inspiration.
> +

As discussed off-list with Konrad, I'm going to get started with 
generation of the payload given a patch.

-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v1 1/5] xsplice: Design document.
  2015-09-16 21:01 ` [PATCH v1 1/5] xsplice: Design document Konrad Rzeszutek Wilk
  2015-10-05 10:02   ` Jan Beulich
  2015-10-05 10:28   ` Ross Lagerwall
@ 2015-10-06 12:57   ` Ross Lagerwall
  2015-10-27  8:08     ` Martin Pohlack
  2015-10-06 15:26   ` Jan Beulich
  2015-10-26 12:01   ` Martin Pohlack
  4 siblings, 1 reply; 36+ messages in thread
From: Ross Lagerwall @ 2015-10-06 12:57 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, msw, aliguori, amesserl,
	rick.harris, paul.voccio, steven.wilson, major.hayden,
	josh.kearney, jinsong.liu, xiantao.zxt, boris.ostrovsky,
	daniel.kiper, elena.ufimtseva, bob.liu, lars.kurth, hanweidong,
	peter.huangpeng, fanhenglong, liuyingdong, john.liuqiming,
	jbeulich, andrew.cooper3, mpohlack, ian.campbell

On 09/16/2015 10:01 PM, Konrad Rzeszutek Wilk wrote:
> +### xSplice interdependencies
> +
> +xSplice patches interdependencies are tricky.
> +
> +There are the ways this can be addressed:
> + * A single large patch that subsumes and replaces all previous ones.
> +   Over the life-time of patching the hypervisor this large patch
> +   grows to accumulate all the code changes.
> + * Hotpatch stack - where an mechanism exists that loads the hotpatches
> +   in the same order they were built in. We would need an build-id
> +   of the hypevisor to make sure the hot-patches are build against the
> +   correct build.
> + * Payload containing the old code to check against that. That allows
> +   the hotpatches to be loaded indepedently (if they don't overlap) - or
> +   if the old code also containst previously patched code - even if they
> +   overlap.
> +
> +The disadvantage of the first large patch is that it can grow over
> +time and not provide an bisection mechanism to identify faulty patches.
> +
> +The hot-patch stack puts stricts requirements on the order of the patches
> +being loaded and requires an hypervisor build-id to match against.
> +
> +The old code allows much more flexibility and an additional guard,
> +but is more complex to implement.
> +

If the single large patch mechanism is used, a new REPLACE action is 
needed to atomically replace one patch with another to prevent a window 
where the hypervisor is unpatched. kpatch has a "replace" command for 
this purpose. This may be useful even for the other mechanisms listed above.

 From what I can tell:
* kSplice uses old code checking (method [3] above), although in 
practice the userspace tools implement dependency logic to enforce a 
linear stack of patches.
* kPatch and kGraft recommend using the single large patch mechanism 
although there's nothing preventing two independent patches from being 
loaded.

-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v1 1/5] xsplice: Design document.
  2015-09-16 21:01 ` [PATCH v1 1/5] xsplice: Design document Konrad Rzeszutek Wilk
                     ` (2 preceding siblings ...)
  2015-10-06 12:57   ` [PATCH v1 1/5] xsplice: Design document Ross Lagerwall
@ 2015-10-06 15:26   ` Jan Beulich
  2015-10-26 12:01   ` Martin Pohlack
  4 siblings, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2015-10-06 15:26 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: elena.ufimtseva, hanweidong, john.liuqiming, paul.voccio,
	daniel.kiper, major.hayden, liuyingdong, aliguori, xiantao.zxt,
	lars.kurth, steven.wilson, ian.campbell, peter.huangpeng, msw,
	xen-devel, rick.harris, boris.ostrovsky, josh.kearney,
	jinsong.liu, amesserl, mpohlack, fanhenglong, andrew.cooper3

>>> On 16.09.15 at 23:01, <konrad.wilk@oracle.com> wrote:
> +### Symbol names
> +
> +
> +Xen as it is now, has a couple of non-unique symbol names which will
> +make runtime symbol identification hard.  Sometimes, static symbols
> +simply have the same name in C files, sometimes such symbols get
> +included via header files, and some C files are also compiled
> +multiple times and linked under different names (guest_walk.c).
> +
> +As such we need to modify the linker to make sure that the symbol
> +table qualifies also symbols by their source file name.
> +
> +For the awkward situations in which C-files are compiled multiple
> +times patches we would need to some modification in the Xen code.
> +
> +
> +The convention for file-type symbols (that would allow to map many
> +symbols to their compilation unit) says that only the basename (i.e.,
> +without directories) is embedded.  This creates another layer of
> +confusion for duplicate file names in the build tree.
> +
> +That would have to be resolved.

So here's a draft (some debugging code left and not yet tested
on a really old tool chain) patch doing the prefixing. I also have
9 more patches on top of this, most dealing with individual
symbols that are still left as duplicates (some also do other
cleanup I no longer directly need with the approach now taken).
Sadly the set of duplicates depends on the compiler version in
some cases, hence at the end of the full current series there
are still some duplicate symbol warnings left. I think we can live
with those though for the moment.

Jan

TODO: remove //temp-s

Note: Not warning about duplicate symbols in the EFI case for now, as
a binutils bug causes misnamed file name entries to appear in EFI
binaries' symbol tables when the file name is longer than 18 chars.
(Not doing so also avoids other duplicates getting printed twice.)

--- unstable.orig/xen/arch/x86/Makefile
+++ unstable/xen/arch/x86/Makefile
@@ -106,11 +106,13 @@ $(BASEDIR)/common/symbols-dummy.o:
 $(TARGET)-syms: prelink.o xen.lds $(BASEDIR)/common/symbols-dummy.o
 	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
 	    $(BASEDIR)/common/symbols-dummy.o -o $(@D)/.$(@F).0
-	$(NM) -n $(@D)/.$(@F).0 | $(BASEDIR)/tools/symbols >$(@D)/.$(@F).0.S
+	$(NM) -pa --format=sysv $(@D)/.$(@F).0 \
+		| $(BASEDIR)/tools/symbols --sysv --sort >$(@D)/.$(@F).0.S
 	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).0.o
 	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
 	    $(@D)/.$(@F).0.o -o $(@D)/.$(@F).1
-	$(NM) -n $(@D)/.$(@F).1 | $(BASEDIR)/tools/symbols >$(@D)/.$(@F).1.S
+	$(NM) -pa --format=sysv $(@D)/.$(@F).1 \
+		| $(BASEDIR)/tools/symbols --sysv --sort --warn-dup >$(@D)/.$(@F).1.S
 	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1.o
 	$(LD) $(LDFLAGS) -T xen.lds -N prelink.o \
 	    $(@D)/.$(@F).1.o -o $@
@@ -133,13 +135,15 @@ $(TARGET).efi: prelink-efi.o efi.lds efi
 	          $(guard) $(LD) $(call EFI_LDFLAGS,$(base)) -T efi.lds -N $< efi/relocs-dummy.o \
 	                $(BASEDIR)/common/symbols-dummy.o -o $(@D)/.$(@F).$(base).0 &&) :
 	$(guard) efi/mkreloc $(foreach base,$(VIRT_BASE) $(ALT_BASE),$(@D)/.$(@F).$(base).0) >$(@D)/.$(@F).0r.S
-	$(guard) $(NM) -n $(@D)/.$(@F).$(VIRT_BASE).0 | $(guard) $(BASEDIR)/tools/symbols >$(@D)/.$(@F).0s.S
+	$(guard) $(NM) -pa --format=sysv $(@D)/.$(@F).$(VIRT_BASE).0 \
+		| $(guard) $(BASEDIR)/tools/symbols --sysv --sort >$(@D)/.$(@F).0s.S
 	$(guard) $(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).0r.o $(@D)/.$(@F).0s.o
 	$(foreach base, $(VIRT_BASE) $(ALT_BASE), \
 	          $(guard) $(LD) $(call EFI_LDFLAGS,$(base)) -T efi.lds -N $< \
 	                $(@D)/.$(@F).0r.o $(@D)/.$(@F).0s.o -o $(@D)/.$(@F).$(base).1 &&) :
 	$(guard) efi/mkreloc $(foreach base,$(VIRT_BASE) $(ALT_BASE),$(@D)/.$(@F).$(base).1) >$(@D)/.$(@F).1r.S
-	$(guard) $(NM) -n $(@D)/.$(@F).$(VIRT_BASE).1 | $(guard) $(BASEDIR)/tools/symbols >$(@D)/.$(@F).1s.S
+	$(guard) $(NM) -pa --format=sysv $(@D)/.$(@F).$(VIRT_BASE).1 \
+		| $(guard) $(BASEDIR)/tools/symbols --sysv --sort >$(@D)/.$(@F).1s.S
 	$(guard) $(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1r.o $(@D)/.$(@F).1s.o
 	$(guard) $(LD) $(call EFI_LDFLAGS,$(VIRT_BASE)) -T efi.lds -N $< \
 	                $(@D)/.$(@F).1r.o $(@D)/.$(@F).1s.o -o $@
--- unstable.orig/xen/arch/x86/time.c
+++ unstable/xen/arch/x86/time.c
@@ -2059,6 +2059,7 @@ static struct keyhandler dump_softtsc_ke
 static int __init setup_dump_softtsc(void)
 {
     register_keyhandler('s', &dump_softtsc_keyhandler);
+dump_execution_state();//temp
     return 0;
 }
 __initcall(setup_dump_softtsc);
--- unstable.orig/xen/tools/symbols.c
+++ unstable/xen/tools/symbols.c
@@ -30,6 +30,7 @@
 #include <stdlib.h>
 #include <string.h>
 #include <stdint.h>
+#include <stdbool.h>
 #include <ctype.h>
 
 #define KSYM_NAME_LEN		127
@@ -40,13 +41,14 @@ struct sym_entry {
 	unsigned int len;
 	unsigned char *sym;
 };
-
+#define SYMBOL_NAME(s) ((char *)(s)->sym + 1)
 
 static struct sym_entry *table;
 static unsigned int table_size, table_cnt;
 static unsigned long long _stext, _etext, _sinittext, _einittext, _sextratext, _eextratext;
 static int all_symbols = 0;
 static char symbol_prefix_char = '\0';
+static enum { fmt_bsd, fmt_sysv } input_format;
 
 int token_profit[0x10000];
 
@@ -73,11 +75,25 @@ static inline int is_arm_mapping_symbol(
 
 static int read_symbol(FILE *in, struct sym_entry *s)
 {
-	char str[500];
+	char str[500], type[20] = "";
 	char *sym, stype;
-	int rc;
-
-	rc = fscanf(in, "%llx %c %499s\n", &s->addr, &stype, str);
+	static enum { symbol, source_file, object_file } last;
+	static char *filename;
+	int rc = -1;
+
+	switch (input_format) {
+	case fmt_bsd:
+		rc = fscanf(in, "%llx %c %499s\n", &s->addr, &stype, str);
+		break;
+	case fmt_sysv:
+		while (fscanf(in, "\n") == 1)
+			/* nothing */;
+		rc = fscanf(in, "%499[^ |] |%llx | %c |",
+			    str, &s->addr, &stype);
+		if (rc == 3 && fscanf(in, " %19[^ |] |", type) != 1)
+			*type = '\0';
+		break;
+	}
 	if (rc != 3) {
 		if (rc != EOF) {
 			/* skip line */
@@ -87,6 +103,28 @@ static int read_symbol(FILE *in, struct 
 		return -1;
 	}
 
+	sym = strrchr(str, '.');
+	if (strcasecmp(type, "FILE") == 0 ||
+	    (/* GNU nm prior to XXX doesn't produce a type for EFI binaries. */
+	     input_format == fmt_sysv && !*type && stype == '?' && sym &&
+	     sym[1] && strchr("cSsoh", sym[1]) && !sym[2])) {
+		/*
+		 * gas prior to XXX outputs symbol table entries resulting
+		 * from .file in reverse order. If we get two consecutive file
+		 * symbols, prefer the first one if that names an object file
+		 * (to cover multiply compiled files).
+		 */
+		if ((sym && sym[1] == 'o') || last != object_file) {
+			free(filename);
+			filename = *str ? strdup(str) : NULL;
+		}
+		last = !sym || sym[1] != 'o' ? source_file : object_file;
+		goto skip_tail;
+	}
+
+	last = symbol;
+	rc = -1;
+
 	sym = str;
 	/* skip prefix char */
 	if (symbol_prefix_char && str[0] == symbol_prefix_char)
@@ -109,24 +147,38 @@ static int read_symbol(FILE *in, struct 
 	{
 		/* Keep these useful absolute symbols */
 		if (strcmp(sym, "__gp"))
-			return -1;
-
+			goto skip_tail;
 	}
 	else if (toupper((uint8_t)stype) == 'U' ||
+		 toupper((uint8_t)stype) == 'N' ||
 		 is_arm_mapping_symbol(sym))
-		return -1;
+		goto skip_tail;
 	/* exclude also MIPS ELF local symbols ($L123 instead of .L123) */
 	else if (str[0] == '$')
-		return -1;
+		goto skip_tail;
 
 	/* include the type field in the symbol name, so that it gets
 	 * compressed together */
 	s->len = strlen(str) + 1;
+	if (islower(stype) && filename)
+		s->len += strlen(filename) + 1;
 	s->sym = malloc(s->len + 1);
-	strcpy((char *)s->sym + 1, str);
+	sym = SYMBOL_NAME(s);
+	if (islower(stype) && filename) {
+		sym = stpcpy(sym, filename);
+		*sym++ = '#';
+	}
+	strcpy(sym, str);
 	s->sym[0] = stype;
+//if(input_format == fmt_sysv) fprintf(stderr, "%c: %s %llx\n", s->sym[0], s->sym + 1, s->addr);//temp
 
-	return 0;
+	rc = 0;
+
+ skip_tail:
+	if (input_format == fmt_sysv)
+		fgets(str, 500, in); /* discard rest of line */
+
+	return rc;
 }
 
 static int symbol_valid(struct sym_entry *s)
@@ -460,11 +512,37 @@ static void optimize_token_table(void)
 	optimize_result();
 }
 
+static int compare_value(const void *p1, const void *p2)
+{
+	const struct sym_entry *sym1 = p1;
+	const struct sym_entry *sym2 = p2;
+
+	if (sym1->addr < sym2->addr)
+		return -1;
+	if (sym1->addr > sym2->addr)
+		return +1;
+	/* Prefer global symbols. */
+	if (isupper(*sym1->sym))
+		return -1;
+	if (isupper(*sym2->sym))
+		return +1;
+	return 0;
+}
+
+static int compare_name(const void *p1, const void *p2)
+{
+	const struct sym_entry *sym1 = p1;
+	const struct sym_entry *sym2 = p2;
+
+	return strcmp(SYMBOL_NAME(sym1), SYMBOL_NAME(sym2));
+}
 
 int main(int argc, char **argv)
 {
+	unsigned int i;
+	bool unsorted = false, warn_dup = false;
+
 	if (argc >= 2) {
-		int i;
 		for (i = 1; i < argc; i++) {
 			if(strcmp(argv[i], "--all-symbols") == 0)
 				all_symbols = 1;
@@ -474,13 +552,36 @@ int main(int argc, char **argv)
 				if ((*p == '"' && *(p+2) == '"') || (*p == '\'' && *(p+2) == '\''))
 					p++;
 				symbol_prefix_char = *p;
-			} else
+			} else if (strcmp(argv[i], "--sysv") == 0)
+				input_format = fmt_sysv;
+			else if (strcmp(argv[i], "--sort") == 0)
+				unsorted = true;
+			else if (strcmp(argv[i], "--warn-dup") == 0)
+				warn_dup = true;
+			else
 				usage();
 		}
 	} else if (argc != 1)
 		usage();
 
 	read_map(stdin);
+
+	if (warn_dup) {
+		qsort(table, table_cnt, sizeof(*table), compare_name);
+		for (i = 1; i < table_cnt; ++i)
+			if (strcmp(SYMBOL_NAME(table + i - 1),
+				   SYMBOL_NAME(table + i)) == 0 &&
+			    table[i - 1].addr != table[i].addr)
+				fprintf(stderr,
+					"Duplicate symbol '%s' (%llx != %llx)\n",
+					SYMBOL_NAME(table + i),
+					table[i].addr, table[i - 1].addr);
+		unsorted = true;
+	}
+
+	if (unsorted)
+		qsort(table, table_cnt, sizeof(*table), compare_value);
+
 	optimize_token_table();
 	write_src();

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v1] xSplice initial foundation patches.
  2015-10-02 14:48 ` [PATCH v1] xSplice initial foundation patches Konrad Rzeszutek Wilk
@ 2015-10-09 12:46   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-10-09 12:46 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: elena.ufimtseva, hanweidong, jbeulich, john.liuqiming,
	paul.voccio, daniel.kiper, major.hayden, liuyingdong, aliguori,
	xen-devel, lars.kurth, steven.wilson, ian.campbell,
	peter.huangpeng, msw, xiantao.zxt, rick.harris, boris.ostrovsky,
	josh.kearney, jinsong.liu, amesserl, mpohlack, ross.lagerwall,
	fanhenglong, andrew.cooper3

On Fri, Oct 02, 2015 at 10:48:54AM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Sep 16, 2015 at 05:01:11PM -0400, Konrad Rzeszutek Wilk wrote:
> > 
> > *OK, what do you have?*
> > 
> > They are located at a git tree:
> >   git://xenbits.xen.org/people/konradwilk/xen.git xsplice.v1
> 
> I've created another branch 'xsplice.v1.1' which has some bug-fixes
> and improvements. Will be posting it soonish when I am done
> with testing.
> 
> But more importantly I would like to setup an IRC meeting
> so that we may coordinate.
> 
> I've used doodle, and the link is:
> 
>  http://doodle.com/poll/adtuwr4hs5m4i9y6
> 
> The times I've put in there are geared towards earlier hours
> but please feel free to propose other times.

The one time that works for the interested parties is 10AM EST
on the fourth Monday of the month. As such we will have an
meeting on #xsplice on irc.freenode.net on that day.

Thank you

^ permalink raw reply	[flat|nested] 36+ messages in thread

* xsplice-build prototype (was [PATCH v1 1/5] xsplice: Design document.)
  2015-10-05 10:28   ` Ross Lagerwall
@ 2015-10-12 11:44     ` Ross Lagerwall
  2015-10-12 13:06       ` Konrad Rzeszutek Wilk
  2015-10-12 14:20       ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 36+ messages in thread
From: Ross Lagerwall @ 2015-10-12 11:44 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, msw, aliguori, amesserl,
	rick.harris, paul.voccio, steven.wilson, major.hayden,
	josh.kearney, jinsong.liu, xiantao.zxt, boris.ostrovsky,
	daniel.kiper, elena.ufimtseva, bob.liu, lars.kurth, hanweidong,
	peter.huangpeng, fanhenglong, liuyingdong, john.liuqiming,
	jbeulich, andrew.cooper3, mpohlack, ian.campbell

On 10/05/2015 11:28 AM, Ross Lagerwall wrote:
> On 09/16/2015 10:01 PM, Konrad Rzeszutek Wilk wrote:
>> +### Generation of xSplice ELF payloads
>> +
>> +The design of that is not discussed in this design.
>> +
>> +The author of this design envisions objdump and objcopy along
>> +with special GCC parameters (see above) to create .o.xsplice files
>> +which can be used to splice an ELF with the new payload.
>> +
>> +The ksplice code can provide inspiration.
>> +
>
> As discussed off-list with Konrad, I'm going to get started with
> generation of the payload given a patch.
>

I've created a _prototype_ tool for this based on kpatch's tooling. It's 
currently living at https://github.com/rosslagerwall/xsplice-build

With no source patch modifications, live patches can be built for every 
XSA that applies to x86 back to XSA-90 except for XSA-97, XSA-111, 
XSA-112, and XSA-114 (83% success rate). It gives plausible output for 
each generated patch although I obviously can't verify the live patches yet.

It doesn't really follow the design in the above document; IMO the 
payload design is unnecessarily complicated. At this point I'd rather 
just get a complete working prototype, and we can figure out the 
specifics and finalize the design later.

If no one else is working on it, I'm going to start the next steps which is:
* Load the ELF binary into Xen memory.
* Resolve symbols.
* Perform ELF relocations

I'll use Konrad's xsplice.v1.1 branch as a starting point to provide the 
hypervisor interface for this work.

Thanks
-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: xsplice-build prototype (was [PATCH v1 1/5] xsplice: Design document.)
  2015-10-12 11:44     ` xsplice-build prototype (was [PATCH v1 1/5] xsplice: Design document.) Ross Lagerwall
@ 2015-10-12 13:06       ` Konrad Rzeszutek Wilk
  2015-10-12 14:20       ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-10-12 13:06 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: elena.ufimtseva, hanweidong, jbeulich, john.liuqiming,
	paul.voccio, daniel.kiper, major.hayden, liuyingdong, aliguori,
	xen-devel, lars.kurth, steven.wilson, ian.campbell,
	peter.huangpeng, msw, xiantao.zxt, rick.harris, boris.ostrovsky,
	josh.kearney, jinsong.liu, amesserl, mpohlack, fanhenglong,
	andrew.cooper3

On Mon, Oct 12, 2015 at 12:44:12PM +0100, Ross Lagerwall wrote:
> On 10/05/2015 11:28 AM, Ross Lagerwall wrote:
> >On 09/16/2015 10:01 PM, Konrad Rzeszutek Wilk wrote:
> >>+### Generation of xSplice ELF payloads
> >>+
> >>+The design of that is not discussed in this design.
> >>+
> >>+The author of this design envisions objdump and objcopy along
> >>+with special GCC parameters (see above) to create .o.xsplice files
> >>+which can be used to splice an ELF with the new payload.
> >>+
> >>+The ksplice code can provide inspiration.
> >>+
> >
> >As discussed off-list with Konrad, I'm going to get started with
> >generation of the payload given a patch.
> >
> 
> I've created a _prototype_ tool for this based on kpatch's tooling. It's
> currently living at https://github.com/rosslagerwall/xsplice-build
> 
> With no source patch modifications, live patches can be built for every XSA
> that applies to x86 back to XSA-90 except for XSA-97, XSA-111, XSA-112, and
> XSA-114 (83% success rate). It gives plausible output for each generated
> patch although I obviously can't verify the live patches yet.
> 
> It doesn't really follow the design in the above document; IMO the payload
> design is unnecessarily complicated. At this point I'd rather just get a
> complete working prototype, and we can figure out the specifics and finalize
> the design later.
> 
> If no one else is working on it, I'm going to start the next steps which is:
> * Load the ELF binary into Xen memory.
> * Resolve symbols.
> * Perform ELF relocations

Go for it.
> 
> I'll use Konrad's xsplice.v1.1 branch as a starting point to provide the
> hypervisor interface for this work.

Fantastic! Thank you!
> 
> Thanks
> -- 
> Ross Lagerwall

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: xsplice-build prototype (was [PATCH v1 1/5] xsplice: Design document.)
  2015-10-12 11:44     ` xsplice-build prototype (was [PATCH v1 1/5] xsplice: Design document.) Ross Lagerwall
  2015-10-12 13:06       ` Konrad Rzeszutek Wilk
@ 2015-10-12 14:20       ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-10-12 14:20 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: elena.ufimtseva, hanweidong, jbeulich, john.liuqiming,
	paul.voccio, daniel.kiper, major.hayden, liuyingdong, aliguori,
	xen-devel, lars.kurth, steven.wilson, ian.campbell,
	peter.huangpeng, msw, xiantao.zxt, rick.harris, boris.ostrovsky,
	josh.kearney, jinsong.liu, amesserl, mpohlack, fanhenglong,
	andrew.cooper3

> If no one else is working on it, I'm going to start the next steps which is:
> * Load the ELF binary into Xen memory.
> * Resolve symbols.
> * Perform ELF relocations

I updated the Wiki xSplice with this and the URL to the tool.
> 
> I'll use Konrad's xsplice.v1.1 branch as a starting point to provide the
> hypervisor interface for this work.

BTW, I also pushed xsplice.v1.2 which has the XENVER patches I posted a couple
of days ago (along with the xSplice using it). It also has some fixes to
xSplice hypervisor code (wrong XSM check).

Thank you!
> 
> Thanks
> -- 
> Ross Lagerwall

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v1 1/5] xsplice: Design document.
  2015-09-16 21:01 ` [PATCH v1 1/5] xsplice: Design document Konrad Rzeszutek Wilk
                     ` (3 preceding siblings ...)
  2015-10-06 15:26   ` Jan Beulich
@ 2015-10-26 12:01   ` Martin Pohlack
  2015-10-26 12:10     ` Jan Beulich
  2015-10-26 13:21     ` Ross Lagerwall
  4 siblings, 2 replies; 36+ messages in thread
From: Martin Pohlack @ 2015-10-26 12:01 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, xen-devel, msw, aliguori, amesserl,
	rick.harris, paul.voccio, steven.wilson, major.hayden,
	josh.kearney, jinsong.liu, xiantao.zxt, boris.ostrovsky,
	daniel.kiper, elena.ufimtseva, bob.liu, lars.kurth, hanweidong,
	peter.huangpeng, fanhenglong, liuyingdong, john.liuqiming,
	jbeulich, andrew.cooper3, ian.campbell

On 16.09.2015 23:01, Konrad Rzeszutek Wilk wrote:

[...]

> +### xsplice_patch
> +
> +This structure has the binary code (or data) to be patched. Depending on the
> +type it can either an inline patch (data or text) or require an relocation
> +change (which requires a trampoline). Naturally it also points to a blob
> +of the binary data to patch in, and the size of the patch.

On the Xen developer summit we agreed to start with a minimal approach
first.  Based on looking at the last ~50 XSA patches, I do think we can
do *without* the in-place patching and would propose to tackle this
approach later or not at all.

[...]

> +### Symbol names
> +
> +
> +Xen as it is now, has a couple of non-unique symbol names which will
> +make runtime symbol identification hard.  Sometimes, static symbols
> +simply have the same name in C files, sometimes such symbols get
> +included via header files, and some C files are also compiled
> +multiple times and linked under different names (guest_walk.c).
> +
> +As such we need to modify the linker to make sure that the symbol
> +table qualifies also symbols by their source file name.
> +
> +For the awkward situations in which C-files are compiled multiple
> +times patches we would need to some modification in the Xen code.
> +
> +
> +The convention for file-type symbols (that would allow to map many
> +symbols to their compilation unit) says that only the basename (i.e.,
> +without directories) is embedded.  This creates another layer of
> +confusion for duplicate file names in the build tree.
> +
> +That would have to be resolved.

Another approach here would be to qualify symbol names by the
object-file name which contains them + a unique path component, for example:

drivers/pci/pci.o
arch/x86/pci.o
arch/x86/x86_64/pci.o

> +
> +<pre>
> +> find . -name \*.c -print0 | xargs -0 -n1 basename | sort | uniq -c | sort -n | tail -n10
> +      3 shutdown.c
> +      3 sysctl.c
> +      3 time.c
> +      3 xenoprof.c
> +      4 gdbstub.c
> +      4 irq.c
> +      5 domain.c
> +      5 mm.c
> +      5 pci.c
> +      5 traps.c
> +</pre>

Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v1 1/5] xsplice: Design document.
  2015-10-26 12:01   ` Martin Pohlack
@ 2015-10-26 12:10     ` Jan Beulich
  2015-10-26 13:21     ` Ross Lagerwall
  1 sibling, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2015-10-26 12:10 UTC (permalink / raw)
  To: Martin Pohlack
  Cc: elena.ufimtseva, hanweidong, john.liuqiming, paul.voccio,
	daniel.kiper, major.hayden, liuyingdong, aliguori, xiantao.zxt,
	lars.kurth, steven.wilson, ian.campbell, peter.huangpeng, msw,
	xen-devel, rick.harris, boris.ostrovsky, josh.kearney,
	jinsong.liu, amesserl, fanhenglong, andrew.cooper3

>>> On 26.10.15 at 13:01, <mpohlack@amazon.com> wrote:
> On 16.09.2015 23:01, Konrad Rzeszutek Wilk wrote:
>> +The convention for file-type symbols (that would allow to map many
>> +symbols to their compilation unit) says that only the basename (i.e.,
>> +without directories) is embedded.  This creates another layer of
>> +confusion for duplicate file names in the build tree.
>> +
>> +That would have to be resolved.
> 
> Another approach here would be to qualify symbol names by the
> object-file name which contains them + a unique path component, for example:
> 
> drivers/pci/pci.o
> arch/x86/pci.o
> arch/x86/x86_64/pci.o

I wouldn't want to do this uniformly. A patch series doing this where
needed has been proposed already:
http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg02702.html

Jan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v1 1/5] xsplice: Design document.
  2015-10-26 12:01   ` Martin Pohlack
  2015-10-26 12:10     ` Jan Beulich
@ 2015-10-26 13:21     ` Ross Lagerwall
  2015-10-26 13:55       ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 36+ messages in thread
From: Ross Lagerwall @ 2015-10-26 13:21 UTC (permalink / raw)
  To: Martin Pohlack
  Cc: elena.ufimtseva, hanweidong, jbeulich, john.liuqiming,
	paul.voccio, daniel.kiper, major.hayden, liuyingdong, aliguori,
	xen-devel, lars.kurth, steven.wilson, ian.campbell,
	peter.huangpeng, msw, xiantao.zxt, rick.harris, boris.ostrovsky,
	josh.kearney, jinsong.liu, amesserl, fanhenglong, andrew.cooper3

On 10/26/2015 12:01 PM, Martin Pohlack wrote:
> On 16.09.2015 23:01, Konrad Rzeszutek Wilk wrote:
>
> [...]
>
>> +### xsplice_patch
>> +
>> +This structure has the binary code (or data) to be patched. Depending on the
>> +type it can either an inline patch (data or text) or require an relocation
>> +change (which requires a trampoline). Naturally it also points to a blob
>> +of the binary data to patch in, and the size of the patch.
>
> On the Xen developer summit we agreed to start with a minimal approach
> first.  Based on looking at the last ~50 XSA patches, I do think we can
> do *without* the in-place patching and would propose to tackle this
> approach later or not at all.
>

Indeed. The prototype V1 that I posted last Friday works at a function 
level. I guess the only reason to make anything more complicated would 
be to patch the asm code in entry.S.

-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v1 1/5] xsplice: Design document.
  2015-10-26 13:21     ` Ross Lagerwall
@ 2015-10-26 13:55       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 36+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-10-26 13:55 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: elena.ufimtseva, hanweidong, jbeulich, john.liuqiming,
	paul.voccio, daniel.kiper, major.hayden, liuyingdong, aliguori,
	xen-devel, lars.kurth, steven.wilson, ian.campbell,
	peter.huangpeng, msw, xiantao.zxt, rick.harris, boris.ostrovsky,
	josh.kearney, jinsong.liu, amesserl, Martin Pohlack, fanhenglong,
	andrew.cooper3

On Mon, Oct 26, 2015 at 01:21:46PM +0000, Ross Lagerwall wrote:
> On 10/26/2015 12:01 PM, Martin Pohlack wrote:
> >On 16.09.2015 23:01, Konrad Rzeszutek Wilk wrote:
> >
> >[...]
> >
> >>+### xsplice_patch
> >>+
> >>+This structure has the binary code (or data) to be patched. Depending on the
> >>+type it can either an inline patch (data or text) or require an relocation
> >>+change (which requires a trampoline). Naturally it also points to a blob
> >>+of the binary data to patch in, and the size of the patch.
> >
> >On the Xen developer summit we agreed to start with a minimal approach
> >first.  Based on looking at the last ~50 XSA patches, I do think we can
> >do *without* the in-place patching and would propose to tackle this
> >approach later or not at all.

<nods vigorously>
> >
> 
> Indeed. The prototype V1 that I posted last Friday works at a function
> level. I guess the only reason to make anything more complicated would be to
> patch the asm code in entry.S.

Right, let me move this to the 'v2 improvements thoughts' of the design
document (so we don't lose some of this information, ideas, etc)?

> 
> -- 
> Ross Lagerwall

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v1 1/5] xsplice: Design document.
  2015-10-06 12:57   ` [PATCH v1 1/5] xsplice: Design document Ross Lagerwall
@ 2015-10-27  8:08     ` Martin Pohlack
  2015-10-27  8:45       ` Ross Lagerwall
  0 siblings, 1 reply; 36+ messages in thread
From: Martin Pohlack @ 2015-10-27  8:08 UTC (permalink / raw)
  To: Ross Lagerwall, Konrad Rzeszutek Wilk, xen-devel, msw, aliguori,
	amesserl, rick.harris, paul.voccio, steven.wilson, major.hayden,
	josh.kearney, jinsong.liu, xiantao.zxt, boris.ostrovsky,
	daniel.kiper, elena.ufimtseva, bob.liu, lars.kurth, hanweidong,
	peter.huangpeng, fanhenglong, liuyingdong, john.liuqiming,
	jbeulich, andrew.cooper3, ian.campbell

On 06.10.2015 14:57, Ross Lagerwall wrote:
> On 09/16/2015 10:01 PM, Konrad Rzeszutek Wilk wrote:
>> +### xSplice interdependencies
>> +
>> +xSplice patches interdependencies are tricky.
>> +
>> +There are the ways this can be addressed:
>> + * A single large patch that subsumes and replaces all previous ones.
>> +   Over the life-time of patching the hypervisor this large patch
>> +   grows to accumulate all the code changes.
>> + * Hotpatch stack - where an mechanism exists that loads the hotpatches
>> +   in the same order they were built in. We would need an build-id
>> +   of the hypevisor to make sure the hot-patches are build against the
>> +   correct build.
>> + * Payload containing the old code to check against that. That allows
>> +   the hotpatches to be loaded indepedently (if they don't overlap) - or
>> +   if the old code also containst previously patched code - even if they
>> +   overlap.
>> +
>> +The disadvantage of the first large patch is that it can grow over
>> +time and not provide an bisection mechanism to identify faulty patches.
>> +
>> +The hot-patch stack puts stricts requirements on the order of the patches
>> +being loaded and requires an hypervisor build-id to match against.
>> +
>> +The old code allows much more flexibility and an additional guard,
>> +but is more complex to implement.
>> +
> 
> If the single large patch mechanism is used, a new REPLACE action is 
> needed to atomically replace one patch with another to prevent a window 
> where the hypervisor is unpatched. kpatch has a "replace" command for 
> this purpose. This may be useful even for the other mechanisms listed above.
> 
>  From what I can tell:
> * kSplice uses old code checking (method [3] above), although in 
> practice the userspace tools implement dependency logic to enforce a 
> linear stack of patches.
> * kPatch and kGraft recommend using the single large patch mechanism 
> although there's nothing preventing two independent patches from being 
> loaded.

To make sure this argument is not lost: I wrote in an earlier email

> * There is a general (and mostly obscure) limitation on unloading
>   hotpatches:
> 
>   In contrast to normal kernel modules where the module code adheres
>   to specific conventions around resource allocation and locking,
>   hotpatches typically contain code from any context.  That code is
>   usually not aware that it can be unloaded.
> 
>   That code could leave behind in Xen references to itself, e.g., by
>   installing a function pointer in a global data structure, without
>   incrementing something like a usage count.  While most hotpatch code
>   will probably be very simple and small, a similar effect could even
>   be achieved by code called from the hotpatch in Xen, e.g., some code
>   patch could dynamically generate a backtrace and later decide to
>   inspect individual elements from the collected trace, later being a
>   time, where the hotpatch has been unloaded again.
> 
>   One could approach that proplem from multiple angles: code
>   inspection of generated hotpatches, testing, and by making unloading
>   a very special and exceptional operation.

If we follow that argument, we should consider patch unloading to be a
potentially unsafe operation which should not be part of standard
workflows.  Consequently, a similar argument holds for REPLACE, because
removing / unloading older code is part of replacement.

One could now either aim to have special inspection and testing
regarding unloading to lower the risk there, or:

Structure hotpatches as an ever-increasing stack of modules that are
loaded on top of each other.  This approach works around the unloading
problems as well as the temporary-unsafe problem outlined above.  Memory
usage would be similar to an ever-increasing monolithic hotpatch (+ some
fragmentation, - the temporary additional memory requirement for
replacing the monolithic hotpatch).

Martin

Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v1 1/5] xsplice: Design document.
  2015-10-27  8:08     ` Martin Pohlack
@ 2015-10-27  8:45       ` Ross Lagerwall
  0 siblings, 0 replies; 36+ messages in thread
From: Ross Lagerwall @ 2015-10-27  8:45 UTC (permalink / raw)
  To: Martin Pohlack, Konrad Rzeszutek Wilk, xen-devel, msw, aliguori,
	amesserl, rick.harris, paul.voccio, steven.wilson, major.hayden,
	josh.kearney, jinsong.liu, xiantao.zxt, boris.ostrovsky,
	daniel.kiper, elena.ufimtseva, bob.liu, lars.kurth, hanweidong,
	peter.huangpeng, fanhenglong, liuyingdong, john.liuqiming,
	jbeulich, andrew.cooper3, ian.campbell

On 10/27/2015 08:08 AM, Martin Pohlack wrote:
> On 06.10.2015 14:57, Ross Lagerwall wrote:
>> On 09/16/2015 10:01 PM, Konrad Rzeszutek Wilk wrote:
>>> +### xSplice interdependencies
>>> +
>>> +xSplice patches interdependencies are tricky.
>>> +
>>> +There are the ways this can be addressed:
>>> + * A single large patch that subsumes and replaces all previous ones.
>>> +   Over the life-time of patching the hypervisor this large patch
>>> +   grows to accumulate all the code changes.
>>> + * Hotpatch stack - where an mechanism exists that loads the hotpatches
>>> +   in the same order they were built in. We would need an build-id
>>> +   of the hypevisor to make sure the hot-patches are build against the
>>> +   correct build.
>>> + * Payload containing the old code to check against that. That allows
>>> +   the hotpatches to be loaded indepedently (if they don't overlap) - or
>>> +   if the old code also containst previously patched code - even if they
>>> +   overlap.
>>> +
>>> +The disadvantage of the first large patch is that it can grow over
>>> +time and not provide an bisection mechanism to identify faulty patches.
>>> +
>>> +The hot-patch stack puts stricts requirements on the order of the patches
>>> +being loaded and requires an hypervisor build-id to match against.
>>> +
>>> +The old code allows much more flexibility and an additional guard,
>>> +but is more complex to implement.
>>> +
>>
>> If the single large patch mechanism is used, a new REPLACE action is
>> needed to atomically replace one patch with another to prevent a window
>> where the hypervisor is unpatched. kpatch has a "replace" command for
>> this purpose. This may be useful even for the other mechanisms listed above.
>>
>>   From what I can tell:
>> * kSplice uses old code checking (method [3] above), although in
>> practice the userspace tools implement dependency logic to enforce a
>> linear stack of patches.
>> * kPatch and kGraft recommend using the single large patch mechanism
>> although there's nothing preventing two independent patches from being
>> loaded.
>
> To make sure this argument is not lost: I wrote in an earlier email
>
>> * There is a general (and mostly obscure) limitation on unloading
>>    hotpatches:
>>
>>    In contrast to normal kernel modules where the module code adheres
>>    to specific conventions around resource allocation and locking,
>>    hotpatches typically contain code from any context.  That code is
>>    usually not aware that it can be unloaded.
>>
>>    That code could leave behind in Xen references to itself, e.g., by
>>    installing a function pointer in a global data structure, without
>>    incrementing something like a usage count.  While most hotpatch code
>>    will probably be very simple and small, a similar effect could even
>>    be achieved by code called from the hotpatch in Xen, e.g., some code
>>    patch could dynamically generate a backtrace and later decide to
>>    inspect individual elements from the collected trace, later being a
>>    time, where the hotpatch has been unloaded again.
>>
>>    One could approach that proplem from multiple angles: code
>>    inspection of generated hotpatches, testing, and by making unloading
>>    a very special and exceptional operation.
>
> If we follow that argument, we should consider patch unloading to be a
> potentially unsafe operation which should not be part of standard
> workflows.  Consequently, a similar argument holds for REPLACE, because
> removing / unloading older code is part of replacement.
>
> One could now either aim to have special inspection and testing
> regarding unloading to lower the risk there, or:
>
> Structure hotpatches as an ever-increasing stack of modules that are
> loaded on top of each other.  This approach works around the unloading
> problems as well as the temporary-unsafe problem outlined above.  Memory
> usage would be similar to an ever-increasing monolithic hotpatch (+ some
> fragmentation, - the temporary additional memory requirement for
> replacing the monolithic hotpatch).
>

I agree that this could be an issue with the replace operation; in 
practice I don't believe it would be an issue (and it is the choice that 
both kGraft and kpatch use). Since it is trivial to implement the 
replace operation (it's already done in my branch), I think we should 
have both and let vendors choose.

I think we should also aim for a slightly more general concept than a 
stack, and let vendors implement the stack approach if they want. Here 
is what I wrote in an earlier mail:
* Each hypervisor has an embedded build id.
* Each binary patch has an embedded build id.
* The hypervisor should expose its build id and the build id of every 
loaded binary patch.
* Each binary patch specifies one or more build ids on which it depends. 
These build ids can be either a hypervisor build id or another patch 
build id. The dependencies could either identified automatically or by a 
developer.
* The userspace tool enforces dependency management (user can optionally 
force patch apply). I don't see any reason to involve the
hypervisor for dependency management.

The vendor could very easily implement a stack of patches with this 
approach (afaik this is sort of what ksplice does).

The one caveat with having multiple patches is that the patches need to 
be dynamically linked (e.g. consider a patch on top of a patch which 
introduces a new function). I've got a basically working version of 
dynamic linking of modules, but it is certainly more complex than static 
linking and we probably shouldn't aim for it in V1.

-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2015-10-27  8:45 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-16 21:01 [PATCH v1] xSplice initial foundation patches Konrad Rzeszutek Wilk
2015-09-16 21:01 ` [PATCH v1 1/5] xsplice: Design document Konrad Rzeszutek Wilk
2015-10-05 10:02   ` Jan Beulich
2015-10-05 10:28   ` Ross Lagerwall
2015-10-12 11:44     ` xsplice-build prototype (was [PATCH v1 1/5] xsplice: Design document.) Ross Lagerwall
2015-10-12 13:06       ` Konrad Rzeszutek Wilk
2015-10-12 14:20       ` Konrad Rzeszutek Wilk
2015-10-06 12:57   ` [PATCH v1 1/5] xsplice: Design document Ross Lagerwall
2015-10-27  8:08     ` Martin Pohlack
2015-10-27  8:45       ` Ross Lagerwall
2015-10-06 15:26   ` Jan Beulich
2015-10-26 12:01   ` Martin Pohlack
2015-10-26 12:10     ` Jan Beulich
2015-10-26 13:21     ` Ross Lagerwall
2015-10-26 13:55       ` Konrad Rzeszutek Wilk
2015-09-16 21:01 ` [PATCH v1 2/5] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op Konrad Rzeszutek Wilk
2015-10-02 15:06   ` Jan Beulich
2015-09-16 21:01 ` [PATCH v1 3/5] libxc: Implementation of XEN_XSPLICE_op in libxc Konrad Rzeszutek Wilk
2015-09-16 21:01 ` [PATCH v1 4/5] xen-xsplice: Tool to manipulate xsplice payloads Konrad Rzeszutek Wilk
2015-09-16 21:01 ` [PATCH v1 5/5] xsplice: Use ld-embedded build-ids Konrad Rzeszutek Wilk
2015-09-16 21:41   ` Andrew Cooper
2015-09-16 21:59     ` Konrad Rzeszutek Wilk
2015-09-16 22:31       ` Andrew Cooper
2015-09-17  6:41         ` Martin Pohlack
2015-09-17  9:35           ` Andrew Cooper
2015-09-17 18:45             ` Is: Make XENVER_* use XSM, seperate the different ops in smaller security domains. Was:Re: " Konrad Rzeszutek Wilk
2015-09-18 11:40               ` Andrew Cooper
2015-09-22 13:22                 ` Konrad Rzeszutek Wilk
2015-09-22 13:33                   ` Andrew Cooper
2015-09-22 13:45                     ` Konrad Rzeszutek Wilk
2015-09-22 16:28                       ` Daniel De Graaf
2015-09-22 16:28               ` Daniel De Graaf
2015-09-25 20:18                 ` Konrad Rzeszutek Wilk
2015-10-02 15:13   ` Jan Beulich
2015-10-02 14:48 ` [PATCH v1] xSplice initial foundation patches Konrad Rzeszutek Wilk
2015-10-09 12:46   ` Konrad Rzeszutek Wilk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.