All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 01/11] xsplice: Design document (v2).
@ 2015-11-03 18:15 Ross Lagerwall
  2015-11-03 18:15 ` [PATCH v1 02/11] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op Ross Lagerwall
                   ` (12 more replies)
  0 siblings, 13 replies; 40+ messages in thread
From: Ross Lagerwall @ 2015-11-03 18:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Ian Campbell, Ian Jackson, Tim Deegan, Ross Lagerwall, Jan Beulich

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

A mechanism is required to binarily patch the running hypervisor with new
opcodes that have come about due to primarily security updates.

This document describes the design of the API that would allow us to
upload to the hypervisor binary patches.

This document has been shaped by the input from:
  Martin Pohlack <mpohlack@amazon.de>
  Jan Beulich <jbeulich@suse.com>

Thank you!

Input-from: Martin Pohlack <mpohlack@amazon.de>
Input-from: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 docs/misc/xsplice.markdown | 999 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 999 insertions(+)
 create mode 100644 docs/misc/xsplice.markdown

diff --git a/docs/misc/xsplice.markdown b/docs/misc/xsplice.markdown
new file mode 100644
index 0000000..501056e
--- /dev/null
+++ b/docs/misc/xsplice.markdown
@@ -0,0 +1,999 @@
+# xSplice Design v1
+
+## Rationale
+
+A mechanism is required to binarily patch the running hypervisor with new
+opcodes that have come about due to primarily security updates.
+
+This document describes the design of the API that would allow us to
+upload to the hypervisor binary patches.
+
+The document is split in four sections:
+
+ * Detailed descriptions of the problem statement.
+ * Design of the data structures.
+ * Design of the hypercalls.
+ * Implementation notes that should be taken into consideration.
+
+
+## Glossary
+
+ * splice - patch in the binary code with new opcodes
+ * trampoline - a jump to a new instruction.
+ * payload - telemetries of the old code along with binary blob of the new
+   function (if needed).
+ * reloc - telemetries contained in the payload to construct proper trampoline.
+
+## History
+
+The document has gone under various reviews and only covers v1 design.
+
+The end of the document has a section titled `Not Yet Done` which
+outlines ideas and design for the v2 version of this work.
+
+## Multiple ways to patch
+
+The mechanism needs to be flexible to patch the hypervisor in multiple ways
+and be as simple as possible. The compiled code is contiguous in memory with
+no gaps - so we have no luxury of 'moving' existing code and must either
+insert a trampoline to the new code to be executed - or only modify in-place
+the code if there is sufficient space. The placement of new code has to be done
+by hypervisor and the virtual address for the new code is allocated dynamically.
+
+This implies that the hypervisor must compute the new offsets when splicing
+in the new trampoline code. Where the trampoline is added (inside
+the function we are patching or just the callers?) is also important.
+
+To lessen the amount of code in hypervisor, the consumer of the API
+is responsible for identifying which mechanism to employ and how many locations
+to patch. Combinations of modifying in-place code, adding trampoline, etc
+has to be supported. The API should allow read/write any memory within
+the hypervisor virtual address space.
+
+We must also have a mechanism to query what has been applied and a mechanism
+to revert it if needed.
+
+## Workflow
+
+The expected workflows of higher-level tools that manage multiple patches
+on production machines would be:
+
+ * The first obvious task is loading all available / suggested
+   hotpatches around system start.
+ * Whenever new hotpatches are installed, they should be loaded too.
+ * One wants to query which modules have been loaded at runtime.
+ * If unloading is deemed safe (see unloading below), one may want to
+   support a workflow where a specific hotpatch is marked as bad and
+   unloaded.
+ * If we do no restrict module activation order and want to report tboot
+   state on sequences, we might have a complexity explosion problem, in
+   what system hashes should be considered acceptable.
+
+## Patching code
+
+The first mechanism to patch that comes in mind is in-place replacement.
+That is replace the affected code with new code. Unfortunately the x86
+ISA is variable size which places limits on how much space we have available
+to replace the instructions. That is not a problem if the change is smaller
+than the original opcode and we can fill it with nops. Problems will
+appear if the replacement code is longer.
+
+The second mechanism is by replacing the call or jump to the
+old function with the address of the new function.
+
+A third mechanism is to add a jump to the new function at the
+start of the old function.
+
+### Example of trampoline and in-place splicing
+
+As example we will assume the hypervisor does not have XSA-132 (see
+*domctl/sysctl: don't leak hypervisor stack to toolstacks*
+4ff3449f0e9d175ceb9551d3f2aecb59273f639d) and we would like to binary patch
+the hypervisor with it. The original code looks as so:
+
+<pre>
+   48 89 e0                  mov    %rsp,%rax  
+   48 25 00 80 ff ff         and    $0xffffffffffff8000,%rax  
+</pre>
+
+while the new patched hypervisor would be:
+
+<pre>
+   48 c7 45 b8 00 00 00 00   movq   $0x0,-0x48(%rbp)  
+   48 c7 45 c0 00 00 00 00   movq   $0x0,-0x40(%rbp)  
+   48 c7 45 c8 00 00 00 00   movq   $0x0,-0x38(%rbp)  
+   48 89 e0                  mov    %rsp,%rax  
+   48 25 00 80 ff ff         and    $0xffffffffffff8000,%rax  
+</pre>
+
+This is inside the arch_do_domctl. This new change adds 21 extra
+bytes of code which alters all the offsets inside the function. To alter
+these offsets and add the extra 21 bytes of code we might not have enough
+space in .text to squeeze this in.
+
+As such we could simplify this problem by only patching the site
+which calls arch_do_domctl:
+
+<pre>
+<do_domctl>:  
+ e8 4b b1 05 00          callq  ffff82d08015fbb9 <arch_do_domctl>  
+</pre>
+
+with a new address for where the new `arch_do_domctl` would be (this
+area would be allocated dynamically).
+
+Astute readers will wonder what we need to do if we were to patch `do_domctl`
+- which is not called directly by hypervisor but on behalf of the guests via
+the `compat_hypercall_table` and `hypercall_table`.
+Patching the offset in `hypercall_table` for `do_domctl:
+(ffff82d080103079 <do_domctl>:)
+<pre>
+
+ ffff82d08024d490:   79 30  
+ ffff82d08024d492:   10 80 d0 82 ff ff   
+
+</pre>
+with the new address where the new `do_domctl` is possible. The other
+place where it is used is in `hvm_hypercall64_table` which would need
+to be patched in a similar way. This would require an in-place splicing
+of the new virtual address of `arch_do_domctl`.
+
+In summary this example patched the callee of the affected function by
+ * allocating memory for the new code to live in,
+ * changing the virtual address in all the functions which called the old
+   code (computing the new offset, patching the callq with a new callq).
+ * changing the function pointer tables with the new virtual address of
+   the function (splicing in the new virtual address). Since this table
+   resides in the .rodata section we would need to temporarily change the
+   page table permissions during this part.
+
+
+However it has severe drawbacks - the safety checks which have to make sure
+the function is not on the stack - must also check every caller. For some
+patches this could mean - if there were an sufficient large amount of
+callers - that we would never be able to apply the update.
+
+### Example of different trampoline patching.
+
+An alternative mechanism exists where we can insert a trampoline in the
+existing function to be patched to jump directly to the new code. This
+lessens the locations to be patched to one but it puts pressure on the
+CPU branching logic (I-cache, but it is just one unconditional jump).
+
+For this example we will assume that the hypervisor has not been compiled
+with fe2e079f642effb3d24a6e1a7096ef26e691d93e (XSA-125: *pre-fill structures
+for certain HYPERVISOR_xen_version sub-ops*) which mem-sets an structure
+in `xen_version` hypercall. This function is not called **anywhere** in
+the hypervisor (it is called by the guest) but referenced in the
+`compat_hypercall_table` and `hypercall_table` (and indirectly called
+from that). Patching the offset in `hypercall_table` for the old
+`do_xen_version` (ffff82d080112f9e <do_xen_version>)
+
+</pre>
+ ffff82d08024b270 <hypercall_table>  
+ ...  
+ ffff82d08024b2f8:   9e 2f 11 80 d0 82 ff ff  
+
+</pre>
+with the new address where the new `do_xen_version` is possible. The other
+place where it is used is in `hvm_hypercall64_table` which would need
+to be patched in a similar way. This would require an in-place splicing
+of the new virtual address of `do_xen_version`.
+
+An alternative solution would be to patch insert a trampoline in the
+old `do_xen_version' function to directly jump to the new `do_xen_version`.
+
+<pre>
+ ffff82d080112f9e <do_xen_version>:  
+ ffff82d080112f9e:       48 c7 c0 da ff ff ff    mov    $0xffffffffffffffda,%rax  
+ ffff82d080112fa5:       83 ff 09                cmp    $0x9,%edi  
+ ffff82d080112fa8:       0f 87 24 05 00 00       ja     ffff82d0801134d2 <do_xen_version+0x534>  
+</pre>
+
+with:
+
+<pre>
+ ffff82d080112f9e <do_xen_version>:  
+ ffff82d080112f9e:       e9 XX YY ZZ QQ          jmpq   [new do_xen_version]  
+</pre>
+
+which would lessen the amount of patching to just one location.
+
+In summary this example patched the affected function to jump to the
+new replacement function which required:
+ * allocating memory for the new code to live in,
+ * inserting trampoline with new offset in the old function to point to the
+   new function.
+ * Optionally we can insert in the old function a trampoline jump to an function
+   providing an BUG_ON to catch errant code.
+
+The disadvantage of this are that the unconditional jump will consume a small
+I-cache penalty. However the simplicity of the patching and higher chance
+of passing safety checks make this a worthwhile option.
+
+### Security
+
+With this method we can re-write the hypervisor - and as such we **MUST** be
+diligent in only allowing certain guests to perform this operation.
+
+Furthermore with SecureBoot or tboot, we **MUST** also verify the signature
+of the payload to be certain it came from a trusted source and integrity
+was intact.
+
+As such the hypercall **MUST** support an XSM policy to limit what the guest
+is allowed to invoke. If the system is booted with signature checking the
+signature checking will be enforced.
+
+## Design of payload format
+
+The payload **MUST** contain enough data to allow us to apply the update
+and also safely reverse it. As such we **MUST** know:
+
+ * The locations in memory to be patched. This can be determined dynamically
+   via symbols or via virtual addresses.
+ * The new code that will be patched in.
+ * Signature to verify the payload.
+
+This binary format can be constructed using an custom binary format but
+there are severe disadvantages of it:
+
+ * The format might need to be changed and we need an mechanism to accommodate
+   that.
+ * It has to be platform agnostic.
+ * Easily constructed using existing tools.
+
+As such having the payload in an ELF file is the sensible way. We would be
+carrying the various sets of structures (and data) in the ELF sections under
+different names and with definitions. The prefix for the ELF section name
+would always be: *.xsplice* to match up to the names of the structures.
+
+Note that every structure has padding. This is added so that the hypervisor
+can re-use those fields as it sees fit.
+
+Earlier design attempted to ineptly explain the relations of the ELF sections
+to each other without using proper ELF mechanism (sh_info, sh_link, data
+structures using Elf types, etc). This design will explain in detail
+the structures and how they are used together and not dig in the ELF
+format - except mention that the section names should match the
+structure names.
+
+The xSplice payload is a relocatable ELF binary. A typical binary would have:
+
+ * One or more .text sections
+ * Zero or more read-only data sections
+ * Zero or more data sections
+ * Relocations for each of these sections
+
+It may also have some architecture-specific sections. For example:
+
+ * Alternatives instructions
+ * Bug frames
+ * Exception tables
+ * Relocations for each of these sections
+
+The xSplice core code loads the payload as a standard ELF binary, relocates it
+and handles the architecture-specifc sections as needed. This process is much
+like what the Linux kernel module loader does. It contains no xSplice-specific
+details and thus will not be discussed further.
+
+Importantly, the payload also contains a section with an array of structures
+describing the functions to be patched:
+<pre>
+struct xsplice_patch_func {
+    unsigned long new_addr;
+    unsigned long new_size;
+    unsigned long old_addr;
+    unsigned long old_size;
+    char *name;
+    uint8_t pad[64];
+};
+<pre>
+
+* `old_addr` is the address of the function to be patched and is filled in at
+  compile time if the payload is statically linked and at run time if the
+  payload is dynamically linked.
+* `new_addr` is the address of the function that is replacing the old
+  function. The address is filled in during relocation.
+* `old_size` and `new_size` contain the sizes of the respective functions.
+* `name` is used for looking up the old function address during dynamic
+  linking.
+
+The size of the `xsplice_patch_func` array is determined from the ELF section
+size.
+
+During patch apply, for each `xsplice_patch_func`, the core code inserts a
+trampoline at `old_addr` to `new_addr`. During patch revert, for each
+`xsplice_patch_func`, the core code copies the data from the undo buffer to
+`old_addr`.
+
+## Hypercalls
+
+We will employ the sub operations of the system management hypercall (sysctl).
+There are to be four sub-operations:
+
+ * upload the payloads.
+ * listing of payloads summary uploaded and their state.
+ * getting an particular payload summary and its state.
+ * command to apply, delete, or revert the payload.
+
+Most of the actions are asynchronous therefore the caller is responsible
+to verify that it has been applied properly by retrieving the summary of it
+and verifying that there are no error codes associated with the payload.
+
+We **MUST** make some of them asynchronous due to the nature of patching
+it requires every physical CPU to be lock-step with each other.
+The patching mechanism while an implementation detail, is not an short
+operation and as such the design **MUST** assume it will be an long-running
+operation.
+
+The sub-operations will spell out how preemption is to be handled (if at all).
+
+Furthermore it is possible to have multiple different payloads for the same
+function. As such an unique id per payload has to be visible to allow proper manipulation.
+
+The hypercall is part of the `xen_sysctl`. The top level structure contains
+one uint32_t to determine the sub-operations:
+
+<pre>
+struct xen_sysctl_xsplice_op {  
+    uint32_t cmd;  
+	union {  
+          ... see below ...  
+        } u;  
+};  
+
+</pre>
+while the rest of hypercall specific structures are part of the this structure.
+
+### Basic type: struct xen_xsplice_id
+
+Most of the hypercalls employ an shared structure called `struct xen_xsplice_id`
+which contains:
+
+ * `name` - pointer where the string for the id is located.
+ * `size` - the size of the string
+ * `_pad` - padding - to be zero.
+
+The structure is as follow:
+
+<pre>
+#define XEN_XSPLICE_NAME_SIZE 128
+struct xen_xsplice_id {  
+    XEN_GUEST_HANDLE_64(char) name;         /* IN, pointer to name. */  
+    uint32_t    size;                       /* IN, size of name. May be upto   
+                                               XEN_XSPLICE_NAME_SIZE. */  
+    uint32_t    _pad;  
+};  
+</pre>
+### XEN_SYSCTL_XSPLICE_UPLOAD (0)
+
+Upload a payload to the hypervisor. The payload is verified
+against basic checks and if there are any issues the proper return code
+will be returned. The payload is not applied at this time - that is
+controlled by *XEN_SYSCTL_XSPLICE_ACTION*.
+
+The caller provides:
+
+ * A `struct xen_xsplice_id` called `id` which has the unique id.
+ * `size` the size of the ELF payload (in bytes).
+ * `payload` the virtual address of where the ELF payload is.
+
+The `id` could be an UUID in mind that stays fixed forever for a given
+hotpatch. It can be embedded into the Elf payload at creation time
+and extracted by tools.
+
+The return value is zero if the payload was succesfully uploaded.
+Otherwise an XEN_EXX return value is provided. Duplicate `id` are not supported.
+The payload at this point is verified against the basic checks.
+
+The `payload` is the ELF payload as mentioned in the `Payload format` section.
+
+The structure is as follow:
+
+<pre>
+struct xen_sysctl_xsplice_upload {  
+    xen_xsplice_id_t id;        /* IN, name of the patch. */  
+    uint64_t size;              /* IN, size of the ELF file. */
+    XEN_GUEST_HANDLE_64(uint8) payload; /* IN: ELF file. */  
+}; 
+</pre>
+
+### XEN_SYSCTL_XSPLICE_GET (1)
+
+Retrieve an status of an specific payload. This caller provides:
+
+ * A `struct xen_xsplice_id` called `id` which has the unique id.
+ * A `struct xen_xsplice_status` structure which has all members
+   set to zero: That is:
+   * `status` *MUST* be set to zero.
+   * `rc` *MUST* be set to zero.
+
+Upon completion the `struct xen_xsplice_status` is updated.
+
+ * `status` - whether it has been:
+   * *XSPLICE_STATUS_LOADED* (0x1) has been loaded.
+   * *XSPLICE_STATUS_CHECKED*  (0x2) the ELF payload safety checks passed.
+   * *XSPLICE_STATUS_APPLIED* (0x4) loaded, checked, and applied.
+   *  Negative values is an error. The error would be of XEN_EXX format.
+ * `rc` - XEN_EXX type errors encountered while performing the `status`
+   operation. The expected values are zero and XEN_EAGAIN which
+   respectively mean: success and operation in progress.
+
+The return value is zero on success and XEN_EXX on failure. This operation
+is synchronous and does not require preemption.
+
+The structure is as follow:
+
+<pre>
+struct xen_xsplice_status {  
+#define XSPLICE_STATUS_LOADED       0x01  
+#define XSPLICE_STATUS_CHECKED      0x02  
+#define XSPLICE_STATUS_APPLIED      0x04  
+	int32_t status;                 /* OUT: XSPLICE_STATE_*. IN: MUST be zero. */  
+    uint32_t rc;                    /* OUT: 0 if no error, otherwise -XEN_EXX. */  
+                                    /* IN: MUST be zero. */
+};  
+
+struct xen_sysctl_xsplice_summary {  
+    xen_xsplice_id_t    id;         /* IN, the name of the payload. */  
+    xen_xsplice_status_t status;    /* IN/OUT: status of the payload. */  
+};  
+</pre>
+
+### XEN_SYSCTL_XSPLICE_LIST (2)
+
+Retrieve an array of abbreviated status and names of payloads that are loaded in the
+hypervisor.
+
+The caller provides:
+
+ * `version`. Initially (on first hypercall) *MUST* be zero.
+ * `idx` index iterator. On first call *MUST* be zero, subsequent calls varies.
+ * `nr` the max number of entries to populate.
+ * `_pad` - *MUST* be zero.
+ * `status` virtual address of where to write `struct xen_xsplice_status`
+   structures. *MUST* allocate up to `nr` of them.
+ * `id` - virtual address of where to write the unique id of the payload.
+   *MUST* allocate up to `nr` of them. Each *MUST* be of
+   **XEN_XSPLICE_NAME_SIZE** size.
+ * `len` - virtual address of where to write the length of each unique id
+   of the payload. *MUST* allocate up to `nr` of them. Each *MUST* be
+   of sizeof(uint32_t) (4 bytes).
+
+If the hypercall returns an positive number, it is the number (up to `nr`)
+of the payloads returned, along with `nr` updated with the number of remaining
+payloads, `version` updated (it may be the same across hypercalls. If it
+varies the data is stale and further calls could fail). The `status`,
+`id`, and `len`' are updated at their designed index value (`idx`) with
+the returned value of data.
+
+If the hypercall returns E2BIG the `count` is too big and should be
+lowered.
+
+This operation can be preempted by the hypercall returning EAGAIN.
+Retry.
+
+Note that due to the asynchronous nature of hypercalls the domain might have
+added or removed the number of payloads making this information stale. It is
+the responsibility of the toolstack to use the `version` field to check
+between each invocation. if the version differs it should discard the stale
+data and start from scratch. It is OK for the toolstack to use the new
+`version` field.
+
+The `struct xen_xsplice_status` structure contains an status of payload which includes:
+
+ * `status` - whether it has been:
+   * *XSPLICE_STATUS_LOADED* (0x1) has been loaded.
+   * *XSPLICE_STATUS_CHECKED*  (0x2) the ELF payload safety checks passed.
+   * *XSPLICE_STATUS_APPLIED* (0x4) loaded, checked, and applied.
+ * `rc` - XEN_EXX type errors encountered while performing the `status`
+   operation. The expected values are zero and XEN_EAGAIN which
+   respectively mean: success and operation in progress.
+
+The structure is as follow:
+
+<pre>
+struct xen_sysctl_xsplice_list {  
+    uint32_t version;                       /* IN/OUT: Initially *MUST* be zero.  
+                                               On subsequent calls reuse value.  
+                                               If varies between calls, we are  
+                                             * getting stale data. */  
+    uint32_t idx;                           /* IN/OUT: Index into array. */  
+    uint32_t nr;                            /* IN: How many status, id, and len  
+                                               should populate.  
+                                               OUT: How many payloads left. */  
+    uint32_t _pad;                          /* IN: Must be zero. */  
+    XEN_GUEST_HANDLE_64(xen_xsplice_status_t) status;  /* OUT. Must have enough  
+                                               space allocate for n of them. */  
+    XEN_GUEST_HANDLE_64(char) id;           /* OUT: Array of ids. Each member  
+                                               MUST XEN_XSPLICE_NAME_SIZE in size.  
+                                               Must have n of them. */  
+    XEN_GUEST_HANDLE_64(uint32) len;        /* OUT: Array of lengths of ids.  
+                                               Must have n of them. */  
+};  
+</pre>
+
+### XEN_SYSCTL_XSPLICE_ACTION (3)
+
+Perform an operation on the payload structure referenced by the `id` field.
+The operation request is asynchronous and the status should be retrieved
+by using either **XEN_SYSCTL_XSPLICE_GET** or **XEN_SYSCTL_XSPLICE_LIST** hypercall.
+If the operation fails more details on the operation can be retrieved via
+**XEN_SYSCTL_XSPLICE_INFO** hypercall.
+
+The caller provides:
+
+ * A 'struct xen_xsplice_id` `id` containing the unique id.
+ * `cmd` the command requested:
+  * *XSPLICE_ACTION_CHECK* (1) check that the payload will apply properly.
+    This also verfies the payload - which may require SecureBoot firmware
+    calls.
+  * *XSPLICE_ACTION_UNLOAD* (2) unload the payload.
+   Any further hypercalls against the `id` will result in failure unless
+   **XEN_SYSCTL_XSPLICE_UPLOAD** hypercall is perfomed with same `id`.
+  * *XSPLICE_ACTION_REVERT* (3) revert the payload. If the operation takes
+  more time than the upper bound of time the `status` will XEN_EBUSY.
+  * *XSPLICE_ACTION_APPLY* (4) apply the payload. If the operation takes
+  more time than the upper bound of time the `status` will be XEN_EBUSY.
+  * *XSPLICE_ACTION_REPLACE* (5) revert all applied payloads and apply this
+  payload.
+  * *XSPLICE_ACTION_LOADED* is an initial state and cannot be requested.
+ * `time` the upper bound of time the cmd should take. Zero means infinite.
+   If within the time the operation does not succeed the operation would go in
+   error state.
+ * `_pad` - *MUST* be zero.
+
+The return value will be zero unless the provided fields are incorrect.
+
+The structure is as follow:
+
+<pre>
+#define XSPLICE_ACTION_CHECK   1  
+#define XSPLICE_ACTION_UNLOAD  2  
+#define XSPLICE_ACTION_REVERT  3  
+#define XSPLICE_ACTION_APPLY   4  
+#define XSPLICE_ACTION_REPLACE 5  
+struct xen_sysctl_xsplice_action {  
+    xen_xsplice_id_t id;                    /* IN, name of the patch. */  
+    uint32_t cmd;                           /* IN: XSPLICE_ACTION_* */  
+    uint32_t _pad;                          /* IN: MUST be zero. */  
+    uint64_t time;                          /* IN: Zero if no timeout. */ 
+                                            /* Or upper bound of time (ms) */   
+                                            /* for operation to take. */  
+};  
+
+</pre>
+
+## State diagrams of XSPLICE_ACTION values.
+
+There is a strict ordering state of what the commands can be.
+The XSPLICE_ACTION prefix has been dropped to easy reading:
+
+<pre>
+              /->\
+              \  /
+ UNLOAD <--- CHECK ---> REPLACE|APPLY --> REVERT --\
+                \                                  |
+                 \-------------------<-------------/
+
+</pre>
+Or an state transition table of valid states:
+
+<pre>
+
++-------+-------+--------+--------+--------+---------+-------+------------------+  
+| CHECK | APPLY | REPLACE| REVERT | UNLOAD | Current | Next  | Result           |  
++-------+-------+--------+--------+--------+---------+-------+------------------+  
+|   x   |       |        |        |        | CHECK   | CHECK | Check payload.   |  
++-------+-------+--------+--------+--------+---------+-------+------------------+  
+|   x   |       |        |        |        | LOADED  | CHECK | Check payload.   |  
++-------+-------+--------+--------+--------+---------+-------+------------------+  
+|       |       |        |        |   x    | CHECK   | UNLOAD| Unload payload.  |  
++-------+-------+--------+--------+--------+---------+-------+------------------+  
+|       |   x   |        |        |        | CHECK   | APPLY | Apply payload.   |  
++-------+-------+--------+--------+--------+---------+-------+------------------+  
+|       |       |   x    |        |        | CHECK   | APPLY | Apply payload..  |  
++-------+-------+--------+--------+--------+---------+-------+------------------+  
+|       |       |        |    x   |        | REVERT  | CHECK | Revert payload.  |  
++-------+-------+--------+--------+--------+---------+-------+------------------+  
+</pre>
+All the other state transitions are invalid.
+
+## Sequence of events.
+
+The normal sequence of events is to:
+
+ 1. *XEN_SYSCTL_XSPLICE_UPLOAD* to upload the payload. If there are errors *STOP* here.
+ 2. *XEN_SYSCTL_XSPLICE_GET* to check the `->rc`. If *XEN_EAGAIN* spin. If zero go to next step.
+ 3. *XEN_SYSCTL_XSPLICE_ACTION* with *XSPLICE_ACTION_CHECK* command to verify that the payload can be succesfully applied.
+ 4. *XEN_SYSCTL_XSPLICE_GET* to check the `->rc`. If *XEN_EAGAIN* spin. If zero go to next step.
+ 5. *XEN_SYSCTL_XSPLICE_ACTION* with *XSPLICE_ACTION_APPLY* to apply the patch.
+ 6. *XEN_SYSCTL_XSPLICE_GET* to check the `->rc`. If in *XEN_EAGAIN* spin. If zero exit with success.
+
+ 
+## Addendum
+
+Implementation quirks should not be discussed in a design document.
+
+However these observations can provide aid when developing against this
+document.
+
+
+### Alternative assembler
+
+Alternative assembler is a mechanism to use different instructions depending
+on what the CPU supports. This is done by providing multiple streams of code
+that can be patched in - or if the CPU does not support it - padded with
+`nop` operations. The alternative assembler macros cause the compiler to
+expand the code to place a most generic code in place - emit a special
+ELF .section header to tag this location. During run-time the hypervisor
+can leave the areas alone or patch them with an better suited opcodes.
+
+
+### When to patch
+
+During the discussion on the design two candidates bubbled where
+the call stack for each CPU would be deterministic. This would
+minimize the chance of the patch not being applied due to safety
+checks failing.
+
+#### Rendezvous code instead of stop_machine for patching
+
+The hypervisor's time rendezvous code runs synchronously across all CPUs
+every second. Using the stop_machine to patch can stall the time rendezvous
+code and result in NMI. As such having the patching be done at the tail
+of rendezvous code should avoid this problem.
+
+However the entrance point for that code is
+do_softirq->timer_softirq_action->time_calibration
+which ends up calling on_selected_cpus on remote CPUs.
+
+The remote CPUs receive CALL_FUNCTION_VECTOR IPI and execute the
+desired function.
+
+#### Before entering the guest code.
+
+Before we call VMXResume we check whether any soft IRQs need to be executed.
+This is a good spot because all Xen stacks are effectively empty at
+that point.
+
+To randezvous all the CPUs an barrier with an maximum timeout (which
+could be adjusted), combined with forcing all other CPUs through the
+hypervisor with IPIs, can be utilized to have all the CPUs be lockstep.
+
+The approach is similar in concept to stop_machine and the time rendezvous
+but is time-bound. However the local CPU stack is much shorter and
+a lot more deterministic.
+
+### Compiling the hypervisor code
+
+Hotpatch generation often requires support for compiling the target
+with -ffunction-sections / -fdata-sections.  Changes would have to
+be done to the linker scripts to support this.
+
+
+### Generation of xSplice ELF payloads
+
+The design of that is not discussed in this design.
+
+The author of this design envisions objdump and objcopy along
+with special GCC parameters (see above) to create .o.xsplice files
+which can be used to splice an ELF with the new payload.
+
+The ksplice or kpatching code can provide inspiration.
+
+### Exception tables and symbol tables growth
+
+We may need support for adapting or augmenting exception tables if
+patching such code.  Hotpatches may need to bring their own small
+exception tables (similar to how Linux modules support this).
+
+If supporting hotpatches that introduce additional exception-locations
+is not important, one could also change the exception table in-place
+and reorder it afterwards.
+
+### Security
+
+Only the privileged domain should be allowed to do this operation.
+
+
+# v2: Not Yet Done
+
+
+## Goals
+
+The v2 design must also have a mechanism for:
+
+ *  An dependency mechanism for the payloads. To use that information to load:
+    - The appropiate payload. To verify that payload is built against the
+      hypervisor. This can be done via the `build-id` (see later sections),
+      or via providing an copy of the old code - so that the hypervisor can
+       verify it against the code in memory.
+    - To construct an appropiate order of payloads to load in case they
+      depend on each other.
+ * Be able to cope with symbol names in the ELF payload.
+ * Be able to patch .rodata, .bss, and .data sections.
+ * Further safety checks.
+
+### Hypervisor ID (build-id)
+
+The build-id can help with:
+
+  * Prevent loading of wrong hotpatches (intended for other builds)
+
+  * Allow to identify suitable hotpatches on disk and help with runtime
+    tooling (if laid out using build ID)
+
+The build-id (aka hypervisor id) can be easily obtained by utilizing
+the ld --build-id operatin which (copied from ld):
+
+<pre>
+--build-id  
+    --build-id=style  
+        Request creation of ".note.gnu.build-id" ELF note section.  The contents of the note are unique bits identifying this  
+        linked file.  style can be "uuid" to use 128 random bits, "sha1" to use a 160-bit SHA1 hash on the normative parts of the  
+        output contents, "md5" to use a 128-bit MD5 hash on the normative parts of the output contents, or "0xhexstring" to use a  
+        chosen bit string specified as an even number of hexadecimal digits ("-" and ":" characters between digit pairs are  
+        ignored).  If style is omitted, "sha1" is used.  
+
+        The "md5" and "sha1" styles produces an identifier that is always the same in an identical output file, but will be  
+        unique among all nonidentical output files.  It is not intended to be compared as a checksum for the file's contents.  A  
+        linked file may be changed later by other tools, but the build ID bit string identifying the original linked file does  
+        not change.  
+
+        Passing "none" for style disables the setting from any "--build-id" options earlier on the command line.  
+
+</pre>
+
+
+### xSplice interdependencies
+
+xSplice patches interdependencies are tricky.
+
+There are the ways this can be addressed:
+ * A single large patch that subsumes and replaces all previous ones.
+   Over the life-time of patching the hypervisor this large patch
+   grows to accumulate all the code changes.
+ * Hotpatch stack - where an mechanism exists that loads the hotpatches
+   in the same order they were built in. We would need an build-id
+   of the hypevisor to make sure the hot-patches are build against the
+   correct build.
+ * Payload containing the old code to check against that. That allows
+   the hotpatches to be loaded indepedently (if they don't overlap) - or
+   if the old code also containst previously patched code - even if they
+   overlap.
+
+The disadvantage of the first large patch is that it can grow over
+time and not provide an bisection mechanism to identify faulty patches.
+
+The hot-patch stack puts stricts requirements on the order of the patches
+being loaded and requires an hypervisor build-id to match against.
+
+The old code allows much more flexibility and an additional guard,
+but is more complex to implement.
+
+### Symbol names
+
+
+Xen as it is now, has a couple of non-unique symbol names which will
+make runtime symbol identification hard.  Sometimes, static symbols
+simply have the same name in C files, sometimes such symbols get
+included via header files, and some C files are also compiled
+multiple times and linked under different names (guest_walk.c).
+
+As such we need to modify the linker to make sure that the symbol
+table qualifies also symbols by their source file name.
+
+For the awkward situations in which C-files are compiled multiple
+times patches we would need to some modification in the Xen code.
+
+
+The convention for file-type symbols (that would allow to map many
+symbols to their compilation unit) says that only the basename (i.e.,
+without directories) is embedded.  This creates another layer of
+confusion for duplicate file names in the build tree.
+
+That would have to be resolved.
+
+<pre>
+> find . -name \*.c -print0 | xargs -0 -n1 basename | sort | uniq -c | sort -n | tail -n10
+      3 shutdown.c
+      3 sysctl.c
+      3 time.c
+      3 xenoprof.c
+      4 gdbstub.c
+      4 irq.c
+      5 domain.c
+      5 mm.c
+      5 pci.c
+      5 traps.c
+</pre>
+
+### Handle inlined __LINE__
+
+
+This problem is related to hotpatch construction
+and potentially has influence on the design of the hotpatching
+infrastructure in Xen.
+
+For example:
+
+We have file1.c with functions f1 and f2 (in that order).  f2 contains a
+BUG() (or WARN()) macro and at that point embeds the source line number
+into the generated code for f2.
+
+Now we want to hotpatch f1 and the hotpatch source-code patch adds 2
+lines to f1 and as a consequence shifts out f2 by two lines.  The newly
+constructed file1.o will now contain differences in both binary
+functions f1 (because we actually changed it with the applied patch) and
+f2 (because the contained BUG macro embeds the new line number).
+
+Without additional information, an algorithm comparing file1.o before
+and after hotpatch application will determine both functions to be
+changed and will have to include both into the binary hotpatch.
+
+Options:
+
+1. Transform source code patches for hotpatches to be line-neutral for
+   each chunk.  This can be done in almost all cases with either
+   reformatting of the source code or by introducing artificial
+   preprocessor "#line n" directives to adjust for the introduced
+   differences.
+
+   This approach is low-tech and simple.  Potentially generated
+   backtraces and existing debug information refers to the original
+   build and does not reflect hotpatching state except for actually
+   hotpatched functions but should be mostly correct.
+
+2. Ignoring the problem and living with artificially large hotpatches
+   that unnecessarily patch many functions.
+
+   This approach might lead to some very large hotpatches depending on
+   content of specific source file.  It may also trigger pulling in
+   functions into the hotpatch that cannot reasonable be hotpatched due
+   to limitations of a hotpatching framework (init-sections, parts of
+   the hotpatching framework itself, ...) and may thereby prevent us
+   from patching a specific problem.
+
+   The decision between 1. and 2. can be made on a patch--by-patch
+   basis.
+
+3. Introducing an indirection table for storing line numbers and
+   treating that specially for binary diffing. Linux may follow
+   this approach.
+
+   We might either use this indirection table for runtime use and patch
+   that with each hotpatch (similarly to exception tables) or we might
+   purely use it when building hotpatches to ignore functions that only
+   differ at exactly the location where a line-number is embedded.
+
+Similar considerations are true to a lesser extent for __FILE__, but it
+could be argued that file renaming should be done outside of hotpatches.
+
+## Signature checking requirements.
+
+The signature checking requires that the layout of the data in memory
+**MUST** be same for signature to be verified. This means that the payload
+data layout in ELF format **MUST** match what the hypervisor would be
+expecting such that it can properly do signature verification.
+
+The signature is based on the all of the payloads continuously laid out
+in memory. The signature is to be appended at the end of the ELF payload
+prefixed with the string '~Module signature appended~\n', followed by
+an signature header then followed by the signature, key identifier, and signers
+name.
+
+Specifically the signature header would be:
+
+<pre>
+#define PKEY_ALGO_DSA       0  
+#define PKEY_ALGO_RSA       1  
+
+#define PKEY_ID_PGP         0 /* OpenPGP generated key ID */  
+#define PKEY_ID_X509        1 /* X.509 arbitrary subjectKeyIdentifier */  
+
+#define HASH_ALGO_MD4          0  
+#define HASH_ALGO_MD5          1  
+#define HASH_ALGO_SHA1         2  
+#define HASH_ALGO_RIPE_MD_160  3  
+#define HASH_ALGO_SHA256       4  
+#define HASH_ALGO_SHA384       5  
+#define HASH_ALGO_SHA512       6  
+#define HASH_ALGO_SHA224       7  
+#define HASH_ALGO_RIPE_MD_128  8  
+#define HASH_ALGO_RIPE_MD_256  9  
+#define HASH_ALGO_RIPE_MD_320 10  
+#define HASH_ALGO_WP_256      11  
+#define HASH_ALGO_WP_384      12  
+#define HASH_ALGO_WP_512      13  
+#define HASH_ALGO_TGR_128     14  
+#define HASH_ALGO_TGR_160     15  
+#define HASH_ALGO_TGR_192     16  
+
+
+struct elf_payload_signature {  
+	u8	algo;		/* Public-key crypto algorithm PKEY_ALGO_*. */  
+	u8	hash;		/* Digest algorithm: HASH_ALGO_*. */  
+	u8	id_type;	/* Key identifier type PKEY_ID*. */  
+	u8	signer_len;	/* Length of signer's name */  
+	u8	key_id_len;	/* Length of key identifier */  
+	u8	__pad[3];  
+	__be32	sig_len;	/* Length of signature data */  
+};
+
+</pre>
+(Note that this has been borrowed from Linux module signature code.).
+
+
+### .rodata sections
+
+The patching might require strings to be updated as well. As such we must be
+also able to patch the strings as needed. This sounds simple - but the compiler
+has a habit of coalescing strings that are the same - which means if we in-place
+alter the strings - other users will be inadvertently affected as well.
+
+This is also where pointers to functions live - and we may need to patch this
+as well. And switch-style jump tables.
+
+To guard against that we must be prepared to do patching similar to
+trampoline patching or in-line depending on the flavour. If we can
+do in-line patching we would need to:
+
+ * alter `.rodata` to be writeable.
+ * inline patch.
+ * alter `.rodata` to be read-only.
+
+If are doing trampoline patching we would need to:
+
+ * allocate a new memory location for the string.
+ * all locations which use this string will have to be updated to use the
+   offset to the string.
+ * mark the region RO when we are done.
+
+### .bss and .data sections.
+
+Patching writable data is not suitable as it is unclear what should be done
+depending on the current state of data. As such it should not be attempted.
+
+
+### Patching code which is in the stack.
+
+We should not patch the code which is on the stack. That can lead
+to corruption.
+
+### Inline patching
+
+The hypervisor should verify that the in-place patching would fit within
+the code or data.
+
+### Trampoline (e9 opcode)
+
+The e9 opcode used for jmpq uses a 32-bit signed displacement. That means
+we are limited to up to 2GB of virtual address to place the new code
+from the old code. That should not be a problem since Xen hypervisor has
+a very small footprint.
+
+However if we need - we can always add two trampolines. One at the 2GB
+limit that calls the next trampoline.
+
+Please note there is a small limitation for trampolines in
+function entries: The target function (+ trailing padding) must be able
+to accomodate the trampoline. On x86 with +-2 GB relative jumps,
+this means 5 bytes are  required.
+
+Depending on compiler settings, there are several functions in Xen that
+are smaller (without inter-function padding).
+
+<pre> 
+readelf -sW xen-syms | grep " FUNC " | \
+    awk '{ if ($3 < 5) print $3, $4, $5, $8 }'
+
+...
+3 FUNC LOCAL wbinvd_ipi
+3 FUNC LOCAL shadow_l1_index
+...
+</pre>
+A compile-time check for, e.g., a minimum alignment of functions or a
+runtime check that verifies symbol size (+ padding to next symbols) for
+that in the hypervisor is advised.
+
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v1 02/11] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
  2015-11-03 18:15 [PATCH v1 01/11] xsplice: Design document (v2) Ross Lagerwall
@ 2015-11-03 18:15 ` Ross Lagerwall
  2015-11-04 21:17   ` Konrad Rzeszutek Wilk
                     ` (2 more replies)
  2015-11-03 18:16 ` [PATCH v1 03/11] libxc: Implementation of XEN_XSPLICE_op in libxc Ross Lagerwall
                   ` (11 subsequent siblings)
  12 siblings, 3 replies; 40+ messages in thread
From: Ross Lagerwall @ 2015-11-03 18:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Ian Jackson,
	Ross Lagerwall, Daniel De Graaf

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

The implementation does not actually do any patching.

It just adds the framework for doing the hypercalls,
keeping track of ELF payloads, and the basic operations:
 - query which payloads exist,
 - query for specific payloads,
 - check*1, apply*1, replace*1, and unload payloads.

*1: Which of course in this patch are nops.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
v2: Rebased on keyhandler: rework keyhandler infrastructure
v3: Fixed XSM.
v4: Removed REVERTED state.
    Split status and error code.
    Add REPLACE action.
    Separate payload data from the payload structure.
    s/XSPLICE_ID_../XSPLICE_NAME_../
---
 tools/flask/policy/policy/modules/xen/xen.te |   1 +
 xen/common/Makefile                          |   1 +
 xen/common/sysctl.c                          |   6 +
 xen/common/xsplice.c                         | 398 +++++++++++++++++++++++++++
 xen/include/public/sysctl.h                  | 157 +++++++++++
 xen/include/xen/xsplice.h                    |   9 +
 xen/xsm/flask/hooks.c                        |   4 +
 xen/xsm/flask/policy/access_vectors          |   2 +
 8 files changed, 578 insertions(+)
 create mode 100644 xen/common/xsplice.c
 create mode 100644 xen/include/xen/xsplice.h

diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te
index d35ae22..542c3e1 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -72,6 +72,7 @@ allow dom0_t xen_t:xen2 {
 allow dom0_t xen_t:xen2 {
     pmu_ctrl
     get_symbol
+    xsplice_op
 };
 allow dom0_t xen_t:mmu memorymap;
 
diff --git a/xen/common/Makefile b/xen/common/Makefile
index a7829db..1b17c9d 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -56,6 +56,7 @@ obj-y += vmap.o
 obj-y += vsprintf.o
 obj-y += wait.o
 obj-y += xmalloc_tlsf.o
+obj-y += xsplice.o
 
 obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma unlzo unlz4 earlycpio,$(n).init.o)
 
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index 85e853f..517d684 100644
--- a/xen/common/sysctl.c
+++ b/xen/common/sysctl.c
@@ -28,6 +28,7 @@
 #include <xsm/xsm.h>
 #include <xen/pmstat.h>
 #include <xen/gcov.h>
+#include <xen/xsplice.h>
 
 long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
 {
@@ -460,6 +461,11 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl)
         ret = tmem_control(&op->u.tmem_op);
         break;
 
+    case XEN_SYSCTL_xsplice_op:
+        ret = xsplice_control(&op->u.xsplice);
+        copyback = 1;
+        break;
+
     default:
         ret = arch_do_sysctl(op, u_sysctl);
         copyback = 0;
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
new file mode 100644
index 0000000..d984c8a
--- /dev/null
+++ b/xen/common/xsplice.c
@@ -0,0 +1,398 @@
+/*
+ * Copyright (c) 2015 Oracle and/or its affiliates. All rights reserved.
+ *
+ */
+
+#include <xen/smp.h>
+#include <xen/keyhandler.h>
+#include <xen/spinlock.h>
+#include <xen/mm.h>
+#include <xen/list.h>
+#include <xen/guest_access.h>
+#include <xen/stdbool.h>
+#include <xen/sched.h>
+#include <xen/lib.h>
+#include <xen/xsplice.h>
+#include <public/sysctl.h>
+
+#include <asm/event.h>
+
+static DEFINE_SPINLOCK(payload_list_lock);
+static LIST_HEAD(payload_list);
+
+static unsigned int payload_cnt;
+static unsigned int payload_version = 1;
+
+struct payload {
+    int32_t state;     /* One of XSPLICE_STATE_*. */
+    int32_t rc;         /* 0 or -EXX. */
+
+    struct list_head   list;   /* Linked to 'payload_list'. */
+
+    char  id[XEN_XSPLICE_NAME_SIZE + 1];          /* Name of it. */
+};
+
+static const char *state2str(int32_t state)
+{
+#define STATE(x) [XSPLICE_STATE_##x] = #x
+    static const char *const names[] = {
+            STATE(LOADED),
+            STATE(CHECKED),
+            STATE(APPLIED),
+    };
+#undef STATE
+
+    if (state >= ARRAY_SIZE(names))
+        return "unknown";
+
+    if (state < 0)
+        return "-EXX";
+
+    if (!names[state])
+        return "unknown";
+
+    return names[state];
+}
+
+void xsplice_printall(unsigned char key)
+{
+    struct payload *data;
+
+    spin_lock(&payload_list_lock);
+
+    list_for_each_entry ( data, &payload_list, list )
+    {
+        printk(" id=%s state=%s(%d): \n", data->id,
+               state2str(data->state), data->state);
+    }
+    spin_unlock(&payload_list_lock);
+}
+
+static int verify_id(xen_xsplice_id_t *id)
+{
+    if ( id->size == 0 || id->size > XEN_XSPLICE_NAME_SIZE )
+        return -EINVAL;
+
+    if ( id->pad != 0 )
+        return -EINVAL;
+
+    if ( !guest_handle_okay(id->name, id->size) )
+        return -EINVAL;
+
+    return 0;
+}
+
+int find_payload(xen_xsplice_id_t *id, bool_t need_lock, struct payload **f)
+{
+    struct payload *data;
+    XEN_GUEST_HANDLE_PARAM(char) str;
+    char name[XEN_XSPLICE_NAME_SIZE + 1] = { 0 }; /* 128 + 1 bytes on stack. Perhaps kzalloc? */
+    int rc = -EINVAL;
+
+    rc = verify_id(id);
+    if ( rc )
+        return rc;
+
+    str = guest_handle_cast(id->name, char);
+    if ( copy_from_guest(name, str, id->size) )
+        return -EFAULT;
+
+    if ( need_lock )
+        spin_lock(&payload_list_lock);
+
+    rc = -ENOENT;
+    list_for_each_entry ( data, &payload_list, list )
+    {
+        if ( !strcmp(data->id, name) )
+        {
+            *f = data;
+            rc = 0;
+            break;
+        }
+    }
+
+    if ( need_lock )
+        spin_unlock(&payload_list_lock);
+
+    return rc;
+}
+
+
+static int verify_payload(xen_sysctl_xsplice_upload_t *upload)
+{
+    if ( verify_id(&upload->id) )
+        return -EINVAL;
+
+    if ( upload->size == 0 )
+        return -EINVAL;
+
+    if ( !guest_handle_okay(upload->payload, upload->size) )
+        return -EFAULT;
+
+    return 0;
+}
+
+/*
+ * We MUST be holding the spinlock.
+ */
+static void __free_payload(struct payload *data)
+{
+    list_del(&data->list);
+    payload_cnt --;
+    payload_version ++;
+    xfree(data);
+}
+
+static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
+{
+    struct payload *data = NULL;
+    uint8_t *raw_data;
+    int rc;
+
+    rc = verify_payload(upload);
+    if ( rc )
+        return rc;
+
+    rc = find_payload(&upload->id, true, &data);
+    if ( rc == 0 /* Found. */ )
+        return -EEXIST;
+
+    if ( rc != -ENOENT )
+        return rc;
+
+    data = xzalloc(struct payload);
+    if ( !data )
+        return -ENOMEM;
+    memset(data, 0, sizeof *data);
+
+    rc = -EFAULT;
+    if ( copy_from_guest(data->id, upload->id.name, upload->id.size) )
+        goto err_data;
+
+    rc = -ENOMEM;
+    raw_data = alloc_xenheap_pages(get_order_from_bytes(upload->size), 0);
+    if ( !raw_data )
+        goto err_data;
+
+    rc = -EFAULT;
+    if ( copy_from_guest(raw_data, upload->payload, upload->size) )
+        goto err_raw;
+
+    data->state = XSPLICE_STATE_LOADED;
+    data->rc = 0;
+    INIT_LIST_HEAD(&data->list);
+
+    spin_lock(&payload_list_lock);
+    list_add_tail(&data->list, &payload_list);
+    payload_cnt ++;
+    payload_version ++;
+    spin_unlock(&payload_list_lock);
+
+    free_xenheap_pages(raw_data, get_order_from_bytes(upload->size));
+    return 0;
+
+err_raw:
+    free_xenheap_pages(raw_data, get_order_from_bytes(upload->size));
+err_data:
+    xfree(data);
+    return rc;
+}
+
+static int xsplice_get(xen_sysctl_xsplice_summary_t *summary)
+{
+    struct payload *data;
+    int rc;
+
+    if ( summary->status.state )
+        return -EINVAL;
+
+    if ( summary->status.rc != 0 )
+        return -EINVAL;
+
+    rc = verify_id(&summary->id );
+    if ( rc )
+        return rc;
+
+    rc = find_payload(&summary->id, true, &data);
+    if ( rc )
+        return rc;
+
+    summary->status.state = data->state;
+    summary->status.rc = data->rc;
+
+    return 0;
+}
+
+static int xsplice_list(xen_sysctl_xsplice_list_t *list)
+{
+    xen_xsplice_status_t status;
+    struct payload *data;
+    unsigned int idx = 0, i = 0;
+    int rc = 0;
+    unsigned int ver = payload_version;
+
+    if ( list->nr > 1024 )
+        return -E2BIG;
+
+    if ( list->pad != 0 )
+        return -EINVAL;
+
+    if ( guest_handle_is_null(list->status) ||
+         guest_handle_is_null(list->id) ||
+         guest_handle_is_null(list->len) )
+        return -EINVAL;
+
+    if ( !guest_handle_okay(list->status, sizeof(status) * list->nr) ||
+         !guest_handle_okay(list->id, XEN_XSPLICE_NAME_SIZE * list->nr) ||
+         !guest_handle_okay(list->len, sizeof(uint32_t) * list->nr) )
+        return -EINVAL;
+
+    spin_lock(&payload_list_lock);
+    if ( list->idx > payload_cnt )
+    {
+        spin_unlock(&payload_list_lock);
+        return -EINVAL;
+    }
+
+    list_for_each_entry( data, &payload_list, list )
+    {
+        uint32_t len;
+
+        if ( list->idx > i++ )
+            continue;
+
+        status.state = data->state;
+        status.rc = data->rc;
+        len = strlen(data->id);
+
+        /* N.B. 'idx' != 'i'. */
+        if ( copy_to_guest_offset(list->id, idx * XEN_XSPLICE_NAME_SIZE,
+                                  data->id, len) ||
+             copy_to_guest_offset(list->len, idx, &len, 1) ||
+             copy_to_guest_offset(list->status, idx, &status, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+        idx ++;
+        if ( hypercall_preempt_check() || (idx + 1 > list->nr) )
+        {
+            break;
+        }
+    }
+    list->nr = payload_cnt - i; /* Remaining amount. */
+    spin_unlock(&payload_list_lock);
+    list->version = ver;
+
+    /* And how many we have processed. */
+    return rc ? rc : idx;
+}
+
+static int xsplice_action(xen_sysctl_xsplice_action_t *action)
+{
+    struct payload *data;
+    int rc;
+
+    if ( action->pad != 0 )
+        return -EINVAL;
+
+    rc = verify_id(&action->id);
+    if ( rc )
+        return rc;
+
+    spin_lock(&payload_list_lock);
+    rc = find_payload(&action->id, false /* we are holding the lock. */, &data);
+    if ( rc )
+        goto out;
+
+    switch ( action->cmd )
+    {
+    case XSPLICE_ACTION_CHECK:
+        if ( (data->state == XSPLICE_STATE_LOADED) ||
+             (data->state == XSPLICE_STATE_CHECKED) )
+        {
+            /* No implementation yet. */
+            data->state = XSPLICE_STATE_CHECKED;
+            data->rc = 0;
+            rc = 0;
+        }
+        break;
+    case XSPLICE_ACTION_UNLOAD:
+        if ( (data->state == XSPLICE_STATE_LOADED) ||
+             (data->state == XSPLICE_STATE_CHECKED) )
+        {
+            __free_payload(data);
+            /* No touching 'data' from here on! */
+            rc = 0;
+        }
+        break;
+    case XSPLICE_ACTION_REVERT:
+        if ( data->state == XSPLICE_STATE_APPLIED )
+        {
+            /* No implementation yet. */
+            data->state = XSPLICE_STATE_CHECKED;
+            data->rc = 0;
+            rc = 0;
+        }
+        break;
+    case XSPLICE_ACTION_APPLY:
+        if ( (data->state == XSPLICE_STATE_CHECKED) )
+        {
+            /* No implementation yet. */
+            data->state = XSPLICE_STATE_APPLIED;
+            data->rc = 0;
+            rc = 0;
+        }
+        break;
+    case XSPLICE_ACTION_REPLACE:
+        if ( data->state == XSPLICE_STATE_CHECKED )
+        {
+            /* No implementation yet. */
+            data->state = XSPLICE_STATE_CHECKED;
+            data->rc = 0;
+            rc = 0;
+        }
+        break;
+    default:
+        rc = -ENOSYS;
+        break;
+    }
+
+ out:
+    spin_unlock(&payload_list_lock);
+
+    return rc;
+}
+
+int xsplice_control(xen_sysctl_xsplice_op_t *xsplice)
+{
+    int rc;
+
+    switch ( xsplice->cmd )
+    {
+    case XEN_SYSCTL_XSPLICE_UPLOAD:
+        rc = xsplice_upload(&xsplice->u.upload);
+        break;
+    case XEN_SYSCTL_XSPLICE_GET:
+        rc = xsplice_get(&xsplice->u.get);
+        break;
+    case XEN_SYSCTL_XSPLICE_LIST:
+        rc = xsplice_list(&xsplice->u.list);
+        break;
+    case XEN_SYSCTL_XSPLICE_ACTION:
+        rc = xsplice_action(&xsplice->u.action);
+        break;
+    default:
+        rc = -ENOSYS;
+        break;
+   }
+
+    return rc;
+}
+
+static int __init xsplice_init(void)
+{
+    register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
+    return 0;
+}
+__initcall(xsplice_init);
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 96680eb..fb9bcca 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -766,6 +766,161 @@ struct xen_sysctl_tmem_op {
 typedef struct xen_sysctl_tmem_op xen_sysctl_tmem_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tmem_op_t);
 
+/*
+ * XEN_SYSCTL_XSPLICE_op
+ *
+ * Refer to the docs/misc/xsplice.markdown for the design details
+ * of this hypercall.
+ */
+
+/*
+ * Structure describing an ELF payload. Uniquely identifies the
+ * payload. Should be human readable.
+ * Recommended length is XEN_XSPLICE_NAME_SIZE.
+ */
+#define XEN_XSPLICE_NAME_SIZE 128
+struct xen_xsplice_id {
+    XEN_GUEST_HANDLE_64(char) name;         /* IN: pointer to name. */
+    uint32_t    size;                       /* IN: size of name. May be upto
+                                               XEN_XSPLICE_NAME_SIZE. */
+    uint32_t    pad;                        /* IN: MUST be zero. */
+};
+typedef struct xen_xsplice_id xen_xsplice_id_t;
+DEFINE_XEN_GUEST_HANDLE(xen_xsplice_id_t);
+
+/*
+ * Upload a payload to the hypervisor. The payload is verified
+ * against basic checks and if there are any issues the proper return code
+ * will be returned. The payload is not applied at this time - that is
+ * controlled by XEN_SYSCTL_XSPLICE_ACTION.
+ *
+ * The return value is zero if the payload was succesfully uploaded.
+ * Otherwise an EXX return value is provided. Duplicate `id` are not supported.
+ * The payload at this point is verified against the basic checks.
+ *
+ * The `payload` is the ELF payload as mentioned in the `Payload format`
+ * section in the xSplice design document.
+ */
+#define XEN_SYSCTL_XSPLICE_UPLOAD 0
+struct xen_sysctl_xsplice_upload {
+    xen_xsplice_id_t id;                    /* IN, name of the patch. */
+    uint64_t    size;                       /* IN, size of the ELF file. */
+    XEN_GUEST_HANDLE_64(uint8) payload;     /* IN, the ELF file. */
+};
+typedef struct xen_sysctl_xsplice_upload xen_sysctl_xsplice_upload_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_upload_t);
+
+/*
+ * Retrieve an status of an specific payload.
+ *
+ * Upon completion the `struct xen_xsplice_status` is updated.
+ *
+ * The return value is zero on success and XEN_EXX on failure. This operation
+ * is synchronous and does not require preemption.
+ */
+#define XEN_SYSCTL_XSPLICE_GET 1
+
+struct xen_xsplice_status {
+#define XSPLICE_STATE_LOADED       0x01
+#define XSPLICE_STATE_CHECKED      0x02
+#define XSPLICE_STATE_APPLIED      0x04
+    int32_t state;                 /* OUT: XSPLICE_STATE_*. IN: MUST be zero. */
+    int32_t rc;                    /* OUT: 0 if no error, otherwise -XEN_EXX. */
+                                   /* IN: MUST be zero. */
+};
+typedef struct xen_xsplice_status xen_xsplice_status_t;
+DEFINE_XEN_GUEST_HANDLE(xen_xsplice_status_t);
+
+struct xen_sysctl_xsplice_summary {
+    xen_xsplice_id_t id;                  /* IN, name of the payload. */
+    xen_xsplice_status_t status;            /* IN/OUT, state of it. */
+};
+typedef struct xen_sysctl_xsplice_summary xen_sysctl_xsplice_summary_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_summary_t);
+
+/*
+ * Retrieve an array of abbreviated status and names of payloads that are
+ * loaded in the hypervisor.
+ *
+ * If the hypercall returns an positive number, it is the number (up to `nr`)
+ * of the payloads returned, along with `nr` updated with the number of remaining
+ * payloads, `version` updated (it may be the same across hypercalls. If it
+ * varies the data is stale and further calls could fail). The `status`,
+ * `id`, and `len`' are updated at their designed index value (`idx`) with
+ * the returned value of data.
+ *
+ * If the hypercall returns E2BIG the `count` is too big and should be
+ * lowered.
+ *
+ * This operation can be preempted by the hypercall returning EAGAIN.
+ * Retry.
+ *
+ * Note that due to the asynchronous nature of hypercalls the domain might have
+ * added or removed the number of payloads making this information stale. It is
+ * the responsibility of the toolstack to use the `version` field to check
+ * between each invocation. if the version differs it should discard the stale
+ * data and start from scratch. It is OK for the toolstack to use the new
+ * `version` field.
+ */
+#define XEN_SYSCTL_XSPLICE_LIST 2
+struct xen_sysctl_xsplice_list {
+    uint32_t version;                       /* IN/OUT: Initially *MUST* be zero.
+                                               On subsequent calls reuse value.
+                                               If varies between calls, we are
+                                             * getting stale data. */
+    uint32_t idx;                           /* IN/OUT: Index into array. */
+    uint32_t nr;                            /* IN: How many status, id, and len
+                                               should populate.
+                                               OUT: How many payloads left. */
+    uint32_t pad;                           /* IN: Must be zero. */
+    XEN_GUEST_HANDLE_64(xen_xsplice_status_t) status;  /* OUT. Must have enough
+                                               space allocate for n of them. */
+    XEN_GUEST_HANDLE_64(char) id;           /* OUT: Array of ids. Each member
+                                               MUST XEN_XSPLICE_NAME_SIZE in size.
+                                               Must have n of them. */
+    XEN_GUEST_HANDLE_64(uint32) len;        /* OUT: Array of lengths of ids.
+                                               Must have n of them. */
+};
+typedef struct xen_sysctl_xsplice_list xen_sysctl_xsplice_list_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_list_t);
+
+/*
+ * Perform an operation on the payload structure referenced by the `id` field.
+ * The operation request is asynchronous and the status should be retrieved
+ * by using either XEN_SYSCTL_XSPLICE_GET or XEN_SYSCTL_XSPLICE_LIST hypercall.
+ * If the operation fails more details on the operation can be retrieved via
+ * XEN_SYSCTL_XSPLICE_INFO hypercall.
+ */
+#define XEN_SYSCTL_XSPLICE_ACTION 3
+struct xen_sysctl_xsplice_action {
+    xen_xsplice_id_t id;                    /* IN, name of the patch. */
+#define XSPLICE_ACTION_CHECK        1
+#define XSPLICE_ACTION_UNLOAD       2
+#define XSPLICE_ACTION_REVERT       3
+#define XSPLICE_ACTION_APPLY        4
+#define XSPLICE_ACTION_REPLACE      5
+    uint32_t    cmd;                        /* IN: XSPLICE_ACTION_*. */
+    uint32_t    pad;                        /* IN: MUST be zero. */
+    uint64_aligned_t time;                  /* IN: Zero if no timeout. */
+                                            /* Or upper bound of time (ms) */
+                                            /* for operation to take. */
+};
+typedef struct xen_sysctl_xsplice_action xen_sysctl_xsplice_action_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_action_t);
+
+struct xen_sysctl_xsplice_op {
+    uint32_t cmd;                           /* IN: XEN_SYSCTL_XSPLICE_* */
+    uint32_t pad;                           /* IN: Always zero. */
+    union {
+        xen_sysctl_xsplice_upload_t upload;
+        xen_sysctl_xsplice_list_t list;
+        xen_sysctl_xsplice_summary_t get;
+        xen_sysctl_xsplice_action_t action;
+    } u;
+};
+typedef struct xen_sysctl_xsplice_op xen_sysctl_xsplice_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_sysctl_xsplice_op_t);
+
 struct xen_sysctl {
     uint32_t cmd;
 #define XEN_SYSCTL_readconsole                    1
@@ -791,6 +946,7 @@ struct xen_sysctl {
 #define XEN_SYSCTL_pcitopoinfo                   22
 #define XEN_SYSCTL_psr_cat_op                    23
 #define XEN_SYSCTL_tmem_op                       24
+#define XEN_SYSCTL_xsplice_op                    25
     uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */
     union {
         struct xen_sysctl_readconsole       readconsole;
@@ -816,6 +972,7 @@ struct xen_sysctl {
         struct xen_sysctl_psr_cmt_op        psr_cmt_op;
         struct xen_sysctl_psr_cat_op        psr_cat_op;
         struct xen_sysctl_tmem_op           tmem_op;
+        struct xen_sysctl_xsplice_op        xsplice;
         uint8_t                             pad[128];
     } u;
 };
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
new file mode 100644
index 0000000..41e28da
--- /dev/null
+++ b/xen/include/xen/xsplice.h
@@ -0,0 +1,9 @@
+#ifndef __XEN_XSPLICE_H__
+#define __XEN_XSPLICE_H__
+
+struct xen_sysctl_xsplice_op;
+int xsplice_control(struct xen_sysctl_xsplice_op *);
+
+extern void xsplice_printall(unsigned char key);
+
+#endif /* __XEN_XSPLICE_H__ */
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 4180f3b..41683f4 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -807,6 +807,10 @@ static int flask_sysctl(int cmd)
     case XEN_SYSCTL_tmem_op:
         return domain_has_xen(current->domain, XEN__TMEM_CONTROL);
 
+    case XEN_SYSCTL_xsplice_op:
+        return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2,
+                                    XEN2__XSPLICE_OP, NULL);
+
     default:
         printk("flask_sysctl: Unknown op %d\n", cmd);
         return -EPERM;
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index effb59f..5f08d05 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -93,6 +93,8 @@ class xen2
     pmu_ctrl
 # PMU use (domains, including unprivileged ones, will be using this operation)
     pmu_use
+# XEN_SYSCTL_xsplice_op
+    xsplice_op
 }
 
 # Classes domain and domain2 consist of operations that a domain performs on
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v1 03/11] libxc: Implementation of XEN_XSPLICE_op in libxc.
  2015-11-03 18:15 [PATCH v1 01/11] xsplice: Design document (v2) Ross Lagerwall
  2015-11-03 18:15 ` [PATCH v1 02/11] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op Ross Lagerwall
@ 2015-11-03 18:16 ` Ross Lagerwall
  2015-11-03 18:16 ` [PATCH v1 04/11] xen-xsplice: Tool to manipulate xsplice payloads Ross Lagerwall
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 40+ messages in thread
From: Ross Lagerwall @ 2015-11-03 18:16 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Ian Jackson, Ross Lagerwall

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

The underlaying toolstack code to do the basic
operations when using the XEN_XSPLICE_op syscalls:
 - upload the payload,
 - get status of an payload,
 - list all the payloads,
 - apply, check, replace, and revert the payload.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
v2: Actually set zero for the _pad entries.
v3: Split status into state and error code.
    Add REPLACE action.
---
 tools/libxc/include/xenctrl.h |  18 +++
 tools/libxc/xc_misc.c         | 283 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 301 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 2fec1fb..b6e2fb2 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2859,6 +2859,24 @@ int xc_psr_cat_get_l3_info(xc_interface *xch, uint32_t socket,
                            bool *cdp_enabled);
 #endif
 
+int xc_xsplice_upload(xc_interface *xch,
+                      char *id, char *payload, uint32_t size);
+
+int xc_xsplice_get(xc_interface *xch,
+                   char *id,
+                   xen_xsplice_status_t *status);
+
+int xc_xsplice_list(xc_interface *xch, unsigned int max, unsigned int start,
+                    xen_xsplice_status_t *info, char *id,
+                    uint32_t *len, unsigned int *done,
+                    unsigned int *left);
+
+int xc_xsplice_apply(xc_interface *xch, char *id);
+int xc_xsplice_revert(xc_interface *xch, char *id);
+int xc_xsplice_unload(xc_interface *xch, char *id);
+int xc_xsplice_check(xc_interface *xch, char *id);
+int xc_xsplice_replace(xc_interface *xch, char *id);
+
 #endif /* XENCTRL_H */
 
 /*
diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c
index c613545..6e6ffc0 100644
--- a/tools/libxc/xc_misc.c
+++ b/tools/libxc/xc_misc.c
@@ -718,6 +718,289 @@ int xc_hvm_inject_trap(
     return rc;
 }
 
+int xc_xsplice_upload(xc_interface *xch,
+                      char *id,
+                      char *payload,
+                      uint32_t size)
+{
+    int rc;
+    DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BOUNCE(payload, size, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    DECLARE_HYPERCALL_BOUNCE(id, 0 /* adjust later */, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    ssize_t len;
+
+    if ( !id || !payload )
+        return -1;
+
+    len = strlen(id);
+    if ( len > XEN_XSPLICE_NAME_SIZE )
+        return -1;
+
+    HYPERCALL_BOUNCE_SET_SIZE(id, len);
+
+    if ( xc_hypercall_bounce_pre(xch, id) )
+        return -1;
+
+    if ( xc_hypercall_bounce_pre(xch, payload) )
+        return -1;
+
+    sysctl.cmd = XEN_SYSCTL_xsplice_op;
+    sysctl.u.xsplice.cmd = XEN_SYSCTL_XSPLICE_UPLOAD;
+    sysctl.u.xsplice.u.upload.size = size;
+    set_xen_guest_handle(sysctl.u.xsplice.u.upload.payload, payload);
+
+    sysctl.u.xsplice.u.upload.id.size = len;
+    sysctl.u.xsplice.u.upload.id.pad = 0;
+    set_xen_guest_handle(sysctl.u.xsplice.u.upload.id.name, id);
+
+    rc = do_sysctl(xch, &sysctl);
+
+    xc_hypercall_bounce_post(xch, payload);
+    xc_hypercall_bounce_post(xch, id);
+
+    return rc;
+}
+
+int xc_xsplice_get(xc_interface *xch,
+                   char *id,
+                   xen_xsplice_status_t *status)
+{
+    int rc;
+    DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BOUNCE(id, 0 /*adjust later */, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    ssize_t len;
+
+    if ( !id )
+        return -1;
+
+    len = strlen(id);
+    if ( len > XEN_XSPLICE_NAME_SIZE )
+        return -1;
+
+    HYPERCALL_BOUNCE_SET_SIZE(id, len);
+
+    if ( xc_hypercall_bounce_pre(xch, id) )
+        return -1;
+
+    sysctl.cmd = XEN_SYSCTL_xsplice_op;
+    sysctl.u.xsplice.cmd = XEN_SYSCTL_XSPLICE_GET;
+
+    sysctl.u.xsplice.u.get.status.state = 0;
+    sysctl.u.xsplice.u.get.status.rc = 0;
+
+    sysctl.u.xsplice.u.get.id.size = len;
+    sysctl.u.xsplice.u.get.id.pad = 0;
+    set_xen_guest_handle(sysctl.u.xsplice.u.get.id.name, id);
+
+    rc = do_sysctl(xch, &sysctl);
+
+    xc_hypercall_bounce_post(xch, id);
+
+    memcpy(status, &sysctl.u.xsplice.u.get.status, sizeof(*status));
+
+    return rc;
+}
+
+int xc_xsplice_list(xc_interface *xch, unsigned int max, unsigned int start,
+                    xen_xsplice_status_t *info,
+                    char *id, uint32_t *len,
+                    unsigned int *done,
+                    unsigned int *left)
+{
+    int rc;
+    DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BOUNCE(info, 0 /* adjust later. */, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_HYPERCALL_BOUNCE(id, 0 /* adjust later. */, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    DECLARE_HYPERCALL_BOUNCE(len, 0 /* adjust later. */, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+    uint32_t max_batch_sz, nr;
+    uint32_t version = 0, retries = 0;
+    uint32_t adjust = 0;
+
+    if ( !max || !info || !id || !len )
+        return -1;
+
+    sysctl.cmd = XEN_SYSCTL_xsplice_op;
+    sysctl.u.xsplice.cmd = XEN_SYSCTL_XSPLICE_LIST;
+    sysctl.u.xsplice.u.list.version = 0;
+    sysctl.u.xsplice.u.list.idx = start;
+    sysctl.u.xsplice.u.list.pad = 0;
+
+    max_batch_sz = max;
+
+    *done = 0;
+    *left = 0;
+    do {
+        if ( adjust )
+            adjust = 0; /* Used when adjusting the 'max_batch_sz' or 'retries'. */
+
+        nr = min(max - *done, max_batch_sz);
+
+        sysctl.u.xsplice.u.list.nr = nr;
+        /* Fix the size (may vary between hypercalls). */
+        HYPERCALL_BOUNCE_SET_SIZE(info, nr * sizeof(*info));
+        HYPERCALL_BOUNCE_SET_SIZE(id, nr * sizeof(*id) * XEN_XSPLICE_NAME_SIZE);
+        HYPERCALL_BOUNCE_SET_SIZE(len, nr * sizeof(*len));
+        /* Move the pointer to proper offset into 'info'. */
+        (HYPERCALL_BUFFER(info))->ubuf = info + *done;
+        (HYPERCALL_BUFFER(id))->ubuf = id + (sizeof(*id) * XEN_XSPLICE_NAME_SIZE * *done);
+        (HYPERCALL_BUFFER(len))->ubuf = len + *done;
+        /* Allocate memory. */
+        rc = xc_hypercall_bounce_pre(xch, info);
+        if ( rc )
+            return rc;
+
+        rc = xc_hypercall_bounce_pre(xch, id);
+        if ( rc )
+        {
+            xc_hypercall_bounce_post(xch, info);
+            return rc;
+        }
+        rc = xc_hypercall_bounce_pre(xch, len);
+        if ( rc )
+        {
+            xc_hypercall_bounce_post(xch, info);
+            xc_hypercall_bounce_post(xch, id);
+            return rc;
+        }
+        set_xen_guest_handle(sysctl.u.xsplice.u.list.status, info);
+        set_xen_guest_handle(sysctl.u.xsplice.u.list.id, id);
+        set_xen_guest_handle(sysctl.u.xsplice.u.list.len, len);
+
+        rc = do_sysctl(xch, &sysctl);
+        /*
+         * From here on we MUST call xc_hypercall_bounce. If rc < 0 we
+         * end up doing it (outside the loop), so using a break is OK.
+         */
+        if ( rc < 0 && errno == E2BIG )
+        {
+            if ( max_batch_sz <= 1 )
+                break;
+            max_batch_sz >>= 1;
+            adjust = 1; /* For the loop conditional to let us loop again. */
+            /* No memory leaks! */
+            xc_hypercall_bounce_post(xch, info);
+            xc_hypercall_bounce_post(xch, id);
+            xc_hypercall_bounce_post(xch, len);
+            continue;
+        }
+        else if ( rc < 0 ) /* For all other errors we bail out. */
+            break;
+
+        if ( !version )
+            version = sysctl.u.xsplice.u.list.version;
+
+        if ( sysctl.u.xsplice.u.list.version != version )
+        {
+            /* TODO: retries should be configurable? */
+            if ( retries++ > 3 )
+            {
+                rc = -1;
+                errno = EBUSY;
+                break;
+            }
+            *done = 0; /* Retry from scratch. */
+            version = sysctl.u.xsplice.u.list.version;
+            adjust = 1; /* And make sure we continue in the loop. */
+            /* No memory leaks. */
+            xc_hypercall_bounce_post(xch, info);
+            xc_hypercall_bounce_post(xch, id);
+            xc_hypercall_bounce_post(xch, len);
+            continue;
+        }
+
+        /* We should never hit this, but just in case. */
+        if ( rc > nr )
+        {
+            errno = EINVAL; /* Overflow! */
+            rc = -1;
+            break;
+        }
+        *left = sysctl.u.xsplice.u.list.nr; /* Total remaining count. */
+        /* Copy only up 'rc' of data' - we could add 'min(rc,nr) if desired. */
+        HYPERCALL_BOUNCE_SET_SIZE(info, (rc * sizeof(*info)));
+        HYPERCALL_BOUNCE_SET_SIZE(id, (rc * sizeof(*id) * XEN_XSPLICE_NAME_SIZE));
+        HYPERCALL_BOUNCE_SET_SIZE(len, (rc * sizeof(*len)));
+        /* Bounce the data and free the bounce buffer. */
+        xc_hypercall_bounce_post(xch, info);
+        xc_hypercall_bounce_post(xch, id);
+        xc_hypercall_bounce_post(xch, len);
+        /* And update how many elements of info we have copied into. */
+        *done += rc;
+        /* Update idx. */
+        sysctl.u.xsplice.u.list.idx = *done;
+    } while ( adjust || (*done < max && *left != 0) );
+
+    if ( rc < 0 )
+    {
+        xc_hypercall_bounce_post(xch, len);
+        xc_hypercall_bounce_post(xch, id);
+        xc_hypercall_bounce_post(xch, info);
+    }
+
+    return rc > 0 ? 0 : rc;
+}
+
+static int _xc_xsplice_action(xc_interface *xch,
+                              char *id,
+                              unsigned int action)
+{
+    int rc;
+    DECLARE_SYSCTL;
+    DECLARE_HYPERCALL_BOUNCE(id, 0 /* adjust later */, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+    ssize_t len;
+
+    len = strlen(id);
+
+    if ( len > XEN_XSPLICE_NAME_SIZE )
+        return -1;
+
+    HYPERCALL_BOUNCE_SET_SIZE(id, len);
+
+    if ( xc_hypercall_bounce_pre(xch, id) )
+        return -1;
+
+    sysctl.cmd = XEN_SYSCTL_xsplice_op;
+    sysctl.u.xsplice.cmd = XEN_SYSCTL_XSPLICE_ACTION;
+    sysctl.u.xsplice.u.action.cmd = action;
+    sysctl.u.xsplice.u.action.pad = 0;
+    sysctl.u.xsplice.u.action.time = 0; /* TODO */
+
+    sysctl.u.xsplice.u.action.id.size = len;
+    sysctl.u.xsplice.u.action.id.pad = 0;
+    set_xen_guest_handle(sysctl.u.xsplice.u.action.id.name, id);
+
+    rc = do_sysctl(xch, &sysctl);
+
+    xc_hypercall_bounce_post(xch, id);
+
+    return rc;
+}
+
+int xc_xsplice_apply(xc_interface *xch, char *id)
+{
+    return _xc_xsplice_action(xch, id, XSPLICE_ACTION_APPLY);
+}
+
+int xc_xsplice_revert(xc_interface *xch, char *id)
+{
+    return _xc_xsplice_action(xch, id, XSPLICE_ACTION_REVERT);
+}
+
+int xc_xsplice_unload(xc_interface *xch, char *id)
+{
+    return _xc_xsplice_action(xch, id, XSPLICE_ACTION_UNLOAD);
+}
+
+int xc_xsplice_check(xc_interface *xch, char *id)
+{
+    return _xc_xsplice_action(xch, id, XSPLICE_ACTION_CHECK);
+}
+
+int xc_xsplice_replace(xc_interface *xch, char *id)
+{
+    return _xc_xsplice_action(xch, id, XSPLICE_ACTION_REPLACE);
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v1 04/11] xen-xsplice: Tool to manipulate xsplice payloads.
  2015-11-03 18:15 [PATCH v1 01/11] xsplice: Design document (v2) Ross Lagerwall
  2015-11-03 18:15 ` [PATCH v1 02/11] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op Ross Lagerwall
  2015-11-03 18:16 ` [PATCH v1 03/11] libxc: Implementation of XEN_XSPLICE_op in libxc Ross Lagerwall
@ 2015-11-03 18:16 ` Ross Lagerwall
  2015-11-04 21:27   ` Konrad Rzeszutek Wilk
  2015-11-03 18:16 ` [PATCH v1 05/11] elf: Add relocation types to elfstructs.h Ross Lagerwall
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 40+ messages in thread
From: Ross Lagerwall @ 2015-11-03 18:16 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Ian Jackson, Ross Lagerwall

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

A simple tool that allows an system admin to perform
basic xsplice operations:

 - Upload a xsplice file (with an unique id)
 - List all the xsplice payloads loaded.
 - Apply, revert, replace, unload, or check the payload using the
   unique id.
 - Do all three - upload, check, and apply the
   payload in one go (all).

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
v2:
 - Removed REVERTED state.
 - Fixed bugs handling XSPLICE_STATUS_PROGRESS.
 - Split status into state and error.
   Add REPLACE action.
---
 .gitignore               |   1 +
 tools/misc/Makefile      |   4 +
 tools/misc/xen-xsplice.c | 439 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 444 insertions(+)
 create mode 100644 tools/misc/xen-xsplice.c

diff --git a/.gitignore b/.gitignore
index 91e1430..02d5c4a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -168,6 +168,7 @@ tools/misc/xc_shadow
 tools/misc/xen_cpuperf
 tools/misc/xen-detect
 tools/misc/xen-tmem-list-parse
+tools/misc/xen-xsplice
 tools/misc/xenperf
 tools/misc/xenpm
 tools/misc/xen-hvmctx
diff --git a/tools/misc/Makefile b/tools/misc/Makefile
index c4490f3..c46873e 100644
--- a/tools/misc/Makefile
+++ b/tools/misc/Makefile
@@ -30,6 +30,7 @@ INSTALL_SBIN                   += xenlockprof
 INSTALL_SBIN                   += xenperf
 INSTALL_SBIN                   += xenpm
 INSTALL_SBIN                   += xenwatchdogd
+INSTALL_SBIN                   += xen-xsplice
 INSTALL_SBIN += $(INSTALL_SBIN-y)
 
 # Everything to be installed in a private bin/
@@ -98,6 +99,9 @@ xen-mfndump: xen-mfndump.o
 xenwatchdogd: xenwatchdogd.o
 	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
 
+xen-xsplice: xen-xsplice.o
+	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
+
 xen-lowmemd: xen-lowmemd.o
 	$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(LDLIBS_libxenstore) $(APPEND_LDFLAGS)
 
diff --git a/tools/misc/xen-xsplice.c b/tools/misc/xen-xsplice.c
new file mode 100644
index 0000000..a208aa4
--- /dev/null
+++ b/tools/misc/xen-xsplice.c
@@ -0,0 +1,439 @@
+/*
+ * Copyright (c) 2015 Oracle and/or its affiliates. All rights reserved.
+ */
+#include <xenctrl.h>
+#include <xenstore.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdio.h>
+#include <sys/stat.h>
+#include <sys/mman.h>
+#include <fcntl.h>
+static xc_interface *xch;
+
+void show_help(void)
+{
+    fprintf(stderr,
+            "xen-xsplice: Xsplice test tool\n"
+            "Usage: xen-xsplice <command> [args]\n"
+            " <id> An unique name of payload. Up to %d characters.\n"
+            "Commands:\n"
+            "  help                 display this help\n"
+            "  upload <id> <file>   upload file <file> with <id> name\n"
+            "  list                 list payloads uploaded.\n"
+            "  apply <id>           apply <id> patch.\n"
+            "  revert <id>          revert id <id> patch.\n"
+            "  replace <id>         apply <id> patch and revert all others.\n"
+            "  unload <id>          unload id <id> patch.\n"
+            "  check <id>           check id <id> patch.\n"
+            "  all <id> <file>      upload, check and apply <file>.\n",
+            XEN_XSPLICE_NAME_SIZE);
+}
+
+/* wrapper function */
+static int help_func(int argc, char *argv[])
+{
+    show_help();
+    return 0;
+}
+
+#define ARRAY_SIZE(a) (sizeof (a) / sizeof ((a)[0]))
+
+static const char *state2str(long state)
+{
+#define STATE(x) [XSPLICE_STATE_##x] = #x
+    static const char *const names[] = {
+            STATE(LOADED),
+            STATE(CHECKED),
+            STATE(APPLIED),
+    };
+#undef STATE
+    if (state >= ARRAY_SIZE(names))
+        return "unknown";
+
+    if (state < 0)
+        return "-EXX";
+
+    if (!names[state])
+        return "unknown";
+
+    return names[state];
+}
+
+#define MAX_LEN 11
+static int list_func(int argc, char *argv[])
+{
+    unsigned int idx, done, left, i;
+    xen_xsplice_status_t *info = NULL;
+    char *id = NULL;
+    uint32_t *len = NULL;
+    int rc = ENOMEM;
+
+    if ( argc )
+    {
+        show_help();
+        return -1;
+    }
+    idx = left = 0;
+    info = malloc(sizeof(*info) * MAX_LEN);
+    if ( !info )
+        goto out;
+    id = malloc(sizeof(*id) * XEN_XSPLICE_NAME_SIZE * MAX_LEN);
+    if ( !id )
+        goto out;
+    len = malloc(sizeof(*len) * MAX_LEN);
+    if ( !len )
+        goto out;
+
+    fprintf(stdout," ID                                     | status\n"
+                   "----------------------------------------+------------\n");
+    do {
+        done = 0;
+        memset(info, 'A', sizeof(*info) * MAX_LEN); /* Optional. */
+        memset(id, 'i', sizeof(*id) * MAX_LEN * XEN_XSPLICE_NAME_SIZE); /* Optional. */
+        memset(len, 'l', sizeof(*len) * MAX_LEN); /* Optional. */
+        rc = xc_xsplice_list(xch, MAX_LEN, idx, info, id, len, &done, &left);
+        if ( rc )
+        {
+            fprintf(stderr, "Failed to list %d/%d: %d(%s)!\n", idx, left, errno, strerror(errno));
+            break;
+        }
+        for ( i = 0; i < done; i++ )
+        {
+            unsigned int j;
+            uint32_t sz;
+            char *str;
+
+            sz = len[i];
+            str = id + (i * XEN_XSPLICE_NAME_SIZE);
+            for ( j = sz; j < XEN_XSPLICE_NAME_SIZE; j++ )
+                str[j] = '\0';
+
+            printf("%-40s| %s", str, state2str(info[i].state));
+            if ( info[i].rc )
+                printf(" (%d, %s)\n", -info[i].rc, strerror(-info[i].rc));
+            else
+                puts("");
+        }
+        idx += done;
+    } while ( left );
+
+out:
+    free(id);
+    free(info);
+    free(len);
+    return rc;
+}
+#undef MAX_LEN
+
+static int get_id(int argc, char *argv[], char *id)
+{
+    ssize_t len = strlen(argv[0]);
+    if ( len > XEN_XSPLICE_NAME_SIZE )
+    {
+        fprintf(stderr, "ID MUST be %d characters!\n", XEN_XSPLICE_NAME_SIZE);
+        errno = EINVAL;
+        return errno;
+    }
+    /* Don't want any funny strings from the stack. */
+    memset(id, 0, XEN_XSPLICE_NAME_SIZE);
+    strncpy(id, argv[0], len);
+    return 0;
+}
+
+static int upload_func(int argc, char *argv[])
+{
+    char *filename;
+    char id[XEN_XSPLICE_NAME_SIZE];
+    int fd = 0, rc;
+    struct stat buf;
+    unsigned char *fbuf;
+    ssize_t len;
+    DECLARE_HYPERCALL_BUFFER(char, payload);
+
+    if ( argc != 2 )
+    {
+        show_help();
+        return -1;
+    }
+
+    if ( get_id(argc, argv, id) )
+        return EINVAL;
+
+    filename = argv[1];
+    fd = open(filename, O_RDONLY);
+    if ( fd < 0 )
+    {
+        fprintf(stderr, "Could not open %s, error: %d(%s)\n",
+                filename, errno, strerror(errno));
+        return errno;
+    }
+    if ( stat(filename, &buf) != 0 )
+    {
+        fprintf(stderr, "Could not get right size %s, error: %d(%s)\n",
+                filename, errno, strerror(errno));
+        close(fd);
+        return errno;
+    }
+
+    len = buf.st_size;
+    fbuf = mmap(0, len, PROT_READ, MAP_PRIVATE, fd, 0);
+    if ( fbuf == MAP_FAILED )
+    {
+        fprintf(stderr,"Could not map: %s, error: %d(%s)\n",
+                filename, errno, strerror(errno));
+        close (fd);
+        return errno;
+    }
+    printf("Uploading %s (%zu bytes)\n", filename, len);
+    payload = xc_hypercall_buffer_alloc(xch, payload, len);
+    memcpy(payload, fbuf, len);
+
+    rc = xc_xsplice_upload(xch, id, payload, len);
+    if ( rc )
+    {
+        fprintf(stderr, "Upload failed: %s, error: %d(%s)!\n",
+                filename, errno, strerror(errno));
+        goto out;
+    }
+    xc_hypercall_buffer_free(xch, payload);
+
+out:
+    if ( munmap( fbuf, len) )
+    {
+        fprintf(stderr, "Could not unmap!? error: %d(%s)!\n",
+                errno, strerror(errno));
+        rc = errno;
+    }
+    close(fd);
+
+    return rc;
+}
+
+enum {
+    ACTION_APPLY = 0,
+    ACTION_REVERT = 1,
+    ACTION_UNLOAD = 2,
+    ACTION_CHECK = 3
+};
+
+struct {
+    int allow; /* State it must be in to call function. */
+    int expected; /* The state to be in after the function. */
+    const char *name;
+    int (*function)(xc_interface *xch, char *id);
+    unsigned int executed; /* Has the function been called?. */
+} action_options[] = {
+    {   .allow = XSPLICE_STATE_CHECKED,
+        .expected = XSPLICE_STATE_APPLIED,
+        .name = "apply",
+        .function = xc_xsplice_apply,
+    },
+    {   .allow = XSPLICE_STATE_APPLIED,
+        .expected = XSPLICE_STATE_CHECKED,
+        .name = "revert",
+        .function = xc_xsplice_revert,
+    },
+    {   .allow = XSPLICE_STATE_CHECKED | XSPLICE_STATE_LOADED,
+        .expected = -ENOENT,
+        .name = "unload",
+        .function = xc_xsplice_unload,
+    },
+    {   .allow = XSPLICE_STATE_CHECKED | XSPLICE_STATE_LOADED,
+        .expected = XSPLICE_STATE_CHECKED,
+        .name = "check",
+        .function = xc_xsplice_check
+    },
+    {   .allow = XSPLICE_STATE_CHECKED,
+        .expected = XSPLICE_STATE_APPLIED,
+        .name = "replace",
+        .function = xc_xsplice_replace,
+    },
+};
+
+#define RETRIES 300
+#define DELAY 100000
+
+int action_func(int argc, char *argv[], unsigned int idx)
+{
+    char id[XEN_XSPLICE_NAME_SIZE];
+    int rc, original_state;
+    xen_xsplice_status_t status;
+    unsigned int retry = 0;
+
+    if ( argc != 1 )
+    {
+        show_help();
+        return -1;
+    }
+
+    if ( idx >= ARRAY_SIZE(action_options) )
+        return -1;
+
+    if ( get_id(argc, argv, id) )
+        return EINVAL;
+
+    /* Check initial status. */
+    rc = xc_xsplice_get(xch, id, &status);
+    if ( rc )
+        goto err;
+
+    if ( status.rc == -EAGAIN )
+    {
+        printf("%s failed. Operation already in progress\n", id);
+        return -1;
+    }
+
+    if ( status.state == action_options[idx].expected )
+    {
+        printf("No action needed\n");
+        return 0;
+    }
+
+    /* Perform action. */
+    if ( action_options[idx].allow & status.state )
+    {
+        printf("Performing %s:", action_options[idx].name);
+        rc = action_options[idx].function(xch, id);
+        if ( rc )
+            goto err;
+    }
+    else
+    {
+        printf("%s: in wrong state (%s), expected (%s)\n",
+               id, state2str(status.state),
+               state2str(action_options[idx].expected));
+        return -1;
+    }
+
+    original_state = status.state;
+    do {
+        rc = xc_xsplice_get(xch, id, &status);
+        if ( rc )
+        {
+            rc = -errno;
+            break;
+        }
+
+        if ( status.state != original_state )
+            break;
+        if ( status.rc && status.rc != -EAGAIN )
+        {
+            rc = status.rc;
+            break;
+        }
+
+        printf(".");
+        fflush(stdout);
+        usleep(DELAY);
+    } while ( ++retry < RETRIES );
+
+    if ( retry >= RETRIES )
+    {
+        printf("%s: Operation didn't complete after 30 seconds.\n", id);
+        return -1;
+    }
+    else
+    {
+        if ( rc == 0 )
+            rc = status.state;
+
+        if ( action_options[idx].expected == rc )
+            printf(" completed\n");
+        else if ( rc < 0 )
+        {
+            printf("%s failed with %d(%s)\n", id, -rc, strerror(-rc));
+            return -1;
+        }
+        else
+        {
+            printf("%s: in wrong state (%s), expected (%s)\n",
+               id, state2str(rc),
+               state2str(action_options[idx].expected));
+            return -1;
+        }
+    }
+
+    return 0;
+
+ err:
+    printf("%s failed with %d(%s)\n", id, -rc, strerror(-rc));
+    return rc;
+}
+static int all_func(int argc, char *argv[])
+{
+    int rc;
+
+    rc = upload_func(argc, argv);
+    if ( rc )
+        return rc;
+
+    rc = action_func(1 /* only id */, argv, ACTION_CHECK);
+    if ( rc )
+        goto unload;
+
+    rc = action_func(1 /* only id */, argv, ACTION_APPLY);
+    if ( rc )
+        goto unload;
+
+    return 0;
+unload:
+    action_func(argc, argv, ACTION_UNLOAD);
+    return rc;
+}
+
+struct {
+    const char *name;
+    int (*function)(int argc, char *argv[]);
+} main_options[] = {
+    { "help", help_func },
+    { "list", list_func },
+    { "upload", upload_func },
+    { "all", all_func },
+};
+
+int main(int argc, char *argv[])
+{
+    int i, j, ret;
+
+    if ( argc  <= 1 )
+    {
+        show_help();
+        return 0;
+    }
+    for ( i = 0; i < ARRAY_SIZE(main_options); i++ )
+        if (!strncmp(main_options[i].name, argv[1], strlen(argv[1])))
+            break;
+
+    if ( i == ARRAY_SIZE(main_options) )
+    {
+        for ( j = 0; j < ARRAY_SIZE(action_options); j++ )
+            if (!strncmp(action_options[j].name, argv[1], strlen(argv[1])))
+                break;
+
+        if ( j == ARRAY_SIZE(action_options) )
+        {
+            fprintf(stderr, "Unrecognised command '%s' -- try "
+                   "'xen-xsplice help'\n", argv[1]);
+            return 1;
+        }
+    } else
+        j = ARRAY_SIZE(action_options);
+
+    xch = xc_interface_open(0,0,0);
+    if ( !xch )
+    {
+        fprintf(stderr, "failed to get the handler\n");
+        return 0;
+    }
+
+    if ( i == ARRAY_SIZE(main_options) )
+        ret = action_func(argc -2, argv + 2, j);
+    else
+        ret = main_options[i].function(argc -2, argv + 2);
+
+    xc_interface_close(xch);
+
+    return !!ret;
+}
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v1 05/11] elf: Add relocation types to elfstructs.h
  2015-11-03 18:15 [PATCH v1 01/11] xsplice: Design document (v2) Ross Lagerwall
                   ` (2 preceding siblings ...)
  2015-11-03 18:16 ` [PATCH v1 04/11] xen-xsplice: Tool to manipulate xsplice payloads Ross Lagerwall
@ 2015-11-03 18:16 ` Ross Lagerwall
  2015-11-05 10:38   ` Jan Beulich
  2015-11-03 18:16 ` [PATCH v1 06/11] xsplice: Add helper elf routines Ross Lagerwall
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 40+ messages in thread
From: Ross Lagerwall @ 2015-11-03 18:16 UTC (permalink / raw)
  To: xen-devel
  Cc: Ross Lagerwall, Ian Jackson, Ian Campbell, Jan Beulich, Tim Deegan

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 xen/include/xen/elfstructs.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/xen/include/xen/elfstructs.h b/xen/include/xen/elfstructs.h
index 12ffb82..5ca2956 100644
--- a/xen/include/xen/elfstructs.h
+++ b/xen/include/xen/elfstructs.h
@@ -348,6 +348,27 @@ typedef struct {
 #define	ELF64_R_TYPE(info)	((info) & 0xFFFFFFFF)
 #define ELF64_R_INFO(s,t) 	(((s) << 32) + (u_int32_t)(t))
 
+/* x86-64 relocation types */
+#define R_X86_64_NONE		0	/* No reloc */
+#define R_X86_64_64		1	/* Direct 64 bit  */
+#define R_X86_64_PC32		2	/* PC relative 32 bit signed */
+#define R_X86_64_GOT32		3	/* 32 bit GOT entry */
+#define R_X86_64_PLT32		4	/* 32 bit PLT address */
+#define R_X86_64_COPY		5	/* Copy symbol at runtime */
+#define R_X86_64_GLOB_DAT	6	/* Create GOT entry */
+#define R_X86_64_JUMP_SLOT	7	/* Create PLT entry */
+#define R_X86_64_RELATIVE	8	/* Adjust by program base */
+#define R_X86_64_GOTPCREL	9	/* 32 bit signed pc relative
+					   offset to GOT */
+#define R_X86_64_32		10	/* Direct 32 bit zero extended */
+#define R_X86_64_32S		11	/* Direct 32 bit sign extended */
+#define R_X86_64_16		12	/* Direct 16 bit zero extended */
+#define R_X86_64_PC16		13	/* 16 bit sign extended pc relative */
+#define R_X86_64_8		14	/* Direct 8 bit sign extended  */
+#define R_X86_64_PC8		15	/* 8 bit sign extended pc relative */
+
+#define R_X86_64_NUM		16
+
 /* Program Header */
 typedef struct {
 	Elf32_Word	p_type;		/* segment type */
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v1 06/11] xsplice: Add helper elf routines
  2015-11-03 18:15 [PATCH v1 01/11] xsplice: Design document (v2) Ross Lagerwall
                   ` (3 preceding siblings ...)
  2015-11-03 18:16 ` [PATCH v1 05/11] elf: Add relocation types to elfstructs.h Ross Lagerwall
@ 2015-11-03 18:16 ` Ross Lagerwall
  2015-11-04 21:49   ` Konrad Rzeszutek Wilk
  2015-11-03 18:16 ` [PATCH v1 07/11] xsplice: Implement payload loading Ross Lagerwall
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 40+ messages in thread
From: Ross Lagerwall @ 2015-11-03 18:16 UTC (permalink / raw)
  To: xen-devel
  Cc: Ross Lagerwall, Ian Jackson, Ian Campbell, Jan Beulich, Tim Deegan

Add some elf routines and data structures in preparation for loading an
xsplice payload.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 xen/common/Makefile           |   1 +
 xen/common/xsplice_elf.c      | 122 ++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/xsplice_elf.h |  44 +++++++++++++++
 3 files changed, 167 insertions(+)
 create mode 100644 xen/common/xsplice_elf.c
 create mode 100644 xen/include/xen/xsplice_elf.h

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 1b17c9d..de7c08a 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -57,6 +57,7 @@ obj-y += vsprintf.o
 obj-y += wait.o
 obj-y += xmalloc_tlsf.o
 obj-y += xsplice.o
+obj-y += xsplice_elf.o
 
 obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma unlzo unlz4 earlycpio,$(n).init.o)
 
diff --git a/xen/common/xsplice_elf.c b/xen/common/xsplice_elf.c
new file mode 100644
index 0000000..13a9229
--- /dev/null
+++ b/xen/common/xsplice_elf.c
@@ -0,0 +1,122 @@
+#include <xen/lib.h>
+#include <xen/errno.h>
+#include <xen/xsplice_elf.h>
+
+struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
+                                                const char *name)
+{
+    int i;
+
+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( !strcmp(name, elf->sec[i].name) )
+            return &elf->sec[i];
+    }
+
+    return NULL;
+}
+
+static int elf_get_sections(struct xsplice_elf *elf, uint8_t *data)
+{
+    struct xsplice_elf_sec *sec;
+    int i;
+
+    sec = xmalloc_array(struct xsplice_elf_sec, elf->hdr->e_shnum);
+    if ( !sec )
+    {
+        printk(XENLOG_ERR "Could not find section table\n");
+        return -ENOMEM;
+    }
+
+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
+    {
+#ifdef CONFIG_ARM_32
+        sec[i].sec = (Elf32_Shdr *)(data + elf->hdr->e_shoff +
+                                    i * elf->hdr->e_shentsize);
+#else
+        sec[i].sec = (Elf64_Shdr *)(data + elf->hdr->e_shoff +
+                                    i * elf->hdr->e_shentsize);
+#endif
+        sec[i].data = data + sec[i].sec->sh_offset;
+    }
+    elf->sec = sec;
+
+    return 0;
+}
+
+static int elf_get_sym(struct xsplice_elf *elf, uint8_t *data)
+{
+    struct xsplice_elf_sec *symtab, *strtab_sec;
+    struct xsplice_elf_sym *sym;
+    const char *strtab;
+    int i;
+
+    symtab = xsplice_elf_sec_by_name(elf, ".symtab");
+    if ( !symtab )
+    {
+        printk(XENLOG_ERR "Could not find symbol table\n");
+        return -EINVAL;
+    }
+
+    strtab_sec = xsplice_elf_sec_by_name(elf, ".strtab");
+    if ( !strtab_sec )
+    {
+        printk(XENLOG_ERR "Could not find string table\n");
+        return -EINVAL;
+    }
+    strtab = (const char *)(data + strtab_sec->sec->sh_offset);
+
+    elf->nsym = symtab->sec->sh_size / symtab->sec->sh_entsize;
+
+    sym = xmalloc_array(struct xsplice_elf_sym, elf->nsym);
+    if ( !sym )
+    {
+        printk(XENLOG_ERR "Could not allocate memory for symbols\n");
+        return -ENOMEM;
+    }
+
+    for ( i = 0; i < elf->nsym; i++ )
+    {
+#ifdef CONFIG_ARM_32
+        sym[i].sym = (Elf32_Sym *)(symtab->data + i * symtab->sec->sh_entsize);
+#else
+        sym[i].sym = (Elf64_Sym *)(symtab->data + i * symtab->sec->sh_entsize);
+#endif
+        sym[i].name = strtab + sym[i].sym->st_name;
+    }
+    elf->sym = sym;
+
+    return 0;
+}
+
+int xsplice_elf_load(struct xsplice_elf *elf, uint8_t *data, ssize_t len)
+{
+    const char *shstrtab;
+    int i, rc;
+
+#ifdef CONFIG_ARM_32
+    elf->hdr = (Elf32_Ehdr *)data;
+#else
+    elf->hdr = (Elf64_Ehdr *)data;
+#endif
+
+    rc = elf_get_sections(elf, data);
+    if ( rc )
+        return rc;
+
+    shstrtab = (const char *)(data + elf->sec[elf->hdr->e_shstrndx].sec->sh_offset);
+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
+        elf->sec[i].name = shstrtab + elf->sec[i].sec->sh_name;
+
+    rc = elf_get_sym(elf, data);
+    if ( rc )
+        return rc;
+
+    return 0;
+}
+
+void xsplice_elf_free(struct xsplice_elf *elf)
+{
+    xfree(elf->sec);
+    xfree(elf->sym);
+}
diff --git a/xen/include/xen/xsplice_elf.h b/xen/include/xen/xsplice_elf.h
new file mode 100644
index 0000000..bac0053
--- /dev/null
+++ b/xen/include/xen/xsplice_elf.h
@@ -0,0 +1,44 @@
+#ifndef __XEN_XSPLICE_ELF_H__
+#define __XEN_XSPLICE_ELF_H__
+
+#include <xen/types.h>
+#include <xen/elfstructs.h>
+
+/* The following describes an Elf file as consumed by xsplice. */
+struct xsplice_elf_sec {
+#ifdef CONFIG_ARM_32
+    Elf32_Shdr *sec;
+#else
+    Elf64_Shdr *sec;
+#endif
+    const char *name;
+    const uint8_t *data;           /* A pointer to the data section */
+    uint8_t *load_addr;            /* A pointer to the allocated destination */
+};
+
+struct xsplice_elf_sym {
+#ifdef CONFIG_ARM_32
+    Elf32_Sym *sym;
+#else
+    Elf64_Sym *sym;
+#endif
+    const char *name;
+};
+
+struct xsplice_elf {
+#ifdef CONFIG_ARM_32
+    Elf32_Ehdr *hdr;
+#else
+    Elf64_Ehdr *hdr;
+#endif
+    struct xsplice_elf_sec *sec;   /* Array of sections */
+    struct xsplice_elf_sym *sym;   /* Array of symbols */
+    int nsym;
+};
+
+struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
+                                                const char *name);
+int xsplice_elf_load(struct xsplice_elf *elf, uint8_t *data, ssize_t len);
+void xsplice_elf_free(struct xsplice_elf *elf);
+
+#endif /* __XEN_XSPLICE_ELF_H__ */
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v1 07/11] xsplice: Implement payload loading
  2015-11-03 18:15 [PATCH v1 01/11] xsplice: Design document (v2) Ross Lagerwall
                   ` (4 preceding siblings ...)
  2015-11-03 18:16 ` [PATCH v1 06/11] xsplice: Add helper elf routines Ross Lagerwall
@ 2015-11-03 18:16 ` Ross Lagerwall
  2015-11-04 22:21   ` Konrad Rzeszutek Wilk
  2015-11-03 18:16 ` [PATCH v1 08/11] xsplice: Implement support for applying patches Ross Lagerwall
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 40+ messages in thread
From: Ross Lagerwall @ 2015-11-03 18:16 UTC (permalink / raw)
  To: xen-devel
  Cc: Ross Lagerwall, Stefano Stabellini, Ian Campbell, Jan Beulich,
	Andrew Cooper

Add support for loading xsplice payloads. This is somewhat similar to
the Linux kernel module loader, implementing the following steps:
- Verify the elf file.
- Parse the elf file.
- Allocate a region of memory mapped within a free area of
  [xen_virt_end, XEN_VIRT_END].
- Copy allocated sections into the new region.
- Resolve section symbols. All other symbols must be absolute addresses.
- Perform relocations.
- Process xsplice specific sections.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 xen/arch/arm/Makefile             |   1 +
 xen/arch/arm/xsplice.c            |  23 ++++
 xen/arch/x86/Makefile             |   1 +
 xen/arch/x86/setup.c              |   7 +
 xen/arch/x86/xsplice.c            |  90 ++++++++++++
 xen/common/xsplice.c              | 282 ++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/x86_64/page.h |   2 +
 xen/include/xen/xsplice.h         |  22 +++
 8 files changed, 428 insertions(+)
 create mode 100644 xen/arch/arm/xsplice.c
 create mode 100644 xen/arch/x86/xsplice.c

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 1ef39f7..f785c07 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -39,6 +39,7 @@ obj-y += device.o
 obj-y += decode.o
 obj-y += processor.o
 obj-y += smc.o
+obj-y += xsplice.o
 
 #obj-bin-y += ....o
 
diff --git a/xen/arch/arm/xsplice.c b/xen/arch/arm/xsplice.c
new file mode 100644
index 0000000..8d85fa9
--- /dev/null
+++ b/xen/arch/arm/xsplice.c
@@ -0,0 +1,23 @@
+#include <xen/lib.h>
+#include <xen/errno.h>
+#include <xen/xsplice_elf.h>
+#include <xen/xsplice.h>
+
+int xsplice_verify_elf(uint8_t *data, ssize_t len)
+{
+    return -ENOSYS;
+}
+
+int xsplice_perform_rel(struct xsplice_elf *elf,
+                        struct xsplice_elf_sec *base,
+                        struct xsplice_elf_sec *rela)
+{
+    return -ENOSYS;
+}
+
+int xsplice_perform_rela(struct xsplice_elf *elf,
+                         struct xsplice_elf_sec *base,
+                         struct xsplice_elf_sec *rela)
+{
+    return -ENOSYS;
+}
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 39a8059..6e05532 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -61,6 +61,7 @@ obj-y += x86_emulate.o
 obj-y += tboot.o
 obj-y += hpet.o
 obj-y += vm_event.o
+obj-y += xsplice.o
 obj-y += xstate.o
 
 obj-$(crash_debug) += gdbstub.o
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 4ed0110..a79c5e3 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -99,6 +99,9 @@ unsigned long __read_mostly xen_phys_start;
 
 unsigned long __read_mostly xen_virt_end;
 
+unsigned long __read_mostly module_virt_start;
+unsigned long __read_mostly module_virt_end;
+
 DEFINE_PER_CPU(struct tss_struct, init_tss);
 
 char __section(".bss.stack_aligned") cpu0_stack[STACK_SIZE];
@@ -1145,6 +1148,10 @@ void __init noreturn __start_xen(unsigned long mbi_p)
                    ~((1UL << L2_PAGETABLE_SHIFT) - 1);
     destroy_xen_mappings(xen_virt_end, XEN_VIRT_START + BOOTSTRAP_MAP_BASE);
 
+    module_virt_start = xen_virt_end;
+    module_virt_end = XEN_VIRT_END - NR_CPUS * PAGE_SIZE;
+    BUG_ON(module_virt_end <= module_virt_start);
+
     memguard_init();
 
     nr_pages = 0;
diff --git a/xen/arch/x86/xsplice.c b/xen/arch/x86/xsplice.c
new file mode 100644
index 0000000..dbff0d5
--- /dev/null
+++ b/xen/arch/x86/xsplice.c
@@ -0,0 +1,90 @@
+#include <xen/lib.h>
+#include <xen/errno.h>
+#include <xen/xsplice_elf.h>
+#include <xen/xsplice.h>
+
+int xsplice_verify_elf(uint8_t *data, ssize_t len)
+{
+
+    Elf64_Ehdr *hdr = (Elf64_Ehdr *)data;
+
+    if ( len < (sizeof *hdr) ||
+         !IS_ELF(*hdr) ||
+         hdr->e_ident[EI_CLASS] != ELFCLASS64 ||
+         hdr->e_ident[EI_DATA] != ELFDATA2LSB ||
+         hdr->e_machine != EM_X86_64 )
+    {
+        printk(XENLOG_ERR "Invalid ELF file\n");
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+int xsplice_perform_rel(struct xsplice_elf *elf,
+                        struct xsplice_elf_sec *base,
+                        struct xsplice_elf_sec *rela)
+{
+    printk(XENLOG_ERR "SHT_REL relocation unsupported\n");
+    return -ENOSYS;
+}
+
+int xsplice_perform_rela(struct xsplice_elf *elf,
+                         struct xsplice_elf_sec *base,
+                         struct xsplice_elf_sec *rela)
+{
+    Elf64_Rela *r;
+    int symndx, i;
+    uint64_t val;
+    uint8_t *dest;
+
+    for ( i = 0; i < (rela->sec->sh_size / rela->sec->sh_entsize); i++ )
+    {
+        r = (Elf64_Rela *)(rela->data + i * rela->sec->sh_entsize);
+        symndx = ELF64_R_SYM(r->r_info);
+        dest = base->load_addr + r->r_offset;
+        val = r->r_addend + elf->sym[symndx].sym->st_value;
+
+        switch ( ELF64_R_TYPE(r->r_info) )
+        {
+            case R_X86_64_NONE:
+                break;
+            case R_X86_64_64:
+                *(uint64_t *)dest = val;
+                break;
+            case R_X86_64_32:
+                *(uint32_t *)dest = val;
+                if (val != *(uint32_t *)dest)
+                    goto overflow;
+                break;
+            case R_X86_64_32S:
+                *(int32_t *)dest = val;
+                if ((int64_t)val != *(int32_t *)dest)
+                    goto overflow;
+                break;
+            case R_X86_64_PLT32:
+                /*
+                 * Xen uses -fpic which normally uses PLT relocations
+                 * except that it sets visibility to hidden which means
+                 * that they are not used.  However, when gcc cannot
+                 * inline memcpy it emits memcpy with default visibility
+                 * which then creates a PLT relocation.  It can just be
+                 * treated the same as R_X86_64_PC32.
+                 */
+                /* Fall through */
+            case R_X86_64_PC32:
+                *(uint32_t *)dest = val - (uint64_t)dest;
+                break;
+            default:
+                printk(XENLOG_ERR "Unhandled relocation %lu\n",
+                       ELF64_R_TYPE(r->r_info));
+                return -EINVAL;
+        }
+    }
+
+    return 0;
+
+ overflow:
+    printk(XENLOG_ERR "Overflow in relocation %d in %s\n", i, rela->name);
+    return -EOVERFLOW;
+}
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index d984c8a..5e88c55 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -12,6 +12,7 @@
 #include <xen/stdbool.h>
 #include <xen/sched.h>
 #include <xen/lib.h>
+#include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 #include <public/sysctl.h>
 
@@ -29,9 +30,15 @@ struct payload {
 
     struct list_head   list;   /* Linked to 'payload_list'. */
 
+    void *module_address;
+    size_t module_pages;
+
     char  id[XEN_XSPLICE_NAME_SIZE + 1];          /* Name of it. */
 };
 
+static int load_module(struct payload *payload, uint8_t *raw, ssize_t len);
+static void free_module(struct payload *payload);
+
 static const char *state2str(int32_t state)
 {
 #define STATE(x) [XSPLICE_STATE_##x] = #x
@@ -140,6 +147,7 @@ static void __free_payload(struct payload *data)
     list_del(&data->list);
     payload_cnt --;
     payload_version ++;
+    free_module(data);
     xfree(data);
 }
 
@@ -178,6 +186,10 @@ static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
     if ( copy_from_guest(raw_data, upload->payload, upload->size) )
         goto err_raw;
 
+    rc = load_module(data, raw_data, upload->size);
+    if ( rc )
+        goto err_raw;
+
     data->state = XSPLICE_STATE_LOADED;
     data->rc = 0;
     INIT_LIST_HEAD(&data->list);
@@ -390,6 +402,276 @@ int xsplice_control(xen_sysctl_xsplice_op_t *xsplice)
     return rc;
 }
 
+
+/*
+ * The following functions prepare an xSplice module to be executed by
+ * allocating space, loading the allocated sections, resolving symbols,
+ * performing relocations, etc.
+ */
+#ifdef CONFIG_X86
+static void *alloc_module(size_t size)
+{
+    mfn_t *mfn, *mfn_ptr;
+    size_t pages, i;
+    struct page_info *pg;
+    unsigned long hole_start, hole_end, cur;
+    struct payload *data, *data2;
+
+    ASSERT(size);
+
+    pages = PFN_UP(size);
+    mfn = xmalloc_array(mfn_t, pages);
+    if ( mfn == NULL )
+        return NULL;
+
+    for ( i = 0; i < pages; i++ )
+    {
+        pg = alloc_domheap_page(NULL, 0);
+        if ( pg == NULL )
+            goto error;
+        mfn[i] = _mfn(page_to_mfn(pg));
+    }
+
+    hole_start = (unsigned long)module_virt_start;
+    hole_end = hole_start + pages * PAGE_SIZE;
+    spin_lock(&payload_list_lock);
+    list_for_each_entry ( data, &payload_list, list )
+    {
+        list_for_each_entry ( data2, &payload_list, list )
+        {
+            unsigned long start, end;
+
+            start = (unsigned long)data2->module_address;
+            end = start + data2->module_pages * PAGE_SIZE;
+            if ( hole_end > start && hole_start < end )
+            {
+                hole_start = end;
+                hole_end = hole_start + pages * PAGE_SIZE;
+                break;
+            }
+        }
+        if ( &data2->list == &payload_list )
+            break;
+    }
+    spin_unlock(&payload_list_lock);
+
+    if ( hole_end >= module_virt_end )
+        goto error;
+
+    for ( cur = hole_start, mfn_ptr = mfn; pages--; ++mfn_ptr, cur += PAGE_SIZE )
+    {
+        if ( map_pages_to_xen(cur, mfn_x(*mfn_ptr), 1, PAGE_HYPERVISOR_RWX) )
+        {
+            if ( cur != hole_start )
+                destroy_xen_mappings(hole_start, cur);
+            goto error;
+        }
+    }
+    xfree(mfn);
+    return (void *)hole_start;
+
+ error:
+    while ( i-- )
+        free_domheap_page(mfn_to_page(mfn_x(mfn[i])));
+    xfree(mfn);
+    return NULL;
+}
+#else
+static void *alloc_module(size_t size)
+{
+    return NULL;
+}
+#endif
+
+static void free_module(struct payload *payload)
+{
+    int i;
+    struct page_info *pg;
+    PAGE_LIST_HEAD(pg_list);
+    void *va = payload->module_address;
+    unsigned long addr = (unsigned long)va;
+
+    if ( !payload->module_address )
+        return;
+
+    payload->module_address = NULL;
+
+    for ( i = 0; i < payload->module_pages; i++ )
+        page_list_add(vmap_to_page(va + i * PAGE_SIZE), &pg_list);
+
+    destroy_xen_mappings(addr, addr + payload->module_pages * PAGE_SIZE);
+
+    while ( (pg = page_list_remove_head(&pg_list)) != NULL )
+        free_domheap_page(pg);
+
+    payload->module_pages = 0;
+}
+
+static void alloc_section(struct xsplice_elf_sec *sec, size_t *core_size)
+{
+    size_t align_size = ROUNDUP(*core_size, sec->sec->sh_addralign);
+    sec->sec->sh_entsize = align_size;
+    *core_size = sec->sec->sh_size + align_size;
+}
+
+static int move_module(struct payload *payload, struct xsplice_elf *elf)
+{
+    uint8_t *buf;
+    int i;
+    size_t core_size = 0;
+
+    /* Allocate text regions */
+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( (elf->sec[i].sec->sh_flags & (SHF_ALLOC|SHF_EXECINSTR)) ==
+             (SHF_ALLOC|SHF_EXECINSTR) )
+            alloc_section(&elf->sec[i], &core_size);
+    }
+
+    /* Allocate rw data */
+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
+             !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
+             (elf->sec[i].sec->sh_flags & SHF_WRITE) )
+            alloc_section(&elf->sec[i], &core_size);
+    }
+
+    /* Allocate ro data */
+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
+             !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
+             !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
+            alloc_section(&elf->sec[i], &core_size);
+    }
+
+    buf = alloc_module(core_size);
+    if ( !buf ) {
+        printk(XENLOG_ERR "Could not allocate memory for module\n");
+        return -ENOMEM;
+    }
+    memset(buf, 0, core_size);
+
+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
+    {
+        if ( elf->sec[i].sec->sh_flags & SHF_ALLOC )
+        {
+            elf->sec[i].load_addr = buf + elf->sec[i].sec->sh_entsize;
+            memcpy(elf->sec[i].load_addr, elf->sec[i].data,
+                   elf->sec[i].sec->sh_size);
+            printk(XENLOG_DEBUG "Loaded %s at 0x%p\n",
+                   elf->sec[i].name, elf->sec[i].load_addr);
+        }
+    }
+
+    payload->module_address = buf;
+    payload->module_pages = PFN_UP(core_size);
+
+    return 0;
+}
+
+static int resolve_symbols(struct xsplice_elf *elf)
+{
+    int i;
+
+    for ( i = 1; i < elf->nsym; i++ )
+    {
+        switch ( elf->sym[i].sym->st_shndx )
+        {
+            case SHN_COMMON:
+                printk(XENLOG_ERR "Unexpected common symbol: %s\n",
+                       elf->sym[i].name);
+                return -EINVAL;
+                break;
+            case SHN_UNDEF:
+                printk(XENLOG_ERR "Unknown symbol: %s\n", elf->sym[i].name);
+                return -ENOENT;
+                break;
+            case SHN_ABS:
+                printk(XENLOG_DEBUG "Absolute symbol: %s => 0x%p\n",
+                       elf->sym[i].name, (void *)elf->sym[i].sym->st_value);
+                break;
+            default:
+                if ( elf->sec[elf->sym[i].sym->st_shndx].sec->sh_flags & SHF_ALLOC )
+                {
+                    elf->sym[i].sym->st_value +=
+                        (unsigned long)elf->sec[elf->sym[i].sym->st_shndx].load_addr;
+                    printk(XENLOG_DEBUG "Symbol resolved: %s => 0x%p\n",
+                           elf->sym[i].name, (void *)elf->sym[i].sym->st_value);
+                }
+        }
+    }
+
+    return 0;
+}
+
+static int perform_relocs(struct xsplice_elf *elf)
+{
+    struct xsplice_elf_sec *rela, *base;
+    int i, rc;
+
+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
+    {
+        rela = &elf->sec[i];
+
+        /* Is it a valid relocation section? */
+        if ( rela->sec->sh_info >= elf->hdr->e_shnum )
+            continue;
+
+        base = &elf->sec[rela->sec->sh_info];
+
+        /* Don't relocate non-allocated sections */
+        if ( !(base->sec->sh_flags & SHF_ALLOC) )
+            continue;
+
+        if ( elf->sec[i].sec->sh_type == SHT_RELA )
+            rc = xsplice_perform_rela(elf, base, rela);
+        else if ( elf->sec[i].sec->sh_type == SHT_REL )
+            rc = xsplice_perform_rel(elf, base, rela);
+
+        if ( rc )
+            return rc;
+    }
+
+    return 0;
+}
+
+static int load_module(struct payload *payload, uint8_t *raw, ssize_t len)
+{
+    struct xsplice_elf elf;
+    int rc = 0;
+
+    rc = xsplice_verify_elf(raw, len);
+    if ( rc )
+        return rc;
+
+    rc = xsplice_elf_load(&elf, raw, len);
+    if ( rc )
+        return rc;
+
+    rc = move_module(payload, &elf);
+    if ( rc )
+        goto err_elf;
+
+    rc = resolve_symbols(&elf);
+    if ( rc )
+        goto err_module;
+
+    rc = perform_relocs(&elf);
+    if ( rc )
+        goto err_module;
+
+    return 0;
+
+ err_module:
+    free_module(payload);
+ err_elf:
+    xsplice_elf_free(&elf);
+
+    return rc;
+}
+
 static int __init xsplice_init(void)
 {
     register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
diff --git a/xen/include/asm-x86/x86_64/page.h b/xen/include/asm-x86/x86_64/page.h
index 19ab4d0..e6f08e9 100644
--- a/xen/include/asm-x86/x86_64/page.h
+++ b/xen/include/asm-x86/x86_64/page.h
@@ -38,6 +38,8 @@
 #include <xen/pdx.h>
 
 extern unsigned long xen_virt_end;
+extern unsigned long module_virt_start;
+extern unsigned long module_virt_end;
 
 #define spage_to_pdx(spg) (((spg) - spage_table)<<(SUPERPAGE_SHIFT-PAGE_SHIFT))
 #define pdx_to_spage(pdx) (spage_table + ((pdx)>>(SUPERPAGE_SHIFT-PAGE_SHIFT)))
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index 41e28da..a3946a3 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -1,9 +1,31 @@
 #ifndef __XEN_XSPLICE_H__
 #define __XEN_XSPLICE_H__
 
+struct xsplice_elf;
+struct xsplice_elf_sec;
+struct xsplice_elf_sym;
+
+struct xsplice_patch_func {
+    unsigned long new_addr;
+    unsigned long new_size;
+    unsigned long old_addr;
+    unsigned long old_size;
+    char *name;
+    unsigned char undo[8];
+};
+
 struct xen_sysctl_xsplice_op;
 int xsplice_control(struct xen_sysctl_xsplice_op *);
 
 extern void xsplice_printall(unsigned char key);
 
+/* Arch hooks */
+int xsplice_verify_elf(uint8_t *data, ssize_t len);
+int xsplice_perform_rel(struct xsplice_elf *elf,
+                        struct xsplice_elf_sec *base,
+                        struct xsplice_elf_sec *rela);
+int xsplice_perform_rela(struct xsplice_elf *elf,
+                         struct xsplice_elf_sec *base,
+                         struct xsplice_elf_sec *rela);
+
 #endif /* __XEN_XSPLICE_H__ */
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v1 08/11] xsplice: Implement support for applying patches
  2015-11-03 18:15 [PATCH v1 01/11] xsplice: Design document (v2) Ross Lagerwall
                   ` (5 preceding siblings ...)
  2015-11-03 18:16 ` [PATCH v1 07/11] xsplice: Implement payload loading Ross Lagerwall
@ 2015-11-03 18:16 ` Ross Lagerwall
  2015-11-05  3:17   ` Konrad Rzeszutek Wilk
                     ` (2 more replies)
  2015-11-03 18:16 ` [PATCH v1 09/11] xsplice: Add support for bug frames Ross Lagerwall
                   ` (5 subsequent siblings)
  12 siblings, 3 replies; 40+ messages in thread
From: Ross Lagerwall @ 2015-11-03 18:16 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Ian Campbell, Jun Nakajima, Andrew Cooper,
	Ross Lagerwall, Stefano Stabellini, Jan Beulich,
	Aravind Gopalakrishnan, Boris Ostrovsky, Suravee Suthikulpanit

Implement support for the apply, revert and replace actions.

To perform and action on a payload, the hypercall sets up a data
structure to schedule the work.  A hook is added in all the
return-to-guest paths to check for work to do and execute it if needed.
In this way, patches can be applied with all CPUs idle and without
stacks.  The first CPU to do_xsplice() becomes the master and triggers a
reschedule softirq to trigger all the other CPUs to enter do_xsplice()
with no stack.  Once all CPUs have rendezvoused, all CPUs disable IRQs
and NMIs are ignored. The system is then quiscient and the master
performs the action.  After this, all CPUs enable IRQs and NMIs are
re-enabled.

The action to perform is one of:
- APPLY: For each function in the module, store the first 5 bytes of the
  old function and replace it with a jump to the new function.
- REVERT: Copy the previously stored bytes into the first 5 bytes of the
  old function.
- REPLACE: Revert each applied module and then apply the new module.

To prevent a deadlock with any other barrier in the system, the master
will wait for up to 30ms before timing out.  I've taken some
measurements and found the patch application to take about 100 μs on a
72 CPU system, whether idle or fully loaded.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 xen/arch/arm/xsplice.c      |   8 ++
 xen/arch/x86/domain.c       |   4 +
 xen/arch/x86/hvm/svm/svm.c  |   2 +
 xen/arch/x86/hvm/vmx/vmcs.c |   2 +
 xen/arch/x86/xsplice.c      |  19 ++++
 xen/common/xsplice.c        | 264 ++++++++++++++++++++++++++++++++++++++++++--
 xen/include/asm-arm/nmi.h   |  13 +++
 xen/include/xen/xsplice.h   |   7 +-
 8 files changed, 306 insertions(+), 13 deletions(-)

diff --git a/xen/arch/arm/xsplice.c b/xen/arch/arm/xsplice.c
index 8d85fa9..3c34eb3 100644
--- a/xen/arch/arm/xsplice.c
+++ b/xen/arch/arm/xsplice.c
@@ -3,6 +3,14 @@
 #include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 
+void xsplice_apply_jmp(struct xsplice_patch_func *func)
+{
+}
+
+void xsplice_revert_jmp(struct xsplice_patch_func *func)
+{
+}
+
 int xsplice_verify_elf(uint8_t *data, ssize_t len)
 {
     return -ENOSYS;
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index fe3be30..4420cfc 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -36,6 +36,7 @@
 #include <xen/cpu.h>
 #include <xen/wait.h>
 #include <xen/guest_access.h>
+#include <xen/xsplice.h>
 #include <public/sysctl.h>
 #include <asm/regs.h>
 #include <asm/mc146818rtc.h>
@@ -120,6 +121,7 @@ static void idle_loop(void)
         (*pm_idle)();
         do_tasklet();
         do_softirq();
+        do_xsplice();
     }
 }
 
@@ -136,6 +138,7 @@ void startup_cpu_idle_loop(void)
 
 static void noreturn continue_idle_domain(struct vcpu *v)
 {
+    do_xsplice();
     reset_stack_and_jump(idle_loop);
 }
 
@@ -143,6 +146,7 @@ static void noreturn continue_nonidle_domain(struct vcpu *v)
 {
     check_wakeup_from_wait();
     mark_regs_dirty(guest_cpu_user_regs());
+    do_xsplice();
     reset_stack_and_jump(ret_from_intr);
 }
 
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 8de41fa..65bf7e9 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -26,6 +26,7 @@
 #include <xen/hypercall.h>
 #include <xen/domain_page.h>
 #include <xen/xenoprof.h>
+#include <xen/xsplice.h>
 #include <asm/current.h>
 #include <asm/io.h>
 #include <asm/paging.h>
@@ -1071,6 +1072,7 @@ static void noreturn svm_do_resume(struct vcpu *v)
 
     hvm_do_resume(v);
 
+    do_xsplice();
     reset_stack_and_jump(svm_asm_do_resume);
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 4ea1ad1..d996f47 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -25,6 +25,7 @@
 #include <xen/kernel.h>
 #include <xen/keyhandler.h>
 #include <xen/vm_event.h>
+#include <xen/xsplice.h>
 #include <asm/current.h>
 #include <asm/cpufeature.h>
 #include <asm/processor.h>
@@ -1685,6 +1686,7 @@ void vmx_do_resume(struct vcpu *v)
     }
 
     hvm_do_resume(v);
+    do_xsplice();
     reset_stack_and_jump(vmx_asm_do_vmentry);
 }
 
diff --git a/xen/arch/x86/xsplice.c b/xen/arch/x86/xsplice.c
index dbff0d5..31e4124 100644
--- a/xen/arch/x86/xsplice.c
+++ b/xen/arch/x86/xsplice.c
@@ -3,6 +3,25 @@
 #include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 
+#define PATCH_INSN_SIZE 5
+
+void xsplice_apply_jmp(struct xsplice_patch_func *func)
+{
+    uint32_t val;
+    uint8_t *old_ptr;
+
+    old_ptr = (uint8_t *)func->old_addr;
+    memcpy(func->undo, old_ptr, PATCH_INSN_SIZE);
+    *old_ptr++ = 0xe9; /* Relative jump */
+    val = func->new_addr - func->old_addr - PATCH_INSN_SIZE;
+    memcpy(old_ptr, &val, sizeof val);
+}
+
+void xsplice_revert_jmp(struct xsplice_patch_func *func)
+{
+    memcpy((void *)func->old_addr, func->undo, PATCH_INSN_SIZE);
+}
+
 int xsplice_verify_elf(uint8_t *data, ssize_t len)
 {
 
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 5e88c55..4476be5 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -11,16 +11,21 @@
 #include <xen/guest_access.h>
 #include <xen/stdbool.h>
 #include <xen/sched.h>
+#include <xen/softirq.h>
 #include <xen/lib.h>
+#include <xen/wait.h>
 #include <xen/xsplice_elf.h>
 #include <xen/xsplice.h>
 #include <public/sysctl.h>
 
 #include <asm/event.h>
+#include <asm/nmi.h>
 
 static DEFINE_SPINLOCK(payload_list_lock);
 static LIST_HEAD(payload_list);
 
+static LIST_HEAD(applied_list);
+
 static unsigned int payload_cnt;
 static unsigned int payload_version = 1;
 
@@ -29,15 +34,34 @@ struct payload {
     int32_t rc;         /* 0 or -EXX. */
 
     struct list_head   list;   /* Linked to 'payload_list'. */
+    struct list_head   applied_list;   /* Linked to 'applied_list'. */
 
+    struct xsplice_patch_func *funcs;
+    int nfuncs;
     void *module_address;
     size_t module_pages;
 
     char  id[XEN_XSPLICE_NAME_SIZE + 1];          /* Name of it. */
 };
 
+/* Defines an outstanding patching action. */
+struct xsplice_work
+{
+    atomic_t semaphore;          /* Used for rendezvous */
+    atomic_t irq_semaphore;      /* Used to signal all IRQs disabled */
+    struct payload *data;        /* The payload on which to act */
+    volatile bool_t do_work;     /* Signals work to do */
+    volatile bool_t ready;       /* Signals all CPUs synchronized */
+    uint32_t cmd;                /* Action request. XSPLICE_ACTION_* */
+};
+
+static DEFINE_SPINLOCK(xsplice_work_lock);
+/* There can be only one outstanding patching action. */
+static struct xsplice_work xsplice_work;
+
 static int load_module(struct payload *payload, uint8_t *raw, ssize_t len);
 static void free_module(struct payload *payload);
+static int schedule_work(struct payload *data, uint32_t cmd);
 
 static const char *state2str(int32_t state)
 {
@@ -341,28 +365,22 @@ static int xsplice_action(xen_sysctl_xsplice_action_t *action)
     case XSPLICE_ACTION_REVERT:
         if ( data->state == XSPLICE_STATE_APPLIED )
         {
-            /* No implementation yet. */
-            data->state = XSPLICE_STATE_CHECKED;
-            data->rc = 0;
-            rc = 0;
+            data->rc = -EAGAIN;
+            rc = schedule_work(data, action->cmd);
         }
         break;
     case XSPLICE_ACTION_APPLY:
         if ( (data->state == XSPLICE_STATE_CHECKED) )
         {
-            /* No implementation yet. */
-            data->state = XSPLICE_STATE_APPLIED;
-            data->rc = 0;
-            rc = 0;
+            data->rc = -EAGAIN;
+            rc = schedule_work(data, action->cmd);
         }
         break;
     case XSPLICE_ACTION_REPLACE:
         if ( data->state == XSPLICE_STATE_CHECKED )
         {
-            /* No implementation yet. */
-            data->state = XSPLICE_STATE_CHECKED;
-            data->rc = 0;
-            rc = 0;
+            data->rc = -EAGAIN;
+            rc = schedule_work(data, action->cmd);
         }
         break;
     default:
@@ -637,6 +655,24 @@ static int perform_relocs(struct xsplice_elf *elf)
     return 0;
 }
 
+static int find_special_sections(struct payload *payload,
+                                 struct xsplice_elf *elf)
+{
+    struct xsplice_elf_sec *sec;
+
+    sec = xsplice_elf_sec_by_name(elf, ".xsplice.funcs");
+    if ( !sec )
+    {
+        printk(XENLOG_ERR ".xsplice.funcs is missing\n");
+        return -1;
+    }
+
+    payload->funcs = (struct xsplice_patch_func *)sec->load_addr;
+    payload->nfuncs = sec->sec->sh_size / (sizeof *payload->funcs);
+
+    return 0;
+}
+
 static int load_module(struct payload *payload, uint8_t *raw, ssize_t len)
 {
     struct xsplice_elf elf;
@@ -662,6 +698,10 @@ static int load_module(struct payload *payload, uint8_t *raw, ssize_t len)
     if ( rc )
         goto err_module;
 
+    rc = find_special_sections(payload, &elf);
+    if ( rc )
+        goto err_module;
+
     return 0;
 
  err_module:
@@ -672,6 +712,206 @@ static int load_module(struct payload *payload, uint8_t *raw, ssize_t len)
     return rc;
 }
 
+
+/*
+ * The following functions get the CPUs into an appropriate state and
+ * apply (or revert) each of the module's functions.
+ */
+
+/*
+ * This function is executed having all other CPUs with no stack and IRQs
+ * disabled.
+ */
+static int apply_payload(struct payload *data)
+{
+    int i;
+
+    printk(XENLOG_DEBUG "Applying payload: %s\n", data->id);
+
+    for ( i = 0; i < data->nfuncs; i++ )
+        xsplice_apply_jmp(data->funcs + i);
+
+    list_add_tail(&data->applied_list, &applied_list);
+
+    return 0;
+}
+
+/*
+ * This function is executed having all other CPUs with no stack and IRQs
+ * disabled.
+ */
+static int revert_payload(struct payload *data)
+{
+    int i;
+
+    printk(XENLOG_DEBUG "Reverting payload: %s\n", data->id);
+
+    for ( i = 0; i < data->nfuncs; i++ )
+        xsplice_revert_jmp(data->funcs + i);
+
+    list_del(&data->applied_list);
+
+    return 0;
+}
+
+/* Must be holding the payload_list lock */
+static int schedule_work(struct payload *data, uint32_t cmd)
+{
+    /* Fail if an operation is already scheduled */
+    if ( xsplice_work.do_work )
+        return -EAGAIN;
+
+    xsplice_work.cmd = cmd;
+    xsplice_work.data = data;
+    atomic_set(&xsplice_work.semaphore, 0);
+    atomic_set(&xsplice_work.irq_semaphore, 0);
+    xsplice_work.ready = false;
+    smp_mb();
+    xsplice_work.do_work = true;
+    smp_mb();
+
+    return 0;
+}
+
+static int mask_nmi_callback(const struct cpu_user_regs *regs, int cpu)
+{
+    return 1;
+}
+
+static void reschedule_fn(void *unused)
+{
+    smp_mb(); /* Synchronize with setting do_work */
+    raise_softirq(SCHEDULE_SOFTIRQ);
+}
+
+/*
+ * The main function which manages the work of quiescing the system and
+ * patching code.
+ */
+void do_xsplice(void)
+{
+    int id;
+    unsigned int total_cpus;
+    nmi_callback_t saved_nmi_callback;
+
+    /* Fast path: no work to do */
+    if ( likely(!xsplice_work.do_work) )
+        return;
+
+    ASSERT(local_irq_is_enabled());
+
+    spin_lock(&xsplice_work_lock);
+    id = atomic_read(&xsplice_work.semaphore);
+    atomic_inc(&xsplice_work.semaphore);
+    spin_unlock(&xsplice_work_lock);
+
+    total_cpus = num_online_cpus();
+
+    if ( id == 0 )
+    {
+        s_time_t timeout, start;
+
+        /* Trigger other CPUs to execute do_xsplice */
+        smp_call_function(reschedule_fn, NULL, 0);
+
+        /* Wait for other CPUs with a timeout */
+        start = NOW();
+        timeout = start + MILLISECS(30);
+        while ( atomic_read(&xsplice_work.semaphore) != total_cpus &&
+                NOW() < timeout )
+            cpu_relax();
+
+        if ( atomic_read(&xsplice_work.semaphore) == total_cpus )
+        {
+            struct payload *data2;
+
+            /* "Mask" NMIs */
+            saved_nmi_callback = set_nmi_callback(mask_nmi_callback);
+
+            /* All CPUs are waiting, now signal to disable IRQs */
+            xsplice_work.ready = true;
+            smp_mb();
+
+            /* Wait for irqs to be disabled */
+            while ( atomic_read(&xsplice_work.irq_semaphore) != (total_cpus - 1) )
+                cpu_relax();
+
+            local_irq_disable();
+            /* Now this function should be the only one on any stack.
+             * No need to lock the payload list or applied list. */
+            switch ( xsplice_work.cmd )
+            {
+                case XSPLICE_ACTION_APPLY:
+                        xsplice_work.data->rc = apply_payload(xsplice_work.data);
+                        if ( xsplice_work.data->rc == 0 )
+                            xsplice_work.data->state = XSPLICE_STATE_APPLIED;
+                        break;
+                case XSPLICE_ACTION_REVERT:
+                        xsplice_work.data->rc = revert_payload(xsplice_work.data);
+                        if ( xsplice_work.data->rc == 0 )
+                            xsplice_work.data->state = XSPLICE_STATE_CHECKED;
+                        break;
+                case XSPLICE_ACTION_REPLACE:
+                        list_for_each_entry ( data2, &payload_list, list )
+                        {
+                            if ( data2->state != XSPLICE_STATE_APPLIED )
+                                continue;
+
+                            data2->rc = revert_payload(data2);
+                            if ( data2->rc == 0 )
+                                data2->state = XSPLICE_STATE_CHECKED;
+                            else
+                            {
+                                xsplice_work.data->rc = -EINVAL;
+                                break;
+                            }
+                        }
+                        if ( xsplice_work.data->rc != -EINVAL )
+                        {
+                            xsplice_work.data->rc = apply_payload(xsplice_work.data);
+                            if ( xsplice_work.data->rc == 0 )
+                                xsplice_work.data->state = XSPLICE_STATE_APPLIED;
+                        }
+                        break;
+                default:
+                        xsplice_work.data->rc = -EINVAL;
+                        break;
+            }
+
+            local_irq_enable();
+            set_nmi_callback(saved_nmi_callback);
+        }
+        else
+        {
+            xsplice_work.data->rc = -EBUSY;
+        }
+
+        xsplice_work.do_work = 0;
+        smp_mb(); /* Synchronize with waiting CPUs */
+    }
+    else
+    {
+        /* Wait for all CPUs to rendezvous */
+        while ( xsplice_work.do_work && !xsplice_work.ready )
+        {
+            cpu_relax();
+            smp_mb();
+        }
+
+        /* Disable IRQs and signal */
+        local_irq_disable();
+        atomic_inc(&xsplice_work.irq_semaphore);
+
+        /* Wait for patching to complete */
+        while ( xsplice_work.do_work )
+        {
+            cpu_relax();
+            smp_mb();
+        }
+        local_irq_enable();
+    }
+}
+
 static int __init xsplice_init(void)
 {
     register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
diff --git a/xen/include/asm-arm/nmi.h b/xen/include/asm-arm/nmi.h
index a60587e..82aff35 100644
--- a/xen/include/asm-arm/nmi.h
+++ b/xen/include/asm-arm/nmi.h
@@ -4,6 +4,19 @@
 #define register_guest_nmi_callback(a)  (-ENOSYS)
 #define unregister_guest_nmi_callback() (-ENOSYS)
 
+typedef int (*nmi_callback_t)(const struct cpu_user_regs *regs, int cpu);
+
+/**
+ * set_nmi_callback
+ *
+ * Set a handler for an NMI. Only one handler may be
+ * set. Return the old nmi callback handler.
+ */
+static inline nmi_callback_t set_nmi_callback(nmi_callback_t callback)
+{
+    return NULL;
+}
+
 #endif /* ASM_NMI_H */
 /*
  * Local variables:
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index a3946a3..507829c 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -11,7 +11,8 @@ struct xsplice_patch_func {
     unsigned long old_addr;
     unsigned long old_size;
     char *name;
-    unsigned char undo[8];
+    uint8_t undo[8];
+    uint8_t pad[56];
 };
 
 struct xen_sysctl_xsplice_op;
@@ -19,6 +20,8 @@ int xsplice_control(struct xen_sysctl_xsplice_op *);
 
 extern void xsplice_printall(unsigned char key);
 
+void do_xsplice(void);
+
 /* Arch hooks */
 int xsplice_verify_elf(uint8_t *data, ssize_t len);
 int xsplice_perform_rel(struct xsplice_elf *elf,
@@ -27,5 +30,7 @@ int xsplice_perform_rel(struct xsplice_elf *elf,
 int xsplice_perform_rela(struct xsplice_elf *elf,
                          struct xsplice_elf_sec *base,
                          struct xsplice_elf_sec *rela);
+void xsplice_apply_jmp(struct xsplice_patch_func *func);
+void xsplice_revert_jmp(struct xsplice_patch_func *func);
 
 #endif /* __XEN_XSPLICE_H__ */
-- 
2.4.3


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v1 09/11] xsplice: Add support for bug frames
  2015-11-03 18:15 [PATCH v1 01/11] xsplice: Design document (v2) Ross Lagerwall
                   ` (6 preceding siblings ...)
  2015-11-03 18:16 ` [PATCH v1 08/11] xsplice: Implement support for applying patches Ross Lagerwall
@ 2015-11-03 18:16 ` Ross Lagerwall
  2015-11-05 19:43   ` Konrad Rzeszutek Wilk
  2015-11-03 18:16 ` [PATCH v1 10/11] xsplice: Add support for exception tables Ross Lagerwall
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 40+ messages in thread
From: Ross Lagerwall @ 2015-11-03 18:16 UTC (permalink / raw)
  To: xen-devel; +Cc: Ross Lagerwall, Jan Beulich, Andrew Cooper

Add support for handling bug frames contained with xsplice modules. If a
trap occurs search either the kernel bug table or an applied module's
bug table depending on the instruction pointer.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 xen/arch/x86/traps.c      |  30 ++++++++-----
 xen/common/symbols.c      |   7 +++
 xen/common/xsplice.c      | 107 +++++++++++++++++++++++++++++++++++++++++-----
 xen/include/xen/kernel.h  |   1 +
 xen/include/xen/xsplice.h |   4 ++
 5 files changed, 129 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index b32f696..cd51cfd 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -48,6 +48,7 @@
 #include <xen/kexec.h>
 #include <xen/trace.h>
 #include <xen/paging.h>
+#include <xen/xsplice.h>
 #include <xen/watchdog.h>
 #include <asm/system.h>
 #include <asm/io.h>
@@ -1076,20 +1077,29 @@ void do_invalid_op(struct cpu_user_regs *regs)
         return;
     }
 
-    if ( !is_active_kernel_text(regs->eip) ||
+    if ( !is_active_text(regs->eip) ||
          __copy_from_user(bug_insn, eip, sizeof(bug_insn)) ||
          memcmp(bug_insn, "\xf\xb", sizeof(bug_insn)) )
         goto die;
 
-    for ( bug = __start_bug_frames, id = 0; stop_frames[id]; ++bug )
+    if ( likely(is_active_kernel_text(regs->eip)) )
     {
-        while ( unlikely(bug == stop_frames[id]) )
-            ++id;
-        if ( bug_loc(bug) == eip )
-            break;
+        for ( bug = __start_bug_frames, id = 0; stop_frames[id]; ++bug )
+        {
+            while ( unlikely(bug == stop_frames[id]) )
+                ++id;
+            if ( bug_loc(bug) == eip )
+                break;
+        }
+        if ( !stop_frames[id] )
+            goto die;
+    }
+    else
+    {
+        bug = xsplice_find_bug(eip, &id);
+        if ( !bug )
+            goto die;
     }
-    if ( !stop_frames[id] )
-        goto die;
 
     eip += sizeof(bug_insn);
     if ( id == BUGFRAME_run_fn )
@@ -1103,7 +1113,7 @@ void do_invalid_op(struct cpu_user_regs *regs)
 
     /* WARN, BUG or ASSERT: decode the filename pointer and line number. */
     filename = bug_ptr(bug);
-    if ( !is_kernel(filename) )
+    if ( !is_kernel(filename) && !is_module(filename) )
         goto die;
     fixup = strlen(filename);
     if ( fixup > 50 )
@@ -1130,7 +1140,7 @@ void do_invalid_op(struct cpu_user_regs *regs)
     case BUGFRAME_assert:
         /* ASSERT: decode the predicate string pointer. */
         predicate = bug_msg(bug);
-        if ( !is_kernel(predicate) )
+        if ( !is_kernel(predicate) && !is_module(predicate) )
             predicate = "<unknown>";
 
         printk("Assertion '%s' failed at %s%s:%d\n",
diff --git a/xen/common/symbols.c b/xen/common/symbols.c
index a59c59d..bf5623f 100644
--- a/xen/common/symbols.c
+++ b/xen/common/symbols.c
@@ -17,6 +17,7 @@
 #include <xen/lib.h>
 #include <xen/string.h>
 #include <xen/spinlock.h>
+#include <xen/xsplice.h>
 #include <public/platform.h>
 #include <xen/guest_access.h>
 
@@ -101,6 +102,12 @@ bool_t is_active_kernel_text(unsigned long addr)
             (system_state < SYS_STATE_active && is_kernel_inittext(addr)));
 }
 
+bool_t is_active_text(unsigned long addr)
+{
+    return is_active_kernel_text(addr) ||
+           is_active_module_text(addr);
+}
+
 const char *symbols_lookup(unsigned long addr,
                            unsigned long *symbolsize,
                            unsigned long *offset,
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 4476be5..982954b 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -40,6 +40,11 @@ struct payload {
     int nfuncs;
     void *module_address;
     size_t module_pages;
+    size_t core_size;
+    size_t core_text_size;
+
+    struct bug_frame *start_bug_frames[4];
+    struct bug_frame *stop_bug_frames[4];
 
     char  id[XEN_XSPLICE_NAME_SIZE + 1];          /* Name of it. */
 };
@@ -525,26 +530,27 @@ static void free_module(struct payload *payload)
     payload->module_pages = 0;
 }
 
-static void alloc_section(struct xsplice_elf_sec *sec, size_t *core_size)
+static void alloc_section(struct xsplice_elf_sec *sec, size_t *size)
 {
-    size_t align_size = ROUNDUP(*core_size, sec->sec->sh_addralign);
+    size_t align_size = ROUNDUP(*size, sec->sec->sh_addralign);
     sec->sec->sh_entsize = align_size;
-    *core_size = sec->sec->sh_size + align_size;
+    *size = sec->sec->sh_size + align_size;
 }
 
 static int move_module(struct payload *payload, struct xsplice_elf *elf)
 {
     uint8_t *buf;
     int i;
-    size_t core_size = 0;
+    size_t size = 0;
 
     /* Allocate text regions */
     for ( i = 0; i < elf->hdr->e_shnum; i++ )
     {
         if ( (elf->sec[i].sec->sh_flags & (SHF_ALLOC|SHF_EXECINSTR)) ==
              (SHF_ALLOC|SHF_EXECINSTR) )
-            alloc_section(&elf->sec[i], &core_size);
+            alloc_section(&elf->sec[i], &size);
     }
+    payload->core_text_size = size;
 
     /* Allocate rw data */
     for ( i = 0; i < elf->hdr->e_shnum; i++ )
@@ -552,7 +558,7 @@ static int move_module(struct payload *payload, struct xsplice_elf *elf)
         if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
              !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
              (elf->sec[i].sec->sh_flags & SHF_WRITE) )
-            alloc_section(&elf->sec[i], &core_size);
+            alloc_section(&elf->sec[i], &size);
     }
 
     /* Allocate ro data */
@@ -561,15 +567,16 @@ static int move_module(struct payload *payload, struct xsplice_elf *elf)
         if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
              !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
              !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
-            alloc_section(&elf->sec[i], &core_size);
+            alloc_section(&elf->sec[i], &size);
     }
+    payload->core_size = size;
 
-    buf = alloc_module(core_size);
+    buf = alloc_module(size);
     if ( !buf ) {
         printk(XENLOG_ERR "Could not allocate memory for module\n");
         return -ENOMEM;
     }
-    memset(buf, 0, core_size);
+    memset(buf, 0, size);
 
     for ( i = 0; i < elf->hdr->e_shnum; i++ )
     {
@@ -584,7 +591,7 @@ static int move_module(struct payload *payload, struct xsplice_elf *elf)
     }
 
     payload->module_address = buf;
-    payload->module_pages = PFN_UP(core_size);
+    payload->module_pages = PFN_UP(size);
 
     return 0;
 }
@@ -659,6 +666,7 @@ static int find_special_sections(struct payload *payload,
                                  struct xsplice_elf *elf)
 {
     struct xsplice_elf_sec *sec;
+    int i;
 
     sec = xsplice_elf_sec_by_name(elf, ".xsplice.funcs");
     if ( !sec )
@@ -670,6 +678,19 @@ static int find_special_sections(struct payload *payload,
     payload->funcs = (struct xsplice_patch_func *)sec->load_addr;
     payload->nfuncs = sec->sec->sh_size / (sizeof *payload->funcs);
 
+    for ( i = 0; i < 4; i++ )
+    {
+        char str[14];
+
+        snprintf(str, sizeof str, ".bug_frames.%d", i);
+        sec = xsplice_elf_sec_by_name(elf, str);
+        if ( !sec )
+            continue;
+
+        payload->start_bug_frames[i] = (struct bug_frame *)sec->load_addr;
+        payload->stop_bug_frames[i] = (struct bug_frame *)(sec->load_addr + sec->sec->sh_size);
+    }
+
     return 0;
 }
 
@@ -912,6 +933,72 @@ void do_xsplice(void)
     }
 }
 
+
+/*
+ * Functions for handling special sections.
+ */
+struct bug_frame *xsplice_find_bug(const char *eip, int *id)
+{
+    struct payload *data;
+    struct bug_frame *bug;
+    int i;
+
+    /* No locking since this list is only ever changed during apply or revert
+     * context. */
+    list_for_each_entry ( data, &applied_list, applied_list )
+    {
+        for (i = 0; i < 4; i++) {
+            if (!data->start_bug_frames[i])
+                continue;
+            if ( !((void *)eip >= data->module_address &&
+                   (void *)eip < (data->module_address + data->core_text_size)))
+                continue;
+
+            for ( bug = data->start_bug_frames[i]; bug != data->stop_bug_frames[i]; ++bug ) {
+                if ( bug_loc(bug) == eip )
+                {
+                    *id = i;
+                    return bug;
+                }
+            }
+        }
+    }
+
+    return NULL;
+}
+
+bool_t is_module(const void *ptr)
+{
+    struct payload *data;
+
+    /* No locking since this list is only ever changed during apply or revert
+     * context. */
+    list_for_each_entry ( data, &applied_list, applied_list )
+    {
+        if ( ptr >= data->module_address &&
+             ptr < (data->module_address + data->core_size))
+            return true;
+    }
+
+    return false;
+}
+
+bool_t is_active_module_text(unsigned long addr)
+{
+    struct payload *data;
+
+    /* No locking since this list is only ever changed during apply or revert
+     * context. */
+    list_for_each_entry ( data, &applied_list, applied_list )
+    {
+        if ( (void *)addr >= data->module_address &&
+             (void *)addr < (data->module_address + data->core_text_size))
+            return true;
+    }
+
+    return false;
+}
+
 static int __init xsplice_init(void)
 {
     register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
diff --git a/xen/include/xen/kernel.h b/xen/include/xen/kernel.h
index 548b64d..df57754 100644
--- a/xen/include/xen/kernel.h
+++ b/xen/include/xen/kernel.h
@@ -99,6 +99,7 @@ extern enum system_state {
 } system_state;
 
 bool_t is_active_kernel_text(unsigned long addr);
+bool_t is_active_text(unsigned long addr);
 
 #endif /* _LINUX_KERNEL_H */
 
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index 507829c..772fa3a 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -22,6 +22,10 @@ extern void xsplice_printall(unsigned char key);
 
 void do_xsplice(void);
 
+struct bug_frame * xsplice_find_bug(const char *eip, int *id);
+bool_t is_module(const void *addr);
+bool_t is_active_module_text(unsigned long addr);
+
 /* Arch hooks */
 int xsplice_verify_elf(uint8_t *data, ssize_t len);
 int xsplice_perform_rel(struct xsplice_elf *elf,
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v1 10/11] xsplice: Add support for exception tables
  2015-11-03 18:15 [PATCH v1 01/11] xsplice: Design document (v2) Ross Lagerwall
                   ` (7 preceding siblings ...)
  2015-11-03 18:16 ` [PATCH v1 09/11] xsplice: Add support for bug frames Ross Lagerwall
@ 2015-11-03 18:16 ` Ross Lagerwall
  2015-11-05 19:47   ` Konrad Rzeszutek Wilk
  2015-11-27 16:28   ` Martin Pohlack
  2015-11-03 18:16 ` [PATCH v1 11/11] xsplice: Add support for alternatives Ross Lagerwall
                   ` (3 subsequent siblings)
  12 siblings, 2 replies; 40+ messages in thread
From: Ross Lagerwall @ 2015-11-03 18:16 UTC (permalink / raw)
  To: xen-devel; +Cc: Ross Lagerwall, Jan Beulich, Andrew Cooper

Add support for exception tables contained within xsplice modules. If an
exception occurs search either the main exception table or a particular
active module's exception table depending on the instruction pointer.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 xen/arch/x86/extable.c        | 36 ++++++++++++++++++++++--------------
 xen/common/xsplice.c          | 41 +++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/uaccess.h |  5 +++++
 xen/include/xen/xsplice.h     |  2 ++
 4 files changed, 70 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c
index 89b5bcb..2787a92 100644
--- a/xen/arch/x86/extable.c
+++ b/xen/arch/x86/extable.c
@@ -4,6 +4,7 @@
 #include <xen/perfc.h>
 #include <xen/sort.h>
 #include <xen/spinlock.h>
+#include <xen/xsplice.h>
 #include <asm/uaccess.h>
 
 #define EX_FIELD(ptr, field) ((unsigned long)&(ptr)->field + (ptr)->field)
@@ -18,7 +19,7 @@ static inline unsigned long ex_cont(const struct exception_table_entry *x)
 	return EX_FIELD(x, cont);
 }
 
-static int __init cmp_ex(const void *a, const void *b)
+static int cmp_ex(const void *a, const void *b)
 {
 	const struct exception_table_entry *l = a, *r = b;
 	unsigned long lip = ex_addr(l);
@@ -33,7 +34,7 @@ static int __init cmp_ex(const void *a, const void *b)
 }
 
 #ifndef swap_ex
-static void __init swap_ex(void *a, void *b, int size)
+static void swap_ex(void *a, void *b, int size)
 {
 	struct exception_table_entry *l = a, *r = b, tmp;
 	long delta = b - a;
@@ -46,19 +47,23 @@ static void __init swap_ex(void *a, void *b, int size)
 }
 #endif
 
-void __init sort_exception_tables(void)
+void sort_exception_table(struct exception_table_entry *start,
+                          struct exception_table_entry *stop)
 {
-    sort(__start___ex_table, __stop___ex_table - __start___ex_table,
-         sizeof(struct exception_table_entry), cmp_ex, swap_ex);
-    sort(__start___pre_ex_table,
-         __stop___pre_ex_table - __start___pre_ex_table,
+    sort(start, stop - start,
          sizeof(struct exception_table_entry), cmp_ex, swap_ex);
 }
 
-static inline unsigned long
-search_one_table(const struct exception_table_entry *first,
-                 const struct exception_table_entry *last,
-                 unsigned long value)
+void __init sort_exception_tables(void)
+{
+    sort_exception_table(__start___ex_table, __stop___ex_table);
+    sort_exception_table(__start___pre_ex_table, __stop___pre_ex_table);
+}
+
+unsigned long
+search_one_extable(const struct exception_table_entry *first,
+                   const struct exception_table_entry *last,
+                   unsigned long value)
 {
     const struct exception_table_entry *mid;
     long diff;
@@ -80,15 +85,18 @@ search_one_table(const struct exception_table_entry *first,
 unsigned long
 search_exception_table(unsigned long addr)
 {
-    return search_one_table(
-        __start___ex_table, __stop___ex_table-1, addr);
+    if ( likely(is_kernel(addr)) )
+        return search_one_extable(
+            __start___ex_table, __stop___ex_table-1, addr);
+    else
+        return search_module_extables(addr);
 }
 
 unsigned long
 search_pre_exception_table(struct cpu_user_regs *regs)
 {
     unsigned long addr = (unsigned long)regs->eip;
-    unsigned long fixup = search_one_table(
+    unsigned long fixup = search_one_extable(
         __start___pre_ex_table, __stop___pre_ex_table-1, addr);
     if ( fixup )
     {
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index 982954b..c5a403b 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -45,6 +45,10 @@ struct payload {
 
     struct bug_frame *start_bug_frames[4];
     struct bug_frame *stop_bug_frames[4];
+#ifdef CONFIG_X86
+    struct exception_table_entry *start_ex_table;
+    struct exception_table_entry *stop_ex_table;
+#endif
 
     char  id[XEN_XSPLICE_NAME_SIZE + 1];          /* Name of it. */
 };
@@ -691,6 +695,17 @@ static int find_special_sections(struct payload *payload,
         payload->stop_bug_frames[i] = (struct bug_frame *)(sec->load_addr + sec->sec->sh_size);
     }
 
+#ifdef CONFIG_X86
+    sec = xsplice_elf_sec_by_name(elf, ".ex_table");
+    if ( sec )
+    {
+        payload->start_ex_table = (struct exception_table_entry *)sec->load_addr;
+        payload->stop_ex_table = (struct exception_table_entry *)(sec->load_addr + sec->sec->sh_size);
+
+        sort_exception_table(payload->start_ex_table, payload->stop_ex_table);
+    }
+#endif
+
     return 0;
 }
 
@@ -999,6 +1014,32 @@ bool_t is_active_module_text(unsigned long addr)
     return false;
 }
 
+#ifdef CONFIG_X86
+unsigned long search_module_extables(unsigned long addr)
+{
+    struct payload *data;
+    unsigned long ret;
+
+    /* No locking since this list is only ever changed during apply or revert
+     * context. */
+    list_for_each_entry ( data, &applied_list, applied_list )
+    {
+        if ( !data->start_ex_table )
+            continue;
+        if ( !((void *)addr >= data->module_address &&
+               (void *)addr < (data->module_address + data->core_text_size)))
+            continue;
+
+        ret = search_one_extable(data->start_ex_table, data->stop_ex_table - 1,
+                                 addr);
+        if ( ret )
+            return ret;
+    }
+
+    return 0;
+}
+#endif
+
 static int __init xsplice_init(void)
 {
     register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
diff --git a/xen/include/asm-x86/uaccess.h b/xen/include/asm-x86/uaccess.h
index 947470d..9e67bf0 100644
--- a/xen/include/asm-x86/uaccess.h
+++ b/xen/include/asm-x86/uaccess.h
@@ -276,6 +276,11 @@ extern struct exception_table_entry __start___pre_ex_table[];
 extern struct exception_table_entry __stop___pre_ex_table[];
 
 extern unsigned long search_exception_table(unsigned long);
+extern unsigned long search_one_extable(const struct exception_table_entry *first,
+                                        const struct exception_table_entry *last,
+                                        unsigned long value);
 extern void sort_exception_tables(void);
+extern void sort_exception_table(struct exception_table_entry *start,
+                                 struct exception_table_entry *stop);
 
 #endif /* __X86_UACCESS_H__ */
diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
index 772fa3a..485eb08 100644
--- a/xen/include/xen/xsplice.h
+++ b/xen/include/xen/xsplice.h
@@ -26,6 +26,8 @@ struct bug_frame * xsplice_find_bug(const char *eip, int *id);
 bool_t is_module(const void *addr);
 bool_t is_active_module_text(unsigned long addr);
 
+unsigned long search_module_extables(unsigned long addr);
+
 /* Arch hooks */
 int xsplice_verify_elf(uint8_t *data, ssize_t len);
 int xsplice_perform_rel(struct xsplice_elf *elf,
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v1 11/11] xsplice: Add support for alternatives
  2015-11-03 18:15 [PATCH v1 01/11] xsplice: Design document (v2) Ross Lagerwall
                   ` (8 preceding siblings ...)
  2015-11-03 18:16 ` [PATCH v1 10/11] xsplice: Add support for exception tables Ross Lagerwall
@ 2015-11-03 18:16 ` Ross Lagerwall
  2015-11-05 19:48   ` Konrad Rzeszutek Wilk
  2015-11-04 21:10 ` [PATCH v1 01/11] xsplice: Design document (v2) Konrad Rzeszutek Wilk
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 40+ messages in thread
From: Ross Lagerwall @ 2015-11-03 18:16 UTC (permalink / raw)
  To: xen-devel; +Cc: Ross Lagerwall, Jan Beulich, Andrew Cooper

Add support for applying alternative sections within xsplice modules. At
module load time, apply any alternative sections that are found.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 xen/arch/x86/Makefile             |  2 +-
 xen/arch/x86/alternative.c        | 12 ++++++------
 xen/common/xsplice.c              | 11 +++++++++++
 xen/include/asm-x86/alternative.h |  1 +
 4 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 6e05532..5dbe2e8 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -7,7 +7,7 @@ subdir-y += oprofile
 
 subdir-$(x86_64) += x86_64
 
-obj-bin-y += alternative.init.o
+obj-bin-y += alternative.o
 obj-y += apic.o
 obj-y += bitops.o
 obj-bin-y += bzimage.init.o
diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index 46ac0fd..8d895ad 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -28,7 +28,7 @@
 extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
 
 #ifdef K8_NOP1
-static const unsigned char k8nops[] __initconst = {
+static const unsigned char k8nops[] = {
     K8_NOP1,
     K8_NOP2,
     K8_NOP3,
@@ -52,7 +52,7 @@ static const unsigned char * const k8_nops[ASM_NOP_MAX+1] = {
 #endif
 
 #ifdef P6_NOP1
-static const unsigned char p6nops[] __initconst = {
+static const unsigned char p6nops[] = {
     P6_NOP1,
     P6_NOP2,
     P6_NOP3,
@@ -75,7 +75,7 @@ static const unsigned char * const p6_nops[ASM_NOP_MAX+1] = {
 };
 #endif
 
-static const unsigned char * const *ideal_nops __initdata = k8_nops;
+static const unsigned char * const *ideal_nops = k8_nops;
 
 static int __init mask_nmi_callback(const struct cpu_user_regs *regs, int cpu)
 {
@@ -100,7 +100,7 @@ static void __init arch_init_ideal_nops(void)
 }
 
 /* Use this to add nops to a buffer, then text_poke the whole buffer. */
-static void __init add_nops(void *insns, unsigned int len)
+static void add_nops(void *insns, unsigned int len)
 {
     while ( len > 0 )
     {
@@ -127,7 +127,7 @@ static void __init add_nops(void *insns, unsigned int len)
  *
  * This routine is called with local interrupt disabled.
  */
-static void *__init text_poke_early(void *addr, const void *opcode, size_t len)
+static void *text_poke_early(void *addr, const void *opcode, size_t len)
 {
     memcpy(addr, opcode, len);
     sync_core();
@@ -142,7 +142,7 @@ static void *__init text_poke_early(void *addr, const void *opcode, size_t len)
  * APs have less capabilities than the boot processor are not handled.
  * Tough. Make sure you disable such features by hand.
  */
-static void __init apply_alternatives(struct alt_instr *start, struct alt_instr *end)
+void apply_alternatives(struct alt_instr *start, struct alt_instr *end)
 {
     struct alt_instr *a;
     u8 *instr, *replacement;
diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
index c5a403b..6a368af 100644
--- a/xen/common/xsplice.c
+++ b/xen/common/xsplice.c
@@ -682,6 +682,17 @@ static int find_special_sections(struct payload *payload,
     payload->funcs = (struct xsplice_patch_func *)sec->load_addr;
     payload->nfuncs = sec->sec->sh_size / (sizeof *payload->funcs);
 
+#ifdef CONFIG_X86
+    sec = xsplice_elf_sec_by_name(elf, ".altinstructions");
+    if ( sec )
+    {
+        local_irq_disable();
+        apply_alternatives((struct alt_instr *)sec->load_addr,
+                           (struct alt_instr *)(sec->load_addr + sec->sec->sh_size));
+        local_irq_enable();
+    }
+#endif
+
     for ( i = 0; i < 4; i++ )
     {
         char str[14];
diff --git a/xen/include/asm-x86/alternative.h b/xen/include/asm-x86/alternative.h
index 23c9b9f..8e83572 100644
--- a/xen/include/asm-x86/alternative.h
+++ b/xen/include/asm-x86/alternative.h
@@ -23,6 +23,7 @@ struct alt_instr {
     u8  replacementlen;     /* length of new instruction, <= instrlen */
 };
 
+extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end);
 extern void alternative_instructions(void);
 
 #define OLDINSTR(oldinstr)      "661:\n\t" oldinstr "\n662:\n"
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 01/11] xsplice: Design document (v2).
  2015-11-03 18:15 [PATCH v1 01/11] xsplice: Design document (v2) Ross Lagerwall
                   ` (9 preceding siblings ...)
  2015-11-03 18:16 ` [PATCH v1 11/11] xsplice: Add support for alternatives Ross Lagerwall
@ 2015-11-04 21:10 ` Konrad Rzeszutek Wilk
  2015-11-05 10:49   ` Ross Lagerwall
  2015-11-10  9:55 ` Ross Lagerwall
  2015-11-27 12:48 ` Martin Pohlack
  12 siblings, 1 reply; 40+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-04 21:10 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: Tim Deegan, Ian Jackson, Ian Campbell, Jan Beulich, xen-devel

On Tue, Nov 03, 2015 at 06:15:58PM +0000, Ross Lagerwall wrote:
> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> A mechanism is required to binarily patch the running hypervisor with new
> opcodes that have come about due to primarily security updates.
> 
> This document describes the design of the API that would allow us to
> upload to the hypervisor binary patches.
> 
> This document has been shaped by the input from:
>   Martin Pohlack <mpohlack@amazon.de>
>   Jan Beulich <jbeulich@suse.com>
> 
> Thank you!
> 
> Input-from: Martin Pohlack <mpohlack@amazon.de>
> Input-from: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> ---
>  docs/misc/xsplice.markdown | 999 +++++++++++++++++++++++++++++++++++++++++++++


9999 :-) What a nice number!

.. sniup..
> +## Design of payload format
> +
> +The payload **MUST** contain enough data to allow us to apply the update
> +and also safely reverse it. As such we **MUST** know:
> +
> + * The locations in memory to be patched. This can be determined dynamically
> +   via symbols or via virtual addresses.
> + * The new code that will be patched in.
> + * Signature to verify the payload.

Argh. We need to move the 'Signature to verify' in the 'v2' section
as I don't think we can get that done in time.

> +
> +This binary format can be constructed using an custom binary format but
> +there are severe disadvantages of it:
> +
> + * The format might need to be changed and we need an mechanism to accommodate
> +   that.
> + * It has to be platform agnostic.
> + * Easily constructed using existing tools.
> +
> +As such having the payload in an ELF file is the sensible way. We would be
> +carrying the various sets of structures (and data) in the ELF sections under
> +different names and with definitions. The prefix for the ELF section name
> +would always be: *.xsplice* to match up to the names of the structures.
> +
> +Note that every structure has padding. This is added so that the hypervisor
> +can re-use those fields as it sees fit.
> +
> +Earlier design attempted to ineptly explain the relations of the ELF sections
> +to each other without using proper ELF mechanism (sh_info, sh_link, data
> +structures using Elf types, etc). This design will explain in detail
> +the structures and how they are used together and not dig in the ELF
> +format - except mention that the section names should match the
> +structure names.
> +
> +The xSplice payload is a relocatable ELF binary. A typical binary would have:
> +
> + * One or more .text sections
> + * Zero or more read-only data sections
> + * Zero or more data sections
> + * Relocations for each of these sections
> +
> +It may also have some architecture-specific sections. For example:
> +
> + * Alternatives instructions
> + * Bug frames
> + * Exception tables
> + * Relocations for each of these sections
> +
> +The xSplice core code loads the payload as a standard ELF binary, relocates it
> +and handles the architecture-specifc sections as needed. This process is much
> +like what the Linux kernel module loader does. It contains no xSplice-specific
> +details and thus will not be discussed further.

What is 'it'? The 'process of what module loader does'?

> +
> +Importantly, the payload also contains a section with an array of structures
> +describing the functions to be patched:
> +<pre>
> +struct xsplice_patch_func {
> +    unsigned long new_addr;
> +    unsigned long new_size;
> +    unsigned long old_addr;
> +    unsigned long old_size;
> +    char *name;
> +    uint8_t pad[64];
> +};
> +<pre>

Uh, so 104 bytes ? Or did you mean to s/64/24/ so the structure is nicely
padded to 64-bytes?

I think that is what you meant.
> +
> +* `old_addr` is the address of the function to be patched and is filled in at
> +  compile time if the payload is statically linked and at run time if the
> +  payload is dynamically linked.
> +* `new_addr` is the address of the function that is replacing the old
> +  function. The address is filled in during relocation.
> +* `old_size` and `new_size` contain the sizes of the respective functions.
> +* `name` is used for looking up the old function address during dynamic
> +  linking.
> +
> +The size of the `xsplice_patch_func` array is determined from the ELF section
> +size.
> +
> +During patch apply, for each `xsplice_patch_func`, the core code inserts a
> +trampoline at `old_addr` to `new_addr`. During patch revert, for each
> +`xsplice_patch_func`, the core code copies the data from the undo buffer to
> +`old_addr`.
> +

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 02/11] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
  2015-11-03 18:15 ` [PATCH v1 02/11] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op Ross Lagerwall
@ 2015-11-04 21:17   ` Konrad Rzeszutek Wilk
  2015-11-12 16:28   ` Jan Beulich
  2015-11-13 23:50   ` Daniel De Graaf
  2 siblings, 0 replies; 40+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-04 21:17 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Ian Jackson,
	xen-devel, Daniel De Graaf

On Tue, Nov 03, 2015 at 06:15:59PM +0000, Ross Lagerwall wrote:
> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

It is very hard for me to review my own code, but mostly
what I see are some quite simple things (really really simple)

> diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
> new file mode 100644
> index 0000000..d984c8a
> --- /dev/null
> +++ b/xen/common/xsplice.c
> +struct payload {
> +    int32_t state;     /* One of XSPLICE_STATE_*. */
> +    int32_t rc;         /* 0 or -EXX. */
> +
> +    struct list_head   list;   /* Linked to 'payload_list'. */
> +
> +    char  id[XEN_XSPLICE_NAME_SIZE + 1];          /* Name of it. */
> +};

There is some space between char and id. Actually all of these look
like they were at the same indentation but are now a bit off.

It would be nice to have them align (the comments) and the
space between 'list_head   list' and 'char  id' shorten.


..snip..
> +int find_payload(xen_xsplice_id_t *id, bool_t need_lock, struct payload **f)
> +{
> +    struct payload *data;
> +    XEN_GUEST_HANDLE_PARAM(char) str;
> +    char name[XEN_XSPLICE_NAME_SIZE + 1] = { 0 }; /* 128 + 1 bytes on stack. Perhaps kzalloc? */

Jan was not too worried about the kzalloc so I think we can
remove the comment.

> +    int rc = -EINVAL;
> +
> +    rc = verify_id(id);
> +    if ( rc )
> +        return rc;
> +
> +    str = guest_handle_cast(id->name, char);
> +    if ( copy_from_guest(name, str, id->size) )
> +        return -EFAULT;
> +
> +    if ( need_lock )
> +        spin_lock(&payload_list_lock);
> +
> +    rc = -ENOENT;
> +    list_for_each_entry ( data, &payload_list, list )
> +    {
> +        if ( !strcmp(data->id, name) )
> +        {
> +            *f = data;
> +            rc = 0;
> +            break;
> +        }
> +    }
> +
> +    if ( need_lock )
> +        spin_unlock(&payload_list_lock);
> +
> +    return rc;
> +}
> +
> +

And we have an extra \n here. 

.. And besides that I realized that there is some code for which
we have CONFIG_ around (lock profile, gcov, kdump?, etc). We
should probably make the xSplice code also be guarded by this
as well.

Ross, I can do these changes easily. Unless you are itching
to do it now :-)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 04/11] xen-xsplice: Tool to manipulate xsplice payloads.
  2015-11-03 18:16 ` [PATCH v1 04/11] xen-xsplice: Tool to manipulate xsplice payloads Ross Lagerwall
@ 2015-11-04 21:27   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 40+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-04 21:27 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel


..snip..

Probably need this:

/* This value was choosen adhoc. It could be 42 too. */

> +#define MAX_LEN 11
> +static int list_func(int argc, char *argv[])
> +{

..snip..
> +

May be worth having an comment:
/* These MUST match to the 'action_options[]' array. 
> +enum {
> +    ACTION_APPLY = 0,
> +    ACTION_REVERT = 1,
> +    ACTION_UNLOAD = 2,
> +    ACTION_CHECK = 3

And:
	ACTION_REPLACE = 4,

> +};
> +
> +struct {
> +    int allow; /* State it must be in to call function. */
> +    int expected; /* The state to be in after the function. */
> +    const char *name;
> +    int (*function)(xc_interface *xch, char *id);
> +    unsigned int executed; /* Has the function been called?. */
> +} action_options[] = {
> +    {   .allow = XSPLICE_STATE_CHECKED,
> +        .expected = XSPLICE_STATE_APPLIED,
> +        .name = "apply",
> +        .function = xc_xsplice_apply,
> +    },
> +    {   .allow = XSPLICE_STATE_APPLIED,
> +        .expected = XSPLICE_STATE_CHECKED,
> +        .name = "revert",
> +        .function = xc_xsplice_revert,
> +    },
> +    {   .allow = XSPLICE_STATE_CHECKED | XSPLICE_STATE_LOADED,
> +        .expected = -ENOENT,
> +        .name = "unload",
> +        .function = xc_xsplice_unload,
> +    },
> +    {   .allow = XSPLICE_STATE_CHECKED | XSPLICE_STATE_LOADED,
> +        .expected = XSPLICE_STATE_CHECKED,
> +        .name = "check",
> +        .function = xc_xsplice_check
> +    },
> +    {   .allow = XSPLICE_STATE_CHECKED,
> +        .expected = XSPLICE_STATE_APPLIED,
> +        .name = "replace",
> +        .function = xc_xsplice_replace,
> +    },
> +};
> +

May want to have a comment saying what the delay is in human
values. Minutes? Seconds? Days?

> +#define RETRIES 300
> +#define DELAY 100000
> +

We may want to add a comment:

/* There are functions in action_options that are called in case
 * none of these match. */

> +struct {
> +    const char *name;
> +    int (*function)(int argc, char *argv[]);
> +} main_options[] = {
> +    { "help", help_func },
> +    { "list", list_func },
> +    { "upload", upload_func },
> +    { "all", all_func },

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 06/11] xsplice: Add helper elf routines
  2015-11-03 18:16 ` [PATCH v1 06/11] xsplice: Add helper elf routines Ross Lagerwall
@ 2015-11-04 21:49   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 40+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-04 21:49 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: Tim Deegan, Ian Jackson, Ian Campbell, Jan Beulich, xen-devel

On Tue, Nov 03, 2015 at 06:16:03PM +0000, Ross Lagerwall wrote:
> Add some elf routines and data structures in preparation for loading an
> xsplice payload.
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> ---
>  xen/common/Makefile           |   1 +
>  xen/common/xsplice_elf.c      | 122 ++++++++++++++++++++++++++++++++++++++++++
>  xen/include/xen/xsplice_elf.h |  44 +++++++++++++++
>  3 files changed, 167 insertions(+)
>  create mode 100644 xen/common/xsplice_elf.c
>  create mode 100644 xen/include/xen/xsplice_elf.h
> 
> diff --git a/xen/common/Makefile b/xen/common/Makefile
> index 1b17c9d..de7c08a 100644
> --- a/xen/common/Makefile
> +++ b/xen/common/Makefile
> @@ -57,6 +57,7 @@ obj-y += vsprintf.o
>  obj-y += wait.o
>  obj-y += xmalloc_tlsf.o
>  obj-y += xsplice.o
> +obj-y += xsplice_elf.o
>  
>  obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma unlzo unlz4 earlycpio,$(n).init.o)
>  
> diff --git a/xen/common/xsplice_elf.c b/xen/common/xsplice_elf.c
> new file mode 100644
> index 0000000..13a9229
> --- /dev/null
> +++ b/xen/common/xsplice_elf.c
> @@ -0,0 +1,122 @@

Do you want to add your company copyright header here?

> +#include <xen/lib.h>
> +#include <xen/errno.h>
> +#include <xen/xsplice_elf.h>
> +
> +struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
> +                                                const char *name)
> +{
> +    int i;

unsigned int ?

> +
> +    for ( i = 0; i < elf->hdr->e_shnum; i++ )
> +    {
> +        if ( !strcmp(name, elf->sec[i].name) )
> +            return &elf->sec[i];
> +    }
> +
> +    return NULL;
> +}
> +
> +static int elf_get_sections(struct xsplice_elf *elf, uint8_t *data)
> +{
> +    struct xsplice_elf_sec *sec;
> +    int i;

unsigned int;

Should we check the e_shnum for out of bound values? Hmm,
uint16t so not that bad..

> +
> +    sec = xmalloc_array(struct xsplice_elf_sec, elf->hdr->e_shnum);
> +    if ( !sec )
> +    {
> +        printk(XENLOG_ERR "Could not find section table\n");

Well that is not exactly right. It couldnt' allocate the memory.

Perhaps we should say:
	"Could not allocate memory for %s which has %u sections!\n", elf->name, elf->hdr->e_shnum);

> +        return -ENOMEM;
> +    }
> +
> +    for ( i = 0; i < elf->hdr->e_shnum; i++ )
> +    {
> +#ifdef CONFIG_ARM_32
> +        sec[i].sec = (Elf32_Shdr *)(data + elf->hdr->e_shoff +
> +                                    i * elf->hdr->e_shentsize);
> +#else
> +        sec[i].sec = (Elf64_Shdr *)(data + elf->hdr->e_shoff +
> +                                    i * elf->hdr->e_shentsize);
> +#endif

> +        sec[i].data = data + sec[i].sec->sh_offset;

We should validate that the 'sec[i].data' pointers are not outside
the memory allocated for 'data'.

> +    }
> +    elf->sec = sec;
> +
> +    return 0;
> +}
> +
> +static int elf_get_sym(struct xsplice_elf *elf, uint8_t *data)
> +{
> +    struct xsplice_elf_sec *symtab, *strtab_sec;
> +    struct xsplice_elf_sym *sym;
> +    const char *strtab;
> +    int i;
> +
> +    symtab = xsplice_elf_sec_by_name(elf, ".symtab");
> +    if ( !symtab )
> +    {
> +        printk(XENLOG_ERR "Could not find symbol table\n");
> +        return -EINVAL;
> +    }
> +
> +    strtab_sec = xsplice_elf_sec_by_name(elf, ".strtab");
> +    if ( !strtab_sec )
> +    {
> +        printk(XENLOG_ERR "Could not find string table\n");
> +        return -EINVAL;
> +    }
> +    strtab = (const char *)(data + strtab_sec->sec->sh_offset);

Should we do a check to make sure that 'strtab' is not bigger
than the amount of memory allocated for 'data'?

> +
> +    elf->nsym = symtab->sec->sh_size / symtab->sec->sh_entsize;
> +
> +    sym = xmalloc_array(struct xsplice_elf_sym, elf->nsym);

Perhaps also a validity check on the elf->nsym?..
> +    if ( !sym )
> +    {
> +        printk(XENLOG_ERR "Could not allocate memory for symbols\n");
> +        return -ENOMEM;
> +    }
> +
> +    for ( i = 0; i < elf->nsym; i++ )
> +    {
> +#ifdef CONFIG_ARM_32
> +        sym[i].sym = (Elf32_Sym *)(symtab->data + i * symtab->sec->sh_entsize);
> +#else
> +        sym[i].sym = (Elf64_Sym *)(symtab->data + i * symtab->sec->sh_entsize);
> +#endif
> +        sym[i].name = strtab + sym[i].sym->st_name;

Could we check that the 'sym[i].name is not outside the memory allocated
for data' ?
> +    }
> +    elf->sym = sym;
> +
> +    return 0;
> +}
> +
> +int xsplice_elf_load(struct xsplice_elf *elf, uint8_t *data, ssize_t len)
> +{
> +    const char *shstrtab;
> +    int i, rc;

unsigned int i;

> +
> +#ifdef CONFIG_ARM_32
> +    elf->hdr = (Elf32_Ehdr *)data;
> +#else
> +    elf->hdr = (Elf64_Ehdr *)data;
> +#endif
> +
> +    rc = elf_get_sections(elf, data);
> +    if ( rc )
> +        return rc;
> +
> +    shstrtab = (const char *)(data + elf->sec[elf->hdr->e_shstrndx].sec->sh_offset);

if (shstrtab > data + len)
	return -EINVAL;

> +    for ( i = 0; i < elf->hdr->e_shnum; i++ )
> +        elf->sec[i].name = shstrtab + elf->sec[i].sec->sh_name;
> +
> +    rc = elf_get_sym(elf, data);
> +    if ( rc )
> +        return rc;
> +
> +    return 0;
> +}
> +
> +void xsplice_elf_free(struct xsplice_elf *elf)
> +{
> +    xfree(elf->sec);
> +    xfree(elf->sym);

elf->sec = NULL;
elf->sym = NULL

just in case..
> +}
> diff --git a/xen/include/xen/xsplice_elf.h b/xen/include/xen/xsplice_elf.h
> new file mode 100644
> index 0000000..bac0053
> --- /dev/null
> +++ b/xen/include/xen/xsplice_elf.h
> @@ -0,0 +1,44 @@
> +#ifndef __XEN_XSPLICE_ELF_H__
> +#define __XEN_XSPLICE_ELF_H__
> +
> +#include <xen/types.h>
> +#include <xen/elfstructs.h>
> +
> +/* The following describes an Elf file as consumed by xsplice. */
> +struct xsplice_elf_sec {
> +#ifdef CONFIG_ARM_32
> +    Elf32_Shdr *sec;
> +#else
> +    Elf64_Shdr *sec;
> +#endif
> +    const char *name;
> +    const uint8_t *data;           /* A pointer to the data section */
> +    uint8_t *load_addr;            /* A pointer to the allocated destination */

Missing full stop.
> +};
> +
> +struct xsplice_elf_sym {
> +#ifdef CONFIG_ARM_32
> +    Elf32_Sym *sym;
> +#else
> +    Elf64_Sym *sym;
> +#endif
> +    const char *name;
> +};
> +
> +struct xsplice_elf {
> +#ifdef CONFIG_ARM_32
> +    Elf32_Ehdr *hdr;
> +#else
> +    Elf64_Ehdr *hdr;
> +#endif
> +    struct xsplice_elf_sec *sec;   /* Array of sections */
> +    struct xsplice_elf_sym *sym;   /* Array of symbols */

Missing full stop.
> +    int nsym;

unsigned int nsym;
> +};
> +
> +struct xsplice_elf_sec *xsplice_elf_sec_by_name(const struct xsplice_elf *elf,
> +                                                const char *name);
> +int xsplice_elf_load(struct xsplice_elf *elf, uint8_t *data, ssize_t len);
> +void xsplice_elf_free(struct xsplice_elf *elf);
> +
> +#endif /* __XEN_XSPLICE_ELF_H__ */
> -- 
> 2.4.3
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 07/11] xsplice: Implement payload loading
  2015-11-03 18:16 ` [PATCH v1 07/11] xsplice: Implement payload loading Ross Lagerwall
@ 2015-11-04 22:21   ` Konrad Rzeszutek Wilk
  2015-11-05 10:35     ` Jan Beulich
  2015-11-05 11:15     ` Ross Lagerwall
  0 siblings, 2 replies; 40+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-04 22:21 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: Andrew Cooper, Stefano Stabellini, Ian Campbell, Jan Beulich, xen-devel

On Tue, Nov 03, 2015 at 06:16:04PM +0000, Ross Lagerwall wrote:
> Add support for loading xsplice payloads. This is somewhat similar to
> the Linux kernel module loader, implementing the following steps:
> - Verify the elf file.
> - Parse the elf file.
> - Allocate a region of memory mapped within a free area of
>   [xen_virt_end, XEN_VIRT_END].
> - Copy allocated sections into the new region.
> - Resolve section symbols. All other symbols must be absolute addresses.
> - Perform relocations.
> - Process xsplice specific sections.
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> ---
>  xen/arch/arm/Makefile             |   1 +
>  xen/arch/arm/xsplice.c            |  23 ++++
>  xen/arch/x86/Makefile             |   1 +
>  xen/arch/x86/setup.c              |   7 +
>  xen/arch/x86/xsplice.c            |  90 ++++++++++++
>  xen/common/xsplice.c              | 282 ++++++++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/x86_64/page.h |   2 +
>  xen/include/xen/xsplice.h         |  22 +++
>  8 files changed, 428 insertions(+)
>  create mode 100644 xen/arch/arm/xsplice.c
>  create mode 100644 xen/arch/x86/xsplice.c
> 
> diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
> index 1ef39f7..f785c07 100644
> --- a/xen/arch/arm/Makefile
> +++ b/xen/arch/arm/Makefile
> @@ -39,6 +39,7 @@ obj-y += device.o
>  obj-y += decode.o
>  obj-y += processor.o
>  obj-y += smc.o
> +obj-y += xsplice.o
>  
>  #obj-bin-y += ....o
>  
> diff --git a/xen/arch/arm/xsplice.c b/xen/arch/arm/xsplice.c
> new file mode 100644
> index 0000000..8d85fa9
> --- /dev/null
> +++ b/xen/arch/arm/xsplice.c
> @@ -0,0 +1,23 @@
> +#include <xen/lib.h>
> +#include <xen/errno.h>
> +#include <xen/xsplice_elf.h>
> +#include <xen/xsplice.h>
> +
> +int xsplice_verify_elf(uint8_t *data, ssize_t len)
> +{
> +    return -ENOSYS;
> +}
> +
> +int xsplice_perform_rel(struct xsplice_elf *elf,
> +                        struct xsplice_elf_sec *base,
> +                        struct xsplice_elf_sec *rela)
> +{
> +    return -ENOSYS;
> +}
> +
> +int xsplice_perform_rela(struct xsplice_elf *elf,
> +                         struct xsplice_elf_sec *base,
> +                         struct xsplice_elf_sec *rela)
> +{
> +    return -ENOSYS;
> +}
> diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
> index 39a8059..6e05532 100644
> --- a/xen/arch/x86/Makefile
> +++ b/xen/arch/x86/Makefile
> @@ -61,6 +61,7 @@ obj-y += x86_emulate.o
>  obj-y += tboot.o
>  obj-y += hpet.o
>  obj-y += vm_event.o
> +obj-y += xsplice.o
>  obj-y += xstate.o
>  
>  obj-$(crash_debug) += gdbstub.o
> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> index 4ed0110..a79c5e3 100644
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -99,6 +99,9 @@ unsigned long __read_mostly xen_phys_start;
>  
>  unsigned long __read_mostly xen_virt_end;
>  
> +unsigned long __read_mostly module_virt_start;
> +unsigned long __read_mostly module_virt_end;
> +
>  DEFINE_PER_CPU(struct tss_struct, init_tss);
>  
>  char __section(".bss.stack_aligned") cpu0_stack[STACK_SIZE];
> @@ -1145,6 +1148,10 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>                     ~((1UL << L2_PAGETABLE_SHIFT) - 1);
>      destroy_xen_mappings(xen_virt_end, XEN_VIRT_START + BOOTSTRAP_MAP_BASE);
>  
> +    module_virt_start = xen_virt_end;
> +    module_virt_end = XEN_VIRT_END - NR_CPUS * PAGE_SIZE;
> +    BUG_ON(module_virt_end <= module_virt_start);
> +
>      memguard_init();
>  
>      nr_pages = 0;
> diff --git a/xen/arch/x86/xsplice.c b/xen/arch/x86/xsplice.c
> new file mode 100644
> index 0000000..dbff0d5
> --- /dev/null
> +++ b/xen/arch/x86/xsplice.c
> @@ -0,0 +1,90 @@

You would want to put Citrix regular Copyright header here.
> +#include <xen/lib.h>
> +#include <xen/errno.h>
> +#include <xen/xsplice_elf.h>
> +#include <xen/xsplice.h>
> +
> +int xsplice_verify_elf(uint8_t *data, ssize_t len)
> +{
> +
> +    Elf64_Ehdr *hdr = (Elf64_Ehdr *)data;
> +
> +    if ( len < (sizeof *hdr) ||
> +         !IS_ELF(*hdr) ||
> +         hdr->e_ident[EI_CLASS] != ELFCLASS64 ||
> +         hdr->e_ident[EI_DATA] != ELFDATA2LSB ||
> +         hdr->e_machine != EM_X86_64 )
> +    {
> +        printk(XENLOG_ERR "Invalid ELF file\n");

For audit reasons I think we should have at least the name (id) that
the payload was. Could we include that as argument and
print it here?

> +        return -EINVAL;
> +    }
> +
> +    return 0;
> +}
> +
> +int xsplice_perform_rel(struct xsplice_elf *elf,
> +                        struct xsplice_elf_sec *base,
> +                        struct xsplice_elf_sec *rela)
> +{
> +    printk(XENLOG_ERR "SHT_REL relocation unsupported\n");

%s: SHR_REL relocation ..\n", elf->name);

> +    return -ENOSYS;
> +}
> +
> +int xsplice_perform_rela(struct xsplice_elf *elf,
> +                         struct xsplice_elf_sec *base,
> +                         struct xsplice_elf_sec *rela)
> +{
> +    Elf64_Rela *r;
> +    int symndx, i;

unsigned int

> +    uint64_t val;
> +    uint8_t *dest;
> +

Can you double check that rela->sec-sh_entsize is not zero first?

> +    for ( i = 0; i < (rela->sec->sh_size / rela->sec->sh_entsize); i++ )
> +    {
> +        r = (Elf64_Rela *)(rela->data + i * rela->sec->sh_entsize);

Can you check that that 'r' is not past the memory allocated for 'data'?

> +        symndx = ELF64_R_SYM(r->r_info);
> +        dest = base->load_addr + r->r_offset;
> +        val = r->r_addend + elf->sym[symndx].sym->st_value;

Can you check that 'symndx' is not past the what we allocated for elf->sym?

> +
> +        switch ( ELF64_R_TYPE(r->r_info) )
> +        {
> +            case R_X86_64_NONE:
> +                break;
> +            case R_X86_64_64:
> +                *(uint64_t *)dest = val;
> +                break;
> +            case R_X86_64_32:
> +                *(uint32_t *)dest = val;
> +                if (val != *(uint32_t *)dest)
> +                    goto overflow;
> +                break;
> +            case R_X86_64_32S:
> +                *(int32_t *)dest = val;
> +                if ((int64_t)val != *(int32_t *)dest)
> +                    goto overflow;
> +                break;
> +            case R_X86_64_PLT32:
> +                /*
> +                 * Xen uses -fpic which normally uses PLT relocations
> +                 * except that it sets visibility to hidden which means
> +                 * that they are not used.  However, when gcc cannot
> +                 * inline memcpy it emits memcpy with default visibility
> +                 * which then creates a PLT relocation.  It can just be
> +                 * treated the same as R_X86_64_PC32.
> +                 */
> +                /* Fall through */
> +            case R_X86_64_PC32:
> +                *(uint32_t *)dest = val - (uint64_t)dest;
> +                break;
> +            default:
> +                printk(XENLOG_ERR "Unhandled relocation %lu\n",
> +                       ELF64_R_TYPE(r->r_info));
> +                return -EINVAL;
> +        }
> +    }
> +
> +    return 0;
> +
> + overflow:
> +    printk(XENLOG_ERR "Overflow in relocation %d in %s\n", i, rela->name);
> +    return -EOVERFLOW;
> +}
> diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
> index d984c8a..5e88c55 100644
> --- a/xen/common/xsplice.c
> +++ b/xen/common/xsplice.c
> @@ -12,6 +12,7 @@
>  #include <xen/stdbool.h>
>  #include <xen/sched.h>
>  #include <xen/lib.h>
> +#include <xen/xsplice_elf.h>
>  #include <xen/xsplice.h>
>  #include <public/sysctl.h>
>  
> @@ -29,9 +30,15 @@ struct payload {
>  
>      struct list_head   list;   /* Linked to 'payload_list'. */
>  
> +    void *module_address;
> +    size_t module_pages;
> +
>      char  id[XEN_XSPLICE_NAME_SIZE + 1];          /* Name of it. */
>  };
>  
> +static int load_module(struct payload *payload, uint8_t *raw, ssize_t len);
> +static void free_module(struct payload *payload);
> +
>  static const char *state2str(int32_t state)
>  {
>  #define STATE(x) [XSPLICE_STATE_##x] = #x
> @@ -140,6 +147,7 @@ static void __free_payload(struct payload *data)
>      list_del(&data->list);
>      payload_cnt --;
>      payload_version ++;
> +    free_module(data);
>      xfree(data);
>  }
>  
> @@ -178,6 +186,10 @@ static int xsplice_upload(xen_sysctl_xsplice_upload_t *upload)
>      if ( copy_from_guest(raw_data, upload->payload, upload->size) )
>          goto err_raw;
>  
> +    rc = load_module(data, raw_data, upload->size);
> +    if ( rc )
> +        goto err_raw;
> +
>      data->state = XSPLICE_STATE_LOADED;
>      data->rc = 0;
>      INIT_LIST_HEAD(&data->list);
> @@ -390,6 +402,276 @@ int xsplice_control(xen_sysctl_xsplice_op_t *xsplice)
>      return rc;
>  }
>  
> +
> +/*
> + * The following functions prepare an xSplice module to be executed by
> + * allocating space, loading the allocated sections, resolving symbols,
> + * performing relocations, etc.
> + */
> +#ifdef CONFIG_X86
> +static void *alloc_module(size_t size)

s/module/payload/
> +{
> +    mfn_t *mfn, *mfn_ptr;
> +    size_t pages, i;
> +    struct page_info *pg;
> +    unsigned long hole_start, hole_end, cur;
> +    struct payload *data, *data2;
> +
> +    ASSERT(size);
> +
> +    pages = PFN_UP(size);
> +    mfn = xmalloc_array(mfn_t, pages);
> +    if ( mfn == NULL )
> +        return NULL;
> +
> +    for ( i = 0; i < pages; i++ )
> +    {
> +        pg = alloc_domheap_page(NULL, 0);
> +        if ( pg == NULL )
> +            goto error;
> +        mfn[i] = _mfn(page_to_mfn(pg));
> +    }

This looks like 'vmalloc'. Why not use that?
(That explanation should be part of the commit description probably)

> +
> +    hole_start = (unsigned long)module_virt_start;
> +    hole_end = hole_start + pages * PAGE_SIZE;
> +    spin_lock(&payload_list_lock);
> +    list_for_each_entry ( data, &payload_list, list )
> +    {
> +        list_for_each_entry ( data2, &payload_list, list )
> +        {
> +            unsigned long start, end;
> +
> +            start = (unsigned long)data2->module_address;
> +            end = start + data2->module_pages * PAGE_SIZE;
> +            if ( hole_end > start && hole_start < end )
> +            {
> +                hole_start = end;
> +                hole_end = hole_start + pages * PAGE_SIZE;
> +                break;
> +            }
> +        }
> +        if ( &data2->list == &payload_list )
> +            break;
> +    }
> +    spin_unlock(&payload_list_lock);

This could be made in a nice function. 'find_hole' perhaps?

> +
> +    if ( hole_end >= module_virt_end )
> +        goto error;
> +
> +    for ( cur = hole_start, mfn_ptr = mfn; pages--; ++mfn_ptr, cur += PAGE_SIZE )
> +    {
> +        if ( map_pages_to_xen(cur, mfn_x(*mfn_ptr), 1, PAGE_HYPERVISOR_RWX) )
> +        {
> +            if ( cur != hole_start )
> +                destroy_xen_mappings(hole_start, cur);

I think 'destroy_xen_mappings' is OK handling hole_start == cur.

> +            goto error;
> +        }
> +    }
> +    xfree(mfn);
> +    return (void *)hole_start;
> +
> + error:
> +    while ( i-- )
> +        free_domheap_page(mfn_to_page(mfn_x(mfn[i])));
> +    xfree(mfn);
> +    return NULL;
> +}
> +#else
> +static void *alloc_module(size_t size)

s/module/payload/
> +{
> +    return NULL;
> +}
> +#endif
> +
> +static void free_module(struct payload *payload)
> +{
> +    int i;

unsigned int;

> +    struct page_info *pg;
> +    PAGE_LIST_HEAD(pg_list);
> +    void *va = payload->module_address;
> +    unsigned long addr = (unsigned long)va;
> +
> +    if ( !payload->module_address )
> +        return;

How about 'if ( !addr )
		return;
?

> +
> +    payload->module_address = NULL;
> +
> +    for ( i = 0; i < payload->module_pages; i++ )
> +        page_list_add(vmap_to_page(va + i * PAGE_SIZE), &pg_list);
> +
> +    destroy_xen_mappings(addr, addr + payload->module_pages * PAGE_SIZE);
> +
> +    while ( (pg = page_list_remove_head(&pg_list)) != NULL )
> +        free_domheap_page(pg);
> +
> +    payload->module_pages = 0;
> +}
> +
> +static void alloc_section(struct xsplice_elf_sec *sec, size_t *core_size)

s/alloc/compute/?

> +{
> +    size_t align_size = ROUNDUP(*core_size, sec->sec->sh_addralign);
> +    sec->sec->sh_entsize = align_size;
> +    *core_size = sec->sec->sh_size + align_size;
> +}
> +
> +static int move_module(struct payload *payload, struct xsplice_elf *elf)
> +{
> +    uint8_t *buf;
> +    int i;

unsigned int i;

> +    size_t core_size = 0;
> +
> +    /* Allocate text regions */

s/Allocate/Compute/

> +    for ( i = 0; i < elf->hdr->e_shnum; i++ )
> +    {
> +        if ( (elf->sec[i].sec->sh_flags & (SHF_ALLOC|SHF_EXECINSTR)) ==
> +             (SHF_ALLOC|SHF_EXECINSTR) )
> +            alloc_section(&elf->sec[i], &core_size);
> +    }
> +
> +    /* Allocate rw data */
> +    for ( i = 0; i < elf->hdr->e_shnum; i++ )
> +    {
> +        if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> +             !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
> +             (elf->sec[i].sec->sh_flags & SHF_WRITE) )
> +            alloc_section(&elf->sec[i], &core_size);
> +    }
> +
> +    /* Allocate ro data */
> +    for ( i = 0; i < elf->hdr->e_shnum; i++ )
> +    {
> +        if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> +             !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
> +             !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
> +            alloc_section(&elf->sec[i], &core_size);
> +    }
> +
> +    buf = alloc_module(core_size);
> +    if ( !buf ) {
> +        printk(XENLOG_ERR "Could not allocate memory for module\n");

(%s: Could not allocate %u memory for payload!\n", elf->name, core_size);

> +        return -ENOMEM;
> +    }
> +    memset(buf, 0, core_size);

Perhaps for fun it ought to be 'ud2' ?

> +
> +    for ( i = 0; i < elf->hdr->e_shnum; i++ )
> +    {
> +        if ( elf->sec[i].sec->sh_flags & SHF_ALLOC )
> +        {
> +            elf->sec[i].load_addr = buf + elf->sec[i].sec->sh_entsize;
> +            memcpy(elf->sec[i].load_addr, elf->sec[i].data,
> +                   elf->sec[i].sec->sh_size);
> +            printk(XENLOG_DEBUG "Loaded %s at 0x%p\n",

Add %s: at the start .. 
> +                   elf->sec[i].name, elf->sec[i].load_addr);

which would be elf->name.

> +        }
> +    }
> +
> +    payload->module_address = buf;
> +    payload->module_pages = PFN_UP(core_size);

Instead of module could we name it payload?

> +
> +    return 0;
> +}
> +
> +static int resolve_symbols(struct xsplice_elf *elf)

s/resolve/check/

> +{
> +    int i;

unsigned int;

> +
> +    for ( i = 1; i < elf->nsym; i++ )

Why 1? Please explain as comment.


> +    {
> +        switch ( elf->sym[i].sym->st_shndx )
> +        {
> +            case SHN_COMMON:
> +                printk(XENLOG_ERR "Unexpected common symbol: %s\n",
> +                       elf->sym[i].name);

Please also include elf->name in the error.

> +                return -EINVAL;
> +                break;
> +            case SHN_UNDEF:
> +                printk(XENLOG_ERR "Unknown symbol: %s\n", elf->sym[i].name);

Ditto.
> +                return -ENOENT;
> +                break;
> +            case SHN_ABS:
> +                printk(XENLOG_DEBUG "Absolute symbol: %s => 0x%p\n",
> +                       elf->sym[i].name, (void *)elf->sym[i].sym->st_value);
> +                break;
> +            default:
> +                if ( elf->sec[elf->sym[i].sym->st_shndx].sec->sh_flags & SHF_ALLOC )
> +                {
> +                    elf->sym[i].sym->st_value +=
> +                        (unsigned long)elf->sec[elf->sym[i].sym->st_shndx].load_addr;
> +                    printk(XENLOG_DEBUG "Symbol resolved: %s => 0x%p\n",

Ditto;
> +                           elf->sym[i].name, (void *)elf->sym[i].sym->st_value);
> +                }
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +static int perform_relocs(struct xsplice_elf *elf)
> +{
> +    struct xsplice_elf_sec *rela, *base;
> +    int i, rc;
> +

unsigned int i;

> +    for ( i = 0; i < elf->hdr->e_shnum; i++ )
> +    {
> +        rela = &elf->sec[i];
> +
> +        /* Is it a valid relocation section? */
> +        if ( rela->sec->sh_info >= elf->hdr->e_shnum )
> +            continue;

Um, don't we want to mark it as invalid or such?
Or overwrite it so we won't use it?

> +
> +        base = &elf->sec[rela->sec->sh_info];
> +
> +        /* Don't relocate non-allocated sections */
> +        if ( !(base->sec->sh_flags & SHF_ALLOC) )
> +            continue;

> +
> +        if ( elf->sec[i].sec->sh_type == SHT_RELA )
> +            rc = xsplice_perform_rela(elf, base, rela);
> +        else if ( elf->sec[i].sec->sh_type == SHT_REL )
> +            rc = xsplice_perform_rel(elf, base, rela);
> +
> +        if ( rc )
> +            return rc;
> +    }
> +
> +    return 0;
> +}
> +
> +static int load_module(struct payload *payload, uint8_t *raw, ssize_t len)
> +{
> +    struct xsplice_elf elf;

Wait a minute? We ditch it after this?

> +    int rc = 0;
> +
> +    rc = xsplice_verify_elf(raw, len);
> +    if ( rc )
> +        return rc;
> +
> +    rc = xsplice_elf_load(&elf, raw, len);
> +    if ( rc )
> +        return rc;
> +
> +    rc = move_module(payload, &elf);
> +    if ( rc )
> +        goto err_elf;
> +
> +    rc = resolve_symbols(&elf);
> +    if ( rc )
> +        goto err_module;
> +
> +    rc = perform_relocs(&elf);
> +    if ( rc )
> +        goto err_module;
> +

Shouldn't you call xsplice_elf_free(&elf) here? Or
hook up the elf to the 'struct payload'?


If not, who is going to clean up elf->sec and elf->sym when the
payload is unloaded?
> +    return 0;
> +
> + err_module:
> +    free_module(payload);
> + err_elf:
> +    xsplice_elf_free(&elf);
> +
> +    return rc;
> +}
> +
>  static int __init xsplice_init(void)
>  {
>      register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
> diff --git a/xen/include/asm-x86/x86_64/page.h b/xen/include/asm-x86/x86_64/page.h
> index 19ab4d0..e6f08e9 100644
> --- a/xen/include/asm-x86/x86_64/page.h
> +++ b/xen/include/asm-x86/x86_64/page.h
> @@ -38,6 +38,8 @@
>  #include <xen/pdx.h>
>  
>  extern unsigned long xen_virt_end;
> +extern unsigned long module_virt_start;
> +extern unsigned long module_virt_end;
>  
>  #define spage_to_pdx(spg) (((spg) - spage_table)<<(SUPERPAGE_SHIFT-PAGE_SHIFT))
>  #define pdx_to_spage(pdx) (spage_table + ((pdx)>>(SUPERPAGE_SHIFT-PAGE_SHIFT)))
> diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
> index 41e28da..a3946a3 100644
> --- a/xen/include/xen/xsplice.h
> +++ b/xen/include/xen/xsplice.h
> @@ -1,9 +1,31 @@
>  #ifndef __XEN_XSPLICE_H__
>  #define __XEN_XSPLICE_H__
>  
> +struct xsplice_elf;
> +struct xsplice_elf_sec;
> +struct xsplice_elf_sym;
> +
> +struct xsplice_patch_func {
> +    unsigned long new_addr;
> +    unsigned long new_size;
> +    unsigned long old_addr;
> +    unsigned long old_size;
> +    char *name;
> +    unsigned char undo[8];
> +};

We don't use them in this patch. They could be moved to another patch.
> +
>  struct xen_sysctl_xsplice_op;
>  int xsplice_control(struct xen_sysctl_xsplice_op *);
>  
>  extern void xsplice_printall(unsigned char key);
>  
> +/* Arch hooks */
> +int xsplice_verify_elf(uint8_t *data, ssize_t len);
> +int xsplice_perform_rel(struct xsplice_elf *elf,
> +                        struct xsplice_elf_sec *base,
> +                        struct xsplice_elf_sec *rela);
> +int xsplice_perform_rela(struct xsplice_elf *elf,
> +                         struct xsplice_elf_sec *base,
> +                         struct xsplice_elf_sec *rela);
> +
>  #endif /* __XEN_XSPLICE_H__ */
> -- 
> 2.4.3
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 08/11] xsplice: Implement support for applying patches
  2015-11-03 18:16 ` [PATCH v1 08/11] xsplice: Implement support for applying patches Ross Lagerwall
@ 2015-11-05  3:17   ` Konrad Rzeszutek Wilk
  2015-11-05 11:45     ` Ross Lagerwall
  2015-11-05  3:19   ` Konrad Rzeszutek Wilk
  2015-11-27 13:51   ` Martin Pohlack
  2 siblings, 1 reply; 40+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-05  3:17 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: Kevin Tian, Ian Campbell, Andrew Cooper, xen-devel, Jan Beulich,
	Stefano Stabellini, Jun Nakajima, Aravind Gopalakrishnan,
	Boris Ostrovsky, Suravee Suthikulpanit

On Tue, Nov 03, 2015 at 06:16:05PM +0000, Ross Lagerwall wrote:
> Implement support for the apply, revert and replace actions.
> 
> To perform and action on a payload, the hypercall sets up a data
> structure to schedule the work.  A hook is added in all the
> return-to-guest paths to check for work to do and execute it if needed.
> In this way, patches can be applied with all CPUs idle and without
> stacks.  The first CPU to do_xsplice() becomes the master and triggers a
> reschedule softirq to trigger all the other CPUs to enter do_xsplice()
> with no stack.  Once all CPUs have rendezvoused, all CPUs disable IRQs
> and NMIs are ignored. The system is then quiscient and the master
> performs the action.  After this, all CPUs enable IRQs and NMIs are
> re-enabled.
> 
> The action to perform is one of:
> - APPLY: For each function in the module, store the first 5 bytes of the
>   old function and replace it with a jump to the new function.
> - REVERT: Copy the previously stored bytes into the first 5 bytes of the
>   old function.
> - REPLACE: Revert each applied module and then apply the new module.
> 
> To prevent a deadlock with any other barrier in the system, the master
> will wait for up to 30ms before timing out.  I've taken some
> measurements and found the patch application to take about 100 μs on a
> 72 CPU system, whether idle or fully loaded.
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> ---
>  xen/arch/arm/xsplice.c      |   8 ++
>  xen/arch/x86/domain.c       |   4 +
>  xen/arch/x86/hvm/svm/svm.c  |   2 +
>  xen/arch/x86/hvm/vmx/vmcs.c |   2 +
>  xen/arch/x86/xsplice.c      |  19 ++++
>  xen/common/xsplice.c        | 264 ++++++++++++++++++++++++++++++++++++++++++--
>  xen/include/asm-arm/nmi.h   |  13 +++
>  xen/include/xen/xsplice.h   |   7 +-
>  8 files changed, 306 insertions(+), 13 deletions(-)
> 
> diff --git a/xen/arch/arm/xsplice.c b/xen/arch/arm/xsplice.c
> index 8d85fa9..3c34eb3 100644
> --- a/xen/arch/arm/xsplice.c
> +++ b/xen/arch/arm/xsplice.c
> @@ -3,6 +3,14 @@
>  #include <xen/xsplice_elf.h>
>  #include <xen/xsplice.h>
>  
> +void xsplice_apply_jmp(struct xsplice_patch_func *func)
> +{
> +}
> +
> +void xsplice_revert_jmp(struct xsplice_patch_func *func)
> +{
> +}
> +
>  int xsplice_verify_elf(uint8_t *data, ssize_t len)
>  {
>      return -ENOSYS;
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index fe3be30..4420cfc 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -36,6 +36,7 @@
>  #include <xen/cpu.h>
>  #include <xen/wait.h>
>  #include <xen/guest_access.h>
> +#include <xen/xsplice.h>
>  #include <public/sysctl.h>
>  #include <asm/regs.h>
>  #include <asm/mc146818rtc.h>
> @@ -120,6 +121,7 @@ static void idle_loop(void)
>          (*pm_idle)();
>          do_tasklet();
>          do_softirq();
> +        do_xsplice();
>      }
>  }
>  
> @@ -136,6 +138,7 @@ void startup_cpu_idle_loop(void)
>  
>  static void noreturn continue_idle_domain(struct vcpu *v)
>  {
> +    do_xsplice();
>      reset_stack_and_jump(idle_loop);
>  }
>  
> @@ -143,6 +146,7 @@ static void noreturn continue_nonidle_domain(struct vcpu *v)
>  {
>      check_wakeup_from_wait();
>      mark_regs_dirty(guest_cpu_user_regs());
> +    do_xsplice();
>      reset_stack_and_jump(ret_from_intr);
>  }
>  
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index 8de41fa..65bf7e9 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -26,6 +26,7 @@
>  #include <xen/hypercall.h>
>  #include <xen/domain_page.h>
>  #include <xen/xenoprof.h>
> +#include <xen/xsplice.h>
>  #include <asm/current.h>
>  #include <asm/io.h>
>  #include <asm/paging.h>
> @@ -1071,6 +1072,7 @@ static void noreturn svm_do_resume(struct vcpu *v)
>  
>      hvm_do_resume(v);
>  
> +    do_xsplice();
>      reset_stack_and_jump(svm_asm_do_resume);
>  }
>  
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 4ea1ad1..d996f47 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -25,6 +25,7 @@
>  #include <xen/kernel.h>
>  #include <xen/keyhandler.h>
>  #include <xen/vm_event.h>
> +#include <xen/xsplice.h>
>  #include <asm/current.h>
>  #include <asm/cpufeature.h>
>  #include <asm/processor.h>
> @@ -1685,6 +1686,7 @@ void vmx_do_resume(struct vcpu *v)
>      }
>  
>      hvm_do_resume(v);
> +    do_xsplice();
>      reset_stack_and_jump(vmx_asm_do_vmentry);
>  }
>  
> diff --git a/xen/arch/x86/xsplice.c b/xen/arch/x86/xsplice.c
> index dbff0d5..31e4124 100644
> --- a/xen/arch/x86/xsplice.c
> +++ b/xen/arch/x86/xsplice.c
> @@ -3,6 +3,25 @@
>  #include <xen/xsplice_elf.h>
>  #include <xen/xsplice.h>
>  
> +#define PATCH_INSN_SIZE 5
> +
> +void xsplice_apply_jmp(struct xsplice_patch_func *func)

Don't we want for it to be 'int'
> +{
> +    uint32_t val;
> +    uint8_t *old_ptr;
> +
> +    old_ptr = (uint8_t *)func->old_addr;
> +    memcpy(func->undo, old_ptr, PATCH_INSN_SIZE);

And perhaps use something which can catch an exception (#GP) so that
this can error out?
> +    *old_ptr++ = 0xe9; /* Relative jump */
> +    val = func->new_addr - func->old_addr - PATCH_INSN_SIZE;
> +    memcpy(old_ptr, &val, sizeof val);
> +}
> +
> +void xsplice_revert_jmp(struct xsplice_patch_func *func)
> +{
> +    memcpy((void *)func->old_addr, func->undo, PATCH_INSN_SIZE);
> +}
> +
>  int xsplice_verify_elf(uint8_t *data, ssize_t len)
>  {
>  
> diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
> index 5e88c55..4476be5 100644
> --- a/xen/common/xsplice.c
> +++ b/xen/common/xsplice.c
> @@ -11,16 +11,21 @@
>  #include <xen/guest_access.h>
>  #include <xen/stdbool.h>
>  #include <xen/sched.h>
> +#include <xen/softirq.h>
>  #include <xen/lib.h>
> +#include <xen/wait.h>
>  #include <xen/xsplice_elf.h>
>  #include <xen/xsplice.h>
>  #include <public/sysctl.h>
>  
>  #include <asm/event.h>
> +#include <asm/nmi.h>
>  
>  static DEFINE_SPINLOCK(payload_list_lock);
>  static LIST_HEAD(payload_list);
>  
> +static LIST_HEAD(applied_list);
> +
>  static unsigned int payload_cnt;
>  static unsigned int payload_version = 1;
>  
> @@ -29,15 +34,34 @@ struct payload {
>      int32_t rc;         /* 0 or -EXX. */
>  
>      struct list_head   list;   /* Linked to 'payload_list'. */
> +    struct list_head   applied_list;   /* Linked to 'applied_list'. */
>  
> +    struct xsplice_patch_func *funcs;
> +    int nfuncs;

unsigned int;

>      void *module_address;
>      size_t module_pages;
>  
>      char  id[XEN_XSPLICE_NAME_SIZE + 1];          /* Name of it. */
>  };
>  
> +/* Defines an outstanding patching action. */
> +struct xsplice_work
> +{
> +    atomic_t semaphore;          /* Used for rendezvous */
> +    atomic_t irq_semaphore;      /* Used to signal all IRQs disabled */
> +    struct payload *data;        /* The payload on which to act */
> +    volatile bool_t do_work;     /* Signals work to do */
> +    volatile bool_t ready;       /* Signals all CPUs synchronized */
> +    uint32_t cmd;                /* Action request. XSPLICE_ACTION_* */

Now since you have a pointer to 'data' can't you follow that for the
cmd? Or at least the 'data->state'?

Missing full stops.
> +};
> +
> +static DEFINE_SPINLOCK(xsplice_work_lock);
> +/* There can be only one outstanding patching action. */
> +static struct xsplice_work xsplice_work;
> +
>  static int load_module(struct payload *payload, uint8_t *raw, ssize_t len);
>  static void free_module(struct payload *payload);
> +static int schedule_work(struct payload *data, uint32_t cmd);
>  
>  static const char *state2str(int32_t state)
>  {
> @@ -341,28 +365,22 @@ static int xsplice_action(xen_sysctl_xsplice_action_t *action)
>      case XSPLICE_ACTION_REVERT:
>          if ( data->state == XSPLICE_STATE_APPLIED )
>          {
> -            /* No implementation yet. */
> -            data->state = XSPLICE_STATE_CHECKED;
> -            data->rc = 0;
> -            rc = 0;
> +            data->rc = -EAGAIN;
> +            rc = schedule_work(data, action->cmd);
>          }
>          break;
>      case XSPLICE_ACTION_APPLY:
>          if ( (data->state == XSPLICE_STATE_CHECKED) )
>          {
> -            /* No implementation yet. */
> -            data->state = XSPLICE_STATE_APPLIED;
> -            data->rc = 0;
> -            rc = 0;
> +            data->rc = -EAGAIN;
> +            rc = schedule_work(data, action->cmd);
>          }
>          break;
>      case XSPLICE_ACTION_REPLACE:
>          if ( data->state == XSPLICE_STATE_CHECKED )
>          {
> -            /* No implementation yet. */
> -            data->state = XSPLICE_STATE_CHECKED;
> -            data->rc = 0;
> -            rc = 0;
> +            data->rc = -EAGAIN;
> +            rc = schedule_work(data, action->cmd);
>          }
>          break;
>      default:
> @@ -637,6 +655,24 @@ static int perform_relocs(struct xsplice_elf *elf)
>      return 0;
>  }
>  
> +static int find_special_sections(struct payload *payload,
> +                                 struct xsplice_elf *elf)
> +{
> +    struct xsplice_elf_sec *sec;
> +
> +    sec = xsplice_elf_sec_by_name(elf, ".xsplice.funcs");
> +    if ( !sec )
> +    {
> +        printk(XENLOG_ERR ".xsplice.funcs is missing\n");
> +        return -1;
> +    }
> +
> +    payload->funcs = (struct xsplice_patch_func *)sec->load_addr;
> +    payload->nfuncs = sec->sec->sh_size / (sizeof *payload->funcs);
> +
> +    return 0;
> +}

That looks like it should belong to another patch?
> +
>  static int load_module(struct payload *payload, uint8_t *raw, ssize_t len)
>  {
>      struct xsplice_elf elf;
> @@ -662,6 +698,10 @@ static int load_module(struct payload *payload, uint8_t *raw, ssize_t len)
>      if ( rc )
>          goto err_module;
>  
> +    rc = find_special_sections(payload, &elf);
> +    if ( rc )
> +        goto err_module;
> +

Ditto?
>      return 0;
>  
>   err_module:
> @@ -672,6 +712,206 @@ static int load_module(struct payload *payload, uint8_t *raw, ssize_t len)
>      return rc;
>  }
>  
> +
> +/*
> + * The following functions get the CPUs into an appropriate state and
> + * apply (or revert) each of the module's functions.

s/module/payload/

> + */
> +
> +/*
> + * This function is executed having all other CPUs with no stack and IRQs
> + * disabled.

Well, there is some stack. For example from 'cpu_idle' - you have the
'cpu_idle' on the stack.

> + */
> +static int apply_payload(struct payload *data)
> +{
> +    int i;

unsigned int
> +
> +    printk(XENLOG_DEBUG "Applying payload: %s\n", data->id);
> +
> +    for ( i = 0; i < data->nfuncs; i++ )
> +        xsplice_apply_jmp(data->funcs + i);

And if this returns an error then we could skip adding
it to the applied_list..
> +

Also the patching in Linux seems to do some icache purging.
Should we use that?

> +    list_add_tail(&data->applied_list, &applied_list);
> +
> +    return 0;
> +}
> +
> +/*
> + * This function is executed having all other CPUs with no stack and IRQs
> + * disabled.
> + */
> +static int revert_payload(struct payload *data)
> +{
> +    int i;

unsigned int i;
> +
> +    printk(XENLOG_DEBUG "Reverting payload: %s\n", data->id);
> +
> +    for ( i = 0; i < data->nfuncs; i++ )
> +        xsplice_revert_jmp(data->funcs + i);
> +
> +    list_del(&data->applied_list);
> +
> +    return 0;
> +}
> +
> +/* Must be holding the payload_list lock */

Missing full stop.

Should that lock be called something else now? (Because it is certainly
not protecting the list anymore - but also the scheduling action).
> +static int schedule_work(struct payload *data, uint32_t cmd)
> +{
> +    /* Fail if an operation is already scheduled */
> +    if ( xsplice_work.do_work )
> +        return -EAGAIN;
> +

> +    xsplice_work.cmd = cmd;
> +    xsplice_work.data = data;
> +    atomic_set(&xsplice_work.semaphore, 0);
> +    atomic_set(&xsplice_work.irq_semaphore, 0);
> +    xsplice_work.ready = false;
> +    smp_mb();
> +    xsplice_work.do_work = true;
> +    smp_mb();

So this is your 'GO GO' signal right? I think you may want
to have 'smb_wmb()'
> +
> +    return 0;
> +}
> +

/me laughs. What a way to 'fix' the NMI watchdog.

> +static int mask_nmi_callback(const struct cpu_user_regs *regs, int cpu)
> +{
> +    return 1;
> +}
> +
> +static void reschedule_fn(void *unused)
> +{
> +    smp_mb(); /* Synchronize with setting do_work */
> +    raise_softirq(SCHEDULE_SOFTIRQ);
> +}
> +
> +/*
> + * The main function which manages the work of quiescing the system and
> + * patching code.
> + */
> +void do_xsplice(void)
> +{
> +    int id;

unsigned int id;
> +    unsigned int total_cpus;
> +    nmi_callback_t saved_nmi_callback;
> +
> +    /* Fast path: no work to do */

Missing full stop.
> +    if ( likely(!xsplice_work.do_work) )
> +        return;
> +
> +    ASSERT(local_irq_is_enabled());
> +
> +    spin_lock(&xsplice_work_lock);
> +    id = atomic_read(&xsplice_work.semaphore);
> +    atomic_inc(&xsplice_work.semaphore);
> +    spin_unlock(&xsplice_work_lock);

Could you use 'atomic_inc_and_test' and then you can get
rid of the spinlock.

> +
> +    total_cpus = num_online_cpus();

Which could change across these invocations.. Perhaps
during these calls we need to lock up CPU up/down code?

> +
> +    if ( id == 0 )
> +    {

Can you just make this its own function? Perhaps call it
'xsplice_do_single' or such?

> +        s_time_t timeout, start;
> +
> +        /* Trigger other CPUs to execute do_xsplice */

Missing full stop.
> +        smp_call_function(reschedule_fn, NULL, 0);
> +
> +        /* Wait for other CPUs with a timeout */

Missing full stop.
> +        start = NOW();
> +        timeout = start + MILLISECS(30);

Nah. That should be gotten from the XSPLICE_ACTION_APPLY 'time'
parameter - which has an 'timeout' in it.

> +        while ( atomic_read(&xsplice_work.semaphore) != total_cpus &&
> +                NOW() < timeout )
> +            cpu_relax();
> +
> +        if ( atomic_read(&xsplice_work.semaphore) == total_cpus )
> +        {
> +            struct payload *data2;

s/data2/data/ ?
> +
> +            /* "Mask" NMIs */
> +            saved_nmi_callback = set_nmi_callback(mask_nmi_callback);
> +
> +            /* All CPUs are waiting, now signal to disable IRQs */
> +            xsplice_work.ready = true;
> +            smp_mb();
> +
> +            /* Wait for irqs to be disabled */
> +            while ( atomic_read(&xsplice_work.irq_semaphore) != (total_cpus - 1) )
> +                cpu_relax();
> +
> +            local_irq_disable();
> +            /* Now this function should be the only one on any stack.
> +             * No need to lock the payload list or applied list. */
> +            switch ( xsplice_work.cmd )
> +            {
> +                case XSPLICE_ACTION_APPLY:
> +                        xsplice_work.data->rc = apply_payload(xsplice_work.data);
> +                        if ( xsplice_work.data->rc == 0 )
> +                            xsplice_work.data->state = XSPLICE_STATE_APPLIED;
> +                        break;
> +                case XSPLICE_ACTION_REVERT:
> +                        xsplice_work.data->rc = revert_payload(xsplice_work.data);
> +                        if ( xsplice_work.data->rc == 0 )
> +                            xsplice_work.data->state = XSPLICE_STATE_CHECKED;
> +                        break;
> +                case XSPLICE_ACTION_REPLACE:
> +                        list_for_each_entry ( data2, &payload_list, list )
> +                        {
> +                            if ( data2->state != XSPLICE_STATE_APPLIED )
> +                                continue;
> +
> +                            data2->rc = revert_payload(data2);
> +                            if ( data2->rc == 0 )
> +                                data2->state = XSPLICE_STATE_CHECKED;
> +                            else
> +                            {
> +                                xsplice_work.data->rc = -EINVAL;

Why not copy the error code (from data2->rc?)
> +                                break;
> +                            }
> +                        }
> +                        if ( xsplice_work.data->rc != -EINVAL )

And here you can just check for zero.
> +                        {
> +                            xsplice_work.data->rc = apply_payload(xsplice_work.data);
> +                            if ( xsplice_work.data->rc == 0 )
> +                                xsplice_work.data->state = XSPLICE_STATE_APPLIED;
> +                        }
> +                        break;
> +                default:
> +                        xsplice_work.data->rc = -EINVAL;
> +                        break;
> +            }
> +
> +            local_irq_enable();
> +            set_nmi_callback(saved_nmi_callback);
> +        }
> +        else
> +        {
> +            xsplice_work.data->rc = -EBUSY;
> +        }
> +
> +        xsplice_work.do_work = 0;
> +        smp_mb(); /* Synchronize with waiting CPUs */

Missing full stop.
> +    }
> +    else
> +    {
> +        /* Wait for all CPUs to rendezvous */

Missing full stop
> +        while ( xsplice_work.do_work && !xsplice_work.ready )
> +        {
> +            cpu_relax();
> +            smp_mb();
> +        }
> +
> +        /* Disable IRQs and signal */

Missing full stop.
> +        local_irq_disable();
> +        atomic_inc(&xsplice_work.irq_semaphore);
> +
> +        /* Wait for patching to complete */

Missing full stop.
> +        while ( xsplice_work.do_work )
> +        {
> +            cpu_relax();
> +            smp_mb();
> +        }
> +        local_irq_enable();
> +    }
> +}
> +
>  static int __init xsplice_init(void)
>  {
>      register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
> diff --git a/xen/include/asm-arm/nmi.h b/xen/include/asm-arm/nmi.h
> index a60587e..82aff35 100644
> --- a/xen/include/asm-arm/nmi.h
> +++ b/xen/include/asm-arm/nmi.h
> @@ -4,6 +4,19 @@
>  #define register_guest_nmi_callback(a)  (-ENOSYS)
>  #define unregister_guest_nmi_callback() (-ENOSYS)
>  
> +typedef int (*nmi_callback_t)(const struct cpu_user_regs *regs, int cpu);
> +
> +/**
> + * set_nmi_callback
> + *
> + * Set a handler for an NMI. Only one handler may be
> + * set. Return the old nmi callback handler.
> + */
> +static inline nmi_callback_t set_nmi_callback(nmi_callback_t callback)
> +{
> +    return NULL;
> +}
> +
>  #endif /* ASM_NMI_H */
>  /*
>   * Local variables:
> diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
> index a3946a3..507829c 100644
> --- a/xen/include/xen/xsplice.h
> +++ b/xen/include/xen/xsplice.h
> @@ -11,7 +11,8 @@ struct xsplice_patch_func {
>      unsigned long old_addr;
>      unsigned long old_size;
>      char *name;
> -    unsigned char undo[8];
> +    uint8_t undo[8];
> +    uint8_t pad[56];

This should be in a different patch. As part of the
"xsplice: Implement payload loading"

>  };
>  
>  struct xen_sysctl_xsplice_op;
> @@ -19,6 +20,8 @@ int xsplice_control(struct xen_sysctl_xsplice_op *);
>  
>  extern void xsplice_printall(unsigned char key);
>  
> +void do_xsplice(void);
> +
>  /* Arch hooks */
>  int xsplice_verify_elf(uint8_t *data, ssize_t len);
>  int xsplice_perform_rel(struct xsplice_elf *elf,
> @@ -27,5 +30,7 @@ int xsplice_perform_rel(struct xsplice_elf *elf,
>  int xsplice_perform_rela(struct xsplice_elf *elf,
>                           struct xsplice_elf_sec *base,
>                           struct xsplice_elf_sec *rela);
> +void xsplice_apply_jmp(struct xsplice_patch_func *func);
> +void xsplice_revert_jmp(struct xsplice_patch_func *func);
>  
>  #endif /* __XEN_XSPLICE_H__ */
> -- 
> 2.4.3
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 08/11] xsplice: Implement support for applying patches
  2015-11-03 18:16 ` [PATCH v1 08/11] xsplice: Implement support for applying patches Ross Lagerwall
  2015-11-05  3:17   ` Konrad Rzeszutek Wilk
@ 2015-11-05  3:19   ` Konrad Rzeszutek Wilk
  2015-11-27 13:51   ` Martin Pohlack
  2 siblings, 0 replies; 40+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-05  3:19 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: Kevin Tian, Ian Campbell, Andrew Cooper, xen-devel, Jan Beulich,
	Stefano Stabellini, Jun Nakajima, Aravind Gopalakrishnan,
	Boris Ostrovsky, Suravee Suthikulpanit

. snip..
> +void do_xsplice(void)
> +{
.. snip..
> +
> +        xsplice_work.do_work = 0;

= false

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 07/11] xsplice: Implement payload loading
  2015-11-04 22:21   ` Konrad Rzeszutek Wilk
@ 2015-11-05 10:35     ` Jan Beulich
  2015-11-05 11:51       ` Ross Lagerwall
  2015-11-05 11:15     ` Ross Lagerwall
  1 sibling, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2015-11-05 10:35 UTC (permalink / raw)
  To: Ross Lagerwall, Konrad Rzeszutek Wilk
  Cc: Andrew Cooper, Stefano Stabellini, Ian Campbell, xen-devel

>>> On 04.11.15 at 23:21, <konrad.wilk@oracle.com> wrote:
>> +int xsplice_perform_rela(struct xsplice_elf *elf,
>> +                         struct xsplice_elf_sec *base,
>> +                         struct xsplice_elf_sec *rela)
>> +{
>> +    Elf64_Rela *r;
>> +    int symndx, i;
> 
> unsigned int
> 
>> +    uint64_t val;
>> +    uint8_t *dest;
>> +
> 
> Can you double check that rela->sec-sh_entsize is not zero first?

Perhaps not just not zero, but at least a certain minimum? Or even
equaling some sizeof()?

Jan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 05/11] elf: Add relocation types to elfstructs.h
  2015-11-03 18:16 ` [PATCH v1 05/11] elf: Add relocation types to elfstructs.h Ross Lagerwall
@ 2015-11-05 10:38   ` Jan Beulich
  2015-11-05 11:52     ` Ross Lagerwall
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2015-11-05 10:38 UTC (permalink / raw)
  To: Ross Lagerwall; +Cc: Tim Deegan, Ian Jackson, Ian Campbell, xen-devel

>>> On 03.11.15 at 19:16, <ross.lagerwall@citrix.com> wrote:
> --- a/xen/include/xen/elfstructs.h
> +++ b/xen/include/xen/elfstructs.h
> @@ -348,6 +348,27 @@ typedef struct {
>  #define	ELF64_R_TYPE(info)	((info) & 0xFFFFFFFF)
>  #define ELF64_R_INFO(s,t) 	(((s) << 32) + (u_int32_t)(t))
>  
> +/* x86-64 relocation types */
> +#define R_X86_64_NONE		0	/* No reloc */
> +#define R_X86_64_64		1	/* Direct 64 bit  */
> +#define R_X86_64_PC32		2	/* PC relative 32 bit signed */
> +#define R_X86_64_GOT32		3	/* 32 bit GOT entry */
> +#define R_X86_64_PLT32		4	/* 32 bit PLT address */
> +#define R_X86_64_COPY		5	/* Copy symbol at runtime */
> +#define R_X86_64_GLOB_DAT	6	/* Create GOT entry */
> +#define R_X86_64_JUMP_SLOT	7	/* Create PLT entry */
> +#define R_X86_64_RELATIVE	8	/* Adjust by program base */
> +#define R_X86_64_GOTPCREL	9	/* 32 bit signed pc relative
> +					   offset to GOT */
> +#define R_X86_64_32		10	/* Direct 32 bit zero extended */
> +#define R_X86_64_32S		11	/* Direct 32 bit sign extended */
> +#define R_X86_64_16		12	/* Direct 16 bit zero extended */
> +#define R_X86_64_PC16		13	/* 16 bit sign extended pc relative */
> +#define R_X86_64_8		14	/* Direct 8 bit sign extended  */
> +#define R_X86_64_PC8		15	/* 8 bit sign extended pc relative */
> +
> +#define R_X86_64_NUM		16

Since the set isn't complete anyway - any reason not to drop
everything that's of no relevance to xSplice?

Jan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 01/11] xsplice: Design document (v2).
  2015-11-04 21:10 ` [PATCH v1 01/11] xsplice: Design document (v2) Konrad Rzeszutek Wilk
@ 2015-11-05 10:49   ` Ross Lagerwall
  2015-11-05 19:56     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 40+ messages in thread
From: Ross Lagerwall @ 2015-11-05 10:49 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Tim Deegan, Ian Jackson, Ian Campbell, Jan Beulich, xen-devel

On 11/04/2015 09:10 PM, Konrad Rzeszutek Wilk wrote:
snip
>> +The payload **MUST** contain enough data to allow us to apply the update
>> +and also safely reverse it. As such we **MUST** know:
>> +
>> + * The locations in memory to be patched. This can be determined dynamically
>> +   via symbols or via virtual addresses.
>> + * The new code that will be patched in.
>> + * Signature to verify the payload.
>
> Argh. We need to move the 'Signature to verify' in the 'v2' section
> as I don't think we can get that done in time.

No, not for V1.

>
>> +
>> +This binary format can be constructed using an custom binary format but
>> +there are severe disadvantages of it:
>> +
>> + * The format might need to be changed and we need an mechanism to accommodate
>> +   that.
>> + * It has to be platform agnostic.
>> + * Easily constructed using existing tools.
>> +
>> +As such having the payload in an ELF file is the sensible way. We would be
>> +carrying the various sets of structures (and data) in the ELF sections under
>> +different names and with definitions. The prefix for the ELF section name
>> +would always be: *.xsplice* to match up to the names of the structures.
>> +
>> +Note that every structure has padding. This is added so that the hypervisor
>> +can re-use those fields as it sees fit.
>> +
>> +Earlier design attempted to ineptly explain the relations of the ELF sections
>> +to each other without using proper ELF mechanism (sh_info, sh_link, data
>> +structures using Elf types, etc). This design will explain in detail
>> +the structures and how they are used together and not dig in the ELF
>> +format - except mention that the section names should match the
>> +structure names.
>> +
>> +The xSplice payload is a relocatable ELF binary. A typical binary would have:
>> +
>> + * One or more .text sections
>> + * Zero or more read-only data sections
>> + * Zero or more data sections
>> + * Relocations for each of these sections
>> +
>> +It may also have some architecture-specific sections. For example:
>> +
>> + * Alternatives instructions
>> + * Bug frames
>> + * Exception tables
>> + * Relocations for each of these sections
>> +
>> +The xSplice core code loads the payload as a standard ELF binary, relocates it
>> +and handles the architecture-specifc sections as needed. This process is much
>> +like what the Linux kernel module loader does. It contains no xSplice-specific
>> +details and thus will not be discussed further.
>
> What is 'it'? The 'process of what module loader does'?

'It' refers to the process of module loading in the previous sentence.

>
>> +
>> +Importantly, the payload also contains a section with an array of structures
>> +describing the functions to be patched:
>> +<pre>
>> +struct xsplice_patch_func {
>> +    unsigned long new_addr;
>> +    unsigned long new_size;
>> +    unsigned long old_addr;
>> +    unsigned long old_size;
>> +    char *name;
>> +    uint8_t pad[64];
>> +};
>> +<pre>
>
> Uh, so 104 bytes ? Or did you mean to s/64/24/ so the structure is nicely
> padded to 64-bytes?
>
> I think that is what you meant.

OK. I'm not too fussed about exact sizes for V1 anyway, it's likely to 
change at some point.

-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 07/11] xsplice: Implement payload loading
  2015-11-04 22:21   ` Konrad Rzeszutek Wilk
  2015-11-05 10:35     ` Jan Beulich
@ 2015-11-05 11:15     ` Ross Lagerwall
  2015-11-05 20:12       ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 40+ messages in thread
From: Ross Lagerwall @ 2015-11-05 11:15 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Andrew Cooper, Stefano Stabellini, Ian Campbell, Jan Beulich, xen-devel

On 11/04/2015 10:21 PM, Konrad Rzeszutek Wilk wrote:
snip
>>
>> +
>> +/*
>> + * The following functions prepare an xSplice module to be executed by
>> + * allocating space, loading the allocated sections, resolving symbols,
>> + * performing relocations, etc.
>> + */
>> +#ifdef CONFIG_X86
>> +static void *alloc_module(size_t size)
>
> s/module/payload/

My intention was that all the code which implements the "module loader" 
functionality (and is sort of independent from xSplice) uses the term 
"module" whereas the payload implies the loaded module + the other 
xSplice-specific bits. Your thoughts?

>> +{
>> +    mfn_t *mfn, *mfn_ptr;
>> +    size_t pages, i;
>> +    struct page_info *pg;
>> +    unsigned long hole_start, hole_end, cur;
>> +    struct payload *data, *data2;
>> +
>> +    ASSERT(size);
>> +
>> +    pages = PFN_UP(size);
>> +    mfn = xmalloc_array(mfn_t, pages);
>> +    if ( mfn == NULL )
>> +        return NULL;
>> +
>> +    for ( i = 0; i < pages; i++ )
>> +    {
>> +        pg = alloc_domheap_page(NULL, 0);
>> +        if ( pg == NULL )
>> +            goto error;
>> +        mfn[i] = _mfn(page_to_mfn(pg));
>> +    }
>
> This looks like 'vmalloc'. Why not use that?
> (That explanation should be part of the commit description probably)

vmalloc allocates pages and then maps them to an arbitrary virtual 
address with PAGE_HYPERVISOR. I needed to use a specific virtual address 
with PAGE_HYPERVISOR_RWX.

>
>> +
>> +    hole_start = (unsigned long)module_virt_start;
>> +    hole_end = hole_start + pages * PAGE_SIZE;
>> +    spin_lock(&payload_list_lock);
>> +    list_for_each_entry ( data, &payload_list, list )
>> +    {
>> +        list_for_each_entry ( data2, &payload_list, list )
>> +        {
>> +            unsigned long start, end;
>> +
>> +            start = (unsigned long)data2->module_address;
>> +            end = start + data2->module_pages * PAGE_SIZE;
>> +            if ( hole_end > start && hole_start < end )
>> +            {
>> +                hole_start = end;
>> +                hole_end = hole_start + pages * PAGE_SIZE;
>> +                break;
>> +            }
>> +        }
>> +        if ( &data2->list == &payload_list )
>> +            break;
>> +    }
>> +    spin_unlock(&payload_list_lock);
>
> This could be made in a nice function. 'find_hole' perhaps?
>
>> +
>> +    if ( hole_end >= module_virt_end )
>> +        goto error;
>> +
>> +    for ( cur = hole_start, mfn_ptr = mfn; pages--; ++mfn_ptr, cur += PAGE_SIZE )
>> +    {
>> +        if ( map_pages_to_xen(cur, mfn_x(*mfn_ptr), 1, PAGE_HYPERVISOR_RWX) )
>> +        {
>> +            if ( cur != hole_start )
>> +                destroy_xen_mappings(hole_start, cur);
>
> I think 'destroy_xen_mappings' is OK handling hole_start == cur.
>
>> +            goto error;
>> +        }
>> +    }
>> +    xfree(mfn);
>> +    return (void *)hole_start;
>> +
>> + error:
>> +    while ( i-- )
>> +        free_domheap_page(mfn_to_page(mfn_x(mfn[i])));
>> +    xfree(mfn);
>> +    return NULL;
>> +}
>> +#else
>> +static void *alloc_module(size_t size)
>
> s/module/payload/
>> +{
>> +    return NULL;
>> +}
>> +#endif
>> +
>> +static void free_module(struct payload *payload)
>> +{
>> +    int i;
>
> unsigned int;
>
>> +    struct page_info *pg;
>> +    PAGE_LIST_HEAD(pg_list);
>> +    void *va = payload->module_address;
>> +    unsigned long addr = (unsigned long)va;
>> +
>> +    if ( !payload->module_address )
>> +        return;
>
> How about 'if ( !addr )
> 		return;
> ?
>
>> +
>> +    payload->module_address = NULL;
>> +
>> +    for ( i = 0; i < payload->module_pages; i++ )
>> +        page_list_add(vmap_to_page(va + i * PAGE_SIZE), &pg_list);
>> +
>> +    destroy_xen_mappings(addr, addr + payload->module_pages * PAGE_SIZE);
>> +
>> +    while ( (pg = page_list_remove_head(&pg_list)) != NULL )
>> +        free_domheap_page(pg);
>> +
>> +    payload->module_pages = 0;
>> +}
>> +
>> +static void alloc_section(struct xsplice_elf_sec *sec, size_t *core_size)
>
> s/alloc/compute/?
>
>> +{
>> +    size_t align_size = ROUNDUP(*core_size, sec->sec->sh_addralign);
>> +    sec->sec->sh_entsize = align_size;
>> +    *core_size = sec->sec->sh_size + align_size;
>> +}
>> +
>> +static int move_module(struct payload *payload, struct xsplice_elf *elf)
>> +{
>> +    uint8_t *buf;
>> +    int i;
>
> unsigned int i;
>
>> +    size_t core_size = 0;
>> +
>> +    /* Allocate text regions */
>
> s/Allocate/Compute/
>
>> +    for ( i = 0; i < elf->hdr->e_shnum; i++ )
>> +    {
>> +        if ( (elf->sec[i].sec->sh_flags & (SHF_ALLOC|SHF_EXECINSTR)) ==
>> +             (SHF_ALLOC|SHF_EXECINSTR) )
>> +            alloc_section(&elf->sec[i], &core_size);
>> +    }
>> +
>> +    /* Allocate rw data */
>> +    for ( i = 0; i < elf->hdr->e_shnum; i++ )
>> +    {
>> +        if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
>> +             !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
>> +             (elf->sec[i].sec->sh_flags & SHF_WRITE) )
>> +            alloc_section(&elf->sec[i], &core_size);
>> +    }
>> +
>> +    /* Allocate ro data */
>> +    for ( i = 0; i < elf->hdr->e_shnum; i++ )
>> +    {
>> +        if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
>> +             !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
>> +             !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
>> +            alloc_section(&elf->sec[i], &core_size);
>> +    }
>> +
>> +    buf = alloc_module(core_size);
>> +    if ( !buf ) {
>> +        printk(XENLOG_ERR "Could not allocate memory for module\n");
>
> (%s: Could not allocate %u memory for payload!\n", elf->name, core_size);
>
>> +        return -ENOMEM;
>> +    }
>> +    memset(buf, 0, core_size);
>
> Perhaps for fun it ought to be 'ud2' ?

There's no point. It either gets memcpy'd over or needs to be set to 
zero for BSS.

>
>> +
>> +    for ( i = 0; i < elf->hdr->e_shnum; i++ )
>> +    {
>> +        if ( elf->sec[i].sec->sh_flags & SHF_ALLOC )
>> +        {
>> +            elf->sec[i].load_addr = buf + elf->sec[i].sec->sh_entsize;
>> +            memcpy(elf->sec[i].load_addr, elf->sec[i].data,
>> +                   elf->sec[i].sec->sh_size);
>> +            printk(XENLOG_DEBUG "Loaded %s at 0x%p\n",
>
> Add %s: at the start ..
>> +                   elf->sec[i].name, elf->sec[i].load_addr);
>
> which would be elf->name.
>
>> +        }
>> +    }
>> +
>> +    payload->module_address = buf;
>> +    payload->module_pages = PFN_UP(core_size);
>
> Instead of module could we name it payload?

See comment above.

>
>> +
>> +    return 0;
>> +}
>> +
>> +static int resolve_symbols(struct xsplice_elf *elf)
>
> s/resolve/check/

No, this is resolving section symbols.

>
>> +{
>> +    int i;
>
> unsigned int;
>
>> +
>> +    for ( i = 1; i < elf->nsym; i++ )
>
> Why 1? Please explain as comment.

The first entry of an ELF symbol table is the "undefined symbol index". 
This code is expected to be read alongside the ELF specification :-)

>
>
>> +    {
>> +        switch ( elf->sym[i].sym->st_shndx )
>> +        {
>> +            case SHN_COMMON:
>> +                printk(XENLOG_ERR "Unexpected common symbol: %s\n",
>> +                       elf->sym[i].name);
>
> Please also include elf->name in the error.
>
>> +                return -EINVAL;
>> +                break;
>> +            case SHN_UNDEF:
>> +                printk(XENLOG_ERR "Unknown symbol: %s\n", elf->sym[i].name);
>
> Ditto.
>> +                return -ENOENT;
>> +                break;
>> +            case SHN_ABS:
>> +                printk(XENLOG_DEBUG "Absolute symbol: %s => 0x%p\n",
>> +                       elf->sym[i].name, (void *)elf->sym[i].sym->st_value);
>> +                break;
>> +            default:
>> +                if ( elf->sec[elf->sym[i].sym->st_shndx].sec->sh_flags & SHF_ALLOC )
>> +                {
>> +                    elf->sym[i].sym->st_value +=
>> +                        (unsigned long)elf->sec[elf->sym[i].sym->st_shndx].load_addr;
>> +                    printk(XENLOG_DEBUG "Symbol resolved: %s => 0x%p\n",
>
> Ditto;
>> +                           elf->sym[i].name, (void *)elf->sym[i].sym->st_value);
>> +                }
>> +        }
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int perform_relocs(struct xsplice_elf *elf)
>> +{
>> +    struct xsplice_elf_sec *rela, *base;
>> +    int i, rc;
>> +
>
> unsigned int i;
>
>> +    for ( i = 0; i < elf->hdr->e_shnum; i++ )
>> +    {
>> +        rela = &elf->sec[i];
>> +
>> +        /* Is it a valid relocation section? */
>> +        if ( rela->sec->sh_info >= elf->hdr->e_shnum )
>> +            continue;
>
> Um, don't we want to mark it as invalid or such?
> Or overwrite it so we won't use it?

The code doesn't use it. I don't understand the concern.

>
>> +
>> +        base = &elf->sec[rela->sec->sh_info];
>> +
>> +        /* Don't relocate non-allocated sections */
>> +        if ( !(base->sec->sh_flags & SHF_ALLOC) )
>> +            continue;
>
>> +
>> +        if ( elf->sec[i].sec->sh_type == SHT_RELA )
>> +            rc = xsplice_perform_rela(elf, base, rela);
>> +        else if ( elf->sec[i].sec->sh_type == SHT_REL )
>> +            rc = xsplice_perform_rel(elf, base, rela);
>> +
>> +        if ( rc )
>> +            return rc;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int load_module(struct payload *payload, uint8_t *raw, ssize_t len)
>> +{
>> +    struct xsplice_elf elf;
>
> Wait a minute? We ditch it after this?
>
>> +    int rc = 0;
>> +
>> +    rc = xsplice_verify_elf(raw, len);
>> +    if ( rc )
>> +        return rc;
>> +
>> +    rc = xsplice_elf_load(&elf, raw, len);
>> +    if ( rc )
>> +        return rc;
>> +
>> +    rc = move_module(payload, &elf);
>> +    if ( rc )
>> +        goto err_elf;
>> +
>> +    rc = resolve_symbols(&elf);
>> +    if ( rc )
>> +        goto err_module;
>> +
>> +    rc = perform_relocs(&elf);
>> +    if ( rc )
>> +        goto err_module;
>> +
>
> Shouldn't you call xsplice_elf_free(&elf) here? Or
> hook up the elf to the 'struct payload'?
>
>
> If not, who is going to clean up elf->sec and elf->sym when the
> payload is unloaded?

Yes, I forgot to free it here.

>> +    return 0;
>> +
>> + err_module:
>> +    free_module(payload);
>> + err_elf:
>> +    xsplice_elf_free(&elf);
>> +
>> +    return rc;
>> +}
>> +
>>   static int __init xsplice_init(void)
>>   {
>>       register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
>> diff --git a/xen/include/asm-x86/x86_64/page.h b/xen/include/asm-x86/x86_64/page.h
>> index 19ab4d0..e6f08e9 100644
>> --- a/xen/include/asm-x86/x86_64/page.h
>> +++ b/xen/include/asm-x86/x86_64/page.h
>> @@ -38,6 +38,8 @@
>>   #include <xen/pdx.h>
>>
>>   extern unsigned long xen_virt_end;
>> +extern unsigned long module_virt_start;
>> +extern unsigned long module_virt_end;
>>
>>   #define spage_to_pdx(spg) (((spg) - spage_table)<<(SUPERPAGE_SHIFT-PAGE_SHIFT))
>>   #define pdx_to_spage(pdx) (spage_table + ((pdx)>>(SUPERPAGE_SHIFT-PAGE_SHIFT)))
>> diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
>> index 41e28da..a3946a3 100644
>> --- a/xen/include/xen/xsplice.h
>> +++ b/xen/include/xen/xsplice.h
>> @@ -1,9 +1,31 @@
>>   #ifndef __XEN_XSPLICE_H__
>>   #define __XEN_XSPLICE_H__
>>
>> +struct xsplice_elf;
>> +struct xsplice_elf_sec;
>> +struct xsplice_elf_sym;
>> +
>> +struct xsplice_patch_func {
>> +    unsigned long new_addr;
>> +    unsigned long new_size;
>> +    unsigned long old_addr;
>> +    unsigned long old_size;
>> +    char *name;
>> +    unsigned char undo[8];
>> +};
>
> We don't use them in this patch. They could be moved to another patch.

OK.

-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 08/11] xsplice: Implement support for applying patches
  2015-11-05  3:17   ` Konrad Rzeszutek Wilk
@ 2015-11-05 11:45     ` Ross Lagerwall
  2015-11-05 20:08       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 40+ messages in thread
From: Ross Lagerwall @ 2015-11-05 11:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Kevin Tian, Ian Campbell, Andrew Cooper, xen-devel, Jan Beulich,
	Stefano Stabellini, Jun Nakajima, Aravind Gopalakrishnan,
	Boris Ostrovsky, Suravee Suthikulpanit

On 11/05/2015 03:17 AM, Konrad Rzeszutek Wilk wrote:
snip
>> diff --git a/xen/arch/x86/xsplice.c b/xen/arch/x86/xsplice.c
>> index dbff0d5..31e4124 100644
>> --- a/xen/arch/x86/xsplice.c
>> +++ b/xen/arch/x86/xsplice.c
>> @@ -3,6 +3,25 @@
>>   #include <xen/xsplice_elf.h>
>>   #include <xen/xsplice.h>
>>
>> +#define PATCH_INSN_SIZE 5
>> +
>> +void xsplice_apply_jmp(struct xsplice_patch_func *func)
>
> Don't we want for it to be 'int'

Only if an error is expected.

>> +{
>> +    uint32_t val;
>> +    uint8_t *old_ptr;
>> +
>> +    old_ptr = (uint8_t *)func->old_addr;
>> +    memcpy(func->undo, old_ptr, PATCH_INSN_SIZE);
>
> And perhaps use something which can catch an exception (#GP) so that
> this can error out?

Why would this fail?

>> +    *old_ptr++ = 0xe9; /* Relative jump */
>> +    val = func->new_addr - func->old_addr - PATCH_INSN_SIZE;
>> +    memcpy(old_ptr, &val, sizeof val);
>> +}
>> +
>> +void xsplice_revert_jmp(struct xsplice_patch_func *func)
>> +{
>> +    memcpy((void *)func->old_addr, func->undo, PATCH_INSN_SIZE);
>> +}
>> +
>>   int xsplice_verify_elf(uint8_t *data, ssize_t len)
>>   {
>>
>> diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
>> index 5e88c55..4476be5 100644
>> --- a/xen/common/xsplice.c
>> +++ b/xen/common/xsplice.c
>> @@ -11,16 +11,21 @@
>>   #include <xen/guest_access.h>
>>   #include <xen/stdbool.h>
>>   #include <xen/sched.h>
>> +#include <xen/softirq.h>
>>   #include <xen/lib.h>
>> +#include <xen/wait.h>
>>   #include <xen/xsplice_elf.h>
>>   #include <xen/xsplice.h>
>>   #include <public/sysctl.h>
>>
>>   #include <asm/event.h>
>> +#include <asm/nmi.h>
>>
>>   static DEFINE_SPINLOCK(payload_list_lock);
>>   static LIST_HEAD(payload_list);
>>
>> +static LIST_HEAD(applied_list);
>> +
>>   static unsigned int payload_cnt;
>>   static unsigned int payload_version = 1;
>>
>> @@ -29,15 +34,34 @@ struct payload {
>>       int32_t rc;         /* 0 or -EXX. */
>>
>>       struct list_head   list;   /* Linked to 'payload_list'. */
>> +    struct list_head   applied_list;   /* Linked to 'applied_list'. */
>>
>> +    struct xsplice_patch_func *funcs;
>> +    int nfuncs;
>
> unsigned int;
>
>>       void *module_address;
>>       size_t module_pages;
>>
>>       char  id[XEN_XSPLICE_NAME_SIZE + 1];          /* Name of it. */
>>   };
>>
>> +/* Defines an outstanding patching action. */
>> +struct xsplice_work
>> +{
>> +    atomic_t semaphore;          /* Used for rendezvous */
>> +    atomic_t irq_semaphore;      /* Used to signal all IRQs disabled */
>> +    struct payload *data;        /* The payload on which to act */
>> +    volatile bool_t do_work;     /* Signals work to do */
>> +    volatile bool_t ready;       /* Signals all CPUs synchronized */
>> +    uint32_t cmd;                /* Action request. XSPLICE_ACTION_* */
>
> Now since you have a pointer to 'data' can't you follow that for the
> cmd? Or at least the 'data->state'?

I moved cmd out of the payload and into xsplice_work since cmd is only 
needed when there is work to do.
data->state contains the current state of the payload (i.e. before the 
action has been performed) so it provides no indication of what command 
needs to be performed.

>
> Missing full stops.
>> +};
>> +
>> +static DEFINE_SPINLOCK(xsplice_work_lock);
>> +/* There can be only one outstanding patching action. */
>> +static struct xsplice_work xsplice_work;
>> +
>>   static int load_module(struct payload *payload, uint8_t *raw, ssize_t len);
>>   static void free_module(struct payload *payload);
>> +static int schedule_work(struct payload *data, uint32_t cmd);
>>
>>   static const char *state2str(int32_t state)
>>   {
>> @@ -341,28 +365,22 @@ static int xsplice_action(xen_sysctl_xsplice_action_t *action)
>>       case XSPLICE_ACTION_REVERT:
>>           if ( data->state == XSPLICE_STATE_APPLIED )
>>           {
>> -            /* No implementation yet. */
>> -            data->state = XSPLICE_STATE_CHECKED;
>> -            data->rc = 0;
>> -            rc = 0;
>> +            data->rc = -EAGAIN;
>> +            rc = schedule_work(data, action->cmd);
>>           }
>>           break;
>>       case XSPLICE_ACTION_APPLY:
>>           if ( (data->state == XSPLICE_STATE_CHECKED) )
>>           {
>> -            /* No implementation yet. */
>> -            data->state = XSPLICE_STATE_APPLIED;
>> -            data->rc = 0;
>> -            rc = 0;
>> +            data->rc = -EAGAIN;
>> +            rc = schedule_work(data, action->cmd);
>>           }
>>           break;
>>       case XSPLICE_ACTION_REPLACE:
>>           if ( data->state == XSPLICE_STATE_CHECKED )
>>           {
>> -            /* No implementation yet. */
>> -            data->state = XSPLICE_STATE_CHECKED;
>> -            data->rc = 0;
>> -            rc = 0;
>> +            data->rc = -EAGAIN;
>> +            rc = schedule_work(data, action->cmd);
>>           }
>>           break;
>>       default:
>> @@ -637,6 +655,24 @@ static int perform_relocs(struct xsplice_elf *elf)
>>       return 0;
>>   }
>>
>> +static int find_special_sections(struct payload *payload,
>> +                                 struct xsplice_elf *elf)
>> +{
>> +    struct xsplice_elf_sec *sec;
>> +
>> +    sec = xsplice_elf_sec_by_name(elf, ".xsplice.funcs");
>> +    if ( !sec )
>> +    {
>> +        printk(XENLOG_ERR ".xsplice.funcs is missing\n");
>> +        return -1;
>> +    }
>> +
>> +    payload->funcs = (struct xsplice_patch_func *)sec->load_addr;
>> +    payload->nfuncs = sec->sec->sh_size / (sizeof *payload->funcs);
>> +
>> +    return 0;
>> +}
>
> That looks like it should belong to another patch?

Why? The array of functions is specifically needed for applying a 
payload so the code belongs in this patch.

>> +
>>   static int load_module(struct payload *payload, uint8_t *raw, ssize_t len)
>>   {
>>       struct xsplice_elf elf;
>> @@ -662,6 +698,10 @@ static int load_module(struct payload *payload, uint8_t *raw, ssize_t len)
>>       if ( rc )
>>           goto err_module;
>>
>> +    rc = find_special_sections(payload, &elf);
>> +    if ( rc )
>> +        goto err_module;
>> +
>
> Ditto?
>>       return 0;
>>
>>    err_module:
>> @@ -672,6 +712,206 @@ static int load_module(struct payload *payload, uint8_t *raw, ssize_t len)
>>       return rc;
>>   }
>>
>> +
>> +/*
>> + * The following functions get the CPUs into an appropriate state and
>> + * apply (or revert) each of the module's functions.
>
> s/module/payload/
>
>> + */
>> +
>> +/*
>> + * This function is executed having all other CPUs with no stack and IRQs
>> + * disabled.
>
> Well, there is some stack. For example from 'cpu_idle' - you have the
> 'cpu_idle' on the stack.
>
>> + */
>> +static int apply_payload(struct payload *data)
>> +{
>> +    int i;
>
> unsigned int
>> +
>> +    printk(XENLOG_DEBUG "Applying payload: %s\n", data->id);
>> +
>> +    for ( i = 0; i < data->nfuncs; i++ )
>> +        xsplice_apply_jmp(data->funcs + i);
>
> And if this returns an error then we could skip adding
> it to the applied_list..

Only if an error is expected.

>> +
>
> Also the patching in Linux seems to do some icache purging.
> Should we use that?

I didn't see that when I looked for it. The alternatives patching in Xen 
doesn't purge the icache (afaict). We need feedback from an x86 
maintainer here.

>
>> +    list_add_tail(&data->applied_list, &applied_list);
>> +
>> +    return 0;
>> +}
>> +
>> +/*
>> + * This function is executed having all other CPUs with no stack and IRQs
>> + * disabled.
>> + */
>> +static int revert_payload(struct payload *data)
>> +{
>> +    int i;
>
> unsigned int i;
>> +
>> +    printk(XENLOG_DEBUG "Reverting payload: %s\n", data->id);
>> +
>> +    for ( i = 0; i < data->nfuncs; i++ )
>> +        xsplice_revert_jmp(data->funcs + i);
>> +
>> +    list_del(&data->applied_list);
>> +
>> +    return 0;
>> +}
>> +
>> +/* Must be holding the payload_list lock */
>
> Missing full stop.
>
> Should that lock be called something else now? (Because it is certainly
> not protecting the list anymore - but also the scheduling action).

Hmm...

>> +static int schedule_work(struct payload *data, uint32_t cmd)
>> +{
>> +    /* Fail if an operation is already scheduled */
>> +    if ( xsplice_work.do_work )
>> +        return -EAGAIN;
>> +
>
>> +    xsplice_work.cmd = cmd;
>> +    xsplice_work.data = data;
>> +    atomic_set(&xsplice_work.semaphore, 0);
>> +    atomic_set(&xsplice_work.irq_semaphore, 0);
>> +    xsplice_work.ready = false;
>> +    smp_mb();
>> +    xsplice_work.do_work = true;
>> +    smp_mb();
>
> So this is your 'GO GO' signal right? I think you may want
> to have 'smb_wmb()'

A full review of the memory barriers and synchronization is needed by 
someone more knowledgeable than me.

>> +
>> +    return 0;
>> +}
>> +
>
> /me laughs. What a way to 'fix' the NMI watchdog.

It comes directly from the alternatives code.

>
>> +static int mask_nmi_callback(const struct cpu_user_regs *regs, int cpu)
>> +{
>> +    return 1;
>> +}
>> +
>> +static void reschedule_fn(void *unused)
>> +{
>> +    smp_mb(); /* Synchronize with setting do_work */
>> +    raise_softirq(SCHEDULE_SOFTIRQ);
>> +}
>> +
>> +/*
>> + * The main function which manages the work of quiescing the system and
>> + * patching code.
>> + */
>> +void do_xsplice(void)
>> +{
>> +    int id;
>
> unsigned int id;
>> +    unsigned int total_cpus;
>> +    nmi_callback_t saved_nmi_callback;
>> +
>> +    /* Fast path: no work to do */
>
> Missing full stop.
>> +    if ( likely(!xsplice_work.do_work) )
>> +        return;
>> +
>> +    ASSERT(local_irq_is_enabled());
>> +
>> +    spin_lock(&xsplice_work_lock);
>> +    id = atomic_read(&xsplice_work.semaphore);
>> +    atomic_inc(&xsplice_work.semaphore);
>> +    spin_unlock(&xsplice_work_lock);
>
> Could you use 'atomic_inc_and_test' and then you can get
> rid of the spinlock.

OK.

>
>> +
>> +    total_cpus = num_online_cpus();
>
> Which could change across these invocations.. Perhaps
> during these calls we need to lock up CPU up/down code?

OK.

>
>> +
>> +    if ( id == 0 )
>> +    {
>
> Can you just make this its own function? Perhaps call it
> 'xsplice_do_single' or such?
>
>> +        s_time_t timeout, start;
>> +
>> +        /* Trigger other CPUs to execute do_xsplice */
>
> Missing full stop.
>> +        smp_call_function(reschedule_fn, NULL, 0);
>> +
>> +        /* Wait for other CPUs with a timeout */
>
> Missing full stop.
>> +        start = NOW();
>> +        timeout = start + MILLISECS(30);
>
> Nah. That should be gotten from the XSPLICE_ACTION_APPLY 'time'
> parameter - which has an 'timeout' in it.
>
>> +        while ( atomic_read(&xsplice_work.semaphore) != total_cpus &&
>> +                NOW() < timeout )
>> +            cpu_relax();
>> +
>> +        if ( atomic_read(&xsplice_work.semaphore) == total_cpus )
>> +        {
>> +            struct payload *data2;
>
> s/data2/data/ ?
>> +
>> +            /* "Mask" NMIs */
>> +            saved_nmi_callback = set_nmi_callback(mask_nmi_callback);
>> +
>> +            /* All CPUs are waiting, now signal to disable IRQs */
>> +            xsplice_work.ready = true;
>> +            smp_mb();
>> +
>> +            /* Wait for irqs to be disabled */
>> +            while ( atomic_read(&xsplice_work.irq_semaphore) != (total_cpus - 1) )
>> +                cpu_relax();
>> +
>> +            local_irq_disable();
>> +            /* Now this function should be the only one on any stack.
>> +             * No need to lock the payload list or applied list. */
>> +            switch ( xsplice_work.cmd )
>> +            {
>> +                case XSPLICE_ACTION_APPLY:
>> +                        xsplice_work.data->rc = apply_payload(xsplice_work.data);
>> +                        if ( xsplice_work.data->rc == 0 )
>> +                            xsplice_work.data->state = XSPLICE_STATE_APPLIED;
>> +                        break;
>> +                case XSPLICE_ACTION_REVERT:
>> +                        xsplice_work.data->rc = revert_payload(xsplice_work.data);
>> +                        if ( xsplice_work.data->rc == 0 )
>> +                            xsplice_work.data->state = XSPLICE_STATE_CHECKED;
>> +                        break;
>> +                case XSPLICE_ACTION_REPLACE:
>> +                        list_for_each_entry ( data2, &payload_list, list )
>> +                        {
>> +                            if ( data2->state != XSPLICE_STATE_APPLIED )
>> +                                continue;
>> +
>> +                            data2->rc = revert_payload(data2);
>> +                            if ( data2->rc == 0 )
>> +                                data2->state = XSPLICE_STATE_CHECKED;
>> +                            else
>> +                            {
>> +                                xsplice_work.data->rc = -EINVAL;
>
> Why not copy the error code (from data2->rc?)

No. Reverting a different payload updates the error code for that 
payload. The payload to-be-applied has failed because a dependent action 
has failed. This is not the same as the original error. The original 
error is visible through data2->rc.

>> +                                break;
>> +                            }
>> +                        }
>> +                        if ( xsplice_work.data->rc != -EINVAL )
>
> And here you can just check for zero.

No, because xsplice_work.data->rc is originally -EAGAIN (in progress). I 
suppose I could check for xsplice_work.data->rc == -EAGAIN.

>> +                        {
>> +                            xsplice_work.data->rc = apply_payload(xsplice_work.data);
>> +                            if ( xsplice_work.data->rc == 0 )
>> +                                xsplice_work.data->state = XSPLICE_STATE_APPLIED;
>> +                        }
>> +                        break;
>> +                default:
>> +                        xsplice_work.data->rc = -EINVAL;
>> +                        break;
>> +            }
>> +
>> +            local_irq_enable();
>> +            set_nmi_callback(saved_nmi_callback);
>> +        }
>> +        else
>> +        {
>> +            xsplice_work.data->rc = -EBUSY;
>> +        }
>> +
>> +        xsplice_work.do_work = 0;
>> +        smp_mb(); /* Synchronize with waiting CPUs */
>
> Missing full stop.
>> +    }
>> +    else
>> +    {
>> +        /* Wait for all CPUs to rendezvous */
>
> Missing full stop
>> +        while ( xsplice_work.do_work && !xsplice_work.ready )
>> +        {
>> +            cpu_relax();
>> +            smp_mb();
>> +        }
>> +
>> +        /* Disable IRQs and signal */
>
> Missing full stop.
>> +        local_irq_disable();
>> +        atomic_inc(&xsplice_work.irq_semaphore);
>> +
>> +        /* Wait for patching to complete */
>
> Missing full stop.
>> +        while ( xsplice_work.do_work )
>> +        {
>> +            cpu_relax();
>> +            smp_mb();
>> +        }
>> +        local_irq_enable();
>> +    }
>> +}
>> +
>>   static int __init xsplice_init(void)
>>   {
>>       register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
>> diff --git a/xen/include/asm-arm/nmi.h b/xen/include/asm-arm/nmi.h
>> index a60587e..82aff35 100644
>> --- a/xen/include/asm-arm/nmi.h
>> +++ b/xen/include/asm-arm/nmi.h
>> @@ -4,6 +4,19 @@
>>   #define register_guest_nmi_callback(a)  (-ENOSYS)
>>   #define unregister_guest_nmi_callback() (-ENOSYS)
>>
>> +typedef int (*nmi_callback_t)(const struct cpu_user_regs *regs, int cpu);
>> +
>> +/**
>> + * set_nmi_callback
>> + *
>> + * Set a handler for an NMI. Only one handler may be
>> + * set. Return the old nmi callback handler.
>> + */
>> +static inline nmi_callback_t set_nmi_callback(nmi_callback_t callback)
>> +{
>> +    return NULL;
>> +}
>> +
>>   #endif /* ASM_NMI_H */
>>   /*
>>    * Local variables:
>> diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
>> index a3946a3..507829c 100644
>> --- a/xen/include/xen/xsplice.h
>> +++ b/xen/include/xen/xsplice.h
>> @@ -11,7 +11,8 @@ struct xsplice_patch_func {
>>       unsigned long old_addr;
>>       unsigned long old_size;
>>       char *name;
>> -    unsigned char undo[8];
>> +    uint8_t undo[8];
>> +    uint8_t pad[56];
>
> This should be in a different patch. As part of the
> "xsplice: Implement payload loading"

Oops, that's what I intended.

-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 07/11] xsplice: Implement payload loading
  2015-11-05 10:35     ` Jan Beulich
@ 2015-11-05 11:51       ` Ross Lagerwall
  2015-11-05 12:13         ` Jan Beulich
  0 siblings, 1 reply; 40+ messages in thread
From: Ross Lagerwall @ 2015-11-05 11:51 UTC (permalink / raw)
  To: Jan Beulich, Konrad Rzeszutek Wilk
  Cc: Andrew Cooper, Stefano Stabellini, Ian Campbell, xen-devel

On 11/05/2015 10:35 AM, Jan Beulich wrote:
>>>> On 04.11.15 at 23:21, <konrad.wilk@oracle.com> wrote:
>>> +int xsplice_perform_rela(struct xsplice_elf *elf,
>>> +                         struct xsplice_elf_sec *base,
>>> +                         struct xsplice_elf_sec *rela)
>>> +{
>>> +    Elf64_Rela *r;
>>> +    int symndx, i;
>>
>> unsigned int
>>
>>> +    uint64_t val;
>>> +    uint8_t *dest;
>>> +
>>
>> Can you double check that rela->sec-sh_entsize is not zero first?
>
> Perhaps not just not zero, but at least a certain minimum? Or even
> equaling some sizeof()?
>

Well it only makes sense if rela->sec-sh_entsize == sizeof(Elf64_Rela) 
so that is what I shall check for.

-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 05/11] elf: Add relocation types to elfstructs.h
  2015-11-05 10:38   ` Jan Beulich
@ 2015-11-05 11:52     ` Ross Lagerwall
  0 siblings, 0 replies; 40+ messages in thread
From: Ross Lagerwall @ 2015-11-05 11:52 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tim Deegan, Ian Jackson, Ian Campbell, xen-devel

On 11/05/2015 10:38 AM, Jan Beulich wrote:
>>>> On 03.11.15 at 19:16, <ross.lagerwall@citrix.com> wrote:
>> --- a/xen/include/xen/elfstructs.h
>> +++ b/xen/include/xen/elfstructs.h
>> @@ -348,6 +348,27 @@ typedef struct {
>>   #define	ELF64_R_TYPE(info)	((info) & 0xFFFFFFFF)
>>   #define ELF64_R_INFO(s,t) 	(((s) << 32) + (u_int32_t)(t))
>>
>> +/* x86-64 relocation types */
>> +#define R_X86_64_NONE		0	/* No reloc */
>> +#define R_X86_64_64		1	/* Direct 64 bit  */
>> +#define R_X86_64_PC32		2	/* PC relative 32 bit signed */
>> +#define R_X86_64_GOT32		3	/* 32 bit GOT entry */
>> +#define R_X86_64_PLT32		4	/* 32 bit PLT address */
>> +#define R_X86_64_COPY		5	/* Copy symbol at runtime */
>> +#define R_X86_64_GLOB_DAT	6	/* Create GOT entry */
>> +#define R_X86_64_JUMP_SLOT	7	/* Create PLT entry */
>> +#define R_X86_64_RELATIVE	8	/* Adjust by program base */
>> +#define R_X86_64_GOTPCREL	9	/* 32 bit signed pc relative
>> +					   offset to GOT */
>> +#define R_X86_64_32		10	/* Direct 32 bit zero extended */
>> +#define R_X86_64_32S		11	/* Direct 32 bit sign extended */
>> +#define R_X86_64_16		12	/* Direct 16 bit zero extended */
>> +#define R_X86_64_PC16		13	/* 16 bit sign extended pc relative */
>> +#define R_X86_64_8		14	/* Direct 8 bit sign extended  */
>> +#define R_X86_64_PC8		15	/* 8 bit sign extended pc relative */
>> +
>> +#define R_X86_64_NUM		16
>
> Since the set isn't complete anyway - any reason not to drop
> everything that's of no relevance to xSplice?
>

I copied these definitions from Linux (wrongly) assuming that they were 
complete. I shall remove the unused ones.

-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 07/11] xsplice: Implement payload loading
  2015-11-05 11:51       ` Ross Lagerwall
@ 2015-11-05 12:13         ` Jan Beulich
  0 siblings, 0 replies; 40+ messages in thread
From: Jan Beulich @ 2015-11-05 12:13 UTC (permalink / raw)
  To: Ross Lagerwall, Konrad Rzeszutek Wilk
  Cc: Andrew Cooper, Stefano Stabellini, Ian Campbell, xen-devel

>>> On 05.11.15 at 12:51, <ross.lagerwall@citrix.com> wrote:
> On 11/05/2015 10:35 AM, Jan Beulich wrote:
>>>>> On 04.11.15 at 23:21, <konrad.wilk@oracle.com> wrote:
>>>> +int xsplice_perform_rela(struct xsplice_elf *elf,
>>>> +                         struct xsplice_elf_sec *base,
>>>> +                         struct xsplice_elf_sec *rela)
>>>> +{
>>>> +    Elf64_Rela *r;
>>>> +    int symndx, i;
>>>
>>> unsigned int
>>>
>>>> +    uint64_t val;
>>>> +    uint8_t *dest;
>>>> +
>>>
>>> Can you double check that rela->sec-sh_entsize is not zero first?
>>
>> Perhaps not just not zero, but at least a certain minimum? Or even
>> equaling some sizeof()?
>>
> 
> Well it only makes sense if rela->sec-sh_entsize == sizeof(Elf64_Rela) 
> so that is what I shall check for.

The question whether to use == or >= really depends on whether
we expect (theoretical) additions to the structure to be backwards
compatible.

Jan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 09/11] xsplice: Add support for bug frames
  2015-11-03 18:16 ` [PATCH v1 09/11] xsplice: Add support for bug frames Ross Lagerwall
@ 2015-11-05 19:43   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 40+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-05 19:43 UTC (permalink / raw)
  To: Ross Lagerwall; +Cc: Andrew Cooper, Jan Beulich, xen-devel

On Tue, Nov 03, 2015 at 06:16:06PM +0000, Ross Lagerwall wrote:
> Add support for handling bug frames contained with xsplice modules. If a
> trap occurs search either the kernel bug table or an applied module's

payload.
> bug table depending on the instruction pointer.
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> ---
>  xen/arch/x86/traps.c      |  30 ++++++++-----
>  xen/common/symbols.c      |   7 +++
>  xen/common/xsplice.c      | 107 +++++++++++++++++++++++++++++++++++++++++-----
>  xen/include/xen/kernel.h  |   1 +
>  xen/include/xen/xsplice.h |   4 ++
>  5 files changed, 129 insertions(+), 20 deletions(-)
> 
> diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
> index b32f696..cd51cfd 100644
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -48,6 +48,7 @@
>  #include <xen/kexec.h>
>  #include <xen/trace.h>
>  #include <xen/paging.h>
> +#include <xen/xsplice.h>
>  #include <xen/watchdog.h>
>  #include <asm/system.h>
>  #include <asm/io.h>
> @@ -1076,20 +1077,29 @@ void do_invalid_op(struct cpu_user_regs *regs)
>          return;
>      }
>  
> -    if ( !is_active_kernel_text(regs->eip) ||
> +    if ( !is_active_text(regs->eip) ||
>           __copy_from_user(bug_insn, eip, sizeof(bug_insn)) ||
>           memcmp(bug_insn, "\xf\xb", sizeof(bug_insn)) )
>          goto die;
>  
> -    for ( bug = __start_bug_frames, id = 0; stop_frames[id]; ++bug )
> +    if ( likely(is_active_kernel_text(regs->eip)) )
>      {
> -        while ( unlikely(bug == stop_frames[id]) )
> -            ++id;
> -        if ( bug_loc(bug) == eip )
> -            break;
> +        for ( bug = __start_bug_frames, id = 0; stop_frames[id]; ++bug )
> +        {
> +            while ( unlikely(bug == stop_frames[id]) )
> +                ++id;
> +            if ( bug_loc(bug) == eip )
> +                break;
> +        }
> +        if ( !stop_frames[id] )
> +            goto die;
> +    }
> +    else
> +    {
> +        bug = xsplice_find_bug(eip, &id);
> +        if ( !bug )
> +            goto die;
>      }
> -    if ( !stop_frames[id] )
> -        goto die;
>  
>      eip += sizeof(bug_insn);
>      if ( id == BUGFRAME_run_fn )
> @@ -1103,7 +1113,7 @@ void do_invalid_op(struct cpu_user_regs *regs)
>  
>      /* WARN, BUG or ASSERT: decode the filename pointer and line number. */
>      filename = bug_ptr(bug);
> -    if ( !is_kernel(filename) )
> +    if ( !is_kernel(filename) && !is_module(filename) )
>          goto die;
>      fixup = strlen(filename);
>      if ( fixup > 50 )
> @@ -1130,7 +1140,7 @@ void do_invalid_op(struct cpu_user_regs *regs)
>      case BUGFRAME_assert:
>          /* ASSERT: decode the predicate string pointer. */
>          predicate = bug_msg(bug);
> -        if ( !is_kernel(predicate) )
> +        if ( !is_kernel(predicate) && !is_module(predicate) )

is_payload ?
>              predicate = "<unknown>";
>  
>          printk("Assertion '%s' failed at %s%s:%d\n",
> diff --git a/xen/common/symbols.c b/xen/common/symbols.c
> index a59c59d..bf5623f 100644
> --- a/xen/common/symbols.c
> +++ b/xen/common/symbols.c
> @@ -17,6 +17,7 @@
>  #include <xen/lib.h>
>  #include <xen/string.h>
>  #include <xen/spinlock.h>
> +#include <xen/xsplice.h>
>  #include <public/platform.h>
>  #include <xen/guest_access.h>
>  
> @@ -101,6 +102,12 @@ bool_t is_active_kernel_text(unsigned long addr)
>              (system_state < SYS_STATE_active && is_kernel_inittext(addr)));
>  }
>  
> +bool_t is_active_text(unsigned long addr)
> +{
> +    return is_active_kernel_text(addr) ||
> +           is_active_module_text(addr);

Ditto?
> +}
> +
>  const char *symbols_lookup(unsigned long addr,
>                             unsigned long *symbolsize,
>                             unsigned long *offset,
> diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
> index 4476be5..982954b 100644
> --- a/xen/common/xsplice.c
> +++ b/xen/common/xsplice.c
> @@ -40,6 +40,11 @@ struct payload {
>      int nfuncs;
>      void *module_address;
>      size_t module_pages;
> +    size_t core_size;
> +    size_t core_text_size;
> +
> +    struct bug_frame *start_bug_frames[4];
> +    struct bug_frame *stop_bug_frames[4];

You need a #define for the 4 value.
>  
>      char  id[XEN_XSPLICE_NAME_SIZE + 1];          /* Name of it. */
>  };
> @@ -525,26 +530,27 @@ static void free_module(struct payload *payload)
>      payload->module_pages = 0;
>  }
>  
> -static void alloc_section(struct xsplice_elf_sec *sec, size_t *core_size)
> +static void alloc_section(struct xsplice_elf_sec *sec, size_t *size)
>  {
> -    size_t align_size = ROUNDUP(*core_size, sec->sec->sh_addralign);
> +    size_t align_size = ROUNDUP(*size, sec->sec->sh_addralign);
>      sec->sec->sh_entsize = align_size;
> -    *core_size = sec->sec->sh_size + align_size;
> +    *size = sec->sec->sh_size + align_size;
>  }

That looks to be unrelated to this patch. Perhaps squash it in earlier?

>  
>  static int move_module(struct payload *payload, struct xsplice_elf *elf)
>  {
>      uint8_t *buf;
>      int i;
> -    size_t core_size = 0;
> +    size_t size = 0;
>  
>      /* Allocate text regions */
>      for ( i = 0; i < elf->hdr->e_shnum; i++ )
>      {
>          if ( (elf->sec[i].sec->sh_flags & (SHF_ALLOC|SHF_EXECINSTR)) ==
>               (SHF_ALLOC|SHF_EXECINSTR) )
> -            alloc_section(&elf->sec[i], &core_size);
> +            alloc_section(&elf->sec[i], &size);
>      }
> +    payload->core_text_size = size;
>  
>      /* Allocate rw data */
>      for ( i = 0; i < elf->hdr->e_shnum; i++ )
> @@ -552,7 +558,7 @@ static int move_module(struct payload *payload, struct xsplice_elf *elf)
>          if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
>               !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
>               (elf->sec[i].sec->sh_flags & SHF_WRITE) )
> -            alloc_section(&elf->sec[i], &core_size);
> +            alloc_section(&elf->sec[i], &size);
>      }
>  
>      /* Allocate ro data */
> @@ -561,15 +567,16 @@ static int move_module(struct payload *payload, struct xsplice_elf *elf)
>          if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
>               !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
>               !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
> -            alloc_section(&elf->sec[i], &core_size);
> +            alloc_section(&elf->sec[i], &size);
>      }
> +    payload->core_size = size;
>  
> -    buf = alloc_module(core_size);
> +    buf = alloc_module(size);
>      if ( !buf ) {
>          printk(XENLOG_ERR "Could not allocate memory for module\n");
>          return -ENOMEM;
>      }
> -    memset(buf, 0, core_size);
> +    memset(buf, 0, size);
>  
>      for ( i = 0; i < elf->hdr->e_shnum; i++ )
>      {
> @@ -584,7 +591,7 @@ static int move_module(struct payload *payload, struct xsplice_elf *elf)
>      }
>  
>      payload->module_address = buf;
> -    payload->module_pages = PFN_UP(core_size);
> +    payload->module_pages = PFN_UP(size);
>  
>      return 0;
>  }
> @@ -659,6 +666,7 @@ static int find_special_sections(struct payload *payload,
>                                   struct xsplice_elf *elf)
>  {
>      struct xsplice_elf_sec *sec;
> +    int i;

unsigned int 
>  
>      sec = xsplice_elf_sec_by_name(elf, ".xsplice.funcs");
>      if ( !sec )
> @@ -670,6 +678,19 @@ static int find_special_sections(struct payload *payload,
>      payload->funcs = (struct xsplice_patch_func *)sec->load_addr;
>      payload->nfuncs = sec->sec->sh_size / (sizeof *payload->funcs);
>  
> +    for ( i = 0; i < 4; i++ )

use the #define.
> +    {
> +        char str[14];

And this needs a #define as well.
> +
> +        snprintf(str, sizeof str, ".bug_frames.%d", i);
> +        sec = xsplice_elf_sec_by_name(elf, str);
> +        if ( !sec )
> +            continue;
> +
> +        payload->start_bug_frames[i] = (struct bug_frame *)sec->load_addr;
> +        payload->stop_bug_frames[i] = (struct bug_frame *)(sec->load_addr + sec->sec->sh_size);

You probably should double check that the 'sh_size' is not greater than
4. 
> +    }
> +
>      return 0;
>  }
>  
> @@ -912,6 +933,72 @@ void do_xsplice(void)
>      }
>  }
>  
> +
> +/*
> + * Functions for handling special sections.
> + */
> +struct bug_frame *xsplice_find_bug(const char *eip, int *id)
> +{
> +    struct payload *data;
> +    struct bug_frame *bug;
> +    int i;

unsigned int;
> +
> +    /* No locking since this list is only ever changed during apply or revert
> +     * context. */

Please use hte different style of comments:

/*
 * blah blah blah
 */

> +    list_for_each_entry ( data, &applied_list, applied_list )
> +    {
> +        for (i = 0; i < 4; i++) {
> +            if (!data->start_bug_frames[i])
> +                continue;
> +            if ( !((void *)eip >= data->module_address &&
> +                   (void *)eip < (data->module_address + data->core_text_size)))
> +                continue;
> +
> +            for ( bug = data->start_bug_frames[i]; bug != data->stop_bug_frames[i]; ++bug ) {

Could we ever have the situation of where there is a NULL structure
within start_bug_frames and stop_bug_frames?

[Say the file is corrupted]
Perhaps we should double-check that in find_special_sections?
> +                if ( bug_loc(bug) == eip )
> +                {
> +                    *id = i;
> +                    return bug;
> +                }
> +            }
> +        }
> +    }
> +
> +    return NULL;
> +}
> +
> +bool_t is_module(const void *ptr)
> +{
> +    struct payload *data;
> +
> +    /* No locking since this list is only ever changed during apply or revert
> +     * context. */
> +    list_for_each_entry ( data, &applied_list, applied_list )
> +    {
> +        if ( ptr >= data->module_address &&
> +             ptr < (data->module_address + data->core_size))
> +            return true;
> +    }
> +
> +    return false;
> +}
> +
> +bool_t is_active_module_text(unsigned long addr)
> +{
> +    struct payload *data;
> +
> +    /* No locking since this list is only ever changed during apply or revert
> +     * context. */
> +    list_for_each_entry ( data, &applied_list, applied_list )
> +    {
> +        if ( (void *)addr >= data->module_address &&
> +             (void *)addr < (data->module_address + data->core_text_size))
> +            return true;
> +    }
> +
> +    return false;
> +}
> +

Those two look very very much the same. Could one of them call the
other?

Actually why not just have one? And do some casting?


>  static int __init xsplice_init(void)
>  {
>      register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
> diff --git a/xen/include/xen/kernel.h b/xen/include/xen/kernel.h
> index 548b64d..df57754 100644
> --- a/xen/include/xen/kernel.h
> +++ b/xen/include/xen/kernel.h
> @@ -99,6 +99,7 @@ extern enum system_state {
>  } system_state;
>  
>  bool_t is_active_kernel_text(unsigned long addr);
> +bool_t is_active_text(unsigned long addr);
>  
>  #endif /* _LINUX_KERNEL_H */
>  
> diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
> index 507829c..772fa3a 100644
> --- a/xen/include/xen/xsplice.h
> +++ b/xen/include/xen/xsplice.h
> @@ -22,6 +22,10 @@ extern void xsplice_printall(unsigned char key);
>  
>  void do_xsplice(void);
>  
> +struct bug_frame * xsplice_find_bug(const char *eip, int *id);
> +bool_t is_module(const void *addr);
> +bool_t is_active_module_text(unsigned long addr);
> +
>  /* Arch hooks */
>  int xsplice_verify_elf(uint8_t *data, ssize_t len);
>  int xsplice_perform_rel(struct xsplice_elf *elf,
> -- 
> 2.4.3
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 10/11] xsplice: Add support for exception tables
  2015-11-03 18:16 ` [PATCH v1 10/11] xsplice: Add support for exception tables Ross Lagerwall
@ 2015-11-05 19:47   ` Konrad Rzeszutek Wilk
  2015-11-27 16:28   ` Martin Pohlack
  1 sibling, 0 replies; 40+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-05 19:47 UTC (permalink / raw)
  To: Ross Lagerwall; +Cc: Andrew Cooper, Jan Beulich, xen-devel

On Tue, Nov 03, 2015 at 06:16:07PM +0000, Ross Lagerwall wrote:
> Add support for exception tables contained within xsplice modules. If an
> exception occurs search either the main exception table or a particular
> active module's exception table depending on the instruction pointer.

s/module/payload/
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> ---
>  xen/arch/x86/extable.c        | 36 ++++++++++++++++++++++--------------
>  xen/common/xsplice.c          | 41 +++++++++++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/uaccess.h |  5 +++++
>  xen/include/xen/xsplice.h     |  2 ++
>  4 files changed, 70 insertions(+), 14 deletions(-)
> 
> diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c
> index 89b5bcb..2787a92 100644
> --- a/xen/arch/x86/extable.c
> +++ b/xen/arch/x86/extable.c
> @@ -4,6 +4,7 @@
>  #include <xen/perfc.h>
>  #include <xen/sort.h>
>  #include <xen/spinlock.h>
> +#include <xen/xsplice.h>
>  #include <asm/uaccess.h>
>  
>  #define EX_FIELD(ptr, field) ((unsigned long)&(ptr)->field + (ptr)->field)
> @@ -18,7 +19,7 @@ static inline unsigned long ex_cont(const struct exception_table_entry *x)
>  	return EX_FIELD(x, cont);
>  }
>  
> -static int __init cmp_ex(const void *a, const void *b)
> +static int cmp_ex(const void *a, const void *b)
>  {
>  	const struct exception_table_entry *l = a, *r = b;
>  	unsigned long lip = ex_addr(l);
> @@ -33,7 +34,7 @@ static int __init cmp_ex(const void *a, const void *b)
>  }
>  
>  #ifndef swap_ex
> -static void __init swap_ex(void *a, void *b, int size)
> +static void swap_ex(void *a, void *b, int size)
>  {
>  	struct exception_table_entry *l = a, *r = b, tmp;
>  	long delta = b - a;
> @@ -46,19 +47,23 @@ static void __init swap_ex(void *a, void *b, int size)
>  }
>  #endif
>  
> -void __init sort_exception_tables(void)
> +void sort_exception_table(struct exception_table_entry *start,
> +                          struct exception_table_entry *stop)
>  {
> -    sort(__start___ex_table, __stop___ex_table - __start___ex_table,
> -         sizeof(struct exception_table_entry), cmp_ex, swap_ex);
> -    sort(__start___pre_ex_table,
> -         __stop___pre_ex_table - __start___pre_ex_table,
> +    sort(start, stop - start,
>           sizeof(struct exception_table_entry), cmp_ex, swap_ex);
>  }
>  
> -static inline unsigned long
> -search_one_table(const struct exception_table_entry *first,
> -                 const struct exception_table_entry *last,
> -                 unsigned long value)
> +void __init sort_exception_tables(void)
> +{
> +    sort_exception_table(__start___ex_table, __stop___ex_table);
> +    sort_exception_table(__start___pre_ex_table, __stop___pre_ex_table);
> +}
> +
> +unsigned long
> +search_one_extable(const struct exception_table_entry *first,
> +                   const struct exception_table_entry *last,
> +                   unsigned long value)
>  {
>      const struct exception_table_entry *mid;
>      long diff;
> @@ -80,15 +85,18 @@ search_one_table(const struct exception_table_entry *first,
>  unsigned long
>  search_exception_table(unsigned long addr)
>  {
> -    return search_one_table(
> -        __start___ex_table, __stop___ex_table-1, addr);
> +    if ( likely(is_kernel(addr)) )
> +        return search_one_extable(
> +            __start___ex_table, __stop___ex_table-1, addr);
> +    else
> +        return search_module_extables(addr);
>  }
>  
>  unsigned long
>  search_pre_exception_table(struct cpu_user_regs *regs)
>  {
>      unsigned long addr = (unsigned long)regs->eip;
> -    unsigned long fixup = search_one_table(
> +    unsigned long fixup = search_one_extable(
>          __start___pre_ex_table, __stop___pre_ex_table-1, addr);
>      if ( fixup )
>      {
> diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
> index 982954b..c5a403b 100644
> --- a/xen/common/xsplice.c
> +++ b/xen/common/xsplice.c
> @@ -45,6 +45,10 @@ struct payload {
>  
>      struct bug_frame *start_bug_frames[4];
>      struct bug_frame *stop_bug_frames[4];
> +#ifdef CONFIG_X86
> +    struct exception_table_entry *start_ex_table;
> +    struct exception_table_entry *stop_ex_table;
> +#endif
>  
>      char  id[XEN_XSPLICE_NAME_SIZE + 1];          /* Name of it. */
>  };
> @@ -691,6 +695,17 @@ static int find_special_sections(struct payload *payload,
>          payload->stop_bug_frames[i] = (struct bug_frame *)(sec->load_addr + sec->sec->sh_size);
>      }
>  
> +#ifdef CONFIG_X86
> +    sec = xsplice_elf_sec_by_name(elf, ".ex_table");
> +    if ( sec )
> +    {
> +        payload->start_ex_table = (struct exception_table_entry *)sec->load_addr;
> +        payload->stop_ex_table = (struct exception_table_entry *)(sec->load_addr + sec->sec->sh_size);

Please double-check that sh_size is not some funny value. For example it
may be 0, or worst - not aligned to the size of 'struct
exception_table_entry'!

> +
> +        sort_exception_table(payload->start_ex_table, payload->stop_ex_table);
> +    }
> +#endif
> +
>      return 0;
>  }
>  
> @@ -999,6 +1014,32 @@ bool_t is_active_module_text(unsigned long addr)
>      return false;
>  }
>  
> +#ifdef CONFIG_X86
> +unsigned long search_module_extables(unsigned long addr)
> +{
> +    struct payload *data;
> +    unsigned long ret;
> +
> +    /* No locking since this list is only ever changed during apply or revert
> +     * context. */
> +    list_for_each_entry ( data, &applied_list, applied_list )
> +    {
> +        if ( !data->start_ex_table )
> +            continue;
> +        if ( !((void *)addr >= data->module_address &&
> +               (void *)addr < (data->module_address + data->core_text_size)))

The last ')' needs to have space before it.

> +            continue;
> +
> +        ret = search_one_extable(data->start_ex_table, data->stop_ex_table - 1,
> +                                 addr);
> +        if ( ret )
> +            return ret;
> +    }
> +
> +    return 0;
> +}
> +#endif
> +
>  static int __init xsplice_init(void)
>  {
>      register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
> diff --git a/xen/include/asm-x86/uaccess.h b/xen/include/asm-x86/uaccess.h
> index 947470d..9e67bf0 100644
> --- a/xen/include/asm-x86/uaccess.h
> +++ b/xen/include/asm-x86/uaccess.h
> @@ -276,6 +276,11 @@ extern struct exception_table_entry __start___pre_ex_table[];
>  extern struct exception_table_entry __stop___pre_ex_table[];
>  
>  extern unsigned long search_exception_table(unsigned long);
> +extern unsigned long search_one_extable(const struct exception_table_entry *first,
> +                                        const struct exception_table_entry *last,
> +                                        unsigned long value);
>  extern void sort_exception_tables(void);
> +extern void sort_exception_table(struct exception_table_entry *start,
> +                                 struct exception_table_entry *stop);
>  
>  #endif /* __X86_UACCESS_H__ */
> diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
> index 772fa3a..485eb08 100644
> --- a/xen/include/xen/xsplice.h
> +++ b/xen/include/xen/xsplice.h
> @@ -26,6 +26,8 @@ struct bug_frame * xsplice_find_bug(const char *eip, int *id);
>  bool_t is_module(const void *addr);
>  bool_t is_active_module_text(unsigned long addr);
>  
> +unsigned long search_module_extables(unsigned long addr);
> +
>  /* Arch hooks */
>  int xsplice_verify_elf(uint8_t *data, ssize_t len);
>  int xsplice_perform_rel(struct xsplice_elf *elf,
> -- 
> 2.4.3
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 11/11] xsplice: Add support for alternatives
  2015-11-03 18:16 ` [PATCH v1 11/11] xsplice: Add support for alternatives Ross Lagerwall
@ 2015-11-05 19:48   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 40+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-05 19:48 UTC (permalink / raw)
  To: Ross Lagerwall; +Cc: Andrew Cooper, Jan Beulich, xen-devel

On Tue, Nov 03, 2015 at 06:16:08PM +0000, Ross Lagerwall wrote:
> Add support for applying alternative sections within xsplice modules. At
> module load time, apply any alternative sections that are found.
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> ---
>  xen/arch/x86/Makefile             |  2 +-
>  xen/arch/x86/alternative.c        | 12 ++++++------
>  xen/common/xsplice.c              | 11 +++++++++++
>  xen/include/asm-x86/alternative.h |  1 +
>  4 files changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
> index 6e05532..5dbe2e8 100644
> --- a/xen/arch/x86/Makefile
> +++ b/xen/arch/x86/Makefile
> @@ -7,7 +7,7 @@ subdir-y += oprofile
>  
>  subdir-$(x86_64) += x86_64
>  
> -obj-bin-y += alternative.init.o
> +obj-bin-y += alternative.o
>  obj-y += apic.o
>  obj-y += bitops.o
>  obj-bin-y += bzimage.init.o
> diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
> index 46ac0fd..8d895ad 100644
> --- a/xen/arch/x86/alternative.c
> +++ b/xen/arch/x86/alternative.c
> @@ -28,7 +28,7 @@
>  extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
>  
>  #ifdef K8_NOP1
> -static const unsigned char k8nops[] __initconst = {
> +static const unsigned char k8nops[] = {
>      K8_NOP1,
>      K8_NOP2,
>      K8_NOP3,
> @@ -52,7 +52,7 @@ static const unsigned char * const k8_nops[ASM_NOP_MAX+1] = {
>  #endif
>  
>  #ifdef P6_NOP1
> -static const unsigned char p6nops[] __initconst = {
> +static const unsigned char p6nops[] = {
>      P6_NOP1,
>      P6_NOP2,
>      P6_NOP3,
> @@ -75,7 +75,7 @@ static const unsigned char * const p6_nops[ASM_NOP_MAX+1] = {
>  };
>  #endif
>  
> -static const unsigned char * const *ideal_nops __initdata = k8_nops;
> +static const unsigned char * const *ideal_nops = k8_nops;
>  
>  static int __init mask_nmi_callback(const struct cpu_user_regs *regs, int cpu)
>  {
> @@ -100,7 +100,7 @@ static void __init arch_init_ideal_nops(void)
>  }
>  
>  /* Use this to add nops to a buffer, then text_poke the whole buffer. */
> -static void __init add_nops(void *insns, unsigned int len)
> +static void add_nops(void *insns, unsigned int len)
>  {
>      while ( len > 0 )
>      {
> @@ -127,7 +127,7 @@ static void __init add_nops(void *insns, unsigned int len)
>   *
>   * This routine is called with local interrupt disabled.
>   */
> -static void *__init text_poke_early(void *addr, const void *opcode, size_t len)
> +static void *text_poke_early(void *addr, const void *opcode, size_t len)
>  {
>      memcpy(addr, opcode, len);
>      sync_core();
> @@ -142,7 +142,7 @@ static void *__init text_poke_early(void *addr, const void *opcode, size_t len)
>   * APs have less capabilities than the boot processor are not handled.
>   * Tough. Make sure you disable such features by hand.
>   */
> -static void __init apply_alternatives(struct alt_instr *start, struct alt_instr *end)
> +void apply_alternatives(struct alt_instr *start, struct alt_instr *end)
>  {
>      struct alt_instr *a;
>      u8 *instr, *replacement;
> diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
> index c5a403b..6a368af 100644
> --- a/xen/common/xsplice.c
> +++ b/xen/common/xsplice.c
> @@ -682,6 +682,17 @@ static int find_special_sections(struct payload *payload,
>      payload->funcs = (struct xsplice_patch_func *)sec->load_addr;
>      payload->nfuncs = sec->sec->sh_size / (sizeof *payload->funcs);
>  
> +#ifdef CONFIG_X86
> +    sec = xsplice_elf_sec_by_name(elf, ".altinstructions");
> +    if ( sec )
> +    {
> +        local_irq_disable();
> +        apply_alternatives((struct alt_instr *)sec->load_addr,
> +                           (struct alt_instr *)(sec->load_addr + sec->sec->sh_size));

Before we do that we need to double-check that 'sh_size' is the proper
size (size aligns with the size of the structure) and that it does not
have some funny value (0).

> +        local_irq_enable();
> +    }
> +#endif
> +
>      for ( i = 0; i < 4; i++ )
>      {
>          char str[14];
> diff --git a/xen/include/asm-x86/alternative.h b/xen/include/asm-x86/alternative.h
> index 23c9b9f..8e83572 100644
> --- a/xen/include/asm-x86/alternative.h
> +++ b/xen/include/asm-x86/alternative.h
> @@ -23,6 +23,7 @@ struct alt_instr {
>      u8  replacementlen;     /* length of new instruction, <= instrlen */
>  };
>  
> +extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end);
>  extern void alternative_instructions(void);
>  
>  #define OLDINSTR(oldinstr)      "661:\n\t" oldinstr "\n662:\n"
> -- 
> 2.4.3
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 01/11] xsplice: Design document (v2).
  2015-11-05 10:49   ` Ross Lagerwall
@ 2015-11-05 19:56     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 40+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-05 19:56 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: Tim Deegan, Ian Jackson, Ian Campbell, Jan Beulich, xen-devel

On Thu, Nov 05, 2015 at 10:49:51AM +0000, Ross Lagerwall wrote:
> On 11/04/2015 09:10 PM, Konrad Rzeszutek Wilk wrote:
> snip
> >>+The payload **MUST** contain enough data to allow us to apply the update
> >>+and also safely reverse it. As such we **MUST** know:
> >>+
> >>+ * The locations in memory to be patched. This can be determined dynamically
> >>+   via symbols or via virtual addresses.
> >>+ * The new code that will be patched in.
> >>+ * Signature to verify the payload.
> >
> >Argh. We need to move the 'Signature to verify' in the 'v2' section
> >as I don't think we can get that done in time.
> 
> No, not for V1.
> 
> >
> >>+
> >>+This binary format can be constructed using an custom binary format but
> >>+there are severe disadvantages of it:
> >>+
> >>+ * The format might need to be changed and we need an mechanism to accommodate
> >>+   that.
> >>+ * It has to be platform agnostic.
> >>+ * Easily constructed using existing tools.
> >>+
> >>+As such having the payload in an ELF file is the sensible way. We would be
> >>+carrying the various sets of structures (and data) in the ELF sections under
> >>+different names and with definitions. The prefix for the ELF section name
> >>+would always be: *.xsplice* to match up to the names of the structures.
> >>+
> >>+Note that every structure has padding. This is added so that the hypervisor
> >>+can re-use those fields as it sees fit.
> >>+
> >>+Earlier design attempted to ineptly explain the relations of the ELF sections
> >>+to each other without using proper ELF mechanism (sh_info, sh_link, data
> >>+structures using Elf types, etc). This design will explain in detail
> >>+the structures and how they are used together and not dig in the ELF
> >>+format - except mention that the section names should match the
> >>+structure names.
> >>+
> >>+The xSplice payload is a relocatable ELF binary. A typical binary would have:
> >>+
> >>+ * One or more .text sections
> >>+ * Zero or more read-only data sections
> >>+ * Zero or more data sections
> >>+ * Relocations for each of these sections
> >>+
> >>+It may also have some architecture-specific sections. For example:
> >>+
> >>+ * Alternatives instructions
> >>+ * Bug frames
> >>+ * Exception tables
> >>+ * Relocations for each of these sections
> >>+
> >>+The xSplice core code loads the payload as a standard ELF binary, relocates it
> >>+and handles the architecture-specifc sections as needed. This process is much
> >>+like what the Linux kernel module loader does. It contains no xSplice-specific
> >>+details and thus will not be discussed further.
> >
> >What is 'it'? The 'process of what module loader does'?
> 
> 'It' refers to the process of module loading in the previous sentence.
> 
> >
> >>+
> >>+Importantly, the payload also contains a section with an array of structures
> >>+describing the functions to be patched:
> >>+<pre>
> >>+struct xsplice_patch_func {
> >>+    unsigned long new_addr;
> >>+    unsigned long new_size;
> >>+    unsigned long old_addr;
> >>+    unsigned long old_size;
> >>+    char *name;
> >>+    uint8_t pad[64];
> >>+};
> >>+<pre>
> >
> >Uh, so 104 bytes ? Or did you mean to s/64/24/ so the structure is nicely
> >padded to 64-bytes?
> >
> >I think that is what you meant.
> 
> OK. I'm not too fussed about exact sizes for V1 anyway, it's likely to
> change at some point.

Right. How do we want to handle that?

The original structure I wrote up was mostly copied from ksplice kmodsrc with
modifications. Which meant it was capable of patching data, code, your
grandmothers sofa, you name it. But for v1 we want to have the basic -
patching code.

However going forward we will want to expand and make changes - and some
companies may want to add fields and structures (and naturally also
add code to the hypervisor for that) and so on.

And at some point we will make mistakes and realize that some structures
are to be deprecated.

To make the live easier for them we ought to provide some way for this.

Thoughts?


> 
> -- 
> Ross Lagerwall

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 08/11] xsplice: Implement support for applying patches
  2015-11-05 11:45     ` Ross Lagerwall
@ 2015-11-05 20:08       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 40+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-05 20:08 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: Kevin Tian, Ian Campbell, Andrew Cooper, xen-devel, Jan Beulich,
	Stefano Stabellini, Jun Nakajima, Aravind Gopalakrishnan,
	Boris Ostrovsky, Suravee Suthikulpanit

On Thu, Nov 05, 2015 at 11:45:42AM +0000, Ross Lagerwall wrote:
> On 11/05/2015 03:17 AM, Konrad Rzeszutek Wilk wrote:
> snip
> >>diff --git a/xen/arch/x86/xsplice.c b/xen/arch/x86/xsplice.c
> >>index dbff0d5..31e4124 100644
> >>--- a/xen/arch/x86/xsplice.c
> >>+++ b/xen/arch/x86/xsplice.c
> >>@@ -3,6 +3,25 @@
> >>  #include <xen/xsplice_elf.h>
> >>  #include <xen/xsplice.h>
> >>
> >>+#define PATCH_INSN_SIZE 5
> >>+
> >>+void xsplice_apply_jmp(struct xsplice_patch_func *func)
> >
> >Don't we want for it to be 'int'
> 
> Only if an error is expected.
> 
> >>+{
> >>+    uint32_t val;
> >>+    uint8_t *old_ptr;
> >>+
> >>+    old_ptr = (uint8_t *)func->old_addr;
> >>+    memcpy(func->undo, old_ptr, PATCH_INSN_SIZE);
> >
> >And perhaps use something which can catch an exception (#GP) so that
> >this can error out?
> 
> Why would this fail?

I was thinking of page being read-only and we hadn't modified it (bug
in our code). It would be good to actually report this instead of
doing an #GP and crashing the whole hypervisor.

> >>+    *old_ptr++ = 0xe9; /* Relative jump */
> >>+    val = func->new_addr - func->old_addr - PATCH_INSN_SIZE;
> >>+    memcpy(old_ptr, &val, sizeof val);
> >>+}
> >>+
> >>+void xsplice_revert_jmp(struct xsplice_patch_func *func)
> >>+{
> >>+    memcpy((void *)func->old_addr, func->undo, PATCH_INSN_SIZE);
> >>+}
> >>+
> >>  int xsplice_verify_elf(uint8_t *data, ssize_t len)
> >>  {
> >>
> >>diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
> >>index 5e88c55..4476be5 100644
> >>--- a/xen/common/xsplice.c
> >>+++ b/xen/common/xsplice.c
> >>@@ -11,16 +11,21 @@
> >>  #include <xen/guest_access.h>
> >>  #include <xen/stdbool.h>
> >>  #include <xen/sched.h>
> >>+#include <xen/softirq.h>
> >>  #include <xen/lib.h>
> >>+#include <xen/wait.h>
> >>  #include <xen/xsplice_elf.h>
> >>  #include <xen/xsplice.h>
> >>  #include <public/sysctl.h>
> >>
> >>  #include <asm/event.h>
> >>+#include <asm/nmi.h>
> >>
> >>  static DEFINE_SPINLOCK(payload_list_lock);
> >>  static LIST_HEAD(payload_list);
> >>
> >>+static LIST_HEAD(applied_list);
> >>+
> >>  static unsigned int payload_cnt;
> >>  static unsigned int payload_version = 1;
> >>
> >>@@ -29,15 +34,34 @@ struct payload {
> >>      int32_t rc;         /* 0 or -EXX. */
> >>
> >>      struct list_head   list;   /* Linked to 'payload_list'. */
> >>+    struct list_head   applied_list;   /* Linked to 'applied_list'. */
> >>
> >>+    struct xsplice_patch_func *funcs;
> >>+    int nfuncs;
> >
> >unsigned int;
> >
> >>      void *module_address;
> >>      size_t module_pages;
> >>
> >>      char  id[XEN_XSPLICE_NAME_SIZE + 1];          /* Name of it. */
> >>  };
> >>
> >>+/* Defines an outstanding patching action. */
> >>+struct xsplice_work
> >>+{
> >>+    atomic_t semaphore;          /* Used for rendezvous */
> >>+    atomic_t irq_semaphore;      /* Used to signal all IRQs disabled */
> >>+    struct payload *data;        /* The payload on which to act */
> >>+    volatile bool_t do_work;     /* Signals work to do */
> >>+    volatile bool_t ready;       /* Signals all CPUs synchronized */
> >>+    uint32_t cmd;                /* Action request. XSPLICE_ACTION_* */
> >
> >Now since you have a pointer to 'data' can't you follow that for the
> >cmd? Or at least the 'data->state'?
> 
> I moved cmd out of the payload and into xsplice_work since cmd is only
> needed when there is work to do.

OK. My thoughts were that if this gets wedged -  the user may want
to hit 'x' on the serial console to get an idea of what is wedged.

And the keyhandler can print the gory details of the payloads - along
with the one that is currently wedged and what command it is currently
doing. Having this in payload->cmd would work nicely.

But on the other hand - we could also just expand the keyhandler to
look in the xsplice_work and print the gory details of that.


> data->state contains the current state of the payload (i.e. before the
> action has been performed) so it provides no indication of what command
> needs to be performed.

Right, but previously the 'cmd' had the command it was to perform.
> 
> >
> >Missing full stops.
> >>+};
> >>+
> >>+static DEFINE_SPINLOCK(xsplice_work_lock);
> >>+/* There can be only one outstanding patching action. */
> >>+static struct xsplice_work xsplice_work;
> >>+
> >>  static int load_module(struct payload *payload, uint8_t *raw, ssize_t len);
> >>  static void free_module(struct payload *payload);
> >>+static int schedule_work(struct payload *data, uint32_t cmd);
> >>
> >>  static const char *state2str(int32_t state)
> >>  {
> >>@@ -341,28 +365,22 @@ static int xsplice_action(xen_sysctl_xsplice_action_t *action)
> >>      case XSPLICE_ACTION_REVERT:
> >>          if ( data->state == XSPLICE_STATE_APPLIED )
> >>          {
> >>-            /* No implementation yet. */
> >>-            data->state = XSPLICE_STATE_CHECKED;
> >>-            data->rc = 0;
> >>-            rc = 0;
> >>+            data->rc = -EAGAIN;
> >>+            rc = schedule_work(data, action->cmd);
> >>          }
> >>          break;
> >>      case XSPLICE_ACTION_APPLY:
> >>          if ( (data->state == XSPLICE_STATE_CHECKED) )
> >>          {
> >>-            /* No implementation yet. */
> >>-            data->state = XSPLICE_STATE_APPLIED;
> >>-            data->rc = 0;
> >>-            rc = 0;
> >>+            data->rc = -EAGAIN;
> >>+            rc = schedule_work(data, action->cmd);
> >>          }
> >>          break;
> >>      case XSPLICE_ACTION_REPLACE:
> >>          if ( data->state == XSPLICE_STATE_CHECKED )
> >>          {
> >>-            /* No implementation yet. */
> >>-            data->state = XSPLICE_STATE_CHECKED;
> >>-            data->rc = 0;
> >>-            rc = 0;
> >>+            data->rc = -EAGAIN;
> >>+            rc = schedule_work(data, action->cmd);
> >>          }
> >>          break;
> >>      default:
> >>@@ -637,6 +655,24 @@ static int perform_relocs(struct xsplice_elf *elf)
> >>      return 0;
> >>  }
> >>
> >>+static int find_special_sections(struct payload *payload,
> >>+                                 struct xsplice_elf *elf)
> >>+{
> >>+    struct xsplice_elf_sec *sec;
> >>+
> >>+    sec = xsplice_elf_sec_by_name(elf, ".xsplice.funcs");
> >>+    if ( !sec )
> >>+    {
> >>+        printk(XENLOG_ERR ".xsplice.funcs is missing\n");
> >>+        return -1;
> >>+    }
> >>+
> >>+    payload->funcs = (struct xsplice_patch_func *)sec->load_addr;
> >>+    payload->nfuncs = sec->sec->sh_size / (sizeof *payload->funcs);

You also need to verify that the 'sh_size' is actually the right aligned
size. As in we don't want to have a size that when dividing by 'sizeof
*payload->funcs' gives us a remainder.

> >>+
> >>+    return 0;
> >>+}
> >
> >That looks like it should belong to another patch?
> 
> Why? The array of functions is specifically needed for applying a payload so
> the code belongs in this patch.

The title of the patch sounded like it would add the infrastructure
for patching. Not the pieces needed from ELF file.

But thay may be just me.
> 
> >>+
> >>  static int load_module(struct payload *payload, uint8_t *raw, ssize_t len)
> >>  {
> >>      struct xsplice_elf elf;
> >>@@ -662,6 +698,10 @@ static int load_module(struct payload *payload, uint8_t *raw, ssize_t len)
> >>      if ( rc )
> >>          goto err_module;
> >>
> >>+    rc = find_special_sections(payload, &elf);
> >>+    if ( rc )
> >>+        goto err_module;
> >>+
> >
> >Ditto?
> >>      return 0;
> >>
> >>   err_module:
> >>@@ -672,6 +712,206 @@ static int load_module(struct payload *payload, uint8_t *raw, ssize_t len)
> >>      return rc;
> >>  }
> >>
> >>+
> >>+/*
> >>+ * The following functions get the CPUs into an appropriate state and
> >>+ * apply (or revert) each of the module's functions.
> >
> >s/module/payload/
> >
> >>+ */
> >>+
> >>+/*
> >>+ * This function is executed having all other CPUs with no stack and IRQs
> >>+ * disabled.
> >
> >Well, there is some stack. For example from 'cpu_idle' - you have the
> >'cpu_idle' on the stack.
> >
> >>+ */
> >>+static int apply_payload(struct payload *data)
> >>+{
> >>+    int i;
> >
> >unsigned int
> >>+
> >>+    printk(XENLOG_DEBUG "Applying payload: %s\n", data->id);
> >>+
> >>+    for ( i = 0; i < data->nfuncs; i++ )
> >>+        xsplice_apply_jmp(data->funcs + i);
> >
> >And if this returns an error then we could skip adding
> >it to the applied_list..
> 
> Only if an error is expected.
> 
> >>+
> >
> >Also the patching in Linux seems to do some icache purging.
> >Should we use that?
> 
> I didn't see that when I looked for it. The alternatives patching in Xen
> doesn't purge the icache (afaict). We need feedback from an x86 maintainer
> here.

Sorry, it is the 'ftrace_modify_code_direct' - see 'sync_core' which
does an cpuid.

> 
> >
> >>+    list_add_tail(&data->applied_list, &applied_list);
> >>+
> >>+    return 0;
> >>+}
> >>+
> >>+/*
> >>+ * This function is executed having all other CPUs with no stack and IRQs
> >>+ * disabled.
> >>+ */
> >>+static int revert_payload(struct payload *data)
> >>+{
> >>+    int i;
> >
> >unsigned int i;
> >>+
> >>+    printk(XENLOG_DEBUG "Reverting payload: %s\n", data->id);
> >>+
> >>+    for ( i = 0; i < data->nfuncs; i++ )
> >>+        xsplice_revert_jmp(data->funcs + i);
> >>+
> >>+    list_del(&data->applied_list);
> >>+
> >>+    return 0;
> >>+}
> >>+
> >>+/* Must be holding the payload_list lock */
> >
> >Missing full stop.
> >
> >Should that lock be called something else now? (Because it is certainly
> >not protecting the list anymore - but also the scheduling action).
> 
> Hmm...
> 
> >>+static int schedule_work(struct payload *data, uint32_t cmd)
> >>+{
> >>+    /* Fail if an operation is already scheduled */
> >>+    if ( xsplice_work.do_work )
> >>+        return -EAGAIN;
> >>+
> >
> >>+    xsplice_work.cmd = cmd;
> >>+    xsplice_work.data = data;
> >>+    atomic_set(&xsplice_work.semaphore, 0);
> >>+    atomic_set(&xsplice_work.irq_semaphore, 0);
> >>+    xsplice_work.ready = false;
> >>+    smp_mb();
> >>+    xsplice_work.do_work = true;
> >>+    smp_mb();
> >
> >So this is your 'GO GO' signal right? I think you may want
> >to have 'smb_wmb()'
> 
> A full review of the memory barriers and synchronization is needed by
> someone more knowledgeable than me.
> 
> >>+
> >>+    return 0;
> >>+}
> >>+
> >
> >/me laughs. What a way to 'fix' the NMI watchdog.
> 
> It comes directly from the alternatives code.
> 
> >
> >>+static int mask_nmi_callback(const struct cpu_user_regs *regs, int cpu)
> >>+{
> >>+    return 1;
> >>+}
> >>+
> >>+static void reschedule_fn(void *unused)
> >>+{
> >>+    smp_mb(); /* Synchronize with setting do_work */
> >>+    raise_softirq(SCHEDULE_SOFTIRQ);
> >>+}
> >>+
> >>+/*
> >>+ * The main function which manages the work of quiescing the system and
> >>+ * patching code.
> >>+ */
> >>+void do_xsplice(void)
> >>+{
> >>+    int id;
> >
> >unsigned int id;
> >>+    unsigned int total_cpus;
> >>+    nmi_callback_t saved_nmi_callback;
> >>+
> >>+    /* Fast path: no work to do */
> >
> >Missing full stop.
> >>+    if ( likely(!xsplice_work.do_work) )
> >>+        return;
> >>+
> >>+    ASSERT(local_irq_is_enabled());
> >>+
> >>+    spin_lock(&xsplice_work_lock);
> >>+    id = atomic_read(&xsplice_work.semaphore);
> >>+    atomic_inc(&xsplice_work.semaphore);
> >>+    spin_unlock(&xsplice_work_lock);
> >
> >Could you use 'atomic_inc_and_test' and then you can get
> >rid of the spinlock.
> 
> OK.
> 
> >
> >>+
> >>+    total_cpus = num_online_cpus();
> >
> >Which could change across these invocations.. Perhaps
> >during these calls we need to lock up CPU up/down code?
> 
> OK.
> 
> >
> >>+
> >>+    if ( id == 0 )
> >>+    {
> >
> >Can you just make this its own function? Perhaps call it
> >'xsplice_do_single' or such?
> >
> >>+        s_time_t timeout, start;
> >>+
> >>+        /* Trigger other CPUs to execute do_xsplice */
> >
> >Missing full stop.
> >>+        smp_call_function(reschedule_fn, NULL, 0);
> >>+
> >>+        /* Wait for other CPUs with a timeout */
> >
> >Missing full stop.
> >>+        start = NOW();
> >>+        timeout = start + MILLISECS(30);
> >
> >Nah. That should be gotten from the XSPLICE_ACTION_APPLY 'time'
> >parameter - which has an 'timeout' in it.
> >
> >>+        while ( atomic_read(&xsplice_work.semaphore) != total_cpus &&
> >>+                NOW() < timeout )
> >>+            cpu_relax();
> >>+
> >>+        if ( atomic_read(&xsplice_work.semaphore) == total_cpus )
> >>+        {
> >>+            struct payload *data2;
> >
> >s/data2/data/ ?
> >>+
> >>+            /* "Mask" NMIs */
> >>+            saved_nmi_callback = set_nmi_callback(mask_nmi_callback);
> >>+
> >>+            /* All CPUs are waiting, now signal to disable IRQs */
> >>+            xsplice_work.ready = true;
> >>+            smp_mb();
> >>+
> >>+            /* Wait for irqs to be disabled */
> >>+            while ( atomic_read(&xsplice_work.irq_semaphore) != (total_cpus - 1) )
> >>+                cpu_relax();
> >>+
> >>+            local_irq_disable();
> >>+            /* Now this function should be the only one on any stack.
> >>+             * No need to lock the payload list or applied list. */
> >>+            switch ( xsplice_work.cmd )
> >>+            {
> >>+                case XSPLICE_ACTION_APPLY:
> >>+                        xsplice_work.data->rc = apply_payload(xsplice_work.data);
> >>+                        if ( xsplice_work.data->rc == 0 )
> >>+                            xsplice_work.data->state = XSPLICE_STATE_APPLIED;
> >>+                        break;
> >>+                case XSPLICE_ACTION_REVERT:
> >>+                        xsplice_work.data->rc = revert_payload(xsplice_work.data);
> >>+                        if ( xsplice_work.data->rc == 0 )
> >>+                            xsplice_work.data->state = XSPLICE_STATE_CHECKED;
> >>+                        break;
> >>+                case XSPLICE_ACTION_REPLACE:
> >>+                        list_for_each_entry ( data2, &payload_list, list )
> >>+                        {
> >>+                            if ( data2->state != XSPLICE_STATE_APPLIED )
> >>+                                continue;
> >>+
> >>+                            data2->rc = revert_payload(data2);
> >>+                            if ( data2->rc == 0 )
> >>+                                data2->state = XSPLICE_STATE_CHECKED;
> >>+                            else
> >>+                            {
> >>+                                xsplice_work.data->rc = -EINVAL;
> >
> >Why not copy the error code (from data2->rc?)
> 
> No. Reverting a different payload updates the error code for that payload.
> The payload to-be-applied has failed because a dependent action has failed.
> This is not the same as the original error. The original error is visible
> through data2->rc.
> 
> >>+                                break;
> >>+                            }
> >>+                        }
> >>+                        if ( xsplice_work.data->rc != -EINVAL )
> >
> >And here you can just check for zero.
> 
> No, because xsplice_work.data->rc is originally -EAGAIN (in progress). I
> suppose I could check for xsplice_work.data->rc == -EAGAIN.

/me nods. That would be nicer I think?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 07/11] xsplice: Implement payload loading
  2015-11-05 11:15     ` Ross Lagerwall
@ 2015-11-05 20:12       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 40+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-05 20:12 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: Andrew Cooper, Stefano Stabellini, Ian Campbell, Jan Beulich, xen-devel

On Thu, Nov 05, 2015 at 11:15:34AM +0000, Ross Lagerwall wrote:
> On 11/04/2015 10:21 PM, Konrad Rzeszutek Wilk wrote:
> snip
> >>
> >>+
> >>+/*
> >>+ * The following functions prepare an xSplice module to be executed by
> >>+ * allocating space, loading the allocated sections, resolving symbols,
> >>+ * performing relocations, etc.
> >>+ */
> >>+#ifdef CONFIG_X86
> >>+static void *alloc_module(size_t size)
> >
> >s/module/payload/
> 
> My intention was that all the code which implements the "module loader"
> functionality (and is sort of independent from xSplice) uses the term
> "module" whereas the payload implies the loaded module + the other
> xSplice-specific bits. Your thoughts?

Aaah. I had module == payload in mind. 

> 
> >>+{
> >>+    mfn_t *mfn, *mfn_ptr;
> >>+    size_t pages, i;
> >>+    struct page_info *pg;
> >>+    unsigned long hole_start, hole_end, cur;
> >>+    struct payload *data, *data2;
> >>+
> >>+    ASSERT(size);
> >>+
> >>+    pages = PFN_UP(size);
> >>+    mfn = xmalloc_array(mfn_t, pages);
> >>+    if ( mfn == NULL )
> >>+        return NULL;
> >>+
> >>+    for ( i = 0; i < pages; i++ )
> >>+    {
> >>+        pg = alloc_domheap_page(NULL, 0);
> >>+        if ( pg == NULL )
> >>+            goto error;
> >>+        mfn[i] = _mfn(page_to_mfn(pg));
> >>+    }
> >
> >This looks like 'vmalloc'. Why not use that?
> >(That explanation should be part of the commit description probably)
> 
> vmalloc allocates pages and then maps them to an arbitrary virtual address
> with PAGE_HYPERVISOR. I needed to use a specific virtual address with
> PAGE_HYPERVISOR_RWX.

/me nods. 
> 
> >
> >>+
> >>+    hole_start = (unsigned long)module_virt_start;
> >>+    hole_end = hole_start + pages * PAGE_SIZE;
> >>+    spin_lock(&payload_list_lock);
> >>+    list_for_each_entry ( data, &payload_list, list )
> >>+    {
> >>+        list_for_each_entry ( data2, &payload_list, list )
> >>+        {
> >>+            unsigned long start, end;
> >>+
> >>+            start = (unsigned long)data2->module_address;
> >>+            end = start + data2->module_pages * PAGE_SIZE;
> >>+            if ( hole_end > start && hole_start < end )
> >>+            {
> >>+                hole_start = end;
> >>+                hole_end = hole_start + pages * PAGE_SIZE;
> >>+                break;
> >>+            }
> >>+        }
> >>+        if ( &data2->list == &payload_list )
> >>+            break;
> >>+    }
> >>+    spin_unlock(&payload_list_lock);
> >
> >This could be made in a nice function. 'find_hole' perhaps?
> >
> >>+
> >>+    if ( hole_end >= module_virt_end )
> >>+        goto error;
> >>+
> >>+    for ( cur = hole_start, mfn_ptr = mfn; pages--; ++mfn_ptr, cur += PAGE_SIZE )
> >>+    {
> >>+        if ( map_pages_to_xen(cur, mfn_x(*mfn_ptr), 1, PAGE_HYPERVISOR_RWX) )
> >>+        {
> >>+            if ( cur != hole_start )
> >>+                destroy_xen_mappings(hole_start, cur);
> >
> >I think 'destroy_xen_mappings' is OK handling hole_start == cur.
> >
> >>+            goto error;
> >>+        }
> >>+    }
> >>+    xfree(mfn);
> >>+    return (void *)hole_start;
> >>+
> >>+ error:
> >>+    while ( i-- )
> >>+        free_domheap_page(mfn_to_page(mfn_x(mfn[i])));
> >>+    xfree(mfn);
> >>+    return NULL;
> >>+}
> >>+#else
> >>+static void *alloc_module(size_t size)
> >
> >s/module/payload/
> >>+{
> >>+    return NULL;
> >>+}
> >>+#endif
> >>+
> >>+static void free_module(struct payload *payload)
> >>+{
> >>+    int i;
> >
> >unsigned int;
> >
> >>+    struct page_info *pg;
> >>+    PAGE_LIST_HEAD(pg_list);
> >>+    void *va = payload->module_address;
> >>+    unsigned long addr = (unsigned long)va;
> >>+
> >>+    if ( !payload->module_address )
> >>+        return;
> >
> >How about 'if ( !addr )
> >		return;
> >?
> >
> >>+
> >>+    payload->module_address = NULL;
> >>+
> >>+    for ( i = 0; i < payload->module_pages; i++ )
> >>+        page_list_add(vmap_to_page(va + i * PAGE_SIZE), &pg_list);
> >>+
> >>+    destroy_xen_mappings(addr, addr + payload->module_pages * PAGE_SIZE);
> >>+
> >>+    while ( (pg = page_list_remove_head(&pg_list)) != NULL )
> >>+        free_domheap_page(pg);
> >>+
> >>+    payload->module_pages = 0;
> >>+}
> >>+
> >>+static void alloc_section(struct xsplice_elf_sec *sec, size_t *core_size)
> >
> >s/alloc/compute/?
> >
> >>+{
> >>+    size_t align_size = ROUNDUP(*core_size, sec->sec->sh_addralign);
> >>+    sec->sec->sh_entsize = align_size;
> >>+    *core_size = sec->sec->sh_size + align_size;
> >>+}
> >>+
> >>+static int move_module(struct payload *payload, struct xsplice_elf *elf)
> >>+{
> >>+    uint8_t *buf;
> >>+    int i;
> >
> >unsigned int i;
> >
> >>+    size_t core_size = 0;
> >>+
> >>+    /* Allocate text regions */
> >
> >s/Allocate/Compute/
> >
> >>+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
> >>+    {
> >>+        if ( (elf->sec[i].sec->sh_flags & (SHF_ALLOC|SHF_EXECINSTR)) ==
> >>+             (SHF_ALLOC|SHF_EXECINSTR) )
> >>+            alloc_section(&elf->sec[i], &core_size);
> >>+    }
> >>+
> >>+    /* Allocate rw data */
> >>+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
> >>+    {
> >>+        if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> >>+             !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
> >>+             (elf->sec[i].sec->sh_flags & SHF_WRITE) )
> >>+            alloc_section(&elf->sec[i], &core_size);
> >>+    }
> >>+
> >>+    /* Allocate ro data */
> >>+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
> >>+    {
> >>+        if ( (elf->sec[i].sec->sh_flags & SHF_ALLOC) &&
> >>+             !(elf->sec[i].sec->sh_flags & SHF_EXECINSTR) &&
> >>+             !(elf->sec[i].sec->sh_flags & SHF_WRITE) )
> >>+            alloc_section(&elf->sec[i], &core_size);
> >>+    }
> >>+
> >>+    buf = alloc_module(core_size);
> >>+    if ( !buf ) {
> >>+        printk(XENLOG_ERR "Could not allocate memory for module\n");
> >
> >(%s: Could not allocate %u memory for payload!\n", elf->name, core_size);
> >
> >>+        return -ENOMEM;
> >>+    }
> >>+    memset(buf, 0, core_size);
> >
> >Perhaps for fun it ought to be 'ud2' ?
> 
> There's no point. It either gets memcpy'd over or needs to be set to zero
> for BSS.

Or 0xcc. Was thinking it may be good have it poison so
that we would trip over bugs more easily. Maybe for debug builds?

> 
> >
> >>+
> >>+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
> >>+    {
> >>+        if ( elf->sec[i].sec->sh_flags & SHF_ALLOC )
> >>+        {
> >>+            elf->sec[i].load_addr = buf + elf->sec[i].sec->sh_entsize;
> >>+            memcpy(elf->sec[i].load_addr, elf->sec[i].data,
> >>+                   elf->sec[i].sec->sh_size);
> >>+            printk(XENLOG_DEBUG "Loaded %s at 0x%p\n",
> >
> >Add %s: at the start ..
> >>+                   elf->sec[i].name, elf->sec[i].load_addr);
> >
> >which would be elf->name.
> >
> >>+        }
> >>+    }
> >>+
> >>+    payload->module_address = buf;
> >>+    payload->module_pages = PFN_UP(core_size);
> >
> >Instead of module could we name it payload?
> 
> See comment above.
> 
> >
> >>+
> >>+    return 0;
> >>+}
> >>+
> >>+static int resolve_symbols(struct xsplice_elf *elf)
> >
> >s/resolve/check/
> 
> No, this is resolving section symbols.
> 
> >
> >>+{
> >>+    int i;
> >
> >unsigned int;
> >
> >>+
> >>+    for ( i = 1; i < elf->nsym; i++ )
> >
> >Why 1? Please explain as comment.
> 
> The first entry of an ELF symbol table is the "undefined symbol index". This
> code is expected to be read alongside the ELF specification :-)
> 
> >
> >
> >>+    {
> >>+        switch ( elf->sym[i].sym->st_shndx )
> >>+        {
> >>+            case SHN_COMMON:
> >>+                printk(XENLOG_ERR "Unexpected common symbol: %s\n",
> >>+                       elf->sym[i].name);
> >
> >Please also include elf->name in the error.
> >
> >>+                return -EINVAL;
> >>+                break;
> >>+            case SHN_UNDEF:
> >>+                printk(XENLOG_ERR "Unknown symbol: %s\n", elf->sym[i].name);
> >
> >Ditto.
> >>+                return -ENOENT;
> >>+                break;
> >>+            case SHN_ABS:
> >>+                printk(XENLOG_DEBUG "Absolute symbol: %s => 0x%p\n",
> >>+                       elf->sym[i].name, (void *)elf->sym[i].sym->st_value);
> >>+                break;
> >>+            default:
> >>+                if ( elf->sec[elf->sym[i].sym->st_shndx].sec->sh_flags & SHF_ALLOC )
> >>+                {
> >>+                    elf->sym[i].sym->st_value +=
> >>+                        (unsigned long)elf->sec[elf->sym[i].sym->st_shndx].load_addr;
> >>+                    printk(XENLOG_DEBUG "Symbol resolved: %s => 0x%p\n",
> >
> >Ditto;
> >>+                           elf->sym[i].name, (void *)elf->sym[i].sym->st_value);
> >>+                }
> >>+        }
> >>+    }
> >>+
> >>+    return 0;
> >>+}
> >>+
> >>+static int perform_relocs(struct xsplice_elf *elf)
> >>+{
> >>+    struct xsplice_elf_sec *rela, *base;
> >>+    int i, rc;
> >>+
> >
> >unsigned int i;
> >
> >>+    for ( i = 0; i < elf->hdr->e_shnum; i++ )
> >>+    {
> >>+        rela = &elf->sec[i];
> >>+
> >>+        /* Is it a valid relocation section? */
> >>+        if ( rela->sec->sh_info >= elf->hdr->e_shnum )
> >>+            continue;
> >
> >Um, don't we want to mark it as invalid or such?
> >Or overwrite it so we won't use it?
> 
> The code doesn't use it. I don't understand the concern.

The comment. Perhaps change it to
/* Ignore invalid relocation sections.  */
> 
> >
> >>+
> >>+        base = &elf->sec[rela->sec->sh_info];
> >>+
> >>+        /* Don't relocate non-allocated sections */
> >>+        if ( !(base->sec->sh_flags & SHF_ALLOC) )
> >>+            continue;
> >
> >>+
> >>+        if ( elf->sec[i].sec->sh_type == SHT_RELA )
> >>+            rc = xsplice_perform_rela(elf, base, rela);
> >>+        else if ( elf->sec[i].sec->sh_type == SHT_REL )
> >>+            rc = xsplice_perform_rel(elf, base, rela);
> >>+
> >>+        if ( rc )
> >>+            return rc;
> >>+    }
> >>+
> >>+    return 0;
> >>+}
> >>+
> >>+static int load_module(struct payload *payload, uint8_t *raw, ssize_t len)
> >>+{
> >>+    struct xsplice_elf elf;
> >
> >Wait a minute? We ditch it after this?
> >
> >>+    int rc = 0;
> >>+
> >>+    rc = xsplice_verify_elf(raw, len);
> >>+    if ( rc )
> >>+        return rc;
> >>+
> >>+    rc = xsplice_elf_load(&elf, raw, len);
> >>+    if ( rc )
> >>+        return rc;
> >>+
> >>+    rc = move_module(payload, &elf);
> >>+    if ( rc )
> >>+        goto err_elf;
> >>+
> >>+    rc = resolve_symbols(&elf);
> >>+    if ( rc )
> >>+        goto err_module;
> >>+
> >>+    rc = perform_relocs(&elf);
> >>+    if ( rc )
> >>+        goto err_module;
> >>+
> >
> >Shouldn't you call xsplice_elf_free(&elf) here? Or
> >hook up the elf to the 'struct payload'?
> >
> >
> >If not, who is going to clean up elf->sec and elf->sym when the
> >payload is unloaded?
> 
> Yes, I forgot to free it here.
> 
> >>+    return 0;
> >>+
> >>+ err_module:
> >>+    free_module(payload);
> >>+ err_elf:
> >>+    xsplice_elf_free(&elf);
> >>+
> >>+    return rc;
> >>+}
> >>+
> >>  static int __init xsplice_init(void)
> >>  {
> >>      register_keyhandler('x', xsplice_printall, "print xsplicing info", 1);
> >>diff --git a/xen/include/asm-x86/x86_64/page.h b/xen/include/asm-x86/x86_64/page.h
> >>index 19ab4d0..e6f08e9 100644
> >>--- a/xen/include/asm-x86/x86_64/page.h
> >>+++ b/xen/include/asm-x86/x86_64/page.h
> >>@@ -38,6 +38,8 @@
> >>  #include <xen/pdx.h>
> >>
> >>  extern unsigned long xen_virt_end;
> >>+extern unsigned long module_virt_start;
> >>+extern unsigned long module_virt_end;
> >>
> >>  #define spage_to_pdx(spg) (((spg) - spage_table)<<(SUPERPAGE_SHIFT-PAGE_SHIFT))
> >>  #define pdx_to_spage(pdx) (spage_table + ((pdx)>>(SUPERPAGE_SHIFT-PAGE_SHIFT)))
> >>diff --git a/xen/include/xen/xsplice.h b/xen/include/xen/xsplice.h
> >>index 41e28da..a3946a3 100644
> >>--- a/xen/include/xen/xsplice.h
> >>+++ b/xen/include/xen/xsplice.h
> >>@@ -1,9 +1,31 @@
> >>  #ifndef __XEN_XSPLICE_H__
> >>  #define __XEN_XSPLICE_H__
> >>
> >>+struct xsplice_elf;
> >>+struct xsplice_elf_sec;
> >>+struct xsplice_elf_sym;
> >>+
> >>+struct xsplice_patch_func {
> >>+    unsigned long new_addr;
> >>+    unsigned long new_size;
> >>+    unsigned long old_addr;
> >>+    unsigned long old_size;
> >>+    char *name;
> >>+    unsigned char undo[8];
> >>+};
> >
> >We don't use them in this patch. They could be moved to another patch.
> 
> OK.
> 
> -- 
> Ross Lagerwall

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 01/11] xsplice: Design document (v2).
  2015-11-03 18:15 [PATCH v1 01/11] xsplice: Design document (v2) Ross Lagerwall
                   ` (10 preceding siblings ...)
  2015-11-04 21:10 ` [PATCH v1 01/11] xsplice: Design document (v2) Konrad Rzeszutek Wilk
@ 2015-11-10  9:55 ` Ross Lagerwall
  2015-11-27 12:48 ` Martin Pohlack
  12 siblings, 0 replies; 40+ messages in thread
From: Ross Lagerwall @ 2015-11-10  9:55 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Tim Deegan, Ian Jackson, Ian Campbell, Jan Beulich, xen-devel

On 11/03/2015 06:15 PM, Ross Lagerwall wrote:
> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
snip
> +## Patching code
> +
> +The first mechanism to patch that comes in mind is in-place replacement.
> +That is replace the affected code with new code. Unfortunately the x86
> +ISA is variable size which places limits on how much space we have available
> +to replace the instructions. That is not a problem if the change is smaller
> +than the original opcode and we can fill it with nops. Problems will
> +appear if the replacement code is longer.
> +
> +The second mechanism is by replacing the call or jump to the
> +old function with the address of the new function.
> +
> +A third mechanism is to add a jump to the new function at the
> +start of the old function.

Perhaps this document should be clarified to say what is actually 
implemented in this patch series? I.e. the third mechanism is the one 
that is actually implemented.

> +
> +### Example of trampoline and in-place splicing
> +
snip
> +## Addendum
> +
> +Implementation quirks should not be discussed in a design document.
> +
> +However these observations can provide aid when developing against this
> +document.
> +
> +
> +### Alternative assembler
> +
> +Alternative assembler is a mechanism to use different instructions depending
> +on what the CPU supports. This is done by providing multiple streams of code
> +that can be patched in - or if the CPU does not support it - padded with
> +`nop` operations. The alternative assembler macros cause the compiler to
> +expand the code to place a most generic code in place - emit a special
> +ELF .section header to tag this location. During run-time the hypervisor
> +can leave the areas alone or patch them with an better suited opcodes.

Note that if you patch any function that does a copy to or from guest 
memory, alternatives support is _required_ on Broadwell hardware because 
of SMAP (it patches in stac and clac).

This is actually implemented in this patch series.

> +
> +
> +### When to patch
> +
> +During the discussion on the design two candidates bubbled where
> +the call stack for each CPU would be deterministic. This would
> +minimize the chance of the patch not being applied due to safety
> +checks failing.
> +
> +#### Rendezvous code instead of stop_machine for patching
> +
> +The hypervisor's time rendezvous code runs synchronously across all CPUs
> +every second. Using the stop_machine to patch can stall the time rendezvous
> +code and result in NMI. As such having the patching be done at the tail
> +of rendezvous code should avoid this problem.
> +
> +However the entrance point for that code is
> +do_softirq->timer_softirq_action->time_calibration
> +which ends up calling on_selected_cpus on remote CPUs.
> +
> +The remote CPUs receive CALL_FUNCTION_VECTOR IPI and execute the
> +desired function.
> +
> +#### Before entering the guest code.
> +
> +Before we call VMXResume we check whether any soft IRQs need to be executed.
> +This is a good spot because all Xen stacks are effectively empty at
> +that point.
> +
> +To randezvous all the CPUs an barrier with an maximum timeout (which
> +could be adjusted), combined with forcing all other CPUs through the
> +hypervisor with IPIs, can be utilized to have all the CPUs be lockstep.
> +
> +The approach is similar in concept to stop_machine and the time rendezvous
> +but is time-bound. However the local CPU stack is much shorter and
> +a lot more deterministic.

This is implemented in this patch series.

> +
> +### Compiling the hypervisor code
> +
> +Hotpatch generation often requires support for compiling the target
> +with -ffunction-sections / -fdata-sections.  Changes would have to
> +be done to the linker scripts to support this.

As implemented, the current tool for payload generation requires no 
changes to Xen.
It is also possible to create a payload by compiling a C file containing 
the replacement functions that it completely separate from Xen itself.

> +
> +
> +### Generation of xSplice ELF payloads
> +
> +The design of that is not discussed in this design.
> +
> +The author of this design envisions objdump and objcopy along
> +with special GCC parameters (see above) to create .o.xsplice files
> +which can be used to splice an ELF with the new payload.
> +
> +The ksplice or kpatching code can provide inspiration.

This is already implemented and the tool lives in a separate repo.

> +
> +### Exception tables and symbol tables growth
> +
> +We may need support for adapting or augmenting exception tables if
> +patching such code.  Hotpatches may need to bring their own small
> +exception tables (similar to how Linux modules support this).
> +
> +If supporting hotpatches that introduce additional exception-locations
> +is not important, one could also change the exception table in-place
> +and reorder it afterwards.

Almost every patch to a non-trivial function requires additional entries 
in the exception table and/or the bug frames.

This patch series allows each payload to introduce their own exception 
tables and bug frames to support this.

> +
> +### Security
> +
> +Only the privileged domain should be allowed to do this operation.
> +
> +
> +# v2: Not Yet Done
> +
> +
> +## Goals
> +
> +The v2 design must also have a mechanism for:
> +
> + *  An dependency mechanism for the payloads. To use that information to load:
> +    - The appropiate payload. To verify that payload is built against the
> +      hypervisor. This can be done via the `build-id` (see later sections),
> +      or via providing an copy of the old code - so that the hypervisor can
> +       verify it against the code in memory.
> +    - To construct an appropiate order of payloads to load in case they
> +      depend on each other.
> + * Be able to cope with symbol names in the ELF payload.

I don't understand what this means.

> + * Be able to patch .rodata, .bss, and .data sections.

To clarify, this patch series allows a payload to introduce new .rodata, 
.bss, and .data sections. So if you change a string in a function, the 
payload produced will have a new function which overrides the old one 
and a new .rodata section. The new function will reference the new 
string in the new .rodata section.

What is not implemented in this series is _in-place_ patching or 
updating of these sections. Hook functions (coming in V2) will allow 
custom modification of these sections during patch apply. IMO, 
performing automated in-place patching of these sections (along with 
in-place patching of text sections) is crazy :-)

> + * Further safety checks.

Such as?

> +
> +### Hypervisor ID (build-id)
> +
> +The build-id can help with:
> +
> +  * Prevent loading of wrong hotpatches (intended for other builds)
> +
> +  * Allow to identify suitable hotpatches on disk and help with runtime
> +    tooling (if laid out using build ID)
> +
> +The build-id (aka hypervisor id) can be easily obtained by utilizing
> +the ld --build-id operatin which (copied from ld):
> +
> +<pre>
> +--build-id
> +    --build-id=style
> +        Request creation of ".note.gnu.build-id" ELF note section.  The contents of the note are unique bits identifying this
> +        linked file.  style can be "uuid" to use 128 random bits, "sha1" to use a 160-bit SHA1 hash on the normative parts of the
> +        output contents, "md5" to use a 128-bit MD5 hash on the normative parts of the output contents, or "0xhexstring" to use a
> +        chosen bit string specified as an even number of hexadecimal digits ("-" and ":" characters between digit pairs are
> +        ignored).  If style is omitted, "sha1" is used.
> +
> +        The "md5" and "sha1" styles produces an identifier that is always the same in an identical output file, but will be
> +        unique among all nonidentical output files.  It is not intended to be compared as a checksum for the file's contents.  A
> +        linked file may be changed later by other tools, but the build ID bit string identifying the original linked file does
> +        not change.
> +
> +        Passing "none" for style disables the setting from any "--build-id" options earlier on the command line.
> +
> +</pre>
> +
> +
> +### xSplice interdependencies
> +
> +xSplice patches interdependencies are tricky.
> +
> +There are the ways this can be addressed:
> + * A single large patch that subsumes and replaces all previous ones.
> +   Over the life-time of patching the hypervisor this large patch
> +   grows to accumulate all the code changes.
> + * Hotpatch stack - where an mechanism exists that loads the hotpatches
> +   in the same order they were built in. We would need an build-id
> +   of the hypevisor to make sure the hot-patches are build against the
> +   correct build.
> + * Payload containing the old code to check against that. That allows
> +   the hotpatches to be loaded indepedently (if they don't overlap) - or
> +   if the old code also containst previously patched code - even if they
> +   overlap.
> +
> +The disadvantage of the first large patch is that it can grow over
> +time and not provide an bisection mechanism to identify faulty patches.
> +
> +The hot-patch stack puts stricts requirements on the order of the patches
> +being loaded and requires an hypervisor build-id to match against.
> +
> +The old code allows much more flexibility and an additional guard,
> +but is more complex to implement.
> +
> +### Symbol names
> +
> +
> +Xen as it is now, has a couple of non-unique symbol names which will
> +make runtime symbol identification hard.  Sometimes, static symbols
> +simply have the same name in C files, sometimes such symbols get
> +included via header files, and some C files are also compiled
> +multiple times and linked under different names (guest_walk.c).
> +
> +As such we need to modify the linker to make sure that the symbol
> +table qualifies also symbols by their source file name.
> +
> +For the awkward situations in which C-files are compiled multiple
> +times patches we would need to some modification in the Xen code.
> +
> +
> +The convention for file-type symbols (that would allow to map many
> +symbols to their compilation unit) says that only the basename (i.e.,
> +without directories) is embedded.  This creates another layer of
> +confusion for duplicate file names in the build tree.

This should all be fixed since Jan's symbol patches so perhaps you can 
drop this section?

> +
> +That would have to be resolved.
> +
> +<pre>
> +> find . -name \*.c -print0 | xargs -0 -n1 basename | sort | uniq -c | sort -n | tail -n10
> +      3 shutdown.c
> +      3 sysctl.c
> +      3 time.c
> +      3 xenoprof.c
> +      4 gdbstub.c
> +      4 irq.c
> +      5 domain.c
> +      5 mm.c
> +      5 pci.c
> +      5 traps.c
> +</pre>
> +
> +### Handle inlined __LINE__
> +
> +
> +This problem is related to hotpatch construction
> +and potentially has influence on the design of the hotpatching
> +infrastructure in Xen.
> +
> +For example:
> +
> +We have file1.c with functions f1 and f2 (in that order).  f2 contains a
> +BUG() (or WARN()) macro and at that point embeds the source line number
> +into the generated code for f2.
> +
> +Now we want to hotpatch f1 and the hotpatch source-code patch adds 2
> +lines to f1 and as a consequence shifts out f2 by two lines.  The newly
> +constructed file1.o will now contain differences in both binary
> +functions f1 (because we actually changed it with the applied patch) and
> +f2 (because the contained BUG macro embeds the new line number).
> +
> +Without additional information, an algorithm comparing file1.o before
> +and after hotpatch application will determine both functions to be
> +changed and will have to include both into the binary hotpatch.
> +
> +Options:
> +
> +1. Transform source code patches for hotpatches to be line-neutral for
> +   each chunk.  This can be done in almost all cases with either
> +   reformatting of the source code or by introducing artificial
> +   preprocessor "#line n" directives to adjust for the introduced
> +   differences.
> +
> +   This approach is low-tech and simple.  Potentially generated
> +   backtraces and existing debug information refers to the original
> +   build and does not reflect hotpatching state except for actually
> +   hotpatched functions but should be mostly correct.
> +
> +2. Ignoring the problem and living with artificially large hotpatches
> +   that unnecessarily patch many functions.
> +
> +   This approach might lead to some very large hotpatches depending on
> +   content of specific source file.  It may also trigger pulling in
> +   functions into the hotpatch that cannot reasonable be hotpatched due
> +   to limitations of a hotpatching framework (init-sections, parts of
> +   the hotpatching framework itself, ...) and may thereby prevent us
> +   from patching a specific problem.
> +
> +   The decision between 1. and 2. can be made on a patch--by-patch
> +   basis.
> +
> +3. Introducing an indirection table for storing line numbers and
> +   treating that specially for binary diffing. Linux may follow
> +   this approach.
> +
> +   We might either use this indirection table for runtime use and patch
> +   that with each hotpatch (similarly to exception tables) or we might
> +   purely use it when building hotpatches to ignore functions that only
> +   differ at exactly the location where a line-number is embedded.

For BUG(), WARN(), etc., the line number is embedded into the bug frame, 
not the function itself. As implemented, the patch creation tool handles 
this just fine. Because the tool works on a function-by-function basis, 
only functions which have actually changed will be included.

> +
> +Similar considerations are true to a lesser extent for __FILE__, but it
> +could be argued that file renaming should be done outside of hotpatches.

The same applies for __FILE__ references in bug frames (e.g. BUG, WARN, 
ASSERT, etc).

> +
> +## Signature checking requirements.
> +
> +The signature checking requires that the layout of the data in memory
> +**MUST** be same for signature to be verified. This means that the payload
> +data layout in ELF format **MUST** match what the hypervisor would be
> +expecting such that it can properly do signature verification.
> +
> +The signature is based on the all of the payloads continuously laid out
> +in memory. The signature is to be appended at the end of the ELF payload
> +prefixed with the string '~Module signature appended~\n', followed by
> +an signature header then followed by the signature, key identifier, and signers
> +name.
> +
> +Specifically the signature header would be:
> +
> +<pre>
> +#define PKEY_ALGO_DSA       0
> +#define PKEY_ALGO_RSA       1
> +
> +#define PKEY_ID_PGP         0 /* OpenPGP generated key ID */
> +#define PKEY_ID_X509        1 /* X.509 arbitrary subjectKeyIdentifier */
> +
> +#define HASH_ALGO_MD4          0
> +#define HASH_ALGO_MD5          1
> +#define HASH_ALGO_SHA1         2
> +#define HASH_ALGO_RIPE_MD_160  3
> +#define HASH_ALGO_SHA256       4
> +#define HASH_ALGO_SHA384       5
> +#define HASH_ALGO_SHA512       6
> +#define HASH_ALGO_SHA224       7
> +#define HASH_ALGO_RIPE_MD_128  8
> +#define HASH_ALGO_RIPE_MD_256  9
> +#define HASH_ALGO_RIPE_MD_320 10
> +#define HASH_ALGO_WP_256      11
> +#define HASH_ALGO_WP_384      12
> +#define HASH_ALGO_WP_512      13
> +#define HASH_ALGO_TGR_128     14
> +#define HASH_ALGO_TGR_160     15
> +#define HASH_ALGO_TGR_192     16
> +
> +
> +struct elf_payload_signature {
> +	u8	algo;		/* Public-key crypto algorithm PKEY_ALGO_*. */
> +	u8	hash;		/* Digest algorithm: HASH_ALGO_*. */
> +	u8	id_type;	/* Key identifier type PKEY_ID*. */
> +	u8	signer_len;	/* Length of signer's name */
> +	u8	key_id_len;	/* Length of key identifier */
> +	u8	__pad[3];
> +	__be32	sig_len;	/* Length of signature data */
> +};
> +
> +</pre>
> +(Note that this has been borrowed from Linux module signature code.).
> +
> +
> +### .rodata sections
> +
> +The patching might require strings to be updated as well. As such we must be
> +also able to patch the strings as needed. This sounds simple - but the compiler
> +has a habit of coalescing strings that are the same - which means if we in-place
> +alter the strings - other users will be inadvertently affected as well.
> +
> +This is also where pointers to functions live - and we may need to patch this
> +as well. And switch-style jump tables.
> +
> +To guard against that we must be prepared to do patching similar to
> +trampoline patching or in-line depending on the flavour. If we can
> +do in-line patching we would need to:
> +
> + * alter `.rodata` to be writeable.
> + * inline patch.
> + * alter `.rodata` to be read-only.
> +
> +If are doing trampoline patching we would need to:
> +
> + * allocate a new memory location for the string.
> + * all locations which use this string will have to be updated to use the
> +   offset to the string.
> + * mark the region RO when we are done.
> +
> +### .bss and .data sections.
> +
> +Patching writable data is not suitable as it is unclear what should be done
> +depending on the current state of data. As such it should not be attempted.

Handling of these sections is implemented in this patch series and 
should just work. See what I wrote above about how .rodata, .data, .bss 
is handled.

Hook functions (coming in V2) can be used to fix up writable data. E.g. 
a hook function might iterate over each domain and apply some 
transformation.

> +
> +
> +### Patching code which is in the stack.
> +
> +We should not patch the code which is on the stack. That can lead
> +to corruption.

Due to the way the patching code is done I believe that the only 
functions on any stack will be the xsplice code and domain.c:idle_loop 
(and it shouldn't matter if a trampoline is inserted at the beginning of 
idle_loop anyway). Perhaps this section could be merged with the "When 
to patch" section above?

> +
> +### Inline patching
> +
> +The hypervisor should verify that the in-place patching would fit within
> +the code or data.
> +
> +### Trampoline (e9 opcode)
> +
> +The e9 opcode used for jmpq uses a 32-bit signed displacement. That means
> +we are limited to up to 2GB of virtual address to place the new code
> +from the old code. That should not be a problem since Xen hypervisor has
> +a very small footprint.
> +
> +However if we need - we can always add two trampolines. One at the 2GB
> +limit that calls the next trampoline.
> +
> +Please note there is a small limitation for trampolines in
> +function entries: The target function (+ trailing padding) must be able
> +to accomodate the trampoline. On x86 with +-2 GB relative jumps,
> +this means 5 bytes are  required.
> +
> +Depending on compiler settings, there are several functions in Xen that
> +are smaller (without inter-function padding).
> +
> +<pre>
> +readelf -sW xen-syms | grep " FUNC " | \
> +    awk '{ if ($3 < 5) print $3, $4, $5, $8 }'
> +
> +...
> +3 FUNC LOCAL wbinvd_ipi
> +3 FUNC LOCAL shadow_l1_index
> +...
> +</pre>
> +A compile-time check for, e.g., a minimum alignment of functions or a
> +runtime check that verifies symbol size (+ padding to next symbols) for
> +that in the hypervisor is advised.
> +

The tool for generating payloads currently does perform a compile-time 
check to ensure the function to be replaced is large enough.

-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 02/11] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
  2015-11-03 18:15 ` [PATCH v1 02/11] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op Ross Lagerwall
  2015-11-04 21:17   ` Konrad Rzeszutek Wilk
@ 2015-11-12 16:28   ` Jan Beulich
  2015-11-13 14:13     ` Konrad Rzeszutek Wilk
  2015-11-13 23:50   ` Daniel De Graaf
  2 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2015-11-12 16:28 UTC (permalink / raw)
  To: Ross Lagerwall
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Ian Jackson,
	xen-devel, Daniel De Graaf

>>> On 03.11.15 at 19:15, <ross.lagerwall@citrix.com> wrote:
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> ---
> v2: Rebased on keyhandler: rework keyhandler infrastructure
> v3: Fixed XSM.
> v4: Removed REVERTED state.
>     Split status and error code.
>     Add REPLACE action.
>     Separate payload data from the payload structure.
>     s/XSPLICE_ID_../XSPLICE_NAME_../

Odd - the subject says v1.

> --- /dev/null
> +++ b/xen/common/xsplice.c
> @@ -0,0 +1,398 @@
> +/*
> + * Copyright (c) 2015 Oracle and/or its affiliates. All rights reserved.
> + *
> + */
> +
> +#include <xen/smp.h>
> +#include <xen/keyhandler.h>
> +#include <xen/spinlock.h>
> +#include <xen/mm.h>
> +#include <xen/list.h>
> +#include <xen/guest_access.h>
> +#include <xen/stdbool.h>

No use of this header except in code shared with the tools.

Also I'd like to encourage you to sort all xen/ headers in a file
(and all public/ and asm/ ones, just that this doesn't apply here)
alphabetically in new code.

> +}
> +
> +
> +static int verify_payload(xen_sysctl_xsplice_upload_t *upload)

Double blank line above.

> +/*
> + * We MUST be holding the spinlock.
> + */

Which spinlock? Also this is a single line comment.

> +static void __free_payload(struct payload *data)

I see no reason for this function to have two underscores at the
beginning of its name.


> +err_raw:
> +    free_xenheap_pages(raw_data, get_order_from_bytes(upload->size));
> +err_data:

Labels indented by at least one blank please.

> +static int xsplice_list(xen_sysctl_xsplice_list_t *list)
> +{
> +    xen_xsplice_status_t status;
> +    struct payload *data;
> +    unsigned int idx = 0, i = 0;
> +    int rc = 0;
> +    unsigned int ver = payload_version;
> +
> +    if ( list->nr > 1024 )
> +        return -E2BIG;
> +
> +    if ( list->pad != 0 )
> +        return -EINVAL;
> +
> +    if ( guest_handle_is_null(list->status) ||
> +         guest_handle_is_null(list->id) ||
> +         guest_handle_is_null(list->len) )
> +        return -EINVAL;

???

> +    if ( !guest_handle_okay(list->status, sizeof(status) * list->nr) ||
> +         !guest_handle_okay(list->id, XEN_XSPLICE_NAME_SIZE * list->nr) ||
> +         !guest_handle_okay(list->len, sizeof(uint32_t) * list->nr) )
> +        return -EINVAL;
> +
> +    spin_lock(&payload_list_lock);
> +    if ( list->idx > payload_cnt )
> +    {
> +        spin_unlock(&payload_list_lock);
> +        return -EINVAL;
> +    }
> +
> +    list_for_each_entry( data, &payload_list, list )
> +    {
> +        uint32_t len;
> +
> +        if ( list->idx > i++ )
> +            continue;
> +
> +        status.state = data->state;
> +        status.rc = data->rc;
> +        len = strlen(data->id);
> +
> +        /* N.B. 'idx' != 'i'. */
> +        if ( copy_to_guest_offset(list->id, idx * XEN_XSPLICE_NAME_SIZE,
> +                                  data->id, len) ||
> +             copy_to_guest_offset(list->len, idx, &len, 1) ||
> +             copy_to_guest_offset(list->status, idx, &status, 1) )

You having used guest_handle_okay() above, all of these can be
__copy_to_guest_offset)() afaict.

> +        {
> +            rc = -EFAULT;
> +            break;
> +        }
> +        idx ++;

Spurious blank.

> +        if ( hypercall_preempt_check() || (idx + 1 > list->nr) )
> +        {
> +            break;
> +        }

Pointless braces.

Also - what about an input list->nr of zero?

> +    }
> +    list->nr = payload_cnt - i; /* Remaining amount. */
> +    spin_unlock(&payload_list_lock);
> +    list->version = ver;
> +
> +    /* And how many we have processed. */
> +    return rc ? rc : idx;

Please omit the middle expression in cases like this. To be honest I
can't help myself thinking that I'v already made at least some of
these comments.

> --- a/xen/include/public/sysctl.h
> +++ b/xen/include/public/sysctl.h
> @@ -766,6 +766,161 @@ struct xen_sysctl_tmem_op {
>  typedef struct xen_sysctl_tmem_op xen_sysctl_tmem_op_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tmem_op_t);
>  
> +/*
> + * XEN_SYSCTL_XSPLICE_op
> + *
> + * Refer to the docs/misc/xsplice.markdown for the design details
> + * of this hypercall.

To someone importing this header into another project, this
reference may be quite odd. Don't these get translated to
some html with a canonical place on the web?

> +struct xen_sysctl_xsplice_action {
> +    xen_xsplice_id_t id;                    /* IN, name of the patch. */
> +#define XSPLICE_ACTION_CHECK        1
> +#define XSPLICE_ACTION_UNLOAD       2
> +#define XSPLICE_ACTION_REVERT       3
> +#define XSPLICE_ACTION_APPLY        4
> +#define XSPLICE_ACTION_REPLACE      5
> +    uint32_t    cmd;                        /* IN: XSPLICE_ACTION_*. */
> +    uint32_t    pad;                        /* IN: MUST be zero. */
> +    uint64_aligned_t time;                  /* IN: Zero if no timeout. */
> +                                            /* Or upper bound of time (ms) */
> +                                            /* for operation to take. */

So if the field represent a timeout, why not call it such?

> +struct xen_sysctl_xsplice_op {
> +    uint32_t cmd;                           /* IN: XEN_SYSCTL_XSPLICE_* */
> +    uint32_t pad;                           /* IN: Always zero. */

Other "pad" fields get checked to be zero, but I wasn't able to
spot a check for this one.

> --- /dev/null
> +++ b/xen/include/xen/xsplice.h
> @@ -0,0 +1,9 @@
> +#ifndef __XEN_XSPLICE_H__
> +#define __XEN_XSPLICE_H__
> +
> +struct xen_sysctl_xsplice_op;
> +int xsplice_control(struct xen_sysctl_xsplice_op *);
> +
> +extern void xsplice_printall(unsigned char key);

What is this declaration good for? The function ought to be static
afaics.

Jan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 02/11] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
  2015-11-12 16:28   ` Jan Beulich
@ 2015-11-13 14:13     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 40+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-11-13 14:13 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Ian Campbell, Stefano Stabellini, Ian Jackson,
	xen-devel, Ross Lagerwall, Daniel De Graaf

On Thu, Nov 12, 2015 at 09:28:36AM -0700, Jan Beulich wrote:
> >>> On 03.11.15 at 19:15, <ross.lagerwall@citrix.com> wrote:
> > Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> > ---
> > v2: Rebased on keyhandler: rework keyhandler infrastructure
> > v3: Fixed XSM.
> > v4: Removed REVERTED state.
> >     Split status and error code.
> >     Add REPLACE action.
> >     Separate payload data from the payload structure.
> >     s/XSPLICE_ID_../XSPLICE_NAME_../
> 
> Odd - the subject says v1.
> 
> > --- /dev/null
> > +++ b/xen/common/xsplice.c
> > @@ -0,0 +1,398 @@
> > +/*
> > + * Copyright (c) 2015 Oracle and/or its affiliates. All rights reserved.
> > + *
> > + */
> > +
> > +#include <xen/smp.h>
> > +#include <xen/keyhandler.h>
> > +#include <xen/spinlock.h>
> > +#include <xen/mm.h>
> > +#include <xen/list.h>
> > +#include <xen/guest_access.h>
> > +#include <xen/stdbool.h>
> 
> No use of this header except in code shared with the tools.
> 
> Also I'd like to encourage you to sort all xen/ headers in a file
> (and all public/ and asm/ ones, just that this doesn't apply here)
> alphabetically in new code.

I tried sorting it. I couldn't build the file. And then I put
it on the 'TODO yak-shaving pile' and hadn't yet touched it.

There has to be some automatic tool to help with figuring the
header dependencies for all the headers, not just the ones
I am uisng.
Not exactly sure how to make all of the head
> 
> > +}
> > +
> > +
> > +static int verify_payload(xen_sysctl_xsplice_upload_t *upload)
> 
> Double blank line above.
> 
> > +/*
> > + * We MUST be holding the spinlock.
> > + */
> 
> Which spinlock? Also this is a single line comment.
> 
> > +static void __free_payload(struct payload *data)
> 
> I see no reason for this function to have two underscores at the
> beginning of its name.
> 
> 
> > +err_raw:
> > +    free_xenheap_pages(raw_data, get_order_from_bytes(upload->size));
> > +err_data:
> 
> Labels indented by at least one blank please.
> 
> > +static int xsplice_list(xen_sysctl_xsplice_list_t *list)
> > +{
> > +    xen_xsplice_status_t status;
> > +    struct payload *data;
> > +    unsigned int idx = 0, i = 0;
> > +    int rc = 0;
> > +    unsigned int ver = payload_version;
> > +
> > +    if ( list->nr > 1024 )
> > +        return -E2BIG;
> > +
> > +    if ( list->pad != 0 )
> > +        return -EINVAL;
> > +
> > +    if ( guest_handle_is_null(list->status) ||
> > +         guest_handle_is_null(list->id) ||
> > +         guest_handle_is_null(list->len) )
> > +        return -EINVAL;
> 
> ???
> 
> > +    if ( !guest_handle_okay(list->status, sizeof(status) * list->nr) ||
> > +         !guest_handle_okay(list->id, XEN_XSPLICE_NAME_SIZE * list->nr) ||
> > +         !guest_handle_okay(list->len, sizeof(uint32_t) * list->nr) )
> > +        return -EINVAL;
> > +
> > +    spin_lock(&payload_list_lock);
> > +    if ( list->idx > payload_cnt )
> > +    {
> > +        spin_unlock(&payload_list_lock);
> > +        return -EINVAL;
> > +    }
> > +
> > +    list_for_each_entry( data, &payload_list, list )
> > +    {
> > +        uint32_t len;
> > +
> > +        if ( list->idx > i++ )
> > +            continue;
> > +
> > +        status.state = data->state;
> > +        status.rc = data->rc;
> > +        len = strlen(data->id);
> > +
> > +        /* N.B. 'idx' != 'i'. */
> > +        if ( copy_to_guest_offset(list->id, idx * XEN_XSPLICE_NAME_SIZE,
> > +                                  data->id, len) ||
> > +             copy_to_guest_offset(list->len, idx, &len, 1) ||
> > +             copy_to_guest_offset(list->status, idx, &status, 1) )
> 
> You having used guest_handle_okay() above, all of these can be
> __copy_to_guest_offset)() afaict.
> 
> > +        {
> > +            rc = -EFAULT;
> > +            break;
> > +        }
> > +        idx ++;
> 
> Spurious blank.
> 
> > +        if ( hypercall_preempt_check() || (idx + 1 > list->nr) )
> > +        {
> > +            break;
> > +        }
> 
> Pointless braces.
> 
> Also - what about an input list->nr of zero?

Duh! Right.
> 
> > +    }
> > +    list->nr = payload_cnt - i; /* Remaining amount. */
> > +    spin_unlock(&payload_list_lock);
> > +    list->version = ver;
> > +
> > +    /* And how many we have processed. */
> > +    return rc ? rc : idx;
> 
> Please omit the middle expression in cases like this. To be honest I
> can't help myself thinking that I'v already made at least some of
> these comments.

You did. I hadn't had a chance to address them. Sorry about you
wasting your time and not addressing them in the code yet.

> 
> > --- a/xen/include/public/sysctl.h
> > +++ b/xen/include/public/sysctl.h
> > @@ -766,6 +766,161 @@ struct xen_sysctl_tmem_op {
> >  typedef struct xen_sysctl_tmem_op xen_sysctl_tmem_op_t;
> >  DEFINE_XEN_GUEST_HANDLE(xen_sysctl_tmem_op_t);
> >  
> > +/*
> > + * XEN_SYSCTL_XSPLICE_op
> > + *
> > + * Refer to the docs/misc/xsplice.markdown for the design details
> > + * of this hypercall.
> 
> To someone importing this header into another project, this
> reference may be quite odd. Don't these get translated to
> some html with a canonical place on the web?
> 
> > +struct xen_sysctl_xsplice_action {
> > +    xen_xsplice_id_t id;                    /* IN, name of the patch. */
> > +#define XSPLICE_ACTION_CHECK        1
> > +#define XSPLICE_ACTION_UNLOAD       2
> > +#define XSPLICE_ACTION_REVERT       3
> > +#define XSPLICE_ACTION_APPLY        4
> > +#define XSPLICE_ACTION_REPLACE      5
> > +    uint32_t    cmd;                        /* IN: XSPLICE_ACTION_*. */
> > +    uint32_t    pad;                        /* IN: MUST be zero. */
> > +    uint64_aligned_t time;                  /* IN: Zero if no timeout. */
> > +                                            /* Or upper bound of time (ms) */
> > +                                            /* for operation to take. */
> 
> So if the field represent a timeout, why not call it such?

<brainfart>
> 
> > +struct xen_sysctl_xsplice_op {
> > +    uint32_t cmd;                           /* IN: XEN_SYSCTL_XSPLICE_* */
> > +    uint32_t pad;                           /* IN: Always zero. */
> 
> Other "pad" fields get checked to be zero, but I wasn't able to
> spot a check for this one.

<nods> Thanks for spotting that!
> 
> > --- /dev/null
> > +++ b/xen/include/xen/xsplice.h
> > @@ -0,0 +1,9 @@
> > +#ifndef __XEN_XSPLICE_H__
> > +#define __XEN_XSPLICE_H__
> > +
> > +struct xen_sysctl_xsplice_op;
> > +int xsplice_control(struct xen_sysctl_xsplice_op *);
> > +
> > +extern void xsplice_printall(unsigned char key);
> 
> What is this declaration good for? The function ought to be static
> afaics.

Yes, good spoting. The code was written before Andrew's big
keyhandler rewrite and after the rebase I forgot about this.

> 
> Jan
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 02/11] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
  2015-11-03 18:15 ` [PATCH v1 02/11] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op Ross Lagerwall
  2015-11-04 21:17   ` Konrad Rzeszutek Wilk
  2015-11-12 16:28   ` Jan Beulich
@ 2015-11-13 23:50   ` Daniel De Graaf
  2 siblings, 0 replies; 40+ messages in thread
From: Daniel De Graaf @ 2015-11-13 23:50 UTC (permalink / raw)
  To: Ross Lagerwall, xen-devel
  Cc: Wei Liu, Stefano Stabellini, Ian Jackson, Ian Campbell

On 03/11/15 13:15, Ross Lagerwall wrote:
> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>
> The implementation does not actually do any patching.
>
> It just adds the framework for doing the hypercalls,
> keeping track of ELF payloads, and the basic operations:
>   - query which payloads exist,
>   - query for specific payloads,
>   - check*1, apply*1, replace*1, and unload payloads.
>
> *1: Which of course in this patch are nops.
>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>

Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 01/11] xsplice: Design document (v2).
  2015-11-03 18:15 [PATCH v1 01/11] xsplice: Design document (v2) Ross Lagerwall
                   ` (11 preceding siblings ...)
  2015-11-10  9:55 ` Ross Lagerwall
@ 2015-11-27 12:48 ` Martin Pohlack
  12 siblings, 0 replies; 40+ messages in thread
From: Martin Pohlack @ 2015-11-27 12:48 UTC (permalink / raw)
  To: Ross Lagerwall, xen-devel
  Cc: Tim Deegan, Ian Jackson, Ian Campbell, Jan Beulich

On 03.11.2015 19:15, Ross Lagerwall wrote:
[...]
> +struct xen_sysctl_xsplice_summary {  
> +    xen_xsplice_id_t    id;         /* IN, the name of the payload. */  

I still feel a bit confused about the ID vs. name thingy.  IMHO, each
payload should have a name (easily readable by a human, like
"xsa-148-xpatch") and an ID (to safely address it, like
sha1(payload)).

Here the variable is called id but described as name.

The definition of id contains a member called name again:

/*
 * Structure describing an ELF payload. Uniquely identifies the
 * payload. Should be human readable.
 * Recommended length is XEN_XSPLICE_NAME_SIZE.
 */
#define XEN_XSPLICE_NAME_SIZE 128
struct xen_xsplice_id {
    XEN_GUEST_HANDLE_64(char) name;         /* IN: pointer to name. */
    uint32_t    size;                       /* IN: size of name. May be upto
                                               XEN_XSPLICE_NAME_SIZE. */
    uint32_t    pad;                        /* IN: MUST be zero. */
};

If this thing is supposed to carry something like a build_id we should
call it such through out.  If it is supposed to carry a human readable
name, let's call it name throughout.

> sgrep xen_xsplice_id_t
./tools/include/xen/sysctl.h:806:    xen_xsplice_id_t id;                    /* IN, name of the patch. */
./tools/include/xen/sysctl.h:838:    xen_xsplice_id_t id;                  /* IN, name of the payload. */
./tools/include/xen/sysctl.h:899:    xen_xsplice_id_t id;                    /* IN, name of the patch. */


Martin
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 08/11] xsplice: Implement support for applying patches
  2015-11-03 18:16 ` [PATCH v1 08/11] xsplice: Implement support for applying patches Ross Lagerwall
  2015-11-05  3:17   ` Konrad Rzeszutek Wilk
  2015-11-05  3:19   ` Konrad Rzeszutek Wilk
@ 2015-11-27 13:51   ` Martin Pohlack
  2 siblings, 0 replies; 40+ messages in thread
From: Martin Pohlack @ 2015-11-27 13:51 UTC (permalink / raw)
  To: Ross Lagerwall, xen-devel
  Cc: Kevin Tian, Ian Campbell, Jun Nakajima, Andrew Cooper,
	Stefano Stabellini, Jan Beulich, Aravind Gopalakrishnan,
	Boris Ostrovsky, Suravee Suthikulpanit

On 03.11.2015 19:16, Ross Lagerwall wrote:
> Implement support for the apply, revert and replace actions.
> 
> To perform and action on a payload, the hypercall sets up a data
> structure to schedule the work.  A hook is added in all the
> return-to-guest paths to check for work to do and execute it if needed.
> In this way, patches can be applied with all CPUs idle and without
> stacks.  The first CPU to do_xsplice() becomes the master and triggers a
> reschedule softirq to trigger all the other CPUs to enter do_xsplice()
> with no stack.  Once all CPUs have rendezvoused, all CPUs disable IRQs
> and NMIs are ignored. The system is then quiscient and the master
> performs the action.  After this, all CPUs enable IRQs and NMIs are
> re-enabled.
> 
> The action to perform is one of:
> - APPLY: For each function in the module, store the first 5 bytes of the
>   old function and replace it with a jump to the new function.
> - REVERT: Copy the previously stored bytes into the first 5 bytes of the
>   old function.
> - REPLACE: Revert each applied module and then apply the new module.
> 
> To prevent a deadlock with any other barrier in the system, the master
> will wait for up to 30ms before timing out.  I've taken some
> measurements and found the patch application to take about 100 μs on a
> 72 CPU system, whether idle or fully loaded.
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> ---
>  xen/arch/arm/xsplice.c      |   8 ++
>  xen/arch/x86/domain.c       |   4 +
>  xen/arch/x86/hvm/svm/svm.c  |   2 +
>  xen/arch/x86/hvm/vmx/vmcs.c |   2 +
>  xen/arch/x86/xsplice.c      |  19 ++++
>  xen/common/xsplice.c        | 264 ++++++++++++++++++++++++++++++++++++++++++--
>  xen/include/asm-arm/nmi.h   |  13 +++
>  xen/include/xen/xsplice.h   |   7 +-
>  8 files changed, 306 insertions(+), 13 deletions(-)
> 
> diff --git a/xen/arch/arm/xsplice.c b/xen/arch/arm/xsplice.c
> index 8d85fa9..3c34eb3 100644
> --- a/xen/arch/arm/xsplice.c
> +++ b/xen/arch/arm/xsplice.c
> @@ -3,6 +3,14 @@
>  #include <xen/xsplice_elf.h>
>  #include <xen/xsplice.h>
>  
> +void xsplice_apply_jmp(struct xsplice_patch_func *func)
> +{
> +}
> +
> +void xsplice_revert_jmp(struct xsplice_patch_func *func)
> +{
> +}
> +
>  int xsplice_verify_elf(uint8_t *data, ssize_t len)
>  {
>      return -ENOSYS;
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index fe3be30..4420cfc 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -36,6 +36,7 @@
>  #include <xen/cpu.h>
>  #include <xen/wait.h>
>  #include <xen/guest_access.h>
> +#include <xen/xsplice.h>
>  #include <public/sysctl.h>
>  #include <asm/regs.h>
>  #include <asm/mc146818rtc.h>
> @@ -120,6 +121,7 @@ static void idle_loop(void)
>          (*pm_idle)();
>          do_tasklet();
>          do_softirq();
> +        do_xsplice();

Do we have a requirement here to run last?  -> /* must be last */

>      }
>  }
>  
> @@ -136,6 +138,7 @@ void startup_cpu_idle_loop(void)
>  
>  static void noreturn continue_idle_domain(struct vcpu *v)
>  {
> +    do_xsplice();
>      reset_stack_and_jump(idle_loop);
>  }
>  
> @@ -143,6 +146,7 @@ static void noreturn continue_nonidle_domain(struct vcpu *v)
>  {
>      check_wakeup_from_wait();
>      mark_regs_dirty(guest_cpu_user_regs());
> +    do_xsplice();
>      reset_stack_and_jump(ret_from_intr);
>  }
>  
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index 8de41fa..65bf7e9 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -26,6 +26,7 @@
>  #include <xen/hypercall.h>
>  #include <xen/domain_page.h>
>  #include <xen/xenoprof.h>
> +#include <xen/xsplice.h>
>  #include <asm/current.h>
>  #include <asm/io.h>
>  #include <asm/paging.h>
> @@ -1071,6 +1072,7 @@ static void noreturn svm_do_resume(struct vcpu *v)
>  
>      hvm_do_resume(v);
>  
> +    do_xsplice();
>      reset_stack_and_jump(svm_asm_do_resume);
>  }
>  
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 4ea1ad1..d996f47 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -25,6 +25,7 @@
>  #include <xen/kernel.h>
>  #include <xen/keyhandler.h>
>  #include <xen/vm_event.h>
> +#include <xen/xsplice.h>
>  #include <asm/current.h>
>  #include <asm/cpufeature.h>
>  #include <asm/processor.h>
> @@ -1685,6 +1686,7 @@ void vmx_do_resume(struct vcpu *v)
>      }
>  
>      hvm_do_resume(v);
> +    do_xsplice();
>      reset_stack_and_jump(vmx_asm_do_vmentry);
>  }
>  
> diff --git a/xen/arch/x86/xsplice.c b/xen/arch/x86/xsplice.c
> index dbff0d5..31e4124 100644
> --- a/xen/arch/x86/xsplice.c
> +++ b/xen/arch/x86/xsplice.c
> @@ -3,6 +3,25 @@
>  #include <xen/xsplice_elf.h>
>  #include <xen/xsplice.h>
>  
> +#define PATCH_INSN_SIZE 5
> +
> +void xsplice_apply_jmp(struct xsplice_patch_func *func)
> +{
> +    uint32_t val;
> +    uint8_t *old_ptr;
> +
> +    old_ptr = (uint8_t *)func->old_addr;
> +    memcpy(func->undo, old_ptr, PATCH_INSN_SIZE);
> +    *old_ptr++ = 0xe9; /* Relative jump */
> +    val = func->new_addr - func->old_addr - PATCH_INSN_SIZE;
> +    memcpy(old_ptr, &val, sizeof val);
> +}
> +
> +void xsplice_revert_jmp(struct xsplice_patch_func *func)
> +{
> +    memcpy((void *)func->old_addr, func->undo, PATCH_INSN_SIZE);
> +}
> +
>  int xsplice_verify_elf(uint8_t *data, ssize_t len)
>  {
>  
> diff --git a/xen/common/xsplice.c b/xen/common/xsplice.c
> index 5e88c55..4476be5 100644
> --- a/xen/common/xsplice.c
> +++ b/xen/common/xsplice.c
> @@ -11,16 +11,21 @@
>  #include <xen/guest_access.h>
>  #include <xen/stdbool.h>
>  #include <xen/sched.h>
> +#include <xen/softirq.h>
>  #include <xen/lib.h>
> +#include <xen/wait.h>
>  #include <xen/xsplice_elf.h>
>  #include <xen/xsplice.h>
>  #include <public/sysctl.h>
>  
>  #include <asm/event.h>
> +#include <asm/nmi.h>
>  
>  static DEFINE_SPINLOCK(payload_list_lock);
>  static LIST_HEAD(payload_list);
>  
> +static LIST_HEAD(applied_list);
> +
>  static unsigned int payload_cnt;
>  static unsigned int payload_version = 1;
>  
> @@ -29,15 +34,34 @@ struct payload {
>      int32_t rc;         /* 0 or -EXX. */
>  
>      struct list_head   list;   /* Linked to 'payload_list'. */
> +    struct list_head   applied_list;   /* Linked to 'applied_list'. */
>  
> +    struct xsplice_patch_func *funcs;
> +    int nfuncs;
>      void *module_address;
>      size_t module_pages;
>  
>      char  id[XEN_XSPLICE_NAME_SIZE + 1];          /* Name of it. */
>  };
>  
> +/* Defines an outstanding patching action. */
> +struct xsplice_work
> +{
> +    atomic_t semaphore;          /* Used for rendezvous */
> +    atomic_t irq_semaphore;      /* Used to signal all IRQs disabled */
> +    struct payload *data;        /* The payload on which to act */
> +    volatile bool_t do_work;     /* Signals work to do */
> +    volatile bool_t ready;       /* Signals all CPUs synchronized */
> +    uint32_t cmd;                /* Action request. XSPLICE_ACTION_* */
> +};
> +
> +static DEFINE_SPINLOCK(xsplice_work_lock);
> +/* There can be only one outstanding patching action. */
> +static struct xsplice_work xsplice_work;
> +
>  static int load_module(struct payload *payload, uint8_t *raw, ssize_t len);
>  static void free_module(struct payload *payload);
> +static int schedule_work(struct payload *data, uint32_t cmd);
>  
>  static const char *state2str(int32_t state)
>  {
> @@ -341,28 +365,22 @@ static int xsplice_action(xen_sysctl_xsplice_action_t *action)
>      case XSPLICE_ACTION_REVERT:
>          if ( data->state == XSPLICE_STATE_APPLIED )
>          {
> -            /* No implementation yet. */
> -            data->state = XSPLICE_STATE_CHECKED;
> -            data->rc = 0;
> -            rc = 0;
> +            data->rc = -EAGAIN;
> +            rc = schedule_work(data, action->cmd);
>          }
>          break;
>      case XSPLICE_ACTION_APPLY:
>          if ( (data->state == XSPLICE_STATE_CHECKED) )
>          {
> -            /* No implementation yet. */
> -            data->state = XSPLICE_STATE_APPLIED;
> -            data->rc = 0;
> -            rc = 0;
> +            data->rc = -EAGAIN;
> +            rc = schedule_work(data, action->cmd);
>          }
>          break;
>      case XSPLICE_ACTION_REPLACE:
>          if ( data->state == XSPLICE_STATE_CHECKED )
>          {
> -            /* No implementation yet. */
> -            data->state = XSPLICE_STATE_CHECKED;
> -            data->rc = 0;
> -            rc = 0;
> +            data->rc = -EAGAIN;
> +            rc = schedule_work(data, action->cmd);
>          }
>          break;
>      default:
> @@ -637,6 +655,24 @@ static int perform_relocs(struct xsplice_elf *elf)
>      return 0;
>  }
>  
> +static int find_special_sections(struct payload *payload,
> +                                 struct xsplice_elf *elf)
> +{
> +    struct xsplice_elf_sec *sec;
> +
> +    sec = xsplice_elf_sec_by_name(elf, ".xsplice.funcs");
> +    if ( !sec )
> +    {
> +        printk(XENLOG_ERR ".xsplice.funcs is missing\n");
> +        return -1;
> +    }
> +
> +    payload->funcs = (struct xsplice_patch_func *)sec->load_addr;
> +    payload->nfuncs = sec->sec->sh_size / (sizeof *payload->funcs);
> +
> +    return 0;
> +}
> +
>  static int load_module(struct payload *payload, uint8_t *raw, ssize_t len)
>  {
>      struct xsplice_elf elf;
> @@ -662,6 +698,10 @@ static int load_module(struct payload *payload, uint8_t *raw, ssize_t len)
>      if ( rc )
>          goto err_module;
>  
> +    rc = find_special_sections(payload, &elf);
> +    if ( rc )
> +        goto err_module;
> +
>      return 0;
>  
>   err_module:
> @@ -672,6 +712,206 @@ static int load_module(struct payload *payload, uint8_t *raw, ssize_t len)
>      return rc;
>  }
>  
> +
> +/*
> + * The following functions get the CPUs into an appropriate state and
> + * apply (or revert) each of the module's functions.
> + */
> +
> +/*
> + * This function is executed having all other CPUs with no stack and IRQs
> + * disabled.

How would we guard against NMIs?

> + */
> +static int apply_payload(struct payload *data)
> +{
> +    int i;
> +
> +    printk(XENLOG_DEBUG "Applying payload: %s\n", data->id);
> +
> +    for ( i = 0; i < data->nfuncs; i++ )
> +        xsplice_apply_jmp(data->funcs + i);
> +
> +    list_add_tail(&data->applied_list, &applied_list);
> +
> +    return 0;
> +}
> +
> +/*
> + * This function is executed having all other CPUs with no stack and IRQs
> + * disabled.
> + */
> +static int revert_payload(struct payload *data)
> +{
> +    int i;
> +
> +    printk(XENLOG_DEBUG "Reverting payload: %s\n", data->id);
> +
> +    for ( i = 0; i < data->nfuncs; i++ )
> +        xsplice_revert_jmp(data->funcs + i);
> +
> +    list_del(&data->applied_list);
> +
> +    return 0;
> +}
> +
> +/* Must be holding the payload_list lock */
> +static int schedule_work(struct payload *data, uint32_t cmd)
> +{
> +    /* Fail if an operation is already scheduled */
> +    if ( xsplice_work.do_work )
> +        return -EAGAIN;
> +
> +    xsplice_work.cmd = cmd;
> +    xsplice_work.data = data;
> +    atomic_set(&xsplice_work.semaphore, 0);
> +    atomic_set(&xsplice_work.irq_semaphore, 0);
> +    xsplice_work.ready = false;
> +    smp_mb();
> +    xsplice_work.do_work = true;
> +    smp_mb();
> +
> +    return 0;
> +}
> +
> +static int mask_nmi_callback(const struct cpu_user_regs *regs, int cpu)
> +{
> +    return 1;

Nice approach.  But it means that:
* At least do_nmi will not be safely patchable.  Any idea how we can
  come up with a list of similarly non-patchable functions?
* We will lose NMIs hitting us here.

> +}
> +
> +static void reschedule_fn(void *unused)
> +{
> +    smp_mb(); /* Synchronize with setting do_work */
> +    raise_softirq(SCHEDULE_SOFTIRQ);
> +}
> +
> +/*
> + * The main function which manages the work of quiescing the system and
> + * patching code.
> + */
> +void do_xsplice(void)
> +{
> +    int id;
> +    unsigned int total_cpus;
> +    nmi_callback_t saved_nmi_callback;
> +
> +    /* Fast path: no work to do */
> +    if ( likely(!xsplice_work.do_work) )
> +        return;
> +
> +    ASSERT(local_irq_is_enabled());
> +
> +    spin_lock(&xsplice_work_lock);
> +    id = atomic_read(&xsplice_work.semaphore);
> +    atomic_inc(&xsplice_work.semaphore);
> +    spin_unlock(&xsplice_work_lock);
> +

I have used something like this:

    /* Deal with CPU hotplugging etc., changes in cpu_online_map while
     * running this sync protocol.
     */
    if ( ! get_cpu_maps() )
        return -EBUSY;

and later in the last worker:

        put_cpu_maps();

> +    total_cpus = num_online_cpus();
> +
> +    if ( id == 0 )
> +    {
> +        s_time_t timeout, start;
> +
> +        /* Trigger other CPUs to execute do_xsplice */
> +        smp_call_function(reschedule_fn, NULL, 0);
> +
> +        /* Wait for other CPUs with a timeout */
> +        start = NOW();
> +        timeout = start + MILLISECS(30);
> +        while ( atomic_read(&xsplice_work.semaphore) != total_cpus &&
> +                NOW() < timeout )
> +            cpu_relax();
> +
> +        if ( atomic_read(&xsplice_work.semaphore) == total_cpus )
> +        {
> +            struct payload *data2;
> +
> +            /* "Mask" NMIs */
> +            saved_nmi_callback = set_nmi_callback(mask_nmi_callback);
> +
> +            /* All CPUs are waiting, now signal to disable IRQs */
> +            xsplice_work.ready = true;
> +            smp_mb();
> +
> +            /* Wait for irqs to be disabled */

There is no timeout here.  If a "follower" is interrupted before closing
his interrupts you wait arbitrarily long here for this one follower.
You have no progress guarantee from the followers without closed
interrupts on their side.

> +            while ( atomic_read(&xsplice_work.irq_semaphore) != (total_cpus - 1) )
> +                cpu_relax();
> +

And if you are interrupted here, all others will wait arbitrarily long
here for you.

> +            local_irq_disable();
> +            /* Now this function should be the only one on any stack.
> +             * No need to lock the payload list or applied list. */
> +            switch ( xsplice_work.cmd )
> +            {
> +                case XSPLICE_ACTION_APPLY:
> +                        xsplice_work.data->rc = apply_payload(xsplice_work.data);
> +                        if ( xsplice_work.data->rc == 0 )
> +                            xsplice_work.data->state = XSPLICE_STATE_APPLIED;
> +                        break;
> +                case XSPLICE_ACTION_REVERT:
> +                        xsplice_work.data->rc = revert_payload(xsplice_work.data);
> +                        if ( xsplice_work.data->rc == 0 )
> +                            xsplice_work.data->state = XSPLICE_STATE_CHECKED;
> +                        break;
> +                case XSPLICE_ACTION_REPLACE:
> +                        list_for_each_entry ( data2, &payload_list, list )
> +                        {
> +                            if ( data2->state != XSPLICE_STATE_APPLIED )
> +                                continue;
> +
> +                            data2->rc = revert_payload(data2);
> +                            if ( data2->rc == 0 )
> +                                data2->state = XSPLICE_STATE_CHECKED;
> +                            else
> +                            {
> +                                xsplice_work.data->rc = -EINVAL;
> +                                break;
> +                            }
> +                        }
> +                        if ( xsplice_work.data->rc != -EINVAL )
> +                        {
> +                            xsplice_work.data->rc = apply_payload(xsplice_work.data);
> +                            if ( xsplice_work.data->rc == 0 )
> +                                xsplice_work.data->state = XSPLICE_STATE_APPLIED;
> +                        }
> +                        break;
> +                default:
> +                        xsplice_work.data->rc = -EINVAL;
> +                        break;
> +            }
> +
> +            local_irq_enable();
> +            set_nmi_callback(saved_nmi_callback);
> +        }
> +        else
> +        {
> +            xsplice_work.data->rc = -EBUSY;
> +        }
> +
> +        xsplice_work.do_work = 0;
> +        smp_mb(); /* Synchronize with waiting CPUs */
> +    }
> +    else

I call this the "follower"

> +    {
> +        /* Wait for all CPUs to rendezvous */
> +        while ( xsplice_work.do_work && !xsplice_work.ready )
> +        {
> +            cpu_relax();
> +            smp_mb();
> +        }

Even of the main CPU above has already decided to abort, you go through
the code below.  Not sure if one follower might be trapped in here for
quick do_xsplice retries.

> +
> +        /* Disable IRQs and signal */
> +        local_irq_disable();
> +        atomic_inc(&xsplice_work.irq_semaphore);
> +
> +        /* Wait for patching to complete */
> +        while ( xsplice_work.do_work )
> +        {
> +            cpu_relax();
> +            smp_mb();
> +        }
> +        local_irq_enable();
> +    }
> +}

Huh, that needs another deep dive.

Maybe it would be helpful to write this sync protocol up in text form.
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 10/11] xsplice: Add support for exception tables
  2015-11-03 18:16 ` [PATCH v1 10/11] xsplice: Add support for exception tables Ross Lagerwall
  2015-11-05 19:47   ` Konrad Rzeszutek Wilk
@ 2015-11-27 16:28   ` Martin Pohlack
  2015-11-27 17:05     ` Ross Lagerwall
  1 sibling, 1 reply; 40+ messages in thread
From: Martin Pohlack @ 2015-11-27 16:28 UTC (permalink / raw)
  To: Ross Lagerwall, xen-devel; +Cc: Andrew Cooper, Jan Beulich

On 03.11.2015 19:16, Ross Lagerwall wrote:
> +#ifdef CONFIG_X86
> +unsigned long search_module_extables(unsigned long addr)
> +{
> +    struct payload *data;
> +    unsigned long ret;
> +
> +    /* No locking since this list is only ever changed during apply or revert
> +     * context. */

How do you make sure that no exception is triggered in the patching
process itself (also for future code changes)?

Could we use a lockless update on the list of module ex-tables?

> +    list_for_each_entry ( data, &applied_list, applied_list )
> +    {
> +        if ( !data->start_ex_table )
> +            continue;
> +        if ( !((void *)addr >= data->module_address &&
> +               (void *)addr < (data->module_address + data->core_text_size)))
> +            continue;
> +
> +        ret = search_one_extable(data->start_ex_table, data->stop_ex_table - 1,
> +                                 addr);
> +        if ( ret )
> +            return ret;
> +    }
> +
> +    return 0;
> +}
> +#endif
> +

Martin

-- 

Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v1 10/11] xsplice: Add support for exception tables
  2015-11-27 16:28   ` Martin Pohlack
@ 2015-11-27 17:05     ` Ross Lagerwall
  0 siblings, 0 replies; 40+ messages in thread
From: Ross Lagerwall @ 2015-11-27 17:05 UTC (permalink / raw)
  To: Martin Pohlack, xen-devel; +Cc: Andrew Cooper, Jan Beulich

On 11/27/2015 04:28 PM, Martin Pohlack wrote:
> On 03.11.2015 19:16, Ross Lagerwall wrote:
>> +#ifdef CONFIG_X86
>> +unsigned long search_module_extables(unsigned long addr)
>> +{
>> +    struct payload *data;
>> +    unsigned long ret;
>> +
>> +    /* No locking since this list is only ever changed during apply or revert
>> +     * context. */
>
> How do you make sure that no exception is triggered in the patching
> process itself (also for future code changes)?
>
> Could we use a lockless update on the list of module ex-tables?
>

That seems like overkill. The patching process either simply does 
list_add_tail or list_del to update the list, so unless you expect those 
to generate exceptions, I don't think you gain anything by doing 
lockless updates.

-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2015-11-27 17:05 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-03 18:15 [PATCH v1 01/11] xsplice: Design document (v2) Ross Lagerwall
2015-11-03 18:15 ` [PATCH v1 02/11] xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op Ross Lagerwall
2015-11-04 21:17   ` Konrad Rzeszutek Wilk
2015-11-12 16:28   ` Jan Beulich
2015-11-13 14:13     ` Konrad Rzeszutek Wilk
2015-11-13 23:50   ` Daniel De Graaf
2015-11-03 18:16 ` [PATCH v1 03/11] libxc: Implementation of XEN_XSPLICE_op in libxc Ross Lagerwall
2015-11-03 18:16 ` [PATCH v1 04/11] xen-xsplice: Tool to manipulate xsplice payloads Ross Lagerwall
2015-11-04 21:27   ` Konrad Rzeszutek Wilk
2015-11-03 18:16 ` [PATCH v1 05/11] elf: Add relocation types to elfstructs.h Ross Lagerwall
2015-11-05 10:38   ` Jan Beulich
2015-11-05 11:52     ` Ross Lagerwall
2015-11-03 18:16 ` [PATCH v1 06/11] xsplice: Add helper elf routines Ross Lagerwall
2015-11-04 21:49   ` Konrad Rzeszutek Wilk
2015-11-03 18:16 ` [PATCH v1 07/11] xsplice: Implement payload loading Ross Lagerwall
2015-11-04 22:21   ` Konrad Rzeszutek Wilk
2015-11-05 10:35     ` Jan Beulich
2015-11-05 11:51       ` Ross Lagerwall
2015-11-05 12:13         ` Jan Beulich
2015-11-05 11:15     ` Ross Lagerwall
2015-11-05 20:12       ` Konrad Rzeszutek Wilk
2015-11-03 18:16 ` [PATCH v1 08/11] xsplice: Implement support for applying patches Ross Lagerwall
2015-11-05  3:17   ` Konrad Rzeszutek Wilk
2015-11-05 11:45     ` Ross Lagerwall
2015-11-05 20:08       ` Konrad Rzeszutek Wilk
2015-11-05  3:19   ` Konrad Rzeszutek Wilk
2015-11-27 13:51   ` Martin Pohlack
2015-11-03 18:16 ` [PATCH v1 09/11] xsplice: Add support for bug frames Ross Lagerwall
2015-11-05 19:43   ` Konrad Rzeszutek Wilk
2015-11-03 18:16 ` [PATCH v1 10/11] xsplice: Add support for exception tables Ross Lagerwall
2015-11-05 19:47   ` Konrad Rzeszutek Wilk
2015-11-27 16:28   ` Martin Pohlack
2015-11-27 17:05     ` Ross Lagerwall
2015-11-03 18:16 ` [PATCH v1 11/11] xsplice: Add support for alternatives Ross Lagerwall
2015-11-05 19:48   ` Konrad Rzeszutek Wilk
2015-11-04 21:10 ` [PATCH v1 01/11] xsplice: Design document (v2) Konrad Rzeszutek Wilk
2015-11-05 10:49   ` Ross Lagerwall
2015-11-05 19:56     ` Konrad Rzeszutek Wilk
2015-11-10  9:55 ` Ross Lagerwall
2015-11-27 12:48 ` Martin Pohlack

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.