Possible issue with x86_emulate when writing results back to memory

* Possible issue with x86_emulate when writing results back to memory
@ 2014-01-09 15:39 Simon Graham
  2014-01-09 15:53 ` Andrew Cooper
  2014-01-09 16:00 ` Jan Beulich
  0 siblings, 2 replies; 16+ messages in thread
From: Simon Graham @ 2014-01-09 15:39 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 2886 bytes --]

We've been seeing very infrequent crashes in Windows VMs for a while where it appears that the top-byte in a longword that is supposed to hold a pointer is being set to zero - for example:

BUG CHECK 0000000A (00B49A40 00000002 00000001 8261092E)

The first parameter to keBugCheck is the faulting address - 00B49A40 in this case.

When we look at the dump, we are in a routine releasing a queued spinlock and the correct address that should have been used was 0xA3B49A40 and indeed memory contents in the windows dump have this value. Looking around some more, we see that the failing processor is executing the release queued spinlock code and another processor is executing the code to acquire the same queued spinlock and has recently written the 0xA3B49A40 value to the location where the failing instruction stream read it from.

If we look at the disassembly for the two code paths, the writing code does:

	mov dword ptr [edx],eax

and the reading code does the following to read this same value:

	mov ebx,dword ptr [eax]

On a hunch that this might be a problem with the x86_emulate code, I took a look at how the mov instruction would be emulated - in both cases where emulation can be done (hvm/emulate.c and mm/shadow/multi.c), the routines that write instruction results back to memory use memcpy() to actually copy the data. Looking at the implementation of memcpy in Xen, I see that, in a 64-bit build as ours is, it will use 'rep movsq' to move the data in quadwords and then uses 'rep movsb' to move the last 1-7 bytes -- so the instructions above will, I think, always use byte instructions for the four bytes.

Now, according to the X86 arch, 32-bit mov's are supposed to be atomic but based on the above they will not be and I am speculating that this is the cause of our occasional crash - the code path unlocking the spin lock on the other processor sees a partially written value in memory.

Does this seem a reasonable explanation of the issue? 

On the assumption that this is correct, I developed the attached patch (against 4.3.1) which updates all the code paths that are used to read and writeback the results of instruction emulation to use a simple assignment if the length is 2 or 4 bytes -- this doesn't fix the general case where you have a length > 8 but it does handle emulation of MOV instructions. Unfortunately, the use of emulation in the HVM code uses a generic routine for copying memory to the guest so every single place that guest memory is read or written will pay the penalty of the extra check for length - not sure if that is terrible or not. Since doing this we have not seen a single instance of the crash - but it's only been a month!

The attached patch is for discussion purposes only - if it is deemed acceptable I'll resubmit a proper patch request against unstable.

Simon Graham
Citrix Systems, Inc

[-- Attachment #2: fix-memcpy-in-x86-emulate --]
[-- Type: application/octet-stream, Size: 2838 bytes --]

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 2dd9b7e..2ad603e 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2602,6 +2602,26 @@ void hvm_task_switch(
     hvm_unmap_entry(nptss_desc);
 }
 
+/*
+ * Routine to make __hvm_copy appropriate to use for copying the
+ * results of instruction emulation back to guest memory - these
+ * typically require 64-bit, 32-bit and 16-bit writes to be atomic
+ * whereas memcpy is only atomic for 64-bit writes. This is still
+ * not 100% correct since copies larger than 64-bits will not be
+ * atomic for the last 2-6 bytes but should be good enough for
+ * instruction emulation
+ */
+static inline void __hvm_atomic_copy(
+    void *to, void *from, int count)
+{
+    if (count == sizeof(uint32_t))
+        *(uint32_t *)to = *(uint32_t *)from;
+    else if (count == sizeof(uint16_t))
+        *(uint16_t *)to = *(uint16_t *)from;
+    else
+        memcpy(to, from, count);
+}
+
 #define HVMCOPY_from_guest (0u<<0)
 #define HVMCOPY_to_guest   (1u<<0)
 #define HVMCOPY_no_fault   (0u<<1)
@@ -2701,7 +2721,7 @@ static enum hvm_copy_result __hvm_copy(
             }
             else
             {
-                memcpy(p, buf, count);
+                __hvm_atomic_copy(p, buf, count);
                 paging_mark_dirty(curr->domain, page_to_mfn(page));
             }
         }
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 3fed0b6..5e0da82 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -4762,6 +4762,26 @@ static void emulate_unmap_dest(struct vcpu *v,
     atomic_inc(&v->domain->arch.paging.shadow.gtable_dirty_version);
 }
 
+/*
+ * Routine to make sh_x86_emulate_write appropriate to use for copying the
+ * results of instruction emulation back to guest memory - these
+ * typically require 64-bit, 32-bit and 16-bit writes to be atomic
+ * whereas memcpy is only atomic for 64-bit writes. This is still
+ * not 100% correct since copies larger than 64-bits will not be
+ * atomic for the last 2-6 bytes but should be good enough for
+ * instruction emulation
+ */
+static inline void __sh_atomic_write(
+    void *to, void *from, int count)
+{
+    if (count == sizeof(uint32_t))
+        *(uint32_t *)to = *(uint32_t *)from;
+    else if (count == sizeof(uint16_t))
+        *(uint16_t *)to = *(uint16_t *)from;
+    else
+        memcpy(to, from, count);
+}
+
 static int
 sh_x86_emulate_write(struct vcpu *v, unsigned long vaddr, void *src,
                      u32 bytes, struct sh_emulate_ctxt *sh_ctxt)
@@ -4777,7 +4797,7 @@ sh_x86_emulate_write(struct vcpu *v, unsigned long vaddr, void *src,
         return (long)addr;
 
     paging_lock(v->domain);
-    memcpy(addr, src, bytes);
+    __sh_atomic_write(addr, src, bytes);
 
     if ( tb_init_done )
     {

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread