[PATCH] xen mmu: fix a race window causing leave_mm BUG()

* [PATCH] xen mmu: fix a race window causing leave_mm BUG()
@ 2011-04-29  4:10 Tian, Kevin
  2011-05-10 20:27 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 7+ messages in thread
From: Tian, Kevin @ 2011-04-29  4:10 UTC (permalink / raw)
  To: xen devel; +Cc: jeremy, MaoXiaoyun

[-- Attachment #1: Type: text/plain, Size: 1121 bytes --]

    xen mmu: fix a race window causing leave_mm BUG()
    
    there's a race window in xen_drop_mm_ref, where remote cpu may exit
    dirty bitmap between the check on this cpu and the point where remote
    cpu handles drop request. So in drop_other_mm_ref we need check
    whether TLB state is still lazy before calling into leave_mm. This
    bug is rarely observed in earlier kernel, but exaggerated by the
    commit 831d52bc153971b70e64eccfbed2b232394f22f8 which clears bitmap
    after changing the TLB state.
    
    thanks for Maxiaoyun<tinnycloud@hotmail.com> to verify it.
    
    Signed-off-by: Kevin Tian <kevin.tian@intel.com>

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 4e5a611..74c6e4a 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1260,7 +1260,7 @@ static void drop_other_mm_ref(void *info)
 
 	active_mm = percpu_read(cpu_tlbstate.active_mm);
 
-	if (active_mm == mm)
+	if (active_mm == mm && percpu_read(cpu_tlbstate.state) != TLBSTATE_OK)
 		leave_mm(smp_processor_id());
 
 	/* If this cpu still has a stale cr3 reference, then make sure

[-- Attachment #2: 20100429_fix_leave_mm_bug.patch --]
[-- Type: application/octet-stream, Size: 1224 bytes --]

commit d49e9a336371c5ab171d9eccec922b0d0db9e67d
Author: Kevin Tian <kevin.tian@intel.com>
Date:   Fri Apr 29 10:42:05 2011 +0800

    xen mmu: fix a race window causing leave_mm BUG()
    
    there's a race window in xen_drop_mm_ref, where remote cpu may exit
    dirty bitmap between the check on this cpu and the point where remote
    cpu handles drop request. So in drop_other_mm_ref we need check
    whether TLB state is still lazy before calling into leave_mm. This
    bug is rarely observed in earlier kernel, but exaggerated by the
    commit 831d52bc153971b70e64eccfbed2b232394f22f8 which clears bitmap
    after changing the TLB state.
    
    thanks for Maxiaoyun<tinnycloud@hotmail.com> to verify it.
    
    Signed-off-by: Kevin Tian <kevin.tian@intel.com>

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 4e5a611..91c9527 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1260,7 +1260,7 @@ static void drop_other_mm_ref(void *info)
 
 	active_mm = percpu_read(cpu_tlbstate.active_mm);
 
-	if (active_mm == mm)
+	if (active_mm == mm && percpu_read(cpu_tlbstate.state) != TLBSTATE_OK)
 		leave_mm(smp_processor_id());
 
 	/* If this cpu still has a stale cr3 reference, then make sure

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply related	[flat|nested] 7+ messages in thread