All of lore.kernel.org
 help / color / mirror / Atom feed
From: Teck Choon Giam <giamteckchoon@gmail.com>
To: MaoXiaoyun <tinnycloud@hotmail.com>
Cc: jeremy@goop.org, xen devel <xen-devel@lists.xensource.com>,
	keir@xen.org, ian.campbell@citrix.com, konrad.wilk@oracle.com,
	dave@ivt.com.au
Subject: Re: kernel BUG at arch/x86/xen/mmu.c:1872
Date: Mon, 11 Apr 2011 04:14:45 +0800	[thread overview]
Message-ID: <BANLkTimgh_iip27zkDPNV9r7miwbxHmdVg@mail.gmail.com> (raw)
In-Reply-To: <BLU157-w540B39FBA137B4D96278D2DAA90@phx.gbl>

[-- Attachment #1: Type: text/plain, Size: 4583 bytes --]

2011/4/10 MaoXiaoyun <tinnycloud@hotmail.com>:
> Hi Konrad & Jeremy:
>
>             I think we finally located the missing patch for this commit.
>             We test commit
> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=c97f681f138039425c87f35ea46a92385d81e70e
>             which is works.
>
>             We test commit
> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=221c64dbf860d37f841f40893bddf8d804aa55bd
>             which server crashed.
>
>              Later I found the comments for this commit:
>
> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=64141da587241301ce8638cc945f8b67853156ec
>
>             So It looks like this fix is not applied on 2.6.32.36, Could you
> take a look at this?
>
>             Many thanks.
>
> =====================================================
>>Hi Konrad & Jeremy:
>>
>>     I'd like to open this BUG in a new thread, since the old thread is too
>> long for easy read.
>>
>>     We recently want to upgrade our kernel to 2.6.32, but unfortunately,
>> we confront a kernel crash bug.
>>Our test case is simple, start 24 win2003 HVMS on our physical machine, and
>> each HVM reboot
>>every 15minutes. The kernel will crash in half an hour.(That is crash on VM
>> second starts).
>>
>>Our test go much further.
>>We test different kernel version.
>>2.6.32.10
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=d945b014ac5df9592c478bf9486d97e8914aab59
>>2.6.32.11
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=27f948a3bf365a5bc3d56119637a177d41147815
>>2.6.32.12
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=ba739f9abd3f659b907a824af1161926b420a2ce
>>2.6.32.13
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=f6fe6583b77a49b569eef1b66c3d761eec2e561b
>>2.6.32.15
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=27ed1b0e0dae5f1d5da5c76451bc84cb529128bd
>>2.6.32.21
>> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=69e50db231723596ed8ef9275d0068d6697f466a
>>
>>There are basic three different result we met.
>>
>>i1) grant table issue
>>The host still function, but use xm  dmesg, we have abnormal log.
>>please refer to the attched log of grant table
>>
>>i2) kernel crash on a different place.
>>Host die during the test, after reboot, we can see nothing abnormal in
>> /var/log/messages
>>
>>i3) kernel BUG at arch/x86/xen/mmu.c:1872;
>>Host die during the test, after reboot, we see the crash log in messages,
>> refer to the attached log of 2.6.32.36
>>Summary of the test result, can be classified in two:
>>
>>1) 2.6.32.10
>>30 machines involved the test, and three has issue (i1), and two has issue
>> (i2), *no* issue (i3)
>>Other machines run tests successfully till now, more than 8 hours
>>
>>2)2.6.32.11 or later version.
>>Each version containers 10 machine for tests, and all machine crashed in
>> less than half an hour.
>>
>>Conclusion:
>>1) grant table issue exists in all kernel version
>>2) kernerl crash at different place may exist in all kernel versions, but
>> not happen so frequently, 2 out of 30
>>3) We observe the major difference of issue i3), from the test, it looks
>> like it is introduced between the version
>>2.6.32.10 and 2.6.32.11.
>>
>>Hope this help to locate the bug.
>>Many thanks.
>>
>>
>

Hi,

Sorry, since this mmu related BUG has been troubled me for very
long... I really want to "kill" this BUG but my knowledge in kernel
hacking and/or xen is very limited.

While waiting for Jeremy or Konrad or others ...

Many thanks for spending time to track down this mmu related BUG.  I
have backported the commit from
http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=64141da587241301ce8638cc945f8b67853156ec
to 2.6.32.36 PVOPS kernel and patch attached.  I won't know whether
did I backport it correctly nor does it affects anything.  I am
currently testing the 2.6.32.36 PVOPS kernel with this patch applied
and also unset CONFIG_DEBUG_PAGEALLOC.  Currently running testcrash.sh
loop 1000 as I am unable to reproduce this mmu BUG 1872 in
testcrash.sh loop 100.  Please note that when CONFIG_DEBUG_PAGEALLOC
is unset, I can reproduce this mmu BUG 1872 easily within <50
testcrash.sh loop cycle with PVOPS version 2.6.32.24 to 2.6.32.36
kernel.  Now test with this backport patch to see whether I can
reproduce this mmu BUG... ...

Kindest regards,
Giam Teck Choon

[-- Attachment #2: vmalloc__eagerly_clear_ptes_on_vunmap.patch --]
[-- Type: text/x-patch, Size: 3393 bytes --]

Back port from commit http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=64141da587241301ce8638cc945f8b67853156ec

diff -urN a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
--- a/arch/x86/xen/mmu.c	2011-03-30 06:17:46.000000000 +0800
+++ b/arch/x86/xen/mmu.c	2011-04-11 02:17:54.000000000 +0800
@@ -2430,8 +2430,6 @@
 	x86_init.paging.pagetable_setup_start = xen_pagetable_setup_start;
 	x86_init.paging.pagetable_setup_done = xen_pagetable_setup_done;
 	pv_mmu_ops = xen_mmu_ops;
-
-	vmap_lazy_unmap = false;
 }
 
 /* Protected by xen_reservation_lock. */
diff -urN a/include/linux/vmalloc.h b/include/linux/vmalloc.h
--- a/include/linux/vmalloc.h	2011-03-30 06:17:46.000000000 +0800
+++ b/include/linux/vmalloc.h	2011-04-11 02:18:43.000000000 +0800
@@ -7,8 +7,6 @@
 
 struct vm_area_struct;		/* vma defining user mapping in mm_types.h */
 
-extern bool vmap_lazy_unmap;
-
 /* bits in flags of vmalloc's vm_struct below */
 #define VM_IOREMAP	0x00000001	/* ioremap() and friends */
 #define VM_ALLOC	0x00000002	/* vmalloc() */
diff -urN a/mm/vmalloc.c b/mm/vmalloc.c
--- a/mm/vmalloc.c	2011-03-30 06:17:46.000000000 +0800
+++ b/mm/vmalloc.c	2011-04-11 02:25:38.000000000 +0800
@@ -31,8 +31,6 @@
 #include <asm/tlbflush.h>
 #include <asm/shmparam.h>
 
-bool vmap_lazy_unmap __read_mostly = true;
-
 /*** Page table manipulation functions ***/
 
 static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end)
@@ -503,9 +501,6 @@
 {
 	unsigned int log;
 
-	if (!vmap_lazy_unmap)
-		return 0;
-
 	log = fls(num_online_cpus());
 
 	return log * (32UL * 1024 * 1024 / PAGE_SIZE);
@@ -566,7 +561,6 @@
 			if (va->va_end > *end)
 				*end = va->va_end;
 			nr += (va->va_end - va->va_start) >> PAGE_SHIFT;
-			unmap_vmap_area(va);
 			list_add_tail(&va->purge_list, &valist);
 			va->flags |= VM_LAZY_FREEING;
 			va->flags &= ~VM_LAZY_FREE;
@@ -612,10 +606,11 @@
 }
 
 /*
- * Free and unmap a vmap area, caller ensuring flush_cache_vunmap had been
- * called for the correct range previously.
+ * Free a vmap area, caller ensuring that the area has been unmapped
+ * and flush_cache_vunmap had been called for the correct range
+ * previously.
  */
-static void free_unmap_vmap_area_noflush(struct vmap_area *va)
+static void free_vmap_area_noflush(struct vmap_area *va)
 {
 	va->flags |= VM_LAZY_FREE;
 	atomic_add((va->va_end - va->va_start) >> PAGE_SHIFT, &vmap_lazy_nr);
@@ -624,6 +619,16 @@
 }
 
 /*
+ * Free and unmap a vmap area, caller ensuring flush_cache_vunmap had been
+ * called for the correct range previously.
+ */
+static void free_unmap_vmap_area_noflush(struct vmap_area *va)
+{
+	unmap_vmap_area(va);
+	free_vmap_area_noflush(va);
+}
+
+/*
  * Free and unmap a vmap area
  */
 static void free_unmap_vmap_area(struct vmap_area *va)
@@ -799,7 +804,7 @@
 	spin_unlock(&vmap_block_tree_lock);
 	BUG_ON(tmp != vb);
 
-	free_unmap_vmap_area_noflush(vb->va);
+	free_vmap_area_noflush(vb->va);
 	call_rcu(&vb->rcu_head, rcu_free_vb);
 }
 
@@ -936,6 +941,8 @@
 	rcu_read_unlock();
 	BUG_ON(!vb);
 
+	vunmap_page_range((unsigned long)addr, (unsigned long)addr + size);
+
 	spin_lock(&vb->lock);
 	BUG_ON(bitmap_allocate_region(vb->dirty_map, offset >> PAGE_SHIFT, order));
 
@@ -988,7 +995,6 @@
 
 				s = vb->va->va_start + (i << PAGE_SHIFT);
 				e = vb->va->va_start + (j << PAGE_SHIFT);
-				vunmap_page_range(s, e);
 				flush = 1;
 
 				if (s < start)

[-- Attachment #3: testcrash.sh --]
[-- Type: application/x-sh, Size: 5573 bytes --]

[-- Attachment #4: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

  reply	other threads:[~2011-04-10 20:14 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <COL0-MC1-F14hmBzxHs00230882@col0-mc1-f14.Col0.hotmail.com>
2011-04-08 11:24 ` kernel BUG at arch/x86/xen/mmu.c:1860! MaoXiaoyun
2011-04-08 11:46   ` MaoXiaoyun
2011-04-10  3:57   ` kernel BUG at arch/x86/xen/mmu.c:1872 MaoXiaoyun
2011-04-10  4:29   ` MaoXiaoyun
2011-04-10 13:57     ` MaoXiaoyun
2011-04-10 20:14       ` Teck Choon Giam [this message]
2011-04-11 12:16         ` Teck Choon Giam
2011-04-11 12:22           ` Teck Choon Giam
2011-04-11 12:31           ` MaoXiaoyun
2011-04-11 15:25             ` Teck Choon Giam
2011-04-12  3:30               ` MaoXiaoyun
2011-04-12 16:08                 ` Teck Choon Giam
2011-04-11 18:08             ` Jeremy Fitzhardinge
2011-04-12  3:35               ` MaoXiaoyun
2011-04-12  6:48                 ` Grant Table Error on 2.6.32.36 + Xen 4.0.1 MaoXiaoyun
2011-04-12  8:46                   ` Konrad Rzeszutek Wilk
2011-04-12  9:02                     ` MaoXiaoyun
2011-04-12  9:11                 ` Kernel BUG at arch/x86/mm/tlb.c:61 MaoXiaoyun
2011-04-12 10:00                   ` Konrad Rzeszutek Wilk
2011-04-12 10:10                     ` MaoXiaoyun
2011-04-14  6:16                     ` MaoXiaoyun
2011-04-14  7:26                       ` Teck Choon Giam
2011-04-14  7:56                         ` MaoXiaoyun
2011-04-14 11:16                           ` MaoXiaoyun
2011-04-15 12:23                             ` MaoXiaoyun
2011-04-15 21:22                               ` Jeremy Fitzhardinge
2011-04-18 15:20                                 ` MaoXiaoyun
2011-04-25  3:15                                 ` MaoXiaoyun
2011-04-26  5:52                                   ` Tian, Kevin
2011-04-26  7:04                                     ` MaoXiaoyun
2011-04-26  8:31                                       ` Tian, Kevin
2011-04-28 23:29                                     ` Jeremy Fitzhardinge
2011-04-29  0:19                                       ` Tian, Kevin
2011-04-29  1:50                                         ` MaoXiaoyun
2011-04-29  1:57                                           ` Tian, Kevin
2011-04-25  4:42                                 ` MaoXiaoyun
2011-04-25 12:54                                   ` MaoXiaoyun
2011-04-25 13:11                                     ` MaoXiaoyun
2011-04-25 15:05                                       ` MaoXiaoyun
2011-04-26  5:55                                         ` Tian, Kevin
2011-04-12 16:32               ` kernel BUG at arch/x86/xen/mmu.c:1872 Teck Choon Giam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BANLkTimgh_iip27zkDPNV9r7miwbxHmdVg@mail.gmail.com \
    --to=giamteckchoon@gmail.com \
    --cc=dave@ivt.com.au \
    --cc=ian.campbell@citrix.com \
    --cc=jeremy@goop.org \
    --cc=keir@xen.org \
    --cc=konrad.wilk@oracle.com \
    --cc=tinnycloud@hotmail.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.