Hi: As I go through the code. From tlb.c:60, it looks like it cpu_tlbstate.state is TLBSTATE_OK, which indicates in user space, but the caller, in mmu.c:1512, (active_mm == mm) indicates kernel space, that the conflict. Well, the panic CPU is processing IPI interrupt, could it be something wrong with CPU mask? thanks. ======arch/x86/mm/tlb.c=== 58 void leave_mm(int cpu) 59 { 60 <+++if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK) 61 <+++<+++BUG(); 62 <+++cpumask_clear_cpu(cpu, 63 <+++<+++<+++ mm_cpumask(percpu_read(cpu_tlbstate.active_mm))); 64 <+++load_cr3(swapper_pg_dir); 65 } 66 EXPORT_SYMBOL_GPL(leave_mm); 67 ///arch/x86/xen/mmu.c 1502 #ifdef CONFIG_SMP 1503 /* Another cpu may still have their %cr3 pointing at the pagetable, so 1504 we need to repoint it somewhere else before we can unpin it. */ 1505 static void drop_other_mm_ref(void *info) 1506 { 1507 <+++struct mm_struct *mm = info; 1508 <+++struct mm_struct *active_mm; 1509 1510 <+++active_mm = percpu_read(cpu_tlbstate.active_mm); 1511 1512 <+++if (active_mm == mm) 1513 <+++<+++leave_mm(smp_processor_id()); 1514 1515 <+++/* If this cpu still has a stale cr3 reference, then make sure 1516 <+++ it has been flushed. */ 1517 <+++if (percpu_read(xen_current_cr3) == __pa(mm->pgd)) 1518 <+++<+++load_cr3(swapper_pg_dir); 1519 } > Date: Thu, 14 Apr 2011 15:26:14 +0800 > Subject: Re: Kernel BUG at arch/x86/mm/tlb.c:61 > From: giamteckchoon@gmail.com > To: tinnycloud@hotmail.com > CC: xen-devel@lists.xensource.com; jeremy@goop.org; konrad.wilk@oracle.com > > 2011/4/14 MaoXiaoyun : > > Hi: > > > > I've done test with "cpuidle=0 cpufreq=none", two machine crashed. > > > > blktap_sysfs_destroy > > blktap_sysfs_destroy > > blktap_sysfs_create: adding attributes for dev ffff8800ad581000 > > blktap_sysfs_create: adding attributes for dev ffff8800a48e3e00 > > ------------[ cut here ]------------ > > kernel BUG at arch/x86/mm/tlb.c:61! > > invalid opcode: 0000 [#1] SMP > > last sysfs file: /sys/block/tapdeve/dev > > CPU 0 > > Modules linked in: 8021q garp blktap xen_netback xen_blkback blkback_pagemap nbd bridge stp llc autofs4 ipmi_devintf ipmi_si ipmi_ms > > ghandler lockd sunrpc bonding ipv6 xenfs dm_multipath video output sbs sbshc parport_pc lp parport ses enclosure snd_seq_dummy bnx2 > > serio_raw snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm i2c_i801 snd_timer i2c_core snd iT > > CO_wdt pata_acpi soundcore iTCO_vendor_ > > support ata_generic snd_page_alloc pcspkr ata_piix shpchp mptsas mptscsih mptbase [last unloa > > ded: freq_table] > > Pid: 8022, comm: khelper Not tainted 2.6.32.36xen #1 Tecal RH2285 > > RIP: e030:[] [] leave_mm+0x15/0x46 > > RSP: e02b:ffff88002803ee48 EFLAGS: 00010046 > > RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff81675980 > > RDX: ffff88002803ee78 RSI: 0000000000000000 RDI: 0000000000000000 > > RBP: ffff88002803ee48 R08: ffff8800a4929000 R09: dead000000200200 > > R10: dead000000100100 R11: ffffffff81447292 R12: ffff88012ba07b80 > > R13: ffff880028046020 R14: 00000000000004fb R15: 0000000000000000 > > FS: 00007f410af416e0(0000) GS:ffff88002803b000(0000) knlGS:0000000000000000 > > CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > > CR2: 0000000000469000 CR3: 00000000ad639000 CR4: 0000000000002660 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > Process khelper (pid: 8022, threadinfo ffff8800a4846000, task ffff8800a9ed0000) > > Stack: > > ffff88002803ee68 ffffffff8100e4a4 0000000000000001 ffff880097de3b88 > > <0> ffff88002803ee98 ffffffff81087224 ffff88002803ee78 ffff88002803ee78 > > <0> ffff88015f808180 00000000000004fb ffff88002803eea8 ffffffff810100e8 > > Call Trace: > > > > [] drop_other_mm_ref+0x2a/0x53 > > [] generic_smp_call_function_single_interrupt+0xd8/0xfc > > [] xen_call_function_single_interrupt+0x13/0x28 > > [] handle_IRQ_event+0x66/0x120 > > [] handle_percpu_irq+0x41/0x6e > > [] __xen_evtchn_do_upcall+0x1ab/0x27d > > [] xen_evtchn_do_upcall+0x33/0x46 > > [] xen_do_hypervisor_callback+0x1e/0x30 > > > > [] ? _spin_unlock_irqrestore+0x15/0x17 > > [] ? xen_restore_fl_direct_end+0x0/0x1 > > [] ? flush_old_exec+0x3ac/0x500 > > [] ? load_elf_binary+0x0/0x17ef > > [] ? load_elf_binary+0x0/0x17ef > > [] ? load_elf_binary+0x398/0x17ef > > [] ? need_resched+0x23/0x2d > > > > [] ? process_measurement+0xc0/0xd7 > > [] ? load_elf_binary+0x0/0x17ef > > [] ? search_binary_handler+0xc8/0x255 > > [] ? do_execve+0x1c3/0x29e > > [] ? sys_execve+0x43/0x5d > > [] ? __call_usermodehelper+0x0/0x6f > > [] ? kernel_execve+0x68/0xd0 > > [] ? __call_usermodehelper+0x0/0x6f > > [] ? xen_restore_fl_direct_end+0x0/0x1 > > [] ? ____call_usermodehelper+0x113/0x11e > > [] ? child_rip+0xa/0x20 > > [] ? __call_usermodehelper+0x0/0x6f > > [] ? int_ret_from_sys_call+0x7/0x1b > > [] ? retint_restore_args+0x5/0x6 > > [] ? c > > hild_rip+0x0/0x20 > > Code: 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00 e8 17 ff ff ff c9 c3 55 48 89 e5 0f 1f 44 00 00 65 8b 04 25 c8 55 01 00 ff c8 75 04 <0f> 0b eb fe 65 48 8b 34 25 c0 55 01 00 48 81 c6 b8 02 00 00 e8 > > RIP [] leave_mm+0x15/0x46 > > RSP > > ---[ end trace 1522f17fdfc9162d ]--- > > Kernel panic - not syncing: Fatal exception in interrupt > > Pid: 8022, comm: khelper Tainted: G D 2.6.32.36xen #1 > > Call Trace: > > [] panic+0xe0/0x19a > > [] ? init_amd+0x296/0x37a > > [] ? xen_force_evtchn_callback+0xd/0xf > > [] ? check_events+0x12/0x20 > > [] ? xen_restore_fl_direct_end+0x0/0x1 > > [] ? print_oops_end_marker+0x23/0x25 > > [] oops_end+0xb6/0xc6 > > [] die+0x5a/0x63 > > [] do_trap+0x115/0x124 > > [] do_invalid_op+0x9c/0xa5 > > [] ? leave_mm+0x15/0x46 > > [] ? xen_clocksource_read+0x21/0x23 > > [] ? HYPERVISOR_vcpu_op+0xf/0x11 > > [] ? xen_vcpuop_set_next_event+0x52/0x67 > > [] invalid_op+0x1b/0x20 > > [] ? _spin_unlock_irqrestore+0x15/0x17 > > [] ? leave_mm+0x15/0x46 > > [] drop_other_mm_ref+0x2a/0x53 > > [] generic_smp_call_function_single_interrupt+0xd8/0xfc > > [] xen_call_function_single_interrupt+0x13/0x28 > > [] handle_IRQ_event+0x66/0x120 > > [] handle_percpu_irq+0x41/0x6e > > [] __xen_evtchn_do_upcall+0x1ab/0x27d > > [] xen_evtchn_do_upcall+0x33/0x46 > > [] xen_do_hypervisor_callback+0x1e/0x30 > > [] ? _spin_unlock_irqrestore+0x15/0x17 > > [] ? xen_restore_fl_direct_end+0x0/0x1 > > [] ? flush_old_exec+0x3ac/0x500 > > [] ? load_elf_binary+0x0/0x17ef > > [] ? load_elf_binary+0x0/0x17ef > > [] ? load_elf_binary+0x398/0x17ef > > [] ? need_resched+0x23/0x > > 2d > > [] ? process_measurement+0xc0/0xd7 > > [] ? load_elf_binary+0x0/0x17ef > > [] ? search_binary_handler+0xc8/0x255 > > [] ? do_execve+0x1c3/0x29e > > [] ? sys_execve+0x43/0x5d > > [] ? __call_usermodehelper+0x0/0x6f > > [] ? kernel_execve+0x68/0xd0 > > [] ? __call_usermodehelper+0x0/0x6f > > [] ? xen_restore_fl_direct_end+0x0/0x1 > > [] ? ____call_usermodehelper+0x113/0x11e > > [] ? child_rip+0xa/0x20 > > [] ? __call_usermodehelper+0x0/0x6f > > [] ? int_ret_from_sys_call+0x7/0x1b > > [] ? retint_restore_args+0x5/0x6 > > [] ? child_rip+0x0/0x20 > > (XEN) Domain 0 crashed: 'noreboot' set - not rebooting. > > > >> Date: Tue, 12 Apr 2011 06:00:00 -0400 > >> From: konrad.wilk@oracle.com > >> To: tinnycloud@hotmail.com > >> CC: xen-devel@lists.xensource.com; giamteckchoon@gmail.com; > >> jeremy@goop.org > >> Subject: Re: Kernel BUG at arch/x86/mm/tlb.c:61 > >> > >> On Tue, Apr 12, 2011 at 05:11:51PM +0800, MaoXiaoyun wrote: > >> > > >> > Hi : > >> > > >> > We are using pvops kernel 2.6.32.36 + xen 4.0.1, but confront a kernel > >> > panic bug. > >> > > >> > 2.6.32.36 Kernel: > >> > http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=bb1a15e55ec665a64c8a9c6bd699b1f16ac01ff4 > >> > Xen 4.0.1 http://xenbits.xen.org/hg/xen-4.0-testing.hg/rev/b536ebfba183 > >> > > >> > Our test is simple, 24 HVMS(Win2003 ) on a single host, each HVM loopes > >> > in restart every 15minutes. > >> > >> What is the storage that you are using for your guests? AoE? Local disks? > >> > >> > About 17 machines are invovled in the test, after 10 hours run, one > >> > confrontted a crash at arch/x86/mm/tlb.c:61 > >> > > >> > Currently I am trying "cpuidle=0 cpufreq=none" tests based on Teck's > >> > suggestion. > >> > > >> > Any comments, thanks. > >> >