* Ceph kernel client - kernel craches @ 2012-05-08 15:43 Giorgos Kappes 2012-05-08 19:18 ` Tommi Virtanen 0 siblings, 1 reply; 4+ messages in thread From: Giorgos Kappes @ 2012-05-08 15:43 UTC (permalink / raw) To: ceph-devel hi, When I am running deboostrap to install a base Debian Squeeze system on a Ceph directory the client's kernel crashes with the following message: I: Retrieving Release I: Validating Packages I: Resolving dependencies of required packages... I: Resolving dependencies of base packages... I: Found additional required dependencies: insserv libbz2-1.0 libdb4.8 libslang2 I: Found additional base dependencies: libnfnetlink0 libsqlite3-0 I: Checking component main on http://ftp.us.debian.org/debian... I: Validating libacl1 ... I: Extracting xz-utils... I: Extracting zlib1g... W: Failure trying to run: chroot /mnt/debian mount -t proc proc /proc [ 759.776151] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 759.776169] BUG: unable to handle kernel paging request at ffffe8fffffe4ab0 [ 759.776182] IP: [<ffffe8fffffe4ab0>] 0xffffe8fffffe4aaf [ 759.776195] PGD c42b067 PUD c42c067 PMD c42d067 PTE 801000000c445067 [ 759.776209] Oops: 0011 [#1] SMP [ 759.776219] CPU 0 [ 759.776224] Modules linked in: pcspkr [last unloaded: scsi_wait_scan] [ 759.776237] [ 759.776244] Pid: 0, comm: swapper/0 Tainted: G W 3.2.11 #2 [ 759.776255] RIP: e030:[<ffffe8fffffe4ab0>] [<ffffe8fffffe4ab0>] 0xffffe8fffffe4aaf [ 759.776267] RSP: e02b:ffff88001ffaae98 EFLAGS: 00010296 [ 759.776274] RAX: ffff880012d7a900 RBX: ffff88001ffb5960 RCX: ffffe8fffffe4ab0 [ 759.776302] RDX: ffff88000d1a9b00 RSI: 000000000000000f RDI: ffff88000d1a9b00 [ 759.776309] RBP: ffffffff81c1fa80 R08: ffff88001eb74000 R09: 000000018010000f [ 759.776317] R10: 000000008010000f R11: ffffffff818055f5 R12: ffff88001ffb5990 [ 759.776324] R13: ffff88000c5ea880 R14: 0000000000000001 R15: 000000000000000a [ 759.776334] FS: 00007f21095a4740(0000) GS:ffff88001ffa7000(0000) knlGS:0000000000000000 [ 759.776342] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [ 759.776349] CR2: ffffe8fffffe4ab0 CR3: 0000000012e28000 CR4: 0000000000002660 [ 759.776356] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 759.776364] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 759.776372] Process swapper/0 (pid: 0, threadinfo ffffffff81c00000, task ffffffff81c0d020) [ 759.776379] Stack: [ 759.776384] ffffffff81099405 0000000000000001 ffff880012d7a900 ffff88001ffaaeb0 [ 759.776397] 0000000000000048 ffffffff81c01fd8 0000000000000100 0000000000000001 [ 759.776409] 0000000000000009 ffffffff81c01fd8 ffffffff81099898 ffffffff81c01fd8 [ 759.776422] Call Trace: [ 759.776427] <IRQ> [ 759.776438] [<ffffffff81099405>] ? __rcu_process_callbacks+0x1c7/0x2f8 [ 759.776447] [<ffffffff81099898>] ? rcu_process_callbacks+0x2c/0x56 [ 759.776457] [<ffffffff8104cb72>] ? __do_softirq+0xc4/0x1a0 [ 759.776465] [<ffffffff81096875>] ? handle_percpu_irq+0x3d/0x54 [ 759.776475] [<ffffffff8150efb6>] ? __xen_evtchn_do_upcall+0x1c7/0x205 [ 759.776484] [<ffffffff8176e52c>] ? call_softirq+0x1c/0x30 [ 759.776493] [<ffffffff8100fa47>] ? do_softirq+0x3f/0x79 [ 759.776501] [<ffffffff8104c942>] ? irq_exit+0x44/0xb5 [ 759.776508] [<ffffffff8150ffc6>] ? xen_evtchn_do_upcall+0x27/0x32 [ 759.776516] [<ffffffff8176e57e>] ? xen_do_hypervisor_callback+0x1e/0x30 [ 759.776523] <EOI> [ 759.776531] [<ffffffff81006f3f>] ? xen_restore_fl_direct_reloc+0x4/0x4 [ 759.776539] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000 [ 759.776547] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000 [ 759.776556] [<ffffffff8163969b>] ? cpuidle_idle_call+0x16/0x1af [ 759.776564] [<ffffffff810068dc>] ? xen_safe_halt+0xc/0x15 [ 759.776572] [<ffffffff810150a6>] ? default_idle+0x4b/0x84 [ 759.776580] [<ffffffff8100ddf6>] ? cpu_idle+0xb9/0xef [ 759.776588] [<ffffffff81cf7bff>] ? start_kernel+0x395/0x3a0 [ 759.776596] [<ffffffff81cfa536>] ? xen_start_kernel+0x593/0x598 [ 759.776602] Code: e8 ff ff 80 4a fe ff ff e8 ff ff 0b 00 00 00 01 00 00 00 fa ff ff ff fa ff ff ff 06 00 00 00 02 00 00 00 05 00 00 00 cc cc cc cc <00> 9b 1a 0d 00 88 ff ff 00 0f b7 1e 00 88 ff ff 01 00 00 00 00 [ 759.776699] RIP [<ffffe8fffffe4ab0>] 0xffffe8fffffe4aaf [ 759.776712] RSP <ffff88001ffaae98> [ 759.776717] CR2: ffffe8fffffe4ab0 [ 759.776725] ---[ end trace 36924001333caa12 ]--- [ 759.776731] Kernel panic - not syncing: Fatal exception in interrupt [ 759.776739] Pid: 0, comm: swapper/0 Tainted: G D W 3.2.11 #2 [ 759.776745] Call Trace: [ 759.776749] <IRQ> [<ffffffff81764003>] ? panic+0x92/0x1a0 [ 759.776771] [<ffffffff810478c0>] ? kmsg_dump+0x41/0xdd [ 759.776779] [<ffffffff81766cc1>] ? oops_end+0xa9/0xb6 [ 759.776788] [<ffffffff8102ec7d>] ? no_context+0x1ff/0x20c [ 759.776795] [<ffffffff81768d9f>] ? do_page_fault+0x1ad/0x34c [ 759.776805] [<ffffffff8106dfb3>] ? tick_nohz_handler+0xcb/0xcb [ 759.776813] [<ffffffff8102c12a>] ? pvclock_clocksource_read+0x46/0xb4 [ 759.776821] [<ffffffff81006eb3>] ? xen_vcpuop_set_next_event+0x4d/0x61 [ 759.776829] [<ffffffff8106cdcc>] ? clockevents_program_event+0x99/0xb8 [ 759.776837] [<ffffffff817663b5>] ? page_fault+0x25/0x30 [ 759.776845] [<ffffffff81099405>] ? __rcu_process_callbacks+0x1c7/0x2f8 [ 759.776853] [<ffffffff81099898>] ? rcu_process_callbacks+0x2c/0x56 [ 759.776861] [<ffffffff8104cb72>] ? __do_softirq+0xc4/0x1a0 [ 759.776868] [<ffffffff81096875>] ? handle_percpu_irq+0x3d/0x54 [ 759.776876] [<ffffffff8150efb6>] ? __xen_evtchn_do_upcall+0x1c7/0x205 [ 759.776883] [<ffffffff8176e52c>] ? call_softirq+0x1c/0x30 [ 759.776891] [<ffffffff8100fa47>] ? do_softirq+0x3f/0x79 [ 759.776898] [<ffffffff8104c942>] ? irq_exit+0x44/0xb5 [ 759.776905] [<ffffffff8150ffc6>] ? xen_evtchn_do_upcall+0x27/0x32 [ 759.776913] [<ffffffff8176e57e>] ? xen_do_hypervisor_callback+0x1e/0x30 [ 759.776919] <EOI> [<ffffffff81006f3f>] ? xen_restore_fl_direct_reloc+0x4/0x4 [ 759.776931] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000 [ 759.780132] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000 [ 759.780132] [<ffffffff8163969b>] ? cpuidle_idle_call+0x16/0x1af [ 759.780132] [<ffffffff810068dc>] ? xen_safe_halt+0xc/0x15 [ 759.780132] [<ffffffff810150a6>] ? default_idle+0x4b/0x84 [ 759.780132] [<ffffffff8100ddf6>] ? cpu_idle+0xb9/0xef [ 759.780132] [<ffffffff81cf7bff>] ? start_kernel+0x395/0x3a0 My simple cluster consists of 3 nodes in total. Each node is a Xen domU guest running the Linux kernel 3.2.6 and ceph 0.43. For reference, here is my configuration: ; ------------------------------------------------------------------------------------------- ; ; ceph ceph.conf file. ; ; This file defines cluster membership, the various locations ; that Ceph stores data, and any other runtime options. [global] ; enable secure authentication auth supported = cephx ; keyring placement keyring = /etc/ceph/$name.keyring ; allow ourselves to open a lot of files ; max open files = 131072 ; set log file ; log file = /var/log/ceph/$name.log ; log_to_syslog = true ; uncomment this line to log to syslog ; set up pid files ; pid file = /var/run/ceph/$name.pid ; If you want to run a IPv6 cluster, set this to true. Dual-stack isn't possible ; ms bind ipv6 = true ; monitors [mon] mon data = /mnt/store/$name [mon.a] host = sm-ceph0 mon addr = 192.168.2.254:6789 [mds] ; where the mds keeps it's secret encryption keys ;keyring = /data/keyring.$name [mds.a] host = sm-ceph0 [osd] ; This is where the btrfs volume will be mounted. osd data = /mnt/store/$name ; This is a file-based journal. osd journal = /mnt/store/$name/$name.journal osd journal size = 1000 ; journal size, in megabytes ; You can change the number of recovery operations to speed up recovery ; or slow it down if your machines can't handle it ; osd recovery max active = 3 [osd.0] host = sm-ceph0 btrfs devs = /dev/xvda3 ; If you want to specify some other mount options, you can do so. ; The default values are rw,noatime ; btrfs options = rw,noatime [osd.1] host = sm-ceph1 btrfs devs = /dev/xvda3 [osd.2] host = sm-ceph2 btrfs devs = /dev/xvda3 ; ------------------------------------------------------------------------------------------- My Ceph kernel client is another Xen domU node running the Linux kernel 3.2.11. I have also tried a native client with the same result. Please note that this bug happens only in the client side. Your help would be greatly appreciated. Thanks, Giorgos Kappes ----------------------------------------------------------- Giorgos Kappes Website: http://www.cs.uoi.gr/~gkappes email: geokapp@gmail.com -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Ceph kernel client - kernel craches 2012-05-08 15:43 Ceph kernel client - kernel craches Giorgos Kappes @ 2012-05-08 19:18 ` Tommi Virtanen 2012-05-10 18:00 ` Giorgos Kappes 0 siblings, 1 reply; 4+ messages in thread From: Tommi Virtanen @ 2012-05-08 19:18 UTC (permalink / raw) To: Giorgos Kappes; +Cc: ceph-devel On Tue, May 8, 2012 at 8:43 AM, Giorgos Kappes <geokapp@gmail.com> wrote: > When I am running deboostrap to install a base Debian Squeeze system > on a Ceph directory the client's kernel crashes with the following > message: > > I: Extracting zlib1g... > W: Failure trying to run: chroot /mnt/debian mount -t proc proc /proc > [ 759.776151] kernel tried to execute NX-protected page - exploit > attempt? (uid: 0) > [ 759.776169] BUG: unable to handle kernel paging request at ffffe8fffffe4ab0 ... > [ 759.776438] [<ffffffff81099405>] ? __rcu_process_callbacks+0x1c7/0x2f8 > [ 759.776447] [<ffffffff81099898>] ? rcu_process_callbacks+0x2c/0x56 > [ 759.776457] [<ffffffff8104cb72>] ? __do_softirq+0xc4/0x1a0 > [ 759.776465] [<ffffffff81096875>] ? handle_percpu_irq+0x3d/0x54 > [ 759.776475] [<ffffffff8150efb6>] ? __xen_evtchn_do_upcall+0x1c7/0x205 > [ 759.776484] [<ffffffff8176e52c>] ? call_softirq+0x1c/0x30 > [ 759.776493] [<ffffffff8100fa47>] ? do_softirq+0x3f/0x79 > [ 759.776501] [<ffffffff8104c942>] ? irq_exit+0x44/0xb5 > [ 759.776508] [<ffffffff8150ffc6>] ? xen_evtchn_do_upcall+0x27/0x32 > [ 759.776516] [<ffffffff8176e57e>] ? xen_do_hypervisor_callback+0x1e/0x30 ... > My simple cluster consists of 3 nodes in total. Each node is a Xen > domU guest running the Linux kernel 3.2.6 and ceph 0.43. For > reference, here is my configuration: ... > My Ceph kernel client is another Xen domU node running the Linux > kernel 3.2.11. I have also tried a native client with the same result. > Please note that this bug happens only in the client side. > Your help would be greatly appreciated. Your backtrace includes Xen code in it -- can you reproduce this bug with a mainline kernel, without Xen at all? Also, the error encountered is from the NX security subsystem. It would be nice to know what would happen without NX. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Ceph kernel client - kernel craches 2012-05-08 19:18 ` Tommi Virtanen @ 2012-05-10 18:00 ` Giorgos Kappes 2012-05-17 22:49 ` Josh Durgin 0 siblings, 1 reply; 4+ messages in thread From: Giorgos Kappes @ 2012-05-10 18:00 UTC (permalink / raw) To: Tommi Virtanen; +Cc: ceph-devel Sorry for my late response. I reproduced the above bug with the Linux kernel 3.3.4 and without using XEN: uname -a Linux node33 3.3.4 #1 SMP Wed May 9 13:00:07 EEST 2012 x86_64 GNU/Linux The trace is shown below: ---------------------------------------------------- [ 763.984023] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 763.984177] BUG: unable to handle kernel paging request at ffff880037bd0800 [ 763.984402] IP: [<ffff880037bd0800>] 0xffff880037bd07ff [ 763.984568] PGD 1806063 PUD 180a063 PMD 8000000037a001e3 [ 763.984845] Oops: 0011 [#1] SMP [ 763.985058] CPU 3 [ 763.985124] Modules linked in: cbc netconsole loop snd_pcm snd_timer snd soundcore snd_page_alloc processor tpm_tis i5400_edac tpm edac_core tpm_bios evdev pcspkr i5k_amb rng_core thermal_sys button shpchp pci_hotplug sd_mod crc_t10dif usbhid hid ide_cd_mod cdrom ata_generic uhci_hcd ehci_hcd ata_piix libata piix ide_core usbcore usb_common tg3 libphy mptsas mptscsih mptbase scsi_transport_sas scsi_mod [last unloaded: scsi_wait_scan] [ 763.988002] [ 763.988002] Pid: 0, comm: swapper/3 Not tainted 3.3.4 #1 HP ProLiant DL160 G5 [ 763.988002] RIP: 0010:[<ffff880037bd0800>] [<ffff880037bd0800>] 0xffff880037bd07ff [ 763.988002] RSP: 0018:ffff8800bfcc3e78 EFLAGS: 00010292 [ 763.988002] RAX: ffff8800b97745b0 RBX: ffff8800bfcce770 RCX: ffff880037bd0800 [ 763.988002] RDX: ffff880037bd1600 RSI: 00000000b9b6a040 RDI: ffff880037bd1600 [ 763.988002] RBP: ffffffff81820080 R08: ffff8800b9dd0b00 R09: 000000018020001c [ 763.988002] R10: 000000008020001c R11: ffffffff816075c0 R12: ffff8800bfcce7a0 [ 763.988002] R13: ffff8800b97745b0 R14: 0000000000000003 R15: 000000000000000a [ 763.988002] FS: 0000000000000000(0000) GS:ffff8800bfcc0000(0000) knlGS:0000000000000000 [ 763.988002] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 763.988002] CR2: ffff880037bd0800 CR3: 00000000b895b000 CR4: 00000000000006e0 [ 763.988002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 763.988002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 763.988002] Process swapper/3 (pid: 0, threadinfo ffff8800bbae0000, task ffff8800bbad8000) [ 763.988002] Stack: [ 763.988002] ffffffff8109b44d ffff8800bbacd820 ffff8800b97745b0 ffff8800bbae0010 [ 763.988002] ffff8800bbad8000 ffff8800bfcc3ea0 0000000000000048 ffff8800bbae1fd8 [ 763.988002] 0000000000000100 0000000000000001 0000000000000009 ffff8800bbae1fd8 [ 763.988002] Call Trace: [ 763.988002] <IRQ> [ 763.988002] [<ffffffff8109b44d>] ? __rcu_process_callbacks+0x1e9/0x335 [ 763.988002] [<ffffffff8109b8fb>] ? rcu_process_callbacks+0x2c/0x56 [ 763.988002] [<ffffffff8103e3b1>] ? __do_softirq+0xc4/0x1a0 [ 763.988002] [<ffffffff8102515b>] ? lapic_next_event+0x18/0x1d [ 763.988002] [<ffffffff815d3b1c>] ? call_softirq+0x1c/0x30 [ 763.988002] [<ffffffff8100fba3>] ? do_softirq+0x3f/0x79 [ 763.988002] [<ffffffff8103e186>] ? irq_exit+0x44/0xb1 [ 763.988002] [<ffffffff81025c61>] ? smp_apic_timer_interrupt+0x85/0x93 [ 763.988002] [<ffffffff815d311e>] ? apic_timer_interrupt+0x6e/0x80 [ 763.988002] <EOI> [ 763.988002] [<ffffffff810145e1>] ? native_sched_clock+0x28/0x33 [ 763.988002] [<ffffffff810152f6>] ? mwait_idle+0x8c/0xbc [ 763.988002] [<ffffffff810152ae>] ? mwait_idle+0x44/0xbc [ 763.988002] [<ffffffff8100de94>] ? cpu_idle+0xb9/0xf7 [ 763.988002] [<ffffffff815c43c6>] ? start_secondary+0x270/0x275 [ 763.988002] Code: 00 00 00 00 04 8a b8 00 88 ff ff 00 04 8a b8 00 88 ff ff 00 03 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 16 bd 37 00 88 ff ff 40 ab cd bf 00 88 ff ff 20 15 42 b9 00 [ 763.988002] RIP [<ffff880037bd0800>] 0xffff880037bd07ff [ 763.988002] RSP <ffff8800bfcc3e78> [ 763.988002] CR2: ffff880037bd0800 [ 763.988002] ---[ end trace 614049dc850267ac ]--- [ 763.988002] Kernel panic - not syncing: Fatal exception in interrupt [ 763.997833] ------------[ cut here ]------------ [ 763.997936] WARNING: at arch/x86/kernel/smp.c:120 update_process_times+0x57/0x63() [ 763.998072] Hardware name: ProLiant DL160 G5 [ 763.998171] Modules linked in: cbc netconsole loop snd_pcm snd_timer snd soundcore snd_page_alloc processor tpm_tis i5400_edac tpm edac_core tpm_bios evdev pcspkr i5k_amb rng_core thermal_sys button shpchp pci_hotplug sd_mod crc_t10dif usbhid hid ide_cd_mod cdrom ata_generic uhci_hcd ehci_hcd ata_piix libata piix ide_core usbcore usb_common tg3 libphy mptsas mptscsih mptbase scsi_transport_sas scsi_mod [last unloaded: scsi_wait_scan] [ 764.001205] Pid: 0, comm: swapper/3 Tainted: G D 3.3.4 #1 [ 764.001311] Call Trace: [ 764.001404] <IRQ> [<ffffffff81038bb0>] ? warn_slowpath_common+0x78/0x8c [ 764.001573] [<ffffffff81044937>] ? update_process_times+0x57/0x63 [ 764.001681] [<ffffffff81075dbe>] ? tick_sched_timer+0x65/0x8b [ 764.001788] [<ffffffff810561bd>] ? __run_hrtimer+0xb2/0x13d [ 764.001832] [<ffffffff81013ca9>] ? read_tsc+0x5/0x16 [ 764.001832] [<ffffffff81056482>] ? hrtimer_interrupt+0xd8/0x1a7 [ 764.001832] [<ffffffff81025c5c>] ? smp_apic_timer_interrupt+0x80/0x93 [ 764.001832] [<ffffffff81025c89>] ? native_safe_apic_wait_icr_idle+0x1a/0x49 [ 764.001832] [<ffffffff815d311e>] ? apic_timer_interrupt+0x6e/0x80 [ 764.001832] [<ffffffff81056eaa>] ? up+0xe/0x36 [ 764.001832] [<ffffffff815ca3ec>] ? panic+0x189/0x1c9 [ 764.001832] [<ffffffff815ca353>] ? panic+0xf0/0x1c9 [ 764.001832] [<ffffffff810390ee>] ? kmsg_dump+0x53/0xef [ 764.001832] [<ffffffff815cd05e>] ? oops_end+0xaa/0xb7 [ 764.001832] [<ffffffff8102eaca>] ? no_context+0x254/0x263 [ 764.001832] [<ffffffff815cf187>] ? do_page_fault+0x1ad/0x34c [ 764.001832] [<ffffffff814c6b67>] ? __netif_receive_skb+0x44d/0x491 [ 764.001832] [<ffffffff81013ca9>] ? read_tsc+0x5/0x16 [ 764.001832] [<ffffffff814c6f4f>] ? netif_receive_skb+0x71/0x77 [ 764.001832] [<ffffffff814c74bd>] ? napi_gro_receive+0x1f/0x2c [ 764.001832] [<ffffffff814c7029>] ? napi_skb_finish+0x1c/0x31 [ 764.001832] [<ffffffffa008cc74>] ? tg3_poll_work+0x8f9/0xb66 [tg3] [ 764.001832] [<ffffffff815cc5f5>] ? page_fault+0x25/0x30 [ 764.001832] [<ffffffff8109b44d>] ? __rcu_process_callbacks+0x1e9/0x335 [ 764.001832] [<ffffffff8109b8fb>] ? rcu_process_callbacks+0x2c/0x56 [ 764.001832] [<ffffffff8103e3b1>] ? __do_softirq+0xc4/0x1a0 [ 764.001832] [<ffffffff8102515b>] ? lapic_next_event+0x18/0x1d [ 764.001832] [<ffffffff815d3b1c>] ? call_softirq+0x1c/0x30 [ 764.001832] [<ffffffff8100fba3>] ? do_softirq+0x3f/0x79 [ 764.001832] [<ffffffff8103e186>] ? irq_exit+0x44/0xb1 [ 764.001832] [<ffffffff81025c61>] ? smp_apic_timer_interrupt+0x85/0x93 [ 764.001832] [<ffffffff815d311e>] ? apic_timer_interrupt+0x6e/0x80 [ 764.001832] <EOI> [<ffffffff810145e1>] ? native_sched_clock+0x28/0x33 [ 764.001832] [<ffffffff810152f6>] ? mwait_idle+0x8c/0xbc [ 764.001832] [<ffffffff810152ae>] ? mwait_idle+0x44/0xbc [ 764.001832] [<ffffffff8100de94>] ? cpu_idle+0xb9/0xf7 [ 764.001832] [<ffffffff815c43c6>] ? start_secondary+0x270/0x275 [ 764.001832] ---[ end trace 614049dc850267ad ]--- ---------------------------------------------------- Also, as you noted, I disabled the NX bit by passing "noexec=off" to the kernel. Unfortunately, the bug is still happening: ---------------------------------------------------- [ 703.168022] BUG: unable to handle kernel paging request at ffff87ffbfa0e22b [ 703.168293] IP: [<ffff8800b9767200>] 0xffff8800b97671ff [ 703.168457] PGD 0 [ 703.168613] Oops: 0002 [#1] SMP [ 703.168831] CPU 0 [ 703.168896] Modules linked in: cbc netconsole loop tpm_tis snd_pcm snd_timer snd soundcore shpchp pci_hotplug snd_page_alloc tpm i5400_edac rng_core tpm_bios edac_core i5k_amb processor pcspkr thermal_sys evdev button sd_mod crc_t10dif usbhid hid sg sr_mod cdrom ata_generic uhci_hcd piix ide_core ehci_hcd ata_piix tg3 libphy usbcore usb_common libata mptsas mptscsih mptbase scsi_transport_sas scsi_mod [last unloaded: scsi_wait_scan] [ 703.172001] [ 703.172001] Pid: 0, comm: swapper/0 Not tainted 3.3.4 #1 HP ProLiant DL160 G5 [ 703.172001] RIP: 0010:[<ffff8800b9767200>] [<ffff8800b9767200>] 0xffff8800b97671ff [ 703.172001] RSP: 0018:ffff8800bfc03e78 EFLAGS: 00010292 [ 703.172001] RAX: ffff880037a02900 RBX: ffff8800bfc0e770 RCX: ffff8800b9767200 [ 703.172001] RDX: ffff8800b92b9000 RSI: ffff8800b8901800 RDI: ffff8800b92b9000 [ 703.172001] RBP: ffffffff81820080 R08: ffff8800b8f77f00 R09: 000000018020000a [ 703.172001] R10: 000000008020000a R11: ffff8800bfc0e600 R12: ffff8800bfc0e7a0 [ 703.172001] R13: ffff8800ba177370 R14: 0000000000000005 R15: 000000000000000a [ 703.172001] FS: 0000000000000000(0000) GS:ffff8800bfc00000(0000) knlGS:0000000000000000 [ 703.172001] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 703.172001] CR2: ffff87ffbfa0e22b CR3: 0000000001805000 CR4: 00000000000006f0 [ 703.172001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 703.172001] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 703.172001] Process swapper/0 (pid: 0, threadinfo ffffffff81800000, task ffffffff8180d020) [ 703.172001] Stack: [ 703.172001] ffffffff8109b44d 0000000000000000 ffff880037a02900 ffffffff81800010 [ 703.172001] ffffffff8180d020 ffffffff81801fd8 0000000000000048 ffffffff81801fd8 [ 703.172001] 0000000000000100 0000000000000001 0000000000000009 ffffffff81801fd8 [ 703.172001] Call Trace: [ 703.172001] <IRQ> [ 703.172001] [<ffffffff8109b44d>] ? __rcu_process_callbacks+0x1e9/0x335 [ 703.172001] [<ffffffff8109b8fb>] ? rcu_process_callbacks+0x2c/0x56 [ 703.172001] [<ffffffff8103e3b1>] ? __do_softirq+0xc4/0x1a0 [ 703.172001] [<ffffffff8102515b>] ? lapic_next_event+0x18/0x1d [ 703.172001] [<ffffffff815d3b1c>] ? call_softirq+0x1c/0x30 [ 703.172001] [<ffffffff8100fba3>] ? do_softirq+0x3f/0x79 [ 703.172001] [<ffffffff8103e186>] ? irq_exit+0x44/0xb1 [ 703.172001] [<ffffffff81025c61>] ? smp_apic_timer_interrupt+0x85/0x93 [ 703.172001] [<ffffffff815d311e>] ? apic_timer_interrupt+0x6e/0x80 [ 703.172001] <EOI> [ 703.172001] [<ffffffff810145e1>] ? native_sched_clock+0x28/0x33 [ 703.172001] [<ffffffff810152f6>] ? mwait_idle+0x8c/0xbc [ 703.172001] [<ffffffff810152ae>] ? mwait_idle+0x44/0xbc [ 703.172001] [<ffffffff8100de94>] ? cpu_idle+0xb9/0xf7 [ 703.172001] [<ffffffff818c1c06>] ? start_kernel+0x395/0x3a0 [ 703.172001] [<ffffffff818c13d1>] ? x86_64_start_kernel+0x102/0x10f [ 703.172001] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 90 2b b9 00 88 ff ff 20 ac c1 bf 00 88 ff ff 20 98 a7 b8 00 [ 703.172001] RIP [<ffff8800b9767200>] 0xffff8800b97671ff [ 703.172001] RSP <ffff8800bfc03e78> [ 703.172001] CR2: ffff87ffbfa0e22b [ 703.172001] ---[ end trace 15e08c2db2033830 ]--- [ 703.172001] Kernel panic - not syncing: Fatal exception in interrupt ---------------------------------------------------- The strange thing is that the crash traces does not contain any calls related to Ceph. However, this bug only happens when running debootstrap to install a base Debian system into a Ceph directory. Debootstrap completes successfully when the target directory is under NFS or on a local file system. Furthermore, a different crash occurs when trying to remove a non-empty Ceph directory: ****************************************************** root@node33:/mnt# rm debian -r rm: cannot remove `debian/etc': Directory not empty Write failed: Broken pipe ****************************************************** The crash trace is shown below: ---------------------------------------------------- [74576.543412] libceph: client0 fsid 9b3222ac-fce2-44eb-8599-d39da02d2393 [74576.651197] libceph: mon0 192.168.2.254:6789 session established [75143.963663] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 [75143.963771] IP: [<ffffffff811061cd>] path_init+0x218/0x2cc [75143.963827] PGD 37a63067 PUD 37b4b067 PMD 0 [75143.963880] Oops: 0000 [#1] SMP [75143.963928] CPU 3 [75143.963935] Modules linked in: cbc netconsole loop i5400_edac snd_pcm edac_core i5k_amb snd_timer snd soundcore evdev snd_page_alloc tpm_tis tpm processor rng_core thermal_sys pcspkr tpm_bios shpchp pci_hotplug button sd_mod crc_t10dif usbhid hid ide_cd_mod cdrom ata_generic uhci_hcd ata_piix libata piix ide_core ehci_hcd usbcore usb_common tg3 libphy mptsas mptscsih mptbase scsi_transport_sas scsi_mod [last unloaded: netconsole] [75143.964390] [75143.964426] Pid: 3861, comm: rm Not tainted 3.3.4 #1 HP ProLiant DL160 G5 [75143.964485] RIP: 0010:[<ffffffff811061cd>] [<ffffffff811061cd>] path_init+0x218/0x2cc [75143.964570] RSP: 0018:ffff880037b45d58 EFLAGS: 00010202 [75143.964618] RAX: 0000000000000000 RBX: ffff8800b8975000 RCX: ffff880037b45ea8 [75143.964672] RDX: ffff8800b929a900 RSI: ffff880037b45d74 RDI: ffff8800b9d0e830 [75143.964727] RBP: 0000000000000050 R08: ffff880037b45de0 R09: 0000000000000000 [75143.964781] R10: 0000006e7265746c R11: 0000000001406d90 R12: ffff880037b45ea8 [75143.964835] R13: ffff8800b929a900 R14: ffff880037b45de0 R15: 0000000000000003 [75143.964890] FS: 00007fb7a9b47700(0000) GS:ffff8800bfcc0000(0000) knlGS:0000000000000000 [75143.964974] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [75143.965023] CR2: 0000000000000030 CR3: 00000000379d5000 CR4: 00000000000006e0 [75143.965078] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [75143.965132] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [75143.965187] Process rm (pid: 3861, threadinfo ffff880037b44000, task ffff8800b91fed00) [75143.965269] Stack: [75143.965306] 00000000b9f76000 000000005088218e ffff8800b9f76280 00000000b9f76280 [75143.965397] ffff880037b45ea8 0000000000000050 ffff8800b8975000 0000000000000010 [75143.965489] 00000000013f2030 ffffffff811072b4 ffffffff81060362 dead000000100100 [75143.965581] Call Trace: [75143.965622] [<ffffffff811072b4>] ? path_lookupat+0x2c/0x30b [75143.965674] [<ffffffff81060362>] ? try_to_wake_up+0x1a5/0x1a5 [75143.965725] [<ffffffff811075b1>] ? do_path_lookup+0x1e/0x9a [75143.965775] [<ffffffff81107bf1>] ? user_path_parent+0x3a/0x5f [75143.965826] [<ffffffff810f3484>] ? virt_to_head_page+0x9/0x2c [75143.965877] [<ffffffff81107e96>] ? do_unlinkat+0x1d/0x15e [75143.965927] [<ffffffff81109e94>] ? vfs_readdir+0x91/0xa7 [75143.965977] [<ffffffff8112a9a5>] ? fsnotify_find_inode_mark+0x23/0x2f [75143.966031] [<ffffffff810fa97e>] ? filp_close+0x64/0x6c [75143.966082] [<ffffffff815d2679>] ? system_call_fastpath+0x16/0x1b [75143.966133] Code: 04 01 e9 a1 00 00 00 48 8d 74 24 1c e8 ae 6f ff ff 49 89 c5 b8 f7 ff ff ff 4d 85 ed 0f 84 b0 00 00 00 49 8b 45 18 80 3b 00 74 28 <48> 8b 78 30 b8 ec ff ff ff 0f b7 17 81 e2 00 f0 00 00 81 fa 00 [75143.966507] RIP [<ffffffff811061cd>] path_init+0x218/0x2cc [75143.966558] RSP <ffff880037b45d58> [75143.966600] CR2: 0000000000000030 [75143.967124] ---[ end trace 18e2f523c5af9a38 ]--- [75143.967322] general protection fault: 0000 [#2] SMP [75143.967542] CPU 3 [75143.967607] Modules linked in: cbc netconsole loop i5400_edac snd_pcm edac_core i5k_amb snd_timer snd soundcore evdev snd_page_alloc tpm_tis tpm processor rng_core thermal_sys pcspkr tpm_bios shpchp pci_hotplug button sd_mod crc_t10dif usbhid hid ide_cd_mod cdrom ata_generic uhci_hcd ata_piix libata piix ide_core ehci_hcd usbcore usb_common tg3 libphy mptsas mptscsih mptbase scsi_transport_sas scsi_mod [last unloaded: netconsole] [75143.970715] [75143.970805] Pid: 3861, comm: rm Tainted: G D 3.3.4 #1 HP ProLiant DL160 G5 [75143.971058] RIP: 0010:[<ffffffff810fa947>] [<ffffffff810fa947>] filp_close+0x2d/0x6c [75143.971085] RSP: 0018:ffff880037b45a48 EFLAGS: 00010206 [75143.971085] RAX: 0012080800000000 RBX: ffff8800b910d300 RCX: 0000000000000000 [75143.971085] RDX: 0000000000000000 RSI: ffff8800b9712c00 RDI: ffff8800b910d300 [75143.971085] RBP: ffff8800b9712c00 R08: 0000000000016870 R09: 00007fff71600000 [75143.971085] R10: 0000000000000001 R11: ffff880037b459a8 R12: 0000000000000000 [75143.971085] R13: ffff8800bb51f6c0 R14: 0000000000000004 R15: 0000000000000000 [75143.971085] FS: 0000000000000000(0000) GS:ffff8800bfcc0000(0000) knlGS:0000000000000000 [75143.971085] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [75143.971085] CR2: 0000000000000030 CR3: 0000000001805000 CR4: 00000000000006e0 [75143.971085] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [75143.971085] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [75143.971085] Process rm (pid: 3861, threadinfo ffff880037b44000, task ffff8800b91fed00) [75143.971085] Stack: [75143.971085] ffff8800b9712c00 0000000000000007 0000000000000000 ffffffff8103aaad [75143.971085] 0000000000000009 ffff8800b91fed00 ffff8800b91ff218 0000000000000009 [75143.971085] ffff8800b8ca0700 ffff8800b92c0880 0000000000000001 ffffffff8103bfe4 [75143.971085] Call Trace: [75143.971085] [<ffffffff8103aaad>] ? put_files_struct+0x67/0xbf [75143.971085] [<ffffffff8103bfe4>] ? do_exit+0x2aa/0x7e1 [75143.971085] [<ffffffff810390ee>] ? kmsg_dump+0x53/0xef [75143.971085] [<ffffffff815cd01a>] ? oops_end+0x66/0xb7 [75143.971085] [<ffffffff815cd066>] ? oops_end+0xb2/0xb7 [75143.971085] [<ffffffff8102eaca>] ? no_context+0x254/0x263 [75143.971085] [<ffffffff81368f16>] ? ceph_writepages_start+0xbb4/0xbee [75143.971085] [<ffffffff815cf1ef>] ? do_page_fault+0x215/0x34c [75143.971085] [<ffffffff8136a1e5>] ? __cap_is_valid+0x19/0x9a [75143.971085] [<ffffffff8136ba47>] ? ceph_encode_inode_release+0xed/0x2b2 [75143.971085] [<ffffffff81063c10>] ? update_curr+0xfb/0x130 [75143.971085] [<ffffffff8100d6fe>] ? __switch_to+0x20b/0x35f [75143.971085] [<ffffffff81063c10>] ? update_curr+0xfb/0x130 [75143.971085] [<ffffffff815cc5f5>] ? page_fault+0x25/0x30 [75143.971085] [<ffffffff811061cd>] ? path_init+0x218/0x2cc [75143.971085] [<ffffffff811061b3>] ? path_init+0x1fe/0x2cc [75143.971085] [<ffffffff811072b4>] ? path_lookupat+0x2c/0x30b [75143.971085] [<ffffffff81060362>] ? try_to_wake_up+0x1a5/0x1a5 [75143.971085] [<ffffffff811075b1>] ? do_path_lookup+0x1e/0x9a [75143.971085] [<ffffffff81107bf1>] ? user_path_parent+0x3a/0x5f [75143.971085] [<ffffffff810f3484>] ? virt_to_head_page+0x9/0x2c [75143.971085] [<ffffffff81107e96>] ? do_unlinkat+0x1d/0x15e [75143.971085] [<ffffffff81109e94>] ? vfs_readdir+0x91/0xa7 [75143.971085] [<ffffffff8112a9a5>] ? fsnotify_find_inode_mark+0x23/0x2f [75143.971085] [<ffffffff810fa97e>] ? filp_close+0x64/0x6c [75143.971085] [<ffffffff815d2679>] ? system_call_fastpath+0x16/0x1b [75143.971085] Code: 55 48 89 f5 53 48 89 fb 48 8b 47 30 48 85 c0 75 11 48 c7 c7 6e d5 72 81 45 31 e4 e8 f0 fa 4c 00 eb 40 48 8b 47 20 48 85 c0 74 10 <48> 8b 40 60 48 85 c0 74 07 ff d0 41 89 c4 eb 03 45 31 e4 f6 43 [75143.971085] RIP [<ffffffff810fa947>] filp_close+0x2d/0x6c [75143.971085] RSP <ffff880037b45a48> [75143.988721] ---[ end trace 18e2f523c5af9a39 ]--- [75143.988826] Fixing recursive fault but reboot is needed! [75146.018276] ------------[ cut here ]------------ [75146.018399] kernel BUG at mm/slub.c:3442! [75146.018498] invalid opcode: 0000 [#3] SMP [75146.018718] CPU 1 [75146.018789] Modules linked in: cbc netconsole loop i5400_edac snd_pcm edac_core i5k_amb snd_timer snd soundcore evdev snd_page_alloc tpm_tis tpm processor rng_core thermal_sys pcspkr tpm_bios shpchp pci_hotplug button sd_mod crc_t10dif usbhid hid ide_cd_mod cdrom ata_generic uhci_hcd ata_piix libata piix ide_core ehci_hcd usbcore usb_common tg3 libphy mptsas mptscsih mptbase scsi_transport_sas scsi_mod [last unloaded: netconsole] [75146.021908] [75146.021999] Pid: 1137, comm: kworker/1:0 Tainted: G D 3.3.4 #1 HP ProLiant DL160 G5 [75146.022236] RIP: 0010:[<ffffffff810f55df>] [<ffffffff810f55df>] kfree+0x59/0xc2 [75146.022236] RSP: 0018:ffff8800bbb35b90 EFLAGS: 00010246 [75146.022236] RAX: 0100000000000400 RBX: ffff8800bfcdab70 RCX: ffff8800b9f762c8 [75146.022236] RDX: ffff8800bfc4e550 RSI: 0000000000000000 RDI: ffffea0002ff3680 [75146.022236] RBP: ffffffff815a2a50 R08: 0000000000000000 R09: ffffffff814f9620 [75146.022236] R10: 000000000000000d R11: ffff8800b9d0d000 R12: ffffffff815a2a47 [75146.022236] R13: ffff8800ba15d400 R14: ffff8800b9d0d271 R15: ffff8800b9d0d000 [75146.022236] FS: 0000000000000000(0000) GS:ffff8800bfc40000(0000) knlGS:0000000000000000 [75146.022236] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [75146.022236] CR2: ffffffffff600400 CR3: 0000000037b13000 CR4: 00000000000006e0 [75146.022236] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [75146.022236] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [75146.022236] Process kworker/1:0 (pid: 1137, threadinfo ffff8800bbb34000, task ffff8800b90b0000) [75146.022236] Stack: [75146.022236] ffff8800b8fde500 ffffffff815a2a50 ffff8800b9f72800 ffffffff815a2a47 [75146.022236] ffff8800b8fde588 ffffffff81372b8e 0000001b00004040 ffff8800b9f72a68 [75146.022236] ffff8800b9f72800 ffffffff8137513d ffff8800ba15d400 ffff8800b9f72a68 [75146.022236] Call Trace: [75146.022236] [<ffffffff815a2a50>] ? ceph_msg_kfree+0x47/0x47 [75146.022236] [<ffffffff815a2a47>] ? ceph_msg_kfree+0x3e/0x47 [75146.022236] [<ffffffff81372b8e>] ? kref_put+0x34/0x3e [75146.022236] [<ffffffff8137513d>] ? ceph_mdsc_release_request+0x2f/0x145 [75146.022236] [<ffffffff8137510e>] ? encode_caps_cb+0x2f9/0x2f9 [75146.022236] [<ffffffff81372b8e>] ? kref_put+0x34/0x3e [75146.022236] [<ffffffff813778d3>] ? dispatch+0xe05/0x132c [75146.022236] [<ffffffff814b3d5e>] ? kernel_recvmsg+0x34/0x3f [75146.022236] [<ffffffff813dce42>] ? crc32c+0x56/0x7c [75146.022236] [<ffffffff815a39c7>] ? ceph_tcp_recvmsg+0x43/0x4f [75146.022236] [<ffffffff815a658b>] ? con_work+0x15ac/0x17a8 [75146.022236] [<ffffffff8104483f>] ? lock_timer_base+0x25/0x49 [75146.022236] [<ffffffff815a4fdf>] ? ceph_fault+0x2b4/0x2b4 [75146.022236] [<ffffffff8104ef8a>] ? process_one_work+0x1cd/0x2eb [75146.022236] [<ffffffff8104f1d6>] ? worker_thread+0x12e/0x249 [75146.022236] [<ffffffff8104f0a8>] ? process_one_work+0x2eb/0x2eb [75146.022236] [<ffffffff8104f0a8>] ? process_one_work+0x2eb/0x2eb [75146.022236] [<ffffffff81052b82>] ? kthread+0x81/0x89 [75146.022236] [<ffffffff815d3a24>] ? kernel_thread_helper+0x4/0x10 [75146.022236] [<ffffffff81052b01>] ? kthread_freezable_should_stop+0x53/0x53 [75146.022236] [<ffffffff815d3a20>] ? gs_change+0x13/0x13 [75146.022236] Code: 00 48 83 c5 10 48 83 7d 00 00 eb e6 48 83 fb 10 76 7d 48 89 df e8 ad de ff ff 48 89 c7 48 8b 00 84 c0 78 14 66 f7 07 00 c0 75 04 <0f> 0b eb fe 5b 5d 41 5c e9 e8 03 fd ff 4c 8b 54 24 18 4c 8b 4f [75146.022236] RIP [<ffffffff810f55df>] kfree+0x59/0xc2 [75146.022236] RSP <ffff8800bbb35b90> [75146.031675] ---[ end trace 18e2f523c5af9a3a ]--- [75146.031809] BUG: unable to handle kernel paging request at fffffffffffffff8 [75146.032058] IP: [<ffffffff81052783>] kthread_data+0x7/0xc [75146.032221] PGD 1807067 PUD 1808067 PMD 0 [75146.032494] Oops: 0000 [#4] SMP [75146.032706] CPU 1 [75146.032771] Modules linked in: cbc netconsole loop i5400_edac snd_pcm edac_core i5k_amb snd_timer snd soundcore evdev snd_page_alloc tpm_tis tpm processor rng_core thermal_sys pcspkr tpm_bios shpchp pci_hotplug button sd_mod crc_t10dif usbhid hid ide_cd_mod cdrom ata_generic uhci_hcd ata_piix libata piix ide_core ehci_hcd usbcore usb_common tg3 libphy mptsas mptscsih mptbase scsi_transport_sas scsi_mod [last unloaded: netconsole] [75146.035616] [75146.035616] Pid: 1137, comm: kworker/1:0 Tainted: G D 3.3.4 #1 HP ProLiant DL160 G5 [75146.035616] RIP: 0010:[<ffffffff81052783>] [<ffffffff81052783>] kthread_data+0x7/0xc [75146.035616] RSP: 0018:ffff8800bbb35900 EFLAGS: 00010002 [75146.035616] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001 [75146.035616] RDX: ffffffff819a87b0 RSI: 0000000000000001 RDI: ffff8800b90b0000 [75146.035616] RBP: ffff8800b90b0000 R08: 0000000000000400 R09: ffffffff81013c7c [75146.035616] R10: ffff8800b90b0000 R11: ffff8800b90b0518 R12: ffff8800b90b02f8 [75146.035616] R13: ffff8800bbb359c8 R14: 0000000000000001 R15: 0000000000000001 [75146.035616] FS: 0000000000000000(0000) GS:ffff8800bfc40000(0000) knlGS:0000000000000000 [75146.035616] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [75146.035616] CR2: fffffffffffffff8 CR3: 0000000037b13000 CR4: 00000000000006e0 [75146.035616] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [75146.035616] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [75146.035616] Process kworker/1:0 (pid: 1137, threadinfo ffff8800bbb34000, task ffff8800b90b0000) [75146.035616] Stack: [75146.035616] ffffffff8104e2a2 ffff8800bfc53340 ffffffff815cb1d1 0000000000000001 [75146.035616] 0000000000000296 0000000000013340 ffff8800bbb35fd8 0000000000013340 [75146.035616] ffff8800bbb35fd8 0000000000013340 ffff8800b90b0000 0000000000013340 [75146.035616] Call Trace: [75146.035616] [<ffffffff8104e2a2>] ? wq_worker_sleeping+0x8/0x82 [75146.035616] [<ffffffff815cb1d1>] ? __schedule+0x166/0x4fc [75146.035616] [<ffffffff8103c517>] ? do_exit+0x7dd/0x7e1 [75146.035616] [<ffffffff815ca46c>] ? printk+0x40/0x4c [75146.035616] [<ffffffff815cd01a>] ? oops_end+0x66/0xb7 [75146.035616] [<ffffffff815cd066>] ? oops_end+0xb2/0xb7 [75146.035616] [<ffffffff815a2a47>] ? ceph_msg_kfree+0x3e/0x47 [75146.035616] [<ffffffff8100ef69>] ? do_invalid_op+0x8b/0x95 [75146.035616] [<ffffffff810f55df>] ? kfree+0x59/0xc2 [75146.035616] [<ffffffff81515b94>] ? inet_recvmsg+0x64/0x75 [75146.035616] [<ffffffff815a2a50>] ? ceph_msg_kfree+0x47/0x47 [75146.035616] [<ffffffff815d389b>] ? invalid_op+0x1b/0x20 [75146.035616] [<ffffffff815a2a47>] ? ceph_msg_kfree+0x3e/0x47 [75146.035616] [<ffffffff815a2a50>] ? ceph_msg_kfree+0x47/0x47 [75146.035616] [<ffffffff814f9620>] ? tcp_recvmsg+0x773/0x95e [75146.035616] [<ffffffff810f55df>] ? kfree+0x59/0xc2 [75146.035616] [<ffffffff815a2a50>] ? ceph_msg_kfree+0x47/0x47 [75146.035616] [<ffffffff815a2a47>] ? ceph_msg_kfree+0x3e/0x47 [75146.035616] [<ffffffff81372b8e>] ? kref_put+0x34/0x3e [75146.035616] [<ffffffff8137513d>] ? ceph_mdsc_release_request+0x2f/0x145 [75146.035616] [<ffffffff8137510e>] ? encode_caps_cb+0x2f9/0x2f9 [75146.035616] [<ffffffff81372b8e>] ? kref_put+0x34/0x3e [75146.035616] [<ffffffff813778d3>] ? dispatch+0xe05/0x132c [75146.035616] [<ffffffff814b3d5e>] ? kernel_recvmsg+0x34/0x3f [75146.035616] [<ffffffff813dce42>] ? crc32c+0x56/0x7c [75146.035616] [<ffffffff815a39c7>] ? ceph_tcp_recvmsg+0x43/0x4f [75146.035616] [<ffffffff815a658b>] ? con_work+0x15ac/0x17a8 [75146.035616] [<ffffffff8104483f>] ? lock_timer_base+0x25/0x49 [75146.035616] [<ffffffff815a4fdf>] ? ceph_fault+0x2b4/0x2b4 [75146.035616] [<ffffffff8104ef8a>] ? process_one_work+0x1cd/0x2eb [75146.035616] [<ffffffff8104f1d6>] ? worker_thread+0x12e/0x249 [75146.035616] [<ffffffff8104f0a8>] ? process_one_work+0x2eb/0x2eb [75146.035616] [<ffffffff8104f0a8>] ? process_one_work+0x2eb/0x2eb [75146.035616] [<ffffffff81052b82>] ? kthread+0x81/0x89 [75146.035616] [<ffffffff815d3a24>] ? kernel_thread_helper+0x4/0x10 [75146.035616] [<ffffffff81052b01>] ? kthread_freezable_should_stop+0x53/0x53 [75146.035616] [<ffffffff815d3a20>] ? gs_change+0x13/0x13 [75146.035616] Code: 41 5e 41 5f c3 41 bf ea ff ff ff eb 97 90 90 90 65 48 8b 04 25 c0 c6 00 00 48 8b 80 a0 02 00 00 8b 40 f0 c3 48 8b 87 a0 02 00 00 <48> 8b 40 f8 c3 48 3b 3d 51 5f 95 00 75 08 0f bf 87 6a 06 00 00 [75146.035616] RIP [<ffffffff81052783>] kthread_data+0x7/0xc [75146.035616] RSP <ffff8800bbb35900> [75146.035616] CR2: fffffffffffffff8 [75146.035616] ---[ end trace 18e2f523c5af9a3b ]--- [75146.035616] Fixing recursive fault but reboot is needed! [75206.036002] INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 3, t=15002 jiffies) [75206.036265] Pid: 0, comm: swapper/3 Tainted: G D 3.3.4 #1 [75206.036371] Call Trace: [75206.036464] <IRQ> [<ffffffff8109b7b3>] ? __rcu_pending+0x21a/0x336 [75206.036635] [<ffffffff81075d59>] ? tick_nohz_handler+0xcb/0xcb [75206.036740] [<ffffffff8109b9cc>] ? rcu_check_callbacks+0xa7/0xe7 [75206.036846] [<ffffffff81075d59>] ? tick_nohz_handler+0xcb/0xcb [75206.036951] [<ffffffff81044911>] ? update_process_times+0x31/0x63 ---------------------------------------------------- Thanks a lot, Giorgos Kappes On Tue, May 8, 2012 at 10:18 PM, Tommi Virtanen <tv@inktank.com> wrote: > On Tue, May 8, 2012 at 8:43 AM, Giorgos Kappes <geokapp@gmail.com> wrote: >> When I am running deboostrap to install a base Debian Squeeze system >> on a Ceph directory the client's kernel crashes with the following >> message: >> >> I: Extracting zlib1g... >> W: Failure trying to run: chroot /mnt/debian mount -t proc proc /proc >> [ 759.776151] kernel tried to execute NX-protected page - exploit >> attempt? (uid: 0) >> [ 759.776169] BUG: unable to handle kernel paging request at ffffe8fffffe4ab0 > ... >> [ 759.776438] [<ffffffff81099405>] ? __rcu_process_callbacks+0x1c7/0x2f8 >> [ 759.776447] [<ffffffff81099898>] ? rcu_process_callbacks+0x2c/0x56 >> [ 759.776457] [<ffffffff8104cb72>] ? __do_softirq+0xc4/0x1a0 >> [ 759.776465] [<ffffffff81096875>] ? handle_percpu_irq+0x3d/0x54 >> [ 759.776475] [<ffffffff8150efb6>] ? __xen_evtchn_do_upcall+0x1c7/0x205 >> [ 759.776484] [<ffffffff8176e52c>] ? call_softirq+0x1c/0x30 >> [ 759.776493] [<ffffffff8100fa47>] ? do_softirq+0x3f/0x79 >> [ 759.776501] [<ffffffff8104c942>] ? irq_exit+0x44/0xb5 >> [ 759.776508] [<ffffffff8150ffc6>] ? xen_evtchn_do_upcall+0x27/0x32 >> [ 759.776516] [<ffffffff8176e57e>] ? xen_do_hypervisor_callback+0x1e/0x30 > ... >> My simple cluster consists of 3 nodes in total. Each node is a Xen >> domU guest running the Linux kernel 3.2.6 and ceph 0.43. For >> reference, here is my configuration: > ... >> My Ceph kernel client is another Xen domU node running the Linux >> kernel 3.2.11. I have also tried a native client with the same result. >> Please note that this bug happens only in the client side. >> Your help would be greatly appreciated. > > Your backtrace includes Xen code in it -- can you reproduce this bug > with a mainline kernel, without Xen at all? > > Also, the error encountered is from the NX security subsystem. It > would be nice to know what would happen without NX. ----------------------------------------------------------- Giorgos Kappes Website: http://www.cs.uoi.gr/~gkappes email: geokapp@gmail.com -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Ceph kernel client - kernel craches 2012-05-10 18:00 ` Giorgos Kappes @ 2012-05-17 22:49 ` Josh Durgin 0 siblings, 0 replies; 4+ messages in thread From: Josh Durgin @ 2012-05-17 22:49 UTC (permalink / raw) To: Giorgos Kappes; +Cc: ceph-devel Sorry your mail fell through the cracks before. I filed http://tracker.newdream.net/issues/2445 to track the ceph-related crashes. Alex, do you think the first crash is related to ceph at all? Josh On 05/10/2012 11:00 AM, Giorgos Kappes wrote: > Sorry for my late response. I reproduced the above bug with the Linux > kernel 3.3.4 and without using XEN: > > uname -a > Linux node33 3.3.4 #1 SMP Wed May 9 13:00:07 EEST 2012 x86_64 GNU/Linux > > The trace is shown below: > > ---------------------------------------------------- > [ 763.984023] kernel tried to execute NX-protected page - exploit > attempt? (uid: 0) > [ 763.984177] BUG: unable to handle kernel paging request at ffff880037bd0800 > [ 763.984402] IP: [<ffff880037bd0800>] 0xffff880037bd07ff > [ 763.984568] PGD 1806063 PUD 180a063 PMD 8000000037a001e3 > [ 763.984845] Oops: 0011 [#1] SMP > [ 763.985058] CPU 3 > [ 763.985124] Modules linked in: cbc netconsole loop snd_pcm > snd_timer snd soundcore snd_page_alloc processor tpm_tis i5400_edac > tpm edac_core tpm_bios evdev pcspkr i5k_amb rng_core thermal_sys > button shpchp pci_hotplug sd_mod crc_t10dif usbhid hid ide_cd_mod > cdrom ata_generic uhci_hcd ehci_hcd ata_piix libata piix ide_core > usbcore usb_common tg3 libphy mptsas mptscsih mptbase > scsi_transport_sas scsi_mod [last unloaded: scsi_wait_scan] > [ 763.988002] > [ 763.988002] Pid: 0, comm: swapper/3 Not tainted 3.3.4 #1 HP ProLiant DL160 G5 > [ 763.988002] RIP: 0010:[<ffff880037bd0800>] [<ffff880037bd0800>] > 0xffff880037bd07ff > [ 763.988002] RSP: 0018:ffff8800bfcc3e78 EFLAGS: 00010292 > [ 763.988002] RAX: ffff8800b97745b0 RBX: ffff8800bfcce770 RCX: ffff880037bd0800 > [ 763.988002] RDX: ffff880037bd1600 RSI: 00000000b9b6a040 RDI: ffff880037bd1600 > [ 763.988002] RBP: ffffffff81820080 R08: ffff8800b9dd0b00 R09: 000000018020001c > [ 763.988002] R10: 000000008020001c R11: ffffffff816075c0 R12: ffff8800bfcce7a0 > [ 763.988002] R13: ffff8800b97745b0 R14: 0000000000000003 R15: 000000000000000a > [ 763.988002] FS: 0000000000000000(0000) GS:ffff8800bfcc0000(0000) > knlGS:0000000000000000 > [ 763.988002] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 763.988002] CR2: ffff880037bd0800 CR3: 00000000b895b000 CR4: 00000000000006e0 > [ 763.988002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 763.988002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 763.988002] Process swapper/3 (pid: 0, threadinfo ffff8800bbae0000, > task ffff8800bbad8000) > [ 763.988002] Stack: > [ 763.988002] ffffffff8109b44d ffff8800bbacd820 ffff8800b97745b0 > ffff8800bbae0010 > [ 763.988002] ffff8800bbad8000 ffff8800bfcc3ea0 0000000000000048 > ffff8800bbae1fd8 > [ 763.988002] 0000000000000100 0000000000000001 0000000000000009 > ffff8800bbae1fd8 > [ 763.988002] Call Trace: > [ 763.988002]<IRQ> > [ 763.988002] [<ffffffff8109b44d>] ? __rcu_process_callbacks+0x1e9/0x335 > [ 763.988002] [<ffffffff8109b8fb>] ? rcu_process_callbacks+0x2c/0x56 > [ 763.988002] [<ffffffff8103e3b1>] ? __do_softirq+0xc4/0x1a0 > [ 763.988002] [<ffffffff8102515b>] ? lapic_next_event+0x18/0x1d > [ 763.988002] [<ffffffff815d3b1c>] ? call_softirq+0x1c/0x30 > [ 763.988002] [<ffffffff8100fba3>] ? do_softirq+0x3f/0x79 > [ 763.988002] [<ffffffff8103e186>] ? irq_exit+0x44/0xb1 > [ 763.988002] [<ffffffff81025c61>] ? smp_apic_timer_interrupt+0x85/0x93 > [ 763.988002] [<ffffffff815d311e>] ? apic_timer_interrupt+0x6e/0x80 > [ 763.988002]<EOI> > [ 763.988002] [<ffffffff810145e1>] ? native_sched_clock+0x28/0x33 > [ 763.988002] [<ffffffff810152f6>] ? mwait_idle+0x8c/0xbc > [ 763.988002] [<ffffffff810152ae>] ? mwait_idle+0x44/0xbc > [ 763.988002] [<ffffffff8100de94>] ? cpu_idle+0xb9/0xf7 > [ 763.988002] [<ffffffff815c43c6>] ? start_secondary+0x270/0x275 > [ 763.988002] Code: 00 00 00 00 04 8a b8 00 88 ff ff 00 04 8a b8 00 > 88 ff ff 00 03 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00<00> 16 bd 37 00 88 ff ff 40 ab cd bf 00 88 ff ff 20 15 42 > b9 00 > [ 763.988002] RIP [<ffff880037bd0800>] 0xffff880037bd07ff > [ 763.988002] RSP<ffff8800bfcc3e78> > [ 763.988002] CR2: ffff880037bd0800 > [ 763.988002] ---[ end trace 614049dc850267ac ]--- > [ 763.988002] Kernel panic - not syncing: Fatal exception in interrupt > [ 763.997833] ------------[ cut here ]------------ > [ 763.997936] WARNING: at arch/x86/kernel/smp.c:120 > update_process_times+0x57/0x63() > [ 763.998072] Hardware name: ProLiant DL160 G5 > [ 763.998171] Modules linked in: cbc netconsole loop snd_pcm > snd_timer snd soundcore snd_page_alloc processor tpm_tis i5400_edac > tpm edac_core tpm_bios evdev pcspkr i5k_amb rng_core thermal_sys > button shpchp pci_hotplug sd_mod crc_t10dif usbhid hid ide_cd_mod > cdrom ata_generic uhci_hcd ehci_hcd ata_piix libata piix ide_core > usbcore usb_common tg3 libphy mptsas mptscsih mptbase > scsi_transport_sas scsi_mod [last unloaded: scsi_wait_scan] > [ 764.001205] Pid: 0, comm: swapper/3 Tainted: G D 3.3.4 #1 > [ 764.001311] Call Trace: > [ 764.001404]<IRQ> [<ffffffff81038bb0>] ? warn_slowpath_common+0x78/0x8c > [ 764.001573] [<ffffffff81044937>] ? update_process_times+0x57/0x63 > [ 764.001681] [<ffffffff81075dbe>] ? tick_sched_timer+0x65/0x8b > [ 764.001788] [<ffffffff810561bd>] ? __run_hrtimer+0xb2/0x13d > [ 764.001832] [<ffffffff81013ca9>] ? read_tsc+0x5/0x16 > [ 764.001832] [<ffffffff81056482>] ? hrtimer_interrupt+0xd8/0x1a7 > [ 764.001832] [<ffffffff81025c5c>] ? smp_apic_timer_interrupt+0x80/0x93 > [ 764.001832] [<ffffffff81025c89>] ? native_safe_apic_wait_icr_idle+0x1a/0x49 > [ 764.001832] [<ffffffff815d311e>] ? apic_timer_interrupt+0x6e/0x80 > [ 764.001832] [<ffffffff81056eaa>] ? up+0xe/0x36 > [ 764.001832] [<ffffffff815ca3ec>] ? panic+0x189/0x1c9 > [ 764.001832] [<ffffffff815ca353>] ? panic+0xf0/0x1c9 > [ 764.001832] [<ffffffff810390ee>] ? kmsg_dump+0x53/0xef > [ 764.001832] [<ffffffff815cd05e>] ? oops_end+0xaa/0xb7 > [ 764.001832] [<ffffffff8102eaca>] ? no_context+0x254/0x263 > [ 764.001832] [<ffffffff815cf187>] ? do_page_fault+0x1ad/0x34c > [ 764.001832] [<ffffffff814c6b67>] ? __netif_receive_skb+0x44d/0x491 > [ 764.001832] [<ffffffff81013ca9>] ? read_tsc+0x5/0x16 > [ 764.001832] [<ffffffff814c6f4f>] ? netif_receive_skb+0x71/0x77 > [ 764.001832] [<ffffffff814c74bd>] ? napi_gro_receive+0x1f/0x2c > [ 764.001832] [<ffffffff814c7029>] ? napi_skb_finish+0x1c/0x31 > [ 764.001832] [<ffffffffa008cc74>] ? tg3_poll_work+0x8f9/0xb66 [tg3] > [ 764.001832] [<ffffffff815cc5f5>] ? page_fault+0x25/0x30 > [ 764.001832] [<ffffffff8109b44d>] ? __rcu_process_callbacks+0x1e9/0x335 > [ 764.001832] [<ffffffff8109b8fb>] ? rcu_process_callbacks+0x2c/0x56 > [ 764.001832] [<ffffffff8103e3b1>] ? __do_softirq+0xc4/0x1a0 > [ 764.001832] [<ffffffff8102515b>] ? lapic_next_event+0x18/0x1d > [ 764.001832] [<ffffffff815d3b1c>] ? call_softirq+0x1c/0x30 > [ 764.001832] [<ffffffff8100fba3>] ? do_softirq+0x3f/0x79 > [ 764.001832] [<ffffffff8103e186>] ? irq_exit+0x44/0xb1 > [ 764.001832] [<ffffffff81025c61>] ? smp_apic_timer_interrupt+0x85/0x93 > [ 764.001832] [<ffffffff815d311e>] ? apic_timer_interrupt+0x6e/0x80 > [ 764.001832]<EOI> [<ffffffff810145e1>] ? native_sched_clock+0x28/0x33 > [ 764.001832] [<ffffffff810152f6>] ? mwait_idle+0x8c/0xbc > [ 764.001832] [<ffffffff810152ae>] ? mwait_idle+0x44/0xbc > [ 764.001832] [<ffffffff8100de94>] ? cpu_idle+0xb9/0xf7 > [ 764.001832] [<ffffffff815c43c6>] ? start_secondary+0x270/0x275 > [ 764.001832] ---[ end trace 614049dc850267ad ]--- > > ---------------------------------------------------- > > Also, as you noted, I disabled the NX bit by passing "noexec=off" to > the kernel. > Unfortunately, the bug is still happening: > > ---------------------------------------------------- > [ 703.168022] BUG: unable to handle kernel paging request at ffff87ffbfa0e22b > [ 703.168293] IP: [<ffff8800b9767200>] 0xffff8800b97671ff > [ 703.168457] PGD 0 > [ 703.168613] Oops: 0002 [#1] SMP > [ 703.168831] CPU 0 > [ 703.168896] Modules linked in: cbc netconsole loop tpm_tis snd_pcm > snd_timer snd soundcore shpchp pci_hotplug snd_page_alloc tpm > i5400_edac rng_core tpm_bios edac_core i5k_amb processor pcspkr > thermal_sys evdev button sd_mod crc_t10dif usbhid hid sg sr_mod cdrom > ata_generic uhci_hcd piix ide_core ehci_hcd ata_piix tg3 libphy > usbcore usb_common libata mptsas mptscsih mptbase scsi_transport_sas > scsi_mod [last unloaded: scsi_wait_scan] > [ 703.172001] > [ 703.172001] Pid: 0, comm: swapper/0 Not tainted 3.3.4 #1 HP ProLiant DL160 G5 > [ 703.172001] RIP: 0010:[<ffff8800b9767200>] [<ffff8800b9767200>] > 0xffff8800b97671ff > [ 703.172001] RSP: 0018:ffff8800bfc03e78 EFLAGS: 00010292 > [ 703.172001] RAX: ffff880037a02900 RBX: ffff8800bfc0e770 RCX: ffff8800b9767200 > [ 703.172001] RDX: ffff8800b92b9000 RSI: ffff8800b8901800 RDI: ffff8800b92b9000 > [ 703.172001] RBP: ffffffff81820080 R08: ffff8800b8f77f00 R09: 000000018020000a > [ 703.172001] R10: 000000008020000a R11: ffff8800bfc0e600 R12: ffff8800bfc0e7a0 > [ 703.172001] R13: ffff8800ba177370 R14: 0000000000000005 R15: 000000000000000a > [ 703.172001] FS: 0000000000000000(0000) GS:ffff8800bfc00000(0000) > knlGS:0000000000000000 > [ 703.172001] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 703.172001] CR2: ffff87ffbfa0e22b CR3: 0000000001805000 CR4: 00000000000006f0 > [ 703.172001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 703.172001] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 703.172001] Process swapper/0 (pid: 0, threadinfo ffffffff81800000, > task ffffffff8180d020) > [ 703.172001] Stack: > [ 703.172001] ffffffff8109b44d 0000000000000000 ffff880037a02900 > ffffffff81800010 > [ 703.172001] ffffffff8180d020 ffffffff81801fd8 0000000000000048 > ffffffff81801fd8 > [ 703.172001] 0000000000000100 0000000000000001 0000000000000009 > ffffffff81801fd8 > [ 703.172001] Call Trace: > [ 703.172001]<IRQ> > [ 703.172001] [<ffffffff8109b44d>] ? __rcu_process_callbacks+0x1e9/0x335 > [ 703.172001] [<ffffffff8109b8fb>] ? rcu_process_callbacks+0x2c/0x56 > [ 703.172001] [<ffffffff8103e3b1>] ? __do_softirq+0xc4/0x1a0 > [ 703.172001] [<ffffffff8102515b>] ? lapic_next_event+0x18/0x1d > [ 703.172001] [<ffffffff815d3b1c>] ? call_softirq+0x1c/0x30 > [ 703.172001] [<ffffffff8100fba3>] ? do_softirq+0x3f/0x79 > [ 703.172001] [<ffffffff8103e186>] ? irq_exit+0x44/0xb1 > [ 703.172001] [<ffffffff81025c61>] ? smp_apic_timer_interrupt+0x85/0x93 > [ 703.172001] [<ffffffff815d311e>] ? apic_timer_interrupt+0x6e/0x80 > [ 703.172001]<EOI> > [ 703.172001] [<ffffffff810145e1>] ? native_sched_clock+0x28/0x33 > [ 703.172001] [<ffffffff810152f6>] ? mwait_idle+0x8c/0xbc > [ 703.172001] [<ffffffff810152ae>] ? mwait_idle+0x44/0xbc > [ 703.172001] [<ffffffff8100de94>] ? cpu_idle+0xb9/0xf7 > [ 703.172001] [<ffffffff818c1c06>] ? start_kernel+0x395/0x3a0 > [ 703.172001] [<ffffffff818c13d1>] ? x86_64_start_kernel+0x102/0x10f > [ 703.172001] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00<00> 90 2b b9 00 88 ff ff 20 ac c1 bf 00 88 ff ff 20 98 a7 > b8 00 > [ 703.172001] RIP [<ffff8800b9767200>] 0xffff8800b97671ff > [ 703.172001] RSP<ffff8800bfc03e78> > [ 703.172001] CR2: ffff87ffbfa0e22b > [ 703.172001] ---[ end trace 15e08c2db2033830 ]--- > [ 703.172001] Kernel panic - not syncing: Fatal exception in interrupt > ---------------------------------------------------- > > The strange thing is that the crash traces does not contain any calls > related to Ceph. > However, this bug only happens when running debootstrap to install a > base Debian system > into a Ceph directory. Debootstrap completes successfully when the > target directory is > under NFS or on a local file system. > > Furthermore, a different crash occurs when trying to remove a > non-empty Ceph directory: > ****************************************************** > root@node33:/mnt# rm debian -r > rm: cannot remove `debian/etc': Directory not empty > Write failed: Broken pipe > ****************************************************** > The crash trace is shown below: > > ---------------------------------------------------- > > [74576.543412] libceph: client0 fsid 9b3222ac-fce2-44eb-8599-d39da02d2393 > [74576.651197] libceph: mon0 192.168.2.254:6789 session established > [75143.963663] BUG: unable to handle kernel NULL pointer dereference > at 0000000000000030 > [75143.963771] IP: [<ffffffff811061cd>] path_init+0x218/0x2cc > [75143.963827] PGD 37a63067 PUD 37b4b067 PMD 0 > [75143.963880] Oops: 0000 [#1] SMP > [75143.963928] CPU 3 > [75143.963935] Modules linked in: cbc netconsole loop i5400_edac > snd_pcm edac_core i5k_amb snd_timer snd soundcore evdev snd_page_alloc > tpm_tis tpm processor rng_core thermal_sys pcspkr tpm_bios shpchp > pci_hotplug button sd_mod crc_t10dif usbhid hid ide_cd_mod cdrom > ata_generic uhci_hcd ata_piix libata piix ide_core ehci_hcd usbcore > usb_common tg3 libphy mptsas mptscsih mptbase scsi_transport_sas > scsi_mod [last unloaded: netconsole] > [75143.964390] > [75143.964426] Pid: 3861, comm: rm Not tainted 3.3.4 #1 HP ProLiant DL160 G5 > [75143.964485] RIP: 0010:[<ffffffff811061cd>] [<ffffffff811061cd>] > path_init+0x218/0x2cc > [75143.964570] RSP: 0018:ffff880037b45d58 EFLAGS: 00010202 > [75143.964618] RAX: 0000000000000000 RBX: ffff8800b8975000 RCX: ffff880037b45ea8 > [75143.964672] RDX: ffff8800b929a900 RSI: ffff880037b45d74 RDI: ffff8800b9d0e830 > [75143.964727] RBP: 0000000000000050 R08: ffff880037b45de0 R09: 0000000000000000 > [75143.964781] R10: 0000006e7265746c R11: 0000000001406d90 R12: ffff880037b45ea8 > [75143.964835] R13: ffff8800b929a900 R14: ffff880037b45de0 R15: 0000000000000003 > [75143.964890] FS: 00007fb7a9b47700(0000) GS:ffff8800bfcc0000(0000) > knlGS:0000000000000000 > [75143.964974] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [75143.965023] CR2: 0000000000000030 CR3: 00000000379d5000 CR4: 00000000000006e0 > [75143.965078] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [75143.965132] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [75143.965187] Process rm (pid: 3861, threadinfo ffff880037b44000, > task ffff8800b91fed00) > [75143.965269] Stack: > [75143.965306] 00000000b9f76000 000000005088218e ffff8800b9f76280 > 00000000b9f76280 > [75143.965397] ffff880037b45ea8 0000000000000050 ffff8800b8975000 > 0000000000000010 > [75143.965489] 00000000013f2030 ffffffff811072b4 ffffffff81060362 > dead000000100100 > [75143.965581] Call Trace: > [75143.965622] [<ffffffff811072b4>] ? path_lookupat+0x2c/0x30b > [75143.965674] [<ffffffff81060362>] ? try_to_wake_up+0x1a5/0x1a5 > [75143.965725] [<ffffffff811075b1>] ? do_path_lookup+0x1e/0x9a > [75143.965775] [<ffffffff81107bf1>] ? user_path_parent+0x3a/0x5f > [75143.965826] [<ffffffff810f3484>] ? virt_to_head_page+0x9/0x2c > [75143.965877] [<ffffffff81107e96>] ? do_unlinkat+0x1d/0x15e > [75143.965927] [<ffffffff81109e94>] ? vfs_readdir+0x91/0xa7 > [75143.965977] [<ffffffff8112a9a5>] ? fsnotify_find_inode_mark+0x23/0x2f > [75143.966031] [<ffffffff810fa97e>] ? filp_close+0x64/0x6c > [75143.966082] [<ffffffff815d2679>] ? system_call_fastpath+0x16/0x1b > [75143.966133] Code: 04 01 e9 a1 00 00 00 48 8d 74 24 1c e8 ae 6f ff > ff 49 89 c5 b8 f7 ff ff ff 4d 85 ed 0f 84 b0 00 00 00 49 8b 45 18 80 > 3b 00 74 28<48> 8b 78 30 b8 ec ff ff ff 0f b7 17 81 e2 00 f0 00 00 81 > fa 00 > [75143.966507] RIP [<ffffffff811061cd>] path_init+0x218/0x2cc > [75143.966558] RSP<ffff880037b45d58> > [75143.966600] CR2: 0000000000000030 > [75143.967124] ---[ end trace 18e2f523c5af9a38 ]--- > [75143.967322] general protection fault: 0000 [#2] SMP > [75143.967542] CPU 3 > [75143.967607] Modules linked in: cbc netconsole loop i5400_edac > snd_pcm edac_core i5k_amb snd_timer snd soundcore evdev snd_page_alloc > tpm_tis tpm processor rng_core thermal_sys pcspkr tpm_bios shpchp > pci_hotplug button sd_mod crc_t10dif usbhid hid ide_cd_mod cdrom > ata_generic uhci_hcd ata_piix libata piix ide_core ehci_hcd usbcore > usb_common tg3 libphy mptsas mptscsih mptbase scsi_transport_sas > scsi_mod [last unloaded: netconsole] > [75143.970715] > [75143.970805] Pid: 3861, comm: rm Tainted: G D 3.3.4 #1 HP > ProLiant DL160 G5 > [75143.971058] RIP: 0010:[<ffffffff810fa947>] [<ffffffff810fa947>] > filp_close+0x2d/0x6c > [75143.971085] RSP: 0018:ffff880037b45a48 EFLAGS: 00010206 > [75143.971085] RAX: 0012080800000000 RBX: ffff8800b910d300 RCX: 0000000000000000 > [75143.971085] RDX: 0000000000000000 RSI: ffff8800b9712c00 RDI: ffff8800b910d300 > [75143.971085] RBP: ffff8800b9712c00 R08: 0000000000016870 R09: 00007fff71600000 > [75143.971085] R10: 0000000000000001 R11: ffff880037b459a8 R12: 0000000000000000 > [75143.971085] R13: ffff8800bb51f6c0 R14: 0000000000000004 R15: 0000000000000000 > [75143.971085] FS: 0000000000000000(0000) GS:ffff8800bfcc0000(0000) > knlGS:0000000000000000 > [75143.971085] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [75143.971085] CR2: 0000000000000030 CR3: 0000000001805000 CR4: 00000000000006e0 > [75143.971085] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [75143.971085] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [75143.971085] Process rm (pid: 3861, threadinfo ffff880037b44000, > task ffff8800b91fed00) > [75143.971085] Stack: > [75143.971085] ffff8800b9712c00 0000000000000007 0000000000000000 > ffffffff8103aaad > [75143.971085] 0000000000000009 ffff8800b91fed00 ffff8800b91ff218 > 0000000000000009 > [75143.971085] ffff8800b8ca0700 ffff8800b92c0880 0000000000000001 > ffffffff8103bfe4 > [75143.971085] Call Trace: > [75143.971085] [<ffffffff8103aaad>] ? put_files_struct+0x67/0xbf > [75143.971085] [<ffffffff8103bfe4>] ? do_exit+0x2aa/0x7e1 > [75143.971085] [<ffffffff810390ee>] ? kmsg_dump+0x53/0xef > [75143.971085] [<ffffffff815cd01a>] ? oops_end+0x66/0xb7 > [75143.971085] [<ffffffff815cd066>] ? oops_end+0xb2/0xb7 > [75143.971085] [<ffffffff8102eaca>] ? no_context+0x254/0x263 > [75143.971085] [<ffffffff81368f16>] ? ceph_writepages_start+0xbb4/0xbee > [75143.971085] [<ffffffff815cf1ef>] ? do_page_fault+0x215/0x34c > [75143.971085] [<ffffffff8136a1e5>] ? __cap_is_valid+0x19/0x9a > [75143.971085] [<ffffffff8136ba47>] ? ceph_encode_inode_release+0xed/0x2b2 > [75143.971085] [<ffffffff81063c10>] ? update_curr+0xfb/0x130 > [75143.971085] [<ffffffff8100d6fe>] ? __switch_to+0x20b/0x35f > [75143.971085] [<ffffffff81063c10>] ? update_curr+0xfb/0x130 > [75143.971085] [<ffffffff815cc5f5>] ? page_fault+0x25/0x30 > [75143.971085] [<ffffffff811061cd>] ? path_init+0x218/0x2cc > [75143.971085] [<ffffffff811061b3>] ? path_init+0x1fe/0x2cc > [75143.971085] [<ffffffff811072b4>] ? path_lookupat+0x2c/0x30b > [75143.971085] [<ffffffff81060362>] ? try_to_wake_up+0x1a5/0x1a5 > [75143.971085] [<ffffffff811075b1>] ? do_path_lookup+0x1e/0x9a > [75143.971085] [<ffffffff81107bf1>] ? user_path_parent+0x3a/0x5f > [75143.971085] [<ffffffff810f3484>] ? virt_to_head_page+0x9/0x2c > [75143.971085] [<ffffffff81107e96>] ? do_unlinkat+0x1d/0x15e > [75143.971085] [<ffffffff81109e94>] ? vfs_readdir+0x91/0xa7 > [75143.971085] [<ffffffff8112a9a5>] ? fsnotify_find_inode_mark+0x23/0x2f > [75143.971085] [<ffffffff810fa97e>] ? filp_close+0x64/0x6c > [75143.971085] [<ffffffff815d2679>] ? system_call_fastpath+0x16/0x1b > [75143.971085] Code: 55 48 89 f5 53 48 89 fb 48 8b 47 30 48 85 c0 75 > 11 48 c7 c7 6e d5 72 81 45 31 e4 e8 f0 fa 4c 00 eb 40 48 8b 47 20 48 > 85 c0 74 10<48> 8b 40 60 48 85 c0 74 07 ff d0 41 89 c4 eb 03 45 31 e4 > f6 43 > [75143.971085] RIP [<ffffffff810fa947>] filp_close+0x2d/0x6c > [75143.971085] RSP<ffff880037b45a48> > [75143.988721] ---[ end trace 18e2f523c5af9a39 ]--- > [75143.988826] Fixing recursive fault but reboot is needed! > [75146.018276] ------------[ cut here ]------------ > [75146.018399] kernel BUG at mm/slub.c:3442! > [75146.018498] invalid opcode: 0000 [#3] SMP > [75146.018718] CPU 1 > [75146.018789] Modules linked in: cbc netconsole loop i5400_edac > snd_pcm edac_core i5k_amb snd_timer snd soundcore evdev snd_page_alloc > tpm_tis tpm processor rng_core thermal_sys pcspkr tpm_bios shpchp > pci_hotplug button sd_mod crc_t10dif usbhid hid ide_cd_mod cdrom > ata_generic uhci_hcd ata_piix libata piix ide_core ehci_hcd usbcore > usb_common tg3 libphy mptsas mptscsih mptbase scsi_transport_sas > scsi_mod [last unloaded: netconsole] > [75146.021908] > [75146.021999] Pid: 1137, comm: kworker/1:0 Tainted: G D > 3.3.4 #1 HP ProLiant DL160 G5 > [75146.022236] RIP: 0010:[<ffffffff810f55df>] [<ffffffff810f55df>] > kfree+0x59/0xc2 > [75146.022236] RSP: 0018:ffff8800bbb35b90 EFLAGS: 00010246 > [75146.022236] RAX: 0100000000000400 RBX: ffff8800bfcdab70 RCX: ffff8800b9f762c8 > [75146.022236] RDX: ffff8800bfc4e550 RSI: 0000000000000000 RDI: ffffea0002ff3680 > [75146.022236] RBP: ffffffff815a2a50 R08: 0000000000000000 R09: ffffffff814f9620 > [75146.022236] R10: 000000000000000d R11: ffff8800b9d0d000 R12: ffffffff815a2a47 > [75146.022236] R13: ffff8800ba15d400 R14: ffff8800b9d0d271 R15: ffff8800b9d0d000 > [75146.022236] FS: 0000000000000000(0000) GS:ffff8800bfc40000(0000) > knlGS:0000000000000000 > [75146.022236] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [75146.022236] CR2: ffffffffff600400 CR3: 0000000037b13000 CR4: 00000000000006e0 > [75146.022236] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [75146.022236] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [75146.022236] Process kworker/1:0 (pid: 1137, threadinfo > ffff8800bbb34000, task ffff8800b90b0000) > [75146.022236] Stack: > [75146.022236] ffff8800b8fde500 ffffffff815a2a50 ffff8800b9f72800 > ffffffff815a2a47 > [75146.022236] ffff8800b8fde588 ffffffff81372b8e 0000001b00004040 > ffff8800b9f72a68 > [75146.022236] ffff8800b9f72800 ffffffff8137513d ffff8800ba15d400 > ffff8800b9f72a68 > [75146.022236] Call Trace: > [75146.022236] [<ffffffff815a2a50>] ? ceph_msg_kfree+0x47/0x47 > [75146.022236] [<ffffffff815a2a47>] ? ceph_msg_kfree+0x3e/0x47 > [75146.022236] [<ffffffff81372b8e>] ? kref_put+0x34/0x3e > [75146.022236] [<ffffffff8137513d>] ? ceph_mdsc_release_request+0x2f/0x145 > [75146.022236] [<ffffffff8137510e>] ? encode_caps_cb+0x2f9/0x2f9 > [75146.022236] [<ffffffff81372b8e>] ? kref_put+0x34/0x3e > [75146.022236] [<ffffffff813778d3>] ? dispatch+0xe05/0x132c > [75146.022236] [<ffffffff814b3d5e>] ? kernel_recvmsg+0x34/0x3f > [75146.022236] [<ffffffff813dce42>] ? crc32c+0x56/0x7c > [75146.022236] [<ffffffff815a39c7>] ? ceph_tcp_recvmsg+0x43/0x4f > [75146.022236] [<ffffffff815a658b>] ? con_work+0x15ac/0x17a8 > [75146.022236] [<ffffffff8104483f>] ? lock_timer_base+0x25/0x49 > [75146.022236] [<ffffffff815a4fdf>] ? ceph_fault+0x2b4/0x2b4 > [75146.022236] [<ffffffff8104ef8a>] ? process_one_work+0x1cd/0x2eb > [75146.022236] [<ffffffff8104f1d6>] ? worker_thread+0x12e/0x249 > [75146.022236] [<ffffffff8104f0a8>] ? process_one_work+0x2eb/0x2eb > [75146.022236] [<ffffffff8104f0a8>] ? process_one_work+0x2eb/0x2eb > [75146.022236] [<ffffffff81052b82>] ? kthread+0x81/0x89 > [75146.022236] [<ffffffff815d3a24>] ? kernel_thread_helper+0x4/0x10 > [75146.022236] [<ffffffff81052b01>] ? kthread_freezable_should_stop+0x53/0x53 > [75146.022236] [<ffffffff815d3a20>] ? gs_change+0x13/0x13 > [75146.022236] Code: 00 48 83 c5 10 48 83 7d 00 00 eb e6 48 83 fb 10 > 76 7d 48 89 df e8 ad de ff ff 48 89 c7 48 8b 00 84 c0 78 14 66 f7 07 > 00 c0 75 04<0f> 0b eb fe 5b 5d 41 5c e9 e8 03 fd ff 4c 8b 54 24 18 4c > 8b 4f > [75146.022236] RIP [<ffffffff810f55df>] kfree+0x59/0xc2 > [75146.022236] RSP<ffff8800bbb35b90> > [75146.031675] ---[ end trace 18e2f523c5af9a3a ]--- > [75146.031809] BUG: unable to handle kernel paging request at fffffffffffffff8 > [75146.032058] IP: [<ffffffff81052783>] kthread_data+0x7/0xc > [75146.032221] PGD 1807067 PUD 1808067 PMD 0 > [75146.032494] Oops: 0000 [#4] SMP > [75146.032706] CPU 1 > [75146.032771] Modules linked in: cbc netconsole loop i5400_edac > snd_pcm edac_core i5k_amb snd_timer snd soundcore evdev snd_page_alloc > tpm_tis tpm processor rng_core thermal_sys pcspkr tpm_bios shpchp > pci_hotplug button sd_mod crc_t10dif usbhid hid ide_cd_mod cdrom > ata_generic uhci_hcd ata_piix libata piix ide_core ehci_hcd usbcore > usb_common tg3 libphy mptsas mptscsih mptbase scsi_transport_sas > scsi_mod [last unloaded: netconsole] > [75146.035616] > [75146.035616] Pid: 1137, comm: kworker/1:0 Tainted: G D > 3.3.4 #1 HP ProLiant DL160 G5 > [75146.035616] RIP: 0010:[<ffffffff81052783>] [<ffffffff81052783>] > kthread_data+0x7/0xc > [75146.035616] RSP: 0018:ffff8800bbb35900 EFLAGS: 00010002 > [75146.035616] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001 > [75146.035616] RDX: ffffffff819a87b0 RSI: 0000000000000001 RDI: ffff8800b90b0000 > [75146.035616] RBP: ffff8800b90b0000 R08: 0000000000000400 R09: ffffffff81013c7c > [75146.035616] R10: ffff8800b90b0000 R11: ffff8800b90b0518 R12: ffff8800b90b02f8 > [75146.035616] R13: ffff8800bbb359c8 R14: 0000000000000001 R15: 0000000000000001 > [75146.035616] FS: 0000000000000000(0000) GS:ffff8800bfc40000(0000) > knlGS:0000000000000000 > [75146.035616] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [75146.035616] CR2: fffffffffffffff8 CR3: 0000000037b13000 CR4: 00000000000006e0 > [75146.035616] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [75146.035616] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [75146.035616] Process kworker/1:0 (pid: 1137, threadinfo > ffff8800bbb34000, task ffff8800b90b0000) > [75146.035616] Stack: > [75146.035616] ffffffff8104e2a2 ffff8800bfc53340 ffffffff815cb1d1 > 0000000000000001 > [75146.035616] 0000000000000296 0000000000013340 ffff8800bbb35fd8 > 0000000000013340 > [75146.035616] ffff8800bbb35fd8 0000000000013340 ffff8800b90b0000 > 0000000000013340 > [75146.035616] Call Trace: > [75146.035616] [<ffffffff8104e2a2>] ? wq_worker_sleeping+0x8/0x82 > [75146.035616] [<ffffffff815cb1d1>] ? __schedule+0x166/0x4fc > [75146.035616] [<ffffffff8103c517>] ? do_exit+0x7dd/0x7e1 > [75146.035616] [<ffffffff815ca46c>] ? printk+0x40/0x4c > [75146.035616] [<ffffffff815cd01a>] ? oops_end+0x66/0xb7 > [75146.035616] [<ffffffff815cd066>] ? oops_end+0xb2/0xb7 > [75146.035616] [<ffffffff815a2a47>] ? ceph_msg_kfree+0x3e/0x47 > [75146.035616] [<ffffffff8100ef69>] ? do_invalid_op+0x8b/0x95 > [75146.035616] [<ffffffff810f55df>] ? kfree+0x59/0xc2 > [75146.035616] [<ffffffff81515b94>] ? inet_recvmsg+0x64/0x75 > [75146.035616] [<ffffffff815a2a50>] ? ceph_msg_kfree+0x47/0x47 > [75146.035616] [<ffffffff815d389b>] ? invalid_op+0x1b/0x20 > [75146.035616] [<ffffffff815a2a47>] ? ceph_msg_kfree+0x3e/0x47 > [75146.035616] [<ffffffff815a2a50>] ? ceph_msg_kfree+0x47/0x47 > [75146.035616] [<ffffffff814f9620>] ? tcp_recvmsg+0x773/0x95e > [75146.035616] [<ffffffff810f55df>] ? kfree+0x59/0xc2 > [75146.035616] [<ffffffff815a2a50>] ? ceph_msg_kfree+0x47/0x47 > [75146.035616] [<ffffffff815a2a47>] ? ceph_msg_kfree+0x3e/0x47 > [75146.035616] [<ffffffff81372b8e>] ? kref_put+0x34/0x3e > [75146.035616] [<ffffffff8137513d>] ? ceph_mdsc_release_request+0x2f/0x145 > [75146.035616] [<ffffffff8137510e>] ? encode_caps_cb+0x2f9/0x2f9 > [75146.035616] [<ffffffff81372b8e>] ? kref_put+0x34/0x3e > [75146.035616] [<ffffffff813778d3>] ? dispatch+0xe05/0x132c > [75146.035616] [<ffffffff814b3d5e>] ? kernel_recvmsg+0x34/0x3f > [75146.035616] [<ffffffff813dce42>] ? crc32c+0x56/0x7c > [75146.035616] [<ffffffff815a39c7>] ? ceph_tcp_recvmsg+0x43/0x4f > [75146.035616] [<ffffffff815a658b>] ? con_work+0x15ac/0x17a8 > [75146.035616] [<ffffffff8104483f>] ? lock_timer_base+0x25/0x49 > [75146.035616] [<ffffffff815a4fdf>] ? ceph_fault+0x2b4/0x2b4 > [75146.035616] [<ffffffff8104ef8a>] ? process_one_work+0x1cd/0x2eb > [75146.035616] [<ffffffff8104f1d6>] ? worker_thread+0x12e/0x249 > [75146.035616] [<ffffffff8104f0a8>] ? process_one_work+0x2eb/0x2eb > [75146.035616] [<ffffffff8104f0a8>] ? process_one_work+0x2eb/0x2eb > [75146.035616] [<ffffffff81052b82>] ? kthread+0x81/0x89 > [75146.035616] [<ffffffff815d3a24>] ? kernel_thread_helper+0x4/0x10 > [75146.035616] [<ffffffff81052b01>] ? kthread_freezable_should_stop+0x53/0x53 > [75146.035616] [<ffffffff815d3a20>] ? gs_change+0x13/0x13 > [75146.035616] Code: 41 5e 41 5f c3 41 bf ea ff ff ff eb 97 90 90 90 > 65 48 8b 04 25 c0 c6 00 00 48 8b 80 a0 02 00 00 8b 40 f0 c3 48 8b 87 > a0 02 00 00<48> 8b 40 f8 c3 48 3b 3d 51 5f 95 00 75 08 0f bf 87 6a 06 > 00 00 > [75146.035616] RIP [<ffffffff81052783>] kthread_data+0x7/0xc > [75146.035616] RSP<ffff8800bbb35900> > [75146.035616] CR2: fffffffffffffff8 > [75146.035616] ---[ end trace 18e2f523c5af9a3b ]--- > [75146.035616] Fixing recursive fault but reboot is needed! > [75206.036002] INFO: rcu_sched detected stalls on CPUs/tasks: { 1} > (detected by 3, t=15002 jiffies) > [75206.036265] Pid: 0, comm: swapper/3 Tainted: G D 3.3.4 #1 > [75206.036371] Call Trace: > [75206.036464]<IRQ> [<ffffffff8109b7b3>] ? __rcu_pending+0x21a/0x336 > [75206.036635] [<ffffffff81075d59>] ? tick_nohz_handler+0xcb/0xcb > [75206.036740] [<ffffffff8109b9cc>] ? rcu_check_callbacks+0xa7/0xe7 > [75206.036846] [<ffffffff81075d59>] ? tick_nohz_handler+0xcb/0xcb > [75206.036951] [<ffffffff81044911>] ? update_process_times+0x31/0x63 > > ---------------------------------------------------- > > Thanks a lot, > Giorgos Kappes > > On Tue, May 8, 2012 at 10:18 PM, Tommi Virtanen<tv@inktank.com> wrote: >> On Tue, May 8, 2012 at 8:43 AM, Giorgos Kappes<geokapp@gmail.com> wrote: >>> When I am running deboostrap to install a base Debian Squeeze system >>> on a Ceph directory the client's kernel crashes with the following >>> message: >>> >>> I: Extracting zlib1g... >>> W: Failure trying to run: chroot /mnt/debian mount -t proc proc /proc >>> [ 759.776151] kernel tried to execute NX-protected page - exploit >>> attempt? (uid: 0) >>> [ 759.776169] BUG: unable to handle kernel paging request at ffffe8fffffe4ab0 >> ... >>> [ 759.776438] [<ffffffff81099405>] ? __rcu_process_callbacks+0x1c7/0x2f8 >>> [ 759.776447] [<ffffffff81099898>] ? rcu_process_callbacks+0x2c/0x56 >>> [ 759.776457] [<ffffffff8104cb72>] ? __do_softirq+0xc4/0x1a0 >>> [ 759.776465] [<ffffffff81096875>] ? handle_percpu_irq+0x3d/0x54 >>> [ 759.776475] [<ffffffff8150efb6>] ? __xen_evtchn_do_upcall+0x1c7/0x205 >>> [ 759.776484] [<ffffffff8176e52c>] ? call_softirq+0x1c/0x30 >>> [ 759.776493] [<ffffffff8100fa47>] ? do_softirq+0x3f/0x79 >>> [ 759.776501] [<ffffffff8104c942>] ? irq_exit+0x44/0xb5 >>> [ 759.776508] [<ffffffff8150ffc6>] ? xen_evtchn_do_upcall+0x27/0x32 >>> [ 759.776516] [<ffffffff8176e57e>] ? xen_do_hypervisor_callback+0x1e/0x30 >> ... >>> My simple cluster consists of 3 nodes in total. Each node is a Xen >>> domU guest running the Linux kernel 3.2.6 and ceph 0.43. For >>> reference, here is my configuration: >> ... >>> My Ceph kernel client is another Xen domU node running the Linux >>> kernel 3.2.11. I have also tried a native client with the same result. >>> Please note that this bug happens only in the client side. >>> Your help would be greatly appreciated. >> >> Your backtrace includes Xen code in it -- can you reproduce this bug >> with a mainline kernel, without Xen at all? >> >> Also, the error encountered is from the NX security subsystem. It >> would be nice to know what would happen without NX. > > > > ----------------------------------------------------------- > Giorgos Kappes > Website: http://www.cs.uoi.gr/~gkappes > email: geokapp@gmail.com ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-05-17 22:49 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-05-08 15:43 Ceph kernel client - kernel craches Giorgos Kappes 2012-05-08 19:18 ` Tommi Virtanen 2012-05-10 18:00 ` Giorgos Kappes 2012-05-17 22:49 ` Josh Durgin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.