All of lore.kernel.org
 help / color / mirror / Atom feed
* Ceph kernel client - kernel craches
@ 2012-05-08 15:43 Giorgos Kappes
  2012-05-08 19:18 ` Tommi Virtanen
  0 siblings, 1 reply; 4+ messages in thread
From: Giorgos Kappes @ 2012-05-08 15:43 UTC (permalink / raw)
  To: ceph-devel

hi,

When I am running deboostrap to install a base Debian Squeeze system
on a Ceph directory the client's kernel crashes with the following
message:

I: Retrieving Release
I: Validating Packages
I: Resolving dependencies of required packages...
I: Resolving dependencies of base packages...
I: Found additional required dependencies: insserv libbz2-1.0 libdb4.8 libslang2
I: Found additional base dependencies: libnfnetlink0 libsqlite3-0
I: Checking component main on http://ftp.us.debian.org/debian...
I: Validating libacl1
...
I: Extracting xz-utils...
I: Extracting zlib1g...
W: Failure trying to run: chroot /mnt/debian mount -t proc proc /proc
[  759.776151] kernel tried to execute NX-protected page - exploit
attempt? (uid: 0)
[  759.776169] BUG: unable to handle kernel paging request at ffffe8fffffe4ab0
[  759.776182] IP: [<ffffe8fffffe4ab0>] 0xffffe8fffffe4aaf
[  759.776195] PGD c42b067 PUD c42c067 PMD c42d067 PTE 801000000c445067
[  759.776209] Oops: 0011 [#1] SMP
[  759.776219] CPU 0
[  759.776224] Modules linked in: pcspkr [last unloaded: scsi_wait_scan]
[  759.776237]
[  759.776244] Pid: 0, comm: swapper/0 Tainted: G        W    3.2.11 #2
[  759.776255] RIP: e030:[<ffffe8fffffe4ab0>]  [<ffffe8fffffe4ab0>]
0xffffe8fffffe4aaf
[  759.776267] RSP: e02b:ffff88001ffaae98  EFLAGS: 00010296
[  759.776274] RAX: ffff880012d7a900 RBX: ffff88001ffb5960 RCX: ffffe8fffffe4ab0
[  759.776302] RDX: ffff88000d1a9b00 RSI: 000000000000000f RDI: ffff88000d1a9b00
[  759.776309] RBP: ffffffff81c1fa80 R08: ffff88001eb74000 R09: 000000018010000f
[  759.776317] R10: 000000008010000f R11: ffffffff818055f5 R12: ffff88001ffb5990
[  759.776324] R13: ffff88000c5ea880 R14: 0000000000000001 R15: 000000000000000a
[  759.776334] FS:  00007f21095a4740(0000) GS:ffff88001ffa7000(0000)
knlGS:0000000000000000
[  759.776342] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[  759.776349] CR2: ffffe8fffffe4ab0 CR3: 0000000012e28000 CR4: 0000000000002660
[  759.776356] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  759.776364] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  759.776372] Process swapper/0 (pid: 0, threadinfo ffffffff81c00000,
task ffffffff81c0d020)
[  759.776379] Stack:
[  759.776384]  ffffffff81099405 0000000000000001 ffff880012d7a900
ffff88001ffaaeb0
[  759.776397]  0000000000000048 ffffffff81c01fd8 0000000000000100
0000000000000001
[  759.776409]  0000000000000009 ffffffff81c01fd8 ffffffff81099898
ffffffff81c01fd8
[  759.776422] Call Trace:
[  759.776427]  <IRQ>
[  759.776438]  [<ffffffff81099405>] ? __rcu_process_callbacks+0x1c7/0x2f8
[  759.776447]  [<ffffffff81099898>] ? rcu_process_callbacks+0x2c/0x56
[  759.776457]  [<ffffffff8104cb72>] ? __do_softirq+0xc4/0x1a0
[  759.776465]  [<ffffffff81096875>] ? handle_percpu_irq+0x3d/0x54
[  759.776475]  [<ffffffff8150efb6>] ? __xen_evtchn_do_upcall+0x1c7/0x205
[  759.776484]  [<ffffffff8176e52c>] ? call_softirq+0x1c/0x30
[  759.776493]  [<ffffffff8100fa47>] ? do_softirq+0x3f/0x79
[  759.776501]  [<ffffffff8104c942>] ? irq_exit+0x44/0xb5
[  759.776508]  [<ffffffff8150ffc6>] ? xen_evtchn_do_upcall+0x27/0x32
[  759.776516]  [<ffffffff8176e57e>] ? xen_do_hypervisor_callback+0x1e/0x30
[  759.776523]  <EOI>
[  759.776531]  [<ffffffff81006f3f>] ? xen_restore_fl_direct_reloc+0x4/0x4
[  759.776539]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[  759.776547]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[  759.776556]  [<ffffffff8163969b>] ? cpuidle_idle_call+0x16/0x1af
[  759.776564]  [<ffffffff810068dc>] ? xen_safe_halt+0xc/0x15
[  759.776572]  [<ffffffff810150a6>] ? default_idle+0x4b/0x84
[  759.776580]  [<ffffffff8100ddf6>] ? cpu_idle+0xb9/0xef
[  759.776588]  [<ffffffff81cf7bff>] ? start_kernel+0x395/0x3a0
[  759.776596]  [<ffffffff81cfa536>] ? xen_start_kernel+0x593/0x598
[  759.776602] Code: e8 ff ff 80 4a fe ff ff e8 ff ff 0b 00 00 00 01
00 00 00 fa ff ff ff fa ff ff ff 06 00 00 00 02 00 00 00 05 00 00 00
cc cc cc cc <00> 9b 1a 0d 00 88 ff ff 00 0f b7 1e 00 88 ff ff 01 00 00
00 00
[  759.776699] RIP  [<ffffe8fffffe4ab0>] 0xffffe8fffffe4aaf
[  759.776712]  RSP <ffff88001ffaae98>
[  759.776717] CR2: ffffe8fffffe4ab0
[  759.776725] ---[ end trace 36924001333caa12 ]---
[  759.776731] Kernel panic - not syncing: Fatal exception in interrupt
[  759.776739] Pid: 0, comm: swapper/0 Tainted: G      D W    3.2.11 #2
[  759.776745] Call Trace:
[  759.776749]  <IRQ>  [<ffffffff81764003>] ? panic+0x92/0x1a0
[  759.776771]  [<ffffffff810478c0>] ? kmsg_dump+0x41/0xdd
[  759.776779]  [<ffffffff81766cc1>] ? oops_end+0xa9/0xb6
[  759.776788]  [<ffffffff8102ec7d>] ? no_context+0x1ff/0x20c
[  759.776795]  [<ffffffff81768d9f>] ? do_page_fault+0x1ad/0x34c
[  759.776805]  [<ffffffff8106dfb3>] ? tick_nohz_handler+0xcb/0xcb
[  759.776813]  [<ffffffff8102c12a>] ? pvclock_clocksource_read+0x46/0xb4
[  759.776821]  [<ffffffff81006eb3>] ? xen_vcpuop_set_next_event+0x4d/0x61
[  759.776829]  [<ffffffff8106cdcc>] ? clockevents_program_event+0x99/0xb8
[  759.776837]  [<ffffffff817663b5>] ? page_fault+0x25/0x30
[  759.776845]  [<ffffffff81099405>] ? __rcu_process_callbacks+0x1c7/0x2f8
[  759.776853]  [<ffffffff81099898>] ? rcu_process_callbacks+0x2c/0x56
[  759.776861]  [<ffffffff8104cb72>] ? __do_softirq+0xc4/0x1a0
[  759.776868]  [<ffffffff81096875>] ? handle_percpu_irq+0x3d/0x54
[  759.776876]  [<ffffffff8150efb6>] ? __xen_evtchn_do_upcall+0x1c7/0x205
[  759.776883]  [<ffffffff8176e52c>] ? call_softirq+0x1c/0x30
[  759.776891]  [<ffffffff8100fa47>] ? do_softirq+0x3f/0x79
[  759.776898]  [<ffffffff8104c942>] ? irq_exit+0x44/0xb5
[  759.776905]  [<ffffffff8150ffc6>] ? xen_evtchn_do_upcall+0x27/0x32
[  759.776913]  [<ffffffff8176e57e>] ? xen_do_hypervisor_callback+0x1e/0x30
[  759.776919]  <EOI>  [<ffffffff81006f3f>] ?
xen_restore_fl_direct_reloc+0x4/0x4
[  759.776931]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[  759.780132]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[  759.780132]  [<ffffffff8163969b>] ? cpuidle_idle_call+0x16/0x1af
[  759.780132]  [<ffffffff810068dc>] ? xen_safe_halt+0xc/0x15
[  759.780132]  [<ffffffff810150a6>] ? default_idle+0x4b/0x84
[  759.780132]  [<ffffffff8100ddf6>] ? cpu_idle+0xb9/0xef
[  759.780132]  [<ffffffff81cf7bff>] ? start_kernel+0x395/0x3a0



My simple cluster consists of 3 nodes in total. Each node is a Xen
domU guest running the Linux kernel 3.2.6 and ceph 0.43. For
reference, here is my configuration:

; -------------------------------------------------------------------------------------------
;
; ceph ceph.conf file.
;
; This file defines cluster membership, the various locations
; that Ceph stores data, and any other runtime options.

[global]
        ; enable secure authentication
        auth supported = cephx

        ; keyring placement
        keyring = /etc/ceph/$name.keyring
        ; allow ourselves to open a lot of files
        ; max open files = 131072

        ; set log file
        ; log file = /var/log/ceph/$name.log
        ; log_to_syslog = true        ; uncomment this line to log to syslog

        ; set up pid files
        ; pid file = /var/run/ceph/$name.pid

        ; If you want to run a IPv6 cluster, set this to true.
Dual-stack isn't possible
        ; ms bind ipv6 = true

; monitors
[mon]
        mon data = /mnt/store/$name

[mon.a]
        host = sm-ceph0
        mon addr = 192.168.2.254:6789

[mds]
        ; where the mds keeps it's secret encryption keys
        ;keyring = /data/keyring.$name

[mds.a]
        host = sm-ceph0

[osd]
        ; This is where the btrfs volume will be mounted.
        osd data = /mnt/store/$name

        ; This is a file-based journal.
        osd journal = /mnt/store/$name/$name.journal
        osd journal size = 1000 ; journal size, in megabytes

        ; You can change the number of recovery operations to speed up recovery
        ; or slow it down if your machines can't handle it
        ; osd recovery max active = 3

[osd.0]
        host = sm-ceph0
        btrfs devs = /dev/xvda3

        ; If you want to specify some other mount options, you can do so.
        ; The default values are rw,noatime
        ; btrfs options = rw,noatime

[osd.1]
        host = sm-ceph1
        btrfs devs = /dev/xvda3

[osd.2]
        host = sm-ceph2
        btrfs devs = /dev/xvda3
; -------------------------------------------------------------------------------------------
My Ceph kernel client is another Xen domU node running the Linux
kernel 3.2.11. I have also tried a native client with the same result.
Please note that this bug happens only in the client side.
Your help would be greatly appreciated.

Thanks,
Giorgos Kappes


-----------------------------------------------------------
Giorgos Kappes
Website: http://www.cs.uoi.gr/~gkappes
email: geokapp@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Ceph kernel client - kernel craches
  2012-05-08 15:43 Ceph kernel client - kernel craches Giorgos Kappes
@ 2012-05-08 19:18 ` Tommi Virtanen
  2012-05-10 18:00   ` Giorgos Kappes
  0 siblings, 1 reply; 4+ messages in thread
From: Tommi Virtanen @ 2012-05-08 19:18 UTC (permalink / raw)
  To: Giorgos Kappes; +Cc: ceph-devel

On Tue, May 8, 2012 at 8:43 AM, Giorgos Kappes <geokapp@gmail.com> wrote:
> When I am running deboostrap to install a base Debian Squeeze system
> on a Ceph directory the client's kernel crashes with the following
> message:
>
> I: Extracting zlib1g...
> W: Failure trying to run: chroot /mnt/debian mount -t proc proc /proc
> [  759.776151] kernel tried to execute NX-protected page - exploit
> attempt? (uid: 0)
> [  759.776169] BUG: unable to handle kernel paging request at ffffe8fffffe4ab0
...
> [  759.776438]  [<ffffffff81099405>] ? __rcu_process_callbacks+0x1c7/0x2f8
> [  759.776447]  [<ffffffff81099898>] ? rcu_process_callbacks+0x2c/0x56
> [  759.776457]  [<ffffffff8104cb72>] ? __do_softirq+0xc4/0x1a0
> [  759.776465]  [<ffffffff81096875>] ? handle_percpu_irq+0x3d/0x54
> [  759.776475]  [<ffffffff8150efb6>] ? __xen_evtchn_do_upcall+0x1c7/0x205
> [  759.776484]  [<ffffffff8176e52c>] ? call_softirq+0x1c/0x30
> [  759.776493]  [<ffffffff8100fa47>] ? do_softirq+0x3f/0x79
> [  759.776501]  [<ffffffff8104c942>] ? irq_exit+0x44/0xb5
> [  759.776508]  [<ffffffff8150ffc6>] ? xen_evtchn_do_upcall+0x27/0x32
> [  759.776516]  [<ffffffff8176e57e>] ? xen_do_hypervisor_callback+0x1e/0x30
...
> My simple cluster consists of 3 nodes in total. Each node is a Xen
> domU guest running the Linux kernel 3.2.6 and ceph 0.43. For
> reference, here is my configuration:
...
> My Ceph kernel client is another Xen domU node running the Linux
> kernel 3.2.11. I have also tried a native client with the same result.
> Please note that this bug happens only in the client side.
> Your help would be greatly appreciated.

Your backtrace includes Xen code in it -- can you reproduce this bug
with a mainline kernel, without Xen at all?

Also, the error encountered is from the NX security subsystem. It
would be nice to know what would happen without NX.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Ceph kernel client - kernel craches
  2012-05-08 19:18 ` Tommi Virtanen
@ 2012-05-10 18:00   ` Giorgos Kappes
  2012-05-17 22:49     ` Josh Durgin
  0 siblings, 1 reply; 4+ messages in thread
From: Giorgos Kappes @ 2012-05-10 18:00 UTC (permalink / raw)
  To: Tommi Virtanen; +Cc: ceph-devel

Sorry for my late response. I reproduced the above bug with the Linux
kernel 3.3.4 and without using XEN:

uname -a
Linux node33 3.3.4 #1 SMP Wed May 9 13:00:07 EEST 2012 x86_64 GNU/Linux

The trace is shown below:

----------------------------------------------------
[  763.984023] kernel tried to execute NX-protected page - exploit
attempt? (uid: 0)
[  763.984177] BUG: unable to handle kernel paging request at ffff880037bd0800
[  763.984402] IP: [<ffff880037bd0800>] 0xffff880037bd07ff
[  763.984568] PGD 1806063 PUD 180a063 PMD 8000000037a001e3
[  763.984845] Oops: 0011 [#1] SMP
[  763.985058] CPU 3
[  763.985124] Modules linked in: cbc netconsole loop snd_pcm
snd_timer snd soundcore snd_page_alloc processor tpm_tis i5400_edac
tpm edac_core tpm_bios evdev pcspkr i5k_amb rng_core thermal_sys
button shpchp pci_hotplug sd_mod crc_t10dif usbhid hid ide_cd_mod
cdrom ata_generic uhci_hcd ehci_hcd ata_piix libata piix ide_core
usbcore usb_common tg3 libphy mptsas mptscsih mptbase
scsi_transport_sas scsi_mod [last unloaded: scsi_wait_scan]
[  763.988002]
[  763.988002] Pid: 0, comm: swapper/3 Not tainted 3.3.4 #1 HP ProLiant DL160 G5
[  763.988002] RIP: 0010:[<ffff880037bd0800>]  [<ffff880037bd0800>]
0xffff880037bd07ff
[  763.988002] RSP: 0018:ffff8800bfcc3e78  EFLAGS: 00010292
[  763.988002] RAX: ffff8800b97745b0 RBX: ffff8800bfcce770 RCX: ffff880037bd0800
[  763.988002] RDX: ffff880037bd1600 RSI: 00000000b9b6a040 RDI: ffff880037bd1600
[  763.988002] RBP: ffffffff81820080 R08: ffff8800b9dd0b00 R09: 000000018020001c
[  763.988002] R10: 000000008020001c R11: ffffffff816075c0 R12: ffff8800bfcce7a0
[  763.988002] R13: ffff8800b97745b0 R14: 0000000000000003 R15: 000000000000000a
[  763.988002] FS:  0000000000000000(0000) GS:ffff8800bfcc0000(0000)
knlGS:0000000000000000
[  763.988002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  763.988002] CR2: ffff880037bd0800 CR3: 00000000b895b000 CR4: 00000000000006e0
[  763.988002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  763.988002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  763.988002] Process swapper/3 (pid: 0, threadinfo ffff8800bbae0000,
task ffff8800bbad8000)
[  763.988002] Stack:
[  763.988002]  ffffffff8109b44d ffff8800bbacd820 ffff8800b97745b0
ffff8800bbae0010
[  763.988002]  ffff8800bbad8000 ffff8800bfcc3ea0 0000000000000048
ffff8800bbae1fd8
[  763.988002]  0000000000000100 0000000000000001 0000000000000009
ffff8800bbae1fd8
[  763.988002] Call Trace:
[  763.988002]  <IRQ>
[  763.988002]  [<ffffffff8109b44d>] ? __rcu_process_callbacks+0x1e9/0x335
[  763.988002]  [<ffffffff8109b8fb>] ? rcu_process_callbacks+0x2c/0x56
[  763.988002]  [<ffffffff8103e3b1>] ? __do_softirq+0xc4/0x1a0
[  763.988002]  [<ffffffff8102515b>] ? lapic_next_event+0x18/0x1d
[  763.988002]  [<ffffffff815d3b1c>] ? call_softirq+0x1c/0x30
[  763.988002]  [<ffffffff8100fba3>] ? do_softirq+0x3f/0x79
[  763.988002]  [<ffffffff8103e186>] ? irq_exit+0x44/0xb1
[  763.988002]  [<ffffffff81025c61>] ? smp_apic_timer_interrupt+0x85/0x93
[  763.988002]  [<ffffffff815d311e>] ? apic_timer_interrupt+0x6e/0x80
[  763.988002]  <EOI>
[  763.988002]  [<ffffffff810145e1>] ? native_sched_clock+0x28/0x33
[  763.988002]  [<ffffffff810152f6>] ? mwait_idle+0x8c/0xbc
[  763.988002]  [<ffffffff810152ae>] ? mwait_idle+0x44/0xbc
[  763.988002]  [<ffffffff8100de94>] ? cpu_idle+0xb9/0xf7
[  763.988002]  [<ffffffff815c43c6>] ? start_secondary+0x270/0x275
[  763.988002] Code: 00 00 00 00 04 8a b8 00 88 ff ff 00 04 8a b8 00
88 ff ff 00 03 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 <00> 16 bd 37 00 88 ff ff 40 ab cd bf 00 88 ff ff 20 15 42
b9 00
[  763.988002] RIP  [<ffff880037bd0800>] 0xffff880037bd07ff
[  763.988002]  RSP <ffff8800bfcc3e78>
[  763.988002] CR2: ffff880037bd0800
[  763.988002] ---[ end trace 614049dc850267ac ]---
[  763.988002] Kernel panic - not syncing: Fatal exception in interrupt
[  763.997833] ------------[ cut here ]------------
[  763.997936] WARNING: at arch/x86/kernel/smp.c:120
update_process_times+0x57/0x63()
[  763.998072] Hardware name: ProLiant DL160 G5
[  763.998171] Modules linked in: cbc netconsole loop snd_pcm
snd_timer snd soundcore snd_page_alloc processor tpm_tis i5400_edac
tpm edac_core tpm_bios evdev pcspkr i5k_amb rng_core thermal_sys
button shpchp pci_hotplug sd_mod crc_t10dif usbhid hid ide_cd_mod
cdrom ata_generic uhci_hcd ehci_hcd ata_piix libata piix ide_core
usbcore usb_common tg3 libphy mptsas mptscsih mptbase
scsi_transport_sas scsi_mod [last unloaded: scsi_wait_scan]
[  764.001205] Pid: 0, comm: swapper/3 Tainted: G      D      3.3.4 #1
[  764.001311] Call Trace:
[  764.001404]  <IRQ>  [<ffffffff81038bb0>] ? warn_slowpath_common+0x78/0x8c
[  764.001573]  [<ffffffff81044937>] ? update_process_times+0x57/0x63
[  764.001681]  [<ffffffff81075dbe>] ? tick_sched_timer+0x65/0x8b
[  764.001788]  [<ffffffff810561bd>] ? __run_hrtimer+0xb2/0x13d
[  764.001832]  [<ffffffff81013ca9>] ? read_tsc+0x5/0x16
[  764.001832]  [<ffffffff81056482>] ? hrtimer_interrupt+0xd8/0x1a7
[  764.001832]  [<ffffffff81025c5c>] ? smp_apic_timer_interrupt+0x80/0x93
[  764.001832]  [<ffffffff81025c89>] ? native_safe_apic_wait_icr_idle+0x1a/0x49
[  764.001832]  [<ffffffff815d311e>] ? apic_timer_interrupt+0x6e/0x80
[  764.001832]  [<ffffffff81056eaa>] ? up+0xe/0x36
[  764.001832]  [<ffffffff815ca3ec>] ? panic+0x189/0x1c9
[  764.001832]  [<ffffffff815ca353>] ? panic+0xf0/0x1c9
[  764.001832]  [<ffffffff810390ee>] ? kmsg_dump+0x53/0xef
[  764.001832]  [<ffffffff815cd05e>] ? oops_end+0xaa/0xb7
[  764.001832]  [<ffffffff8102eaca>] ? no_context+0x254/0x263
[  764.001832]  [<ffffffff815cf187>] ? do_page_fault+0x1ad/0x34c
[  764.001832]  [<ffffffff814c6b67>] ? __netif_receive_skb+0x44d/0x491
[  764.001832]  [<ffffffff81013ca9>] ? read_tsc+0x5/0x16
[  764.001832]  [<ffffffff814c6f4f>] ? netif_receive_skb+0x71/0x77
[  764.001832]  [<ffffffff814c74bd>] ? napi_gro_receive+0x1f/0x2c
[  764.001832]  [<ffffffff814c7029>] ? napi_skb_finish+0x1c/0x31
[  764.001832]  [<ffffffffa008cc74>] ? tg3_poll_work+0x8f9/0xb66 [tg3]
[  764.001832]  [<ffffffff815cc5f5>] ? page_fault+0x25/0x30
[  764.001832]  [<ffffffff8109b44d>] ? __rcu_process_callbacks+0x1e9/0x335
[  764.001832]  [<ffffffff8109b8fb>] ? rcu_process_callbacks+0x2c/0x56
[  764.001832]  [<ffffffff8103e3b1>] ? __do_softirq+0xc4/0x1a0
[  764.001832]  [<ffffffff8102515b>] ? lapic_next_event+0x18/0x1d
[  764.001832]  [<ffffffff815d3b1c>] ? call_softirq+0x1c/0x30
[  764.001832]  [<ffffffff8100fba3>] ? do_softirq+0x3f/0x79
[  764.001832]  [<ffffffff8103e186>] ? irq_exit+0x44/0xb1
[  764.001832]  [<ffffffff81025c61>] ? smp_apic_timer_interrupt+0x85/0x93
[  764.001832]  [<ffffffff815d311e>] ? apic_timer_interrupt+0x6e/0x80
[  764.001832]  <EOI>  [<ffffffff810145e1>] ? native_sched_clock+0x28/0x33
[  764.001832]  [<ffffffff810152f6>] ? mwait_idle+0x8c/0xbc
[  764.001832]  [<ffffffff810152ae>] ? mwait_idle+0x44/0xbc
[  764.001832]  [<ffffffff8100de94>] ? cpu_idle+0xb9/0xf7
[  764.001832]  [<ffffffff815c43c6>] ? start_secondary+0x270/0x275
[  764.001832] ---[ end trace 614049dc850267ad ]---

----------------------------------------------------

Also, as you noted, I disabled the NX bit by passing "noexec=off" to
the kernel.
Unfortunately, the bug is still happening:

----------------------------------------------------
[  703.168022] BUG: unable to handle kernel paging request at ffff87ffbfa0e22b
[  703.168293] IP: [<ffff8800b9767200>] 0xffff8800b97671ff
[  703.168457] PGD 0
[  703.168613] Oops: 0002 [#1] SMP
[  703.168831] CPU 0
[  703.168896] Modules linked in: cbc netconsole loop tpm_tis snd_pcm
snd_timer snd soundcore shpchp pci_hotplug snd_page_alloc tpm
i5400_edac rng_core tpm_bios edac_core i5k_amb processor pcspkr
thermal_sys evdev button sd_mod crc_t10dif usbhid hid sg sr_mod cdrom
ata_generic uhci_hcd piix ide_core ehci_hcd ata_piix tg3 libphy
usbcore usb_common libata mptsas mptscsih mptbase scsi_transport_sas
scsi_mod [last unloaded: scsi_wait_scan]
[  703.172001]
[  703.172001] Pid: 0, comm: swapper/0 Not tainted 3.3.4 #1 HP ProLiant DL160 G5
[  703.172001] RIP: 0010:[<ffff8800b9767200>]  [<ffff8800b9767200>]
0xffff8800b97671ff
[  703.172001] RSP: 0018:ffff8800bfc03e78  EFLAGS: 00010292
[  703.172001] RAX: ffff880037a02900 RBX: ffff8800bfc0e770 RCX: ffff8800b9767200
[  703.172001] RDX: ffff8800b92b9000 RSI: ffff8800b8901800 RDI: ffff8800b92b9000
[  703.172001] RBP: ffffffff81820080 R08: ffff8800b8f77f00 R09: 000000018020000a
[  703.172001] R10: 000000008020000a R11: ffff8800bfc0e600 R12: ffff8800bfc0e7a0
[  703.172001] R13: ffff8800ba177370 R14: 0000000000000005 R15: 000000000000000a
[  703.172001] FS:  0000000000000000(0000) GS:ffff8800bfc00000(0000)
knlGS:0000000000000000
[  703.172001] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  703.172001] CR2: ffff87ffbfa0e22b CR3: 0000000001805000 CR4: 00000000000006f0
[  703.172001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  703.172001] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  703.172001] Process swapper/0 (pid: 0, threadinfo ffffffff81800000,
task ffffffff8180d020)
[  703.172001] Stack:
[  703.172001]  ffffffff8109b44d 0000000000000000 ffff880037a02900
ffffffff81800010
[  703.172001]  ffffffff8180d020 ffffffff81801fd8 0000000000000048
ffffffff81801fd8
[  703.172001]  0000000000000100 0000000000000001 0000000000000009
ffffffff81801fd8
[  703.172001] Call Trace:
[  703.172001]  <IRQ>
[  703.172001]  [<ffffffff8109b44d>] ? __rcu_process_callbacks+0x1e9/0x335
[  703.172001]  [<ffffffff8109b8fb>] ? rcu_process_callbacks+0x2c/0x56
[  703.172001]  [<ffffffff8103e3b1>] ? __do_softirq+0xc4/0x1a0
[  703.172001]  [<ffffffff8102515b>] ? lapic_next_event+0x18/0x1d
[  703.172001]  [<ffffffff815d3b1c>] ? call_softirq+0x1c/0x30
[  703.172001]  [<ffffffff8100fba3>] ? do_softirq+0x3f/0x79
[  703.172001]  [<ffffffff8103e186>] ? irq_exit+0x44/0xb1
[  703.172001]  [<ffffffff81025c61>] ? smp_apic_timer_interrupt+0x85/0x93
[  703.172001]  [<ffffffff815d311e>] ? apic_timer_interrupt+0x6e/0x80
[  703.172001]  <EOI>
[  703.172001]  [<ffffffff810145e1>] ? native_sched_clock+0x28/0x33
[  703.172001]  [<ffffffff810152f6>] ? mwait_idle+0x8c/0xbc
[  703.172001]  [<ffffffff810152ae>] ? mwait_idle+0x44/0xbc
[  703.172001]  [<ffffffff8100de94>] ? cpu_idle+0xb9/0xf7
[  703.172001]  [<ffffffff818c1c06>] ? start_kernel+0x395/0x3a0
[  703.172001]  [<ffffffff818c13d1>] ? x86_64_start_kernel+0x102/0x10f
[  703.172001] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 <00> 90 2b b9 00 88 ff ff 20 ac c1 bf 00 88 ff ff 20 98 a7
b8 00
[  703.172001] RIP  [<ffff8800b9767200>] 0xffff8800b97671ff
[  703.172001]  RSP <ffff8800bfc03e78>
[  703.172001] CR2: ffff87ffbfa0e22b
[  703.172001] ---[ end trace 15e08c2db2033830 ]---
[  703.172001] Kernel panic - not syncing: Fatal exception in interrupt
----------------------------------------------------

The strange thing is that the crash traces does not contain any calls
related to Ceph.
However, this bug only happens when running debootstrap to install a
base Debian system
into a Ceph directory. Debootstrap completes successfully when the
target directory is
under NFS or on a local file system.

Furthermore, a different crash occurs when trying to remove a
non-empty Ceph directory:
******************************************************
root@node33:/mnt# rm debian -r
rm: cannot remove `debian/etc': Directory not empty
Write failed: Broken pipe
******************************************************
The crash trace is shown below:

----------------------------------------------------

[74576.543412] libceph: client0 fsid 9b3222ac-fce2-44eb-8599-d39da02d2393
[74576.651197] libceph: mon0 192.168.2.254:6789 session established
[75143.963663] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000030
[75143.963771] IP: [<ffffffff811061cd>] path_init+0x218/0x2cc
[75143.963827] PGD 37a63067 PUD 37b4b067 PMD 0
[75143.963880] Oops: 0000 [#1] SMP
[75143.963928] CPU 3
[75143.963935] Modules linked in: cbc netconsole loop i5400_edac
snd_pcm edac_core i5k_amb snd_timer snd soundcore evdev snd_page_alloc
tpm_tis tpm processor rng_core thermal_sys pcspkr tpm_bios shpchp
pci_hotplug button sd_mod crc_t10dif usbhid hid ide_cd_mod cdrom
ata_generic uhci_hcd ata_piix libata piix ide_core ehci_hcd usbcore
usb_common tg3 libphy mptsas mptscsih mptbase scsi_transport_sas
scsi_mod [last unloaded: netconsole]
[75143.964390]
[75143.964426] Pid: 3861, comm: rm Not tainted 3.3.4 #1 HP ProLiant DL160 G5
[75143.964485] RIP: 0010:[<ffffffff811061cd>]  [<ffffffff811061cd>]
path_init+0x218/0x2cc
[75143.964570] RSP: 0018:ffff880037b45d58  EFLAGS: 00010202
[75143.964618] RAX: 0000000000000000 RBX: ffff8800b8975000 RCX: ffff880037b45ea8
[75143.964672] RDX: ffff8800b929a900 RSI: ffff880037b45d74 RDI: ffff8800b9d0e830
[75143.964727] RBP: 0000000000000050 R08: ffff880037b45de0 R09: 0000000000000000
[75143.964781] R10: 0000006e7265746c R11: 0000000001406d90 R12: ffff880037b45ea8
[75143.964835] R13: ffff8800b929a900 R14: ffff880037b45de0 R15: 0000000000000003
[75143.964890] FS:  00007fb7a9b47700(0000) GS:ffff8800bfcc0000(0000)
knlGS:0000000000000000
[75143.964974] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[75143.965023] CR2: 0000000000000030 CR3: 00000000379d5000 CR4: 00000000000006e0
[75143.965078] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[75143.965132] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[75143.965187] Process rm (pid: 3861, threadinfo ffff880037b44000,
task ffff8800b91fed00)
[75143.965269] Stack:
[75143.965306]  00000000b9f76000 000000005088218e ffff8800b9f76280
00000000b9f76280
[75143.965397]  ffff880037b45ea8 0000000000000050 ffff8800b8975000
0000000000000010
[75143.965489]  00000000013f2030 ffffffff811072b4 ffffffff81060362
dead000000100100
[75143.965581] Call Trace:
[75143.965622]  [<ffffffff811072b4>] ? path_lookupat+0x2c/0x30b
[75143.965674]  [<ffffffff81060362>] ? try_to_wake_up+0x1a5/0x1a5
[75143.965725]  [<ffffffff811075b1>] ? do_path_lookup+0x1e/0x9a
[75143.965775]  [<ffffffff81107bf1>] ? user_path_parent+0x3a/0x5f
[75143.965826]  [<ffffffff810f3484>] ? virt_to_head_page+0x9/0x2c
[75143.965877]  [<ffffffff81107e96>] ? do_unlinkat+0x1d/0x15e
[75143.965927]  [<ffffffff81109e94>] ? vfs_readdir+0x91/0xa7
[75143.965977]  [<ffffffff8112a9a5>] ? fsnotify_find_inode_mark+0x23/0x2f
[75143.966031]  [<ffffffff810fa97e>] ? filp_close+0x64/0x6c
[75143.966082]  [<ffffffff815d2679>] ? system_call_fastpath+0x16/0x1b
[75143.966133] Code: 04 01 e9 a1 00 00 00 48 8d 74 24 1c e8 ae 6f ff
ff 49 89 c5 b8 f7 ff ff ff 4d 85 ed 0f 84 b0 00 00 00 49 8b 45 18 80
3b 00 74 28 <48> 8b 78 30 b8 ec ff ff ff 0f b7 17 81 e2 00 f0 00 00 81
fa 00
[75143.966507] RIP  [<ffffffff811061cd>] path_init+0x218/0x2cc
[75143.966558]  RSP <ffff880037b45d58>
[75143.966600] CR2: 0000000000000030
[75143.967124] ---[ end trace 18e2f523c5af9a38 ]---
[75143.967322] general protection fault: 0000 [#2] SMP
[75143.967542] CPU 3
[75143.967607] Modules linked in: cbc netconsole loop i5400_edac
snd_pcm edac_core i5k_amb snd_timer snd soundcore evdev snd_page_alloc
tpm_tis tpm processor rng_core thermal_sys pcspkr tpm_bios shpchp
pci_hotplug button sd_mod crc_t10dif usbhid hid ide_cd_mod cdrom
ata_generic uhci_hcd ata_piix libata piix ide_core ehci_hcd usbcore
usb_common tg3 libphy mptsas mptscsih mptbase scsi_transport_sas
scsi_mod [last unloaded: netconsole]
[75143.970715]
[75143.970805] Pid: 3861, comm: rm Tainted: G      D      3.3.4 #1 HP
ProLiant DL160 G5
[75143.971058] RIP: 0010:[<ffffffff810fa947>]  [<ffffffff810fa947>]
filp_close+0x2d/0x6c
[75143.971085] RSP: 0018:ffff880037b45a48  EFLAGS: 00010206
[75143.971085] RAX: 0012080800000000 RBX: ffff8800b910d300 RCX: 0000000000000000
[75143.971085] RDX: 0000000000000000 RSI: ffff8800b9712c00 RDI: ffff8800b910d300
[75143.971085] RBP: ffff8800b9712c00 R08: 0000000000016870 R09: 00007fff71600000
[75143.971085] R10: 0000000000000001 R11: ffff880037b459a8 R12: 0000000000000000
[75143.971085] R13: ffff8800bb51f6c0 R14: 0000000000000004 R15: 0000000000000000
[75143.971085] FS:  0000000000000000(0000) GS:ffff8800bfcc0000(0000)
knlGS:0000000000000000
[75143.971085] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[75143.971085] CR2: 0000000000000030 CR3: 0000000001805000 CR4: 00000000000006e0
[75143.971085] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[75143.971085] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[75143.971085] Process rm (pid: 3861, threadinfo ffff880037b44000,
task ffff8800b91fed00)
[75143.971085] Stack:
[75143.971085]  ffff8800b9712c00 0000000000000007 0000000000000000
ffffffff8103aaad
[75143.971085]  0000000000000009 ffff8800b91fed00 ffff8800b91ff218
0000000000000009
[75143.971085]  ffff8800b8ca0700 ffff8800b92c0880 0000000000000001
ffffffff8103bfe4
[75143.971085] Call Trace:
[75143.971085]  [<ffffffff8103aaad>] ? put_files_struct+0x67/0xbf
[75143.971085]  [<ffffffff8103bfe4>] ? do_exit+0x2aa/0x7e1
[75143.971085]  [<ffffffff810390ee>] ? kmsg_dump+0x53/0xef
[75143.971085]  [<ffffffff815cd01a>] ? oops_end+0x66/0xb7
[75143.971085]  [<ffffffff815cd066>] ? oops_end+0xb2/0xb7
[75143.971085]  [<ffffffff8102eaca>] ? no_context+0x254/0x263
[75143.971085]  [<ffffffff81368f16>] ? ceph_writepages_start+0xbb4/0xbee
[75143.971085]  [<ffffffff815cf1ef>] ? do_page_fault+0x215/0x34c
[75143.971085]  [<ffffffff8136a1e5>] ? __cap_is_valid+0x19/0x9a
[75143.971085]  [<ffffffff8136ba47>] ? ceph_encode_inode_release+0xed/0x2b2
[75143.971085]  [<ffffffff81063c10>] ? update_curr+0xfb/0x130
[75143.971085]  [<ffffffff8100d6fe>] ? __switch_to+0x20b/0x35f
[75143.971085]  [<ffffffff81063c10>] ? update_curr+0xfb/0x130
[75143.971085]  [<ffffffff815cc5f5>] ? page_fault+0x25/0x30
[75143.971085]  [<ffffffff811061cd>] ? path_init+0x218/0x2cc
[75143.971085]  [<ffffffff811061b3>] ? path_init+0x1fe/0x2cc
[75143.971085]  [<ffffffff811072b4>] ? path_lookupat+0x2c/0x30b
[75143.971085]  [<ffffffff81060362>] ? try_to_wake_up+0x1a5/0x1a5
[75143.971085]  [<ffffffff811075b1>] ? do_path_lookup+0x1e/0x9a
[75143.971085]  [<ffffffff81107bf1>] ? user_path_parent+0x3a/0x5f
[75143.971085]  [<ffffffff810f3484>] ? virt_to_head_page+0x9/0x2c
[75143.971085]  [<ffffffff81107e96>] ? do_unlinkat+0x1d/0x15e
[75143.971085]  [<ffffffff81109e94>] ? vfs_readdir+0x91/0xa7
[75143.971085]  [<ffffffff8112a9a5>] ? fsnotify_find_inode_mark+0x23/0x2f
[75143.971085]  [<ffffffff810fa97e>] ? filp_close+0x64/0x6c
[75143.971085]  [<ffffffff815d2679>] ? system_call_fastpath+0x16/0x1b
[75143.971085] Code: 55 48 89 f5 53 48 89 fb 48 8b 47 30 48 85 c0 75
11 48 c7 c7 6e d5 72 81 45 31 e4 e8 f0 fa 4c 00 eb 40 48 8b 47 20 48
85 c0 74 10 <48> 8b 40 60 48 85 c0 74 07 ff d0 41 89 c4 eb 03 45 31 e4
f6 43
[75143.971085] RIP  [<ffffffff810fa947>] filp_close+0x2d/0x6c
[75143.971085]  RSP <ffff880037b45a48>
[75143.988721] ---[ end trace 18e2f523c5af9a39 ]---
[75143.988826] Fixing recursive fault but reboot is needed!
[75146.018276] ------------[ cut here ]------------
[75146.018399] kernel BUG at mm/slub.c:3442!
[75146.018498] invalid opcode: 0000 [#3] SMP
[75146.018718] CPU 1
[75146.018789] Modules linked in: cbc netconsole loop i5400_edac
snd_pcm edac_core i5k_amb snd_timer snd soundcore evdev snd_page_alloc
tpm_tis tpm processor rng_core thermal_sys pcspkr tpm_bios shpchp
pci_hotplug button sd_mod crc_t10dif usbhid hid ide_cd_mod cdrom
ata_generic uhci_hcd ata_piix libata piix ide_core ehci_hcd usbcore
usb_common tg3 libphy mptsas mptscsih mptbase scsi_transport_sas
scsi_mod [last unloaded: netconsole]
[75146.021908]
[75146.021999] Pid: 1137, comm: kworker/1:0 Tainted: G      D
3.3.4 #1 HP ProLiant DL160 G5
[75146.022236] RIP: 0010:[<ffffffff810f55df>]  [<ffffffff810f55df>]
kfree+0x59/0xc2
[75146.022236] RSP: 0018:ffff8800bbb35b90  EFLAGS: 00010246
[75146.022236] RAX: 0100000000000400 RBX: ffff8800bfcdab70 RCX: ffff8800b9f762c8
[75146.022236] RDX: ffff8800bfc4e550 RSI: 0000000000000000 RDI: ffffea0002ff3680
[75146.022236] RBP: ffffffff815a2a50 R08: 0000000000000000 R09: ffffffff814f9620
[75146.022236] R10: 000000000000000d R11: ffff8800b9d0d000 R12: ffffffff815a2a47
[75146.022236] R13: ffff8800ba15d400 R14: ffff8800b9d0d271 R15: ffff8800b9d0d000
[75146.022236] FS:  0000000000000000(0000) GS:ffff8800bfc40000(0000)
knlGS:0000000000000000
[75146.022236] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[75146.022236] CR2: ffffffffff600400 CR3: 0000000037b13000 CR4: 00000000000006e0
[75146.022236] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[75146.022236] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[75146.022236] Process kworker/1:0 (pid: 1137, threadinfo
ffff8800bbb34000, task ffff8800b90b0000)
[75146.022236] Stack:
[75146.022236]  ffff8800b8fde500 ffffffff815a2a50 ffff8800b9f72800
ffffffff815a2a47
[75146.022236]  ffff8800b8fde588 ffffffff81372b8e 0000001b00004040
ffff8800b9f72a68
[75146.022236]  ffff8800b9f72800 ffffffff8137513d ffff8800ba15d400
ffff8800b9f72a68
[75146.022236] Call Trace:
[75146.022236]  [<ffffffff815a2a50>] ? ceph_msg_kfree+0x47/0x47
[75146.022236]  [<ffffffff815a2a47>] ? ceph_msg_kfree+0x3e/0x47
[75146.022236]  [<ffffffff81372b8e>] ? kref_put+0x34/0x3e
[75146.022236]  [<ffffffff8137513d>] ? ceph_mdsc_release_request+0x2f/0x145
[75146.022236]  [<ffffffff8137510e>] ? encode_caps_cb+0x2f9/0x2f9
[75146.022236]  [<ffffffff81372b8e>] ? kref_put+0x34/0x3e
[75146.022236]  [<ffffffff813778d3>] ? dispatch+0xe05/0x132c
[75146.022236]  [<ffffffff814b3d5e>] ? kernel_recvmsg+0x34/0x3f
[75146.022236]  [<ffffffff813dce42>] ? crc32c+0x56/0x7c
[75146.022236]  [<ffffffff815a39c7>] ? ceph_tcp_recvmsg+0x43/0x4f
[75146.022236]  [<ffffffff815a658b>] ? con_work+0x15ac/0x17a8
[75146.022236]  [<ffffffff8104483f>] ? lock_timer_base+0x25/0x49
[75146.022236]  [<ffffffff815a4fdf>] ? ceph_fault+0x2b4/0x2b4
[75146.022236]  [<ffffffff8104ef8a>] ? process_one_work+0x1cd/0x2eb
[75146.022236]  [<ffffffff8104f1d6>] ? worker_thread+0x12e/0x249
[75146.022236]  [<ffffffff8104f0a8>] ? process_one_work+0x2eb/0x2eb
[75146.022236]  [<ffffffff8104f0a8>] ? process_one_work+0x2eb/0x2eb
[75146.022236]  [<ffffffff81052b82>] ? kthread+0x81/0x89
[75146.022236]  [<ffffffff815d3a24>] ? kernel_thread_helper+0x4/0x10
[75146.022236]  [<ffffffff81052b01>] ? kthread_freezable_should_stop+0x53/0x53
[75146.022236]  [<ffffffff815d3a20>] ? gs_change+0x13/0x13
[75146.022236] Code: 00 48 83 c5 10 48 83 7d 00 00 eb e6 48 83 fb 10
76 7d 48 89 df e8 ad de ff ff 48 89 c7 48 8b 00 84 c0 78 14 66 f7 07
00 c0 75 04 <0f> 0b eb fe 5b 5d 41 5c e9 e8 03 fd ff 4c 8b 54 24 18 4c
8b 4f
[75146.022236] RIP  [<ffffffff810f55df>] kfree+0x59/0xc2
[75146.022236]  RSP <ffff8800bbb35b90>
[75146.031675] ---[ end trace 18e2f523c5af9a3a ]---
[75146.031809] BUG: unable to handle kernel paging request at fffffffffffffff8
[75146.032058] IP: [<ffffffff81052783>] kthread_data+0x7/0xc
[75146.032221] PGD 1807067 PUD 1808067 PMD 0
[75146.032494] Oops: 0000 [#4] SMP
[75146.032706] CPU 1
[75146.032771] Modules linked in: cbc netconsole loop i5400_edac
snd_pcm edac_core i5k_amb snd_timer snd soundcore evdev snd_page_alloc
tpm_tis tpm processor rng_core thermal_sys pcspkr tpm_bios shpchp
pci_hotplug button sd_mod crc_t10dif usbhid hid ide_cd_mod cdrom
ata_generic uhci_hcd ata_piix libata piix ide_core ehci_hcd usbcore
usb_common tg3 libphy mptsas mptscsih mptbase scsi_transport_sas
scsi_mod [last unloaded: netconsole]
[75146.035616]
[75146.035616] Pid: 1137, comm: kworker/1:0 Tainted: G      D
3.3.4 #1 HP ProLiant DL160 G5
[75146.035616] RIP: 0010:[<ffffffff81052783>]  [<ffffffff81052783>]
kthread_data+0x7/0xc
[75146.035616] RSP: 0018:ffff8800bbb35900  EFLAGS: 00010002
[75146.035616] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001
[75146.035616] RDX: ffffffff819a87b0 RSI: 0000000000000001 RDI: ffff8800b90b0000
[75146.035616] RBP: ffff8800b90b0000 R08: 0000000000000400 R09: ffffffff81013c7c
[75146.035616] R10: ffff8800b90b0000 R11: ffff8800b90b0518 R12: ffff8800b90b02f8
[75146.035616] R13: ffff8800bbb359c8 R14: 0000000000000001 R15: 0000000000000001
[75146.035616] FS:  0000000000000000(0000) GS:ffff8800bfc40000(0000)
knlGS:0000000000000000
[75146.035616] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[75146.035616] CR2: fffffffffffffff8 CR3: 0000000037b13000 CR4: 00000000000006e0
[75146.035616] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[75146.035616] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[75146.035616] Process kworker/1:0 (pid: 1137, threadinfo
ffff8800bbb34000, task ffff8800b90b0000)
[75146.035616] Stack:
[75146.035616]  ffffffff8104e2a2 ffff8800bfc53340 ffffffff815cb1d1
0000000000000001
[75146.035616]  0000000000000296 0000000000013340 ffff8800bbb35fd8
0000000000013340
[75146.035616]  ffff8800bbb35fd8 0000000000013340 ffff8800b90b0000
0000000000013340
[75146.035616] Call Trace:
[75146.035616]  [<ffffffff8104e2a2>] ? wq_worker_sleeping+0x8/0x82
[75146.035616]  [<ffffffff815cb1d1>] ? __schedule+0x166/0x4fc
[75146.035616]  [<ffffffff8103c517>] ? do_exit+0x7dd/0x7e1
[75146.035616]  [<ffffffff815ca46c>] ? printk+0x40/0x4c
[75146.035616]  [<ffffffff815cd01a>] ? oops_end+0x66/0xb7
[75146.035616]  [<ffffffff815cd066>] ? oops_end+0xb2/0xb7
[75146.035616]  [<ffffffff815a2a47>] ? ceph_msg_kfree+0x3e/0x47
[75146.035616]  [<ffffffff8100ef69>] ? do_invalid_op+0x8b/0x95
[75146.035616]  [<ffffffff810f55df>] ? kfree+0x59/0xc2
[75146.035616]  [<ffffffff81515b94>] ? inet_recvmsg+0x64/0x75
[75146.035616]  [<ffffffff815a2a50>] ? ceph_msg_kfree+0x47/0x47
[75146.035616]  [<ffffffff815d389b>] ? invalid_op+0x1b/0x20
[75146.035616]  [<ffffffff815a2a47>] ? ceph_msg_kfree+0x3e/0x47
[75146.035616]  [<ffffffff815a2a50>] ? ceph_msg_kfree+0x47/0x47
[75146.035616]  [<ffffffff814f9620>] ? tcp_recvmsg+0x773/0x95e
[75146.035616]  [<ffffffff810f55df>] ? kfree+0x59/0xc2
[75146.035616]  [<ffffffff815a2a50>] ? ceph_msg_kfree+0x47/0x47
[75146.035616]  [<ffffffff815a2a47>] ? ceph_msg_kfree+0x3e/0x47
[75146.035616]  [<ffffffff81372b8e>] ? kref_put+0x34/0x3e
[75146.035616]  [<ffffffff8137513d>] ? ceph_mdsc_release_request+0x2f/0x145
[75146.035616]  [<ffffffff8137510e>] ? encode_caps_cb+0x2f9/0x2f9
[75146.035616]  [<ffffffff81372b8e>] ? kref_put+0x34/0x3e
[75146.035616]  [<ffffffff813778d3>] ? dispatch+0xe05/0x132c
[75146.035616]  [<ffffffff814b3d5e>] ? kernel_recvmsg+0x34/0x3f
[75146.035616]  [<ffffffff813dce42>] ? crc32c+0x56/0x7c
[75146.035616]  [<ffffffff815a39c7>] ? ceph_tcp_recvmsg+0x43/0x4f
[75146.035616]  [<ffffffff815a658b>] ? con_work+0x15ac/0x17a8
[75146.035616]  [<ffffffff8104483f>] ? lock_timer_base+0x25/0x49
[75146.035616]  [<ffffffff815a4fdf>] ? ceph_fault+0x2b4/0x2b4
[75146.035616]  [<ffffffff8104ef8a>] ? process_one_work+0x1cd/0x2eb
[75146.035616]  [<ffffffff8104f1d6>] ? worker_thread+0x12e/0x249
[75146.035616]  [<ffffffff8104f0a8>] ? process_one_work+0x2eb/0x2eb
[75146.035616]  [<ffffffff8104f0a8>] ? process_one_work+0x2eb/0x2eb
[75146.035616]  [<ffffffff81052b82>] ? kthread+0x81/0x89
[75146.035616]  [<ffffffff815d3a24>] ? kernel_thread_helper+0x4/0x10
[75146.035616]  [<ffffffff81052b01>] ? kthread_freezable_should_stop+0x53/0x53
[75146.035616]  [<ffffffff815d3a20>] ? gs_change+0x13/0x13
[75146.035616] Code: 41 5e 41 5f c3 41 bf ea ff ff ff eb 97 90 90 90
65 48 8b 04 25 c0 c6 00 00 48 8b 80 a0 02 00 00 8b 40 f0 c3 48 8b 87
a0 02 00 00 <48> 8b 40 f8 c3 48 3b 3d 51 5f 95 00 75 08 0f bf 87 6a 06
00 00
[75146.035616] RIP  [<ffffffff81052783>] kthread_data+0x7/0xc
[75146.035616]  RSP <ffff8800bbb35900>
[75146.035616] CR2: fffffffffffffff8
[75146.035616] ---[ end trace 18e2f523c5af9a3b ]---
[75146.035616] Fixing recursive fault but reboot is needed!
[75206.036002] INFO: rcu_sched detected stalls on CPUs/tasks: { 1}
(detected by 3, t=15002 jiffies)
[75206.036265] Pid: 0, comm: swapper/3 Tainted: G      D      3.3.4 #1
[75206.036371] Call Trace:
[75206.036464]  <IRQ>  [<ffffffff8109b7b3>] ? __rcu_pending+0x21a/0x336
[75206.036635]  [<ffffffff81075d59>] ? tick_nohz_handler+0xcb/0xcb
[75206.036740]  [<ffffffff8109b9cc>] ? rcu_check_callbacks+0xa7/0xe7
[75206.036846]  [<ffffffff81075d59>] ? tick_nohz_handler+0xcb/0xcb
[75206.036951]  [<ffffffff81044911>] ? update_process_times+0x31/0x63

----------------------------------------------------

Thanks a lot,
Giorgos Kappes

On Tue, May 8, 2012 at 10:18 PM, Tommi Virtanen <tv@inktank.com> wrote:
> On Tue, May 8, 2012 at 8:43 AM, Giorgos Kappes <geokapp@gmail.com> wrote:
>> When I am running deboostrap to install a base Debian Squeeze system
>> on a Ceph directory the client's kernel crashes with the following
>> message:
>>
>> I: Extracting zlib1g...
>> W: Failure trying to run: chroot /mnt/debian mount -t proc proc /proc
>> [  759.776151] kernel tried to execute NX-protected page - exploit
>> attempt? (uid: 0)
>> [  759.776169] BUG: unable to handle kernel paging request at ffffe8fffffe4ab0
> ...
>> [  759.776438]  [<ffffffff81099405>] ? __rcu_process_callbacks+0x1c7/0x2f8
>> [  759.776447]  [<ffffffff81099898>] ? rcu_process_callbacks+0x2c/0x56
>> [  759.776457]  [<ffffffff8104cb72>] ? __do_softirq+0xc4/0x1a0
>> [  759.776465]  [<ffffffff81096875>] ? handle_percpu_irq+0x3d/0x54
>> [  759.776475]  [<ffffffff8150efb6>] ? __xen_evtchn_do_upcall+0x1c7/0x205
>> [  759.776484]  [<ffffffff8176e52c>] ? call_softirq+0x1c/0x30
>> [  759.776493]  [<ffffffff8100fa47>] ? do_softirq+0x3f/0x79
>> [  759.776501]  [<ffffffff8104c942>] ? irq_exit+0x44/0xb5
>> [  759.776508]  [<ffffffff8150ffc6>] ? xen_evtchn_do_upcall+0x27/0x32
>> [  759.776516]  [<ffffffff8176e57e>] ? xen_do_hypervisor_callback+0x1e/0x30
> ...
>> My simple cluster consists of 3 nodes in total. Each node is a Xen
>> domU guest running the Linux kernel 3.2.6 and ceph 0.43. For
>> reference, here is my configuration:
> ...
>> My Ceph kernel client is another Xen domU node running the Linux
>> kernel 3.2.11. I have also tried a native client with the same result.
>> Please note that this bug happens only in the client side.
>> Your help would be greatly appreciated.
>
> Your backtrace includes Xen code in it -- can you reproduce this bug
> with a mainline kernel, without Xen at all?
>
> Also, the error encountered is from the NX security subsystem. It
> would be nice to know what would happen without NX.



-----------------------------------------------------------
Giorgos Kappes
Website: http://www.cs.uoi.gr/~gkappes
email: geokapp@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Ceph kernel client - kernel craches
  2012-05-10 18:00   ` Giorgos Kappes
@ 2012-05-17 22:49     ` Josh Durgin
  0 siblings, 0 replies; 4+ messages in thread
From: Josh Durgin @ 2012-05-17 22:49 UTC (permalink / raw)
  To: Giorgos Kappes; +Cc: ceph-devel

Sorry your mail fell through the cracks before. I filed
http://tracker.newdream.net/issues/2445 to track the ceph-related
crashes. Alex, do you think the first crash is related to ceph at all?

Josh

On 05/10/2012 11:00 AM, Giorgos Kappes wrote:
> Sorry for my late response. I reproduced the above bug with the Linux
> kernel 3.3.4 and without using XEN:
>
> uname -a
> Linux node33 3.3.4 #1 SMP Wed May 9 13:00:07 EEST 2012 x86_64 GNU/Linux
>
> The trace is shown below:
>
> ----------------------------------------------------
> [  763.984023] kernel tried to execute NX-protected page - exploit
> attempt? (uid: 0)
> [  763.984177] BUG: unable to handle kernel paging request at ffff880037bd0800
> [  763.984402] IP: [<ffff880037bd0800>] 0xffff880037bd07ff
> [  763.984568] PGD 1806063 PUD 180a063 PMD 8000000037a001e3
> [  763.984845] Oops: 0011 [#1] SMP
> [  763.985058] CPU 3
> [  763.985124] Modules linked in: cbc netconsole loop snd_pcm
> snd_timer snd soundcore snd_page_alloc processor tpm_tis i5400_edac
> tpm edac_core tpm_bios evdev pcspkr i5k_amb rng_core thermal_sys
> button shpchp pci_hotplug sd_mod crc_t10dif usbhid hid ide_cd_mod
> cdrom ata_generic uhci_hcd ehci_hcd ata_piix libata piix ide_core
> usbcore usb_common tg3 libphy mptsas mptscsih mptbase
> scsi_transport_sas scsi_mod [last unloaded: scsi_wait_scan]
> [  763.988002]
> [  763.988002] Pid: 0, comm: swapper/3 Not tainted 3.3.4 #1 HP ProLiant DL160 G5
> [  763.988002] RIP: 0010:[<ffff880037bd0800>]  [<ffff880037bd0800>]
> 0xffff880037bd07ff
> [  763.988002] RSP: 0018:ffff8800bfcc3e78  EFLAGS: 00010292
> [  763.988002] RAX: ffff8800b97745b0 RBX: ffff8800bfcce770 RCX: ffff880037bd0800
> [  763.988002] RDX: ffff880037bd1600 RSI: 00000000b9b6a040 RDI: ffff880037bd1600
> [  763.988002] RBP: ffffffff81820080 R08: ffff8800b9dd0b00 R09: 000000018020001c
> [  763.988002] R10: 000000008020001c R11: ffffffff816075c0 R12: ffff8800bfcce7a0
> [  763.988002] R13: ffff8800b97745b0 R14: 0000000000000003 R15: 000000000000000a
> [  763.988002] FS:  0000000000000000(0000) GS:ffff8800bfcc0000(0000)
> knlGS:0000000000000000
> [  763.988002] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  763.988002] CR2: ffff880037bd0800 CR3: 00000000b895b000 CR4: 00000000000006e0
> [  763.988002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  763.988002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  763.988002] Process swapper/3 (pid: 0, threadinfo ffff8800bbae0000,
> task ffff8800bbad8000)
> [  763.988002] Stack:
> [  763.988002]  ffffffff8109b44d ffff8800bbacd820 ffff8800b97745b0
> ffff8800bbae0010
> [  763.988002]  ffff8800bbad8000 ffff8800bfcc3ea0 0000000000000048
> ffff8800bbae1fd8
> [  763.988002]  0000000000000100 0000000000000001 0000000000000009
> ffff8800bbae1fd8
> [  763.988002] Call Trace:
> [  763.988002]<IRQ>
> [  763.988002]  [<ffffffff8109b44d>] ? __rcu_process_callbacks+0x1e9/0x335
> [  763.988002]  [<ffffffff8109b8fb>] ? rcu_process_callbacks+0x2c/0x56
> [  763.988002]  [<ffffffff8103e3b1>] ? __do_softirq+0xc4/0x1a0
> [  763.988002]  [<ffffffff8102515b>] ? lapic_next_event+0x18/0x1d
> [  763.988002]  [<ffffffff815d3b1c>] ? call_softirq+0x1c/0x30
> [  763.988002]  [<ffffffff8100fba3>] ? do_softirq+0x3f/0x79
> [  763.988002]  [<ffffffff8103e186>] ? irq_exit+0x44/0xb1
> [  763.988002]  [<ffffffff81025c61>] ? smp_apic_timer_interrupt+0x85/0x93
> [  763.988002]  [<ffffffff815d311e>] ? apic_timer_interrupt+0x6e/0x80
> [  763.988002]<EOI>
> [  763.988002]  [<ffffffff810145e1>] ? native_sched_clock+0x28/0x33
> [  763.988002]  [<ffffffff810152f6>] ? mwait_idle+0x8c/0xbc
> [  763.988002]  [<ffffffff810152ae>] ? mwait_idle+0x44/0xbc
> [  763.988002]  [<ffffffff8100de94>] ? cpu_idle+0xb9/0xf7
> [  763.988002]  [<ffffffff815c43c6>] ? start_secondary+0x270/0x275
> [  763.988002] Code: 00 00 00 00 04 8a b8 00 88 ff ff 00 04 8a b8 00
> 88 ff ff 00 03 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00<00>  16 bd 37 00 88 ff ff 40 ab cd bf 00 88 ff ff 20 15 42
> b9 00
> [  763.988002] RIP  [<ffff880037bd0800>] 0xffff880037bd07ff
> [  763.988002]  RSP<ffff8800bfcc3e78>
> [  763.988002] CR2: ffff880037bd0800
> [  763.988002] ---[ end trace 614049dc850267ac ]---
> [  763.988002] Kernel panic - not syncing: Fatal exception in interrupt
> [  763.997833] ------------[ cut here ]------------
> [  763.997936] WARNING: at arch/x86/kernel/smp.c:120
> update_process_times+0x57/0x63()
> [  763.998072] Hardware name: ProLiant DL160 G5
> [  763.998171] Modules linked in: cbc netconsole loop snd_pcm
> snd_timer snd soundcore snd_page_alloc processor tpm_tis i5400_edac
> tpm edac_core tpm_bios evdev pcspkr i5k_amb rng_core thermal_sys
> button shpchp pci_hotplug sd_mod crc_t10dif usbhid hid ide_cd_mod
> cdrom ata_generic uhci_hcd ehci_hcd ata_piix libata piix ide_core
> usbcore usb_common tg3 libphy mptsas mptscsih mptbase
> scsi_transport_sas scsi_mod [last unloaded: scsi_wait_scan]
> [  764.001205] Pid: 0, comm: swapper/3 Tainted: G      D      3.3.4 #1
> [  764.001311] Call Trace:
> [  764.001404]<IRQ>    [<ffffffff81038bb0>] ? warn_slowpath_common+0x78/0x8c
> [  764.001573]  [<ffffffff81044937>] ? update_process_times+0x57/0x63
> [  764.001681]  [<ffffffff81075dbe>] ? tick_sched_timer+0x65/0x8b
> [  764.001788]  [<ffffffff810561bd>] ? __run_hrtimer+0xb2/0x13d
> [  764.001832]  [<ffffffff81013ca9>] ? read_tsc+0x5/0x16
> [  764.001832]  [<ffffffff81056482>] ? hrtimer_interrupt+0xd8/0x1a7
> [  764.001832]  [<ffffffff81025c5c>] ? smp_apic_timer_interrupt+0x80/0x93
> [  764.001832]  [<ffffffff81025c89>] ? native_safe_apic_wait_icr_idle+0x1a/0x49
> [  764.001832]  [<ffffffff815d311e>] ? apic_timer_interrupt+0x6e/0x80
> [  764.001832]  [<ffffffff81056eaa>] ? up+0xe/0x36
> [  764.001832]  [<ffffffff815ca3ec>] ? panic+0x189/0x1c9
> [  764.001832]  [<ffffffff815ca353>] ? panic+0xf0/0x1c9
> [  764.001832]  [<ffffffff810390ee>] ? kmsg_dump+0x53/0xef
> [  764.001832]  [<ffffffff815cd05e>] ? oops_end+0xaa/0xb7
> [  764.001832]  [<ffffffff8102eaca>] ? no_context+0x254/0x263
> [  764.001832]  [<ffffffff815cf187>] ? do_page_fault+0x1ad/0x34c
> [  764.001832]  [<ffffffff814c6b67>] ? __netif_receive_skb+0x44d/0x491
> [  764.001832]  [<ffffffff81013ca9>] ? read_tsc+0x5/0x16
> [  764.001832]  [<ffffffff814c6f4f>] ? netif_receive_skb+0x71/0x77
> [  764.001832]  [<ffffffff814c74bd>] ? napi_gro_receive+0x1f/0x2c
> [  764.001832]  [<ffffffff814c7029>] ? napi_skb_finish+0x1c/0x31
> [  764.001832]  [<ffffffffa008cc74>] ? tg3_poll_work+0x8f9/0xb66 [tg3]
> [  764.001832]  [<ffffffff815cc5f5>] ? page_fault+0x25/0x30
> [  764.001832]  [<ffffffff8109b44d>] ? __rcu_process_callbacks+0x1e9/0x335
> [  764.001832]  [<ffffffff8109b8fb>] ? rcu_process_callbacks+0x2c/0x56
> [  764.001832]  [<ffffffff8103e3b1>] ? __do_softirq+0xc4/0x1a0
> [  764.001832]  [<ffffffff8102515b>] ? lapic_next_event+0x18/0x1d
> [  764.001832]  [<ffffffff815d3b1c>] ? call_softirq+0x1c/0x30
> [  764.001832]  [<ffffffff8100fba3>] ? do_softirq+0x3f/0x79
> [  764.001832]  [<ffffffff8103e186>] ? irq_exit+0x44/0xb1
> [  764.001832]  [<ffffffff81025c61>] ? smp_apic_timer_interrupt+0x85/0x93
> [  764.001832]  [<ffffffff815d311e>] ? apic_timer_interrupt+0x6e/0x80
> [  764.001832]<EOI>    [<ffffffff810145e1>] ? native_sched_clock+0x28/0x33
> [  764.001832]  [<ffffffff810152f6>] ? mwait_idle+0x8c/0xbc
> [  764.001832]  [<ffffffff810152ae>] ? mwait_idle+0x44/0xbc
> [  764.001832]  [<ffffffff8100de94>] ? cpu_idle+0xb9/0xf7
> [  764.001832]  [<ffffffff815c43c6>] ? start_secondary+0x270/0x275
> [  764.001832] ---[ end trace 614049dc850267ad ]---
>
> ----------------------------------------------------
>
> Also, as you noted, I disabled the NX bit by passing "noexec=off" to
> the kernel.
> Unfortunately, the bug is still happening:
>
> ----------------------------------------------------
> [  703.168022] BUG: unable to handle kernel paging request at ffff87ffbfa0e22b
> [  703.168293] IP: [<ffff8800b9767200>] 0xffff8800b97671ff
> [  703.168457] PGD 0
> [  703.168613] Oops: 0002 [#1] SMP
> [  703.168831] CPU 0
> [  703.168896] Modules linked in: cbc netconsole loop tpm_tis snd_pcm
> snd_timer snd soundcore shpchp pci_hotplug snd_page_alloc tpm
> i5400_edac rng_core tpm_bios edac_core i5k_amb processor pcspkr
> thermal_sys evdev button sd_mod crc_t10dif usbhid hid sg sr_mod cdrom
> ata_generic uhci_hcd piix ide_core ehci_hcd ata_piix tg3 libphy
> usbcore usb_common libata mptsas mptscsih mptbase scsi_transport_sas
> scsi_mod [last unloaded: scsi_wait_scan]
> [  703.172001]
> [  703.172001] Pid: 0, comm: swapper/0 Not tainted 3.3.4 #1 HP ProLiant DL160 G5
> [  703.172001] RIP: 0010:[<ffff8800b9767200>]  [<ffff8800b9767200>]
> 0xffff8800b97671ff
> [  703.172001] RSP: 0018:ffff8800bfc03e78  EFLAGS: 00010292
> [  703.172001] RAX: ffff880037a02900 RBX: ffff8800bfc0e770 RCX: ffff8800b9767200
> [  703.172001] RDX: ffff8800b92b9000 RSI: ffff8800b8901800 RDI: ffff8800b92b9000
> [  703.172001] RBP: ffffffff81820080 R08: ffff8800b8f77f00 R09: 000000018020000a
> [  703.172001] R10: 000000008020000a R11: ffff8800bfc0e600 R12: ffff8800bfc0e7a0
> [  703.172001] R13: ffff8800ba177370 R14: 0000000000000005 R15: 000000000000000a
> [  703.172001] FS:  0000000000000000(0000) GS:ffff8800bfc00000(0000)
> knlGS:0000000000000000
> [  703.172001] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  703.172001] CR2: ffff87ffbfa0e22b CR3: 0000000001805000 CR4: 00000000000006f0
> [  703.172001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  703.172001] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  703.172001] Process swapper/0 (pid: 0, threadinfo ffffffff81800000,
> task ffffffff8180d020)
> [  703.172001] Stack:
> [  703.172001]  ffffffff8109b44d 0000000000000000 ffff880037a02900
> ffffffff81800010
> [  703.172001]  ffffffff8180d020 ffffffff81801fd8 0000000000000048
> ffffffff81801fd8
> [  703.172001]  0000000000000100 0000000000000001 0000000000000009
> ffffffff81801fd8
> [  703.172001] Call Trace:
> [  703.172001]<IRQ>
> [  703.172001]  [<ffffffff8109b44d>] ? __rcu_process_callbacks+0x1e9/0x335
> [  703.172001]  [<ffffffff8109b8fb>] ? rcu_process_callbacks+0x2c/0x56
> [  703.172001]  [<ffffffff8103e3b1>] ? __do_softirq+0xc4/0x1a0
> [  703.172001]  [<ffffffff8102515b>] ? lapic_next_event+0x18/0x1d
> [  703.172001]  [<ffffffff815d3b1c>] ? call_softirq+0x1c/0x30
> [  703.172001]  [<ffffffff8100fba3>] ? do_softirq+0x3f/0x79
> [  703.172001]  [<ffffffff8103e186>] ? irq_exit+0x44/0xb1
> [  703.172001]  [<ffffffff81025c61>] ? smp_apic_timer_interrupt+0x85/0x93
> [  703.172001]  [<ffffffff815d311e>] ? apic_timer_interrupt+0x6e/0x80
> [  703.172001]<EOI>
> [  703.172001]  [<ffffffff810145e1>] ? native_sched_clock+0x28/0x33
> [  703.172001]  [<ffffffff810152f6>] ? mwait_idle+0x8c/0xbc
> [  703.172001]  [<ffffffff810152ae>] ? mwait_idle+0x44/0xbc
> [  703.172001]  [<ffffffff8100de94>] ? cpu_idle+0xb9/0xf7
> [  703.172001]  [<ffffffff818c1c06>] ? start_kernel+0x395/0x3a0
> [  703.172001]  [<ffffffff818c13d1>] ? x86_64_start_kernel+0x102/0x10f
> [  703.172001] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00<00>  90 2b b9 00 88 ff ff 20 ac c1 bf 00 88 ff ff 20 98 a7
> b8 00
> [  703.172001] RIP  [<ffff8800b9767200>] 0xffff8800b97671ff
> [  703.172001]  RSP<ffff8800bfc03e78>
> [  703.172001] CR2: ffff87ffbfa0e22b
> [  703.172001] ---[ end trace 15e08c2db2033830 ]---
> [  703.172001] Kernel panic - not syncing: Fatal exception in interrupt
> ----------------------------------------------------
>
> The strange thing is that the crash traces does not contain any calls
> related to Ceph.
> However, this bug only happens when running debootstrap to install a
> base Debian system
> into a Ceph directory. Debootstrap completes successfully when the
> target directory is
> under NFS or on a local file system.
>
> Furthermore, a different crash occurs when trying to remove a
> non-empty Ceph directory:
> ******************************************************
> root@node33:/mnt# rm debian -r
> rm: cannot remove `debian/etc': Directory not empty
> Write failed: Broken pipe
> ******************************************************
> The crash trace is shown below:
>
> ----------------------------------------------------
>
> [74576.543412] libceph: client0 fsid 9b3222ac-fce2-44eb-8599-d39da02d2393
> [74576.651197] libceph: mon0 192.168.2.254:6789 session established
> [75143.963663] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000030
> [75143.963771] IP: [<ffffffff811061cd>] path_init+0x218/0x2cc
> [75143.963827] PGD 37a63067 PUD 37b4b067 PMD 0
> [75143.963880] Oops: 0000 [#1] SMP
> [75143.963928] CPU 3
> [75143.963935] Modules linked in: cbc netconsole loop i5400_edac
> snd_pcm edac_core i5k_amb snd_timer snd soundcore evdev snd_page_alloc
> tpm_tis tpm processor rng_core thermal_sys pcspkr tpm_bios shpchp
> pci_hotplug button sd_mod crc_t10dif usbhid hid ide_cd_mod cdrom
> ata_generic uhci_hcd ata_piix libata piix ide_core ehci_hcd usbcore
> usb_common tg3 libphy mptsas mptscsih mptbase scsi_transport_sas
> scsi_mod [last unloaded: netconsole]
> [75143.964390]
> [75143.964426] Pid: 3861, comm: rm Not tainted 3.3.4 #1 HP ProLiant DL160 G5
> [75143.964485] RIP: 0010:[<ffffffff811061cd>]  [<ffffffff811061cd>]
> path_init+0x218/0x2cc
> [75143.964570] RSP: 0018:ffff880037b45d58  EFLAGS: 00010202
> [75143.964618] RAX: 0000000000000000 RBX: ffff8800b8975000 RCX: ffff880037b45ea8
> [75143.964672] RDX: ffff8800b929a900 RSI: ffff880037b45d74 RDI: ffff8800b9d0e830
> [75143.964727] RBP: 0000000000000050 R08: ffff880037b45de0 R09: 0000000000000000
> [75143.964781] R10: 0000006e7265746c R11: 0000000001406d90 R12: ffff880037b45ea8
> [75143.964835] R13: ffff8800b929a900 R14: ffff880037b45de0 R15: 0000000000000003
> [75143.964890] FS:  00007fb7a9b47700(0000) GS:ffff8800bfcc0000(0000)
> knlGS:0000000000000000
> [75143.964974] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [75143.965023] CR2: 0000000000000030 CR3: 00000000379d5000 CR4: 00000000000006e0
> [75143.965078] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [75143.965132] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [75143.965187] Process rm (pid: 3861, threadinfo ffff880037b44000,
> task ffff8800b91fed00)
> [75143.965269] Stack:
> [75143.965306]  00000000b9f76000 000000005088218e ffff8800b9f76280
> 00000000b9f76280
> [75143.965397]  ffff880037b45ea8 0000000000000050 ffff8800b8975000
> 0000000000000010
> [75143.965489]  00000000013f2030 ffffffff811072b4 ffffffff81060362
> dead000000100100
> [75143.965581] Call Trace:
> [75143.965622]  [<ffffffff811072b4>] ? path_lookupat+0x2c/0x30b
> [75143.965674]  [<ffffffff81060362>] ? try_to_wake_up+0x1a5/0x1a5
> [75143.965725]  [<ffffffff811075b1>] ? do_path_lookup+0x1e/0x9a
> [75143.965775]  [<ffffffff81107bf1>] ? user_path_parent+0x3a/0x5f
> [75143.965826]  [<ffffffff810f3484>] ? virt_to_head_page+0x9/0x2c
> [75143.965877]  [<ffffffff81107e96>] ? do_unlinkat+0x1d/0x15e
> [75143.965927]  [<ffffffff81109e94>] ? vfs_readdir+0x91/0xa7
> [75143.965977]  [<ffffffff8112a9a5>] ? fsnotify_find_inode_mark+0x23/0x2f
> [75143.966031]  [<ffffffff810fa97e>] ? filp_close+0x64/0x6c
> [75143.966082]  [<ffffffff815d2679>] ? system_call_fastpath+0x16/0x1b
> [75143.966133] Code: 04 01 e9 a1 00 00 00 48 8d 74 24 1c e8 ae 6f ff
> ff 49 89 c5 b8 f7 ff ff ff 4d 85 ed 0f 84 b0 00 00 00 49 8b 45 18 80
> 3b 00 74 28<48>  8b 78 30 b8 ec ff ff ff 0f b7 17 81 e2 00 f0 00 00 81
> fa 00
> [75143.966507] RIP  [<ffffffff811061cd>] path_init+0x218/0x2cc
> [75143.966558]  RSP<ffff880037b45d58>
> [75143.966600] CR2: 0000000000000030
> [75143.967124] ---[ end trace 18e2f523c5af9a38 ]---
> [75143.967322] general protection fault: 0000 [#2] SMP
> [75143.967542] CPU 3
> [75143.967607] Modules linked in: cbc netconsole loop i5400_edac
> snd_pcm edac_core i5k_amb snd_timer snd soundcore evdev snd_page_alloc
> tpm_tis tpm processor rng_core thermal_sys pcspkr tpm_bios shpchp
> pci_hotplug button sd_mod crc_t10dif usbhid hid ide_cd_mod cdrom
> ata_generic uhci_hcd ata_piix libata piix ide_core ehci_hcd usbcore
> usb_common tg3 libphy mptsas mptscsih mptbase scsi_transport_sas
> scsi_mod [last unloaded: netconsole]
> [75143.970715]
> [75143.970805] Pid: 3861, comm: rm Tainted: G      D      3.3.4 #1 HP
> ProLiant DL160 G5
> [75143.971058] RIP: 0010:[<ffffffff810fa947>]  [<ffffffff810fa947>]
> filp_close+0x2d/0x6c
> [75143.971085] RSP: 0018:ffff880037b45a48  EFLAGS: 00010206
> [75143.971085] RAX: 0012080800000000 RBX: ffff8800b910d300 RCX: 0000000000000000
> [75143.971085] RDX: 0000000000000000 RSI: ffff8800b9712c00 RDI: ffff8800b910d300
> [75143.971085] RBP: ffff8800b9712c00 R08: 0000000000016870 R09: 00007fff71600000
> [75143.971085] R10: 0000000000000001 R11: ffff880037b459a8 R12: 0000000000000000
> [75143.971085] R13: ffff8800bb51f6c0 R14: 0000000000000004 R15: 0000000000000000
> [75143.971085] FS:  0000000000000000(0000) GS:ffff8800bfcc0000(0000)
> knlGS:0000000000000000
> [75143.971085] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [75143.971085] CR2: 0000000000000030 CR3: 0000000001805000 CR4: 00000000000006e0
> [75143.971085] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [75143.971085] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [75143.971085] Process rm (pid: 3861, threadinfo ffff880037b44000,
> task ffff8800b91fed00)
> [75143.971085] Stack:
> [75143.971085]  ffff8800b9712c00 0000000000000007 0000000000000000
> ffffffff8103aaad
> [75143.971085]  0000000000000009 ffff8800b91fed00 ffff8800b91ff218
> 0000000000000009
> [75143.971085]  ffff8800b8ca0700 ffff8800b92c0880 0000000000000001
> ffffffff8103bfe4
> [75143.971085] Call Trace:
> [75143.971085]  [<ffffffff8103aaad>] ? put_files_struct+0x67/0xbf
> [75143.971085]  [<ffffffff8103bfe4>] ? do_exit+0x2aa/0x7e1
> [75143.971085]  [<ffffffff810390ee>] ? kmsg_dump+0x53/0xef
> [75143.971085]  [<ffffffff815cd01a>] ? oops_end+0x66/0xb7
> [75143.971085]  [<ffffffff815cd066>] ? oops_end+0xb2/0xb7
> [75143.971085]  [<ffffffff8102eaca>] ? no_context+0x254/0x263
> [75143.971085]  [<ffffffff81368f16>] ? ceph_writepages_start+0xbb4/0xbee
> [75143.971085]  [<ffffffff815cf1ef>] ? do_page_fault+0x215/0x34c
> [75143.971085]  [<ffffffff8136a1e5>] ? __cap_is_valid+0x19/0x9a
> [75143.971085]  [<ffffffff8136ba47>] ? ceph_encode_inode_release+0xed/0x2b2
> [75143.971085]  [<ffffffff81063c10>] ? update_curr+0xfb/0x130
> [75143.971085]  [<ffffffff8100d6fe>] ? __switch_to+0x20b/0x35f
> [75143.971085]  [<ffffffff81063c10>] ? update_curr+0xfb/0x130
> [75143.971085]  [<ffffffff815cc5f5>] ? page_fault+0x25/0x30
> [75143.971085]  [<ffffffff811061cd>] ? path_init+0x218/0x2cc
> [75143.971085]  [<ffffffff811061b3>] ? path_init+0x1fe/0x2cc
> [75143.971085]  [<ffffffff811072b4>] ? path_lookupat+0x2c/0x30b
> [75143.971085]  [<ffffffff81060362>] ? try_to_wake_up+0x1a5/0x1a5
> [75143.971085]  [<ffffffff811075b1>] ? do_path_lookup+0x1e/0x9a
> [75143.971085]  [<ffffffff81107bf1>] ? user_path_parent+0x3a/0x5f
> [75143.971085]  [<ffffffff810f3484>] ? virt_to_head_page+0x9/0x2c
> [75143.971085]  [<ffffffff81107e96>] ? do_unlinkat+0x1d/0x15e
> [75143.971085]  [<ffffffff81109e94>] ? vfs_readdir+0x91/0xa7
> [75143.971085]  [<ffffffff8112a9a5>] ? fsnotify_find_inode_mark+0x23/0x2f
> [75143.971085]  [<ffffffff810fa97e>] ? filp_close+0x64/0x6c
> [75143.971085]  [<ffffffff815d2679>] ? system_call_fastpath+0x16/0x1b
> [75143.971085] Code: 55 48 89 f5 53 48 89 fb 48 8b 47 30 48 85 c0 75
> 11 48 c7 c7 6e d5 72 81 45 31 e4 e8 f0 fa 4c 00 eb 40 48 8b 47 20 48
> 85 c0 74 10<48>  8b 40 60 48 85 c0 74 07 ff d0 41 89 c4 eb 03 45 31 e4
> f6 43
> [75143.971085] RIP  [<ffffffff810fa947>] filp_close+0x2d/0x6c
> [75143.971085]  RSP<ffff880037b45a48>
> [75143.988721] ---[ end trace 18e2f523c5af9a39 ]---
> [75143.988826] Fixing recursive fault but reboot is needed!
> [75146.018276] ------------[ cut here ]------------
> [75146.018399] kernel BUG at mm/slub.c:3442!
> [75146.018498] invalid opcode: 0000 [#3] SMP
> [75146.018718] CPU 1
> [75146.018789] Modules linked in: cbc netconsole loop i5400_edac
> snd_pcm edac_core i5k_amb snd_timer snd soundcore evdev snd_page_alloc
> tpm_tis tpm processor rng_core thermal_sys pcspkr tpm_bios shpchp
> pci_hotplug button sd_mod crc_t10dif usbhid hid ide_cd_mod cdrom
> ata_generic uhci_hcd ata_piix libata piix ide_core ehci_hcd usbcore
> usb_common tg3 libphy mptsas mptscsih mptbase scsi_transport_sas
> scsi_mod [last unloaded: netconsole]
> [75146.021908]
> [75146.021999] Pid: 1137, comm: kworker/1:0 Tainted: G      D
> 3.3.4 #1 HP ProLiant DL160 G5
> [75146.022236] RIP: 0010:[<ffffffff810f55df>]  [<ffffffff810f55df>]
> kfree+0x59/0xc2
> [75146.022236] RSP: 0018:ffff8800bbb35b90  EFLAGS: 00010246
> [75146.022236] RAX: 0100000000000400 RBX: ffff8800bfcdab70 RCX: ffff8800b9f762c8
> [75146.022236] RDX: ffff8800bfc4e550 RSI: 0000000000000000 RDI: ffffea0002ff3680
> [75146.022236] RBP: ffffffff815a2a50 R08: 0000000000000000 R09: ffffffff814f9620
> [75146.022236] R10: 000000000000000d R11: ffff8800b9d0d000 R12: ffffffff815a2a47
> [75146.022236] R13: ffff8800ba15d400 R14: ffff8800b9d0d271 R15: ffff8800b9d0d000
> [75146.022236] FS:  0000000000000000(0000) GS:ffff8800bfc40000(0000)
> knlGS:0000000000000000
> [75146.022236] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [75146.022236] CR2: ffffffffff600400 CR3: 0000000037b13000 CR4: 00000000000006e0
> [75146.022236] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [75146.022236] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [75146.022236] Process kworker/1:0 (pid: 1137, threadinfo
> ffff8800bbb34000, task ffff8800b90b0000)
> [75146.022236] Stack:
> [75146.022236]  ffff8800b8fde500 ffffffff815a2a50 ffff8800b9f72800
> ffffffff815a2a47
> [75146.022236]  ffff8800b8fde588 ffffffff81372b8e 0000001b00004040
> ffff8800b9f72a68
> [75146.022236]  ffff8800b9f72800 ffffffff8137513d ffff8800ba15d400
> ffff8800b9f72a68
> [75146.022236] Call Trace:
> [75146.022236]  [<ffffffff815a2a50>] ? ceph_msg_kfree+0x47/0x47
> [75146.022236]  [<ffffffff815a2a47>] ? ceph_msg_kfree+0x3e/0x47
> [75146.022236]  [<ffffffff81372b8e>] ? kref_put+0x34/0x3e
> [75146.022236]  [<ffffffff8137513d>] ? ceph_mdsc_release_request+0x2f/0x145
> [75146.022236]  [<ffffffff8137510e>] ? encode_caps_cb+0x2f9/0x2f9
> [75146.022236]  [<ffffffff81372b8e>] ? kref_put+0x34/0x3e
> [75146.022236]  [<ffffffff813778d3>] ? dispatch+0xe05/0x132c
> [75146.022236]  [<ffffffff814b3d5e>] ? kernel_recvmsg+0x34/0x3f
> [75146.022236]  [<ffffffff813dce42>] ? crc32c+0x56/0x7c
> [75146.022236]  [<ffffffff815a39c7>] ? ceph_tcp_recvmsg+0x43/0x4f
> [75146.022236]  [<ffffffff815a658b>] ? con_work+0x15ac/0x17a8
> [75146.022236]  [<ffffffff8104483f>] ? lock_timer_base+0x25/0x49
> [75146.022236]  [<ffffffff815a4fdf>] ? ceph_fault+0x2b4/0x2b4
> [75146.022236]  [<ffffffff8104ef8a>] ? process_one_work+0x1cd/0x2eb
> [75146.022236]  [<ffffffff8104f1d6>] ? worker_thread+0x12e/0x249
> [75146.022236]  [<ffffffff8104f0a8>] ? process_one_work+0x2eb/0x2eb
> [75146.022236]  [<ffffffff8104f0a8>] ? process_one_work+0x2eb/0x2eb
> [75146.022236]  [<ffffffff81052b82>] ? kthread+0x81/0x89
> [75146.022236]  [<ffffffff815d3a24>] ? kernel_thread_helper+0x4/0x10
> [75146.022236]  [<ffffffff81052b01>] ? kthread_freezable_should_stop+0x53/0x53
> [75146.022236]  [<ffffffff815d3a20>] ? gs_change+0x13/0x13
> [75146.022236] Code: 00 48 83 c5 10 48 83 7d 00 00 eb e6 48 83 fb 10
> 76 7d 48 89 df e8 ad de ff ff 48 89 c7 48 8b 00 84 c0 78 14 66 f7 07
> 00 c0 75 04<0f>  0b eb fe 5b 5d 41 5c e9 e8 03 fd ff 4c 8b 54 24 18 4c
> 8b 4f
> [75146.022236] RIP  [<ffffffff810f55df>] kfree+0x59/0xc2
> [75146.022236]  RSP<ffff8800bbb35b90>
> [75146.031675] ---[ end trace 18e2f523c5af9a3a ]---
> [75146.031809] BUG: unable to handle kernel paging request at fffffffffffffff8
> [75146.032058] IP: [<ffffffff81052783>] kthread_data+0x7/0xc
> [75146.032221] PGD 1807067 PUD 1808067 PMD 0
> [75146.032494] Oops: 0000 [#4] SMP
> [75146.032706] CPU 1
> [75146.032771] Modules linked in: cbc netconsole loop i5400_edac
> snd_pcm edac_core i5k_amb snd_timer snd soundcore evdev snd_page_alloc
> tpm_tis tpm processor rng_core thermal_sys pcspkr tpm_bios shpchp
> pci_hotplug button sd_mod crc_t10dif usbhid hid ide_cd_mod cdrom
> ata_generic uhci_hcd ata_piix libata piix ide_core ehci_hcd usbcore
> usb_common tg3 libphy mptsas mptscsih mptbase scsi_transport_sas
> scsi_mod [last unloaded: netconsole]
> [75146.035616]
> [75146.035616] Pid: 1137, comm: kworker/1:0 Tainted: G      D
> 3.3.4 #1 HP ProLiant DL160 G5
> [75146.035616] RIP: 0010:[<ffffffff81052783>]  [<ffffffff81052783>]
> kthread_data+0x7/0xc
> [75146.035616] RSP: 0018:ffff8800bbb35900  EFLAGS: 00010002
> [75146.035616] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001
> [75146.035616] RDX: ffffffff819a87b0 RSI: 0000000000000001 RDI: ffff8800b90b0000
> [75146.035616] RBP: ffff8800b90b0000 R08: 0000000000000400 R09: ffffffff81013c7c
> [75146.035616] R10: ffff8800b90b0000 R11: ffff8800b90b0518 R12: ffff8800b90b02f8
> [75146.035616] R13: ffff8800bbb359c8 R14: 0000000000000001 R15: 0000000000000001
> [75146.035616] FS:  0000000000000000(0000) GS:ffff8800bfc40000(0000)
> knlGS:0000000000000000
> [75146.035616] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [75146.035616] CR2: fffffffffffffff8 CR3: 0000000037b13000 CR4: 00000000000006e0
> [75146.035616] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [75146.035616] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [75146.035616] Process kworker/1:0 (pid: 1137, threadinfo
> ffff8800bbb34000, task ffff8800b90b0000)
> [75146.035616] Stack:
> [75146.035616]  ffffffff8104e2a2 ffff8800bfc53340 ffffffff815cb1d1
> 0000000000000001
> [75146.035616]  0000000000000296 0000000000013340 ffff8800bbb35fd8
> 0000000000013340
> [75146.035616]  ffff8800bbb35fd8 0000000000013340 ffff8800b90b0000
> 0000000000013340
> [75146.035616] Call Trace:
> [75146.035616]  [<ffffffff8104e2a2>] ? wq_worker_sleeping+0x8/0x82
> [75146.035616]  [<ffffffff815cb1d1>] ? __schedule+0x166/0x4fc
> [75146.035616]  [<ffffffff8103c517>] ? do_exit+0x7dd/0x7e1
> [75146.035616]  [<ffffffff815ca46c>] ? printk+0x40/0x4c
> [75146.035616]  [<ffffffff815cd01a>] ? oops_end+0x66/0xb7
> [75146.035616]  [<ffffffff815cd066>] ? oops_end+0xb2/0xb7
> [75146.035616]  [<ffffffff815a2a47>] ? ceph_msg_kfree+0x3e/0x47
> [75146.035616]  [<ffffffff8100ef69>] ? do_invalid_op+0x8b/0x95
> [75146.035616]  [<ffffffff810f55df>] ? kfree+0x59/0xc2
> [75146.035616]  [<ffffffff81515b94>] ? inet_recvmsg+0x64/0x75
> [75146.035616]  [<ffffffff815a2a50>] ? ceph_msg_kfree+0x47/0x47
> [75146.035616]  [<ffffffff815d389b>] ? invalid_op+0x1b/0x20
> [75146.035616]  [<ffffffff815a2a47>] ? ceph_msg_kfree+0x3e/0x47
> [75146.035616]  [<ffffffff815a2a50>] ? ceph_msg_kfree+0x47/0x47
> [75146.035616]  [<ffffffff814f9620>] ? tcp_recvmsg+0x773/0x95e
> [75146.035616]  [<ffffffff810f55df>] ? kfree+0x59/0xc2
> [75146.035616]  [<ffffffff815a2a50>] ? ceph_msg_kfree+0x47/0x47
> [75146.035616]  [<ffffffff815a2a47>] ? ceph_msg_kfree+0x3e/0x47
> [75146.035616]  [<ffffffff81372b8e>] ? kref_put+0x34/0x3e
> [75146.035616]  [<ffffffff8137513d>] ? ceph_mdsc_release_request+0x2f/0x145
> [75146.035616]  [<ffffffff8137510e>] ? encode_caps_cb+0x2f9/0x2f9
> [75146.035616]  [<ffffffff81372b8e>] ? kref_put+0x34/0x3e
> [75146.035616]  [<ffffffff813778d3>] ? dispatch+0xe05/0x132c
> [75146.035616]  [<ffffffff814b3d5e>] ? kernel_recvmsg+0x34/0x3f
> [75146.035616]  [<ffffffff813dce42>] ? crc32c+0x56/0x7c
> [75146.035616]  [<ffffffff815a39c7>] ? ceph_tcp_recvmsg+0x43/0x4f
> [75146.035616]  [<ffffffff815a658b>] ? con_work+0x15ac/0x17a8
> [75146.035616]  [<ffffffff8104483f>] ? lock_timer_base+0x25/0x49
> [75146.035616]  [<ffffffff815a4fdf>] ? ceph_fault+0x2b4/0x2b4
> [75146.035616]  [<ffffffff8104ef8a>] ? process_one_work+0x1cd/0x2eb
> [75146.035616]  [<ffffffff8104f1d6>] ? worker_thread+0x12e/0x249
> [75146.035616]  [<ffffffff8104f0a8>] ? process_one_work+0x2eb/0x2eb
> [75146.035616]  [<ffffffff8104f0a8>] ? process_one_work+0x2eb/0x2eb
> [75146.035616]  [<ffffffff81052b82>] ? kthread+0x81/0x89
> [75146.035616]  [<ffffffff815d3a24>] ? kernel_thread_helper+0x4/0x10
> [75146.035616]  [<ffffffff81052b01>] ? kthread_freezable_should_stop+0x53/0x53
> [75146.035616]  [<ffffffff815d3a20>] ? gs_change+0x13/0x13
> [75146.035616] Code: 41 5e 41 5f c3 41 bf ea ff ff ff eb 97 90 90 90
> 65 48 8b 04 25 c0 c6 00 00 48 8b 80 a0 02 00 00 8b 40 f0 c3 48 8b 87
> a0 02 00 00<48>  8b 40 f8 c3 48 3b 3d 51 5f 95 00 75 08 0f bf 87 6a 06
> 00 00
> [75146.035616] RIP  [<ffffffff81052783>] kthread_data+0x7/0xc
> [75146.035616]  RSP<ffff8800bbb35900>
> [75146.035616] CR2: fffffffffffffff8
> [75146.035616] ---[ end trace 18e2f523c5af9a3b ]---
> [75146.035616] Fixing recursive fault but reboot is needed!
> [75206.036002] INFO: rcu_sched detected stalls on CPUs/tasks: { 1}
> (detected by 3, t=15002 jiffies)
> [75206.036265] Pid: 0, comm: swapper/3 Tainted: G      D      3.3.4 #1
> [75206.036371] Call Trace:
> [75206.036464]<IRQ>    [<ffffffff8109b7b3>] ? __rcu_pending+0x21a/0x336
> [75206.036635]  [<ffffffff81075d59>] ? tick_nohz_handler+0xcb/0xcb
> [75206.036740]  [<ffffffff8109b9cc>] ? rcu_check_callbacks+0xa7/0xe7
> [75206.036846]  [<ffffffff81075d59>] ? tick_nohz_handler+0xcb/0xcb
> [75206.036951]  [<ffffffff81044911>] ? update_process_times+0x31/0x63
>
> ----------------------------------------------------
>
> Thanks a lot,
> Giorgos Kappes
>
> On Tue, May 8, 2012 at 10:18 PM, Tommi Virtanen<tv@inktank.com>  wrote:
>> On Tue, May 8, 2012 at 8:43 AM, Giorgos Kappes<geokapp@gmail.com>  wrote:
>>> When I am running deboostrap to install a base Debian Squeeze system
>>> on a Ceph directory the client's kernel crashes with the following
>>> message:
>>>
>>> I: Extracting zlib1g...
>>> W: Failure trying to run: chroot /mnt/debian mount -t proc proc /proc
>>> [  759.776151] kernel tried to execute NX-protected page - exploit
>>> attempt? (uid: 0)
>>> [  759.776169] BUG: unable to handle kernel paging request at ffffe8fffffe4ab0
>> ...
>>> [  759.776438]  [<ffffffff81099405>] ? __rcu_process_callbacks+0x1c7/0x2f8
>>> [  759.776447]  [<ffffffff81099898>] ? rcu_process_callbacks+0x2c/0x56
>>> [  759.776457]  [<ffffffff8104cb72>] ? __do_softirq+0xc4/0x1a0
>>> [  759.776465]  [<ffffffff81096875>] ? handle_percpu_irq+0x3d/0x54
>>> [  759.776475]  [<ffffffff8150efb6>] ? __xen_evtchn_do_upcall+0x1c7/0x205
>>> [  759.776484]  [<ffffffff8176e52c>] ? call_softirq+0x1c/0x30
>>> [  759.776493]  [<ffffffff8100fa47>] ? do_softirq+0x3f/0x79
>>> [  759.776501]  [<ffffffff8104c942>] ? irq_exit+0x44/0xb5
>>> [  759.776508]  [<ffffffff8150ffc6>] ? xen_evtchn_do_upcall+0x27/0x32
>>> [  759.776516]  [<ffffffff8176e57e>] ? xen_do_hypervisor_callback+0x1e/0x30
>> ...
>>> My simple cluster consists of 3 nodes in total. Each node is a Xen
>>> domU guest running the Linux kernel 3.2.6 and ceph 0.43. For
>>> reference, here is my configuration:
>> ...
>>> My Ceph kernel client is another Xen domU node running the Linux
>>> kernel 3.2.11. I have also tried a native client with the same result.
>>> Please note that this bug happens only in the client side.
>>> Your help would be greatly appreciated.
>>
>> Your backtrace includes Xen code in it -- can you reproduce this bug
>> with a mainline kernel, without Xen at all?
>>
>> Also, the error encountered is from the NX security subsystem. It
>> would be nice to know what would happen without NX.
>
>
>
> -----------------------------------------------------------
> Giorgos Kappes
> Website: http://www.cs.uoi.gr/~gkappes
> email: geokapp@gmail.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-05-17 22:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-08 15:43 Ceph kernel client - kernel craches Giorgos Kappes
2012-05-08 19:18 ` Tommi Virtanen
2012-05-10 18:00   ` Giorgos Kappes
2012-05-17 22:49     ` Josh Durgin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.