All of lore.kernel.org
 help / color / mirror / Atom feed
* General Protection Fault in 3.8.5
@ 2013-05-07  1:48 Travis Rhoden
  2013-05-07  2:54 ` Sage Weil
  0 siblings, 1 reply; 9+ messages in thread
From: Travis Rhoden @ 2013-05-07  1:48 UTC (permalink / raw)
  To: ceph-devel

Hey folks,

We have two servers that map a lot of RBDs (20 to 30 each so far),
using the RBD kernel module.  They are running Ubuntu 12.10, and I
originally saw a lot of kernel panics (obviously from Ceph) when
running a 3.5.7 kernel.

I upgrade a while back to a 3.8.5 kernel to get a much newer RBD
module, and the kernel panics from Ceph went away...and were replaced
by these nebulous "General Protection Faults" that I couldn't really
tell what was causing them.

Today we saw one that actually had a Ceph backtrace in it, so I wanted
to throw it on here:

May  6 23:02:58 nfs1 kernel: [295972.423165] general protection fault:
0000 [#3] SMP
May  6 23:02:58 nfs1 kernel: [295972.428252] Modules linked in: rbd
libceph libcrc32c coretemp nfsd kvm nfs_acl auth_rpcgss nfs fscache
lockd sunrpc gpio_ich psmouse microcode serio_raw i7core_edac ipmi_si
edac_core lpc_ich ioatdma ipmi_devintf mac_hid ipmi_msghandler bonding
lp parport tcp_bic raid10 raid456 async_pq async_xor xor async_memcpy
async_raid6_recov hid_generic raid6_pq usbhid async_tx hid igb raid1
myri10ge raid0 ahci ptp libahci dca pps_core multipath linear
May  6 23:02:58 nfs1 kernel: [295972.468114] CPU 17
May  6 23:02:58 nfs1 kernel: [295972.470133] Pid: 15920, comm:
kworker/17:2 Tainted: G      D      3.8.5-030805-generic #201303281651
Penguin Computing Relion 1751/X8DTU
May  6 23:02:58 nfs1 kernel: [295972.482635] RIP:
0010:[<ffffffff811851ff>]  [<ffffffff811851ff>]
kmem_cache_alloc_trace+0x5f/0x140
May  6 23:02:58 nfs1 kernel: [295972.491686] RSP:
0018:ffff880624cb1a98  EFLAGS: 00010202
May  6 23:02:58 nfs1 kernel: [295972.497074] RAX: 0000000000000000
RBX: ffff88032ddc46d0 RCX: 000000000003c867
May  6 23:02:58 nfs1 kernel: [295972.504283] RDX: 000000000003c866
RSI: 0000000000008050 RDI: 0000000000016c80
May  6 23:02:58 nfs1 kernel: [295972.511490] RBP: ffff880624cb1ae8
R08: ffff880333d76c80 R09: 0000000000000002
May  6 23:02:58 nfs1 kernel: [295972.518697] R10: ffff88032ce40070
R11: 000000000000000d R12: ffff880333802200
May  6 23:02:58 nfs1 kernel: [295972.525906] R13: 2e0460b9275465f2
R14: ffffffffa023901e R15: 0000000000008050
May  6 23:02:58 nfs1 kernel: [295972.533113] FS:
0000000000000000(0000) GS:ffff880333d60000(0000)
knlGS:0000000000000000
May  6 23:02:58 nfs1 kernel: [295972.541274] CS:  0010 DS: 0000 ES:
0000 CR0: 000000008005003b
May  6 23:02:58 nfs1 kernel: [295972.547095] CR2: 00007fbf9467f2b0
CR3: 0000000001c0d000 CR4: 00000000000007e0
May  6 23:02:58 nfs1 kernel: [295972.554305] DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
May  6 23:02:58 nfs1 kernel: [295972.561512] DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
May  6 23:02:58 nfs1 kernel: [295972.568720] Process kworker/17:2
(pid: 15920, threadinfo ffff880624cb0000, task ffff88032b600000)
May  6 23:02:58 nfs1 kernel: [295972.577656] Stack:
May  6 23:02:58 nfs1 kernel: [295972.579756]  0000000000000000
0000000000000000 0000000000000060 0000000000000000
May  6 23:02:58 nfs1 kernel: [295972.587292]  0000000000000000
ffff88032ddc46d0 0000000000000004 ffff88032ddc46c0
May  6 23:02:58 nfs1 kernel: [295972.594819]  ffff88032b432b30
0000000000000000 ffff880624cb1b28 ffffffffa023901e
May  6 23:02:58 nfs1 kernel: [295972.602347] Call Trace:
May  6 23:02:58 nfs1 kernel: [295972.604886]  [<ffffffffa023901e>]
get_ticket_handler.isra.4+0x5e/0xc0 [libceph]
May  6 23:02:58 nfs1 kernel: [295972.612271]  [<ffffffffa02394b4>]
ceph_x_proc_ticket_reply+0x274/0x440 [libceph]
May  6 23:02:58 nfs1 kernel: [295972.619740]  [<ffffffffa023973d>]
ceph_x_handle_reply+0xbd/0x110 [libceph]
May  6 23:02:58 nfs1 kernel: [295972.626696]  [<ffffffffa023765c>]
ceph_handle_auth_reply+0x18c/0x200 [libceph]
May  6 23:02:58 nfs1 kernel: [295972.633988]  [<ffffffffa022d590>]
handle_auth_reply.isra.12+0xa0/0x230 [libceph]
May  6 23:02:58 nfs1 kernel: [295972.641457]  [<ffffffffa022e87d>]
dispatch+0xbd/0x120 [libceph]
May  6 23:02:58 nfs1 kernel: [295972.647450]  [<ffffffffa0228205>]
process_message+0xa5/0xc0 [libceph]
May  6 23:02:58 nfs1 kernel: [295972.653966]  [<ffffffffa022c1b1>]
try_read+0x2e1/0x430 [libceph]
May  6 23:02:58 nfs1 kernel: [295972.660048]  [<ffffffffa022c38f>]
con_work+0x8f/0x140 [libceph]
May  6 23:02:58 nfs1 kernel: [295972.666043]  [<ffffffff81078c31>]
process_one_work+0x141/0x490
May  6 23:02:58 nfs1 kernel: [295972.671952]  [<ffffffff81079b08>]
worker_thread+0x168/0x400
May  6 23:02:58 nfs1 kernel: [295972.677601]  [<ffffffff810799a0>] ?
manage_workers+0x120/0x120
May  6 23:02:58 nfs1 kernel: [295972.683513]  [<ffffffff8107eff0>]
kthread+0xc0/0xd0
May  6 23:02:58 nfs1 kernel: [295972.688469]  [<ffffffff8107ef30>] ?
flush_kthread_worker+0xb0/0xb0
May  6 23:02:58 nfs1 kernel: [295972.694726]  [<ffffffff816f532c>]
ret_from_fork+0x7c/0xb0
May  6 23:02:58 nfs1 kernel: [295972.700203]  [<ffffffff8107ef30>] ?
flush_kthread_worker+0xb0/0xb0
May  6 23:02:58 nfs1 kernel: [295972.706456] Code: 00 4d 8b 04 24 65
4c 03 04 25 08 dc 00 00 49 8b 50 08 4d 8b 28 4d 85 ed 0f 84 cf 00 00
00 49 63 44 24 20 49 8b 3c 24 48 8d 4a 01 <49> 8b 5c 05 00 4c 89 e8 65
48 0f c7 0f 0f 94 c0 84 c0 74 c2 49
May  6 23:02:58 nfs1 kernel: [295972.726468] RIP  [<ffffffff811851ff>]
kmem_cache_alloc_trace+0x5f/0x140
May  6 23:02:58 nfs1 kernel: [295972.733182]  RSP <ffff880624cb1a98>
May  6 23:02:58 nfs1 kernel: [295972.736838] ---[ end trace
20e9b6a1bb611aba ]---

I'm not sure whether the problem started here or not.  I mentioned
that the previous GPFs were nebulous -- one thing most of them have
had in common is that it's almost always from nfsd (this one isn't --
first and only time I've seen this one).  Howevever, I am using NFS to
re-export some RBDs (to provide access to multiple clients) so Ceph is
still in the picture on those.

I know its not a lot to go on, but any advice would be appreciated.

 - Travis

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: General Protection Fault in 3.8.5
  2013-05-07  1:48 General Protection Fault in 3.8.5 Travis Rhoden
@ 2013-05-07  2:54 ` Sage Weil
  2013-05-07 14:54   ` Travis Rhoden
  0 siblings, 1 reply; 9+ messages in thread
From: Sage Weil @ 2013-05-07  2:54 UTC (permalink / raw)
  To: Travis Rhoden; +Cc: ceph-devel

On Mon, 6 May 2013, Travis Rhoden wrote:
> Hey folks,
> 
> We have two servers that map a lot of RBDs (20 to 30 each so far),
> using the RBD kernel module.  They are running Ubuntu 12.10, and I
> originally saw a lot of kernel panics (obviously from Ceph) when
> running a 3.5.7 kernel.
> 
> I upgrade a while back to a 3.8.5 kernel to get a much newer RBD
> module, and the kernel panics from Ceph went away...and were replaced
> by these nebulous "General Protection Faults" that I couldn't really
> tell what was causing them.
> 
> Today we saw one that actually had a Ceph backtrace in it, so I wanted
> to throw it on here:
> 
> May  6 23:02:58 nfs1 kernel: [295972.423165] general protection fault:
> 0000 [#3] SMP
> May  6 23:02:58 nfs1 kernel: [295972.428252] Modules linked in: rbd
> libceph libcrc32c coretemp nfsd kvm nfs_acl auth_rpcgss nfs fscache
> lockd sunrpc gpio_ich psmouse microcode serio_raw i7core_edac ipmi_si
> edac_core lpc_ich ioatdma ipmi_devintf mac_hid ipmi_msghandler bonding
> lp parport tcp_bic raid10 raid456 async_pq async_xor xor async_memcpy
> async_raid6_recov hid_generic raid6_pq usbhid async_tx hid igb raid1
> myri10ge raid0 ahci ptp libahci dca pps_core multipath linear
> May  6 23:02:58 nfs1 kernel: [295972.468114] CPU 17
> May  6 23:02:58 nfs1 kernel: [295972.470133] Pid: 15920, comm:
> kworker/17:2 Tainted: G      D      3.8.5-030805-generic #201303281651
> Penguin Computing Relion 1751/X8DTU
> May  6 23:02:58 nfs1 kernel: [295972.482635] RIP:
> 0010:[<ffffffff811851ff>]  [<ffffffff811851ff>]
> kmem_cache_alloc_trace+0x5f/0x140
> May  6 23:02:58 nfs1 kernel: [295972.491686] RSP:
> 0018:ffff880624cb1a98  EFLAGS: 00010202
> May  6 23:02:58 nfs1 kernel: [295972.497074] RAX: 0000000000000000
> RBX: ffff88032ddc46d0 RCX: 000000000003c867
> May  6 23:02:58 nfs1 kernel: [295972.504283] RDX: 000000000003c866
> RSI: 0000000000008050 RDI: 0000000000016c80
> May  6 23:02:58 nfs1 kernel: [295972.511490] RBP: ffff880624cb1ae8
> R08: ffff880333d76c80 R09: 0000000000000002
> May  6 23:02:58 nfs1 kernel: [295972.518697] R10: ffff88032ce40070
> R11: 000000000000000d R12: ffff880333802200
> May  6 23:02:58 nfs1 kernel: [295972.525906] R13: 2e0460b9275465f2
> R14: ffffffffa023901e R15: 0000000000008050
> May  6 23:02:58 nfs1 kernel: [295972.533113] FS:
> 0000000000000000(0000) GS:ffff880333d60000(0000)
> knlGS:0000000000000000
> May  6 23:02:58 nfs1 kernel: [295972.541274] CS:  0010 DS: 0000 ES:
> 0000 CR0: 000000008005003b
> May  6 23:02:58 nfs1 kernel: [295972.547095] CR2: 00007fbf9467f2b0
> CR3: 0000000001c0d000 CR4: 00000000000007e0
> May  6 23:02:58 nfs1 kernel: [295972.554305] DR0: 0000000000000000
> DR1: 0000000000000000 DR2: 0000000000000000
> May  6 23:02:58 nfs1 kernel: [295972.561512] DR3: 0000000000000000
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> May  6 23:02:58 nfs1 kernel: [295972.568720] Process kworker/17:2
> (pid: 15920, threadinfo ffff880624cb0000, task ffff88032b600000)
> May  6 23:02:58 nfs1 kernel: [295972.577656] Stack:
> May  6 23:02:58 nfs1 kernel: [295972.579756]  0000000000000000
> 0000000000000000 0000000000000060 0000000000000000
> May  6 23:02:58 nfs1 kernel: [295972.587292]  0000000000000000
> ffff88032ddc46d0 0000000000000004 ffff88032ddc46c0
> May  6 23:02:58 nfs1 kernel: [295972.594819]  ffff88032b432b30
> 0000000000000000 ffff880624cb1b28 ffffffffa023901e
> May  6 23:02:58 nfs1 kernel: [295972.602347] Call Trace:
> May  6 23:02:58 nfs1 kernel: [295972.604886]  [<ffffffffa023901e>]
> get_ticket_handler.isra.4+0x5e/0xc0 [libceph]
> May  6 23:02:58 nfs1 kernel: [295972.612271]  [<ffffffffa02394b4>]
> ceph_x_proc_ticket_reply+0x274/0x440 [libceph]
> May  6 23:02:58 nfs1 kernel: [295972.619740]  [<ffffffffa023973d>]
> ceph_x_handle_reply+0xbd/0x110 [libceph]
> May  6 23:02:58 nfs1 kernel: [295972.626696]  [<ffffffffa023765c>]
> ceph_handle_auth_reply+0x18c/0x200 [libceph]
> May  6 23:02:58 nfs1 kernel: [295972.633988]  [<ffffffffa022d590>]
> handle_auth_reply.isra.12+0xa0/0x230 [libceph]

Ah, this is in the auth code.  There was a series of patches that fixed 
the locking and a few other things that jsut went upstream for 3.10.  I'll 
prepare some patches to backport those fixes to stable kernels (3.8 and 
3.4).  It could easily explain your crashes.

Thanks!
sage


> May  6 23:02:58 nfs1 kernel: [295972.641457]  [<ffffffffa022e87d>]
> dispatch+0xbd/0x120 [libceph]
> May  6 23:02:58 nfs1 kernel: [295972.647450]  [<ffffffffa0228205>]
> process_message+0xa5/0xc0 [libceph]
> May  6 23:02:58 nfs1 kernel: [295972.653966]  [<ffffffffa022c1b1>]
> try_read+0x2e1/0x430 [libceph]
> May  6 23:02:58 nfs1 kernel: [295972.660048]  [<ffffffffa022c38f>]
> con_work+0x8f/0x140 [libceph]
> May  6 23:02:58 nfs1 kernel: [295972.666043]  [<ffffffff81078c31>]
> process_one_work+0x141/0x490
> May  6 23:02:58 nfs1 kernel: [295972.671952]  [<ffffffff81079b08>]
> worker_thread+0x168/0x400
> May  6 23:02:58 nfs1 kernel: [295972.677601]  [<ffffffff810799a0>] ?
> manage_workers+0x120/0x120
> May  6 23:02:58 nfs1 kernel: [295972.683513]  [<ffffffff8107eff0>]
> kthread+0xc0/0xd0
> May  6 23:02:58 nfs1 kernel: [295972.688469]  [<ffffffff8107ef30>] ?
> flush_kthread_worker+0xb0/0xb0
> May  6 23:02:58 nfs1 kernel: [295972.694726]  [<ffffffff816f532c>]
> ret_from_fork+0x7c/0xb0
> May  6 23:02:58 nfs1 kernel: [295972.700203]  [<ffffffff8107ef30>] ?
> flush_kthread_worker+0xb0/0xb0
> May  6 23:02:58 nfs1 kernel: [295972.706456] Code: 00 4d 8b 04 24 65
> 4c 03 04 25 08 dc 00 00 49 8b 50 08 4d 8b 28 4d 85 ed 0f 84 cf 00 00
> 00 49 63 44 24 20 49 8b 3c 24 48 8d 4a 01 <49> 8b 5c 05 00 4c 89 e8 65
> 48 0f c7 0f 0f 94 c0 84 c0 74 c2 49
> May  6 23:02:58 nfs1 kernel: [295972.726468] RIP  [<ffffffff811851ff>]
> kmem_cache_alloc_trace+0x5f/0x140
> May  6 23:02:58 nfs1 kernel: [295972.733182]  RSP <ffff880624cb1a98>
> May  6 23:02:58 nfs1 kernel: [295972.736838] ---[ end trace
> 20e9b6a1bb611aba ]---
> 
> I'm not sure whether the problem started here or not.  I mentioned
> that the previous GPFs were nebulous -- one thing most of them have
> had in common is that it's almost always from nfsd (this one isn't --
> first and only time I've seen this one).  Howevever, I am using NFS to
> re-export some RBDs (to provide access to multiple clients) so Ceph is
> still in the picture on those.
> 
> I know its not a lot to go on, but any advice would be appreciated.
> 
>  - Travis
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: General Protection Fault in 3.8.5
  2013-05-07  2:54 ` Sage Weil
@ 2013-05-07 14:54   ` Travis Rhoden
  2013-05-20 15:45     ` Travis Rhoden
  0 siblings, 1 reply; 9+ messages in thread
From: Travis Rhoden @ 2013-05-07 14:54 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Thanks Sage, I'll monitor the 3.8 point releases and update when I see
a release with those changes.

 - Travis

On Mon, May 6, 2013 at 10:54 PM, Sage Weil <sage@inktank.com> wrote:
> On Mon, 6 May 2013, Travis Rhoden wrote:
>> Hey folks,
>>
>> We have two servers that map a lot of RBDs (20 to 30 each so far),
>> using the RBD kernel module.  They are running Ubuntu 12.10, and I
>> originally saw a lot of kernel panics (obviously from Ceph) when
>> running a 3.5.7 kernel.
>>
>> I upgrade a while back to a 3.8.5 kernel to get a much newer RBD
>> module, and the kernel panics from Ceph went away...and were replaced
>> by these nebulous "General Protection Faults" that I couldn't really
>> tell what was causing them.
>>
>> Today we saw one that actually had a Ceph backtrace in it, so I wanted
>> to throw it on here:
>>
>> May  6 23:02:58 nfs1 kernel: [295972.423165] general protection fault:
>> 0000 [#3] SMP
>> May  6 23:02:58 nfs1 kernel: [295972.428252] Modules linked in: rbd
>> libceph libcrc32c coretemp nfsd kvm nfs_acl auth_rpcgss nfs fscache
>> lockd sunrpc gpio_ich psmouse microcode serio_raw i7core_edac ipmi_si
>> edac_core lpc_ich ioatdma ipmi_devintf mac_hid ipmi_msghandler bonding
>> lp parport tcp_bic raid10 raid456 async_pq async_xor xor async_memcpy
>> async_raid6_recov hid_generic raid6_pq usbhid async_tx hid igb raid1
>> myri10ge raid0 ahci ptp libahci dca pps_core multipath linear
>> May  6 23:02:58 nfs1 kernel: [295972.468114] CPU 17
>> May  6 23:02:58 nfs1 kernel: [295972.470133] Pid: 15920, comm:
>> kworker/17:2 Tainted: G      D      3.8.5-030805-generic #201303281651
>> Penguin Computing Relion 1751/X8DTU
>> May  6 23:02:58 nfs1 kernel: [295972.482635] RIP:
>> 0010:[<ffffffff811851ff>]  [<ffffffff811851ff>]
>> kmem_cache_alloc_trace+0x5f/0x140
>> May  6 23:02:58 nfs1 kernel: [295972.491686] RSP:
>> 0018:ffff880624cb1a98  EFLAGS: 00010202
>> May  6 23:02:58 nfs1 kernel: [295972.497074] RAX: 0000000000000000
>> RBX: ffff88032ddc46d0 RCX: 000000000003c867
>> May  6 23:02:58 nfs1 kernel: [295972.504283] RDX: 000000000003c866
>> RSI: 0000000000008050 RDI: 0000000000016c80
>> May  6 23:02:58 nfs1 kernel: [295972.511490] RBP: ffff880624cb1ae8
>> R08: ffff880333d76c80 R09: 0000000000000002
>> May  6 23:02:58 nfs1 kernel: [295972.518697] R10: ffff88032ce40070
>> R11: 000000000000000d R12: ffff880333802200
>> May  6 23:02:58 nfs1 kernel: [295972.525906] R13: 2e0460b9275465f2
>> R14: ffffffffa023901e R15: 0000000000008050
>> May  6 23:02:58 nfs1 kernel: [295972.533113] FS:
>> 0000000000000000(0000) GS:ffff880333d60000(0000)
>> knlGS:0000000000000000
>> May  6 23:02:58 nfs1 kernel: [295972.541274] CS:  0010 DS: 0000 ES:
>> 0000 CR0: 000000008005003b
>> May  6 23:02:58 nfs1 kernel: [295972.547095] CR2: 00007fbf9467f2b0
>> CR3: 0000000001c0d000 CR4: 00000000000007e0
>> May  6 23:02:58 nfs1 kernel: [295972.554305] DR0: 0000000000000000
>> DR1: 0000000000000000 DR2: 0000000000000000
>> May  6 23:02:58 nfs1 kernel: [295972.561512] DR3: 0000000000000000
>> DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> May  6 23:02:58 nfs1 kernel: [295972.568720] Process kworker/17:2
>> (pid: 15920, threadinfo ffff880624cb0000, task ffff88032b600000)
>> May  6 23:02:58 nfs1 kernel: [295972.577656] Stack:
>> May  6 23:02:58 nfs1 kernel: [295972.579756]  0000000000000000
>> 0000000000000000 0000000000000060 0000000000000000
>> May  6 23:02:58 nfs1 kernel: [295972.587292]  0000000000000000
>> ffff88032ddc46d0 0000000000000004 ffff88032ddc46c0
>> May  6 23:02:58 nfs1 kernel: [295972.594819]  ffff88032b432b30
>> 0000000000000000 ffff880624cb1b28 ffffffffa023901e
>> May  6 23:02:58 nfs1 kernel: [295972.602347] Call Trace:
>> May  6 23:02:58 nfs1 kernel: [295972.604886]  [<ffffffffa023901e>]
>> get_ticket_handler.isra.4+0x5e/0xc0 [libceph]
>> May  6 23:02:58 nfs1 kernel: [295972.612271]  [<ffffffffa02394b4>]
>> ceph_x_proc_ticket_reply+0x274/0x440 [libceph]
>> May  6 23:02:58 nfs1 kernel: [295972.619740]  [<ffffffffa023973d>]
>> ceph_x_handle_reply+0xbd/0x110 [libceph]
>> May  6 23:02:58 nfs1 kernel: [295972.626696]  [<ffffffffa023765c>]
>> ceph_handle_auth_reply+0x18c/0x200 [libceph]
>> May  6 23:02:58 nfs1 kernel: [295972.633988]  [<ffffffffa022d590>]
>> handle_auth_reply.isra.12+0xa0/0x230 [libceph]
>
> Ah, this is in the auth code.  There was a series of patches that fixed
> the locking and a few other things that jsut went upstream for 3.10.  I'll
> prepare some patches to backport those fixes to stable kernels (3.8 and
> 3.4).  It could easily explain your crashes.
>
> Thanks!
> sage
>
>
>> May  6 23:02:58 nfs1 kernel: [295972.641457]  [<ffffffffa022e87d>]
>> dispatch+0xbd/0x120 [libceph]
>> May  6 23:02:58 nfs1 kernel: [295972.647450]  [<ffffffffa0228205>]
>> process_message+0xa5/0xc0 [libceph]
>> May  6 23:02:58 nfs1 kernel: [295972.653966]  [<ffffffffa022c1b1>]
>> try_read+0x2e1/0x430 [libceph]
>> May  6 23:02:58 nfs1 kernel: [295972.660048]  [<ffffffffa022c38f>]
>> con_work+0x8f/0x140 [libceph]
>> May  6 23:02:58 nfs1 kernel: [295972.666043]  [<ffffffff81078c31>]
>> process_one_work+0x141/0x490
>> May  6 23:02:58 nfs1 kernel: [295972.671952]  [<ffffffff81079b08>]
>> worker_thread+0x168/0x400
>> May  6 23:02:58 nfs1 kernel: [295972.677601]  [<ffffffff810799a0>] ?
>> manage_workers+0x120/0x120
>> May  6 23:02:58 nfs1 kernel: [295972.683513]  [<ffffffff8107eff0>]
>> kthread+0xc0/0xd0
>> May  6 23:02:58 nfs1 kernel: [295972.688469]  [<ffffffff8107ef30>] ?
>> flush_kthread_worker+0xb0/0xb0
>> May  6 23:02:58 nfs1 kernel: [295972.694726]  [<ffffffff816f532c>]
>> ret_from_fork+0x7c/0xb0
>> May  6 23:02:58 nfs1 kernel: [295972.700203]  [<ffffffff8107ef30>] ?
>> flush_kthread_worker+0xb0/0xb0
>> May  6 23:02:58 nfs1 kernel: [295972.706456] Code: 00 4d 8b 04 24 65
>> 4c 03 04 25 08 dc 00 00 49 8b 50 08 4d 8b 28 4d 85 ed 0f 84 cf 00 00
>> 00 49 63 44 24 20 49 8b 3c 24 48 8d 4a 01 <49> 8b 5c 05 00 4c 89 e8 65
>> 48 0f c7 0f 0f 94 c0 84 c0 74 c2 49
>> May  6 23:02:58 nfs1 kernel: [295972.726468] RIP  [<ffffffff811851ff>]
>> kmem_cache_alloc_trace+0x5f/0x140
>> May  6 23:02:58 nfs1 kernel: [295972.733182]  RSP <ffff880624cb1a98>
>> May  6 23:02:58 nfs1 kernel: [295972.736838] ---[ end trace
>> 20e9b6a1bb611aba ]---
>>
>> I'm not sure whether the problem started here or not.  I mentioned
>> that the previous GPFs were nebulous -- one thing most of them have
>> had in common is that it's almost always from nfsd (this one isn't --
>> first and only time I've seen this one).  Howevever, I am using NFS to
>> re-export some RBDs (to provide access to multiple clients) so Ceph is
>> still in the picture on those.
>>
>> I know its not a lot to go on, but any advice would be appreciated.
>>
>>  - Travis
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: General Protection Fault in 3.8.5
  2013-05-07 14:54   ` Travis Rhoden
@ 2013-05-20 15:45     ` Travis Rhoden
  2013-05-20 16:29       ` Sage Weil
  0 siblings, 1 reply; 9+ messages in thread
From: Travis Rhoden @ 2013-05-20 15:45 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Sage,

Did a patch for the auth code get submitted for the 3.8 kernel?  I hit
this again over the weekend.  Looks slightly different than the last
one, but still in the auth code.

May 18 13:26:15 nfs1 kernel: [999560.730733] BUG: unable to handle
kernel paging request at ffff880640000000
May 18 13:26:15 nfs1 kernel: [999560.737818] IP: [<ffffffff8135ca9d>]
memcpy+0xd/0x110
May 18 13:26:15 nfs1 kernel: [999560.742974] PGD 1c0e063 PUD 0
May 18 13:26:15 nfs1 kernel: [999560.746150] Oops: 0000 [#1] SMP
May 18 13:26:15 nfs1 kernel: [999560.749498] Modules linked in: btrfs
zlib_deflate ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs reiserfs
ext2 rbd libceph libcrc32c nfsd nfs_acl auth_rpcgss nfs fscache lockd
coretemp sunrpc kvm gpio_ich psmouse microcode serio_raw i7core_edac
ioatdma lpc_ich edac_core ipmi_si mac_hid ipmi_devintf ipmi_msghandler
bonding lp parport tcp_bic raid10 raid456 async_pq async_xor xor
async_memcpy async_raid6_recov hid_generic usbhid hid raid6_pq
async_tx igb ahci myri10ge raid1 ptp libahci raid0 dca pps_core
multipath linear
May 18 13:26:15 nfs1 kernel: [999560.796421] CPU 0
May 18 13:26:15 nfs1 kernel: [999560.798353] Pid: 26234, comm:
kworker/0:0 Not tainted 3.8.5-030805-generic #201303281651 Penguin
Computing Relion 1751/X8DTU
May 18 13:26:15 nfs1 kernel: [999560.809827] RIP:
0010:[<ffffffff8135ca9d>]  [<ffffffff8135ca9d>] memcpy+0xd/0x110
May 18 13:26:15 nfs1 kernel: [999560.817403] RSP:
0018:ffff88062dc3dc40  EFLAGS: 00010246
May 18 13:26:15 nfs1 kernel: [999560.822794] RAX: ffffc90017f4301a
RBX: ffff880323ba4300 RCX: 1ffff100c2f035b2
May 18 13:26:15 nfs1 kernel: [999560.830003] RDX: 0000000000000000
RSI: ffff880640000000 RDI: ffffc9002c335952
May 18 13:26:15 nfs1 kernel: [999560.837209] RBP: ffff88062dc3dc98
R08: ffffc90043b52000 R09: ffff88062dc3dad4
May 18 13:26:15 nfs1 kernel: [999560.844417] R10: ffff88027a45f0e8
R11: ffff88033fffbec0 R12: ffffc90017f4301a
May 18 13:26:15 nfs1 kernel: [999560.851626] R13: 000000002bc0d708
R14: ffff880628407120 R15: 000000002bc0d6c8
May 18 13:26:15 nfs1 kernel: [999560.858834] FS:
0000000000000000(0000) GS:ffff880333c00000(0000)
knlGS:0000000000000000
May 18 13:26:15 nfs1 kernel: [999560.867000] CS:  0010 DS: 0000 ES:
0000 CR0: 000000008005003b
May 18 13:26:15 nfs1 kernel: [999560.872824] CR2: ffff880640000000
CR3: 0000000001c0d000 CR4: 00000000000007f0
May 18 13:26:15 nfs1 kernel: [999560.880032] DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
May 18 13:26:15 nfs1 kernel: [999560.887239] DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 18 13:26:15 nfs1 kernel: [999560.894446] Process kworker/0:0 (pid:
26234, threadinfo ffff88062dc3c000, task ffff88032d8845c0)
May 18 13:26:15 nfs1 kernel: [999560.903298] Stack:
May 18 13:26:15 nfs1 kernel: [999560.905399]  ffffffffa0368a54
ffffffffa035b60d 2bc0d6c8a0368d12 0000000000000098
May 18 13:26:15 nfs1 kernel: [999560.912942]  00000000000000c0
ffffffffa03687bc ffff880323ba4300 ffff880322fec4d8
May 18 13:26:15 nfs1 kernel: [999560.920471]  ffff880628407120
ffff88032bdf5c40 ffff880322fec420 ffff88062dc3dcd8
May 18 13:26:15 nfs1 kernel: [999560.928016] Call Trace:
May 18 13:26:15 nfs1 kernel: [999560.930561]  [<ffffffffa0368a54>] ?
ceph_x_build_authorizer.isra.6+0x144/0x1e0 [libceph]
May 18 13:26:15 nfs1 kernel: [999560.938727]  [<ffffffffa035b60d>] ?
ceph_buffer_release+0x2d/0x50 [libceph]
May 18 13:26:15 nfs1 kernel: [999560.945761]  [<ffffffffa03687bc>] ?
ceph_x_destroy_authorizer+0x2c/0x40 [libceph]
May 18 13:26:15 nfs1 kernel: [999560.953315]  [<ffffffffa0368d2e>]
ceph_x_create_authorizer+0x6e/0xd0 [libceph]
May 18 13:26:15 nfs1 kernel: [999560.960609]  [<ffffffffa035db49>]
get_authorizer+0x89/0xc0 [libceph]
May 18 13:26:15 nfs1 kernel: [999560.967035]  [<ffffffffa0357704>]
prepare_write_connect+0xb4/0x210 [libceph]
May 18 13:26:15 nfs1 kernel: [999560.974161]  [<ffffffffa035b2a5>]
try_read+0x3d5/0x430 [libceph]
May 18 13:26:15 nfs1 kernel: [999560.980249]  [<ffffffffa035b38f>]
con_work+0x8f/0x140 [libceph]
May 18 13:26:15 nfs1 kernel: [999560.986242]  [<ffffffff81078c31>]
process_one_work+0x141/0x490
May 18 13:26:15 nfs1 kernel: [999560.992153]  [<ffffffff81079b08>]
worker_thread+0x168/0x400
May 18 13:26:15 nfs1 kernel: [999560.997800]  [<ffffffff810799a0>] ?
manage_workers+0x120/0x120
May 18 13:26:15 nfs1 kernel: [999561.003713]  [<ffffffff8107eff0>]
kthread+0xc0/0xd0
May 18 13:26:15 nfs1 kernel: [999561.008669]  [<ffffffff8107ef30>] ?
flush_kthread_worker+0xb0/0xb0
May 18 13:26:15 nfs1 kernel: [999561.014927]  [<ffffffff816f532c>]
ret_from_fork+0x7c/0xb0
May 18 13:26:15 nfs1 kernel: [999561.020401]  [<ffffffff8107ef30>] ?
flush_kthread_worker+0xb0/0xb0
May 18 13:26:15 nfs1 kernel: [999561.026657] Code: 2b 43 50 88 43 4e
48 83 c4 08 5b 5d c3 90 e8 fb fd ff ff eb e6 90 90 90 90 90 90 90 90
90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20
4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c
May 18 13:26:15 nfs1 kernel: [999561.046667] RIP  [<ffffffff8135ca9d>]
memcpy+0xd/0x110
May 18 13:26:15 nfs1 kernel: [999561.051903]  RSP <ffff88062dc3dc40>
May 18 13:26:15 nfs1 kernel: [999561.055477] CR2: ffff880640000000
May 18 13:26:15 nfs1 kernel: [999561.058894] ---[ end trace
2fa4f8a71fe96709 ]---

Thanks!

 - Travis

On Tue, May 7, 2013 at 10:54 AM, Travis Rhoden <trhoden@gmail.com> wrote:
> Thanks Sage, I'll monitor the 3.8 point releases and update when I see
> a release with those changes.
>
>  - Travis
>
> On Mon, May 6, 2013 at 10:54 PM, Sage Weil <sage@inktank.com> wrote:
>> On Mon, 6 May 2013, Travis Rhoden wrote:
>>> Hey folks,
>>>
>>> We have two servers that map a lot of RBDs (20 to 30 each so far),
>>> using the RBD kernel module.  They are running Ubuntu 12.10, and I
>>> originally saw a lot of kernel panics (obviously from Ceph) when
>>> running a 3.5.7 kernel.
>>>
>>> I upgrade a while back to a 3.8.5 kernel to get a much newer RBD
>>> module, and the kernel panics from Ceph went away...and were replaced
>>> by these nebulous "General Protection Faults" that I couldn't really
>>> tell what was causing them.
>>>
>>> Today we saw one that actually had a Ceph backtrace in it, so I wanted
>>> to throw it on here:
>>>
>>> May  6 23:02:58 nfs1 kernel: [295972.423165] general protection fault:
>>> 0000 [#3] SMP
>>> May  6 23:02:58 nfs1 kernel: [295972.428252] Modules linked in: rbd
>>> libceph libcrc32c coretemp nfsd kvm nfs_acl auth_rpcgss nfs fscache
>>> lockd sunrpc gpio_ich psmouse microcode serio_raw i7core_edac ipmi_si
>>> edac_core lpc_ich ioatdma ipmi_devintf mac_hid ipmi_msghandler bonding
>>> lp parport tcp_bic raid10 raid456 async_pq async_xor xor async_memcpy
>>> async_raid6_recov hid_generic raid6_pq usbhid async_tx hid igb raid1
>>> myri10ge raid0 ahci ptp libahci dca pps_core multipath linear
>>> May  6 23:02:58 nfs1 kernel: [295972.468114] CPU 17
>>> May  6 23:02:58 nfs1 kernel: [295972.470133] Pid: 15920, comm:
>>> kworker/17:2 Tainted: G      D      3.8.5-030805-generic #201303281651
>>> Penguin Computing Relion 1751/X8DTU
>>> May  6 23:02:58 nfs1 kernel: [295972.482635] RIP:
>>> 0010:[<ffffffff811851ff>]  [<ffffffff811851ff>]
>>> kmem_cache_alloc_trace+0x5f/0x140
>>> May  6 23:02:58 nfs1 kernel: [295972.491686] RSP:
>>> 0018:ffff880624cb1a98  EFLAGS: 00010202
>>> May  6 23:02:58 nfs1 kernel: [295972.497074] RAX: 0000000000000000
>>> RBX: ffff88032ddc46d0 RCX: 000000000003c867
>>> May  6 23:02:58 nfs1 kernel: [295972.504283] RDX: 000000000003c866
>>> RSI: 0000000000008050 RDI: 0000000000016c80
>>> May  6 23:02:58 nfs1 kernel: [295972.511490] RBP: ffff880624cb1ae8
>>> R08: ffff880333d76c80 R09: 0000000000000002
>>> May  6 23:02:58 nfs1 kernel: [295972.518697] R10: ffff88032ce40070
>>> R11: 000000000000000d R12: ffff880333802200
>>> May  6 23:02:58 nfs1 kernel: [295972.525906] R13: 2e0460b9275465f2
>>> R14: ffffffffa023901e R15: 0000000000008050
>>> May  6 23:02:58 nfs1 kernel: [295972.533113] FS:
>>> 0000000000000000(0000) GS:ffff880333d60000(0000)
>>> knlGS:0000000000000000
>>> May  6 23:02:58 nfs1 kernel: [295972.541274] CS:  0010 DS: 0000 ES:
>>> 0000 CR0: 000000008005003b
>>> May  6 23:02:58 nfs1 kernel: [295972.547095] CR2: 00007fbf9467f2b0
>>> CR3: 0000000001c0d000 CR4: 00000000000007e0
>>> May  6 23:02:58 nfs1 kernel: [295972.554305] DR0: 0000000000000000
>>> DR1: 0000000000000000 DR2: 0000000000000000
>>> May  6 23:02:58 nfs1 kernel: [295972.561512] DR3: 0000000000000000
>>> DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>> May  6 23:02:58 nfs1 kernel: [295972.568720] Process kworker/17:2
>>> (pid: 15920, threadinfo ffff880624cb0000, task ffff88032b600000)
>>> May  6 23:02:58 nfs1 kernel: [295972.577656] Stack:
>>> May  6 23:02:58 nfs1 kernel: [295972.579756]  0000000000000000
>>> 0000000000000000 0000000000000060 0000000000000000
>>> May  6 23:02:58 nfs1 kernel: [295972.587292]  0000000000000000
>>> ffff88032ddc46d0 0000000000000004 ffff88032ddc46c0
>>> May  6 23:02:58 nfs1 kernel: [295972.594819]  ffff88032b432b30
>>> 0000000000000000 ffff880624cb1b28 ffffffffa023901e
>>> May  6 23:02:58 nfs1 kernel: [295972.602347] Call Trace:
>>> May  6 23:02:58 nfs1 kernel: [295972.604886]  [<ffffffffa023901e>]
>>> get_ticket_handler.isra.4+0x5e/0xc0 [libceph]
>>> May  6 23:02:58 nfs1 kernel: [295972.612271]  [<ffffffffa02394b4>]
>>> ceph_x_proc_ticket_reply+0x274/0x440 [libceph]
>>> May  6 23:02:58 nfs1 kernel: [295972.619740]  [<ffffffffa023973d>]
>>> ceph_x_handle_reply+0xbd/0x110 [libceph]
>>> May  6 23:02:58 nfs1 kernel: [295972.626696]  [<ffffffffa023765c>]
>>> ceph_handle_auth_reply+0x18c/0x200 [libceph]
>>> May  6 23:02:58 nfs1 kernel: [295972.633988]  [<ffffffffa022d590>]
>>> handle_auth_reply.isra.12+0xa0/0x230 [libceph]
>>
>> Ah, this is in the auth code.  There was a series of patches that fixed
>> the locking and a few other things that jsut went upstream for 3.10.  I'll
>> prepare some patches to backport those fixes to stable kernels (3.8 and
>> 3.4).  It could easily explain your crashes.
>>
>> Thanks!
>> sage
>>
>>
>>> May  6 23:02:58 nfs1 kernel: [295972.641457]  [<ffffffffa022e87d>]
>>> dispatch+0xbd/0x120 [libceph]
>>> May  6 23:02:58 nfs1 kernel: [295972.647450]  [<ffffffffa0228205>]
>>> process_message+0xa5/0xc0 [libceph]
>>> May  6 23:02:58 nfs1 kernel: [295972.653966]  [<ffffffffa022c1b1>]
>>> try_read+0x2e1/0x430 [libceph]
>>> May  6 23:02:58 nfs1 kernel: [295972.660048]  [<ffffffffa022c38f>]
>>> con_work+0x8f/0x140 [libceph]
>>> May  6 23:02:58 nfs1 kernel: [295972.666043]  [<ffffffff81078c31>]
>>> process_one_work+0x141/0x490
>>> May  6 23:02:58 nfs1 kernel: [295972.671952]  [<ffffffff81079b08>]
>>> worker_thread+0x168/0x400
>>> May  6 23:02:58 nfs1 kernel: [295972.677601]  [<ffffffff810799a0>] ?
>>> manage_workers+0x120/0x120
>>> May  6 23:02:58 nfs1 kernel: [295972.683513]  [<ffffffff8107eff0>]
>>> kthread+0xc0/0xd0
>>> May  6 23:02:58 nfs1 kernel: [295972.688469]  [<ffffffff8107ef30>] ?
>>> flush_kthread_worker+0xb0/0xb0
>>> May  6 23:02:58 nfs1 kernel: [295972.694726]  [<ffffffff816f532c>]
>>> ret_from_fork+0x7c/0xb0
>>> May  6 23:02:58 nfs1 kernel: [295972.700203]  [<ffffffff8107ef30>] ?
>>> flush_kthread_worker+0xb0/0xb0
>>> May  6 23:02:58 nfs1 kernel: [295972.706456] Code: 00 4d 8b 04 24 65
>>> 4c 03 04 25 08 dc 00 00 49 8b 50 08 4d 8b 28 4d 85 ed 0f 84 cf 00 00
>>> 00 49 63 44 24 20 49 8b 3c 24 48 8d 4a 01 <49> 8b 5c 05 00 4c 89 e8 65
>>> 48 0f c7 0f 0f 94 c0 84 c0 74 c2 49
>>> May  6 23:02:58 nfs1 kernel: [295972.726468] RIP  [<ffffffff811851ff>]
>>> kmem_cache_alloc_trace+0x5f/0x140
>>> May  6 23:02:58 nfs1 kernel: [295972.733182]  RSP <ffff880624cb1a98>
>>> May  6 23:02:58 nfs1 kernel: [295972.736838] ---[ end trace
>>> 20e9b6a1bb611aba ]---
>>>
>>> I'm not sure whether the problem started here or not.  I mentioned
>>> that the previous GPFs were nebulous -- one thing most of them have
>>> had in common is that it's almost always from nfsd (this one isn't --
>>> first and only time I've seen this one).  Howevever, I am using NFS to
>>> re-export some RBDs (to provide access to multiple clients) so Ceph is
>>> still in the picture on those.
>>>
>>> I know its not a lot to go on, but any advice would be appreciated.
>>>
>>>  - Travis
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: General Protection Fault in 3.8.5
  2013-05-20 15:45     ` Travis Rhoden
@ 2013-05-20 16:29       ` Sage Weil
  2013-05-20 18:15         ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 9+ messages in thread
From: Sage Weil @ 2013-05-20 16:29 UTC (permalink / raw)
  To: Travis Rhoden; +Cc: ceph-devel

Hi Travis,

The fixes for this locking just went upstream for 3.10.  We'll be sending 
to Greg KH for the stable kernels shortly.

sage


On Mon, 20 May 2013, Travis Rhoden wrote:

> Sage,
> 
> Did a patch for the auth code get submitted for the 3.8 kernel?  I hit
> this again over the weekend.  Looks slightly different than the last
> one, but still in the auth code.
> 
> May 18 13:26:15 nfs1 kernel: [999560.730733] BUG: unable to handle
> kernel paging request at ffff880640000000
> May 18 13:26:15 nfs1 kernel: [999560.737818] IP: [<ffffffff8135ca9d>]
> memcpy+0xd/0x110
> May 18 13:26:15 nfs1 kernel: [999560.742974] PGD 1c0e063 PUD 0
> May 18 13:26:15 nfs1 kernel: [999560.746150] Oops: 0000 [#1] SMP
> May 18 13:26:15 nfs1 kernel: [999560.749498] Modules linked in: btrfs
> zlib_deflate ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs reiserfs
> ext2 rbd libceph libcrc32c nfsd nfs_acl auth_rpcgss nfs fscache lockd
> coretemp sunrpc kvm gpio_ich psmouse microcode serio_raw i7core_edac
> ioatdma lpc_ich edac_core ipmi_si mac_hid ipmi_devintf ipmi_msghandler
> bonding lp parport tcp_bic raid10 raid456 async_pq async_xor xor
> async_memcpy async_raid6_recov hid_generic usbhid hid raid6_pq
> async_tx igb ahci myri10ge raid1 ptp libahci raid0 dca pps_core
> multipath linear
> May 18 13:26:15 nfs1 kernel: [999560.796421] CPU 0
> May 18 13:26:15 nfs1 kernel: [999560.798353] Pid: 26234, comm:
> kworker/0:0 Not tainted 3.8.5-030805-generic #201303281651 Penguin
> Computing Relion 1751/X8DTU
> May 18 13:26:15 nfs1 kernel: [999560.809827] RIP:
> 0010:[<ffffffff8135ca9d>]  [<ffffffff8135ca9d>] memcpy+0xd/0x110
> May 18 13:26:15 nfs1 kernel: [999560.817403] RSP:
> 0018:ffff88062dc3dc40  EFLAGS: 00010246
> May 18 13:26:15 nfs1 kernel: [999560.822794] RAX: ffffc90017f4301a
> RBX: ffff880323ba4300 RCX: 1ffff100c2f035b2
> May 18 13:26:15 nfs1 kernel: [999560.830003] RDX: 0000000000000000
> RSI: ffff880640000000 RDI: ffffc9002c335952
> May 18 13:26:15 nfs1 kernel: [999560.837209] RBP: ffff88062dc3dc98
> R08: ffffc90043b52000 R09: ffff88062dc3dad4
> May 18 13:26:15 nfs1 kernel: [999560.844417] R10: ffff88027a45f0e8
> R11: ffff88033fffbec0 R12: ffffc90017f4301a
> May 18 13:26:15 nfs1 kernel: [999560.851626] R13: 000000002bc0d708
> R14: ffff880628407120 R15: 000000002bc0d6c8
> May 18 13:26:15 nfs1 kernel: [999560.858834] FS:
> 0000000000000000(0000) GS:ffff880333c00000(0000)
> knlGS:0000000000000000
> May 18 13:26:15 nfs1 kernel: [999560.867000] CS:  0010 DS: 0000 ES:
> 0000 CR0: 000000008005003b
> May 18 13:26:15 nfs1 kernel: [999560.872824] CR2: ffff880640000000
> CR3: 0000000001c0d000 CR4: 00000000000007f0
> May 18 13:26:15 nfs1 kernel: [999560.880032] DR0: 0000000000000000
> DR1: 0000000000000000 DR2: 0000000000000000
> May 18 13:26:15 nfs1 kernel: [999560.887239] DR3: 0000000000000000
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> May 18 13:26:15 nfs1 kernel: [999560.894446] Process kworker/0:0 (pid:
> 26234, threadinfo ffff88062dc3c000, task ffff88032d8845c0)
> May 18 13:26:15 nfs1 kernel: [999560.903298] Stack:
> May 18 13:26:15 nfs1 kernel: [999560.905399]  ffffffffa0368a54
> ffffffffa035b60d 2bc0d6c8a0368d12 0000000000000098
> May 18 13:26:15 nfs1 kernel: [999560.912942]  00000000000000c0
> ffffffffa03687bc ffff880323ba4300 ffff880322fec4d8
> May 18 13:26:15 nfs1 kernel: [999560.920471]  ffff880628407120
> ffff88032bdf5c40 ffff880322fec420 ffff88062dc3dcd8
> May 18 13:26:15 nfs1 kernel: [999560.928016] Call Trace:
> May 18 13:26:15 nfs1 kernel: [999560.930561]  [<ffffffffa0368a54>] ?
> ceph_x_build_authorizer.isra.6+0x144/0x1e0 [libceph]
> May 18 13:26:15 nfs1 kernel: [999560.938727]  [<ffffffffa035b60d>] ?
> ceph_buffer_release+0x2d/0x50 [libceph]
> May 18 13:26:15 nfs1 kernel: [999560.945761]  [<ffffffffa03687bc>] ?
> ceph_x_destroy_authorizer+0x2c/0x40 [libceph]
> May 18 13:26:15 nfs1 kernel: [999560.953315]  [<ffffffffa0368d2e>]
> ceph_x_create_authorizer+0x6e/0xd0 [libceph]
> May 18 13:26:15 nfs1 kernel: [999560.960609]  [<ffffffffa035db49>]
> get_authorizer+0x89/0xc0 [libceph]
> May 18 13:26:15 nfs1 kernel: [999560.967035]  [<ffffffffa0357704>]
> prepare_write_connect+0xb4/0x210 [libceph]
> May 18 13:26:15 nfs1 kernel: [999560.974161]  [<ffffffffa035b2a5>]
> try_read+0x3d5/0x430 [libceph]
> May 18 13:26:15 nfs1 kernel: [999560.980249]  [<ffffffffa035b38f>]
> con_work+0x8f/0x140 [libceph]
> May 18 13:26:15 nfs1 kernel: [999560.986242]  [<ffffffff81078c31>]
> process_one_work+0x141/0x490
> May 18 13:26:15 nfs1 kernel: [999560.992153]  [<ffffffff81079b08>]
> worker_thread+0x168/0x400
> May 18 13:26:15 nfs1 kernel: [999560.997800]  [<ffffffff810799a0>] ?
> manage_workers+0x120/0x120
> May 18 13:26:15 nfs1 kernel: [999561.003713]  [<ffffffff8107eff0>]
> kthread+0xc0/0xd0
> May 18 13:26:15 nfs1 kernel: [999561.008669]  [<ffffffff8107ef30>] ?
> flush_kthread_worker+0xb0/0xb0
> May 18 13:26:15 nfs1 kernel: [999561.014927]  [<ffffffff816f532c>]
> ret_from_fork+0x7c/0xb0
> May 18 13:26:15 nfs1 kernel: [999561.020401]  [<ffffffff8107ef30>] ?
> flush_kthread_worker+0xb0/0xb0
> May 18 13:26:15 nfs1 kernel: [999561.026657] Code: 2b 43 50 88 43 4e
> 48 83 c4 08 5b 5d c3 90 e8 fb fd ff ff eb e6 90 90 90 90 90 90 90 90
> 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20
> 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c
> May 18 13:26:15 nfs1 kernel: [999561.046667] RIP  [<ffffffff8135ca9d>]
> memcpy+0xd/0x110
> May 18 13:26:15 nfs1 kernel: [999561.051903]  RSP <ffff88062dc3dc40>
> May 18 13:26:15 nfs1 kernel: [999561.055477] CR2: ffff880640000000
> May 18 13:26:15 nfs1 kernel: [999561.058894] ---[ end trace
> 2fa4f8a71fe96709 ]---
> 
> Thanks!
> 
>  - Travis
> 
> On Tue, May 7, 2013 at 10:54 AM, Travis Rhoden <trhoden@gmail.com> wrote:
> > Thanks Sage, I'll monitor the 3.8 point releases and update when I see
> > a release with those changes.
> >
> >  - Travis
> >
> > On Mon, May 6, 2013 at 10:54 PM, Sage Weil <sage@inktank.com> wrote:
> >> On Mon, 6 May 2013, Travis Rhoden wrote:
> >>> Hey folks,
> >>>
> >>> We have two servers that map a lot of RBDs (20 to 30 each so far),
> >>> using the RBD kernel module.  They are running Ubuntu 12.10, and I
> >>> originally saw a lot of kernel panics (obviously from Ceph) when
> >>> running a 3.5.7 kernel.
> >>>
> >>> I upgrade a while back to a 3.8.5 kernel to get a much newer RBD
> >>> module, and the kernel panics from Ceph went away...and were replaced
> >>> by these nebulous "General Protection Faults" that I couldn't really
> >>> tell what was causing them.
> >>>
> >>> Today we saw one that actually had a Ceph backtrace in it, so I wanted
> >>> to throw it on here:
> >>>
> >>> May  6 23:02:58 nfs1 kernel: [295972.423165] general protection fault:
> >>> 0000 [#3] SMP
> >>> May  6 23:02:58 nfs1 kernel: [295972.428252] Modules linked in: rbd
> >>> libceph libcrc32c coretemp nfsd kvm nfs_acl auth_rpcgss nfs fscache
> >>> lockd sunrpc gpio_ich psmouse microcode serio_raw i7core_edac ipmi_si
> >>> edac_core lpc_ich ioatdma ipmi_devintf mac_hid ipmi_msghandler bonding
> >>> lp parport tcp_bic raid10 raid456 async_pq async_xor xor async_memcpy
> >>> async_raid6_recov hid_generic raid6_pq usbhid async_tx hid igb raid1
> >>> myri10ge raid0 ahci ptp libahci dca pps_core multipath linear
> >>> May  6 23:02:58 nfs1 kernel: [295972.468114] CPU 17
> >>> May  6 23:02:58 nfs1 kernel: [295972.470133] Pid: 15920, comm:
> >>> kworker/17:2 Tainted: G      D      3.8.5-030805-generic #201303281651
> >>> Penguin Computing Relion 1751/X8DTU
> >>> May  6 23:02:58 nfs1 kernel: [295972.482635] RIP:
> >>> 0010:[<ffffffff811851ff>]  [<ffffffff811851ff>]
> >>> kmem_cache_alloc_trace+0x5f/0x140
> >>> May  6 23:02:58 nfs1 kernel: [295972.491686] RSP:
> >>> 0018:ffff880624cb1a98  EFLAGS: 00010202
> >>> May  6 23:02:58 nfs1 kernel: [295972.497074] RAX: 0000000000000000
> >>> RBX: ffff88032ddc46d0 RCX: 000000000003c867
> >>> May  6 23:02:58 nfs1 kernel: [295972.504283] RDX: 000000000003c866
> >>> RSI: 0000000000008050 RDI: 0000000000016c80
> >>> May  6 23:02:58 nfs1 kernel: [295972.511490] RBP: ffff880624cb1ae8
> >>> R08: ffff880333d76c80 R09: 0000000000000002
> >>> May  6 23:02:58 nfs1 kernel: [295972.518697] R10: ffff88032ce40070
> >>> R11: 000000000000000d R12: ffff880333802200
> >>> May  6 23:02:58 nfs1 kernel: [295972.525906] R13: 2e0460b9275465f2
> >>> R14: ffffffffa023901e R15: 0000000000008050
> >>> May  6 23:02:58 nfs1 kernel: [295972.533113] FS:
> >>> 0000000000000000(0000) GS:ffff880333d60000(0000)
> >>> knlGS:0000000000000000
> >>> May  6 23:02:58 nfs1 kernel: [295972.541274] CS:  0010 DS: 0000 ES:
> >>> 0000 CR0: 000000008005003b
> >>> May  6 23:02:58 nfs1 kernel: [295972.547095] CR2: 00007fbf9467f2b0
> >>> CR3: 0000000001c0d000 CR4: 00000000000007e0
> >>> May  6 23:02:58 nfs1 kernel: [295972.554305] DR0: 0000000000000000
> >>> DR1: 0000000000000000 DR2: 0000000000000000
> >>> May  6 23:02:58 nfs1 kernel: [295972.561512] DR3: 0000000000000000
> >>> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >>> May  6 23:02:58 nfs1 kernel: [295972.568720] Process kworker/17:2
> >>> (pid: 15920, threadinfo ffff880624cb0000, task ffff88032b600000)
> >>> May  6 23:02:58 nfs1 kernel: [295972.577656] Stack:
> >>> May  6 23:02:58 nfs1 kernel: [295972.579756]  0000000000000000
> >>> 0000000000000000 0000000000000060 0000000000000000
> >>> May  6 23:02:58 nfs1 kernel: [295972.587292]  0000000000000000
> >>> ffff88032ddc46d0 0000000000000004 ffff88032ddc46c0
> >>> May  6 23:02:58 nfs1 kernel: [295972.594819]  ffff88032b432b30
> >>> 0000000000000000 ffff880624cb1b28 ffffffffa023901e
> >>> May  6 23:02:58 nfs1 kernel: [295972.602347] Call Trace:
> >>> May  6 23:02:58 nfs1 kernel: [295972.604886]  [<ffffffffa023901e>]
> >>> get_ticket_handler.isra.4+0x5e/0xc0 [libceph]
> >>> May  6 23:02:58 nfs1 kernel: [295972.612271]  [<ffffffffa02394b4>]
> >>> ceph_x_proc_ticket_reply+0x274/0x440 [libceph]
> >>> May  6 23:02:58 nfs1 kernel: [295972.619740]  [<ffffffffa023973d>]
> >>> ceph_x_handle_reply+0xbd/0x110 [libceph]
> >>> May  6 23:02:58 nfs1 kernel: [295972.626696]  [<ffffffffa023765c>]
> >>> ceph_handle_auth_reply+0x18c/0x200 [libceph]
> >>> May  6 23:02:58 nfs1 kernel: [295972.633988]  [<ffffffffa022d590>]
> >>> handle_auth_reply.isra.12+0xa0/0x230 [libceph]
> >>
> >> Ah, this is in the auth code.  There was a series of patches that fixed
> >> the locking and a few other things that jsut went upstream for 3.10.  I'll
> >> prepare some patches to backport those fixes to stable kernels (3.8 and
> >> 3.4).  It could easily explain your crashes.
> >>
> >> Thanks!
> >> sage
> >>
> >>
> >>> May  6 23:02:58 nfs1 kernel: [295972.641457]  [<ffffffffa022e87d>]
> >>> dispatch+0xbd/0x120 [libceph]
> >>> May  6 23:02:58 nfs1 kernel: [295972.647450]  [<ffffffffa0228205>]
> >>> process_message+0xa5/0xc0 [libceph]
> >>> May  6 23:02:58 nfs1 kernel: [295972.653966]  [<ffffffffa022c1b1>]
> >>> try_read+0x2e1/0x430 [libceph]
> >>> May  6 23:02:58 nfs1 kernel: [295972.660048]  [<ffffffffa022c38f>]
> >>> con_work+0x8f/0x140 [libceph]
> >>> May  6 23:02:58 nfs1 kernel: [295972.666043]  [<ffffffff81078c31>]
> >>> process_one_work+0x141/0x490
> >>> May  6 23:02:58 nfs1 kernel: [295972.671952]  [<ffffffff81079b08>]
> >>> worker_thread+0x168/0x400
> >>> May  6 23:02:58 nfs1 kernel: [295972.677601]  [<ffffffff810799a0>] ?
> >>> manage_workers+0x120/0x120
> >>> May  6 23:02:58 nfs1 kernel: [295972.683513]  [<ffffffff8107eff0>]
> >>> kthread+0xc0/0xd0
> >>> May  6 23:02:58 nfs1 kernel: [295972.688469]  [<ffffffff8107ef30>] ?
> >>> flush_kthread_worker+0xb0/0xb0
> >>> May  6 23:02:58 nfs1 kernel: [295972.694726]  [<ffffffff816f532c>]
> >>> ret_from_fork+0x7c/0xb0
> >>> May  6 23:02:58 nfs1 kernel: [295972.700203]  [<ffffffff8107ef30>] ?
> >>> flush_kthread_worker+0xb0/0xb0
> >>> May  6 23:02:58 nfs1 kernel: [295972.706456] Code: 00 4d 8b 04 24 65
> >>> 4c 03 04 25 08 dc 00 00 49 8b 50 08 4d 8b 28 4d 85 ed 0f 84 cf 00 00
> >>> 00 49 63 44 24 20 49 8b 3c 24 48 8d 4a 01 <49> 8b 5c 05 00 4c 89 e8 65
> >>> 48 0f c7 0f 0f 94 c0 84 c0 74 c2 49
> >>> May  6 23:02:58 nfs1 kernel: [295972.726468] RIP  [<ffffffff811851ff>]
> >>> kmem_cache_alloc_trace+0x5f/0x140
> >>> May  6 23:02:58 nfs1 kernel: [295972.733182]  RSP <ffff880624cb1a98>
> >>> May  6 23:02:58 nfs1 kernel: [295972.736838] ---[ end trace
> >>> 20e9b6a1bb611aba ]---
> >>>
> >>> I'm not sure whether the problem started here or not.  I mentioned
> >>> that the previous GPFs were nebulous -- one thing most of them have
> >>> had in common is that it's almost always from nfsd (this one isn't --
> >>> first and only time I've seen this one).  Howevever, I am using NFS to
> >>> re-export some RBDs (to provide access to multiple clients) so Ceph is
> >>> still in the picture on those.
> >>>
> >>> I know its not a lot to go on, but any advice would be appreciated.
> >>>
> >>>  - Travis
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>
> >>>
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: General Protection Fault in 3.8.5
  2013-05-20 16:29       ` Sage Weil
@ 2013-05-20 18:15         ` Stefan Priebe - Profihost AG
  2013-05-20 18:18           ` Sage Weil
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Priebe - Profihost AG @ 2013-05-20 18:15 UTC (permalink / raw)
  To: Sage Weil; +Cc: Travis Rhoden, ceph-devel

Am 20.05.2013 um 18:29 schrieb Sage Weil <sage@inktank.com>:

> Hi Travis,
> 
> The fixes for this locking just went upstream for 3.10.  We'll be sending 
> to Greg KH for the stable kernels shortly.
> 
> sage

But this won't change anything for 3.8 as this is eol.

> 
> 
> On Mon, 20 May 2013, Travis Rhoden wrote:
> 
>> Sage,
>> 
>> Did a patch for the auth code get submitted for the 3.8 kernel?  I hit
>> this again over the weekend.  Looks slightly different than the last
>> one, but still in the auth code.
>> 
>> May 18 13:26:15 nfs1 kernel: [999560.730733] BUG: unable to handle
>> kernel paging request at ffff880640000000
>> May 18 13:26:15 nfs1 kernel: [999560.737818] IP: [<ffffffff8135ca9d>]
>> memcpy+0xd/0x110
>> May 18 13:26:15 nfs1 kernel: [999560.742974] PGD 1c0e063 PUD 0
>> May 18 13:26:15 nfs1 kernel: [999560.746150] Oops: 0000 [#1] SMP
>> May 18 13:26:15 nfs1 kernel: [999560.749498] Modules linked in: btrfs
>> zlib_deflate ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs reiserfs
>> ext2 rbd libceph libcrc32c nfsd nfs_acl auth_rpcgss nfs fscache lockd
>> coretemp sunrpc kvm gpio_ich psmouse microcode serio_raw i7core_edac
>> ioatdma lpc_ich edac_core ipmi_si mac_hid ipmi_devintf ipmi_msghandler
>> bonding lp parport tcp_bic raid10 raid456 async_pq async_xor xor
>> async_memcpy async_raid6_recov hid_generic usbhid hid raid6_pq
>> async_tx igb ahci myri10ge raid1 ptp libahci raid0 dca pps_core
>> multipath linear
>> May 18 13:26:15 nfs1 kernel: [999560.796421] CPU 0
>> May 18 13:26:15 nfs1 kernel: [999560.798353] Pid: 26234, comm:
>> kworker/0:0 Not tainted 3.8.5-030805-generic #201303281651 Penguin
>> Computing Relion 1751/X8DTU
>> May 18 13:26:15 nfs1 kernel: [999560.809827] RIP:
>> 0010:[<ffffffff8135ca9d>]  [<ffffffff8135ca9d>] memcpy+0xd/0x110
>> May 18 13:26:15 nfs1 kernel: [999560.817403] RSP:
>> 0018:ffff88062dc3dc40  EFLAGS: 00010246
>> May 18 13:26:15 nfs1 kernel: [999560.822794] RAX: ffffc90017f4301a
>> RBX: ffff880323ba4300 RCX: 1ffff100c2f035b2
>> May 18 13:26:15 nfs1 kernel: [999560.830003] RDX: 0000000000000000
>> RSI: ffff880640000000 RDI: ffffc9002c335952
>> May 18 13:26:15 nfs1 kernel: [999560.837209] RBP: ffff88062dc3dc98
>> R08: ffffc90043b52000 R09: ffff88062dc3dad4
>> May 18 13:26:15 nfs1 kernel: [999560.844417] R10: ffff88027a45f0e8
>> R11: ffff88033fffbec0 R12: ffffc90017f4301a
>> May 18 13:26:15 nfs1 kernel: [999560.851626] R13: 000000002bc0d708
>> R14: ffff880628407120 R15: 000000002bc0d6c8
>> May 18 13:26:15 nfs1 kernel: [999560.858834] FS:
>> 0000000000000000(0000) GS:ffff880333c00000(0000)
>> knlGS:0000000000000000
>> May 18 13:26:15 nfs1 kernel: [999560.867000] CS:  0010 DS: 0000 ES:
>> 0000 CR0: 000000008005003b
>> May 18 13:26:15 nfs1 kernel: [999560.872824] CR2: ffff880640000000
>> CR3: 0000000001c0d000 CR4: 00000000000007f0
>> May 18 13:26:15 nfs1 kernel: [999560.880032] DR0: 0000000000000000
>> DR1: 0000000000000000 DR2: 0000000000000000
>> May 18 13:26:15 nfs1 kernel: [999560.887239] DR3: 0000000000000000
>> DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> May 18 13:26:15 nfs1 kernel: [999560.894446] Process kworker/0:0 (pid:
>> 26234, threadinfo ffff88062dc3c000, task ffff88032d8845c0)
>> May 18 13:26:15 nfs1 kernel: [999560.903298] Stack:
>> May 18 13:26:15 nfs1 kernel: [999560.905399]  ffffffffa0368a54
>> ffffffffa035b60d 2bc0d6c8a0368d12 0000000000000098
>> May 18 13:26:15 nfs1 kernel: [999560.912942]  00000000000000c0
>> ffffffffa03687bc ffff880323ba4300 ffff880322fec4d8
>> May 18 13:26:15 nfs1 kernel: [999560.920471]  ffff880628407120
>> ffff88032bdf5c40 ffff880322fec420 ffff88062dc3dcd8
>> May 18 13:26:15 nfs1 kernel: [999560.928016] Call Trace:
>> May 18 13:26:15 nfs1 kernel: [999560.930561]  [<ffffffffa0368a54>] ?
>> ceph_x_build_authorizer.isra.6+0x144/0x1e0 [libceph]
>> May 18 13:26:15 nfs1 kernel: [999560.938727]  [<ffffffffa035b60d>] ?
>> ceph_buffer_release+0x2d/0x50 [libceph]
>> May 18 13:26:15 nfs1 kernel: [999560.945761]  [<ffffffffa03687bc>] ?
>> ceph_x_destroy_authorizer+0x2c/0x40 [libceph]
>> May 18 13:26:15 nfs1 kernel: [999560.953315]  [<ffffffffa0368d2e>]
>> ceph_x_create_authorizer+0x6e/0xd0 [libceph]
>> May 18 13:26:15 nfs1 kernel: [999560.960609]  [<ffffffffa035db49>]
>> get_authorizer+0x89/0xc0 [libceph]
>> May 18 13:26:15 nfs1 kernel: [999560.967035]  [<ffffffffa0357704>]
>> prepare_write_connect+0xb4/0x210 [libceph]
>> May 18 13:26:15 nfs1 kernel: [999560.974161]  [<ffffffffa035b2a5>]
>> try_read+0x3d5/0x430 [libceph]
>> May 18 13:26:15 nfs1 kernel: [999560.980249]  [<ffffffffa035b38f>]
>> con_work+0x8f/0x140 [libceph]
>> May 18 13:26:15 nfs1 kernel: [999560.986242]  [<ffffffff81078c31>]
>> process_one_work+0x141/0x490
>> May 18 13:26:15 nfs1 kernel: [999560.992153]  [<ffffffff81079b08>]
>> worker_thread+0x168/0x400
>> May 18 13:26:15 nfs1 kernel: [999560.997800]  [<ffffffff810799a0>] ?
>> manage_workers+0x120/0x120
>> May 18 13:26:15 nfs1 kernel: [999561.003713]  [<ffffffff8107eff0>]
>> kthread+0xc0/0xd0
>> May 18 13:26:15 nfs1 kernel: [999561.008669]  [<ffffffff8107ef30>] ?
>> flush_kthread_worker+0xb0/0xb0
>> May 18 13:26:15 nfs1 kernel: [999561.014927]  [<ffffffff816f532c>]
>> ret_from_fork+0x7c/0xb0
>> May 18 13:26:15 nfs1 kernel: [999561.020401]  [<ffffffff8107ef30>] ?
>> flush_kthread_worker+0xb0/0xb0
>> May 18 13:26:15 nfs1 kernel: [999561.026657] Code: 2b 43 50 88 43 4e
>> 48 83 c4 08 5b 5d c3 90 e8 fb fd ff ff eb e6 90 90 90 90 90 90 90 90
>> 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20
>> 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c
>> May 18 13:26:15 nfs1 kernel: [999561.046667] RIP  [<ffffffff8135ca9d>]
>> memcpy+0xd/0x110
>> May 18 13:26:15 nfs1 kernel: [999561.051903]  RSP <ffff88062dc3dc40>
>> May 18 13:26:15 nfs1 kernel: [999561.055477] CR2: ffff880640000000
>> May 18 13:26:15 nfs1 kernel: [999561.058894] ---[ end trace
>> 2fa4f8a71fe96709 ]---
>> 
>> Thanks!
>> 
>> - Travis
>> 
>> On Tue, May 7, 2013 at 10:54 AM, Travis Rhoden <trhoden@gmail.com> wrote:
>>> Thanks Sage, I'll monitor the 3.8 point releases and update when I see
>>> a release with those changes.
>>> 
>>> - Travis
>>> 
>>> On Mon, May 6, 2013 at 10:54 PM, Sage Weil <sage@inktank.com> wrote:
>>>> On Mon, 6 May 2013, Travis Rhoden wrote:
>>>>> Hey folks,
>>>>> 
>>>>> We have two servers that map a lot of RBDs (20 to 30 each so far),
>>>>> using the RBD kernel module.  They are running Ubuntu 12.10, and I
>>>>> originally saw a lot of kernel panics (obviously from Ceph) when
>>>>> running a 3.5.7 kernel.
>>>>> 
>>>>> I upgrade a while back to a 3.8.5 kernel to get a much newer RBD
>>>>> module, and the kernel panics from Ceph went away...and were replaced
>>>>> by these nebulous "General Protection Faults" that I couldn't really
>>>>> tell what was causing them.
>>>>> 
>>>>> Today we saw one that actually had a Ceph backtrace in it, so I wanted
>>>>> to throw it on here:
>>>>> 
>>>>> May  6 23:02:58 nfs1 kernel: [295972.423165] general protection fault:
>>>>> 0000 [#3] SMP
>>>>> May  6 23:02:58 nfs1 kernel: [295972.428252] Modules linked in: rbd
>>>>> libceph libcrc32c coretemp nfsd kvm nfs_acl auth_rpcgss nfs fscache
>>>>> lockd sunrpc gpio_ich psmouse microcode serio_raw i7core_edac ipmi_si
>>>>> edac_core lpc_ich ioatdma ipmi_devintf mac_hid ipmi_msghandler bonding
>>>>> lp parport tcp_bic raid10 raid456 async_pq async_xor xor async_memcpy
>>>>> async_raid6_recov hid_generic raid6_pq usbhid async_tx hid igb raid1
>>>>> myri10ge raid0 ahci ptp libahci dca pps_core multipath linear
>>>>> May  6 23:02:58 nfs1 kernel: [295972.468114] CPU 17
>>>>> May  6 23:02:58 nfs1 kernel: [295972.470133] Pid: 15920, comm:
>>>>> kworker/17:2 Tainted: G      D      3.8.5-030805-generic #201303281651
>>>>> Penguin Computing Relion 1751/X8DTU
>>>>> May  6 23:02:58 nfs1 kernel: [295972.482635] RIP:
>>>>> 0010:[<ffffffff811851ff>]  [<ffffffff811851ff>]
>>>>> kmem_cache_alloc_trace+0x5f/0x140
>>>>> May  6 23:02:58 nfs1 kernel: [295972.491686] RSP:
>>>>> 0018:ffff880624cb1a98  EFLAGS: 00010202
>>>>> May  6 23:02:58 nfs1 kernel: [295972.497074] RAX: 0000000000000000
>>>>> RBX: ffff88032ddc46d0 RCX: 000000000003c867
>>>>> May  6 23:02:58 nfs1 kernel: [295972.504283] RDX: 000000000003c866
>>>>> RSI: 0000000000008050 RDI: 0000000000016c80
>>>>> May  6 23:02:58 nfs1 kernel: [295972.511490] RBP: ffff880624cb1ae8
>>>>> R08: ffff880333d76c80 R09: 0000000000000002
>>>>> May  6 23:02:58 nfs1 kernel: [295972.518697] R10: ffff88032ce40070
>>>>> R11: 000000000000000d R12: ffff880333802200
>>>>> May  6 23:02:58 nfs1 kernel: [295972.525906] R13: 2e0460b9275465f2
>>>>> R14: ffffffffa023901e R15: 0000000000008050
>>>>> May  6 23:02:58 nfs1 kernel: [295972.533113] FS:
>>>>> 0000000000000000(0000) GS:ffff880333d60000(0000)
>>>>> knlGS:0000000000000000
>>>>> May  6 23:02:58 nfs1 kernel: [295972.541274] CS:  0010 DS: 0000 ES:
>>>>> 0000 CR0: 000000008005003b
>>>>> May  6 23:02:58 nfs1 kernel: [295972.547095] CR2: 00007fbf9467f2b0
>>>>> CR3: 0000000001c0d000 CR4: 00000000000007e0
>>>>> May  6 23:02:58 nfs1 kernel: [295972.554305] DR0: 0000000000000000
>>>>> DR1: 0000000000000000 DR2: 0000000000000000
>>>>> May  6 23:02:58 nfs1 kernel: [295972.561512] DR3: 0000000000000000
>>>>> DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>>>> May  6 23:02:58 nfs1 kernel: [295972.568720] Process kworker/17:2
>>>>> (pid: 15920, threadinfo ffff880624cb0000, task ffff88032b600000)
>>>>> May  6 23:02:58 nfs1 kernel: [295972.577656] Stack:
>>>>> May  6 23:02:58 nfs1 kernel: [295972.579756]  0000000000000000
>>>>> 0000000000000000 0000000000000060 0000000000000000
>>>>> May  6 23:02:58 nfs1 kernel: [295972.587292]  0000000000000000
>>>>> ffff88032ddc46d0 0000000000000004 ffff88032ddc46c0
>>>>> May  6 23:02:58 nfs1 kernel: [295972.594819]  ffff88032b432b30
>>>>> 0000000000000000 ffff880624cb1b28 ffffffffa023901e
>>>>> May  6 23:02:58 nfs1 kernel: [295972.602347] Call Trace:
>>>>> May  6 23:02:58 nfs1 kernel: [295972.604886]  [<ffffffffa023901e>]
>>>>> get_ticket_handler.isra.4+0x5e/0xc0 [libceph]
>>>>> May  6 23:02:58 nfs1 kernel: [295972.612271]  [<ffffffffa02394b4>]
>>>>> ceph_x_proc_ticket_reply+0x274/0x440 [libceph]
>>>>> May  6 23:02:58 nfs1 kernel: [295972.619740]  [<ffffffffa023973d>]
>>>>> ceph_x_handle_reply+0xbd/0x110 [libceph]
>>>>> May  6 23:02:58 nfs1 kernel: [295972.626696]  [<ffffffffa023765c>]
>>>>> ceph_handle_auth_reply+0x18c/0x200 [libceph]
>>>>> May  6 23:02:58 nfs1 kernel: [295972.633988]  [<ffffffffa022d590>]
>>>>> handle_auth_reply.isra.12+0xa0/0x230 [libceph]
>>>> 
>>>> Ah, this is in the auth code.  There was a series of patches that fixed
>>>> the locking and a few other things that jsut went upstream for 3.10.  I'll
>>>> prepare some patches to backport those fixes to stable kernels (3.8 and
>>>> 3.4).  It could easily explain your crashes.
>>>> 
>>>> Thanks!
>>>> sage
>>>> 
>>>> 
>>>>> May  6 23:02:58 nfs1 kernel: [295972.641457]  [<ffffffffa022e87d>]
>>>>> dispatch+0xbd/0x120 [libceph]
>>>>> May  6 23:02:58 nfs1 kernel: [295972.647450]  [<ffffffffa0228205>]
>>>>> process_message+0xa5/0xc0 [libceph]
>>>>> May  6 23:02:58 nfs1 kernel: [295972.653966]  [<ffffffffa022c1b1>]
>>>>> try_read+0x2e1/0x430 [libceph]
>>>>> May  6 23:02:58 nfs1 kernel: [295972.660048]  [<ffffffffa022c38f>]
>>>>> con_work+0x8f/0x140 [libceph]
>>>>> May  6 23:02:58 nfs1 kernel: [295972.666043]  [<ffffffff81078c31>]
>>>>> process_one_work+0x141/0x490
>>>>> May  6 23:02:58 nfs1 kernel: [295972.671952]  [<ffffffff81079b08>]
>>>>> worker_thread+0x168/0x400
>>>>> May  6 23:02:58 nfs1 kernel: [295972.677601]  [<ffffffff810799a0>] ?
>>>>> manage_workers+0x120/0x120
>>>>> May  6 23:02:58 nfs1 kernel: [295972.683513]  [<ffffffff8107eff0>]
>>>>> kthread+0xc0/0xd0
>>>>> May  6 23:02:58 nfs1 kernel: [295972.688469]  [<ffffffff8107ef30>] ?
>>>>> flush_kthread_worker+0xb0/0xb0
>>>>> May  6 23:02:58 nfs1 kernel: [295972.694726]  [<ffffffff816f532c>]
>>>>> ret_from_fork+0x7c/0xb0
>>>>> May  6 23:02:58 nfs1 kernel: [295972.700203]  [<ffffffff8107ef30>] ?
>>>>> flush_kthread_worker+0xb0/0xb0
>>>>> May  6 23:02:58 nfs1 kernel: [295972.706456] Code: 00 4d 8b 04 24 65
>>>>> 4c 03 04 25 08 dc 00 00 49 8b 50 08 4d 8b 28 4d 85 ed 0f 84 cf 00 00
>>>>> 00 49 63 44 24 20 49 8b 3c 24 48 8d 4a 01 <49> 8b 5c 05 00 4c 89 e8 65
>>>>> 48 0f c7 0f 0f 94 c0 84 c0 74 c2 49
>>>>> May  6 23:02:58 nfs1 kernel: [295972.726468] RIP  [<ffffffff811851ff>]
>>>>> kmem_cache_alloc_trace+0x5f/0x140
>>>>> May  6 23:02:58 nfs1 kernel: [295972.733182]  RSP <ffff880624cb1a98>
>>>>> May  6 23:02:58 nfs1 kernel: [295972.736838] ---[ end trace
>>>>> 20e9b6a1bb611aba ]---
>>>>> 
>>>>> I'm not sure whether the problem started here or not.  I mentioned
>>>>> that the previous GPFs were nebulous -- one thing most of them have
>>>>> had in common is that it's almost always from nfsd (this one isn't --
>>>>> first and only time I've seen this one).  Howevever, I am using NFS to
>>>>> re-export some RBDs (to provide access to multiple clients) so Ceph is
>>>>> still in the picture on those.
>>>>> 
>>>>> I know its not a lot to go on, but any advice would be appreciated.
>>>>> 
>>>>> - Travis
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: General Protection Fault in 3.8.5
  2013-05-20 18:15         ` Stefan Priebe - Profihost AG
@ 2013-05-20 18:18           ` Sage Weil
  2013-05-21 15:53             ` Travis Rhoden
  0 siblings, 1 reply; 9+ messages in thread
From: Sage Weil @ 2013-05-20 18:18 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: Travis Rhoden, ceph-devel

On Mon, 20 May 2013, Stefan Priebe - Profihost AG wrote:
> Am 20.05.2013 um 18:29 schrieb Sage Weil <sage@inktank.com>:
> 
> > Hi Travis,
> > 
> > The fixes for this locking just went upstream for 3.10.  We'll be sending 
> > to Greg KH for the stable kernels shortly.
> > 
> > sage
> 
> But this won't change anything for 3.8 as this is eol.

Yeah, it'll go to 3.9 and 3.4.

sage


> 
> > 
> > 
> > On Mon, 20 May 2013, Travis Rhoden wrote:
> > 
> >> Sage,
> >> 
> >> Did a patch for the auth code get submitted for the 3.8 kernel?  I hit
> >> this again over the weekend.  Looks slightly different than the last
> >> one, but still in the auth code.
> >> 
> >> May 18 13:26:15 nfs1 kernel: [999560.730733] BUG: unable to handle
> >> kernel paging request at ffff880640000000
> >> May 18 13:26:15 nfs1 kernel: [999560.737818] IP: [<ffffffff8135ca9d>]
> >> memcpy+0xd/0x110
> >> May 18 13:26:15 nfs1 kernel: [999560.742974] PGD 1c0e063 PUD 0
> >> May 18 13:26:15 nfs1 kernel: [999560.746150] Oops: 0000 [#1] SMP
> >> May 18 13:26:15 nfs1 kernel: [999560.749498] Modules linked in: btrfs
> >> zlib_deflate ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs reiserfs
> >> ext2 rbd libceph libcrc32c nfsd nfs_acl auth_rpcgss nfs fscache lockd
> >> coretemp sunrpc kvm gpio_ich psmouse microcode serio_raw i7core_edac
> >> ioatdma lpc_ich edac_core ipmi_si mac_hid ipmi_devintf ipmi_msghandler
> >> bonding lp parport tcp_bic raid10 raid456 async_pq async_xor xor
> >> async_memcpy async_raid6_recov hid_generic usbhid hid raid6_pq
> >> async_tx igb ahci myri10ge raid1 ptp libahci raid0 dca pps_core
> >> multipath linear
> >> May 18 13:26:15 nfs1 kernel: [999560.796421] CPU 0
> >> May 18 13:26:15 nfs1 kernel: [999560.798353] Pid: 26234, comm:
> >> kworker/0:0 Not tainted 3.8.5-030805-generic #201303281651 Penguin
> >> Computing Relion 1751/X8DTU
> >> May 18 13:26:15 nfs1 kernel: [999560.809827] RIP:
> >> 0010:[<ffffffff8135ca9d>]  [<ffffffff8135ca9d>] memcpy+0xd/0x110
> >> May 18 13:26:15 nfs1 kernel: [999560.817403] RSP:
> >> 0018:ffff88062dc3dc40  EFLAGS: 00010246
> >> May 18 13:26:15 nfs1 kernel: [999560.822794] RAX: ffffc90017f4301a
> >> RBX: ffff880323ba4300 RCX: 1ffff100c2f035b2
> >> May 18 13:26:15 nfs1 kernel: [999560.830003] RDX: 0000000000000000
> >> RSI: ffff880640000000 RDI: ffffc9002c335952
> >> May 18 13:26:15 nfs1 kernel: [999560.837209] RBP: ffff88062dc3dc98
> >> R08: ffffc90043b52000 R09: ffff88062dc3dad4
> >> May 18 13:26:15 nfs1 kernel: [999560.844417] R10: ffff88027a45f0e8
> >> R11: ffff88033fffbec0 R12: ffffc90017f4301a
> >> May 18 13:26:15 nfs1 kernel: [999560.851626] R13: 000000002bc0d708
> >> R14: ffff880628407120 R15: 000000002bc0d6c8
> >> May 18 13:26:15 nfs1 kernel: [999560.858834] FS:
> >> 0000000000000000(0000) GS:ffff880333c00000(0000)
> >> knlGS:0000000000000000
> >> May 18 13:26:15 nfs1 kernel: [999560.867000] CS:  0010 DS: 0000 ES:
> >> 0000 CR0: 000000008005003b
> >> May 18 13:26:15 nfs1 kernel: [999560.872824] CR2: ffff880640000000
> >> CR3: 0000000001c0d000 CR4: 00000000000007f0
> >> May 18 13:26:15 nfs1 kernel: [999560.880032] DR0: 0000000000000000
> >> DR1: 0000000000000000 DR2: 0000000000000000
> >> May 18 13:26:15 nfs1 kernel: [999560.887239] DR3: 0000000000000000
> >> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >> May 18 13:26:15 nfs1 kernel: [999560.894446] Process kworker/0:0 (pid:
> >> 26234, threadinfo ffff88062dc3c000, task ffff88032d8845c0)
> >> May 18 13:26:15 nfs1 kernel: [999560.903298] Stack:
> >> May 18 13:26:15 nfs1 kernel: [999560.905399]  ffffffffa0368a54
> >> ffffffffa035b60d 2bc0d6c8a0368d12 0000000000000098
> >> May 18 13:26:15 nfs1 kernel: [999560.912942]  00000000000000c0
> >> ffffffffa03687bc ffff880323ba4300 ffff880322fec4d8
> >> May 18 13:26:15 nfs1 kernel: [999560.920471]  ffff880628407120
> >> ffff88032bdf5c40 ffff880322fec420 ffff88062dc3dcd8
> >> May 18 13:26:15 nfs1 kernel: [999560.928016] Call Trace:
> >> May 18 13:26:15 nfs1 kernel: [999560.930561]  [<ffffffffa0368a54>] ?
> >> ceph_x_build_authorizer.isra.6+0x144/0x1e0 [libceph]
> >> May 18 13:26:15 nfs1 kernel: [999560.938727]  [<ffffffffa035b60d>] ?
> >> ceph_buffer_release+0x2d/0x50 [libceph]
> >> May 18 13:26:15 nfs1 kernel: [999560.945761]  [<ffffffffa03687bc>] ?
> >> ceph_x_destroy_authorizer+0x2c/0x40 [libceph]
> >> May 18 13:26:15 nfs1 kernel: [999560.953315]  [<ffffffffa0368d2e>]
> >> ceph_x_create_authorizer+0x6e/0xd0 [libceph]
> >> May 18 13:26:15 nfs1 kernel: [999560.960609]  [<ffffffffa035db49>]
> >> get_authorizer+0x89/0xc0 [libceph]
> >> May 18 13:26:15 nfs1 kernel: [999560.967035]  [<ffffffffa0357704>]
> >> prepare_write_connect+0xb4/0x210 [libceph]
> >> May 18 13:26:15 nfs1 kernel: [999560.974161]  [<ffffffffa035b2a5>]
> >> try_read+0x3d5/0x430 [libceph]
> >> May 18 13:26:15 nfs1 kernel: [999560.980249]  [<ffffffffa035b38f>]
> >> con_work+0x8f/0x140 [libceph]
> >> May 18 13:26:15 nfs1 kernel: [999560.986242]  [<ffffffff81078c31>]
> >> process_one_work+0x141/0x490
> >> May 18 13:26:15 nfs1 kernel: [999560.992153]  [<ffffffff81079b08>]
> >> worker_thread+0x168/0x400
> >> May 18 13:26:15 nfs1 kernel: [999560.997800]  [<ffffffff810799a0>] ?
> >> manage_workers+0x120/0x120
> >> May 18 13:26:15 nfs1 kernel: [999561.003713]  [<ffffffff8107eff0>]
> >> kthread+0xc0/0xd0
> >> May 18 13:26:15 nfs1 kernel: [999561.008669]  [<ffffffff8107ef30>] ?
> >> flush_kthread_worker+0xb0/0xb0
> >> May 18 13:26:15 nfs1 kernel: [999561.014927]  [<ffffffff816f532c>]
> >> ret_from_fork+0x7c/0xb0
> >> May 18 13:26:15 nfs1 kernel: [999561.020401]  [<ffffffff8107ef30>] ?
> >> flush_kthread_worker+0xb0/0xb0
> >> May 18 13:26:15 nfs1 kernel: [999561.026657] Code: 2b 43 50 88 43 4e
> >> 48 83 c4 08 5b 5d c3 90 e8 fb fd ff ff eb e6 90 90 90 90 90 90 90 90
> >> 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20
> >> 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c
> >> May 18 13:26:15 nfs1 kernel: [999561.046667] RIP  [<ffffffff8135ca9d>]
> >> memcpy+0xd/0x110
> >> May 18 13:26:15 nfs1 kernel: [999561.051903]  RSP <ffff88062dc3dc40>
> >> May 18 13:26:15 nfs1 kernel: [999561.055477] CR2: ffff880640000000
> >> May 18 13:26:15 nfs1 kernel: [999561.058894] ---[ end trace
> >> 2fa4f8a71fe96709 ]---
> >> 
> >> Thanks!
> >> 
> >> - Travis
> >> 
> >> On Tue, May 7, 2013 at 10:54 AM, Travis Rhoden <trhoden@gmail.com> wrote:
> >>> Thanks Sage, I'll monitor the 3.8 point releases and update when I see
> >>> a release with those changes.
> >>> 
> >>> - Travis
> >>> 
> >>> On Mon, May 6, 2013 at 10:54 PM, Sage Weil <sage@inktank.com> wrote:
> >>>> On Mon, 6 May 2013, Travis Rhoden wrote:
> >>>>> Hey folks,
> >>>>> 
> >>>>> We have two servers that map a lot of RBDs (20 to 30 each so far),
> >>>>> using the RBD kernel module.  They are running Ubuntu 12.10, and I
> >>>>> originally saw a lot of kernel panics (obviously from Ceph) when
> >>>>> running a 3.5.7 kernel.
> >>>>> 
> >>>>> I upgrade a while back to a 3.8.5 kernel to get a much newer RBD
> >>>>> module, and the kernel panics from Ceph went away...and were replaced
> >>>>> by these nebulous "General Protection Faults" that I couldn't really
> >>>>> tell what was causing them.
> >>>>> 
> >>>>> Today we saw one that actually had a Ceph backtrace in it, so I wanted
> >>>>> to throw it on here:
> >>>>> 
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.423165] general protection fault:
> >>>>> 0000 [#3] SMP
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.428252] Modules linked in: rbd
> >>>>> libceph libcrc32c coretemp nfsd kvm nfs_acl auth_rpcgss nfs fscache
> >>>>> lockd sunrpc gpio_ich psmouse microcode serio_raw i7core_edac ipmi_si
> >>>>> edac_core lpc_ich ioatdma ipmi_devintf mac_hid ipmi_msghandler bonding
> >>>>> lp parport tcp_bic raid10 raid456 async_pq async_xor xor async_memcpy
> >>>>> async_raid6_recov hid_generic raid6_pq usbhid async_tx hid igb raid1
> >>>>> myri10ge raid0 ahci ptp libahci dca pps_core multipath linear
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.468114] CPU 17
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.470133] Pid: 15920, comm:
> >>>>> kworker/17:2 Tainted: G      D      3.8.5-030805-generic #201303281651
> >>>>> Penguin Computing Relion 1751/X8DTU
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.482635] RIP:
> >>>>> 0010:[<ffffffff811851ff>]  [<ffffffff811851ff>]
> >>>>> kmem_cache_alloc_trace+0x5f/0x140
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.491686] RSP:
> >>>>> 0018:ffff880624cb1a98  EFLAGS: 00010202
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.497074] RAX: 0000000000000000
> >>>>> RBX: ffff88032ddc46d0 RCX: 000000000003c867
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.504283] RDX: 000000000003c866
> >>>>> RSI: 0000000000008050 RDI: 0000000000016c80
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.511490] RBP: ffff880624cb1ae8
> >>>>> R08: ffff880333d76c80 R09: 0000000000000002
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.518697] R10: ffff88032ce40070
> >>>>> R11: 000000000000000d R12: ffff880333802200
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.525906] R13: 2e0460b9275465f2
> >>>>> R14: ffffffffa023901e R15: 0000000000008050
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.533113] FS:
> >>>>> 0000000000000000(0000) GS:ffff880333d60000(0000)
> >>>>> knlGS:0000000000000000
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.541274] CS:  0010 DS: 0000 ES:
> >>>>> 0000 CR0: 000000008005003b
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.547095] CR2: 00007fbf9467f2b0
> >>>>> CR3: 0000000001c0d000 CR4: 00000000000007e0
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.554305] DR0: 0000000000000000
> >>>>> DR1: 0000000000000000 DR2: 0000000000000000
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.561512] DR3: 0000000000000000
> >>>>> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.568720] Process kworker/17:2
> >>>>> (pid: 15920, threadinfo ffff880624cb0000, task ffff88032b600000)
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.577656] Stack:
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.579756]  0000000000000000
> >>>>> 0000000000000000 0000000000000060 0000000000000000
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.587292]  0000000000000000
> >>>>> ffff88032ddc46d0 0000000000000004 ffff88032ddc46c0
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.594819]  ffff88032b432b30
> >>>>> 0000000000000000 ffff880624cb1b28 ffffffffa023901e
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.602347] Call Trace:
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.604886]  [<ffffffffa023901e>]
> >>>>> get_ticket_handler.isra.4+0x5e/0xc0 [libceph]
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.612271]  [<ffffffffa02394b4>]
> >>>>> ceph_x_proc_ticket_reply+0x274/0x440 [libceph]
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.619740]  [<ffffffffa023973d>]
> >>>>> ceph_x_handle_reply+0xbd/0x110 [libceph]
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.626696]  [<ffffffffa023765c>]
> >>>>> ceph_handle_auth_reply+0x18c/0x200 [libceph]
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.633988]  [<ffffffffa022d590>]
> >>>>> handle_auth_reply.isra.12+0xa0/0x230 [libceph]
> >>>> 
> >>>> Ah, this is in the auth code.  There was a series of patches that fixed
> >>>> the locking and a few other things that jsut went upstream for 3.10.  I'll
> >>>> prepare some patches to backport those fixes to stable kernels (3.8 and
> >>>> 3.4).  It could easily explain your crashes.
> >>>> 
> >>>> Thanks!
> >>>> sage
> >>>> 
> >>>> 
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.641457]  [<ffffffffa022e87d>]
> >>>>> dispatch+0xbd/0x120 [libceph]
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.647450]  [<ffffffffa0228205>]
> >>>>> process_message+0xa5/0xc0 [libceph]
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.653966]  [<ffffffffa022c1b1>]
> >>>>> try_read+0x2e1/0x430 [libceph]
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.660048]  [<ffffffffa022c38f>]
> >>>>> con_work+0x8f/0x140 [libceph]
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.666043]  [<ffffffff81078c31>]
> >>>>> process_one_work+0x141/0x490
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.671952]  [<ffffffff81079b08>]
> >>>>> worker_thread+0x168/0x400
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.677601]  [<ffffffff810799a0>] ?
> >>>>> manage_workers+0x120/0x120
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.683513]  [<ffffffff8107eff0>]
> >>>>> kthread+0xc0/0xd0
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.688469]  [<ffffffff8107ef30>] ?
> >>>>> flush_kthread_worker+0xb0/0xb0
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.694726]  [<ffffffff816f532c>]
> >>>>> ret_from_fork+0x7c/0xb0
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.700203]  [<ffffffff8107ef30>] ?
> >>>>> flush_kthread_worker+0xb0/0xb0
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.706456] Code: 00 4d 8b 04 24 65
> >>>>> 4c 03 04 25 08 dc 00 00 49 8b 50 08 4d 8b 28 4d 85 ed 0f 84 cf 00 00
> >>>>> 00 49 63 44 24 20 49 8b 3c 24 48 8d 4a 01 <49> 8b 5c 05 00 4c 89 e8 65
> >>>>> 48 0f c7 0f 0f 94 c0 84 c0 74 c2 49
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.726468] RIP  [<ffffffff811851ff>]
> >>>>> kmem_cache_alloc_trace+0x5f/0x140
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.733182]  RSP <ffff880624cb1a98>
> >>>>> May  6 23:02:58 nfs1 kernel: [295972.736838] ---[ end trace
> >>>>> 20e9b6a1bb611aba ]---
> >>>>> 
> >>>>> I'm not sure whether the problem started here or not.  I mentioned
> >>>>> that the previous GPFs were nebulous -- one thing most of them have
> >>>>> had in common is that it's almost always from nfsd (this one isn't --
> >>>>> first and only time I've seen this one).  Howevever, I am using NFS to
> >>>>> re-export some RBDs (to provide access to multiple clients) so Ceph is
> >>>>> still in the picture on those.
> >>>>> 
> >>>>> I know its not a lot to go on, but any advice would be appreciated.
> >>>>> 
> >>>>> - Travis
> >>>>> --
> >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>>> the body of a message to majordomo@vger.kernel.org
> >>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: General Protection Fault in 3.8.5
  2013-05-20 18:18           ` Sage Weil
@ 2013-05-21 15:53             ` Travis Rhoden
  2013-06-10 19:26               ` Travis Rhoden
  0 siblings, 1 reply; 9+ messages in thread
From: Travis Rhoden @ 2013-05-21 15:53 UTC (permalink / raw)
  To: Sage Weil; +Cc: Stefan Priebe - Profihost AG, ceph-devel

Roger. Thanks for the heads-up on that.

On Mon, May 20, 2013 at 2:18 PM, Sage Weil <sage@inktank.com> wrote:
> On Mon, 20 May 2013, Stefan Priebe - Profihost AG wrote:
>> Am 20.05.2013 um 18:29 schrieb Sage Weil <sage@inktank.com>:
>>
>> > Hi Travis,
>> >
>> > The fixes for this locking just went upstream for 3.10.  We'll be sending
>> > to Greg KH for the stable kernels shortly.
>> >
>> > sage
>>
>> But this won't change anything for 3.8 as this is eol.
>
> Yeah, it'll go to 3.9 and 3.4.
>
> sage
>
>
>>
>> >
>> >
>> > On Mon, 20 May 2013, Travis Rhoden wrote:
>> >
>> >> Sage,
>> >>
>> >> Did a patch for the auth code get submitted for the 3.8 kernel?  I hit
>> >> this again over the weekend.  Looks slightly different than the last
>> >> one, but still in the auth code.
>> >>
>> >> May 18 13:26:15 nfs1 kernel: [999560.730733] BUG: unable to handle
>> >> kernel paging request at ffff880640000000
>> >> May 18 13:26:15 nfs1 kernel: [999560.737818] IP: [<ffffffff8135ca9d>]
>> >> memcpy+0xd/0x110
>> >> May 18 13:26:15 nfs1 kernel: [999560.742974] PGD 1c0e063 PUD 0
>> >> May 18 13:26:15 nfs1 kernel: [999560.746150] Oops: 0000 [#1] SMP
>> >> May 18 13:26:15 nfs1 kernel: [999560.749498] Modules linked in: btrfs
>> >> zlib_deflate ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs reiserfs
>> >> ext2 rbd libceph libcrc32c nfsd nfs_acl auth_rpcgss nfs fscache lockd
>> >> coretemp sunrpc kvm gpio_ich psmouse microcode serio_raw i7core_edac
>> >> ioatdma lpc_ich edac_core ipmi_si mac_hid ipmi_devintf ipmi_msghandler
>> >> bonding lp parport tcp_bic raid10 raid456 async_pq async_xor xor
>> >> async_memcpy async_raid6_recov hid_generic usbhid hid raid6_pq
>> >> async_tx igb ahci myri10ge raid1 ptp libahci raid0 dca pps_core
>> >> multipath linear
>> >> May 18 13:26:15 nfs1 kernel: [999560.796421] CPU 0
>> >> May 18 13:26:15 nfs1 kernel: [999560.798353] Pid: 26234, comm:
>> >> kworker/0:0 Not tainted 3.8.5-030805-generic #201303281651 Penguin
>> >> Computing Relion 1751/X8DTU
>> >> May 18 13:26:15 nfs1 kernel: [999560.809827] RIP:
>> >> 0010:[<ffffffff8135ca9d>]  [<ffffffff8135ca9d>] memcpy+0xd/0x110
>> >> May 18 13:26:15 nfs1 kernel: [999560.817403] RSP:
>> >> 0018:ffff88062dc3dc40  EFLAGS: 00010246
>> >> May 18 13:26:15 nfs1 kernel: [999560.822794] RAX: ffffc90017f4301a
>> >> RBX: ffff880323ba4300 RCX: 1ffff100c2f035b2
>> >> May 18 13:26:15 nfs1 kernel: [999560.830003] RDX: 0000000000000000
>> >> RSI: ffff880640000000 RDI: ffffc9002c335952
>> >> May 18 13:26:15 nfs1 kernel: [999560.837209] RBP: ffff88062dc3dc98
>> >> R08: ffffc90043b52000 R09: ffff88062dc3dad4
>> >> May 18 13:26:15 nfs1 kernel: [999560.844417] R10: ffff88027a45f0e8
>> >> R11: ffff88033fffbec0 R12: ffffc90017f4301a
>> >> May 18 13:26:15 nfs1 kernel: [999560.851626] R13: 000000002bc0d708
>> >> R14: ffff880628407120 R15: 000000002bc0d6c8
>> >> May 18 13:26:15 nfs1 kernel: [999560.858834] FS:
>> >> 0000000000000000(0000) GS:ffff880333c00000(0000)
>> >> knlGS:0000000000000000
>> >> May 18 13:26:15 nfs1 kernel: [999560.867000] CS:  0010 DS: 0000 ES:
>> >> 0000 CR0: 000000008005003b
>> >> May 18 13:26:15 nfs1 kernel: [999560.872824] CR2: ffff880640000000
>> >> CR3: 0000000001c0d000 CR4: 00000000000007f0
>> >> May 18 13:26:15 nfs1 kernel: [999560.880032] DR0: 0000000000000000
>> >> DR1: 0000000000000000 DR2: 0000000000000000
>> >> May 18 13:26:15 nfs1 kernel: [999560.887239] DR3: 0000000000000000
>> >> DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> >> May 18 13:26:15 nfs1 kernel: [999560.894446] Process kworker/0:0 (pid:
>> >> 26234, threadinfo ffff88062dc3c000, task ffff88032d8845c0)
>> >> May 18 13:26:15 nfs1 kernel: [999560.903298] Stack:
>> >> May 18 13:26:15 nfs1 kernel: [999560.905399]  ffffffffa0368a54
>> >> ffffffffa035b60d 2bc0d6c8a0368d12 0000000000000098
>> >> May 18 13:26:15 nfs1 kernel: [999560.912942]  00000000000000c0
>> >> ffffffffa03687bc ffff880323ba4300 ffff880322fec4d8
>> >> May 18 13:26:15 nfs1 kernel: [999560.920471]  ffff880628407120
>> >> ffff88032bdf5c40 ffff880322fec420 ffff88062dc3dcd8
>> >> May 18 13:26:15 nfs1 kernel: [999560.928016] Call Trace:
>> >> May 18 13:26:15 nfs1 kernel: [999560.930561]  [<ffffffffa0368a54>] ?
>> >> ceph_x_build_authorizer.isra.6+0x144/0x1e0 [libceph]
>> >> May 18 13:26:15 nfs1 kernel: [999560.938727]  [<ffffffffa035b60d>] ?
>> >> ceph_buffer_release+0x2d/0x50 [libceph]
>> >> May 18 13:26:15 nfs1 kernel: [999560.945761]  [<ffffffffa03687bc>] ?
>> >> ceph_x_destroy_authorizer+0x2c/0x40 [libceph]
>> >> May 18 13:26:15 nfs1 kernel: [999560.953315]  [<ffffffffa0368d2e>]
>> >> ceph_x_create_authorizer+0x6e/0xd0 [libceph]
>> >> May 18 13:26:15 nfs1 kernel: [999560.960609]  [<ffffffffa035db49>]
>> >> get_authorizer+0x89/0xc0 [libceph]
>> >> May 18 13:26:15 nfs1 kernel: [999560.967035]  [<ffffffffa0357704>]
>> >> prepare_write_connect+0xb4/0x210 [libceph]
>> >> May 18 13:26:15 nfs1 kernel: [999560.974161]  [<ffffffffa035b2a5>]
>> >> try_read+0x3d5/0x430 [libceph]
>> >> May 18 13:26:15 nfs1 kernel: [999560.980249]  [<ffffffffa035b38f>]
>> >> con_work+0x8f/0x140 [libceph]
>> >> May 18 13:26:15 nfs1 kernel: [999560.986242]  [<ffffffff81078c31>]
>> >> process_one_work+0x141/0x490
>> >> May 18 13:26:15 nfs1 kernel: [999560.992153]  [<ffffffff81079b08>]
>> >> worker_thread+0x168/0x400
>> >> May 18 13:26:15 nfs1 kernel: [999560.997800]  [<ffffffff810799a0>] ?
>> >> manage_workers+0x120/0x120
>> >> May 18 13:26:15 nfs1 kernel: [999561.003713]  [<ffffffff8107eff0>]
>> >> kthread+0xc0/0xd0
>> >> May 18 13:26:15 nfs1 kernel: [999561.008669]  [<ffffffff8107ef30>] ?
>> >> flush_kthread_worker+0xb0/0xb0
>> >> May 18 13:26:15 nfs1 kernel: [999561.014927]  [<ffffffff816f532c>]
>> >> ret_from_fork+0x7c/0xb0
>> >> May 18 13:26:15 nfs1 kernel: [999561.020401]  [<ffffffff8107ef30>] ?
>> >> flush_kthread_worker+0xb0/0xb0
>> >> May 18 13:26:15 nfs1 kernel: [999561.026657] Code: 2b 43 50 88 43 4e
>> >> 48 83 c4 08 5b 5d c3 90 e8 fb fd ff ff eb e6 90 90 90 90 90 90 90 90
>> >> 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20
>> >> 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c
>> >> May 18 13:26:15 nfs1 kernel: [999561.046667] RIP  [<ffffffff8135ca9d>]
>> >> memcpy+0xd/0x110
>> >> May 18 13:26:15 nfs1 kernel: [999561.051903]  RSP <ffff88062dc3dc40>
>> >> May 18 13:26:15 nfs1 kernel: [999561.055477] CR2: ffff880640000000
>> >> May 18 13:26:15 nfs1 kernel: [999561.058894] ---[ end trace
>> >> 2fa4f8a71fe96709 ]---
>> >>
>> >> Thanks!
>> >>
>> >> - Travis
>> >>
>> >> On Tue, May 7, 2013 at 10:54 AM, Travis Rhoden <trhoden@gmail.com> wrote:
>> >>> Thanks Sage, I'll monitor the 3.8 point releases and update when I see
>> >>> a release with those changes.
>> >>>
>> >>> - Travis
>> >>>
>> >>> On Mon, May 6, 2013 at 10:54 PM, Sage Weil <sage@inktank.com> wrote:
>> >>>> On Mon, 6 May 2013, Travis Rhoden wrote:
>> >>>>> Hey folks,
>> >>>>>
>> >>>>> We have two servers that map a lot of RBDs (20 to 30 each so far),
>> >>>>> using the RBD kernel module.  They are running Ubuntu 12.10, and I
>> >>>>> originally saw a lot of kernel panics (obviously from Ceph) when
>> >>>>> running a 3.5.7 kernel.
>> >>>>>
>> >>>>> I upgrade a while back to a 3.8.5 kernel to get a much newer RBD
>> >>>>> module, and the kernel panics from Ceph went away...and were replaced
>> >>>>> by these nebulous "General Protection Faults" that I couldn't really
>> >>>>> tell what was causing them.
>> >>>>>
>> >>>>> Today we saw one that actually had a Ceph backtrace in it, so I wanted
>> >>>>> to throw it on here:
>> >>>>>
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.423165] general protection fault:
>> >>>>> 0000 [#3] SMP
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.428252] Modules linked in: rbd
>> >>>>> libceph libcrc32c coretemp nfsd kvm nfs_acl auth_rpcgss nfs fscache
>> >>>>> lockd sunrpc gpio_ich psmouse microcode serio_raw i7core_edac ipmi_si
>> >>>>> edac_core lpc_ich ioatdma ipmi_devintf mac_hid ipmi_msghandler bonding
>> >>>>> lp parport tcp_bic raid10 raid456 async_pq async_xor xor async_memcpy
>> >>>>> async_raid6_recov hid_generic raid6_pq usbhid async_tx hid igb raid1
>> >>>>> myri10ge raid0 ahci ptp libahci dca pps_core multipath linear
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.468114] CPU 17
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.470133] Pid: 15920, comm:
>> >>>>> kworker/17:2 Tainted: G      D      3.8.5-030805-generic #201303281651
>> >>>>> Penguin Computing Relion 1751/X8DTU
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.482635] RIP:
>> >>>>> 0010:[<ffffffff811851ff>]  [<ffffffff811851ff>]
>> >>>>> kmem_cache_alloc_trace+0x5f/0x140
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.491686] RSP:
>> >>>>> 0018:ffff880624cb1a98  EFLAGS: 00010202
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.497074] RAX: 0000000000000000
>> >>>>> RBX: ffff88032ddc46d0 RCX: 000000000003c867
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.504283] RDX: 000000000003c866
>> >>>>> RSI: 0000000000008050 RDI: 0000000000016c80
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.511490] RBP: ffff880624cb1ae8
>> >>>>> R08: ffff880333d76c80 R09: 0000000000000002
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.518697] R10: ffff88032ce40070
>> >>>>> R11: 000000000000000d R12: ffff880333802200
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.525906] R13: 2e0460b9275465f2
>> >>>>> R14: ffffffffa023901e R15: 0000000000008050
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.533113] FS:
>> >>>>> 0000000000000000(0000) GS:ffff880333d60000(0000)
>> >>>>> knlGS:0000000000000000
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.541274] CS:  0010 DS: 0000 ES:
>> >>>>> 0000 CR0: 000000008005003b
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.547095] CR2: 00007fbf9467f2b0
>> >>>>> CR3: 0000000001c0d000 CR4: 00000000000007e0
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.554305] DR0: 0000000000000000
>> >>>>> DR1: 0000000000000000 DR2: 0000000000000000
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.561512] DR3: 0000000000000000
>> >>>>> DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.568720] Process kworker/17:2
>> >>>>> (pid: 15920, threadinfo ffff880624cb0000, task ffff88032b600000)
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.577656] Stack:
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.579756]  0000000000000000
>> >>>>> 0000000000000000 0000000000000060 0000000000000000
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.587292]  0000000000000000
>> >>>>> ffff88032ddc46d0 0000000000000004 ffff88032ddc46c0
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.594819]  ffff88032b432b30
>> >>>>> 0000000000000000 ffff880624cb1b28 ffffffffa023901e
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.602347] Call Trace:
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.604886]  [<ffffffffa023901e>]
>> >>>>> get_ticket_handler.isra.4+0x5e/0xc0 [libceph]
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.612271]  [<ffffffffa02394b4>]
>> >>>>> ceph_x_proc_ticket_reply+0x274/0x440 [libceph]
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.619740]  [<ffffffffa023973d>]
>> >>>>> ceph_x_handle_reply+0xbd/0x110 [libceph]
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.626696]  [<ffffffffa023765c>]
>> >>>>> ceph_handle_auth_reply+0x18c/0x200 [libceph]
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.633988]  [<ffffffffa022d590>]
>> >>>>> handle_auth_reply.isra.12+0xa0/0x230 [libceph]
>> >>>>
>> >>>> Ah, this is in the auth code.  There was a series of patches that fixed
>> >>>> the locking and a few other things that jsut went upstream for 3.10.  I'll
>> >>>> prepare some patches to backport those fixes to stable kernels (3.8 and
>> >>>> 3.4).  It could easily explain your crashes.
>> >>>>
>> >>>> Thanks!
>> >>>> sage
>> >>>>
>> >>>>
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.641457]  [<ffffffffa022e87d>]
>> >>>>> dispatch+0xbd/0x120 [libceph]
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.647450]  [<ffffffffa0228205>]
>> >>>>> process_message+0xa5/0xc0 [libceph]
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.653966]  [<ffffffffa022c1b1>]
>> >>>>> try_read+0x2e1/0x430 [libceph]
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.660048]  [<ffffffffa022c38f>]
>> >>>>> con_work+0x8f/0x140 [libceph]
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.666043]  [<ffffffff81078c31>]
>> >>>>> process_one_work+0x141/0x490
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.671952]  [<ffffffff81079b08>]
>> >>>>> worker_thread+0x168/0x400
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.677601]  [<ffffffff810799a0>] ?
>> >>>>> manage_workers+0x120/0x120
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.683513]  [<ffffffff8107eff0>]
>> >>>>> kthread+0xc0/0xd0
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.688469]  [<ffffffff8107ef30>] ?
>> >>>>> flush_kthread_worker+0xb0/0xb0
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.694726]  [<ffffffff816f532c>]
>> >>>>> ret_from_fork+0x7c/0xb0
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.700203]  [<ffffffff8107ef30>] ?
>> >>>>> flush_kthread_worker+0xb0/0xb0
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.706456] Code: 00 4d 8b 04 24 65
>> >>>>> 4c 03 04 25 08 dc 00 00 49 8b 50 08 4d 8b 28 4d 85 ed 0f 84 cf 00 00
>> >>>>> 00 49 63 44 24 20 49 8b 3c 24 48 8d 4a 01 <49> 8b 5c 05 00 4c 89 e8 65
>> >>>>> 48 0f c7 0f 0f 94 c0 84 c0 74 c2 49
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.726468] RIP  [<ffffffff811851ff>]
>> >>>>> kmem_cache_alloc_trace+0x5f/0x140
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.733182]  RSP <ffff880624cb1a98>
>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.736838] ---[ end trace
>> >>>>> 20e9b6a1bb611aba ]---
>> >>>>>
>> >>>>> I'm not sure whether the problem started here or not.  I mentioned
>> >>>>> that the previous GPFs were nebulous -- one thing most of them have
>> >>>>> had in common is that it's almost always from nfsd (this one isn't --
>> >>>>> first and only time I've seen this one).  Howevever, I am using NFS to
>> >>>>> re-export some RBDs (to provide access to multiple clients) so Ceph is
>> >>>>> still in the picture on those.
>> >>>>>
>> >>>>> I know its not a lot to go on, but any advice would be appreciated.
>> >>>>>
>> >>>>> - Travis
>> >>>>> --
>> >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >>>>> the body of a message to majordomo@vger.kernel.org
>> >>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: General Protection Fault in 3.8.5
  2013-05-21 15:53             ` Travis Rhoden
@ 2013-06-10 19:26               ` Travis Rhoden
  0 siblings, 0 replies; 9+ messages in thread
From: Travis Rhoden @ 2013-06-10 19:26 UTC (permalink / raw)
  To: Sage Weil; +Cc: Stefan Priebe - Profihost AG, ceph-devel

Any idea if these patches have been merged upstream yet?  I've been
keeping an eye out for them.

3.4.48 and 3.9.5 ubuntu kernel builds each came out last Friday, but I
still haven't seen any Ceph/RBD updates in the newer builds.

On Tue, May 21, 2013 at 11:53 AM, Travis Rhoden <trhoden@gmail.com> wrote:
> Roger. Thanks for the heads-up on that.
>
> On Mon, May 20, 2013 at 2:18 PM, Sage Weil <sage@inktank.com> wrote:
>> On Mon, 20 May 2013, Stefan Priebe - Profihost AG wrote:
>>> Am 20.05.2013 um 18:29 schrieb Sage Weil <sage@inktank.com>:
>>>
>>> > Hi Travis,
>>> >
>>> > The fixes for this locking just went upstream for 3.10.  We'll be sending
>>> > to Greg KH for the stable kernels shortly.
>>> >
>>> > sage
>>>
>>> But this won't change anything for 3.8 as this is eol.
>>
>> Yeah, it'll go to 3.9 and 3.4.
>>
>> sage
>>
>>
>>>
>>> >
>>> >
>>> > On Mon, 20 May 2013, Travis Rhoden wrote:
>>> >
>>> >> Sage,
>>> >>
>>> >> Did a patch for the auth code get submitted for the 3.8 kernel?  I hit
>>> >> this again over the weekend.  Looks slightly different than the last
>>> >> one, but still in the auth code.
>>> >>
>>> >> May 18 13:26:15 nfs1 kernel: [999560.730733] BUG: unable to handle
>>> >> kernel paging request at ffff880640000000
>>> >> May 18 13:26:15 nfs1 kernel: [999560.737818] IP: [<ffffffff8135ca9d>]
>>> >> memcpy+0xd/0x110
>>> >> May 18 13:26:15 nfs1 kernel: [999560.742974] PGD 1c0e063 PUD 0
>>> >> May 18 13:26:15 nfs1 kernel: [999560.746150] Oops: 0000 [#1] SMP
>>> >> May 18 13:26:15 nfs1 kernel: [999560.749498] Modules linked in: btrfs
>>> >> zlib_deflate ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs reiserfs
>>> >> ext2 rbd libceph libcrc32c nfsd nfs_acl auth_rpcgss nfs fscache lockd
>>> >> coretemp sunrpc kvm gpio_ich psmouse microcode serio_raw i7core_edac
>>> >> ioatdma lpc_ich edac_core ipmi_si mac_hid ipmi_devintf ipmi_msghandler
>>> >> bonding lp parport tcp_bic raid10 raid456 async_pq async_xor xor
>>> >> async_memcpy async_raid6_recov hid_generic usbhid hid raid6_pq
>>> >> async_tx igb ahci myri10ge raid1 ptp libahci raid0 dca pps_core
>>> >> multipath linear
>>> >> May 18 13:26:15 nfs1 kernel: [999560.796421] CPU 0
>>> >> May 18 13:26:15 nfs1 kernel: [999560.798353] Pid: 26234, comm:
>>> >> kworker/0:0 Not tainted 3.8.5-030805-generic #201303281651 Penguin
>>> >> Computing Relion 1751/X8DTU
>>> >> May 18 13:26:15 nfs1 kernel: [999560.809827] RIP:
>>> >> 0010:[<ffffffff8135ca9d>]  [<ffffffff8135ca9d>] memcpy+0xd/0x110
>>> >> May 18 13:26:15 nfs1 kernel: [999560.817403] RSP:
>>> >> 0018:ffff88062dc3dc40  EFLAGS: 00010246
>>> >> May 18 13:26:15 nfs1 kernel: [999560.822794] RAX: ffffc90017f4301a
>>> >> RBX: ffff880323ba4300 RCX: 1ffff100c2f035b2
>>> >> May 18 13:26:15 nfs1 kernel: [999560.830003] RDX: 0000000000000000
>>> >> RSI: ffff880640000000 RDI: ffffc9002c335952
>>> >> May 18 13:26:15 nfs1 kernel: [999560.837209] RBP: ffff88062dc3dc98
>>> >> R08: ffffc90043b52000 R09: ffff88062dc3dad4
>>> >> May 18 13:26:15 nfs1 kernel: [999560.844417] R10: ffff88027a45f0e8
>>> >> R11: ffff88033fffbec0 R12: ffffc90017f4301a
>>> >> May 18 13:26:15 nfs1 kernel: [999560.851626] R13: 000000002bc0d708
>>> >> R14: ffff880628407120 R15: 000000002bc0d6c8
>>> >> May 18 13:26:15 nfs1 kernel: [999560.858834] FS:
>>> >> 0000000000000000(0000) GS:ffff880333c00000(0000)
>>> >> knlGS:0000000000000000
>>> >> May 18 13:26:15 nfs1 kernel: [999560.867000] CS:  0010 DS: 0000 ES:
>>> >> 0000 CR0: 000000008005003b
>>> >> May 18 13:26:15 nfs1 kernel: [999560.872824] CR2: ffff880640000000
>>> >> CR3: 0000000001c0d000 CR4: 00000000000007f0
>>> >> May 18 13:26:15 nfs1 kernel: [999560.880032] DR0: 0000000000000000
>>> >> DR1: 0000000000000000 DR2: 0000000000000000
>>> >> May 18 13:26:15 nfs1 kernel: [999560.887239] DR3: 0000000000000000
>>> >> DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>> >> May 18 13:26:15 nfs1 kernel: [999560.894446] Process kworker/0:0 (pid:
>>> >> 26234, threadinfo ffff88062dc3c000, task ffff88032d8845c0)
>>> >> May 18 13:26:15 nfs1 kernel: [999560.903298] Stack:
>>> >> May 18 13:26:15 nfs1 kernel: [999560.905399]  ffffffffa0368a54
>>> >> ffffffffa035b60d 2bc0d6c8a0368d12 0000000000000098
>>> >> May 18 13:26:15 nfs1 kernel: [999560.912942]  00000000000000c0
>>> >> ffffffffa03687bc ffff880323ba4300 ffff880322fec4d8
>>> >> May 18 13:26:15 nfs1 kernel: [999560.920471]  ffff880628407120
>>> >> ffff88032bdf5c40 ffff880322fec420 ffff88062dc3dcd8
>>> >> May 18 13:26:15 nfs1 kernel: [999560.928016] Call Trace:
>>> >> May 18 13:26:15 nfs1 kernel: [999560.930561]  [<ffffffffa0368a54>] ?
>>> >> ceph_x_build_authorizer.isra.6+0x144/0x1e0 [libceph]
>>> >> May 18 13:26:15 nfs1 kernel: [999560.938727]  [<ffffffffa035b60d>] ?
>>> >> ceph_buffer_release+0x2d/0x50 [libceph]
>>> >> May 18 13:26:15 nfs1 kernel: [999560.945761]  [<ffffffffa03687bc>] ?
>>> >> ceph_x_destroy_authorizer+0x2c/0x40 [libceph]
>>> >> May 18 13:26:15 nfs1 kernel: [999560.953315]  [<ffffffffa0368d2e>]
>>> >> ceph_x_create_authorizer+0x6e/0xd0 [libceph]
>>> >> May 18 13:26:15 nfs1 kernel: [999560.960609]  [<ffffffffa035db49>]
>>> >> get_authorizer+0x89/0xc0 [libceph]
>>> >> May 18 13:26:15 nfs1 kernel: [999560.967035]  [<ffffffffa0357704>]
>>> >> prepare_write_connect+0xb4/0x210 [libceph]
>>> >> May 18 13:26:15 nfs1 kernel: [999560.974161]  [<ffffffffa035b2a5>]
>>> >> try_read+0x3d5/0x430 [libceph]
>>> >> May 18 13:26:15 nfs1 kernel: [999560.980249]  [<ffffffffa035b38f>]
>>> >> con_work+0x8f/0x140 [libceph]
>>> >> May 18 13:26:15 nfs1 kernel: [999560.986242]  [<ffffffff81078c31>]
>>> >> process_one_work+0x141/0x490
>>> >> May 18 13:26:15 nfs1 kernel: [999560.992153]  [<ffffffff81079b08>]
>>> >> worker_thread+0x168/0x400
>>> >> May 18 13:26:15 nfs1 kernel: [999560.997800]  [<ffffffff810799a0>] ?
>>> >> manage_workers+0x120/0x120
>>> >> May 18 13:26:15 nfs1 kernel: [999561.003713]  [<ffffffff8107eff0>]
>>> >> kthread+0xc0/0xd0
>>> >> May 18 13:26:15 nfs1 kernel: [999561.008669]  [<ffffffff8107ef30>] ?
>>> >> flush_kthread_worker+0xb0/0xb0
>>> >> May 18 13:26:15 nfs1 kernel: [999561.014927]  [<ffffffff816f532c>]
>>> >> ret_from_fork+0x7c/0xb0
>>> >> May 18 13:26:15 nfs1 kernel: [999561.020401]  [<ffffffff8107ef30>] ?
>>> >> flush_kthread_worker+0xb0/0xb0
>>> >> May 18 13:26:15 nfs1 kernel: [999561.026657] Code: 2b 43 50 88 43 4e
>>> >> 48 83 c4 08 5b 5d c3 90 e8 fb fd ff ff eb e6 90 90 90 90 90 90 90 90
>>> >> 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20
>>> >> 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c
>>> >> May 18 13:26:15 nfs1 kernel: [999561.046667] RIP  [<ffffffff8135ca9d>]
>>> >> memcpy+0xd/0x110
>>> >> May 18 13:26:15 nfs1 kernel: [999561.051903]  RSP <ffff88062dc3dc40>
>>> >> May 18 13:26:15 nfs1 kernel: [999561.055477] CR2: ffff880640000000
>>> >> May 18 13:26:15 nfs1 kernel: [999561.058894] ---[ end trace
>>> >> 2fa4f8a71fe96709 ]---
>>> >>
>>> >> Thanks!
>>> >>
>>> >> - Travis
>>> >>
>>> >> On Tue, May 7, 2013 at 10:54 AM, Travis Rhoden <trhoden@gmail.com> wrote:
>>> >>> Thanks Sage, I'll monitor the 3.8 point releases and update when I see
>>> >>> a release with those changes.
>>> >>>
>>> >>> - Travis
>>> >>>
>>> >>> On Mon, May 6, 2013 at 10:54 PM, Sage Weil <sage@inktank.com> wrote:
>>> >>>> On Mon, 6 May 2013, Travis Rhoden wrote:
>>> >>>>> Hey folks,
>>> >>>>>
>>> >>>>> We have two servers that map a lot of RBDs (20 to 30 each so far),
>>> >>>>> using the RBD kernel module.  They are running Ubuntu 12.10, and I
>>> >>>>> originally saw a lot of kernel panics (obviously from Ceph) when
>>> >>>>> running a 3.5.7 kernel.
>>> >>>>>
>>> >>>>> I upgrade a while back to a 3.8.5 kernel to get a much newer RBD
>>> >>>>> module, and the kernel panics from Ceph went away...and were replaced
>>> >>>>> by these nebulous "General Protection Faults" that I couldn't really
>>> >>>>> tell what was causing them.
>>> >>>>>
>>> >>>>> Today we saw one that actually had a Ceph backtrace in it, so I wanted
>>> >>>>> to throw it on here:
>>> >>>>>
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.423165] general protection fault:
>>> >>>>> 0000 [#3] SMP
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.428252] Modules linked in: rbd
>>> >>>>> libceph libcrc32c coretemp nfsd kvm nfs_acl auth_rpcgss nfs fscache
>>> >>>>> lockd sunrpc gpio_ich psmouse microcode serio_raw i7core_edac ipmi_si
>>> >>>>> edac_core lpc_ich ioatdma ipmi_devintf mac_hid ipmi_msghandler bonding
>>> >>>>> lp parport tcp_bic raid10 raid456 async_pq async_xor xor async_memcpy
>>> >>>>> async_raid6_recov hid_generic raid6_pq usbhid async_tx hid igb raid1
>>> >>>>> myri10ge raid0 ahci ptp libahci dca pps_core multipath linear
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.468114] CPU 17
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.470133] Pid: 15920, comm:
>>> >>>>> kworker/17:2 Tainted: G      D      3.8.5-030805-generic #201303281651
>>> >>>>> Penguin Computing Relion 1751/X8DTU
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.482635] RIP:
>>> >>>>> 0010:[<ffffffff811851ff>]  [<ffffffff811851ff>]
>>> >>>>> kmem_cache_alloc_trace+0x5f/0x140
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.491686] RSP:
>>> >>>>> 0018:ffff880624cb1a98  EFLAGS: 00010202
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.497074] RAX: 0000000000000000
>>> >>>>> RBX: ffff88032ddc46d0 RCX: 000000000003c867
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.504283] RDX: 000000000003c866
>>> >>>>> RSI: 0000000000008050 RDI: 0000000000016c80
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.511490] RBP: ffff880624cb1ae8
>>> >>>>> R08: ffff880333d76c80 R09: 0000000000000002
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.518697] R10: ffff88032ce40070
>>> >>>>> R11: 000000000000000d R12: ffff880333802200
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.525906] R13: 2e0460b9275465f2
>>> >>>>> R14: ffffffffa023901e R15: 0000000000008050
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.533113] FS:
>>> >>>>> 0000000000000000(0000) GS:ffff880333d60000(0000)
>>> >>>>> knlGS:0000000000000000
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.541274] CS:  0010 DS: 0000 ES:
>>> >>>>> 0000 CR0: 000000008005003b
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.547095] CR2: 00007fbf9467f2b0
>>> >>>>> CR3: 0000000001c0d000 CR4: 00000000000007e0
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.554305] DR0: 0000000000000000
>>> >>>>> DR1: 0000000000000000 DR2: 0000000000000000
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.561512] DR3: 0000000000000000
>>> >>>>> DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.568720] Process kworker/17:2
>>> >>>>> (pid: 15920, threadinfo ffff880624cb0000, task ffff88032b600000)
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.577656] Stack:
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.579756]  0000000000000000
>>> >>>>> 0000000000000000 0000000000000060 0000000000000000
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.587292]  0000000000000000
>>> >>>>> ffff88032ddc46d0 0000000000000004 ffff88032ddc46c0
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.594819]  ffff88032b432b30
>>> >>>>> 0000000000000000 ffff880624cb1b28 ffffffffa023901e
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.602347] Call Trace:
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.604886]  [<ffffffffa023901e>]
>>> >>>>> get_ticket_handler.isra.4+0x5e/0xc0 [libceph]
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.612271]  [<ffffffffa02394b4>]
>>> >>>>> ceph_x_proc_ticket_reply+0x274/0x440 [libceph]
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.619740]  [<ffffffffa023973d>]
>>> >>>>> ceph_x_handle_reply+0xbd/0x110 [libceph]
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.626696]  [<ffffffffa023765c>]
>>> >>>>> ceph_handle_auth_reply+0x18c/0x200 [libceph]
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.633988]  [<ffffffffa022d590>]
>>> >>>>> handle_auth_reply.isra.12+0xa0/0x230 [libceph]
>>> >>>>
>>> >>>> Ah, this is in the auth code.  There was a series of patches that fixed
>>> >>>> the locking and a few other things that jsut went upstream for 3.10.  I'll
>>> >>>> prepare some patches to backport those fixes to stable kernels (3.8 and
>>> >>>> 3.4).  It could easily explain your crashes.
>>> >>>>
>>> >>>> Thanks!
>>> >>>> sage
>>> >>>>
>>> >>>>
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.641457]  [<ffffffffa022e87d>]
>>> >>>>> dispatch+0xbd/0x120 [libceph]
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.647450]  [<ffffffffa0228205>]
>>> >>>>> process_message+0xa5/0xc0 [libceph]
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.653966]  [<ffffffffa022c1b1>]
>>> >>>>> try_read+0x2e1/0x430 [libceph]
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.660048]  [<ffffffffa022c38f>]
>>> >>>>> con_work+0x8f/0x140 [libceph]
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.666043]  [<ffffffff81078c31>]
>>> >>>>> process_one_work+0x141/0x490
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.671952]  [<ffffffff81079b08>]
>>> >>>>> worker_thread+0x168/0x400
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.677601]  [<ffffffff810799a0>] ?
>>> >>>>> manage_workers+0x120/0x120
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.683513]  [<ffffffff8107eff0>]
>>> >>>>> kthread+0xc0/0xd0
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.688469]  [<ffffffff8107ef30>] ?
>>> >>>>> flush_kthread_worker+0xb0/0xb0
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.694726]  [<ffffffff816f532c>]
>>> >>>>> ret_from_fork+0x7c/0xb0
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.700203]  [<ffffffff8107ef30>] ?
>>> >>>>> flush_kthread_worker+0xb0/0xb0
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.706456] Code: 00 4d 8b 04 24 65
>>> >>>>> 4c 03 04 25 08 dc 00 00 49 8b 50 08 4d 8b 28 4d 85 ed 0f 84 cf 00 00
>>> >>>>> 00 49 63 44 24 20 49 8b 3c 24 48 8d 4a 01 <49> 8b 5c 05 00 4c 89 e8 65
>>> >>>>> 48 0f c7 0f 0f 94 c0 84 c0 74 c2 49
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.726468] RIP  [<ffffffff811851ff>]
>>> >>>>> kmem_cache_alloc_trace+0x5f/0x140
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.733182]  RSP <ffff880624cb1a98>
>>> >>>>> May  6 23:02:58 nfs1 kernel: [295972.736838] ---[ end trace
>>> >>>>> 20e9b6a1bb611aba ]---
>>> >>>>>
>>> >>>>> I'm not sure whether the problem started here or not.  I mentioned
>>> >>>>> that the previous GPFs were nebulous -- one thing most of them have
>>> >>>>> had in common is that it's almost always from nfsd (this one isn't --
>>> >>>>> first and only time I've seen this one).  Howevever, I am using NFS to
>>> >>>>> re-export some RBDs (to provide access to multiple clients) so Ceph is
>>> >>>>> still in the picture on those.
>>> >>>>>
>>> >>>>> I know its not a lot to go on, but any advice would be appreciated.
>>> >>>>>
>>> >>>>> - Travis
>>> >>>>> --
>>> >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> >>>>> the body of a message to majordomo@vger.kernel.org
>>> >>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> > --
>>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> > the body of a message to majordomo@vger.kernel.org
>>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-06-10 19:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-07  1:48 General Protection Fault in 3.8.5 Travis Rhoden
2013-05-07  2:54 ` Sage Weil
2013-05-07 14:54   ` Travis Rhoden
2013-05-20 15:45     ` Travis Rhoden
2013-05-20 16:29       ` Sage Weil
2013-05-20 18:15         ` Stefan Priebe - Profihost AG
2013-05-20 18:18           ` Sage Weil
2013-05-21 15:53             ` Travis Rhoden
2013-06-10 19:26               ` Travis Rhoden

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.