* ceph kernel bug @ 2011-09-10 21:12 Martin Mailand 2011-09-10 22:47 ` Sage Weil 0 siblings, 1 reply; 10+ messages in thread From: Martin Mailand @ 2011-09-10 21:12 UTC (permalink / raw) To: ceph-devel Hi, I hit the following Bug. My Setup is very simple I have two osd (osd1 and osd2) and one monitor. On the fourth machine I mount ceph via the rbd device and I use the rbd device for a qemu instance. When I reboot one of the two osds I hit reproducible this bug. On all machine I use the kernel version 3.1.0-rc5 and ceph version 0.34-1natty from the newdream repro. Regards, martin [ 105.746163] libceph: osd2 192.168.42.114:6800 socket closed [ 105.757635] libceph: osd2 192.168.42.114:6800 connection failed [ 106.040203] libceph: osd2 192.168.42.114:6800 connection failed [ 107.040231] libceph: osd2 192.168.42.114:6800 connection failed [ 109.040508] libceph: osd2 192.168.42.114:6800 connection failed [ 113.050453] libceph: osd2 192.168.42.114:6800 connection failed [ 121.060191] libceph: osd2 192.168.42.114:6800 connection failed [ 137.090484] libceph: osd2 192.168.42.114:6800 connection failed [ 198.237123] ------------[ cut here ]------------ [ 198.246419] kernel BUG at net/ceph/messenger.c:2193! [ 198.246949] invalid opcode: 0000 [#1] SMP [ 198.246949] CPU 0 [ 198.246949] Modules linked in: rbd libceph libcrc32c ip6table_filter ip6_tables iptable_filter ip_tables x_tables nv_tco bridge stp kvm_amd kvm radeon ttm psmouse drm_kms_helper drm i2c_algo_bit k10temp i2c_nforce2 shpchp amd64_edac_mod serio_raw edac_core edac_mce_amd lp parport ses enclosure aacraid forcedeth [ 198.246949] [ 198.246949] Pid: 10, comm: kworker/0:1 Not tainted 3.1.0-rc5-custom #1 Supermicro H8DM8-2/H8DM8-2 [ 198.246949] RIP: 0010:[<ffffffffa02d83f1>] [<ffffffffa02d83f1>] ceph_con_send+0x111/0x120 [libceph] [ 198.246949] RSP: 0018:ffff880405cd5bc0 EFLAGS: 00010202 [ 198.246949] RAX: ffff880803fe7878 RBX: ffff880403fb8030 RCX: ffff880803fd1650 [ 198.246949] RDX: ffff880405cd5fd8 RSI: ffff880803fe7800 RDI: ffff880403fb81a8 [ 198.246949] RBP: ffff880405cd5be0 R08: ffff880405cd5b70 R09: 0000000000000002 [ 198.246949] R10: 0000000000000002 R11: 0000000000000072 R12: ffff880403fb81a8 [ 198.246949] R13: ffff880803fe7800 R14: ffff880803fd1660 R15: ffff880803fd1650 [ 198.246949] FS: 00007fea65610700(0000) GS:ffff88040fc00000(0000) knlGS:0000000000000000 [ 198.246949] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 198.246949] CR2: 00007f61e407f000 CR3: 0000000001a05000 CR4: 00000000000006f0 [ 198.246949] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 198.246949] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 198.246949] Process kworker/0:1 (pid: 10, threadinfo ffff880405cd4000, task ffff880405cc5bc0) [ 198.246949] Stack: [ 198.246949] ffff880405cd5be0 ffff880804fb5800 ffff880803fd1630 ffff880803fd15a8 [ 198.246949] ffff880405cd5c30 ffffffffa02dd8ad ffff880803fd1480 ffff880803fd1600 [ 198.246949] ffff880405cd5c30 ffff8803fde4c644 ffff880803fd15a8 0000000000000000 [ 198.246949] Call Trace: [ 198.246949] [<ffffffffa02dd8ad>] send_queued+0xed/0x130 [libceph] [ 198.246949] [<ffffffffa02dfd81>] ceph_osdc_handle_map+0x261/0x3b0 [libceph] [ 198.246949] [<ffffffffa02d711c>] ? ceph_msg_new+0x15c/0x230 [libceph] [ 198.246949] [<ffffffffa02e01e0>] dispatch+0x150/0x360 [libceph] [ 198.246949] [<ffffffffa02da54f>] con_work+0x214f/0x21d0 [libceph] [ 198.246949] [<ffffffffa02d8400>] ? ceph_con_send+0x120/0x120 [libceph] [ 198.246949] [<ffffffff8108110d>] process_one_work+0x11d/0x430 [ 198.246949] [<ffffffff81081c69>] worker_thread+0x169/0x360 [ 198.246949] [<ffffffff81081b00>] ? manage_workers.clone.21+0x240/0x240 [ 198.246949] [<ffffffff81086496>] kthread+0x96/0xa0 [ 198.246949] [<ffffffff815e5bb4>] kernel_thread_helper+0x4/0x10 [ 198.246949] [<ffffffff81086400>] ? flush_kthread_worker+0xb0/0xb0 [ 198.246949] [<ffffffff815e5bb0>] ? gs_change+0x13/0x13 [ 198.246949] Code: 65 f0 4c 8b 6d f8 c9 c3 66 90 48 8d be 88 00 00 00 48 c7 c6 70 a8 2d a0 e8 dd 9c 00 e1 48 8b 5d e8 4c 8b 65 f0 4c 8b 6d f8 c9 c3 <0f> 0b 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 [ 198.246949] RIP [<ffffffffa02d83f1>] ceph_con_send+0x111/0x120 [libceph] [ 198.246949] RSP <ffff880405cd5bc0> [ 198.927024] ---[ end trace 03cb81299b093f05 ]--- [ 198.940010] BUG: unable to handle kernel paging request at fffffffffffffff8 [ 198.949892] IP: [<ffffffff810868f0>] kthread_data+0x10/0x20 [ 198.949892] PGD 1a07067 PUD 1a08067 PMD 0 [ 198.949892] Oops: 0000 [#2] SMP [ 198.949892] CPU 0 [ 198.949892] Modules linked in: rbd libceph libcrc32c ip6table_filter ip6_tables iptable_filter ip_tables x_tables nv_tco bridge stp kvm_amd kvm radeon ttm psmouse drm_kms_helper drm i2c_algo_bit k10temp i2c_nforce2 shpchp amd64_edac_mod serio_raw edac_core edac_mce_amd lp parport ses enclosure aacraid forcedeth [ 198.949892] [ 198.949892] Pid: 10, comm: kworker/0:1 Tainted: G D 3.1.0-rc5-custom #1 Supermicro H8DM8-2/H8DM8-2 [ 198.949892] RIP: 0010:[<ffffffff810868f0>] [<ffffffff810868f0>] kthread_data+0x10/0x20 [ 198.949892] RSP: 0018:ffff880405cd5868 EFLAGS: 00010096 [ 198.949892] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 198.949892] RDX: ffff880405cc5bc0 RSI: 0000000000000000 RDI: ffff880405cc5bc0 [ 198.949892] RBP: ffff880405cd5868 R08: 0000000000989680 R09: 0000000000000000 [ 198.949892] R10: 0000000000000400 R11: 0000000000000006 R12: ffff880405cc5f88 [ 198.949892] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880405cc5e90 [ 198.949892] FS: 00007fea65610700(0000) GS:ffff88040fc00000(0000) knlGS:0000000000000000 [ 198.949892] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 198.949892] CR2: fffffffffffffff8 CR3: 0000000001a05000 CR4: 00000000000006f0 [ 198.949892] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 198.949892] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 198.949892] Process kworker/0:1 (pid: 10, threadinfo ffff880405cd4000, task ffff880405cc5bc0) [ 198.949892] Stack: [ 198.949892] ffff880405cd5888 ffffffff81082345 ffff880405cd5888 ffff88040fc13080 [ 198.949892] ffff880405cd5918 ffffffff815d9092 ffff880405e5a558 ffff880405cc5bc0 [ 198.949892] ffff880405cd58d8 ffff880405cd5fd8 ffff880405cd4000 ffff880405cd5fd8 [ 198.949892] Call Trace: [ 198.949892] [<ffffffff81082345>] wq_worker_sleeping+0x15/0xa0 [ 198.949892] [<ffffffff815d9092>] __schedule+0x5c2/0x8b0 [ 198.949892] [<ffffffff812caf96>] ? put_io_context+0x46/0x70 [ 198.949892] [<ffffffff8105b72f>] schedule+0x3f/0x60 [ 198.949892] [<ffffffff81068223>] do_exit+0x5e3/0x8a0 [ 198.949892] [<ffffffff815dcc4f>] oops_end+0xaf/0xf0 [ 198.949892] [<ffffffff8101689b>] die+0x5b/0x90 [ 198.949892] [<ffffffff815dc354>] do_trap+0xc4/0x170 [ 198.949892] [<ffffffff81013f25>] do_invalid_op+0x95/0xb0 [ 198.949892] [<ffffffffa02d83f1>] ? ceph_con_send+0x111/0x120 [libceph] [ 198.949892] [<ffffffffa02e276a>] ? ceph_calc_pg_acting+0x2a/0x90 [libceph] [ 198.949892] [<ffffffff815e5a2b>] invalid_op+0x1b/0x20 [ 198.949892] [<ffffffffa02d83f1>] ? ceph_con_send+0x111/0x120 [libceph] [ 198.949892] [<ffffffffa02dd8ad>] send_queued+0xed/0x130 [libceph] [ 198.949892] [<ffffffffa02dfd81>] ceph_osdc_handle_map+0x261/0x3b0 [libceph] [ 198.949892] [<ffffffffa02d711c>] ? ceph_msg_new+0x15c/0x230 [libceph] [ 198.949892] [<ffffffffa02e01e0>] dispatch+0x150/0x360 [libceph] [ 198.949892] [<ffffffffa02da54f>] con_work+0x214f/0x21d0 [libceph] [ 198.949892] [<ffffffffa02d8400>] ? ceph_con_send+0x120/0x120 [libceph] [ 198.949892] [<ffffffff8108110d>] process_one_work+0x11d/0x430 [ 198.949892] [<ffffffff81081c69>] worker_thread+0x169/0x360 [ 198.949892] [<ffffffff81081b00>] ? manage_workers.clone.21+0x240/0x240 [ 198.949892] [<ffffffff81086496>] kthread+0x96/0xa0 [ 198.949892] [<ffffffff815e5bb4>] kernel_thread_helper+0x4/0x10 [ 198.949892] [<ffffffff81086400>] ? flush_kthread_worker+0xb0/0xb0 [ 198.949892] [<ffffffff815e5bb0>] ? gs_change+0x13/0x13 [ 198.949892] Code: 5e 41 5f c9 c3 be 3e 01 00 00 48 c7 c7 5b 3a 7d 81 e8 85 d3 fd ff e9 84 fe ff ff 55 48 89 e5 66 66 66 66 90 48 8b 87 70 03 00 00 [ 198.949892] 8b 40 f8 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 [ 198.949892] RIP [<ffffffff810868f0>] kthread_data+0x10/0x20 [ 198.949892] RSP <ffff880405cd5868> [ 198.949892] CR2: fffffffffffffff8 [ 198.949892] ---[ end trace 03cb81299b093f06 ]--- [ 198.949892] Fixing recursive fault but reboot is needed! ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: ceph kernel bug 2011-09-10 21:12 ceph kernel bug Martin Mailand @ 2011-09-10 22:47 ` Sage Weil 2011-09-10 23:46 ` Martin Mailand 0 siblings, 1 reply; 10+ messages in thread From: Sage Weil @ 2011-09-10 22:47 UTC (permalink / raw) To: Martin Mailand; +Cc: ceph-devel Hi Martin, Is this reproducible? If so, does the patch below fix it? Thanks! sage diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 5634216..dcd3475 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -31,6 +31,7 @@ static void __unregister_linger_request(struct ceph_osd_client *osdc, struct ceph_osd_request *req); static int __send_request(struct ceph_osd_client *osdc, struct ceph_osd_request *req); +static void __cancel_request(struct ceph_osd_request *req); static int op_needs_trail(int op) { @@ -571,6 +572,7 @@ static void __kick_osd_requests(struct ceph_osd_client *osdc, return; list_for_each_entry(req, &osd->o_requests, r_osd_item) { + __cancel_request(req); list_move(&req->r_req_lru_item, &osdc->req_unsent); dout("requeued %p tid %llu osd%d\n", req, req->r_tid, osd->o_osd); On Sat, 10 Sep 2011, Martin Mailand wrote: > Hi, > I hit the following Bug. My Setup is very simple I have two osd (osd1 and > osd2) and one monitor. > On the fourth machine I mount ceph via the rbd device and I use the rbd device > for a qemu instance. > When I reboot one of the two osds I hit reproducible this bug. > On all machine I use the kernel version 3.1.0-rc5 and ceph version 0.34-1natty > from the newdream repro. > > Regards, > martin > > [ 105.746163] libceph: osd2 192.168.42.114:6800 socket closed > [ 105.757635] libceph: osd2 192.168.42.114:6800 connection failed > [ 106.040203] libceph: osd2 192.168.42.114:6800 connection failed > [ 107.040231] libceph: osd2 192.168.42.114:6800 connection failed > [ 109.040508] libceph: osd2 192.168.42.114:6800 connection failed > [ 113.050453] libceph: osd2 192.168.42.114:6800 connection failed > [ 121.060191] libceph: osd2 192.168.42.114:6800 connection failed > [ 137.090484] libceph: osd2 192.168.42.114:6800 connection failed > [ 198.237123] ------------[ cut here ]------------ > [ 198.246419] kernel BUG at net/ceph/messenger.c:2193! > [ 198.246949] invalid opcode: 0000 [#1] SMP > [ 198.246949] CPU 0 > [ 198.246949] Modules linked in: rbd libceph libcrc32c ip6table_filter > ip6_tables iptable_filter ip_tables x_tables nv_tco bridge stp kvm_amd kvm > radeon ttm psmouse drm_kms_helper drm i2c_algo_bit k10temp i2c_nforce2 shpchp > amd64_edac_mod serio_raw edac_core edac_mce_amd lp parport ses enclosure > aacraid forcedeth > [ 198.246949] > [ 198.246949] Pid: 10, comm: kworker/0:1 Not tainted 3.1.0-rc5-custom #1 > Supermicro H8DM8-2/H8DM8-2 > [ 198.246949] RIP: 0010:[<ffffffffa02d83f1>] [<ffffffffa02d83f1>] > ceph_con_send+0x111/0x120 [libceph] > [ 198.246949] RSP: 0018:ffff880405cd5bc0 EFLAGS: 00010202 > [ 198.246949] RAX: ffff880803fe7878 RBX: ffff880403fb8030 RCX: > ffff880803fd1650 > [ 198.246949] RDX: ffff880405cd5fd8 RSI: ffff880803fe7800 RDI: > ffff880403fb81a8 > [ 198.246949] RBP: ffff880405cd5be0 R08: ffff880405cd5b70 R09: > 0000000000000002 > [ 198.246949] R10: 0000000000000002 R11: 0000000000000072 R12: > ffff880403fb81a8 > [ 198.246949] R13: ffff880803fe7800 R14: ffff880803fd1660 R15: > ffff880803fd1650 > [ 198.246949] FS: 00007fea65610700(0000) GS:ffff88040fc00000(0000) > knlGS:0000000000000000 > [ 198.246949] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 198.246949] CR2: 00007f61e407f000 CR3: 0000000001a05000 CR4: > 00000000000006f0 > [ 198.246949] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 198.246949] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [ 198.246949] Process kworker/0:1 (pid: 10, threadinfo ffff880405cd4000, task > ffff880405cc5bc0) > [ 198.246949] Stack: > [ 198.246949] ffff880405cd5be0 ffff880804fb5800 ffff880803fd1630 > ffff880803fd15a8 > [ 198.246949] ffff880405cd5c30 ffffffffa02dd8ad ffff880803fd1480 > ffff880803fd1600 > [ 198.246949] ffff880405cd5c30 ffff8803fde4c644 ffff880803fd15a8 > 0000000000000000 > [ 198.246949] Call Trace: > [ 198.246949] [<ffffffffa02dd8ad>] send_queued+0xed/0x130 [libceph] > [ 198.246949] [<ffffffffa02dfd81>] ceph_osdc_handle_map+0x261/0x3b0 > [libceph] > [ 198.246949] [<ffffffffa02d711c>] ? ceph_msg_new+0x15c/0x230 [libceph] > [ 198.246949] [<ffffffffa02e01e0>] dispatch+0x150/0x360 [libceph] > [ 198.246949] [<ffffffffa02da54f>] con_work+0x214f/0x21d0 [libceph] > [ 198.246949] [<ffffffffa02d8400>] ? ceph_con_send+0x120/0x120 [libceph] > [ 198.246949] [<ffffffff8108110d>] process_one_work+0x11d/0x430 > [ 198.246949] [<ffffffff81081c69>] worker_thread+0x169/0x360 > [ 198.246949] [<ffffffff81081b00>] ? manage_workers.clone.21+0x240/0x240 > [ 198.246949] [<ffffffff81086496>] kthread+0x96/0xa0 > [ 198.246949] [<ffffffff815e5bb4>] kernel_thread_helper+0x4/0x10 > [ 198.246949] [<ffffffff81086400>] ? flush_kthread_worker+0xb0/0xb0 > [ 198.246949] [<ffffffff815e5bb0>] ? gs_change+0x13/0x13 > [ 198.246949] Code: 65 f0 4c 8b 6d f8 c9 c3 66 90 48 8d be 88 00 00 00 48 c7 > c6 70 a8 2d a0 e8 dd 9c 00 e1 48 8b 5d e8 4c 8b 65 f0 4c 8b 6d f8 c9 c3 <0f> > 0b 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 > [ 198.246949] RIP [<ffffffffa02d83f1>] ceph_con_send+0x111/0x120 [libceph] > [ 198.246949] RSP <ffff880405cd5bc0> > [ 198.927024] ---[ end trace 03cb81299b093f05 ]--- > [ 198.940010] BUG: unable to handle kernel paging request at fffffffffffffff8 > [ 198.949892] IP: [<ffffffff810868f0>] kthread_data+0x10/0x20 > [ 198.949892] PGD 1a07067 PUD 1a08067 PMD 0 > [ 198.949892] Oops: 0000 [#2] SMP > [ 198.949892] CPU 0 > [ 198.949892] Modules linked in: rbd libceph libcrc32c ip6table_filter > ip6_tables iptable_filter ip_tables x_tables nv_tco bridge stp kvm_amd kvm > radeon ttm psmouse drm_kms_helper drm i2c_algo_bit k10temp i2c_nforce2 shpchp > amd64_edac_mod serio_raw edac_core edac_mce_amd lp parport ses enclosure > aacraid forcedeth > [ 198.949892] > [ 198.949892] Pid: 10, comm: kworker/0:1 Tainted: G D 3.1.0-rc5-custom > #1 Supermicro H8DM8-2/H8DM8-2 > [ 198.949892] RIP: 0010:[<ffffffff810868f0>] [<ffffffff810868f0>] > kthread_data+0x10/0x20 > [ 198.949892] RSP: 0018:ffff880405cd5868 EFLAGS: 00010096 > [ 198.949892] RAX: 0000000000000000 RBX: 0000000000000000 RCX: > 0000000000000000 > [ 198.949892] RDX: ffff880405cc5bc0 RSI: 0000000000000000 RDI: > ffff880405cc5bc0 > [ 198.949892] RBP: ffff880405cd5868 R08: 0000000000989680 R09: > 0000000000000000 > [ 198.949892] R10: 0000000000000400 R11: 0000000000000006 R12: > ffff880405cc5f88 > [ 198.949892] R13: 0000000000000000 R14: 0000000000000000 R15: > ffff880405cc5e90 > [ 198.949892] FS: 00007fea65610700(0000) GS:ffff88040fc00000(0000) > knlGS:0000000000000000 > [ 198.949892] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 198.949892] CR2: fffffffffffffff8 CR3: 0000000001a05000 CR4: > 00000000000006f0 > [ 198.949892] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 198.949892] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [ 198.949892] Process kworker/0:1 (pid: 10, threadinfo ffff880405cd4000, task > ffff880405cc5bc0) > [ 198.949892] Stack: > [ 198.949892] ffff880405cd5888 ffffffff81082345 ffff880405cd5888 > ffff88040fc13080 > [ 198.949892] ffff880405cd5918 ffffffff815d9092 ffff880405e5a558 > ffff880405cc5bc0 > [ 198.949892] ffff880405cd58d8 ffff880405cd5fd8 ffff880405cd4000 > ffff880405cd5fd8 > [ 198.949892] Call Trace: > [ 198.949892] [<ffffffff81082345>] wq_worker_sleeping+0x15/0xa0 > [ 198.949892] [<ffffffff815d9092>] __schedule+0x5c2/0x8b0 > [ 198.949892] [<ffffffff812caf96>] ? put_io_context+0x46/0x70 > [ 198.949892] [<ffffffff8105b72f>] schedule+0x3f/0x60 > [ 198.949892] [<ffffffff81068223>] do_exit+0x5e3/0x8a0 > [ 198.949892] [<ffffffff815dcc4f>] oops_end+0xaf/0xf0 > [ 198.949892] [<ffffffff8101689b>] die+0x5b/0x90 > [ 198.949892] [<ffffffff815dc354>] do_trap+0xc4/0x170 > [ 198.949892] [<ffffffff81013f25>] do_invalid_op+0x95/0xb0 > [ 198.949892] [<ffffffffa02d83f1>] ? ceph_con_send+0x111/0x120 [libceph] > [ 198.949892] [<ffffffffa02e276a>] ? ceph_calc_pg_acting+0x2a/0x90 [libceph] > [ 198.949892] [<ffffffff815e5a2b>] invalid_op+0x1b/0x20 > [ 198.949892] [<ffffffffa02d83f1>] ? ceph_con_send+0x111/0x120 [libceph] > [ 198.949892] [<ffffffffa02dd8ad>] send_queued+0xed/0x130 [libceph] > [ 198.949892] [<ffffffffa02dfd81>] ceph_osdc_handle_map+0x261/0x3b0 > [libceph] > [ 198.949892] [<ffffffffa02d711c>] ? ceph_msg_new+0x15c/0x230 [libceph] > [ 198.949892] [<ffffffffa02e01e0>] dispatch+0x150/0x360 [libceph] > [ 198.949892] [<ffffffffa02da54f>] con_work+0x214f/0x21d0 [libceph] > [ 198.949892] [<ffffffffa02d8400>] ? ceph_con_send+0x120/0x120 [libceph] > [ 198.949892] [<ffffffff8108110d>] process_one_work+0x11d/0x430 > [ 198.949892] [<ffffffff81081c69>] worker_thread+0x169/0x360 > [ 198.949892] [<ffffffff81081b00>] ? manage_workers.clone.21+0x240/0x240 > [ 198.949892] [<ffffffff81086496>] kthread+0x96/0xa0 > [ 198.949892] [<ffffffff815e5bb4>] kernel_thread_helper+0x4/0x10 > [ 198.949892] [<ffffffff81086400>] ? flush_kthread_worker+0xb0/0xb0 > [ 198.949892] [<ffffffff815e5bb0>] ? gs_change+0x13/0x13 > [ 198.949892] Code: 5e 41 5f c9 c3 be 3e 01 00 00 48 c7 c7 5b 3a 7d 81 e8 85 > d3 fd ff e9 84 fe ff ff 55 48 89 e5 66 66 66 66 90 48 8b 87 70 03 00 00 > [ 198.949892] 8b 40 f8 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 > [ 198.949892] RIP [<ffffffff810868f0>] kthread_data+0x10/0x20 > [ 198.949892] RSP <ffff880405cd5868> > [ 198.949892] CR2: fffffffffffffff8 > [ 198.949892] ---[ end trace 03cb81299b093f06 ]--- > [ 198.949892] Fixing recursive fault but reboot is needed! > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: ceph kernel bug 2011-09-10 22:47 ` Sage Weil @ 2011-09-10 23:46 ` Martin Mailand 2011-09-15 19:41 ` Martin Mailand 0 siblings, 1 reply; 10+ messages in thread From: Martin Mailand @ 2011-09-10 23:46 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Hi Sage, no it did not fix it. Here the new trace. Regards, martin [ 182.721180] libceph: osd2 192.168.42.114:6800 socket closed [ 182.732642] libceph: osd2 192.168.42.114:6800 connection failed [ 183.040233] libceph: osd2 192.168.42.114:6800 connection failed [ 184.040204] libceph: osd2 192.168.42.114:6800 connection failed [ 186.040244] libceph: osd2 192.168.42.114:6800 connection failed [ 190.060233] libceph: osd2 192.168.42.114:6800 connection failed [ 198.060214] libceph: osd2 192.168.42.114:6800 connection failed [ 213.964994] ------------[ cut here ]------------ [ 213.974288] kernel BUG at net/ceph/messenger.c:2193! [ 213.974470] invalid opcode: 0000 [#1] SMP [ 213.974470] CPU 0 [ 213.974470] Modules linked in: rbd libceph libcrc32c ip6table_filter ip6_tables iptable_filter ip_tables x_tables nv_tco bridge stp kvm_amd kvm radeon lp psmouse shpchp parport i2c_nforce2 amd64_edac_mod ttm drm_kms_helper drm edac_core i2c_algo_bit edac_mce_amd serio_raw k10temp ses enclosure aacraid forcedeth [ 213.974470] [ 213.974470] Pid: 10, comm: kworker/0:1 Not tainted 3.1.0-rc5-custom #3 Supermicro H8DM8-2/H8DM8-2 [ 213.974470] RIP: 0010:[<ffffffffa02cf3f1>] [<ffffffffa02cf3f1>] ceph_con_send+0x111/0x120 [libceph] [ 213.974470] RSP: 0018:ffff880405cddbd0 EFLAGS: 00010283 [ 213.974470] RAX: ffff880403e93c78 RBX: ffff880803f97030 RCX: ffff8808034d2e50 [ 213.974470] RDX: ffff880405cddfd8 RSI: ffff880403e93c00 RDI: ffff880803f971a8 [ 213.974470] RBP: ffff880405cddbf0 R08: ffff88040fc0de40 R09: 000000000000fffb [ 213.974470] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880803f971a8 [ 213.974470] R13: ffff880403e93c00 R14: ffff8808034d2e60 R15: ffff8808034d2e50 [ 213.974470] FS: 00007f5909978720(0000) GS:ffff88040fc00000(0000) knlGS:0000000000000000 [ 213.974470] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 213.974470] CR2: ffffffffff600400 CR3: 0000000404e6f000 CR4: 00000000000006f0 [ 213.974470] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 213.974470] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 213.974470] Process kworker/0:1 (pid: 10, threadinfo ffff880405cdc000, task ffff880405cb5bc0) [ 213.974470] Stack: [ 213.974470] ffff880405cddbf0 ffff880403e0ac00 ffff8808034d2e30 ffff8808034d2da8 [ 213.974470] ffff880405cddc40 ffffffffa02d490d ffff8808034d2c80 ffff8808034d2e00 [ 213.974470] ffff880405cddc40 ffff8804041d1c91 ffff8808034d2da8 0000000000000000 [ 213.974470] Call Trace: [ 213.974470] [<ffffffffa02d490d>] send_queued+0xed/0x130 [libceph] [ 213.974470] [<ffffffffa02d6d91>] ceph_osdc_handle_map+0x261/0x3b0 [libceph] [ 213.974470] [<ffffffffa02d331f>] dispatch+0x10f/0x580 [libceph] [ 213.974470] [<ffffffffa02d154f>] con_work+0x214f/0x21d0 [libceph] [ 213.974470] [<ffffffffa02cf400>] ? ceph_con_send+0x120/0x120 [libceph] [ 213.974470] [<ffffffff8108110d>] process_one_work+0x11d/0x430 [ 213.974470] [<ffffffff81081c69>] worker_thread+0x169/0x360 [ 213.974470] [<ffffffff81081b00>] ? manage_workers.clone.21+0x240/0x240 [ 213.974470] [<ffffffff81086496>] kthread+0x96/0xa0 [ 213.974470] [<ffffffff815e5bb4>] kernel_thread_helper+0x4/0x10 [ 213.974470] [<ffffffff81086400>] ? flush_kthread_worker+0xb0/0xb0 [ 213.974470] [<ffffffff815e5bb0>] ? gs_change+0x13/0x13 [ 213.974470] Code: 65 f0 4c 8b 6d f8 c9 c3 66 90 48 8d be 88 00 00 00 48 c7 c6 70 18 2d a0 e8 dd 2c 01 e1 48 8b 5d e8 4c 8b 65 f0 4c 8b 6d f8 c9 c3 <0f> 0b 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 [ 213.974470] RIP [<ffffffffa02cf3f1>] ceph_con_send+0x111/0x120 [libceph] [ 213.974470] RSP <ffff880405cddbd0> [ 214.640753] ---[ end trace 837698aee31a73fc ]--- [ 214.653687] BUG: unable to handle kernel paging request at fffffffffffffff8 [ 214.663571] IP: [<ffffffff810868f0>] kthread_data+0x10/0x20 [ 214.663571] PGD 1a07067 PUD 1a08067 PMD 0 [ 214.663571] Oops: 0000 [#2] SMP [ 214.663571] CPU 0 [ 214.663571] Modules linked in: rbd libceph libcrc32c ip6table_filter ip6_tables iptable_filter ip_tables x_tables nv_tco bridge stp kvm_amd kvm radeon lp psmouse shpchp parport i2c_nforce2 amd64_edac_mod ttm drm_kms_helper drm edac_core i2c_algo_bit edac_mce_amd serio_raw k10temp ses enclosure aacraid forcedeth [ 214.663571] [ 214.663571] Pid: 10, comm: kworker/0:1 Tainted: G D 3.1.0-rc5-custom #3 Supermicro H8DM8-2/H8DM8-2 [ 214.663571] RIP: 0010:[<ffffffff810868f0>] [<ffffffff810868f0>] kthread_data+0x10/0x20 [ 214.663571] RSP: 0018:ffff880405cdd878 EFLAGS: 00010096 [ 214.663571] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 214.663571] RDX: ffff880405cb5bc0 RSI: 0000000000000000 RDI: ffff880405cb5bc0 [ 214.663571] RBP: ffff880405cdd878 R08: 0000000000989680 R09: 0000000000000000 [ 214.663571] R10: 0000000000000400 R11: 0000000000000006 R12: ffff880405cb5f88 [ 214.663571] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880405cb5e90 [ 214.663571] FS: 00007f5909978720(0000) GS:ffff88040fc00000(0000) knlGS:0000000000000000 [ 214.663571] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 214.663571] CR2: fffffffffffffff8 CR3: 0000000404e6f000 CR4: 00000000000006f0 [ 214.663571] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 214.663571] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 214.663571] Process kworker/0:1 (pid: 10, threadinfo ffff880405cdc000, task ffff880405cb5bc0) [ 214.663571] Stack: [ 214.663571] ffff880405cdd898 ffffffff81082345 ffff880405cdd898 ffff88040fc13080 [ 214.663571] ffff880405cdd928 ffffffff815d9092 ffff8804050938b8 ffff880405cb5bc0 [ 214.663571] ffff880405cdd8e8 ffff880405cddfd8 ffff880405cdc000 ffff880405cddfd8 [ 214.663571] Call Trace: [ 214.663571] [<ffffffff81082345>] wq_worker_sleeping+0x15/0xa0 [ 214.663571] [<ffffffff815d9092>] __schedule+0x5c2/0x8b0 [ 214.663571] [<ffffffff812caf96>] ? put_io_context+0x46/0x70 [ 214.663571] [<ffffffff8105b72f>] schedule+0x3f/0x60 [ 214.663571] [<ffffffff81068223>] do_exit+0x5e3/0x8a0 [ 214.663571] [<ffffffff815dcc4f>] oops_end+0xaf/0xf0 [ 214.663571] [<ffffffff8101689b>] die+0x5b/0x90 [ 214.663571] [<ffffffff815dc354>] do_trap+0xc4/0x170 [ 214.663571] [<ffffffff81013f25>] do_invalid_op+0x95/0xb0 [ 214.663571] [<ffffffffa02cf3f1>] ? ceph_con_send+0x111/0x120 [libceph] [ 214.663571] [<ffffffff812e9759>] ? vsnprintf+0x479/0x620 [ 214.663571] [<ffffffff8103be49>] ? default_spin_lock_flags+0x9/0x10 [ 214.663571] [<ffffffff815e5a2b>] invalid_op+0x1b/0x20 [ 214.663571] [<ffffffffa02cf3f1>] ? ceph_con_send+0x111/0x120 [libceph] [ 214.663571] [<ffffffffa02d490d>] send_queued+0xed/0x130 [libceph] [ 214.663571] [<ffffffffa02d6d91>] ceph_osdc_handle_map+0x261/0x3b0 [libceph] [ 214.663571] [<ffffffffa02d331f>] dispatch+0x10f/0x580 [libceph] [ 214.663571] [<ffffffffa02d154f>] con_work+0x214f/0x21d0 [libceph] [ 214.663571] [<ffffffffa02cf400>] ? ceph_con_send+0x120/0x120 [libceph] [ 214.663571] [<ffffffff8108110d>] process_one_work+0x11d/0x430 [ 214.663571] [<ffffffff81081c69>] worker_thread+0x169/0x360 [ 214.663571] [<ffffffff81081b00>] ? manage_workers.clone.21+0x240/0x240 [ 214.663571] [<ffffffff81086496>] kthread+0x96/0xa0 [ 214.663571] [<ffffffff815e5bb4>] kernel_thread_helper+0x4/0x10 [ 214.663571] [<ffffffff81086400>] ? flush_kthread_worker+0xb0/0xb0 [ 214.663571] [<ffffffff815e5bb0>] ? gs_change+0x13/0x13 [ 214.663571] Code: 5e 41 5f c9 c3 be 3e 01 00 00 48 c7 c7 5b 3a 7d 81 e8 85 d3 fd ff e9 84 fe ff ff 55 48 89 e5 66 66 66 66 90 48 8b 87 70 03 00 00 [ 214.663571] 8b 40 f8 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 [ 214.663571] RIP [<ffffffff810868f0>] kthread_data+0x10/0x20 [ 214.663571] RSP <ffff880405cdd878> [ 214.663571] CR2: fffffffffffffff8 [ 214.663571] ---[ end trace 837698aee31a73fd ]--- [ 214.663571] Fixing recursive fault but reboot is needed! Sage Weil schrieb: > Hi Martin, > > Is this reproducible? If so, does the patch below fix it? > > Thanks! > sage > > diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c > index 5634216..dcd3475 100644 > --- a/net/ceph/osd_client.c > +++ b/net/ceph/osd_client.c > @@ -31,6 +31,7 @@ static void __unregister_linger_request(struct ceph_osd_client *osdc, > struct ceph_osd_request *req); > static int __send_request(struct ceph_osd_client *osdc, > struct ceph_osd_request *req); > +static void __cancel_request(struct ceph_osd_request *req); > > static int op_needs_trail(int op) > { > @@ -571,6 +572,7 @@ static void __kick_osd_requests(struct ceph_osd_client *osdc, > return; > > list_for_each_entry(req, &osd->o_requests, r_osd_item) { > + __cancel_request(req); > list_move(&req->r_req_lru_item, &osdc->req_unsent); > dout("requeued %p tid %llu osd%d\n", req, req->r_tid, > osd->o_osd); > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: ceph kernel bug 2011-09-10 23:46 ` Martin Mailand @ 2011-09-15 19:41 ` Martin Mailand 2011-09-15 20:06 ` Sage Weil 0 siblings, 1 reply; 10+ messages in thread From: Martin Mailand @ 2011-09-15 19:41 UTC (permalink / raw) To: martin; +Cc: Sage Weil, ceph-devel Hi Sage, I am still hitting this in -rc6. It happeneds every time I stop an OSD. Do you need more information to reproduce it? Best Regards, martin [103159.164630] libceph: osd0 192.168.42.113:6800 socket closed [103169.153484] ------------[ cut here ]------------ [103169.162935] kernel BUG at net/ceph/messenger.c:2193! [103169.163332] invalid opcode: 0000 [#1] SMP [103169.163332] CPU 0 [103169.163332] Modules linked in: btrfs zlib_deflate rbd libceph libcrc32c ip6table_filter ip6_tables iptable_filter ip_tables x_tables kvm_amd kvm bridge nv_tco stp radeon ttm drm_kms_helper drm lp parport i2c_algo_bit amd64_edac_mod i2c_nforce2 edac_core edac_mce_amd k10temp shpchp psmouse serio_raw ses enclosure aacraid forcedeth [103169.163332] [103169.163332] Pid: 4405, comm: kworker/0:1 Not tainted 3.1.0-rc6 #1 Supermicro H8DM8-2/H8DM8-2 [103169.163332] RIP: 0010:[<ffffffffa02b73f1>] [<ffffffffa02b73f1>] ceph_con_send+0x111/0x120 [libceph] [103169.163332] RSP: 0018:ffff88031c5b3bd0 EFLAGS: 00010202 [103169.163332] RAX: ffff88040502c678 RBX: ffff88040452b030 RCX: ffff88031c8a9e50 [103169.163332] RDX: ffff88031c5b3fd8 RSI: ffff88040502c600 RDI: ffff88040452b1a8 [103169.163332] RBP: ffff88031c5b3bf0 R08: ffff88040fc0de40 R09: 0000000000000002 [103169.163332] R10: 0000000000000002 R11: 0000000000000072 R12: ffff88040452b1a8 [103169.163332] R13: ffff88040502c600 R14: ffff88031c8a9e60 R15: ffff88031c8a9e50 [103169.163332] FS: 00007f6d43dd2700(0000) GS:ffff88040fc00000(0000) knlGS:0000000000000000 [103169.163332] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [103169.163332] CR2: ffffffffff600400 CR3: 0000000403fb1000 CR4: 00000000000006f0 [103169.163332] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [103169.163332] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [103169.163332] Process kworker/0:1 (pid: 4405, threadinfo ffff88031c5b2000, task ffff880405cd5bc0) [103169.163332] Stack: [103169.163332] ffff88031c5b3bf0 ffff880404632a00 ffff88031c8a9e30 ffff88031c8a9da8 [103169.163332] ffff88031c5b3c40 ffffffffa02bc8ad ffff88031c8a9c80 ffff88031c8a9e00 [103169.163332] ffff88031c5b3c40 ffff8804045b7151 ffff88031c8a9da8 0000000000000000 [103169.163332] Call Trace: [103169.163332] [<ffffffffa02bc8ad>] send_queued+0xed/0x130 [libceph] [103169.163332] [<ffffffffa02bed81>] ceph_osdc_handle_map+0x261/0x3b0 [libceph] [103169.163332] [<ffffffffa02bb31f>] dispatch+0x10f/0x580 [libceph] [103169.163332] [<ffffffffa02b954f>] con_work+0x214f/0x21d0 [libceph] [103169.163332] [<ffffffffa02b7400>] ? ceph_con_send+0x120/0x120 [libceph] [103169.163332] [<ffffffff8108110d>] process_one_work+0x11d/0x430 [103169.163332] [<ffffffff81081c69>] worker_thread+0x169/0x360 [103169.163332] [<ffffffff81081b00>] ? manage_workers.clone.21+0x240/0x240 [103169.163332] [<ffffffff81086496>] kthread+0x96/0xa0 [103169.163332] [<ffffffff815e5c34>] kernel_thread_helper+0x4/0x10 [103169.163332] [<ffffffff81086400>] ? flush_kthread_worker+0xb0/0xb0 [103169.163332] [<ffffffff815e5c30>] ? gs_change+0x13/0x13 [103169.163332] Code: 65 f0 4c 8b 6d f8 c9 c3 66 90 48 8d be 88 00 00 00 48 c7 c6 70 98 2b a0 e8 1d ad 02 e1 48 8b 5d e8 4c 8b 65 f0 4c 8b 6d f8 c9 c3 <0f> 0b 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 [103169.163332] RIP [<ffffffffa02b73f1>] ceph_con_send+0x111/0x120 [libceph] [103169.163332] RSP <ffff88031c5b3bd0> [103169.805672] ---[ end trace 49d197af1dff5a93 ]--- [103169.818910] BUG: unable to handle kernel paging request at fffffffffffffff8 [103169.828781] IP: [<ffffffff810868f0>] kthread_data+0x10/0x20 [103169.828781] PGD 1a07067 PUD 1a08067 PMD 0 [103169.828781] Oops: 0000 [#2] SMP [103169.828781] CPU 0 [103169.828781] Modules linked in: btrfs zlib_deflate rbd libceph libcrc32c ip6table_filter ip6_tables iptable_filter ip_tables x_tables kvm_amd kvm bridge nv_tco stp radeon ttm drm_kms_helper drm lp parport i2c_algo_bit amd64_edac_mod i2c_nforce2 edac_core edac_mce_amd k10temp shpchp psmouse serio_raw ses enclosure aacraid forcedeth [103169.828781] [103169.828781] Pid: 4405, comm: kworker/0:1 Tainted: G D 3.1.0-rc6 #1 Supermicro H8DM8-2/H8DM8-2 [103169.828781] RIP: 0010:[<ffffffff810868f0>] [<ffffffff810868f0>] kthread_data+0x10/0x20 [103169.828781] RSP: 0018:ffff88031c5b3878 EFLAGS: 00010096 [103169.828781] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [103169.828781] RDX: ffff880405cd5bc0 RSI: 0000000000000000 RDI: ffff880405cd5bc0 [103169.828781] RBP: ffff88031c5b3878 R08: 0000000000989680 R09: 0000000000000000 [103169.828781] R10: 0000000000000400 R11: 0000000000000005 R12: ffff880405cd5f88 [103169.828781] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880405cd5e90 [103169.828781] FS: 00007f6d43dd2700(0000) GS:ffff88040fc00000(0000) knlGS:0000000000000000 [103169.828781] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [103169.828781] CR2: fffffffffffffff8 CR3: 0000000403fb1000 CR4: 00000000000006f0 [103169.828781] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [103169.828781] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [103169.828781] Process kworker/0:1 (pid: 4405, threadinfo ffff88031c5b2000, task ffff880405cd5bc0) [103169.828781] Stack: [103169.828781] ffff88031c5b3898 ffffffff81082345 ffff88031c5b3898 ffff88040fc13080 [103169.828781] ffff88031c5b3928 ffffffff815d9142 ffff88031c5b38c8 ffff880405cd5bc0 [103169.828781] ffff880405cd5bc0 ffff88031c5b3fd8 ffff88031c5b2000 ffff88031c5b3fd8 [103169.828781] Call Trace: [103169.828781] [<ffffffff81082345>] wq_worker_sleeping+0x15/0xa0 [103169.828781] [<ffffffff815d9142>] __schedule+0x5c2/0x8b0 [103169.828781] [<ffffffff8105b72f>] schedule+0x3f/0x60 [103169.828781] [<ffffffff81068223>] do_exit+0x5e3/0x8a0 [103169.828781] [<ffffffff815dcccf>] oops_end+0xaf/0xf0 [103169.828781] [<ffffffff8101689b>] die+0x5b/0x90 [103169.828781] [<ffffffff815dc3d4>] do_trap+0xc4/0x170 [103169.828781] [<ffffffff81013f25>] do_invalid_op+0x95/0xb0 [103169.828781] [<ffffffffa02b73f1>] ? ceph_con_send+0x111/0x120 [libceph] [103169.828781] [<ffffffff8103be49>] ? default_spin_lock_flags+0x9/0x10 [103169.828781] [<ffffffff815e5aab>] invalid_op+0x1b/0x20 [103169.828781] [<ffffffffa02b73f1>] ? ceph_con_send+0x111/0x120 [libceph] [103169.828781] [<ffffffffa02bc8ad>] send_queued+0xed/0x130 [libceph] [103169.828781] [<ffffffffa02bed81>] ceph_osdc_handle_map+0x261/0x3b0 [libceph] [103169.828781] [<ffffffffa02bb31f>] dispatch+0x10f/0x580 [libceph] [103169.828781] [<ffffffffa02b954f>] con_work+0x214f/0x21d0 [libceph] [103169.828781] [<ffffffffa02b7400>] ? ceph_con_send+0x120/0x120 [libceph] [103169.828781] [<ffffffff8108110d>] process_one_work+0x11d/0x430 [103169.828781] [<ffffffff81081c69>] worker_thread+0x169/0x360 [103169.828781] [<ffffffff81081b00>] ? manage_workers.clone.21+0x240/0x240 [103169.828781] [<ffffffff81086496>] kthread+0x96/0xa0 [103169.828781] [<ffffffff815e5c34>] kernel_thread_helper+0x4/0x10 [103169.828781] [<ffffffff81086400>] ? flush_kthread_worker+0xb0/0xb0 [103169.828781] [<ffffffff815e5c30>] ? gs_change+0x13/0x13 [103169.828781] Code: 5e 41 5f c9 c3 be 3e 01 00 00 48 c7 c7 54 3a 7d 81 e8 85 d3 fd ff e9 84 fe ff ff 55 48 89 e5 66 66 66 66 90 48 8b 87 70 03 00 00 [103169.828781] 8b 40 f8 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 [103169.828781] RIP [<ffffffff810868f0>] kthread_data+0x10/0x20 [103169.828781] RSP <ffff88031c5b3878> [103169.828781] CR2: fffffffffffffff8 [103169.828781] ---[ end trace 49d197af1dff5a94 ]--- [103169.828781] Fixing recursive fault but reboot is needed! > Sage Weil schrieb: >> Hi Martin, >> >> Is this reproducible? If so, does the patch below fix it? >> >> Thanks! >> sage >> >> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c >> index 5634216..dcd3475 100644 >> --- a/net/ceph/osd_client.c >> +++ b/net/ceph/osd_client.c >> @@ -31,6 +31,7 @@ static void __unregister_linger_request(struct >> ceph_osd_client *osdc, >> struct ceph_osd_request *req); >> static int __send_request(struct ceph_osd_client *osdc, >> struct ceph_osd_request *req); >> +static void __cancel_request(struct ceph_osd_request *req); >> >> static int op_needs_trail(int op) >> { >> @@ -571,6 +572,7 @@ static void __kick_osd_requests(struct >> ceph_osd_client *osdc, >> return; >> >> list_for_each_entry(req, &osd->o_requests, r_osd_item) { >> + __cancel_request(req); >> list_move(&req->r_req_lru_item, &osdc->req_unsent); >> dout("requeued %p tid %llu osd%d\n", req, req->r_tid, >> osd->o_osd); >> >> >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: ceph kernel bug 2011-09-15 19:41 ` Martin Mailand @ 2011-09-15 20:06 ` Sage Weil 2011-09-15 21:10 ` Martin Mailand 0 siblings, 1 reply; 10+ messages in thread From: Sage Weil @ 2011-09-15 20:06 UTC (permalink / raw) To: Martin Mailand; +Cc: ceph-devel On Thu, 15 Sep 2011, Martin Mailand wrote: > Hi Sage, > I am still hitting this in -rc6. It happeneds every time I stop an OSD. > Do you need more information to reproduce it? Oh, great to hear it's easy to reproduce! I was trying (in my uml environment) and failing. Can run the script below right before stopping the osd, and send the dmesg output along? (Or attach to http://tracker.newdream.net/issues/1382) Thanks! sage #!/bin/sh -x p() { echo "$*" > /sys/kernel/debug/dynamic_debug/control } p 'module ceph +p' p 'module libceph +p' p 'module rbd +p' p 'file net/ceph/messenger.c -p' p 'file' `grep -- --- /sys/kernel/debug/dynamic_debug/control | grep ceph \ | awk '{print $1}' | sed 's/:/ line /'` '+p' p 'file' `grep -- === /sys/kernel/debug/dynamic_debug/control | grep ceph \ | awk '{print $1}' | sed 's/:/ line /'` '+p' ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: ceph kernel bug 2011-09-15 20:06 ` Sage Weil @ 2011-09-15 21:10 ` Martin Mailand 2011-09-15 22:54 ` Sage Weil 0 siblings, 1 reply; 10+ messages in thread From: Martin Mailand @ 2011-09-15 21:10 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Hi Sage, that's quite a bit of output, I put it in a pastebin. http://pastebin.com/9CNJk0Pw. Best Regards, martin Sage Weil schrieb: > On Thu, 15 Sep 2011, Martin Mailand wrote: >> Hi Sage, >> I am still hitting this in -rc6. It happeneds every time I stop an OSD. >> Do you need more information to reproduce it? > > Oh, great to hear it's easy to reproduce! I was trying (in my uml > environment) and failing. > > Can run the script below right before stopping the osd, and send the dmesg > output along? (Or attach to http://tracker.newdream.net/issues/1382) > > Thanks! > sage > > > #!/bin/sh -x > > p() { > echo "$*" > /sys/kernel/debug/dynamic_debug/control > } > > p 'module ceph +p' > p 'module libceph +p' > p 'module rbd +p' > p 'file net/ceph/messenger.c -p' > p 'file' `grep -- --- /sys/kernel/debug/dynamic_debug/control | grep ceph \ > | awk '{print $1}' | sed 's/:/ line /'` '+p' > p 'file' `grep -- === /sys/kernel/debug/dynamic_debug/control | grep ceph \ > | awk '{print $1}' | sed 's/:/ line /'` '+p' > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: ceph kernel bug 2011-09-15 21:10 ` Martin Mailand @ 2011-09-15 22:54 ` Sage Weil 2011-09-16 9:05 ` Martin Mailand 0 siblings, 1 reply; 10+ messages in thread From: Sage Weil @ 2011-09-15 22:54 UTC (permalink / raw) To: Martin Mailand; +Cc: ceph-devel On Thu, 15 Sep 2011, Martin Mailand wrote: > Hi Sage, > that's quite a bit of output, I put it in a pastebin. > http://pastebin.com/9CNJk0Pw. Any chance you can include the output of 'objdump -rdS libceph.ko'? ceph.ko too, for good measure. This looks like a sightly different crash than the one on that bug! Thanks! sage > > Best Regards, > martin > > Sage Weil schrieb: > > On Thu, 15 Sep 2011, Martin Mailand wrote: > > > Hi Sage, > > > I am still hitting this in -rc6. It happeneds every time I stop an OSD. > > > Do you need more information to reproduce it? > > > > Oh, great to hear it's easy to reproduce! I was trying (in my uml > > environment) and failing. > > > > Can run the script below right before stopping the osd, and send the dmesg > > output along? (Or attach to http://tracker.newdream.net/issues/1382) > > > > Thanks! > > sage > > > > > > #!/bin/sh -x > > > > p() { > > echo "$*" > /sys/kernel/debug/dynamic_debug/control > > } > > > > p 'module ceph +p' > > p 'module libceph +p' > > p 'module rbd +p' > > p 'file net/ceph/messenger.c -p' > > p 'file' `grep -- --- /sys/kernel/debug/dynamic_debug/control | grep ceph \ > > | awk '{print $1}' | sed 's/:/ line /'` '+p' > > p 'file' `grep -- === /sys/kernel/debug/dynamic_debug/control | grep ceph \ > > | awk '{print $1}' | sed 's/:/ line /'` '+p' > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: ceph kernel bug 2011-09-15 22:54 ` Sage Weil @ 2011-09-16 9:05 ` Martin Mailand 2011-09-16 18:22 ` Sage Weil 0 siblings, 1 reply; 10+ messages in thread From: Martin Mailand @ 2011-09-16 9:05 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Hi Sage, I rerun the test and I think I triggered the first bug again. http://pastebin.com/ydNm0pff I did also the dumps for you. http://tuxadero.com/multistorage/ceph.ko_dump http://tuxadero.com/multistorage/libceph.ko_dump Best Regards, martin Am 16.09.2011 00:54, schrieb Sage Weil: > On Thu, 15 Sep 2011, Martin Mailand wrote: >> Hi Sage, >> that's quite a bit of output, I put it in a pastebin. >> http://pastebin.com/9CNJk0Pw. > Any chance you can include the output of 'objdump -rdS libceph.ko'? > ceph.ko too, for good measure. > > This looks like a sightly different crash than the one on that bug! > > Thanks! > sage > > >> Best Regards, >> martin >> >> Sage Weil schrieb: >>> On Thu, 15 Sep 2011, Martin Mailand wrote: >>>> Hi Sage, >>>> I am still hitting this in -rc6. It happeneds every time I stop an OSD. >>>> Do you need more information to reproduce it? >>> Oh, great to hear it's easy to reproduce! I was trying (in my uml >>> environment) and failing. >>> >>> Can run the script below right before stopping the osd, and send the dmesg >>> output along? (Or attach to http://tracker.newdream.net/issues/1382) >>> >>> Thanks! >>> sage >>> >>> >>> #!/bin/sh -x >>> >>> p() { >>> echo "$*"> /sys/kernel/debug/dynamic_debug/control >>> } >>> >>> p 'module ceph +p' >>> p 'module libceph +p' >>> p 'module rbd +p' >>> p 'file net/ceph/messenger.c -p' >>> p 'file' `grep -- --- /sys/kernel/debug/dynamic_debug/control | grep ceph \ >>> | awk '{print $1}' | sed 's/:/ line /'` '+p' >>> p 'file' `grep -- === /sys/kernel/debug/dynamic_debug/control | grep ceph \ >>> | awk '{print $1}' | sed 's/:/ line /'` '+p' >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: ceph kernel bug 2011-09-16 9:05 ` Martin Mailand @ 2011-09-16 18:22 ` Sage Weil 2011-09-16 20:17 ` Martin Mailand 0 siblings, 1 reply; 10+ messages in thread From: Sage Weil @ 2011-09-16 18:22 UTC (permalink / raw) To: Martin Mailand; +Cc: ceph-devel Hi Martin, Thanks, this was enough to help me reproduce it, and I believe I have a correct fix (it's working for me). Can you try commit 935b639 'libceph: fix linger request requeuing' (for-linus branch of git://github.com/NewDreamNetwork/ceph-client.git) and confirm that it fixes things for you as well? Thanks! sage On Fri, 16 Sep 2011, Martin Mailand wrote: > Hi Sage, > I rerun the test and I think I triggered the first bug again. > http://pastebin.com/ydNm0pff > > I did also the dumps for you. > http://tuxadero.com/multistorage/ceph.ko_dump > http://tuxadero.com/multistorage/libceph.ko_dump > > Best Regards, > martin > Am 16.09.2011 00:54, schrieb Sage Weil: > > On Thu, 15 Sep 2011, Martin Mailand wrote: > > > Hi Sage, > > > that's quite a bit of output, I put it in a pastebin. > > > http://pastebin.com/9CNJk0Pw. > > Any chance you can include the output of 'objdump -rdS libceph.ko'? > > ceph.ko too, for good measure. > > > > This looks like a sightly different crash than the one on that bug! > > > > Thanks! > > sage > > > > > > > Best Regards, > > > martin > > > > > > Sage Weil schrieb: > > > > On Thu, 15 Sep 2011, Martin Mailand wrote: > > > > > Hi Sage, > > > > > I am still hitting this in -rc6. It happeneds every time I stop an > > > > > OSD. > > > > > Do you need more information to reproduce it? > > > > Oh, great to hear it's easy to reproduce! I was trying (in my uml > > > > environment) and failing. > > > > > > > > Can run the script below right before stopping the osd, and send the > > > > dmesg > > > > output along? (Or attach to http://tracker.newdream.net/issues/1382) > > > > > > > > Thanks! > > > > sage > > > > > > > > > > > > #!/bin/sh -x > > > > > > > > p() { > > > > echo "$*"> /sys/kernel/debug/dynamic_debug/control > > > > } > > > > > > > > p 'module ceph +p' > > > > p 'module libceph +p' > > > > p 'module rbd +p' > > > > p 'file net/ceph/messenger.c -p' > > > > p 'file' `grep -- --- /sys/kernel/debug/dynamic_debug/control | grep > > > > ceph \ > > > > | awk '{print $1}' | sed 's/:/ line /'` '+p' > > > > p 'file' `grep -- === /sys/kernel/debug/dynamic_debug/control | grep > > > > ceph \ > > > > | awk '{print $1}' | sed 's/:/ line /'` '+p' > > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: ceph kernel bug 2011-09-16 18:22 ` Sage Weil @ 2011-09-16 20:17 ` Martin Mailand 0 siblings, 0 replies; 10+ messages in thread From: Martin Mailand @ 2011-09-16 20:17 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Hi Sage, yes it fixes things for me as well. Best Regards, martin Sage Weil schrieb: > Hi Martin, > > Thanks, this was enough to help me reproduce it, and I believe I have a > correct fix (it's working for me). Can you try commit 935b639 'libceph: > fix linger request requeuing' (for-linus branch of > git://github.com/NewDreamNetwork/ceph-client.git) and confirm that it > fixes things for you as well? > > Thanks! > sage > > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-09-16 20:17 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-09-10 21:12 ceph kernel bug Martin Mailand 2011-09-10 22:47 ` Sage Weil 2011-09-10 23:46 ` Martin Mailand 2011-09-15 19:41 ` Martin Mailand 2011-09-15 20:06 ` Sage Weil 2011-09-15 21:10 ` Martin Mailand 2011-09-15 22:54 ` Sage Weil 2011-09-16 9:05 ` Martin Mailand 2011-09-16 18:22 ` Sage Weil 2011-09-16 20:17 ` Martin Mailand
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.