From mboxrd@z Thu Jan 1 00:00:00 1970 From: Danny Kukawka Subject: Kernel crashes with RBD Date: Thu, 12 Apr 2012 00:30:49 +0200 Message-ID: <4F860619.5040802@bisect.de> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig077AE7730B68A3172ED91855" Return-path: Received: from wp188.webpack.hosteurope.de ([80.237.132.195]:58100 "EHLO wp188.webpack.hosteurope.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759447Ab2DKWxY (ORCPT ); Wed, 11 Apr 2012 18:53:24 -0400 Sender: ceph-devel-owner@vger.kernel.org List-ID: To: ceph-devel@vger.kernel.org This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig077AE7730B68A3172ED91855 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: quoted-printable Hi, we are currently testing CEPH with RBD on a cluster with 1GBit and 10Gbit interfaces. While we see no kernel crashes with RBD if the cluster runs on the 1GBit interfaces, we see very frequent kernel crashes with the 10Gbit network while running tests with e.g. fio against the RBDs. I've tested it with kernel v3.0 and also 3.3.0 (with the patches from the 'for-linus' branch from ceph-client.git at git.kernel.org). With more client machines running tests the crashes occur even much faster. The issue is fully reproducible here. Has anyone seen similar problems? See the backtrace below. Regards Danny PID: 10902 TASK: ffff88032a9a2080 CPU: 0 COMMAND: "kworker/0:0" #0 [ffff8803235fd950] machine_kexec at ffffffff810265ee #1 [ffff8803235fd9a0] crash_kexec at ffffffff810a3bda #2 [ffff8803235fda70] oops_end at ffffffff81444688 #3 [ffff8803235fda90] __bad_area_nosemaphore at ffffffff81032a35 #4 [ffff8803235fdb50] do_page_fault at ffffffff81446d3e #5 [ffff8803235fdc50] page_fault at ffffffff81443865 [exception RIP: read_partial_message+816] RIP: ffffffffa041e500 RSP: ffff8803235fdd00 RFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000000009d7 RCX: 0000000000008000 RDX: 0000000000000000 RSI: 00000000000009d7 RDI: ffffffff813c8d78 RBP: ffff880328827030 R8: 00000000000009d7 R9: 0000000000004000 R10: 0000000000000000 R11: ffffffff81205800 R12: 0000000000000000 R13: 0000000000000069 R14: ffff88032a9bc780 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #6 [ffff8803235fdd38] thread_return at ffffffff81440e82 #7 [ffff8803235fdd78] try_read at ffffffffa041ed58 [libceph] #8 [ffff8803235fddf8] con_work at ffffffffa041fb2e [libceph] #9 [ffff8803235fde28] process_one_work at ffffffff8107487c #10 [ffff8803235fde78] worker_thread at ffffffff8107740a #11 [ffff8803235fdee8] kthread at ffffffff8107b736 #12 [ffff8803235fdf48] kernel_thread_helper at ffffffff8144c144 --------------enig077AE7730B68A3172ED91855 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iJwEAQECAAYFAk+GBigACgkQ9DHLX79LmTKdXgQAklecH/LsSaFDBlaZa/EKd7ON J4YAjCaJ+2bOmngSqO34OuperO/yMygseVbrN/Z38W9a8WsLFFNyZoLPyNxUzrxq 4Pa+0q6/1xZy3VGuDQL/EMbV2qXFWFTKhxE/O+C3qh8HvWRc8v4cad7xZsG7WX3K WFSooT0Be3ltIY0ivVg= =S7AD -----END PGP SIGNATURE----- --------------enig077AE7730B68A3172ED91855--