All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ilya Dryomov <idryomov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Ceph Development
	<ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Ceph Users <ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>
Subject: Re: general protection fault: 0000 [#1] SMP
Date: Thu, 12 Oct 2017 12:50:52 +0200	[thread overview]
Message-ID: <CAOi1vP9E13ZSzCoWv+8vz__kFOuhNE9cJVJW2nA5hM+boGc4Bw@mail.gmail.com> (raw)
In-Reply-To: <1507803838.5310.9.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Thu, Oct 12, 2017 at 12:23 PM, Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Thu, 2017-10-12 at 09:12 +0200, Ilya Dryomov wrote:
>> On Wed, Oct 11, 2017 at 4:40 PM, Olivier Bonvalet <ceph.list-PaEMFeTk6C1QFI55V6+gNQ@public.gmane.org> wrote:
>> > Hi,
>> >
>> > I had a "general protection fault: 0000" with Ceph RBD kernel client.
>> > Not sure how to read the call, is it Ceph related ?
>> >
>> >
>> > Oct 11 16:15:11 lorunde kernel: [311418.891238] general protection fault: 0000 [#1] SMP
>> > Oct 11 16:15:11 lorunde kernel: [311418.891855] Modules linked in: cpuid binfmt_misc nls_iso8859_1 nls_cp437 vfat fat tcp_diag inet_diag xt_physdev br_netfilter iptable_filter xen_netback loop xen_blkback cbc rbd libceph xen_gntdev xen_evtchn xenfs xen_privcmd ipmi_ssif intel_rapl iosf_mbi sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul ghash_clmulni_intel iTCO_wdt pcbc iTCO_vendor_support mxm_wmi aesni_intel aes_x86_64 crypto_simd glue_helper cryptd mgag200 i2c_algo_bit drm_kms_helper intel_rapl_perf ttm drm syscopyarea sysfillrect efi_pstore sysimgblt fb_sys_fops lpc_ich efivars mfd_core evdev ioatdma shpchp acpi_power_meter ipmi_si wmi button ipmi_devintf ipmi_msghandler bridge efivarfs ip_tables x_tables autofs4 dm_mod dax raid10 raid456 async_raid6_recov a
 sync_memcpy async_pq async_xor xor async_tx raid6_pq
>> > Oct 11 16:15:11 lorunde kernel: [311418.895403]  libcrc32c raid1 raid0 multipath linear md_mod hid_generic usbhid i2c_i801 crc32c_intel i2c_core xhci_pci ahci ixgbe xhci_hcd libahci ehci_pci ehci_hcd libata usbcore dca ptp usb_common pps_core mdio
>> > Oct 11 16:15:11 lorunde kernel: [311418.896551] CPU: 1 PID: 4916 Comm: kworker/1:0 Not tainted 4.13-dae-dom0 #2
>> > Oct 11 16:15:11 lorunde kernel: [311418.897134] Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.0019.101220160604 10/12/2016
>> > Oct 11 16:15:11 lorunde kernel: [311418.897745] Workqueue: ceph-msgr ceph_con_workfn [libceph]
>> > Oct 11 16:15:11 lorunde kernel: [311418.898355] task: ffff8801ce434280 task.stack: ffffc900151bc000
>> > Oct 11 16:15:11 lorunde kernel: [311418.899007] RIP: e030:memcpy_erms+0x6/0x10
>> > Oct 11 16:15:11 lorunde kernel: [311418.899616] RSP: e02b:ffffc900151bfac0 EFLAGS: 00010202
>> > Oct 11 16:15:11 lorunde kernel: [311418.900228] RAX: ffff8801b63df000 RBX: ffff88021b41be00 RCX: 0000000004df0000
>> > Oct 11 16:15:11 lorunde kernel: [311418.900848] RDX: 0000000004df0000 RSI: 4450736e24806564 RDI: ffff8801b63df000
>> > Oct 11 16:15:11 lorunde kernel: [311418.901479] RBP: ffffea0005fdd8c8 R08: ffff88028545d618 R09: 0000000000000010
>> > Oct 11 16:15:11 lorunde kernel: [311418.902104] R10: 0000000000000000 R11: ffff880215815000 R12: 0000000000000000
>> > Oct 11 16:15:11 lorunde kernel: [311418.902723] R13: ffff8802158156c0 R14: 0000000000000000 R15: ffff8801ce434280
>> > Oct 11 16:15:11 lorunde kernel: [311418.903359] FS:  0000000000000000(0000) GS:ffff880285440000(0000) knlGS:ffff880285440000
>> > Oct 11 16:15:11 lorunde kernel: [311418.903994] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > Oct 11 16:15:11 lorunde kernel: [311418.904627] CR2: 000055a8461cfc20 CR3: 0000000001809000 CR4: 0000000000042660
>> > Oct 11 16:15:11 lorunde kernel: [311418.905271] Call Trace:
>> > Oct 11 16:15:11 lorunde kernel: [311418.905909]  ? skb_copy_ubufs+0xef/0x290
>> > Oct 11 16:15:11 lorunde kernel: [311418.906548]  ? skb_clone+0x82/0x90
>> > Oct 11 16:15:11 lorunde kernel: [311418.907225]  ? tcp_transmit_skb+0x74/0x930
>> > Oct 11 16:15:11 lorunde kernel: [311418.907858]  ? tcp_write_xmit+0x1bd/0xfb0
>> > Oct 11 16:15:11 lorunde kernel: [311418.908490]  ? __sk_mem_raise_allocated+0x4e/0x220
>> > Oct 11 16:15:11 lorunde kernel: [311418.909122]  ? __tcp_push_pending_frames+0x28/0x90
>> > Oct 11 16:15:11 lorunde kernel: [311418.909755]  ? do_tcp_sendpages+0x4fc/0x590
>> > Oct 11 16:15:11 lorunde kernel: [311418.910386]  ? tcp_sendpage+0x7c/0xa0
>> > Oct 11 16:15:11 lorunde kernel: [311418.911026]  ? inet_sendpage+0x37/0xe0
>> > Oct 11 16:15:11 lorunde kernel: [311418.911655]  ? kernel_sendpage+0x12/0x20
>> > Oct 11 16:15:11 lorunde kernel: [311418.912297]  ? ceph_tcp_sendpage+0x5c/0xc0 [libceph]
>> > Oct 11 16:15:11 lorunde kernel: [311418.912926]  ? ceph_tcp_recvmsg+0x53/0x70 [libceph]
>> > Oct 11 16:15:11 lorunde kernel: [311418.913553]  ? ceph_con_workfn+0xd08/0x22a0 [libceph]
>> > Oct 11 16:15:11 lorunde kernel: [311418.914179]  ? ceph_osdc_start_request+0x23/0x30 [libceph]
>> > Oct 11 16:15:11 lorunde kernel: [311418.914807]  ? rbd_img_obj_request_submit+0x1ac/0x3c0 [rbd]
>> > Oct 11 16:15:11 lorunde kernel: [311418.915458]  ? process_one_work+0x1ad/0x340
>> > Oct 11 16:15:11 lorunde kernel: [311418.916083]  ? worker_thread+0x45/0x3f0
>> > Oct 11 16:15:11 lorunde kernel: [311418.916706]  ? kthread+0xf2/0x130
>> > Oct 11 16:15:11 lorunde kernel: [311418.917327]  ? process_one_work+0x340/0x340
>> > Oct 11 16:15:11 lorunde kernel: [311418.917946]  ? kthread_create_on_node+0x40/0x40
>> > Oct 11 16:15:11 lorunde kernel: [311418.918565]  ? do_group_exit+0x35/0xa0
>> > Oct 11 16:15:11 lorunde kernel: [311418.919215]  ? ret_from_fork+0x25/0x30
>> > Oct 11 16:15:11 lorunde kernel: [311418.919826] Code: 43 4e 5b eb ec eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
>> > Oct 11 16:15:11 lorunde kernel: [311418.921094] RIP: memcpy_erms+0x6/0x10 RSP: ffffc900151bfac0
>> > Oct 11 16:15:11 lorunde kernel: [311418.921970] ---[ end trace 904278a63cb49fca ]---
>>
>> It's a crash in memcpy() in skb_copy_ubufs().  It's not in ceph, but
>> ceph-induced, it looks like.  I don't remember seeing anything similar
>> in the context of krbd.
>>
>> This is a Xen dom0 kernel, right?  What did the workload look like?
>> Can you provide dmesg before the crash?
>>
>
> ...and to be clear:
>
> (gdb) list *(memcpy_erms+0x6)
> 0xffffffff8188f136 is at arch/x86/lib/memcpy_64.S:54.
> 49       * simpler than memcpy. Use memcpy_erms when possible.
> 50       */
> 51      ENTRY(memcpy_erms)
> 52              movq %rdi, %rax
> 53              movq %rdx, %rcx
> 54              rep movsb
> 55              ret
> 56      ENDPROC(memcpy_erms)
> 57
> 58      ENTRY(memcpy_orig)
>
> So either %rsi or %rdi held a bogus address at the time of the crash,
> most likely. If you have a vmcore, you may be able to dig in with crash
> and tell which address it was, and trace back up the call stack to where
> it came from.

I suspect src-side bustage.

>
> That said... %rcx looks quite large -- 81723392 bytes still to go in the
> copy. This might be a case where the copy length got screwed up somehow
> and it overran its bounds.

Yeah, suspiciously large.  I don't think it copied a single byte
though: %rcx never got decremented.

Thanks,

                Ilya

  parent reply	other threads:[~2017-10-12 10:50 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-11 14:40 general protection fault: 0000 [#1] SMP Olivier Bonvalet
2017-10-12  7:12 ` [ceph-users] " Ilya Dryomov
     [not found]   ` <CAOi1vP--q8y696g5W_AUmR9Yxe5Xop3BH3xjEQG6_pmQmXO6kA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-10-12  7:26     ` Re : " Olivier Bonvalet
2017-10-12 13:58       ` Re : [ceph-users] " Luis Henriques
2017-10-12 10:23   ` Jeff Layton
     [not found]     ` <1507803838.5310.9.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-10-12 10:50       ` Ilya Dryomov [this message]
     [not found] <550186fd-f426-08a6-8b32-e2818717b06a@molgen.mpg.de>
2017-05-04 10:49 ` Jeff Layton
  -- strict thread matches above, loose matches on Subject: below --
2011-03-14 17:41 Justin P. Mattock
2010-11-20 16:35 Justin Mattock
2010-11-20 22:28 ` Jesper Juhl
2010-11-20 22:32   ` Jesper Juhl
2010-11-20 23:21     ` Justin P. Mattock
2010-11-22 19:01     ` Justin P. Mattock
2010-11-22 20:25       ` Hugh Dickins
2010-11-22 21:44         ` Justin P. Mattock
2010-07-03 22:59 Justin P. Mattock
2006-01-30  8:54 general protection fault: 0000 [1] SMP Martin Klier
2006-02-17 13:25 ` Martin Klier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOi1vP9E13ZSzCoWv+8vz__kFOuhNE9cJVJW2nA5hM+boGc4Bw@mail.gmail.com \
    --to=idryomov-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org \
    --cc=jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.