All of lore.kernel.org
 help / color / mirror / Atom feed
* general protection fault: 0000 [#1] SMP
@ 2017-10-11 14:40 Olivier Bonvalet
  2017-10-12  7:12 ` [ceph-users] " Ilya Dryomov
  0 siblings, 1 reply; 18+ messages in thread
From: Olivier Bonvalet @ 2017-10-11 14:40 UTC (permalink / raw)
  To: Ceph Users, ceph-devel-u79uwXL29TY76Z2rM5mHXA

Hi,

I had a "general protection fault: 0000" with Ceph RBD kernel client.
Not sure how to read the call, is it Ceph related ?


Oct 11 16:15:11 lorunde kernel: [311418.891238] general protection fault: 0000 [#1] SMP
Oct 11 16:15:11 lorunde kernel: [311418.891855] Modules linked in: cpuid binfmt_misc nls_iso8859_1 nls_cp437 vfat fat tcp_diag inet_diag xt_physdev br_netfilter iptable_filter xen_netback loop xen_blkback cbc rbd libceph xen_gntdev xen_evtchn xenfs xen_privcmd ipmi_ssif intel_rapl iosf_mbi sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul ghash_clmulni_intel iTCO_wdt pcbc iTCO_vendor_support mxm_wmi aesni_intel aes_x86_64 crypto_simd glue_helper cryptd mgag200 i2c_algo_bit drm_kms_helper intel_rapl_perf ttm drm syscopyarea sysfillrect efi_pstore sysimgblt fb_sys_fops lpc_ich efivars mfd_core evdev ioatdma shpchp acpi_power_meter ipmi_si wmi button ipmi_devintf ipmi_msghandler bridge efivarfs ip_tables x_tables autofs4 dm_mod dax raid10 raid456 async_raid6_recov async_
 memcpy async_pq async_xor xor async_tx raid6_pq
Oct 11 16:15:11 lorunde kernel: [311418.895403]  libcrc32c raid1 raid0 multipath linear md_mod hid_generic usbhid i2c_i801 crc32c_intel i2c_core xhci_pci ahci ixgbe xhci_hcd libahci ehci_pci ehci_hcd libata usbcore dca ptp usb_common pps_core mdio
Oct 11 16:15:11 lorunde kernel: [311418.896551] CPU: 1 PID: 4916 Comm: kworker/1:0 Not tainted 4.13-dae-dom0 #2
Oct 11 16:15:11 lorunde kernel: [311418.897134] Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.0019.101220160604 10/12/2016
Oct 11 16:15:11 lorunde kernel: [311418.897745] Workqueue: ceph-msgr ceph_con_workfn [libceph]
Oct 11 16:15:11 lorunde kernel: [311418.898355] task: ffff8801ce434280 task.stack: ffffc900151bc000
Oct 11 16:15:11 lorunde kernel: [311418.899007] RIP: e030:memcpy_erms+0x6/0x10
Oct 11 16:15:11 lorunde kernel: [311418.899616] RSP: e02b:ffffc900151bfac0 EFLAGS: 00010202
Oct 11 16:15:11 lorunde kernel: [311418.900228] RAX: ffff8801b63df000 RBX: ffff88021b41be00 RCX: 0000000004df0000
Oct 11 16:15:11 lorunde kernel: [311418.900848] RDX: 0000000004df0000 RSI: 4450736e24806564 RDI: ffff8801b63df000
Oct 11 16:15:11 lorunde kernel: [311418.901479] RBP: ffffea0005fdd8c8 R08: ffff88028545d618 R09: 0000000000000010
Oct 11 16:15:11 lorunde kernel: [311418.902104] R10: 0000000000000000 R11: ffff880215815000 R12: 0000000000000000
Oct 11 16:15:11 lorunde kernel: [311418.902723] R13: ffff8802158156c0 R14: 0000000000000000 R15: ffff8801ce434280
Oct 11 16:15:11 lorunde kernel: [311418.903359] FS:  0000000000000000(0000) GS:ffff880285440000(0000) knlGS:ffff880285440000
Oct 11 16:15:11 lorunde kernel: [311418.903994] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 11 16:15:11 lorunde kernel: [311418.904627] CR2: 000055a8461cfc20 CR3: 0000000001809000 CR4: 0000000000042660
Oct 11 16:15:11 lorunde kernel: [311418.905271] Call Trace:
Oct 11 16:15:11 lorunde kernel: [311418.905909]  ? skb_copy_ubufs+0xef/0x290
Oct 11 16:15:11 lorunde kernel: [311418.906548]  ? skb_clone+0x82/0x90
Oct 11 16:15:11 lorunde kernel: [311418.907225]  ? tcp_transmit_skb+0x74/0x930
Oct 11 16:15:11 lorunde kernel: [311418.907858]  ? tcp_write_xmit+0x1bd/0xfb0
Oct 11 16:15:11 lorunde kernel: [311418.908490]  ? __sk_mem_raise_allocated+0x4e/0x220
Oct 11 16:15:11 lorunde kernel: [311418.909122]  ? __tcp_push_pending_frames+0x28/0x90
Oct 11 16:15:11 lorunde kernel: [311418.909755]  ? do_tcp_sendpages+0x4fc/0x590
Oct 11 16:15:11 lorunde kernel: [311418.910386]  ? tcp_sendpage+0x7c/0xa0
Oct 11 16:15:11 lorunde kernel: [311418.911026]  ? inet_sendpage+0x37/0xe0
Oct 11 16:15:11 lorunde kernel: [311418.911655]  ? kernel_sendpage+0x12/0x20
Oct 11 16:15:11 lorunde kernel: [311418.912297]  ? ceph_tcp_sendpage+0x5c/0xc0 [libceph]
Oct 11 16:15:11 lorunde kernel: [311418.912926]  ? ceph_tcp_recvmsg+0x53/0x70 [libceph]
Oct 11 16:15:11 lorunde kernel: [311418.913553]  ? ceph_con_workfn+0xd08/0x22a0 [libceph]
Oct 11 16:15:11 lorunde kernel: [311418.914179]  ? ceph_osdc_start_request+0x23/0x30 [libceph]
Oct 11 16:15:11 lorunde kernel: [311418.914807]  ? rbd_img_obj_request_submit+0x1ac/0x3c0 [rbd]
Oct 11 16:15:11 lorunde kernel: [311418.915458]  ? process_one_work+0x1ad/0x340
Oct 11 16:15:11 lorunde kernel: [311418.916083]  ? worker_thread+0x45/0x3f0
Oct 11 16:15:11 lorunde kernel: [311418.916706]  ? kthread+0xf2/0x130
Oct 11 16:15:11 lorunde kernel: [311418.917327]  ? process_one_work+0x340/0x340
Oct 11 16:15:11 lorunde kernel: [311418.917946]  ? kthread_create_on_node+0x40/0x40
Oct 11 16:15:11 lorunde kernel: [311418.918565]  ? do_group_exit+0x35/0xa0
Oct 11 16:15:11 lorunde kernel: [311418.919215]  ? ret_from_fork+0x25/0x30
Oct 11 16:15:11 lorunde kernel: [311418.919826] Code: 43 4e 5b eb ec eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 
Oct 11 16:15:11 lorunde kernel: [311418.921094] RIP: memcpy_erms+0x6/0x10 RSP: ffffc900151bfac0
Oct 11 16:15:11 lorunde kernel: [311418.921970] ---[ end trace 904278a63cb49fca ]---

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ceph-users] general protection fault: 0000 [#1] SMP
  2017-10-11 14:40 general protection fault: 0000 [#1] SMP Olivier Bonvalet
@ 2017-10-12  7:12 ` Ilya Dryomov
       [not found]   ` <CAOi1vP--q8y696g5W_AUmR9Yxe5Xop3BH3xjEQG6_pmQmXO6kA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2017-10-12 10:23   ` Jeff Layton
  0 siblings, 2 replies; 18+ messages in thread
From: Ilya Dryomov @ 2017-10-12  7:12 UTC (permalink / raw)
  To: Olivier Bonvalet; +Cc: Ceph Users, Ceph Development

On Wed, Oct 11, 2017 at 4:40 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> Hi,
>
> I had a "general protection fault: 0000" with Ceph RBD kernel client.
> Not sure how to read the call, is it Ceph related ?
>
>
> Oct 11 16:15:11 lorunde kernel: [311418.891238] general protection fault: 0000 [#1] SMP
> Oct 11 16:15:11 lorunde kernel: [311418.891855] Modules linked in: cpuid binfmt_misc nls_iso8859_1 nls_cp437 vfat fat tcp_diag inet_diag xt_physdev br_netfilter iptable_filter xen_netback loop xen_blkback cbc rbd libceph xen_gntdev xen_evtchn xenfs xen_privcmd ipmi_ssif intel_rapl iosf_mbi sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul ghash_clmulni_intel iTCO_wdt pcbc iTCO_vendor_support mxm_wmi aesni_intel aes_x86_64 crypto_simd glue_helper cryptd mgag200 i2c_algo_bit drm_kms_helper intel_rapl_perf ttm drm syscopyarea sysfillrect efi_pstore sysimgblt fb_sys_fops lpc_ich efivars mfd_core evdev ioatdma shpchp acpi_power_meter ipmi_si wmi button ipmi_devintf ipmi_msghandler bridge efivarfs ip_tables x_tables autofs4 dm_mod dax raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq
> Oct 11 16:15:11 lorunde kernel: [311418.895403]  libcrc32c raid1 raid0 multipath linear md_mod hid_generic usbhid i2c_i801 crc32c_intel i2c_core xhci_pci ahci ixgbe xhci_hcd libahci ehci_pci ehci_hcd libata usbcore dca ptp usb_common pps_core mdio
> Oct 11 16:15:11 lorunde kernel: [311418.896551] CPU: 1 PID: 4916 Comm: kworker/1:0 Not tainted 4.13-dae-dom0 #2
> Oct 11 16:15:11 lorunde kernel: [311418.897134] Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.0019.101220160604 10/12/2016
> Oct 11 16:15:11 lorunde kernel: [311418.897745] Workqueue: ceph-msgr ceph_con_workfn [libceph]
> Oct 11 16:15:11 lorunde kernel: [311418.898355] task: ffff8801ce434280 task.stack: ffffc900151bc000
> Oct 11 16:15:11 lorunde kernel: [311418.899007] RIP: e030:memcpy_erms+0x6/0x10
> Oct 11 16:15:11 lorunde kernel: [311418.899616] RSP: e02b:ffffc900151bfac0 EFLAGS: 00010202
> Oct 11 16:15:11 lorunde kernel: [311418.900228] RAX: ffff8801b63df000 RBX: ffff88021b41be00 RCX: 0000000004df0000
> Oct 11 16:15:11 lorunde kernel: [311418.900848] RDX: 0000000004df0000 RSI: 4450736e24806564 RDI: ffff8801b63df000
> Oct 11 16:15:11 lorunde kernel: [311418.901479] RBP: ffffea0005fdd8c8 R08: ffff88028545d618 R09: 0000000000000010
> Oct 11 16:15:11 lorunde kernel: [311418.902104] R10: 0000000000000000 R11: ffff880215815000 R12: 0000000000000000
> Oct 11 16:15:11 lorunde kernel: [311418.902723] R13: ffff8802158156c0 R14: 0000000000000000 R15: ffff8801ce434280
> Oct 11 16:15:11 lorunde kernel: [311418.903359] FS:  0000000000000000(0000) GS:ffff880285440000(0000) knlGS:ffff880285440000
> Oct 11 16:15:11 lorunde kernel: [311418.903994] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> Oct 11 16:15:11 lorunde kernel: [311418.904627] CR2: 000055a8461cfc20 CR3: 0000000001809000 CR4: 0000000000042660
> Oct 11 16:15:11 lorunde kernel: [311418.905271] Call Trace:
> Oct 11 16:15:11 lorunde kernel: [311418.905909]  ? skb_copy_ubufs+0xef/0x290
> Oct 11 16:15:11 lorunde kernel: [311418.906548]  ? skb_clone+0x82/0x90
> Oct 11 16:15:11 lorunde kernel: [311418.907225]  ? tcp_transmit_skb+0x74/0x930
> Oct 11 16:15:11 lorunde kernel: [311418.907858]  ? tcp_write_xmit+0x1bd/0xfb0
> Oct 11 16:15:11 lorunde kernel: [311418.908490]  ? __sk_mem_raise_allocated+0x4e/0x220
> Oct 11 16:15:11 lorunde kernel: [311418.909122]  ? __tcp_push_pending_frames+0x28/0x90
> Oct 11 16:15:11 lorunde kernel: [311418.909755]  ? do_tcp_sendpages+0x4fc/0x590
> Oct 11 16:15:11 lorunde kernel: [311418.910386]  ? tcp_sendpage+0x7c/0xa0
> Oct 11 16:15:11 lorunde kernel: [311418.911026]  ? inet_sendpage+0x37/0xe0
> Oct 11 16:15:11 lorunde kernel: [311418.911655]  ? kernel_sendpage+0x12/0x20
> Oct 11 16:15:11 lorunde kernel: [311418.912297]  ? ceph_tcp_sendpage+0x5c/0xc0 [libceph]
> Oct 11 16:15:11 lorunde kernel: [311418.912926]  ? ceph_tcp_recvmsg+0x53/0x70 [libceph]
> Oct 11 16:15:11 lorunde kernel: [311418.913553]  ? ceph_con_workfn+0xd08/0x22a0 [libceph]
> Oct 11 16:15:11 lorunde kernel: [311418.914179]  ? ceph_osdc_start_request+0x23/0x30 [libceph]
> Oct 11 16:15:11 lorunde kernel: [311418.914807]  ? rbd_img_obj_request_submit+0x1ac/0x3c0 [rbd]
> Oct 11 16:15:11 lorunde kernel: [311418.915458]  ? process_one_work+0x1ad/0x340
> Oct 11 16:15:11 lorunde kernel: [311418.916083]  ? worker_thread+0x45/0x3f0
> Oct 11 16:15:11 lorunde kernel: [311418.916706]  ? kthread+0xf2/0x130
> Oct 11 16:15:11 lorunde kernel: [311418.917327]  ? process_one_work+0x340/0x340
> Oct 11 16:15:11 lorunde kernel: [311418.917946]  ? kthread_create_on_node+0x40/0x40
> Oct 11 16:15:11 lorunde kernel: [311418.918565]  ? do_group_exit+0x35/0xa0
> Oct 11 16:15:11 lorunde kernel: [311418.919215]  ? ret_from_fork+0x25/0x30
> Oct 11 16:15:11 lorunde kernel: [311418.919826] Code: 43 4e 5b eb ec eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
> Oct 11 16:15:11 lorunde kernel: [311418.921094] RIP: memcpy_erms+0x6/0x10 RSP: ffffc900151bfac0
> Oct 11 16:15:11 lorunde kernel: [311418.921970] ---[ end trace 904278a63cb49fca ]---

It's a crash in memcpy() in skb_copy_ubufs().  It's not in ceph, but
ceph-induced, it looks like.  I don't remember seeing anything similar
in the context of krbd.

This is a Xen dom0 kernel, right?  What did the workload look like?
Can you provide dmesg before the crash?

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re :  general protection fault: 0000 [#1] SMP
       [not found]   ` <CAOi1vP--q8y696g5W_AUmR9Yxe5Xop3BH3xjEQG6_pmQmXO6kA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-10-12  7:26     ` Olivier Bonvalet
  2017-10-12 13:58       ` Re : [ceph-users] " Luis Henriques
  0 siblings, 1 reply; 18+ messages in thread
From: Olivier Bonvalet @ 2017-10-12  7:26 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: Ceph Users, Ceph Development

Le jeudi 12 octobre 2017 à 09:12 +0200, Ilya Dryomov a écrit :
> It's a crash in memcpy() in skb_copy_ubufs().  It's not in ceph, but
> ceph-induced, it looks like.  I don't remember seeing anything
> similar
> in the context of krbd.
> 
> This is a Xen dom0 kernel, right?  What did the workload look like?
> Can you provide dmesg before the crash?

Hi,

yes it's a Xen dom0 kernel. Linux 4.13.3, Xen 4.8.2, with an old
0.94.10 Ceph (so, Hammer).

Before this error, I add this in logs :

Oct 11 16:00:41 lorunde kernel: [310548.899082] libceph: read_partial_message ffff88021a910200 data crc 2306836368 != exp. 2215155875
Oct 11 16:00:41 lorunde kernel: [310548.899841] libceph: osd117 10.0.0.31:6804 bad crc/signature
Oct 11 16:02:25 lorunde kernel: [310652.695015] libceph: read_partial_message ffff880220b10100 data crc 842840543 != exp. 2657161714
Oct 11 16:02:25 lorunde kernel: [310652.695731] libceph: osd3 10.0.0.26:6804 bad crc/signature
Oct 11 16:07:24 lorunde kernel: [310952.485202] libceph: read_partial_message ffff88025d1aa400 data crc 938978341 != exp. 4154366769
Oct 11 16:07:24 lorunde kernel: [310952.485870] libceph: osd117 10.0.0.31:6804 bad crc/signature
Oct 11 16:10:44 lorunde kernel: [311151.841812] libceph: read_partial_message ffff880260300400 data crc 2988747958 != exp. 319958859
Oct 11 16:10:44 lorunde kernel: [311151.842672] libceph: osd9 10.0.0.51:6802 bad crc/signature
Oct 11 16:10:57 lorunde kernel: [311165.211412] libceph: read_partial_message ffff8802208b8300 data crc 369498361 != exp. 906022772
Oct 11 16:10:57 lorunde kernel: [311165.212135] libceph: osd87 10.0.0.5:6800 bad crc/signature
Oct 11 16:12:27 lorunde kernel: [311254.635767] libceph: read_partial_message ffff880236f9a000 data crc 2586662963 != exp. 2886241494
Oct 11 16:12:27 lorunde kernel: [311254.636493] libceph: osd90 10.0.0.5:6814 bad crc/signature
Oct 11 16:14:31 lorunde kernel: [311378.808191] libceph: read_partial_message ffff88027e633c00 data crc 1102363051 != exp. 679243837
Oct 11 16:14:31 lorunde kernel: [311378.808889] libceph: osd13 10.0.0.21:6804 bad crc/signature
Oct 11 16:15:01 lorunde kernel: [311409.431034] libceph: read_partial_message ffff88024ce0a800 data crc 2467415342 != exp. 1753860323
Oct 11 16:15:01 lorunde kernel: [311409.431718] libceph: osd111 10.0.0.30:6804 bad crc/signature
Oct 11 16:15:11 lorunde kernel: [311418.891238] general protection fault: 0000 [#1] SMP


We had to switch to TCP Cubic (instead of badly configured TCP BBR, without FQ), to reduce the data crc errors.
But since we still had some errors, last night we rebooted all the OSD nodes in Linux 4.4.91, instead of Linux 4.9.47 & 4.9.53.

Since the last 7 hours, we haven't got any data crc errors from OSD, but we had one from a MON. Without hang/crash.

About the workload, the Xen VMs are mainly LAMP servers : http traffic, handle by nginx or apache, php, and MySQL databases.

Thanks,

Olivier
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ceph-users] general protection fault: 0000 [#1] SMP
  2017-10-12  7:12 ` [ceph-users] " Ilya Dryomov
       [not found]   ` <CAOi1vP--q8y696g5W_AUmR9Yxe5Xop3BH3xjEQG6_pmQmXO6kA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-10-12 10:23   ` Jeff Layton
       [not found]     ` <1507803838.5310.9.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 18+ messages in thread
From: Jeff Layton @ 2017-10-12 10:23 UTC (permalink / raw)
  To: Ilya Dryomov, Olivier Bonvalet; +Cc: Ceph Users, Ceph Development

On Thu, 2017-10-12 at 09:12 +0200, Ilya Dryomov wrote:
> On Wed, Oct 11, 2017 at 4:40 PM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> > Hi,
> > 
> > I had a "general protection fault: 0000" with Ceph RBD kernel client.
> > Not sure how to read the call, is it Ceph related ?
> > 
> > 
> > Oct 11 16:15:11 lorunde kernel: [311418.891238] general protection fault: 0000 [#1] SMP
> > Oct 11 16:15:11 lorunde kernel: [311418.891855] Modules linked in: cpuid binfmt_misc nls_iso8859_1 nls_cp437 vfat fat tcp_diag inet_diag xt_physdev br_netfilter iptable_filter xen_netback loop xen_blkback cbc rbd libceph xen_gntdev xen_evtchn xenfs xen_privcmd ipmi_ssif intel_rapl iosf_mbi sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul ghash_clmulni_intel iTCO_wdt pcbc iTCO_vendor_support mxm_wmi aesni_intel aes_x86_64 crypto_simd glue_helper cryptd mgag200 i2c_algo_bit drm_kms_helper intel_rapl_perf ttm drm syscopyarea sysfillrect efi_pstore sysimgblt fb_sys_fops lpc_ich efivars mfd_core evdev ioatdma shpchp acpi_power_meter ipmi_si wmi button ipmi_devintf ipmi_msghandler bridge efivarfs ip_tables x_tables autofs4 dm_mod dax raid10 raid456 async_raid6_recov as
 ync_memcpy async_pq async_xor xor async_tx raid6_pq
> > Oct 11 16:15:11 lorunde kernel: [311418.895403]  libcrc32c raid1 raid0 multipath linear md_mod hid_generic usbhid i2c_i801 crc32c_intel i2c_core xhci_pci ahci ixgbe xhci_hcd libahci ehci_pci ehci_hcd libata usbcore dca ptp usb_common pps_core mdio
> > Oct 11 16:15:11 lorunde kernel: [311418.896551] CPU: 1 PID: 4916 Comm: kworker/1:0 Not tainted 4.13-dae-dom0 #2
> > Oct 11 16:15:11 lorunde kernel: [311418.897134] Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.0019.101220160604 10/12/2016
> > Oct 11 16:15:11 lorunde kernel: [311418.897745] Workqueue: ceph-msgr ceph_con_workfn [libceph]
> > Oct 11 16:15:11 lorunde kernel: [311418.898355] task: ffff8801ce434280 task.stack: ffffc900151bc000
> > Oct 11 16:15:11 lorunde kernel: [311418.899007] RIP: e030:memcpy_erms+0x6/0x10
> > Oct 11 16:15:11 lorunde kernel: [311418.899616] RSP: e02b:ffffc900151bfac0 EFLAGS: 00010202
> > Oct 11 16:15:11 lorunde kernel: [311418.900228] RAX: ffff8801b63df000 RBX: ffff88021b41be00 RCX: 0000000004df0000
> > Oct 11 16:15:11 lorunde kernel: [311418.900848] RDX: 0000000004df0000 RSI: 4450736e24806564 RDI: ffff8801b63df000
> > Oct 11 16:15:11 lorunde kernel: [311418.901479] RBP: ffffea0005fdd8c8 R08: ffff88028545d618 R09: 0000000000000010
> > Oct 11 16:15:11 lorunde kernel: [311418.902104] R10: 0000000000000000 R11: ffff880215815000 R12: 0000000000000000
> > Oct 11 16:15:11 lorunde kernel: [311418.902723] R13: ffff8802158156c0 R14: 0000000000000000 R15: ffff8801ce434280
> > Oct 11 16:15:11 lorunde kernel: [311418.903359] FS:  0000000000000000(0000) GS:ffff880285440000(0000) knlGS:ffff880285440000
> > Oct 11 16:15:11 lorunde kernel: [311418.903994] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> > Oct 11 16:15:11 lorunde kernel: [311418.904627] CR2: 000055a8461cfc20 CR3: 0000000001809000 CR4: 0000000000042660
> > Oct 11 16:15:11 lorunde kernel: [311418.905271] Call Trace:
> > Oct 11 16:15:11 lorunde kernel: [311418.905909]  ? skb_copy_ubufs+0xef/0x290
> > Oct 11 16:15:11 lorunde kernel: [311418.906548]  ? skb_clone+0x82/0x90
> > Oct 11 16:15:11 lorunde kernel: [311418.907225]  ? tcp_transmit_skb+0x74/0x930
> > Oct 11 16:15:11 lorunde kernel: [311418.907858]  ? tcp_write_xmit+0x1bd/0xfb0
> > Oct 11 16:15:11 lorunde kernel: [311418.908490]  ? __sk_mem_raise_allocated+0x4e/0x220
> > Oct 11 16:15:11 lorunde kernel: [311418.909122]  ? __tcp_push_pending_frames+0x28/0x90
> > Oct 11 16:15:11 lorunde kernel: [311418.909755]  ? do_tcp_sendpages+0x4fc/0x590
> > Oct 11 16:15:11 lorunde kernel: [311418.910386]  ? tcp_sendpage+0x7c/0xa0
> > Oct 11 16:15:11 lorunde kernel: [311418.911026]  ? inet_sendpage+0x37/0xe0
> > Oct 11 16:15:11 lorunde kernel: [311418.911655]  ? kernel_sendpage+0x12/0x20
> > Oct 11 16:15:11 lorunde kernel: [311418.912297]  ? ceph_tcp_sendpage+0x5c/0xc0 [libceph]
> > Oct 11 16:15:11 lorunde kernel: [311418.912926]  ? ceph_tcp_recvmsg+0x53/0x70 [libceph]
> > Oct 11 16:15:11 lorunde kernel: [311418.913553]  ? ceph_con_workfn+0xd08/0x22a0 [libceph]
> > Oct 11 16:15:11 lorunde kernel: [311418.914179]  ? ceph_osdc_start_request+0x23/0x30 [libceph]
> > Oct 11 16:15:11 lorunde kernel: [311418.914807]  ? rbd_img_obj_request_submit+0x1ac/0x3c0 [rbd]
> > Oct 11 16:15:11 lorunde kernel: [311418.915458]  ? process_one_work+0x1ad/0x340
> > Oct 11 16:15:11 lorunde kernel: [311418.916083]  ? worker_thread+0x45/0x3f0
> > Oct 11 16:15:11 lorunde kernel: [311418.916706]  ? kthread+0xf2/0x130
> > Oct 11 16:15:11 lorunde kernel: [311418.917327]  ? process_one_work+0x340/0x340
> > Oct 11 16:15:11 lorunde kernel: [311418.917946]  ? kthread_create_on_node+0x40/0x40
> > Oct 11 16:15:11 lorunde kernel: [311418.918565]  ? do_group_exit+0x35/0xa0
> > Oct 11 16:15:11 lorunde kernel: [311418.919215]  ? ret_from_fork+0x25/0x30
> > Oct 11 16:15:11 lorunde kernel: [311418.919826] Code: 43 4e 5b eb ec eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
> > Oct 11 16:15:11 lorunde kernel: [311418.921094] RIP: memcpy_erms+0x6/0x10 RSP: ffffc900151bfac0
> > Oct 11 16:15:11 lorunde kernel: [311418.921970] ---[ end trace 904278a63cb49fca ]---
> 
> It's a crash in memcpy() in skb_copy_ubufs().  It's not in ceph, but
> ceph-induced, it looks like.  I don't remember seeing anything similar
> in the context of krbd.
> 
> This is a Xen dom0 kernel, right?  What did the workload look like?
> Can you provide dmesg before the crash?
> 

...and to be clear:

(gdb) list *(memcpy_erms+0x6)
0xffffffff8188f136 is at arch/x86/lib/memcpy_64.S:54.
49	 * simpler than memcpy. Use memcpy_erms when possible.
50	 */
51	ENTRY(memcpy_erms)
52		movq %rdi, %rax
53		movq %rdx, %rcx
54		rep movsb
55		ret
56	ENDPROC(memcpy_erms)
57	
58	ENTRY(memcpy_orig)

So either %rsi or %rdi held a bogus address at the time of the crash,
most likely. If you have a vmcore, you may be able to dig in with crash
and tell which address it was, and trace back up the call stack to where
it came from.

That said... %rcx looks quite large -- 81723392 bytes still to go in the
copy. This might be a case where the copy length got screwed up somehow
and it overran its bounds.
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: general protection fault: 0000 [#1] SMP
       [not found]     ` <1507803838.5310.9.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-10-12 10:50       ` Ilya Dryomov
  0 siblings, 0 replies; 18+ messages in thread
From: Ilya Dryomov @ 2017-10-12 10:50 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Ceph Development, Ceph Users

On Thu, Oct 12, 2017 at 12:23 PM, Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Thu, 2017-10-12 at 09:12 +0200, Ilya Dryomov wrote:
>> On Wed, Oct 11, 2017 at 4:40 PM, Olivier Bonvalet <ceph.list-PaEMFeTk6C1QFI55V6+gNQ@public.gmane.org> wrote:
>> > Hi,
>> >
>> > I had a "general protection fault: 0000" with Ceph RBD kernel client.
>> > Not sure how to read the call, is it Ceph related ?
>> >
>> >
>> > Oct 11 16:15:11 lorunde kernel: [311418.891238] general protection fault: 0000 [#1] SMP
>> > Oct 11 16:15:11 lorunde kernel: [311418.891855] Modules linked in: cpuid binfmt_misc nls_iso8859_1 nls_cp437 vfat fat tcp_diag inet_diag xt_physdev br_netfilter iptable_filter xen_netback loop xen_blkback cbc rbd libceph xen_gntdev xen_evtchn xenfs xen_privcmd ipmi_ssif intel_rapl iosf_mbi sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul ghash_clmulni_intel iTCO_wdt pcbc iTCO_vendor_support mxm_wmi aesni_intel aes_x86_64 crypto_simd glue_helper cryptd mgag200 i2c_algo_bit drm_kms_helper intel_rapl_perf ttm drm syscopyarea sysfillrect efi_pstore sysimgblt fb_sys_fops lpc_ich efivars mfd_core evdev ioatdma shpchp acpi_power_meter ipmi_si wmi button ipmi_devintf ipmi_msghandler bridge efivarfs ip_tables x_tables autofs4 dm_mod dax raid10 raid456 async_raid6_recov a
 sync_memcpy async_pq async_xor xor async_tx raid6_pq
>> > Oct 11 16:15:11 lorunde kernel: [311418.895403]  libcrc32c raid1 raid0 multipath linear md_mod hid_generic usbhid i2c_i801 crc32c_intel i2c_core xhci_pci ahci ixgbe xhci_hcd libahci ehci_pci ehci_hcd libata usbcore dca ptp usb_common pps_core mdio
>> > Oct 11 16:15:11 lorunde kernel: [311418.896551] CPU: 1 PID: 4916 Comm: kworker/1:0 Not tainted 4.13-dae-dom0 #2
>> > Oct 11 16:15:11 lorunde kernel: [311418.897134] Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.0019.101220160604 10/12/2016
>> > Oct 11 16:15:11 lorunde kernel: [311418.897745] Workqueue: ceph-msgr ceph_con_workfn [libceph]
>> > Oct 11 16:15:11 lorunde kernel: [311418.898355] task: ffff8801ce434280 task.stack: ffffc900151bc000
>> > Oct 11 16:15:11 lorunde kernel: [311418.899007] RIP: e030:memcpy_erms+0x6/0x10
>> > Oct 11 16:15:11 lorunde kernel: [311418.899616] RSP: e02b:ffffc900151bfac0 EFLAGS: 00010202
>> > Oct 11 16:15:11 lorunde kernel: [311418.900228] RAX: ffff8801b63df000 RBX: ffff88021b41be00 RCX: 0000000004df0000
>> > Oct 11 16:15:11 lorunde kernel: [311418.900848] RDX: 0000000004df0000 RSI: 4450736e24806564 RDI: ffff8801b63df000
>> > Oct 11 16:15:11 lorunde kernel: [311418.901479] RBP: ffffea0005fdd8c8 R08: ffff88028545d618 R09: 0000000000000010
>> > Oct 11 16:15:11 lorunde kernel: [311418.902104] R10: 0000000000000000 R11: ffff880215815000 R12: 0000000000000000
>> > Oct 11 16:15:11 lorunde kernel: [311418.902723] R13: ffff8802158156c0 R14: 0000000000000000 R15: ffff8801ce434280
>> > Oct 11 16:15:11 lorunde kernel: [311418.903359] FS:  0000000000000000(0000) GS:ffff880285440000(0000) knlGS:ffff880285440000
>> > Oct 11 16:15:11 lorunde kernel: [311418.903994] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > Oct 11 16:15:11 lorunde kernel: [311418.904627] CR2: 000055a8461cfc20 CR3: 0000000001809000 CR4: 0000000000042660
>> > Oct 11 16:15:11 lorunde kernel: [311418.905271] Call Trace:
>> > Oct 11 16:15:11 lorunde kernel: [311418.905909]  ? skb_copy_ubufs+0xef/0x290
>> > Oct 11 16:15:11 lorunde kernel: [311418.906548]  ? skb_clone+0x82/0x90
>> > Oct 11 16:15:11 lorunde kernel: [311418.907225]  ? tcp_transmit_skb+0x74/0x930
>> > Oct 11 16:15:11 lorunde kernel: [311418.907858]  ? tcp_write_xmit+0x1bd/0xfb0
>> > Oct 11 16:15:11 lorunde kernel: [311418.908490]  ? __sk_mem_raise_allocated+0x4e/0x220
>> > Oct 11 16:15:11 lorunde kernel: [311418.909122]  ? __tcp_push_pending_frames+0x28/0x90
>> > Oct 11 16:15:11 lorunde kernel: [311418.909755]  ? do_tcp_sendpages+0x4fc/0x590
>> > Oct 11 16:15:11 lorunde kernel: [311418.910386]  ? tcp_sendpage+0x7c/0xa0
>> > Oct 11 16:15:11 lorunde kernel: [311418.911026]  ? inet_sendpage+0x37/0xe0
>> > Oct 11 16:15:11 lorunde kernel: [311418.911655]  ? kernel_sendpage+0x12/0x20
>> > Oct 11 16:15:11 lorunde kernel: [311418.912297]  ? ceph_tcp_sendpage+0x5c/0xc0 [libceph]
>> > Oct 11 16:15:11 lorunde kernel: [311418.912926]  ? ceph_tcp_recvmsg+0x53/0x70 [libceph]
>> > Oct 11 16:15:11 lorunde kernel: [311418.913553]  ? ceph_con_workfn+0xd08/0x22a0 [libceph]
>> > Oct 11 16:15:11 lorunde kernel: [311418.914179]  ? ceph_osdc_start_request+0x23/0x30 [libceph]
>> > Oct 11 16:15:11 lorunde kernel: [311418.914807]  ? rbd_img_obj_request_submit+0x1ac/0x3c0 [rbd]
>> > Oct 11 16:15:11 lorunde kernel: [311418.915458]  ? process_one_work+0x1ad/0x340
>> > Oct 11 16:15:11 lorunde kernel: [311418.916083]  ? worker_thread+0x45/0x3f0
>> > Oct 11 16:15:11 lorunde kernel: [311418.916706]  ? kthread+0xf2/0x130
>> > Oct 11 16:15:11 lorunde kernel: [311418.917327]  ? process_one_work+0x340/0x340
>> > Oct 11 16:15:11 lorunde kernel: [311418.917946]  ? kthread_create_on_node+0x40/0x40
>> > Oct 11 16:15:11 lorunde kernel: [311418.918565]  ? do_group_exit+0x35/0xa0
>> > Oct 11 16:15:11 lorunde kernel: [311418.919215]  ? ret_from_fork+0x25/0x30
>> > Oct 11 16:15:11 lorunde kernel: [311418.919826] Code: 43 4e 5b eb ec eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
>> > Oct 11 16:15:11 lorunde kernel: [311418.921094] RIP: memcpy_erms+0x6/0x10 RSP: ffffc900151bfac0
>> > Oct 11 16:15:11 lorunde kernel: [311418.921970] ---[ end trace 904278a63cb49fca ]---
>>
>> It's a crash in memcpy() in skb_copy_ubufs().  It's not in ceph, but
>> ceph-induced, it looks like.  I don't remember seeing anything similar
>> in the context of krbd.
>>
>> This is a Xen dom0 kernel, right?  What did the workload look like?
>> Can you provide dmesg before the crash?
>>
>
> ...and to be clear:
>
> (gdb) list *(memcpy_erms+0x6)
> 0xffffffff8188f136 is at arch/x86/lib/memcpy_64.S:54.
> 49       * simpler than memcpy. Use memcpy_erms when possible.
> 50       */
> 51      ENTRY(memcpy_erms)
> 52              movq %rdi, %rax
> 53              movq %rdx, %rcx
> 54              rep movsb
> 55              ret
> 56      ENDPROC(memcpy_erms)
> 57
> 58      ENTRY(memcpy_orig)
>
> So either %rsi or %rdi held a bogus address at the time of the crash,
> most likely. If you have a vmcore, you may be able to dig in with crash
> and tell which address it was, and trace back up the call stack to where
> it came from.

I suspect src-side bustage.

>
> That said... %rcx looks quite large -- 81723392 bytes still to go in the
> copy. This might be a case where the copy length got screwed up somehow
> and it overran its bounds.

Yeah, suspiciously large.  I don't think it copied a single byte
though: %rcx never got decremented.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Re : [ceph-users] general protection fault: 0000 [#1] SMP
  2017-10-12  7:26     ` Re : " Olivier Bonvalet
@ 2017-10-12 13:58       ` Luis Henriques
  0 siblings, 0 replies; 18+ messages in thread
From: Luis Henriques @ 2017-10-12 13:58 UTC (permalink / raw)
  To: Olivier Bonvalet; +Cc: Jeff Layton, Ilya Dryomov, Ceph Users, Ceph Development

Olivier Bonvalet <ceph.list@daevel.fr> writes:

> Le jeudi 12 octobre 2017 à 09:12 +0200, Ilya Dryomov a écrit :
>> It's a crash in memcpy() in skb_copy_ubufs().  It's not in ceph, but
>> ceph-induced, it looks like.  I don't remember seeing anything
>> similar
>> in the context of krbd.
>> 
>> This is a Xen dom0 kernel, right?  What did the workload look like?
>> Can you provide dmesg before the crash?
>
> Hi,
>
> yes it's a Xen dom0 kernel. Linux 4.13.3, Xen 4.8.2, with an old
> 0.94.10 Ceph (so, Hammer).
>
> Before this error, I add this in logs :
>
> Oct 11 16:00:41 lorunde kernel: [310548.899082] libceph: read_partial_message ffff88021a910200 data crc 2306836368 != exp. 2215155875
> Oct 11 16:00:41 lorunde kernel: [310548.899841] libceph: osd117 10.0.0.31:6804 bad crc/signature
> Oct 11 16:02:25 lorunde kernel: [310652.695015] libceph: read_partial_message ffff880220b10100 data crc 842840543 != exp. 2657161714
> Oct 11 16:02:25 lorunde kernel: [310652.695731] libceph: osd3 10.0.0.26:6804 bad crc/signature
> Oct 11 16:07:24 lorunde kernel: [310952.485202] libceph: read_partial_message ffff88025d1aa400 data crc 938978341 != exp. 4154366769
> Oct 11 16:07:24 lorunde kernel: [310952.485870] libceph: osd117 10.0.0.31:6804 bad crc/signature
> Oct 11 16:10:44 lorunde kernel: [311151.841812] libceph: read_partial_message ffff880260300400 data crc 2988747958 != exp. 319958859
> Oct 11 16:10:44 lorunde kernel: [311151.842672] libceph: osd9 10.0.0.51:6802 bad crc/signature
> Oct 11 16:10:57 lorunde kernel: [311165.211412] libceph: read_partial_message ffff8802208b8300 data crc 369498361 != exp. 906022772
> Oct 11 16:10:57 lorunde kernel: [311165.212135] libceph: osd87 10.0.0.5:6800 bad crc/signature
> Oct 11 16:12:27 lorunde kernel: [311254.635767] libceph: read_partial_message ffff880236f9a000 data crc 2586662963 != exp. 2886241494
> Oct 11 16:12:27 lorunde kernel: [311254.636493] libceph: osd90 10.0.0.5:6814 bad crc/signature
> Oct 11 16:14:31 lorunde kernel: [311378.808191] libceph: read_partial_message ffff88027e633c00 data crc 1102363051 != exp. 679243837
> Oct 11 16:14:31 lorunde kernel: [311378.808889] libceph: osd13 10.0.0.21:6804 bad crc/signature
> Oct 11 16:15:01 lorunde kernel: [311409.431034] libceph: read_partial_message ffff88024ce0a800 data crc 2467415342 != exp. 1753860323
> Oct 11 16:15:01 lorunde kernel: [311409.431718] libceph: osd111 10.0.0.30:6804 bad crc/signature
> Oct 11 16:15:11 lorunde kernel: [311418.891238] general protection fault: 0000 [#1] SMP
>
>
> We had to switch to TCP Cubic (instead of badly configured TCP BBR, without FQ), to reduce the data crc errors.
> But since we still had some errors, last night we rebooted all the OSD nodes in Linux 4.4.91, instead of Linux 4.9.47 & 4.9.53.
>
> Since the last 7 hours, we haven't got any data crc errors from OSD, but we had one from a MON. Without hang/crash.

Since there are a bunch of errors before the GPF I suspect this bug is
related to some error paths that haven't been thoroughly tested (as it is
the case for error paths in general I guess).

My initial guess was a race in ceph_con_workfn:

 - An error returned from try_read() would cause a delayed retry (in
   function con_fault())
 - con_fault_finish() would then trigger a ceph_con_close/ceph_con_open in
   osd_fault.
 - the delayed retry kicks-in and the above close+open, which includes
   releasing con->in_msg and con->out_msg, could cause this GPF.

Unfortunately, I wasn't yet able to find any race there (probably because
there's none), but maybe there's a small window where this could occur.

I wonder if this occurred only once, or if this is something that is
easily triggerable.

Cheers,
-- 
Luis

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: general protection fault: 0000 [#1] SMP
       [not found] <550186fd-f426-08a6-8b32-e2818717b06a@molgen.mpg.de>
@ 2017-05-04 10:49 ` Jeff Layton
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff Layton @ 2017-05-04 10:49 UTC (permalink / raw)
  To: Paul Menzel, linux-nfs; +Cc: it+linux-nfs, J. Bruce Fields

On Thu, 2017-05-04 at 11:36 +0200, Paul Menzel wrote:
> Dear Linux folks,
> 
> 
> Rebooting a system running Linux 4.8.4 to Linux 4.9.25, the general 
> protection fault below showed up requiring another reboot of the system. 
> After that, the problem couldn’t be reproduced.
> 
> ```
> > [ 4110.000731] general protection fault: 0000 [#1] SMP
> > [ 4110.000748] Modules linked in: af_packet nfsv4 nfs xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables 8021q garp mrp stp llc nfsd ixgbe 3w_9xxx auth_rpcgss oid_registry nfs_acl lockd grace sunrpc ipv6 autofs4 unix
> > [ 4110.000942] CPU: 4 PID: 3677 Comm: grep Not tainted 4.9.25.mx64.152 #1
> > [ 4110.000959] Hardware name: Dell Inc. PowerEdge R720/0X6H47, BIOS 2.0.19 08/29/2013
> > [ 4110.000981] task: ffff88080687b1c0 task.stack: ffffc9000ecfc000
> > [ 4110.000999] RIP: 0010:[<ffffffffa0229db1>]  [<ffffffffa0229db1>] nfs4_put_open_state+0x51/0xe0 [nfsv4]
> > [ 4110.001034] RSP: 0018:ffffc9000ecffaf0  EFLAGS: 00010246
> > [ 4110.001051] RAX: dead000000000200 RBX: ffff8807ee04e780 RCX: 0000000000000001
> > [ 4110.001071] RDX: dead000000000100 RSI: ffff88080b68c240 RDI: ffff8807ea17b638
> > [ 4110.001093] RBP: ffffc9000ecffb08 R08: 0000000000008000 R09: ffff8807ee04e780
> > [ 4110.001114] R10: 0000000000000000 R11: ffff8807eaf55240 R12: ffff88080b68c200
> > [ 4110.001135] R13: ffff8807ea17b5b8 R14: ffff88080b68c200 R15: ffff8807e9047380
> > [ 4110.001157] FS:  00007f7846809700(0000) GS:ffff88080f880000(0000) knlGS:0000000000000000
> > [ 4110.001181] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 4110.001198] CR2: 0000000000661000 CR3: 00000007e9df8000 CR4: 00000000000406e0
> > [ 4110.001462] Stack:
> > [ 4110.001711]  ffff880808b7f400 ffff8807e90ab000 ffff8807e90a0800 ffffc9000ecffb28
> > [ 4110.002229]  ffffffffa0216b99 ffff880808b7f400 ffff88080b68c200 ffffc9000ecffbe8
> > [ 4110.002747]  ffffffffa021d030 ffffc900024000c0 0000000000000000 ffff8807e90473f8
> > [ 4110.003261] Call Trace:
> > [ 4110.003613]  [<ffffffffa0216b99>] nfs4_opendata_put+0x59/0xb0 [nfsv4]
> > [ 4110.003873]  [<ffffffffa021d030>] nfs4_do_open.constprop.54+0x420/0x7a0 [nfsv4]
> > [ 4110.004372]  [<ffffffffa021d44e>] nfs4_atomic_open+0xe/0x20 [nfsv4]
> > [ 4110.004634]  [<ffffffffa022c4d0>] nfs4_file_open+0xf0/0x220 [nfsv4]
> > [ 4110.004896]  [<ffffffffa01e8967>] ? nfs_permission+0xe7/0x1b0 [nfs]
> > [ 4110.005158]  [<ffffffff81086094>] ? try_to_wake_up+0x184/0x390
> > [ 4110.005419]  [<ffffffffa022c3e0>] ? nfs4_try_mount+0x60/0x60 [nfsv4]
> > [ 4110.005679]  [<ffffffff8119534f>] do_dentry_open.isra.1+0x15f/0x2f0
> > [ 4110.005938]  [<ffffffff811962ee>] vfs_open+0x4e/0x70
> > [ 4110.006195]  [<ffffffff811a6687>] path_openat+0x557/0x12b0
> > [ 4110.006453]  [<ffffffff811a8591>] do_filp_open+0x81/0xe0
> > [ 4110.006712]  [<ffffffff8149b306>] ? tty_ldisc_deref+0x16/0x20
> > [ 4110.006971]  [<ffffffff811a75c1>] ? getname_flags+0x61/0x210
> > [ 4110.007229]  [<ffffffff811b580f>] ? __alloc_fd+0x3f/0x170
> > [ 4110.007487]  [<ffffffff811966e9>] do_sys_open+0x139/0x200
> > [ 4110.007744]  [<ffffffff811967e4>] SyS_openat+0x14/0x20
> > [ 4110.008003]  [<ffffffff81a972e0>] entry_SYSCALL_64_fastpath+0x13/0x94
> > [ 4110.008261] Code: 00 4c 8b 6f ac 49 8d 74 24 40 e8 8b 65 1b e1 85 c0 0f 84 86 00 00 00 49 8d bd 80 00 00 00 e8 07 d2 86 e1 48 8b 53 10 48 8b 43 18 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 43 10 
> > [ 4110.009334] RIP  [<ffffffffa0229db1>] nfs4_put_open_state+0x51/0xe0 [nfsv4]
> > [ 4110.009606]  RSP <ffffc9000ecffaf0>
> > [ 4110.010444] ---[ end trace 321fb30dd41845f9 ]---```
> 
> Please find the full log attached.
> 

(FWIW, this is a client-side crash, but you cc'ed Bruce and I who are
the server maintainers)

This one doesn't look familiar to me.

It crashed while tearing down the opendata. On my machine that
instruction offset corresponds to one of the list_del calls in
nfs4_put_open_state. Your offset may be different. What you may want to
do is grab the debuginfo for that kernel (if it's stripped) and use gdb
to open nfsv4.ko and then do something like:

    gdb> list *(nfs4_put_open_state+0x51)

That should give you a listing around the exact spot of the crash.
Without a vmcore here though, you may be out of luck on really tracking
this down.

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* general protection fault: 0000 [#1] SMP
@ 2011-03-14 17:41 Justin P. Mattock
  0 siblings, 0 replies; 18+ messages in thread
From: Justin P. Mattock @ 2011-03-14 17:41 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Ive seen this before, this time though once this fired off the screen
kind of got garbled but the system was usable.
Note: I am having an issue with radeon(trying to figure out another 
issue)so maybe this is part of what I am seeing..:

  [ 7820.017431] general protection fault: 0000 [#1] SMP
[ 7820.017538] last sysfs file: 
/sys/devices/pci0000:00/0000:00:1d.3/usb5/5-1/5-1:1.0/bluetooth/hci0/hci0:46/input14/capabilities/sw
[ 7820.017628] CPU 0
[ 7820.017656] Modules linked in: evdev hidp xfrm4_mode_transport xcbc 
rmd160 sha512_generic rfcomm sco bnep l2cap radeon drm_kms_helper 
ipt_REJECT xt_tcpudp ipt_LOG iptable_nat nf_nat xt_state 
nf_conntrack_ftp nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 
iptable_filter ip_tables x_tables btusb bluetooth psmouse thermal fan 
container button ac battery video ath9k ath9k_common ath9k_hw ath ttm 
drm aes_x86_64 lzo zlib ipcomp xfrm_ipcomp crypto_null des_generic cast5 
blowfish serpent camellia twofish_generic twofish_x86_64 twofish_common 
ctr ah4 esp4 authenc firewire_ohci firewire_core uhci_hcd ehci_hcd 
coretemp acpi_cpufreq processor mperf appletouch applesmc
[ 7820.018011]
[ 7820.018011] Pid: 508, comm: kswapd0 Not tainted 
2.6.38-rc8-00123-g41d5502 #2 Apple Computer, Inc. MacBookPro2,2/Mac-F42187C8
[ 7820.018011] RIP: 0010:[<ffffffff8110eb45>]  [<ffffffff8110eb45>] 
evict+0x10/0x88
[ 7820.018011] RSP: 0000:ffff88003d645cb0  EFLAGS: 00010282
[ 7820.018011] RAX: e08e66c08e66d88e RBX: ffff88000008f050 RCX: 
0000000000000025
[ 7820.018011] RDX: ffff8800223e8188 RSI: ffffffff8116a98f RDI: 
ffff88000008f050
[ 7820.018011] RBP: ffff88003d645cc0 R08: 0000000000000080 R09: 
ffff880039cf6f80
[ 7820.018011] R10: ffff88003d645b80 R11: ffff880039cf70d8 R12: 
ffff88003d645d00
[ 7820.018011] R13: ffff88000008f108 R14: 0000000000000080 R15: 
0000000000000080
[ 7820.018011] FS:  0000000000000000(0000) GS:ffff88003ee00000(0000) 
knlGS:0000000000000000
[ 7820.018011] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 7820.018011] CR2: 0000000000a9e000 CR3: 0000000008f26000 CR4: 
00000000000006f0
[ 7820.018011] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[ 7820.018011] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[ 7820.018011] Process kswapd0 (pid: 508, threadinfo ffff88003d644000, 
task ffff88003d669560)
[ 7820.018011] Stack:
[ 7820.018011]  ffff880039cf7050 ffff88000008f050 ffff88003d645cf0 
ffffffff8110f17d
[ 7820.018011]  0000000000000000 ffff880005417a78 ffff880005417a88 
0000000000000000
[ 7820.018011]  ffff88003d645d40 ffffffff8110f9bb ffff8800223e8188 
ffff880002ba3a88
[ 7820.018011] Call Trace:
[ 7820.018011]  [<ffffffff8110f17d>] dispose_list+0x47/0xe3
[ 7820.018011]  [<ffffffff8110f9bb>] shrink_icache_memory+0x281/0x2b3
[ 7820.018011]  [<ffffffff810d04dc>] shrink_slab+0xde/0x162
[ 7820.018011]  [<ffffffff810d0b21>] kswapd+0x5c1/0x9af
[ 7820.018011]  [<ffffffff810d0560>] ? kswapd+0x0/0x9af
[ 7820.018011]  [<ffffffff8107bdaf>] kthread+0x7d/0x85
[ 7820.018011]  [<ffffffff8102e064>] kernel_thread_helper+0x4/0x10
[ 7820.018011]  [<ffffffff8107bd32>] ? kthread+0x0/0x85
[ 7820.018011]  [<ffffffff8102e060>] ? kernel_thread_helper+0x0/0x10
[ 7820.018011] Code: 10 81 be 07 00 00 00 e8 2f cc 33 00 48 c7 83 88 00 
00 00 60 00 00 00 5f 5b c9 c3 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 
47 18 <48> 8b 40 30 48 8b 40 28 48 85 c0 74 04 ff d0 eb 20 48 83 bf 10
[ 7820.018011] RIP  [<ffffffff8110eb45>] evict+0x10/0x88
[ 7820.018011]  RSP <ffff88003d645cb0>
[ 7820.028800] ---[ end trace 506063ca7889c564 ]---


full dmesg here:
http://fpaste.org/wzgx/

Justin P. Mattock


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: general protection fault: 0000 [#1] SMP
  2010-11-22 20:25       ` Hugh Dickins
@ 2010-11-22 21:44         ` Justin P. Mattock
  0 siblings, 0 replies; 18+ messages in thread
From: Justin P. Mattock @ 2010-11-22 21:44 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Jesper Juhl, Linux Kernel Mailing List

On 11/22/2010 12:25 PM, Hugh Dickins wrote:
> On Mon, 22 Nov 2010, Justin P. Mattock wrote:
>
>> <---- cut -------->
>>
>> cleaned the thread up.. Anyways after doing some debugging with gdb and with
>> valgrind for an application that keeps segfaulting I noticed this in dmesg:
>>
>> [ 3028.571941] FIREWALL:INPUT IN=wlan0 OUT=
>> MAC=ff:ff:ff:ff:ff:ff:00:23:12:65:cb:02:08:00 SRC=0.0.0.0
>> DST=255.255.255.255 LEN=328 TOS=0x00 PREC=0x00 TTL=255 ID=57984 PROTO=UDP
>> SPT=68 DPT=67 LEN=308
>> [ 3061.177655] BUG: Bad page state in process make  pfn:2134c
>> [ 3061.177661] page:ffffea00007438a0 count:0 mapcount:0 mapping:   (null)
>> index:0x507
>> [ 3061.177663] page flags: 0x4000000000000008(uptodate)
>> [ 3061.177669] Pid: 5691, comm: make Not tainted 2.6.37-rc2-00039-g0211924
>> #7
>> [ 3061.177671] Call Trace:
>> [ 3061.177680]  [<ffffffff810c5900>] ? dump_page+0xc0/0xc5
>> [ 3061.177684]  [<ffffffff810c5f18>] bad_page+0xd8/0xea
>> [ 3061.177688]  [<ffffffff810c7aeb>] get_page_from_freelist+0x344/0x4a0
>> [ 3061.177693]  [<ffffffff811bf123>] ? inode_has_perm+0x68/0x6a
>> [ 3061.177697]  [<ffffffff810c7d6b>] __alloc_pages_nodemask+0x124/0x645
>> [ 3061.177701]  [<ffffffff810f7cf2>] ? __dentry_open+0x194/0x2a1
>> [ 3061.177705]  [<ffffffff810dc155>] handle_mm_fault+0x2a8/0x82f
>> [ 3061.177710]  [<ffffffff811056ec>] ? do_filp_open+0x1f3/0x646
>> [ 3061.177714]  [<ffffffff810f4226>] ? check_object+0x13b/0x1eb
>> [ 3061.177719]  [<ffffffff81447d8e>] do_page_fault+0x3ec/0x411
>> [ 3061.177722]  [<ffffffff810f4b95>] ? free_debug_processing+0x1c5/0x208
>> [ 3061.177726]  [<ffffffff81103958>] ? getname+0x2c/0x1be
>> [ 3061.177728]  [<ffffffff810f4d08>] ? __slab_free+0x130/0x145
>> [ 3061.177732]  [<ffffffff81444e25>] page_fault+0x25/0x30
>> [ 3061.177734] Disabling lock debugging due to kernel taint
>> [ 3126.418774] type=1400 audit(1290451825.417:178): avc:  denied
>>
>> from what I remember using valgirnd with the app took a while to load but am
>> unsure if is the reason for the above message.
>
> This particular error is almost certainly fixed by rc3's patch below.
> Whether your earlier errors are a side-effect of the same Uptodate bug
> I cannot say: it's conceivable, but I don't see it as likely.  Maybe
> you should just move up to rc3 and see what happens with that.
>
> Hugh
>
> From: Markus Trippelsdorf<markus@trippelsdorf.de>
> Date: Thu, 18 Nov 2010 02:46:06 +0000 (-0500)
> Subject: ext4: fix setting random pages PageUptodate
> X-Git-Tag: v2.6.37-rc3~1^2~5
> X-Git-Url: http://127.0.0.1:1234/?p=.git;a=commitdiff_plain;h=08da1193d2c8c7a25d0cef7f85d0b9f1ad7c583a
>
> ext4: fix setting random pages PageUptodate
>
> ext4_end_bio calls put_page and kmem_cache_free before calling
> SetPageUpdate(). This can result in setting the PageUptodate bit on
> random pages and causes the following BUG:
>
>   BUG: Bad page state in process rm  pfn:52e54
>   page:ffffea0001222260 count:0 mapcount:0 mapping:          (null) index:0x0
>   arch kernel: page flags: 0x4000000000000008(uptodate)
>
> Fix the problem by moving put_io_page() after the SetPageUpdate() call.
>
> Thanks to Hugh Dickins for analyzing this problem.
>
> Reported-by: Markus Trippelsdorf<markus@trippelsdorf.de>
> Tested-by: Markus Trippelsdorf<markus@trippelsdorf.de>
> Signed-off-by: Markus Trippelsdorf<markus@trippelsdorf.de>
> Signed-off-by: "Theodore Ts'o"<tytso@mit.edu>
> ---
>
> diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
> index 7f5451c..beacce1 100644
> --- a/fs/ext4/page-io.c
> +++ b/fs/ext4/page-io.c
> @@ -237,8 +237,6 @@ static void ext4_end_bio(struct bio *bio, int error)
>   			} while (bh != head);
>   		}
>
> -		put_io_page(io_end->pages[i]);
> -
>   		/*
>   		 * If this is a partial write which happened to make
>   		 * all buffers uptodate then we can optimize away a
> @@ -248,6 +246,8 @@ static void ext4_end_bio(struct bio *bio, int error)
>   		 */
>   		if (!partial_write)
>   			SetPageUptodate(page);
> +
> +		put_io_page(io_end->pages[i]);
>   	}
>   	io_end->num_io_pages = 0;
>   	inode = io_end->inode;
>


alright.. will do..

Justin P. Mattock

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: general protection fault: 0000 [#1] SMP
  2010-11-22 19:01     ` Justin P. Mattock
@ 2010-11-22 20:25       ` Hugh Dickins
  2010-11-22 21:44         ` Justin P. Mattock
  0 siblings, 1 reply; 18+ messages in thread
From: Hugh Dickins @ 2010-11-22 20:25 UTC (permalink / raw)
  To: Justin P. Mattock; +Cc: Jesper Juhl, Linux Kernel Mailing List

On Mon, 22 Nov 2010, Justin P. Mattock wrote:

> <---- cut -------->
> 
> cleaned the thread up.. Anyways after doing some debugging with gdb and with
> valgrind for an application that keeps segfaulting I noticed this in dmesg:
> 
> [ 3028.571941] FIREWALL:INPUT IN=wlan0 OUT=
> MAC=ff:ff:ff:ff:ff:ff:00:23:12:65:cb:02:08:00 SRC=0.0.0.0
> DST=255.255.255.255 LEN=328 TOS=0x00 PREC=0x00 TTL=255 ID=57984 PROTO=UDP
> SPT=68 DPT=67 LEN=308
> [ 3061.177655] BUG: Bad page state in process make  pfn:2134c
> [ 3061.177661] page:ffffea00007438a0 count:0 mapcount:0 mapping:   (null)
> index:0x507
> [ 3061.177663] page flags: 0x4000000000000008(uptodate)
> [ 3061.177669] Pid: 5691, comm: make Not tainted 2.6.37-rc2-00039-g0211924
> #7
> [ 3061.177671] Call Trace:
> [ 3061.177680]  [<ffffffff810c5900>] ? dump_page+0xc0/0xc5
> [ 3061.177684]  [<ffffffff810c5f18>] bad_page+0xd8/0xea
> [ 3061.177688]  [<ffffffff810c7aeb>] get_page_from_freelist+0x344/0x4a0
> [ 3061.177693]  [<ffffffff811bf123>] ? inode_has_perm+0x68/0x6a
> [ 3061.177697]  [<ffffffff810c7d6b>] __alloc_pages_nodemask+0x124/0x645
> [ 3061.177701]  [<ffffffff810f7cf2>] ? __dentry_open+0x194/0x2a1
> [ 3061.177705]  [<ffffffff810dc155>] handle_mm_fault+0x2a8/0x82f
> [ 3061.177710]  [<ffffffff811056ec>] ? do_filp_open+0x1f3/0x646
> [ 3061.177714]  [<ffffffff810f4226>] ? check_object+0x13b/0x1eb
> [ 3061.177719]  [<ffffffff81447d8e>] do_page_fault+0x3ec/0x411
> [ 3061.177722]  [<ffffffff810f4b95>] ? free_debug_processing+0x1c5/0x208
> [ 3061.177726]  [<ffffffff81103958>] ? getname+0x2c/0x1be
> [ 3061.177728]  [<ffffffff810f4d08>] ? __slab_free+0x130/0x145
> [ 3061.177732]  [<ffffffff81444e25>] page_fault+0x25/0x30
> [ 3061.177734] Disabling lock debugging due to kernel taint
> [ 3126.418774] type=1400 audit(1290451825.417:178): avc:  denied
> 
> from what I remember using valgirnd with the app took a while to load but am
> unsure if is the reason for the above message.

This particular error is almost certainly fixed by rc3's patch below.
Whether your earlier errors are a side-effect of the same Uptodate bug
I cannot say: it's conceivable, but I don't see it as likely.  Maybe
you should just move up to rc3 and see what happens with that.

Hugh

From: Markus Trippelsdorf <markus@trippelsdorf.de>
Date: Thu, 18 Nov 2010 02:46:06 +0000 (-0500)
Subject: ext4: fix setting random pages PageUptodate
X-Git-Tag: v2.6.37-rc3~1^2~5
X-Git-Url: http://127.0.0.1:1234/?p=.git;a=commitdiff_plain;h=08da1193d2c8c7a25d0cef7f85d0b9f1ad7c583a

ext4: fix setting random pages PageUptodate

ext4_end_bio calls put_page and kmem_cache_free before calling
SetPageUpdate(). This can result in setting the PageUptodate bit on
random pages and causes the following BUG:

 BUG: Bad page state in process rm  pfn:52e54
 page:ffffea0001222260 count:0 mapcount:0 mapping:          (null) index:0x0
 arch kernel: page flags: 0x4000000000000008(uptodate)

Fix the problem by moving put_io_page() after the SetPageUpdate() call.

Thanks to Hugh Dickins for analyzing this problem.

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 7f5451c..beacce1 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -237,8 +237,6 @@ static void ext4_end_bio(struct bio *bio, int error)
 			} while (bh != head);
 		}
 
-		put_io_page(io_end->pages[i]);
-
 		/*
 		 * If this is a partial write which happened to make
 		 * all buffers uptodate then we can optimize away a
@@ -248,6 +246,8 @@ static void ext4_end_bio(struct bio *bio, int error)
 		 */
 		if (!partial_write)
 			SetPageUptodate(page);
+
+		put_io_page(io_end->pages[i]);
 	}
 	io_end->num_io_pages = 0;
 	inode = io_end->inode;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: general protection fault: 0000 [#1] SMP
  2010-11-20 22:32   ` Jesper Juhl
  2010-11-20 23:21     ` Justin P. Mattock
@ 2010-11-22 19:01     ` Justin P. Mattock
  2010-11-22 20:25       ` Hugh Dickins
  1 sibling, 1 reply; 18+ messages in thread
From: Justin P. Mattock @ 2010-11-22 19:01 UTC (permalink / raw)
  To: Jesper Juhl; +Cc: Linux Kernel Mailing List

<---- cut -------->

cleaned the thread up.. Anyways after doing some debugging with gdb and 
with valgrind for an application that keeps segfaulting I noticed this 
in dmesg:

[ 3028.571941] FIREWALL:INPUT IN=wlan0 OUT= 
MAC=ff:ff:ff:ff:ff:ff:00:23:12:65:cb:02:08:00 SRC=0.0.0.0 
DST=255.255.255.255 LEN=328 TOS=0x00 PREC=0x00 TTL=255 ID=57984 
PROTO=UDP SPT=68 DPT=67 LEN=308
[ 3061.177655] BUG: Bad page state in process make  pfn:2134c
[ 3061.177661] page:ffffea00007438a0 count:0 mapcount:0 mapping: 
   (null) index:0x507
[ 3061.177663] page flags: 0x4000000000000008(uptodate)
[ 3061.177669] Pid: 5691, comm: make Not tainted 
2.6.37-rc2-00039-g0211924 #7
[ 3061.177671] Call Trace:
[ 3061.177680]  [<ffffffff810c5900>] ? dump_page+0xc0/0xc5
[ 3061.177684]  [<ffffffff810c5f18>] bad_page+0xd8/0xea
[ 3061.177688]  [<ffffffff810c7aeb>] get_page_from_freelist+0x344/0x4a0
[ 3061.177693]  [<ffffffff811bf123>] ? inode_has_perm+0x68/0x6a
[ 3061.177697]  [<ffffffff810c7d6b>] __alloc_pages_nodemask+0x124/0x645
[ 3061.177701]  [<ffffffff810f7cf2>] ? __dentry_open+0x194/0x2a1
[ 3061.177705]  [<ffffffff810dc155>] handle_mm_fault+0x2a8/0x82f
[ 3061.177710]  [<ffffffff811056ec>] ? do_filp_open+0x1f3/0x646
[ 3061.177714]  [<ffffffff810f4226>] ? check_object+0x13b/0x1eb
[ 3061.177719]  [<ffffffff81447d8e>] do_page_fault+0x3ec/0x411
[ 3061.177722]  [<ffffffff810f4b95>] ? free_debug_processing+0x1c5/0x208
[ 3061.177726]  [<ffffffff81103958>] ? getname+0x2c/0x1be
[ 3061.177728]  [<ffffffff810f4d08>] ? __slab_free+0x130/0x145
[ 3061.177732]  [<ffffffff81444e25>] page_fault+0x25/0x30
[ 3061.177734] Disabling lock debugging due to kernel taint
[ 3126.418774] type=1400 audit(1290451825.417:178): avc:  denied

from what I remember using valgirnd with the app took a while to load 
but am unsure if is the reason for the above message.

Justin P. Mattock

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: general protection fault: 0000 [#1] SMP
  2010-11-20 22:32   ` Jesper Juhl
@ 2010-11-20 23:21     ` Justin P. Mattock
  2010-11-22 19:01     ` Justin P. Mattock
  1 sibling, 0 replies; 18+ messages in thread
From: Justin P. Mattock @ 2010-11-20 23:21 UTC (permalink / raw)
  To: Jesper Juhl; +Cc: Linux Kernel Mailing List

On 11/20/2010 02:32 PM, Jesper Juhl wrote:
> On Sat, 20 Nov 2010, Jesper Juhl wrote:
>
>> On Sat, 20 Nov 2010, Justin Mattock wrote:
>>
>>> Ive seen this before, but could not reproduce for a bisect.. basically
>>> what I remember doing
>>> was building webkit(let sit and compile) passed out, woke up at 5AM
>>> closed the lid on the machine,few hrs later
>>> woke up, went for a run, came back opened the lid and this:
>>>
>>> [43925.668053] general protection fault: 0000 [#1] SMP
>>> [43925.668059] last sysfs file: /sys/devices/platform/applesmc.768/light
>>> [43925.668061] CPU 0
>>> [43925.668063] Modules linked in: firewire_sbp2 radeon sco bnep ttm
>>> drm_kms_helper drm ipt_LOG iptable_nat nf_nat xt_state
>>> nf_conntrack_ftp nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4
>>> iptable_filter ip_tables x_tables ath9k ath9k_common video ath9k_hw
>>> sky2 firewire_ohci battery ac ath evdev joydev button firewire_core
>>> i2c_i801 kvm_intel aes_x86_64 lzo zlib ipcomp xfrm_ipcomp crypto_null
>>> sha256_generic cbc des_generic cast5 blowfish serpent camellia
>>> twofish_generic twofish_x86_64 twofish_common ctr ah4 esp4 authenc
>>> uhci_hcd ehci_hcd hci_uart rfcomm btusb hidp l2cap bluetooth coretemp
>>> acpi_cpufreq processor mperf appletouch applesmc uvcvideo
>>> [43925.668120]
>>> [43925.668123] Pid: 27262, comm: make Not tainted
>>> 2.6.37-rc2-00037-g7957f0a-dirty #6 Mac-F42187C8/MacBookPro2,2
>>> [43925.668126] RIP: 0010:[<ffffffff811bf10a>]  [<ffffffff811bf10a>]
>>> inode_has_perm+0x53/0x6a
>>> [43925.668135] RSP: 0018:ffff88003c5a5bc8  EFLAGS: 00010282
>>> [43925.668137] RAX: ffff88003826a208 RBX: ffff88000008ed80 RCX: ffff88003c5a5c68
>>> [43925.668140] RDX: 0000000000000002 RSI: ffff88000008ed80 RDI: ffff88002feacc00
>>> [43925.668142] RBP: ffff88003c5a5c58 R08: ffff88003c5a5c68 R09: 00000000000000d5
>>> [43925.668145] R10: 050366048b660e04 R11: 0000000000000000 R12: 0000000000000024
>>> [43925.668147] R13: 00000000ffffffd8 R14: 0000000000000000 R15: 0000000000000000
>>> [43925.668150] FS:  00007f4f786b3700(0000) GS:ffff88003ee00000(0000)
>>> knlGS:0000000000000000
>>> [43925.668153] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [43925.668155] CR2: 00007f4f78637000 CR3: 00000000383ac000 CR4: 00000000000006e0
>>> [43925.668158] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [43925.668161] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>> [43925.668163] Process make (pid: 27262, threadinfo ffff88003c5a4000,
>>> task ffff880001afb410)
>>> [43925.668165] Stack:
>>> [43925.668167]  ffff880038a98060 0000000000000000 ffff88003c5a5c48
>>> ffffffff81182b7c
>>> [43925.668172]  ffff88003cab2688 ffff880024da9990 ffff88003caa18d8
>>> ffff880038a98060
>>> [43925.668177]  ffff880024da98b0 ffffea0000a54940 ffff88003c5a5c78
>>> ffff88003d402500
>>> [43925.668182] Call Trace:
>>> [43925.668189]  [<ffffffff81182b7c>] ? jbd2_journal_stop+0x21e/0x230
>>> [43925.668193]  [<ffffffff811be4bb>] ? selinux_cred_free+0xb/0x27
>>> [43925.668196]  [<ffffffff811be441>] ? selinux_file_alloc_security+0x4a/0xb9
>>> [43925.668201]  [<ffffffff810f4226>] ? check_object+0x13b/0x1eb
>>> [43925.668205]  [<ffffffff811bf853>] selinux_inode_permission+0xd2/0xd4
>>> [43925.668211]  [<ffffffff811bbf9c>] security_inode_permission+0x1c/0x1e
>>> [43925.668215]  [<ffffffff81101ab2>] inode_permission+0x87/0x93
>>> [43925.668218]  [<ffffffff81102e86>] may_open+0x9e/0x11e
>>> [43925.668221]  [<ffffffff8110373e>] do_last+0x542/0x6fa
>>> [43925.668225]  [<ffffffff811056ec>] do_filp_open+0x1f3/0x646
>>> [43925.668228]  [<ffffffff810f4226>] ? check_object+0x13b/0x1eb
>>> [43925.668232]  [<ffffffff81103958>] ? getname+0x2c/0x1be
>>> [43925.668236]  [<ffffffff8110eca8>] ? alloc_fd+0x111/0x123
>>> [43925.668240]  [<ffffffff810f7a84>] do_sys_open+0x5b/0xf8
>>> [43925.668243]  [<ffffffff810f7b4a>] sys_open+0x1b/0x1d
>>> [43925.668248]  [<ffffffff8102b542>] system_call_fastpath+0x16/0x1b
>>> [43925.668250] Code: 02 00 00 44 8b 48 04 48 85 c9 75 1f 4c 8d 85 70
>>> ff ff ff b9 22 00 00 00 4c 89 c7 44 89 d8 f3 ab c6 85 70 ff ff ff 01
>>> 48 89 75 90<41>  0f b7 42 20 89 d1 41 8b 72 1c 89 c2 44 89 cf e8 99 e7
>>> ff ff
>>> [43925.668288] RIP  [<ffffffff811bf10a>] inode_has_perm+0x53/0x6a
>>> [43925.668291]  RSP<ffff88003c5a5bc8>
>>> [43925.668295] ---[ end trace 75bdddc506717838 ]---
>>> [43934.866252] general protection fault: 0000 [#2] SMP
>>> [43934.866257] last sysfs file: /sys/devices/platform/applesmc.768/light
>>> [43934.866260] CPU 0
>>> [43934.866261] Modules linked in: firewire_sbp2 radeon sco bnep ttm
>>> drm_kms_helper drm ipt_LOG iptable_nat nf_nat xt_state
>>> nf_conntrack_ftp nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4
>>> iptable_filter ip_tables x_tables ath9k ath9k_common video ath9k_hw
>>> sky2 firewire_ohci battery ac ath evdev joydev button firewire_core
>>> i2c_i801 kvm_intel aes_x86_64 lzo zlib ipcomp xfrm_ipcomp crypto_null
>>> sha256_generic cbc des_generic cast5 blowfish serpent camellia
>>> twofish_generic twofish_x86_64 twofish_common ctr ah4 esp4 authenc
>>> uhci_hcd ehci_hcd hci_uart rfcomm btusb hidp l2cap bluetooth coretemp
>>> acpi_cpufreq processor mperf appletouch applesmc uvcvideo
>>> [43934.866318]
>>> [43934.866321] Pid: 27283, comm: make Tainted: G      D
>>> 2.6.37-rc2-00037-g7957f0a-dirty #6 Mac-F42187C8/MacBookPro2,2
>>> [43934.866324] RIP: 0010:[<ffffffff811bf10a>]  [<ffffffff811bf10a>]
>>> inode_has_perm+0x53/0x6a
>>> [43934.866334] RSP: 0018:ffff88003c5a5bc8  EFLAGS: 00010282
>>> [43934.866336] RAX: ffff88003807a958 RBX: ffff88000008ed80 RCX: ffff88003c5a5c68
>>> [43934.866339] RDX: 0000000000000002 RSI: ffff88000008ed80 RDI: ffff880034b01700
>>> [43934.866341] RBP: ffff88003c5a5c58 R08: ffff88003c5a5c68 R09: 00000000000000d5
>>> [43934.866343] R10: 050366048b660e04 R11: 0000000000000000 R12: 0000000000000024
>>> [43934.866346] R13: 00000000ffffffd8 R14: 0000000000000000 R15: 0000000000000000
>>> [43934.866349] FS:  00007fdf0a661700(0000) GS:ffff88003ee00000(0000)
>>> knlGS:0000000000000000
>>> [43934.866352] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [43934.866354] CR2: 00007fdf0a5e5000 CR3: 0000000029800000 CR4: 00000000000006e0
>>> [43934.866357] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [43934.866359] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>> [43934.866362] Process make (pid: 27283, threadinfo ffff88003c5a4000,
>>> task ffff880001afb410)
>>> [43934.866364] Stack:
>>> [43934.866366]  ffff88002f398a50 ffff880024da9990 000000003c5a5c78
>>> ffffffff81810be8
>>> [43934.866371]  0020000000000001 0000000000000001 0000000000001000
>>> ffff880037bc0a00
>>> [43934.866375]  0000000000001000 ffffea0000a54940 ffff88003c5a5d18
>>> ffff88003d402500
>>> [43934.866380] Call Trace:
>>> [43934.866385]  [<ffffffff811be4bb>] ? selinux_cred_free+0xb/0x27
>>> [43934.866389]  [<ffffffff811be441>] ? selinux_file_alloc_security+0x4a/0xb9
>>> [43934.866395]  [<ffffffff810f4226>] ? check_object+0x13b/0x1eb
>>> [43934.866398]  [<ffffffff811bf853>] selinux_inode_permission+0xd2/0xd4
>>> [43934.866404]  [<ffffffff811bbf9c>] security_inode_permission+0x1c/0x1e
>>> [43934.866409]  [<ffffffff81101ab2>] inode_permission+0x87/0x93
>>> [43934.866412]  [<ffffffff81102e86>] may_open+0x9e/0x11e
>>> [43934.866415]  [<ffffffff8110373e>] do_last+0x542/0x6fa
>>> [43934.866419]  [<ffffffff811056ec>] do_filp_open+0x1f3/0x646
>>> [43934.866422]  [<ffffffff810f4226>] ? check_object+0x13b/0x1eb
>>> [43934.866426]  [<ffffffff81103958>] ? getname+0x2c/0x1be
>>> [43934.866430]  [<ffffffff8110eca8>] ? alloc_fd+0x111/0x123
>>> [43934.866433]  [<ffffffff810f7a84>] do_sys_open+0x5b/0xf8
>>> [43934.866437]  [<ffffffff810f7b4a>] sys_open+0x1b/0x1d
>>> [43934.866441]  [<ffffffff8102b542>] system_call_fastpath+0x16/0x1b
>>> [43934.866443] Code: 02 00 00 44 8b 48 04 48 85 c9 75 1f 4c 8d 85 70
>>> ff ff ff b9 22 00 00 00 4c 89 c7 44 89 d8 f3 ab c6 85 70 ff ff ff 01
>>> 48 89 75 90<41>  0f b7 42 20 89 d1 41 8b 72 1c 89 c2 44 89 cf e8 99 e7
>>> ff ff
>>> [43934.866481] RIP  [<ffffffff811bf10a>] inode_has_perm+0x53/0x6a
>>> [43934.866484]  RSP<ffff88003c5a5bc8>
>>> [43934.866488] ---[ end trace 75bdddc506717839 ]---
>>>
>> [...]
>>
>> Hmm, ok, I have no idea about the root cause of this problem, but I did
>> notice one thing about selinux_cred_free() that's different than most
>> other freeing functions in the kernel. It does not accept a NULL value.
>> Most other freeing functions will just return if passed NULL, but
>> selinux_cred_free() will crash.
>> I wonder if it would make sense to add a NULL 'short circuit' to that
>> function? If so, please pick up the patch below.
>>
>>
>> Signed-off-by: Jesper Juhl<jj@chaosbits.net>
>> ---
>>   hooks.c |    6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
>> index 65fa8bf..d088532 100644
>> --- a/security/selinux/hooks.c
>> +++ b/security/selinux/hooks.c
>> @@ -3193,11 +3193,11 @@ static int selinux_cred_alloc_blank(struct cred *cred, gfp_t gfp)
>>    */
>>   static void selinux_cred_free(struct cred *cred)
>>   {
>> -	struct task_security_struct *tsec = cred->security;
>> -
>> +	if (!cred)
>> +		return;
>>   	BUG_ON((unsigned long) cred->security<  PAGE_SIZE);
>>   	cred->security = (void *) 0x7UL;
>> -	kfree(tsec);
>> +	kfree(cred->security);
>>   }
>>
>>   /*
>>
>
> Arrgh, sent the wrong (early version) patch. This is what it should have
> been:
>
>
> Signed-off-by: Jesper Juhl<jj@chaosbits.net>
> ---
>   hooks.c |    5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 65fa8bf..00f28dc 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -3193,9 +3193,12 @@ static int selinux_cred_alloc_blank(struct cred *cred, gfp_t gfp)
>    */
>   static void selinux_cred_free(struct cred *cred)
>   {
> -	struct task_security_struct *tsec = cred->security;
> +	struct task_security_struct *tsec;
>
> +	if (!cred)
> +		return;
>   	BUG_ON((unsigned long) cred->security<  PAGE_SIZE);
> +	tsec = cred->security;
>   	cred->security = (void *) 0x7UL;
>   	kfree(tsec);
>   }
>
>
>


sure.. I'll load this patch in.. I will post if I see anything out of 
the ordinary.

Justin P. Mattock

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: general protection fault: 0000 [#1] SMP
  2010-11-20 22:28 ` Jesper Juhl
@ 2010-11-20 22:32   ` Jesper Juhl
  2010-11-20 23:21     ` Justin P. Mattock
  2010-11-22 19:01     ` Justin P. Mattock
  0 siblings, 2 replies; 18+ messages in thread
From: Jesper Juhl @ 2010-11-20 22:32 UTC (permalink / raw)
  To: Justin Mattock; +Cc: Linux Kernel Mailing List

On Sat, 20 Nov 2010, Jesper Juhl wrote:

> On Sat, 20 Nov 2010, Justin Mattock wrote:
> 
> > Ive seen this before, but could not reproduce for a bisect.. basically
> > what I remember doing
> > was building webkit(let sit and compile) passed out, woke up at 5AM
> > closed the lid on the machine,few hrs later
> > woke up, went for a run, came back opened the lid and this:
> > 
> > [43925.668053] general protection fault: 0000 [#1] SMP
> > [43925.668059] last sysfs file: /sys/devices/platform/applesmc.768/light
> > [43925.668061] CPU 0
> > [43925.668063] Modules linked in: firewire_sbp2 radeon sco bnep ttm
> > drm_kms_helper drm ipt_LOG iptable_nat nf_nat xt_state
> > nf_conntrack_ftp nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4
> > iptable_filter ip_tables x_tables ath9k ath9k_common video ath9k_hw
> > sky2 firewire_ohci battery ac ath evdev joydev button firewire_core
> > i2c_i801 kvm_intel aes_x86_64 lzo zlib ipcomp xfrm_ipcomp crypto_null
> > sha256_generic cbc des_generic cast5 blowfish serpent camellia
> > twofish_generic twofish_x86_64 twofish_common ctr ah4 esp4 authenc
> > uhci_hcd ehci_hcd hci_uart rfcomm btusb hidp l2cap bluetooth coretemp
> > acpi_cpufreq processor mperf appletouch applesmc uvcvideo
> > [43925.668120]
> > [43925.668123] Pid: 27262, comm: make Not tainted
> > 2.6.37-rc2-00037-g7957f0a-dirty #6 Mac-F42187C8/MacBookPro2,2
> > [43925.668126] RIP: 0010:[<ffffffff811bf10a>]  [<ffffffff811bf10a>]
> > inode_has_perm+0x53/0x6a
> > [43925.668135] RSP: 0018:ffff88003c5a5bc8  EFLAGS: 00010282
> > [43925.668137] RAX: ffff88003826a208 RBX: ffff88000008ed80 RCX: ffff88003c5a5c68
> > [43925.668140] RDX: 0000000000000002 RSI: ffff88000008ed80 RDI: ffff88002feacc00
> > [43925.668142] RBP: ffff88003c5a5c58 R08: ffff88003c5a5c68 R09: 00000000000000d5
> > [43925.668145] R10: 050366048b660e04 R11: 0000000000000000 R12: 0000000000000024
> > [43925.668147] R13: 00000000ffffffd8 R14: 0000000000000000 R15: 0000000000000000
> > [43925.668150] FS:  00007f4f786b3700(0000) GS:ffff88003ee00000(0000)
> > knlGS:0000000000000000
> > [43925.668153] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [43925.668155] CR2: 00007f4f78637000 CR3: 00000000383ac000 CR4: 00000000000006e0
> > [43925.668158] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [43925.668161] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [43925.668163] Process make (pid: 27262, threadinfo ffff88003c5a4000,
> > task ffff880001afb410)
> > [43925.668165] Stack:
> > [43925.668167]  ffff880038a98060 0000000000000000 ffff88003c5a5c48
> > ffffffff81182b7c
> > [43925.668172]  ffff88003cab2688 ffff880024da9990 ffff88003caa18d8
> > ffff880038a98060
> > [43925.668177]  ffff880024da98b0 ffffea0000a54940 ffff88003c5a5c78
> > ffff88003d402500
> > [43925.668182] Call Trace:
> > [43925.668189]  [<ffffffff81182b7c>] ? jbd2_journal_stop+0x21e/0x230
> > [43925.668193]  [<ffffffff811be4bb>] ? selinux_cred_free+0xb/0x27
> > [43925.668196]  [<ffffffff811be441>] ? selinux_file_alloc_security+0x4a/0xb9
> > [43925.668201]  [<ffffffff810f4226>] ? check_object+0x13b/0x1eb
> > [43925.668205]  [<ffffffff811bf853>] selinux_inode_permission+0xd2/0xd4
> > [43925.668211]  [<ffffffff811bbf9c>] security_inode_permission+0x1c/0x1e
> > [43925.668215]  [<ffffffff81101ab2>] inode_permission+0x87/0x93
> > [43925.668218]  [<ffffffff81102e86>] may_open+0x9e/0x11e
> > [43925.668221]  [<ffffffff8110373e>] do_last+0x542/0x6fa
> > [43925.668225]  [<ffffffff811056ec>] do_filp_open+0x1f3/0x646
> > [43925.668228]  [<ffffffff810f4226>] ? check_object+0x13b/0x1eb
> > [43925.668232]  [<ffffffff81103958>] ? getname+0x2c/0x1be
> > [43925.668236]  [<ffffffff8110eca8>] ? alloc_fd+0x111/0x123
> > [43925.668240]  [<ffffffff810f7a84>] do_sys_open+0x5b/0xf8
> > [43925.668243]  [<ffffffff810f7b4a>] sys_open+0x1b/0x1d
> > [43925.668248]  [<ffffffff8102b542>] system_call_fastpath+0x16/0x1b
> > [43925.668250] Code: 02 00 00 44 8b 48 04 48 85 c9 75 1f 4c 8d 85 70
> > ff ff ff b9 22 00 00 00 4c 89 c7 44 89 d8 f3 ab c6 85 70 ff ff ff 01
> > 48 89 75 90 <41> 0f b7 42 20 89 d1 41 8b 72 1c 89 c2 44 89 cf e8 99 e7
> > ff ff
> > [43925.668288] RIP  [<ffffffff811bf10a>] inode_has_perm+0x53/0x6a
> > [43925.668291]  RSP <ffff88003c5a5bc8>
> > [43925.668295] ---[ end trace 75bdddc506717838 ]---
> > [43934.866252] general protection fault: 0000 [#2] SMP
> > [43934.866257] last sysfs file: /sys/devices/platform/applesmc.768/light
> > [43934.866260] CPU 0
> > [43934.866261] Modules linked in: firewire_sbp2 radeon sco bnep ttm
> > drm_kms_helper drm ipt_LOG iptable_nat nf_nat xt_state
> > nf_conntrack_ftp nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4
> > iptable_filter ip_tables x_tables ath9k ath9k_common video ath9k_hw
> > sky2 firewire_ohci battery ac ath evdev joydev button firewire_core
> > i2c_i801 kvm_intel aes_x86_64 lzo zlib ipcomp xfrm_ipcomp crypto_null
> > sha256_generic cbc des_generic cast5 blowfish serpent camellia
> > twofish_generic twofish_x86_64 twofish_common ctr ah4 esp4 authenc
> > uhci_hcd ehci_hcd hci_uart rfcomm btusb hidp l2cap bluetooth coretemp
> > acpi_cpufreq processor mperf appletouch applesmc uvcvideo
> > [43934.866318]
> > [43934.866321] Pid: 27283, comm: make Tainted: G      D
> > 2.6.37-rc2-00037-g7957f0a-dirty #6 Mac-F42187C8/MacBookPro2,2
> > [43934.866324] RIP: 0010:[<ffffffff811bf10a>]  [<ffffffff811bf10a>]
> > inode_has_perm+0x53/0x6a
> > [43934.866334] RSP: 0018:ffff88003c5a5bc8  EFLAGS: 00010282
> > [43934.866336] RAX: ffff88003807a958 RBX: ffff88000008ed80 RCX: ffff88003c5a5c68
> > [43934.866339] RDX: 0000000000000002 RSI: ffff88000008ed80 RDI: ffff880034b01700
> > [43934.866341] RBP: ffff88003c5a5c58 R08: ffff88003c5a5c68 R09: 00000000000000d5
> > [43934.866343] R10: 050366048b660e04 R11: 0000000000000000 R12: 0000000000000024
> > [43934.866346] R13: 00000000ffffffd8 R14: 0000000000000000 R15: 0000000000000000
> > [43934.866349] FS:  00007fdf0a661700(0000) GS:ffff88003ee00000(0000)
> > knlGS:0000000000000000
> > [43934.866352] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [43934.866354] CR2: 00007fdf0a5e5000 CR3: 0000000029800000 CR4: 00000000000006e0
> > [43934.866357] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [43934.866359] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [43934.866362] Process make (pid: 27283, threadinfo ffff88003c5a4000,
> > task ffff880001afb410)
> > [43934.866364] Stack:
> > [43934.866366]  ffff88002f398a50 ffff880024da9990 000000003c5a5c78
> > ffffffff81810be8
> > [43934.866371]  0020000000000001 0000000000000001 0000000000001000
> > ffff880037bc0a00
> > [43934.866375]  0000000000001000 ffffea0000a54940 ffff88003c5a5d18
> > ffff88003d402500
> > [43934.866380] Call Trace:
> > [43934.866385]  [<ffffffff811be4bb>] ? selinux_cred_free+0xb/0x27
> > [43934.866389]  [<ffffffff811be441>] ? selinux_file_alloc_security+0x4a/0xb9
> > [43934.866395]  [<ffffffff810f4226>] ? check_object+0x13b/0x1eb
> > [43934.866398]  [<ffffffff811bf853>] selinux_inode_permission+0xd2/0xd4
> > [43934.866404]  [<ffffffff811bbf9c>] security_inode_permission+0x1c/0x1e
> > [43934.866409]  [<ffffffff81101ab2>] inode_permission+0x87/0x93
> > [43934.866412]  [<ffffffff81102e86>] may_open+0x9e/0x11e
> > [43934.866415]  [<ffffffff8110373e>] do_last+0x542/0x6fa
> > [43934.866419]  [<ffffffff811056ec>] do_filp_open+0x1f3/0x646
> > [43934.866422]  [<ffffffff810f4226>] ? check_object+0x13b/0x1eb
> > [43934.866426]  [<ffffffff81103958>] ? getname+0x2c/0x1be
> > [43934.866430]  [<ffffffff8110eca8>] ? alloc_fd+0x111/0x123
> > [43934.866433]  [<ffffffff810f7a84>] do_sys_open+0x5b/0xf8
> > [43934.866437]  [<ffffffff810f7b4a>] sys_open+0x1b/0x1d
> > [43934.866441]  [<ffffffff8102b542>] system_call_fastpath+0x16/0x1b
> > [43934.866443] Code: 02 00 00 44 8b 48 04 48 85 c9 75 1f 4c 8d 85 70
> > ff ff ff b9 22 00 00 00 4c 89 c7 44 89 d8 f3 ab c6 85 70 ff ff ff 01
> > 48 89 75 90 <41> 0f b7 42 20 89 d1 41 8b 72 1c 89 c2 44 89 cf e8 99 e7
> > ff ff
> > [43934.866481] RIP  [<ffffffff811bf10a>] inode_has_perm+0x53/0x6a
> > [43934.866484]  RSP <ffff88003c5a5bc8>
> > [43934.866488] ---[ end trace 75bdddc506717839 ]---
> > 
> [...]
> 
> Hmm, ok, I have no idea about the root cause of this problem, but I did 
> notice one thing about selinux_cred_free() that's different than most 
> other freeing functions in the kernel. It does not accept a NULL value.
> Most other freeing functions will just return if passed NULL, but 
> selinux_cred_free() will crash.
> I wonder if it would make sense to add a NULL 'short circuit' to that 
> function? If so, please pick up the patch below.
> 
> 
> Signed-off-by: Jesper Juhl <jj@chaosbits.net>
> ---
>  hooks.c |    6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 65fa8bf..d088532 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -3193,11 +3193,11 @@ static int selinux_cred_alloc_blank(struct cred *cred, gfp_t gfp)
>   */
>  static void selinux_cred_free(struct cred *cred)
>  {
> -	struct task_security_struct *tsec = cred->security;
> -
> +	if (!cred)
> +		return;
>  	BUG_ON((unsigned long) cred->security < PAGE_SIZE);
>  	cred->security = (void *) 0x7UL;
> -	kfree(tsec);
> +	kfree(cred->security);
>  }
>  
>  /*
> 

Arrgh, sent the wrong (early version) patch. This is what it should have 
been:


Signed-off-by: Jesper Juhl <jj@chaosbits.net>
---
 hooks.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 65fa8bf..00f28dc 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3193,9 +3193,12 @@ static int selinux_cred_alloc_blank(struct cred *cred, gfp_t gfp)
  */
 static void selinux_cred_free(struct cred *cred)
 {
-	struct task_security_struct *tsec = cred->security;
+	struct task_security_struct *tsec;
 
+	if (!cred)
+		return;
 	BUG_ON((unsigned long) cred->security < PAGE_SIZE);
+	tsec = cred->security;
 	cred->security = (void *) 0x7UL;
 	kfree(tsec);
 }



-- 
Jesper Juhl <jj@chaosbits.net>            http://www.chaosbits.net/
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please.


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: general protection fault: 0000 [#1] SMP
  2010-11-20 16:35 Justin Mattock
@ 2010-11-20 22:28 ` Jesper Juhl
  2010-11-20 22:32   ` Jesper Juhl
  0 siblings, 1 reply; 18+ messages in thread
From: Jesper Juhl @ 2010-11-20 22:28 UTC (permalink / raw)
  To: Justin Mattock; +Cc: Linux Kernel Mailing List

On Sat, 20 Nov 2010, Justin Mattock wrote:

> Ive seen this before, but could not reproduce for a bisect.. basically
> what I remember doing
> was building webkit(let sit and compile) passed out, woke up at 5AM
> closed the lid on the machine,few hrs later
> woke up, went for a run, came back opened the lid and this:
> 
> [43925.668053] general protection fault: 0000 [#1] SMP
> [43925.668059] last sysfs file: /sys/devices/platform/applesmc.768/light
> [43925.668061] CPU 0
> [43925.668063] Modules linked in: firewire_sbp2 radeon sco bnep ttm
> drm_kms_helper drm ipt_LOG iptable_nat nf_nat xt_state
> nf_conntrack_ftp nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4
> iptable_filter ip_tables x_tables ath9k ath9k_common video ath9k_hw
> sky2 firewire_ohci battery ac ath evdev joydev button firewire_core
> i2c_i801 kvm_intel aes_x86_64 lzo zlib ipcomp xfrm_ipcomp crypto_null
> sha256_generic cbc des_generic cast5 blowfish serpent camellia
> twofish_generic twofish_x86_64 twofish_common ctr ah4 esp4 authenc
> uhci_hcd ehci_hcd hci_uart rfcomm btusb hidp l2cap bluetooth coretemp
> acpi_cpufreq processor mperf appletouch applesmc uvcvideo
> [43925.668120]
> [43925.668123] Pid: 27262, comm: make Not tainted
> 2.6.37-rc2-00037-g7957f0a-dirty #6 Mac-F42187C8/MacBookPro2,2
> [43925.668126] RIP: 0010:[<ffffffff811bf10a>]  [<ffffffff811bf10a>]
> inode_has_perm+0x53/0x6a
> [43925.668135] RSP: 0018:ffff88003c5a5bc8  EFLAGS: 00010282
> [43925.668137] RAX: ffff88003826a208 RBX: ffff88000008ed80 RCX: ffff88003c5a5c68
> [43925.668140] RDX: 0000000000000002 RSI: ffff88000008ed80 RDI: ffff88002feacc00
> [43925.668142] RBP: ffff88003c5a5c58 R08: ffff88003c5a5c68 R09: 00000000000000d5
> [43925.668145] R10: 050366048b660e04 R11: 0000000000000000 R12: 0000000000000024
> [43925.668147] R13: 00000000ffffffd8 R14: 0000000000000000 R15: 0000000000000000
> [43925.668150] FS:  00007f4f786b3700(0000) GS:ffff88003ee00000(0000)
> knlGS:0000000000000000
> [43925.668153] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [43925.668155] CR2: 00007f4f78637000 CR3: 00000000383ac000 CR4: 00000000000006e0
> [43925.668158] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [43925.668161] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [43925.668163] Process make (pid: 27262, threadinfo ffff88003c5a4000,
> task ffff880001afb410)
> [43925.668165] Stack:
> [43925.668167]  ffff880038a98060 0000000000000000 ffff88003c5a5c48
> ffffffff81182b7c
> [43925.668172]  ffff88003cab2688 ffff880024da9990 ffff88003caa18d8
> ffff880038a98060
> [43925.668177]  ffff880024da98b0 ffffea0000a54940 ffff88003c5a5c78
> ffff88003d402500
> [43925.668182] Call Trace:
> [43925.668189]  [<ffffffff81182b7c>] ? jbd2_journal_stop+0x21e/0x230
> [43925.668193]  [<ffffffff811be4bb>] ? selinux_cred_free+0xb/0x27
> [43925.668196]  [<ffffffff811be441>] ? selinux_file_alloc_security+0x4a/0xb9
> [43925.668201]  [<ffffffff810f4226>] ? check_object+0x13b/0x1eb
> [43925.668205]  [<ffffffff811bf853>] selinux_inode_permission+0xd2/0xd4
> [43925.668211]  [<ffffffff811bbf9c>] security_inode_permission+0x1c/0x1e
> [43925.668215]  [<ffffffff81101ab2>] inode_permission+0x87/0x93
> [43925.668218]  [<ffffffff81102e86>] may_open+0x9e/0x11e
> [43925.668221]  [<ffffffff8110373e>] do_last+0x542/0x6fa
> [43925.668225]  [<ffffffff811056ec>] do_filp_open+0x1f3/0x646
> [43925.668228]  [<ffffffff810f4226>] ? check_object+0x13b/0x1eb
> [43925.668232]  [<ffffffff81103958>] ? getname+0x2c/0x1be
> [43925.668236]  [<ffffffff8110eca8>] ? alloc_fd+0x111/0x123
> [43925.668240]  [<ffffffff810f7a84>] do_sys_open+0x5b/0xf8
> [43925.668243]  [<ffffffff810f7b4a>] sys_open+0x1b/0x1d
> [43925.668248]  [<ffffffff8102b542>] system_call_fastpath+0x16/0x1b
> [43925.668250] Code: 02 00 00 44 8b 48 04 48 85 c9 75 1f 4c 8d 85 70
> ff ff ff b9 22 00 00 00 4c 89 c7 44 89 d8 f3 ab c6 85 70 ff ff ff 01
> 48 89 75 90 <41> 0f b7 42 20 89 d1 41 8b 72 1c 89 c2 44 89 cf e8 99 e7
> ff ff
> [43925.668288] RIP  [<ffffffff811bf10a>] inode_has_perm+0x53/0x6a
> [43925.668291]  RSP <ffff88003c5a5bc8>
> [43925.668295] ---[ end trace 75bdddc506717838 ]---
> [43934.866252] general protection fault: 0000 [#2] SMP
> [43934.866257] last sysfs file: /sys/devices/platform/applesmc.768/light
> [43934.866260] CPU 0
> [43934.866261] Modules linked in: firewire_sbp2 radeon sco bnep ttm
> drm_kms_helper drm ipt_LOG iptable_nat nf_nat xt_state
> nf_conntrack_ftp nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4
> iptable_filter ip_tables x_tables ath9k ath9k_common video ath9k_hw
> sky2 firewire_ohci battery ac ath evdev joydev button firewire_core
> i2c_i801 kvm_intel aes_x86_64 lzo zlib ipcomp xfrm_ipcomp crypto_null
> sha256_generic cbc des_generic cast5 blowfish serpent camellia
> twofish_generic twofish_x86_64 twofish_common ctr ah4 esp4 authenc
> uhci_hcd ehci_hcd hci_uart rfcomm btusb hidp l2cap bluetooth coretemp
> acpi_cpufreq processor mperf appletouch applesmc uvcvideo
> [43934.866318]
> [43934.866321] Pid: 27283, comm: make Tainted: G      D
> 2.6.37-rc2-00037-g7957f0a-dirty #6 Mac-F42187C8/MacBookPro2,2
> [43934.866324] RIP: 0010:[<ffffffff811bf10a>]  [<ffffffff811bf10a>]
> inode_has_perm+0x53/0x6a
> [43934.866334] RSP: 0018:ffff88003c5a5bc8  EFLAGS: 00010282
> [43934.866336] RAX: ffff88003807a958 RBX: ffff88000008ed80 RCX: ffff88003c5a5c68
> [43934.866339] RDX: 0000000000000002 RSI: ffff88000008ed80 RDI: ffff880034b01700
> [43934.866341] RBP: ffff88003c5a5c58 R08: ffff88003c5a5c68 R09: 00000000000000d5
> [43934.866343] R10: 050366048b660e04 R11: 0000000000000000 R12: 0000000000000024
> [43934.866346] R13: 00000000ffffffd8 R14: 0000000000000000 R15: 0000000000000000
> [43934.866349] FS:  00007fdf0a661700(0000) GS:ffff88003ee00000(0000)
> knlGS:0000000000000000
> [43934.866352] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [43934.866354] CR2: 00007fdf0a5e5000 CR3: 0000000029800000 CR4: 00000000000006e0
> [43934.866357] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [43934.866359] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [43934.866362] Process make (pid: 27283, threadinfo ffff88003c5a4000,
> task ffff880001afb410)
> [43934.866364] Stack:
> [43934.866366]  ffff88002f398a50 ffff880024da9990 000000003c5a5c78
> ffffffff81810be8
> [43934.866371]  0020000000000001 0000000000000001 0000000000001000
> ffff880037bc0a00
> [43934.866375]  0000000000001000 ffffea0000a54940 ffff88003c5a5d18
> ffff88003d402500
> [43934.866380] Call Trace:
> [43934.866385]  [<ffffffff811be4bb>] ? selinux_cred_free+0xb/0x27
> [43934.866389]  [<ffffffff811be441>] ? selinux_file_alloc_security+0x4a/0xb9
> [43934.866395]  [<ffffffff810f4226>] ? check_object+0x13b/0x1eb
> [43934.866398]  [<ffffffff811bf853>] selinux_inode_permission+0xd2/0xd4
> [43934.866404]  [<ffffffff811bbf9c>] security_inode_permission+0x1c/0x1e
> [43934.866409]  [<ffffffff81101ab2>] inode_permission+0x87/0x93
> [43934.866412]  [<ffffffff81102e86>] may_open+0x9e/0x11e
> [43934.866415]  [<ffffffff8110373e>] do_last+0x542/0x6fa
> [43934.866419]  [<ffffffff811056ec>] do_filp_open+0x1f3/0x646
> [43934.866422]  [<ffffffff810f4226>] ? check_object+0x13b/0x1eb
> [43934.866426]  [<ffffffff81103958>] ? getname+0x2c/0x1be
> [43934.866430]  [<ffffffff8110eca8>] ? alloc_fd+0x111/0x123
> [43934.866433]  [<ffffffff810f7a84>] do_sys_open+0x5b/0xf8
> [43934.866437]  [<ffffffff810f7b4a>] sys_open+0x1b/0x1d
> [43934.866441]  [<ffffffff8102b542>] system_call_fastpath+0x16/0x1b
> [43934.866443] Code: 02 00 00 44 8b 48 04 48 85 c9 75 1f 4c 8d 85 70
> ff ff ff b9 22 00 00 00 4c 89 c7 44 89 d8 f3 ab c6 85 70 ff ff ff 01
> 48 89 75 90 <41> 0f b7 42 20 89 d1 41 8b 72 1c 89 c2 44 89 cf e8 99 e7
> ff ff
> [43934.866481] RIP  [<ffffffff811bf10a>] inode_has_perm+0x53/0x6a
> [43934.866484]  RSP <ffff88003c5a5bc8>
> [43934.866488] ---[ end trace 75bdddc506717839 ]---
> 
[...]

Hmm, ok, I have no idea about the root cause of this problem, but I did 
notice one thing about selinux_cred_free() that's different than most 
other freeing functions in the kernel. It does not accept a NULL value.
Most other freeing functions will just return if passed NULL, but 
selinux_cred_free() will crash.
I wonder if it would make sense to add a NULL 'short circuit' to that 
function? If so, please pick up the patch below.


Signed-off-by: Jesper Juhl <jj@chaosbits.net>
---
 hooks.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 65fa8bf..d088532 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3193,11 +3193,11 @@ static int selinux_cred_alloc_blank(struct cred *cred, gfp_t gfp)
  */
 static void selinux_cred_free(struct cred *cred)
 {
-	struct task_security_struct *tsec = cred->security;
-
+	if (!cred)
+		return;
 	BUG_ON((unsigned long) cred->security < PAGE_SIZE);
 	cred->security = (void *) 0x7UL;
-	kfree(tsec);
+	kfree(cred->security);
 }
 
 /*


-- 
Jesper Juhl <jj@chaosbits.net>            http://www.chaosbits.net/
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please.


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* general protection fault: 0000 [#1] SMP
@ 2010-11-20 16:35 Justin Mattock
  2010-11-20 22:28 ` Jesper Juhl
  0 siblings, 1 reply; 18+ messages in thread
From: Justin Mattock @ 2010-11-20 16:35 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Ive seen this before, but could not reproduce for a bisect.. basically
what I remember doing
was building webkit(let sit and compile) passed out, woke up at 5AM
closed the lid on the machine,few hrs later
woke up, went for a run, came back opened the lid and this:

[43925.668053] general protection fault: 0000 [#1] SMP
[43925.668059] last sysfs file: /sys/devices/platform/applesmc.768/light
[43925.668061] CPU 0
[43925.668063] Modules linked in: firewire_sbp2 radeon sco bnep ttm
drm_kms_helper drm ipt_LOG iptable_nat nf_nat xt_state
nf_conntrack_ftp nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4
iptable_filter ip_tables x_tables ath9k ath9k_common video ath9k_hw
sky2 firewire_ohci battery ac ath evdev joydev button firewire_core
i2c_i801 kvm_intel aes_x86_64 lzo zlib ipcomp xfrm_ipcomp crypto_null
sha256_generic cbc des_generic cast5 blowfish serpent camellia
twofish_generic twofish_x86_64 twofish_common ctr ah4 esp4 authenc
uhci_hcd ehci_hcd hci_uart rfcomm btusb hidp l2cap bluetooth coretemp
acpi_cpufreq processor mperf appletouch applesmc uvcvideo
[43925.668120]
[43925.668123] Pid: 27262, comm: make Not tainted
2.6.37-rc2-00037-g7957f0a-dirty #6 Mac-F42187C8/MacBookPro2,2
[43925.668126] RIP: 0010:[<ffffffff811bf10a>]  [<ffffffff811bf10a>]
inode_has_perm+0x53/0x6a
[43925.668135] RSP: 0018:ffff88003c5a5bc8  EFLAGS: 00010282
[43925.668137] RAX: ffff88003826a208 RBX: ffff88000008ed80 RCX: ffff88003c5a5c68
[43925.668140] RDX: 0000000000000002 RSI: ffff88000008ed80 RDI: ffff88002feacc00
[43925.668142] RBP: ffff88003c5a5c58 R08: ffff88003c5a5c68 R09: 00000000000000d5
[43925.668145] R10: 050366048b660e04 R11: 0000000000000000 R12: 0000000000000024
[43925.668147] R13: 00000000ffffffd8 R14: 0000000000000000 R15: 0000000000000000
[43925.668150] FS:  00007f4f786b3700(0000) GS:ffff88003ee00000(0000)
knlGS:0000000000000000
[43925.668153] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[43925.668155] CR2: 00007f4f78637000 CR3: 00000000383ac000 CR4: 00000000000006e0
[43925.668158] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[43925.668161] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[43925.668163] Process make (pid: 27262, threadinfo ffff88003c5a4000,
task ffff880001afb410)
[43925.668165] Stack:
[43925.668167]  ffff880038a98060 0000000000000000 ffff88003c5a5c48
ffffffff81182b7c
[43925.668172]  ffff88003cab2688 ffff880024da9990 ffff88003caa18d8
ffff880038a98060
[43925.668177]  ffff880024da98b0 ffffea0000a54940 ffff88003c5a5c78
ffff88003d402500
[43925.668182] Call Trace:
[43925.668189]  [<ffffffff81182b7c>] ? jbd2_journal_stop+0x21e/0x230
[43925.668193]  [<ffffffff811be4bb>] ? selinux_cred_free+0xb/0x27
[43925.668196]  [<ffffffff811be441>] ? selinux_file_alloc_security+0x4a/0xb9
[43925.668201]  [<ffffffff810f4226>] ? check_object+0x13b/0x1eb
[43925.668205]  [<ffffffff811bf853>] selinux_inode_permission+0xd2/0xd4
[43925.668211]  [<ffffffff811bbf9c>] security_inode_permission+0x1c/0x1e
[43925.668215]  [<ffffffff81101ab2>] inode_permission+0x87/0x93
[43925.668218]  [<ffffffff81102e86>] may_open+0x9e/0x11e
[43925.668221]  [<ffffffff8110373e>] do_last+0x542/0x6fa
[43925.668225]  [<ffffffff811056ec>] do_filp_open+0x1f3/0x646
[43925.668228]  [<ffffffff810f4226>] ? check_object+0x13b/0x1eb
[43925.668232]  [<ffffffff81103958>] ? getname+0x2c/0x1be
[43925.668236]  [<ffffffff8110eca8>] ? alloc_fd+0x111/0x123
[43925.668240]  [<ffffffff810f7a84>] do_sys_open+0x5b/0xf8
[43925.668243]  [<ffffffff810f7b4a>] sys_open+0x1b/0x1d
[43925.668248]  [<ffffffff8102b542>] system_call_fastpath+0x16/0x1b
[43925.668250] Code: 02 00 00 44 8b 48 04 48 85 c9 75 1f 4c 8d 85 70
ff ff ff b9 22 00 00 00 4c 89 c7 44 89 d8 f3 ab c6 85 70 ff ff ff 01
48 89 75 90 <41> 0f b7 42 20 89 d1 41 8b 72 1c 89 c2 44 89 cf e8 99 e7
ff ff
[43925.668288] RIP  [<ffffffff811bf10a>] inode_has_perm+0x53/0x6a
[43925.668291]  RSP <ffff88003c5a5bc8>
[43925.668295] ---[ end trace 75bdddc506717838 ]---
[43934.866252] general protection fault: 0000 [#2] SMP
[43934.866257] last sysfs file: /sys/devices/platform/applesmc.768/light
[43934.866260] CPU 0
[43934.866261] Modules linked in: firewire_sbp2 radeon sco bnep ttm
drm_kms_helper drm ipt_LOG iptable_nat nf_nat xt_state
nf_conntrack_ftp nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4
iptable_filter ip_tables x_tables ath9k ath9k_common video ath9k_hw
sky2 firewire_ohci battery ac ath evdev joydev button firewire_core
i2c_i801 kvm_intel aes_x86_64 lzo zlib ipcomp xfrm_ipcomp crypto_null
sha256_generic cbc des_generic cast5 blowfish serpent camellia
twofish_generic twofish_x86_64 twofish_common ctr ah4 esp4 authenc
uhci_hcd ehci_hcd hci_uart rfcomm btusb hidp l2cap bluetooth coretemp
acpi_cpufreq processor mperf appletouch applesmc uvcvideo
[43934.866318]
[43934.866321] Pid: 27283, comm: make Tainted: G      D
2.6.37-rc2-00037-g7957f0a-dirty #6 Mac-F42187C8/MacBookPro2,2
[43934.866324] RIP: 0010:[<ffffffff811bf10a>]  [<ffffffff811bf10a>]
inode_has_perm+0x53/0x6a
[43934.866334] RSP: 0018:ffff88003c5a5bc8  EFLAGS: 00010282
[43934.866336] RAX: ffff88003807a958 RBX: ffff88000008ed80 RCX: ffff88003c5a5c68
[43934.866339] RDX: 0000000000000002 RSI: ffff88000008ed80 RDI: ffff880034b01700
[43934.866341] RBP: ffff88003c5a5c58 R08: ffff88003c5a5c68 R09: 00000000000000d5
[43934.866343] R10: 050366048b660e04 R11: 0000000000000000 R12: 0000000000000024
[43934.866346] R13: 00000000ffffffd8 R14: 0000000000000000 R15: 0000000000000000
[43934.866349] FS:  00007fdf0a661700(0000) GS:ffff88003ee00000(0000)
knlGS:0000000000000000
[43934.866352] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[43934.866354] CR2: 00007fdf0a5e5000 CR3: 0000000029800000 CR4: 00000000000006e0
[43934.866357] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[43934.866359] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[43934.866362] Process make (pid: 27283, threadinfo ffff88003c5a4000,
task ffff880001afb410)
[43934.866364] Stack:
[43934.866366]  ffff88002f398a50 ffff880024da9990 000000003c5a5c78
ffffffff81810be8
[43934.866371]  0020000000000001 0000000000000001 0000000000001000
ffff880037bc0a00
[43934.866375]  0000000000001000 ffffea0000a54940 ffff88003c5a5d18
ffff88003d402500
[43934.866380] Call Trace:
[43934.866385]  [<ffffffff811be4bb>] ? selinux_cred_free+0xb/0x27
[43934.866389]  [<ffffffff811be441>] ? selinux_file_alloc_security+0x4a/0xb9
[43934.866395]  [<ffffffff810f4226>] ? check_object+0x13b/0x1eb
[43934.866398]  [<ffffffff811bf853>] selinux_inode_permission+0xd2/0xd4
[43934.866404]  [<ffffffff811bbf9c>] security_inode_permission+0x1c/0x1e
[43934.866409]  [<ffffffff81101ab2>] inode_permission+0x87/0x93
[43934.866412]  [<ffffffff81102e86>] may_open+0x9e/0x11e
[43934.866415]  [<ffffffff8110373e>] do_last+0x542/0x6fa
[43934.866419]  [<ffffffff811056ec>] do_filp_open+0x1f3/0x646
[43934.866422]  [<ffffffff810f4226>] ? check_object+0x13b/0x1eb
[43934.866426]  [<ffffffff81103958>] ? getname+0x2c/0x1be
[43934.866430]  [<ffffffff8110eca8>] ? alloc_fd+0x111/0x123
[43934.866433]  [<ffffffff810f7a84>] do_sys_open+0x5b/0xf8
[43934.866437]  [<ffffffff810f7b4a>] sys_open+0x1b/0x1d
[43934.866441]  [<ffffffff8102b542>] system_call_fastpath+0x16/0x1b
[43934.866443] Code: 02 00 00 44 8b 48 04 48 85 c9 75 1f 4c 8d 85 70
ff ff ff b9 22 00 00 00 4c 89 c7 44 89 d8 f3 ab c6 85 70 ff ff ff 01
48 89 75 90 <41> 0f b7 42 20 89 d1 41 8b 72 1c 89 c2 44 89 cf e8 99 e7
ff ff
[43934.866481] RIP  [<ffffffff811bf10a>] inode_has_perm+0x53/0x6a
[43934.866484]  RSP <ffff88003c5a5bc8>
[43934.866488] ---[ end trace 75bdddc506717839 ]---



system seems usable after this... just trying to reproduce for a
bisect seems impossible
at this point(If I can I will post)

-- 
Justin P. Mattock

^ permalink raw reply	[flat|nested] 18+ messages in thread

* general protection fault: 0000 [#1] SMP
@ 2010-07-03 22:59 Justin P. Mattock
  0 siblings, 0 replies; 18+ messages in thread
From: Justin P. Mattock @ 2010-07-03 22:59 UTC (permalink / raw)
  To: Linux Kernel Mailing List

the kernel barfed this up, on waking up from suspend.. I've tried to 
reproduce this but haven't.(will see if I can, then will do a bisect)
also I've had to revert commit: 6a4f3b52377 due to another issue
so maybe that is a factor in this..

[10384.818511] general protection fault: 0000 [#1] SMP
[10384.818517] last sysfs file: /sys/devices/platform/applesmc.768/light
[10384.818520] CPU 1
[10384.818522] Modules linked in: radeon ttm drm_kms_helper drm sco xcbc 
bnep rmd160 sha512_generic xt_tcpudp ipt_LOG iptable_nat nf_nat xt_state 
nf_conntrack_ftp nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 
iptable_filter ip_tables x_tables ath9k ath9k_common firewire_ohci 
firewire_core battery ath9k_hw ac video evdev ohci1394 sky2 ath joydev 
button thermal i2c_i801 hid_magicmouse aes_x86_64 lzo lzo_compress zlib 
ipcomp xfrm_ipcomp crypto_null sha256_generic cbc des_generic cast5 
blowfish serpent camellia twofish twofish_common ctr ah4 esp4 authenc 
raw1394 ieee1394 uhci_hcd ehci_hcd hci_uart rfcomm btusb hidp l2cap 
bluetooth coretemp acpi_cpufreq processor mperf appletouch applesmc uvcvideo
[10384.818594]
[10384.818598] Pid: 409, comm: kswapd0 Not tainted 
2.6.35-rc3-00398-g5a847c7-dirty #13 Mac-F42187C8/MacBookPro2,2
[10384.818601] RIP: 0010:[<ffffffff810b7487>]  [<ffffffff810b7487>] 
find_get_pages+0x62/0xc0
[10384.818611] RSP: 0018:ffff88003e011b40  EFLAGS: 00010293
[10384.818614] RAX: ffff88000008f000 RBX: ffff88003e011bf0 RCX: 
0000000000000003
[10384.818617] RDX: ffff88003e011c08 RSI: 0000000000000001 RDI: 
8ed88ec88ce88b66
[10384.818620] RBP: ffff88003e011b90 R08: 8ed88ec88ce88b6e R09: 
0000000000000002
[10384.818623] R10: ffff88000008f050 R11: ffff88000008f050 R12: 
ffffffffffffffff
[10384.818626] R13: 000000000000000e R14: 0000000000000000 R15: 
0000000000000003
[10384.818629] FS:  0000000000000000(0000) GS:ffff880001b00000(0000) 
knlGS:0000000000000000
[10384.818632] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[10384.818635] CR2: 00007f1a8989b000 CR3: 000000000166d000 CR4: 
00000000000006e0
[10384.818638] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[10384.818641] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[10384.818644] Process kswapd0 (pid: 409, threadinfo ffff88003e010000, 
task ffff88003eded490)
[10384.818646] Stack:
[10384.818648]  ffff88003e011b70 ffffffff810c0e85 ffff880018a2afe0 
0000000e0001fad8
[10384.818652] <0> ffff880018a30c68 ffff88003e011be0 0000000000000000 
ffff88003e011be0
[10384.818657] <0> ffffffffffffffff ffff880018a2afd8 ffff88003e011bb0 
ffffffff810bed06
[10384.818663] Call Trace:
[10384.818669]  [<ffffffff810c0e85>] ? __remove_mapping+0xa5/0xbe
[10384.818674]  [<ffffffff810bed06>] pagevec_lookup+0x1d/0x26
[10384.818678]  [<ffffffff810bfb78>] invalidate_mapping_pages+0xe7/0x10b
[10384.818683]  [<ffffffff810fdc4a>] shrink_icache_memory+0x10a/0x227
[10384.818687]  [<ffffffff810c21fc>] shrink_slab+0xd6/0x147
[10384.818691]  [<ffffffff810c25d2>] balance_pgdat+0x365/0x5b4
[10384.818695]  [<ffffffff810c29c7>] kswapd+0x1a6/0x1bc
[10384.818700]  [<ffffffff81070d75>] ? autoremove_wake_function+0x0/0x34
[10384.818704]  [<ffffffff810c2821>] ? kswapd+0x0/0x1bc
[10384.818707]  [<ffffffff81070953>] kthread+0x7a/0x82
[10384.818712]  [<ffffffff81027264>] kernel_thread_helper+0x4/0x10
[10384.818716]  [<ffffffff810708d9>] ? kthread+0x0/0x82
[10384.818719]  [<ffffffff81027260>] ? kernel_thread_helper+0x0/0x10
[10384.818721] Code: f5 d0 11 00 48 89 da 89 45 cc 31 c9 eb 64 48 8b 02 
48 8b 38 40 f6 c7 01 49 0f 45 fc 48 85 ff 74 4b 48 83 ff ff 74 c8 4c 8d 
47 08 <8b> 77 08 85 f6 74 dc 44 8d 4e 01 89 f0 f0 45 0f b1 08 39 f0 74
[10384.818762] RIP  [<ffffffff810b7487>] find_get_pages+0x62/0xc0
[10384.818767]  RSP <ffff88003e011b40>
[10384.818770] ---[ end trace 594fde37483e4533 ]---



Justin P. Mattock

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: general protection fault: 0000 [1] SMP
  2006-01-30  8:54 general protection fault: 0000 [1] SMP Martin Klier
@ 2006-02-17 13:25 ` Martin Klier
  0 siblings, 0 replies; 18+ messages in thread
From: Martin Klier @ 2006-02-17 13:25 UTC (permalink / raw)
  To: linux-admin

[-- Attachment #1: Type: text/plain, Size: 1491 bytes --]

Hi there,

> Jan 28 00:27:54 svdbslx002 kernel: general protection fault: 0000 [1] SMP
> Jan 28 00:27:54 svdbslx002 kernel: CPU 1
> Jan 28 00:27:54 svdbslx002 kernel: Pid: 14706, comm: ps Tainted: P   U
> (2.6.5-7.201-smp SLES9_SP2_BRANCH-20050825
> 0620450000)

I have found the culprit (Novell support helped a lot):
The 2.6.5-7.201 kernel for x86_64 suffered from a racing condition via which
32-bit systemcalls could cause Oopses. As far as I understood occured the 
problem in memory allocation for 32bit code.

Maybe you want to read 
http://www.x86-64.org/lists/discuss/msg05795.html
or
http://groups.google.de/group/fa.linux.kernel/browse_thread/thread/d3d72f301833e6e2/948440747db94d2b?lnk=st&q=kernel%3A+general+protection+fault%3A+0000+%5B1%5D+SMP&rnum=1&hl=de#948440747db94d2b

It will help to upgrade to the 2.6.5-7.244 kernel, available as part of SP3 or 
as
the separate patch-10731, "Recommended update for Linux kernel",     
http://support.novell.com/cgi-bin/search/searchtid.cgi?/psdb/309c95cc337c1c860f8b7fd1ef14067a.html

Have care, maybe you will have to re-think your raw device rights management 
after patching to -7.244. Dunno why, but it's fact that since the upgrade the 
raw binary messes up the given file permissions/ownership/group each time you 
(re)map a raw device.

Regards,
-- 
Mit freundlichen Grüßen

i.A. Martin Klier
Serveradministration / Datenbanken

A.T.U - Auto-Teile-Unger
Dr.-Kilian-Straße 11
92637 Weiden

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* general protection fault: 0000 [1] SMP
@ 2006-01-30  8:54 Martin Klier
  2006-02-17 13:25 ` Martin Klier
  0 siblings, 1 reply; 18+ messages in thread
From: Martin Klier @ 2006-01-30  8:54 UTC (permalink / raw)
  To: linux-admin

[-- Attachment #1: Type: text/plain, Size: 3342 bytes --]

Dear list,

has somebody seen something like that before?

------------------snip----------------------
Jan 28 00:27:54 svdbslx002 kernel: general protection fault: 0000 [1] SMP
Jan 28 00:27:54 svdbslx002 kernel: CPU 1
Jan 28 00:27:54 svdbslx002 kernel: Pid: 14706, comm: ps Tainted: P   U   
(2.6.5-7.201-smp SLES9_SP2_BRANCH-20050825
0620450000)
Jan 28 00:27:54 svdbslx002 kernel: RIP: 0010:[<ffffffff80177f4b>] 
<ffffffff80177f4b>{get_user_pages+267}
Jan 28 00:27:54 svdbslx002 kernel: RSP: 0018:00000101d2e8dd58  EFLAGS: 
00010202
Jan 28 00:27:54 svdbslx002 kernel: RAX: 00009cd0f0009ff8 RBX: 00000000ffffe000 
RCX: 0000010000000000
Jan 28 00:27:54 svdbslx002 kernel: RDX: 00009bd0f0009ff8 RSI: 000ffffffffff000 
RDI: ffffffff803d4f80
Jan 28 00:27:54 svdbslx002 kernel: RBP: 00000101d8d6cc00 R08: 0000000000000000 
R09: 0000000000000001
Jan 28 00:27:54 svdbslx002 kernel: R10: 0000000000000001 R11: 0000000000000246 
R12: 0000000000000000
Jan 28 00:27:54 svdbslx002 kernel: R13: 0000000000000000 R14: 000001017b9163e0 
R15: 0000000000000001
Jan 28 00:27:54 svdbslx002 kernel: FS:  00000000417ff960(0000) 
GS:ffffffff80562f00(0000) knlGS:00000000557d56a0
Jan 28 00:27:54 svdbslx002 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 
000000008005003b
Jan 28 00:27:54 svdbslx002 kernel: CR2: 0000007fbfffbc90 CR3: 000000017ff54000 
CR4: 00000000000006e0
Jan 28 00:27:54 svdbslx002 kernel: Process ps (pid: 14706, threadinfo 
00000101d2e8c000, task 000001015fecf330)
Jan 28 00:27:54 svdbslx002 kernel: Stack: 00000101e9e263a0 ffffffff8019c54b 
00000101d2e8de98 0000000000000000
Jan 28 00:27:54 svdbslx002 kernel:        000fffffeff80106 0000000000000106 
00000010d2e8dde8 00000000ffffe000
Jan 28 00:27:54 svdbslx002 kernel:        0000000000000001 00000101d2e8de18
Jan 28 00:27:54 svdbslx002 kernel: Call 
Trace:<ffffffff8019c54b>{real_lookup+123} <ffffffff80145493>{access_process
_vm+179}
Jan 28 00:27:54 svdbslx002 kernel:        
<ffffffff801c4d02>{proc_pid_cmdline+146} <ffffffff801c418f>{proc_info_rea
d+111}
Jan 28 00:27:54 svdbslx002 kernel:        <ffffffff8018d234>{vfs_read+244} 
<ffffffff8018d48d>{sys_read+157}
Jan 28 00:27:54 svdbslx002 kernel:        <ffffffff80189e87>{sys_open+231} 
<ffffffff801107d4>{system_call+124}
Jan 28 00:27:54 svdbslx002 kernel:
Jan 28 00:27:54 svdbslx002 kernel:
Jan 28 00:27:54 svdbslx002 kernel: Code: 48 8b 00 48 c1 eb 09 81 e3 f8 0f 00 
00 48 21 f0 48 01 d8 48
Jan 28 00:27:54 svdbslx002 kernel: RIP <ffffffff80177f4b>{get_user_pages+267} 
RSP <00000101d2e8dd58>
------------------snip----------------------

In the web, I found several *shrug*s and vague hints on a nmi watchdog issue. 
But I have not seen a real solution anywhere. Can YOU tell me more?
-- 
Mit freundlichen Grüßen

i.A. Martin Klier
PC-Benutzerunterstützung / Linux-Server
IT Asset Management

A.T.U - Auto-Teile-Unger
Dr.-Kilian-Straße 11
92637 Weiden

Telefon:  (0961) 306-5663
Telefax:  (0961) 306-5982

Internet: www.atu.de

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

[-- Attachment #2: signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-10-12 13:58 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-11 14:40 general protection fault: 0000 [#1] SMP Olivier Bonvalet
2017-10-12  7:12 ` [ceph-users] " Ilya Dryomov
     [not found]   ` <CAOi1vP--q8y696g5W_AUmR9Yxe5Xop3BH3xjEQG6_pmQmXO6kA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-10-12  7:26     ` Re : " Olivier Bonvalet
2017-10-12 13:58       ` Re : [ceph-users] " Luis Henriques
2017-10-12 10:23   ` Jeff Layton
     [not found]     ` <1507803838.5310.9.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-10-12 10:50       ` Ilya Dryomov
     [not found] <550186fd-f426-08a6-8b32-e2818717b06a@molgen.mpg.de>
2017-05-04 10:49 ` Jeff Layton
  -- strict thread matches above, loose matches on Subject: below --
2011-03-14 17:41 Justin P. Mattock
2010-11-20 16:35 Justin Mattock
2010-11-20 22:28 ` Jesper Juhl
2010-11-20 22:32   ` Jesper Juhl
2010-11-20 23:21     ` Justin P. Mattock
2010-11-22 19:01     ` Justin P. Mattock
2010-11-22 20:25       ` Hugh Dickins
2010-11-22 21:44         ` Justin P. Mattock
2010-07-03 22:59 Justin P. Mattock
2006-01-30  8:54 general protection fault: 0000 [1] SMP Martin Klier
2006-02-17 13:25 ` Martin Klier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.