* applesmc oops in 3.10/3.11 @ 2013-09-25 19:06 ` Josh Boyer 0 siblings, 0 replies; 86+ messages in thread From: Josh Boyer @ 2013-09-25 19:06 UTC (permalink / raw) To: Henrik Rydberg Cc: khali, Guenter Roeck, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla Hi All, Chris has reported[1] an issue with the applesmc driver in the 3.10 and 3.11 kernels. This seems to be a bit transient, but the oops is consistent across those releases. This is with a MacBook Pro 4,1. The backtrace is below. Any ideas on this one? josh [1] https://bugzilla.redhat.com/show_bug.cgi?id=1011719 BUG: unable to handle kernel paging request at ffff880322aed005 IP: [<ffffffffa05dd345>] applesmc_get_entry_by_index+0x25/0xf0 [applesmc] PGD 1fe5067 PUD 0 Oops: 0000 [#1] SMP Modules linked in: b43 coretemp kvm_intel kvm joydev bcma mac80211 uvcvideo snd_hda_codec_realtek applesmc(+) iTCO_wdt cfg80211 iTCO_vendor_support microcode input_polldev snd_hda_intel snd_hda_codec i2c_i801 videobuf2_vmalloc videobuf2_memops videobuf2_core videodev snd_hwdep snd_seq bcm5974 btusb hid_appleir snd_seq_device media bluetooth snd_pcm lpc_ich mfd_core rfkill snd_page_alloc snd_timer snd apple_bl soundcore shpchp acpi_cpufreq mperf nfsd auth_rpcgss nfs_acl lockd isofs squashfs btrfs libcrc32c xor zlib_deflate raid6_pq usb_storage nouveau firewire_ohci ata_generic pata_acpi firewire_core crc_itu_t sky2 mxm_wmi wmi i2c_algo_bit ssb drm_kms_helper mmc_core ttm drm i2c_core video sunrpc loop CPU: 1 PID: 492 Comm: systemd-udevd Not tainted 3.11.0-300.fc20.x86_64 #1 Hardware name: Apple Inc. MacBookPro4,1/Mac-F42C89C8, BIOS MBP41.88Z.00C1.B03.0802271651 02/27/08 task: ffff880132a407a0 ti: ffff8801343fe000 task.ti: ffff8801343fe000 RIP: 0010:[<ffffffffa05dd345>] [<ffffffffa05dd345>] applesmc_get_entry_by_index+0x25/0xf0 [applesmc] RSP: 0018:ffff8801343ffa18 EFLAGS: 00010286 RAX: 0000000026000000 RBX: ffff880322aed000 RCX: ffff8801343ffaf4 RDX: 0000000072000000 RSI: ffffffffa05df06e RDI: 0000000026000000 RBP: ffff8801343ffa38 R08: ffff8801343fe000 R09: ffff880132a6c100 R10: 0000000000000000 R11: 0000000000014180 R12: 0000000026000000 R13: 000000004c000001 R14: ffffffffa05df06e R15: ffff8801343ffa84 FS: 00007f6aac863880(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff880322aed005 CR3: 00000001340fa000 CR4: 00000000000007e0 Stack: 000000000004deef ffffffffffffff10 0000000026000000 0000000000000000 ffff8801343ffa70 ffffffffa05dd4f1 0000000000000032 0000000000000001 ffffffffa05df06e ffff880134be3810 ffffffffa05e0660 ffff8801343ffaa8 Call Trace: [<ffffffffa05dd4f1>] applesmc_get_lower_bound+0x51/0xa0 [applesmc] [<ffffffffa05dd563>] applesmc_get_entry_by_key+0x23/0xc0 [applesmc] [<ffffffffa05dd9f9>] applesmc_read_key+0x19/0x70 [applesmc] [<ffffffffa05de08f>] applesmc_init_smcreg+0x8f/0x2d0 [applesmc] [<ffffffffa05de2e2>] applesmc_probe+0x12/0x30 [applesmc] [<ffffffff813f41dc>] platform_drv_probe+0x3c/0x70 [<ffffffff813f18b2>] ? driver_sysfs_add+0x82/0xb0 [<ffffffff813f1f77>] driver_probe_device+0x87/0x390 [<ffffffff813f2280>] ? driver_probe_device+0x390/0x390 [<ffffffff813f22bb>] __device_attach+0x3b/0x40 [<ffffffff813eff73>] bus_for_each_drv+0x63/0xa0 [<ffffffff813f1e78>] device_attach+0x88/0xa0 [<ffffffff813f11e8>] bus_probe_device+0x98/0xc0 [<ffffffff813eeeb4>] device_add+0x4c4/0x7a0 [<ffffffff813f3cb1>] platform_device_add+0xd1/0x2d0 [<ffffffff813f4430>] platform_device_register_full+0xe0/0x120 [<ffffffffa05e3000>] ? 0xffffffffa05e2fff [<ffffffffa05e3098>] applesmc_init+0x98/0x1000 [applesmc] [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0 [<ffffffff81052483>] ? set_memory_nx+0x43/0x50 [<ffffffff810cc58d>] load_module+0x1bbd/0x2660 [<ffffffff810c8890>] ? store_uevent+0x40/0x40 [<ffffffff810cd1a6>] SyS_finit_module+0x86/0xb0 [<ffffffff81652cd9>] system_call_fastpath+0x16/0x1b Code: 84 00 00 00 00 00 66 66 66 66 90 55 48 63 c7 48 8d 14 40 48 89 e5 41 54 41 89 fc 53 48 8d 1c 90 48 83 ec 10 48 03 1d 03 33 00 00 <80> 7b 05 00 48 89 d8 74 12 48 83 c4 10 5b 41 5c 5d c3 66 0f 1f RIP [<ffffffffa05dd345>] applesmc_get_entry_by_index+0x25/0xf0 [applesmc] RSP <ffff8801343ffa18> ^ permalink raw reply [flat|nested] 86+ messages in thread
* [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-09-25 19:06 ` Josh Boyer 0 siblings, 0 replies; 86+ messages in thread From: Josh Boyer @ 2013-09-25 19:06 UTC (permalink / raw) To: Henrik Rydberg Cc: khali, Guenter Roeck, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla Hi All, Chris has reported[1] an issue with the applesmc driver in the 3.10 and 3.11 kernels. This seems to be a bit transient, but the oops is consistent across those releases. This is with a MacBook Pro 4,1. The backtrace is below. Any ideas on this one? josh [1] https://bugzilla.redhat.com/show_bug.cgi?id\x1011719 BUG: unable to handle kernel paging request at ffff880322aed005 IP: [<ffffffffa05dd345>] applesmc_get_entry_by_index+0x25/0xf0 [applesmc] PGD 1fe5067 PUD 0 Oops: 0000 [#1] SMP Modules linked in: b43 coretemp kvm_intel kvm joydev bcma mac80211 uvcvideo snd_hda_codec_realtek applesmc(+) iTCO_wdt cfg80211 iTCO_vendor_support microcode input_polldev snd_hda_intel snd_hda_codec i2c_i801 videobuf2_vmalloc videobuf2_memops videobuf2_core videodev snd_hwdep snd_seq bcm5974 btusb hid_appleir snd_seq_device media bluetooth snd_pcm lpc_ich mfd_core rfkill snd_page_alloc snd_timer snd apple_bl soundcore shpchp acpi_cpufreq mperf nfsd auth_rpcgss nfs_acl lockd isofs squashfs btrfs libcrc32c xor zlib_deflate raid6_pq usb_storage nouveau firewire_ohci ata_generic pata_acpi firewire_core crc_itu_t sky2 mxm_wmi wmi i2c_algo_bit ssb drm_kms_helper mmc_core ttm drm i2c_core video sunrpc loop CPU: 1 PID: 492 Comm: systemd-udevd Not tainted 3.11.0-300.fc20.x86_64 #1 Hardware name: Apple Inc. MacBookPro4,1/Mac-F42C89C8, BIOS MBP41.88Z.00C1.B03.0802271651 02/27/08 task: ffff880132a407a0 ti: ffff8801343fe000 task.ti: ffff8801343fe000 RIP: 0010:[<ffffffffa05dd345>] [<ffffffffa05dd345>] applesmc_get_entry_by_index+0x25/0xf0 [applesmc] RSP: 0018:ffff8801343ffa18 EFLAGS: 00010286 RAX: 0000000026000000 RBX: ffff880322aed000 RCX: ffff8801343ffaf4 RDX: 0000000072000000 RSI: ffffffffa05df06e RDI: 0000000026000000 RBP: ffff8801343ffa38 R08: ffff8801343fe000 R09: ffff880132a6c100 R10: 0000000000000000 R11: 0000000000014180 R12: 0000000026000000 R13: 000000004c000001 R14: ffffffffa05df06e R15: ffff8801343ffa84 FS: 00007f6aac863880(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff880322aed005 CR3: 00000001340fa000 CR4: 00000000000007e0 Stack: 000000000004deef ffffffffffffff10 0000000026000000 0000000000000000 ffff8801343ffa70 ffffffffa05dd4f1 0000000000000032 0000000000000001 ffffffffa05df06e ffff880134be3810 ffffffffa05e0660 ffff8801343ffaa8 Call Trace: [<ffffffffa05dd4f1>] applesmc_get_lower_bound+0x51/0xa0 [applesmc] [<ffffffffa05dd563>] applesmc_get_entry_by_key+0x23/0xc0 [applesmc] [<ffffffffa05dd9f9>] applesmc_read_key+0x19/0x70 [applesmc] [<ffffffffa05de08f>] applesmc_init_smcreg+0x8f/0x2d0 [applesmc] [<ffffffffa05de2e2>] applesmc_probe+0x12/0x30 [applesmc] [<ffffffff813f41dc>] platform_drv_probe+0x3c/0x70 [<ffffffff813f18b2>] ? driver_sysfs_add+0x82/0xb0 [<ffffffff813f1f77>] driver_probe_device+0x87/0x390 [<ffffffff813f2280>] ? driver_probe_device+0x390/0x390 [<ffffffff813f22bb>] __device_attach+0x3b/0x40 [<ffffffff813eff73>] bus_for_each_drv+0x63/0xa0 [<ffffffff813f1e78>] device_attach+0x88/0xa0 [<ffffffff813f11e8>] bus_probe_device+0x98/0xc0 [<ffffffff813eeeb4>] device_add+0x4c4/0x7a0 [<ffffffff813f3cb1>] platform_device_add+0xd1/0x2d0 [<ffffffff813f4430>] platform_device_register_full+0xe0/0x120 [<ffffffffa05e3000>] ? 0xffffffffa05e2fff [<ffffffffa05e3098>] applesmc_init+0x98/0x1000 [applesmc] [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0 [<ffffffff81052483>] ? set_memory_nx+0x43/0x50 [<ffffffff810cc58d>] load_module+0x1bbd/0x2660 [<ffffffff810c8890>] ? store_uevent+0x40/0x40 [<ffffffff810cd1a6>] SyS_finit_module+0x86/0xb0 [<ffffffff81652cd9>] system_call_fastpath+0x16/0x1b Code: 84 00 00 00 00 00 66 66 66 66 90 55 48 63 c7 48 8d 14 40 48 89 e5 41 54 41 89 fc 53 48 8d 1c 90 48 83 ec 10 48 03 1d 03 33 00 00 <80> 7b 05 00 48 89 d8 74 12 48 83 c4 10 5b 41 5c 5d c3 66 0f 1f RIP [<ffffffffa05dd345>] applesmc_get_entry_by_index+0x25/0xf0 [applesmc] RSP <ffff8801343ffa18> _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-09-25 19:06 ` [lm-sensors] " Josh Boyer @ 2013-09-25 19:56 ` Guenter Roeck -1 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-09-25 19:56 UTC (permalink / raw) To: Josh Boyer Cc: Henrik Rydberg, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On Wed, Sep 25, 2013 at 03:06:25PM -0400, Josh Boyer wrote: > Hi All, > > Chris has reported[1] an issue with the applesmc driver in the 3.10 > and 3.11 kernels. This seems to be a bit transient, but the oops is > consistent across those releases. This is with a MacBook Pro 4,1. > The backtrace is below. > > Any ideas on this one? > Some of the bug reports suggest that there are related messages in at least some of the kernel logs. Jul 02 22:23:22 f19s.local kernel: applesmc: send_byte(0x04, 0x0300) fail: 0x01 Jul 02 22:23:22 f19s.local kernel: applesmc: : read len fail This suggests that initialization may be attempted more than once. The key cache is allocated only once, but the number of keys is read for each attempt. No idea if that can happen, but if the number of keys can increase after the first initialization attempt you would have an explanation for the crash. Guenter > josh > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1011719 > > BUG: unable to handle kernel paging request at ffff880322aed005 > IP: [<ffffffffa05dd345>] applesmc_get_entry_by_index+0x25/0xf0 [applesmc] > PGD 1fe5067 PUD 0 > Oops: 0000 [#1] SMP > Modules linked in: b43 coretemp kvm_intel kvm joydev bcma mac80211 > uvcvideo snd_hda_codec_realtek applesmc(+) iTCO_wdt cfg80211 > iTCO_vendor_support microcode input_polldev snd_hda_intel > snd_hda_codec i2c_i801 videobuf2_vmalloc videobuf2_memops > videobuf2_core videodev snd_hwdep snd_seq bcm5974 btusb hid_appleir > snd_seq_device media bluetooth snd_pcm lpc_ich mfd_core rfkill > snd_page_alloc snd_timer snd apple_bl soundcore shpchp acpi_cpufreq > mperf nfsd auth_rpcgss nfs_acl lockd isofs squashfs btrfs libcrc32c > xor zlib_deflate raid6_pq usb_storage nouveau firewire_ohci > ata_generic pata_acpi firewire_core crc_itu_t sky2 mxm_wmi wmi > i2c_algo_bit ssb drm_kms_helper mmc_core ttm drm i2c_core video sunrpc > loop > CPU: 1 PID: 492 Comm: systemd-udevd Not tainted 3.11.0-300.fc20.x86_64 #1 > Hardware name: Apple Inc. MacBookPro4,1/Mac-F42C89C8, BIOS > MBP41.88Z.00C1.B03.0802271651 02/27/08 > task: ffff880132a407a0 ti: ffff8801343fe000 task.ti: ffff8801343fe000 > RIP: 0010:[<ffffffffa05dd345>] [<ffffffffa05dd345>] > applesmc_get_entry_by_index+0x25/0xf0 [applesmc] > RSP: 0018:ffff8801343ffa18 EFLAGS: 00010286 > RAX: 0000000026000000 RBX: ffff880322aed000 RCX: ffff8801343ffaf4 > RDX: 0000000072000000 RSI: ffffffffa05df06e RDI: 0000000026000000 > RBP: ffff8801343ffa38 R08: ffff8801343fe000 R09: ffff880132a6c100 > R10: 0000000000000000 R11: 0000000000014180 R12: 0000000026000000 > R13: 000000004c000001 R14: ffffffffa05df06e R15: ffff8801343ffa84 > FS: 00007f6aac863880(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: ffff880322aed005 CR3: 00000001340fa000 CR4: 00000000000007e0 > Stack: > 000000000004deef ffffffffffffff10 0000000026000000 0000000000000000 > ffff8801343ffa70 ffffffffa05dd4f1 0000000000000032 0000000000000001 > ffffffffa05df06e ffff880134be3810 ffffffffa05e0660 ffff8801343ffaa8 > Call Trace: > [<ffffffffa05dd4f1>] applesmc_get_lower_bound+0x51/0xa0 [applesmc] > [<ffffffffa05dd563>] applesmc_get_entry_by_key+0x23/0xc0 [applesmc] > [<ffffffffa05dd9f9>] applesmc_read_key+0x19/0x70 [applesmc] > [<ffffffffa05de08f>] applesmc_init_smcreg+0x8f/0x2d0 [applesmc] > [<ffffffffa05de2e2>] applesmc_probe+0x12/0x30 [applesmc] > [<ffffffff813f41dc>] platform_drv_probe+0x3c/0x70 > [<ffffffff813f18b2>] ? driver_sysfs_add+0x82/0xb0 > [<ffffffff813f1f77>] driver_probe_device+0x87/0x390 > [<ffffffff813f2280>] ? driver_probe_device+0x390/0x390 > [<ffffffff813f22bb>] __device_attach+0x3b/0x40 > [<ffffffff813eff73>] bus_for_each_drv+0x63/0xa0 > [<ffffffff813f1e78>] device_attach+0x88/0xa0 > [<ffffffff813f11e8>] bus_probe_device+0x98/0xc0 > [<ffffffff813eeeb4>] device_add+0x4c4/0x7a0 > [<ffffffff813f3cb1>] platform_device_add+0xd1/0x2d0 > [<ffffffff813f4430>] platform_device_register_full+0xe0/0x120 > [<ffffffffa05e3000>] ? 0xffffffffa05e2fff > [<ffffffffa05e3098>] applesmc_init+0x98/0x1000 [applesmc] > [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0 > [<ffffffff81052483>] ? set_memory_nx+0x43/0x50 > [<ffffffff810cc58d>] load_module+0x1bbd/0x2660 > [<ffffffff810c8890>] ? store_uevent+0x40/0x40 > [<ffffffff810cd1a6>] SyS_finit_module+0x86/0xb0 > [<ffffffff81652cd9>] system_call_fastpath+0x16/0x1b > Code: 84 00 00 00 00 00 66 66 66 66 90 55 48 63 c7 48 8d 14 40 48 89 > e5 41 54 41 89 fc 53 48 8d 1c 90 48 83 ec 10 48 03 1d 03 33 00 00 <80> > 7b 05 00 48 89 d8 74 12 48 83 c4 10 5b 41 5c 5d c3 66 0f 1f > RIP [<ffffffffa05dd345>] applesmc_get_entry_by_index+0x25/0xf0 [applesmc] > RSP <ffff8801343ffa18> > ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-09-25 19:56 ` Guenter Roeck 0 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-09-25 19:56 UTC (permalink / raw) To: Josh Boyer Cc: Henrik Rydberg, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On Wed, Sep 25, 2013 at 03:06:25PM -0400, Josh Boyer wrote: > Hi All, > > Chris has reported[1] an issue with the applesmc driver in the 3.10 > and 3.11 kernels. This seems to be a bit transient, but the oops is > consistent across those releases. This is with a MacBook Pro 4,1. > The backtrace is below. > > Any ideas on this one? > Some of the bug reports suggest that there are related messages in at least some of the kernel logs. Jul 02 22:23:22 f19s.local kernel: applesmc: send_byte(0x04, 0x0300) fail: 0x01 Jul 02 22:23:22 f19s.local kernel: applesmc: : read len fail This suggests that initialization may be attempted more than once. The key cache is allocated only once, but the number of keys is read for each attempt. No idea if that can happen, but if the number of keys can increase after the first initialization attempt you would have an explanation for the crash. Guenter > josh > > [1] https://bugzilla.redhat.com/show_bug.cgi?id\x1011719 > > BUG: unable to handle kernel paging request at ffff880322aed005 > IP: [<ffffffffa05dd345>] applesmc_get_entry_by_index+0x25/0xf0 [applesmc] > PGD 1fe5067 PUD 0 > Oops: 0000 [#1] SMP > Modules linked in: b43 coretemp kvm_intel kvm joydev bcma mac80211 > uvcvideo snd_hda_codec_realtek applesmc(+) iTCO_wdt cfg80211 > iTCO_vendor_support microcode input_polldev snd_hda_intel > snd_hda_codec i2c_i801 videobuf2_vmalloc videobuf2_memops > videobuf2_core videodev snd_hwdep snd_seq bcm5974 btusb hid_appleir > snd_seq_device media bluetooth snd_pcm lpc_ich mfd_core rfkill > snd_page_alloc snd_timer snd apple_bl soundcore shpchp acpi_cpufreq > mperf nfsd auth_rpcgss nfs_acl lockd isofs squashfs btrfs libcrc32c > xor zlib_deflate raid6_pq usb_storage nouveau firewire_ohci > ata_generic pata_acpi firewire_core crc_itu_t sky2 mxm_wmi wmi > i2c_algo_bit ssb drm_kms_helper mmc_core ttm drm i2c_core video sunrpc > loop > CPU: 1 PID: 492 Comm: systemd-udevd Not tainted 3.11.0-300.fc20.x86_64 #1 > Hardware name: Apple Inc. MacBookPro4,1/Mac-F42C89C8, BIOS > MBP41.88Z.00C1.B03.0802271651 02/27/08 > task: ffff880132a407a0 ti: ffff8801343fe000 task.ti: ffff8801343fe000 > RIP: 0010:[<ffffffffa05dd345>] [<ffffffffa05dd345>] > applesmc_get_entry_by_index+0x25/0xf0 [applesmc] > RSP: 0018:ffff8801343ffa18 EFLAGS: 00010286 > RAX: 0000000026000000 RBX: ffff880322aed000 RCX: ffff8801343ffaf4 > RDX: 0000000072000000 RSI: ffffffffa05df06e RDI: 0000000026000000 > RBP: ffff8801343ffa38 R08: ffff8801343fe000 R09: ffff880132a6c100 > R10: 0000000000000000 R11: 0000000000014180 R12: 0000000026000000 > R13: 000000004c000001 R14: ffffffffa05df06e R15: ffff8801343ffa84 > FS: 00007f6aac863880(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: ffff880322aed005 CR3: 00000001340fa000 CR4: 00000000000007e0 > Stack: > 000000000004deef ffffffffffffff10 0000000026000000 0000000000000000 > ffff8801343ffa70 ffffffffa05dd4f1 0000000000000032 0000000000000001 > ffffffffa05df06e ffff880134be3810 ffffffffa05e0660 ffff8801343ffaa8 > Call Trace: > [<ffffffffa05dd4f1>] applesmc_get_lower_bound+0x51/0xa0 [applesmc] > [<ffffffffa05dd563>] applesmc_get_entry_by_key+0x23/0xc0 [applesmc] > [<ffffffffa05dd9f9>] applesmc_read_key+0x19/0x70 [applesmc] > [<ffffffffa05de08f>] applesmc_init_smcreg+0x8f/0x2d0 [applesmc] > [<ffffffffa05de2e2>] applesmc_probe+0x12/0x30 [applesmc] > [<ffffffff813f41dc>] platform_drv_probe+0x3c/0x70 > [<ffffffff813f18b2>] ? driver_sysfs_add+0x82/0xb0 > [<ffffffff813f1f77>] driver_probe_device+0x87/0x390 > [<ffffffff813f2280>] ? driver_probe_device+0x390/0x390 > [<ffffffff813f22bb>] __device_attach+0x3b/0x40 > [<ffffffff813eff73>] bus_for_each_drv+0x63/0xa0 > [<ffffffff813f1e78>] device_attach+0x88/0xa0 > [<ffffffff813f11e8>] bus_probe_device+0x98/0xc0 > [<ffffffff813eeeb4>] device_add+0x4c4/0x7a0 > [<ffffffff813f3cb1>] platform_device_add+0xd1/0x2d0 > [<ffffffff813f4430>] platform_device_register_full+0xe0/0x120 > [<ffffffffa05e3000>] ? 0xffffffffa05e2fff > [<ffffffffa05e3098>] applesmc_init+0x98/0x1000 [applesmc] > [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0 > [<ffffffff81052483>] ? set_memory_nx+0x43/0x50 > [<ffffffff810cc58d>] load_module+0x1bbd/0x2660 > [<ffffffff810c8890>] ? store_uevent+0x40/0x40 > [<ffffffff810cd1a6>] SyS_finit_module+0x86/0xb0 > [<ffffffff81652cd9>] system_call_fastpath+0x16/0x1b > Code: 84 00 00 00 00 00 66 66 66 66 90 55 48 63 c7 48 8d 14 40 48 89 > e5 41 54 41 89 fc 53 48 8d 1c 90 48 83 ec 10 48 03 1d 03 33 00 00 <80> > 7b 05 00 48 89 d8 74 12 48 83 c4 10 5b 41 5c 5d c3 66 0f 1f > RIP [<ffffffffa05dd345>] applesmc_get_entry_by_index+0x25/0xf0 [applesmc] > RSP <ffff8801343ffa18> > _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-09-25 19:56 ` [lm-sensors] " Guenter Roeck @ 2013-09-25 21:48 ` Henrik Rydberg -1 siblings, 0 replies; 86+ messages in thread From: Henrik Rydberg @ 2013-09-25 21:48 UTC (permalink / raw) To: Guenter Roeck Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On Wed, Sep 25, 2013 at 12:56:28PM -0700, Guenter Roeck wrote: > On Wed, Sep 25, 2013 at 03:06:25PM -0400, Josh Boyer wrote: > > Hi All, > > > > Chris has reported[1] an issue with the applesmc driver in the 3.10 > > and 3.11 kernels. This seems to be a bit transient, but the oops is > > consistent across those releases. This is with a MacBook Pro 4,1. > > The backtrace is below. > > > > Any ideas on this one? > > > Some of the bug reports suggest that there are related messages in at least some > of the kernel logs. > > Jul 02 22:23:22 f19s.local kernel: applesmc: send_byte(0x04, 0x0300) fail: 0x01 > Jul 02 22:23:22 f19s.local kernel: applesmc: : read len fail > > This suggests that initialization may be attempted more than once. The key cache > is allocated only once, but the number of keys is read for each attempt. > > No idea if that can happen, but if the number of keys can increase after > the first initialization attempt you would have an explanation for the crash. Good idea, and easy enough to test with the patch below. diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c index 62c2e32..d962c4d 100644 --- a/drivers/hwmon/applesmc.c +++ b/drivers/hwmon/applesmc.c @@ -525,16 +525,24 @@ static int applesmc_init_smcreg_try(void) { struct applesmc_registers *s = &smcreg; bool left_light_sensor, right_light_sensor; + unsigned int count; u8 tmp[1]; int ret; if (s->init_complete) return 0; - ret = read_register_count(&s->key_count); + ret = read_register_count(&count); if (ret) return ret; + if (s->cache && s->key_count != count) { + pr_warn("key count changed from %d to %d\n", s->key_count, count); + kfree(s->cache); + s->cache = NULL; + } + s->key_count = count; + if (!s->cache) s->cache = kcalloc(s->key_count, sizeof(*s->cache), GFP_KERNEL); if (!s->cache) Thanks, Henrik ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-09-25 21:48 ` Henrik Rydberg 0 siblings, 0 replies; 86+ messages in thread From: Henrik Rydberg @ 2013-09-25 21:48 UTC (permalink / raw) To: Guenter Roeck Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On Wed, Sep 25, 2013 at 12:56:28PM -0700, Guenter Roeck wrote: > On Wed, Sep 25, 2013 at 03:06:25PM -0400, Josh Boyer wrote: > > Hi All, > > > > Chris has reported[1] an issue with the applesmc driver in the 3.10 > > and 3.11 kernels. This seems to be a bit transient, but the oops is > > consistent across those releases. This is with a MacBook Pro 4,1. > > The backtrace is below. > > > > Any ideas on this one? > > > Some of the bug reports suggest that there are related messages in at least some > of the kernel logs. > > Jul 02 22:23:22 f19s.local kernel: applesmc: send_byte(0x04, 0x0300) fail: 0x01 > Jul 02 22:23:22 f19s.local kernel: applesmc: : read len fail > > This suggests that initialization may be attempted more than once. The key cache > is allocated only once, but the number of keys is read for each attempt. > > No idea if that can happen, but if the number of keys can increase after > the first initialization attempt you would have an explanation for the crash. Good idea, and easy enough to test with the patch below. diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c index 62c2e32..d962c4d 100644 --- a/drivers/hwmon/applesmc.c +++ b/drivers/hwmon/applesmc.c @@ -525,16 +525,24 @@ static int applesmc_init_smcreg_try(void) { struct applesmc_registers *s = &smcreg; bool left_light_sensor, right_light_sensor; + unsigned int count; u8 tmp[1]; int ret; if (s->init_complete) return 0; - ret = read_register_count(&s->key_count); + ret = read_register_count(&count); if (ret) return ret; + if (s->cache && s->key_count != count) { + pr_warn("key count changed from %d to %d\n", s->key_count, count); + kfree(s->cache); + s->cache = NULL; + } + s->key_count = count; + if (!s->cache) s->cache = kcalloc(s->key_count, sizeof(*s->cache), GFP_KERNEL); if (!s->cache) Thanks, Henrik _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-09-25 21:48 ` [lm-sensors] " Henrik Rydberg @ 2013-09-25 22:08 ` Guenter Roeck -1 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-09-25 22:08 UTC (permalink / raw) To: Henrik Rydberg Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On Wed, Sep 25, 2013 at 11:48:07PM +0200, Henrik Rydberg wrote: > On Wed, Sep 25, 2013 at 12:56:28PM -0700, Guenter Roeck wrote: > > On Wed, Sep 25, 2013 at 03:06:25PM -0400, Josh Boyer wrote: > > > Hi All, > > > > > > Chris has reported[1] an issue with the applesmc driver in the 3.10 > > > and 3.11 kernels. This seems to be a bit transient, but the oops is > > > consistent across those releases. This is with a MacBook Pro 4,1. > > > The backtrace is below. > > > > > > Any ideas on this one? > > > > > Some of the bug reports suggest that there are related messages in at least some > > of the kernel logs. > > > > Jul 02 22:23:22 f19s.local kernel: applesmc: send_byte(0x04, 0x0300) fail: 0x01 > > Jul 02 22:23:22 f19s.local kernel: applesmc: : read len fail > > > > This suggests that initialization may be attempted more than once. The key cache > > is allocated only once, but the number of keys is read for each attempt. > > > > No idea if that can happen, but if the number of keys can increase after > > the first initialization attempt you would have an explanation for the crash. > > Good idea, and easy enough to test with the patch below. > Should we apply this patch even though it may not solve the specific problem ? Again, not sure if the key count can change, but the current code is at the very least inconsistent, as it keeps reading the key count without updating or verifying the cache size. Thanks, Guenter > diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c > index 62c2e32..d962c4d 100644 > --- a/drivers/hwmon/applesmc.c > +++ b/drivers/hwmon/applesmc.c > @@ -525,16 +525,24 @@ static int applesmc_init_smcreg_try(void) > { > struct applesmc_registers *s = &smcreg; > bool left_light_sensor, right_light_sensor; > + unsigned int count; > u8 tmp[1]; > int ret; > > if (s->init_complete) > return 0; > > - ret = read_register_count(&s->key_count); > + ret = read_register_count(&count); > if (ret) > return ret; > > + if (s->cache && s->key_count != count) { > + pr_warn("key count changed from %d to %d\n", s->key_count, count); > + kfree(s->cache); > + s->cache = NULL; > + } > + s->key_count = count; > + > if (!s->cache) > s->cache = kcalloc(s->key_count, sizeof(*s->cache), GFP_KERNEL); > if (!s->cache) > > Thanks, > Henrik > ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-09-25 22:08 ` Guenter Roeck 0 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-09-25 22:08 UTC (permalink / raw) To: Henrik Rydberg Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On Wed, Sep 25, 2013 at 11:48:07PM +0200, Henrik Rydberg wrote: > On Wed, Sep 25, 2013 at 12:56:28PM -0700, Guenter Roeck wrote: > > On Wed, Sep 25, 2013 at 03:06:25PM -0400, Josh Boyer wrote: > > > Hi All, > > > > > > Chris has reported[1] an issue with the applesmc driver in the 3.10 > > > and 3.11 kernels. This seems to be a bit transient, but the oops is > > > consistent across those releases. This is with a MacBook Pro 4,1. > > > The backtrace is below. > > > > > > Any ideas on this one? > > > > > Some of the bug reports suggest that there are related messages in at least some > > of the kernel logs. > > > > Jul 02 22:23:22 f19s.local kernel: applesmc: send_byte(0x04, 0x0300) fail: 0x01 > > Jul 02 22:23:22 f19s.local kernel: applesmc: : read len fail > > > > This suggests that initialization may be attempted more than once. The key cache > > is allocated only once, but the number of keys is read for each attempt. > > > > No idea if that can happen, but if the number of keys can increase after > > the first initialization attempt you would have an explanation for the crash. > > Good idea, and easy enough to test with the patch below. > Should we apply this patch even though it may not solve the specific problem ? Again, not sure if the key count can change, but the current code is at the very least inconsistent, as it keeps reading the key count without updating or verifying the cache size. Thanks, Guenter > diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c > index 62c2e32..d962c4d 100644 > --- a/drivers/hwmon/applesmc.c > +++ b/drivers/hwmon/applesmc.c > @@ -525,16 +525,24 @@ static int applesmc_init_smcreg_try(void) > { > struct applesmc_registers *s = &smcreg; > bool left_light_sensor, right_light_sensor; > + unsigned int count; > u8 tmp[1]; > int ret; > > if (s->init_complete) > return 0; > > - ret = read_register_count(&s->key_count); > + ret = read_register_count(&count); > if (ret) > return ret; > > + if (s->cache && s->key_count != count) { > + pr_warn("key count changed from %d to %d\n", s->key_count, count); > + kfree(s->cache); > + s->cache = NULL; > + } > + s->key_count = count; > + > if (!s->cache) > s->cache = kcalloc(s->key_count, sizeof(*s->cache), GFP_KERNEL); > if (!s->cache) > > Thanks, > Henrik > _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-09-25 22:08 ` [lm-sensors] " Guenter Roeck @ 2013-09-26 6:34 ` Henrik Rydberg -1 siblings, 0 replies; 86+ messages in thread From: Henrik Rydberg @ 2013-09-26 6:34 UTC (permalink / raw) To: Guenter Roeck Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla > > > This suggests that initialization may be attempted more than once. The key cache > > > is allocated only once, but the number of keys is read for each attempt. > > > > > > No idea if that can happen, but if the number of keys can increase after > > > the first initialization attempt you would have an explanation for the crash. > > > > Good idea, and easy enough to test with the patch below. > > > Should we apply this patch even though it may not solve the specific problem ? Yes, why not - it certainly won't hurt. I am running it right now, so it is at least run-tested. > Again, not sure if the key count can change, but the current code is at the very > least inconsistent, as it keeps reading the key count without updating or > verifying the cache size. Yes - I agree that the error state is far-fetched, but it is hard to see any other logical explanation. There is of course always the possibility that the problem is somewhere else completely. Proper patch attached. Thanks, Henrik --- >From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 From: Henrik Rydberg <rydberg@euromail.se> Date: Thu, 26 Sep 2013 08:33:16 +0200 Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding After reports from Chris and Josh Boyer of a rare crash in applesmc, Guenter pointed at the initialization problem fixed below. The patch has not been verified to fix the crash, but should be applied regardless. Reported-by: <jwboyer@fedoraproject.org> Suggested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Henrik Rydberg <rydberg@euromail.se> --- drivers/hwmon/applesmc.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c index 62c2e32..98814d1 100644 --- a/drivers/hwmon/applesmc.c +++ b/drivers/hwmon/applesmc.c @@ -525,16 +525,25 @@ static int applesmc_init_smcreg_try(void) { struct applesmc_registers *s = &smcreg; bool left_light_sensor, right_light_sensor; + unsigned int count; u8 tmp[1]; int ret; if (s->init_complete) return 0; - ret = read_register_count(&s->key_count); + ret = read_register_count(&count); if (ret) return ret; + if (s->cache && s->key_count != count) { + pr_warn("key count changed from %d to %d\n", + s->key_count, count); + kfree(s->cache); + s->cache = NULL; + } + s->key_count = count; + if (!s->cache) s->cache = kcalloc(s->key_count, sizeof(*s->cache), GFP_KERNEL); if (!s->cache) -- 1.8.3.4 ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-09-26 6:34 ` Henrik Rydberg 0 siblings, 0 replies; 86+ messages in thread From: Henrik Rydberg @ 2013-09-26 6:34 UTC (permalink / raw) To: Guenter Roeck Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla > > > This suggests that initialization may be attempted more than once. The key cache > > > is allocated only once, but the number of keys is read for each attempt. > > > > > > No idea if that can happen, but if the number of keys can increase after > > > the first initialization attempt you would have an explanation for the crash. > > > > Good idea, and easy enough to test with the patch below. > > > Should we apply this patch even though it may not solve the specific problem ? Yes, why not - it certainly won't hurt. I am running it right now, so it is at least run-tested. > Again, not sure if the key count can change, but the current code is at the very > least inconsistent, as it keeps reading the key count without updating or > verifying the cache size. Yes - I agree that the error state is far-fetched, but it is hard to see any other logical explanation. There is of course always the possibility that the problem is somewhere else completely. Proper patch attached. Thanks, Henrik --- From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 From: Henrik Rydberg <rydberg@euromail.se> Date: Thu, 26 Sep 2013 08:33:16 +0200 Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding After reports from Chris and Josh Boyer of a rare crash in applesmc, Guenter pointed at the initialization problem fixed below. The patch has not been verified to fix the crash, but should be applied regardless. Reported-by: <jwboyer@fedoraproject.org> Suggested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Henrik Rydberg <rydberg@euromail.se> --- drivers/hwmon/applesmc.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c index 62c2e32..98814d1 100644 --- a/drivers/hwmon/applesmc.c +++ b/drivers/hwmon/applesmc.c @@ -525,16 +525,25 @@ static int applesmc_init_smcreg_try(void) { struct applesmc_registers *s = &smcreg; bool left_light_sensor, right_light_sensor; + unsigned int count; u8 tmp[1]; int ret; if (s->init_complete) return 0; - ret = read_register_count(&s->key_count); + ret = read_register_count(&count); if (ret) return ret; + if (s->cache && s->key_count != count) { + pr_warn("key count changed from %d to %d\n", + s->key_count, count); + kfree(s->cache); + s->cache = NULL; + } + s->key_count = count; + if (!s->cache) s->cache = kcalloc(s->key_count, sizeof(*s->cache), GFP_KERNEL); if (!s->cache) -- 1.8.3.4 _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-09-26 6:34 ` [lm-sensors] " Henrik Rydberg @ 2013-09-26 10:36 ` Guenter Roeck -1 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-09-26 10:36 UTC (permalink / raw) To: Henrik Rydberg Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On 09/25/2013 11:34 PM, Henrik Rydberg wrote: >>>> This suggests that initialization may be attempted more than once. The key cache >>>> is allocated only once, but the number of keys is read for each attempt. >>>> >>>> No idea if that can happen, but if the number of keys can increase after >>>> the first initialization attempt you would have an explanation for the crash. >>> >>> Good idea, and easy enough to test with the patch below. >>> >> Should we apply this patch even though it may not solve the specific problem ? > > Yes, why not - it certainly won't hurt. I am running it right now, so > it is at least run-tested. > >> Again, not sure if the key count can change, but the current code is at the very >> least inconsistent, as it keeps reading the key count without updating or >> verifying the cache size. > > Yes - I agree that the error state is far-fetched, but it is hard to > see any other logical explanation. There is of course always the > possibility that the problem is somewhere else completely. > There are also ACPI conflicts in each of the bug reports I looked at. Can this play a role, or is that "normal" on apple systems ? Thanks, Guenter > Proper patch attached. > > Thanks, > Henrik > > --- > >>From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 > From: Henrik Rydberg <rydberg@euromail.se> > Date: Thu, 26 Sep 2013 08:33:16 +0200 > Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding > > After reports from Chris and Josh Boyer of a rare crash in applesmc, > Guenter pointed at the initialization problem fixed below. The patch > has not been verified to fix the crash, but should be applied > regardless. > > Reported-by: <jwboyer@fedoraproject.org> > Suggested-by: Guenter Roeck <linux@roeck-us.net> > Signed-off-by: Henrik Rydberg <rydberg@euromail.se> > --- > drivers/hwmon/applesmc.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c > index 62c2e32..98814d1 100644 > --- a/drivers/hwmon/applesmc.c > +++ b/drivers/hwmon/applesmc.c > @@ -525,16 +525,25 @@ static int applesmc_init_smcreg_try(void) > { > struct applesmc_registers *s = &smcreg; > bool left_light_sensor, right_light_sensor; > + unsigned int count; > u8 tmp[1]; > int ret; > > if (s->init_complete) > return 0; > > - ret = read_register_count(&s->key_count); > + ret = read_register_count(&count); > if (ret) > return ret; > > + if (s->cache && s->key_count != count) { > + pr_warn("key count changed from %d to %d\n", > + s->key_count, count); > + kfree(s->cache); > + s->cache = NULL; > + } > + s->key_count = count; > + > if (!s->cache) > s->cache = kcalloc(s->key_count, sizeof(*s->cache), GFP_KERNEL); > if (!s->cache) > ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-09-26 10:36 ` Guenter Roeck 0 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-09-26 10:36 UTC (permalink / raw) To: Henrik Rydberg Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On 09/25/2013 11:34 PM, Henrik Rydberg wrote: >>>> This suggests that initialization may be attempted more than once. The key cache >>>> is allocated only once, but the number of keys is read for each attempt. >>>> >>>> No idea if that can happen, but if the number of keys can increase after >>>> the first initialization attempt you would have an explanation for the crash. >>> >>> Good idea, and easy enough to test with the patch below. >>> >> Should we apply this patch even though it may not solve the specific problem ? > > Yes, why not - it certainly won't hurt. I am running it right now, so > it is at least run-tested. > >> Again, not sure if the key count can change, but the current code is at the very >> least inconsistent, as it keeps reading the key count without updating or >> verifying the cache size. > > Yes - I agree that the error state is far-fetched, but it is hard to > see any other logical explanation. There is of course always the > possibility that the problem is somewhere else completely. > There are also ACPI conflicts in each of the bug reports I looked at. Can this play a role, or is that "normal" on apple systems ? Thanks, Guenter > Proper patch attached. > > Thanks, > Henrik > > --- > >>From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 > From: Henrik Rydberg <rydberg@euromail.se> > Date: Thu, 26 Sep 2013 08:33:16 +0200 > Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding > > After reports from Chris and Josh Boyer of a rare crash in applesmc, > Guenter pointed at the initialization problem fixed below. The patch > has not been verified to fix the crash, but should be applied > regardless. > > Reported-by: <jwboyer@fedoraproject.org> > Suggested-by: Guenter Roeck <linux@roeck-us.net> > Signed-off-by: Henrik Rydberg <rydberg@euromail.se> > --- > drivers/hwmon/applesmc.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c > index 62c2e32..98814d1 100644 > --- a/drivers/hwmon/applesmc.c > +++ b/drivers/hwmon/applesmc.c > @@ -525,16 +525,25 @@ static int applesmc_init_smcreg_try(void) > { > struct applesmc_registers *s = &smcreg; > bool left_light_sensor, right_light_sensor; > + unsigned int count; > u8 tmp[1]; > int ret; > > if (s->init_complete) > return 0; > > - ret = read_register_count(&s->key_count); > + ret = read_register_count(&count); > if (ret) > return ret; > > + if (s->cache && s->key_count != count) { > + pr_warn("key count changed from %d to %d\n", > + s->key_count, count); > + kfree(s->cache); > + s->cache = NULL; > + } > + s->key_count = count; > + > if (!s->cache) > s->cache = kcalloc(s->key_count, sizeof(*s->cache), GFP_KERNEL); > if (!s->cache) > _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-09-26 10:36 ` [lm-sensors] " Guenter Roeck @ 2013-09-26 11:13 ` Henrik Rydberg -1 siblings, 0 replies; 86+ messages in thread From: Henrik Rydberg @ 2013-09-26 11:13 UTC (permalink / raw) To: Guenter Roeck Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla > >Yes - I agree that the error state is far-fetched, but it is hard to > >see any other logical explanation. There is of course always the > >possibility that the problem is somewhere else completely. > > > There are also ACPI conflicts in each of the bug reports I looked at. > Can this play a role, or is that "normal" on apple systems ? I honstestly don't know. On my own machine (MBA3,1), I have problems with the PCIe NIC somehow overwriting the graphics buffer, resulting in a corrupted screen at boot. After a suspend/resume, that problem is gone, but there is surely something fishy going on... Thanks, Henrik ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-09-26 11:13 ` Henrik Rydberg 0 siblings, 0 replies; 86+ messages in thread From: Henrik Rydberg @ 2013-09-26 11:13 UTC (permalink / raw) To: Guenter Roeck Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla > >Yes - I agree that the error state is far-fetched, but it is hard to > >see any other logical explanation. There is of course always the > >possibility that the problem is somewhere else completely. > > > There are also ACPI conflicts in each of the bug reports I looked at. > Can this play a role, or is that "normal" on apple systems ? I honstestly don't know. On my own machine (MBA3,1), I have problems with the PCIe NIC somehow overwriting the graphics buffer, resulting in a corrupted screen at boot. After a suspend/resume, that problem is gone, but there is surely something fishy going on... Thanks, Henrik _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-09-26 6:34 ` [lm-sensors] " Henrik Rydberg @ 2013-09-26 10:53 ` Guenter Roeck -1 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-09-26 10:53 UTC (permalink / raw) To: Henrik Rydberg Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On 09/25/2013 11:34 PM, Henrik Rydberg wrote: >>>> This suggests that initialization may be attempted more than once. The key cache >>>> is allocated only once, but the number of keys is read for each attempt. >>>> >>>> No idea if that can happen, but if the number of keys can increase after >>>> the first initialization attempt you would have an explanation for the crash. >>> >>> Good idea, and easy enough to test with the patch below. >>> >> Should we apply this patch even though it may not solve the specific problem ? > > Yes, why not - it certainly won't hurt. I am running it right now, so > it is at least run-tested. > >> Again, not sure if the key count can change, but the current code is at the very >> least inconsistent, as it keeps reading the key count without updating or >> verifying the cache size. > > Yes - I agree that the error state is far-fetched, but it is hard to > see any other logical explanation. There is of course always the > possibility that the problem is somewhere else completely. > > Proper patch attached. > > Thanks, > Henrik > > --- > >>From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 > From: Henrik Rydberg <rydberg@euromail.se> > Date: Thu, 26 Sep 2013 08:33:16 +0200 > Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding > > After reports from Chris and Josh Boyer of a rare crash in applesmc, > Guenter pointed at the initialization problem fixed below. The patch > has not been verified to fix the crash, but should be applied > regardless. > > Reported-by: <jwboyer@fedoraproject.org> > Suggested-by: Guenter Roeck <linux@roeck-us.net> > Signed-off-by: Henrik Rydberg <rydberg@euromail.se> > --- > drivers/hwmon/applesmc.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) Applied. I'll run my sanity tests before I send it upstream. I'll also Cc: stable. Interesting is that this started to happen with 3.10, even though I did not find any relevant changes in the driver. Is it possible that changed boot timing (ie reduced boot time) exposes this problem ? Thanks, Guenter ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-09-26 10:53 ` Guenter Roeck 0 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-09-26 10:53 UTC (permalink / raw) To: Henrik Rydberg Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On 09/25/2013 11:34 PM, Henrik Rydberg wrote: >>>> This suggests that initialization may be attempted more than once. The key cache >>>> is allocated only once, but the number of keys is read for each attempt. >>>> >>>> No idea if that can happen, but if the number of keys can increase after >>>> the first initialization attempt you would have an explanation for the crash. >>> >>> Good idea, and easy enough to test with the patch below. >>> >> Should we apply this patch even though it may not solve the specific problem ? > > Yes, why not - it certainly won't hurt. I am running it right now, so > it is at least run-tested. > >> Again, not sure if the key count can change, but the current code is at the very >> least inconsistent, as it keeps reading the key count without updating or >> verifying the cache size. > > Yes - I agree that the error state is far-fetched, but it is hard to > see any other logical explanation. There is of course always the > possibility that the problem is somewhere else completely. > > Proper patch attached. > > Thanks, > Henrik > > --- > >>From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 > From: Henrik Rydberg <rydberg@euromail.se> > Date: Thu, 26 Sep 2013 08:33:16 +0200 > Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding > > After reports from Chris and Josh Boyer of a rare crash in applesmc, > Guenter pointed at the initialization problem fixed below. The patch > has not been verified to fix the crash, but should be applied > regardless. > > Reported-by: <jwboyer@fedoraproject.org> > Suggested-by: Guenter Roeck <linux@roeck-us.net> > Signed-off-by: Henrik Rydberg <rydberg@euromail.se> > --- > drivers/hwmon/applesmc.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) Applied. I'll run my sanity tests before I send it upstream. I'll also Cc: stable. Interesting is that this started to happen with 3.10, even though I did not find any relevant changes in the driver. Is it possible that changed boot timing (ie reduced boot time) exposes this problem ? Thanks, Guenter _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-09-26 10:53 ` [lm-sensors] " Guenter Roeck @ 2013-09-26 11:11 ` Henrik Rydberg -1 siblings, 0 replies; 86+ messages in thread From: Henrik Rydberg @ 2013-09-26 11:11 UTC (permalink / raw) To: Guenter Roeck Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla > Applied. I'll run my sanity tests before I send it upstream. > I'll also Cc: stable. Great. > Interesting is that this started to happen with 3.10, even though > I did not find any relevant changes in the driver. Is it possible that > changed boot timing (ie reduced boot time) exposes this problem ? I was thinking the same thing - if the SMC takes some time to boot up, it could very well be the culprit. Imagine that, the kernel is too fast. ;-) Thanks, Henrik ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-09-26 11:11 ` Henrik Rydberg 0 siblings, 0 replies; 86+ messages in thread From: Henrik Rydberg @ 2013-09-26 11:11 UTC (permalink / raw) To: Guenter Roeck Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla > Applied. I'll run my sanity tests before I send it upstream. > I'll also Cc: stable. Great. > Interesting is that this started to happen with 3.10, even though > I did not find any relevant changes in the driver. Is it possible that > changed boot timing (ie reduced boot time) exposes this problem ? I was thinking the same thing - if the SMC takes some time to boot up, it could very well be the culprit. Imagine that, the kernel is too fast. ;-) Thanks, Henrik _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-09-26 6:34 ` [lm-sensors] " Henrik Rydberg @ 2013-09-27 16:21 ` Josh Boyer -1 siblings, 0 replies; 86+ messages in thread From: Josh Boyer @ 2013-09-27 16:21 UTC (permalink / raw) To: Henrik Rydberg Cc: Guenter Roeck, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote: >> > > This suggests that initialization may be attempted more than once. The key cache >> > > is allocated only once, but the number of keys is read for each attempt. >> > > >> > > No idea if that can happen, but if the number of keys can increase after >> > > the first initialization attempt you would have an explanation for the crash. >> > >> > Good idea, and easy enough to test with the patch below. >> > >> Should we apply this patch even though it may not solve the specific problem ? > > Yes, why not - it certainly won't hurt. I am running it right now, so > it is at least run-tested. > >> Again, not sure if the key count can change, but the current code is at the very >> least inconsistent, as it keeps reading the key count without updating or >> verifying the cache size. > > Yes - I agree that the error state is far-fetched, but it is hard to > see any other logical explanation. There is of course always the > possibility that the problem is somewhere else completely. > > Proper patch attached. > > Thanks, > Henrik > > --- > > From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 > From: Henrik Rydberg <rydberg@euromail.se> > Date: Thu, 26 Sep 2013 08:33:16 +0200 > Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding > > After reports from Chris and Josh Boyer of a rare crash in applesmc, > Guenter pointed at the initialization problem fixed below. The patch > has not been verified to fix the crash, but should be applied > regardless. > > Reported-by: <jwboyer@fedoraproject.org> > Suggested-by: Guenter Roeck <linux@roeck-us.net> > Signed-off-by: Henrik Rydberg <rydberg@euromail.se> > --- > drivers/hwmon/applesmc.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) Thanks for the quick reply. I'll get this rolled into our kernels soon. josh ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-09-27 16:21 ` Josh Boyer 0 siblings, 0 replies; 86+ messages in thread From: Josh Boyer @ 2013-09-27 16:21 UTC (permalink / raw) To: Henrik Rydberg Cc: Guenter Roeck, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote: >> > > This suggests that initialization may be attempted more than once. The key cache >> > > is allocated only once, but the number of keys is read for each attempt. >> > > >> > > No idea if that can happen, but if the number of keys can increase after >> > > the first initialization attempt you would have an explanation for the crash. >> > >> > Good idea, and easy enough to test with the patch below. >> > >> Should we apply this patch even though it may not solve the specific problem ? > > Yes, why not - it certainly won't hurt. I am running it right now, so > it is at least run-tested. > >> Again, not sure if the key count can change, but the current code is at the very >> least inconsistent, as it keeps reading the key count without updating or >> verifying the cache size. > > Yes - I agree that the error state is far-fetched, but it is hard to > see any other logical explanation. There is of course always the > possibility that the problem is somewhere else completely. > > Proper patch attached. > > Thanks, > Henrik > > --- > > From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 > From: Henrik Rydberg <rydberg@euromail.se> > Date: Thu, 26 Sep 2013 08:33:16 +0200 > Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding > > After reports from Chris and Josh Boyer of a rare crash in applesmc, > Guenter pointed at the initialization problem fixed below. The patch > has not been verified to fix the crash, but should be applied > regardless. > > Reported-by: <jwboyer@fedoraproject.org> > Suggested-by: Guenter Roeck <linux@roeck-us.net> > Signed-off-by: Henrik Rydberg <rydberg@euromail.se> > --- > drivers/hwmon/applesmc.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) Thanks for the quick reply. I'll get this rolled into our kernels soon. josh _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-09-27 16:21 ` [lm-sensors] " Josh Boyer @ 2013-09-27 17:12 ` Guenter Roeck -1 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-09-27 17:12 UTC (permalink / raw) To: Josh Boyer Cc: Henrik Rydberg, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote: > On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote: > >> > > This suggests that initialization may be attempted more than once. The key cache > >> > > is allocated only once, but the number of keys is read for each attempt. > >> > > > >> > > No idea if that can happen, but if the number of keys can increase after > >> > > the first initialization attempt you would have an explanation for the crash. > >> > > >> > Good idea, and easy enough to test with the patch below. > >> > > >> Should we apply this patch even though it may not solve the specific problem ? > > > > Yes, why not - it certainly won't hurt. I am running it right now, so > > it is at least run-tested. > > > >> Again, not sure if the key count can change, but the current code is at the very > >> least inconsistent, as it keeps reading the key count without updating or > >> verifying the cache size. > > > > Yes - I agree that the error state is far-fetched, but it is hard to > > see any other logical explanation. There is of course always the > > possibility that the problem is somewhere else completely. > > > > Proper patch attached. > > > > Thanks, > > Henrik > > > > --- > > > > From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 > > From: Henrik Rydberg <rydberg@euromail.se> > > Date: Thu, 26 Sep 2013 08:33:16 +0200 > > Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding > > > > After reports from Chris and Josh Boyer of a rare crash in applesmc, > > Guenter pointed at the initialization problem fixed below. The patch > > has not been verified to fix the crash, but should be applied > > regardless. > > > > Reported-by: <jwboyer@fedoraproject.org> > > Suggested-by: Guenter Roeck <linux@roeck-us.net> > > Signed-off-by: Henrik Rydberg <rydberg@euromail.se> > > --- > > drivers/hwmon/applesmc.c | 11 ++++++++++- > > 1 file changed, 10 insertions(+), 1 deletion(-) > > Thanks for the quick reply. I'll get this rolled into our kernels soon. > I sent a pull request to Linus, so you should be able to pull it from the upstream kernel shortly. Would be great to get feedback if the patch solves the problem (or doesn't). Thanks, Guenter ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-09-27 17:12 ` Guenter Roeck 0 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-09-27 17:12 UTC (permalink / raw) To: Josh Boyer Cc: Henrik Rydberg, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote: > On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote: > >> > > This suggests that initialization may be attempted more than once. The key cache > >> > > is allocated only once, but the number of keys is read for each attempt. > >> > > > >> > > No idea if that can happen, but if the number of keys can increase after > >> > > the first initialization attempt you would have an explanation for the crash. > >> > > >> > Good idea, and easy enough to test with the patch below. > >> > > >> Should we apply this patch even though it may not solve the specific problem ? > > > > Yes, why not - it certainly won't hurt. I am running it right now, so > > it is at least run-tested. > > > >> Again, not sure if the key count can change, but the current code is at the very > >> least inconsistent, as it keeps reading the key count without updating or > >> verifying the cache size. > > > > Yes - I agree that the error state is far-fetched, but it is hard to > > see any other logical explanation. There is of course always the > > possibility that the problem is somewhere else completely. > > > > Proper patch attached. > > > > Thanks, > > Henrik > > > > --- > > > > From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 > > From: Henrik Rydberg <rydberg@euromail.se> > > Date: Thu, 26 Sep 2013 08:33:16 +0200 > > Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding > > > > After reports from Chris and Josh Boyer of a rare crash in applesmc, > > Guenter pointed at the initialization problem fixed below. The patch > > has not been verified to fix the crash, but should be applied > > regardless. > > > > Reported-by: <jwboyer@fedoraproject.org> > > Suggested-by: Guenter Roeck <linux@roeck-us.net> > > Signed-off-by: Henrik Rydberg <rydberg@euromail.se> > > --- > > drivers/hwmon/applesmc.c | 11 ++++++++++- > > 1 file changed, 10 insertions(+), 1 deletion(-) > > Thanks for the quick reply. I'll get this rolled into our kernels soon. > I sent a pull request to Linus, so you should be able to pull it from the upstream kernel shortly. Would be great to get feedback if the patch solves the problem (or doesn't). Thanks, Guenter _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-09-27 17:12 ` [lm-sensors] " Guenter Roeck @ 2013-09-27 17:41 ` Chris Murphy -1 siblings, 0 replies; 86+ messages in thread From: Chris Murphy @ 2013-09-27 17:41 UTC (permalink / raw) To: Guenter Roeck Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote: > On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote: >> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote: >>>>>> This suggests that initialization may be attempted more than once. The key cache >>>>>> is allocated only once, but the number of keys is read for each attempt. >>>>>> >>>>>> No idea if that can happen, but if the number of keys can increase after >>>>>> the first initialization attempt you would have an explanation for the crash. >>>>> >>>>> Good idea, and easy enough to test with the patch below. >>>>> >>>> Should we apply this patch even though it may not solve the specific problem ? >>> >>> Yes, why not - it certainly won't hurt. I am running it right now, so >>> it is at least run-tested. >>> >>>> Again, not sure if the key count can change, but the current code is at the very >>>> least inconsistent, as it keeps reading the key count without updating or >>>> verifying the cache size. >>> >>> Yes - I agree that the error state is far-fetched, but it is hard to >>> see any other logical explanation. There is of course always the >>> possibility that the problem is somewhere else completely. >>> >>> Proper patch attached. >>> >>> Thanks, >>> Henrik >>> >>> --- >>> >>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 >>> From: Henrik Rydberg <rydberg@euromail.se> >>> Date: Thu, 26 Sep 2013 08:33:16 +0200 >>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding >>> >>> After reports from Chris and Josh Boyer of a rare crash in applesmc, >>> Guenter pointed at the initialization problem fixed below. The patch >>> has not been verified to fix the crash, but should be applied >>> regardless. >>> >>> Reported-by: <jwboyer@fedoraproject.org> >>> Suggested-by: Guenter Roeck <linux@roeck-us.net> >>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se> >>> --- >>> drivers/hwmon/applesmc.c | 11 ++++++++++- >>> 1 file changed, 10 insertions(+), 1 deletion(-) >> >> Thanks for the quick reply. I'll get this rolled into our kernels soon. >> > I sent a pull request to Linus, so you should be able to pull it from > the upstream kernel shortly. Would be great to get feedback if the patch > solves the problem (or doesn't). I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both. Chris ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-09-27 17:41 ` Chris Murphy 0 siblings, 0 replies; 86+ messages in thread From: Chris Murphy @ 2013-09-27 17:41 UTC (permalink / raw) To: Guenter Roeck Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote: > On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote: >> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote: >>>>>> This suggests that initialization may be attempted more than once. The key cache >>>>>> is allocated only once, but the number of keys is read for each attempt. >>>>>> >>>>>> No idea if that can happen, but if the number of keys can increase after >>>>>> the first initialization attempt you would have an explanation for the crash. >>>>> >>>>> Good idea, and easy enough to test with the patch below. >>>>> >>>> Should we apply this patch even though it may not solve the specific problem ? >>> >>> Yes, why not - it certainly won't hurt. I am running it right now, so >>> it is at least run-tested. >>> >>>> Again, not sure if the key count can change, but the current code is at the very >>>> least inconsistent, as it keeps reading the key count without updating or >>>> verifying the cache size. >>> >>> Yes - I agree that the error state is far-fetched, but it is hard to >>> see any other logical explanation. There is of course always the >>> possibility that the problem is somewhere else completely. >>> >>> Proper patch attached. >>> >>> Thanks, >>> Henrik >>> >>> --- >>> >>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 >>> From: Henrik Rydberg <rydberg@euromail.se> >>> Date: Thu, 26 Sep 2013 08:33:16 +0200 >>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding >>> >>> After reports from Chris and Josh Boyer of a rare crash in applesmc, >>> Guenter pointed at the initialization problem fixed below. The patch >>> has not been verified to fix the crash, but should be applied >>> regardless. >>> >>> Reported-by: <jwboyer@fedoraproject.org> >>> Suggested-by: Guenter Roeck <linux@roeck-us.net> >>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se> >>> --- >>> drivers/hwmon/applesmc.c | 11 ++++++++++- >>> 1 file changed, 10 insertions(+), 1 deletion(-) >> >> Thanks for the quick reply. I'll get this rolled into our kernels soon. >> > I sent a pull request to Linus, so you should be able to pull it from > the upstream kernel shortly. Would be great to get feedback if the patch > solves the problem (or doesn't). I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both. Chris _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-09-27 17:41 ` [lm-sensors] " Chris Murphy @ 2013-09-27 17:59 ` Guenter Roeck -1 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-09-27 17:59 UTC (permalink / raw) To: Chris Murphy Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote: > > On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote: > > > On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote: > >> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote: > >>>>>> This suggests that initialization may be attempted more than once. The key cache > >>>>>> is allocated only once, but the number of keys is read for each attempt. > >>>>>> > >>>>>> No idea if that can happen, but if the number of keys can increase after > >>>>>> the first initialization attempt you would have an explanation for the crash. > >>>>> > >>>>> Good idea, and easy enough to test with the patch below. > >>>>> > >>>> Should we apply this patch even though it may not solve the specific problem ? > >>> > >>> Yes, why not - it certainly won't hurt. I am running it right now, so > >>> it is at least run-tested. > >>> > >>>> Again, not sure if the key count can change, but the current code is at the very > >>>> least inconsistent, as it keeps reading the key count without updating or > >>>> verifying the cache size. > >>> > >>> Yes - I agree that the error state is far-fetched, but it is hard to > >>> see any other logical explanation. There is of course always the > >>> possibility that the problem is somewhere else completely. > >>> > >>> Proper patch attached. > >>> > >>> Thanks, > >>> Henrik > >>> > >>> --- > >>> > >>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 > >>> From: Henrik Rydberg <rydberg@euromail.se> > >>> Date: Thu, 26 Sep 2013 08:33:16 +0200 > >>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding > >>> > >>> After reports from Chris and Josh Boyer of a rare crash in applesmc, > >>> Guenter pointed at the initialization problem fixed below. The patch > >>> has not been verified to fix the crash, but should be applied > >>> regardless. > >>> > >>> Reported-by: <jwboyer@fedoraproject.org> > >>> Suggested-by: Guenter Roeck <linux@roeck-us.net> > >>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se> > >>> --- > >>> drivers/hwmon/applesmc.c | 11 ++++++++++- > >>> 1 file changed, 10 insertions(+), 1 deletion(-) > >> > >> Thanks for the quick reply. I'll get this rolled into our kernels soon. > >> > > I sent a pull request to Linus, so you should be able to pull it from > > the upstream kernel shortly. Would be great to get feedback if the patch > > solves the problem (or doesn't). > > I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both. > When you do, can you possibly trigger an event based on the warning added with the patch ? This might help us to identify if the problem fixed with the patch actually happens. Thanks, Guenter ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-09-27 17:59 ` Guenter Roeck 0 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-09-27 17:59 UTC (permalink / raw) To: Chris Murphy Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote: > > On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote: > > > On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote: > >> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote: > >>>>>> This suggests that initialization may be attempted more than once. The key cache > >>>>>> is allocated only once, but the number of keys is read for each attempt. > >>>>>> > >>>>>> No idea if that can happen, but if the number of keys can increase after > >>>>>> the first initialization attempt you would have an explanation for the crash. > >>>>> > >>>>> Good idea, and easy enough to test with the patch below. > >>>>> > >>>> Should we apply this patch even though it may not solve the specific problem ? > >>> > >>> Yes, why not - it certainly won't hurt. I am running it right now, so > >>> it is at least run-tested. > >>> > >>>> Again, not sure if the key count can change, but the current code is at the very > >>>> least inconsistent, as it keeps reading the key count without updating or > >>>> verifying the cache size. > >>> > >>> Yes - I agree that the error state is far-fetched, but it is hard to > >>> see any other logical explanation. There is of course always the > >>> possibility that the problem is somewhere else completely. > >>> > >>> Proper patch attached. > >>> > >>> Thanks, > >>> Henrik > >>> > >>> --- > >>> > >>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 > >>> From: Henrik Rydberg <rydberg@euromail.se> > >>> Date: Thu, 26 Sep 2013 08:33:16 +0200 > >>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding > >>> > >>> After reports from Chris and Josh Boyer of a rare crash in applesmc, > >>> Guenter pointed at the initialization problem fixed below. The patch > >>> has not been verified to fix the crash, but should be applied > >>> regardless. > >>> > >>> Reported-by: <jwboyer@fedoraproject.org> > >>> Suggested-by: Guenter Roeck <linux@roeck-us.net> > >>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se> > >>> --- > >>> drivers/hwmon/applesmc.c | 11 ++++++++++- > >>> 1 file changed, 10 insertions(+), 1 deletion(-) > >> > >> Thanks for the quick reply. I'll get this rolled into our kernels soon. > >> > > I sent a pull request to Linus, so you should be able to pull it from > > the upstream kernel shortly. Would be great to get feedback if the patch > > solves the problem (or doesn't). > > I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both. > When you do, can you possibly trigger an event based on the warning added with the patch ? This might help us to identify if the problem fixed with the patch actually happens. Thanks, Guenter _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-09-27 17:59 ` [lm-sensors] " Guenter Roeck @ 2013-09-27 18:03 ` Chris Murphy -1 siblings, 0 replies; 86+ messages in thread From: Chris Murphy @ 2013-09-27 18:03 UTC (permalink / raw) To: Guenter Roeck Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On Sep 27, 2013, at 11:59 AM, Guenter Roeck <linux@roeck-us.net> wrote: > On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote: >> >> On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote: >> >>> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote: >>>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote: >>>>>>>> This suggests that initialization may be attempted more than once. The key cache >>>>>>>> is allocated only once, but the number of keys is read for each attempt. >>>>>>>> >>>>>>>> No idea if that can happen, but if the number of keys can increase after >>>>>>>> the first initialization attempt you would have an explanation for the crash. >>>>>>> >>>>>>> Good idea, and easy enough to test with the patch below. >>>>>>> >>>>>> Should we apply this patch even though it may not solve the specific problem ? >>>>> >>>>> Yes, why not - it certainly won't hurt. I am running it right now, so >>>>> it is at least run-tested. >>>>> >>>>>> Again, not sure if the key count can change, but the current code is at the very >>>>>> least inconsistent, as it keeps reading the key count without updating or >>>>>> verifying the cache size. >>>>> >>>>> Yes - I agree that the error state is far-fetched, but it is hard to >>>>> see any other logical explanation. There is of course always the >>>>> possibility that the problem is somewhere else completely. >>>>> >>>>> Proper patch attached. >>>>> >>>>> Thanks, >>>>> Henrik >>>>> >>>>> --- >>>>> >>>>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 >>>>> From: Henrik Rydberg <rydberg@euromail.se> >>>>> Date: Thu, 26 Sep 2013 08:33:16 +0200 >>>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding >>>>> >>>>> After reports from Chris and Josh Boyer of a rare crash in applesmc, >>>>> Guenter pointed at the initialization problem fixed below. The patch >>>>> has not been verified to fix the crash, but should be applied >>>>> regardless. >>>>> >>>>> Reported-by: <jwboyer@fedoraproject.org> >>>>> Suggested-by: Guenter Roeck <linux@roeck-us.net> >>>>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se> >>>>> --- >>>>> drivers/hwmon/applesmc.c | 11 ++++++++++- >>>>> 1 file changed, 10 insertions(+), 1 deletion(-) >>>> >>>> Thanks for the quick reply. I'll get this rolled into our kernels soon. >>>> >>> I sent a pull request to Linus, so you should be able to pull it from >>> the upstream kernel shortly. Would be great to get feedback if the patch >>> solves the problem (or doesn't). >> >> I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both. >> > When you do, can you possibly trigger an event based on the warning added > with the patch ? This might help us to identify if the problem fixed > with the patch actually happens. I don't understand the question. I'm uncertain how to trigger, and also what event. Chris ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-09-27 18:03 ` Chris Murphy 0 siblings, 0 replies; 86+ messages in thread From: Chris Murphy @ 2013-09-27 18:03 UTC (permalink / raw) To: Guenter Roeck Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On Sep 27, 2013, at 11:59 AM, Guenter Roeck <linux@roeck-us.net> wrote: > On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote: >> >> On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote: >> >>> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote: >>>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote: >>>>>>>> This suggests that initialization may be attempted more than once. The key cache >>>>>>>> is allocated only once, but the number of keys is read for each attempt. >>>>>>>> >>>>>>>> No idea if that can happen, but if the number of keys can increase after >>>>>>>> the first initialization attempt you would have an explanation for the crash. >>>>>>> >>>>>>> Good idea, and easy enough to test with the patch below. >>>>>>> >>>>>> Should we apply this patch even though it may not solve the specific problem ? >>>>> >>>>> Yes, why not - it certainly won't hurt. I am running it right now, so >>>>> it is at least run-tested. >>>>> >>>>>> Again, not sure if the key count can change, but the current code is at the very >>>>>> least inconsistent, as it keeps reading the key count without updating or >>>>>> verifying the cache size. >>>>> >>>>> Yes - I agree that the error state is far-fetched, but it is hard to >>>>> see any other logical explanation. There is of course always the >>>>> possibility that the problem is somewhere else completely. >>>>> >>>>> Proper patch attached. >>>>> >>>>> Thanks, >>>>> Henrik >>>>> >>>>> --- >>>>> >>>>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 >>>>> From: Henrik Rydberg <rydberg@euromail.se> >>>>> Date: Thu, 26 Sep 2013 08:33:16 +0200 >>>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding >>>>> >>>>> After reports from Chris and Josh Boyer of a rare crash in applesmc, >>>>> Guenter pointed at the initialization problem fixed below. The patch >>>>> has not been verified to fix the crash, but should be applied >>>>> regardless. >>>>> >>>>> Reported-by: <jwboyer@fedoraproject.org> >>>>> Suggested-by: Guenter Roeck <linux@roeck-us.net> >>>>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se> >>>>> --- >>>>> drivers/hwmon/applesmc.c | 11 ++++++++++- >>>>> 1 file changed, 10 insertions(+), 1 deletion(-) >>>> >>>> Thanks for the quick reply. I'll get this rolled into our kernels soon. >>>> >>> I sent a pull request to Linus, so you should be able to pull it from >>> the upstream kernel shortly. Would be great to get feedback if the patch >>> solves the problem (or doesn't). >> >> I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both. >> > When you do, can you possibly trigger an event based on the warning added > with the patch ? This might help us to identify if the problem fixed > with the patch actually happens. I don't understand the question. I'm uncertain how to trigger, and also what event. Chris _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-09-27 18:03 ` [lm-sensors] " Chris Murphy @ 2013-09-27 23:33 ` Guenter Roeck -1 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-09-27 23:33 UTC (permalink / raw) To: Chris Murphy Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On 09/27/2013 11:03 AM, Chris Murphy wrote: > > On Sep 27, 2013, at 11:59 AM, Guenter Roeck <linux@roeck-us.net> wrote: > >> On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote: >>> >>> On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote: >>> >>>> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote: >>>>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote: >>>>>>>>> This suggests that initialization may be attempted more than once. The key cache >>>>>>>>> is allocated only once, but the number of keys is read for each attempt. >>>>>>>>> >>>>>>>>> No idea if that can happen, but if the number of keys can increase after >>>>>>>>> the first initialization attempt you would have an explanation for the crash. >>>>>>>> >>>>>>>> Good idea, and easy enough to test with the patch below. >>>>>>>> >>>>>>> Should we apply this patch even though it may not solve the specific problem ? >>>>>> >>>>>> Yes, why not - it certainly won't hurt. I am running it right now, so >>>>>> it is at least run-tested. >>>>>> >>>>>>> Again, not sure if the key count can change, but the current code is at the very >>>>>>> least inconsistent, as it keeps reading the key count without updating or >>>>>>> verifying the cache size. >>>>>> >>>>>> Yes - I agree that the error state is far-fetched, but it is hard to >>>>>> see any other logical explanation. There is of course always the >>>>>> possibility that the problem is somewhere else completely. >>>>>> >>>>>> Proper patch attached. >>>>>> >>>>>> Thanks, >>>>>> Henrik >>>>>> >>>>>> --- >>>>>> >>>>>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 >>>>>> From: Henrik Rydberg <rydberg@euromail.se> >>>>>> Date: Thu, 26 Sep 2013 08:33:16 +0200 >>>>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding >>>>>> >>>>>> After reports from Chris and Josh Boyer of a rare crash in applesmc, >>>>>> Guenter pointed at the initialization problem fixed below. The patch >>>>>> has not been verified to fix the crash, but should be applied >>>>>> regardless. >>>>>> >>>>>> Reported-by: <jwboyer@fedoraproject.org> >>>>>> Suggested-by: Guenter Roeck <linux@roeck-us.net> >>>>>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se> >>>>>> --- >>>>>> drivers/hwmon/applesmc.c | 11 ++++++++++- >>>>>> 1 file changed, 10 insertions(+), 1 deletion(-) >>>>> >>>>> Thanks for the quick reply. I'll get this rolled into our kernels soon. >>>>> >>>> I sent a pull request to Linus, so you should be able to pull it from >>>> the upstream kernel shortly. Would be great to get feedback if the patch >>>> solves the problem (or doesn't). >>> >>> I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both. >>> >> When you do, can you possibly trigger an event based on the warning added >> with the patch ? This might help us to identify if the problem fixed >> with the patch actually happens. > > I don't understand the question. I'm uncertain how to trigger, and also what event. > The patch includes a new warning message. pr_warn("key count changed from %d to %d\n", s->key_count, count); It would be great if there would be a means to detect if this message is seen in a kernel log, because it would show that the potential crash condition fixed with the patch was actually encountered. This would help us to determine if we actually fixed the problem or not. Of course, we'll know if is wasn't fixed if the system still crashes. Thanks, Guenter ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-09-27 23:33 ` Guenter Roeck 0 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-09-27 23:33 UTC (permalink / raw) To: Chris Murphy Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On 09/27/2013 11:03 AM, Chris Murphy wrote: > > On Sep 27, 2013, at 11:59 AM, Guenter Roeck <linux@roeck-us.net> wrote: > >> On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote: >>> >>> On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote: >>> >>>> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote: >>>>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote: >>>>>>>>> This suggests that initialization may be attempted more than once. The key cache >>>>>>>>> is allocated only once, but the number of keys is read for each attempt. >>>>>>>>> >>>>>>>>> No idea if that can happen, but if the number of keys can increase after >>>>>>>>> the first initialization attempt you would have an explanation for the crash. >>>>>>>> >>>>>>>> Good idea, and easy enough to test with the patch below. >>>>>>>> >>>>>>> Should we apply this patch even though it may not solve the specific problem ? >>>>>> >>>>>> Yes, why not - it certainly won't hurt. I am running it right now, so >>>>>> it is at least run-tested. >>>>>> >>>>>>> Again, not sure if the key count can change, but the current code is at the very >>>>>>> least inconsistent, as it keeps reading the key count without updating or >>>>>>> verifying the cache size. >>>>>> >>>>>> Yes - I agree that the error state is far-fetched, but it is hard to >>>>>> see any other logical explanation. There is of course always the >>>>>> possibility that the problem is somewhere else completely. >>>>>> >>>>>> Proper patch attached. >>>>>> >>>>>> Thanks, >>>>>> Henrik >>>>>> >>>>>> --- >>>>>> >>>>>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 >>>>>> From: Henrik Rydberg <rydberg@euromail.se> >>>>>> Date: Thu, 26 Sep 2013 08:33:16 +0200 >>>>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding >>>>>> >>>>>> After reports from Chris and Josh Boyer of a rare crash in applesmc, >>>>>> Guenter pointed at the initialization problem fixed below. The patch >>>>>> has not been verified to fix the crash, but should be applied >>>>>> regardless. >>>>>> >>>>>> Reported-by: <jwboyer@fedoraproject.org> >>>>>> Suggested-by: Guenter Roeck <linux@roeck-us.net> >>>>>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se> >>>>>> --- >>>>>> drivers/hwmon/applesmc.c | 11 ++++++++++- >>>>>> 1 file changed, 10 insertions(+), 1 deletion(-) >>>>> >>>>> Thanks for the quick reply. I'll get this rolled into our kernels soon. >>>>> >>>> I sent a pull request to Linus, so you should be able to pull it from >>>> the upstream kernel shortly. Would be great to get feedback if the patch >>>> solves the problem (or doesn't). >>> >>> I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both. >>> >> When you do, can you possibly trigger an event based on the warning added >> with the patch ? This might help us to identify if the problem fixed >> with the patch actually happens. > > I don't understand the question. I'm uncertain how to trigger, and also what event. > The patch includes a new warning message. pr_warn("key count changed from %d to %d\n", s->key_count, count); It would be great if there would be a means to detect if this message is seen in a kernel log, because it would show that the potential crash condition fixed with the patch was actually encountered. This would help us to determine if we actually fixed the problem or not. Of course, we'll know if is wasn't fixed if the system still crashes. Thanks, Guenter _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-09-27 23:33 ` [lm-sensors] " Guenter Roeck @ 2013-10-01 1:57 ` Chris Murphy -1 siblings, 0 replies; 86+ messages in thread From: Chris Murphy @ 2013-10-01 1:57 UTC (permalink / raw) To: Guenter Roeck Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Sep 27, 2013, at 5:33 PM, Guenter Roeck <linux@roeck-us.net> wrote: > On 09/27/2013 11:03 AM, Chris Murphy wrote: >> >> On Sep 27, 2013, at 11:59 AM, Guenter Roeck <linux@roeck-us.net> wrote: >> >>> On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote: >>>> >>>> On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote: >>>> >>>>> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote: >>>>>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote: >>>>>>>>>> This suggests that initialization may be attempted more than once. The key cache >>>>>>>>>> is allocated only once, but the number of keys is read for each attempt. >>>>>>>>>> >>>>>>>>>> No idea if that can happen, but if the number of keys can increase after >>>>>>>>>> the first initialization attempt you would have an explanation for the crash. >>>>>>>>> >>>>>>>>> Good idea, and easy enough to test with the patch below. >>>>>>>>> >>>>>>>> Should we apply this patch even though it may not solve the specific problem ? >>>>>>> >>>>>>> Yes, why not - it certainly won't hurt. I am running it right now, so >>>>>>> it is at least run-tested. >>>>>>> >>>>>>>> Again, not sure if the key count can change, but the current code is at the very >>>>>>>> least inconsistent, as it keeps reading the key count without updating or >>>>>>>> verifying the cache size. >>>>>>> >>>>>>> Yes - I agree that the error state is far-fetched, but it is hard to >>>>>>> see any other logical explanation. There is of course always the >>>>>>> possibility that the problem is somewhere else completely. >>>>>>> >>>>>>> Proper patch attached. >>>>>>> >>>>>>> Thanks, >>>>>>> Henrik >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 >>>>>>> From: Henrik Rydberg <rydberg@euromail.se> >>>>>>> Date: Thu, 26 Sep 2013 08:33:16 +0200 >>>>>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding >>>>>>> >>>>>>> After reports from Chris and Josh Boyer of a rare crash in applesmc, >>>>>>> Guenter pointed at the initialization problem fixed below. The patch >>>>>>> has not been verified to fix the crash, but should be applied >>>>>>> regardless. >>>>>>> >>>>>>> Reported-by: <jwboyer@fedoraproject.org> >>>>>>> Suggested-by: Guenter Roeck <linux@roeck-us.net> >>>>>>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se> >>>>>>> --- >>>>>>> drivers/hwmon/applesmc.c | 11 ++++++++++- >>>>>>> 1 file changed, 10 insertions(+), 1 deletion(-) >>>>>> >>>>>> Thanks for the quick reply. I'll get this rolled into our kernels soon. >>>>>> >>>>> I sent a pull request to Linus, so you should be able to pull it from >>>>> the upstream kernel shortly. Would be great to get feedback if the patch >>>>> solves the problem (or doesn't). >>>> >>>> I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both. >>>> >>> When you do, can you possibly trigger an event based on the warning added >>> with the patch ? This might help us to identify if the problem fixed >>> with the patch actually happens. >> >> I don't understand the question. I'm uncertain how to trigger, and also what event. >> > > The patch includes a new warning message. > > pr_warn("key count changed from %d to %d\n", > s->key_count, count); > > It would be great if there would be a means to detect if this message is seen > in a kernel log, because it would show that the potential crash condition > fixed with the patch was actually encountered. This would help us to determine > if we actually fixed the problem or not. > > Of course, we'll know if is wasn't fixed if the system still crashes. Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64. [ 10.886016] applesmc: key count changed from 261 to 1174405121 Attaching new full dmesg to the bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1011719#c11 Chris ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-01 1:57 ` Chris Murphy 0 siblings, 0 replies; 86+ messages in thread From: Chris Murphy @ 2013-10-01 1:57 UTC (permalink / raw) To: Guenter Roeck Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Sep 27, 2013, at 5:33 PM, Guenter Roeck <linux@roeck-us.net> wrote: > On 09/27/2013 11:03 AM, Chris Murphy wrote: >> >> On Sep 27, 2013, at 11:59 AM, Guenter Roeck <linux@roeck-us.net> wrote: >> >>> On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote: >>>> >>>> On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote: >>>> >>>>> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote: >>>>>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote: >>>>>>>>>> This suggests that initialization may be attempted more than once. The key cache >>>>>>>>>> is allocated only once, but the number of keys is read for each attempt. >>>>>>>>>> >>>>>>>>>> No idea if that can happen, but if the number of keys can increase after >>>>>>>>>> the first initialization attempt you would have an explanation for the crash. >>>>>>>>> >>>>>>>>> Good idea, and easy enough to test with the patch below. >>>>>>>>> >>>>>>>> Should we apply this patch even though it may not solve the specific problem ? >>>>>>> >>>>>>> Yes, why not - it certainly won't hurt. I am running it right now, so >>>>>>> it is at least run-tested. >>>>>>> >>>>>>>> Again, not sure if the key count can change, but the current code is at the very >>>>>>>> least inconsistent, as it keeps reading the key count without updating or >>>>>>>> verifying the cache size. >>>>>>> >>>>>>> Yes - I agree that the error state is far-fetched, but it is hard to >>>>>>> see any other logical explanation. There is of course always the >>>>>>> possibility that the problem is somewhere else completely. >>>>>>> >>>>>>> Proper patch attached. >>>>>>> >>>>>>> Thanks, >>>>>>> Henrik >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 >>>>>>> From: Henrik Rydberg <rydberg@euromail.se> >>>>>>> Date: Thu, 26 Sep 2013 08:33:16 +0200 >>>>>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding >>>>>>> >>>>>>> After reports from Chris and Josh Boyer of a rare crash in applesmc, >>>>>>> Guenter pointed at the initialization problem fixed below. The patch >>>>>>> has not been verified to fix the crash, but should be applied >>>>>>> regardless. >>>>>>> >>>>>>> Reported-by: <jwboyer@fedoraproject.org> >>>>>>> Suggested-by: Guenter Roeck <linux@roeck-us.net> >>>>>>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se> >>>>>>> --- >>>>>>> drivers/hwmon/applesmc.c | 11 ++++++++++- >>>>>>> 1 file changed, 10 insertions(+), 1 deletion(-) >>>>>> >>>>>> Thanks for the quick reply. I'll get this rolled into our kernels soon. >>>>>> >>>>> I sent a pull request to Linus, so you should be able to pull it from >>>>> the upstream kernel shortly. Would be great to get feedback if the patch >>>>> solves the problem (or doesn't). >>>> >>>> I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both. >>>> >>> When you do, can you possibly trigger an event based on the warning added >>> with the patch ? This might help us to identify if the problem fixed >>> with the patch actually happens. >> >> I don't understand the question. I'm uncertain how to trigger, and also what event. >> > > The patch includes a new warning message. > > pr_warn("key count changed from %d to %d\n", > s->key_count, count); > > It would be great if there would be a means to detect if this message is seen > in a kernel log, because it would show that the potential crash condition > fixed with the patch was actually encountered. This would help us to determine > if we actually fixed the problem or not. > > Of course, we'll know if is wasn't fixed if the system still crashes. Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64. [ 10.886016] applesmc: key count changed from 261 to 1174405121 Attaching new full dmesg to the bug report: https://bugzilla.redhat.com/show_bug.cgi?id\x1011719#c11 Chris _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-01 1:57 ` [lm-sensors] " Chris Murphy @ 2013-10-01 3:37 ` Guenter Roeck -1 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-01 3:37 UTC (permalink / raw) To: Chris Murphy Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On 09/30/2013 06:57 PM, Chris Murphy wrote: > > On Sep 27, 2013, at 5:33 PM, Guenter Roeck <linux@roeck-us.net> wrote: > >> On 09/27/2013 11:03 AM, Chris Murphy wrote: >>> >>> On Sep 27, 2013, at 11:59 AM, Guenter Roeck <linux@roeck-us.net> wrote: >>> >>>> On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote: >>>>> >>>>> On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote: >>>>> >>>>>> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote: >>>>>>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote: >>>>>>>>>>> This suggests that initialization may be attempted more than once. The key cache >>>>>>>>>>> is allocated only once, but the number of keys is read for each attempt. >>>>>>>>>>> >>>>>>>>>>> No idea if that can happen, but if the number of keys can increase after >>>>>>>>>>> the first initialization attempt you would have an explanation for the crash. >>>>>>>>>> >>>>>>>>>> Good idea, and easy enough to test with the patch below. >>>>>>>>>> >>>>>>>>> Should we apply this patch even though it may not solve the specific problem ? >>>>>>>> >>>>>>>> Yes, why not - it certainly won't hurt. I am running it right now, so >>>>>>>> it is at least run-tested. >>>>>>>> >>>>>>>>> Again, not sure if the key count can change, but the current code is at the very >>>>>>>>> least inconsistent, as it keeps reading the key count without updating or >>>>>>>>> verifying the cache size. >>>>>>>> >>>>>>>> Yes - I agree that the error state is far-fetched, but it is hard to >>>>>>>> see any other logical explanation. There is of course always the >>>>>>>> possibility that the problem is somewhere else completely. >>>>>>>> >>>>>>>> Proper patch attached. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Henrik >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 >>>>>>>> From: Henrik Rydberg <rydberg@euromail.se> >>>>>>>> Date: Thu, 26 Sep 2013 08:33:16 +0200 >>>>>>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding >>>>>>>> >>>>>>>> After reports from Chris and Josh Boyer of a rare crash in applesmc, >>>>>>>> Guenter pointed at the initialization problem fixed below. The patch >>>>>>>> has not been verified to fix the crash, but should be applied >>>>>>>> regardless. >>>>>>>> >>>>>>>> Reported-by: <jwboyer@fedoraproject.org> >>>>>>>> Suggested-by: Guenter Roeck <linux@roeck-us.net> >>>>>>>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se> >>>>>>>> --- >>>>>>>> drivers/hwmon/applesmc.c | 11 ++++++++++- >>>>>>>> 1 file changed, 10 insertions(+), 1 deletion(-) >>>>>>> >>>>>>> Thanks for the quick reply. I'll get this rolled into our kernels soon. >>>>>>> >>>>>> I sent a pull request to Linus, so you should be able to pull it from >>>>>> the upstream kernel shortly. Would be great to get feedback if the patch >>>>>> solves the problem (or doesn't). >>>>> >>>>> I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both. >>>>> >>>> When you do, can you possibly trigger an event based on the warning added >>>> with the patch ? This might help us to identify if the problem fixed >>>> with the patch actually happens. >>> >>> I don't understand the question. I'm uncertain how to trigger, and also what event. >>> >> >> The patch includes a new warning message. >> >> pr_warn("key count changed from %d to %d\n", >> s->key_count, count); >> >> It would be great if there would be a means to detect if this message is seen >> in a kernel log, because it would show that the potential crash condition >> fixed with the patch was actually encountered. This would help us to determine >> if we actually fixed the problem or not. >> >> Of course, we'll know if is wasn't fixed if the system still crashes. > > Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64. > > [ 10.886016] applesmc: key count changed from 261 to 1174405121 > Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001. Which I guess explains the subsequent memory allocation error in the log. Henrik, any idea what might be going on ? Is it possible that the previous command failure leaves some state machine in a bad state ? Guenter > Attaching new full dmesg to the bug report: > https://bugzilla.redhat.com/show_bug.cgi?id=1011719#c11 > > Chris > ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-01 3:37 ` Guenter Roeck 0 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-01 3:37 UTC (permalink / raw) To: Chris Murphy Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On 09/30/2013 06:57 PM, Chris Murphy wrote: > > On Sep 27, 2013, at 5:33 PM, Guenter Roeck <linux@roeck-us.net> wrote: > >> On 09/27/2013 11:03 AM, Chris Murphy wrote: >>> >>> On Sep 27, 2013, at 11:59 AM, Guenter Roeck <linux@roeck-us.net> wrote: >>> >>>> On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote: >>>>> >>>>> On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote: >>>>> >>>>>> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote: >>>>>>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote: >>>>>>>>>>> This suggests that initialization may be attempted more than once. The key cache >>>>>>>>>>> is allocated only once, but the number of keys is read for each attempt. >>>>>>>>>>> >>>>>>>>>>> No idea if that can happen, but if the number of keys can increase after >>>>>>>>>>> the first initialization attempt you would have an explanation for the crash. >>>>>>>>>> >>>>>>>>>> Good idea, and easy enough to test with the patch below. >>>>>>>>>> >>>>>>>>> Should we apply this patch even though it may not solve the specific problem ? >>>>>>>> >>>>>>>> Yes, why not - it certainly won't hurt. I am running it right now, so >>>>>>>> it is at least run-tested. >>>>>>>> >>>>>>>>> Again, not sure if the key count can change, but the current code is at the very >>>>>>>>> least inconsistent, as it keeps reading the key count without updating or >>>>>>>>> verifying the cache size. >>>>>>>> >>>>>>>> Yes - I agree that the error state is far-fetched, but it is hard to >>>>>>>> see any other logical explanation. There is of course always the >>>>>>>> possibility that the problem is somewhere else completely. >>>>>>>> >>>>>>>> Proper patch attached. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Henrik >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 >>>>>>>> From: Henrik Rydberg <rydberg@euromail.se> >>>>>>>> Date: Thu, 26 Sep 2013 08:33:16 +0200 >>>>>>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding >>>>>>>> >>>>>>>> After reports from Chris and Josh Boyer of a rare crash in applesmc, >>>>>>>> Guenter pointed at the initialization problem fixed below. The patch >>>>>>>> has not been verified to fix the crash, but should be applied >>>>>>>> regardless. >>>>>>>> >>>>>>>> Reported-by: <jwboyer@fedoraproject.org> >>>>>>>> Suggested-by: Guenter Roeck <linux@roeck-us.net> >>>>>>>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se> >>>>>>>> --- >>>>>>>> drivers/hwmon/applesmc.c | 11 ++++++++++- >>>>>>>> 1 file changed, 10 insertions(+), 1 deletion(-) >>>>>>> >>>>>>> Thanks for the quick reply. I'll get this rolled into our kernels soon. >>>>>>> >>>>>> I sent a pull request to Linus, so you should be able to pull it from >>>>>> the upstream kernel shortly. Would be great to get feedback if the patch >>>>>> solves the problem (or doesn't). >>>>> >>>>> I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both. >>>>> >>>> When you do, can you possibly trigger an event based on the warning added >>>> with the patch ? This might help us to identify if the problem fixed >>>> with the patch actually happens. >>> >>> I don't understand the question. I'm uncertain how to trigger, and also what event. >>> >> >> The patch includes a new warning message. >> >> pr_warn("key count changed from %d to %d\n", >> s->key_count, count); >> >> It would be great if there would be a means to detect if this message is seen >> in a kernel log, because it would show that the potential crash condition >> fixed with the patch was actually encountered. This would help us to determine >> if we actually fixed the problem or not. >> >> Of course, we'll know if is wasn't fixed if the system still crashes. > > Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64. > > [ 10.886016] applesmc: key count changed from 261 to 1174405121 > Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001. Which I guess explains the subsequent memory allocation error in the log. Henrik, any idea what might be going on ? Is it possible that the previous command failure leaves some state machine in a bad state ? Guenter > Attaching new full dmesg to the bug report: > https://bugzilla.redhat.com/show_bug.cgi?id\x1011719#c11 > > Chris > _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-01 3:37 ` [lm-sensors] " Guenter Roeck @ 2013-10-01 10:55 ` Henrik Rydberg -1 siblings, 0 replies; 86+ messages in thread From: Henrik Rydberg @ 2013-10-01 10:55 UTC (permalink / raw) To: Guenter Roeck Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org > >Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64. > > > >[ 10.886016] applesmc: key count changed from 261 to 1174405121 > > > > Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001. > Which I guess explains the subsequent memory allocation error in the log. > > Henrik, any idea what might be going on ? Is it possible that the previous > command failure leaves some state machine in a bad state ? I seem to recall a report on another similar state problem on newer machines, so maybe, yes. Older machines seem fine, I have never encountered the problem myself. Here is a patch to test that theory. It has been tested to be pretty harmless on two different generations. I really really do not want to add an 'if (value is insane)' check ;-) Thanks, Henrik >From d48a9e4e6e45dcd9c7e7ad88df714b92772a62d6 Mon Sep 17 00:00:00 2001 From: Henrik Rydberg <rydberg@euromail.se> Date: Tue, 1 Oct 2013 12:16:09 +0200 Subject: [PATCH] applesmc remedy take 1 Conjectured problem: there are remnant bytes ready on the data line which corrupts the read after a failure. Remedy: assuming bit0 is the read valid line, try to flush it before starting a new command. Exception: the write-number-of-bytes-to-read command seems to differ between models (it may not be needed on the newest), so do not try to flush the data at that particular point. Tested on a MacBookPro10,1 and a MacBookAir3,1, but the original problem has not been reproduced, so the actual effect of this patch is unknown. --- drivers/hwmon/applesmc.c | 29 +++++++++++++++++++++-------- 1 file changed, 21 insertions(+), 8 deletions(-) diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c index 98814d1..f6eaf6d1 100644 --- a/drivers/hwmon/applesmc.c +++ b/drivers/hwmon/applesmc.c @@ -186,10 +186,23 @@ static int wait_read(void) * send_byte - Write to SMC port, retrying when necessary. Callers * must hold applesmc_lock. */ -static int send_byte(u8 cmd, u16 port) +static int send_byte(u8 cmd, u16 port, bool flush) { - u8 status; - int us; + u8 status, data; + int us, nskip; + + if (flush) { + /* read the data port until bit0 is cleared */ + for (nskip = 0; nskip < 16; nskip++) { + udelay(APPLESMC_MIN_WAIT); + status = inb(APPLESMC_CMD_PORT); + if (!(status & 0x01)) + break; + data = inb(APPLESMC_DATA_PORT); + } + if (nskip) + pr_warn("flushed %d bytes\n", nskip); + } outb(cmd, port); for (us = APPLESMC_MIN_WAIT; us < APPLESMC_MAX_WAIT; us <<= 1) { @@ -215,7 +228,7 @@ static int send_byte(u8 cmd, u16 port) static int send_command(u8 cmd) { - return send_byte(cmd, APPLESMC_CMD_PORT); + return send_byte(cmd, APPLESMC_CMD_PORT, true); } static int send_argument(const char *key) @@ -223,7 +236,7 @@ static int send_argument(const char *key) int i; for (i = 0; i < 4; i++) - if (send_byte(key[i], APPLESMC_DATA_PORT)) + if (send_byte(key[i], APPLESMC_DATA_PORT, true)) return -EIO; return 0; } @@ -237,7 +250,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) return -EIO; } - if (send_byte(len, APPLESMC_DATA_PORT)) { + if (send_byte(len, APPLESMC_DATA_PORT, false)) { pr_warn("%.4s: read len fail\n", key); return -EIO; } @@ -262,13 +275,13 @@ static int write_smc(u8 cmd, const char *key, const u8 *buffer, u8 len) return -EIO; } - if (send_byte(len, APPLESMC_DATA_PORT)) { + if (send_byte(len, APPLESMC_DATA_PORT, true)) { pr_warn("%.4s: write len fail\n", key); return -EIO; } for (i = 0; i < len; i++) { - if (send_byte(buffer[i], APPLESMC_DATA_PORT)) { + if (send_byte(buffer[i], APPLESMC_DATA_PORT, true)) { pr_warn("%s: write data fail\n", key); return -EIO; } -- 1.8.3.4 ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-01 10:55 ` Henrik Rydberg 0 siblings, 0 replies; 86+ messages in thread From: Henrik Rydberg @ 2013-10-01 10:55 UTC (permalink / raw) To: Guenter Roeck Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org > >Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64. > > > >[ 10.886016] applesmc: key count changed from 261 to 1174405121 > > > > Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001. > Which I guess explains the subsequent memory allocation error in the log. > > Henrik, any idea what might be going on ? Is it possible that the previous > command failure leaves some state machine in a bad state ? I seem to recall a report on another similar state problem on newer machines, so maybe, yes. Older machines seem fine, I have never encountered the problem myself. Here is a patch to test that theory. It has been tested to be pretty harmless on two different generations. I really really do not want to add an 'if (value is insane)' check ;-) Thanks, Henrik From d48a9e4e6e45dcd9c7e7ad88df714b92772a62d6 Mon Sep 17 00:00:00 2001 From: Henrik Rydberg <rydberg@euromail.se> Date: Tue, 1 Oct 2013 12:16:09 +0200 Subject: [PATCH] applesmc remedy take 1 Conjectured problem: there are remnant bytes ready on the data line which corrupts the read after a failure. Remedy: assuming bit0 is the read valid line, try to flush it before starting a new command. Exception: the write-number-of-bytes-to-read command seems to differ between models (it may not be needed on the newest), so do not try to flush the data at that particular point. Tested on a MacBookPro10,1 and a MacBookAir3,1, but the original problem has not been reproduced, so the actual effect of this patch is unknown. --- drivers/hwmon/applesmc.c | 29 +++++++++++++++++++++-------- 1 file changed, 21 insertions(+), 8 deletions(-) diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c index 98814d1..f6eaf6d1 100644 --- a/drivers/hwmon/applesmc.c +++ b/drivers/hwmon/applesmc.c @@ -186,10 +186,23 @@ static int wait_read(void) * send_byte - Write to SMC port, retrying when necessary. Callers * must hold applesmc_lock. */ -static int send_byte(u8 cmd, u16 port) +static int send_byte(u8 cmd, u16 port, bool flush) { - u8 status; - int us; + u8 status, data; + int us, nskip; + + if (flush) { + /* read the data port until bit0 is cleared */ + for (nskip = 0; nskip < 16; nskip++) { + udelay(APPLESMC_MIN_WAIT); + status = inb(APPLESMC_CMD_PORT); + if (!(status & 0x01)) + break; + data = inb(APPLESMC_DATA_PORT); + } + if (nskip) + pr_warn("flushed %d bytes\n", nskip); + } outb(cmd, port); for (us = APPLESMC_MIN_WAIT; us < APPLESMC_MAX_WAIT; us <<= 1) { @@ -215,7 +228,7 @@ static int send_byte(u8 cmd, u16 port) static int send_command(u8 cmd) { - return send_byte(cmd, APPLESMC_CMD_PORT); + return send_byte(cmd, APPLESMC_CMD_PORT, true); } static int send_argument(const char *key) @@ -223,7 +236,7 @@ static int send_argument(const char *key) int i; for (i = 0; i < 4; i++) - if (send_byte(key[i], APPLESMC_DATA_PORT)) + if (send_byte(key[i], APPLESMC_DATA_PORT, true)) return -EIO; return 0; } @@ -237,7 +250,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) return -EIO; } - if (send_byte(len, APPLESMC_DATA_PORT)) { + if (send_byte(len, APPLESMC_DATA_PORT, false)) { pr_warn("%.4s: read len fail\n", key); return -EIO; } @@ -262,13 +275,13 @@ static int write_smc(u8 cmd, const char *key, const u8 *buffer, u8 len) return -EIO; } - if (send_byte(len, APPLESMC_DATA_PORT)) { + if (send_byte(len, APPLESMC_DATA_PORT, true)) { pr_warn("%.4s: write len fail\n", key); return -EIO; } for (i = 0; i < len; i++) { - if (send_byte(buffer[i], APPLESMC_DATA_PORT)) { + if (send_byte(buffer[i], APPLESMC_DATA_PORT, true)) { pr_warn("%s: write data fail\n", key); return -EIO; } -- 1.8.3.4 _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-01 10:55 ` [lm-sensors] " Henrik Rydberg @ 2013-10-01 15:19 ` Guenter Roeck -1 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-01 15:19 UTC (permalink / raw) To: Henrik Rydberg Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote: > > >Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64. > > > > > >[ 10.886016] applesmc: key count changed from 261 to 1174405121 > > > > > > > Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001. > > Which I guess explains the subsequent memory allocation error in the log. > > > > Henrik, any idea what might be going on ? Is it possible that the previous > > command failure leaves some state machine in a bad state ? > > I seem to recall a report on another similar state problem on newer > machines, so maybe, yes. Older machines seem fine, I have never > encountered the problem myself. Here is a patch to test that > theory. It has been tested to be pretty harmless on two different > generations. > > I really really do not want to add an 'if (value is insane)' check ;-) > Chris, any chance you can load this patch on an affected machine so we can get test feedback ? This one is too experimental to submit upstream without knowing that it really fixes the problem. Thanks, Guenter > Thanks, > Henrik > > From d48a9e4e6e45dcd9c7e7ad88df714b92772a62d6 Mon Sep 17 00:00:00 2001 > From: Henrik Rydberg <rydberg@euromail.se> > Date: Tue, 1 Oct 2013 12:16:09 +0200 > Subject: [PATCH] applesmc remedy take 1 > > Conjectured problem: there are remnant bytes ready on the data line > which corrupts the read after a failure. > > Remedy: assuming bit0 is the read valid line, try to flush it before > starting a new command. > > Exception: the write-number-of-bytes-to-read command seems to differ > between models (it may not be needed on the newest), so do not try to > flush the data at that particular point. > > Tested on a MacBookPro10,1 and a MacBookAir3,1, but the original problem > has not been reproduced, so the actual effect of this patch is unknown. > --- > drivers/hwmon/applesmc.c | 29 +++++++++++++++++++++-------- > 1 file changed, 21 insertions(+), 8 deletions(-) > > diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c > index 98814d1..f6eaf6d1 100644 > --- a/drivers/hwmon/applesmc.c > +++ b/drivers/hwmon/applesmc.c > @@ -186,10 +186,23 @@ static int wait_read(void) > * send_byte - Write to SMC port, retrying when necessary. Callers > * must hold applesmc_lock. > */ > -static int send_byte(u8 cmd, u16 port) > +static int send_byte(u8 cmd, u16 port, bool flush) > { > - u8 status; > - int us; > + u8 status, data; > + int us, nskip; > + > + if (flush) { > + /* read the data port until bit0 is cleared */ > + for (nskip = 0; nskip < 16; nskip++) { > + udelay(APPLESMC_MIN_WAIT); > + status = inb(APPLESMC_CMD_PORT); > + if (!(status & 0x01)) > + break; > + data = inb(APPLESMC_DATA_PORT); > + } > + if (nskip) > + pr_warn("flushed %d bytes\n", nskip); > + } > > outb(cmd, port); > for (us = APPLESMC_MIN_WAIT; us < APPLESMC_MAX_WAIT; us <<= 1) { > @@ -215,7 +228,7 @@ static int send_byte(u8 cmd, u16 port) > > static int send_command(u8 cmd) > { > - return send_byte(cmd, APPLESMC_CMD_PORT); > + return send_byte(cmd, APPLESMC_CMD_PORT, true); > } > > static int send_argument(const char *key) > @@ -223,7 +236,7 @@ static int send_argument(const char *key) > int i; > > for (i = 0; i < 4; i++) > - if (send_byte(key[i], APPLESMC_DATA_PORT)) > + if (send_byte(key[i], APPLESMC_DATA_PORT, true)) > return -EIO; > return 0; > } > @@ -237,7 +250,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) > return -EIO; > } > > - if (send_byte(len, APPLESMC_DATA_PORT)) { > + if (send_byte(len, APPLESMC_DATA_PORT, false)) { > pr_warn("%.4s: read len fail\n", key); > return -EIO; > } > @@ -262,13 +275,13 @@ static int write_smc(u8 cmd, const char *key, const u8 *buffer, u8 len) > return -EIO; > } > > - if (send_byte(len, APPLESMC_DATA_PORT)) { > + if (send_byte(len, APPLESMC_DATA_PORT, true)) { > pr_warn("%.4s: write len fail\n", key); > return -EIO; > } > > for (i = 0; i < len; i++) { > - if (send_byte(buffer[i], APPLESMC_DATA_PORT)) { > + if (send_byte(buffer[i], APPLESMC_DATA_PORT, true)) { > pr_warn("%s: write data fail\n", key); > return -EIO; > } > -- > 1.8.3.4 > > ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-01 15:19 ` Guenter Roeck 0 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-01 15:19 UTC (permalink / raw) To: Henrik Rydberg Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote: > > >Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64. > > > > > >[ 10.886016] applesmc: key count changed from 261 to 1174405121 > > > > > > > Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001. > > Which I guess explains the subsequent memory allocation error in the log. > > > > Henrik, any idea what might be going on ? Is it possible that the previous > > command failure leaves some state machine in a bad state ? > > I seem to recall a report on another similar state problem on newer > machines, so maybe, yes. Older machines seem fine, I have never > encountered the problem myself. Here is a patch to test that > theory. It has been tested to be pretty harmless on two different > generations. > > I really really do not want to add an 'if (value is insane)' check ;-) > Chris, any chance you can load this patch on an affected machine so we can get test feedback ? This one is too experimental to submit upstream without knowing that it really fixes the problem. Thanks, Guenter > Thanks, > Henrik > > From d48a9e4e6e45dcd9c7e7ad88df714b92772a62d6 Mon Sep 17 00:00:00 2001 > From: Henrik Rydberg <rydberg@euromail.se> > Date: Tue, 1 Oct 2013 12:16:09 +0200 > Subject: [PATCH] applesmc remedy take 1 > > Conjectured problem: there are remnant bytes ready on the data line > which corrupts the read after a failure. > > Remedy: assuming bit0 is the read valid line, try to flush it before > starting a new command. > > Exception: the write-number-of-bytes-to-read command seems to differ > between models (it may not be needed on the newest), so do not try to > flush the data at that particular point. > > Tested on a MacBookPro10,1 and a MacBookAir3,1, but the original problem > has not been reproduced, so the actual effect of this patch is unknown. > --- > drivers/hwmon/applesmc.c | 29 +++++++++++++++++++++-------- > 1 file changed, 21 insertions(+), 8 deletions(-) > > diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c > index 98814d1..f6eaf6d1 100644 > --- a/drivers/hwmon/applesmc.c > +++ b/drivers/hwmon/applesmc.c > @@ -186,10 +186,23 @@ static int wait_read(void) > * send_byte - Write to SMC port, retrying when necessary. Callers > * must hold applesmc_lock. > */ > -static int send_byte(u8 cmd, u16 port) > +static int send_byte(u8 cmd, u16 port, bool flush) > { > - u8 status; > - int us; > + u8 status, data; > + int us, nskip; > + > + if (flush) { > + /* read the data port until bit0 is cleared */ > + for (nskip = 0; nskip < 16; nskip++) { > + udelay(APPLESMC_MIN_WAIT); > + status = inb(APPLESMC_CMD_PORT); > + if (!(status & 0x01)) > + break; > + data = inb(APPLESMC_DATA_PORT); > + } > + if (nskip) > + pr_warn("flushed %d bytes\n", nskip); > + } > > outb(cmd, port); > for (us = APPLESMC_MIN_WAIT; us < APPLESMC_MAX_WAIT; us <<= 1) { > @@ -215,7 +228,7 @@ static int send_byte(u8 cmd, u16 port) > > static int send_command(u8 cmd) > { > - return send_byte(cmd, APPLESMC_CMD_PORT); > + return send_byte(cmd, APPLESMC_CMD_PORT, true); > } > > static int send_argument(const char *key) > @@ -223,7 +236,7 @@ static int send_argument(const char *key) > int i; > > for (i = 0; i < 4; i++) > - if (send_byte(key[i], APPLESMC_DATA_PORT)) > + if (send_byte(key[i], APPLESMC_DATA_PORT, true)) > return -EIO; > return 0; > } > @@ -237,7 +250,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) > return -EIO; > } > > - if (send_byte(len, APPLESMC_DATA_PORT)) { > + if (send_byte(len, APPLESMC_DATA_PORT, false)) { > pr_warn("%.4s: read len fail\n", key); > return -EIO; > } > @@ -262,13 +275,13 @@ static int write_smc(u8 cmd, const char *key, const u8 *buffer, u8 len) > return -EIO; > } > > - if (send_byte(len, APPLESMC_DATA_PORT)) { > + if (send_byte(len, APPLESMC_DATA_PORT, true)) { > pr_warn("%.4s: write len fail\n", key); > return -EIO; > } > > for (i = 0; i < len; i++) { > - if (send_byte(buffer[i], APPLESMC_DATA_PORT)) { > + if (send_byte(buffer[i], APPLESMC_DATA_PORT, true)) { > pr_warn("%s: write data fail\n", key); > return -EIO; > } > -- > 1.8.3.4 > > _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-01 15:19 ` [lm-sensors] " Guenter Roeck @ 2013-10-01 15:33 ` Chris Murphy -1 siblings, 0 replies; 86+ messages in thread From: Chris Murphy @ 2013-10-01 15:33 UTC (permalink / raw) To: Guenter Roeck Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote: > On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote: >>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64. >>>> >>>> [ 10.886016] applesmc: key count changed from 261 to 1174405121 >>>> >>> >>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001. >>> Which I guess explains the subsequent memory allocation error in the log. >>> >>> Henrik, any idea what might be going on ? Is it possible that the previous >>> command failure leaves some state machine in a bad state ? >> >> I seem to recall a report on another similar state problem on newer >> machines, so maybe, yes. Older machines seem fine, I have never >> encountered the problem myself. Here is a patch to test that >> theory. It has been tested to be pretty harmless on two different >> generations. >> >> I really really do not want to add an 'if (value is insane)' check ;-) >> > Chris, > > any chance you can load this patch on an affected machine so we can get > test feedback ? This one is too experimental to submit upstream without > knowing that it really fixes the problem. Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012). Chris ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-01 15:33 ` Chris Murphy 0 siblings, 0 replies; 86+ messages in thread From: Chris Murphy @ 2013-10-01 15:33 UTC (permalink / raw) To: Guenter Roeck Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote: > On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote: >>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64. >>>> >>>> [ 10.886016] applesmc: key count changed from 261 to 1174405121 >>>> >>> >>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001. >>> Which I guess explains the subsequent memory allocation error in the log. >>> >>> Henrik, any idea what might be going on ? Is it possible that the previous >>> command failure leaves some state machine in a bad state ? >> >> I seem to recall a report on another similar state problem on newer >> machines, so maybe, yes. Older machines seem fine, I have never >> encountered the problem myself. Here is a patch to test that >> theory. It has been tested to be pretty harmless on two different >> generations. >> >> I really really do not want to add an 'if (value is insane)' check ;-) >> > Chris, > > any chance you can load this patch on an affected machine so we can get > test feedback ? This one is too experimental to submit upstream without > knowing that it really fixes the problem. Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012). Chris _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-01 15:33 ` [lm-sensors] " Chris Murphy @ 2013-10-01 16:24 ` Guenter Roeck -1 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-01 16:24 UTC (permalink / raw) To: Chris Murphy Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote: > > On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote: > > > On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote: > >>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64. > >>>> > >>>> [ 10.886016] applesmc: key count changed from 261 to 1174405121 > >>>> > >>> > >>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001. > >>> Which I guess explains the subsequent memory allocation error in the log. > >>> > >>> Henrik, any idea what might be going on ? Is it possible that the previous > >>> command failure leaves some state machine in a bad state ? > >> > >> I seem to recall a report on another similar state problem on newer > >> machines, so maybe, yes. Older machines seem fine, I have never > >> encountered the problem myself. Here is a patch to test that > >> theory. It has been tested to be pretty harmless on two different > >> generations. > >> > >> I really really do not want to add an 'if (value is insane)' check ;-) > >> > > Chris, > > > > any chance you can load this patch on an affected machine so we can get > > test feedback ? This one is too experimental to submit upstream without > > knowing that it really fixes the problem. > > Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012). > Only requirement is that it also includes the previous patch, so it would be optimal if you can apply it on top of the previous image. Thanks, Guenter ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-01 16:24 ` Guenter Roeck 0 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-01 16:24 UTC (permalink / raw) To: Chris Murphy Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote: > > On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote: > > > On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote: > >>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64. > >>>> > >>>> [ 10.886016] applesmc: key count changed from 261 to 1174405121 > >>>> > >>> > >>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001. > >>> Which I guess explains the subsequent memory allocation error in the log. > >>> > >>> Henrik, any idea what might be going on ? Is it possible that the previous > >>> command failure leaves some state machine in a bad state ? > >> > >> I seem to recall a report on another similar state problem on newer > >> machines, so maybe, yes. Older machines seem fine, I have never > >> encountered the problem myself. Here is a patch to test that > >> theory. It has been tested to be pretty harmless on two different > >> generations. > >> > >> I really really do not want to add an 'if (value is insane)' check ;-) > >> > > Chris, > > > > any chance you can load this patch on an affected machine so we can get > > test feedback ? This one is too experimental to submit upstream without > > knowing that it really fixes the problem. > > Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012). > Only requirement is that it also includes the previous patch, so it would be optimal if you can apply it on top of the previous image. Thanks, Guenter _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-01 16:24 ` [lm-sensors] " Guenter Roeck @ 2013-10-02 1:09 ` Chris Murphy -1 siblings, 0 replies; 86+ messages in thread From: Chris Murphy @ 2013-10-02 1:09 UTC (permalink / raw) To: Guenter Roeck Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Oct 1, 2013, at 10:24 AM, Guenter Roeck <linux@roeck-us.net> wrote: > On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote: >> >> On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote: >> >>> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote: >>>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64. >>>>>> >>>>>> [ 10.886016] applesmc: key count changed from 261 to 1174405121 >>>>>> >>>>> >>>>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001. >>>>> Which I guess explains the subsequent memory allocation error in the log. >>>>> >>>>> Henrik, any idea what might be going on ? Is it possible that the previous >>>>> command failure leaves some state machine in a bad state ? >>>> >>>> I seem to recall a report on another similar state problem on newer >>>> machines, so maybe, yes. Older machines seem fine, I have never >>>> encountered the problem myself. Here is a patch to test that >>>> theory. It has been tested to be pretty harmless on two different >>>> generations. >>>> >>>> I really really do not want to add an 'if (value is insane)' check ;-) >>>> >>> Chris, >>> >>> any chance you can load this patch on an affected machine so we can get >>> test feedback ? This one is too experimental to submit upstream without >>> knowing that it really fixes the problem. >> >> Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012). >> > Only requirement is that it also includes the previous patch, so it would be > optimal if you can apply it on top of the previous image. Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it: [ 13.799819] applesmc: key=261 fan=2 temp=14 index=14 acc=1 lux=2 kbd=1 [ 13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10 Chris ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-02 1:09 ` Chris Murphy 0 siblings, 0 replies; 86+ messages in thread From: Chris Murphy @ 2013-10-02 1:09 UTC (permalink / raw) To: Guenter Roeck Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Oct 1, 2013, at 10:24 AM, Guenter Roeck <linux@roeck-us.net> wrote: > On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote: >> >> On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote: >> >>> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote: >>>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64. >>>>>> >>>>>> [ 10.886016] applesmc: key count changed from 261 to 1174405121 >>>>>> >>>>> >>>>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001. >>>>> Which I guess explains the subsequent memory allocation error in the log. >>>>> >>>>> Henrik, any idea what might be going on ? Is it possible that the previous >>>>> command failure leaves some state machine in a bad state ? >>>> >>>> I seem to recall a report on another similar state problem on newer >>>> machines, so maybe, yes. Older machines seem fine, I have never >>>> encountered the problem myself. Here is a patch to test that >>>> theory. It has been tested to be pretty harmless on two different >>>> generations. >>>> >>>> I really really do not want to add an 'if (value is insane)' check ;-) >>>> >>> Chris, >>> >>> any chance you can load this patch on an affected machine so we can get >>> test feedback ? This one is too experimental to submit upstream without >>> knowing that it really fixes the problem. >> >> Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012). >> > Only requirement is that it also includes the previous patch, so it would be > optimal if you can apply it on top of the previous image. Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it: [ 13.799819] applesmc: key&1 fan=2 temp\x14 index\x14 acc=1 lux=2 kbd=1 [ 13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10 Chris _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-02 1:09 ` [lm-sensors] " Chris Murphy @ 2013-10-02 3:51 ` Guenter Roeck -1 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-02 3:51 UTC (permalink / raw) To: Chris Murphy Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Tue, Oct 01, 2013 at 07:09:26PM -0600, Chris Murphy wrote: > > On Oct 1, 2013, at 10:24 AM, Guenter Roeck <linux@roeck-us.net> wrote: > > > On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote: > >> > >> On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote: > >> > >>> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote: > >>>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64. > >>>>>> > >>>>>> [ 10.886016] applesmc: key count changed from 261 to 1174405121 > >>>>>> > >>>>> > >>>>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001. > >>>>> Which I guess explains the subsequent memory allocation error in the log. > >>>>> > >>>>> Henrik, any idea what might be going on ? Is it possible that the previous > >>>>> command failure leaves some state machine in a bad state ? > >>>> > >>>> I seem to recall a report on another similar state problem on newer > >>>> machines, so maybe, yes. Older machines seem fine, I have never > >>>> encountered the problem myself. Here is a patch to test that > >>>> theory. It has been tested to be pretty harmless on two different > >>>> generations. > >>>> > >>>> I really really do not want to add an 'if (value is insane)' check ;-) > >>>> > >>> Chris, > >>> > >>> any chance you can load this patch on an affected machine so we can get > >>> test feedback ? This one is too experimental to submit upstream without > >>> knowing that it really fixes the problem. > >> > >> Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012). > >> > > Only requirement is that it also includes the previous patch, so it would be > > optimal if you can apply it on top of the previous image. > > Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it: > > [ 13.799819] applesmc: key=261 fan=2 temp=14 index=14 acc=1 lux=2 kbd=1 > [ 13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10 > Hi Chris, That only means that you did not hit the problem. There may be some secondary trigger (cold boot ? coffee on the cpu ?). One thing I have seen in all logs is the earlier "send_byte fail" message, so I think that is a pre-requisite. Thanks, Guenter ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-02 3:51 ` Guenter Roeck 0 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-02 3:51 UTC (permalink / raw) To: Chris Murphy Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Tue, Oct 01, 2013 at 07:09:26PM -0600, Chris Murphy wrote: > > On Oct 1, 2013, at 10:24 AM, Guenter Roeck <linux@roeck-us.net> wrote: > > > On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote: > >> > >> On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote: > >> > >>> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote: > >>>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64. > >>>>>> > >>>>>> [ 10.886016] applesmc: key count changed from 261 to 1174405121 > >>>>>> > >>>>> > >>>>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001. > >>>>> Which I guess explains the subsequent memory allocation error in the log. > >>>>> > >>>>> Henrik, any idea what might be going on ? Is it possible that the previous > >>>>> command failure leaves some state machine in a bad state ? > >>>> > >>>> I seem to recall a report on another similar state problem on newer > >>>> machines, so maybe, yes. Older machines seem fine, I have never > >>>> encountered the problem myself. Here is a patch to test that > >>>> theory. It has been tested to be pretty harmless on two different > >>>> generations. > >>>> > >>>> I really really do not want to add an 'if (value is insane)' check ;-) > >>>> > >>> Chris, > >>> > >>> any chance you can load this patch on an affected machine so we can get > >>> test feedback ? This one is too experimental to submit upstream without > >>> knowing that it really fixes the problem. > >> > >> Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012). > >> > > Only requirement is that it also includes the previous patch, so it would be > > optimal if you can apply it on top of the previous image. > > Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it: > > [ 13.799819] applesmc: key&1 fan=2 temp\x14 index\x14 acc=1 lux=2 kbd=1 > [ 13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10 > Hi Chris, That only means that you did not hit the problem. There may be some secondary trigger (cold boot ? coffee on the cpu ?). One thing I have seen in all logs is the earlier "send_byte fail" message, so I think that is a pre-requisite. Thanks, Guenter _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-02 3:51 ` [lm-sensors] " Guenter Roeck @ 2013-10-02 3:55 ` Chris Murphy -1 siblings, 0 replies; 86+ messages in thread From: Chris Murphy @ 2013-10-02 3:55 UTC (permalink / raw) To: Guenter Roeck Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Oct 1, 2013, at 9:51 PM, Guenter Roeck <linux@roeck-us.net> wrote: > On Tue, Oct 01, 2013 at 07:09:26PM -0600, Chris Murphy wrote: >> >> On Oct 1, 2013, at 10:24 AM, Guenter Roeck <linux@roeck-us.net> wrote: >> >>> On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote: >>>> >>>> On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote: >>>> >>>>> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote: >>>>>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64. >>>>>>>> >>>>>>>> [ 10.886016] applesmc: key count changed from 261 to 1174405121 >>>>>>>> >>>>>>> >>>>>>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001. >>>>>>> Which I guess explains the subsequent memory allocation error in the log. >>>>>>> >>>>>>> Henrik, any idea what might be going on ? Is it possible that the previous >>>>>>> command failure leaves some state machine in a bad state ? >>>>>> >>>>>> I seem to recall a report on another similar state problem on newer >>>>>> machines, so maybe, yes. Older machines seem fine, I have never >>>>>> encountered the problem myself. Here is a patch to test that >>>>>> theory. It has been tested to be pretty harmless on two different >>>>>> generations. >>>>>> >>>>>> I really really do not want to add an 'if (value is insane)' check ;-) >>>>>> >>>>> Chris, >>>>> >>>>> any chance you can load this patch on an affected machine so we can get >>>>> test feedback ? This one is too experimental to submit upstream without >>>>> knowing that it really fixes the problem. >>>> >>>> Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012). >>>> >>> Only requirement is that it also includes the previous patch, so it would be >>> optimal if you can apply it on top of the previous image. >> >> Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it: >> >> [ 13.799819] applesmc: key=261 fan=2 temp=14 index=14 acc=1 lux=2 kbd=1 >> [ 13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10 >> > > Hi Chris, > > That only means that you did not hit the problem. There may be some secondary > trigger (cold boot ? coffee on the cpu ?). > > One thing I have seen in all logs is the earlier "send_byte fail" message, so > I think that is a pre-requisite. I have no idea how to trigger it. I have tried cold and warm boots. Boots between linux and OS X to linux. *shrug* I'll keep trying as I'm doing other testing, maybe I'll stumble onto it. Chris ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-02 3:55 ` Chris Murphy 0 siblings, 0 replies; 86+ messages in thread From: Chris Murphy @ 2013-10-02 3:55 UTC (permalink / raw) To: Guenter Roeck Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Oct 1, 2013, at 9:51 PM, Guenter Roeck <linux@roeck-us.net> wrote: > On Tue, Oct 01, 2013 at 07:09:26PM -0600, Chris Murphy wrote: >> >> On Oct 1, 2013, at 10:24 AM, Guenter Roeck <linux@roeck-us.net> wrote: >> >>> On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote: >>>> >>>> On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote: >>>> >>>>> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote: >>>>>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64. >>>>>>>> >>>>>>>> [ 10.886016] applesmc: key count changed from 261 to 1174405121 >>>>>>>> >>>>>>> >>>>>>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001. >>>>>>> Which I guess explains the subsequent memory allocation error in the log. >>>>>>> >>>>>>> Henrik, any idea what might be going on ? Is it possible that the previous >>>>>>> command failure leaves some state machine in a bad state ? >>>>>> >>>>>> I seem to recall a report on another similar state problem on newer >>>>>> machines, so maybe, yes. Older machines seem fine, I have never >>>>>> encountered the problem myself. Here is a patch to test that >>>>>> theory. It has been tested to be pretty harmless on two different >>>>>> generations. >>>>>> >>>>>> I really really do not want to add an 'if (value is insane)' check ;-) >>>>>> >>>>> Chris, >>>>> >>>>> any chance you can load this patch on an affected machine so we can get >>>>> test feedback ? This one is too experimental to submit upstream without >>>>> knowing that it really fixes the problem. >>>> >>>> Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012). >>>> >>> Only requirement is that it also includes the previous patch, so it would be >>> optimal if you can apply it on top of the previous image. >> >> Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it: >> >> [ 13.799819] applesmc: key&1 fan=2 temp\x14 index\x14 acc=1 lux=2 kbd=1 >> [ 13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10 >> > > Hi Chris, > > That only means that you did not hit the problem. There may be some secondary > trigger (cold boot ? coffee on the cpu ?). > > One thing I have seen in all logs is the earlier "send_byte fail" message, so > I think that is a pre-requisite. I have no idea how to trigger it. I have tried cold and warm boots. Boots between linux and OS X to linux. *shrug* I'll keep trying as I'm doing other testing, maybe I'll stumble onto it. Chris _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-02 3:55 ` [lm-sensors] " Chris Murphy @ 2013-10-02 4:02 ` Guenter Roeck -1 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-02 4:02 UTC (permalink / raw) To: Chris Murphy Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On 10/01/2013 08:55 PM, Chris Murphy wrote: > > On Oct 1, 2013, at 9:51 PM, Guenter Roeck <linux@roeck-us.net> wrote: > >> On Tue, Oct 01, 2013 at 07:09:26PM -0600, Chris Murphy wrote: >>> >>> On Oct 1, 2013, at 10:24 AM, Guenter Roeck <linux@roeck-us.net> wrote: >>> >>>> On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote: >>>>> >>>>> On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote: >>>>> >>>>>> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote: >>>>>>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64. >>>>>>>>> >>>>>>>>> [ 10.886016] applesmc: key count changed from 261 to 1174405121 >>>>>>>>> >>>>>>>> >>>>>>>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001. >>>>>>>> Which I guess explains the subsequent memory allocation error in the log. >>>>>>>> >>>>>>>> Henrik, any idea what might be going on ? Is it possible that the previous >>>>>>>> command failure leaves some state machine in a bad state ? >>>>>>> >>>>>>> I seem to recall a report on another similar state problem on newer >>>>>>> machines, so maybe, yes. Older machines seem fine, I have never >>>>>>> encountered the problem myself. Here is a patch to test that >>>>>>> theory. It has been tested to be pretty harmless on two different >>>>>>> generations. >>>>>>> >>>>>>> I really really do not want to add an 'if (value is insane)' check ;-) >>>>>>> >>>>>> Chris, >>>>>> >>>>>> any chance you can load this patch on an affected machine so we can get >>>>>> test feedback ? This one is too experimental to submit upstream without >>>>>> knowing that it really fixes the problem. >>>>> >>>>> Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012). >>>>> >>>> Only requirement is that it also includes the previous patch, so it would be >>>> optimal if you can apply it on top of the previous image. >>> >>> Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it: >>> >>> [ 13.799819] applesmc: key=261 fan=2 temp=14 index=14 acc=1 lux=2 kbd=1 >>> [ 13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10 >>> >> >> Hi Chris, >> >> That only means that you did not hit the problem. There may be some secondary >> trigger (cold boot ? coffee on the cpu ?). >> >> One thing I have seen in all logs is the earlier "send_byte fail" message, so >> I think that is a pre-requisite. > > I have no idea how to trigger it. I have tried cold and warm boots. Boots between linux and OS X to linux. *shrug* I'll keep trying as I'm doing other testing, maybe I'll stumble onto it. > I am sure you didn't pour coffee onto the CPU yet :) Basic rule of testing: You'll only hit the problem again after you are convinced that it was magically fixed by a completely unrelated change. Cheers, Guenter ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-02 4:02 ` Guenter Roeck 0 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-02 4:02 UTC (permalink / raw) To: Chris Murphy Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On 10/01/2013 08:55 PM, Chris Murphy wrote: > > On Oct 1, 2013, at 9:51 PM, Guenter Roeck <linux@roeck-us.net> wrote: > >> On Tue, Oct 01, 2013 at 07:09:26PM -0600, Chris Murphy wrote: >>> >>> On Oct 1, 2013, at 10:24 AM, Guenter Roeck <linux@roeck-us.net> wrote: >>> >>>> On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote: >>>>> >>>>> On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote: >>>>> >>>>>> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote: >>>>>>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64. >>>>>>>>> >>>>>>>>> [ 10.886016] applesmc: key count changed from 261 to 1174405121 >>>>>>>>> >>>>>>>> >>>>>>>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001. >>>>>>>> Which I guess explains the subsequent memory allocation error in the log. >>>>>>>> >>>>>>>> Henrik, any idea what might be going on ? Is it possible that the previous >>>>>>>> command failure leaves some state machine in a bad state ? >>>>>>> >>>>>>> I seem to recall a report on another similar state problem on newer >>>>>>> machines, so maybe, yes. Older machines seem fine, I have never >>>>>>> encountered the problem myself. Here is a patch to test that >>>>>>> theory. It has been tested to be pretty harmless on two different >>>>>>> generations. >>>>>>> >>>>>>> I really really do not want to add an 'if (value is insane)' check ;-) >>>>>>> >>>>>> Chris, >>>>>> >>>>>> any chance you can load this patch on an affected machine so we can get >>>>>> test feedback ? This one is too experimental to submit upstream without >>>>>> knowing that it really fixes the problem. >>>>> >>>>> Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012). >>>>> >>>> Only requirement is that it also includes the previous patch, so it would be >>>> optimal if you can apply it on top of the previous image. >>> >>> Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it: >>> >>> [ 13.799819] applesmc: key&1 fan=2 temp\x14 index\x14 acc=1 lux=2 kbd=1 >>> [ 13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10 >>> >> >> Hi Chris, >> >> That only means that you did not hit the problem. There may be some secondary >> trigger (cold boot ? coffee on the cpu ?). >> >> One thing I have seen in all logs is the earlier "send_byte fail" message, so >> I think that is a pre-requisite. > > I have no idea how to trigger it. I have tried cold and warm boots. Boots between linux and OS X to linux. *shrug* I'll keep trying as I'm doing other testing, maybe I'll stumble onto it. > I am sure you didn't pour coffee onto the CPU yet :) Basic rule of testing: You'll only hit the problem again after you are convinced that it was magically fixed by a completely unrelated change. Cheers, Guenter _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-02 3:51 ` [lm-sensors] " Guenter Roeck @ 2013-10-02 9:53 ` Henrik Rydberg -1 siblings, 0 replies; 86+ messages in thread From: Henrik Rydberg @ 2013-10-02 9:53 UTC (permalink / raw) To: Guenter Roeck Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org > > Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it: > > > > [ 13.799819] applesmc: key=261 fan=2 temp=14 index=14 acc=1 lux=2 kbd=1 > > [ 13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10 > > > > One thing I have seen in all logs is the earlier "send_byte fail" message, so > I think that is a pre-requisite. Not necessarily - it could be that the patch actually fixes the root cause. One possible scenario is that on recent SMCs, some of the commands produce more data than we actually read. This would eventually lead to both data corruption and overflow somwhere in the SMC internals. If the original SMC error is interpreted as a read buffer overflow, then that problem should be fixed with this patch. Even so, the patch does require more testing. It would be great to get coverage on all models we can find. Thanks, Henrik ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-02 9:53 ` Henrik Rydberg 0 siblings, 0 replies; 86+ messages in thread From: Henrik Rydberg @ 2013-10-02 9:53 UTC (permalink / raw) To: Guenter Roeck Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org > > Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it: > > > > [ 13.799819] applesmc: key&1 fan=2 temp\x14 index\x14 acc=1 lux=2 kbd=1 > > [ 13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10 > > > > One thing I have seen in all logs is the earlier "send_byte fail" message, so > I think that is a pre-requisite. Not necessarily - it could be that the patch actually fixes the root cause. One possible scenario is that on recent SMCs, some of the commands produce more data than we actually read. This would eventually lead to both data corruption and overflow somwhere in the SMC internals. If the original SMC error is interpreted as a read buffer overflow, then that problem should be fixed with this patch. Even so, the patch does require more testing. It would be great to get coverage on all models we can find. Thanks, Henrik _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-02 9:53 ` [lm-sensors] " Henrik Rydberg @ 2013-10-02 13:30 ` Guenter Roeck -1 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-02 13:30 UTC (permalink / raw) To: Henrik Rydberg Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On 10/02/2013 02:53 AM, Henrik Rydberg wrote: >>> Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it: >>> >>> [ 13.799819] applesmc: key=261 fan=2 temp=14 index=14 acc=1 lux=2 kbd=1 >>> [ 13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10 >>> >> >> One thing I have seen in all logs is the earlier "send_byte fail" message, so >> I think that is a pre-requisite. > > Not necessarily - it could be that the patch actually fixes the root > cause. One possible scenario is that on recent SMCs, some of the > commands produce more data than we actually read. This would > eventually lead to both data corruption and overflow somwhere in the > SMC internals. If the original SMC error is interpreted as a read > buffer overflow, then that problem should be fixed with this patch. > Good point. But shouldn't we at least get the "flushed %d bytes" warning message in this case ? Thanks, Guenter ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-02 13:30 ` Guenter Roeck 0 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-02 13:30 UTC (permalink / raw) To: Henrik Rydberg Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On 10/02/2013 02:53 AM, Henrik Rydberg wrote: >>> Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it: >>> >>> [ 13.799819] applesmc: key&1 fan=2 temp\x14 index\x14 acc=1 lux=2 kbd=1 >>> [ 13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10 >>> >> >> One thing I have seen in all logs is the earlier "send_byte fail" message, so >> I think that is a pre-requisite. > > Not necessarily - it could be that the patch actually fixes the root > cause. One possible scenario is that on recent SMCs, some of the > commands produce more data than we actually read. This would > eventually lead to both data corruption and overflow somwhere in the > SMC internals. If the original SMC error is interpreted as a read > buffer overflow, then that problem should be fixed with this patch. > Good point. But shouldn't we at least get the "flushed %d bytes" warning message in this case ? Thanks, Guenter _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-02 13:30 ` [lm-sensors] " Guenter Roeck @ 2013-10-02 16:34 ` Henrik Rydberg -1 siblings, 0 replies; 86+ messages in thread From: Henrik Rydberg @ 2013-10-02 16:34 UTC (permalink / raw) To: Guenter Roeck Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org > >>One thing I have seen in all logs is the earlier "send_byte fail" message, so > >>I think that is a pre-requisite. > > > >Not necessarily - it could be that the patch actually fixes the root > >cause. One possible scenario is that on recent SMCs, some of the > >commands produce more data than we actually read. This would > >eventually lead to both data corruption and overflow somwhere in the > >SMC internals. If the original SMC error is interpreted as a read > >buffer overflow, then that problem should be fixed with this patch. > > > > Good point. > > But shouldn't we at least get the "flushed %d bytes" warning message in this case ? The explanation I have there is that the (newer) SMC needs the application to read the 'no more bytes' or it will get confused. It makes sense, if the number of bytes to read is no longer specified. Thanks, Henrik ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-02 16:34 ` Henrik Rydberg 0 siblings, 0 replies; 86+ messages in thread From: Henrik Rydberg @ 2013-10-02 16:34 UTC (permalink / raw) To: Guenter Roeck Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org > >>One thing I have seen in all logs is the earlier "send_byte fail" message, so > >>I think that is a pre-requisite. > > > >Not necessarily - it could be that the patch actually fixes the root > >cause. One possible scenario is that on recent SMCs, some of the > >commands produce more data than we actually read. This would > >eventually lead to both data corruption and overflow somwhere in the > >SMC internals. If the original SMC error is interpreted as a read > >buffer overflow, then that problem should be fixed with this patch. > > > > Good point. > > But shouldn't we at least get the "flushed %d bytes" warning message in this case ? The explanation I have there is that the (newer) SMC needs the application to read the 'no more bytes' or it will get confused. It makes sense, if the number of bytes to read is no longer specified. Thanks, Henrik _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-02 16:34 ` [lm-sensors] " Henrik Rydberg @ 2013-10-02 16:47 ` Guenter Roeck -1 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-02 16:47 UTC (permalink / raw) To: Henrik Rydberg Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote: > > >>One thing I have seen in all logs is the earlier "send_byte fail" message, so > > >>I think that is a pre-requisite. > > > > > >Not necessarily - it could be that the patch actually fixes the root > > >cause. One possible scenario is that on recent SMCs, some of the > > >commands produce more data than we actually read. This would > > >eventually lead to both data corruption and overflow somwhere in the > > >SMC internals. If the original SMC error is interpreted as a read > > >buffer overflow, then that problem should be fixed with this patch. > > > > > > > Good point. > > > > But shouldn't we at least get the "flushed %d bytes" warning message in this case ? > > The explanation I have there is that the (newer) SMC needs the > application to read the 'no more bytes' or it will get confused. It > makes sense, if the number of bytes to read is no longer specified. > You mean that just reading from APPLESMC_CMD_PORT would solve the problem ? That might make sense. Thanks, Guenter ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-02 16:47 ` Guenter Roeck 0 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-02 16:47 UTC (permalink / raw) To: Henrik Rydberg Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote: > > >>One thing I have seen in all logs is the earlier "send_byte fail" message, so > > >>I think that is a pre-requisite. > > > > > >Not necessarily - it could be that the patch actually fixes the root > > >cause. One possible scenario is that on recent SMCs, some of the > > >commands produce more data than we actually read. This would > > >eventually lead to both data corruption and overflow somwhere in the > > >SMC internals. If the original SMC error is interpreted as a read > > >buffer overflow, then that problem should be fixed with this patch. > > > > > > > Good point. > > > > But shouldn't we at least get the "flushed %d bytes" warning message in this case ? > > The explanation I have there is that the (newer) SMC needs the > application to read the 'no more bytes' or it will get confused. It > makes sense, if the number of bytes to read is no longer specified. > You mean that just reading from APPLESMC_CMD_PORT would solve the problem ? That might make sense. Thanks, Guenter _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-02 16:47 ` [lm-sensors] " Guenter Roeck @ 2013-10-02 17:24 ` Henrik Rydberg -1 siblings, 0 replies; 86+ messages in thread From: Henrik Rydberg @ 2013-10-02 17:24 UTC (permalink / raw) To: Guenter Roeck Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote: > On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote: > > > >>One thing I have seen in all logs is the earlier "send_byte fail" message, so > > > >>I think that is a pre-requisite. > > > > > > > >Not necessarily - it could be that the patch actually fixes the root > > > >cause. One possible scenario is that on recent SMCs, some of the > > > >commands produce more data than we actually read. This would > > > >eventually lead to both data corruption and overflow somwhere in the > > > >SMC internals. If the original SMC error is interpreted as a read > > > >buffer overflow, then that problem should be fixed with this patch. > > > > > > > > > > Good point. > > > > > > But shouldn't we at least get the "flushed %d bytes" warning message in this case ? > > > > The explanation I have there is that the (newer) SMC needs the > > application to read the 'no more bytes' or it will get confused. It > > makes sense, if the number of bytes to read is no longer specified. > > > You mean that just reading from APPLESMC_CMD_PORT would solve the problem ? > That might make sense. It also points at the possibility of a smaller patch to test, but I have not had the time to check this very deeply myself: Thanks, Henrik >From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001 From: Henrik Rydberg <rydberg@euromail.se> Date: Wed, 2 Oct 2013 19:15:03 +0200 Subject: [PATCH] applesmc remedy take 2 Conjectured problem: there are remnant bytes ready on the data line which corrupts the read after a failure. Remedy: assuming bit0 is the read valid line, try to flush it before starting a new command. Tests by Chris suggests reading the status is enough for the problem to go away, which is consistent with a change in the SMC interface, where the number of bytes to read is no longer specified, but found out by reading until end of data. Tested on a MacBookAir3,1, but the original problem has not been reproduced. --- drivers/hwmon/applesmc.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c index 98814d1..c0ff350 100644 --- a/drivers/hwmon/applesmc.c +++ b/drivers/hwmon/applesmc.c @@ -230,6 +230,7 @@ static int send_argument(const char *key) static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) { + u8 status, data; int i; if (send_command(cmd) || send_argument(key)) { @@ -237,6 +238,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) return -EIO; } + /* This has no effect on newer (2012) SMCs */ if (send_byte(len, APPLESMC_DATA_PORT)) { pr_warn("%.4s: read len fail\n", key); return -EIO; @@ -250,6 +252,17 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) buffer[i] = inb(APPLESMC_DATA_PORT); } + /* Read the data port until bit0 is cleared */ + for (i = 0; i < 16; i++) { + udelay(APPLESMC_MIN_WAIT); + status = inb(APPLESMC_CMD_PORT); + if (!(status & 0x01)) + break; + data = inb(APPLESMC_DATA_PORT); + } + if (i) + pr_warn("flushed %d bytes, last value is: %d\n", i, data); + return 0; } -- 1.8.3.4 ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-02 17:24 ` Henrik Rydberg 0 siblings, 0 replies; 86+ messages in thread From: Henrik Rydberg @ 2013-10-02 17:24 UTC (permalink / raw) To: Guenter Roeck Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote: > On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote: > > > >>One thing I have seen in all logs is the earlier "send_byte fail" message, so > > > >>I think that is a pre-requisite. > > > > > > > >Not necessarily - it could be that the patch actually fixes the root > > > >cause. One possible scenario is that on recent SMCs, some of the > > > >commands produce more data than we actually read. This would > > > >eventually lead to both data corruption and overflow somwhere in the > > > >SMC internals. If the original SMC error is interpreted as a read > > > >buffer overflow, then that problem should be fixed with this patch. > > > > > > > > > > Good point. > > > > > > But shouldn't we at least get the "flushed %d bytes" warning message in this case ? > > > > The explanation I have there is that the (newer) SMC needs the > > application to read the 'no more bytes' or it will get confused. It > > makes sense, if the number of bytes to read is no longer specified. > > > You mean that just reading from APPLESMC_CMD_PORT would solve the problem ? > That might make sense. It also points at the possibility of a smaller patch to test, but I have not had the time to check this very deeply myself: Thanks, Henrik From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001 From: Henrik Rydberg <rydberg@euromail.se> Date: Wed, 2 Oct 2013 19:15:03 +0200 Subject: [PATCH] applesmc remedy take 2 Conjectured problem: there are remnant bytes ready on the data line which corrupts the read after a failure. Remedy: assuming bit0 is the read valid line, try to flush it before starting a new command. Tests by Chris suggests reading the status is enough for the problem to go away, which is consistent with a change in the SMC interface, where the number of bytes to read is no longer specified, but found out by reading until end of data. Tested on a MacBookAir3,1, but the original problem has not been reproduced. --- drivers/hwmon/applesmc.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c index 98814d1..c0ff350 100644 --- a/drivers/hwmon/applesmc.c +++ b/drivers/hwmon/applesmc.c @@ -230,6 +230,7 @@ static int send_argument(const char *key) static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) { + u8 status, data; int i; if (send_command(cmd) || send_argument(key)) { @@ -237,6 +238,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) return -EIO; } + /* This has no effect on newer (2012) SMCs */ if (send_byte(len, APPLESMC_DATA_PORT)) { pr_warn("%.4s: read len fail\n", key); return -EIO; @@ -250,6 +252,17 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) buffer[i] = inb(APPLESMC_DATA_PORT); } + /* Read the data port until bit0 is cleared */ + for (i = 0; i < 16; i++) { + udelay(APPLESMC_MIN_WAIT); + status = inb(APPLESMC_CMD_PORT); + if (!(status & 0x01)) + break; + data = inb(APPLESMC_DATA_PORT); + } + if (i) + pr_warn("flushed %d bytes, last value is: %d\n", i, data); + return 0; } -- 1.8.3.4 _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-02 17:24 ` [lm-sensors] " Henrik Rydberg @ 2013-10-02 18:02 ` Guenter Roeck -1 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-02 18:02 UTC (permalink / raw) To: Henrik Rydberg Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote: > On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote: > > On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote: > > > > >>One thing I have seen in all logs is the earlier "send_byte fail" message, so > > > > >>I think that is a pre-requisite. > > > > > > > > > >Not necessarily - it could be that the patch actually fixes the root > > > > >cause. One possible scenario is that on recent SMCs, some of the > > > > >commands produce more data than we actually read. This would > > > > >eventually lead to both data corruption and overflow somwhere in the > > > > >SMC internals. If the original SMC error is interpreted as a read > > > > >buffer overflow, then that problem should be fixed with this patch. > > > > > > > > > > > > > Good point. > > > > > > > > But shouldn't we at least get the "flushed %d bytes" warning message in this case ? > > > > > > The explanation I have there is that the (newer) SMC needs the > > > application to read the 'no more bytes' or it will get confused. It > > > makes sense, if the number of bytes to read is no longer specified. > > > > > You mean that just reading from APPLESMC_CMD_PORT would solve the problem ? > > That might make sense. > > It also points at the possibility of a smaller patch to test, but I > have not had the time to check this very deeply myself: > I like this patch much more than the previous patch. Chris, can you test it ? Thanks, Guenter > Thanks, > Henrik > > From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001 > From: Henrik Rydberg <rydberg@euromail.se> > Date: Wed, 2 Oct 2013 19:15:03 +0200 > Subject: [PATCH] applesmc remedy take 2 > > Conjectured problem: there are remnant bytes ready on the data line > which corrupts the read after a failure. > > Remedy: assuming bit0 is the read valid line, try to flush it before > starting a new command. > > Tests by Chris suggests reading the status is enough for the problem > to go away, which is consistent with a change in the SMC interface, > where the number of bytes to read is no longer specified, but found > out by reading until end of data. > > Tested on a MacBookAir3,1, but the original problem has not been > reproduced. > --- > drivers/hwmon/applesmc.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c > index 98814d1..c0ff350 100644 > --- a/drivers/hwmon/applesmc.c > +++ b/drivers/hwmon/applesmc.c > @@ -230,6 +230,7 @@ static int send_argument(const char *key) > > static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) > { > + u8 status, data; > int i; > > if (send_command(cmd) || send_argument(key)) { > @@ -237,6 +238,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) > return -EIO; > } > > + /* This has no effect on newer (2012) SMCs */ > if (send_byte(len, APPLESMC_DATA_PORT)) { > pr_warn("%.4s: read len fail\n", key); > return -EIO; > @@ -250,6 +252,17 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) > buffer[i] = inb(APPLESMC_DATA_PORT); > } > > + /* Read the data port until bit0 is cleared */ > + for (i = 0; i < 16; i++) { > + udelay(APPLESMC_MIN_WAIT); > + status = inb(APPLESMC_CMD_PORT); > + if (!(status & 0x01)) > + break; > + data = inb(APPLESMC_DATA_PORT); > + } > + if (i) > + pr_warn("flushed %d bytes, last value is: %d\n", i, data); > + > return 0; > } > > -- > 1.8.3.4 > > ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-02 18:02 ` Guenter Roeck 0 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-02 18:02 UTC (permalink / raw) To: Henrik Rydberg Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote: > On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote: > > On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote: > > > > >>One thing I have seen in all logs is the earlier "send_byte fail" message, so > > > > >>I think that is a pre-requisite. > > > > > > > > > >Not necessarily - it could be that the patch actually fixes the root > > > > >cause. One possible scenario is that on recent SMCs, some of the > > > > >commands produce more data than we actually read. This would > > > > >eventually lead to both data corruption and overflow somwhere in the > > > > >SMC internals. If the original SMC error is interpreted as a read > > > > >buffer overflow, then that problem should be fixed with this patch. > > > > > > > > > > > > > Good point. > > > > > > > > But shouldn't we at least get the "flushed %d bytes" warning message in this case ? > > > > > > The explanation I have there is that the (newer) SMC needs the > > > application to read the 'no more bytes' or it will get confused. It > > > makes sense, if the number of bytes to read is no longer specified. > > > > > You mean that just reading from APPLESMC_CMD_PORT would solve the problem ? > > That might make sense. > > It also points at the possibility of a smaller patch to test, but I > have not had the time to check this very deeply myself: > I like this patch much more than the previous patch. Chris, can you test it ? Thanks, Guenter > Thanks, > Henrik > > From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001 > From: Henrik Rydberg <rydberg@euromail.se> > Date: Wed, 2 Oct 2013 19:15:03 +0200 > Subject: [PATCH] applesmc remedy take 2 > > Conjectured problem: there are remnant bytes ready on the data line > which corrupts the read after a failure. > > Remedy: assuming bit0 is the read valid line, try to flush it before > starting a new command. > > Tests by Chris suggests reading the status is enough for the problem > to go away, which is consistent with a change in the SMC interface, > where the number of bytes to read is no longer specified, but found > out by reading until end of data. > > Tested on a MacBookAir3,1, but the original problem has not been > reproduced. > --- > drivers/hwmon/applesmc.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c > index 98814d1..c0ff350 100644 > --- a/drivers/hwmon/applesmc.c > +++ b/drivers/hwmon/applesmc.c > @@ -230,6 +230,7 @@ static int send_argument(const char *key) > > static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) > { > + u8 status, data; > int i; > > if (send_command(cmd) || send_argument(key)) { > @@ -237,6 +238,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) > return -EIO; > } > > + /* This has no effect on newer (2012) SMCs */ > if (send_byte(len, APPLESMC_DATA_PORT)) { > pr_warn("%.4s: read len fail\n", key); > return -EIO; > @@ -250,6 +252,17 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) > buffer[i] = inb(APPLESMC_DATA_PORT); > } > > + /* Read the data port until bit0 is cleared */ > + for (i = 0; i < 16; i++) { > + udelay(APPLESMC_MIN_WAIT); > + status = inb(APPLESMC_CMD_PORT); > + if (!(status & 0x01)) > + break; > + data = inb(APPLESMC_DATA_PORT); > + } > + if (i) > + pr_warn("flushed %d bytes, last value is: %d\n", i, data); > + > return 0; > } > > -- > 1.8.3.4 > > _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-02 18:02 ` [lm-sensors] " Guenter Roeck @ 2013-10-02 18:33 ` Chris Murphy -1 siblings, 0 replies; 86+ messages in thread From: Chris Murphy @ 2013-10-02 18:33 UTC (permalink / raw) To: Guenter Roeck Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Oct 2, 2013, at 12:02 PM, Guenter Roeck <linux@roeck-us.net> wrote: > On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote: >> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote: >>> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote: >>>>>>> One thing I have seen in all logs is the earlier "send_byte fail" message, so >>>>>>> I think that is a pre-requisite. >>>>>> >>>>>> Not necessarily - it could be that the patch actually fixes the root >>>>>> cause. One possible scenario is that on recent SMCs, some of the >>>>>> commands produce more data than we actually read. This would >>>>>> eventually lead to both data corruption and overflow somwhere in the >>>>>> SMC internals. If the original SMC error is interpreted as a read >>>>>> buffer overflow, then that problem should be fixed with this patch. >>>>>> >>>>> >>>>> Good point. >>>>> >>>>> But shouldn't we at least get the "flushed %d bytes" warning message in this case ? >>>> >>>> The explanation I have there is that the (newer) SMC needs the >>>> application to read the 'no more bytes' or it will get confused. It >>>> makes sense, if the number of bytes to read is no longer specified. >>>> >>> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ? >>> That might make sense. >> >> It also points at the possibility of a smaller patch to test, but I >> have not had the time to check this very deeply myself: >> > I like this patch much more than the previous patch. Chris, can you test it ? Yes. Building now. What kernel message should I be looking for? At least on 2011 and 2012 laptops I have yet to see an Oops related to smc. The kernel with previous patch at least is not causing problems on them so far, which works well as I can test more on the 2008 model. Chris Murphy ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-02 18:33 ` Chris Murphy 0 siblings, 0 replies; 86+ messages in thread From: Chris Murphy @ 2013-10-02 18:33 UTC (permalink / raw) To: Guenter Roeck Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Oct 2, 2013, at 12:02 PM, Guenter Roeck <linux@roeck-us.net> wrote: > On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote: >> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote: >>> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote: >>>>>>> One thing I have seen in all logs is the earlier "send_byte fail" message, so >>>>>>> I think that is a pre-requisite. >>>>>> >>>>>> Not necessarily - it could be that the patch actually fixes the root >>>>>> cause. One possible scenario is that on recent SMCs, some of the >>>>>> commands produce more data than we actually read. This would >>>>>> eventually lead to both data corruption and overflow somwhere in the >>>>>> SMC internals. If the original SMC error is interpreted as a read >>>>>> buffer overflow, then that problem should be fixed with this patch. >>>>>> >>>>> >>>>> Good point. >>>>> >>>>> But shouldn't we at least get the "flushed %d bytes" warning message in this case ? >>>> >>>> The explanation I have there is that the (newer) SMC needs the >>>> application to read the 'no more bytes' or it will get confused. It >>>> makes sense, if the number of bytes to read is no longer specified. >>>> >>> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ? >>> That might make sense. >> >> It also points at the possibility of a smaller patch to test, but I >> have not had the time to check this very deeply myself: >> > I like this patch much more than the previous patch. Chris, can you test it ? Yes. Building now. What kernel message should I be looking for? At least on 2011 and 2012 laptops I have yet to see an Oops related to smc. The kernel with previous patch at least is not causing problems on them so far, which works well as I can test more on the 2008 model. Chris Murphy _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-02 18:33 ` [lm-sensors] " Chris Murphy @ 2013-10-02 20:59 ` Guenter Roeck -1 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-02 20:59 UTC (permalink / raw) To: Chris Murphy Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Wed, Oct 02, 2013 at 12:33:00PM -0600, Chris Murphy wrote: > > On Oct 2, 2013, at 12:02 PM, Guenter Roeck <linux@roeck-us.net> wrote: > > > On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote: > >> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote: > >>> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote: > >>>>>>> One thing I have seen in all logs is the earlier "send_byte fail" message, so > >>>>>>> I think that is a pre-requisite. > >>>>>> > >>>>>> Not necessarily - it could be that the patch actually fixes the root > >>>>>> cause. One possible scenario is that on recent SMCs, some of the > >>>>>> commands produce more data than we actually read. This would > >>>>>> eventually lead to both data corruption and overflow somwhere in the > >>>>>> SMC internals. If the original SMC error is interpreted as a read > >>>>>> buffer overflow, then that problem should be fixed with this patch. > >>>>>> > >>>>> > >>>>> Good point. > >>>>> > >>>>> But shouldn't we at least get the "flushed %d bytes" warning message in this case ? > >>>> > >>>> The explanation I have there is that the (newer) SMC needs the > >>>> application to read the 'no more bytes' or it will get confused. It > >>>> makes sense, if the number of bytes to read is no longer specified. > >>>> > >>> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ? > >>> That might make sense. > >> > >> It also points at the possibility of a smaller patch to test, but I > >> have not had the time to check this very deeply myself: > >> > > I like this patch much more than the previous patch. Chris, can you test it ? > > Yes. Building now. What kernel message should I be looking for? At least on 2011 and 2012 laptops I have yet to see an Oops related to smc. The kernel with previous patch at least is not causing problems on them so far, which works well as I can test more on the 2008 model. > None, if I understand correctly and if the patch really fixes the root cause of the problem. Guenter ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-02 20:59 ` Guenter Roeck 0 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-02 20:59 UTC (permalink / raw) To: Chris Murphy Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Wed, Oct 02, 2013 at 12:33:00PM -0600, Chris Murphy wrote: > > On Oct 2, 2013, at 12:02 PM, Guenter Roeck <linux@roeck-us.net> wrote: > > > On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote: > >> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote: > >>> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote: > >>>>>>> One thing I have seen in all logs is the earlier "send_byte fail" message, so > >>>>>>> I think that is a pre-requisite. > >>>>>> > >>>>>> Not necessarily - it could be that the patch actually fixes the root > >>>>>> cause. One possible scenario is that on recent SMCs, some of the > >>>>>> commands produce more data than we actually read. This would > >>>>>> eventually lead to both data corruption and overflow somwhere in the > >>>>>> SMC internals. If the original SMC error is interpreted as a read > >>>>>> buffer overflow, then that problem should be fixed with this patch. > >>>>>> > >>>>> > >>>>> Good point. > >>>>> > >>>>> But shouldn't we at least get the "flushed %d bytes" warning message in this case ? > >>>> > >>>> The explanation I have there is that the (newer) SMC needs the > >>>> application to read the 'no more bytes' or it will get confused. It > >>>> makes sense, if the number of bytes to read is no longer specified. > >>>> > >>> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ? > >>> That might make sense. > >> > >> It also points at the possibility of a smaller patch to test, but I > >> have not had the time to check this very deeply myself: > >> > > I like this patch much more than the previous patch. Chris, can you test it ? > > Yes. Building now. What kernel message should I be looking for? At least on 2011 and 2012 laptops I have yet to see an Oops related to smc. The kernel with previous patch at least is not causing problems on them so far, which works well as I can test more on the 2008 model. > None, if I understand correctly and if the patch really fixes the root cause of the problem. Guenter _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-02 20:59 ` [lm-sensors] " Guenter Roeck @ 2013-10-02 21:34 ` Chris Murphy -1 siblings, 0 replies; 86+ messages in thread From: Chris Murphy @ 2013-10-02 21:34 UTC (permalink / raw) To: Guenter Roeck Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Oct 2, 2013, at 2:59 PM, Guenter Roeck <linux@roeck-us.net> wrote: > On Wed, Oct 02, 2013 at 12:33:00PM -0600, Chris Murphy wrote: >> >> On Oct 2, 2013, at 12:02 PM, Guenter Roeck <linux@roeck-us.net> wrote: >> >>> On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote: >>>> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote: >>>>> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote: >>>>>>>>> One thing I have seen in all logs is the earlier "send_byte fail" message, so >>>>>>>>> I think that is a pre-requisite. >>>>>>>> >>>>>>>> Not necessarily - it could be that the patch actually fixes the root >>>>>>>> cause. One possible scenario is that on recent SMCs, some of the >>>>>>>> commands produce more data than we actually read. This would >>>>>>>> eventually lead to both data corruption and overflow somwhere in the >>>>>>>> SMC internals. If the original SMC error is interpreted as a read >>>>>>>> buffer overflow, then that problem should be fixed with this patch. >>>>>>>> >>>>>>> >>>>>>> Good point. >>>>>>> >>>>>>> But shouldn't we at least get the "flushed %d bytes" warning message in this case ? >>>>>> >>>>>> The explanation I have there is that the (newer) SMC needs the >>>>>> application to read the 'no more bytes' or it will get confused. It >>>>>> makes sense, if the number of bytes to read is no longer specified. >>>>>> >>>>> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ? >>>>> That might make sense. >>>> >>>> It also points at the possibility of a smaller patch to test, but I >>>> have not had the time to check this very deeply myself: >>>> >>> I like this patch much more than the previous patch. Chris, can you test it ? >> >> Yes. Building now. What kernel message should I be looking for? At least on 2011 and 2012 laptops I have yet to see an Oops related to smc. The kernel with previous patch at least is not causing problems on them so far, which works well as I can test more on the 2008 model. >> > None, if I understand correctly and if the patch really fixes the root cause > of the problem. A vast majority of the Ooops I've had are when booting from flash media, testing Fedora installs. Is it possible the much slower kernel load and boot time is a trigger? If so, I'll look into modifying the media to accept the custom kernel and requisite fat initramfs. Chris ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-02 21:34 ` Chris Murphy 0 siblings, 0 replies; 86+ messages in thread From: Chris Murphy @ 2013-10-02 21:34 UTC (permalink / raw) To: Guenter Roeck Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Oct 2, 2013, at 2:59 PM, Guenter Roeck <linux@roeck-us.net> wrote: > On Wed, Oct 02, 2013 at 12:33:00PM -0600, Chris Murphy wrote: >> >> On Oct 2, 2013, at 12:02 PM, Guenter Roeck <linux@roeck-us.net> wrote: >> >>> On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote: >>>> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote: >>>>> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote: >>>>>>>>> One thing I have seen in all logs is the earlier "send_byte fail" message, so >>>>>>>>> I think that is a pre-requisite. >>>>>>>> >>>>>>>> Not necessarily - it could be that the patch actually fixes the root >>>>>>>> cause. One possible scenario is that on recent SMCs, some of the >>>>>>>> commands produce more data than we actually read. This would >>>>>>>> eventually lead to both data corruption and overflow somwhere in the >>>>>>>> SMC internals. If the original SMC error is interpreted as a read >>>>>>>> buffer overflow, then that problem should be fixed with this patch. >>>>>>>> >>>>>>> >>>>>>> Good point. >>>>>>> >>>>>>> But shouldn't we at least get the "flushed %d bytes" warning message in this case ? >>>>>> >>>>>> The explanation I have there is that the (newer) SMC needs the >>>>>> application to read the 'no more bytes' or it will get confused. It >>>>>> makes sense, if the number of bytes to read is no longer specified. >>>>>> >>>>> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ? >>>>> That might make sense. >>>> >>>> It also points at the possibility of a smaller patch to test, but I >>>> have not had the time to check this very deeply myself: >>>> >>> I like this patch much more than the previous patch. Chris, can you test it ? >> >> Yes. Building now. What kernel message should I be looking for? At least on 2011 and 2012 laptops I have yet to see an Oops related to smc. The kernel with previous patch at least is not causing problems on them so far, which works well as I can test more on the 2008 model. >> > None, if I understand correctly and if the patch really fixes the root cause > of the problem. A vast majority of the Ooops I've had are when booting from flash media, testing Fedora installs. Is it possible the much slower kernel load and boot time is a trigger? If so, I'll look into modifying the media to accept the custom kernel and requisite fat initramfs. Chris _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-02 21:34 ` [lm-sensors] " Chris Murphy @ 2013-10-02 23:32 ` Guenter Roeck -1 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-02 23:32 UTC (permalink / raw) To: Chris Murphy Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Wed, Oct 02, 2013 at 03:34:41PM -0600, Chris Murphy wrote: > > On Oct 2, 2013, at 2:59 PM, Guenter Roeck <linux@roeck-us.net> wrote: > > > On Wed, Oct 02, 2013 at 12:33:00PM -0600, Chris Murphy wrote: > >> > >> On Oct 2, 2013, at 12:02 PM, Guenter Roeck <linux@roeck-us.net> wrote: > >> > >>> On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote: > >>>> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote: > >>>>> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote: > >>>>>>>>> One thing I have seen in all logs is the earlier "send_byte fail" message, so > >>>>>>>>> I think that is a pre-requisite. > >>>>>>>> > >>>>>>>> Not necessarily - it could be that the patch actually fixes the root > >>>>>>>> cause. One possible scenario is that on recent SMCs, some of the > >>>>>>>> commands produce more data than we actually read. This would > >>>>>>>> eventually lead to both data corruption and overflow somwhere in the > >>>>>>>> SMC internals. If the original SMC error is interpreted as a read > >>>>>>>> buffer overflow, then that problem should be fixed with this patch. > >>>>>>>> > >>>>>>> > >>>>>>> Good point. > >>>>>>> > >>>>>>> But shouldn't we at least get the "flushed %d bytes" warning message in this case ? > >>>>>> > >>>>>> The explanation I have there is that the (newer) SMC needs the > >>>>>> application to read the 'no more bytes' or it will get confused. It > >>>>>> makes sense, if the number of bytes to read is no longer specified. > >>>>>> > >>>>> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ? > >>>>> That might make sense. > >>>> > >>>> It also points at the possibility of a smaller patch to test, but I > >>>> have not had the time to check this very deeply myself: > >>>> > >>> I like this patch much more than the previous patch. Chris, can you test it ? > >> > >> Yes. Building now. What kernel message should I be looking for? At least on 2011 and 2012 laptops I have yet to see an Oops related to smc. The kernel with previous patch at least is not causing problems on them so far, which works well as I can test more on the 2008 model. > >> > > None, if I understand correctly and if the patch really fixes the root cause > > of the problem. > > A vast majority of the Ooops I've had are when booting from flash media, testing Fedora installs. Is it possible the much slower kernel load and boot time is a trigger? If so, I'll look into modifying the media to accept the custom kernel and requisite fat initramfs. > Yes, that could be a possible trigger. I thought it might be triggered by faster boot (as one gets with 3.10 and 3.11), but slow boot is just as likely. Guenter ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-02 23:32 ` Guenter Roeck 0 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-02 23:32 UTC (permalink / raw) To: Chris Murphy Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Wed, Oct 02, 2013 at 03:34:41PM -0600, Chris Murphy wrote: > > On Oct 2, 2013, at 2:59 PM, Guenter Roeck <linux@roeck-us.net> wrote: > > > On Wed, Oct 02, 2013 at 12:33:00PM -0600, Chris Murphy wrote: > >> > >> On Oct 2, 2013, at 12:02 PM, Guenter Roeck <linux@roeck-us.net> wrote: > >> > >>> On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote: > >>>> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote: > >>>>> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote: > >>>>>>>>> One thing I have seen in all logs is the earlier "send_byte fail" message, so > >>>>>>>>> I think that is a pre-requisite. > >>>>>>>> > >>>>>>>> Not necessarily - it could be that the patch actually fixes the root > >>>>>>>> cause. One possible scenario is that on recent SMCs, some of the > >>>>>>>> commands produce more data than we actually read. This would > >>>>>>>> eventually lead to both data corruption and overflow somwhere in the > >>>>>>>> SMC internals. If the original SMC error is interpreted as a read > >>>>>>>> buffer overflow, then that problem should be fixed with this patch. > >>>>>>>> > >>>>>>> > >>>>>>> Good point. > >>>>>>> > >>>>>>> But shouldn't we at least get the "flushed %d bytes" warning message in this case ? > >>>>>> > >>>>>> The explanation I have there is that the (newer) SMC needs the > >>>>>> application to read the 'no more bytes' or it will get confused. It > >>>>>> makes sense, if the number of bytes to read is no longer specified. > >>>>>> > >>>>> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ? > >>>>> That might make sense. > >>>> > >>>> It also points at the possibility of a smaller patch to test, but I > >>>> have not had the time to check this very deeply myself: > >>>> > >>> I like this patch much more than the previous patch. Chris, can you test it ? > >> > >> Yes. Building now. What kernel message should I be looking for? At least on 2011 and 2012 laptops I have yet to see an Oops related to smc. The kernel with previous patch at least is not causing problems on them so far, which works well as I can test more on the 2008 model. > >> > > None, if I understand correctly and if the patch really fixes the root cause > > of the problem. > > A vast majority of the Ooops I've had are when booting from flash media, testing Fedora installs. Is it possible the much slower kernel load and boot time is a trigger? If so, I'll look into modifying the media to accept the custom kernel and requisite fat initramfs. > Yes, that could be a possible trigger. I thought it might be triggered by faster boot (as one gets with 3.10 and 3.11), but slow boot is just as likely. Guenter _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-02 17:24 ` [lm-sensors] " Henrik Rydberg @ 2013-10-07 23:42 ` Guenter Roeck -1 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-07 23:42 UTC (permalink / raw) To: Henrik Rydberg Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On 10/02/2013 10:24 AM, Henrik Rydberg wrote: >>From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001 > From: Henrik Rydberg <rydberg@euromail.se> > Date: Wed, 2 Oct 2013 19:15:03 +0200 > Subject: [PATCH] applesmc remedy take 2 > > Conjectured problem: there are remnant bytes ready on the data line > which corrupts the read after a failure. > > Remedy: assuming bit0 is the read valid line, try to flush it before > starting a new command. > > Tests by Chris suggests reading the status is enough for the problem > to go away, which is consistent with a change in the SMC interface, > where the number of bytes to read is no longer specified, but found > out by reading until end of data. > > Tested on a MacBookAir3,1, but the original problem has not been > reproduced. So, what should we do with this patch ? Apply it ? Thanks, Guenter > --- > drivers/hwmon/applesmc.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c > index 98814d1..c0ff350 100644 > --- a/drivers/hwmon/applesmc.c > +++ b/drivers/hwmon/applesmc.c > @@ -230,6 +230,7 @@ static int send_argument(const char *key) > > static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) > { > + u8 status, data; > int i; > > if (send_command(cmd) || send_argument(key)) { > @@ -237,6 +238,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) > return -EIO; > } > > + /* This has no effect on newer (2012) SMCs */ > if (send_byte(len, APPLESMC_DATA_PORT)) { > pr_warn("%.4s: read len fail\n", key); > return -EIO; > @@ -250,6 +252,17 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) > buffer[i] = inb(APPLESMC_DATA_PORT); > } > > + /* Read the data port until bit0 is cleared */ > + for (i = 0; i < 16; i++) { > + udelay(APPLESMC_MIN_WAIT); > + status = inb(APPLESMC_CMD_PORT); > + if (!(status & 0x01)) > + break; > + data = inb(APPLESMC_DATA_PORT); > + } > + if (i) > + pr_warn("flushed %d bytes, last value is: %d\n", i, data); > + > return 0; > } > > ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-07 23:42 ` Guenter Roeck 0 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-07 23:42 UTC (permalink / raw) To: Henrik Rydberg Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On 10/02/2013 10:24 AM, Henrik Rydberg wrote: >>From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001 > From: Henrik Rydberg <rydberg@euromail.se> > Date: Wed, 2 Oct 2013 19:15:03 +0200 > Subject: [PATCH] applesmc remedy take 2 > > Conjectured problem: there are remnant bytes ready on the data line > which corrupts the read after a failure. > > Remedy: assuming bit0 is the read valid line, try to flush it before > starting a new command. > > Tests by Chris suggests reading the status is enough for the problem > to go away, which is consistent with a change in the SMC interface, > where the number of bytes to read is no longer specified, but found > out by reading until end of data. > > Tested on a MacBookAir3,1, but the original problem has not been > reproduced. So, what should we do with this patch ? Apply it ? Thanks, Guenter > --- > drivers/hwmon/applesmc.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c > index 98814d1..c0ff350 100644 > --- a/drivers/hwmon/applesmc.c > +++ b/drivers/hwmon/applesmc.c > @@ -230,6 +230,7 @@ static int send_argument(const char *key) > > static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) > { > + u8 status, data; > int i; > > if (send_command(cmd) || send_argument(key)) { > @@ -237,6 +238,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) > return -EIO; > } > > + /* This has no effect on newer (2012) SMCs */ > if (send_byte(len, APPLESMC_DATA_PORT)) { > pr_warn("%.4s: read len fail\n", key); > return -EIO; > @@ -250,6 +252,17 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) > buffer[i] = inb(APPLESMC_DATA_PORT); > } > > + /* Read the data port until bit0 is cleared */ > + for (i = 0; i < 16; i++) { > + udelay(APPLESMC_MIN_WAIT); > + status = inb(APPLESMC_CMD_PORT); > + if (!(status & 0x01)) > + break; > + data = inb(APPLESMC_DATA_PORT); > + } > + if (i) > + pr_warn("flushed %d bytes, last value is: %d\n", i, data); > + > return 0; > } > > _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-07 23:42 ` [lm-sensors] " Guenter Roeck @ 2013-10-07 23:46 ` Chris Murphy -1 siblings, 0 replies; 86+ messages in thread From: Chris Murphy @ 2013-10-07 23:46 UTC (permalink / raw) To: Guenter Roeck Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Oct 7, 2013, at 5:42 PM, Guenter Roeck <linux@roeck-us.net> wrote: > On 10/02/2013 10:24 AM, Henrik Rydberg wrote: > >>> From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001 >> From: Henrik Rydberg <rydberg@euromail.se> >> Date: Wed, 2 Oct 2013 19:15:03 +0200 >> Subject: [PATCH] applesmc remedy take 2 >> >> Conjectured problem: there are remnant bytes ready on the data line >> which corrupts the read after a failure. >> >> Remedy: assuming bit0 is the read valid line, try to flush it before >> starting a new command. >> >> Tests by Chris suggests reading the status is enough for the problem >> to go away, which is consistent with a change in the SMC interface, >> where the number of bytes to read is no longer specified, but found >> out by reading until end of data. >> >> Tested on a MacBookAir3,1, but the original problem has not been >> reproduced. > > So, what should we do with this patch ? Apply it ? So far I'm getting nothing on the original machine. As of today it's applied as the last patch on 3.12.0-0.rc4.git0.1.fc20.x86_64. Unfortunately at the moment I'm a bit too dense to figure out how to get a new kernel applied to an existing live package so I can try this on a USB stick. While maybe unrelated, the oops was occurring at least 4x as often booted from USB stick media than HDD. Chris ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-07 23:46 ` Chris Murphy 0 siblings, 0 replies; 86+ messages in thread From: Chris Murphy @ 2013-10-07 23:46 UTC (permalink / raw) To: Guenter Roeck Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Oct 7, 2013, at 5:42 PM, Guenter Roeck <linux@roeck-us.net> wrote: > On 10/02/2013 10:24 AM, Henrik Rydberg wrote: > >>> From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001 >> From: Henrik Rydberg <rydberg@euromail.se> >> Date: Wed, 2 Oct 2013 19:15:03 +0200 >> Subject: [PATCH] applesmc remedy take 2 >> >> Conjectured problem: there are remnant bytes ready on the data line >> which corrupts the read after a failure. >> >> Remedy: assuming bit0 is the read valid line, try to flush it before >> starting a new command. >> >> Tests by Chris suggests reading the status is enough for the problem >> to go away, which is consistent with a change in the SMC interface, >> where the number of bytes to read is no longer specified, but found >> out by reading until end of data. >> >> Tested on a MacBookAir3,1, but the original problem has not been >> reproduced. > > So, what should we do with this patch ? Apply it ? So far I'm getting nothing on the original machine. As of today it's applied as the last patch on 3.12.0-0.rc4.git0.1.fc20.x86_64. Unfortunately at the moment I'm a bit too dense to figure out how to get a new kernel applied to an existing live package so I can try this on a USB stick. While maybe unrelated, the oops was occurring at least 4x as often booted from USB stick media than HDD. Chris _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-07 23:46 ` [lm-sensors] " Chris Murphy @ 2013-10-08 15:48 ` Guenter Roeck -1 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-08 15:48 UTC (permalink / raw) To: Chris Murphy Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Mon, Oct 07, 2013 at 05:46:55PM -0600, Chris Murphy wrote: > > On Oct 7, 2013, at 5:42 PM, Guenter Roeck <linux@roeck-us.net> wrote: > > > On 10/02/2013 10:24 AM, Henrik Rydberg wrote: > > > >>> From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001 > >> From: Henrik Rydberg <rydberg@euromail.se> > >> Date: Wed, 2 Oct 2013 19:15:03 +0200 > >> Subject: [PATCH] applesmc remedy take 2 > >> > >> Conjectured problem: there are remnant bytes ready on the data line > >> which corrupts the read after a failure. > >> > >> Remedy: assuming bit0 is the read valid line, try to flush it before > >> starting a new command. > >> > >> Tests by Chris suggests reading the status is enough for the problem > >> to go away, which is consistent with a change in the SMC interface, > >> where the number of bytes to read is no longer specified, but found > >> out by reading until end of data. > >> > >> Tested on a MacBookAir3,1, but the original problem has not been > >> reproduced. > > > > So, what should we do with this patch ? Apply it ? > > So far I'm getting nothing on the original machine. As of today it's applied as the last patch on 3.12.0-0.rc4.git0.1.fc20.x86_64. Unfortunately at the moment I'm a bit too dense to figure out how to get a new kernel applied to an existing live package so I can try this on a USB stick. While maybe unrelated, the oops was occurring at least 4x as often booted from USB stick media than HDD. > Seems to me we should apply it. Henrik, what do you think ? Did you have time for additional testing ? If you think we should apply it, please send me a signed patch. Thanks, Guenter ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-08 15:48 ` Guenter Roeck 0 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-08 15:48 UTC (permalink / raw) To: Chris Murphy Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Mon, Oct 07, 2013 at 05:46:55PM -0600, Chris Murphy wrote: > > On Oct 7, 2013, at 5:42 PM, Guenter Roeck <linux@roeck-us.net> wrote: > > > On 10/02/2013 10:24 AM, Henrik Rydberg wrote: > > > >>> From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001 > >> From: Henrik Rydberg <rydberg@euromail.se> > >> Date: Wed, 2 Oct 2013 19:15:03 +0200 > >> Subject: [PATCH] applesmc remedy take 2 > >> > >> Conjectured problem: there are remnant bytes ready on the data line > >> which corrupts the read after a failure. > >> > >> Remedy: assuming bit0 is the read valid line, try to flush it before > >> starting a new command. > >> > >> Tests by Chris suggests reading the status is enough for the problem > >> to go away, which is consistent with a change in the SMC interface, > >> where the number of bytes to read is no longer specified, but found > >> out by reading until end of data. > >> > >> Tested on a MacBookAir3,1, but the original problem has not been > >> reproduced. > > > > So, what should we do with this patch ? Apply it ? > > So far I'm getting nothing on the original machine. As of today it's applied as the last patch on 3.12.0-0.rc4.git0.1.fc20.x86_64. Unfortunately at the moment I'm a bit too dense to figure out how to get a new kernel applied to an existing live package so I can try this on a USB stick. While maybe unrelated, the oops was occurring at least 4x as often booted from USB stick media than HDD. > Seems to me we should apply it. Henrik, what do you think ? Did you have time for additional testing ? If you think we should apply it, please send me a signed patch. Thanks, Guenter _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-08 15:48 ` [lm-sensors] " Guenter Roeck @ 2013-10-08 16:29 ` Henrik Rydberg -1 siblings, 0 replies; 86+ messages in thread From: Henrik Rydberg @ 2013-10-08 16:29 UTC (permalink / raw) To: Guenter Roeck Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org Hi Guenter, > > > So, what should we do with this patch ? Apply it ? > > > > So far I'm getting nothing on the original machine. As of today it's applied as the last patch on 3.12.0-0.rc4.git0.1.fc20.x86_64. Unfortunately at the moment I'm a bit too dense to figure out how to get a new kernel applied to an existing live package so I can try this on a USB stick. While maybe unrelated, the oops was occurring at least 4x as often booted from USB stick media than HDD. > > > > Seems to me we should apply it. > > Henrik, > > what do you think ? Did you have time for additional testing ? > If you think we should apply it, please send me a signed patch. I want to make one more test tomorrow, then I'll send a proper patch. Thanks! Henrik ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-08 16:29 ` Henrik Rydberg 0 siblings, 0 replies; 86+ messages in thread From: Henrik Rydberg @ 2013-10-08 16:29 UTC (permalink / raw) To: Guenter Roeck Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org Hi Guenter, > > > So, what should we do with this patch ? Apply it ? > > > > So far I'm getting nothing on the original machine. As of today it's applied as the last patch on 3.12.0-0.rc4.git0.1.fc20.x86_64. Unfortunately at the moment I'm a bit too dense to figure out how to get a new kernel applied to an existing live package so I can try this on a USB stick. While maybe unrelated, the oops was occurring at least 4x as often booted from USB stick media than HDD. > > > > Seems to me we should apply it. > > Henrik, > > what do you think ? Did you have time for additional testing ? > If you think we should apply it, please send me a signed patch. I want to make one more test tomorrow, then I'll send a proper patch. Thanks! Henrik _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-08 16:29 ` [lm-sensors] " Henrik Rydberg @ 2013-10-08 16:29 ` Guenter Roeck -1 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-08 16:29 UTC (permalink / raw) To: Henrik Rydberg Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Tue, Oct 08, 2013 at 06:29:17PM +0200, Henrik Rydberg wrote: > Hi Guenter, > > > > > So, what should we do with this patch ? Apply it ? > > > > > > So far I'm getting nothing on the original machine. As of today it's applied as the last patch on 3.12.0-0.rc4.git0.1.fc20.x86_64. Unfortunately at the moment I'm a bit too dense to figure out how to get a new kernel applied to an existing live package so I can try this on a USB stick. While maybe unrelated, the oops was occurring at least 4x as often booted from USB stick media than HDD. > > > > > > > Seems to me we should apply it. > > > > Henrik, > > > > what do you think ? Did you have time for additional testing ? > > If you think we should apply it, please send me a signed patch. > > I want to make one more test tomorrow, then I'll send a proper patch. Thanks! > Ok, great. Thanks, Guenter ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-08 16:29 ` Guenter Roeck 0 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-08 16:29 UTC (permalink / raw) To: Henrik Rydberg Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Tue, Oct 08, 2013 at 06:29:17PM +0200, Henrik Rydberg wrote: > Hi Guenter, > > > > > So, what should we do with this patch ? Apply it ? > > > > > > So far I'm getting nothing on the original machine. As of today it's applied as the last patch on 3.12.0-0.rc4.git0.1.fc20.x86_64. Unfortunately at the moment I'm a bit too dense to figure out how to get a new kernel applied to an existing live package so I can try this on a USB stick. While maybe unrelated, the oops was occurring at least 4x as often booted from USB stick media than HDD. > > > > > > > Seems to me we should apply it. > > > > Henrik, > > > > what do you think ? Did you have time for additional testing ? > > If you think we should apply it, please send me a signed patch. > > I want to make one more test tomorrow, then I'll send a proper patch. Thanks! > Ok, great. Thanks, Guenter _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-08 16:29 ` [lm-sensors] " Guenter Roeck @ 2013-10-09 8:29 ` Henrik Rydberg -1 siblings, 0 replies; 86+ messages in thread From: Henrik Rydberg @ 2013-10-09 8:29 UTC (permalink / raw) To: Guenter Roeck Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org Hi Guenter, > > I want to make one more test tomorrow, then I'll send a proper patch. Thanks! > > > Ok, great. And here it is, with compiler warning fixed. Thanks everybody, Henrik >From 18fa0f55b764ad0fe5fc49f81bae281e5110ed56 Mon Sep 17 00:00:00 2001 From: Henrik Rydberg <rydberg@euromail.se> Date: Wed, 2 Oct 2013 19:15:03 +0200 Subject: [PATCH] hwmon: (applesmc) Always read until end of data The crash reported and investigated in commit 5f4513 turned out to be caused by a change to the read interface on newer (2012) SMCs. Tests by Chris show that simply reading the data valid line is enough for the problem to go away. Additional tests show that the newer SMCs no longer wait for the number of requested bytes, but start sending data right away. Apparently the number of bytes to read is no longer specified as before, but instead found out by reading until end of data. Failure to read until end of data confuses the state machine, which eventually causes the crash. As a remedy, assuming bit0 is the read valid line, make sure there is nothing more to read before leaving the read function. Tested to resolve the original problem, and runtested on MBA3,1, MBP4,1, MBP8,2, MBP10,1, MBP10,2. The patch seems to have no effect on machines before 2012. Tested-by: Chris Murphy <chris@cmurf.com> Signed-off-by: Henrik Rydberg <rydberg@euromail.se> --- drivers/hwmon/applesmc.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c index 98814d1..3288f13 100644 --- a/drivers/hwmon/applesmc.c +++ b/drivers/hwmon/applesmc.c @@ -230,6 +230,7 @@ static int send_argument(const char *key) static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) { + u8 status, data = 0; int i; if (send_command(cmd) || send_argument(key)) { @@ -237,6 +238,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) return -EIO; } + /* This has no effect on newer (2012) SMCs */ if (send_byte(len, APPLESMC_DATA_PORT)) { pr_warn("%.4s: read len fail\n", key); return -EIO; @@ -250,6 +252,17 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) buffer[i] = inb(APPLESMC_DATA_PORT); } + /* Read the data port until bit0 is cleared */ + for (i = 0; i < 16; i++) { + udelay(APPLESMC_MIN_WAIT); + status = inb(APPLESMC_CMD_PORT); + if (!(status & 0x01)) + break; + data = inb(APPLESMC_DATA_PORT); + } + if (i) + pr_warn("flushed %d bytes, last value is: %d\n", i, data); + return 0; } -- 1.8.3.4 ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-09 8:29 ` Henrik Rydberg 0 siblings, 0 replies; 86+ messages in thread From: Henrik Rydberg @ 2013-10-09 8:29 UTC (permalink / raw) To: Guenter Roeck Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org Hi Guenter, > > I want to make one more test tomorrow, then I'll send a proper patch. Thanks! > > > Ok, great. And here it is, with compiler warning fixed. Thanks everybody, Henrik From 18fa0f55b764ad0fe5fc49f81bae281e5110ed56 Mon Sep 17 00:00:00 2001 From: Henrik Rydberg <rydberg@euromail.se> Date: Wed, 2 Oct 2013 19:15:03 +0200 Subject: [PATCH] hwmon: (applesmc) Always read until end of data The crash reported and investigated in commit 5f4513 turned out to be caused by a change to the read interface on newer (2012) SMCs. Tests by Chris show that simply reading the data valid line is enough for the problem to go away. Additional tests show that the newer SMCs no longer wait for the number of requested bytes, but start sending data right away. Apparently the number of bytes to read is no longer specified as before, but instead found out by reading until end of data. Failure to read until end of data confuses the state machine, which eventually causes the crash. As a remedy, assuming bit0 is the read valid line, make sure there is nothing more to read before leaving the read function. Tested to resolve the original problem, and runtested on MBA3,1, MBP4,1, MBP8,2, MBP10,1, MBP10,2. The patch seems to have no effect on machines before 2012. Tested-by: Chris Murphy <chris@cmurf.com> Signed-off-by: Henrik Rydberg <rydberg@euromail.se> --- drivers/hwmon/applesmc.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c index 98814d1..3288f13 100644 --- a/drivers/hwmon/applesmc.c +++ b/drivers/hwmon/applesmc.c @@ -230,6 +230,7 @@ static int send_argument(const char *key) static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) { + u8 status, data = 0; int i; if (send_command(cmd) || send_argument(key)) { @@ -237,6 +238,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) return -EIO; } + /* This has no effect on newer (2012) SMCs */ if (send_byte(len, APPLESMC_DATA_PORT)) { pr_warn("%.4s: read len fail\n", key); return -EIO; @@ -250,6 +252,17 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len) buffer[i] = inb(APPLESMC_DATA_PORT); } + /* Read the data port until bit0 is cleared */ + for (i = 0; i < 16; i++) { + udelay(APPLESMC_MIN_WAIT); + status = inb(APPLESMC_CMD_PORT); + if (!(status & 0x01)) + break; + data = inb(APPLESMC_DATA_PORT); + } + if (i) + pr_warn("flushed %d bytes, last value is: %d\n", i, data); + return 0; } -- 1.8.3.4 _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply related [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-10-09 8:29 ` [lm-sensors] " Henrik Rydberg @ 2013-10-09 16:52 ` Guenter Roeck -1 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-09 16:52 UTC (permalink / raw) To: Henrik Rydberg Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Wed, Oct 09, 2013 at 10:29:39AM +0200, Henrik Rydberg wrote: > From 18fa0f55b764ad0fe5fc49f81bae281e5110ed56 Mon Sep 17 00:00:00 2001 > From: Henrik Rydberg <rydberg@euromail.se> > Date: Wed, 2 Oct 2013 19:15:03 +0200 > Subject: [PATCH] hwmon: (applesmc) Always read until end of data > > The crash reported and investigated in commit 5f4513 turned out to be > caused by a change to the read interface on newer (2012) SMCs. > > Tests by Chris show that simply reading the data valid line is enough > for the problem to go away. Additional tests show that the newer SMCs > no longer wait for the number of requested bytes, but start sending > data right away. Apparently the number of bytes to read is no longer > specified as before, but instead found out by reading until end of > data. Failure to read until end of data confuses the state machine, > which eventually causes the crash. > > As a remedy, assuming bit0 is the read valid line, make sure there is > nothing more to read before leaving the read function. > > Tested to resolve the original problem, and runtested on MBA3,1, > MBP4,1, MBP8,2, MBP10,1, MBP10,2. The patch seems to have no effect on > machines before 2012. > > Tested-by: Chris Murphy <chris@cmurf.com> > Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Applied. I'll do my usual sanity testing and send it to Linus either tomorrow or Friday. Thanks, Guenter ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-10-09 16:52 ` Guenter Roeck 0 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-10-09 16:52 UTC (permalink / raw) To: Henrik Rydberg Cc: Chris Murphy, Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org On Wed, Oct 09, 2013 at 10:29:39AM +0200, Henrik Rydberg wrote: > From 18fa0f55b764ad0fe5fc49f81bae281e5110ed56 Mon Sep 17 00:00:00 2001 > From: Henrik Rydberg <rydberg@euromail.se> > Date: Wed, 2 Oct 2013 19:15:03 +0200 > Subject: [PATCH] hwmon: (applesmc) Always read until end of data > > The crash reported and investigated in commit 5f4513 turned out to be > caused by a change to the read interface on newer (2012) SMCs. > > Tests by Chris show that simply reading the data valid line is enough > for the problem to go away. Additional tests show that the newer SMCs > no longer wait for the number of requested bytes, but start sending > data right away. Apparently the number of bytes to read is no longer > specified as before, but instead found out by reading until end of > data. Failure to read until end of data confuses the state machine, > which eventually causes the crash. > > As a remedy, assuming bit0 is the read valid line, make sure there is > nothing more to read before leaving the read function. > > Tested to resolve the original problem, and runtested on MBA3,1, > MBP4,1, MBP8,2, MBP10,1, MBP10,2. The patch seems to have no effect on > machines before 2012. > > Tested-by: Chris Murphy <chris@cmurf.com> > Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Applied. I'll do my usual sanity testing and send it to Linus either tomorrow or Friday. Thanks, Guenter _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: applesmc oops in 3.10/3.11 2013-09-27 16:21 ` [lm-sensors] " Josh Boyer @ 2013-09-27 18:01 ` Guenter Roeck -1 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-09-27 18:01 UTC (permalink / raw) To: Josh Boyer Cc: Henrik Rydberg, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote: > On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote: > >> > > This suggests that initialization may be attempted more than once. The key cache > >> > > is allocated only once, but the number of keys is read for each attempt. > >> > > > >> > > No idea if that can happen, but if the number of keys can increase after > >> > > the first initialization attempt you would have an explanation for the crash. > >> > > >> > Good idea, and easy enough to test with the patch below. > >> > > >> Should we apply this patch even though it may not solve the specific problem ? > > > > Yes, why not - it certainly won't hurt. I am running it right now, so > > it is at least run-tested. > > > >> Again, not sure if the key count can change, but the current code is at the very > >> least inconsistent, as it keeps reading the key count without updating or > >> verifying the cache size. > > > > Yes - I agree that the error state is far-fetched, but it is hard to > > see any other logical explanation. There is of course always the > > possibility that the problem is somewhere else completely. > > > > Proper patch attached. > > > > Thanks, > > Henrik > > > > --- > > > > From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 > > From: Henrik Rydberg <rydberg@euromail.se> > > Date: Thu, 26 Sep 2013 08:33:16 +0200 > > Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding > > > > After reports from Chris and Josh Boyer of a rare crash in applesmc, > > Guenter pointed at the initialization problem fixed below. The patch > > has not been verified to fix the crash, but should be applied > > regardless. > > > > Reported-by: <jwboyer@fedoraproject.org> > > Suggested-by: Guenter Roeck <linux@roeck-us.net> > > Signed-off-by: Henrik Rydberg <rydberg@euromail.se> > > --- > > drivers/hwmon/applesmc.c | 11 ++++++++++- > > 1 file changed, 10 insertions(+), 1 deletion(-) > > Thanks for the quick reply. I'll get this rolled into our kernels soon. > Patch is now upstream (commit 5f4513864). Guenter ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: [lm-sensors] applesmc oops in 3.10/3.11 @ 2013-09-27 18:01 ` Guenter Roeck 0 siblings, 0 replies; 86+ messages in thread From: Guenter Roeck @ 2013-09-27 18:01 UTC (permalink / raw) To: Josh Boyer Cc: Henrik Rydberg, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote: > On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote: > >> > > This suggests that initialization may be attempted more than once. The key cache > >> > > is allocated only once, but the number of keys is read for each attempt. > >> > > > >> > > No idea if that can happen, but if the number of keys can increase after > >> > > the first initialization attempt you would have an explanation for the crash. > >> > > >> > Good idea, and easy enough to test with the patch below. > >> > > >> Should we apply this patch even though it may not solve the specific problem ? > > > > Yes, why not - it certainly won't hurt. I am running it right now, so > > it is at least run-tested. > > > >> Again, not sure if the key count can change, but the current code is at the very > >> least inconsistent, as it keeps reading the key count without updating or > >> verifying the cache size. > > > > Yes - I agree that the error state is far-fetched, but it is hard to > > see any other logical explanation. There is of course always the > > possibility that the problem is somewhere else completely. > > > > Proper patch attached. > > > > Thanks, > > Henrik > > > > --- > > > > From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001 > > From: Henrik Rydberg <rydberg@euromail.se> > > Date: Thu, 26 Sep 2013 08:33:16 +0200 > > Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding > > > > After reports from Chris and Josh Boyer of a rare crash in applesmc, > > Guenter pointed at the initialization problem fixed below. The patch > > has not been verified to fix the crash, but should be applied > > regardless. > > > > Reported-by: <jwboyer@fedoraproject.org> > > Suggested-by: Guenter Roeck <linux@roeck-us.net> > > Signed-off-by: Henrik Rydberg <rydberg@euromail.se> > > --- > > drivers/hwmon/applesmc.c | 11 ++++++++++- > > 1 file changed, 10 insertions(+), 1 deletion(-) > > Thanks for the quick reply. I'll get this rolled into our kernels soon. > Patch is now upstream (commit 5f4513864). Guenter _______________________________________________ lm-sensors mailing list lm-sensors@lm-sensors.org http://lists.lm-sensors.org/mailman/listinfo/lm-sensors ^ permalink raw reply [flat|nested] 86+ messages in thread
end of thread, other threads:[~2013-10-09 16:53 UTC | newest] Thread overview: 86+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-09-25 19:06 applesmc oops in 3.10/3.11 Josh Boyer 2013-09-25 19:06 ` [lm-sensors] " Josh Boyer 2013-09-25 19:56 ` Guenter Roeck 2013-09-25 19:56 ` [lm-sensors] " Guenter Roeck 2013-09-25 21:48 ` Henrik Rydberg 2013-09-25 21:48 ` [lm-sensors] " Henrik Rydberg 2013-09-25 22:08 ` Guenter Roeck 2013-09-25 22:08 ` [lm-sensors] " Guenter Roeck 2013-09-26 6:34 ` Henrik Rydberg 2013-09-26 6:34 ` [lm-sensors] " Henrik Rydberg 2013-09-26 10:36 ` Guenter Roeck 2013-09-26 10:36 ` [lm-sensors] " Guenter Roeck 2013-09-26 11:13 ` Henrik Rydberg 2013-09-26 11:13 ` [lm-sensors] " Henrik Rydberg 2013-09-26 10:53 ` Guenter Roeck 2013-09-26 10:53 ` [lm-sensors] " Guenter Roeck 2013-09-26 11:11 ` Henrik Rydberg 2013-09-26 11:11 ` [lm-sensors] " Henrik Rydberg 2013-09-27 16:21 ` Josh Boyer 2013-09-27 16:21 ` [lm-sensors] " Josh Boyer 2013-09-27 17:12 ` Guenter Roeck 2013-09-27 17:12 ` [lm-sensors] " Guenter Roeck 2013-09-27 17:41 ` Chris Murphy 2013-09-27 17:41 ` [lm-sensors] " Chris Murphy 2013-09-27 17:59 ` Guenter Roeck 2013-09-27 17:59 ` [lm-sensors] " Guenter Roeck 2013-09-27 18:03 ` Chris Murphy 2013-09-27 18:03 ` [lm-sensors] " Chris Murphy 2013-09-27 23:33 ` Guenter Roeck 2013-09-27 23:33 ` [lm-sensors] " Guenter Roeck 2013-10-01 1:57 ` Chris Murphy 2013-10-01 1:57 ` [lm-sensors] " Chris Murphy 2013-10-01 3:37 ` Guenter Roeck 2013-10-01 3:37 ` [lm-sensors] " Guenter Roeck 2013-10-01 10:55 ` Henrik Rydberg 2013-10-01 10:55 ` [lm-sensors] " Henrik Rydberg 2013-10-01 15:19 ` Guenter Roeck 2013-10-01 15:19 ` [lm-sensors] " Guenter Roeck 2013-10-01 15:33 ` Chris Murphy 2013-10-01 15:33 ` [lm-sensors] " Chris Murphy 2013-10-01 16:24 ` Guenter Roeck 2013-10-01 16:24 ` [lm-sensors] " Guenter Roeck 2013-10-02 1:09 ` Chris Murphy 2013-10-02 1:09 ` [lm-sensors] " Chris Murphy 2013-10-02 3:51 ` Guenter Roeck 2013-10-02 3:51 ` [lm-sensors] " Guenter Roeck 2013-10-02 3:55 ` Chris Murphy 2013-10-02 3:55 ` [lm-sensors] " Chris Murphy 2013-10-02 4:02 ` Guenter Roeck 2013-10-02 4:02 ` [lm-sensors] " Guenter Roeck 2013-10-02 9:53 ` Henrik Rydberg 2013-10-02 9:53 ` [lm-sensors] " Henrik Rydberg 2013-10-02 13:30 ` Guenter Roeck 2013-10-02 13:30 ` [lm-sensors] " Guenter Roeck 2013-10-02 16:34 ` Henrik Rydberg 2013-10-02 16:34 ` [lm-sensors] " Henrik Rydberg 2013-10-02 16:47 ` Guenter Roeck 2013-10-02 16:47 ` [lm-sensors] " Guenter Roeck 2013-10-02 17:24 ` Henrik Rydberg 2013-10-02 17:24 ` [lm-sensors] " Henrik Rydberg 2013-10-02 18:02 ` Guenter Roeck 2013-10-02 18:02 ` [lm-sensors] " Guenter Roeck 2013-10-02 18:33 ` Chris Murphy 2013-10-02 18:33 ` [lm-sensors] " Chris Murphy 2013-10-02 20:59 ` Guenter Roeck 2013-10-02 20:59 ` [lm-sensors] " Guenter Roeck 2013-10-02 21:34 ` Chris Murphy 2013-10-02 21:34 ` [lm-sensors] " Chris Murphy 2013-10-02 23:32 ` Guenter Roeck 2013-10-02 23:32 ` [lm-sensors] " Guenter Roeck 2013-10-07 23:42 ` Guenter Roeck 2013-10-07 23:42 ` [lm-sensors] " Guenter Roeck 2013-10-07 23:46 ` Chris Murphy 2013-10-07 23:46 ` [lm-sensors] " Chris Murphy 2013-10-08 15:48 ` Guenter Roeck 2013-10-08 15:48 ` [lm-sensors] " Guenter Roeck 2013-10-08 16:29 ` Henrik Rydberg 2013-10-08 16:29 ` [lm-sensors] " Henrik Rydberg 2013-10-08 16:29 ` Guenter Roeck 2013-10-08 16:29 ` [lm-sensors] " Guenter Roeck 2013-10-09 8:29 ` Henrik Rydberg 2013-10-09 8:29 ` [lm-sensors] " Henrik Rydberg 2013-10-09 16:52 ` Guenter Roeck 2013-10-09 16:52 ` [lm-sensors] " Guenter Roeck 2013-09-27 18:01 ` Guenter Roeck 2013-09-27 18:01 ` [lm-sensors] " Guenter Roeck
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.