All of lore.kernel.org
 help / color / mirror / Atom feed
* applesmc oops in 3.10/3.11
@ 2013-09-25 19:06 ` Josh Boyer
  0 siblings, 0 replies; 86+ messages in thread
From: Josh Boyer @ 2013-09-25 19:06 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: khali, Guenter Roeck, lm-sensors, Linux-Kernel@Vger. Kernel. Org,
	bugzilla

Hi All,

Chris has reported[1] an issue with the applesmc driver in the 3.10
and 3.11 kernels.  This seems to be a bit transient, but the oops is
consistent across those releases.  This is with a MacBook Pro 4,1.
The backtrace is below.

Any ideas on this one?

josh

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1011719

BUG: unable to handle kernel paging request at ffff880322aed005
IP: [<ffffffffa05dd345>] applesmc_get_entry_by_index+0x25/0xf0 [applesmc]
PGD 1fe5067 PUD 0
Oops: 0000 [#1] SMP
Modules linked in: b43 coretemp kvm_intel kvm joydev bcma mac80211
uvcvideo snd_hda_codec_realtek applesmc(+) iTCO_wdt cfg80211
iTCO_vendor_support microcode input_polldev snd_hda_intel
snd_hda_codec i2c_i801 videobuf2_vmalloc videobuf2_memops
videobuf2_core videodev snd_hwdep snd_seq bcm5974 btusb hid_appleir
snd_seq_device media bluetooth snd_pcm lpc_ich mfd_core rfkill
snd_page_alloc snd_timer snd apple_bl soundcore shpchp acpi_cpufreq
mperf nfsd auth_rpcgss nfs_acl lockd isofs squashfs btrfs libcrc32c
xor zlib_deflate raid6_pq usb_storage nouveau firewire_ohci
ata_generic pata_acpi firewire_core crc_itu_t sky2 mxm_wmi wmi
i2c_algo_bit ssb drm_kms_helper mmc_core ttm drm i2c_core video sunrpc
loop
CPU: 1 PID: 492 Comm: systemd-udevd Not tainted 3.11.0-300.fc20.x86_64 #1
Hardware name: Apple Inc. MacBookPro4,1/Mac-F42C89C8, BIOS
MBP41.88Z.00C1.B03.0802271651 02/27/08
task: ffff880132a407a0 ti: ffff8801343fe000 task.ti: ffff8801343fe000
RIP: 0010:[<ffffffffa05dd345>]  [<ffffffffa05dd345>]
applesmc_get_entry_by_index+0x25/0xf0 [applesmc]
RSP: 0018:ffff8801343ffa18  EFLAGS: 00010286
RAX: 0000000026000000 RBX: ffff880322aed000 RCX: ffff8801343ffaf4
RDX: 0000000072000000 RSI: ffffffffa05df06e RDI: 0000000026000000
RBP: ffff8801343ffa38 R08: ffff8801343fe000 R09: ffff880132a6c100
R10: 0000000000000000 R11: 0000000000014180 R12: 0000000026000000
R13: 000000004c000001 R14: ffffffffa05df06e R15: ffff8801343ffa84
FS:  00007f6aac863880(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff880322aed005 CR3: 00000001340fa000 CR4: 00000000000007e0
Stack:
 000000000004deef ffffffffffffff10 0000000026000000 0000000000000000
 ffff8801343ffa70 ffffffffa05dd4f1 0000000000000032 0000000000000001
 ffffffffa05df06e ffff880134be3810 ffffffffa05e0660 ffff8801343ffaa8
Call Trace:
 [<ffffffffa05dd4f1>] applesmc_get_lower_bound+0x51/0xa0 [applesmc]
 [<ffffffffa05dd563>] applesmc_get_entry_by_key+0x23/0xc0 [applesmc]
 [<ffffffffa05dd9f9>] applesmc_read_key+0x19/0x70 [applesmc]
 [<ffffffffa05de08f>] applesmc_init_smcreg+0x8f/0x2d0 [applesmc]
 [<ffffffffa05de2e2>] applesmc_probe+0x12/0x30 [applesmc]
 [<ffffffff813f41dc>] platform_drv_probe+0x3c/0x70
 [<ffffffff813f18b2>] ? driver_sysfs_add+0x82/0xb0
 [<ffffffff813f1f77>] driver_probe_device+0x87/0x390
 [<ffffffff813f2280>] ? driver_probe_device+0x390/0x390
 [<ffffffff813f22bb>] __device_attach+0x3b/0x40
 [<ffffffff813eff73>] bus_for_each_drv+0x63/0xa0
 [<ffffffff813f1e78>] device_attach+0x88/0xa0
 [<ffffffff813f11e8>] bus_probe_device+0x98/0xc0
 [<ffffffff813eeeb4>] device_add+0x4c4/0x7a0
 [<ffffffff813f3cb1>] platform_device_add+0xd1/0x2d0
 [<ffffffff813f4430>] platform_device_register_full+0xe0/0x120
 [<ffffffffa05e3000>] ? 0xffffffffa05e2fff
 [<ffffffffa05e3098>] applesmc_init+0x98/0x1000 [applesmc]
 [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0
 [<ffffffff81052483>] ? set_memory_nx+0x43/0x50
 [<ffffffff810cc58d>] load_module+0x1bbd/0x2660
 [<ffffffff810c8890>] ? store_uevent+0x40/0x40
 [<ffffffff810cd1a6>] SyS_finit_module+0x86/0xb0
 [<ffffffff81652cd9>] system_call_fastpath+0x16/0x1b
Code: 84 00 00 00 00 00 66 66 66 66 90 55 48 63 c7 48 8d 14 40 48 89
e5 41 54 41 89 fc 53 48 8d 1c 90 48 83 ec 10 48 03 1d 03 33 00 00 <80>
7b 05 00 48 89 d8 74 12 48 83 c4 10 5b 41 5c 5d c3 66 0f 1f
RIP  [<ffffffffa05dd345>] applesmc_get_entry_by_index+0x25/0xf0 [applesmc]
 RSP <ffff8801343ffa18>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-09-25 19:06 ` Josh Boyer
  0 siblings, 0 replies; 86+ messages in thread
From: Josh Boyer @ 2013-09-25 19:06 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: khali, Guenter Roeck, lm-sensors, Linux-Kernel@Vger. Kernel. Org,
	bugzilla

Hi All,

Chris has reported[1] an issue with the applesmc driver in the 3.10
and 3.11 kernels.  This seems to be a bit transient, but the oops is
consistent across those releases.  This is with a MacBook Pro 4,1.
The backtrace is below.

Any ideas on this one?

josh

[1] https://bugzilla.redhat.com/show_bug.cgi?id\x1011719

BUG: unable to handle kernel paging request at ffff880322aed005
IP: [<ffffffffa05dd345>] applesmc_get_entry_by_index+0x25/0xf0 [applesmc]
PGD 1fe5067 PUD 0
Oops: 0000 [#1] SMP
Modules linked in: b43 coretemp kvm_intel kvm joydev bcma mac80211
uvcvideo snd_hda_codec_realtek applesmc(+) iTCO_wdt cfg80211
iTCO_vendor_support microcode input_polldev snd_hda_intel
snd_hda_codec i2c_i801 videobuf2_vmalloc videobuf2_memops
videobuf2_core videodev snd_hwdep snd_seq bcm5974 btusb hid_appleir
snd_seq_device media bluetooth snd_pcm lpc_ich mfd_core rfkill
snd_page_alloc snd_timer snd apple_bl soundcore shpchp acpi_cpufreq
mperf nfsd auth_rpcgss nfs_acl lockd isofs squashfs btrfs libcrc32c
xor zlib_deflate raid6_pq usb_storage nouveau firewire_ohci
ata_generic pata_acpi firewire_core crc_itu_t sky2 mxm_wmi wmi
i2c_algo_bit ssb drm_kms_helper mmc_core ttm drm i2c_core video sunrpc
loop
CPU: 1 PID: 492 Comm: systemd-udevd Not tainted 3.11.0-300.fc20.x86_64 #1
Hardware name: Apple Inc. MacBookPro4,1/Mac-F42C89C8, BIOS
MBP41.88Z.00C1.B03.0802271651 02/27/08
task: ffff880132a407a0 ti: ffff8801343fe000 task.ti: ffff8801343fe000
RIP: 0010:[<ffffffffa05dd345>]  [<ffffffffa05dd345>]
applesmc_get_entry_by_index+0x25/0xf0 [applesmc]
RSP: 0018:ffff8801343ffa18  EFLAGS: 00010286
RAX: 0000000026000000 RBX: ffff880322aed000 RCX: ffff8801343ffaf4
RDX: 0000000072000000 RSI: ffffffffa05df06e RDI: 0000000026000000
RBP: ffff8801343ffa38 R08: ffff8801343fe000 R09: ffff880132a6c100
R10: 0000000000000000 R11: 0000000000014180 R12: 0000000026000000
R13: 000000004c000001 R14: ffffffffa05df06e R15: ffff8801343ffa84
FS:  00007f6aac863880(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff880322aed005 CR3: 00000001340fa000 CR4: 00000000000007e0
Stack:
 000000000004deef ffffffffffffff10 0000000026000000 0000000000000000
 ffff8801343ffa70 ffffffffa05dd4f1 0000000000000032 0000000000000001
 ffffffffa05df06e ffff880134be3810 ffffffffa05e0660 ffff8801343ffaa8
Call Trace:
 [<ffffffffa05dd4f1>] applesmc_get_lower_bound+0x51/0xa0 [applesmc]
 [<ffffffffa05dd563>] applesmc_get_entry_by_key+0x23/0xc0 [applesmc]
 [<ffffffffa05dd9f9>] applesmc_read_key+0x19/0x70 [applesmc]
 [<ffffffffa05de08f>] applesmc_init_smcreg+0x8f/0x2d0 [applesmc]
 [<ffffffffa05de2e2>] applesmc_probe+0x12/0x30 [applesmc]
 [<ffffffff813f41dc>] platform_drv_probe+0x3c/0x70
 [<ffffffff813f18b2>] ? driver_sysfs_add+0x82/0xb0
 [<ffffffff813f1f77>] driver_probe_device+0x87/0x390
 [<ffffffff813f2280>] ? driver_probe_device+0x390/0x390
 [<ffffffff813f22bb>] __device_attach+0x3b/0x40
 [<ffffffff813eff73>] bus_for_each_drv+0x63/0xa0
 [<ffffffff813f1e78>] device_attach+0x88/0xa0
 [<ffffffff813f11e8>] bus_probe_device+0x98/0xc0
 [<ffffffff813eeeb4>] device_add+0x4c4/0x7a0
 [<ffffffff813f3cb1>] platform_device_add+0xd1/0x2d0
 [<ffffffff813f4430>] platform_device_register_full+0xe0/0x120
 [<ffffffffa05e3000>] ? 0xffffffffa05e2fff
 [<ffffffffa05e3098>] applesmc_init+0x98/0x1000 [applesmc]
 [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0
 [<ffffffff81052483>] ? set_memory_nx+0x43/0x50
 [<ffffffff810cc58d>] load_module+0x1bbd/0x2660
 [<ffffffff810c8890>] ? store_uevent+0x40/0x40
 [<ffffffff810cd1a6>] SyS_finit_module+0x86/0xb0
 [<ffffffff81652cd9>] system_call_fastpath+0x16/0x1b
Code: 84 00 00 00 00 00 66 66 66 66 90 55 48 63 c7 48 8d 14 40 48 89
e5 41 54 41 89 fc 53 48 8d 1c 90 48 83 ec 10 48 03 1d 03 33 00 00 <80>
7b 05 00 48 89 d8 74 12 48 83 c4 10 5b 41 5c 5d c3 66 0f 1f
RIP  [<ffffffffa05dd345>] applesmc_get_entry_by_index+0x25/0xf0 [applesmc]
 RSP <ffff8801343ffa18>

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-09-25 19:06 ` [lm-sensors] " Josh Boyer
@ 2013-09-25 19:56   ` Guenter Roeck
  -1 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-09-25 19:56 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Henrik Rydberg, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org, bugzilla

On Wed, Sep 25, 2013 at 03:06:25PM -0400, Josh Boyer wrote:
> Hi All,
> 
> Chris has reported[1] an issue with the applesmc driver in the 3.10
> and 3.11 kernels.  This seems to be a bit transient, but the oops is
> consistent across those releases.  This is with a MacBook Pro 4,1.
> The backtrace is below.
> 
> Any ideas on this one?
> 
Some of the bug reports suggest that there are related messages in at least some
of the kernel logs.

Jul 02 22:23:22 f19s.local kernel: applesmc: send_byte(0x04, 0x0300) fail: 0x01
Jul 02 22:23:22 f19s.local kernel: applesmc: : read len fail

This suggests that initialization may be attempted more than once. The key cache
is allocated only once, but the number of keys is read for each attempt.

No idea if that can happen, but if the number of keys can increase after
the first initialization attempt you would have an explanation for the crash.

Guenter


> josh
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1011719
> 
> BUG: unable to handle kernel paging request at ffff880322aed005
> IP: [<ffffffffa05dd345>] applesmc_get_entry_by_index+0x25/0xf0 [applesmc]
> PGD 1fe5067 PUD 0
> Oops: 0000 [#1] SMP
> Modules linked in: b43 coretemp kvm_intel kvm joydev bcma mac80211
> uvcvideo snd_hda_codec_realtek applesmc(+) iTCO_wdt cfg80211
> iTCO_vendor_support microcode input_polldev snd_hda_intel
> snd_hda_codec i2c_i801 videobuf2_vmalloc videobuf2_memops
> videobuf2_core videodev snd_hwdep snd_seq bcm5974 btusb hid_appleir
> snd_seq_device media bluetooth snd_pcm lpc_ich mfd_core rfkill
> snd_page_alloc snd_timer snd apple_bl soundcore shpchp acpi_cpufreq
> mperf nfsd auth_rpcgss nfs_acl lockd isofs squashfs btrfs libcrc32c
> xor zlib_deflate raid6_pq usb_storage nouveau firewire_ohci
> ata_generic pata_acpi firewire_core crc_itu_t sky2 mxm_wmi wmi
> i2c_algo_bit ssb drm_kms_helper mmc_core ttm drm i2c_core video sunrpc
> loop
> CPU: 1 PID: 492 Comm: systemd-udevd Not tainted 3.11.0-300.fc20.x86_64 #1
> Hardware name: Apple Inc. MacBookPro4,1/Mac-F42C89C8, BIOS
> MBP41.88Z.00C1.B03.0802271651 02/27/08
> task: ffff880132a407a0 ti: ffff8801343fe000 task.ti: ffff8801343fe000
> RIP: 0010:[<ffffffffa05dd345>]  [<ffffffffa05dd345>]
> applesmc_get_entry_by_index+0x25/0xf0 [applesmc]
> RSP: 0018:ffff8801343ffa18  EFLAGS: 00010286
> RAX: 0000000026000000 RBX: ffff880322aed000 RCX: ffff8801343ffaf4
> RDX: 0000000072000000 RSI: ffffffffa05df06e RDI: 0000000026000000
> RBP: ffff8801343ffa38 R08: ffff8801343fe000 R09: ffff880132a6c100
> R10: 0000000000000000 R11: 0000000000014180 R12: 0000000026000000
> R13: 000000004c000001 R14: ffffffffa05df06e R15: ffff8801343ffa84
> FS:  00007f6aac863880(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffff880322aed005 CR3: 00000001340fa000 CR4: 00000000000007e0
> Stack:
>  000000000004deef ffffffffffffff10 0000000026000000 0000000000000000
>  ffff8801343ffa70 ffffffffa05dd4f1 0000000000000032 0000000000000001
>  ffffffffa05df06e ffff880134be3810 ffffffffa05e0660 ffff8801343ffaa8
> Call Trace:
>  [<ffffffffa05dd4f1>] applesmc_get_lower_bound+0x51/0xa0 [applesmc]
>  [<ffffffffa05dd563>] applesmc_get_entry_by_key+0x23/0xc0 [applesmc]
>  [<ffffffffa05dd9f9>] applesmc_read_key+0x19/0x70 [applesmc]
>  [<ffffffffa05de08f>] applesmc_init_smcreg+0x8f/0x2d0 [applesmc]
>  [<ffffffffa05de2e2>] applesmc_probe+0x12/0x30 [applesmc]
>  [<ffffffff813f41dc>] platform_drv_probe+0x3c/0x70
>  [<ffffffff813f18b2>] ? driver_sysfs_add+0x82/0xb0
>  [<ffffffff813f1f77>] driver_probe_device+0x87/0x390
>  [<ffffffff813f2280>] ? driver_probe_device+0x390/0x390
>  [<ffffffff813f22bb>] __device_attach+0x3b/0x40
>  [<ffffffff813eff73>] bus_for_each_drv+0x63/0xa0
>  [<ffffffff813f1e78>] device_attach+0x88/0xa0
>  [<ffffffff813f11e8>] bus_probe_device+0x98/0xc0
>  [<ffffffff813eeeb4>] device_add+0x4c4/0x7a0
>  [<ffffffff813f3cb1>] platform_device_add+0xd1/0x2d0
>  [<ffffffff813f4430>] platform_device_register_full+0xe0/0x120
>  [<ffffffffa05e3000>] ? 0xffffffffa05e2fff
>  [<ffffffffa05e3098>] applesmc_init+0x98/0x1000 [applesmc]
>  [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0
>  [<ffffffff81052483>] ? set_memory_nx+0x43/0x50
>  [<ffffffff810cc58d>] load_module+0x1bbd/0x2660
>  [<ffffffff810c8890>] ? store_uevent+0x40/0x40
>  [<ffffffff810cd1a6>] SyS_finit_module+0x86/0xb0
>  [<ffffffff81652cd9>] system_call_fastpath+0x16/0x1b
> Code: 84 00 00 00 00 00 66 66 66 66 90 55 48 63 c7 48 8d 14 40 48 89
> e5 41 54 41 89 fc 53 48 8d 1c 90 48 83 ec 10 48 03 1d 03 33 00 00 <80>
> 7b 05 00 48 89 d8 74 12 48 83 c4 10 5b 41 5c 5d c3 66 0f 1f
> RIP  [<ffffffffa05dd345>] applesmc_get_entry_by_index+0x25/0xf0 [applesmc]
>  RSP <ffff8801343ffa18>
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-09-25 19:56   ` Guenter Roeck
  0 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-09-25 19:56 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Henrik Rydberg, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org, bugzilla

On Wed, Sep 25, 2013 at 03:06:25PM -0400, Josh Boyer wrote:
> Hi All,
> 
> Chris has reported[1] an issue with the applesmc driver in the 3.10
> and 3.11 kernels.  This seems to be a bit transient, but the oops is
> consistent across those releases.  This is with a MacBook Pro 4,1.
> The backtrace is below.
> 
> Any ideas on this one?
> 
Some of the bug reports suggest that there are related messages in at least some
of the kernel logs.

Jul 02 22:23:22 f19s.local kernel: applesmc: send_byte(0x04, 0x0300) fail: 0x01
Jul 02 22:23:22 f19s.local kernel: applesmc: : read len fail

This suggests that initialization may be attempted more than once. The key cache
is allocated only once, but the number of keys is read for each attempt.

No idea if that can happen, but if the number of keys can increase after
the first initialization attempt you would have an explanation for the crash.

Guenter


> josh
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id\x1011719
> 
> BUG: unable to handle kernel paging request at ffff880322aed005
> IP: [<ffffffffa05dd345>] applesmc_get_entry_by_index+0x25/0xf0 [applesmc]
> PGD 1fe5067 PUD 0
> Oops: 0000 [#1] SMP
> Modules linked in: b43 coretemp kvm_intel kvm joydev bcma mac80211
> uvcvideo snd_hda_codec_realtek applesmc(+) iTCO_wdt cfg80211
> iTCO_vendor_support microcode input_polldev snd_hda_intel
> snd_hda_codec i2c_i801 videobuf2_vmalloc videobuf2_memops
> videobuf2_core videodev snd_hwdep snd_seq bcm5974 btusb hid_appleir
> snd_seq_device media bluetooth snd_pcm lpc_ich mfd_core rfkill
> snd_page_alloc snd_timer snd apple_bl soundcore shpchp acpi_cpufreq
> mperf nfsd auth_rpcgss nfs_acl lockd isofs squashfs btrfs libcrc32c
> xor zlib_deflate raid6_pq usb_storage nouveau firewire_ohci
> ata_generic pata_acpi firewire_core crc_itu_t sky2 mxm_wmi wmi
> i2c_algo_bit ssb drm_kms_helper mmc_core ttm drm i2c_core video sunrpc
> loop
> CPU: 1 PID: 492 Comm: systemd-udevd Not tainted 3.11.0-300.fc20.x86_64 #1
> Hardware name: Apple Inc. MacBookPro4,1/Mac-F42C89C8, BIOS
> MBP41.88Z.00C1.B03.0802271651 02/27/08
> task: ffff880132a407a0 ti: ffff8801343fe000 task.ti: ffff8801343fe000
> RIP: 0010:[<ffffffffa05dd345>]  [<ffffffffa05dd345>]
> applesmc_get_entry_by_index+0x25/0xf0 [applesmc]
> RSP: 0018:ffff8801343ffa18  EFLAGS: 00010286
> RAX: 0000000026000000 RBX: ffff880322aed000 RCX: ffff8801343ffaf4
> RDX: 0000000072000000 RSI: ffffffffa05df06e RDI: 0000000026000000
> RBP: ffff8801343ffa38 R08: ffff8801343fe000 R09: ffff880132a6c100
> R10: 0000000000000000 R11: 0000000000014180 R12: 0000000026000000
> R13: 000000004c000001 R14: ffffffffa05df06e R15: ffff8801343ffa84
> FS:  00007f6aac863880(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffff880322aed005 CR3: 00000001340fa000 CR4: 00000000000007e0
> Stack:
>  000000000004deef ffffffffffffff10 0000000026000000 0000000000000000
>  ffff8801343ffa70 ffffffffa05dd4f1 0000000000000032 0000000000000001
>  ffffffffa05df06e ffff880134be3810 ffffffffa05e0660 ffff8801343ffaa8
> Call Trace:
>  [<ffffffffa05dd4f1>] applesmc_get_lower_bound+0x51/0xa0 [applesmc]
>  [<ffffffffa05dd563>] applesmc_get_entry_by_key+0x23/0xc0 [applesmc]
>  [<ffffffffa05dd9f9>] applesmc_read_key+0x19/0x70 [applesmc]
>  [<ffffffffa05de08f>] applesmc_init_smcreg+0x8f/0x2d0 [applesmc]
>  [<ffffffffa05de2e2>] applesmc_probe+0x12/0x30 [applesmc]
>  [<ffffffff813f41dc>] platform_drv_probe+0x3c/0x70
>  [<ffffffff813f18b2>] ? driver_sysfs_add+0x82/0xb0
>  [<ffffffff813f1f77>] driver_probe_device+0x87/0x390
>  [<ffffffff813f2280>] ? driver_probe_device+0x390/0x390
>  [<ffffffff813f22bb>] __device_attach+0x3b/0x40
>  [<ffffffff813eff73>] bus_for_each_drv+0x63/0xa0
>  [<ffffffff813f1e78>] device_attach+0x88/0xa0
>  [<ffffffff813f11e8>] bus_probe_device+0x98/0xc0
>  [<ffffffff813eeeb4>] device_add+0x4c4/0x7a0
>  [<ffffffff813f3cb1>] platform_device_add+0xd1/0x2d0
>  [<ffffffff813f4430>] platform_device_register_full+0xe0/0x120
>  [<ffffffffa05e3000>] ? 0xffffffffa05e2fff
>  [<ffffffffa05e3098>] applesmc_init+0x98/0x1000 [applesmc]
>  [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0
>  [<ffffffff81052483>] ? set_memory_nx+0x43/0x50
>  [<ffffffff810cc58d>] load_module+0x1bbd/0x2660
>  [<ffffffff810c8890>] ? store_uevent+0x40/0x40
>  [<ffffffff810cd1a6>] SyS_finit_module+0x86/0xb0
>  [<ffffffff81652cd9>] system_call_fastpath+0x16/0x1b
> Code: 84 00 00 00 00 00 66 66 66 66 90 55 48 63 c7 48 8d 14 40 48 89
> e5 41 54 41 89 fc 53 48 8d 1c 90 48 83 ec 10 48 03 1d 03 33 00 00 <80>
> 7b 05 00 48 89 d8 74 12 48 83 c4 10 5b 41 5c 5d c3 66 0f 1f
> RIP  [<ffffffffa05dd345>] applesmc_get_entry_by_index+0x25/0xf0 [applesmc]
>  RSP <ffff8801343ffa18>
> 

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-09-25 19:56   ` [lm-sensors] " Guenter Roeck
@ 2013-09-25 21:48     ` Henrik Rydberg
  -1 siblings, 0 replies; 86+ messages in thread
From: Henrik Rydberg @ 2013-09-25 21:48 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla

On Wed, Sep 25, 2013 at 12:56:28PM -0700, Guenter Roeck wrote:
> On Wed, Sep 25, 2013 at 03:06:25PM -0400, Josh Boyer wrote:
> > Hi All,
> > 
> > Chris has reported[1] an issue with the applesmc driver in the 3.10
> > and 3.11 kernels.  This seems to be a bit transient, but the oops is
> > consistent across those releases.  This is with a MacBook Pro 4,1.
> > The backtrace is below.
> > 
> > Any ideas on this one?
> > 
> Some of the bug reports suggest that there are related messages in at least some
> of the kernel logs.
> 
> Jul 02 22:23:22 f19s.local kernel: applesmc: send_byte(0x04, 0x0300) fail: 0x01
> Jul 02 22:23:22 f19s.local kernel: applesmc: : read len fail
> 
> This suggests that initialization may be attempted more than once. The key cache
> is allocated only once, but the number of keys is read for each attempt.
> 
> No idea if that can happen, but if the number of keys can increase after
> the first initialization attempt you would have an explanation for the crash.

Good idea, and easy enough to test with the patch below.

diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
index 62c2e32..d962c4d 100644
--- a/drivers/hwmon/applesmc.c
+++ b/drivers/hwmon/applesmc.c
@@ -525,16 +525,24 @@ static int applesmc_init_smcreg_try(void)
 {
        struct applesmc_registers *s = &smcreg;
        bool left_light_sensor, right_light_sensor;
+       unsigned int count;
        u8 tmp[1];
        int ret;
 
        if (s->init_complete)
                return 0;
 
-       ret = read_register_count(&s->key_count);
+       ret = read_register_count(&count);
        if (ret)
                return ret;
 
+       if (s->cache && s->key_count != count) {
+               pr_warn("key count changed from %d to %d\n", s->key_count, count);
+               kfree(s->cache);
+               s->cache = NULL;
+       }
+       s->key_count = count;
+
        if (!s->cache)
                s->cache = kcalloc(s->key_count, sizeof(*s->cache), GFP_KERNEL);
        if (!s->cache)
 
Thanks,
Henrik

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-09-25 21:48     ` Henrik Rydberg
  0 siblings, 0 replies; 86+ messages in thread
From: Henrik Rydberg @ 2013-09-25 21:48 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla

On Wed, Sep 25, 2013 at 12:56:28PM -0700, Guenter Roeck wrote:
> On Wed, Sep 25, 2013 at 03:06:25PM -0400, Josh Boyer wrote:
> > Hi All,
> > 
> > Chris has reported[1] an issue with the applesmc driver in the 3.10
> > and 3.11 kernels.  This seems to be a bit transient, but the oops is
> > consistent across those releases.  This is with a MacBook Pro 4,1.
> > The backtrace is below.
> > 
> > Any ideas on this one?
> > 
> Some of the bug reports suggest that there are related messages in at least some
> of the kernel logs.
> 
> Jul 02 22:23:22 f19s.local kernel: applesmc: send_byte(0x04, 0x0300) fail: 0x01
> Jul 02 22:23:22 f19s.local kernel: applesmc: : read len fail
> 
> This suggests that initialization may be attempted more than once. The key cache
> is allocated only once, but the number of keys is read for each attempt.
> 
> No idea if that can happen, but if the number of keys can increase after
> the first initialization attempt you would have an explanation for the crash.

Good idea, and easy enough to test with the patch below.

diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
index 62c2e32..d962c4d 100644
--- a/drivers/hwmon/applesmc.c
+++ b/drivers/hwmon/applesmc.c
@@ -525,16 +525,24 @@ static int applesmc_init_smcreg_try(void)
 {
        struct applesmc_registers *s = &smcreg;
        bool left_light_sensor, right_light_sensor;
+       unsigned int count;
        u8 tmp[1];
        int ret;
 
        if (s->init_complete)
                return 0;
 
-       ret = read_register_count(&s->key_count);
+       ret = read_register_count(&count);
        if (ret)
                return ret;
 
+       if (s->cache && s->key_count != count) {
+               pr_warn("key count changed from %d to %d\n", s->key_count, count);
+               kfree(s->cache);
+               s->cache = NULL;
+       }
+       s->key_count = count;
+
        if (!s->cache)
                s->cache = kcalloc(s->key_count, sizeof(*s->cache), GFP_KERNEL);
        if (!s->cache)
 
Thanks,
Henrik

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-09-25 21:48     ` [lm-sensors] " Henrik Rydberg
@ 2013-09-25 22:08       ` Guenter Roeck
  -1 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-09-25 22:08 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla

On Wed, Sep 25, 2013 at 11:48:07PM +0200, Henrik Rydberg wrote:
> On Wed, Sep 25, 2013 at 12:56:28PM -0700, Guenter Roeck wrote:
> > On Wed, Sep 25, 2013 at 03:06:25PM -0400, Josh Boyer wrote:
> > > Hi All,
> > > 
> > > Chris has reported[1] an issue with the applesmc driver in the 3.10
> > > and 3.11 kernels.  This seems to be a bit transient, but the oops is
> > > consistent across those releases.  This is with a MacBook Pro 4,1.
> > > The backtrace is below.
> > > 
> > > Any ideas on this one?
> > > 
> > Some of the bug reports suggest that there are related messages in at least some
> > of the kernel logs.
> > 
> > Jul 02 22:23:22 f19s.local kernel: applesmc: send_byte(0x04, 0x0300) fail: 0x01
> > Jul 02 22:23:22 f19s.local kernel: applesmc: : read len fail
> > 
> > This suggests that initialization may be attempted more than once. The key cache
> > is allocated only once, but the number of keys is read for each attempt.
> > 
> > No idea if that can happen, but if the number of keys can increase after
> > the first initialization attempt you would have an explanation for the crash.
> 
> Good idea, and easy enough to test with the patch below.
> 
Should we apply this patch even though it may not solve the specific problem ?

Again, not sure if the key count can change, but the current code is at the very
least inconsistent, as it keeps reading the key count without updating or
verifying the cache size.

Thanks,
Guenter

> diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
> index 62c2e32..d962c4d 100644
> --- a/drivers/hwmon/applesmc.c
> +++ b/drivers/hwmon/applesmc.c
> @@ -525,16 +525,24 @@ static int applesmc_init_smcreg_try(void)
>  {
>         struct applesmc_registers *s = &smcreg;
>         bool left_light_sensor, right_light_sensor;
> +       unsigned int count;
>         u8 tmp[1];
>         int ret;
>  
>         if (s->init_complete)
>                 return 0;
>  
> -       ret = read_register_count(&s->key_count);
> +       ret = read_register_count(&count);
>         if (ret)
>                 return ret;
>  
> +       if (s->cache && s->key_count != count) {
> +               pr_warn("key count changed from %d to %d\n", s->key_count, count);
> +               kfree(s->cache);
> +               s->cache = NULL;
> +       }
> +       s->key_count = count;
> +
>         if (!s->cache)
>                 s->cache = kcalloc(s->key_count, sizeof(*s->cache), GFP_KERNEL);
>         if (!s->cache)
>  
> Thanks,
> Henrik
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-09-25 22:08       ` Guenter Roeck
  0 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-09-25 22:08 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla

On Wed, Sep 25, 2013 at 11:48:07PM +0200, Henrik Rydberg wrote:
> On Wed, Sep 25, 2013 at 12:56:28PM -0700, Guenter Roeck wrote:
> > On Wed, Sep 25, 2013 at 03:06:25PM -0400, Josh Boyer wrote:
> > > Hi All,
> > > 
> > > Chris has reported[1] an issue with the applesmc driver in the 3.10
> > > and 3.11 kernels.  This seems to be a bit transient, but the oops is
> > > consistent across those releases.  This is with a MacBook Pro 4,1.
> > > The backtrace is below.
> > > 
> > > Any ideas on this one?
> > > 
> > Some of the bug reports suggest that there are related messages in at least some
> > of the kernel logs.
> > 
> > Jul 02 22:23:22 f19s.local kernel: applesmc: send_byte(0x04, 0x0300) fail: 0x01
> > Jul 02 22:23:22 f19s.local kernel: applesmc: : read len fail
> > 
> > This suggests that initialization may be attempted more than once. The key cache
> > is allocated only once, but the number of keys is read for each attempt.
> > 
> > No idea if that can happen, but if the number of keys can increase after
> > the first initialization attempt you would have an explanation for the crash.
> 
> Good idea, and easy enough to test with the patch below.
> 
Should we apply this patch even though it may not solve the specific problem ?

Again, not sure if the key count can change, but the current code is at the very
least inconsistent, as it keeps reading the key count without updating or
verifying the cache size.

Thanks,
Guenter

> diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
> index 62c2e32..d962c4d 100644
> --- a/drivers/hwmon/applesmc.c
> +++ b/drivers/hwmon/applesmc.c
> @@ -525,16 +525,24 @@ static int applesmc_init_smcreg_try(void)
>  {
>         struct applesmc_registers *s = &smcreg;
>         bool left_light_sensor, right_light_sensor;
> +       unsigned int count;
>         u8 tmp[1];
>         int ret;
>  
>         if (s->init_complete)
>                 return 0;
>  
> -       ret = read_register_count(&s->key_count);
> +       ret = read_register_count(&count);
>         if (ret)
>                 return ret;
>  
> +       if (s->cache && s->key_count != count) {
> +               pr_warn("key count changed from %d to %d\n", s->key_count, count);
> +               kfree(s->cache);
> +               s->cache = NULL;
> +       }
> +       s->key_count = count;
> +
>         if (!s->cache)
>                 s->cache = kcalloc(s->key_count, sizeof(*s->cache), GFP_KERNEL);
>         if (!s->cache)
>  
> Thanks,
> Henrik
> 

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-09-25 22:08       ` [lm-sensors] " Guenter Roeck
@ 2013-09-26  6:34         ` Henrik Rydberg
  -1 siblings, 0 replies; 86+ messages in thread
From: Henrik Rydberg @ 2013-09-26  6:34 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla

> > > This suggests that initialization may be attempted more than once. The key cache
> > > is allocated only once, but the number of keys is read for each attempt.
> > > 
> > > No idea if that can happen, but if the number of keys can increase after
> > > the first initialization attempt you would have an explanation for the crash.
> > 
> > Good idea, and easy enough to test with the patch below.
> > 
> Should we apply this patch even though it may not solve the specific problem ?

Yes, why not - it certainly won't hurt. I am running it right now, so
it is at least run-tested.

> Again, not sure if the key count can change, but the current code is at the very
> least inconsistent, as it keeps reading the key count without updating or
> verifying the cache size.

Yes - I agree that the error state is far-fetched, but it is hard to
see any other logical explanation. There is of course always the
possibility that the problem is somewhere else completely.

Proper patch attached.

Thanks,
Henrik

---

>From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
From: Henrik Rydberg <rydberg@euromail.se>
Date: Thu, 26 Sep 2013 08:33:16 +0200
Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding

After reports from Chris and Josh Boyer of a rare crash in applesmc,
Guenter pointed at the initialization problem fixed below. The patch
has not been verified to fix the crash, but should be applied
regardless.

Reported-by: <jwboyer@fedoraproject.org>
Suggested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
---
 drivers/hwmon/applesmc.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
index 62c2e32..98814d1 100644
--- a/drivers/hwmon/applesmc.c
+++ b/drivers/hwmon/applesmc.c
@@ -525,16 +525,25 @@ static int applesmc_init_smcreg_try(void)
 {
 	struct applesmc_registers *s = &smcreg;
 	bool left_light_sensor, right_light_sensor;
+	unsigned int count;
 	u8 tmp[1];
 	int ret;
 
 	if (s->init_complete)
 		return 0;
 
-	ret = read_register_count(&s->key_count);
+	ret = read_register_count(&count);
 	if (ret)
 		return ret;
 
+	if (s->cache && s->key_count != count) {
+		pr_warn("key count changed from %d to %d\n",
+			s->key_count, count);
+		kfree(s->cache);
+		s->cache = NULL;
+	}
+	s->key_count = count;
+
 	if (!s->cache)
 		s->cache = kcalloc(s->key_count, sizeof(*s->cache), GFP_KERNEL);
 	if (!s->cache)
-- 
1.8.3.4


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-09-26  6:34         ` Henrik Rydberg
  0 siblings, 0 replies; 86+ messages in thread
From: Henrik Rydberg @ 2013-09-26  6:34 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla

> > > This suggests that initialization may be attempted more than once. The key cache
> > > is allocated only once, but the number of keys is read for each attempt.
> > > 
> > > No idea if that can happen, but if the number of keys can increase after
> > > the first initialization attempt you would have an explanation for the crash.
> > 
> > Good idea, and easy enough to test with the patch below.
> > 
> Should we apply this patch even though it may not solve the specific problem ?

Yes, why not - it certainly won't hurt. I am running it right now, so
it is at least run-tested.

> Again, not sure if the key count can change, but the current code is at the very
> least inconsistent, as it keeps reading the key count without updating or
> verifying the cache size.

Yes - I agree that the error state is far-fetched, but it is hard to
see any other logical explanation. There is of course always the
possibility that the problem is somewhere else completely.

Proper patch attached.

Thanks,
Henrik

---

From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
From: Henrik Rydberg <rydberg@euromail.se>
Date: Thu, 26 Sep 2013 08:33:16 +0200
Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding

After reports from Chris and Josh Boyer of a rare crash in applesmc,
Guenter pointed at the initialization problem fixed below. The patch
has not been verified to fix the crash, but should be applied
regardless.

Reported-by: <jwboyer@fedoraproject.org>
Suggested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
---
 drivers/hwmon/applesmc.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
index 62c2e32..98814d1 100644
--- a/drivers/hwmon/applesmc.c
+++ b/drivers/hwmon/applesmc.c
@@ -525,16 +525,25 @@ static int applesmc_init_smcreg_try(void)
 {
 	struct applesmc_registers *s = &smcreg;
 	bool left_light_sensor, right_light_sensor;
+	unsigned int count;
 	u8 tmp[1];
 	int ret;
 
 	if (s->init_complete)
 		return 0;
 
-	ret = read_register_count(&s->key_count);
+	ret = read_register_count(&count);
 	if (ret)
 		return ret;
 
+	if (s->cache && s->key_count != count) {
+		pr_warn("key count changed from %d to %d\n",
+			s->key_count, count);
+		kfree(s->cache);
+		s->cache = NULL;
+	}
+	s->key_count = count;
+
 	if (!s->cache)
 		s->cache = kcalloc(s->key_count, sizeof(*s->cache), GFP_KERNEL);
 	if (!s->cache)
-- 
1.8.3.4


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-09-26  6:34         ` [lm-sensors] " Henrik Rydberg
@ 2013-09-26 10:36           ` Guenter Roeck
  -1 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-09-26 10:36 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla

On 09/25/2013 11:34 PM, Henrik Rydberg wrote:
>>>> This suggests that initialization may be attempted more than once. The key cache
>>>> is allocated only once, but the number of keys is read for each attempt.
>>>>
>>>> No idea if that can happen, but if the number of keys can increase after
>>>> the first initialization attempt you would have an explanation for the crash.
>>>
>>> Good idea, and easy enough to test with the patch below.
>>>
>> Should we apply this patch even though it may not solve the specific problem ?
>
> Yes, why not - it certainly won't hurt. I am running it right now, so
> it is at least run-tested.
>
>> Again, not sure if the key count can change, but the current code is at the very
>> least inconsistent, as it keeps reading the key count without updating or
>> verifying the cache size.
>
> Yes - I agree that the error state is far-fetched, but it is hard to
> see any other logical explanation. There is of course always the
> possibility that the problem is somewhere else completely.
>
There are also ACPI conflicts in each of the bug reports I looked at.
Can this play a role, or is that "normal" on apple systems ?

Thanks,
Guenter

> Proper patch attached.
>
> Thanks,
> Henrik
>
> ---
>
>>From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
> From: Henrik Rydberg <rydberg@euromail.se>
> Date: Thu, 26 Sep 2013 08:33:16 +0200
> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>
> After reports from Chris and Josh Boyer of a rare crash in applesmc,
> Guenter pointed at the initialization problem fixed below. The patch
> has not been verified to fix the crash, but should be applied
> regardless.
>
> Reported-by: <jwboyer@fedoraproject.org>
> Suggested-by: Guenter Roeck <linux@roeck-us.net>
> Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
> ---
>   drivers/hwmon/applesmc.c | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
> index 62c2e32..98814d1 100644
> --- a/drivers/hwmon/applesmc.c
> +++ b/drivers/hwmon/applesmc.c
> @@ -525,16 +525,25 @@ static int applesmc_init_smcreg_try(void)
>   {
>   	struct applesmc_registers *s = &smcreg;
>   	bool left_light_sensor, right_light_sensor;
> +	unsigned int count;
>   	u8 tmp[1];
>   	int ret;
>
>   	if (s->init_complete)
>   		return 0;
>
> -	ret = read_register_count(&s->key_count);
> +	ret = read_register_count(&count);
>   	if (ret)
>   		return ret;
>
> +	if (s->cache && s->key_count != count) {
> +		pr_warn("key count changed from %d to %d\n",
> +			s->key_count, count);
> +		kfree(s->cache);
> +		s->cache = NULL;
> +	}
> +	s->key_count = count;
> +
>   	if (!s->cache)
>   		s->cache = kcalloc(s->key_count, sizeof(*s->cache), GFP_KERNEL);
>   	if (!s->cache)
>


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-09-26 10:36           ` Guenter Roeck
  0 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-09-26 10:36 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla

On 09/25/2013 11:34 PM, Henrik Rydberg wrote:
>>>> This suggests that initialization may be attempted more than once. The key cache
>>>> is allocated only once, but the number of keys is read for each attempt.
>>>>
>>>> No idea if that can happen, but if the number of keys can increase after
>>>> the first initialization attempt you would have an explanation for the crash.
>>>
>>> Good idea, and easy enough to test with the patch below.
>>>
>> Should we apply this patch even though it may not solve the specific problem ?
>
> Yes, why not - it certainly won't hurt. I am running it right now, so
> it is at least run-tested.
>
>> Again, not sure if the key count can change, but the current code is at the very
>> least inconsistent, as it keeps reading the key count without updating or
>> verifying the cache size.
>
> Yes - I agree that the error state is far-fetched, but it is hard to
> see any other logical explanation. There is of course always the
> possibility that the problem is somewhere else completely.
>
There are also ACPI conflicts in each of the bug reports I looked at.
Can this play a role, or is that "normal" on apple systems ?

Thanks,
Guenter

> Proper patch attached.
>
> Thanks,
> Henrik
>
> ---
>
>>From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
> From: Henrik Rydberg <rydberg@euromail.se>
> Date: Thu, 26 Sep 2013 08:33:16 +0200
> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>
> After reports from Chris and Josh Boyer of a rare crash in applesmc,
> Guenter pointed at the initialization problem fixed below. The patch
> has not been verified to fix the crash, but should be applied
> regardless.
>
> Reported-by: <jwboyer@fedoraproject.org>
> Suggested-by: Guenter Roeck <linux@roeck-us.net>
> Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
> ---
>   drivers/hwmon/applesmc.c | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
> index 62c2e32..98814d1 100644
> --- a/drivers/hwmon/applesmc.c
> +++ b/drivers/hwmon/applesmc.c
> @@ -525,16 +525,25 @@ static int applesmc_init_smcreg_try(void)
>   {
>   	struct applesmc_registers *s = &smcreg;
>   	bool left_light_sensor, right_light_sensor;
> +	unsigned int count;
>   	u8 tmp[1];
>   	int ret;
>
>   	if (s->init_complete)
>   		return 0;
>
> -	ret = read_register_count(&s->key_count);
> +	ret = read_register_count(&count);
>   	if (ret)
>   		return ret;
>
> +	if (s->cache && s->key_count != count) {
> +		pr_warn("key count changed from %d to %d\n",
> +			s->key_count, count);
> +		kfree(s->cache);
> +		s->cache = NULL;
> +	}
> +	s->key_count = count;
> +
>   	if (!s->cache)
>   		s->cache = kcalloc(s->key_count, sizeof(*s->cache), GFP_KERNEL);
>   	if (!s->cache)
>


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-09-26  6:34         ` [lm-sensors] " Henrik Rydberg
@ 2013-09-26 10:53           ` Guenter Roeck
  -1 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-09-26 10:53 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla

On 09/25/2013 11:34 PM, Henrik Rydberg wrote:
>>>> This suggests that initialization may be attempted more than once. The key cache
>>>> is allocated only once, but the number of keys is read for each attempt.
>>>>
>>>> No idea if that can happen, but if the number of keys can increase after
>>>> the first initialization attempt you would have an explanation for the crash.
>>>
>>> Good idea, and easy enough to test with the patch below.
>>>
>> Should we apply this patch even though it may not solve the specific problem ?
>
> Yes, why not - it certainly won't hurt. I am running it right now, so
> it is at least run-tested.
>
>> Again, not sure if the key count can change, but the current code is at the very
>> least inconsistent, as it keeps reading the key count without updating or
>> verifying the cache size.
>
> Yes - I agree that the error state is far-fetched, but it is hard to
> see any other logical explanation. There is of course always the
> possibility that the problem is somewhere else completely.
>
> Proper patch attached.
>
> Thanks,
> Henrik
>
> ---
>
>>From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
> From: Henrik Rydberg <rydberg@euromail.se>
> Date: Thu, 26 Sep 2013 08:33:16 +0200
> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>
> After reports from Chris and Josh Boyer of a rare crash in applesmc,
> Guenter pointed at the initialization problem fixed below. The patch
> has not been verified to fix the crash, but should be applied
> regardless.
>
> Reported-by: <jwboyer@fedoraproject.org>
> Suggested-by: Guenter Roeck <linux@roeck-us.net>
> Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
> ---
>   drivers/hwmon/applesmc.c | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)

Applied. I'll run my sanity tests before I send it upstream.
I'll also Cc: stable.

Interesting is that this started to happen with 3.10, even though
I did not find any relevant changes in the driver. Is it possible that
changed boot timing (ie reduced boot time) exposes this problem ?

Thanks,
Guenter


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-09-26 10:53           ` Guenter Roeck
  0 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-09-26 10:53 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla

On 09/25/2013 11:34 PM, Henrik Rydberg wrote:
>>>> This suggests that initialization may be attempted more than once. The key cache
>>>> is allocated only once, but the number of keys is read for each attempt.
>>>>
>>>> No idea if that can happen, but if the number of keys can increase after
>>>> the first initialization attempt you would have an explanation for the crash.
>>>
>>> Good idea, and easy enough to test with the patch below.
>>>
>> Should we apply this patch even though it may not solve the specific problem ?
>
> Yes, why not - it certainly won't hurt. I am running it right now, so
> it is at least run-tested.
>
>> Again, not sure if the key count can change, but the current code is at the very
>> least inconsistent, as it keeps reading the key count without updating or
>> verifying the cache size.
>
> Yes - I agree that the error state is far-fetched, but it is hard to
> see any other logical explanation. There is of course always the
> possibility that the problem is somewhere else completely.
>
> Proper patch attached.
>
> Thanks,
> Henrik
>
> ---
>
>>From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
> From: Henrik Rydberg <rydberg@euromail.se>
> Date: Thu, 26 Sep 2013 08:33:16 +0200
> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>
> After reports from Chris and Josh Boyer of a rare crash in applesmc,
> Guenter pointed at the initialization problem fixed below. The patch
> has not been verified to fix the crash, but should be applied
> regardless.
>
> Reported-by: <jwboyer@fedoraproject.org>
> Suggested-by: Guenter Roeck <linux@roeck-us.net>
> Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
> ---
>   drivers/hwmon/applesmc.c | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)

Applied. I'll run my sanity tests before I send it upstream.
I'll also Cc: stable.

Interesting is that this started to happen with 3.10, even though
I did not find any relevant changes in the driver. Is it possible that
changed boot timing (ie reduced boot time) exposes this problem ?

Thanks,
Guenter


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-09-26 10:53           ` [lm-sensors] " Guenter Roeck
@ 2013-09-26 11:11             ` Henrik Rydberg
  -1 siblings, 0 replies; 86+ messages in thread
From: Henrik Rydberg @ 2013-09-26 11:11 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla

> Applied. I'll run my sanity tests before I send it upstream.
> I'll also Cc: stable.

Great.

> Interesting is that this started to happen with 3.10, even though
> I did not find any relevant changes in the driver. Is it possible that
> changed boot timing (ie reduced boot time) exposes this problem ?

I was thinking the same thing - if the SMC takes some time to boot up,
it could very well be the culprit. Imagine that, the kernel is too
fast. ;-)

Thanks,
Henrik

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-09-26 11:11             ` Henrik Rydberg
  0 siblings, 0 replies; 86+ messages in thread
From: Henrik Rydberg @ 2013-09-26 11:11 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla

> Applied. I'll run my sanity tests before I send it upstream.
> I'll also Cc: stable.

Great.

> Interesting is that this started to happen with 3.10, even though
> I did not find any relevant changes in the driver. Is it possible that
> changed boot timing (ie reduced boot time) exposes this problem ?

I was thinking the same thing - if the SMC takes some time to boot up,
it could very well be the culprit. Imagine that, the kernel is too
fast. ;-)

Thanks,
Henrik

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-09-26 10:36           ` [lm-sensors] " Guenter Roeck
@ 2013-09-26 11:13             ` Henrik Rydberg
  -1 siblings, 0 replies; 86+ messages in thread
From: Henrik Rydberg @ 2013-09-26 11:13 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla

> >Yes - I agree that the error state is far-fetched, but it is hard to
> >see any other logical explanation. There is of course always the
> >possibility that the problem is somewhere else completely.
> >
> There are also ACPI conflicts in each of the bug reports I looked at.
> Can this play a role, or is that "normal" on apple systems ?

I honstestly don't know. On my own machine (MBA3,1), I have problems
with the PCIe NIC somehow overwriting the graphics buffer, resulting
in a corrupted screen at boot. After a suspend/resume, that problem is
gone, but there is surely something fishy going on...

Thanks,
Henrik

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-09-26 11:13             ` Henrik Rydberg
  0 siblings, 0 replies; 86+ messages in thread
From: Henrik Rydberg @ 2013-09-26 11:13 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Josh Boyer, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org, bugzilla

> >Yes - I agree that the error state is far-fetched, but it is hard to
> >see any other logical explanation. There is of course always the
> >possibility that the problem is somewhere else completely.
> >
> There are also ACPI conflicts in each of the bug reports I looked at.
> Can this play a role, or is that "normal" on apple systems ?

I honstestly don't know. On my own machine (MBA3,1), I have problems
with the PCIe NIC somehow overwriting the graphics buffer, resulting
in a corrupted screen at boot. After a suspend/resume, that problem is
gone, but there is surely something fishy going on...

Thanks,
Henrik

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-09-26  6:34         ` [lm-sensors] " Henrik Rydberg
@ 2013-09-27 16:21           ` Josh Boyer
  -1 siblings, 0 replies; 86+ messages in thread
From: Josh Boyer @ 2013-09-27 16:21 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Guenter Roeck, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org,
	bugzilla

On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote:
>> > > This suggests that initialization may be attempted more than once. The key cache
>> > > is allocated only once, but the number of keys is read for each attempt.
>> > >
>> > > No idea if that can happen, but if the number of keys can increase after
>> > > the first initialization attempt you would have an explanation for the crash.
>> >
>> > Good idea, and easy enough to test with the patch below.
>> >
>> Should we apply this patch even though it may not solve the specific problem ?
>
> Yes, why not - it certainly won't hurt. I am running it right now, so
> it is at least run-tested.
>
>> Again, not sure if the key count can change, but the current code is at the very
>> least inconsistent, as it keeps reading the key count without updating or
>> verifying the cache size.
>
> Yes - I agree that the error state is far-fetched, but it is hard to
> see any other logical explanation. There is of course always the
> possibility that the problem is somewhere else completely.
>
> Proper patch attached.
>
> Thanks,
> Henrik
>
> ---
>
> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
> From: Henrik Rydberg <rydberg@euromail.se>
> Date: Thu, 26 Sep 2013 08:33:16 +0200
> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>
> After reports from Chris and Josh Boyer of a rare crash in applesmc,
> Guenter pointed at the initialization problem fixed below. The patch
> has not been verified to fix the crash, but should be applied
> regardless.
>
> Reported-by: <jwboyer@fedoraproject.org>
> Suggested-by: Guenter Roeck <linux@roeck-us.net>
> Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
> ---
>  drivers/hwmon/applesmc.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)

Thanks for the quick reply.  I'll get this rolled into our kernels soon.

josh

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-09-27 16:21           ` Josh Boyer
  0 siblings, 0 replies; 86+ messages in thread
From: Josh Boyer @ 2013-09-27 16:21 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Guenter Roeck, khali, lm-sensors, Linux-Kernel@Vger. Kernel. Org,
	bugzilla

On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote:
>> > > This suggests that initialization may be attempted more than once. The key cache
>> > > is allocated only once, but the number of keys is read for each attempt.
>> > >
>> > > No idea if that can happen, but if the number of keys can increase after
>> > > the first initialization attempt you would have an explanation for the crash.
>> >
>> > Good idea, and easy enough to test with the patch below.
>> >
>> Should we apply this patch even though it may not solve the specific problem ?
>
> Yes, why not - it certainly won't hurt. I am running it right now, so
> it is at least run-tested.
>
>> Again, not sure if the key count can change, but the current code is at the very
>> least inconsistent, as it keeps reading the key count without updating or
>> verifying the cache size.
>
> Yes - I agree that the error state is far-fetched, but it is hard to
> see any other logical explanation. There is of course always the
> possibility that the problem is somewhere else completely.
>
> Proper patch attached.
>
> Thanks,
> Henrik
>
> ---
>
> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
> From: Henrik Rydberg <rydberg@euromail.se>
> Date: Thu, 26 Sep 2013 08:33:16 +0200
> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>
> After reports from Chris and Josh Boyer of a rare crash in applesmc,
> Guenter pointed at the initialization problem fixed below. The patch
> has not been verified to fix the crash, but should be applied
> regardless.
>
> Reported-by: <jwboyer@fedoraproject.org>
> Suggested-by: Guenter Roeck <linux@roeck-us.net>
> Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
> ---
>  drivers/hwmon/applesmc.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)

Thanks for the quick reply.  I'll get this rolled into our kernels soon.

josh

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-09-27 16:21           ` [lm-sensors] " Josh Boyer
@ 2013-09-27 17:12             ` Guenter Roeck
  -1 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-09-27 17:12 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Henrik Rydberg, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org, bugzilla

On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote:
> >> > > This suggests that initialization may be attempted more than once. The key cache
> >> > > is allocated only once, but the number of keys is read for each attempt.
> >> > >
> >> > > No idea if that can happen, but if the number of keys can increase after
> >> > > the first initialization attempt you would have an explanation for the crash.
> >> >
> >> > Good idea, and easy enough to test with the patch below.
> >> >
> >> Should we apply this patch even though it may not solve the specific problem ?
> >
> > Yes, why not - it certainly won't hurt. I am running it right now, so
> > it is at least run-tested.
> >
> >> Again, not sure if the key count can change, but the current code is at the very
> >> least inconsistent, as it keeps reading the key count without updating or
> >> verifying the cache size.
> >
> > Yes - I agree that the error state is far-fetched, but it is hard to
> > see any other logical explanation. There is of course always the
> > possibility that the problem is somewhere else completely.
> >
> > Proper patch attached.
> >
> > Thanks,
> > Henrik
> >
> > ---
> >
> > From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
> > From: Henrik Rydberg <rydberg@euromail.se>
> > Date: Thu, 26 Sep 2013 08:33:16 +0200
> > Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
> >
> > After reports from Chris and Josh Boyer of a rare crash in applesmc,
> > Guenter pointed at the initialization problem fixed below. The patch
> > has not been verified to fix the crash, but should be applied
> > regardless.
> >
> > Reported-by: <jwboyer@fedoraproject.org>
> > Suggested-by: Guenter Roeck <linux@roeck-us.net>
> > Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
> > ---
> >  drivers/hwmon/applesmc.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> Thanks for the quick reply.  I'll get this rolled into our kernels soon.
> 
I sent a pull request to Linus, so you should be able to pull it from
the upstream kernel shortly. Would be great to get feedback if the patch
solves the problem (or doesn't).

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-09-27 17:12             ` Guenter Roeck
  0 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-09-27 17:12 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Henrik Rydberg, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org, bugzilla

On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote:
> >> > > This suggests that initialization may be attempted more than once. The key cache
> >> > > is allocated only once, but the number of keys is read for each attempt.
> >> > >
> >> > > No idea if that can happen, but if the number of keys can increase after
> >> > > the first initialization attempt you would have an explanation for the crash.
> >> >
> >> > Good idea, and easy enough to test with the patch below.
> >> >
> >> Should we apply this patch even though it may not solve the specific problem ?
> >
> > Yes, why not - it certainly won't hurt. I am running it right now, so
> > it is at least run-tested.
> >
> >> Again, not sure if the key count can change, but the current code is at the very
> >> least inconsistent, as it keeps reading the key count without updating or
> >> verifying the cache size.
> >
> > Yes - I agree that the error state is far-fetched, but it is hard to
> > see any other logical explanation. There is of course always the
> > possibility that the problem is somewhere else completely.
> >
> > Proper patch attached.
> >
> > Thanks,
> > Henrik
> >
> > ---
> >
> > From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
> > From: Henrik Rydberg <rydberg@euromail.se>
> > Date: Thu, 26 Sep 2013 08:33:16 +0200
> > Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
> >
> > After reports from Chris and Josh Boyer of a rare crash in applesmc,
> > Guenter pointed at the initialization problem fixed below. The patch
> > has not been verified to fix the crash, but should be applied
> > regardless.
> >
> > Reported-by: <jwboyer@fedoraproject.org>
> > Suggested-by: Guenter Roeck <linux@roeck-us.net>
> > Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
> > ---
> >  drivers/hwmon/applesmc.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> Thanks for the quick reply.  I'll get this rolled into our kernels soon.
> 
I sent a pull request to Linus, so you should be able to pull it from
the upstream kernel shortly. Would be great to get feedback if the patch
solves the problem (or doesn't).

Thanks,
Guenter

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-09-27 17:12             ` [lm-sensors] " Guenter Roeck
@ 2013-09-27 17:41               ` Chris Murphy
  -1 siblings, 0 replies; 86+ messages in thread
From: Chris Murphy @ 2013-09-27 17:41 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org, bugzilla


On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote:

> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote:
>>>>>> This suggests that initialization may be attempted more than once. The key cache
>>>>>> is allocated only once, but the number of keys is read for each attempt.
>>>>>> 
>>>>>> No idea if that can happen, but if the number of keys can increase after
>>>>>> the first initialization attempt you would have an explanation for the crash.
>>>>> 
>>>>> Good idea, and easy enough to test with the patch below.
>>>>> 
>>>> Should we apply this patch even though it may not solve the specific problem ?
>>> 
>>> Yes, why not - it certainly won't hurt. I am running it right now, so
>>> it is at least run-tested.
>>> 
>>>> Again, not sure if the key count can change, but the current code is at the very
>>>> least inconsistent, as it keeps reading the key count without updating or
>>>> verifying the cache size.
>>> 
>>> Yes - I agree that the error state is far-fetched, but it is hard to
>>> see any other logical explanation. There is of course always the
>>> possibility that the problem is somewhere else completely.
>>> 
>>> Proper patch attached.
>>> 
>>> Thanks,
>>> Henrik
>>> 
>>> ---
>>> 
>>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
>>> From: Henrik Rydberg <rydberg@euromail.se>
>>> Date: Thu, 26 Sep 2013 08:33:16 +0200
>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>>> 
>>> After reports from Chris and Josh Boyer of a rare crash in applesmc,
>>> Guenter pointed at the initialization problem fixed below. The patch
>>> has not been verified to fix the crash, but should be applied
>>> regardless.
>>> 
>>> Reported-by: <jwboyer@fedoraproject.org>
>>> Suggested-by: Guenter Roeck <linux@roeck-us.net>
>>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
>>> ---
>>> drivers/hwmon/applesmc.c | 11 ++++++++++-
>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>> 
>> Thanks for the quick reply.  I'll get this rolled into our kernels soon.
>> 
> I sent a pull request to Linus, so you should be able to pull it from
> the upstream kernel shortly. Would be great to get feedback if the patch
> solves the problem (or doesn't).

I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both.

Chris

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-09-27 17:41               ` Chris Murphy
  0 siblings, 0 replies; 86+ messages in thread
From: Chris Murphy @ 2013-09-27 17:41 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org, bugzilla


On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote:

> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote:
>>>>>> This suggests that initialization may be attempted more than once. The key cache
>>>>>> is allocated only once, but the number of keys is read for each attempt.
>>>>>> 
>>>>>> No idea if that can happen, but if the number of keys can increase after
>>>>>> the first initialization attempt you would have an explanation for the crash.
>>>>> 
>>>>> Good idea, and easy enough to test with the patch below.
>>>>> 
>>>> Should we apply this patch even though it may not solve the specific problem ?
>>> 
>>> Yes, why not - it certainly won't hurt. I am running it right now, so
>>> it is at least run-tested.
>>> 
>>>> Again, not sure if the key count can change, but the current code is at the very
>>>> least inconsistent, as it keeps reading the key count without updating or
>>>> verifying the cache size.
>>> 
>>> Yes - I agree that the error state is far-fetched, but it is hard to
>>> see any other logical explanation. There is of course always the
>>> possibility that the problem is somewhere else completely.
>>> 
>>> Proper patch attached.
>>> 
>>> Thanks,
>>> Henrik
>>> 
>>> ---
>>> 
>>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
>>> From: Henrik Rydberg <rydberg@euromail.se>
>>> Date: Thu, 26 Sep 2013 08:33:16 +0200
>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>>> 
>>> After reports from Chris and Josh Boyer of a rare crash in applesmc,
>>> Guenter pointed at the initialization problem fixed below. The patch
>>> has not been verified to fix the crash, but should be applied
>>> regardless.
>>> 
>>> Reported-by: <jwboyer@fedoraproject.org>
>>> Suggested-by: Guenter Roeck <linux@roeck-us.net>
>>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
>>> ---
>>> drivers/hwmon/applesmc.c | 11 ++++++++++-
>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>> 
>> Thanks for the quick reply.  I'll get this rolled into our kernels soon.
>> 
> I sent a pull request to Linus, so you should be able to pull it from
> the upstream kernel shortly. Would be great to get feedback if the patch
> solves the problem (or doesn't).

I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both.

Chris
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-09-27 17:41               ` [lm-sensors] " Chris Murphy
@ 2013-09-27 17:59                 ` Guenter Roeck
  -1 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-09-27 17:59 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org, bugzilla

On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote:
> 
> On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote:
> 
> > On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
> >> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote:
> >>>>>> This suggests that initialization may be attempted more than once. The key cache
> >>>>>> is allocated only once, but the number of keys is read for each attempt.
> >>>>>> 
> >>>>>> No idea if that can happen, but if the number of keys can increase after
> >>>>>> the first initialization attempt you would have an explanation for the crash.
> >>>>> 
> >>>>> Good idea, and easy enough to test with the patch below.
> >>>>> 
> >>>> Should we apply this patch even though it may not solve the specific problem ?
> >>> 
> >>> Yes, why not - it certainly won't hurt. I am running it right now, so
> >>> it is at least run-tested.
> >>> 
> >>>> Again, not sure if the key count can change, but the current code is at the very
> >>>> least inconsistent, as it keeps reading the key count without updating or
> >>>> verifying the cache size.
> >>> 
> >>> Yes - I agree that the error state is far-fetched, but it is hard to
> >>> see any other logical explanation. There is of course always the
> >>> possibility that the problem is somewhere else completely.
> >>> 
> >>> Proper patch attached.
> >>> 
> >>> Thanks,
> >>> Henrik
> >>> 
> >>> ---
> >>> 
> >>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
> >>> From: Henrik Rydberg <rydberg@euromail.se>
> >>> Date: Thu, 26 Sep 2013 08:33:16 +0200
> >>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
> >>> 
> >>> After reports from Chris and Josh Boyer of a rare crash in applesmc,
> >>> Guenter pointed at the initialization problem fixed below. The patch
> >>> has not been verified to fix the crash, but should be applied
> >>> regardless.
> >>> 
> >>> Reported-by: <jwboyer@fedoraproject.org>
> >>> Suggested-by: Guenter Roeck <linux@roeck-us.net>
> >>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
> >>> ---
> >>> drivers/hwmon/applesmc.c | 11 ++++++++++-
> >>> 1 file changed, 10 insertions(+), 1 deletion(-)
> >> 
> >> Thanks for the quick reply.  I'll get this rolled into our kernels soon.
> >> 
> > I sent a pull request to Linus, so you should be able to pull it from
> > the upstream kernel shortly. Would be great to get feedback if the patch
> > solves the problem (or doesn't).
> 
> I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both.
> 
When you do, can you possibly trigger an event based on the warning added
with the patch ? This might help us to identify if the problem fixed
with the patch actually happens.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-09-27 17:59                 ` Guenter Roeck
  0 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-09-27 17:59 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org, bugzilla

On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote:
> 
> On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote:
> 
> > On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
> >> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote:
> >>>>>> This suggests that initialization may be attempted more than once. The key cache
> >>>>>> is allocated only once, but the number of keys is read for each attempt.
> >>>>>> 
> >>>>>> No idea if that can happen, but if the number of keys can increase after
> >>>>>> the first initialization attempt you would have an explanation for the crash.
> >>>>> 
> >>>>> Good idea, and easy enough to test with the patch below.
> >>>>> 
> >>>> Should we apply this patch even though it may not solve the specific problem ?
> >>> 
> >>> Yes, why not - it certainly won't hurt. I am running it right now, so
> >>> it is at least run-tested.
> >>> 
> >>>> Again, not sure if the key count can change, but the current code is at the very
> >>>> least inconsistent, as it keeps reading the key count without updating or
> >>>> verifying the cache size.
> >>> 
> >>> Yes - I agree that the error state is far-fetched, but it is hard to
> >>> see any other logical explanation. There is of course always the
> >>> possibility that the problem is somewhere else completely.
> >>> 
> >>> Proper patch attached.
> >>> 
> >>> Thanks,
> >>> Henrik
> >>> 
> >>> ---
> >>> 
> >>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
> >>> From: Henrik Rydberg <rydberg@euromail.se>
> >>> Date: Thu, 26 Sep 2013 08:33:16 +0200
> >>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
> >>> 
> >>> After reports from Chris and Josh Boyer of a rare crash in applesmc,
> >>> Guenter pointed at the initialization problem fixed below. The patch
> >>> has not been verified to fix the crash, but should be applied
> >>> regardless.
> >>> 
> >>> Reported-by: <jwboyer@fedoraproject.org>
> >>> Suggested-by: Guenter Roeck <linux@roeck-us.net>
> >>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
> >>> ---
> >>> drivers/hwmon/applesmc.c | 11 ++++++++++-
> >>> 1 file changed, 10 insertions(+), 1 deletion(-)
> >> 
> >> Thanks for the quick reply.  I'll get this rolled into our kernels soon.
> >> 
> > I sent a pull request to Linus, so you should be able to pull it from
> > the upstream kernel shortly. Would be great to get feedback if the patch
> > solves the problem (or doesn't).
> 
> I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both.
> 
When you do, can you possibly trigger an event based on the warning added
with the patch ? This might help us to identify if the problem fixed
with the patch actually happens.

Thanks,
Guenter

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-09-27 16:21           ` [lm-sensors] " Josh Boyer
@ 2013-09-27 18:01             ` Guenter Roeck
  -1 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-09-27 18:01 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Henrik Rydberg, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org, bugzilla

On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote:
> >> > > This suggests that initialization may be attempted more than once. The key cache
> >> > > is allocated only once, but the number of keys is read for each attempt.
> >> > >
> >> > > No idea if that can happen, but if the number of keys can increase after
> >> > > the first initialization attempt you would have an explanation for the crash.
> >> >
> >> > Good idea, and easy enough to test with the patch below.
> >> >
> >> Should we apply this patch even though it may not solve the specific problem ?
> >
> > Yes, why not - it certainly won't hurt. I am running it right now, so
> > it is at least run-tested.
> >
> >> Again, not sure if the key count can change, but the current code is at the very
> >> least inconsistent, as it keeps reading the key count without updating or
> >> verifying the cache size.
> >
> > Yes - I agree that the error state is far-fetched, but it is hard to
> > see any other logical explanation. There is of course always the
> > possibility that the problem is somewhere else completely.
> >
> > Proper patch attached.
> >
> > Thanks,
> > Henrik
> >
> > ---
> >
> > From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
> > From: Henrik Rydberg <rydberg@euromail.se>
> > Date: Thu, 26 Sep 2013 08:33:16 +0200
> > Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
> >
> > After reports from Chris and Josh Boyer of a rare crash in applesmc,
> > Guenter pointed at the initialization problem fixed below. The patch
> > has not been verified to fix the crash, but should be applied
> > regardless.
> >
> > Reported-by: <jwboyer@fedoraproject.org>
> > Suggested-by: Guenter Roeck <linux@roeck-us.net>
> > Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
> > ---
> >  drivers/hwmon/applesmc.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> Thanks for the quick reply.  I'll get this rolled into our kernels soon.
> 
Patch is now upstream (commit 5f4513864).

Guenter

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-09-27 18:01             ` Guenter Roeck
  0 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-09-27 18:01 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Henrik Rydberg, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org, bugzilla

On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote:
> >> > > This suggests that initialization may be attempted more than once. The key cache
> >> > > is allocated only once, but the number of keys is read for each attempt.
> >> > >
> >> > > No idea if that can happen, but if the number of keys can increase after
> >> > > the first initialization attempt you would have an explanation for the crash.
> >> >
> >> > Good idea, and easy enough to test with the patch below.
> >> >
> >> Should we apply this patch even though it may not solve the specific problem ?
> >
> > Yes, why not - it certainly won't hurt. I am running it right now, so
> > it is at least run-tested.
> >
> >> Again, not sure if the key count can change, but the current code is at the very
> >> least inconsistent, as it keeps reading the key count without updating or
> >> verifying the cache size.
> >
> > Yes - I agree that the error state is far-fetched, but it is hard to
> > see any other logical explanation. There is of course always the
> > possibility that the problem is somewhere else completely.
> >
> > Proper patch attached.
> >
> > Thanks,
> > Henrik
> >
> > ---
> >
> > From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
> > From: Henrik Rydberg <rydberg@euromail.se>
> > Date: Thu, 26 Sep 2013 08:33:16 +0200
> > Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
> >
> > After reports from Chris and Josh Boyer of a rare crash in applesmc,
> > Guenter pointed at the initialization problem fixed below. The patch
> > has not been verified to fix the crash, but should be applied
> > regardless.
> >
> > Reported-by: <jwboyer@fedoraproject.org>
> > Suggested-by: Guenter Roeck <linux@roeck-us.net>
> > Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
> > ---
> >  drivers/hwmon/applesmc.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> Thanks for the quick reply.  I'll get this rolled into our kernels soon.
> 
Patch is now upstream (commit 5f4513864).

Guenter

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-09-27 17:59                 ` [lm-sensors] " Guenter Roeck
@ 2013-09-27 18:03                   ` Chris Murphy
  -1 siblings, 0 replies; 86+ messages in thread
From: Chris Murphy @ 2013-09-27 18:03 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org, bugzilla


On Sep 27, 2013, at 11:59 AM, Guenter Roeck <linux@roeck-us.net> wrote:

> On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote:
>> 
>> On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>> 
>>> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
>>>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote:
>>>>>>>> This suggests that initialization may be attempted more than once. The key cache
>>>>>>>> is allocated only once, but the number of keys is read for each attempt.
>>>>>>>> 
>>>>>>>> No idea if that can happen, but if the number of keys can increase after
>>>>>>>> the first initialization attempt you would have an explanation for the crash.
>>>>>>> 
>>>>>>> Good idea, and easy enough to test with the patch below.
>>>>>>> 
>>>>>> Should we apply this patch even though it may not solve the specific problem ?
>>>>> 
>>>>> Yes, why not - it certainly won't hurt. I am running it right now, so
>>>>> it is at least run-tested.
>>>>> 
>>>>>> Again, not sure if the key count can change, but the current code is at the very
>>>>>> least inconsistent, as it keeps reading the key count without updating or
>>>>>> verifying the cache size.
>>>>> 
>>>>> Yes - I agree that the error state is far-fetched, but it is hard to
>>>>> see any other logical explanation. There is of course always the
>>>>> possibility that the problem is somewhere else completely.
>>>>> 
>>>>> Proper patch attached.
>>>>> 
>>>>> Thanks,
>>>>> Henrik
>>>>> 
>>>>> ---
>>>>> 
>>>>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
>>>>> From: Henrik Rydberg <rydberg@euromail.se>
>>>>> Date: Thu, 26 Sep 2013 08:33:16 +0200
>>>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>>>>> 
>>>>> After reports from Chris and Josh Boyer of a rare crash in applesmc,
>>>>> Guenter pointed at the initialization problem fixed below. The patch
>>>>> has not been verified to fix the crash, but should be applied
>>>>> regardless.
>>>>> 
>>>>> Reported-by: <jwboyer@fedoraproject.org>
>>>>> Suggested-by: Guenter Roeck <linux@roeck-us.net>
>>>>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
>>>>> ---
>>>>> drivers/hwmon/applesmc.c | 11 ++++++++++-
>>>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>> 
>>>> Thanks for the quick reply.  I'll get this rolled into our kernels soon.
>>>> 
>>> I sent a pull request to Linus, so you should be able to pull it from
>>> the upstream kernel shortly. Would be great to get feedback if the patch
>>> solves the problem (or doesn't).
>> 
>> I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both.
>> 
> When you do, can you possibly trigger an event based on the warning added
> with the patch ? This might help us to identify if the problem fixed
> with the patch actually happens.

I don't understand the question. I'm uncertain how to trigger, and also what event.

Chris

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-09-27 18:03                   ` Chris Murphy
  0 siblings, 0 replies; 86+ messages in thread
From: Chris Murphy @ 2013-09-27 18:03 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org, bugzilla


On Sep 27, 2013, at 11:59 AM, Guenter Roeck <linux@roeck-us.net> wrote:

> On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote:
>> 
>> On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>> 
>>> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
>>>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote:
>>>>>>>> This suggests that initialization may be attempted more than once. The key cache
>>>>>>>> is allocated only once, but the number of keys is read for each attempt.
>>>>>>>> 
>>>>>>>> No idea if that can happen, but if the number of keys can increase after
>>>>>>>> the first initialization attempt you would have an explanation for the crash.
>>>>>>> 
>>>>>>> Good idea, and easy enough to test with the patch below.
>>>>>>> 
>>>>>> Should we apply this patch even though it may not solve the specific problem ?
>>>>> 
>>>>> Yes, why not - it certainly won't hurt. I am running it right now, so
>>>>> it is at least run-tested.
>>>>> 
>>>>>> Again, not sure if the key count can change, but the current code is at the very
>>>>>> least inconsistent, as it keeps reading the key count without updating or
>>>>>> verifying the cache size.
>>>>> 
>>>>> Yes - I agree that the error state is far-fetched, but it is hard to
>>>>> see any other logical explanation. There is of course always the
>>>>> possibility that the problem is somewhere else completely.
>>>>> 
>>>>> Proper patch attached.
>>>>> 
>>>>> Thanks,
>>>>> Henrik
>>>>> 
>>>>> ---
>>>>> 
>>>>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
>>>>> From: Henrik Rydberg <rydberg@euromail.se>
>>>>> Date: Thu, 26 Sep 2013 08:33:16 +0200
>>>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>>>>> 
>>>>> After reports from Chris and Josh Boyer of a rare crash in applesmc,
>>>>> Guenter pointed at the initialization problem fixed below. The patch
>>>>> has not been verified to fix the crash, but should be applied
>>>>> regardless.
>>>>> 
>>>>> Reported-by: <jwboyer@fedoraproject.org>
>>>>> Suggested-by: Guenter Roeck <linux@roeck-us.net>
>>>>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
>>>>> ---
>>>>> drivers/hwmon/applesmc.c | 11 ++++++++++-
>>>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>> 
>>>> Thanks for the quick reply.  I'll get this rolled into our kernels soon.
>>>> 
>>> I sent a pull request to Linus, so you should be able to pull it from
>>> the upstream kernel shortly. Would be great to get feedback if the patch
>>> solves the problem (or doesn't).
>> 
>> I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both.
>> 
> When you do, can you possibly trigger an event based on the warning added
> with the patch ? This might help us to identify if the problem fixed
> with the patch actually happens.

I don't understand the question. I'm uncertain how to trigger, and also what event.

Chris
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-09-27 18:03                   ` [lm-sensors] " Chris Murphy
@ 2013-09-27 23:33                     ` Guenter Roeck
  -1 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-09-27 23:33 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org, bugzilla

On 09/27/2013 11:03 AM, Chris Murphy wrote:
>
> On Sep 27, 2013, at 11:59 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>
>> On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote:
>>>
>>> On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>>>
>>>> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
>>>>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote:
>>>>>>>>> This suggests that initialization may be attempted more than once. The key cache
>>>>>>>>> is allocated only once, but the number of keys is read for each attempt.
>>>>>>>>>
>>>>>>>>> No idea if that can happen, but if the number of keys can increase after
>>>>>>>>> the first initialization attempt you would have an explanation for the crash.
>>>>>>>>
>>>>>>>> Good idea, and easy enough to test with the patch below.
>>>>>>>>
>>>>>>> Should we apply this patch even though it may not solve the specific problem ?
>>>>>>
>>>>>> Yes, why not - it certainly won't hurt. I am running it right now, so
>>>>>> it is at least run-tested.
>>>>>>
>>>>>>> Again, not sure if the key count can change, but the current code is at the very
>>>>>>> least inconsistent, as it keeps reading the key count without updating or
>>>>>>> verifying the cache size.
>>>>>>
>>>>>> Yes - I agree that the error state is far-fetched, but it is hard to
>>>>>> see any other logical explanation. There is of course always the
>>>>>> possibility that the problem is somewhere else completely.
>>>>>>
>>>>>> Proper patch attached.
>>>>>>
>>>>>> Thanks,
>>>>>> Henrik
>>>>>>
>>>>>> ---
>>>>>>
>>>>>>  From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
>>>>>> From: Henrik Rydberg <rydberg@euromail.se>
>>>>>> Date: Thu, 26 Sep 2013 08:33:16 +0200
>>>>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>>>>>>
>>>>>> After reports from Chris and Josh Boyer of a rare crash in applesmc,
>>>>>> Guenter pointed at the initialization problem fixed below. The patch
>>>>>> has not been verified to fix the crash, but should be applied
>>>>>> regardless.
>>>>>>
>>>>>> Reported-by: <jwboyer@fedoraproject.org>
>>>>>> Suggested-by: Guenter Roeck <linux@roeck-us.net>
>>>>>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
>>>>>> ---
>>>>>> drivers/hwmon/applesmc.c | 11 ++++++++++-
>>>>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>
>>>>> Thanks for the quick reply.  I'll get this rolled into our kernels soon.
>>>>>
>>>> I sent a pull request to Linus, so you should be able to pull it from
>>>> the upstream kernel shortly. Would be great to get feedback if the patch
>>>> solves the problem (or doesn't).
>>>
>>> I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both.
>>>
>> When you do, can you possibly trigger an event based on the warning added
>> with the patch ? This might help us to identify if the problem fixed
>> with the patch actually happens.
>
> I don't understand the question. I'm uncertain how to trigger, and also what event.
>

The patch includes a new warning message.

	pr_warn("key count changed from %d to %d\n",
                         s->key_count, count);

It would be great if there would be a means to detect if this message is seen
in a kernel log, because it would show that the potential crash condition
fixed with the patch was actually encountered. This would help us to determine
if we actually fixed the problem or not.

Of course, we'll know if is wasn't fixed if the system still crashes.

Thanks,
Guenter


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-09-27 23:33                     ` Guenter Roeck
  0 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-09-27 23:33 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org, bugzilla

On 09/27/2013 11:03 AM, Chris Murphy wrote:
>
> On Sep 27, 2013, at 11:59 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>
>> On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote:
>>>
>>> On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>>>
>>>> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
>>>>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote:
>>>>>>>>> This suggests that initialization may be attempted more than once. The key cache
>>>>>>>>> is allocated only once, but the number of keys is read for each attempt.
>>>>>>>>>
>>>>>>>>> No idea if that can happen, but if the number of keys can increase after
>>>>>>>>> the first initialization attempt you would have an explanation for the crash.
>>>>>>>>
>>>>>>>> Good idea, and easy enough to test with the patch below.
>>>>>>>>
>>>>>>> Should we apply this patch even though it may not solve the specific problem ?
>>>>>>
>>>>>> Yes, why not - it certainly won't hurt. I am running it right now, so
>>>>>> it is at least run-tested.
>>>>>>
>>>>>>> Again, not sure if the key count can change, but the current code is at the very
>>>>>>> least inconsistent, as it keeps reading the key count without updating or
>>>>>>> verifying the cache size.
>>>>>>
>>>>>> Yes - I agree that the error state is far-fetched, but it is hard to
>>>>>> see any other logical explanation. There is of course always the
>>>>>> possibility that the problem is somewhere else completely.
>>>>>>
>>>>>> Proper patch attached.
>>>>>>
>>>>>> Thanks,
>>>>>> Henrik
>>>>>>
>>>>>> ---
>>>>>>
>>>>>>  From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
>>>>>> From: Henrik Rydberg <rydberg@euromail.se>
>>>>>> Date: Thu, 26 Sep 2013 08:33:16 +0200
>>>>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>>>>>>
>>>>>> After reports from Chris and Josh Boyer of a rare crash in applesmc,
>>>>>> Guenter pointed at the initialization problem fixed below. The patch
>>>>>> has not been verified to fix the crash, but should be applied
>>>>>> regardless.
>>>>>>
>>>>>> Reported-by: <jwboyer@fedoraproject.org>
>>>>>> Suggested-by: Guenter Roeck <linux@roeck-us.net>
>>>>>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
>>>>>> ---
>>>>>> drivers/hwmon/applesmc.c | 11 ++++++++++-
>>>>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>
>>>>> Thanks for the quick reply.  I'll get this rolled into our kernels soon.
>>>>>
>>>> I sent a pull request to Linus, so you should be able to pull it from
>>>> the upstream kernel shortly. Would be great to get feedback if the patch
>>>> solves the problem (or doesn't).
>>>
>>> I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both.
>>>
>> When you do, can you possibly trigger an event based on the warning added
>> with the patch ? This might help us to identify if the problem fixed
>> with the patch actually happens.
>
> I don't understand the question. I'm uncertain how to trigger, and also what event.
>

The patch includes a new warning message.

	pr_warn("key count changed from %d to %d\n",
                         s->key_count, count);

It would be great if there would be a means to detect if this message is seen
in a kernel log, because it would show that the potential crash condition
fixed with the patch was actually encountered. This would help us to determine
if we actually fixed the problem or not.

Of course, we'll know if is wasn't fixed if the system still crashes.

Thanks,
Guenter


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-09-27 23:33                     ` [lm-sensors] " Guenter Roeck
@ 2013-10-01  1:57                       ` Chris Murphy
  -1 siblings, 0 replies; 86+ messages in thread
From: Chris Murphy @ 2013-10-01  1:57 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org


On Sep 27, 2013, at 5:33 PM, Guenter Roeck <linux@roeck-us.net> wrote:

> On 09/27/2013 11:03 AM, Chris Murphy wrote:
>> 
>> On Sep 27, 2013, at 11:59 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>> 
>>> On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote:
>>>> 
>>>> On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>>>> 
>>>>> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
>>>>>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote:
>>>>>>>>>> This suggests that initialization may be attempted more than once. The key cache
>>>>>>>>>> is allocated only once, but the number of keys is read for each attempt.
>>>>>>>>>> 
>>>>>>>>>> No idea if that can happen, but if the number of keys can increase after
>>>>>>>>>> the first initialization attempt you would have an explanation for the crash.
>>>>>>>>> 
>>>>>>>>> Good idea, and easy enough to test with the patch below.
>>>>>>>>> 
>>>>>>>> Should we apply this patch even though it may not solve the specific problem ?
>>>>>>> 
>>>>>>> Yes, why not - it certainly won't hurt. I am running it right now, so
>>>>>>> it is at least run-tested.
>>>>>>> 
>>>>>>>> Again, not sure if the key count can change, but the current code is at the very
>>>>>>>> least inconsistent, as it keeps reading the key count without updating or
>>>>>>>> verifying the cache size.
>>>>>>> 
>>>>>>> Yes - I agree that the error state is far-fetched, but it is hard to
>>>>>>> see any other logical explanation. There is of course always the
>>>>>>> possibility that the problem is somewhere else completely.
>>>>>>> 
>>>>>>> Proper patch attached.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Henrik
>>>>>>> 
>>>>>>> ---
>>>>>>> 
>>>>>>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
>>>>>>> From: Henrik Rydberg <rydberg@euromail.se>
>>>>>>> Date: Thu, 26 Sep 2013 08:33:16 +0200
>>>>>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>>>>>>> 
>>>>>>> After reports from Chris and Josh Boyer of a rare crash in applesmc,
>>>>>>> Guenter pointed at the initialization problem fixed below. The patch
>>>>>>> has not been verified to fix the crash, but should be applied
>>>>>>> regardless.
>>>>>>> 
>>>>>>> Reported-by: <jwboyer@fedoraproject.org>
>>>>>>> Suggested-by: Guenter Roeck <linux@roeck-us.net>
>>>>>>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
>>>>>>> ---
>>>>>>> drivers/hwmon/applesmc.c | 11 ++++++++++-
>>>>>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>> 
>>>>>> Thanks for the quick reply.  I'll get this rolled into our kernels soon.
>>>>>> 
>>>>> I sent a pull request to Linus, so you should be able to pull it from
>>>>> the upstream kernel shortly. Would be great to get feedback if the patch
>>>>> solves the problem (or doesn't).
>>>> 
>>>> I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both.
>>>> 
>>> When you do, can you possibly trigger an event based on the warning added
>>> with the patch ? This might help us to identify if the problem fixed
>>> with the patch actually happens.
>> 
>> I don't understand the question. I'm uncertain how to trigger, and also what event.
>> 
> 
> The patch includes a new warning message.
> 
> 	pr_warn("key count changed from %d to %d\n",
>                        s->key_count, count);
> 
> It would be great if there would be a means to detect if this message is seen
> in a kernel log, because it would show that the potential crash condition
> fixed with the patch was actually encountered. This would help us to determine
> if we actually fixed the problem or not.
> 
> Of course, we'll know if is wasn't fixed if the system still crashes.

Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64. 

[   10.886016] applesmc: key count changed from 261 to 1174405121

Attaching new full dmesg to the bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1011719#c11

Chris

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-01  1:57                       ` Chris Murphy
  0 siblings, 0 replies; 86+ messages in thread
From: Chris Murphy @ 2013-10-01  1:57 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org


On Sep 27, 2013, at 5:33 PM, Guenter Roeck <linux@roeck-us.net> wrote:

> On 09/27/2013 11:03 AM, Chris Murphy wrote:
>> 
>> On Sep 27, 2013, at 11:59 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>> 
>>> On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote:
>>>> 
>>>> On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>>>> 
>>>>> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
>>>>>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote:
>>>>>>>>>> This suggests that initialization may be attempted more than once. The key cache
>>>>>>>>>> is allocated only once, but the number of keys is read for each attempt.
>>>>>>>>>> 
>>>>>>>>>> No idea if that can happen, but if the number of keys can increase after
>>>>>>>>>> the first initialization attempt you would have an explanation for the crash.
>>>>>>>>> 
>>>>>>>>> Good idea, and easy enough to test with the patch below.
>>>>>>>>> 
>>>>>>>> Should we apply this patch even though it may not solve the specific problem ?
>>>>>>> 
>>>>>>> Yes, why not - it certainly won't hurt. I am running it right now, so
>>>>>>> it is at least run-tested.
>>>>>>> 
>>>>>>>> Again, not sure if the key count can change, but the current code is at the very
>>>>>>>> least inconsistent, as it keeps reading the key count without updating or
>>>>>>>> verifying the cache size.
>>>>>>> 
>>>>>>> Yes - I agree that the error state is far-fetched, but it is hard to
>>>>>>> see any other logical explanation. There is of course always the
>>>>>>> possibility that the problem is somewhere else completely.
>>>>>>> 
>>>>>>> Proper patch attached.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Henrik
>>>>>>> 
>>>>>>> ---
>>>>>>> 
>>>>>>> From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
>>>>>>> From: Henrik Rydberg <rydberg@euromail.se>
>>>>>>> Date: Thu, 26 Sep 2013 08:33:16 +0200
>>>>>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>>>>>>> 
>>>>>>> After reports from Chris and Josh Boyer of a rare crash in applesmc,
>>>>>>> Guenter pointed at the initialization problem fixed below. The patch
>>>>>>> has not been verified to fix the crash, but should be applied
>>>>>>> regardless.
>>>>>>> 
>>>>>>> Reported-by: <jwboyer@fedoraproject.org>
>>>>>>> Suggested-by: Guenter Roeck <linux@roeck-us.net>
>>>>>>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
>>>>>>> ---
>>>>>>> drivers/hwmon/applesmc.c | 11 ++++++++++-
>>>>>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>> 
>>>>>> Thanks for the quick reply.  I'll get this rolled into our kernels soon.
>>>>>> 
>>>>> I sent a pull request to Linus, so you should be able to pull it from
>>>>> the upstream kernel shortly. Would be great to get feedback if the patch
>>>>> solves the problem (or doesn't).
>>>> 
>>>> I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both.
>>>> 
>>> When you do, can you possibly trigger an event based on the warning added
>>> with the patch ? This might help us to identify if the problem fixed
>>> with the patch actually happens.
>> 
>> I don't understand the question. I'm uncertain how to trigger, and also what event.
>> 
> 
> The patch includes a new warning message.
> 
> 	pr_warn("key count changed from %d to %d\n",
>                        s->key_count, count);
> 
> It would be great if there would be a means to detect if this message is seen
> in a kernel log, because it would show that the potential crash condition
> fixed with the patch was actually encountered. This would help us to determine
> if we actually fixed the problem or not.
> 
> Of course, we'll know if is wasn't fixed if the system still crashes.

Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64. 

[   10.886016] applesmc: key count changed from 261 to 1174405121

Attaching new full dmesg to the bug report:
https://bugzilla.redhat.com/show_bug.cgi?id\x1011719#c11

Chris
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-01  1:57                       ` [lm-sensors] " Chris Murphy
@ 2013-10-01  3:37                         ` Guenter Roeck
  -1 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-01  3:37 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On 09/30/2013 06:57 PM, Chris Murphy wrote:
>
> On Sep 27, 2013, at 5:33 PM, Guenter Roeck <linux@roeck-us.net> wrote:
>
>> On 09/27/2013 11:03 AM, Chris Murphy wrote:
>>>
>>> On Sep 27, 2013, at 11:59 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>>>
>>>> On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote:
>>>>>
>>>>> On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>>>>>
>>>>>> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
>>>>>>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote:
>>>>>>>>>>> This suggests that initialization may be attempted more than once. The key cache
>>>>>>>>>>> is allocated only once, but the number of keys is read for each attempt.
>>>>>>>>>>>
>>>>>>>>>>> No idea if that can happen, but if the number of keys can increase after
>>>>>>>>>>> the first initialization attempt you would have an explanation for the crash.
>>>>>>>>>>
>>>>>>>>>> Good idea, and easy enough to test with the patch below.
>>>>>>>>>>
>>>>>>>>> Should we apply this patch even though it may not solve the specific problem ?
>>>>>>>>
>>>>>>>> Yes, why not - it certainly won't hurt. I am running it right now, so
>>>>>>>> it is at least run-tested.
>>>>>>>>
>>>>>>>>> Again, not sure if the key count can change, but the current code is at the very
>>>>>>>>> least inconsistent, as it keeps reading the key count without updating or
>>>>>>>>> verifying the cache size.
>>>>>>>>
>>>>>>>> Yes - I agree that the error state is far-fetched, but it is hard to
>>>>>>>> see any other logical explanation. There is of course always the
>>>>>>>> possibility that the problem is somewhere else completely.
>>>>>>>>
>>>>>>>> Proper patch attached.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Henrik
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>>  From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
>>>>>>>> From: Henrik Rydberg <rydberg@euromail.se>
>>>>>>>> Date: Thu, 26 Sep 2013 08:33:16 +0200
>>>>>>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>>>>>>>>
>>>>>>>> After reports from Chris and Josh Boyer of a rare crash in applesmc,
>>>>>>>> Guenter pointed at the initialization problem fixed below. The patch
>>>>>>>> has not been verified to fix the crash, but should be applied
>>>>>>>> regardless.
>>>>>>>>
>>>>>>>> Reported-by: <jwboyer@fedoraproject.org>
>>>>>>>> Suggested-by: Guenter Roeck <linux@roeck-us.net>
>>>>>>>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
>>>>>>>> ---
>>>>>>>> drivers/hwmon/applesmc.c | 11 ++++++++++-
>>>>>>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> Thanks for the quick reply.  I'll get this rolled into our kernels soon.
>>>>>>>
>>>>>> I sent a pull request to Linus, so you should be able to pull it from
>>>>>> the upstream kernel shortly. Would be great to get feedback if the patch
>>>>>> solves the problem (or doesn't).
>>>>>
>>>>> I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both.
>>>>>
>>>> When you do, can you possibly trigger an event based on the warning added
>>>> with the patch ? This might help us to identify if the problem fixed
>>>> with the patch actually happens.
>>>
>>> I don't understand the question. I'm uncertain how to trigger, and also what event.
>>>
>>
>> The patch includes a new warning message.
>>
>> 	pr_warn("key count changed from %d to %d\n",
>>                         s->key_count, count);
>>
>> It would be great if there would be a means to detect if this message is seen
>> in a kernel log, because it would show that the potential crash condition
>> fixed with the patch was actually encountered. This would help us to determine
>> if we actually fixed the problem or not.
>>
>> Of course, we'll know if is wasn't fixed if the system still crashes.
>
> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
>
> [   10.886016] applesmc: key count changed from 261 to 1174405121
>

Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
Which I guess explains the subsequent memory allocation error in the log.

Henrik, any idea what might be going on ? Is it possible that the previous
command failure leaves some state machine in a bad state ?

Guenter

> Attaching new full dmesg to the bug report:
> https://bugzilla.redhat.com/show_bug.cgi?id=1011719#c11
>
> Chris
>


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-01  3:37                         ` Guenter Roeck
  0 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-01  3:37 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Josh Boyer, Henrik Rydberg, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On 09/30/2013 06:57 PM, Chris Murphy wrote:
>
> On Sep 27, 2013, at 5:33 PM, Guenter Roeck <linux@roeck-us.net> wrote:
>
>> On 09/27/2013 11:03 AM, Chris Murphy wrote:
>>>
>>> On Sep 27, 2013, at 11:59 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>>>
>>>> On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote:
>>>>>
>>>>> On Sep 27, 2013, at 11:12 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>>>>>
>>>>>> On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:
>>>>>>> On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg <rydberg@euromail.se> wrote:
>>>>>>>>>>> This suggests that initialization may be attempted more than once. The key cache
>>>>>>>>>>> is allocated only once, but the number of keys is read for each attempt.
>>>>>>>>>>>
>>>>>>>>>>> No idea if that can happen, but if the number of keys can increase after
>>>>>>>>>>> the first initialization attempt you would have an explanation for the crash.
>>>>>>>>>>
>>>>>>>>>> Good idea, and easy enough to test with the patch below.
>>>>>>>>>>
>>>>>>>>> Should we apply this patch even though it may not solve the specific problem ?
>>>>>>>>
>>>>>>>> Yes, why not - it certainly won't hurt. I am running it right now, so
>>>>>>>> it is at least run-tested.
>>>>>>>>
>>>>>>>>> Again, not sure if the key count can change, but the current code is at the very
>>>>>>>>> least inconsistent, as it keeps reading the key count without updating or
>>>>>>>>> verifying the cache size.
>>>>>>>>
>>>>>>>> Yes - I agree that the error state is far-fetched, but it is hard to
>>>>>>>> see any other logical explanation. There is of course always the
>>>>>>>> possibility that the problem is somewhere else completely.
>>>>>>>>
>>>>>>>> Proper patch attached.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Henrik
>>>>>>>>
>>>>>>>> ---
>>>>>>>>
>>>>>>>>  From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
>>>>>>>> From: Henrik Rydberg <rydberg@euromail.se>
>>>>>>>> Date: Thu, 26 Sep 2013 08:33:16 +0200
>>>>>>>> Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding
>>>>>>>>
>>>>>>>> After reports from Chris and Josh Boyer of a rare crash in applesmc,
>>>>>>>> Guenter pointed at the initialization problem fixed below. The patch
>>>>>>>> has not been verified to fix the crash, but should be applied
>>>>>>>> regardless.
>>>>>>>>
>>>>>>>> Reported-by: <jwboyer@fedoraproject.org>
>>>>>>>> Suggested-by: Guenter Roeck <linux@roeck-us.net>
>>>>>>>> Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
>>>>>>>> ---
>>>>>>>> drivers/hwmon/applesmc.c | 11 ++++++++++-
>>>>>>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> Thanks for the quick reply.  I'll get this rolled into our kernels soon.
>>>>>>>
>>>>>> I sent a pull request to Linus, so you should be able to pull it from
>>>>>> the upstream kernel shortly. Would be great to get feedback if the patch
>>>>>> solves the problem (or doesn't).
>>>>>
>>>>> I'll start running it when it appears in koji. It's very transient, maybe one oops per week with lots of (other) testing. I'm not even sure if it happens on warm or cold boots or both.
>>>>>
>>>> When you do, can you possibly trigger an event based on the warning added
>>>> with the patch ? This might help us to identify if the problem fixed
>>>> with the patch actually happens.
>>>
>>> I don't understand the question. I'm uncertain how to trigger, and also what event.
>>>
>>
>> The patch includes a new warning message.
>>
>> 	pr_warn("key count changed from %d to %d\n",
>>                         s->key_count, count);
>>
>> It would be great if there would be a means to detect if this message is seen
>> in a kernel log, because it would show that the potential crash condition
>> fixed with the patch was actually encountered. This would help us to determine
>> if we actually fixed the problem or not.
>>
>> Of course, we'll know if is wasn't fixed if the system still crashes.
>
> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
>
> [   10.886016] applesmc: key count changed from 261 to 1174405121
>

Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
Which I guess explains the subsequent memory allocation error in the log.

Henrik, any idea what might be going on ? Is it possible that the previous
command failure leaves some state machine in a bad state ?

Guenter

> Attaching new full dmesg to the bug report:
> https://bugzilla.redhat.com/show_bug.cgi?id\x1011719#c11
>
> Chris
>


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-01  3:37                         ` [lm-sensors] " Guenter Roeck
@ 2013-10-01 10:55                           ` Henrik Rydberg
  -1 siblings, 0 replies; 86+ messages in thread
From: Henrik Rydberg @ 2013-10-01 10:55 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

> >Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
> >
> >[   10.886016] applesmc: key count changed from 261 to 1174405121
> >
> 
> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
> Which I guess explains the subsequent memory allocation error in the log.
> 
> Henrik, any idea what might be going on ? Is it possible that the previous
> command failure leaves some state machine in a bad state ?

I seem to recall a report on another similar state problem on newer
machines, so maybe, yes. Older machines seem fine, I have never
encountered the problem myself. Here is a patch to test that
theory. It has been tested to be pretty harmless on two different
generations.

I really really do not want to add an 'if (value is insane)' check ;-)

Thanks,
Henrik

>From d48a9e4e6e45dcd9c7e7ad88df714b92772a62d6 Mon Sep 17 00:00:00 2001
From: Henrik Rydberg <rydberg@euromail.se>
Date: Tue, 1 Oct 2013 12:16:09 +0200
Subject: [PATCH] applesmc remedy take 1

Conjectured problem: there are remnant bytes ready on the data line
which corrupts the read after a failure.

Remedy: assuming bit0 is the read valid line, try to flush it before
starting a new command.

Exception: the write-number-of-bytes-to-read command seems to differ
between models (it may not be needed on the newest), so do not try to
flush the data at that particular point.

Tested on a MacBookPro10,1 and a MacBookAir3,1, but the original problem
has not been reproduced, so the actual effect of this patch is unknown.
---
 drivers/hwmon/applesmc.c | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
index 98814d1..f6eaf6d1 100644
--- a/drivers/hwmon/applesmc.c
+++ b/drivers/hwmon/applesmc.c
@@ -186,10 +186,23 @@ static int wait_read(void)
  * send_byte - Write to SMC port, retrying when necessary. Callers
  * must hold applesmc_lock.
  */
-static int send_byte(u8 cmd, u16 port)
+static int send_byte(u8 cmd, u16 port, bool flush)
 {
-	u8 status;
-	int us;
+	u8 status, data;
+	int us, nskip;
+
+	if (flush) {
+		/* read the data port until bit0 is cleared */
+		for (nskip = 0; nskip < 16; nskip++) {
+			udelay(APPLESMC_MIN_WAIT);
+			status = inb(APPLESMC_CMD_PORT);
+			if (!(status & 0x01))
+				break;
+			data = inb(APPLESMC_DATA_PORT);
+		}
+		if (nskip)
+			pr_warn("flushed %d bytes\n", nskip);
+	}
 
 	outb(cmd, port);
 	for (us = APPLESMC_MIN_WAIT; us < APPLESMC_MAX_WAIT; us <<= 1) {
@@ -215,7 +228,7 @@ static int send_byte(u8 cmd, u16 port)
 
 static int send_command(u8 cmd)
 {
-	return send_byte(cmd, APPLESMC_CMD_PORT);
+	return send_byte(cmd, APPLESMC_CMD_PORT, true);
 }
 
 static int send_argument(const char *key)
@@ -223,7 +236,7 @@ static int send_argument(const char *key)
 	int i;
 
 	for (i = 0; i < 4; i++)
-		if (send_byte(key[i], APPLESMC_DATA_PORT))
+		if (send_byte(key[i], APPLESMC_DATA_PORT, true))
 			return -EIO;
 	return 0;
 }
@@ -237,7 +250,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
 		return -EIO;
 	}
 
-	if (send_byte(len, APPLESMC_DATA_PORT)) {
+	if (send_byte(len, APPLESMC_DATA_PORT, false)) {
 		pr_warn("%.4s: read len fail\n", key);
 		return -EIO;
 	}
@@ -262,13 +275,13 @@ static int write_smc(u8 cmd, const char *key, const u8 *buffer, u8 len)
 		return -EIO;
 	}
 
-	if (send_byte(len, APPLESMC_DATA_PORT)) {
+	if (send_byte(len, APPLESMC_DATA_PORT, true)) {
 		pr_warn("%.4s: write len fail\n", key);
 		return -EIO;
 	}
 
 	for (i = 0; i < len; i++) {
-		if (send_byte(buffer[i], APPLESMC_DATA_PORT)) {
+		if (send_byte(buffer[i], APPLESMC_DATA_PORT, true)) {
 			pr_warn("%s: write data fail\n", key);
 			return -EIO;
 		}
-- 
1.8.3.4


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-01 10:55                           ` Henrik Rydberg
  0 siblings, 0 replies; 86+ messages in thread
From: Henrik Rydberg @ 2013-10-01 10:55 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

> >Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
> >
> >[   10.886016] applesmc: key count changed from 261 to 1174405121
> >
> 
> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
> Which I guess explains the subsequent memory allocation error in the log.
> 
> Henrik, any idea what might be going on ? Is it possible that the previous
> command failure leaves some state machine in a bad state ?

I seem to recall a report on another similar state problem on newer
machines, so maybe, yes. Older machines seem fine, I have never
encountered the problem myself. Here is a patch to test that
theory. It has been tested to be pretty harmless on two different
generations.

I really really do not want to add an 'if (value is insane)' check ;-)

Thanks,
Henrik

From d48a9e4e6e45dcd9c7e7ad88df714b92772a62d6 Mon Sep 17 00:00:00 2001
From: Henrik Rydberg <rydberg@euromail.se>
Date: Tue, 1 Oct 2013 12:16:09 +0200
Subject: [PATCH] applesmc remedy take 1

Conjectured problem: there are remnant bytes ready on the data line
which corrupts the read after a failure.

Remedy: assuming bit0 is the read valid line, try to flush it before
starting a new command.

Exception: the write-number-of-bytes-to-read command seems to differ
between models (it may not be needed on the newest), so do not try to
flush the data at that particular point.

Tested on a MacBookPro10,1 and a MacBookAir3,1, but the original problem
has not been reproduced, so the actual effect of this patch is unknown.
---
 drivers/hwmon/applesmc.c | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
index 98814d1..f6eaf6d1 100644
--- a/drivers/hwmon/applesmc.c
+++ b/drivers/hwmon/applesmc.c
@@ -186,10 +186,23 @@ static int wait_read(void)
  * send_byte - Write to SMC port, retrying when necessary. Callers
  * must hold applesmc_lock.
  */
-static int send_byte(u8 cmd, u16 port)
+static int send_byte(u8 cmd, u16 port, bool flush)
 {
-	u8 status;
-	int us;
+	u8 status, data;
+	int us, nskip;
+
+	if (flush) {
+		/* read the data port until bit0 is cleared */
+		for (nskip = 0; nskip < 16; nskip++) {
+			udelay(APPLESMC_MIN_WAIT);
+			status = inb(APPLESMC_CMD_PORT);
+			if (!(status & 0x01))
+				break;
+			data = inb(APPLESMC_DATA_PORT);
+		}
+		if (nskip)
+			pr_warn("flushed %d bytes\n", nskip);
+	}
 
 	outb(cmd, port);
 	for (us = APPLESMC_MIN_WAIT; us < APPLESMC_MAX_WAIT; us <<= 1) {
@@ -215,7 +228,7 @@ static int send_byte(u8 cmd, u16 port)
 
 static int send_command(u8 cmd)
 {
-	return send_byte(cmd, APPLESMC_CMD_PORT);
+	return send_byte(cmd, APPLESMC_CMD_PORT, true);
 }
 
 static int send_argument(const char *key)
@@ -223,7 +236,7 @@ static int send_argument(const char *key)
 	int i;
 
 	for (i = 0; i < 4; i++)
-		if (send_byte(key[i], APPLESMC_DATA_PORT))
+		if (send_byte(key[i], APPLESMC_DATA_PORT, true))
 			return -EIO;
 	return 0;
 }
@@ -237,7 +250,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
 		return -EIO;
 	}
 
-	if (send_byte(len, APPLESMC_DATA_PORT)) {
+	if (send_byte(len, APPLESMC_DATA_PORT, false)) {
 		pr_warn("%.4s: read len fail\n", key);
 		return -EIO;
 	}
@@ -262,13 +275,13 @@ static int write_smc(u8 cmd, const char *key, const u8 *buffer, u8 len)
 		return -EIO;
 	}
 
-	if (send_byte(len, APPLESMC_DATA_PORT)) {
+	if (send_byte(len, APPLESMC_DATA_PORT, true)) {
 		pr_warn("%.4s: write len fail\n", key);
 		return -EIO;
 	}
 
 	for (i = 0; i < len; i++) {
-		if (send_byte(buffer[i], APPLESMC_DATA_PORT)) {
+		if (send_byte(buffer[i], APPLESMC_DATA_PORT, true)) {
 			pr_warn("%s: write data fail\n", key);
 			return -EIO;
 		}
-- 
1.8.3.4


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-01 10:55                           ` [lm-sensors] " Henrik Rydberg
@ 2013-10-01 15:19                             ` Guenter Roeck
  -1 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-01 15:19 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
> > >Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
> > >
> > >[   10.886016] applesmc: key count changed from 261 to 1174405121
> > >
> > 
> > Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
> > Which I guess explains the subsequent memory allocation error in the log.
> > 
> > Henrik, any idea what might be going on ? Is it possible that the previous
> > command failure leaves some state machine in a bad state ?
> 
> I seem to recall a report on another similar state problem on newer
> machines, so maybe, yes. Older machines seem fine, I have never
> encountered the problem myself. Here is a patch to test that
> theory. It has been tested to be pretty harmless on two different
> generations.
> 
> I really really do not want to add an 'if (value is insane)' check ;-)
> 
Chris,

any chance you can load this patch on an affected machine so we can get
test feedback ? This one is too experimental to submit upstream without
knowing that it really fixes the problem.

Thanks,
Guenter

> Thanks,
> Henrik
> 
> From d48a9e4e6e45dcd9c7e7ad88df714b92772a62d6 Mon Sep 17 00:00:00 2001
> From: Henrik Rydberg <rydberg@euromail.se>
> Date: Tue, 1 Oct 2013 12:16:09 +0200
> Subject: [PATCH] applesmc remedy take 1
> 
> Conjectured problem: there are remnant bytes ready on the data line
> which corrupts the read after a failure.
> 
> Remedy: assuming bit0 is the read valid line, try to flush it before
> starting a new command.
> 
> Exception: the write-number-of-bytes-to-read command seems to differ
> between models (it may not be needed on the newest), so do not try to
> flush the data at that particular point.
> 
> Tested on a MacBookPro10,1 and a MacBookAir3,1, but the original problem
> has not been reproduced, so the actual effect of this patch is unknown.
> ---
>  drivers/hwmon/applesmc.c | 29 +++++++++++++++++++++--------
>  1 file changed, 21 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
> index 98814d1..f6eaf6d1 100644
> --- a/drivers/hwmon/applesmc.c
> +++ b/drivers/hwmon/applesmc.c
> @@ -186,10 +186,23 @@ static int wait_read(void)
>   * send_byte - Write to SMC port, retrying when necessary. Callers
>   * must hold applesmc_lock.
>   */
> -static int send_byte(u8 cmd, u16 port)
> +static int send_byte(u8 cmd, u16 port, bool flush)
>  {
> -	u8 status;
> -	int us;
> +	u8 status, data;
> +	int us, nskip;
> +
> +	if (flush) {
> +		/* read the data port until bit0 is cleared */
> +		for (nskip = 0; nskip < 16; nskip++) {
> +			udelay(APPLESMC_MIN_WAIT);
> +			status = inb(APPLESMC_CMD_PORT);
> +			if (!(status & 0x01))
> +				break;
> +			data = inb(APPLESMC_DATA_PORT);
> +		}
> +		if (nskip)
> +			pr_warn("flushed %d bytes\n", nskip);
> +	}
>  
>  	outb(cmd, port);
>  	for (us = APPLESMC_MIN_WAIT; us < APPLESMC_MAX_WAIT; us <<= 1) {
> @@ -215,7 +228,7 @@ static int send_byte(u8 cmd, u16 port)
>  
>  static int send_command(u8 cmd)
>  {
> -	return send_byte(cmd, APPLESMC_CMD_PORT);
> +	return send_byte(cmd, APPLESMC_CMD_PORT, true);
>  }
>  
>  static int send_argument(const char *key)
> @@ -223,7 +236,7 @@ static int send_argument(const char *key)
>  	int i;
>  
>  	for (i = 0; i < 4; i++)
> -		if (send_byte(key[i], APPLESMC_DATA_PORT))
> +		if (send_byte(key[i], APPLESMC_DATA_PORT, true))
>  			return -EIO;
>  	return 0;
>  }
> @@ -237,7 +250,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
>  		return -EIO;
>  	}
>  
> -	if (send_byte(len, APPLESMC_DATA_PORT)) {
> +	if (send_byte(len, APPLESMC_DATA_PORT, false)) {
>  		pr_warn("%.4s: read len fail\n", key);
>  		return -EIO;
>  	}
> @@ -262,13 +275,13 @@ static int write_smc(u8 cmd, const char *key, const u8 *buffer, u8 len)
>  		return -EIO;
>  	}
>  
> -	if (send_byte(len, APPLESMC_DATA_PORT)) {
> +	if (send_byte(len, APPLESMC_DATA_PORT, true)) {
>  		pr_warn("%.4s: write len fail\n", key);
>  		return -EIO;
>  	}
>  
>  	for (i = 0; i < len; i++) {
> -		if (send_byte(buffer[i], APPLESMC_DATA_PORT)) {
> +		if (send_byte(buffer[i], APPLESMC_DATA_PORT, true)) {
>  			pr_warn("%s: write data fail\n", key);
>  			return -EIO;
>  		}
> -- 
> 1.8.3.4
> 
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-01 15:19                             ` Guenter Roeck
  0 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-01 15:19 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
> > >Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
> > >
> > >[   10.886016] applesmc: key count changed from 261 to 1174405121
> > >
> > 
> > Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
> > Which I guess explains the subsequent memory allocation error in the log.
> > 
> > Henrik, any idea what might be going on ? Is it possible that the previous
> > command failure leaves some state machine in a bad state ?
> 
> I seem to recall a report on another similar state problem on newer
> machines, so maybe, yes. Older machines seem fine, I have never
> encountered the problem myself. Here is a patch to test that
> theory. It has been tested to be pretty harmless on two different
> generations.
> 
> I really really do not want to add an 'if (value is insane)' check ;-)
> 
Chris,

any chance you can load this patch on an affected machine so we can get
test feedback ? This one is too experimental to submit upstream without
knowing that it really fixes the problem.

Thanks,
Guenter

> Thanks,
> Henrik
> 
> From d48a9e4e6e45dcd9c7e7ad88df714b92772a62d6 Mon Sep 17 00:00:00 2001
> From: Henrik Rydberg <rydberg@euromail.se>
> Date: Tue, 1 Oct 2013 12:16:09 +0200
> Subject: [PATCH] applesmc remedy take 1
> 
> Conjectured problem: there are remnant bytes ready on the data line
> which corrupts the read after a failure.
> 
> Remedy: assuming bit0 is the read valid line, try to flush it before
> starting a new command.
> 
> Exception: the write-number-of-bytes-to-read command seems to differ
> between models (it may not be needed on the newest), so do not try to
> flush the data at that particular point.
> 
> Tested on a MacBookPro10,1 and a MacBookAir3,1, but the original problem
> has not been reproduced, so the actual effect of this patch is unknown.
> ---
>  drivers/hwmon/applesmc.c | 29 +++++++++++++++++++++--------
>  1 file changed, 21 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
> index 98814d1..f6eaf6d1 100644
> --- a/drivers/hwmon/applesmc.c
> +++ b/drivers/hwmon/applesmc.c
> @@ -186,10 +186,23 @@ static int wait_read(void)
>   * send_byte - Write to SMC port, retrying when necessary. Callers
>   * must hold applesmc_lock.
>   */
> -static int send_byte(u8 cmd, u16 port)
> +static int send_byte(u8 cmd, u16 port, bool flush)
>  {
> -	u8 status;
> -	int us;
> +	u8 status, data;
> +	int us, nskip;
> +
> +	if (flush) {
> +		/* read the data port until bit0 is cleared */
> +		for (nskip = 0; nskip < 16; nskip++) {
> +			udelay(APPLESMC_MIN_WAIT);
> +			status = inb(APPLESMC_CMD_PORT);
> +			if (!(status & 0x01))
> +				break;
> +			data = inb(APPLESMC_DATA_PORT);
> +		}
> +		if (nskip)
> +			pr_warn("flushed %d bytes\n", nskip);
> +	}
>  
>  	outb(cmd, port);
>  	for (us = APPLESMC_MIN_WAIT; us < APPLESMC_MAX_WAIT; us <<= 1) {
> @@ -215,7 +228,7 @@ static int send_byte(u8 cmd, u16 port)
>  
>  static int send_command(u8 cmd)
>  {
> -	return send_byte(cmd, APPLESMC_CMD_PORT);
> +	return send_byte(cmd, APPLESMC_CMD_PORT, true);
>  }
>  
>  static int send_argument(const char *key)
> @@ -223,7 +236,7 @@ static int send_argument(const char *key)
>  	int i;
>  
>  	for (i = 0; i < 4; i++)
> -		if (send_byte(key[i], APPLESMC_DATA_PORT))
> +		if (send_byte(key[i], APPLESMC_DATA_PORT, true))
>  			return -EIO;
>  	return 0;
>  }
> @@ -237,7 +250,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
>  		return -EIO;
>  	}
>  
> -	if (send_byte(len, APPLESMC_DATA_PORT)) {
> +	if (send_byte(len, APPLESMC_DATA_PORT, false)) {
>  		pr_warn("%.4s: read len fail\n", key);
>  		return -EIO;
>  	}
> @@ -262,13 +275,13 @@ static int write_smc(u8 cmd, const char *key, const u8 *buffer, u8 len)
>  		return -EIO;
>  	}
>  
> -	if (send_byte(len, APPLESMC_DATA_PORT)) {
> +	if (send_byte(len, APPLESMC_DATA_PORT, true)) {
>  		pr_warn("%.4s: write len fail\n", key);
>  		return -EIO;
>  	}
>  
>  	for (i = 0; i < len; i++) {
> -		if (send_byte(buffer[i], APPLESMC_DATA_PORT)) {
> +		if (send_byte(buffer[i], APPLESMC_DATA_PORT, true)) {
>  			pr_warn("%s: write data fail\n", key);
>  			return -EIO;
>  		}
> -- 
> 1.8.3.4
> 
> 

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-01 15:19                             ` [lm-sensors] " Guenter Roeck
@ 2013-10-01 15:33                               ` Chris Murphy
  -1 siblings, 0 replies; 86+ messages in thread
From: Chris Murphy @ 2013-10-01 15:33 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org


On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote:

> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
>>>> 
>>>> [   10.886016] applesmc: key count changed from 261 to 1174405121
>>>> 
>>> 
>>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
>>> Which I guess explains the subsequent memory allocation error in the log.
>>> 
>>> Henrik, any idea what might be going on ? Is it possible that the previous
>>> command failure leaves some state machine in a bad state ?
>> 
>> I seem to recall a report on another similar state problem on newer
>> machines, so maybe, yes. Older machines seem fine, I have never
>> encountered the problem myself. Here is a patch to test that
>> theory. It has been tested to be pretty harmless on two different
>> generations.
>> 
>> I really really do not want to add an 'if (value is insane)' check ;-)
>> 
> Chris,
> 
> any chance you can load this patch on an affected machine so we can get
> test feedback ? This one is too experimental to submit upstream without
> knowing that it really fixes the problem.

Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012).

Chris

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-01 15:33                               ` Chris Murphy
  0 siblings, 0 replies; 86+ messages in thread
From: Chris Murphy @ 2013-10-01 15:33 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org


On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote:

> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
>>>> 
>>>> [   10.886016] applesmc: key count changed from 261 to 1174405121
>>>> 
>>> 
>>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
>>> Which I guess explains the subsequent memory allocation error in the log.
>>> 
>>> Henrik, any idea what might be going on ? Is it possible that the previous
>>> command failure leaves some state machine in a bad state ?
>> 
>> I seem to recall a report on another similar state problem on newer
>> machines, so maybe, yes. Older machines seem fine, I have never
>> encountered the problem myself. Here is a patch to test that
>> theory. It has been tested to be pretty harmless on two different
>> generations.
>> 
>> I really really do not want to add an 'if (value is insane)' check ;-)
>> 
> Chris,
> 
> any chance you can load this patch on an affected machine so we can get
> test feedback ? This one is too experimental to submit upstream without
> knowing that it really fixes the problem.

Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012).

Chris
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-01 15:33                               ` [lm-sensors] " Chris Murphy
@ 2013-10-01 16:24                                 ` Guenter Roeck
  -1 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-01 16:24 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote:
> 
> On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote:
> 
> > On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
> >>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
> >>>> 
> >>>> [   10.886016] applesmc: key count changed from 261 to 1174405121
> >>>> 
> >>> 
> >>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
> >>> Which I guess explains the subsequent memory allocation error in the log.
> >>> 
> >>> Henrik, any idea what might be going on ? Is it possible that the previous
> >>> command failure leaves some state machine in a bad state ?
> >> 
> >> I seem to recall a report on another similar state problem on newer
> >> machines, so maybe, yes. Older machines seem fine, I have never
> >> encountered the problem myself. Here is a patch to test that
> >> theory. It has been tested to be pretty harmless on two different
> >> generations.
> >> 
> >> I really really do not want to add an 'if (value is insane)' check ;-)
> >> 
> > Chris,
> > 
> > any chance you can load this patch on an affected machine so we can get
> > test feedback ? This one is too experimental to submit upstream without
> > knowing that it really fixes the problem.
> 
> Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012).
> 
Only requirement is that it also includes the previous patch, so it would be
optimal if you can apply it on top of the previous image.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-01 16:24                                 ` Guenter Roeck
  0 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-01 16:24 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote:
> 
> On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote:
> 
> > On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
> >>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
> >>>> 
> >>>> [   10.886016] applesmc: key count changed from 261 to 1174405121
> >>>> 
> >>> 
> >>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
> >>> Which I guess explains the subsequent memory allocation error in the log.
> >>> 
> >>> Henrik, any idea what might be going on ? Is it possible that the previous
> >>> command failure leaves some state machine in a bad state ?
> >> 
> >> I seem to recall a report on another similar state problem on newer
> >> machines, so maybe, yes. Older machines seem fine, I have never
> >> encountered the problem myself. Here is a patch to test that
> >> theory. It has been tested to be pretty harmless on two different
> >> generations.
> >> 
> >> I really really do not want to add an 'if (value is insane)' check ;-)
> >> 
> > Chris,
> > 
> > any chance you can load this patch on an affected machine so we can get
> > test feedback ? This one is too experimental to submit upstream without
> > knowing that it really fixes the problem.
> 
> Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012).
> 
Only requirement is that it also includes the previous patch, so it would be
optimal if you can apply it on top of the previous image.

Thanks,
Guenter

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-01 16:24                                 ` [lm-sensors] " Guenter Roeck
@ 2013-10-02  1:09                                   ` Chris Murphy
  -1 siblings, 0 replies; 86+ messages in thread
From: Chris Murphy @ 2013-10-02  1:09 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org


On Oct 1, 2013, at 10:24 AM, Guenter Roeck <linux@roeck-us.net> wrote:

> On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote:
>> 
>> On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>> 
>>> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
>>>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
>>>>>> 
>>>>>> [   10.886016] applesmc: key count changed from 261 to 1174405121
>>>>>> 
>>>>> 
>>>>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
>>>>> Which I guess explains the subsequent memory allocation error in the log.
>>>>> 
>>>>> Henrik, any idea what might be going on ? Is it possible that the previous
>>>>> command failure leaves some state machine in a bad state ?
>>>> 
>>>> I seem to recall a report on another similar state problem on newer
>>>> machines, so maybe, yes. Older machines seem fine, I have never
>>>> encountered the problem myself. Here is a patch to test that
>>>> theory. It has been tested to be pretty harmless on two different
>>>> generations.
>>>> 
>>>> I really really do not want to add an 'if (value is insane)' check ;-)
>>>> 
>>> Chris,
>>> 
>>> any chance you can load this patch on an affected machine so we can get
>>> test feedback ? This one is too experimental to submit upstream without
>>> knowing that it really fixes the problem.
>> 
>> Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012).
>> 
> Only requirement is that it also includes the previous patch, so it would be
> optimal if you can apply it on top of the previous image.

Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it:

[   13.799819] applesmc: key=261 fan=2 temp=14 index=14 acc=1 lux=2 kbd=1
[   13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10


Chris

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-02  1:09                                   ` Chris Murphy
  0 siblings, 0 replies; 86+ messages in thread
From: Chris Murphy @ 2013-10-02  1:09 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org


On Oct 1, 2013, at 10:24 AM, Guenter Roeck <linux@roeck-us.net> wrote:

> On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote:
>> 
>> On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>> 
>>> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
>>>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
>>>>>> 
>>>>>> [   10.886016] applesmc: key count changed from 261 to 1174405121
>>>>>> 
>>>>> 
>>>>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
>>>>> Which I guess explains the subsequent memory allocation error in the log.
>>>>> 
>>>>> Henrik, any idea what might be going on ? Is it possible that the previous
>>>>> command failure leaves some state machine in a bad state ?
>>>> 
>>>> I seem to recall a report on another similar state problem on newer
>>>> machines, so maybe, yes. Older machines seem fine, I have never
>>>> encountered the problem myself. Here is a patch to test that
>>>> theory. It has been tested to be pretty harmless on two different
>>>> generations.
>>>> 
>>>> I really really do not want to add an 'if (value is insane)' check ;-)
>>>> 
>>> Chris,
>>> 
>>> any chance you can load this patch on an affected machine so we can get
>>> test feedback ? This one is too experimental to submit upstream without
>>> knowing that it really fixes the problem.
>> 
>> Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012).
>> 
> Only requirement is that it also includes the previous patch, so it would be
> optimal if you can apply it on top of the previous image.

Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it:

[   13.799819] applesmc: key&1 fan=2 temp\x14 index\x14 acc=1 lux=2 kbd=1
[   13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10


Chris
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-02  1:09                                   ` [lm-sensors] " Chris Murphy
@ 2013-10-02  3:51                                     ` Guenter Roeck
  -1 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-02  3:51 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On Tue, Oct 01, 2013 at 07:09:26PM -0600, Chris Murphy wrote:
> 
> On Oct 1, 2013, at 10:24 AM, Guenter Roeck <linux@roeck-us.net> wrote:
> 
> > On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote:
> >> 
> >> On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote:
> >> 
> >>> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
> >>>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
> >>>>>> 
> >>>>>> [   10.886016] applesmc: key count changed from 261 to 1174405121
> >>>>>> 
> >>>>> 
> >>>>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
> >>>>> Which I guess explains the subsequent memory allocation error in the log.
> >>>>> 
> >>>>> Henrik, any idea what might be going on ? Is it possible that the previous
> >>>>> command failure leaves some state machine in a bad state ?
> >>>> 
> >>>> I seem to recall a report on another similar state problem on newer
> >>>> machines, so maybe, yes. Older machines seem fine, I have never
> >>>> encountered the problem myself. Here is a patch to test that
> >>>> theory. It has been tested to be pretty harmless on two different
> >>>> generations.
> >>>> 
> >>>> I really really do not want to add an 'if (value is insane)' check ;-)
> >>>> 
> >>> Chris,
> >>> 
> >>> any chance you can load this patch on an affected machine so we can get
> >>> test feedback ? This one is too experimental to submit upstream without
> >>> knowing that it really fixes the problem.
> >> 
> >> Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012).
> >> 
> > Only requirement is that it also includes the previous patch, so it would be
> > optimal if you can apply it on top of the previous image.
> 
> Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it:
> 
> [   13.799819] applesmc: key=261 fan=2 temp=14 index=14 acc=1 lux=2 kbd=1
> [   13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10
> 

Hi Chris,

That only means that you did not hit the problem. There may be some secondary
trigger (cold boot ? coffee on the cpu ?).

One thing I have seen in all logs is the earlier "send_byte fail" message, so
I think that is a pre-requisite.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-02  3:51                                     ` Guenter Roeck
  0 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-02  3:51 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On Tue, Oct 01, 2013 at 07:09:26PM -0600, Chris Murphy wrote:
> 
> On Oct 1, 2013, at 10:24 AM, Guenter Roeck <linux@roeck-us.net> wrote:
> 
> > On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote:
> >> 
> >> On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote:
> >> 
> >>> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
> >>>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
> >>>>>> 
> >>>>>> [   10.886016] applesmc: key count changed from 261 to 1174405121
> >>>>>> 
> >>>>> 
> >>>>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
> >>>>> Which I guess explains the subsequent memory allocation error in the log.
> >>>>> 
> >>>>> Henrik, any idea what might be going on ? Is it possible that the previous
> >>>>> command failure leaves some state machine in a bad state ?
> >>>> 
> >>>> I seem to recall a report on another similar state problem on newer
> >>>> machines, so maybe, yes. Older machines seem fine, I have never
> >>>> encountered the problem myself. Here is a patch to test that
> >>>> theory. It has been tested to be pretty harmless on two different
> >>>> generations.
> >>>> 
> >>>> I really really do not want to add an 'if (value is insane)' check ;-)
> >>>> 
> >>> Chris,
> >>> 
> >>> any chance you can load this patch on an affected machine so we can get
> >>> test feedback ? This one is too experimental to submit upstream without
> >>> knowing that it really fixes the problem.
> >> 
> >> Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012).
> >> 
> > Only requirement is that it also includes the previous patch, so it would be
> > optimal if you can apply it on top of the previous image.
> 
> Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it:
> 
> [   13.799819] applesmc: key&1 fan=2 temp\x14 index\x14 acc=1 lux=2 kbd=1
> [   13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10
> 

Hi Chris,

That only means that you did not hit the problem. There may be some secondary
trigger (cold boot ? coffee on the cpu ?).

One thing I have seen in all logs is the earlier "send_byte fail" message, so
I think that is a pre-requisite.

Thanks,
Guenter

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-02  3:51                                     ` [lm-sensors] " Guenter Roeck
@ 2013-10-02  3:55                                       ` Chris Murphy
  -1 siblings, 0 replies; 86+ messages in thread
From: Chris Murphy @ 2013-10-02  3:55 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org


On Oct 1, 2013, at 9:51 PM, Guenter Roeck <linux@roeck-us.net> wrote:

> On Tue, Oct 01, 2013 at 07:09:26PM -0600, Chris Murphy wrote:
>> 
>> On Oct 1, 2013, at 10:24 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>> 
>>> On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote:
>>>> 
>>>> On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>>>> 
>>>>> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
>>>>>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
>>>>>>>> 
>>>>>>>> [   10.886016] applesmc: key count changed from 261 to 1174405121
>>>>>>>> 
>>>>>>> 
>>>>>>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
>>>>>>> Which I guess explains the subsequent memory allocation error in the log.
>>>>>>> 
>>>>>>> Henrik, any idea what might be going on ? Is it possible that the previous
>>>>>>> command failure leaves some state machine in a bad state ?
>>>>>> 
>>>>>> I seem to recall a report on another similar state problem on newer
>>>>>> machines, so maybe, yes. Older machines seem fine, I have never
>>>>>> encountered the problem myself. Here is a patch to test that
>>>>>> theory. It has been tested to be pretty harmless on two different
>>>>>> generations.
>>>>>> 
>>>>>> I really really do not want to add an 'if (value is insane)' check ;-)
>>>>>> 
>>>>> Chris,
>>>>> 
>>>>> any chance you can load this patch on an affected machine so we can get
>>>>> test feedback ? This one is too experimental to submit upstream without
>>>>> knowing that it really fixes the problem.
>>>> 
>>>> Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012).
>>>> 
>>> Only requirement is that it also includes the previous patch, so it would be
>>> optimal if you can apply it on top of the previous image.
>> 
>> Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it:
>> 
>> [   13.799819] applesmc: key=261 fan=2 temp=14 index=14 acc=1 lux=2 kbd=1
>> [   13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10
>> 
> 
> Hi Chris,
> 
> That only means that you did not hit the problem. There may be some secondary
> trigger (cold boot ? coffee on the cpu ?).
> 
> One thing I have seen in all logs is the earlier "send_byte fail" message, so
> I think that is a pre-requisite.

I have no idea how to trigger it. I have tried cold and warm boots. Boots between linux and OS X to linux. *shrug* I'll keep trying as I'm doing other testing, maybe I'll stumble onto it.


Chris

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-02  3:55                                       ` Chris Murphy
  0 siblings, 0 replies; 86+ messages in thread
From: Chris Murphy @ 2013-10-02  3:55 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org


On Oct 1, 2013, at 9:51 PM, Guenter Roeck <linux@roeck-us.net> wrote:

> On Tue, Oct 01, 2013 at 07:09:26PM -0600, Chris Murphy wrote:
>> 
>> On Oct 1, 2013, at 10:24 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>> 
>>> On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote:
>>>> 
>>>> On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>>>> 
>>>>> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
>>>>>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
>>>>>>>> 
>>>>>>>> [   10.886016] applesmc: key count changed from 261 to 1174405121
>>>>>>>> 
>>>>>>> 
>>>>>>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
>>>>>>> Which I guess explains the subsequent memory allocation error in the log.
>>>>>>> 
>>>>>>> Henrik, any idea what might be going on ? Is it possible that the previous
>>>>>>> command failure leaves some state machine in a bad state ?
>>>>>> 
>>>>>> I seem to recall a report on another similar state problem on newer
>>>>>> machines, so maybe, yes. Older machines seem fine, I have never
>>>>>> encountered the problem myself. Here is a patch to test that
>>>>>> theory. It has been tested to be pretty harmless on two different
>>>>>> generations.
>>>>>> 
>>>>>> I really really do not want to add an 'if (value is insane)' check ;-)
>>>>>> 
>>>>> Chris,
>>>>> 
>>>>> any chance you can load this patch on an affected machine so we can get
>>>>> test feedback ? This one is too experimental to submit upstream without
>>>>> knowing that it really fixes the problem.
>>>> 
>>>> Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012).
>>>> 
>>> Only requirement is that it also includes the previous patch, so it would be
>>> optimal if you can apply it on top of the previous image.
>> 
>> Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it:
>> 
>> [   13.799819] applesmc: key&1 fan=2 temp\x14 index\x14 acc=1 lux=2 kbd=1
>> [   13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10
>> 
> 
> Hi Chris,
> 
> That only means that you did not hit the problem. There may be some secondary
> trigger (cold boot ? coffee on the cpu ?).
> 
> One thing I have seen in all logs is the earlier "send_byte fail" message, so
> I think that is a pre-requisite.

I have no idea how to trigger it. I have tried cold and warm boots. Boots between linux and OS X to linux. *shrug* I'll keep trying as I'm doing other testing, maybe I'll stumble onto it.


Chris
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-02  3:55                                       ` [lm-sensors] " Chris Murphy
@ 2013-10-02  4:02                                         ` Guenter Roeck
  -1 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-02  4:02 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On 10/01/2013 08:55 PM, Chris Murphy wrote:
>
> On Oct 1, 2013, at 9:51 PM, Guenter Roeck <linux@roeck-us.net> wrote:
>
>> On Tue, Oct 01, 2013 at 07:09:26PM -0600, Chris Murphy wrote:
>>>
>>> On Oct 1, 2013, at 10:24 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>>>
>>>> On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote:
>>>>>
>>>>> On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>>>>>
>>>>>> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
>>>>>>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
>>>>>>>>>
>>>>>>>>> [   10.886016] applesmc: key count changed from 261 to 1174405121
>>>>>>>>>
>>>>>>>>
>>>>>>>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
>>>>>>>> Which I guess explains the subsequent memory allocation error in the log.
>>>>>>>>
>>>>>>>> Henrik, any idea what might be going on ? Is it possible that the previous
>>>>>>>> command failure leaves some state machine in a bad state ?
>>>>>>>
>>>>>>> I seem to recall a report on another similar state problem on newer
>>>>>>> machines, so maybe, yes. Older machines seem fine, I have never
>>>>>>> encountered the problem myself. Here is a patch to test that
>>>>>>> theory. It has been tested to be pretty harmless on two different
>>>>>>> generations.
>>>>>>>
>>>>>>> I really really do not want to add an 'if (value is insane)' check ;-)
>>>>>>>
>>>>>> Chris,
>>>>>>
>>>>>> any chance you can load this patch on an affected machine so we can get
>>>>>> test feedback ? This one is too experimental to submit upstream without
>>>>>> knowing that it really fixes the problem.
>>>>>
>>>>> Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012).
>>>>>
>>>> Only requirement is that it also includes the previous patch, so it would be
>>>> optimal if you can apply it on top of the previous image.
>>>
>>> Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it:
>>>
>>> [   13.799819] applesmc: key=261 fan=2 temp=14 index=14 acc=1 lux=2 kbd=1
>>> [   13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10
>>>
>>
>> Hi Chris,
>>
>> That only means that you did not hit the problem. There may be some secondary
>> trigger (cold boot ? coffee on the cpu ?).
>>
>> One thing I have seen in all logs is the earlier "send_byte fail" message, so
>> I think that is a pre-requisite.
>
> I have no idea how to trigger it. I have tried cold and warm boots. Boots between linux and OS X to linux. *shrug* I'll keep trying as I'm doing other testing, maybe I'll stumble onto it.
>

I am sure you didn't pour coffee onto the CPU yet :)

Basic rule of testing: You'll only hit the problem again after you are convinced
that it was magically fixed by a completely unrelated change.

Cheers,
Guenter


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-02  4:02                                         ` Guenter Roeck
  0 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-02  4:02 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On 10/01/2013 08:55 PM, Chris Murphy wrote:
>
> On Oct 1, 2013, at 9:51 PM, Guenter Roeck <linux@roeck-us.net> wrote:
>
>> On Tue, Oct 01, 2013 at 07:09:26PM -0600, Chris Murphy wrote:
>>>
>>> On Oct 1, 2013, at 10:24 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>>>
>>>> On Tue, Oct 01, 2013 at 09:33:13AM -0600, Chris Murphy wrote:
>>>>>
>>>>> On Oct 1, 2013, at 9:19 AM, Guenter Roeck <linux@roeck-us.net> wrote:
>>>>>
>>>>>> On Tue, Oct 01, 2013 at 12:55:26PM +0200, Henrik Rydberg wrote:
>>>>>>>>> Warning message triggered with 3.12.0-0.rc3.git0.1.fc21.x86_64.
>>>>>>>>>
>>>>>>>>> [   10.886016] applesmc: key count changed from 261 to 1174405121
>>>>>>>>>
>>>>>>>>
>>>>>>>> Explains the crash, but the new key count is very wrong. 1174405121 = 0x46000001.
>>>>>>>> Which I guess explains the subsequent memory allocation error in the log.
>>>>>>>>
>>>>>>>> Henrik, any idea what might be going on ? Is it possible that the previous
>>>>>>>> command failure leaves some state machine in a bad state ?
>>>>>>>
>>>>>>> I seem to recall a report on another similar state problem on newer
>>>>>>> machines, so maybe, yes. Older machines seem fine, I have never
>>>>>>> encountered the problem myself. Here is a patch to test that
>>>>>>> theory. It has been tested to be pretty harmless on two different
>>>>>>> generations.
>>>>>>>
>>>>>>> I really really do not want to add an 'if (value is insane)' check ;-)
>>>>>>>
>>>>>> Chris,
>>>>>>
>>>>>> any chance you can load this patch on an affected machine so we can get
>>>>>> test feedback ? This one is too experimental to submit upstream without
>>>>>> knowing that it really fixes the problem.
>>>>>
>>>>> Yes. What kernel.org source version should I apply it against? I'd use the non-debug config file from an equivalent version Fedora kernel, unless asked otherwise. And also should I test it on other vintages? I have here MBP4,1(2008); MBP8,2(2011), and MBP10,2(2012).
>>>>>
>>>> Only requirement is that it also includes the previous patch, so it would be
>>>> optimal if you can apply it on top of the previous image.
>>>
>>> Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it:
>>>
>>> [   13.799819] applesmc: key&1 fan=2 temp\x14 index\x14 acc=1 lux=2 kbd=1
>>> [   13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10
>>>
>>
>> Hi Chris,
>>
>> That only means that you did not hit the problem. There may be some secondary
>> trigger (cold boot ? coffee on the cpu ?).
>>
>> One thing I have seen in all logs is the earlier "send_byte fail" message, so
>> I think that is a pre-requisite.
>
> I have no idea how to trigger it. I have tried cold and warm boots. Boots between linux and OS X to linux. *shrug* I'll keep trying as I'm doing other testing, maybe I'll stumble onto it.
>

I am sure you didn't pour coffee onto the CPU yet :)

Basic rule of testing: You'll only hit the problem again after you are convinced
that it was magically fixed by a completely unrelated change.

Cheers,
Guenter


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-02  3:51                                     ` [lm-sensors] " Guenter Roeck
@ 2013-10-02  9:53                                       ` Henrik Rydberg
  -1 siblings, 0 replies; 86+ messages in thread
From: Henrik Rydberg @ 2013-10-02  9:53 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

> > Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it:
> > 
> > [   13.799819] applesmc: key=261 fan=2 temp=14 index=14 acc=1 lux=2 kbd=1
> > [   13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10
> > 
> 
> One thing I have seen in all logs is the earlier "send_byte fail" message, so
> I think that is a pre-requisite.

Not necessarily - it could be that the patch actually fixes the root
cause. One possible scenario is that on recent SMCs, some of the
commands produce more data than we actually read. This would
eventually lead to both data corruption and overflow somwhere in the
SMC internals.  If the original SMC error is interpreted as a read
buffer overflow, then that problem should be fixed with this patch.

Even so, the patch does require more testing. It would be great to get
coverage on all models we can find.

Thanks,
Henrik

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-02  9:53                                       ` Henrik Rydberg
  0 siblings, 0 replies; 86+ messages in thread
From: Henrik Rydberg @ 2013-10-02  9:53 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

> > Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it:
> > 
> > [   13.799819] applesmc: key&1 fan=2 temp\x14 index\x14 acc=1 lux=2 kbd=1
> > [   13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10
> > 
> 
> One thing I have seen in all logs is the earlier "send_byte fail" message, so
> I think that is a pre-requisite.

Not necessarily - it could be that the patch actually fixes the root
cause. One possible scenario is that on recent SMCs, some of the
commands produce more data than we actually read. This would
eventually lead to both data corruption and overflow somwhere in the
SMC internals.  If the original SMC error is interpreted as a read
buffer overflow, then that problem should be fixed with this patch.

Even so, the patch does require more testing. It would be great to get
coverage on all models we can find.

Thanks,
Henrik

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-02  9:53                                       ` [lm-sensors] " Henrik Rydberg
@ 2013-10-02 13:30                                         ` Guenter Roeck
  -1 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-02 13:30 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On 10/02/2013 02:53 AM, Henrik Rydberg wrote:
>>> Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it:
>>>
>>> [   13.799819] applesmc: key=261 fan=2 temp=14 index=14 acc=1 lux=2 kbd=1
>>> [   13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10
>>>
>>
>> One thing I have seen in all logs is the earlier "send_byte fail" message, so
>> I think that is a pre-requisite.
>
> Not necessarily - it could be that the patch actually fixes the root
> cause. One possible scenario is that on recent SMCs, some of the
> commands produce more data than we actually read. This would
> eventually lead to both data corruption and overflow somwhere in the
> SMC internals.  If the original SMC error is interpreted as a read
> buffer overflow, then that problem should be fixed with this patch.
>

Good point.

But shouldn't we at least get the "flushed %d bytes" warning message in this case ?

Thanks,
Guenter


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-02 13:30                                         ` Guenter Roeck
  0 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-02 13:30 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On 10/02/2013 02:53 AM, Henrik Rydberg wrote:
>>> Patch added on top of 3.12.0-0.rc3.git0.1.fc20.x86_64 and built. But after ~dozen reboots, I'm not triggering the problem. The only items in dmesg with smc in it:
>>>
>>> [   13.799819] applesmc: key&1 fan=2 temp\x14 index\x14 acc=1 lux=2 kbd=1
>>> [   13.833402] input: applesmc as /devices/platform/applesmc.768/input/input10
>>>
>>
>> One thing I have seen in all logs is the earlier "send_byte fail" message, so
>> I think that is a pre-requisite.
>
> Not necessarily - it could be that the patch actually fixes the root
> cause. One possible scenario is that on recent SMCs, some of the
> commands produce more data than we actually read. This would
> eventually lead to both data corruption and overflow somwhere in the
> SMC internals.  If the original SMC error is interpreted as a read
> buffer overflow, then that problem should be fixed with this patch.
>

Good point.

But shouldn't we at least get the "flushed %d bytes" warning message in this case ?

Thanks,
Guenter


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-02 13:30                                         ` [lm-sensors] " Guenter Roeck
@ 2013-10-02 16:34                                           ` Henrik Rydberg
  -1 siblings, 0 replies; 86+ messages in thread
From: Henrik Rydberg @ 2013-10-02 16:34 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

> >>One thing I have seen in all logs is the earlier "send_byte fail" message, so
> >>I think that is a pre-requisite.
> >
> >Not necessarily - it could be that the patch actually fixes the root
> >cause. One possible scenario is that on recent SMCs, some of the
> >commands produce more data than we actually read. This would
> >eventually lead to both data corruption and overflow somwhere in the
> >SMC internals.  If the original SMC error is interpreted as a read
> >buffer overflow, then that problem should be fixed with this patch.
> >
> 
> Good point.
> 
> But shouldn't we at least get the "flushed %d bytes" warning message in this case ?

The explanation I have there is that the (newer) SMC needs the
application to read the 'no more bytes' or it will get confused. It
makes sense, if the number of bytes to read is no longer specified.

Thanks,
Henrik

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-02 16:34                                           ` Henrik Rydberg
  0 siblings, 0 replies; 86+ messages in thread
From: Henrik Rydberg @ 2013-10-02 16:34 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

> >>One thing I have seen in all logs is the earlier "send_byte fail" message, so
> >>I think that is a pre-requisite.
> >
> >Not necessarily - it could be that the patch actually fixes the root
> >cause. One possible scenario is that on recent SMCs, some of the
> >commands produce more data than we actually read. This would
> >eventually lead to both data corruption and overflow somwhere in the
> >SMC internals.  If the original SMC error is interpreted as a read
> >buffer overflow, then that problem should be fixed with this patch.
> >
> 
> Good point.
> 
> But shouldn't we at least get the "flushed %d bytes" warning message in this case ?

The explanation I have there is that the (newer) SMC needs the
application to read the 'no more bytes' or it will get confused. It
makes sense, if the number of bytes to read is no longer specified.

Thanks,
Henrik

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-02 16:34                                           ` [lm-sensors] " Henrik Rydberg
@ 2013-10-02 16:47                                             ` Guenter Roeck
  -1 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-02 16:47 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote:
> > >>One thing I have seen in all logs is the earlier "send_byte fail" message, so
> > >>I think that is a pre-requisite.
> > >
> > >Not necessarily - it could be that the patch actually fixes the root
> > >cause. One possible scenario is that on recent SMCs, some of the
> > >commands produce more data than we actually read. This would
> > >eventually lead to both data corruption and overflow somwhere in the
> > >SMC internals.  If the original SMC error is interpreted as a read
> > >buffer overflow, then that problem should be fixed with this patch.
> > >
> > 
> > Good point.
> > 
> > But shouldn't we at least get the "flushed %d bytes" warning message in this case ?
> 
> The explanation I have there is that the (newer) SMC needs the
> application to read the 'no more bytes' or it will get confused. It
> makes sense, if the number of bytes to read is no longer specified.
> 
You mean that just reading from APPLESMC_CMD_PORT would solve the problem ?
That might make sense.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-02 16:47                                             ` Guenter Roeck
  0 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-02 16:47 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote:
> > >>One thing I have seen in all logs is the earlier "send_byte fail" message, so
> > >>I think that is a pre-requisite.
> > >
> > >Not necessarily - it could be that the patch actually fixes the root
> > >cause. One possible scenario is that on recent SMCs, some of the
> > >commands produce more data than we actually read. This would
> > >eventually lead to both data corruption and overflow somwhere in the
> > >SMC internals.  If the original SMC error is interpreted as a read
> > >buffer overflow, then that problem should be fixed with this patch.
> > >
> > 
> > Good point.
> > 
> > But shouldn't we at least get the "flushed %d bytes" warning message in this case ?
> 
> The explanation I have there is that the (newer) SMC needs the
> application to read the 'no more bytes' or it will get confused. It
> makes sense, if the number of bytes to read is no longer specified.
> 
You mean that just reading from APPLESMC_CMD_PORT would solve the problem ?
That might make sense.

Thanks,
Guenter

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-02 16:47                                             ` [lm-sensors] " Guenter Roeck
@ 2013-10-02 17:24                                               ` Henrik Rydberg
  -1 siblings, 0 replies; 86+ messages in thread
From: Henrik Rydberg @ 2013-10-02 17:24 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote:
> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote:
> > > >>One thing I have seen in all logs is the earlier "send_byte fail" message, so
> > > >>I think that is a pre-requisite.
> > > >
> > > >Not necessarily - it could be that the patch actually fixes the root
> > > >cause. One possible scenario is that on recent SMCs, some of the
> > > >commands produce more data than we actually read. This would
> > > >eventually lead to both data corruption and overflow somwhere in the
> > > >SMC internals.  If the original SMC error is interpreted as a read
> > > >buffer overflow, then that problem should be fixed with this patch.
> > > >
> > > 
> > > Good point.
> > > 
> > > But shouldn't we at least get the "flushed %d bytes" warning message in this case ?
> > 
> > The explanation I have there is that the (newer) SMC needs the
> > application to read the 'no more bytes' or it will get confused. It
> > makes sense, if the number of bytes to read is no longer specified.
> > 
> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ?
> That might make sense.

It also points at the possibility of a smaller patch to test, but I
have not had the time to check this very deeply myself:

Thanks,
Henrik

>From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001
From: Henrik Rydberg <rydberg@euromail.se>
Date: Wed, 2 Oct 2013 19:15:03 +0200
Subject: [PATCH] applesmc remedy take 2

Conjectured problem: there are remnant bytes ready on the data line
which corrupts the read after a failure.

Remedy: assuming bit0 is the read valid line, try to flush it before
starting a new command.

Tests by Chris suggests reading the status is enough for the problem
to go away, which is consistent with a change in the SMC interface,
where the number of bytes to read is no longer specified, but found
out by reading until end of data.

Tested on a MacBookAir3,1, but the original problem has not been
reproduced.
---
 drivers/hwmon/applesmc.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
index 98814d1..c0ff350 100644
--- a/drivers/hwmon/applesmc.c
+++ b/drivers/hwmon/applesmc.c
@@ -230,6 +230,7 @@ static int send_argument(const char *key)
 
 static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
 {
+	u8 status, data;
 	int i;
 
 	if (send_command(cmd) || send_argument(key)) {
@@ -237,6 +238,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
 		return -EIO;
 	}
 
+	/* This has no effect on newer (2012) SMCs */
 	if (send_byte(len, APPLESMC_DATA_PORT)) {
 		pr_warn("%.4s: read len fail\n", key);
 		return -EIO;
@@ -250,6 +252,17 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
 		buffer[i] = inb(APPLESMC_DATA_PORT);
 	}
 
+	/* Read the data port until bit0 is cleared */
+	for (i = 0; i < 16; i++) {
+		udelay(APPLESMC_MIN_WAIT);
+		status = inb(APPLESMC_CMD_PORT);
+		if (!(status & 0x01))
+			break;
+		data = inb(APPLESMC_DATA_PORT);
+	}
+	if (i)
+		pr_warn("flushed %d bytes, last value is: %d\n", i, data);
+
 	return 0;
 }
 
-- 
1.8.3.4


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-02 17:24                                               ` Henrik Rydberg
  0 siblings, 0 replies; 86+ messages in thread
From: Henrik Rydberg @ 2013-10-02 17:24 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote:
> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote:
> > > >>One thing I have seen in all logs is the earlier "send_byte fail" message, so
> > > >>I think that is a pre-requisite.
> > > >
> > > >Not necessarily - it could be that the patch actually fixes the root
> > > >cause. One possible scenario is that on recent SMCs, some of the
> > > >commands produce more data than we actually read. This would
> > > >eventually lead to both data corruption and overflow somwhere in the
> > > >SMC internals.  If the original SMC error is interpreted as a read
> > > >buffer overflow, then that problem should be fixed with this patch.
> > > >
> > > 
> > > Good point.
> > > 
> > > But shouldn't we at least get the "flushed %d bytes" warning message in this case ?
> > 
> > The explanation I have there is that the (newer) SMC needs the
> > application to read the 'no more bytes' or it will get confused. It
> > makes sense, if the number of bytes to read is no longer specified.
> > 
> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ?
> That might make sense.

It also points at the possibility of a smaller patch to test, but I
have not had the time to check this very deeply myself:

Thanks,
Henrik

From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001
From: Henrik Rydberg <rydberg@euromail.se>
Date: Wed, 2 Oct 2013 19:15:03 +0200
Subject: [PATCH] applesmc remedy take 2

Conjectured problem: there are remnant bytes ready on the data line
which corrupts the read after a failure.

Remedy: assuming bit0 is the read valid line, try to flush it before
starting a new command.

Tests by Chris suggests reading the status is enough for the problem
to go away, which is consistent with a change in the SMC interface,
where the number of bytes to read is no longer specified, but found
out by reading until end of data.

Tested on a MacBookAir3,1, but the original problem has not been
reproduced.
---
 drivers/hwmon/applesmc.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
index 98814d1..c0ff350 100644
--- a/drivers/hwmon/applesmc.c
+++ b/drivers/hwmon/applesmc.c
@@ -230,6 +230,7 @@ static int send_argument(const char *key)
 
 static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
 {
+	u8 status, data;
 	int i;
 
 	if (send_command(cmd) || send_argument(key)) {
@@ -237,6 +238,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
 		return -EIO;
 	}
 
+	/* This has no effect on newer (2012) SMCs */
 	if (send_byte(len, APPLESMC_DATA_PORT)) {
 		pr_warn("%.4s: read len fail\n", key);
 		return -EIO;
@@ -250,6 +252,17 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
 		buffer[i] = inb(APPLESMC_DATA_PORT);
 	}
 
+	/* Read the data port until bit0 is cleared */
+	for (i = 0; i < 16; i++) {
+		udelay(APPLESMC_MIN_WAIT);
+		status = inb(APPLESMC_CMD_PORT);
+		if (!(status & 0x01))
+			break;
+		data = inb(APPLESMC_DATA_PORT);
+	}
+	if (i)
+		pr_warn("flushed %d bytes, last value is: %d\n", i, data);
+
 	return 0;
 }
 
-- 
1.8.3.4


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-02 17:24                                               ` [lm-sensors] " Henrik Rydberg
@ 2013-10-02 18:02                                                 ` Guenter Roeck
  -1 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-02 18:02 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote:
> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote:
> > On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote:
> > > > >>One thing I have seen in all logs is the earlier "send_byte fail" message, so
> > > > >>I think that is a pre-requisite.
> > > > >
> > > > >Not necessarily - it could be that the patch actually fixes the root
> > > > >cause. One possible scenario is that on recent SMCs, some of the
> > > > >commands produce more data than we actually read. This would
> > > > >eventually lead to both data corruption and overflow somwhere in the
> > > > >SMC internals.  If the original SMC error is interpreted as a read
> > > > >buffer overflow, then that problem should be fixed with this patch.
> > > > >
> > > > 
> > > > Good point.
> > > > 
> > > > But shouldn't we at least get the "flushed %d bytes" warning message in this case ?
> > > 
> > > The explanation I have there is that the (newer) SMC needs the
> > > application to read the 'no more bytes' or it will get confused. It
> > > makes sense, if the number of bytes to read is no longer specified.
> > > 
> > You mean that just reading from APPLESMC_CMD_PORT would solve the problem ?
> > That might make sense.
> 
> It also points at the possibility of a smaller patch to test, but I
> have not had the time to check this very deeply myself:
> 
I like this patch much more than the previous patch. Chris, can you test it ?

Thanks,
Guenter

> Thanks,
> Henrik
> 
> From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001
> From: Henrik Rydberg <rydberg@euromail.se>
> Date: Wed, 2 Oct 2013 19:15:03 +0200
> Subject: [PATCH] applesmc remedy take 2
> 
> Conjectured problem: there are remnant bytes ready on the data line
> which corrupts the read after a failure.
> 
> Remedy: assuming bit0 is the read valid line, try to flush it before
> starting a new command.
> 
> Tests by Chris suggests reading the status is enough for the problem
> to go away, which is consistent with a change in the SMC interface,
> where the number of bytes to read is no longer specified, but found
> out by reading until end of data.
> 
> Tested on a MacBookAir3,1, but the original problem has not been
> reproduced.
> ---
>  drivers/hwmon/applesmc.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
> index 98814d1..c0ff350 100644
> --- a/drivers/hwmon/applesmc.c
> +++ b/drivers/hwmon/applesmc.c
> @@ -230,6 +230,7 @@ static int send_argument(const char *key)
>  
>  static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
>  {
> +	u8 status, data;
>  	int i;
>  
>  	if (send_command(cmd) || send_argument(key)) {
> @@ -237,6 +238,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
>  		return -EIO;
>  	}
>  
> +	/* This has no effect on newer (2012) SMCs */
>  	if (send_byte(len, APPLESMC_DATA_PORT)) {
>  		pr_warn("%.4s: read len fail\n", key);
>  		return -EIO;
> @@ -250,6 +252,17 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
>  		buffer[i] = inb(APPLESMC_DATA_PORT);
>  	}
>  
> +	/* Read the data port until bit0 is cleared */
> +	for (i = 0; i < 16; i++) {
> +		udelay(APPLESMC_MIN_WAIT);
> +		status = inb(APPLESMC_CMD_PORT);
> +		if (!(status & 0x01))
> +			break;
> +		data = inb(APPLESMC_DATA_PORT);
> +	}
> +	if (i)
> +		pr_warn("flushed %d bytes, last value is: %d\n", i, data);
> +
>  	return 0;
>  }
>  
> -- 
> 1.8.3.4
> 
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-02 18:02                                                 ` Guenter Roeck
  0 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-02 18:02 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote:
> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote:
> > On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote:
> > > > >>One thing I have seen in all logs is the earlier "send_byte fail" message, so
> > > > >>I think that is a pre-requisite.
> > > > >
> > > > >Not necessarily - it could be that the patch actually fixes the root
> > > > >cause. One possible scenario is that on recent SMCs, some of the
> > > > >commands produce more data than we actually read. This would
> > > > >eventually lead to both data corruption and overflow somwhere in the
> > > > >SMC internals.  If the original SMC error is interpreted as a read
> > > > >buffer overflow, then that problem should be fixed with this patch.
> > > > >
> > > > 
> > > > Good point.
> > > > 
> > > > But shouldn't we at least get the "flushed %d bytes" warning message in this case ?
> > > 
> > > The explanation I have there is that the (newer) SMC needs the
> > > application to read the 'no more bytes' or it will get confused. It
> > > makes sense, if the number of bytes to read is no longer specified.
> > > 
> > You mean that just reading from APPLESMC_CMD_PORT would solve the problem ?
> > That might make sense.
> 
> It also points at the possibility of a smaller patch to test, but I
> have not had the time to check this very deeply myself:
> 
I like this patch much more than the previous patch. Chris, can you test it ?

Thanks,
Guenter

> Thanks,
> Henrik
> 
> From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001
> From: Henrik Rydberg <rydberg@euromail.se>
> Date: Wed, 2 Oct 2013 19:15:03 +0200
> Subject: [PATCH] applesmc remedy take 2
> 
> Conjectured problem: there are remnant bytes ready on the data line
> which corrupts the read after a failure.
> 
> Remedy: assuming bit0 is the read valid line, try to flush it before
> starting a new command.
> 
> Tests by Chris suggests reading the status is enough for the problem
> to go away, which is consistent with a change in the SMC interface,
> where the number of bytes to read is no longer specified, but found
> out by reading until end of data.
> 
> Tested on a MacBookAir3,1, but the original problem has not been
> reproduced.
> ---
>  drivers/hwmon/applesmc.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
> index 98814d1..c0ff350 100644
> --- a/drivers/hwmon/applesmc.c
> +++ b/drivers/hwmon/applesmc.c
> @@ -230,6 +230,7 @@ static int send_argument(const char *key)
>  
>  static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
>  {
> +	u8 status, data;
>  	int i;
>  
>  	if (send_command(cmd) || send_argument(key)) {
> @@ -237,6 +238,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
>  		return -EIO;
>  	}
>  
> +	/* This has no effect on newer (2012) SMCs */
>  	if (send_byte(len, APPLESMC_DATA_PORT)) {
>  		pr_warn("%.4s: read len fail\n", key);
>  		return -EIO;
> @@ -250,6 +252,17 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
>  		buffer[i] = inb(APPLESMC_DATA_PORT);
>  	}
>  
> +	/* Read the data port until bit0 is cleared */
> +	for (i = 0; i < 16; i++) {
> +		udelay(APPLESMC_MIN_WAIT);
> +		status = inb(APPLESMC_CMD_PORT);
> +		if (!(status & 0x01))
> +			break;
> +		data = inb(APPLESMC_DATA_PORT);
> +	}
> +	if (i)
> +		pr_warn("flushed %d bytes, last value is: %d\n", i, data);
> +
>  	return 0;
>  }
>  
> -- 
> 1.8.3.4
> 
> 

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-02 18:02                                                 ` [lm-sensors] " Guenter Roeck
@ 2013-10-02 18:33                                                   ` Chris Murphy
  -1 siblings, 0 replies; 86+ messages in thread
From: Chris Murphy @ 2013-10-02 18:33 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org


On Oct 2, 2013, at 12:02 PM, Guenter Roeck <linux@roeck-us.net> wrote:

> On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote:
>> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote:
>>> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote:
>>>>>>> One thing I have seen in all logs is the earlier "send_byte fail" message, so
>>>>>>> I think that is a pre-requisite.
>>>>>> 
>>>>>> Not necessarily - it could be that the patch actually fixes the root
>>>>>> cause. One possible scenario is that on recent SMCs, some of the
>>>>>> commands produce more data than we actually read. This would
>>>>>> eventually lead to both data corruption and overflow somwhere in the
>>>>>> SMC internals.  If the original SMC error is interpreted as a read
>>>>>> buffer overflow, then that problem should be fixed with this patch.
>>>>>> 
>>>>> 
>>>>> Good point.
>>>>> 
>>>>> But shouldn't we at least get the "flushed %d bytes" warning message in this case ?
>>>> 
>>>> The explanation I have there is that the (newer) SMC needs the
>>>> application to read the 'no more bytes' or it will get confused. It
>>>> makes sense, if the number of bytes to read is no longer specified.
>>>> 
>>> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ?
>>> That might make sense.
>> 
>> It also points at the possibility of a smaller patch to test, but I
>> have not had the time to check this very deeply myself:
>> 
> I like this patch much more than the previous patch. Chris, can you test it ?

Yes. Building now. What kernel message should I be looking for? At least on 2011 and 2012 laptops I have yet to see an Oops related to smc. The kernel with previous patch at least is not causing problems on them so far, which works well as I can test more on the 2008 model.

Chris Murphy

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-02 18:33                                                   ` Chris Murphy
  0 siblings, 0 replies; 86+ messages in thread
From: Chris Murphy @ 2013-10-02 18:33 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org


On Oct 2, 2013, at 12:02 PM, Guenter Roeck <linux@roeck-us.net> wrote:

> On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote:
>> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote:
>>> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote:
>>>>>>> One thing I have seen in all logs is the earlier "send_byte fail" message, so
>>>>>>> I think that is a pre-requisite.
>>>>>> 
>>>>>> Not necessarily - it could be that the patch actually fixes the root
>>>>>> cause. One possible scenario is that on recent SMCs, some of the
>>>>>> commands produce more data than we actually read. This would
>>>>>> eventually lead to both data corruption and overflow somwhere in the
>>>>>> SMC internals.  If the original SMC error is interpreted as a read
>>>>>> buffer overflow, then that problem should be fixed with this patch.
>>>>>> 
>>>>> 
>>>>> Good point.
>>>>> 
>>>>> But shouldn't we at least get the "flushed %d bytes" warning message in this case ?
>>>> 
>>>> The explanation I have there is that the (newer) SMC needs the
>>>> application to read the 'no more bytes' or it will get confused. It
>>>> makes sense, if the number of bytes to read is no longer specified.
>>>> 
>>> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ?
>>> That might make sense.
>> 
>> It also points at the possibility of a smaller patch to test, but I
>> have not had the time to check this very deeply myself:
>> 
> I like this patch much more than the previous patch. Chris, can you test it ?

Yes. Building now. What kernel message should I be looking for? At least on 2011 and 2012 laptops I have yet to see an Oops related to smc. The kernel with previous patch at least is not causing problems on them so far, which works well as I can test more on the 2008 model.

Chris Murphy
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-02 18:33                                                   ` [lm-sensors] " Chris Murphy
@ 2013-10-02 20:59                                                     ` Guenter Roeck
  -1 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-02 20:59 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On Wed, Oct 02, 2013 at 12:33:00PM -0600, Chris Murphy wrote:
> 
> On Oct 2, 2013, at 12:02 PM, Guenter Roeck <linux@roeck-us.net> wrote:
> 
> > On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote:
> >> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote:
> >>> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote:
> >>>>>>> One thing I have seen in all logs is the earlier "send_byte fail" message, so
> >>>>>>> I think that is a pre-requisite.
> >>>>>> 
> >>>>>> Not necessarily - it could be that the patch actually fixes the root
> >>>>>> cause. One possible scenario is that on recent SMCs, some of the
> >>>>>> commands produce more data than we actually read. This would
> >>>>>> eventually lead to both data corruption and overflow somwhere in the
> >>>>>> SMC internals.  If the original SMC error is interpreted as a read
> >>>>>> buffer overflow, then that problem should be fixed with this patch.
> >>>>>> 
> >>>>> 
> >>>>> Good point.
> >>>>> 
> >>>>> But shouldn't we at least get the "flushed %d bytes" warning message in this case ?
> >>>> 
> >>>> The explanation I have there is that the (newer) SMC needs the
> >>>> application to read the 'no more bytes' or it will get confused. It
> >>>> makes sense, if the number of bytes to read is no longer specified.
> >>>> 
> >>> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ?
> >>> That might make sense.
> >> 
> >> It also points at the possibility of a smaller patch to test, but I
> >> have not had the time to check this very deeply myself:
> >> 
> > I like this patch much more than the previous patch. Chris, can you test it ?
> 
> Yes. Building now. What kernel message should I be looking for? At least on 2011 and 2012 laptops I have yet to see an Oops related to smc. The kernel with previous patch at least is not causing problems on them so far, which works well as I can test more on the 2008 model.
> 
None, if I understand correctly and if the patch really fixes the root cause
of the problem.

Guenter

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-02 20:59                                                     ` Guenter Roeck
  0 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-02 20:59 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On Wed, Oct 02, 2013 at 12:33:00PM -0600, Chris Murphy wrote:
> 
> On Oct 2, 2013, at 12:02 PM, Guenter Roeck <linux@roeck-us.net> wrote:
> 
> > On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote:
> >> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote:
> >>> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote:
> >>>>>>> One thing I have seen in all logs is the earlier "send_byte fail" message, so
> >>>>>>> I think that is a pre-requisite.
> >>>>>> 
> >>>>>> Not necessarily - it could be that the patch actually fixes the root
> >>>>>> cause. One possible scenario is that on recent SMCs, some of the
> >>>>>> commands produce more data than we actually read. This would
> >>>>>> eventually lead to both data corruption and overflow somwhere in the
> >>>>>> SMC internals.  If the original SMC error is interpreted as a read
> >>>>>> buffer overflow, then that problem should be fixed with this patch.
> >>>>>> 
> >>>>> 
> >>>>> Good point.
> >>>>> 
> >>>>> But shouldn't we at least get the "flushed %d bytes" warning message in this case ?
> >>>> 
> >>>> The explanation I have there is that the (newer) SMC needs the
> >>>> application to read the 'no more bytes' or it will get confused. It
> >>>> makes sense, if the number of bytes to read is no longer specified.
> >>>> 
> >>> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ?
> >>> That might make sense.
> >> 
> >> It also points at the possibility of a smaller patch to test, but I
> >> have not had the time to check this very deeply myself:
> >> 
> > I like this patch much more than the previous patch. Chris, can you test it ?
> 
> Yes. Building now. What kernel message should I be looking for? At least on 2011 and 2012 laptops I have yet to see an Oops related to smc. The kernel with previous patch at least is not causing problems on them so far, which works well as I can test more on the 2008 model.
> 
None, if I understand correctly and if the patch really fixes the root cause
of the problem.

Guenter

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-02 20:59                                                     ` [lm-sensors] " Guenter Roeck
@ 2013-10-02 21:34                                                       ` Chris Murphy
  -1 siblings, 0 replies; 86+ messages in thread
From: Chris Murphy @ 2013-10-02 21:34 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org


On Oct 2, 2013, at 2:59 PM, Guenter Roeck <linux@roeck-us.net> wrote:

> On Wed, Oct 02, 2013 at 12:33:00PM -0600, Chris Murphy wrote:
>> 
>> On Oct 2, 2013, at 12:02 PM, Guenter Roeck <linux@roeck-us.net> wrote:
>> 
>>> On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote:
>>>> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote:
>>>>> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote:
>>>>>>>>> One thing I have seen in all logs is the earlier "send_byte fail" message, so
>>>>>>>>> I think that is a pre-requisite.
>>>>>>>> 
>>>>>>>> Not necessarily - it could be that the patch actually fixes the root
>>>>>>>> cause. One possible scenario is that on recent SMCs, some of the
>>>>>>>> commands produce more data than we actually read. This would
>>>>>>>> eventually lead to both data corruption and overflow somwhere in the
>>>>>>>> SMC internals.  If the original SMC error is interpreted as a read
>>>>>>>> buffer overflow, then that problem should be fixed with this patch.
>>>>>>>> 
>>>>>>> 
>>>>>>> Good point.
>>>>>>> 
>>>>>>> But shouldn't we at least get the "flushed %d bytes" warning message in this case ?
>>>>>> 
>>>>>> The explanation I have there is that the (newer) SMC needs the
>>>>>> application to read the 'no more bytes' or it will get confused. It
>>>>>> makes sense, if the number of bytes to read is no longer specified.
>>>>>> 
>>>>> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ?
>>>>> That might make sense.
>>>> 
>>>> It also points at the possibility of a smaller patch to test, but I
>>>> have not had the time to check this very deeply myself:
>>>> 
>>> I like this patch much more than the previous patch. Chris, can you test it ?
>> 
>> Yes. Building now. What kernel message should I be looking for? At least on 2011 and 2012 laptops I have yet to see an Oops related to smc. The kernel with previous patch at least is not causing problems on them so far, which works well as I can test more on the 2008 model.
>> 
> None, if I understand correctly and if the patch really fixes the root cause
> of the problem.

A vast majority of the Ooops I've had are when booting from flash media, testing Fedora installs. Is it possible the much slower kernel load and boot time is a trigger? If so, I'll look into modifying the media to accept the custom kernel and requisite fat initramfs.


Chris

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-02 21:34                                                       ` Chris Murphy
  0 siblings, 0 replies; 86+ messages in thread
From: Chris Murphy @ 2013-10-02 21:34 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org


On Oct 2, 2013, at 2:59 PM, Guenter Roeck <linux@roeck-us.net> wrote:

> On Wed, Oct 02, 2013 at 12:33:00PM -0600, Chris Murphy wrote:
>> 
>> On Oct 2, 2013, at 12:02 PM, Guenter Roeck <linux@roeck-us.net> wrote:
>> 
>>> On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote:
>>>> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote:
>>>>> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote:
>>>>>>>>> One thing I have seen in all logs is the earlier "send_byte fail" message, so
>>>>>>>>> I think that is a pre-requisite.
>>>>>>>> 
>>>>>>>> Not necessarily - it could be that the patch actually fixes the root
>>>>>>>> cause. One possible scenario is that on recent SMCs, some of the
>>>>>>>> commands produce more data than we actually read. This would
>>>>>>>> eventually lead to both data corruption and overflow somwhere in the
>>>>>>>> SMC internals.  If the original SMC error is interpreted as a read
>>>>>>>> buffer overflow, then that problem should be fixed with this patch.
>>>>>>>> 
>>>>>>> 
>>>>>>> Good point.
>>>>>>> 
>>>>>>> But shouldn't we at least get the "flushed %d bytes" warning message in this case ?
>>>>>> 
>>>>>> The explanation I have there is that the (newer) SMC needs the
>>>>>> application to read the 'no more bytes' or it will get confused. It
>>>>>> makes sense, if the number of bytes to read is no longer specified.
>>>>>> 
>>>>> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ?
>>>>> That might make sense.
>>>> 
>>>> It also points at the possibility of a smaller patch to test, but I
>>>> have not had the time to check this very deeply myself:
>>>> 
>>> I like this patch much more than the previous patch. Chris, can you test it ?
>> 
>> Yes. Building now. What kernel message should I be looking for? At least on 2011 and 2012 laptops I have yet to see an Oops related to smc. The kernel with previous patch at least is not causing problems on them so far, which works well as I can test more on the 2008 model.
>> 
> None, if I understand correctly and if the patch really fixes the root cause
> of the problem.

A vast majority of the Ooops I've had are when booting from flash media, testing Fedora installs. Is it possible the much slower kernel load and boot time is a trigger? If so, I'll look into modifying the media to accept the custom kernel and requisite fat initramfs.


Chris
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-02 21:34                                                       ` [lm-sensors] " Chris Murphy
@ 2013-10-02 23:32                                                         ` Guenter Roeck
  -1 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-02 23:32 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On Wed, Oct 02, 2013 at 03:34:41PM -0600, Chris Murphy wrote:
> 
> On Oct 2, 2013, at 2:59 PM, Guenter Roeck <linux@roeck-us.net> wrote:
> 
> > On Wed, Oct 02, 2013 at 12:33:00PM -0600, Chris Murphy wrote:
> >> 
> >> On Oct 2, 2013, at 12:02 PM, Guenter Roeck <linux@roeck-us.net> wrote:
> >> 
> >>> On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote:
> >>>> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote:
> >>>>> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote:
> >>>>>>>>> One thing I have seen in all logs is the earlier "send_byte fail" message, so
> >>>>>>>>> I think that is a pre-requisite.
> >>>>>>>> 
> >>>>>>>> Not necessarily - it could be that the patch actually fixes the root
> >>>>>>>> cause. One possible scenario is that on recent SMCs, some of the
> >>>>>>>> commands produce more data than we actually read. This would
> >>>>>>>> eventually lead to both data corruption and overflow somwhere in the
> >>>>>>>> SMC internals.  If the original SMC error is interpreted as a read
> >>>>>>>> buffer overflow, then that problem should be fixed with this patch.
> >>>>>>>> 
> >>>>>>> 
> >>>>>>> Good point.
> >>>>>>> 
> >>>>>>> But shouldn't we at least get the "flushed %d bytes" warning message in this case ?
> >>>>>> 
> >>>>>> The explanation I have there is that the (newer) SMC needs the
> >>>>>> application to read the 'no more bytes' or it will get confused. It
> >>>>>> makes sense, if the number of bytes to read is no longer specified.
> >>>>>> 
> >>>>> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ?
> >>>>> That might make sense.
> >>>> 
> >>>> It also points at the possibility of a smaller patch to test, but I
> >>>> have not had the time to check this very deeply myself:
> >>>> 
> >>> I like this patch much more than the previous patch. Chris, can you test it ?
> >> 
> >> Yes. Building now. What kernel message should I be looking for? At least on 2011 and 2012 laptops I have yet to see an Oops related to smc. The kernel with previous patch at least is not causing problems on them so far, which works well as I can test more on the 2008 model.
> >> 
> > None, if I understand correctly and if the patch really fixes the root cause
> > of the problem.
> 
> A vast majority of the Ooops I've had are when booting from flash media, testing Fedora installs. Is it possible the much slower kernel load and boot time is a trigger? If so, I'll look into modifying the media to accept the custom kernel and requisite fat initramfs.
> 
Yes, that could be a possible trigger. I thought it might be triggered by faster
boot (as one gets with 3.10 and 3.11), but slow boot is just as likely.

Guenter

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-02 23:32                                                         ` Guenter Roeck
  0 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-02 23:32 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On Wed, Oct 02, 2013 at 03:34:41PM -0600, Chris Murphy wrote:
> 
> On Oct 2, 2013, at 2:59 PM, Guenter Roeck <linux@roeck-us.net> wrote:
> 
> > On Wed, Oct 02, 2013 at 12:33:00PM -0600, Chris Murphy wrote:
> >> 
> >> On Oct 2, 2013, at 12:02 PM, Guenter Roeck <linux@roeck-us.net> wrote:
> >> 
> >>> On Wed, Oct 02, 2013 at 07:24:10PM +0200, Henrik Rydberg wrote:
> >>>> On Wed, Oct 02, 2013 at 09:47:18AM -0700, Guenter Roeck wrote:
> >>>>> On Wed, Oct 02, 2013 at 06:34:18PM +0200, Henrik Rydberg wrote:
> >>>>>>>>> One thing I have seen in all logs is the earlier "send_byte fail" message, so
> >>>>>>>>> I think that is a pre-requisite.
> >>>>>>>> 
> >>>>>>>> Not necessarily - it could be that the patch actually fixes the root
> >>>>>>>> cause. One possible scenario is that on recent SMCs, some of the
> >>>>>>>> commands produce more data than we actually read. This would
> >>>>>>>> eventually lead to both data corruption and overflow somwhere in the
> >>>>>>>> SMC internals.  If the original SMC error is interpreted as a read
> >>>>>>>> buffer overflow, then that problem should be fixed with this patch.
> >>>>>>>> 
> >>>>>>> 
> >>>>>>> Good point.
> >>>>>>> 
> >>>>>>> But shouldn't we at least get the "flushed %d bytes" warning message in this case ?
> >>>>>> 
> >>>>>> The explanation I have there is that the (newer) SMC needs the
> >>>>>> application to read the 'no more bytes' or it will get confused. It
> >>>>>> makes sense, if the number of bytes to read is no longer specified.
> >>>>>> 
> >>>>> You mean that just reading from APPLESMC_CMD_PORT would solve the problem ?
> >>>>> That might make sense.
> >>>> 
> >>>> It also points at the possibility of a smaller patch to test, but I
> >>>> have not had the time to check this very deeply myself:
> >>>> 
> >>> I like this patch much more than the previous patch. Chris, can you test it ?
> >> 
> >> Yes. Building now. What kernel message should I be looking for? At least on 2011 and 2012 laptops I have yet to see an Oops related to smc. The kernel with previous patch at least is not causing problems on them so far, which works well as I can test more on the 2008 model.
> >> 
> > None, if I understand correctly and if the patch really fixes the root cause
> > of the problem.
> 
> A vast majority of the Ooops I've had are when booting from flash media, testing Fedora installs. Is it possible the much slower kernel load and boot time is a trigger? If so, I'll look into modifying the media to accept the custom kernel and requisite fat initramfs.
> 
Yes, that could be a possible trigger. I thought it might be triggered by faster
boot (as one gets with 3.10 and 3.11), but slow boot is just as likely.

Guenter

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-02 17:24                                               ` [lm-sensors] " Henrik Rydberg
@ 2013-10-07 23:42                                                 ` Guenter Roeck
  -1 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-07 23:42 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On 10/02/2013 10:24 AM, Henrik Rydberg wrote:

>>From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001
> From: Henrik Rydberg <rydberg@euromail.se>
> Date: Wed, 2 Oct 2013 19:15:03 +0200
> Subject: [PATCH] applesmc remedy take 2
>
> Conjectured problem: there are remnant bytes ready on the data line
> which corrupts the read after a failure.
>
> Remedy: assuming bit0 is the read valid line, try to flush it before
> starting a new command.
>
> Tests by Chris suggests reading the status is enough for the problem
> to go away, which is consistent with a change in the SMC interface,
> where the number of bytes to read is no longer specified, but found
> out by reading until end of data.
>
> Tested on a MacBookAir3,1, but the original problem has not been
> reproduced.

So, what should we do with this patch ? Apply it ?

Thanks,
Guenter

> ---
>   drivers/hwmon/applesmc.c | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
>
> diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
> index 98814d1..c0ff350 100644
> --- a/drivers/hwmon/applesmc.c
> +++ b/drivers/hwmon/applesmc.c
> @@ -230,6 +230,7 @@ static int send_argument(const char *key)
>
>   static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
>   {
> +	u8 status, data;
>   	int i;
>
>   	if (send_command(cmd) || send_argument(key)) {
> @@ -237,6 +238,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
>   		return -EIO;
>   	}
>
> +	/* This has no effect on newer (2012) SMCs */
>   	if (send_byte(len, APPLESMC_DATA_PORT)) {
>   		pr_warn("%.4s: read len fail\n", key);
>   		return -EIO;
> @@ -250,6 +252,17 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
>   		buffer[i] = inb(APPLESMC_DATA_PORT);
>   	}
>
> +	/* Read the data port until bit0 is cleared */
> +	for (i = 0; i < 16; i++) {
> +		udelay(APPLESMC_MIN_WAIT);
> +		status = inb(APPLESMC_CMD_PORT);
> +		if (!(status & 0x01))
> +			break;
> +		data = inb(APPLESMC_DATA_PORT);
> +	}
> +	if (i)
> +		pr_warn("flushed %d bytes, last value is: %d\n", i, data);
> +
>   	return 0;
>   }
>
>


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-07 23:42                                                 ` Guenter Roeck
  0 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-07 23:42 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On 10/02/2013 10:24 AM, Henrik Rydberg wrote:

>>From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001
> From: Henrik Rydberg <rydberg@euromail.se>
> Date: Wed, 2 Oct 2013 19:15:03 +0200
> Subject: [PATCH] applesmc remedy take 2
>
> Conjectured problem: there are remnant bytes ready on the data line
> which corrupts the read after a failure.
>
> Remedy: assuming bit0 is the read valid line, try to flush it before
> starting a new command.
>
> Tests by Chris suggests reading the status is enough for the problem
> to go away, which is consistent with a change in the SMC interface,
> where the number of bytes to read is no longer specified, but found
> out by reading until end of data.
>
> Tested on a MacBookAir3,1, but the original problem has not been
> reproduced.

So, what should we do with this patch ? Apply it ?

Thanks,
Guenter

> ---
>   drivers/hwmon/applesmc.c | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
>
> diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
> index 98814d1..c0ff350 100644
> --- a/drivers/hwmon/applesmc.c
> +++ b/drivers/hwmon/applesmc.c
> @@ -230,6 +230,7 @@ static int send_argument(const char *key)
>
>   static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
>   {
> +	u8 status, data;
>   	int i;
>
>   	if (send_command(cmd) || send_argument(key)) {
> @@ -237,6 +238,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
>   		return -EIO;
>   	}
>
> +	/* This has no effect on newer (2012) SMCs */
>   	if (send_byte(len, APPLESMC_DATA_PORT)) {
>   		pr_warn("%.4s: read len fail\n", key);
>   		return -EIO;
> @@ -250,6 +252,17 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
>   		buffer[i] = inb(APPLESMC_DATA_PORT);
>   	}
>
> +	/* Read the data port until bit0 is cleared */
> +	for (i = 0; i < 16; i++) {
> +		udelay(APPLESMC_MIN_WAIT);
> +		status = inb(APPLESMC_CMD_PORT);
> +		if (!(status & 0x01))
> +			break;
> +		data = inb(APPLESMC_DATA_PORT);
> +	}
> +	if (i)
> +		pr_warn("flushed %d bytes, last value is: %d\n", i, data);
> +
>   	return 0;
>   }
>
>


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-07 23:42                                                 ` [lm-sensors] " Guenter Roeck
@ 2013-10-07 23:46                                                   ` Chris Murphy
  -1 siblings, 0 replies; 86+ messages in thread
From: Chris Murphy @ 2013-10-07 23:46 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org


On Oct 7, 2013, at 5:42 PM, Guenter Roeck <linux@roeck-us.net> wrote:

> On 10/02/2013 10:24 AM, Henrik Rydberg wrote:
> 
>>> From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001
>> From: Henrik Rydberg <rydberg@euromail.se>
>> Date: Wed, 2 Oct 2013 19:15:03 +0200
>> Subject: [PATCH] applesmc remedy take 2
>> 
>> Conjectured problem: there are remnant bytes ready on the data line
>> which corrupts the read after a failure.
>> 
>> Remedy: assuming bit0 is the read valid line, try to flush it before
>> starting a new command.
>> 
>> Tests by Chris suggests reading the status is enough for the problem
>> to go away, which is consistent with a change in the SMC interface,
>> where the number of bytes to read is no longer specified, but found
>> out by reading until end of data.
>> 
>> Tested on a MacBookAir3,1, but the original problem has not been
>> reproduced.
> 
> So, what should we do with this patch ? Apply it ?

So far I'm getting nothing on the original machine. As of today it's applied as the last patch on 3.12.0-0.rc4.git0.1.fc20.x86_64. Unfortunately at the moment I'm a bit too dense to figure out how to get a new kernel applied to an existing live package so I can try this on a USB stick. While maybe unrelated, the oops was occurring at least 4x as often booted from USB stick media than HDD.

Chris

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-07 23:46                                                   ` Chris Murphy
  0 siblings, 0 replies; 86+ messages in thread
From: Chris Murphy @ 2013-10-07 23:46 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org


On Oct 7, 2013, at 5:42 PM, Guenter Roeck <linux@roeck-us.net> wrote:

> On 10/02/2013 10:24 AM, Henrik Rydberg wrote:
> 
>>> From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001
>> From: Henrik Rydberg <rydberg@euromail.se>
>> Date: Wed, 2 Oct 2013 19:15:03 +0200
>> Subject: [PATCH] applesmc remedy take 2
>> 
>> Conjectured problem: there are remnant bytes ready on the data line
>> which corrupts the read after a failure.
>> 
>> Remedy: assuming bit0 is the read valid line, try to flush it before
>> starting a new command.
>> 
>> Tests by Chris suggests reading the status is enough for the problem
>> to go away, which is consistent with a change in the SMC interface,
>> where the number of bytes to read is no longer specified, but found
>> out by reading until end of data.
>> 
>> Tested on a MacBookAir3,1, but the original problem has not been
>> reproduced.
> 
> So, what should we do with this patch ? Apply it ?

So far I'm getting nothing on the original machine. As of today it's applied as the last patch on 3.12.0-0.rc4.git0.1.fc20.x86_64. Unfortunately at the moment I'm a bit too dense to figure out how to get a new kernel applied to an existing live package so I can try this on a USB stick. While maybe unrelated, the oops was occurring at least 4x as often booted from USB stick media than HDD.

Chris
_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-07 23:46                                                   ` [lm-sensors] " Chris Murphy
@ 2013-10-08 15:48                                                     ` Guenter Roeck
  -1 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-08 15:48 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On Mon, Oct 07, 2013 at 05:46:55PM -0600, Chris Murphy wrote:
> 
> On Oct 7, 2013, at 5:42 PM, Guenter Roeck <linux@roeck-us.net> wrote:
> 
> > On 10/02/2013 10:24 AM, Henrik Rydberg wrote:
> > 
> >>> From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001
> >> From: Henrik Rydberg <rydberg@euromail.se>
> >> Date: Wed, 2 Oct 2013 19:15:03 +0200
> >> Subject: [PATCH] applesmc remedy take 2
> >> 
> >> Conjectured problem: there are remnant bytes ready on the data line
> >> which corrupts the read after a failure.
> >> 
> >> Remedy: assuming bit0 is the read valid line, try to flush it before
> >> starting a new command.
> >> 
> >> Tests by Chris suggests reading the status is enough for the problem
> >> to go away, which is consistent with a change in the SMC interface,
> >> where the number of bytes to read is no longer specified, but found
> >> out by reading until end of data.
> >> 
> >> Tested on a MacBookAir3,1, but the original problem has not been
> >> reproduced.
> > 
> > So, what should we do with this patch ? Apply it ?
> 
> So far I'm getting nothing on the original machine. As of today it's applied as the last patch on 3.12.0-0.rc4.git0.1.fc20.x86_64. Unfortunately at the moment I'm a bit too dense to figure out how to get a new kernel applied to an existing live package so I can try this on a USB stick. While maybe unrelated, the oops was occurring at least 4x as often booted from USB stick media than HDD.
> 

Seems to me we should apply it.

Henrik,

what do you think ? Did you have time for additional testing ?
If you think we should apply it, please send me a signed patch.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-08 15:48                                                     ` Guenter Roeck
  0 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-08 15:48 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Henrik Rydberg, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On Mon, Oct 07, 2013 at 05:46:55PM -0600, Chris Murphy wrote:
> 
> On Oct 7, 2013, at 5:42 PM, Guenter Roeck <linux@roeck-us.net> wrote:
> 
> > On 10/02/2013 10:24 AM, Henrik Rydberg wrote:
> > 
> >>> From 4451da32414080bd0563ee9e061f19bf90463cc5 Mon Sep 17 00:00:00 2001
> >> From: Henrik Rydberg <rydberg@euromail.se>
> >> Date: Wed, 2 Oct 2013 19:15:03 +0200
> >> Subject: [PATCH] applesmc remedy take 2
> >> 
> >> Conjectured problem: there are remnant bytes ready on the data line
> >> which corrupts the read after a failure.
> >> 
> >> Remedy: assuming bit0 is the read valid line, try to flush it before
> >> starting a new command.
> >> 
> >> Tests by Chris suggests reading the status is enough for the problem
> >> to go away, which is consistent with a change in the SMC interface,
> >> where the number of bytes to read is no longer specified, but found
> >> out by reading until end of data.
> >> 
> >> Tested on a MacBookAir3,1, but the original problem has not been
> >> reproduced.
> > 
> > So, what should we do with this patch ? Apply it ?
> 
> So far I'm getting nothing on the original machine. As of today it's applied as the last patch on 3.12.0-0.rc4.git0.1.fc20.x86_64. Unfortunately at the moment I'm a bit too dense to figure out how to get a new kernel applied to an existing live package so I can try this on a USB stick. While maybe unrelated, the oops was occurring at least 4x as often booted from USB stick media than HDD.
> 

Seems to me we should apply it.

Henrik,

what do you think ? Did you have time for additional testing ?
If you think we should apply it, please send me a signed patch.

Thanks,
Guenter

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-08 16:29                                                       ` [lm-sensors] " Henrik Rydberg
@ 2013-10-08 16:29                                                         ` Guenter Roeck
  -1 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-08 16:29 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On Tue, Oct 08, 2013 at 06:29:17PM +0200, Henrik Rydberg wrote:
> Hi Guenter,
> 
> > > > So, what should we do with this patch ? Apply it ?
> > > 
> > > So far I'm getting nothing on the original machine. As of today it's applied as the last patch on 3.12.0-0.rc4.git0.1.fc20.x86_64. Unfortunately at the moment I'm a bit too dense to figure out how to get a new kernel applied to an existing live package so I can try this on a USB stick. While maybe unrelated, the oops was occurring at least 4x as often booted from USB stick media than HDD.
> > > 
> > 
> > Seems to me we should apply it.
> > 
> > Henrik,
> > 
> > what do you think ? Did you have time for additional testing ?
> > If you think we should apply it, please send me a signed patch.
> 
> I want to make one more test tomorrow, then I'll send a proper patch. Thanks!
> 
Ok, great.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-08 16:29                                                         ` Guenter Roeck
  0 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-08 16:29 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On Tue, Oct 08, 2013 at 06:29:17PM +0200, Henrik Rydberg wrote:
> Hi Guenter,
> 
> > > > So, what should we do with this patch ? Apply it ?
> > > 
> > > So far I'm getting nothing on the original machine. As of today it's applied as the last patch on 3.12.0-0.rc4.git0.1.fc20.x86_64. Unfortunately at the moment I'm a bit too dense to figure out how to get a new kernel applied to an existing live package so I can try this on a USB stick. While maybe unrelated, the oops was occurring at least 4x as often booted from USB stick media than HDD.
> > > 
> > 
> > Seems to me we should apply it.
> > 
> > Henrik,
> > 
> > what do you think ? Did you have time for additional testing ?
> > If you think we should apply it, please send me a signed patch.
> 
> I want to make one more test tomorrow, then I'll send a proper patch. Thanks!
> 
Ok, great.

Thanks,
Guenter

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-08 15:48                                                     ` [lm-sensors] " Guenter Roeck
@ 2013-10-08 16:29                                                       ` Henrik Rydberg
  -1 siblings, 0 replies; 86+ messages in thread
From: Henrik Rydberg @ 2013-10-08 16:29 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

Hi Guenter,

> > > So, what should we do with this patch ? Apply it ?
> > 
> > So far I'm getting nothing on the original machine. As of today it's applied as the last patch on 3.12.0-0.rc4.git0.1.fc20.x86_64. Unfortunately at the moment I'm a bit too dense to figure out how to get a new kernel applied to an existing live package so I can try this on a USB stick. While maybe unrelated, the oops was occurring at least 4x as often booted from USB stick media than HDD.
> > 
> 
> Seems to me we should apply it.
> 
> Henrik,
> 
> what do you think ? Did you have time for additional testing ?
> If you think we should apply it, please send me a signed patch.

I want to make one more test tomorrow, then I'll send a proper patch. Thanks!

Henrik

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-08 16:29                                                       ` Henrik Rydberg
  0 siblings, 0 replies; 86+ messages in thread
From: Henrik Rydberg @ 2013-10-08 16:29 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

Hi Guenter,

> > > So, what should we do with this patch ? Apply it ?
> > 
> > So far I'm getting nothing on the original machine. As of today it's applied as the last patch on 3.12.0-0.rc4.git0.1.fc20.x86_64. Unfortunately at the moment I'm a bit too dense to figure out how to get a new kernel applied to an existing live package so I can try this on a USB stick. While maybe unrelated, the oops was occurring at least 4x as often booted from USB stick media than HDD.
> > 
> 
> Seems to me we should apply it.
> 
> Henrik,
> 
> what do you think ? Did you have time for additional testing ?
> If you think we should apply it, please send me a signed patch.

I want to make one more test tomorrow, then I'll send a proper patch. Thanks!

Henrik

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-08 16:29                                                         ` [lm-sensors] " Guenter Roeck
@ 2013-10-09  8:29                                                           ` Henrik Rydberg
  -1 siblings, 0 replies; 86+ messages in thread
From: Henrik Rydberg @ 2013-10-09  8:29 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

Hi Guenter,

> > I want to make one more test tomorrow, then I'll send a proper patch. Thanks!
> > 
> Ok, great.

And here it is, with compiler warning fixed.

Thanks everybody,
Henrik

>From 18fa0f55b764ad0fe5fc49f81bae281e5110ed56 Mon Sep 17 00:00:00 2001
From: Henrik Rydberg <rydberg@euromail.se>
Date: Wed, 2 Oct 2013 19:15:03 +0200
Subject: [PATCH] hwmon: (applesmc) Always read until end of data

The crash reported and investigated in commit 5f4513 turned out to be
caused by a change to the read interface on newer (2012) SMCs.

Tests by Chris show that simply reading the data valid line is enough
for the problem to go away. Additional tests show that the newer SMCs
no longer wait for the number of requested bytes, but start sending
data right away.  Apparently the number of bytes to read is no longer
specified as before, but instead found out by reading until end of
data. Failure to read until end of data confuses the state machine,
which eventually causes the crash.

As a remedy, assuming bit0 is the read valid line, make sure there is
nothing more to read before leaving the read function.

Tested to resolve the original problem, and runtested on MBA3,1,
MBP4,1, MBP8,2, MBP10,1, MBP10,2. The patch seems to have no effect on
machines before 2012.

Tested-by: Chris Murphy <chris@cmurf.com>
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
---
 drivers/hwmon/applesmc.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
index 98814d1..3288f13 100644
--- a/drivers/hwmon/applesmc.c
+++ b/drivers/hwmon/applesmc.c
@@ -230,6 +230,7 @@ static int send_argument(const char *key)
 
 static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
 {
+	u8 status, data = 0;
 	int i;
 
 	if (send_command(cmd) || send_argument(key)) {
@@ -237,6 +238,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
 		return -EIO;
 	}
 
+	/* This has no effect on newer (2012) SMCs */
 	if (send_byte(len, APPLESMC_DATA_PORT)) {
 		pr_warn("%.4s: read len fail\n", key);
 		return -EIO;
@@ -250,6 +252,17 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
 		buffer[i] = inb(APPLESMC_DATA_PORT);
 	}
 
+	/* Read the data port until bit0 is cleared */
+	for (i = 0; i < 16; i++) {
+		udelay(APPLESMC_MIN_WAIT);
+		status = inb(APPLESMC_CMD_PORT);
+		if (!(status & 0x01))
+			break;
+		data = inb(APPLESMC_DATA_PORT);
+	}
+	if (i)
+		pr_warn("flushed %d bytes, last value is: %d\n", i, data);
+
 	return 0;
 }
 
-- 
1.8.3.4


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-09  8:29                                                           ` Henrik Rydberg
  0 siblings, 0 replies; 86+ messages in thread
From: Henrik Rydberg @ 2013-10-09  8:29 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

Hi Guenter,

> > I want to make one more test tomorrow, then I'll send a proper patch. Thanks!
> > 
> Ok, great.

And here it is, with compiler warning fixed.

Thanks everybody,
Henrik

From 18fa0f55b764ad0fe5fc49f81bae281e5110ed56 Mon Sep 17 00:00:00 2001
From: Henrik Rydberg <rydberg@euromail.se>
Date: Wed, 2 Oct 2013 19:15:03 +0200
Subject: [PATCH] hwmon: (applesmc) Always read until end of data

The crash reported and investigated in commit 5f4513 turned out to be
caused by a change to the read interface on newer (2012) SMCs.

Tests by Chris show that simply reading the data valid line is enough
for the problem to go away. Additional tests show that the newer SMCs
no longer wait for the number of requested bytes, but start sending
data right away.  Apparently the number of bytes to read is no longer
specified as before, but instead found out by reading until end of
data. Failure to read until end of data confuses the state machine,
which eventually causes the crash.

As a remedy, assuming bit0 is the read valid line, make sure there is
nothing more to read before leaving the read function.

Tested to resolve the original problem, and runtested on MBA3,1,
MBP4,1, MBP8,2, MBP10,1, MBP10,2. The patch seems to have no effect on
machines before 2012.

Tested-by: Chris Murphy <chris@cmurf.com>
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
---
 drivers/hwmon/applesmc.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
index 98814d1..3288f13 100644
--- a/drivers/hwmon/applesmc.c
+++ b/drivers/hwmon/applesmc.c
@@ -230,6 +230,7 @@ static int send_argument(const char *key)
 
 static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
 {
+	u8 status, data = 0;
 	int i;
 
 	if (send_command(cmd) || send_argument(key)) {
@@ -237,6 +238,7 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
 		return -EIO;
 	}
 
+	/* This has no effect on newer (2012) SMCs */
 	if (send_byte(len, APPLESMC_DATA_PORT)) {
 		pr_warn("%.4s: read len fail\n", key);
 		return -EIO;
@@ -250,6 +252,17 @@ static int read_smc(u8 cmd, const char *key, u8 *buffer, u8 len)
 		buffer[i] = inb(APPLESMC_DATA_PORT);
 	}
 
+	/* Read the data port until bit0 is cleared */
+	for (i = 0; i < 16; i++) {
+		udelay(APPLESMC_MIN_WAIT);
+		status = inb(APPLESMC_CMD_PORT);
+		if (!(status & 0x01))
+			break;
+		data = inb(APPLESMC_DATA_PORT);
+	}
+	if (i)
+		pr_warn("flushed %d bytes, last value is: %d\n", i, data);
+
 	return 0;
 }
 
-- 
1.8.3.4


_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: applesmc oops in 3.10/3.11
  2013-10-09  8:29                                                           ` [lm-sensors] " Henrik Rydberg
@ 2013-10-09 16:52                                                             ` Guenter Roeck
  -1 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-09 16:52 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On Wed, Oct 09, 2013 at 10:29:39AM +0200, Henrik Rydberg wrote:
> From 18fa0f55b764ad0fe5fc49f81bae281e5110ed56 Mon Sep 17 00:00:00 2001
> From: Henrik Rydberg <rydberg@euromail.se>
> Date: Wed, 2 Oct 2013 19:15:03 +0200
> Subject: [PATCH] hwmon: (applesmc) Always read until end of data
> 
> The crash reported and investigated in commit 5f4513 turned out to be
> caused by a change to the read interface on newer (2012) SMCs.
> 
> Tests by Chris show that simply reading the data valid line is enough
> for the problem to go away. Additional tests show that the newer SMCs
> no longer wait for the number of requested bytes, but start sending
> data right away.  Apparently the number of bytes to read is no longer
> specified as before, but instead found out by reading until end of
> data. Failure to read until end of data confuses the state machine,
> which eventually causes the crash.
> 
> As a remedy, assuming bit0 is the read valid line, make sure there is
> nothing more to read before leaving the read function.
> 
> Tested to resolve the original problem, and runtested on MBA3,1,
> MBP4,1, MBP8,2, MBP10,1, MBP10,2. The patch seems to have no effect on
> machines before 2012.
> 
> Tested-by: Chris Murphy <chris@cmurf.com>
> Signed-off-by: Henrik Rydberg <rydberg@euromail.se>

Applied. I'll do my usual sanity testing and send it to Linus
either tomorrow or Friday.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [lm-sensors] applesmc oops in 3.10/3.11
@ 2013-10-09 16:52                                                             ` Guenter Roeck
  0 siblings, 0 replies; 86+ messages in thread
From: Guenter Roeck @ 2013-10-09 16:52 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Chris Murphy, Josh Boyer, khali, lm-sensors,
	Linux-Kernel@Vger. Kernel. Org

On Wed, Oct 09, 2013 at 10:29:39AM +0200, Henrik Rydberg wrote:
> From 18fa0f55b764ad0fe5fc49f81bae281e5110ed56 Mon Sep 17 00:00:00 2001
> From: Henrik Rydberg <rydberg@euromail.se>
> Date: Wed, 2 Oct 2013 19:15:03 +0200
> Subject: [PATCH] hwmon: (applesmc) Always read until end of data
> 
> The crash reported and investigated in commit 5f4513 turned out to be
> caused by a change to the read interface on newer (2012) SMCs.
> 
> Tests by Chris show that simply reading the data valid line is enough
> for the problem to go away. Additional tests show that the newer SMCs
> no longer wait for the number of requested bytes, but start sending
> data right away.  Apparently the number of bytes to read is no longer
> specified as before, but instead found out by reading until end of
> data. Failure to read until end of data confuses the state machine,
> which eventually causes the crash.
> 
> As a remedy, assuming bit0 is the read valid line, make sure there is
> nothing more to read before leaving the read function.
> 
> Tested to resolve the original problem, and runtested on MBA3,1,
> MBP4,1, MBP8,2, MBP10,1, MBP10,2. The patch seems to have no effect on
> machines before 2012.
> 
> Tested-by: Chris Murphy <chris@cmurf.com>
> Signed-off-by: Henrik Rydberg <rydberg@euromail.se>

Applied. I'll do my usual sanity testing and send it to Linus
either tomorrow or Friday.

Thanks,
Guenter

_______________________________________________
lm-sensors mailing list
lm-sensors@lm-sensors.org
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors

^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2013-10-09 16:53 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-25 19:06 applesmc oops in 3.10/3.11 Josh Boyer
2013-09-25 19:06 ` [lm-sensors] " Josh Boyer
2013-09-25 19:56 ` Guenter Roeck
2013-09-25 19:56   ` [lm-sensors] " Guenter Roeck
2013-09-25 21:48   ` Henrik Rydberg
2013-09-25 21:48     ` [lm-sensors] " Henrik Rydberg
2013-09-25 22:08     ` Guenter Roeck
2013-09-25 22:08       ` [lm-sensors] " Guenter Roeck
2013-09-26  6:34       ` Henrik Rydberg
2013-09-26  6:34         ` [lm-sensors] " Henrik Rydberg
2013-09-26 10:36         ` Guenter Roeck
2013-09-26 10:36           ` [lm-sensors] " Guenter Roeck
2013-09-26 11:13           ` Henrik Rydberg
2013-09-26 11:13             ` [lm-sensors] " Henrik Rydberg
2013-09-26 10:53         ` Guenter Roeck
2013-09-26 10:53           ` [lm-sensors] " Guenter Roeck
2013-09-26 11:11           ` Henrik Rydberg
2013-09-26 11:11             ` [lm-sensors] " Henrik Rydberg
2013-09-27 16:21         ` Josh Boyer
2013-09-27 16:21           ` [lm-sensors] " Josh Boyer
2013-09-27 17:12           ` Guenter Roeck
2013-09-27 17:12             ` [lm-sensors] " Guenter Roeck
2013-09-27 17:41             ` Chris Murphy
2013-09-27 17:41               ` [lm-sensors] " Chris Murphy
2013-09-27 17:59               ` Guenter Roeck
2013-09-27 17:59                 ` [lm-sensors] " Guenter Roeck
2013-09-27 18:03                 ` Chris Murphy
2013-09-27 18:03                   ` [lm-sensors] " Chris Murphy
2013-09-27 23:33                   ` Guenter Roeck
2013-09-27 23:33                     ` [lm-sensors] " Guenter Roeck
2013-10-01  1:57                     ` Chris Murphy
2013-10-01  1:57                       ` [lm-sensors] " Chris Murphy
2013-10-01  3:37                       ` Guenter Roeck
2013-10-01  3:37                         ` [lm-sensors] " Guenter Roeck
2013-10-01 10:55                         ` Henrik Rydberg
2013-10-01 10:55                           ` [lm-sensors] " Henrik Rydberg
2013-10-01 15:19                           ` Guenter Roeck
2013-10-01 15:19                             ` [lm-sensors] " Guenter Roeck
2013-10-01 15:33                             ` Chris Murphy
2013-10-01 15:33                               ` [lm-sensors] " Chris Murphy
2013-10-01 16:24                               ` Guenter Roeck
2013-10-01 16:24                                 ` [lm-sensors] " Guenter Roeck
2013-10-02  1:09                                 ` Chris Murphy
2013-10-02  1:09                                   ` [lm-sensors] " Chris Murphy
2013-10-02  3:51                                   ` Guenter Roeck
2013-10-02  3:51                                     ` [lm-sensors] " Guenter Roeck
2013-10-02  3:55                                     ` Chris Murphy
2013-10-02  3:55                                       ` [lm-sensors] " Chris Murphy
2013-10-02  4:02                                       ` Guenter Roeck
2013-10-02  4:02                                         ` [lm-sensors] " Guenter Roeck
2013-10-02  9:53                                     ` Henrik Rydberg
2013-10-02  9:53                                       ` [lm-sensors] " Henrik Rydberg
2013-10-02 13:30                                       ` Guenter Roeck
2013-10-02 13:30                                         ` [lm-sensors] " Guenter Roeck
2013-10-02 16:34                                         ` Henrik Rydberg
2013-10-02 16:34                                           ` [lm-sensors] " Henrik Rydberg
2013-10-02 16:47                                           ` Guenter Roeck
2013-10-02 16:47                                             ` [lm-sensors] " Guenter Roeck
2013-10-02 17:24                                             ` Henrik Rydberg
2013-10-02 17:24                                               ` [lm-sensors] " Henrik Rydberg
2013-10-02 18:02                                               ` Guenter Roeck
2013-10-02 18:02                                                 ` [lm-sensors] " Guenter Roeck
2013-10-02 18:33                                                 ` Chris Murphy
2013-10-02 18:33                                                   ` [lm-sensors] " Chris Murphy
2013-10-02 20:59                                                   ` Guenter Roeck
2013-10-02 20:59                                                     ` [lm-sensors] " Guenter Roeck
2013-10-02 21:34                                                     ` Chris Murphy
2013-10-02 21:34                                                       ` [lm-sensors] " Chris Murphy
2013-10-02 23:32                                                       ` Guenter Roeck
2013-10-02 23:32                                                         ` [lm-sensors] " Guenter Roeck
2013-10-07 23:42                                               ` Guenter Roeck
2013-10-07 23:42                                                 ` [lm-sensors] " Guenter Roeck
2013-10-07 23:46                                                 ` Chris Murphy
2013-10-07 23:46                                                   ` [lm-sensors] " Chris Murphy
2013-10-08 15:48                                                   ` Guenter Roeck
2013-10-08 15:48                                                     ` [lm-sensors] " Guenter Roeck
2013-10-08 16:29                                                     ` Henrik Rydberg
2013-10-08 16:29                                                       ` [lm-sensors] " Henrik Rydberg
2013-10-08 16:29                                                       ` Guenter Roeck
2013-10-08 16:29                                                         ` [lm-sensors] " Guenter Roeck
2013-10-09  8:29                                                         ` Henrik Rydberg
2013-10-09  8:29                                                           ` [lm-sensors] " Henrik Rydberg
2013-10-09 16:52                                                           ` Guenter Roeck
2013-10-09 16:52                                                             ` [lm-sensors] " Guenter Roeck
2013-09-27 18:01           ` Guenter Roeck
2013-09-27 18:01             ` [lm-sensors] " Guenter Roeck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.