linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [2.6.30] Kernel bug with dock driver
@ 2009-06-14 19:00 Joerg Platte
  2009-06-15  2:02 ` Henrique de Moraes Holschuh
  2009-06-17  7:13 ` Zhang Rui
  0 siblings, 2 replies; 11+ messages in thread
From: Joerg Platte @ 2009-06-14 19:00 UTC (permalink / raw)
  To: linux-acpi; +Cc: linux-kernel

Hi,

today I ran into the following bug several times (Kernel 2.6.30, IBM ThinkPad 
T40 with ultrabay HD and CD/DVD):

BUG: unable to handle kernel paging request at 00001000
IP: [<c01df5fa>] strcpy+0xe/0x1b
*pde = 00000000
Oops: 0000 [#1] PREEMPT
last sysfs file: /sys/class/power_supply/BAT0/energy_full
Modules linked in: nls_iso8859_1 nls_cp850 vfat fat usb_storage usb_libusual
udf crc_itu_t sg sr_mod cdrom nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs
radeon drm sco bridge stp llc bnep l2cap bluetooth ipt_MASQUERADE
iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT
ipt_LOG xt_limit xt_tcpudp xt_mac xt_multiport iptable_filter iptable_mangle
ip_tables x_tables nf_conntrack_ftp nf_conntrack vboxdrv binfmt_misc
af_packet cpufreq_userspace cpufreq_stats cpufreq_powersave nsc_ircc autofs4
fuse nls_utf8 ntfs nls_base ext2 deadline_iosched as_iosched ircomm_tty
ircomm tun acpi_cpufreq sbs sbshc snd_intel8x0m snd_intel8x0 snd_ac97_codec
ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_oss snd_seq_midi
snd_rawmidi snd_seq_midi_event joydev snd_seq snd_timer snd_seq_device
dvb_usb_cinergyT2 thinkpad_acpi irtty_sir dvb_usb yenta_socket
rsrc_nonstatic ipw2200 pcmcia libipw sir_dev rfkill snd video led_class
lib80211 i2c_i801 rng_core soundcore pcmcia_core snd_page_alloc 8250_pci
dvb_core irda parport_pc parport crc_ccitt psmouse pcspkr nvram output
battery ac button processor rtc_cmos 8250_pnp serio_raw rtc_core rtc_lib
8250 serial_core evdev ext3 jbd mbcache usbhid hid sd_mod ata_generic
pata_acpi ata_piix uhci_hcd ehci_hcd libata e1000 usbcore scsi_mod intel_agp
agpgart thermal fan unix cpufreq_conservative cpufreq_ondemand freq_table
radeonfb fb_ddc backlight i2c_algo_bit cfbcopyarea i2c_core cfbimgblt
cfbfillrect fbcon tileblit font bitblit softcursor fb

Pid: 52, comm: kacpi_notify Not tainted (2.6.30 #1) 2373G1G
EIP: 0060:[<c01df5fa>] EFLAGS: 00010286 CPU: 0
EIP is at strcpy+0xe/0x1b
EAX: c278a22c EBX: c278a200 ECX: c278a22c EDX: 00001000
ESI: 00001000 EDI: c278a22c EBP: f70a8f34 ESP: f70a8f28
 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process kacpi_notify (pid: 52, ti=f70a8000 task=f70c0c40 task.ti=f70a8000)
Stack:
 c278a200 c278a214 c278a244 f70a8f64 c01fc898 030a8f54 00001000 f708a879
 00000002 f6b3ca80 c278a218 00000014 f97fb132 f0a07a80 00000000 f70a8f70
 f97fb14e 00000000 f70a8f7c f97fb023 f7041460 f70a8f8c c020199d f6984ac0
Call Trace:
 [<c01fc898>] ? acpi_bus_generate_netlink_event+0x140/0x199
 [<f97fb132>] ? bay_notify+0x0/0x1f [thinkpad_acpi]
 [<f97fb14e>] ? bay_notify+0x1c/0x1f [thinkpad_acpi]
 [<f97fb023>] ? dispatch_acpi_notify+0x23/0x26 [thinkpad_acpi]
 [<c020199d>] ? acpi_ev_notify_dispatch+0x4c/0x57
 [<c01f4558>] ? acpi_os_execute_deferred+0x20/0x2c
 [<c012cff6>] ? worker_thread+0x15a/0x1fd
 [<c01f4538>] ? acpi_os_execute_deferred+0x0/0x2c
 [<c012fc7d>] ? autoremove_wake_function+0x0/0x33
 [<c012ce9c>] ? worker_thread+0x0/0x1fd
 [<c012f8bc>] ? kthread+0x42/0x67
 [<c012f87a>] ? kthread+0x0/0x67
 [<c01030d3>] ? kernel_thread_helper+0x7/0x10
Code: ff ff 21 e3 8b 5b 18 83 eb 07 39 d9 73 08 89 01 89 51 04 31 c0 c3 b8
f2 ff ff ff c3 90 55 89 c1 89 e5 57 89 c7 56 89 d6 83 ec 04 <ac> aa 84 c0 75
fa 5a 89 c8 5e 5f 5d c3 55 89 e5 57 89 c7 56 89
EIP: [<c01df5fa>] strcpy+0xe/0x1b SS:ESP 0068:f70a8f28
CR2: 0000000000001000
---[ end trace fe729b61cde0024d ]---

The bug message appears with a high probability - but not always - when I 
release the eject lever to exchange the drive in the ultrabay slot. I'm not 
sure if this bug is a regression of 2.6.30, because I did not change the drive 
in the ultrabay for several months. Maybe it was already in 2.6.29 or before. 
How can I help to resolve this bug?

regards,
Jörg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [2.6.30] Kernel bug with dock driver
  2009-06-14 19:00 [2.6.30] Kernel bug with dock driver Joerg Platte
@ 2009-06-15  2:02 ` Henrique de Moraes Holschuh
  2009-06-15  7:36   ` Joerg Platte
  2009-06-17  7:13 ` Zhang Rui
  1 sibling, 1 reply; 11+ messages in thread
From: Henrique de Moraes Holschuh @ 2009-06-15  2:02 UTC (permalink / raw)
  To: Joerg Platte; +Cc: linux-acpi, linux-kernel

On Sun, 14 Jun 2009, Joerg Platte wrote:
> The bug message appears with a high probability - but not always -
> when I release the eject lever to exchange the drive in the ultrabay
> slot. I'm not sure if this bug is a regression of 2.6.30, because I
> did not change the drive in the ultrabay for several months. Maybe it
> was already in 2.6.29 or before.  How can I help to resolve this bug?

Do it in single user mode with udev helpers *killed*, but just release
the lever, _don't_ pull the drive off the bay, unless you do verify that
the bay LED went off.

***NEVER*** remove an Ultrabay device with the Ultrabay LED still lit,
especially if it is a PATA device.

If the bug cannot be reproduced in that condition, it is happening
because userspace is trapping the lever release event (as it should),
and doing something the kernel doesn't like with it (which is a bug
in the kernel in any case).  That might help you track down a better
way to reproduce the issue.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [2.6.30] Kernel bug with dock driver
  2009-06-15  2:02 ` Henrique de Moraes Holschuh
@ 2009-06-15  7:36   ` Joerg Platte
  2009-06-15 20:29     ` Henrique de Moraes Holschuh
  0 siblings, 1 reply; 11+ messages in thread
From: Joerg Platte @ 2009-06-15  7:36 UTC (permalink / raw)
  To: Henrique de Moraes Holschuh; +Cc: linux-acpi, linux-kernel

Am Monday, 15. June 2009 schrieb Henrique de Moraes Holschuh:
> Do it in single user mode with udev helpers *killed*, but just release

Today I tried to reproduce the bug in runlevel 2 (Debian based system) without 
success, even after a suspend to disk. In the past this was one method to 
trigger bugs with the bay.

> the lever, _don't_ pull the drive off the bay, unless you do verify that
> the bay LED went off.
>
> ***NEVER*** remove an Ultrabay device with the Ultrabay LED still lit,
> especially if it is a PATA device.

I did this accidentally some times before and the bay is still working. But 
I'll try to not to do it again :)

> If the bug cannot be reproduced in that condition, it is happening
> because userspace is trapping the lever release event (as it should),
> and doing something the kernel doesn't like with it (which is a bug
> in the kernel in any case).  That might help you track down a better
> way to reproduce the issue.

Userspace scripts are written by me (based on some information from 
thinkwiki). 
The udev rule looks like this:
ATTR{type}=="ata_bay", ATTR{docked}=="1", KERNEL=="dock.*", ACTION=="change", 
SUBSYSTEM=="platform", RUN+="/usr/local/sbin/ultrabay_open"

and ultrabay_open contains the following lines of shell code:
#!/bin/bash
ULTRABAY_SYSDIR='/sys/class/scsi_device/1:0:0:0/device'
shopt -s nullglob

if [ -d $ULTRABAY_SYSDIR ]; then
        sync
        echo 1 > $ULTRABAY_SYSDIR/delete
        logger "removed $ULTRABAY_SYSDIR"
fi
sync
# Turn off power to the UltraBay:
if [ -f /sys/devices/platform/dock.2/undock ]; then
        echo 1 > /sys/devices/platform/dock.2/undock
        logger "/sys/devices/platform/dock.2/undock"
else
        echo eject > /proc/acpi/ibm/bay
fi
# Tell the user we're OK
logger "/proc/acpi/ibm/beep"
echo 12 > /proc/acpi/ibm/beep

What I discovered today after modifying ultrabay_open was that in some cases 
the script is called again directly after inserting a drive to the bay. But 
only, if the execution of this script takes some more time (I added a sleep 2 
after the second sync to be able to see which command might trigger the bug).

So maybe yesterday this script was called twice and the kernel does not like 
to undock a device while it is in the process of being undocked? How should I 
modify the udev rule to prevent another execution each time a drive is 
inserted into the bay?

regards,
Jörg


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [2.6.30] Kernel bug with dock driver
  2009-06-15  7:36   ` Joerg Platte
@ 2009-06-15 20:29     ` Henrique de Moraes Holschuh
  2009-06-16  9:24       ` Joerg Platte
  0 siblings, 1 reply; 11+ messages in thread
From: Henrique de Moraes Holschuh @ 2009-06-15 20:29 UTC (permalink / raw)
  To: Joerg Platte; +Cc: linux-acpi, linux-kernel

On Mon, 15 Jun 2009, Joerg Platte wrote:
> > If the bug cannot be reproduced in that condition, it is happening
> > because userspace is trapping the lever release event (as it should),
> > and doing something the kernel doesn't like with it (which is a bug
> > in the kernel in any case).  That might help you track down a better
> > way to reproduce the issue.
> 
> if [ -d $ULTRABAY_SYSDIR ]; then
>         sync
>         echo 1 > $ULTRABAY_SYSDIR/delete
>         logger "removed $ULTRABAY_SYSDIR"
> fi
> sync
> # Turn off power to the UltraBay:
> if [ -f /sys/devices/platform/dock.2/undock ]; then
>         echo 1 > /sys/devices/platform/dock.2/undock
>         logger "/sys/devices/platform/dock.2/undock"

There might be a race there, as you call undock but you don't really know if
the SCSI device was deleted.

> What I discovered today after modifying ultrabay_open was that in some cases 
> the script is called again directly after inserting a drive to the bay. But 
> only, if the execution of this script takes some more time (I added a sleep 2 
> after the second sync to be able to see which command might trigger the bug).

So, please try to reproduce this by sending a number of delete requests
back-to-back in a row, and also mixing delete requests with undock requests.
When you find out what causes the OOPS (and it _is_ a bug in the kernel if
any of those oops), we can try to direct the bug report to someone who can
fix the problem.

> to undock a device while it is in the process of being undocked? How should I 
> modify the udev rule to prevent another execution each time a drive is 
> inserted into the bay?

You will need locking, unfortunately.  Also, check if there is a device in
the bay.  event+device in bay == hotunplug.  event+no device in bay ==
hotplug.

That is, if the events don't already tell you (i.e. different events for
plug and unplug).  I don't recall right now if the ACPI events are
different, but I do recall the thinkpad BIOS follows the ACPI spec correctly
on this area.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [2.6.30] Kernel bug with dock driver
  2009-06-15 20:29     ` Henrique de Moraes Holschuh
@ 2009-06-16  9:24       ` Joerg Platte
  2009-06-17 12:34         ` Henrique de Moraes Holschuh
  0 siblings, 1 reply; 11+ messages in thread
From: Joerg Platte @ 2009-06-16  9:24 UTC (permalink / raw)
  To: Henrique de Moraes Holschuh; +Cc: linux-acpi, linux-kernel

Am Monday, 15. June 2009 schrieb Henrique de Moraes Holschuh:
> There might be a race there, as you call undock but you don't really know
> if the SCSI device was deleted.

I tried to undock a device without previously disabling it. Here's the result 
after inserting it again:

ata2.00: disabled                                                                                                                                                      
ACPI: \_SB_.PCI0.IDE0.SCND.MSTR - undocking                                                                                                                            
ata2.00: detaching (SCSI 1:0:0:0)                                                                                                                                      
sd 1:0:0:0: [sdb] Synchronizing SCSI cache
sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 1:0:0:0: [sdb] Stopping disk
sd 1:0:0:0: [sdb] START_STOP FAILED
sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<c01df5fa>] strcpy+0xe/0x1b
*pde = 00000000
Oops: 0000 [#1] PREEMPT
last sysfs file: /sys/devices/platform/dock.2/docked
Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs radeon drm 
sco bridge stp llc bnep l2cap bluetooth ipt_MASQUERADE iptable_nat nf_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT ipt_LOG xt_limit 
xt_tcpudp xt_mac xt_multiport iptable_filter iptable_mangle ip_tables x_tables 
nf_conntrack_ftp nf_conntrack vboxdrv binfmt_misc aes_i586 cryptomgr aead 
pcompress crypto_blkcipher crypto_hash aes_generic crypto_algapi 
lib80211_crypt_ccmp af_packet cpufreq_userspace cpufreq_stats 
cpufreq_powersave autofs4 nsc_ircc fuse nls_utf8 ntfs nls_base ext2 
deadline_iosched as_iosched ircomm_tty ircomm tun acpi_cpufreq sbs sbshc 
joydev snd_intel8x0 snd_intel8x0m snd_seq_oss snd_ac97_codec snd_seq_midi 
ac97_bus snd_rawmidi snd_pcm_oss snd_seq_midi_event snd_mixer_oss irtty_sir 
snd_pcm snd_seq dvb_usb_cinergyT2 yenta_socket sir_dev snd_seq_device 
rsrc_nonstatic thinkpad_acpi dvb_usb pcmcia rfkill ipw2200 snd_timer dvb_core 
led_class i2c_i801 rng_core rtc_cmos pcmcia_core video ac libipw parport_pc 
8250_pci parport psmouse 8250_pnp snd lib80211 output soundcore rtc_core nvram 
serio_raw rtc_lib button snd_page_alloc pcspkr 8250 battery processor irda 
serial_core crc_ccitt evdev ext3 jbd mbcache usbhid hid sd_mod ata_generic 
pata_acpi ata_piix uhci_hcd ehci_hcd libata e1000 usbcore scsi_mod intel_agp 
agpgart thermal fan unix cpufreq_conservative cpufreq_ondemand freq_table 
radeonfb fb_ddc backlight i2c_algo_bit cfbcopyarea i2c_core cfbimgblt 
cfbfillrect fbcon tileblit font bitblit softcursor fb

Pid: 52, comm: kacpi_notify Not tainted (2.6.30 #1) 2373G1G
EIP: 0060:[<c01df5fa>] EFLAGS: 00010286 CPU: 0
EIP is at strcpy+0xe/0x1b
EAX: f302482c EBX: f3024800 ECX: f302482c EDX: 00000000
ESI: 00000000 EDI: f302482c EBP: f70a4f34 ESP: f70a4f28
 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process kacpi_notify (pid: 52, ti=f70a4000 task=f704c980 task.ti=f70a4000)
Stack:
 f3024800 f3024814 f3024844 f70a4f64 c01fc898 010a4f54 00000000 f70c2879
 00000004 f30e83c0 f3024818 00000014 f97c8132 f69b2600 00000000 f70a4f70
 f97c814e 00000000 f70a4f7c f97c8023 f7070460 f70a4f8c c020199d f4e37ee0
Call Trace:
 [<c01fc898>] ? acpi_bus_generate_netlink_event+0x140/0x199
 [<f97c8132>] ? bay_notify+0x0/0x1f [thinkpad_acpi]
 [<f97c814e>] ? bay_notify+0x1c/0x1f [thinkpad_acpi]
 [<f97c8023>] ? dispatch_acpi_notify+0x23/0x26 [thinkpad_acpi]
 [<c020199d>] ? acpi_ev_notify_dispatch+0x4c/0x57
 [<c01f4558>] ? acpi_os_execute_deferred+0x20/0x2c
 [<c012cff6>] ? worker_thread+0x15a/0x1fd
 [<c01f4538>] ? acpi_os_execute_deferred+0x0/0x2c
 [<c012fc7d>] ? autoremove_wake_function+0x0/0x33
 [<c012ce9c>] ? worker_thread+0x0/0x1fd
 [<c012f8bc>] ? kthread+0x42/0x67
 [<c012f87a>] ? kthread+0x0/0x67
 [<c01030d3>] ? kernel_thread_helper+0x7/0x10
Code: ff ff 21 e3 8b 5b 18 83 eb 07 39 d9 73 08 89 01 89 51 04 31 c0 c3 b8 f2 
ff ff ff c3 90 55 89 c1 89 e5 57 89 c7 56 89 d6 83 ec 04 <ac> aa 84 c0 75 fa 
5a 89 c8 5e 5f 5d c3 55 89 e5 57 89 c7 56 89
EIP: [<c01df5fa>] strcpy+0xe/0x1b SS:ESP 0068:f70a4f28
CR2: 0000000000000000
---[ end trace 798a63a30da95ce2 ]---

> So, please try to reproduce this by sending a number of delete requests
> back-to-back in a row, and also mixing delete requests with undock
> requests. When you find out what causes the OOPS (and it _is_ a bug in the
> kernel if any of those oops), we can try to direct the bug report to
> someone who can fix the problem.
>
> > to undock a device while it is in the process of being undocked? How
> > should I modify the udev rule to prevent another execution each time a
> > drive is inserted into the bay?
>
> You will need locking, unfortunately.  Also, check if there is a device in
> the bay.  event+device in bay == hotunplug.  event+no device in bay ==
> hotplug.
>
> That is, if the events don't already tell you (i.e. different events for
> plug and unplug).  I don't recall right now if the ACPI events are
> different, but I do recall the thinkpad BIOS follows the ACPI spec
> correctly on this area.

Since I was not able to reproduce the initial bug yesterday I remembered that 
I deinstalled acpid because of Debian Bug#522756 (acpid terminates each time I 
unplug the mouse). After installing acpid again (and disabling the udev rule) 
the Bug appeared again. However, in my acpid config I call a script for both 
events:

event=ibm/bay device:0f 00000001 00000000
action=/usr/local/sbin/ultrabay_close

This is the content of /usr/local/sbin/ultrabay_close
echo 12 > /proc/acpi/ibm/beep
sync
echo 0 0 0 > /sys/class/scsi_host/host1/scan

event=ibm/bay device:0f 00000003 00000000
action=/usr/local/sbin/ultrabay_open

ultrabay_open is identical to the script posted in my last mail. Maybe there's 
a problem with the scan command, because the kernel tries to scan for devices 
automatically. 

It looks like there are still some bugs in the dock handling when using it 
improperly :)

Best regards,
Jörg



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [2.6.30] Kernel bug with dock driver
  2009-06-14 19:00 [2.6.30] Kernel bug with dock driver Joerg Platte
  2009-06-15  2:02 ` Henrique de Moraes Holschuh
@ 2009-06-17  7:13 ` Zhang Rui
  2009-06-18 21:26   ` Joerg Platte
  1 sibling, 1 reply; 11+ messages in thread
From: Zhang Rui @ 2009-06-17  7:13 UTC (permalink / raw)
  To: jplatte; +Cc: linux-acpi, linux-kernel, Henrique de Moraes Holschuh

this seems like an thinkpad_acpi driver problem.
cc henrique.

jplatte,
it would be great if you can verify if this is a regression or not.
And please file a new bug report at
http://bugzilla.kernel.org/enter_bug.cgi?product=ACPI
attach the full dmesg after the bug and the acpidump output there.

thanks,
rui



On Mon, 2009-06-15 at 03:00 +0800, Joerg Platte wrote:
> Hi,
> 
> today I ran into the following bug several times (Kernel 2.6.30, IBM ThinkPad 
> T40 with ultrabay HD and CD/DVD):
> 
> BUG: unable to handle kernel paging request at 00001000
> IP: [<c01df5fa>] strcpy+0xe/0x1b
> *pde = 00000000
> Oops: 0000 [#1] PREEMPT
> last sysfs file: /sys/class/power_supply/BAT0/energy_full
> Modules linked in: nls_iso8859_1 nls_cp850 vfat fat usb_storage usb_libusual
> udf crc_itu_t sg sr_mod cdrom nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs
> radeon drm sco bridge stp llc bnep l2cap bluetooth ipt_MASQUERADE
> iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT
> ipt_LOG xt_limit xt_tcpudp xt_mac xt_multiport iptable_filter iptable_mangle
> ip_tables x_tables nf_conntrack_ftp nf_conntrack vboxdrv binfmt_misc
> af_packet cpufreq_userspace cpufreq_stats cpufreq_powersave nsc_ircc autofs4
> fuse nls_utf8 ntfs nls_base ext2 deadline_iosched as_iosched ircomm_tty
> ircomm tun acpi_cpufreq sbs sbshc snd_intel8x0m snd_intel8x0 snd_ac97_codec
> ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_oss snd_seq_midi
> snd_rawmidi snd_seq_midi_event joydev snd_seq snd_timer snd_seq_device
> dvb_usb_cinergyT2 thinkpad_acpi irtty_sir dvb_usb yenta_socket
> rsrc_nonstatic ipw2200 pcmcia libipw sir_dev rfkill snd video led_class
> lib80211 i2c_i801 rng_core soundcore pcmcia_core snd_page_alloc 8250_pci
> dvb_core irda parport_pc parport crc_ccitt psmouse pcspkr nvram output
> battery ac button processor rtc_cmos 8250_pnp serio_raw rtc_core rtc_lib
> 8250 serial_core evdev ext3 jbd mbcache usbhid hid sd_mod ata_generic
> pata_acpi ata_piix uhci_hcd ehci_hcd libata e1000 usbcore scsi_mod intel_agp
> agpgart thermal fan unix cpufreq_conservative cpufreq_ondemand freq_table
> radeonfb fb_ddc backlight i2c_algo_bit cfbcopyarea i2c_core cfbimgblt
> cfbfillrect fbcon tileblit font bitblit softcursor fb
> 
> Pid: 52, comm: kacpi_notify Not tainted (2.6.30 #1) 2373G1G
> EIP: 0060:[<c01df5fa>] EFLAGS: 00010286 CPU: 0
> EIP is at strcpy+0xe/0x1b
> EAX: c278a22c EBX: c278a200 ECX: c278a22c EDX: 00001000
> ESI: 00001000 EDI: c278a22c EBP: f70a8f34 ESP: f70a8f28
>  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> Process kacpi_notify (pid: 52, ti=f70a8000 task=f70c0c40 task.ti=f70a8000)
> Stack:
>  c278a200 c278a214 c278a244 f70a8f64 c01fc898 030a8f54 00001000 f708a879
>  00000002 f6b3ca80 c278a218 00000014 f97fb132 f0a07a80 00000000 f70a8f70
>  f97fb14e 00000000 f70a8f7c f97fb023 f7041460 f70a8f8c c020199d f6984ac0
> Call Trace:
>  [<c01fc898>] ? acpi_bus_generate_netlink_event+0x140/0x199
>  [<f97fb132>] ? bay_notify+0x0/0x1f [thinkpad_acpi]
>  [<f97fb14e>] ? bay_notify+0x1c/0x1f [thinkpad_acpi]
>  [<f97fb023>] ? dispatch_acpi_notify+0x23/0x26 [thinkpad_acpi]
>  [<c020199d>] ? acpi_ev_notify_dispatch+0x4c/0x57
>  [<c01f4558>] ? acpi_os_execute_deferred+0x20/0x2c
>  [<c012cff6>] ? worker_thread+0x15a/0x1fd
>  [<c01f4538>] ? acpi_os_execute_deferred+0x0/0x2c
>  [<c012fc7d>] ? autoremove_wake_function+0x0/0x33
>  [<c012ce9c>] ? worker_thread+0x0/0x1fd
>  [<c012f8bc>] ? kthread+0x42/0x67
>  [<c012f87a>] ? kthread+0x0/0x67
>  [<c01030d3>] ? kernel_thread_helper+0x7/0x10
> Code: ff ff 21 e3 8b 5b 18 83 eb 07 39 d9 73 08 89 01 89 51 04 31 c0 c3 b8
> f2 ff ff ff c3 90 55 89 c1 89 e5 57 89 c7 56 89 d6 83 ec 04 <ac> aa 84 c0 75
> fa 5a 89 c8 5e 5f 5d c3 55 89 e5 57 89 c7 56 89
> EIP: [<c01df5fa>] strcpy+0xe/0x1b SS:ESP 0068:f70a8f28
> CR2: 0000000000001000
> ---[ end trace fe729b61cde0024d ]---
> 
> The bug message appears with a high probability - but not always - when I 
> release the eject lever to exchange the drive in the ultrabay slot. I'm not 
> sure if this bug is a regression of 2.6.30, because I did not change the drive 
> in the ultrabay for several months. Maybe it was already in 2.6.29 or before. 
> How can I help to resolve this bug?
> 
> regards,
> Jörg
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [2.6.30] Kernel bug with dock driver
  2009-06-16  9:24       ` Joerg Platte
@ 2009-06-17 12:34         ` Henrique de Moraes Holschuh
  2009-06-18 21:22           ` Joerg Platte
  0 siblings, 1 reply; 11+ messages in thread
From: Henrique de Moraes Holschuh @ 2009-06-17 12:34 UTC (permalink / raw)
  To: Joerg Platte; +Cc: linux-acpi, linux-kernel

On Tue, 16 Jun 2009, Joerg Platte wrote:
> Pid: 52, comm: kacpi_notify Not tainted (2.6.30 #1) 2373G1G
> EIP: 0060:[<c01df5fa>] EFLAGS: 00010286 CPU: 0
> EIP is at strcpy+0xe/0x1b
> EAX: f302482c EBX: f3024800 ECX: f302482c EDX: 00000000
> ESI: 00000000 EDI: f302482c EBP: f70a4f34 ESP: f70a4f28
>  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> Process kacpi_notify (pid: 52, ti=f70a4000 task=f704c980 task.ti=f70a4000)
> Stack:
>  f3024800 f3024814 f3024844 f70a4f64 c01fc898 010a4f54 00000000 f70c2879
>  00000004 f30e83c0 f3024818 00000014 f97c8132 f69b2600 00000000 f70a4f70
>  f97c814e 00000000 f70a4f7c f97c8023 f7070460 f70a4f8c c020199d f4e37ee0
> Call Trace:
>  [<c01fc898>] ? acpi_bus_generate_netlink_event+0x140/0x199
>  [<f97c8132>] ? bay_notify+0x0/0x1f [thinkpad_acpi]

Bay notify does this:

static void bay_notify(struct ibm_struct *ibm, u32 event)
{
	acpi_bus_generate_proc_event(ibm->acpi->device, event, 0);
	acpi_bus_generate_netlink_event(ibm->acpi->device->pnp.device_class,
					  dev_name(&ibm->acpi->device->dev),
					  event, 0);
}

If it causes a NULL derreference, it means someone has changed API and did
NOT change all callsites.  It is that simple.

Joerg, please apply this patch, and report the results...  WARNING: you will
not get bay events from thinkpad-acpi, so be careful to not remove from the
bay a device that is still active...


diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c
index 912be65..03e4d16 100644
--- a/drivers/platform/x86/thinkpad_acpi.c
+++ b/drivers/platform/x86/thinkpad_acpi.c
@@ -4556,6 +4556,14 @@ static int __init bay_init(struct ibm_init_struct *iibm)
 
 static void bay_notify(struct ibm_struct *ibm, u32 event)
 {
+	if (WARN(!ibm, TPACPI_ERR "ibm is NULL\n"))
+		return;
+	if (WARN(!ibm->acpi->device, TPACPI_ERR "ibm->acpi->device is NULL\n"))
+		return;
+	if (WARN(!ibm->acpi->device->pnp.device_class, TPACPI_ERR "ibm->acpi->device->pnp.device_class is NULL\n"))
+		return;
+	if (WARN(!dev_name(&ibm->acpi->device->dev), TPACPI_ERR "dev_name() is returning NULL\n"))
+		return;
 	acpi_bus_generate_proc_event(ibm->acpi->device, event, 0);
 	acpi_bus_generate_netlink_event(ibm->acpi->device->pnp.device_class,
 					  dev_name(&ibm->acpi->device->dev),


-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [2.6.30] Kernel bug with dock driver
  2009-06-17 12:34         ` Henrique de Moraes Holschuh
@ 2009-06-18 21:22           ` Joerg Platte
  2009-06-18 22:10             ` Henrique de Moraes Holschuh
  0 siblings, 1 reply; 11+ messages in thread
From: Joerg Platte @ 2009-06-18 21:22 UTC (permalink / raw)
  To: Henrique de Moraes Holschuh; +Cc: linux-acpi, linux-kernel

Am Wednesday, 17. June 2009 schrieb Henrique de Moraes Holschuh:
Hi Henrique,

> If it causes a NULL derreference, it means someone has changed API and did
> NOT change all callsites.  It is that simple.
>
> Joerg, please apply this patch, and report the results...  WARNING: you
> will not get bay events from thinkpad-acpi, so be careful to not remove
> from the bay a device that is still active...

This time I got two BUGs while being on battery and without acpid:

------------[ cut here ]------------
WARNING: at drivers/platform/x86/thinkpad_acpi.c:4565 bay_notify+0x35/0x5e 
[thinkpad_acpi]()
Hardware name: 2373G1G
thinkpad_acpi: dev_name() is returning NULL
Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs radeon drm 
sco bridge stp llc bnep l2cap bluetooth ipt_MASQUERADE iptable_nat nf_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT ipt_LOG xt_limit 
xt_tcpudp xt_mac xt_multiport iptable_filter iptable_mangle ip_tables x_tables 
nf_conntrack_ftp nf_conntrack vboxdrv binfmt_misc af_packet cpufreq_userspace 
cpufreq_stats cpufreq_powersave autofs4 nsc_ircc fuse nls_utf8 ntfs nls_base 
ext2 deadline_iosched as_iosched ircomm_tty ircomm tun acpi_cpufreq sbs sbshc 
joydev snd_intel8x0m snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss 
snd_mixer_oss snd_pcm snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event 
snd_seq snd_timer snd_seq_device irtty_sir thinkpad_acpi ipw2200 libipw 
yenta_socket sir_dev rfkill snd rsrc_nonstatic lib80211 soundcore pcmcia 
rtc_cmos psmouse led_class pcmcia_core snd_page_alloc i2c_i801 rng_core 
8250_pci irda pcspkr serio_raw nvram parport_pc 8250_pnp rtc_core parport 
button rtc_lib processor battery ac crc_ccitt 8250 serial_core evdev ext3 jbd 
mbcache sd_mod ata_generic pata_acpi ata_piix libata uhci_hcd ehci_hcd e1000 
scsi_mod usbcore intel_agp agpgart video output thermal fan unix 
cpufreq_conservative cpufreq_ondemand freq_table radeonfb fb_ddc backlight 
i2c_algo_bit cfbcopyarea i2c_core cfbimgblt cfbfillrect fbcon tileblit font 
bitblit softcursor fb
Pid: 52, comm: kacpi_notify Not tainted 2.6.30 #1
Call Trace:
 [<c011f839>] warn_slowpath_common+0x60/0x90
 [<f9767132>] ? bay_notify+0x0/0x5e [thinkpad_acpi]
 [<c011f89d>] warn_slowpath_fmt+0x24/0x27
 [<f9767167>] bay_notify+0x35/0x5e [thinkpad_acpi]
 [<f9767023>] dispatch_acpi_notify+0x23/0x26 [thinkpad_acpi]
 [<c02018cd>] acpi_ev_notify_dispatch+0x4c/0x57
 [<c01f4488>] acpi_os_execute_deferred+0x20/0x2c
 [<c012cff6>] worker_thread+0x15a/0x1fd
 [<c01f4468>] ? acpi_os_execute_deferred+0x0/0x2c
 [<c012fc7d>] ? autoremove_wake_function+0x0/0x33
 [<c012ce9c>] ? worker_thread+0x0/0x1fd
 [<c012f8bc>] kthread+0x42/0x67
 [<c012f87a>] ? kthread+0x0/0x67
 [<c01030d3>] kernel_thread_helper+0x7/0x10
---[ end trace 0469584017b9bddb ]---
sd 1:0:0:0: [sdb] Synchronizing SCSI cache
sd 1:0:0:0: [sdb] Stopping disk
ata2.00: disabled
ACPI: \_SB_.PCI0.IDE0.SCND.MSTR - undocking
ACPI Warning (nspredef-0290): \_SB_.PCI0.LPC_.EC__.BEEP: Excess arguments - 
needs 1, found 2 [20090320]
------------[ cut here ]------------
WARNING: at drivers/platform/x86/thinkpad_acpi.c:4565 bay_notify+0x35/0x5e 
[thinkpad_acpi]()
Hardware name: 2373G1G
thinkpad_acpi: dev_name() is returning NULL
Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs radeon drm 
sco bridge stp llc bnep l2cap bluetooth ipt_MASQUERADE iptable_nat nf_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT ipt_LOG xt_limit 
xt_tcpudp xt_mac xt_multiport iptable_filter iptable_mangle ip_tables x_tables 
nf_conntrack_ftp nf_conntrack vboxdrv binfmt_misc af_packet cpufreq_userspace 
cpufreq_stats cpufreq_powersave autofs4 nsc_ircc fuse nls_utf8 ntfs nls_base 
ext2 deadline_iosched as_iosched ircomm_tty ircomm tun acpi_cpufreq sbs sbshc 
joydev snd_intel8x0m snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss 
snd_mixer_oss snd_pcm snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event 
snd_seq snd_timer snd_seq_device irtty_sir thinkpad_acpi ipw2200 libipw 
yenta_socket sir_dev rfkill snd rsrc_nonstatic lib80211 soundcore pcmcia 
rtc_cmos psmouse led_class pcmcia_core snd_page_alloc i2c_i801 rng_core 
8250_pci irda pcspkr serio_raw nvram parport_pc 8250_pnp rtc_core parport 
button rtc_lib processor battery ac crc_ccitt 8250 serial_core evdev ext3 jbd 
mbcache sd_mod ata_generic pata_acpi ata_piix libata uhci_hcd ehci_hcd e1000 
scsi_mod usbcore intel_agp agpgart video output thermal fan unix 
cpufreq_conservative cpufreq_ondemand freq_table radeonfb fb_ddc backlight 
i2c_algo_bit cfbcopyarea i2c_core cfbimgblt cfbfillrect fbcon tileblit font 
bitblit softcursor fb
Pid: 52, comm: kacpi_notify Tainted: G        W  2.6.30 #1
Call Trace:
 [<c011f839>] warn_slowpath_common+0x60/0x90
 [<f9767132>] ? bay_notify+0x0/0x5e [thinkpad_acpi]
 [<c011f89d>] warn_slowpath_fmt+0x24/0x27
 [<f9767167>] bay_notify+0x35/0x5e [thinkpad_acpi]
 [<f9767023>] dispatch_acpi_notify+0x23/0x26 [thinkpad_acpi]
 [<c02018cd>] acpi_ev_notify_dispatch+0x4c/0x57
 [<c01f4488>] acpi_os_execute_deferred+0x20/0x2c
 [<c012cff6>] worker_thread+0x15a/0x1fd
 [<c01f4468>] ? acpi_os_execute_deferred+0x0/0x2c
 [<c012fc7d>] ? autoremove_wake_function+0x0/0x33
 [<c012ce9c>] ? worker_thread+0x0/0x1fd
 [<c012f8bc>] kthread+0x42/0x67
 [<c012f87a>] ? kthread+0x0/0x67
 [<c01030d3>] kernel_thread_helper+0x7/0x10
---[ end trace 0469584017b9bddc ]---

Hope it helps...

Best regards,
Jörg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [2.6.30] Kernel bug with dock driver
  2009-06-17  7:13 ` Zhang Rui
@ 2009-06-18 21:26   ` Joerg Platte
  2009-06-18 22:12     ` Henrique de Moraes Holschuh
  0 siblings, 1 reply; 11+ messages in thread
From: Joerg Platte @ 2009-06-18 21:26 UTC (permalink / raw)
  To: Zhang Rui; +Cc: linux-acpi, linux-kernel, Henrique de Moraes Holschuh

Am Wednesday, 17. June 2009 schrieb Zhang Rui:

> jplatte,
> it would be great if you can verify if this is a regression or not.

I'm not sure where to start. I'm not using the bay that frequent and the bug 
is not that easy to trigger. Sometimes I can trigger it directly, sometimes it 
takes a day.

> And please file a new bug report at
> http://bugzilla.kernel.org/enter_bug.cgi?product=ACPI
> attach the full dmesg after the bug and the acpidump output there.

I'll file a bug if Henrique is unable to find the bug. Maybe his debugging 
output helps to find the problem.

regards,
Jörg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [2.6.30] Kernel bug with dock driver
  2009-06-18 21:22           ` Joerg Platte
@ 2009-06-18 22:10             ` Henrique de Moraes Holschuh
  0 siblings, 0 replies; 11+ messages in thread
From: Henrique de Moraes Holschuh @ 2009-06-18 22:10 UTC (permalink / raw)
  To: Joerg Platte; +Cc: linux-acpi, linux-kernel

On Thu, 18 Jun 2009, Joerg Platte wrote:
> Am Wednesday, 17. June 2009 schrieb Henrique de Moraes Holschuh:
> > If it causes a NULL derreference, it means someone has changed API and did
> > NOT change all callsites.  It is that simple.
> >
> thinkpad_acpi: dev_name() is returning NULL

There it is.  "Someone set us up the bomb."

> Hope it helps...

Yes, it does.  Thank you.  Now at least I know what to look for.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [2.6.30] Kernel bug with dock driver
  2009-06-18 21:26   ` Joerg Platte
@ 2009-06-18 22:12     ` Henrique de Moraes Holschuh
  0 siblings, 0 replies; 11+ messages in thread
From: Henrique de Moraes Holschuh @ 2009-06-18 22:12 UTC (permalink / raw)
  To: Zhang Rui; +Cc: linux-acpi, linux-kernel, Joerg Platte

On Thu, 18 Jun 2009, Joerg Platte wrote:
> Am Wednesday, 17. June 2009 schrieb Zhang Rui:
> 
> > jplatte,
> > it would be great if you can verify if this is a regression or not.

Something is causing ACPI devices to have NULL for dev_name().  This
broke thinkpad-acpi.  Any ideas?

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-06-18 22:12 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-14 19:00 [2.6.30] Kernel bug with dock driver Joerg Platte
2009-06-15  2:02 ` Henrique de Moraes Holschuh
2009-06-15  7:36   ` Joerg Platte
2009-06-15 20:29     ` Henrique de Moraes Holschuh
2009-06-16  9:24       ` Joerg Platte
2009-06-17 12:34         ` Henrique de Moraes Holschuh
2009-06-18 21:22           ` Joerg Platte
2009-06-18 22:10             ` Henrique de Moraes Holschuh
2009-06-17  7:13 ` Zhang Rui
2009-06-18 21:26   ` Joerg Platte
2009-06-18 22:12     ` Henrique de Moraes Holschuh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).