[Bug 25832] kernel crashes upon resume if usb devices are removed when suspended

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
@ 2011-02-03 17:09 ` bugzilla-daemon
  2011-02-03 18:22 ` bugzilla-daemon
                   ` (88 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-03 17:09 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832


Greg Kroah-Hartman <greg@kroah.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|USB                         |ext4
         AssignedTo|greg@kroah.com              |fs_ext4@kernel-bugs.osdl.or
                   |                            |g
            Product|Drivers                     |File System




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
  2011-02-03 17:09 ` [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended bugzilla-daemon
@ 2011-02-03 18:22 ` bugzilla-daemon
  2011-02-04  6:28 ` bugzilla-daemon
                   ` (87 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-03 18:22 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

Theodore Tso <tytso@mit.edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tytso@mit.edu

--- Comment #6 from Theodore Tso <tytso@mit.edu>  2011-02-03 18:22:24 ---
I've looked at the kernel log, and the kernel messages displayed, and
unfortunately there's no stack trace that would help narrow down what might be
going on.

Did you have CONFIG_DETECT_SOFTLOCKUP enabled?   If not, I would suggest
enabling it, and see if you can get a stack trace.   Another useful thing to do
is to try getting some information via sysrq-l and sysrq-d.  Using sysrq-p
multiple times to see where the PC is on the various machines can be useful to
detect livelocks.  Sysrq-t can also be useful, but it's a huge amount of
information.

In general, for most of the sysrq dumps, you'll really going to need to use a
serial console attached to a second computer to record all of this output.   So
if someone would like to help reproduce this problem.  Even if the problem is
gone in 2.6.38, it would be useful to see what is going the problem in 2.6.37
so that perhaps the patches could be backported into the stable release if the
changes aren't too invasive.  So if someone can help us gather this
information, it would be really helpful.

-- Ted

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
  2011-02-03 17:09 ` [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended bugzilla-daemon
  2011-02-03 18:22 ` bugzilla-daemon
@ 2011-02-04  6:28 ` bugzilla-daemon
  2011-02-04  6:30 ` bugzilla-daemon
                   ` (86 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-04  6:28 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #7 from rocko <rockorequin@hotmail.com>  2011-02-04 06:27:51 ---
Thanks for the feedback. I didn't have CONFIG_DETECT_SOFTLOCKUP enabled, so I
recompiled the kernel with it on. I managed to reproduce the problem but
unfortunately still didn't get anything useful in the logs. The syslog shows
immediately after the wlan reconnected:

Feb  4 14:13:02 pegasus-maverick anacron[6272]: Anacron 2.3 started on
2011-02-04
Feb  4 14:13:02 pegasus-maverick anacron[6272]: Normal exit (0 jobs run)
<hard reset was required here>
Feb  4 14:14:30 pegasus-maverick kernel: imklog 4.2.0, log source = /proc/kmsg
started.


Might the number of USB devices be relevant? I used to have external 4 drives
connected and now only have 3: I managed three suspend/resume cycles without
getting the freeze, but after I attached a fourth drive (NTFS btw, whereas the
others are ext4), the next suspend/resume cycle froze the machine.


Am I right in thinking sysrq requires me to connect over the network? The
problem with that is that I can only reproduce this reliably after a
suspend/resume cycle, and of course the network connection is lost during
suspend and the machine typically locks up so quickly that I don't have time to
establish a new connection.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (2 preceding siblings ...)
  2011-02-04  6:28 ` bugzilla-daemon
@ 2011-02-04  6:30 ` bugzilla-daemon
  2011-02-04 15:31 ` bugzilla-daemon
                   ` (85 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-04  6:30 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #8 from rocko <rockorequin@hotmail.com>  2011-02-04 06:30:52 ---
And of course after I read your comment again I saw the last paragraph about
using a serial console. Unfortunately, I don't have a serial port on this PC.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (3 preceding siblings ...)
  2011-02-04  6:30 ` bugzilla-daemon
@ 2011-02-04 15:31 ` bugzilla-daemon
  2011-02-05  8:31 ` bugzilla-daemon
                   ` (84 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-04 15:31 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #9 from Alan Stern <stern@rowland.harvard.edu>  2011-02-04 15:31:49 ---
In the absence of anything better you can boot with the "no_console_suspend"
option, do

   echo 8 >/proc/sys/kernel/printk

and then initiate the suspend from a VT console.  You may be able to glean some
useful information from what shows up on the screen.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (4 preceding siblings ...)
  2011-02-04 15:31 ` bugzilla-daemon
@ 2011-02-05  8:31 ` bugzilla-daemon
  2011-02-05  8:53 ` bugzilla-daemon
                   ` (83 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-05  8:31 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #10 from rocko <rockorequin@hotmail.com>  2011-02-05 08:31:16 ---
Created an attachment (id=46432)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=46432)
inode kernel bug dump

I tried suspending/resuming from a VT console in 2.6.38-rc3 and discovered a
kernel bug - I've attached it in case it's related to this. But the system
didn't freeze up in 2.6.38.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (5 preceding siblings ...)
  2011-02-05  8:31 ` bugzilla-daemon
@ 2011-02-05  8:53 ` bugzilla-daemon
  2011-02-05 19:12 ` bugzilla-daemon
                   ` (82 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-05  8:53 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #11 from rocko <rockorequin@hotmail.com>  2011-02-05 08:52:58 ---
I've reproduced the original problem in 2.6.37 suspending/resuming from the
console. The log is quite short and looks cut off:

BUG: unable to handle kernel paging request at ffffffff8291cb60
IP: [<ffffffff8105165d>] task_rq_lock+0x4d/0xa0
PGD 1a05067 PUD 1a09063 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/hwmon/hwmon0/temp1_input
CPU 1
Modules linked in: nls_utf8 udf ses enclosure usb_storage hidp binfmt_misc
frcomm sco bnep 12cap vboxnetadp vboxnetflt vboxdrv parport_pc ppdev_dm_crypt
btrfs zlib_deflate crc32c libcrc32c snd_hda_codec_idt snd_hda_intel nvidia(P)
snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy arc4
snd_seq_oss snd_seq_midi snd_rawmidi iwlagn iwlcore snd_seq_midi_event_

Hopefully I've typed it correctly.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (6 preceding siblings ...)
  2011-02-05  8:53 ` bugzilla-daemon
@ 2011-02-05 19:12 ` bugzilla-daemon
  2011-02-05 19:56 ` bugzilla-daemon
                   ` (81 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-05 19:12 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #12 from Theodore Tso <tytso@mit.edu>  2011-02-05 19:12:45 ---
Rocko,

can you reproduce the problem reliably at this point?  And if so, can you give
us more details about how you reproduced, and how someone else might be able to
replicate the problem reliably?

Is FUSE always involved, as it was apparently in comment #10?   Can you
reproduce the problem without FUSE?

You're right, the oops message does look cut off.

-- Ted

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (7 preceding siblings ...)
  2011-02-05 19:12 ` bugzilla-daemon
@ 2011-02-05 19:56 ` bugzilla-daemon
  2011-02-05 19:58 ` bugzilla-daemon
                   ` (80 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-05 19:56 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #13 from rocko <rockorequin@hotmail.com>  2011-02-05 19:56:52 ---
I can reproduce the freezing problem fairly reliably - it doesn't always occur,
but seems likely to occur over 3-5 suspend/resume cycles. 

I also have seen it occur on another PC (an older Dell 32 bit laptop, fwiw)
using the same external drive setup, ie with four USB drives attached via a
7-in-1 D-Link USB hub. Two are 1TB ext4 drives, one is a 500GB ext4 drive, and
the other drive was either a 1TB ext4 drive or a 500GB ntfs drive. I did try
reproducing the freeze with just a single drive attached but wasn't successful,
so perhaps the number of drives is relevant.

Both PCs that I have seen the freeze on were running Ubuntu 10.10. 

To see the (truncated) debug log I suspended from a console using
"/etc/init.d/sleep.sh force".

The fact that FUSE was involved in comment #10 puzzles me, given that that bug
dump starts off saying it's an EXT4-fs error on sdb1 - isn't ext4 purely a
kernel driver? I assume FUSE would have been loaded because of the ntfs-3g
driver loaded for sdf1? It's quite possible that comment #10 was a different
bug entirely, because the system didn't freeze. If it is related, it might be
relevant that the freezing problem also occurs with only external ext4 drives
attached. FUSE might still have been loaded though at that time the PC used to
have an internal ntfs partition. Does the reference to gnome-panel reading
directory lblock 0 suggest gnome-panel was trying to read a configuration file?
Those would all be on my ext4 home partition, which is on the internal drive.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (8 preceding siblings ...)
  2011-02-05 19:56 ` bugzilla-daemon
@ 2011-02-05 19:58 ` bugzilla-daemon
  2011-02-05 20:53 ` bugzilla-daemon
                   ` (79 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-05 19:58 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #14 from rocko <rockorequin@hotmail.com>  2011-02-05 19:58:34 ---
And of course I meant to say that I suspended from the console with
"/etc/acpi/sleep.sh force". It's quite early over here.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (9 preceding siblings ...)
  2011-02-05 19:58 ` bugzilla-daemon
@ 2011-02-05 20:53 ` bugzilla-daemon
  2011-02-05 23:10 ` bugzilla-daemon
                   ` (78 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-05 20:53 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #15 from rocko <rockorequin@hotmail.com>  2011-02-05 20:53:52 ---
Created an attachment (id=46472)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=46472)
kernel paging request bug dump

I reproduced the freezing bug (ie kernel paging request bug) in 2.6.38-rc3 this
time. The bug dump is longer but still looks truncated during the call trace. 

FUSE was not loaded this time (I only had ext3/4 drives).

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (10 preceding siblings ...)
  2011-02-05 20:53 ` bugzilla-daemon
@ 2011-02-05 23:10 ` bugzilla-daemon
  2011-02-06  2:54 ` bugzilla-daemon
                   ` (77 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-05 23:10 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #16 from Theodore Tso <tytso@mit.edu>  2011-02-05 23:10:20 ---
Hmm, can you tell me how your hard drives are arranged?  Based on the comment
in #15, the bug was while the kernel was trying to wake up firefox.  There's
nothing in the (truncated) strace trace that looks like a file system would be
involved, but if there was file system involvement, presumably it would be
related to firefox writing to its dot files in your home directory.  Is your
home directory located on a USB drive?

Here's another thought.  What happens if you try a series of suspend/resumes
where you pull all of your hard drives, but you don't pull the USB hub.  Does
that change the pattern of the crashes/freezes?

Also, is the system largely quiet before the suspend --- were there any
processes downloading torrents, doing compiles, or otherwise using a large
amount of CPU or disk bandwidth before you suspended your system?

Hmm.... this looks like a real puzzler.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (11 preceding siblings ...)
  2011-02-05 23:10 ` bugzilla-daemon
@ 2011-02-06  2:54 ` bugzilla-daemon
  2011-02-06  3:44 ` bugzilla-daemon
                   ` (76 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-06  2:54 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #17 from rocko <rockorequin@hotmail.com>  2011-02-06 02:53:57 ---
The home directory is on an internal partition and thus never removed. The
system was pretty quiet before the suspend - all I had done after rebooting was
run Firefox to look at this bug report, and I didn't see any substantial disk
activity from any automated tasks like updatedb.

I tried a few cycles removing only the drives and not the hub but haven't
re-encountered the problem in 2.6.38-rc3 yet. This did reveal that I actually
have five external drives connected, not four. (There's an extra 1TB.) I'll try
later with 2.6.37 since it does seem to be easier to reproduce there.

On a few occasions the PC has frozen when I simply removed the USB cable during
normal operation (not during a resume). This happened once on the other 32 bit
PC as well. It is unfortunately much harder to reproduce. Could this indicate
perhaps that the paging request bug is a symptom of an earlier memory
corruption caused when the file system attempts to access a mounted but removed
partition before realising it has gone?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (12 preceding siblings ...)
  2011-02-06  2:54 ` bugzilla-daemon
@ 2011-02-06  3:44 ` bugzilla-daemon
  2011-02-06  4:26 ` bugzilla-daemon
                   ` (75 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-06  3:44 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #18 from rocko <rockorequin@hotmail.com>  2011-02-06 03:44:22 ---
Created an attachment (id=46552)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=46552)
log of latest kernel panic

The attached shows some more information from a 2.6.37 kernel panic during
resume. The panic followed what I assume was the paging request error (the full
log scrolled off the top of the screen, but it looks similar to the last one).
I didn't copy down all the hex data but I might be able to figure them out from
some (slightly blurry) photos of the screen if they are important.

Note: this time I had removed the drives from the hub individually rather than
removing the hub. So it didn't make any difference doing it that way around.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (13 preceding siblings ...)
  2011-02-06  3:44 ` bugzilla-daemon
@ 2011-02-06  4:26 ` bugzilla-daemon
  2011-02-06  6:43 ` bugzilla-daemon
                   ` (74 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-06  4:26 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #19 from Theodore Tso <tytso@mit.edu>  2011-02-06 04:26:25 ---
Hmm.... what if you only remove one drive while suspended?   Can you try
removing each of the drives as separate experiments, and see if you can narrow
it down to a single drive that seems to cause the problem when it is removed?

And then can you see how that drive is being used (if it is being used at all)?

If the system is mostly quiet, then presumably it would die immediately on
resume, but when the system actually tried to access the file system.  So if we
can narrow it down to a single file system, and then figure out which processes
had files opened on that file system, maybe that would give us a clue.

You said, "paging request error" --- were you running programs off of any of
the external drives?   What programs if any were accessing the drive?

I'll note that I've tried some simple experiments with removing a quiescent USB
drive from my system while it was suspended, and I haven't been able to
reproduce the problem.   I have often forgotten to unmount a hot-swapping AHCI
attached hot-swappable SATA drive in my Ultrabay slot which is idle while
suspending, and it's never caused a hang on resume.  It gets a new sdN drive
letter on resume, so all of the file system mounts are invalidated, but I've
never gotten a crash when that happens --- and I've been using 2.6.37 on my
laptop almost ever since it has been released.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (14 preceding siblings ...)
  2011-02-06  4:26 ` bugzilla-daemon
@ 2011-02-06  6:43 ` bugzilla-daemon
  2011-02-06  9:00 ` bugzilla-daemon
                   ` (73 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-06  6:43 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #20 from rocko <rockorequin@hotmail.com>  2011-02-06 06:42:59 ---
It's a *lot* harder to repeat it with just one drive attached. Over 20
suspend/resume cycles I only managed to repeat the freeze once with a single
drive attached (with the 500GB drive attached in this case).

I doubt very much that a particular program is accessing the external
partitions. In my testing I just allow Gnome to mount the partitions and once
it has settled I run the sleep command. A "lsof|grep media" after the
partitions are mounted doesn't show anything accessing the external file
systems.

Is it relevant that when I start the suspend cycle but before suspend occurs
there are some kernel messages saying that the external partitions are being
remounted? I would have thought that they would be unmounted without a remount
before suspend.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (15 preceding siblings ...)
  2011-02-06  6:43 ` bugzilla-daemon
@ 2011-02-06  9:00 ` bugzilla-daemon
  2011-02-07  1:20 ` bugzilla-daemon
                   ` (72 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-06  9:00 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #21 from rocko <rockorequin@hotmail.com>  2011-02-06 09:00:29 ---
Could this possibly be a caching/stale pointer problem? 

In comment #10, FUSE was somehow involved in an ext4-fs operation after the
fuse/ntfs partition was unmounted (and presumably its kernel FS memory freed).
Wouldn't a fuse/ext4 combination only happen if the kernel incorrectly tried to
access the freed ntfs/fuse memory data when subsequently trying to read/write
the ext4 file system (or am I incorrect when I assume that FUSE would never be
involved in ext4-fs operations?).

Two of the other crashes happened when firefox and gnome-panel tried to access
the home partition and failed - but the home partition _was_ present and
therefore should have always been accessible. Could the kernel have got
confused and been accessing access freed memory used for an absent external
partition instead of the memory being used to cache data for the home
partition?

This might also explain why the chances of the kernel panic happening are much
higher if there are multiple removed drives involved, ie because there is more
chance of freed memory incorrectly being accessed.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (16 preceding siblings ...)
  2011-02-06  9:00 ` bugzilla-daemon
@ 2011-02-07  1:20 ` bugzilla-daemon
  2011-02-07  3:11 ` bugzilla-daemon
                   ` (71 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-07  1:20 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #22 from Theodore Tso <tytso@mit.edu>  2011-02-07 01:20:01 ---
Memory doesn't get freed just because a device disappears.   The file system is
still shown as mounted after the system resumes.  Attempts to access the
mounted file system will result in errors, but the data structures don't get
magically freed until you explicit umount the failed file system.

It's more likely that the kernel is stuck in some loop trying to access the
failed file system, and looping, but in that case, it would be caused by a
specific process trying to access the file system after the system resumed. 
Say, if you were executing a program that was located on the now-failed file
system, or if a file from the now-failed file system was mmap'ed into memory,
and for some reason the kernel was looping forever instead of returning an
error to the program and/or killing the program.

This is why I asked you if you could use the various sysrq commands to try to
figure out what the kernel was doing after it locked up.  In answer to your
previous message, no, sysrq doesn't require access over the network.  It
requires access to the console.  If you have a VT console, sysrq-p can be
triggered by holding down the alt, sysrq and p keys; sysrq-l can be triggered
by alt-sysrq-l, etc.   If you have a serial console, you can send a break
followed by an l to trigger a sysrq-l, and so on.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (17 preceding siblings ...)
  2011-02-07  1:20 ` bugzilla-daemon
@ 2011-02-07  3:11 ` bugzilla-daemon
  2011-02-07  3:32 ` bugzilla-daemon
                   ` (70 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-07  3:11 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #23 from rocko <rockorequin@hotmail.com>  2011-02-07 03:11:05 ---
Thanks for the info on how to use the sysrq functions. Those keys work nicely
normally. But unfortunately they don't work once it freezes with this bug - the
freeze is total, which I guess also explains the truncated stack traces.

I wasn't suggesting that memory was freed just because the device disappeared -
I was wondering if it is possible that the kernel could umount several missing
devices and free the memory pertaining to them (all good so far), but get
pointers to the freed memory mixed up, possibly through caching of the freed
memory pointer.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (18 preceding siblings ...)
  2011-02-07  3:11 ` bugzilla-daemon
@ 2011-02-07  3:32 ` bugzilla-daemon
  2011-02-07  3:33 ` bugzilla-daemon
                   ` (69 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-07  3:32 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #24 from Theodore Tso <tytso@mit.edu>  2011-02-07 03:32:44 ---
If the freeze is total, such that sysrq isn't functioning, then that implies
that we're stuck in some kind of interrupt service routine lockup, which might
imply that it's not an ext4 specific bug.

Hmm.... can you try replicating with some other file system, like vfat or ext3?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (19 preceding siblings ...)
  2011-02-07  3:32 ` bugzilla-daemon
@ 2011-02-07  3:33 ` bugzilla-daemon
  2011-02-07  4:24 ` bugzilla-daemon
                   ` (68 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-07  3:33 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #25 from Theodore Tso <tytso@mit.edu>  2011-02-07 03:33:09 ---
(i.e., but rather something which is device driver specific)

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (20 preceding siblings ...)
  2011-02-07  3:33 ` bugzilla-daemon
@ 2011-02-07  4:24 ` bugzilla-daemon
  2011-02-07 15:36 ` bugzilla-daemon
                   ` (67 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-07  4:24 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #26 from rocko <rockorequin@hotmail.com>  2011-02-07 04:24:49 ---
The kernel panic messages (when kmsg gets that far) do say that a fatal
exception has happened in an interrupt. It seems odd to me though that
kmsg_dump locks up a seemingly random time later while reporting the panic.

I tried again with two vfat USB drives, one ext3 external drive, and an ntfs
external drive attached via the hub. I reproduced the freeze, this time caused
by a null pointer reference error rather than a unhandled page fault error.
Again it was in task_rq_lock. The stack trace stopped very abruptly this time. 

I'm not positive that this lets ext4 completely off the hook because of the
crash that was caused when firefox tried to access the internal ext4 home
partition. Presumably I'd have to try and reproduce the problem with all the
internal partitions reformatted to ext3 to rule out ext4 completely.

But perhaps ext4's memory was corrupted by the umount operations? Is there
common umount kernel code shared by all file systems?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (21 preceding siblings ...)
  2011-02-07  4:24 ` bugzilla-daemon
@ 2011-02-07 15:36 ` bugzilla-daemon
  2011-02-07 23:49 ` bugzilla-daemon
                   ` (66 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-07 15:36 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #27 from Theodore Tso <tytso@mit.edu>  2011-02-07 15:35:59 ---
There is shared umount code paths that are used by all file systems in
fs/super.c, in the VFS layer, of course.

But again, unless you or some userspace program has explicitly requested a
umount, a umount does not happen just because a disk drive has disappeared.  So
if you are locking up right after a resume, there wouldn't have been any _time_
for you or some userspace program to request an umount.   And certainly the
standard desktop userspace programs (e.g., the GNOME desktop) do not
automagically look for kernel I/O error messages caused by a disk disappearing,
and then umount the file system automagically.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (22 preceding siblings ...)
  2011-02-07 15:36 ` bugzilla-daemon
@ 2011-02-07 23:49 ` bugzilla-daemon
  2011-02-19 12:36 ` bugzilla-daemon
                   ` (65 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-07 23:49 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #28 from rocko <rockorequin@hotmail.com>  2011-02-07 23:49:14 ---
Actually, the lockup doesn't take place _immediately_ after resume - there is a
random interval of up to several seconds. A couple of times I even managed to
type my password in for the screensaver and get back to the desktop before the
freeze. So there is certainly time for a userspace process to try and access a
missing drive. 

And there is a process that umount drives when it detects they are missing (I
think it is udev?). They disappear from my Gnome desktop when I pull out the
drive or when I resume after removing the drives.

These (sample) messages after resume show it happening:

Feb  7 10:04:04 pegasus-maverick ntfs-3g[2745]: Unmounting /dev/sdf1 (My
Passport)
Feb  7 10:04:04 pegasus-maverick kernel: [  128.908709] JBD: I/O error detected
when updating journal superblock for sde1.
Feb  7 10:04:04 pegasus-maverick kernel: [  128.934868] scsi 9:0:0:1: rejecting
I/O to dead device
Feb  7 10:04:04 pegasus-maverick kernel: [  128.949322] EXT3-fs (sde1): I/O
error while writing superblock


The first message implies that a umount happens for at least missing ntfs
drives. The bug in comment #10 happened during a sys_umount call, which was
triggered by Gnome automatically in response to the missing drive.


The most confusing thing is how the sytem might have crashed when firefox was
trying to access a valid mounted ext4 partition. That's why I'm thinking it's
caused by a memory corruption or invalid pointer.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (23 preceding siblings ...)
  2011-02-07 23:49 ` bugzilla-daemon
@ 2011-02-19 12:36 ` bugzilla-daemon
  2011-02-19 15:55 ` bugzilla-daemon
                   ` (64 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-19 12:36 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #29 from rocko <rockorequin@hotmail.com>  2011-02-19 12:36:35 ---
2.6.38-rc5 seems particularly prone to this bug - I've crashed it three times
today without even suspending. The first time was when I accidentally power off
an attached USB drive; the second was when I removed an ext4 USB key; and the
third time was when I removed the USB key again while trying to get a log
output (it took about 10 goes doing this to get the crash). In all cases the
stored syslog only showed the USB disconnect, but in the last case I got a
partial bug dump on the tty console:

BUG: scheduling while atomic: kworker
RIP: 0010 [...] in tick_nohz_restart_sched_tick+0x55/0x180
...
Call trace:
cpu_idle+0xd7/0xf0
start_secondary+0x1bc/0x1c3

So might this be a scheduling problem rather than a file system problem?

In this last case I must have removed then reattached the USB key in the same
instance, because its light kept flashing ie indicating that something was
trying to read or write to it.

I notice that when a device is removed the kernel immediately re-mounts it in
read-only mode before presumably dismounting it again. So if you are trying to
copy to a device that is removed the first error that Gnome might report is
that the device is read-only, and when you press 'skip', the second error is
that the device has disappeared. Is this normal? It seems odd.

Does anyone have any more suggestions on how to debug this? It is really
annoying when the simple act of removing a USB file system completely crashes
the kernel (and trashes all your unsaved data).

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (24 preceding siblings ...)
  2011-02-19 12:36 ` bugzilla-daemon
@ 2011-02-19 15:55 ` bugzilla-daemon
  2011-02-20  0:16 ` bugzilla-daemon
                   ` (63 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-19 15:55 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #30 from Alan Stern <stern@rowland.harvard.edu>  2011-02-19 15:55:19 ---
I don't know about tracking this down, but for testing purposes you can
reproduce these conditions without actually unplugging and replugging the
device.  All you have to do is go into the /sys/bus/usb/drivers/usb_storage
directory.  In there, you'll see a file representing each usb-storage device in
your system, with names like "2-1.2:1.0".

To imitate an unplug, write the corresponding file name to the "unbind" file. 
To imitate a replug, write the file name to the "bind" file.  For example:

   echo -n 2-1.2:1.0 >unbind
   echo -n 2-1.2:1.0 >bind

This can easily be automated in a shell-script loop, which should speed up your
testing.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (25 preceding siblings ...)
  2011-02-19 15:55 ` bugzilla-daemon
@ 2011-02-20  0:16 ` bugzilla-daemon
  2011-02-21  4:02 ` bugzilla-daemon
                   ` (62 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-20  0:16 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #31 from rocko <rockorequin@hotmail.com>  2011-02-20 00:16:33 ---
Thanks, that helped a lot! Incidentally, if I don't sleep between unbind and
bind operations in the script, the kernel becomes unresponsive (ie to
everything including ssh sessions, ctrl-c/d/z and trying to switch to another
tty) and just keeps scrolling the bind/unbind messages up the screen, even long
after I remove the USB device, so presumably bind/unbind would have to fail at
this point.

Anyway, I managed to reproduce the freeze quite quickly with just a single USB
key. The stack trace is slightly longer this time:

<this scrolled off the top of the screen>
wake_up_process
wakeup_timer_fn
run_timer_softirq
? wakeup_timer_fn
__do_softirq
call_softirq
irq_exit
do_IRQ
ret_from_intr
? arch_local_irq_enable
? sched_clock_idle_wakeup_event
acpi_idle_enter_bm
cpuidle_idle_call
cpu_idle
rest_init
start_kernel
x86_64_start_reservvations
x86_64_start_kernel
...
RIP task_rq_lock
--- end trace ---
Kernel panic - not syncing: fatal exception in interrupt
Pid: 0, comm: swapper
Call trace:
IRQ ? panic
? kmsg_dump
? oops_end
? no_context
? __bad_area_nosemaphore
? bad_area_nosemaphore
? enqueue_task
? resched_task
? try_to_wake_up
? native_sched_clock
? native_sched_clock [yes, twice]
? page_fault
? task_rq_lock
? try_to_wake_up
? wake_up_process
? wake_up_process [twice]
? wakeup_timer
? run_timer_softirq
? wakeup_timer_fn
? __do_softirq
? call_softirq
? irq_exit
? do_IRQ
? ret_from_intr
? arch_local_irq_enable
? sched_clock_idle_wakeup_event
? acpi_idle_enter_bm
? cpuidle_idle_call
? cpu_idle
? rest_init
? x86_64_start_reservvations
? x86_64_start_kernel

But does that help any?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (26 preceding siblings ...)
  2011-02-20  0:16 ` bugzilla-daemon
@ 2011-02-21  4:02 ` bugzilla-daemon
  2011-02-21  4:08 ` bugzilla-daemon
                   ` (61 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-21  4:02 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #32 from rocko <rockorequin@hotmail.com>  2011-02-21 04:01:54 ---
Created an attachment (id=48502)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=48502)
diff between 2.6.37.1 and 2.6.38-rc5 configs

I think this might have had something to do with the config I was building the
kernel with.

I tried the Ubuntu weekly build of 2.6.38-rc5 (which appears to have been
compiled with gcc-4.2) and couldn't repeat the freeze. I then recompiled
2.6.38-rc5 using gcc-4.5.1 but with the Ubuntu config and haven't been able to
reproduce the freeze yet with this combination, so it seems likely a config
issue instead of a compiler issue.

The attached diff should show the 'good' config for 2.6.38-rc5 (ie copied from
the Ubuntu weekly build) against the config for the build I did against
2.6.38-rc4, which is very similar to the config I was using with 2.6.38-rc5
that kept crashing.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (27 preceding siblings ...)
  2011-02-21  4:02 ` bugzilla-daemon
@ 2011-02-21  4:08 ` bugzilla-daemon
  2011-02-21  8:59 ` bugzilla-daemon
                   ` (60 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-21  4:08 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #33 from rocko <rockorequin@hotmail.com>  2011-02-21 04:08:10 ---
Created an attachment (id=48512)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=48512)
syslog from VM which is hanging for 10-15 seconds at a time

In case it is useful... I also tried reproducing the freeze running Ubuntu
11.04 (running 2.6.38-rc4) in VirtualBox and repeatedly assigning/unassigning
the test USB key to it (which used to have a good chance of freezing the host,
but not now). I didn't get a total freeze, but I did start getting freezes for
maybe 10-15 seconds. The USB access light started flashing permanently so I
seem to have recreated the situation where some process has crashed trying to
access the drive, or perhaps just crashed the drive itself, but the kernel
doesn't completely crash, just completely pause temporarily. The key still
shows up in an ls as attached to 1-1:1.0, but I can't bind/unbind it anymore -
it gives write errors (no such device) and "device offlined" errors.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (28 preceding siblings ...)
  2011-02-21  4:08 ` bugzilla-daemon
@ 2011-02-21  8:59 ` bugzilla-daemon
  2011-02-21 16:48 ` bugzilla-daemon
                   ` (59 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-21  8:59 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #34 from rocko <rockorequin@hotmail.com>  2011-02-21 08:59:14 ---
OK, spoke too soon - my host with 2.6.38-rc5 compiled against the new config
just froze up completely after removing the external USB drives and then
resuming. Are there any known issues building the kernel with gcc 4.5.1?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (29 preceding siblings ...)
  2011-02-21  8:59 ` bugzilla-daemon
@ 2011-02-21 16:48 ` bugzilla-daemon
  2011-02-22  7:03 ` bugzilla-daemon
                   ` (58 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-21 16:48 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #35 from Alan Stern <stern@rowland.harvard.edu>  2011-02-21 16:48:34 ---
If you generate plug and unplug events too rapidly, you will overrun the USB
hub driver.  If you run the test while using a VT console instead of X11, and
if you enable CONFIG_USB_DEBUG and do "echo 8 >/proc/sys/kernel/printk" before
starting, you'll be able to see what's going on.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (30 preceding siblings ...)
  2011-02-21 16:48 ` bugzilla-daemon
@ 2011-02-22  7:03 ` bugzilla-daemon
  2011-03-03 15:23 ` bugzilla-daemon
                   ` (57 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-02-22  7:03 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #36 from rocko <rockorequin@hotmail.com>  2011-02-22 07:03:33 ---
Thanks for the info - I assume that's to see what has is happening if I can
make the USB light flash permanently? Or might it help in figuring out what
causes the full kernel crash?

Btw, this kernel freeze bug is still present in 2.6.38-rc6 - I have reproduced
the  unable to handle paging request crash shortly after resuming.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (31 preceding siblings ...)
  2011-02-22  7:03 ` bugzilla-daemon
@ 2011-03-03 15:23 ` bugzilla-daemon
  2011-03-06 10:25 ` bugzilla-daemon
                   ` (56 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-03-03 15:23 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #37 from rocko <rockorequin@hotmail.com>  2011-03-03 15:23:14 ---
The bug is still present in 2.6.38-rc7 (it happened the first time I resumed
after removing the external drives).

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (32 preceding siblings ...)
  2011-03-03 15:23 ` bugzilla-daemon
@ 2011-03-06 10:25 ` bugzilla-daemon
  2011-03-06 15:59 ` bugzilla-daemon
                   ` (55 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-03-06 10:25 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #38 from rocko <rockorequin@hotmail.com>  2011-03-06 10:25:46 ---
The syslog shows that, for whatever reason, sometimes processes do try and
access files on the drives after they are removed. Messages to this effect
typically appear if the kernel _doesn't_ crash.

Is it possible that there might be a sync/locking problem here that leads to
the crash? eg could the crash occur if a process tries to access the removed
drive while the kernel is in the process of unmounting it? Perhaps this would
explain why the crash is more likely to happen on resume (ie when many
processes are waking up and presumably more likely to try and access files) and
when multiple drives are removed (ie when the kernel is likely to be more busy
trying to unmount the missing drives).

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (33 preceding siblings ...)
  2011-03-06 10:25 ` bugzilla-daemon
@ 2011-03-06 15:59 ` bugzilla-daemon
  2011-03-12  6:03 ` bugzilla-daemon
                   ` (54 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-03-06 15:59 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #39 from Alan Stern <stern@rowland.harvard.edu>  2011-03-06 15:59:05 ---
It is expected that processes will try to access files on a drive after it has
been removed.  After all, they have no way of knowing that the drive is gone.

I'm not convinced that the timing with respect to your resume is relevant. 
After all, you mentioned that the crashes are subject to random delays,
sometimes happening several seconds after the system resumes.

Also, the kernel does not automatically unmount missing drives.  Unmounting is
a separate operation from removal.  Unmounting takes place only when the user
(or a program) does the equivalent of running the umount command.

You did say that this problem doesn't arise in 2.6.36.  Your best chance of
tracking it down might be to try doing a git bisection between 2.6.36 and
2.6.37.  However, for this to work, you have to be able to detect reliably
whether or not a particular kernel has the bug.  The test you used in comment
#31 might do the job.  If you can narrow this down to an individual kernel
change, that would be a big help.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (34 preceding siblings ...)
  2011-03-06 15:59 ` bugzilla-daemon
@ 2011-03-12  6:03 ` bugzilla-daemon
  2011-03-12 12:12 ` bugzilla-daemon
                   ` (53 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-03-12  6:03 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #40 from rocko <rockorequin@hotmail.com>  2011-03-12 06:03:21 ---
I tried kernel 2.6.36, and it also crashed the first time. I must just not have
noticed it before (or maybe the compiler *is* important). The log was along the
lines of:

Pid: 0, comm: swapper Not tainted 2.6.36-git
...
Call Trace:
? sched_clock_local
tick_nohz_stop_sched_tick
cpu_idle
rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel

RIP get_next_timer_interrupt

Kernel panic: not syncing: Attempted to kill the idle task!
Pid: 0, comm: swapper Tainted: G   D  2.6.36-git

Call trace:
panic
do_exit
? kmsg_dump
oops_end
no_context
__bad_area_nosemaphore
bad_area_nosemaphore
do_page_fault
page_fault
? get_next_timer_interrupt
? get_next_timer_interrupt
? sched_clock_local
tick_nohz_stop_sched_tick
cpu_idle
rest_init
...

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (35 preceding siblings ...)
  2011-03-12  6:03 ` bugzilla-daemon
@ 2011-03-12 12:12 ` bugzilla-daemon
  2011-03-12 15:51 ` bugzilla-daemon
                   ` (52 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-03-12 12:12 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832


Rafael J. Wysocki <rjw@sisk.pl> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|21782                       |




--- Comment #41 from Rafael J. Wysocki <rjw@sisk.pl>  2011-03-12 12:12:31 ---
Dropping from the list of post-2.6.36 regressions.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (36 preceding siblings ...)
  2011-03-12 12:12 ` bugzilla-daemon
@ 2011-03-12 15:51 ` bugzilla-daemon
  2011-03-13  0:41 ` bugzilla-daemon
                   ` (51 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-03-12 15:51 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #42 from Alan Stern <stern@rowland.harvard.edu>  2011-03-12 15:51:28 ---
Can you install an older compiler and build a kernel with it?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (37 preceding siblings ...)
  2011-03-12 15:51 ` bugzilla-daemon
@ 2011-03-13  0:41 ` bugzilla-daemon
  2011-03-13  2:26 ` bugzilla-daemon
                   ` (50 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-03-13  0:41 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #43 from rocko <rockorequin@hotmail.com>  2011-03-13 00:41:00 ---
I tried both 2.6.38-rc8 built with gcc 4.4.5 and also the Ubuntu weekly build
of 2.6.38-rc8, which says it is built with gcc 4.2.3, and both crashed.

I didn't manage to crash the Ubuntu 2.6.35-28 kernel, so it might be a
regression between 2.6.35 and 36.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (38 preceding siblings ...)
  2011-03-13  0:41 ` bugzilla-daemon
@ 2011-03-13  2:26 ` bugzilla-daemon
  2011-03-13  2:45 ` bugzilla-daemon
                   ` (49 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-03-13  2:26 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #44 from rocko <rockorequin@hotmail.com>  2011-03-13 02:26:21 ---
Created an attachment (id=50702)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=50702)
crash log from 2.6.36-rc1

I got a magnificent crash log from 2.6.36-rc1 on a crash during resume -
unfortunately about five pages worth of log scrolled off the console and I only
got the last one (see attached jpg). There was nothing in the syslog saved to
disk.

I'm not sure exactly what caused the crash, but there are some file system and
page fault calls in the stack trace (as well as some wireless calls), so
perhaps it is related to this bug. This kernel was quite flakey, though - it
crashed on two other occasions when I hadn't even removed any USB drives.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (39 preceding siblings ...)
  2011-03-13  2:26 ` bugzilla-daemon
@ 2011-03-13  2:45 ` bugzilla-daemon
  2011-03-13  3:01 ` bugzilla-daemon
                   ` (48 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-03-13  2:45 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #45 from Alan Stern <stern@rowland.harvard.edu>  2011-03-13 02:45:04 ---
That "invalid_op" entry in the stack trace strongly suggests that memory is
getting corrupted, and I don't mean as a result of a software bug.  Have you
run a memory hardware test recently?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (40 preceding siblings ...)
  2011-03-13  2:45 ` bugzilla-daemon
@ 2011-03-13  3:01 ` bugzilla-daemon
  2011-03-13 21:17 ` bugzilla-daemon
                   ` (47 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-03-13  3:01 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #46 from rocko <rockorequin@hotmail.com>  2011-03-13 03:01:46 ---
It's always a possibility, but I don't think this is a RAM problem. I have run
memory tests fairly recently to try and rule it out. One of the BIOS RAM tests
is very extensive: I didn't run it to completion because it would have taken
many hours to complete, but did let it go for 20 minutes or so. It tested quite
a few patterns and didn't find any failures. I also occasionally make it run
the 'quick' memory check upon boot (about 5 minutes worth) and it never finds
problems.

There are other indicators that this probably isn't a hardware problem:

* I have seen this bug occur on another PC that was running 2.6.37 at the time
- it crashed when I unplugged the external USB hub without unmounting the
drives first.

* I don't get crashes during normal operation - only after removing USB drives.
And only if they were mounted prior to removing them.

* Just now I have been testing Ubuntu's 2.6.35-28-generic kernel more
extensively and haven't been able to crash it (around 20 suspend/resume cycles
without incident).

* I did manage to crash a VM running Ubuntu 11.04 with kernel 2.6.38-rc8 by
using VirtualBox's USB menu in the status bar to repeatedly attach and remove
an external USB drive. It took about ten goes before it hung. In the past, the
host has usually crashed during this test, but this time I was running 2.6.35
on the host and only the VM crashed.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (41 preceding siblings ...)
  2011-03-13  3:01 ` bugzilla-daemon
@ 2011-03-13 21:17 ` bugzilla-daemon
  2011-03-14  4:10 ` bugzilla-daemon
                   ` (46 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-03-13 21:17 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #47 from Alan Stern <stern@rowland.harvard.edu>  2011-03-13 21:17:00 ---
Okay, if you believe this, and if you can reliably detect the bug (which is
often a difficult thing to be sure of), then you can try bisecting the kernel
changes between 2.6.35 and 2.6.36-rc1.

The thing is, the test really does have to be reliable.  If you make a mistake
(tell git that a kernel doesn't have the bug when it actually does) then you're
very unlikely to track down the true cause.  Still, it's worth a try --
especially since nothing else seems to help!

The test you used in comment #31 would be a good approach.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (42 preceding siblings ...)
  2011-03-13 21:17 ` bugzilla-daemon
@ 2011-03-14  4:10 ` bugzilla-daemon
  2011-03-14 14:18 ` bugzilla-daemon
                   ` (45 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-03-14  4:10 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #48 from rocko <rockorequin@hotmail.com>  2011-03-14 04:10:06 ---
Unfortunately there are a couple of problems with bisecting this, quite apart
from the time it takes to build and test each kernel...

The first is that it is unsurprisingly easy to crash an rc1 kernel, which makes
it even harder to reliably detect the bug. For instance, I managed to crash the
first bisect of 35-36.rc1 with the suspend/resume test eventually (it seems
much more robust than 2.6.38 with respect to this bug) but it isn't clear that
the problem occurred as a result of the USB removal - the stack trace looks
like it includes memory allocations and socket receives. Why, the first time I
booted the 36rc1 kernel, it hung completely at the login screen with no human
input whatsoever, clearly as a result of a different bug. I'm not even positive
that the log I posted from the 36rc1 kernel crash above is related to this bug,
as it looks different from the 2.6.37/38 logs.

The second problem is that I can't necessarily compile all the bisects! For
instance commit f6cec0ae58c17522a7bc4e2f39dae19f199ab534 (the second bisect I
tried) fails with this error:

drivers/staging/comedi/drivers/das08_cs.c: In function
‘das08_pcmcia_config_loop’:
drivers/staging/comedi/drivers/das08_cs.c:225:8: error: ‘struct pcmcia_device’
has no member named ‘io

Is there a way to make the kernel handle a bug like this more gracefully? It
seems that there are many great debug tools for extracting information about
process states, but they are all useless here because the crash is so
catastrophic.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (43 preceding siblings ...)
  2011-03-14  4:10 ` bugzilla-daemon
@ 2011-03-14 14:18 ` bugzilla-daemon
  2011-03-24 23:06 ` bugzilla-daemon
                   ` (44 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-03-14 14:18 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #49 from Alan Stern <stern@rowland.harvard.edu>  2011-03-14 14:18:48 ---
It is possible to tell git that you can't test a certain kernel; it will then
choose a different bisection point for testing.

There is no way to make the kernel handle a catastrophic bug gracefully.  There
are ways to capture a complete kernel log when a bug does occur, by using a
serial console or a network console.  There's also the kernel's internal
debugger (kgdb).  Besides, information about process states probably won't
help.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (44 preceding siblings ...)
  2011-03-14 14:18 ` bugzilla-daemon
@ 2011-03-24 23:06 ` bugzilla-daemon
  2011-03-31  0:44 ` bugzilla-daemon
                   ` (43 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-03-24 23:06 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #50 from rocko <rockorequin@hotmail.com>  2011-03-24 23:06:05 ---
For a time I thought I had tracked the problem down to a single 500 GB ext4
hard drive, because it became possible to able to reliably and quickly
reproduce the crash in my VM by simply simulating USB plug/unplug - it was
crashing on the third or fourth simulated unplug. Then after I ran fsck on the
drive it was much harder to reproduce the crash (I didn't manage to reproduce
it in ten separate attempts, each with hundreds of plugs/unplugs). 

I removed the drive completely from the system and had no crashes for over a
week, until this morning when 2.6.38.1 crashed upon resume (of course after I
had removed the drives during suspend).

Is it possible something in the ext4 drivers would try to access a removed
drive that is has some sort of misconfiguration (ie one that is fixed by an
fsck) and cause the crash? ie Seeing as how an fsck made the problem apparently
'go away' for a while? It would explain why more drives increase the chances of
a crash.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (45 preceding siblings ...)
  2011-03-24 23:06 ` bugzilla-daemon
@ 2011-03-31  0:44 ` bugzilla-daemon
  2011-03-31  0:49 ` bugzilla-daemon
                   ` (42 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-03-31  0:44 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #51 from rocko <rockorequin@hotmail.com>  2011-03-31 00:44:24 ---
I've had a report from another user who is seeing the same problem with
2.6.36-generic, which means it isn't confined to my three PCs.

Perhaps this means we'll be seeing more people experiencing it once Ubuntu
11.04 comes out using kernel 2.6.38...

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (46 preceding siblings ...)
  2011-03-31  0:44 ` bugzilla-daemon
@ 2011-03-31  0:49 ` bugzilla-daemon
  2011-04-01 15:09 ` bugzilla-daemon
                   ` (41 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-03-31  0:49 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #52 from rocko <rockorequin@hotmail.com>  2011-03-31 00:49:40 ---
Some quite relevant information is that the other user has reported the
external drive he unplugs during suspend mode is an ext3 drive, not an ext4
drive.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (47 preceding siblings ...)
  2011-03-31  0:49 ` bugzilla-daemon
@ 2011-04-01 15:09 ` bugzilla-daemon
  2011-04-01 23:58 ` bugzilla-daemon
                   ` (40 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-01 15:09 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #53 from Theodore Tso <tytso@mit.edu>  2011-04-01 15:09:03 ---
Folks who are experiencing this problem might want to try 2.6.38.2.   There was
a fix that was committed to mainline and backported to the stable kernels that
may fix this problem:

commit 95f28604a65b1c40b6c6cd95e58439cd7ded3add
Author: Jens Axboe <jaxboe@fusionio.com>
Date:   Thu Mar 17 11:13:12 2011 +0100

    fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away

    We don't have proper reference counting for this yet, so we run into
    cases where the device is pulled and we OOPS on flushing the fs data.
    This happens even though the dirty inodes have already been
    migrated to the default_backing_dev_info.

    Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
    Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe <jaxboe@fusionio.com>

Sorry for not responding to this bug sooner, but I've been crazy busy in the
last couple of weeks; troubleshooting and discussion was taking place on LKML,
and I was pretty sure this wasn't an ext4 specific issue.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (48 preceding siblings ...)
  2011-04-01 15:09 ` bugzilla-daemon
@ 2011-04-01 23:58 ` bugzilla-daemon
  2011-04-04  2:32 ` bugzilla-daemon
                   ` (39 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-01 23:58 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #54 from rocko <rockorequin@hotmail.com>  2011-04-01 23:57:59 ---
That patch certainly sounds like it's on the right track for fixing this bug,
but, alas, I just reproduced the oops in 2.6.38.2 on my fifth attempt. Perhaps
there are still other issues along this line?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (49 preceding siblings ...)
  2011-04-01 23:58 ` bugzilla-daemon
@ 2011-04-04  2:32 ` bugzilla-daemon
  2011-04-04 14:11 ` bugzilla-daemon
                   ` (38 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-04  2:32 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #55 from rocko <rockorequin@hotmail.com>  2011-04-04 02:32:00 ---
The bug is still present in 2.6.39-rc1.

Is it possible that the memory corruption is caused by changes in the locking
code? Wasn't 2.6.36 where the BKL was removed from the usb subsystem?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (50 preceding siblings ...)
  2011-04-04  2:32 ` bugzilla-daemon
@ 2011-04-04 14:11 ` bugzilla-daemon
  2011-04-05 15:17 ` bugzilla-daemon
                   ` (37 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-04 14:11 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #56 from Alan Stern <stern@rowland.harvard.edu>  2011-04-04 14:11:52 ---
The USB stack never really used the BKL for very much, and the mass-storage
pathways didn't use it at all.  Besides, the problem you're seeing doesn't
involve USB directly.  I'm pretty confident that it would occur with any
hot-unpluggable drive.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (51 preceding siblings ...)
  2011-04-04 14:11 ` bugzilla-daemon
@ 2011-04-05 15:17 ` bugzilla-daemon
  2011-04-06 23:01 ` bugzilla-daemon
                   ` (36 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-05 15:17 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #57 from rocko <rockorequin@hotmail.com>  2011-04-05 15:17:02 ---
Ok. I only mentioned it because by chance I saw
https://lkml.org/lkml/2011/4/3/70, which mentioned a patch to usbhid/hiddev.c
at
http://git.kernel.org/?p=linux/kernel/git/jikos/hid.git;a=commitdiff;h=9c9e54a8df0be48aa359744f412377cc55c3b7d2.
The patch says "This obviously lead to memory corruptions at device disconnect
time", which of course looks a lot like the issue I'm getting, so I thought
perhaps there might have been other problems introduced when the BKL was
removed.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (52 preceding siblings ...)
  2011-04-05 15:17 ` bugzilla-daemon
@ 2011-04-06 23:01 ` bugzilla-daemon
  2011-04-22  6:00 ` bugzilla-daemon
                   ` (35 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-06 23:01 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #58 from rocko <rockorequin@hotmail.com>  2011-04-06 23:00:52 ---
I reproduced the problem removing/inserting an SD card in the SD card reader
(SD Host controller: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev
22), but only if it has an ext4 file system on it. I couldn't make it happen at
all with a vfat file system on it - I tried 30 times or so with the vfat card,
but it happened on just the third removal of the ext4 file system.

So it does seem to be an ext3/4 issue rather than a USB-related issue.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (53 preceding siblings ...)
  2011-04-06 23:01 ` bugzilla-daemon
@ 2011-04-22  6:00 ` bugzilla-daemon
  2011-04-22 10:13 ` bugzilla-daemon
                   ` (34 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-22  6:00 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #59 from rocko <rockorequin@hotmail.com>  2011-04-22 06:00:52 ---
This is still an issue in both 2.6.38.3 and 2.6.39-rc4. Fairly frequently,
complete and catastrophic failure of the kernel occurs when I remove an ext3 or
ext4 USB drive.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (54 preceding siblings ...)
  2011-04-22  6:00 ` bugzilla-daemon
@ 2011-04-22 10:13 ` bugzilla-daemon
  2011-04-22 10:37 ` bugzilla-daemon
                   ` (33 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-22 10:13 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #60 from Theodore Tso <tytso@mit.edu>  2011-04-22 10:13:05 ---
One of the problems which has made this hard to debug is (a) the kernel
oops/panic messages are inconsistent (they're not all the same, and some of
them don't have anything file system related at all), and (b) mant of them are
incomplete.

Note that we need more than just the stack traces, too.  We also need the
beginning of the oops message, complete with the IP/RIP information. 

At least one of these systems looks like it's using some kind of network
console?   Can you set up a simple serial console, and then do some experiments
where you yank out the USB drive, and tell us whether it happens reliably 100%
of the time?   50% of the time?   10% of the time?    If you run sync first and
the system is idle, does that make it more or less likely to happen?   If you
then start writing to the now-disappeared file system with a single command,
does it crash right away?  Does it crash 30 seconds later?

I've done a simple experiment where I've mounted a USB stick, written to it,
typed sync, but then without typing umount, yanked the stick out, and then
tried writing to the file system.  It didn't crash for me.   So some explicit
instructions of what you can do that causes reliable crashes would be very
useful, and then if you can set up a serial console so we can get complete and
reliably crash logs, that would also be very useful.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (55 preceding siblings ...)
  2011-04-22 10:13 ` bugzilla-daemon
@ 2011-04-22 10:37 ` bugzilla-daemon
  2011-04-22 11:58 ` bugzilla-daemon
                   ` (32 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-22 10:37 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #61 from rocko <rockorequin@hotmail.com>  2011-04-22 10:37:37 ---
I can reliably reproduce it with the following, running in Ubuntu (11.04 at the
moment, but 10.10 works as well):

1. I insert a USB key formatted with ext4.

2. I run this simple script, passing the id that shows up in
/sys/bus/usb/drivers/usb-storage for the USB key as the argument, eg 2-2:1.0:

#!/bin/bash
SLEEP=3
function list {
  if [ -e "$1" ]; then
    echo $1
  fi
}

cd /sys/bus/usb/drivers/usb-storage
if [ "$1" == "" ] || [ "$2" != "" ]; then
  echo "Usage: $0 usb device"
  echo "With one of these: "
  for file in 1* ; do list $file; done
  for file in 2* ; do list $file; done
  exit 1
fi

echo 8 > /proc/sys/kernel/printk
count=1
while true; do
  echo -n $1 >unbind
  echo $count
  sleep $SLEEP
  echo -n $1 >bind
  sleep $SLEEP
  ((count=count+1))
done

I'm making it sleep 3 seconds between unbind/bind events because that gives
Ubuntu plenty of time to mount the drive after the bind.

When it oopses, it typically oopses within a few seconds of the unbind event
(I'd say in less than 10 seconds and often immediately). For me it normally
will oops inside 10 bind/unbinds, but 2.6.39-rc4 took more like 20 to 30 goes
when I tried it this morning).

I'll try again sending the output to another PC, but I'm pretty sure last time
I tried that the ssh session crashed before it could dump the stack trace, just
as the tty console usually crashes before it finishes dumping.

When I get time I'll try some variations as well like making it sync and sleep
before the unbind.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (56 preceding siblings ...)
  2011-04-22 10:37 ` bugzilla-daemon
@ 2011-04-22 11:58 ` bugzilla-daemon
  2011-04-22 13:42 ` [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed bugzilla-daemon
                   ` (31 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-22 11:58 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #62 from rocko <rockorequin@hotmail.com>  2011-04-22 11:58:14 ---
So I added in a "sync && sleep 3" before the unbind statement and it crashed on
the very first unbind - before it even got to echo $count. And rebooting took
five minutes while all the inbuilt drives fsck'd themselves.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (57 preceding siblings ...)
  2011-04-22 11:58 ` bugzilla-daemon
@ 2011-04-22 13:42 ` bugzilla-daemon
  2011-04-22 15:00 ` bugzilla-daemon
                   ` (30 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-22 13:42 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832


rocko <rockorequin@hotmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|kernel crashes upon resume  |kernel crashes when a
                   |if usb devices are removed  |mounted ext3/4 file system
                   |when suspended              |is physically removed




--- Comment #63 from rocko <rockorequin@hotmail.com>  2011-04-22 13:41:45 ---
2.6.39-rc4 is either _much_ harder to crash, or my script isn't as reliable at
crashing the kernel as I thought (until now I've mostly used the suspend/resume
method with multiple drives attached). I've now done over 200 bind/unbind
cycles of this external ext4 USB key without a crash. But I certainly did crash
it once earlier today.

An observation from earlier that might be relevant here: a couple of weeks ago
one of my drives got itself into a state that made it crash the kernel almost
every time I unplugged it, but after I did an fsck on it it became
significantly less likely to cause the crash. And after my last reboot there
was a lot of fsck'ing going on, probably including the external drive.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (58 preceding siblings ...)
  2011-04-22 13:42 ` [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed bugzilla-daemon
@ 2011-04-22 15:00 ` bugzilla-daemon
  2011-04-23  0:32 ` bugzilla-daemon
                   ` (29 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-22 15:00 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #64 from Theodore Tso <tytso@mit.edu>  2011-04-22 15:00:15 ---
I'm going to guess that your script depends on your desktop trying to access
(and possibly write to) your USB stick during the time that you are running
causing the unbind to happen?

A more useful test case would be one that works even if no desktop is running
(i.e., you're logged in via SSH, or the VT console, or the serial console), and
the script contains all of the commands which are accessing the USB storage
device.  Otherwise, it might be dependent on what desktop you are running, and
someone who is using fvwm or KDE (for example) if you happened to create the
test case while using the GNOME desktop (and then you have to answer the
question of which version of the GNOME desktop, and what desktop pacakges you
might have installed, etc.)

I very much doubt it has to do with when the file system was fsck'ed.  The real
question is what specific I/O pattern happened to be going on at the time when
the USB stick was yanked out.  And that might explain why I don't see it,
because normally I'm not crazy enough to yank out a device while it's actively
been accessed.  (And I don't like desktops that initiate a lot of I/O behind my
back.... since that generally means it's doing this at times when I might not
like it, such as when I'm running on battery and am trying to conserve
power....)

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (59 preceding siblings ...)
  2011-04-22 15:00 ` bugzilla-daemon
@ 2011-04-23  0:32 ` bugzilla-daemon
  2011-04-23  4:12 ` bugzilla-daemon
                   ` (28 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-23  0:32 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #65 from rocko <rockorequin@hotmail.com>  2011-04-23 00:32:32 ---
Yes, the crash only happens when I'm running a desktop (gdm in this case),
partly because this is what handles the auto-mounting of the USB drives. I
suppose we *could* tell people to just not use a desktop at all :)

I don't think users deliberately yank out mounted USB drives. I think the most
likely real-world scenarios that trigger this crash are (1) suspend, remove
drive, resume [ie how I first noticed this], (2) remove the wrong drive by
accident, (3) a power failure makes the drive suddenly go offline [I've seen
that, too].

Anyway, my test crash case using the script above that was working so reliably
for this ext4 USB key is no longer crashing the kernel in either 2.6.38.3 or
2.6.39-rc4 (I've done over a thousand bind/unbind cycles for each now). My
guess is that the suspend/resume test results in a higher likelihood of I/O
(especially if multiple drives are involved) and therefore triggering the bug.

I'm also still curious whether it's possible for the ext3/4 drive to somehow
get its format into a state that causes this to happen, given that yesterday it
crashed so reliably but not today. If so, couldn't it be possible that such a
state could be corrected by a fsck and therefore reduce the chances of this
mystery I/O pattern happening?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (60 preceding siblings ...)
  2011-04-23  0:32 ` bugzilla-daemon
@ 2011-04-23  4:12 ` bugzilla-daemon
  2011-04-23 19:31 ` bugzilla-daemon
                   ` (27 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-23  4:12 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #66 from rocko <rockorequin@hotmail.com>  2011-04-23 04:12:30 ---
FYI, I reproduced it on the second attempt at suspend/resume with 2.6.38.4. I
was slightly hopeful it might be fixed as some of the patches seemed to be
addressing this kind of situation.

I had a thought about the consistency of the logs: I think the ones related to
this bug might be consistent, it's just that I've posted logs that might be due
to different bugs entirely. The ntfs/fuse one turned out to be another bug, for
instance. It's possible the logs from 2.6.36-rc1 were caused by a number of of
other bugs, rc1 being in all likelihood the least robust of any kernel release.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (61 preceding siblings ...)
  2011-04-23  4:12 ` bugzilla-daemon
@ 2011-04-23 19:31 ` bugzilla-daemon
  2011-04-24  1:35 ` bugzilla-daemon
                   ` (26 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-23 19:31 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #67 from Theodore Tso <tytso@mit.edu>  2011-04-23 19:31:45 ---
Just because *I* don't like desktops that initiate I/O at random times when I
don't request doesn't mean that other users shouldn't use it.  It's just that
if we're talking about making a reliable test case, it's much better if it
doesn't depend on random I/O initiated by a desktop.  The test case should do
whatever I/O is needed, so that it is complete reproducible, even by people who
don't necessarily use the same desktop as you.

Thinking about this some more, *very* recently (as in the most recent merge
window) there have been some hanges recently to avoid deadlock in ext3/4 on
when freezing and unfreezing file systems for snapshots, and that code path is
also used on suspend/resume.  Those changes came in way after 2.6.36-rc1, so
yes, if they are also causing some issues with 2.6.39-rc2+ systems, it's very
likely that there are different bugs involved.   Which is why I insist on
getting full and accurate OOPS logs, so we can see if they are different
crashes that happen to have apparently the same symptoms caused by the same
event (i.e. USB keys getting rudely yanked out of the system).

By the way, for years and years and years USB disks just didn't work at all
across suspend/reusmes.  Which is why I have scripts into my suspend/resume
framework to automatically unmount removeable disks at suspend-time....

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (62 preceding siblings ...)
  2011-04-23 19:31 ` bugzilla-daemon
@ 2011-04-24  1:35 ` bugzilla-daemon
  2011-04-25  0:36 ` bugzilla-daemon
                   ` (25 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-24  1:35 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #68 from rocko <rockorequin@hotmail.com>  2011-04-24 01:35:15 ---
Created an attachment (id=55282)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=55282)
oops on USB unbind - unable to handle kernel paging request

> Just because *I* don't like desktops that initiate I/O at random times
> when I don't request doesn't mean that other users shouldn't use it.

Honest, I wasn't having a go, I meant it tongue-in-cheek!

I have managed to reproduce the crash on a VM and log the output via
netconsole. An important thing to note is that it made no difference when my
script was set to call sync just prior to the unbind. (In fact, it crashed on
the very first unbind when I did this.)

The VM was doing very little: I booted into the desktop, ran gnome-terminal,
ran the modprobe command to load netconsole, and then ran the unbind/rebind
script. The first crash happened on the fourth unbind.

I've attached the resulting log, which is for an 'unable to handle kernel
paging request' - hopefully it's sufficiently complete, but it doesn't look
much longer than some of the ones I captured manually earlier. Note that I
wasn't doing suspend/resume, just running the unbind/rebind script, and this
one is without sync being called.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (63 preceding siblings ...)
  2011-04-24  1:35 ` bugzilla-daemon
@ 2011-04-25  0:36 ` bugzilla-daemon
  2011-04-25  0:37 ` bugzilla-daemon
                   ` (24 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-25  0:36 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #69 from rocko <rockorequin@hotmail.com>  2011-04-25 00:36:39 ---
Created an attachment (id=55312)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=55312)
oops log for null pointer dereference sync + unbind

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (64 preceding siblings ...)
  2011-04-25  0:36 ` bugzilla-daemon
@ 2011-04-25  0:37 ` bugzilla-daemon
  2011-04-25  0:39 ` bugzilla-daemon
                   ` (23 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-25  0:37 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #70 from rocko <rockorequin@hotmail.com>  2011-04-25 00:37:51 ---
Created an attachment (id=55322)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=55322)
another null pointer dereference

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (65 preceding siblings ...)
  2011-04-25  0:37 ` bugzilla-daemon
@ 2011-04-25  0:39 ` bugzilla-daemon
  2011-04-25 20:28 ` bugzilla-daemon
                   ` (22 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-25  0:39 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #71 from rocko <rockorequin@hotmail.com>  2011-04-25 00:39:14 ---
Created an attachment (id=55332)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=55332)
and another null pointer dereference

Do these logs have enough information to continue?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (66 preceding siblings ...)
  2011-04-25  0:39 ` bugzilla-daemon
@ 2011-04-25 20:28 ` bugzilla-daemon
  2011-04-26  0:28 ` bugzilla-daemon
                   ` (21 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-25 20:28 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #72 from Theodore Tso <tytso@mit.edu>  2011-04-25 20:28:11 ---
The thing is, the given the stack trace and the fact that it's caused by a null
pointer in next_interrupt, makes me highly dubious this has anything to do with
ext4.   You could put in a printk in fs/ext4/super.c:ext4_put_super() and make
sure it's not trigger, since that's the only place where we mess with timers at
all --- but del_timer() properly disable interrupts before mucking with the
pointer, so I'm not convinced it was caused by ext4.  (Also, the ext4 timer is
only present if the file system has reported any errors, and it only fires once
every 24 hours, so it's highly unlikely it would be on a timer bucket that
would the next timer interrupt would trip against right afterwards.  So I very
much doubt it's caused by the ext4 error reporting timer.)

And if you're seeing this on ext3, which doesn't use a timer at all, then it's
definitely not the fault of the file system layer, but probably something in
the usb block device driver...

In all of the stack traces, there's an scsi disk attach going on right before
the crash:

[ 1255.355192] scsi4 : usb-storage 1-1:1.0
[ 1256.387085] scsi 4:0:0:0: Direct-Access     JetFlash TS2GJF110        0.00
PQ: 0 ANSI: 2
[ 1256.409758] sd 4:0:0:0: Attached scsi generic sg2 type 0
[ 1256.425575] sd 4:0:0:0: [sdb] 4063232 512-byte logical blocks: (2.08 GB/1.93
GiB)
[ 1256.434955] sd 4:0:0:0: [sdb] Write Protect is off
[ 1256.435172] sd 4:0:0:0: [sdb] Mode Sense: 00 00 00 00
[ 1256.447520] sd 4:0:0:0: [sdb] Asking for cache data failed
[ 1256.448031] sd 4:0:0:0: [sdb] Assuming drive cache: write through
[ 1256.484174] sd 4:0:0:0: [sdb] Asking for cache data failed
[ 1256.484174] sd 4:0:0:0: [sdb] Assuming drive cache: write through
[ 1256.493409]  sdb: sdb1
[ 1256.540082] sd 4:0:0:0: [sdb] Asking for cache data failed
[ 1256.540083] sd 4:0:0:0: [sdb] Assuming drive cache: write through
[ 1256.540083] sd 4:0:0:0: [sdb] Attached SCSI removable disk
[ 1257.566980] usb-storage 1-1:1.0: Quirks match for vid 0457 pid 0150: 80
[ 1257.566983] scsi5 : usb-storage 1-1:1.0
[ 1258.400641] BUG: unable to handle kernel NULL pointer dereference at  
(null)
[ 1258.400818] IP: [<c105ccf8>] __next_timer_interrupt+0xa8/0x160

Was sdb the device that had been just yanked out?  Or was this some other SCSI
device?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (67 preceding siblings ...)
  2011-04-25 20:28 ` bugzilla-daemon
@ 2011-04-26  0:28 ` bugzilla-daemon
  2011-04-26  0:44 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-26  0:28 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #73 from rocko <rockorequin@hotmail.com>  2011-04-26 00:27:38 ---
Yes, sdb is in this case the external USB key that has just been inserted (or
bound by the script).

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (68 preceding siblings ...)
  2011-04-26  0:28 ` bugzilla-daemon
@ 2011-04-26  0:44 ` bugzilla-daemon
  2011-04-26  1:22 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-26  0:44 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #73 from rocko <rockorequin@hotmail.com>  2011-04-26 00:27:38 ---
Yes, sdb is in this case the external USB key that has just been inserted (or
bound by the script).

--- Comment #74 from Theodore Tso <tytso@mit.edu>  2011-04-26 00:44:42 ---
So have you confirmed that if you're not running a desktop, the system doesn't
crash?

Part of the problem is I have absolutely *no* idea what the desktop was doing
at the time of the crash.   The stack trace is completely useless.....

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (69 preceding siblings ...)
  2011-04-26  0:44 ` bugzilla-daemon
@ 2011-04-26  1:22 ` bugzilla-daemon
  2011-04-26  3:29 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-26  1:22 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #75 from rocko <rockorequin@hotmail.com>  2011-04-26 01:22:52 ---
Created an attachment (id=55522)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=55522)
null pointer deference hit while running without a desktop

I just now managed to crash the kernel *without* a desktop running. Running the
VM in recovery mode (ie console with no desktop), I tried two ways:

a) I modified the script to mount the device after binding it. With no other
modifications, the kernel did not crash with around 50 bind/mount/unbind
attempts (which is not conclusive but seems a reasonable number of tests to
try). Note that with this setup, the device kept getting a new drive letter on
each bind, ie /dev/sdb, /dev/sdc, /dev/sdd, etc, whereas with a desktop running
it is assigned each time to /dev/sdb.

b) I modified the script to umount the device immediately after the subsequent
unbind, ie the process is bind, mount on /tmp/usb, unbind, umount /tmp/usb. It
crashed with the null pointer dereference first time (log attached just in
case).

So the umount might be the key to the issue. I should think desktop's
auto-mounting code would also be trying to umount devices once it realises
they're no longer there.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (70 preceding siblings ...)
  2011-04-26  1:22 ` bugzilla-daemon
@ 2011-04-26  3:29 ` bugzilla-daemon
  2011-04-26  4:02 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-26  3:29 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #76 from Theodore Tso <tytso@mit.edu>  2011-04-26 03:28:39 ---
So for this latest result, which file systems have you been using?  ext3? ext4?
ntfs? vfat?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (71 preceding siblings ...)
  2011-04-26  3:29 ` bugzilla-daemon
@ 2011-04-26  4:02 ` bugzilla-daemon
  2011-04-26 18:15 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-26  4:02 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #76 from Theodore Tso <tytso@mit.edu>  2011-04-26 03:28:39 ---
So for this latest result, which file systems have you been using?  ext3? ext4?
ntfs? vfat?

--- Comment #77 from rocko <rockorequin@hotmail.com>  2011-04-26 04:02:41 ---
This is with ext4 on both the root file system and the external USB drive.

AFAIK, I have only ever reproduced the crash on ext4, but I haven't done much
experimentation with ext3 as I don't use it. Another user told me he had the
same issue - oops on resume after removing drive - and he was using an ext3
drive. I did try reproducing the crash with vfat and ntfs but couldn't make the
kernel crash.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (72 preceding siblings ...)
  2011-04-26  4:02 ` bugzilla-daemon
@ 2011-04-26 18:15 ` bugzilla-daemon
  2011-05-03  2:19 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-04-26 18:15 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #78 from Alan Stern <stern@rowland.harvard.edu>  2011-04-26 18:15:00 ---
I can report triggering a similar problem -- once.  I attached a USB drive with
an ext4 filesystem, mounted it, read a file from it, unbound usb-storage, and
then unmounted it.  No desktop was running at the time.  About a second later I
got a nasty crash -- an unending stream of log messages scrolling up the
console, with no way to stop it other than powering off the machine.

So then I rebuilt the test kernel to add netconsole support, and I have not
been able to trigger the problem since.

It wouldn't be at all suprising to find out that this bug lies not in the
filesystem code but somewhere lower, such as the block layer or even the SCSI
core.  It seems to have a large random component as well as a delayed impact. 
Rocko is able to trigger it a lot more reproducibly.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (73 preceding siblings ...)
  2011-04-26 18:15 ` bugzilla-daemon
@ 2011-05-03  2:19 ` bugzilla-daemon
  2011-05-04  7:36 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-05-03  2:19 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #79 from rocko <rockorequin@hotmail.com>  2011-05-03 02:19:42 ---
Created an attachment (id=56272)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=56272)
oops log for null pointer dereference 2.6.39-rc5

Still present in 2.6.39-rc5 (log attached).

Yes, I can readily reproduce this bug in a VM with my current ext4 USB test key
- the attached log is from a fresh install of Ubuntu 11.04 amd64 in VirtualBox
4.0.6, with the kernel upgraded to 2.6.39-rc5.

Fwiw, 2.6.39-rc5 behaved a bit differently from 2.6.38: the system reported
issues with eg /dev/sdb1 and I started seeing multiple mountpoints in the
desktop that I couldn't unmount (/media/disk, /media/disk_, /media/disk__).

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (74 preceding siblings ...)
  2011-05-03  2:19 ` bugzilla-daemon
@ 2011-05-04  7:36 ` bugzilla-daemon
  2011-05-10 23:27 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-05-04  7:36 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #80 from rocko <rockorequin@hotmail.com>  2011-05-04 07:36:48 ---
The null pointer dereference still happens in 2.6.39-rc6.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (75 preceding siblings ...)
  2011-05-04  7:36 ` bugzilla-daemon
@ 2011-05-10 23:27 ` bugzilla-daemon
  2011-05-26  6:44 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-05-10 23:27 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #81 from rocko <rockorequin@hotmail.com>  2011-05-10 23:27:56 ---
Still present in 2.6.39-rc7.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (76 preceding siblings ...)
  2011-05-10 23:27 ` bugzilla-daemon
@ 2011-05-26  6:44 ` bugzilla-daemon
  2011-05-26 14:27 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-05-26  6:44 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #82 from rocko <rockorequin@hotmail.com>  2011-05-26 06:44:12 ---
I just hit this again in 2.6.39. I was trying to delete a partition from an
external drive, something went wrong and for some reason the kernel dumped all
the other attached USB drives, whereupon the entire machine crashed with an
"unable to handle paging" then "scheduling while atomic" oops. It's a really
annoying bug.


This is what was recorded in the system log (it actually got written this
time):

May 26 14:28:02 hercules kernel: [128626.171297] xhci_hcd 0000:04:00.0: WARN:
Stalled endpoint
May 26 14:29:36 hercules kernel: [128720.267229] xhci_hcd 0000:04:00.0: WARN:
Stalled endpoint
May 26 14:29:36 hercules kernel: [128720.353640] xhci_hcd 0000:04:00.0: WARN:
Stalled endpoint
May 26 14:29:36 hercules kernel: [128720.380617] xhci_hcd 0000:04:00.0: WARN:
Stalled endpoint
May 26 14:29:41 hercules kernel: [128725.194501] xhci_hcd 0000:04:00.0: WARN:
Stalled endpoint
May 26 14:29:41 hercules kernel: [128725.222284] xhci_hcd 0000:04:00.0: WARN:
Stalled endpoint
May 26 14:29:46 hercules kernel: [128729.766858] xhci_hcd 0000:04:00.0: WARN:
Stalled endpoint
May 26 14:29:46 hercules kernel: [128729.801933] xhci_hcd 0000:04:00.0: WARN:
Stalled endpoint
May 26 14:29:53 hercules kernel: [128736.817629] xhci_hcd 0000:04:00.0: WARN:
Stalled endpoint
May 26 14:29:53 hercules kernel: [128736.817863] xhci_hcd 0000:04:00.0: ERROR
no room on ep ring
May 26 14:29:53 hercules kernel: [128736.817873] xhci_hcd 0000:04:00.0: ERR: No
room for command on command ring
May 26 14:29:53 hercules kernel: [128736.817882] xhci_hcd 0000:04:00.0: FIXME
allocate a new ring segment
May 26 14:30:00 hercules kernel: [128743.863985] xhci_hcd 0000:04:00.0: ERROR
no room on ep ring
May 26 14:30:00 hercules kernel: [128743.863998] xhci_hcd 0000:04:00.0: ERR: No
room for command on command ring
May 26 14:30:05 hercules kernel: [128748.872230] xhci_hcd 0000:04:00.0: xHCI
host not responding to stop endpoint command.
May 26 14:30:05 hercules kernel: [128748.872240] xhci_hcd 0000:04:00.0:
Assuming host is dying, halting host.
May 26 14:30:05 hercules kernel: [128748.878855] xhci_hcd 0000:04:00.0: HC
died; cleaning up
May 26 14:30:05 hercules kernel: [128748.878960] usb 3-1: USB disconnect,
device number 12
May 26 14:30:05 hercules kernel: [128748.878971] usb 3-1.1: USB disconnect,
device number 14
May 26 14:30:05 hercules kernel: [128748.879684] sd 18:0:0:0: Device offlined -
not ready after error recovery
May 26 14:30:05 hercules kernel: [128748.879717] sd 18:0:0:0: rejecting I/O to
offline device
May 26 14:30:05 hercules kernel: [128748.879786] sd 18:0:0:0: [sdf] Unhandled
error code
May 26 14:30:05 hercules kernel: [128748.879793] sd 18:0:0:0: [sdf]  Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
May 26 14:30:05 hercules kernel: [128748.879804] sd 18:0:0:0: [sdf] CDB:
Read(10): 28 00 00 00 00 00 00 00 08 00
May 26 14:30:05 hercules kernel: [128748.879830] end_request: I/O error, dev
sdf, sector 0
May 26 14:30:05 hercules kernel: [128748.879840] Buffer I/O error on device
sdf, logical block 0
May 26 14:30:05 hercules kernel: [128748.879915] sd 18:0:0:0: rejecting I/O to
offline device
May 26 14:30:05 hercules kernel: [128748.879978] sd 18:0:0:0: rejecting I/O to
offline device
May 26 14:30:05 hercules kernel: [128748.963133] usb 3-1.5: USB disconnect,
device number 15
May 26 14:30:05 hercules kernel: [128749.116280] JBD2: I/O error detected when
updating journal superblock for sdd1-8.
May 26 14:30:05 hercules kernel: [128749.222460] usb 3-1.6: USB disconnect,
device number 13
May 26 14:30:05 hercules kernel: [128749.240211] JBD2: I/O error detected when
updating journal superblock for sdb1-8.
May 26 14:30:05 hercules kernel: [128749.262545] usb 3-1.7: USB disconnect,
device number 16
May 26 14:30:05 hercules kernel: [128749.269556] JBD2: I/O error detected when
updating journal superblock for sde1-8.
May 26 14:30:05 hercules kernel: [128749.363281] usb 3-2: USB disconnect,
device number 17
May 26 14:30:10 hercules kernel: [128754.270529] BUG: unable to handle kernel
paging request at 00000001638ca0f8


The "scheduling while atomic" message was shown on the screen, with a short
stack trace ("cpuidle_idle_call / cpu_idle / start_secondary").

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (77 preceding siblings ...)
  2011-05-26  6:44 ` bugzilla-daemon
@ 2011-05-26 14:27 ` bugzilla-daemon
  2011-07-13  7:52 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-05-26 14:27 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #83 from Alan Stern <stern@rowland.harvard.edu>  2011-05-26 14:27:43 ---
This is a completely different issue from the main problem in this bug report. 
You should report it separately -- perhaps by posting it on the linux-usb
mailing list.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (78 preceding siblings ...)
  2011-05-26 14:27 ` bugzilla-daemon
@ 2011-07-13  7:52 ` bugzilla-daemon
  2011-08-31  5:00 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-07-13  7:52 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832


Bryce Nesbitt <bryce2@obviously.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bryce2@obviously.com




--- Comment #84 from Bryce Nesbitt <bryce2@obviously.com>  2011-07-13 07:52:10 ---
The bug bit me today, after I pulled out a mounted ext3 usb drive.  A total
Kernel hang.  Video still displayed.  No ping.  No mouse.  No keyboard. 
Unfortunately I forgot about Magic SysRq
(http://en.wikipedia.org/wiki/Magic_SysRq_key ) and did not probe.  Here's all
I got:

# vi /var/log/syslog
Jul 13 00:31:43 ubuntu kernel: [181916.094245] JBD2: I/O error detected when
updating journal superblock for sde1-8.
Jul 13 00:40:49 ubuntu kernel: imklog 4.6.4, log source = /proc/kmsg started.

# uname -a
Linux ubuntu 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011
x86_64 x86_64 x86_64 GNU/Linux

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (79 preceding siblings ...)
  2011-07-13  7:52 ` bugzilla-daemon
@ 2011-08-31  5:00 ` bugzilla-daemon
  2011-08-31  5:07 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-08-31  5:00 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #85 from rocko <rockorequin@hotmail.com>  2011-08-31 05:00:32 ---
This bug is still an issue in 3.0.4 and 3.1-rc4. Sadly, this is making linux
look rather unreliable for real-world use. :(

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (80 preceding siblings ...)
  2011-08-31  5:00 ` bugzilla-daemon
@ 2011-08-31  5:07 ` bugzilla-daemon
  2011-08-31 14:36 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-08-31  5:07 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #86 from rocko <rockorequin@hotmail.com>  2011-08-31 05:07:27 ---
@Alan: re your comment #78, how did you compile in netconsole support? I have
the following set in .config for my builds, which is the default in Ubuntu's
kernel:

CONFIG_NETCONSOLE=m
CONFIG_NETCONSOLE_DYNAMIC=y

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (81 preceding siblings ...)
  2011-08-31  5:07 ` bugzilla-daemon
@ 2011-08-31 14:36 ` bugzilla-daemon
  2011-08-31 23:43 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-08-31 14:36 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #87 from Alan Stern <stern@rowland.harvard.edu>  2011-08-31 14:35:39 ---
I think all I had was  CONFIG_NETCONSOLE=y.

As for fixing the problem...  To be honest, you shouldn't expect it to get
fixed until somebody can identify what's causing it.  Since you seem to be one
of the very few people experiencing it regularly, the situation doesn't look
good until you can provide more information.

The best course of action is to narrow down the variables as much as possible. 
That means not running any extraneous programs (i.e., don't run a graphical
desktop, and indeed, don't run X at all).  It also means coming up with a very
repeatable scenario to trigger the problem.  Something like what you described
in comment #75 would be good.

Speaking of which, you mentioned in that comment that on each loop through the
test, the driver letter would go up by one.  That should not have happened! 
It's another indication of something strange.  Can you verify -- does this
still happen with 3.1-rc4?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (82 preceding siblings ...)
  2011-08-31 14:36 ` bugzilla-daemon
@ 2011-08-31 23:43 ` bugzilla-daemon
  2011-09-01  1:30 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-08-31 23:43 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #88 from rocko <rockorequin@hotmail.com>  2011-08-31 23:43:20 ---
Running my desktop-less usb-on-off-with-mount.sh test script in 3.1-rc4, the
drive stayed at /dev/sdb1 each time. So that seems to be fixed.

The kernel eventually crashed on a umount (not straightaway like last time).
load_balance appears in the limited stack trace that I can see in the VM.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (83 preceding siblings ...)
  2011-08-31 23:43 ` bugzilla-daemon
@ 2011-09-01  1:30 ` bugzilla-daemon
  2011-09-04  3:53 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-09-01  1:30 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832

--- Comment #89 from rocko <rockorequin@hotmail.com>  2011-09-01 01:30:13 ---
Created an attachment (id=71072)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=71072)
usbonoff-with-mount.sh - script to reproduce the crash

For the record, it is very easy to reproduce the bug with the attached script,
which repeatedly mounts, forces an eject, then umounts the USB device until the
kernel eventually crashes. The procedure is:

1. Create a VirtualBox VM running Ubuntu 11.04 64-bit with the kernel being
tested and boot the VM in recovery mode, ie so no desktop is running.

2. Use the VirtualBox USB facility (right click on the USB icon in the status
bar) to attach an ext3/4 drive to the VM.

3. Run the attached script, specifying the target USB device as the argument
(the script lists possible devices if you don't supply an argument). There's a
commented-out option to redirect output to output to an external machine via
netconsole as well, ie to capture the crash log, which you can't fully see in
the VM window.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (84 preceding siblings ...)
  2011-09-01  1:30 ` bugzilla-daemon
@ 2011-09-04  3:53 ` bugzilla-daemon
  2011-09-04 13:55 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-09-04  3:53 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #90 from rocko <rockorequin@hotmail.com>  2011-09-04 03:53:16 ---
BTW, CONFIG_NETCONSOLE=y doesn't make any difference for me (it still crashes
the kernel), although I didn't really expect it to since I had it compiled in
as a module and was specifically loading it prior to running the test that
crashes the kernel.

So I have a reproducible test case (which, even better, works in VirtualBox
VM), but the error information generated in the crash log isn't sufficient for
tracking down this problem. Where do I go from here?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (85 preceding siblings ...)
  2011-09-04  3:53 ` bugzilla-daemon
@ 2011-09-04 13:55 ` bugzilla-daemon
  2011-09-04 14:00 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-09-04 13:55 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #91 from Alan Stern <stern@rowland.harvard.edu>  2011-09-04 13:55:27 ---
Good news: I have been able to reproduce the same sort of crash regularly,
using a variant of your script.  That will make debugging a lot easier, even
though it will still be difficult.

Incidentally, is CONFIG_EXT4_USE_FOR_EXT23 set in your test kernel
configuration?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (86 preceding siblings ...)
  2011-09-04 13:55 ` bugzilla-daemon
@ 2011-09-04 14:00 ` bugzilla-daemon
  2011-09-05 17:44 ` bugzilla-daemon
  2012-07-02 13:24 ` bugzilla-daemon
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2011-09-04 14:00 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #92 from rocko <rockorequin@hotmail.com>  2011-09-04 14:00:29 ---
Hey, that is good news!

I can't find that setting at all in my config. I only have these for
CONFIG_EXT4_:

CONFIG_EXT4_FS=y
CONFIG_EXT4_FS_XATTR=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
# CONFIG_EXT4_DEBUG is not set

What is it supposed to do?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (87 preceding siblings ...)
  2011-09-04 14:00 ` bugzilla-daemon
@ 2011-09-05 17:44 ` bugzilla-daemon
  2011-09-09 19:13   ` Ted Ts'o
  2012-07-02 13:24 ` bugzilla-daemon
  89 siblings, 1 reply; 111+ messages in thread
From: bugzilla-daemon @ 2011-09-05 17:44 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832





--- Comment #93 from Alan Stern <stern@rowland.harvard.edu>  2011-09-05 17:44:30 ---
It causes the ext4 driver to used for ext2 and ext3 filesystems, instead of
using the ext2 and ext3 drivers.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
  2011-09-05 17:44 ` bugzilla-daemon
@ 2011-09-09 19:13   ` Ted Ts'o
  2011-09-09 22:10     ` Alan Stern
                       ` (2 more replies)
  0 siblings, 3 replies; 111+ messages in thread
From: Ted Ts'o @ 2011-09-09 19:13 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-ext4, rockorequin, stern

Bugzilla.kernel.org is down, so apologies to people who have
subscribed to this bug but which I didn't cc explicitly...

Rocko, Alan, could you try this patch and see what happens.  It may be
that we'll crash somewhere else; the problem is that Linux that the
low-level generic hd routines doesn't have a formal way of informing
the VFS and layers below that the disk has disappeared.  It just yanks
it out from under the file system, and we've been manually patching
around kernel crashes....

     	   	 	   	       	    - Ted

commit 6e478d46e58181ec4814f25a2fd91c6323e16ad4
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Fri Sep 9 15:02:54 2011 -0400

    ext4: add ext4-specific kludge to avoid an oops after the disk disappears
    
    The del_gendisk() function uninitializes the disk-specific data
    structures, including the bdi structure, without telling anyone
    else.  Once this happens, any attempt to call mark_buffer_dirty()
    (for example, by ext4_commit_super), will cause a kernel OOPS.
    
    Fix this for now until we can fix things in an architecturally correct
    way.
    
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index ee2f74a..48cb615 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -414,6 +414,22 @@ static void save_error_info(struct super_block *sb, const char *func,
 	ext4_commit_super(sb, 1);
 }
 
+/*
+ * The del_gendisk() function deactivates the inode and deactivates
+ * the bdi without telling the file system.  Once this happens, any
+ * attempt to call mark_buffer_dirty() (for example, by
+ * ext4_commit_super), will cause a kernel OOPS.  This is a kludge to
+ * prevent these oops until we can put in a proper hook in
+ * del_gendisk() to inform the VFS and file system layers.
+ */
+static int block_device_ejected(struct super_block *sb)
+{
+	struct inode *bd_inode = sb->s_bdev->bd_inode;
+	struct backing_dev_info *bdi = bd_inode->i_mapping->backing_dev_info;
+
+	return bdi->dev == NULL;
+}
+
 
 /* Deal with the reporting of failure conditions on a filesystem such as
  * inconsistencies detected or read IO failures.
@@ -4072,7 +4088,7 @@ static int ext4_commit_super(struct super_block *sb, int sync)
 	struct buffer_head *sbh = EXT4_SB(sb)->s_sbh;
 	int error = 0;
 
-	if (!sbh)
+	if (!sbh || block_device_ejected(sb))
 		return error;
 	if (buffer_write_io_error(sbh)) {
 		/*

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
  2011-09-09 19:13   ` Ted Ts'o
@ 2011-09-09 22:10     ` Alan Stern
       [not found]       ` <BAY151-W6176D929049AA9E2BDBAEBA1000@phx.gbl>
  2011-09-10 18:07     ` Alan Stern
  2011-09-12  1:58     ` Alan Stern
  2 siblings, 1 reply; 111+ messages in thread
From: Alan Stern @ 2011-09-09 22:10 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: bugzilla-daemon, linux-ext4, rockorequin

On Fri, 9 Sep 2011, Ted Ts'o wrote:

> Rocko, Alan, could you try this patch and see what happens.  It may be
> that we'll crash somewhere else; the problem is that Linux that the
> low-level generic hd routines doesn't have a formal way of informing
> the VFS and layers below that the disk has disappeared.  It just yanks
> it out from under the file system, and we've been manually patching
> around kernel crashes....

I confirm that this patch fixes the issue on with my test script.  The 
unmounts occurred with no apparent problems.

However you probably should make the same change to the ext3 driver, 
because it has exactly the same issue and some people may still be 
using it.

Alan Stern


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found]       ` <BAY151-W6176D929049AA9E2BDBAEBA1000@phx.gbl>
@ 2011-09-10 14:06         ` Ted Ts'o
  0 siblings, 0 replies; 111+ messages in thread
From: Ted Ts'o @ 2011-09-10 14:06 UTC (permalink / raw)
  To: Rocko Requin; +Cc: stern, bugzilla-daemon, linux-ext4

On Sat, Sep 10, 2011 at 01:49:44AM +0000, Rocko Requin wrote:
> 
> The patch does stop my console-only test script from crashing the
> kernel (thanks for figuring this patch out, Ted!), but if I try it
> from a desktop, the desktop still freezes after two or three
> bind/unbind iterations. So I guess there must be another way to try
> and access the missing file system that also need patching.

Can you get stack traces or register information?  Via sysrq-t /
sysrq-p?  This might require setting up serial console on your desktop
if you don't have kernel mode switching set up so you can switch away
from the X server.

							- Ted

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
  2011-09-09 19:13   ` Ted Ts'o
  2011-09-09 22:10     ` Alan Stern
@ 2011-09-10 18:07     ` Alan Stern
  2011-09-12  1:58     ` Alan Stern
  2 siblings, 0 replies; 111+ messages in thread
From: Alan Stern @ 2011-09-10 18:07 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: bugzilla-daemon, linux-ext4, rockorequin

On Fri, 9 Sep 2011, Ted Ts'o wrote:

> commit 6e478d46e58181ec4814f25a2fd91c6323e16ad4
> Author: Theodore Ts'o <tytso@mit.edu>
> Date:   Fri Sep 9 15:02:54 2011 -0400
> 
>     ext4: add ext4-specific kludge to avoid an oops after the disk disappears
>     
>     The del_gendisk() function uninitializes the disk-specific data
>     structures, including the bdi structure, without telling anyone
>     else.  Once this happens, any attempt to call mark_buffer_dirty()
>     (for example, by ext4_commit_super), will cause a kernel OOPS.
>     
>     Fix this for now until we can fix things in an architecturally correct
>     way.
>     
>     Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

Further testing revealed the following problem.  I changed the test 
script so that after the USB device is unbound, the script tries to 
write a file before unmounting the ext4 filesystem.

There was no drastic failure; the unregistered bdi structure wasn't
accessed.  But lockdep complained.  This is what I got:

[  166.932194] end_request: I/O error, dev uba, sector 136
[  166.940903] EXT4-fs error (device uba): ext4_find_entry:934: inode #2: comm sh: reading directory lblock 0
[  166.949284] end_request: I/O error, dev uba, sector 164
[  166.952084] EXT4-fs error (device uba): ext4_read_inode_bitmap:161: comm sh: Cannot read inode bitmap - block_group = 0, inode_bitmap = 82
[  166.952906] EXT4-fs error (device uba) in ext4_new_inode:1073: IO failure
[  166.953357] 
[  166.953381] =============================================
[  166.953624] [ INFO: possible recursive locking detected ]
[  166.953958] 3.1.0-rc4 #34
[  166.954099] ---------------------------------------------
[  166.954295] sh/819 is trying to acquire lock:
[  166.954613]  (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<c1101290>] ext4_evict_inode+0x17/0x288
[  166.955947] 
[  166.955969] but task is already holding lock:
[  166.956281]  (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<c10aeb45>] do_last+0x165/0x4ff
[  166.956586] 
[  166.956586] other info that might help us debug this:
[  166.956586]  Possible unsafe locking scenario:
[  166.956586] 
[  166.956586]        CPU0
[  166.956586]        ----
[  166.956586]   lock(&sb->s_type->i_mutex_key);
[  166.956586]   lock(&sb->s_type->i_mutex_key);
[  166.956586] 
[  166.956586]  *** DEADLOCK ***
[  166.956586] 
[  166.956586]  May be due to missing lock nesting notation
[  166.956586] 
[  166.956586] 2 locks held by sh/819:
[  166.956586]  #0:  (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<c10aeb45>] do_last+0x165/0x4ff
[  166.956586]  #1:  (jbd2_handle){+.+...}, at: [<c112469f>] start_this_handle+0x3c2/0x41e
[  166.956586] 
[  166.956586] stack backtrace:
[  166.956586] Pid: 819, comm: sh Not tainted 3.1.0-rc4 #34
[  166.956586] Call Trace:
[  166.956586]  [<c135f26e>] ? printk+0xf/0x11
[  166.956586]  [<c105223c>] __lock_acquire+0x875/0xbe7
[  166.956586]  [<c1361600>] ? _raw_spin_unlock_irq+0x2d/0x30
[  166.956586]  [<c105183a>] ? mark_lock+0x26/0x1b3
[  166.956586]  [<c105183a>] ? mark_lock+0x26/0x1b3
[  166.956586]  [<c1052944>] lock_acquire+0x59/0x70
[  166.956586]  [<c1101290>] ? ext4_evict_inode+0x17/0x288
[  166.956586]  [<c13601f9>] __mutex_lock_common+0x38/0x2d4
[  166.956586]  [<c1101290>] ? ext4_evict_inode+0x17/0x288
[  166.956586]  [<c1360573>] mutex_lock_nested+0x32/0x3b
[  166.956586]  [<c1101290>] ? ext4_evict_inode+0x17/0x288
[  166.956586]  [<c1101290>] ext4_evict_inode+0x17/0x288
[  166.956586]  [<c10b5f63>] evict+0x7b/0x11c
[  166.956586]  [<c10b6136>] iput+0x132/0x137
[  166.956586]  [<c10fc467>] ext4_new_inode+0xa53/0xa92
[  166.956586]  [<c1108942>] ? ext4_journal_start_sb+0xdd/0xec
[  166.956586]  [<c10b4afb>] ? d_splice_alias+0xa9/0xb1
[  166.956586]  [<c11045ec>] ext4_create+0xa6/0x10b
[  166.956586]  [<c10ae2d7>] vfs_create+0x61/0x7b
[  166.956586]  [<c10aebd7>] do_last+0x1f7/0x4ff
[  166.956586]  [<c10aefa1>] path_openat+0x9d/0x2b7
[  166.956586]  [<c1052636>] ? lock_release_non_nested+0x88/0x1f7
[  166.956586]  [<c10af1f3>] do_filp_open+0x21/0x5d
[  166.956586]  [<c1361666>] ? _raw_spin_unlock+0x1d/0x2a
[  166.956586]  [<c10b78b1>] ? alloc_fd+0xc0/0xcb
[  166.956586]  [<c10a4207>] do_sys_open+0x54/0xcd
[  166.956586]  [<c10a429e>] sys_open+0x1e/0x26
[  166.956586]  [<c1361820>] syscall_call+0x7/0xb
[  167.175766] end_request: I/O error, dev uba, sector 16534
[  167.177204] Aborting journal on device uba-8.
[  167.179255] end_request: I/O error, dev uba, sector 16516
[  167.179768] Buffer I/O error on device uba, logical block 8258
[  167.179983] lost page write due to I/O error on uba
[  167.180866] JBD2: I/O error detected when updating journal superblock for uba-8.
[  167.181956] journal commit I/O error
[  167.195334] EXT4-fs error (device uba): ext4_put_super:817: Couldn't clean up the journal
[  167.195777] EXT4-fs (uba): Remounting filesystem read-only

It appears to be an unrelated error, but worth looking at.

Alan Stern


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
  2011-09-09 19:13   ` Ted Ts'o
  2011-09-09 22:10     ` Alan Stern
  2011-09-10 18:07     ` Alan Stern
@ 2011-09-12  1:58     ` Alan Stern
  2 siblings, 0 replies; 111+ messages in thread
From: Alan Stern @ 2011-09-12  1:58 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: linux-ext4, rockorequin

Ted:

You also ought to look at this bug report and the follow-up thread.  
The symptoms are similar, although not exactly the same.

	http://marc.info/?l=linux-kernel&m=131504588401397&w=2
	([BUG] D state process after unplug and umount usb disk)

Alan Stern


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
                   ` (88 preceding siblings ...)
  2011-09-05 17:44 ` bugzilla-daemon
@ 2012-07-02 13:24 ` bugzilla-daemon
  89 siblings, 0 replies; 111+ messages in thread
From: bugzilla-daemon @ 2012-07-02 13:24 UTC (permalink / raw)
  To: linux-ext4

https://bugzilla.kernel.org/show_bug.cgi?id=25832


Alan <alan@lxorguk.ukuu.org.uk> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |alan@lxorguk.ukuu.org.uk
         Resolution|                            |CODE_FIX




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* RE: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <BAY151-W13DDCCEFEB7B68EE506214A10C0@phx.gbl>
@ 2011-09-23 15:18   ` Alan Stern
  0 siblings, 0 replies; 111+ messages in thread
From: Alan Stern @ 2011-09-23 15:18 UTC (permalink / raw)
  To: Rocko Requin
  Cc: hare, j-nomura, ben, jaxboe, james.bottomley, tytso,
	linux-kernel, linux-scsi

On Thu, 22 Sep 2011, Rocko Requin wrote:

> > Rocko:
> > 
> > Can you try testing this patch instead of all the patches I sent to 
> > you (but keep Ted's patch)?
> > 
> > Alan Stern
> > 
> 
> The simpler patch (in conjunction with Ted's patch) does stop the
> crashes. I get the same results as before: no kernel crashes
> (marvellous!), but the script's attempt to umount fails. I can then
> manually umount afterwards.

That sounds like a problem in the ext4 unmount implementation.  Ted
should be able to help track it down.

What happens if you change your script to try two unmounts in a row?  
In theory, the second should work like your manual unmount.

> Are these patches likely to be backported to the 3.0 kernel?

Yes, I should think so.  The ext4/ext3 patches may be ported even
farther back.

Alan Stern


^ permalink raw reply	[flat|nested] 111+ messages in thread

* RE: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
@ 2011-09-23 15:18   ` Alan Stern
  0 siblings, 0 replies; 111+ messages in thread
From: Alan Stern @ 2011-09-23 15:18 UTC (permalink / raw)
  To: Rocko Requin
  Cc: hare, j-nomura, ben, jaxboe, james.bottomley, tytso,
	linux-kernel, linux-scsi

On Thu, 22 Sep 2011, Rocko Requin wrote:

> > Rocko:
> > 
> > Can you try testing this patch instead of all the patches I sent to 
> > you (but keep Ted's patch)?
> > 
> > Alan Stern
> > 
> 
> The simpler patch (in conjunction with Ted's patch) does stop the
> crashes. I get the same results as before: no kernel crashes
> (marvellous!), but the script's attempt to umount fails. I can then
> manually umount afterwards.

That sounds like a problem in the ext4 unmount implementation.  Ted
should be able to help track it down.

What happens if you change your script to try two unmounts in a row?  
In theory, the second should work like your manual unmount.

> Are these patches likely to be backported to the 3.0 kernel?

Yes, I should think so.  The ext4/ext3 patches may be ported even
farther back.

Alan Stern


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
  2011-09-22 16:20           ` Thadeu Lima de Souza Cascardo
@ 2011-09-22 16:32               ` Hannes Reinecke
  0 siblings, 0 replies; 111+ messages in thread
From: Hannes Reinecke @ 2011-09-22 16:32 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: Alan Stern, Rocko Requin, Jun'ichi Nomura, Ben Hutchings,
	jaxboe, James Bottomley, tytso, Kernel development list,
	linux-scsi

On 09/22/2011 06:20 PM, Thadeu Lima de Souza Cascardo wrote:
> On Thu, Sep 22, 2011 at 11:16:30AM -0400, Alan Stern wrote:
>> Rocko:
>>
>> Can you try testing this patch instead of all the patches I sent to 
>> you (but keep Ted's patch)?
>>
>> Alan Stern
>>
>> On Thu, 22 Sep 2011, Hannes Reinecke wrote:
>>
>>> On 09/20/2011 09:32 AM, Jun'ichi Nomura wrote:
>>>> On 09/19/11 08:00, Ben Hutchings wrote:
>>> [ .. ]
>>>>>
>>>>> There have been reports of this in Debian going back to 2.6.39:
>>>>>
>>>>> http://bugs.debian.org/631187
>>>>> http://bugs.debian.org/636263
>>>>> http://bugs.debian.org/642043
>>>>>
>>>>> Plus possibly related crashes in elv_put_request after CD-ROM removal:
>>>>>
>>>>> http://bugs.debian.org/633890
>>>>> http://bugs.debian.org/634681
>>>>> http://bugs.debian.org/636103
>>>>>
>>>>> The former was also reported in Ubuntu since their 2.6.38-10:
>>>>>
>>>>> https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796
>>>>>
>>>>> The result of the discussion there was that it appeared to be a
>>>>> regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b 
>>>>> ("[SCSI] put stricter guards on queue dead checks") which was also
>>>>> included in a stable update for 2.6.38.
>>>>>
>>>>> There was also a report on bugzilla.kernel.org, though no-one can see
>>>>> quite what that says now:
>>>>>
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=38842
>>>>>
>>>>> I also reported most of the above to James Bottomley and linux-scsi
>>>>> nearly 2 months ago, to no response.
>>>>
>>>> I've reported a similar oops related to the above commit:
>>>>   [BUG] Oops when SCSI device under multipath is removed
>>>>   https://lkml.org/lkml/2011/8/10/11
>>>>
>>>> Elevator being removed is the core of the problem.
>>>> And the essential issue seems 2 different models of queue/driver relation
>>>> implied by queue_lock.
>>>>
>>>> If reverting the commit is not an option,
>>>> until somebody comes up to fix the essential issue,
>>>> the patch below should close the regressions introduced by the commit.
>>>>
>>> Why do you have to do it that complicated?
>>> Couldn't we just state that any external lock is being disconnected from
>>> queue_lock after blk_cleanup_queue()?
>>>
>>> Then something like this should suffice here:
>>
>>
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 90e1ffd..a4ac005 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -367,10 +367,8 @@ void blk_cleanup_queue(struct request_queue *q)
>>         queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
>>         mutex_unlock(&q->sysfs_lock);
>>
>> -       if (q->elevator)
>> -               elevator_exit(q->elevator);
>> -
>> -       blk_throtl_exit(q);
>> +       if (q->queue_lock != q->__queue_lock)
>> +               q->queue_lock = q->__queue_lock;
> 
> That should be &q->__queue_lock.
> 
Why, but of course.
It's been fixed with the official patch
(cf block: Free queue resources at blk_release_queue())

Cheers,

Hannes
-- 
Dr. Hannes Reinecke              zSeries & Storage
hare@suse.de                  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
@ 2011-09-22 16:32               ` Hannes Reinecke
  0 siblings, 0 replies; 111+ messages in thread
From: Hannes Reinecke @ 2011-09-22 16:32 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: Alan Stern, Rocko Requin, Jun'ichi Nomura, Ben Hutchings,
	jaxboe, James Bottomley, tytso, Kernel development list,
	linux-scsi

On 09/22/2011 06:20 PM, Thadeu Lima de Souza Cascardo wrote:
> On Thu, Sep 22, 2011 at 11:16:30AM -0400, Alan Stern wrote:
>> Rocko:
>>
>> Can you try testing this patch instead of all the patches I sent to 
>> you (but keep Ted's patch)?
>>
>> Alan Stern
>>
>> On Thu, 22 Sep 2011, Hannes Reinecke wrote:
>>
>>> On 09/20/2011 09:32 AM, Jun'ichi Nomura wrote:
>>>> On 09/19/11 08:00, Ben Hutchings wrote:
>>> [ .. ]
>>>>>
>>>>> There have been reports of this in Debian going back to 2.6.39:
>>>>>
>>>>> http://bugs.debian.org/631187
>>>>> http://bugs.debian.org/636263
>>>>> http://bugs.debian.org/642043
>>>>>
>>>>> Plus possibly related crashes in elv_put_request after CD-ROM removal:
>>>>>
>>>>> http://bugs.debian.org/633890
>>>>> http://bugs.debian.org/634681
>>>>> http://bugs.debian.org/636103
>>>>>
>>>>> The former was also reported in Ubuntu since their 2.6.38-10:
>>>>>
>>>>> https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796
>>>>>
>>>>> The result of the discussion there was that it appeared to be a
>>>>> regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b 
>>>>> ("[SCSI] put stricter guards on queue dead checks") which was also
>>>>> included in a stable update for 2.6.38.
>>>>>
>>>>> There was also a report on bugzilla.kernel.org, though no-one can see
>>>>> quite what that says now:
>>>>>
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=38842
>>>>>
>>>>> I also reported most of the above to James Bottomley and linux-scsi
>>>>> nearly 2 months ago, to no response.
>>>>
>>>> I've reported a similar oops related to the above commit:
>>>>   [BUG] Oops when SCSI device under multipath is removed
>>>>   https://lkml.org/lkml/2011/8/10/11
>>>>
>>>> Elevator being removed is the core of the problem.
>>>> And the essential issue seems 2 different models of queue/driver relation
>>>> implied by queue_lock.
>>>>
>>>> If reverting the commit is not an option,
>>>> until somebody comes up to fix the essential issue,
>>>> the patch below should close the regressions introduced by the commit.
>>>>
>>> Why do you have to do it that complicated?
>>> Couldn't we just state that any external lock is being disconnected from
>>> queue_lock after blk_cleanup_queue()?
>>>
>>> Then something like this should suffice here:
>>
>>
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 90e1ffd..a4ac005 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -367,10 +367,8 @@ void blk_cleanup_queue(struct request_queue *q)
>>         queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
>>         mutex_unlock(&q->sysfs_lock);
>>
>> -       if (q->elevator)
>> -               elevator_exit(q->elevator);
>> -
>> -       blk_throtl_exit(q);
>> +       if (q->queue_lock != q->__queue_lock)
>> +               q->queue_lock = q->__queue_lock;
> 
> That should be &q->__queue_lock.
> 
Why, but of course.
It's been fixed with the official patch
(cf block: Free queue resources at blk_release_queue())

Cheers,

Hannes
-- 
Dr. Hannes Reinecke              zSeries & Storage
hare@suse.de                  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
  2011-09-22 15:16           ` Alan Stern
  (?)
@ 2011-09-22 16:20           ` Thadeu Lima de Souza Cascardo
  2011-09-22 16:32               ` Hannes Reinecke
  -1 siblings, 1 reply; 111+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2011-09-22 16:20 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rocko Requin, Hannes Reinecke, Jun'ichi Nomura,
	Ben Hutchings, jaxboe, James Bottomley, tytso,
	Kernel development list, linux-scsi

On Thu, Sep 22, 2011 at 11:16:30AM -0400, Alan Stern wrote:
> Rocko:
> 
> Can you try testing this patch instead of all the patches I sent to 
> you (but keep Ted's patch)?
> 
> Alan Stern
> 
> On Thu, 22 Sep 2011, Hannes Reinecke wrote:
> 
> > On 09/20/2011 09:32 AM, Jun'ichi Nomura wrote:
> > > On 09/19/11 08:00, Ben Hutchings wrote:
> > [ .. ]
> > >>
> > >> There have been reports of this in Debian going back to 2.6.39:
> > >>
> > >> http://bugs.debian.org/631187
> > >> http://bugs.debian.org/636263
> > >> http://bugs.debian.org/642043
> > >>
> > >> Plus possibly related crashes in elv_put_request after CD-ROM removal:
> > >>
> > >> http://bugs.debian.org/633890
> > >> http://bugs.debian.org/634681
> > >> http://bugs.debian.org/636103
> > >>
> > >> The former was also reported in Ubuntu since their 2.6.38-10:
> > >>
> > >> https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796
> > >>
> > >> The result of the discussion there was that it appeared to be a
> > >> regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b 
> > >> ("[SCSI] put stricter guards on queue dead checks") which was also
> > >> included in a stable update for 2.6.38.
> > >>
> > >> There was also a report on bugzilla.kernel.org, though no-one can see
> > >> quite what that says now:
> > >>
> > >> https://bugzilla.kernel.org/show_bug.cgi?id=38842
> > >>
> > >> I also reported most of the above to James Bottomley and linux-scsi
> > >> nearly 2 months ago, to no response.
> > > 
> > > I've reported a similar oops related to the above commit:
> > >   [BUG] Oops when SCSI device under multipath is removed
> > >   https://lkml.org/lkml/2011/8/10/11
> > > 
> > > Elevator being removed is the core of the problem.
> > > And the essential issue seems 2 different models of queue/driver relation
> > > implied by queue_lock.
> > > 
> > > If reverting the commit is not an option,
> > > until somebody comes up to fix the essential issue,
> > > the patch below should close the regressions introduced by the commit.
> > > 
> > Why do you have to do it that complicated?
> > Couldn't we just state that any external lock is being disconnected from
> > queue_lock after blk_cleanup_queue()?
> > 
> > Then something like this should suffice here:
> 
> 
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 90e1ffd..a4ac005 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -367,10 +367,8 @@ void blk_cleanup_queue(struct request_queue *q)
>         queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
>         mutex_unlock(&q->sysfs_lock);
> 
> -       if (q->elevator)
> -               elevator_exit(q->elevator);
> -
> -       blk_throtl_exit(q);
> +       if (q->queue_lock != q->__queue_lock)
> +               q->queue_lock = q->__queue_lock;

That should be &q->__queue_lock.

Regards,
Cascardo.

> 
>         blk_put_queue(q);
>  }
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 0ee17b5..a5a756b 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -477,6 +477,11 @@ static void blk_release_queue(struct kobject *kobj)
> 
>         blk_sync_queue(q);
> 
> +       if (q->elevator)
> +               elevator_exit(q->elevator);
> +
> +       blk_throtl_exit(q);
> +
>         if (rl->rq_pool)
>                 mempool_destroy(rl->rq_pool);
> 
> 
> > And yeah, I find it pretty annoying, too.
> > 
> > Cheers,
> > 
> > Hannes
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
  2011-09-22 12:26         ` Hannes Reinecke
@ 2011-09-22 15:16           ` Alan Stern
  -1 siblings, 0 replies; 111+ messages in thread
From: Alan Stern @ 2011-09-22 15:16 UTC (permalink / raw)
  To: Rocko Requin
  Cc: Hannes Reinecke, Jun'ichi Nomura, Ben Hutchings, jaxboe,
	James Bottomley, tytso, Kernel development list, linux-scsi

Rocko:

Can you try testing this patch instead of all the patches I sent to 
you (but keep Ted's patch)?

Alan Stern

On Thu, 22 Sep 2011, Hannes Reinecke wrote:

> On 09/20/2011 09:32 AM, Jun'ichi Nomura wrote:
> > On 09/19/11 08:00, Ben Hutchings wrote:
> [ .. ]
> >>
> >> There have been reports of this in Debian going back to 2.6.39:
> >>
> >> http://bugs.debian.org/631187
> >> http://bugs.debian.org/636263
> >> http://bugs.debian.org/642043
> >>
> >> Plus possibly related crashes in elv_put_request after CD-ROM removal:
> >>
> >> http://bugs.debian.org/633890
> >> http://bugs.debian.org/634681
> >> http://bugs.debian.org/636103
> >>
> >> The former was also reported in Ubuntu since their 2.6.38-10:
> >>
> >> https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796
> >>
> >> The result of the discussion there was that it appeared to be a
> >> regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b 
> >> ("[SCSI] put stricter guards on queue dead checks") which was also
> >> included in a stable update for 2.6.38.
> >>
> >> There was also a report on bugzilla.kernel.org, though no-one can see
> >> quite what that says now:
> >>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=38842
> >>
> >> I also reported most of the above to James Bottomley and linux-scsi
> >> nearly 2 months ago, to no response.
> > 
> > I've reported a similar oops related to the above commit:
> >   [BUG] Oops when SCSI device under multipath is removed
> >   https://lkml.org/lkml/2011/8/10/11
> > 
> > Elevator being removed is the core of the problem.
> > And the essential issue seems 2 different models of queue/driver relation
> > implied by queue_lock.
> > 
> > If reverting the commit is not an option,
> > until somebody comes up to fix the essential issue,
> > the patch below should close the regressions introduced by the commit.
> > 
> Why do you have to do it that complicated?
> Couldn't we just state that any external lock is being disconnected from
> queue_lock after blk_cleanup_queue()?
> 
> Then something like this should suffice here:



diff --git a/block/blk-core.c b/block/blk-core.c
index 90e1ffd..a4ac005 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -367,10 +367,8 @@ void blk_cleanup_queue(struct request_queue *q)
        queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
        mutex_unlock(&q->sysfs_lock);

-       if (q->elevator)
-               elevator_exit(q->elevator);
-
-       blk_throtl_exit(q);
+       if (q->queue_lock != q->__queue_lock)
+               q->queue_lock = q->__queue_lock;

        blk_put_queue(q);
 }
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 0ee17b5..a5a756b 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -477,6 +477,11 @@ static void blk_release_queue(struct kobject *kobj)

        blk_sync_queue(q);

+       if (q->elevator)
+               elevator_exit(q->elevator);
+
+       blk_throtl_exit(q);
+
        if (rl->rq_pool)
                mempool_destroy(rl->rq_pool);
 

> And yeah, I find it pretty annoying, too.
> 
> Cheers,
> 
> Hannes
> 


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
@ 2011-09-22 15:16           ` Alan Stern
  0 siblings, 0 replies; 111+ messages in thread
From: Alan Stern @ 2011-09-22 15:16 UTC (permalink / raw)
  To: Rocko Requin
  Cc: Hannes Reinecke, Jun'ichi Nomura, Ben Hutchings, jaxboe,
	James Bottomley, tytso, Kernel development list, linux-scsi

Rocko:

Can you try testing this patch instead of all the patches I sent to 
you (but keep Ted's patch)?

Alan Stern

On Thu, 22 Sep 2011, Hannes Reinecke wrote:

> On 09/20/2011 09:32 AM, Jun'ichi Nomura wrote:
> > On 09/19/11 08:00, Ben Hutchings wrote:
> [ .. ]
> >>
> >> There have been reports of this in Debian going back to 2.6.39:
> >>
> >> http://bugs.debian.org/631187
> >> http://bugs.debian.org/636263
> >> http://bugs.debian.org/642043
> >>
> >> Plus possibly related crashes in elv_put_request after CD-ROM removal:
> >>
> >> http://bugs.debian.org/633890
> >> http://bugs.debian.org/634681
> >> http://bugs.debian.org/636103
> >>
> >> The former was also reported in Ubuntu since their 2.6.38-10:
> >>
> >> https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796
> >>
> >> The result of the discussion there was that it appeared to be a
> >> regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b 
> >> ("[SCSI] put stricter guards on queue dead checks") which was also
> >> included in a stable update for 2.6.38.
> >>
> >> There was also a report on bugzilla.kernel.org, though no-one can see
> >> quite what that says now:
> >>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=38842
> >>
> >> I also reported most of the above to James Bottomley and linux-scsi
> >> nearly 2 months ago, to no response.
> > 
> > I've reported a similar oops related to the above commit:
> >   [BUG] Oops when SCSI device under multipath is removed
> >   https://lkml.org/lkml/2011/8/10/11
> > 
> > Elevator being removed is the core of the problem.
> > And the essential issue seems 2 different models of queue/driver relation
> > implied by queue_lock.
> > 
> > If reverting the commit is not an option,
> > until somebody comes up to fix the essential issue,
> > the patch below should close the regressions introduced by the commit.
> > 
> Why do you have to do it that complicated?
> Couldn't we just state that any external lock is being disconnected from
> queue_lock after blk_cleanup_queue()?
> 
> Then something like this should suffice here:



diff --git a/block/blk-core.c b/block/blk-core.c
index 90e1ffd..a4ac005 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -367,10 +367,8 @@ void blk_cleanup_queue(struct request_queue *q)
        queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
        mutex_unlock(&q->sysfs_lock);

-       if (q->elevator)
-               elevator_exit(q->elevator);
-
-       blk_throtl_exit(q);
+       if (q->queue_lock != q->__queue_lock)
+               q->queue_lock = q->__queue_lock;

        blk_put_queue(q);
 }
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 0ee17b5..a5a756b 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -477,6 +477,11 @@ static void blk_release_queue(struct kobject *kobj)

        blk_sync_queue(q);

+       if (q->elevator)
+               elevator_exit(q->elevator);
+
+       blk_throtl_exit(q);
+
        if (rl->rq_pool)
                mempool_destroy(rl->rq_pool);
 

> And yeah, I find it pretty annoying, too.
> 
> Cheers,
> 
> Hannes
> 

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
  2011-09-22 12:26         ` Hannes Reinecke
  (?)
@ 2011-09-22 12:35         ` James Bottomley
  -1 siblings, 0 replies; 111+ messages in thread
From: James Bottomley @ 2011-09-22 12:35 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Jun'ichi Nomura, Ben Hutchings, jaxboe, Alan Stern,
	Rocko Requin, tytso, Kernel development list, linux-scsi

On Thu, 2011-09-22 at 14:26 +0200, Hannes Reinecke wrote:
> On 09/20/2011 09:32 AM, Jun'ichi Nomura wrote:
> > On 09/19/11 08:00, Ben Hutchings wrote:
> [ .. ]
> >>
> >> There have been reports of this in Debian going back to 2.6.39:
> >>
> >> http://bugs.debian.org/631187
> >> http://bugs.debian.org/636263
> >> http://bugs.debian.org/642043
> >>
> >> Plus possibly related crashes in elv_put_request after CD-ROM removal:
> >>
> >> http://bugs.debian.org/633890
> >> http://bugs.debian.org/634681
> >> http://bugs.debian.org/636103
> >>
> >> The former was also reported in Ubuntu since their 2.6.38-10:
> >>
> >> https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796
> >>
> >> The result of the discussion there was that it appeared to be a
> >> regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b 
> >> ("[SCSI] put stricter guards on queue dead checks") which was also
> >> included in a stable update for 2.6.38.
> >>
> >> There was also a report on bugzilla.kernel.org, though no-one can see
> >> quite what that says now:
> >>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=38842
> >>
> >> I also reported most of the above to James Bottomley and linux-scsi
> >> nearly 2 months ago, to no response.
> > 
> > I've reported a similar oops related to the above commit:
> >   [BUG] Oops when SCSI device under multipath is removed
> >   https://lkml.org/lkml/2011/8/10/11
> > 
> > Elevator being removed is the core of the problem.
> > And the essential issue seems 2 different models of queue/driver relation
> > implied by queue_lock.
> > 
> > If reverting the commit is not an option,
> > until somebody comes up to fix the essential issue,
> > the patch below should close the regressions introduced by the commit.
> > 
> Why do you have to do it that complicated?
> Couldn't we just state that any external lock is being disconnected from
> queue_lock after blk_cleanup_queue()?
> 
> Then something like this should suffice here:
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 90e1ffd..a4ac005 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -367,10 +367,8 @@ void blk_cleanup_queue(struct request_queue *q)
>         queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
>         mutex_unlock(&q->sysfs_lock);
> 
> -       if (q->elevator)
> -               elevator_exit(q->elevator);
> -
> -       blk_throtl_exit(q);
> +       if (q->queue_lock != q->__queue_lock)
> +               q->queue_lock = q->__queue_lock;
> 
>         blk_put_queue(q);
>  }
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 0ee17b5..a5a756b 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -477,6 +477,11 @@ static void blk_release_queue(struct kobject *kobj)
> 
>         blk_sync_queue(q);
> 
> +       if (q->elevator)
> +               elevator_exit(q->elevator);
> +
> +       blk_throtl_exit(q);
> +

OK, I'll buy this one (when you fix the whitespace issue ... you have
spaces instead of tabs).

The fact that the lock check/replacement doesn't actually need any
locking is probably worthy of a comment.

James



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
  2011-09-20  7:32     ` Jun'ichi Nomura
@ 2011-09-22 12:26         ` Hannes Reinecke
  0 siblings, 0 replies; 111+ messages in thread
From: Hannes Reinecke @ 2011-09-22 12:26 UTC (permalink / raw)
  To: Jun'ichi Nomura
  Cc: Ben Hutchings, jaxboe, Alan Stern, James Bottomley, Rocko Requin,
	tytso, Kernel development list, linux-scsi

On 09/20/2011 09:32 AM, Jun'ichi Nomura wrote:
> On 09/19/11 08:00, Ben Hutchings wrote:
[ .. ]
>>
>> There have been reports of this in Debian going back to 2.6.39:
>>
>> http://bugs.debian.org/631187
>> http://bugs.debian.org/636263
>> http://bugs.debian.org/642043
>>
>> Plus possibly related crashes in elv_put_request after CD-ROM removal:
>>
>> http://bugs.debian.org/633890
>> http://bugs.debian.org/634681
>> http://bugs.debian.org/636103
>>
>> The former was also reported in Ubuntu since their 2.6.38-10:
>>
>> https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796
>>
>> The result of the discussion there was that it appeared to be a
>> regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b 
>> ("[SCSI] put stricter guards on queue dead checks") which was also
>> included in a stable update for 2.6.38.
>>
>> There was also a report on bugzilla.kernel.org, though no-one can see
>> quite what that says now:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=38842
>>
>> I also reported most of the above to James Bottomley and linux-scsi
>> nearly 2 months ago, to no response.
> 
> I've reported a similar oops related to the above commit:
>   [BUG] Oops when SCSI device under multipath is removed
>   https://lkml.org/lkml/2011/8/10/11
> 
> Elevator being removed is the core of the problem.
> And the essential issue seems 2 different models of queue/driver relation
> implied by queue_lock.
> 
> If reverting the commit is not an option,
> until somebody comes up to fix the essential issue,
> the patch below should close the regressions introduced by the commit.
> 
Why do you have to do it that complicated?
Couldn't we just state that any external lock is being disconnected from
queue_lock after blk_cleanup_queue()?

Then something like this should suffice here:

diff --git a/block/blk-core.c b/block/blk-core.c
index 90e1ffd..a4ac005 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -367,10 +367,8 @@ void blk_cleanup_queue(struct request_queue *q)
        queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
        mutex_unlock(&q->sysfs_lock);

-       if (q->elevator)
-               elevator_exit(q->elevator);
-
-       blk_throtl_exit(q);
+       if (q->queue_lock != q->__queue_lock)
+               q->queue_lock = q->__queue_lock;

        blk_put_queue(q);
 }
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 0ee17b5..a5a756b 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -477,6 +477,11 @@ static void blk_release_queue(struct kobject *kobj)

        blk_sync_queue(q);

+       if (q->elevator)
+               elevator_exit(q->elevator);
+
+       blk_throtl_exit(q);
+
        if (rl->rq_pool)
                mempool_destroy(rl->rq_pool);


And yeah, I find it pretty annoying, too.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke              zSeries & Storage
hare@suse.de                  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
@ 2011-09-22 12:26         ` Hannes Reinecke
  0 siblings, 0 replies; 111+ messages in thread
From: Hannes Reinecke @ 2011-09-22 12:26 UTC (permalink / raw)
  To: Jun'ichi Nomura
  Cc: Ben Hutchings, jaxboe, Alan Stern, James Bottomley, Rocko Requin,
	tytso, Kernel development list, linux-scsi

On 09/20/2011 09:32 AM, Jun'ichi Nomura wrote:
> On 09/19/11 08:00, Ben Hutchings wrote:
[ .. ]
>>
>> There have been reports of this in Debian going back to 2.6.39:
>>
>> http://bugs.debian.org/631187
>> http://bugs.debian.org/636263
>> http://bugs.debian.org/642043
>>
>> Plus possibly related crashes in elv_put_request after CD-ROM removal:
>>
>> http://bugs.debian.org/633890
>> http://bugs.debian.org/634681
>> http://bugs.debian.org/636103
>>
>> The former was also reported in Ubuntu since their 2.6.38-10:
>>
>> https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796
>>
>> The result of the discussion there was that it appeared to be a
>> regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b 
>> ("[SCSI] put stricter guards on queue dead checks") which was also
>> included in a stable update for 2.6.38.
>>
>> There was also a report on bugzilla.kernel.org, though no-one can see
>> quite what that says now:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=38842
>>
>> I also reported most of the above to James Bottomley and linux-scsi
>> nearly 2 months ago, to no response.
> 
> I've reported a similar oops related to the above commit:
>   [BUG] Oops when SCSI device under multipath is removed
>   https://lkml.org/lkml/2011/8/10/11
> 
> Elevator being removed is the core of the problem.
> And the essential issue seems 2 different models of queue/driver relation
> implied by queue_lock.
> 
> If reverting the commit is not an option,
> until somebody comes up to fix the essential issue,
> the patch below should close the regressions introduced by the commit.
> 
Why do you have to do it that complicated?
Couldn't we just state that any external lock is being disconnected from
queue_lock after blk_cleanup_queue()?

Then something like this should suffice here:

diff --git a/block/blk-core.c b/block/blk-core.c
index 90e1ffd..a4ac005 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -367,10 +367,8 @@ void blk_cleanup_queue(struct request_queue *q)
        queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
        mutex_unlock(&q->sysfs_lock);

-       if (q->elevator)
-               elevator_exit(q->elevator);
-
-       blk_throtl_exit(q);
+       if (q->queue_lock != q->__queue_lock)
+               q->queue_lock = q->__queue_lock;

        blk_put_queue(q);
 }
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 0ee17b5..a5a756b 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -477,6 +477,11 @@ static void blk_release_queue(struct kobject *kobj)

        blk_sync_queue(q);

+       if (q->elevator)
+               elevator_exit(q->elevator);
+
+       blk_throtl_exit(q);
+
        if (rl->rq_pool)
                mempool_destroy(rl->rq_pool);


And yeah, I find it pretty annoying, too.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke              zSeries & Storage
hare@suse.de                  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
  2011-09-18 23:00   ` Ben Hutchings
@ 2011-09-20  7:32     ` Jun'ichi Nomura
  2011-09-22 12:26         ` Hannes Reinecke
  0 siblings, 1 reply; 111+ messages in thread
From: Jun'ichi Nomura @ 2011-09-20  7:32 UTC (permalink / raw)
  To: Ben Hutchings, jaxboe
  Cc: Alan Stern, James Bottomley, Rocko Requin, tytso,
	Kernel development list, linux-scsi

On 09/19/11 08:00, Ben Hutchings wrote:
> On Sat, 2011-09-17 at 13:34 -0400, Alan Stern wrote:
>> On Sat, 17 Sep 2011, Rocko Requin wrote:
>>
>>>> Why were you using gnome-terminal?  You should be running the tests at
>>>> a console VT, not under X at all.  Ctrl-Alt-F2 or the equivalent...
>>>
>>> Because with Ted's patch it doesn't crash when run from a console VT, even with an X server running.
>>
>> That's weird.  Maybe the screen updates change some timing.
>>
>>>> Here's another patch to address the new problem.  You can apply it on 
>>>> top of all the other patches.
>>>
>>> Attached is the crash log I get with the latest patch applied.
>>
>> Okay, more fallout from the same problem.  Here's an updated version of 
>> the previous patch.
> [...]
> 
> There have been reports of this in Debian going back to 2.6.39:
> 
> http://bugs.debian.org/631187
> http://bugs.debian.org/636263
> http://bugs.debian.org/642043
> 
> Plus possibly related crashes in elv_put_request after CD-ROM removal:
> 
> http://bugs.debian.org/633890
> http://bugs.debian.org/634681
> http://bugs.debian.org/636103
> 
> The former was also reported in Ubuntu since their 2.6.38-10:
> 
> https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796
> 
> The result of the discussion there was that it appeared to be a
> regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b 
> ("[SCSI] put stricter guards on queue dead checks") which was also
> included in a stable update for 2.6.38.
> 
> There was also a report on bugzilla.kernel.org, though no-one can see
> quite what that says now:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=38842
> 
> I also reported most of the above to James Bottomley and linux-scsi
> nearly 2 months ago, to no response.

I've reported a similar oops related to the above commit:
  [BUG] Oops when SCSI device under multipath is removed
  https://lkml.org/lkml/2011/8/10/11

Elevator being removed is the core of the problem.
And the essential issue seems 2 different models of queue/driver relation
implied by queue_lock.

If reverting the commit is not an option,
until somebody comes up to fix the essential issue,
the patch below should close the regressions introduced by the commit.

Thanks,
-- 
Jun'ichi Nomura, NEC Corporation


This patch moves elevator_exit() and blk_throtl_exit() from
blk_cleanup_queue() to blk_release_queue() when it is possible.

elevator_exit() and blk_throtl_exit() were called in blk_cleanup_queue()
because they use queue_lock.

There are 2 types of queue_locks:
  a) supplied by driver (via blk_init_queue)
  b) embedded in struct request_queue (__queue_lock)

When queue_lock is supplied by driver, there is no guarantee that
the pointer is valid after blk_cleanup_queue(), so they have to be
called in blk_cleanup_queue().
In this case, the driver has to make sure nobody is using the queue
before calling blk_cleanup_queue().

However, OTOH, if queue_lock is '__queue_lock' in request_queue,
blk_release_queue() is better place for freeing structures
because the block layer knows for sure there is no reference.

This patch is ugly but should fix various oopses introduced by this change:
  86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b
  [SCSI] put stricter guards on queue dead checks

For example:
  https://lkml.org/lkml/2011/8/10/11

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>

Index: linux-3.1-rc4/block/blk-core.c
===================================================================
--- linux-3.1-rc4.orig/block/blk-core.c	2011-08-29 13:16:01.000000000 +0900
+++ linux-3.1-rc4/block/blk-core.c	2011-09-20 15:53:23.496814819 +0900
@@ -352,6 +352,14 @@
  * unexpectedly as some queue cleanup components like elevator_exit() and
  * blk_throtl_exit() need queue lock.
  */
+void blk_release_queue_components_with_queuelock(struct request_queue *q)
+{
+	if (q->elevator)
+		elevator_exit(q->elevator);
+
+	blk_throtl_exit(q);
+}
+
 void blk_cleanup_queue(struct request_queue *q)
 {
 	/*
@@ -367,10 +375,12 @@
 	queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
 	mutex_unlock(&q->sysfs_lock);
 
-	if (q->elevator)
-		elevator_exit(q->elevator);
-
-	blk_throtl_exit(q);
+	/*
+	 * A driver supplied the queue lock.
+	 * Cleanup components while the queue lock is valid.
+	 */
+	if (q->queue_lock != &q->__queue_lock)
+		blk_release_queue_components_with_queuelock(q);
 
 	blk_put_queue(q);
 }
Index: linux-3.1-rc4/block/blk-sysfs.c
===================================================================
--- linux-3.1-rc4.orig/block/blk-sysfs.c	2011-09-19 09:38:51.000000000 +0900
+++ linux-3.1-rc4/block/blk-sysfs.c	2011-09-20 15:57:50.358807023 +0900
@@ -477,6 +477,9 @@
 
 	blk_sync_queue(q);
 
+	if (q->queue_lock == &q->__queue_lock)
+		blk_release_queue_components_with_queuelock(q);
+
 	if (rl->rq_pool)
 		mempool_destroy(rl->rq_pool);
 
Index: linux-3.1-rc4/block/blk.h
===================================================================
--- linux-3.1-rc4.orig/block/blk.h	2011-08-29 13:16:01.000000000 +0900
+++ linux-3.1-rc4/block/blk.h	2011-09-20 15:57:38.306807136 +0900
@@ -25,6 +25,9 @@
 void blk_add_timer(struct request *);
 void __generic_unplug_device(struct request_queue *);
 
+/* Wrapper to release functions to be called while queue_lock is valid */
+void blk_release_queue_components_with_queuelock(struct request_queue *q);
+
 /*
  * Internal atomic flags for request handling
  */

^ permalink raw reply	[flat|nested] 111+ messages in thread

* RE: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
  2011-09-17 17:34 ` Alan Stern
@ 2011-09-18 23:00   ` Ben Hutchings
  2011-09-20  7:32     ` Jun'ichi Nomura
  0 siblings, 1 reply; 111+ messages in thread
From: Ben Hutchings @ 2011-09-18 23:00 UTC (permalink / raw)
  To: Alan Stern, James Bottomley
  Cc: Rocko Requin, tytso, Kernel development list, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 1864 bytes --]

On Sat, 2011-09-17 at 13:34 -0400, Alan Stern wrote:
> On Sat, 17 Sep 2011, Rocko Requin wrote:
> 
> > > Why were you using gnome-terminal?  You should be running the tests at
> > > a console VT, not under X at all.  Ctrl-Alt-F2 or the equivalent...
> > 
> > Because with Ted's patch it doesn't crash when run from a console VT, even with an X server running.
> 
> That's weird.  Maybe the screen updates change some timing.
> 
> > > Here's another patch to address the new problem.  You can apply it on 
> > > top of all the other patches.
> > 
> > Attached is the crash log I get with the latest patch applied.
> 
> Okay, more fallout from the same problem.  Here's an updated version of 
> the previous patch.
[...]

There have been reports of this in Debian going back to 2.6.39:

http://bugs.debian.org/631187
http://bugs.debian.org/636263
http://bugs.debian.org/642043

Plus possibly related crashes in elv_put_request after CD-ROM removal:

http://bugs.debian.org/633890
http://bugs.debian.org/634681
http://bugs.debian.org/636103

The former was also reported in Ubuntu since their 2.6.38-10:

https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796

The result of the discussion there was that it appeared to be a
regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b 
("[SCSI] put stricter guards on queue dead checks") which was also
included in a stable update for 2.6.38.

There was also a report on bugzilla.kernel.org, though no-one can see
quite what that says now:

https://bugzilla.kernel.org/show_bug.cgi?id=38842

I also reported most of the above to James Bottomley and linux-scsi
nearly 2 months ago, to no response.

Ben.

-- 
Ben Hutchings
Power corrupts.  Absolute power is kind of neat.
                           - John Lehman, Secretary of the US Navy 1981-1987

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 111+ messages in thread

* RE: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <BAY151-W234D9A977DF076A732C2AAA1080@phx.gbl>
@ 2011-09-18 14:43 ` Alan Stern
  0 siblings, 0 replies; 111+ messages in thread
From: Alan Stern @ 2011-09-18 14:43 UTC (permalink / raw)
  To: Rocko Requin; +Cc: tytso, linux-kernel

On Sun, 18 Sep 2011, Rocko Requin wrote:

> That patch worked, thanks - no more kernel crashes!

That's good news.

> I did see some slightly strange behaviour, though: the umounts issued
> by the script were not working correctly, so the drive was mounting
> successively on /dev/sdb1, /dev/sdc1, /dev/sdd1, etc again. After
> stopping the script, I was able to umount the various mountpoints
> manually, except for one (not the last one in the list, so not the
> most recently mounted one) which reported that the device was busy.
> lsof showed process jbd2/sdj1 three times, once with FD=cwd and
> TYPE=DIR, another with FD=rtd and TYPE=DIR, and lastly with FD=txt
> and TYPE=unknown.

Ted may have some ideas on how to find out what part of the unmount
commands is failing.

Alan Stern


^ permalink raw reply	[flat|nested] 111+ messages in thread

* RE: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <BAY151-W32DCB4BAFEC97DD4913A12A1090@phx.gbl>
@ 2011-09-17 17:34 ` Alan Stern
  2011-09-18 23:00   ` Ben Hutchings
  0 siblings, 1 reply; 111+ messages in thread
From: Alan Stern @ 2011-09-17 17:34 UTC (permalink / raw)
  To: Rocko Requin; +Cc: tytso, Kernel development list

On Sat, 17 Sep 2011, Rocko Requin wrote:

> > Why were you using gnome-terminal?  You should be running the tests at
> > a console VT, not under X at all.  Ctrl-Alt-F2 or the equivalent...
> 
> Because with Ted's patch it doesn't crash when run from a console VT, even with an X server running.

That's weird.  Maybe the screen updates change some timing.

> > Here's another patch to address the new problem.  You can apply it on 
> > top of all the other patches.
> 
> Attached is the crash log I get with the latest patch applied.

Okay, more fallout from the same problem.  Here's an updated version of 
the previous patch.

These are really just bandaid-type fixes.  The people who understand
the block layer ought to be involved.  If this keeps up much longer
I'll get in touch with them.

Alan Stern


Index: usb-3.1/block/blk-core.c
===================================================================
--- usb-3.1.orig/block/blk-core.c
+++ usb-3.1/block/blk-core.c
@@ -367,8 +367,10 @@ void blk_cleanup_queue(struct request_qu
 	queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
 	mutex_unlock(&q->sysfs_lock);
 
-	if (q->elevator)
+	if (q->elevator) {
 		elevator_exit(q->elevator);
+		q->elevator = NULL;
+	}
 
 	blk_throtl_exit(q);
 
Index: usb-3.1/block/elevator.c
===================================================================
--- usb-3.1.orig/block/elevator.c
+++ usb-3.1/block/elevator.c
@@ -769,7 +769,7 @@ void elv_put_request(struct request_queu
 {
 	struct elevator_queue *e = q->elevator;
 
-	if (e->ops->elevator_put_req_fn)
+	if (e && e->ops->elevator_put_req_fn)
 		e->ops->elevator_put_req_fn(rq);
 }
 
@@ -812,7 +812,7 @@ void elv_completed_request(struct reques
 	 */
 	if (blk_account_rq(rq)) {
 		q->in_flight[rq_is_sync(rq)]--;
-		if ((rq->cmd_flags & REQ_SORTED) &&
+		if ((rq->cmd_flags & REQ_SORTED) && e &&
 		    e->ops->elevator_completed_req_fn)
 			e->ops->elevator_completed_req_fn(q, rq);
 	}


^ permalink raw reply	[flat|nested] 111+ messages in thread

* RE: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <BAY151-W1224E6C1A20D179965A149A1090@phx.gbl>
@ 2011-09-17 13:21 ` Alan Stern
  0 siblings, 0 replies; 111+ messages in thread
From: Alan Stern @ 2011-09-17 13:21 UTC (permalink / raw)
  To: Rocko Requin; +Cc: tytso, linux-ext4

Please don't include an entire 400-line message in your reply if you're
only going to add a few new lines of text.  Use some judicious trimming.

On Sat, 17 Sep 2011, Rocko Requin wrote:

> Here's a log of the latest kernel from git crashing with that patch
> applied (as well as Ted's patch), does it help any?

It does.  It shows a new problem that's completely unrelated to the 
earlier one.

> The gnome-terminal console cursor

Why were you using gnome-terminal?  You should be running the tests at
a console VT, not under X at all.  Ctrl-Alt-F2 or the equivalent...

>  stopped flashing after the last
> 'detaching wakeup' message for a while (it *seemed* to have locked up
> at this point) but then it came back after what looks like 17 seconds
> or so from the log (apport reported something else crashing at this
> point) and then the oops happened and it crashed for good.

Here's another patch to address the new problem.  You can apply it on 
top of all the other patches.

Alan Stern


Index: usb-3.1/block/blk-core.c
===================================================================
--- usb-3.1.orig/block/blk-core.c
+++ usb-3.1/block/blk-core.c
@@ -367,8 +367,10 @@ void blk_cleanup_queue(struct request_qu
 	queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
 	mutex_unlock(&q->sysfs_lock);
 
-	if (q->elevator)
+	if (q->elevator) {
 		elevator_exit(q->elevator);
+		q->elevator = NULL;
+	}
 
 	blk_throtl_exit(q);
 
Index: usb-3.1/block/elevator.c
===================================================================
--- usb-3.1.orig/block/elevator.c
+++ usb-3.1/block/elevator.c
@@ -812,7 +812,7 @@ void elv_completed_request(struct reques
 	 */
 	if (blk_account_rq(rq)) {
 		q->in_flight[rq_is_sync(rq)]--;
-		if ((rq->cmd_flags & REQ_SORTED) &&
+		if ((rq->cmd_flags & REQ_SORTED) && e &&
 		    e->ops->elevator_completed_req_fn)
 			e->ops->elevator_completed_req_fn(q, rq);
 	}


^ permalink raw reply	[flat|nested] 111+ messages in thread

* RE: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
       [not found] <BAY151-W3498E8491E671BDAE90421A1070@phx.gbl>
@ 2011-09-16 16:28 ` Alan Stern
  0 siblings, 0 replies; 111+ messages in thread
From: Alan Stern @ 2011-09-16 16:28 UTC (permalink / raw)
  To: Rocko Requin; +Cc: tytso, linux-ext4

On Thu, 15 Sep 2011, Rocko Requin wrote:

> Unfortunately the lockup is complete - I can't switch away from the X
> server and sysrq-t/p doesn't work if I'm in a tty console when it
> happens. The stack traces are like the ones I posted earlier in the
> bug, and they didn't contain any useful information.

Try applying the patch below.  It will print out some extra debugging
information during normal operation and especially when the USB drive
is mounted and unmounted.  Oh yes -- and be certain to run the test 
from a tty console so that the messages don't get lost.  Maybe you can 
capture the log messages using a network console.

This may not give any useful information in the end, because it 
concentrates on the BDI interface which Ted's patch should have fixed.  
If something else is causing your crashes, you might not see anything.  
But it's worth a try.

Alan Stern



Index: usb-3.1/kernel/timer.c
===================================================================
--- usb-3.1.orig/kernel/timer.c
+++ usb-3.1/kernel/timer.c
@@ -111,6 +111,143 @@ timer_set_base(struct timer_list *timer,
 				      tbase_get_deferrable(timer->base));
 }
 
+static void check_timer_list(struct list_head *start, char *name)
+{
+	struct timer_list *t, *tnext, *tprev, *nt;
+	struct list_head *h = start;
+
+	nt = NULL;
+	do {
+		if (!h->next || !h->prev) {
+			nt = list_entry(h, struct timer_list, entry);
+			break;
+		}
+		h = h->next;
+	} while (h != start);
+	if (!nt)
+		return;
+	pr_err("%s: Found bad timer at %p\n", name, nt);
+
+	tnext = tprev = list_entry(start, struct timer_list, entry);
+	list_for_each_entry(t, start, entry) {
+		if (!t)
+			break;
+		pr_info(" Entry %p cb %pS list %p\n", t, t->function,
+				t->entry.prev);
+		if (t == nt)
+			break;
+		tprev = t;
+	}
+	pr_info(" -----\n");
+
+	tnext = list_entry(start, struct timer_list, entry);
+	list_for_each_entry_reverse(t, start, entry) {
+		if (!t) {
+			pr_info(" Broken link\n");
+			break;
+		}
+		if (t == nt)
+			break;
+		pr_info(" Entry %p cb %pS list %p\n", t, t->function,
+				t->entry.next);
+		tnext = t;
+	}
+	pr_info(" ----- Fixing\n");
+	nt->entry.prev = &tprev->entry;
+	tprev->entry.next = &nt->entry;
+	nt->entry.next = &tnext->entry;
+	tnext->entry.prev = &nt->entry;
+}
+
+struct timer_list *alantimer;
+int alanok;
+
+#include <linux/perf_event.h>
+#include <linux/hw_breakpoint.h>
+
+struct perf_event * __percpu *alanhbp;
+unsigned long alanunused;
+int alanhbp_enabled;
+struct list_head **alanptr;
+
+extern void *last_bdi_unreg;
+
+static void check_alan(char *type)
+{
+	if (!alanok)
+		return;
+	if (!alantimer->entry.next || !alantimer->entry.prev) {
+		pr_err("ERROR %s: Bad alantimer %p\n", type, alantimer);
+		alanok = 0;
+	}
+}
+
+static void alanhbp_handler(struct perf_event *bp,
+			       struct perf_sample_data *data,
+			       struct pt_regs *regs)
+{
+	pr_info("*alanptr written: %p\n", *alanptr);
+	if (!alanok || !alanhbp_enabled)
+		return;
+	if (alantimer->entry.next)
+		return;
+	dump_stack();
+}
+
+static void set_alan(struct timer_list *timer)
+{
+	if (alantimer)
+		return;
+	alantimer = timer;
+	alanok = 1;
+
+	if (alanhbp)
+		alanhbp_enabled = (alanptr == &alantimer->entry.next);
+}
+
+static void clear_alan(struct timer_list *timer)
+{
+	if (alantimer != timer)
+		return;
+	alanok = 0;
+	alantimer = NULL;
+	alanhbp_enabled = 0;
+}
+
+void init_alan(unsigned long addr)
+{
+	struct perf_event_attr attr;
+
+	if (alanhbp) {
+		unregister_wide_hw_breakpoint(alanhbp);
+		alanhbp = NULL;
+		alanhbp_enabled = 0;
+	}
+
+	if (addr) {
+		hw_breakpoint_init(&attr);
+		attr.bp_addr = addr;
+		attr.bp_len = HW_BREAKPOINT_LEN_4;
+		attr.bp_type = HW_BREAKPOINT_W;
+		alanhbp = register_wide_hw_breakpoint(&attr, alanhbp_handler,
+				NULL);
+		if (IS_ERR((void __force *) alanhbp)) {
+			pr_info("Breakpoint reg failed %ld\n",
+					PTR_ERR((void __force *) alanhbp));
+			alanhbp = NULL;
+		} else if (!alanhbp) {
+			pr_info("alanhbp was not created\n");
+		} else {
+			pr_info("alanhbp created\n");
+		}
+
+		alanptr = (struct list_head **) addr;
+		alanhbp_enabled = (alanok && alanptr == &alantimer->entry.next);
+		pr_info("alanhbp set for %p\n", alanptr);
+	}
+}
+EXPORT_SYMBOL(init_alan);
+
 static unsigned long round_jiffies_common(unsigned long j, int cpu,
 		bool force_up)
 {
@@ -330,6 +467,9 @@ void set_timer_slack(struct timer_list *
 }
 EXPORT_SYMBOL_GPL(set_timer_slack);
 
+extern void wakeup_timer_fn(unsigned long data);
+#include <linux/backing-dev.h>
+
 static void internal_add_timer(struct tvec_base *base, struct timer_list *timer)
 {
 	unsigned long expires = timer->expires;
@@ -369,7 +509,17 @@ static void internal_add_timer(struct tv
 	/*
 	 * Timers are FIFO:
 	 */
+check_timer_list(vec, "internal_add_1");
+if (timer->function == wakeup_timer_fn) {
+	struct backing_dev_info *bdi = (struct backing_dev_info *) timer->data;
+
+	pr_info("Adding wakeup %p: bdi %p name %s\n", timer, bdi, bdi->name);
+}
 	list_add_tail(&timer->entry, vec);
+if (timer->function == wakeup_timer_fn)
+	set_alan(timer);
+check_timer_list(vec, "internal_add_2");
+check_alan("add");
 }
 
 #ifdef CONFIG_TIMER_STATS
@@ -608,17 +758,24 @@ void init_timer_deferrable_key(struct ti
 }
 EXPORT_SYMBOL(init_timer_deferrable_key);
 
-static inline void detach_timer(struct timer_list *timer,
+static void detach_timer(struct timer_list *timer,
 				int clear_pending)
 {
 	struct list_head *entry = &timer->entry;
 
+check_alan("detach 1");
+if (timer->function == wakeup_timer_fn) {
+	pr_info("Detaching wakeup %p\n", timer);
+	clear_alan(timer);
+}
+
 	debug_deactivate(timer);
 
 	__list_del(entry->prev, entry->next);
 	if (clear_pending)
 		entry->next = NULL;
 	entry->prev = LIST_POISON2;
+check_alan("detach 2");
 }
 
 /*
@@ -1026,6 +1183,7 @@ static int cascade(struct tvec_base *bas
 	struct list_head tv_list;
 
 	list_replace_init(tv->vec + index, &tv_list);
+check_alan("cascade 1");
 
 	/*
 	 * We are removing _all_ timers from the list, so we
@@ -1033,7 +1191,10 @@ static int cascade(struct tvec_base *bas
 	 */
 	list_for_each_entry_safe(timer, tmp, &tv_list, entry) {
 		BUG_ON(tbase_get_base(timer->base) != base);
+if (timer->function == wakeup_timer_fn)
+	pr_info("Cascading wakeup_timer %p\n", timer);
 		internal_add_timer(base, timer);
+check_alan("cascade 2");
 	}
 
 	return index;
@@ -1109,6 +1270,7 @@ static inline void __run_timers(struct t
 			cascade(base, &base->tv5, INDEX(3));
 		++base->timer_jiffies;
 		list_replace_init(base->tv1.vec + index, &work_list);
+check_alan("run 1");
 		while (!list_empty(head)) {
 			void (*fn)(unsigned long);
 			unsigned long data;
@@ -1148,6 +1310,7 @@ static unsigned long __next_timer_interr
 	/* Look for timer events in tv1. */
 	index = slot = timer_jiffies & TVR_MASK;
 	do {
+check_timer_list(base->tv1.vec + slot, "next_timer_1");
 		list_for_each_entry(nte, base->tv1.vec + slot, entry) {
 			if (tbase_get_deferrable(nte->base))
 				continue;
@@ -1179,6 +1342,7 @@ cascade:
 
 		index = slot = timer_jiffies & TVN_MASK;
 		do {
+check_timer_list(varp->vec + slot, "next_timer_2");
 			list_for_each_entry(nte, varp->vec + slot, entry) {
 				if (tbase_get_deferrable(nte->base))
 					continue;
Index: usb-3.1/mm/backing-dev.c
===================================================================
--- usb-3.1.orig/mm/backing-dev.c
+++ usb-3.1/mm/backing-dev.c
@@ -308,7 +308,7 @@ static void sync_supers_timer_fn(unsigne
 	bdi_arm_supers_timer();
 }
 
-static void wakeup_timer_fn(unsigned long data)
+void wakeup_timer_fn(unsigned long data)
 {
 	struct backing_dev_info *bdi = (struct backing_dev_info *)data;
 
@@ -328,6 +328,8 @@ static void wakeup_timer_fn(unsigned lon
 	spin_unlock_bh(&bdi->wb_lock);
 }
 
+void *last_bdi_unreg;
+
 /*
  * This function is used when the first inode for this bdi is marked dirty. It
  * wakes-up the corresponding bdi thread which should then take care of the
@@ -345,6 +347,8 @@ void bdi_wakeup_thread_delayed(struct ba
 
 	timeout = msecs_to_jiffies(dirty_writeback_interval * 10);
 	mod_timer(&bdi->wb.wakeup_timer, jiffies + timeout);
+if (bdi == last_bdi_unreg)
+	dump_stack();
 }
 
 /*
@@ -547,6 +551,7 @@ int bdi_register(struct backing_dev_info
 			return PTR_ERR(wb->task);
 	}
 
+pr_info("bdi register %s %p\n", dev_name(dev), bdi);
 	bdi_debug_register(bdi, dev_name(dev));
 	set_bit(BDI_registered, &bdi->state);
 
@@ -617,6 +622,8 @@ void bdi_unregister(struct backing_dev_i
 		bdi_set_min_ratio(bdi, 0);
 		trace_writeback_bdi_unregister(bdi);
 		bdi_prune_sb(bdi);
+pr_info("bdi_unreg: wb %p bdi %p\n", &bdi->wb, bdi);
+last_bdi_unreg = bdi;
 		del_timer_sync(&bdi->wb.wakeup_timer);
 
 		if (!bdi_cap_flush_forker(bdi))
@@ -632,6 +639,8 @@ static void bdi_wb_init(struct bdi_write
 {
 	memset(wb, 0, sizeof(*wb));
 
+pr_info("bdi_wb_init: wb %p bdi %p\n", wb, bdi);
+last_bdi_unreg = NULL;
 	wb->bdi = bdi;
 	wb->last_old_flush = jiffies;
 	INIT_LIST_HEAD(&wb->b_dirty);
Index: usb-3.1/drivers/usb/core/usb.c
===================================================================
--- usb-3.1.orig/drivers/usb/core/usb.c
+++ usb-3.1/drivers/usb/core/usb.c
@@ -974,6 +974,29 @@ struct dentry *usb_debug_root;
 EXPORT_SYMBOL_GPL(usb_debug_root);
 
 static struct dentry *usb_debug_devices;
+static struct dentry *alan_dentry;
+
+static ssize_t alan_write(struct file *fd, const char __user *buf,
+		size_t len, loff_t *ptr)
+{
+	unsigned long addr;
+	char buf2[16];
+	void init_alan(unsigned long);
+
+	if (len >= 16)
+		return -EINVAL;
+	buf2[len] = 0;
+	if (copy_from_user(buf2, buf, len))
+		return -EFAULT;
+
+	addr = simple_strtoul(buf2, NULL, 16);
+	init_alan(addr);
+	return len;
+}
+
+static const struct file_operations alan_fops = {
+	.write = alan_write,
+};
 
 static int usb_debugfs_init(void)
 {
@@ -990,11 +1013,17 @@ static int usb_debugfs_init(void)
 		return -ENOENT;
 	}
 
+	alan_dentry = debugfs_create_file("alan", 0200,
+				usb_debug_root, NULL, &alan_fops);
+	if (!alan_dentry)
+		pr_err("Unable to register alan\n");
+
 	return 0;
 }
 
 static void usb_debugfs_cleanup(void)
 {
+	debugfs_remove(alan_dentry);
 	debugfs_remove(usb_debug_devices);
 	debugfs_remove(usb_debug_root);
 }


^ permalink raw reply	[flat|nested] 111+ messages in thread

end of thread, other threads:[~2012-07-02 13:25 UTC | newest]

Thread overview: 111+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-25832-13602@https.bugzilla.kernel.org/>
2011-02-03 17:09 ` [Bug 25832] kernel crashes upon resume if usb devices are removed when suspended bugzilla-daemon
2011-02-03 18:22 ` bugzilla-daemon
2011-02-04  6:28 ` bugzilla-daemon
2011-02-04  6:30 ` bugzilla-daemon
2011-02-04 15:31 ` bugzilla-daemon
2011-02-05  8:31 ` bugzilla-daemon
2011-02-05  8:53 ` bugzilla-daemon
2011-02-05 19:12 ` bugzilla-daemon
2011-02-05 19:56 ` bugzilla-daemon
2011-02-05 19:58 ` bugzilla-daemon
2011-02-05 20:53 ` bugzilla-daemon
2011-02-05 23:10 ` bugzilla-daemon
2011-02-06  2:54 ` bugzilla-daemon
2011-02-06  3:44 ` bugzilla-daemon
2011-02-06  4:26 ` bugzilla-daemon
2011-02-06  6:43 ` bugzilla-daemon
2011-02-06  9:00 ` bugzilla-daemon
2011-02-07  1:20 ` bugzilla-daemon
2011-02-07  3:11 ` bugzilla-daemon
2011-02-07  3:32 ` bugzilla-daemon
2011-02-07  3:33 ` bugzilla-daemon
2011-02-07  4:24 ` bugzilla-daemon
2011-02-07 15:36 ` bugzilla-daemon
2011-02-07 23:49 ` bugzilla-daemon
2011-02-19 12:36 ` bugzilla-daemon
2011-02-19 15:55 ` bugzilla-daemon
2011-02-20  0:16 ` bugzilla-daemon
2011-02-21  4:02 ` bugzilla-daemon
2011-02-21  4:08 ` bugzilla-daemon
2011-02-21  8:59 ` bugzilla-daemon
2011-02-21 16:48 ` bugzilla-daemon
2011-02-22  7:03 ` bugzilla-daemon
2011-03-03 15:23 ` bugzilla-daemon
2011-03-06 10:25 ` bugzilla-daemon
2011-03-06 15:59 ` bugzilla-daemon
2011-03-12  6:03 ` bugzilla-daemon
2011-03-12 12:12 ` bugzilla-daemon
2011-03-12 15:51 ` bugzilla-daemon
2011-03-13  0:41 ` bugzilla-daemon
2011-03-13  2:26 ` bugzilla-daemon
2011-03-13  2:45 ` bugzilla-daemon
2011-03-13  3:01 ` bugzilla-daemon
2011-03-13 21:17 ` bugzilla-daemon
2011-03-14  4:10 ` bugzilla-daemon
2011-03-14 14:18 ` bugzilla-daemon
2011-03-24 23:06 ` bugzilla-daemon
2011-03-31  0:44 ` bugzilla-daemon
2011-03-31  0:49 ` bugzilla-daemon
2011-04-01 15:09 ` bugzilla-daemon
2011-04-01 23:58 ` bugzilla-daemon
2011-04-04  2:32 ` bugzilla-daemon
2011-04-04 14:11 ` bugzilla-daemon
2011-04-05 15:17 ` bugzilla-daemon
2011-04-06 23:01 ` bugzilla-daemon
2011-04-22  6:00 ` bugzilla-daemon
2011-04-22 10:13 ` bugzilla-daemon
2011-04-22 10:37 ` bugzilla-daemon
2011-04-22 11:58 ` bugzilla-daemon
2011-04-22 13:42 ` [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed bugzilla-daemon
2011-04-22 15:00 ` bugzilla-daemon
2011-04-23  0:32 ` bugzilla-daemon
2011-04-23  4:12 ` bugzilla-daemon
2011-04-23 19:31 ` bugzilla-daemon
2011-04-24  1:35 ` bugzilla-daemon
2011-04-25  0:36 ` bugzilla-daemon
2011-04-25  0:37 ` bugzilla-daemon
2011-04-25  0:39 ` bugzilla-daemon
2011-04-25 20:28 ` bugzilla-daemon
2011-04-26  0:28 ` bugzilla-daemon
2011-04-26  0:44 ` bugzilla-daemon
2011-04-26  1:22 ` bugzilla-daemon
2011-04-26  3:29 ` bugzilla-daemon
2011-04-26  4:02 ` bugzilla-daemon
2011-04-26 18:15 ` bugzilla-daemon
2011-05-03  2:19 ` bugzilla-daemon
2011-05-04  7:36 ` bugzilla-daemon
2011-05-10 23:27 ` bugzilla-daemon
2011-05-26  6:44 ` bugzilla-daemon
2011-05-26 14:27 ` bugzilla-daemon
2011-07-13  7:52 ` bugzilla-daemon
2011-08-31  5:00 ` bugzilla-daemon
2011-08-31  5:07 ` bugzilla-daemon
2011-08-31 14:36 ` bugzilla-daemon
2011-08-31 23:43 ` bugzilla-daemon
2011-09-01  1:30 ` bugzilla-daemon
2011-09-04  3:53 ` bugzilla-daemon
2011-09-04 13:55 ` bugzilla-daemon
2011-09-04 14:00 ` bugzilla-daemon
2011-09-05 17:44 ` bugzilla-daemon
2011-09-09 19:13   ` Ted Ts'o
2011-09-09 22:10     ` Alan Stern
     [not found]       ` <BAY151-W6176D929049AA9E2BDBAEBA1000@phx.gbl>
2011-09-10 14:06         ` Ted Ts'o
2011-09-10 18:07     ` Alan Stern
2011-09-12  1:58     ` Alan Stern
2012-07-02 13:24 ` bugzilla-daemon
     [not found] <BAY151-W3498E8491E671BDAE90421A1070@phx.gbl>
2011-09-16 16:28 ` Alan Stern
     [not found] <BAY151-W1224E6C1A20D179965A149A1090@phx.gbl>
2011-09-17 13:21 ` Alan Stern
     [not found] <BAY151-W32DCB4BAFEC97DD4913A12A1090@phx.gbl>
2011-09-17 17:34 ` Alan Stern
2011-09-18 23:00   ` Ben Hutchings
2011-09-20  7:32     ` Jun'ichi Nomura
2011-09-22 12:26       ` Hannes Reinecke
2011-09-22 12:26         ` Hannes Reinecke
2011-09-22 12:35         ` James Bottomley
2011-09-22 15:16         ` Alan Stern
2011-09-22 15:16           ` Alan Stern
2011-09-22 16:20           ` Thadeu Lima de Souza Cascardo
2011-09-22 16:32             ` Hannes Reinecke
2011-09-22 16:32               ` Hannes Reinecke
     [not found] <BAY151-W234D9A977DF076A732C2AAA1080@phx.gbl>
2011-09-18 14:43 ` Alan Stern
     [not found] <BAY151-W13DDCCEFEB7B68EE506214A10C0@phx.gbl>
2011-09-23 15:18 ` Alan Stern
2011-09-23 15:18   ` Alan Stern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.