linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fs: Fix mod_timer crash when removing USB sticks
@ 2012-01-12 21:15 Paul Taysom
  2012-01-12 21:38 ` Greg KH
  2012-01-13  5:42 ` Josh Boyer
  0 siblings, 2 replies; 25+ messages in thread
From: Paul Taysom @ 2012-01-12 21:15 UTC (permalink / raw)
  Cc: Paul Taysom, Paul Taysom, Mandeep Baines, Olof Johansson,
	Greg KH, Jens Axboe, Theodore Tso, Andrew Morton, linux-usb,
	linux-kernel, Alexander Viro, linux-fsdevel

From: Paul Taysom <taysom@google.com>

A USB stick with a ext file system on it, would occasionally crash
when the stick was pulled.

The problem was a timer was being set on the Backing Device Interface,
bdi, after the USB device had been removed and the bdi had been
unregistered. The bdi would then be later reinitialized by zeroing
the timer without removing from the timer from the timer queue.
This would eventually result in a kernel crash (NULL ptr dereference).

When the bdi is unregistered, the dev field is set to NULL. This
indication is used by bdi_unregister to only unregister the device
once.

Fix: When the backing device is invalidated, the mapping backing_dev_info
should be redirected to the default_backing_dev_info.

Created 3 USB sticks with ext2, ext4 and one with both apple and DOS
file systems on it. Inserted and removed USB sticks many times in random
order. With out the bug fix, the kernel would soon crash. With the fix,
it did not. Ran on both stumpy and amd64-generic.

Change-Id: Icdd06cf3ced555dcd9994cfcc9478a9071a802f1
Signed-off-by: Paul Taysom <taysom@chromium.org>
Downstream-bug-report: http://crosbug.com/24165
Cc: Mandeep Baines <msb@chromium.org>
Cc: Olof Johansson <olofj@chromium.org>
Cc: Greg KH <greg@kroah.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Theodore Tso <tytso@google.com>
Cc: Andrew Morton <akpm@google.com>
Cc: <linux-usb@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <linux-fsdevel@vger.kernel.org>
---
 fs/block_dev.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index afe74dd..322cd05 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -110,6 +110,7 @@ void invalidate_bdev(struct block_device *bdev)
 	 * But, for the strange corners, lets be cautious
 	 */
 	cleancache_flush_inode(mapping);
+	mapping->backing_dev_info = &default_backing_dev_info;
 }
 EXPORT_SYMBOL(invalidate_bdev);
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] fs: Fix mod_timer crash when removing USB sticks
  2012-01-12 21:15 [PATCH] fs: Fix mod_timer crash when removing USB sticks Paul Taysom
@ 2012-01-12 21:38 ` Greg KH
  2012-01-12 21:53   ` Mandeep Singh Baines
  2012-01-13  5:42 ` Josh Boyer
  1 sibling, 1 reply; 25+ messages in thread
From: Greg KH @ 2012-01-12 21:38 UTC (permalink / raw)
  To: Paul Taysom
  Cc: Paul Taysom, Mandeep Baines, Olof Johansson, Jens Axboe,
	Theodore Tso, Andrew Morton, linux-usb, linux-kernel,
	Alexander Viro, linux-fsdevel

On Thu, Jan 12, 2012 at 01:15:35PM -0800, Paul Taysom wrote:
> From: Paul Taysom <taysom@google.com>
> 
> A USB stick with a ext file system on it, would occasionally crash
> when the stick was pulled.
> 
> The problem was a timer was being set on the Backing Device Interface,
> bdi, after the USB device had been removed and the bdi had been
> unregistered. The bdi would then be later reinitialized by zeroing
> the timer without removing from the timer from the timer queue.
> This would eventually result in a kernel crash (NULL ptr dereference).
> 
> When the bdi is unregistered, the dev field is set to NULL. This
> indication is used by bdi_unregister to only unregister the device
> once.
> 
> Fix: When the backing device is invalidated, the mapping backing_dev_info
> should be redirected to the default_backing_dev_info.
> 
> Created 3 USB sticks with ext2, ext4 and one with both apple and DOS
> file systems on it. Inserted and removed USB sticks many times in random
> order. With out the bug fix, the kernel would soon crash. With the fix,
> it did not. Ran on both stumpy and amd64-generic.
> 
> Change-Id: Icdd06cf3ced555dcd9994cfcc9478a9071a802f1

What is this field for?  It makes no sense for a kernel patch
submission.

> Signed-off-by: Paul Taysom <taysom@chromium.org>
> Downstream-bug-report: http://crosbug.com/24165

Is that a regular field that we now use?

And shouldn't this go to the stable kernel releases as well?

Third time's a charm?

greg k-h

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] fs: Fix mod_timer crash when removing USB sticks
  2012-01-12 21:38 ` Greg KH
@ 2012-01-12 21:53   ` Mandeep Singh Baines
  2012-01-12 22:02     ` Greg KH
                       ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Mandeep Singh Baines @ 2012-01-12 21:53 UTC (permalink / raw)
  To: Greg KH
  Cc: Paul Taysom, Paul Taysom, Mandeep Baines, Olof Johansson,
	Jens Axboe, Theodore Tso, Andrew Morton, linux-usb, linux-kernel,
	Alexander Viro, linux-fsdevel

Greg KH (greg@kroah.com) wrote:
> On Thu, Jan 12, 2012 at 01:15:35PM -0800, Paul Taysom wrote:
> > From: Paul Taysom <taysom@google.com>
> > 
> > A USB stick with a ext file system on it, would occasionally crash
> > when the stick was pulled.
> > 
> > The problem was a timer was being set on the Backing Device Interface,
> > bdi, after the USB device had been removed and the bdi had been
> > unregistered. The bdi would then be later reinitialized by zeroing
> > the timer without removing from the timer from the timer queue.
> > This would eventually result in a kernel crash (NULL ptr dereference).
> > 
> > When the bdi is unregistered, the dev field is set to NULL. This
> > indication is used by bdi_unregister to only unregister the device
> > once.
> > 
> > Fix: When the backing device is invalidated, the mapping backing_dev_info
> > should be redirected to the default_backing_dev_info.
> > 
> > Created 3 USB sticks with ext2, ext4 and one with both apple and DOS
> > file systems on it. Inserted and removed USB sticks many times in random
> > order. With out the bug fix, the kernel would soon crash. With the fix,
> > it did not. Ran on both stumpy and amd64-generic.
> > 
> > Change-Id: Icdd06cf3ced555dcd9994cfcc9478a9071a802f1
> 
> What is this field for?  It makes no sense for a kernel patch
> submission.
> 
> > Signed-off-by: Paul Taysom <taysom@chromium.org>
> > Downstream-bug-report: http://crosbug.com/24165
> 
> Is that a regular field that we now use?
> 

Hi Greg,

What is the conventional way of doing this? There is a lot of good
data in the bug report which might be useful to reviewers. We
couldn't find a de-facto way of referencing the downstream bug database
so we just made up a new field. Sorry. We'll use the correct
field name next time.

So what is the correct field name?

Regards,
Mandeep

> And shouldn't this go to the stable kernel releases as well?
> 
> Third time's a charm?
> 
> greg k-h

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] fs: Fix mod_timer crash when removing USB sticks
  2012-01-12 21:53   ` Mandeep Singh Baines
@ 2012-01-12 22:02     ` Greg KH
  2012-01-12 22:11     ` Theodore Tso
       [not found]     ` <CAGagf4dk4KsZSkaWTO9Yegi=_wRJsYBPgfyks1z=wMZJV8gX0w@mail.gmail.com>
  2 siblings, 0 replies; 25+ messages in thread
From: Greg KH @ 2012-01-12 22:02 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: Paul Taysom, Paul Taysom, Olof Johansson, Jens Axboe,
	Theodore Tso, Andrew Morton, linux-usb, linux-kernel,
	Alexander Viro, linux-fsdevel

On Thu, Jan 12, 2012 at 01:53:31PM -0800, Mandeep Singh Baines wrote:
> > > Downstream-bug-report: http://crosbug.com/24165
> > 
> > Is that a regular field that we now use?
> > 
> 
> Hi Greg,
> 
> What is the conventional way of doing this? There is a lot of good
> data in the bug report which might be useful to reviewers. We
> couldn't find a de-facto way of referencing the downstream bug database
> so we just made up a new field. Sorry. We'll use the correct
> field name next time.
> 
> So what is the correct field name?

I don't know, I was asking if this is what we are now using.  I usually
just put it in the body of the changelog, something like:

	More details about this can be found at: http://crosbug.com/24165

Making sure that no relevant info is only at that link and not in the
changelog comment, as bug reporting sites have a tendancy to disappear
at times.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] fs: Fix mod_timer crash when removing USB sticks
  2012-01-12 21:53   ` Mandeep Singh Baines
  2012-01-12 22:02     ` Greg KH
@ 2012-01-12 22:11     ` Theodore Tso
       [not found]     ` <CAGagf4dk4KsZSkaWTO9Yegi=_wRJsYBPgfyks1z=wMZJV8gX0w@mail.gmail.com>
  2 siblings, 0 replies; 25+ messages in thread
From: Theodore Tso @ 2012-01-12 22:11 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: Greg KH, Paul Taysom, Paul Taysom, Olof Johansson, Jens Axboe,
	Andrew Morton, linux-usb, linux-kernel, Alexander Viro,
	linux-fsdevel

On Thu, Jan 12, 2012 at 4:53 PM, Mandeep Singh Baines <msb@chromium.org> wrote:
>
> What is the conventional way of doing this? There is a lot of good
> data in the bug report which might be useful to reviewers. We
> couldn't find a de-facto way of referencing the downstream bug database
> so we just made up a new field. Sorry. We'll use the correct
> field name next time.

There isn't a "correct field name", since it hasn't been standardized.
I can tell you as an the ext4 maintainer, I've put things like:

Addresses-Redhat-Bugzilla: <Bugzilla #>

or

Addresses-Debian-Bug: <debian-bug-number>

etc. in commit messages.  I usually put them before the signed-off-block,
i.e, like this:

--------
ext4: a simple commit

This is the longer commit description, blah, blah, blah.

Addresses-Redhat-Bugzilla: 12345

Signed-off-by: Eric Sandeen <....@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-----------

There is some disagreement about whether it's useful to do this
for bugzilla entries which are not publically available (i.e., protected
Enterprise Linux bugzillas which have customer information and
thus are restricted access, or internal Company bug numbers).

My personal take on this matter is that if an engineer from a
particular company has taken the time and effort to contribute
a bug that fixes a bug that they had seen internally, and they
include an internal bug number, it's presumably to make it easier
to track when a particular commit fixing a bug that they really
care about has hit upstream.  It costs me very little to keep
the bug tracker reference, even if I don't have personal access
to it, and given the code contributor has helped me by submitting
a bug fix, it's nice to be able to do a little something nice in
return.  The extra line doesn't really bloat the "git log" output
by all that much.

Others, including Greg K-H, think it's a horrible thing to do,
and will reject commits on that basis.

So it all depends on which subsystem maintainer handles
your bug.....  at the end, it's up to the maintainer.  I've never
had Linus complain to me because the commits that I ask him
to pull might include "Google-Bug-Id: 12345" lines in the commit
description.

-- Ted

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] fs: Fix mod_timer crash when removing USB sticks
       [not found]     ` <CAGagf4dk4KsZSkaWTO9Yegi=_wRJsYBPgfyks1z=wMZJV8gX0w@mail.gmail.com>
@ 2012-01-12 22:35       ` Mandeep Singh Baines
  2012-01-12 23:23         ` Andrew Morton
  0 siblings, 1 reply; 25+ messages in thread
From: Mandeep Singh Baines @ 2012-01-12 22:35 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Mandeep Singh Baines, Greg KH, Paul Taysom, Paul Taysom,
	Olof Johansson, Jens Axboe, Andrew Morton, linux-usb,
	linux-kernel, Alexander Viro, linux-fsdevel

Theodore Tso (tytso@google.com) wrote:
> On Thu, Jan 12, 2012 at 4:53 PM, Mandeep Singh Baines <msb@chromium.org>wrote:
> 
> >
> > Hi Greg,
> >
> > What is the conventional way of doing this? There is a lot of good
> > data in the bug report which might be useful to reviewers. We
> > couldn't find a de-facto way of referencing the downstream bug database
> > so we just made up a new field. Sorry. We'll use the correct
> > field name next time.
> >
> 
> There isn't a "correct field name", since it hasn't been standardized.  I
> can tell you as an the ext4 maintainer, I've put things like
> 
> Addresses-Redhat-Bugzilla: <Bugzilla #>
> 
> or
> 
> Addresses-Debian-Bug: <debian-bug-number>
> 

This seems like a good convention. Its also nice in that it scales to
the case where the bug is reported by multiple distros.

For future, we'll use:

Addresses-ChromiumOS-Bug: http://crosbug.com/<Bug #>

I included the URL because its small and folks might not know where
our bug database is.

Regards,
Mandeep

> etc. in commit messages.  I usually put them before the signed-off-block,
> i.e, like this:
> 
> --------
> ext4: a simple commit
> 
> This is the longer commit description, blah, blah, blah.
> 
> Addresses-Redhat-Bugzilla: 12345
> 
> Signed-off-by: Eric Sandeen <....@redhat.com>
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> -----------
> 
> There is some disagreement about whether it's useful to do this for
> bugzilla entries which are not publically available (i.e., protected
> Enterprise Linux bugzillas which have customer information and thus are
> restricted access, or internal Company bug numbers).
> 
> My personal take on this matter is that if an engineer from a particular
> company has taken the time and effort to contribute a bug that fixes a bug
> that they had seen internally, and they include an internal bug number,
> it's presumably to make it easier to track when a particular commit fixing
> a bug that they really care about has hit upstream, and it's in my interest
> to keep the code contributions coming, and it's very little effort to
> include an internal bug tracker reference, even if I don't have personal
> access to said bug tracker;  it's a nice thing I can do that doesn't cost
> much, and may be useful to the original patch submitter.
> 
> Others, including Greg K-H, think it's a horrible thing to do, and will
> reject commits on that basis.
> 
> So it all depends on which subsystem maintainer handles your bug.....  at
> the end, it's up to the maintainer.  I've never had Linus complain to me
> because the commits that I ask him to pull might include "Google-Bug-Id:
> 12345" lines in the commit description.
> 
> -- Ted
> 
> 
> 
> 
> 
> 
> >
> > So what is the correct field name?
> >
> > Regards,
> > Mandeep
> >
> > > And shouldn't this go to the stable kernel releases as well?
> > >
> > > Third time's a charm?
> > >
> > > greg k-h
> >

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] fs: Fix mod_timer crash when removing USB sticks
  2012-01-12 22:35       ` Mandeep Singh Baines
@ 2012-01-12 23:23         ` Andrew Morton
  0 siblings, 0 replies; 25+ messages in thread
From: Andrew Morton @ 2012-01-12 23:23 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: Theodore Tso, Greg KH, Paul Taysom, Paul Taysom, Olof Johansson,
	Jens Axboe, Andrew Morton, linux-usb, linux-kernel,
	Alexander Viro, linux-fsdevel

On Thu, 12 Jan 2012 14:35:44 -0800
Mandeep Singh Baines <msb@chromium.org> wrote:

> Theodore Tso (tytso@google.com) wrote:
> > On Thu, Jan 12, 2012 at 4:53 PM, Mandeep Singh Baines <msb@chromium.org>wrote:
> > 
> > >
> > > Hi Greg,
> > >
> > > What is the conventional way of doing this? There is a lot of good
> > > data in the bug report which might be useful to reviewers. We
> > > couldn't find a de-facto way of referencing the downstream bug database
> > > so we just made up a new field. Sorry. We'll use the correct
> > > field name next time.
> > >
> > 
> > There isn't a "correct field name", since it hasn't been standardized.  I
> > can tell you as an the ext4 maintainer, I've put things like
> > 
> > Addresses-Redhat-Bugzilla: <Bugzilla #>
> > 
> > or
> > 
> > Addresses-Debian-Bug: <debian-bug-number>
> > 
> 
> This seems like a good convention. Its also nice in that it scales to
> the case where the bug is reported by multiple distros.
> 
> For future, we'll use:
> 
> Addresses-ChromiumOS-Bug: http://crosbug.com/<Bug #>
> 
> I included the URL because its small and folks might not know where
> our bug database is.

I put "Addresses <some URL>" into the changelog.  Just in the text body,
not as one of the tag:value pairs at the end of the changelog.

z:/usr/src/git26> git log | grep "Addresses http" | wc -l
121


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] fs: Fix mod_timer crash when removing USB sticks
  2012-01-12 21:15 [PATCH] fs: Fix mod_timer crash when removing USB sticks Paul Taysom
  2012-01-12 21:38 ` Greg KH
@ 2012-01-13  5:42 ` Josh Boyer
  2012-01-13 15:39   ` Paul Taysom
  1 sibling, 1 reply; 25+ messages in thread
From: Josh Boyer @ 2012-01-13  5:42 UTC (permalink / raw)
  To: Paul Taysom
  Cc: Paul Taysom, Mandeep Baines, Olof Johansson, Greg KH, Jens Axboe,
	Theodore Tso, Andrew Morton, linux-usb, linux-kernel,
	Alexander Viro, linux-fsdevel

On Thu, Jan 12, 2012 at 4:15 PM, Paul Taysom <taysom@chromium.org> wrote:
> From: Paul Taysom <taysom@google.com>
>
> A USB stick with a ext file system on it, would occasionally crash
> when the stick was pulled.
>
> The problem was a timer was being set on the Backing Device Interface,
> bdi, after the USB device had been removed and the bdi had been
> unregistered. The bdi would then be later reinitialized by zeroing
> the timer without removing from the timer from the timer queue.
> This would eventually result in a kernel crash (NULL ptr dereference).

What backtrace did you see when it crashed?  There have been a number of
reports where sd_revalidate_disk is at the top of the backtrace and this sounds
like it might be related.

josh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] fs: Fix mod_timer crash when removing USB sticks
  2012-01-13  5:42 ` Josh Boyer
@ 2012-01-13 15:39   ` Paul Taysom
  0 siblings, 0 replies; 25+ messages in thread
From: Paul Taysom @ 2012-01-13 15:39 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Paul Taysom, Mandeep Baines, Olof Johansson, Greg KH, Jens Axboe,
	Theodore Tso, Andrew Morton, linux-usb, linux-kernel,
	Alexander Viro, linux-fsdevel

On Thu, Jan 12, 2012 at 9:42 PM, Josh Boyer <jwboyer@gmail.com> wrote:
> On Thu, Jan 12, 2012 at 4:15 PM, Paul Taysom <taysom@chromium.org> wrote:
>> From: Paul Taysom <taysom@google.com>
>>
>> A USB stick with a ext file system on it, would occasionally crash
>> when the stick was pulled.
>>
>> The problem was a timer was being set on the Backing Device Interface,
>> bdi, after the USB device had been removed and the bdi had been
>> unregistered. The bdi would then be later reinitialized by zeroing
>> the timer without removing from the timer from the timer queue.
>> This would eventually result in a kernel crash (NULL ptr dereference).
>
> What backtrace did you see when it crashed?  There have been a number of
> reports where sd_revalidate_disk is at the top of the backtrace and this sounds
> like it might be related.
>
> josh

I had several different crash traces some can be seen in
http://crosbug.com/24165

The reason that there are different crash traces is that the code path
to crash would just be the victim of the memory corruption caused
earlier. However, most of them had to do with setting or clearing
timers. There are additional problems related to lost I/O requests
that I'm still looking at.
-Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] fs: Fix mod_timer crash when removing USB sticks
  2012-03-20  2:02                   ` Alan Stern
@ 2012-03-22 16:15                     ` Paul Taysom
  0 siblings, 0 replies; 25+ messages in thread
From: Paul Taysom @ 2012-03-22 16:15 UTC (permalink / raw)
  To: Alan Stern
  Cc: Mandeep Singh Baines, Ted Ts'o, Theodore Tso, Greg KH,
	Paul Taysom, Jens Axboe, Andrew Morton, linux-usb, linux-kernel,
	Alexander Viro, linux-fsdevel, stable

On Mon, Mar 19, 2012 at 7:02 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Mon, 19 Mar 2012, Paul Taysom wrote:
>
>> I have rerun my tests without my change on the 3.2.7 kernel and I was
>> not able to get it to crash. I even put some code in to do the early
>> detection so I didn't have to wait for another thread to stumble
>> across the corruption. The way I test is with several flash drivers
>> with ext2, ext3, ext4, FAT, and HPFS file systems and just repeatedly
>> plug and unplug them. When a flash drive is plugged in with a file
>> system, it is automatically mounted.
>
> That's not the right way to test a race like this.  The right way is to
> insert ssleep() calls at some appropriate spots, so that you can force
> the race to come out the way you want every time.
>
> What about the question of resetting the bdi pointer at the same time
> as some other thread is using it?  Have you tested whether Mandeep's
> locking suggestion will prevent problems?
>
> Alan Stern
>

The checks I added to timer.c seems to the defect every time. With
3.2.7, it never sees the defect.
Paul Taysom

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] fs: Fix mod_timer crash when removing USB sticks
  2012-03-20  0:24                 ` Paul Taysom
@ 2012-03-20  2:02                   ` Alan Stern
  2012-03-22 16:15                     ` Paul Taysom
  0 siblings, 1 reply; 25+ messages in thread
From: Alan Stern @ 2012-03-20  2:02 UTC (permalink / raw)
  To: Paul Taysom
  Cc: Mandeep Singh Baines, Ted Ts'o, Theodore Tso, Greg KH,
	Paul Taysom, Jens Axboe, Andrew Morton, linux-usb, linux-kernel,
	Alexander Viro, linux-fsdevel, stable

On Mon, 19 Mar 2012, Paul Taysom wrote:

> I have rerun my tests without my change on the 3.2.7 kernel and I was
> not able to get it to crash. I even put some code in to do the early
> detection so I didn't have to wait for another thread to stumble
> across the corruption. The way I test is with several flash drivers
> with ext2, ext3, ext4, FAT, and HPFS file systems and just repeatedly
> plug and unplug them. When a flash drive is plugged in with a file
> system, it is automatically mounted.

That's not the right way to test a race like this.  The right way is to 
insert ssleep() calls at some appropriate spots, so that you can force 
the race to come out the way you want every time.

What about the question of resetting the bdi pointer at the same time 
as some other thread is using it?  Have you tested whether Mandeep's 
locking suggestion will prevent problems?

Alan Stern


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] fs: Fix mod_timer crash when removing USB sticks
  2012-03-18 22:25               ` Mandeep Singh Baines
@ 2012-03-20  0:24                 ` Paul Taysom
  2012-03-20  2:02                   ` Alan Stern
  0 siblings, 1 reply; 25+ messages in thread
From: Paul Taysom @ 2012-03-20  0:24 UTC (permalink / raw)
  To: Mandeep Singh Baines
  Cc: Alan Stern, Ted Ts'o, Theodore Tso, Greg KH, Paul Taysom,
	Jens Axboe, Andrew Morton, linux-usb, linux-kernel,
	Alexander Viro, linux-fsdevel, stable

On Sun, Mar 18, 2012 at 3:25 PM, Mandeep Singh Baines <msb@chromium.org> wrote:
> On Sun, Mar 18, 2012 at 1:23 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
>> On Sat, 17 Mar 2012, Ted Ts'o wrote:
>>
>>> I can't help thinking that the fact that we're constantly playing
>>> whack-a-mole trying to fix various random crashes when devices
>>> disappear that perhaps we should consider if there's a better way to
>>> do things.
>>
>> Indeed, as Jens's patch mentions, proper reference counting for the BDI
>> stuff hasn't been implemented yet.  Obviously it will require somebody
>> who really does know the code (i.e., not me).
>>
>> For example, when Paul's patch assigns &default_backing_dev_info, is
>> the assignment synchronized by any sort of lock?  I can't tell -- but
>> if it isn't then the possibility of a race will still exist.
>>
>
> I think its safe without a lock (assuming the assignment is atomic) but it
> wouldn't hurt to add an i_lock. That would also give you a barrier which
> is needed to propagate the assignment to other CPUs.
>
> This is not a perfect fix but its pretty safe and is nice in that it works
> independent of filesystem or bus-type.
>
> Regards,
> Mandeep
>
>>> The fact that at the file system layer I have **no** idea that a
>>> device has disappeared, and just blindly going on trying to write to a
>>> device which is gone just seems a little crazy to me...  why shouldn't
>>> block layer inform the upper layers about something as fundamental as,
>>> "the device is gone and is never coming back"?
>>
>> Playing devil's advocate...  What would you do differently if you did
>> know the device was gone?  All I/O operations will fail regardless, and
>> presumably with an error code like -ENODEV.  Pretty much all you could
>> do would be to fail them a little earlier.
>>
>>> > I suspect Paul's patch is the right thing to do.  It might even make
>>> > the ext4 fix unnecessary, although I don't understand the details well
>>> > enough to verify it.  Maybe Paul can check -- the commit I'm referring
>>> > to is 7c2e70879fc0949b4220ee61b7c4553f6976a94d (ext4: add ext4-specific
>>> > kludge to avoid an oops after the disk disappears).
>>>
>>> I have no idea either, because it's not obvious to me what data
>>> structures can be relied upon, and what can't, and when things are
>>> supposed to get freed on sudden device disconnects.  The fact that
>>> none of us are sure is part of what makes me think that the current
>>> scheme is, perhaps, non-optimal...
>>
>> That's why someone like Jens or Al needs to take a close look at this
>> (hint, hint).
>>
>> Alan Stern
>>

I have rerun my tests without my change on the 3.2.7 kernel and I was
not able to get it to crash. I even put some code in to do the early
detection so I didn't have to wait for another thread to stumble
across the corruption. The way I test is with several flash drivers
with ext2, ext3, ext4, FAT, and HPFS file systems and just repeatedly
plug and unplug them. When a flash drive is plugged in with a file
system, it is automatically mounted.

Paul Taysom

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] fs: Fix mod_timer crash when removing USB sticks
  2012-03-18 20:23             ` Alan Stern
@ 2012-03-18 22:25               ` Mandeep Singh Baines
  2012-03-20  0:24                 ` Paul Taysom
  0 siblings, 1 reply; 25+ messages in thread
From: Mandeep Singh Baines @ 2012-03-18 22:25 UTC (permalink / raw)
  To: Alan Stern
  Cc: Ted Ts'o, Theodore Tso, Greg KH, Paul Taysom, Paul Taysom,
	Jens Axboe, Andrew Morton, linux-usb, linux-kernel,
	Alexander Viro, linux-fsdevel, stable

On Sun, Mar 18, 2012 at 1:23 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Sat, 17 Mar 2012, Ted Ts'o wrote:
>
>> I can't help thinking that the fact that we're constantly playing
>> whack-a-mole trying to fix various random crashes when devices
>> disappear that perhaps we should consider if there's a better way to
>> do things.
>
> Indeed, as Jens's patch mentions, proper reference counting for the BDI
> stuff hasn't been implemented yet.  Obviously it will require somebody
> who really does know the code (i.e., not me).
>
> For example, when Paul's patch assigns &default_backing_dev_info, is
> the assignment synchronized by any sort of lock?  I can't tell -- but
> if it isn't then the possibility of a race will still exist.
>

I think its safe without a lock (assuming the assignment is atomic) but it
wouldn't hurt to add an i_lock. That would also give you a barrier which
is needed to propagate the assignment to other CPUs.

This is not a perfect fix but its pretty safe and is nice in that it works
independent of filesystem or bus-type.

Regards,
Mandeep

>> The fact that at the file system layer I have **no** idea that a
>> device has disappeared, and just blindly going on trying to write to a
>> device which is gone just seems a little crazy to me...  why shouldn't
>> block layer inform the upper layers about something as fundamental as,
>> "the device is gone and is never coming back"?
>
> Playing devil's advocate...  What would you do differently if you did
> know the device was gone?  All I/O operations will fail regardless, and
> presumably with an error code like -ENODEV.  Pretty much all you could
> do would be to fail them a little earlier.
>
>> > I suspect Paul's patch is the right thing to do.  It might even make
>> > the ext4 fix unnecessary, although I don't understand the details well
>> > enough to verify it.  Maybe Paul can check -- the commit I'm referring
>> > to is 7c2e70879fc0949b4220ee61b7c4553f6976a94d (ext4: add ext4-specific
>> > kludge to avoid an oops after the disk disappears).
>>
>> I have no idea either, because it's not obvious to me what data
>> structures can be relied upon, and what can't, and when things are
>> supposed to get freed on sudden device disconnects.  The fact that
>> none of us are sure is part of what makes me think that the current
>> scheme is, perhaps, non-optimal...
>
> That's why someone like Jens or Al needs to take a close look at this
> (hint, hint).
>
> Alan Stern
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] fs: Fix mod_timer crash when removing USB sticks
  2012-03-18  3:44           ` Ted Ts'o
@ 2012-03-18 20:23             ` Alan Stern
  2012-03-18 22:25               ` Mandeep Singh Baines
  0 siblings, 1 reply; 25+ messages in thread
From: Alan Stern @ 2012-03-18 20:23 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Theodore Tso, Greg KH, Paul Taysom, Paul Taysom, Mandeep Baines,
	Jens Axboe, Andrew Morton, linux-usb, linux-kernel,
	Alexander Viro, linux-fsdevel, stable

On Sat, 17 Mar 2012, Ted Ts'o wrote:

> I can't help thinking that the fact that we're constantly playing
> whack-a-mole trying to fix various random crashes when devices
> disappear that perhaps we should consider if there's a better way to
> do things.

Indeed, as Jens's patch mentions, proper reference counting for the BDI 
stuff hasn't been implemented yet.  Obviously it will require somebody 
who really does know the code (i.e., not me).

For example, when Paul's patch assigns &default_backing_dev_info, is 
the assignment synchronized by any sort of lock?  I can't tell -- but 
if it isn't then the possibility of a race will still exist.

> The fact that at the file system layer I have **no** idea that a
> device has disappeared, and just blindly going on trying to write to a
> device which is gone just seems a little crazy to me...  why shouldn't
> block layer inform the upper layers about something as fundamental as,
> "the device is gone and is never coming back"?

Playing devil's advocate...  What would you do differently if you did
know the device was gone?  All I/O operations will fail regardless, and
presumably with an error code like -ENODEV.  Pretty much all you could
do would be to fail them a little earlier.

> > I suspect Paul's patch is the right thing to do.  It might even make
> > the ext4 fix unnecessary, although I don't understand the details well
> > enough to verify it.  Maybe Paul can check -- the commit I'm referring
> > to is 7c2e70879fc0949b4220ee61b7c4553f6976a94d (ext4: add ext4-specific
> > kludge to avoid an oops after the disk disappears).
> 
> I have no idea either, because it's not obvious to me what data
> structures can be relied upon, and what can't, and when things are
> supposed to get freed on sudden device disconnects.  The fact that
> none of us are sure is part of what makes me think that the current
> scheme is, perhaps, non-optimal...

That's why someone like Jens or Al needs to take a close look at this
(hint, hint).

Alan Stern


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] fs: Fix mod_timer crash when removing USB sticks
  2012-03-17 14:21         ` Alan Stern
@ 2012-03-18  3:44           ` Ted Ts'o
  2012-03-18 20:23             ` Alan Stern
  0 siblings, 1 reply; 25+ messages in thread
From: Ted Ts'o @ 2012-03-18  3:44 UTC (permalink / raw)
  To: Alan Stern
  Cc: Theodore Tso, Greg KH, Paul Taysom, Paul Taysom, Mandeep Baines,
	Jens Axboe, Andrew Morton, linux-usb, linux-kernel,
	Alexander Viro, linux-fsdevel, stable

On Sat, Mar 17, 2012 at 10:21:43AM -0400, Alan Stern wrote:
> On Fri, 16 Mar 2012, Theodore Tso wrote:
> 
> > I thought another fix at the USB layer also went in that attempted to
> > fix this problem for 3.2, and so with two separate band-aid patches, I
> > think we had thought the problem had been addressed.
> 
> I can't recall any USB fix like that.  The only thing I remember is 
> your change to ext4 (leaving the problem still present in ext3).

I could have sworn there was another patch that was tried to fix this
at another layer, because I distinctly remember thinking, oh well,
maybe I didn't need to merge my patch, when I saw another patch go by
that tried to fix it somewhere else.

I can't find it though, so maybe my memory is playing tricks on me....

> > The real problem is that all of the patches which I've seen to date
> > are band-aids, in that we aren't properly sending a "device as
> > disappeared" notification to the file system layer, but instead we are
> > trying to keep enough of the pointers valid (while also freeing other
> > data structures), such that the file system can blindly write into a
> > partially dismantled block device, and hopefully not oops.
> 
> That's not a band-aid approach; it's the way reference counting is 
> _intended_ to work.  The whole idea of refcounting is that instead of 
> synchronizing every single operation, you keep data structures around 
> so long as anyone might be using them.

Yeah, maybe.  It's just that when I went looking in the git logs
trying to find the other patch which I was sure had gone by on LKML, I
can across this:

commit 95f28604a65b1c40b6c6cd95e58439cd7ded3add
Author: Jens Axboe <jaxboe@fusionio.com>
Date:   Thu Mar 17 11:13:12 2011 +0100

    fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
    
    We don't have proper reference counting for this yet, so we run into
    cases where the device is pulled and we OOPS on flushing the fs data.
    This happens even though the dirty inodes have already been
    migrated to the default_backing_dev_info.
    
    Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
    Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe <jaxboe@fusionio.com>

I can't help thinking that the fact that we're constantly playing
whack-a-mole trying to fix various random crashes when devices
disappear that perhaps we should consider if there's a better way to
do things.

The fact that at the file system layer I have **no** idea that a
device has disappeared, and just blindly going on trying to write to a
device which is gone just seems a little crazy to me...  why shouldn't
block layer inform the upper layers about something as fundamental as,
"the device is gone and is never coming back"?

> I suspect Paul's patch is the right thing to do.  It might even make
> the ext4 fix unnecessary, although I don't understand the details well
> enough to verify it.  Maybe Paul can check -- the commit I'm referring
> to is 7c2e70879fc0949b4220ee61b7c4553f6976a94d (ext4: add ext4-specific
> kludge to avoid an oops after the disk disappears).

I have no idea either, because it's not obvious to me what data
structures can be relied upon, and what can't, and when things are
supposed to get freed on sudden device disconnects.  The fact that
none of us are sure is part of what makes me think that the current
scheme is, perhaps, non-optimal...

Regards,

						- Ted

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] fs: Fix mod_timer crash when removing USB sticks
  2012-03-16 21:10       ` Theodore Tso
  2012-03-17  0:06         ` Greg KH
@ 2012-03-17 14:21         ` Alan Stern
  2012-03-18  3:44           ` Ted Ts'o
  1 sibling, 1 reply; 25+ messages in thread
From: Alan Stern @ 2012-03-17 14:21 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Greg KH, Paul Taysom, Paul Taysom, Mandeep Baines, Jens Axboe,
	Andrew Morton, linux-usb, linux-kernel, Alexander Viro,
	linux-fsdevel, stable

On Fri, 16 Mar 2012, Theodore Tso wrote:

> I thought another fix at the USB layer also went in that attempted to
> fix this problem for 3.2, and so with two separate band-aid patches, I
> think we had thought the problem had been addressed.

I can't recall any USB fix like that.  The only thing I remember is 
your change to ext4 (leaving the problem still present in ext3).

> The real problem is that all of the patches which I've seen to date
> are band-aids, in that we aren't properly sending a "device as
> disappeared" notification to the file system layer, but instead we are
> trying to keep enough of the pointers valid (while also freeing other
> data structures), such that the file system can blindly write into a
> partially dismantled block device, and hopefully not oops.

That's not a band-aid approach; it's the way reference counting is 
_intended_ to work.  The whole idea of refcounting is that instead of 
synchronizing every single operation, you keep data structures around 
so long as anyone might be using them.

> Some have argued that my suggested approach of having an explicit
> super_ops revoke() function, which tells the file system that the
> block device is gone, etc., isn't necessary because this can be solved
> in userspace somehow.  Personally I think that's nuts, since we'll
> continue to play whack-a-mole, but I haven't had time to work up
> patches addressing this --- since this is really only a problem for
> naive users who pull USB sticks without unmounting them first (and so
> it never happens to me :-), and I've got a lot of other fish to
> try.....

I suspect Paul's patch is the right thing to do.  It might even make
the ext4 fix unnecessary, although I don't understand the details well
enough to verify it.  Maybe Paul can check -- the commit I'm referring
to is 7c2e70879fc0949b4220ee61b7c4553f6976a94d (ext4: add ext4-specific
kludge to avoid an oops after the disk disappears).

Alan Stern


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] fs: Fix mod_timer crash when removing USB sticks
  2012-03-16 21:10       ` Theodore Tso
@ 2012-03-17  0:06         ` Greg KH
  2012-03-17 14:21         ` Alan Stern
  1 sibling, 0 replies; 25+ messages in thread
From: Greg KH @ 2012-03-17  0:06 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Paul Taysom, Paul Taysom, Mandeep Baines, Jens Axboe,
	Andrew Morton, linux-usb, linux-kernel, Alexander Viro,
	linux-fsdevel, stable

On Fri, Mar 16, 2012 at 05:10:21PM -0400, Theodore Tso wrote:
> I thought another fix at the USB layer also went in that attempted to
> fix this problem for 3.2, and so with two separate band-aid patches, I
> think we had thought the problem had been addressed.

Ok, that's good to hear.  Paul, can you verify this, as it looks like
you already had the test devices that could reproduce this easily.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] fs: Fix mod_timer crash when removing USB sticks
  2012-03-16 19:43     ` Greg KH
@ 2012-03-16 21:10       ` Theodore Tso
  2012-03-17  0:06         ` Greg KH
  2012-03-17 14:21         ` Alan Stern
  0 siblings, 2 replies; 25+ messages in thread
From: Theodore Tso @ 2012-03-16 21:10 UTC (permalink / raw)
  To: Greg KH
  Cc: Paul Taysom, Paul Taysom, Mandeep Baines, Jens Axboe,
	Andrew Morton, linux-usb, linux-kernel, Alexander Viro,
	linux-fsdevel, stable

I thought another fix at the USB layer also went in that attempted to
fix this problem for 3.2, and so with two separate band-aid patches, I
think we had thought the problem had been addressed.

The real problem is that all of the patches which I've seen to date
are band-aids, in that we aren't properly sending a "device as
disappeared" notification to the file system layer, but instead we are
trying to keep enough of the pointers valid (while also freeing other
data structures), such that the file system can blindly write into a
partially dismantled block device, and hopefully not oops.

Some have argued that my suggested approach of having an explicit
super_ops revoke() function, which tells the file system that the
block device is gone, etc., isn't necessary because this can be solved
in userspace somehow.  Personally I think that's nuts, since we'll
continue to play whack-a-mole, but I haven't had time to work up
patches addressing this --- since this is really only a problem for
naive users who pull USB sticks without unmounting them first (and so
it never happens to me :-), and I've got a lot of other fish to
try.....

-- Ted

On Fri, Mar 16, 2012 at 3:43 PM, Greg KH <greg@kroah.com> wrote:
>
> On Fri, Mar 16, 2012 at 12:29:15PM -0700, Paul Taysom wrote:
> > On Fri, Mar 16, 2012 at 10:36 AM, Greg KH <greg@kroah.com> wrote:
> > >
> > > On Thu, Jan 12, 2012 at 01:57:11PM -0800, Paul Taysom wrote:
> > > > A USB stick with a ext file system on it, would occasionally crash
> > > > when the stick was pulled.
> > > >
> > > > The problem was a timer was being set on the Backing Device
> > > > Interface,
> > > > bdi, after the USB device had been removed and the bdi had been
> > > > unregistered. The bdi would then be later reinitialized by zeroing
> > > > the timer without removing from the timer from the timer queue.
> > > > This would eventually result in a kernel crash (NULL ptr
> > > > dereference).
> > > >
> > > > When the bdi is unregistered, the dev field is set to NULL. This
> > > > indication is used by bdi_unregister to only unregister the device
> > > > once.
> > > >
> > > > Fix: When the backing device is invalidated, the mapping
> > > > backing_dev_info
> > > > should be redirected to the default_backing_dev_info.
> > > >
> > > > Created 3 USB sticks with ext2, ext4 and one with both apple and DOS
> > > > file systems on it. Inserted and removed USB sticks many times in
> > > > random
> > > > order. With out the bug fix, the kernel would soon crash. With the
> > > > fix,
> > > > it did not. Ran on both stumpy and amd64-generic.
> > > >
> > > > Signed-off-by: Paul Taysom <taysom@chromium.org>
> > > > Cc: Mandeep Baines <msb@chromium.org>
> > > > Cc: Greg KH <greg@kroah.com>
> > > > Cc: Jens Axboe <axboe@kernel.dk>
> > > > Cc: Theodore Tso <tytso@google.com>
> > > > Cc: Andrew Morton <akpm@google.com>
> > > > Cc: <linux-usb@vger.kernel.org>
> > > > Cc: <linux-kernel@vger.kernel.org>
> > > > Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> > > > Cc: <linux-fsdevel@vger.kernel.org>
> > > > Cc: <stable@kernel.org>
> > > > ---
> > > >  fs/block_dev.c |    1 +
> > > >  1 files changed, 1 insertions(+), 0 deletions(-)
> > > >
> > > > diff --git a/fs/block_dev.c b/fs/block_dev.c
> > > > index afe74dd..322cd05 100644
> > > > --- a/fs/block_dev.c
> > > > +++ b/fs/block_dev.c
> > > > @@ -110,6 +110,7 @@ void invalidate_bdev(struct block_device *bdev)
> > > >        * But, for the strange corners, lets be cautious
> > > >        */
> > > >       cleancache_flush_inode(mapping);
> > > > +     mapping->backing_dev_info = &default_backing_dev_info;
> > > >  }
> > > >  EXPORT_SYMBOL(invalidate_bdev);
> > >
> > > What ever happened to this patch?  Is it still needed?  Can you still
> > > reproduce the problem on Linus's tree and older kernels?
> > >
> >
> >
> > Never heard anything back.  Ted supplied a partial fix in 3.2.6 (I
> > believe) for just the ext4 file system. Who should I follow up with?
>
> If the fix went into the 3.2-stable tree, then it's in Linus's tree
> already, which is good.
>
> But, what about all of the other filesystems you hit this on, do we need
> to make the same change to all of them?  If so, that kind of implies
> your original patch is the correct one :)
>
> As for who to poke, Ted, Al, Jens, what should we do here?
>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] fs: Fix mod_timer crash when removing USB sticks
  2012-03-16 19:29   ` Paul Taysom
@ 2012-03-16 19:43     ` Greg KH
  2012-03-16 21:10       ` Theodore Tso
  0 siblings, 1 reply; 25+ messages in thread
From: Greg KH @ 2012-03-16 19:43 UTC (permalink / raw)
  To: Paul Taysom
  Cc: Paul Taysom, Mandeep Baines, Jens Axboe, Theodore Tso,
	Andrew Morton, linux-usb, linux-kernel, Alexander Viro,
	linux-fsdevel, stable

On Fri, Mar 16, 2012 at 12:29:15PM -0700, Paul Taysom wrote:
> On Fri, Mar 16, 2012 at 10:36 AM, Greg KH <greg@kroah.com> wrote:
> >
> > On Thu, Jan 12, 2012 at 01:57:11PM -0800, Paul Taysom wrote:
> > > A USB stick with a ext file system on it, would occasionally crash
> > > when the stick was pulled.
> > >
> > > The problem was a timer was being set on the Backing Device Interface,
> > > bdi, after the USB device had been removed and the bdi had been
> > > unregistered. The bdi would then be later reinitialized by zeroing
> > > the timer without removing from the timer from the timer queue.
> > > This would eventually result in a kernel crash (NULL ptr dereference).
> > >
> > > When the bdi is unregistered, the dev field is set to NULL. This
> > > indication is used by bdi_unregister to only unregister the device
> > > once.
> > >
> > > Fix: When the backing device is invalidated, the mapping
> > > backing_dev_info
> > > should be redirected to the default_backing_dev_info.
> > >
> > > Created 3 USB sticks with ext2, ext4 and one with both apple and DOS
> > > file systems on it. Inserted and removed USB sticks many times in random
> > > order. With out the bug fix, the kernel would soon crash. With the fix,
> > > it did not. Ran on both stumpy and amd64-generic.
> > >
> > > Signed-off-by: Paul Taysom <taysom@chromium.org>
> > > Cc: Mandeep Baines <msb@chromium.org>
> > > Cc: Greg KH <greg@kroah.com>
> > > Cc: Jens Axboe <axboe@kernel.dk>
> > > Cc: Theodore Tso <tytso@google.com>
> > > Cc: Andrew Morton <akpm@google.com>
> > > Cc: <linux-usb@vger.kernel.org>
> > > Cc: <linux-kernel@vger.kernel.org>
> > > Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> > > Cc: <linux-fsdevel@vger.kernel.org>
> > > Cc: <stable@kernel.org>
> > > ---
> > >  fs/block_dev.c |    1 +
> > >  1 files changed, 1 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/fs/block_dev.c b/fs/block_dev.c
> > > index afe74dd..322cd05 100644
> > > --- a/fs/block_dev.c
> > > +++ b/fs/block_dev.c
> > > @@ -110,6 +110,7 @@ void invalidate_bdev(struct block_device *bdev)
> > >        * But, for the strange corners, lets be cautious
> > >        */
> > >       cleancache_flush_inode(mapping);
> > > +     mapping->backing_dev_info = &default_backing_dev_info;
> > >  }
> > >  EXPORT_SYMBOL(invalidate_bdev);
> >
> > What ever happened to this patch?  Is it still needed?  Can you still
> > reproduce the problem on Linus's tree and older kernels?
> >
> 
> 
> Never heard anything back.  Ted supplied a partial fix in 3.2.6 (I
> believe) for just the ext4 file system. Who should I follow up with?

If the fix went into the 3.2-stable tree, then it's in Linus's tree
already, which is good.

But, what about all of the other filesystems you hit this on, do we need
to make the same change to all of them?  If so, that kind of implies
your original patch is the correct one :)

As for who to poke, Ted, Al, Jens, what should we do here?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] fs: Fix mod_timer crash when removing USB sticks
  2012-03-16 17:36 ` Greg KH
@ 2012-03-16 19:29   ` Paul Taysom
  2012-03-16 19:43     ` Greg KH
  0 siblings, 1 reply; 25+ messages in thread
From: Paul Taysom @ 2012-03-16 19:29 UTC (permalink / raw)
  To: Greg KH
  Cc: Paul Taysom, Mandeep Baines, Jens Axboe, Theodore Tso,
	Andrew Morton, linux-usb, linux-kernel, Alexander Viro,
	linux-fsdevel, stable

On Fri, Mar 16, 2012 at 10:36 AM, Greg KH <greg@kroah.com> wrote:
>
> On Thu, Jan 12, 2012 at 01:57:11PM -0800, Paul Taysom wrote:
> > A USB stick with a ext file system on it, would occasionally crash
> > when the stick was pulled.
> >
> > The problem was a timer was being set on the Backing Device Interface,
> > bdi, after the USB device had been removed and the bdi had been
> > unregistered. The bdi would then be later reinitialized by zeroing
> > the timer without removing from the timer from the timer queue.
> > This would eventually result in a kernel crash (NULL ptr dereference).
> >
> > When the bdi is unregistered, the dev field is set to NULL. This
> > indication is used by bdi_unregister to only unregister the device
> > once.
> >
> > Fix: When the backing device is invalidated, the mapping
> > backing_dev_info
> > should be redirected to the default_backing_dev_info.
> >
> > Created 3 USB sticks with ext2, ext4 and one with both apple and DOS
> > file systems on it. Inserted and removed USB sticks many times in random
> > order. With out the bug fix, the kernel would soon crash. With the fix,
> > it did not. Ran on both stumpy and amd64-generic.
> >
> > Signed-off-by: Paul Taysom <taysom@chromium.org>
> > Cc: Mandeep Baines <msb@chromium.org>
> > Cc: Greg KH <greg@kroah.com>
> > Cc: Jens Axboe <axboe@kernel.dk>
> > Cc: Theodore Tso <tytso@google.com>
> > Cc: Andrew Morton <akpm@google.com>
> > Cc: <linux-usb@vger.kernel.org>
> > Cc: <linux-kernel@vger.kernel.org>
> > Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> > Cc: <linux-fsdevel@vger.kernel.org>
> > Cc: <stable@kernel.org>
> > ---
> >  fs/block_dev.c |    1 +
> >  1 files changed, 1 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/block_dev.c b/fs/block_dev.c
> > index afe74dd..322cd05 100644
> > --- a/fs/block_dev.c
> > +++ b/fs/block_dev.c
> > @@ -110,6 +110,7 @@ void invalidate_bdev(struct block_device *bdev)
> >        * But, for the strange corners, lets be cautious
> >        */
> >       cleancache_flush_inode(mapping);
> > +     mapping->backing_dev_info = &default_backing_dev_info;
> >  }
> >  EXPORT_SYMBOL(invalidate_bdev);
>
> What ever happened to this patch?  Is it still needed?  Can you still
> reproduce the problem on Linus's tree and older kernels?
>


Never heard anything back.  Ted supplied a partial fix in 3.2.6 (I
believe) for just the ext4 file system. Who should I follow up with?
Thanks,
Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] fs: Fix mod_timer crash when removing USB sticks
  2012-01-12 21:57 Paul Taysom
@ 2012-03-16 17:36 ` Greg KH
  2012-03-16 19:29   ` Paul Taysom
  0 siblings, 1 reply; 25+ messages in thread
From: Greg KH @ 2012-03-16 17:36 UTC (permalink / raw)
  To: Paul Taysom
  Cc: Mandeep Baines, Jens Axboe, Theodore Tso, Andrew Morton,
	linux-usb, linux-kernel, Alexander Viro, linux-fsdevel, stable

On Thu, Jan 12, 2012 at 01:57:11PM -0800, Paul Taysom wrote:
> A USB stick with a ext file system on it, would occasionally crash
> when the stick was pulled.
> 
> The problem was a timer was being set on the Backing Device Interface,
> bdi, after the USB device had been removed and the bdi had been
> unregistered. The bdi would then be later reinitialized by zeroing
> the timer without removing from the timer from the timer queue.
> This would eventually result in a kernel crash (NULL ptr dereference).
> 
> When the bdi is unregistered, the dev field is set to NULL. This
> indication is used by bdi_unregister to only unregister the device
> once.
> 
> Fix: When the backing device is invalidated, the mapping backing_dev_info
> should be redirected to the default_backing_dev_info.
> 
> Created 3 USB sticks with ext2, ext4 and one with both apple and DOS
> file systems on it. Inserted and removed USB sticks many times in random
> order. With out the bug fix, the kernel would soon crash. With the fix,
> it did not. Ran on both stumpy and amd64-generic.
> 
> Signed-off-by: Paul Taysom <taysom@chromium.org>
> Cc: Mandeep Baines <msb@chromium.org>
> Cc: Greg KH <greg@kroah.com>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Theodore Tso <tytso@google.com>
> Cc: Andrew Morton <akpm@google.com>
> Cc: <linux-usb@vger.kernel.org>
> Cc: <linux-kernel@vger.kernel.org>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: <linux-fsdevel@vger.kernel.org>
> Cc: <stable@kernel.org>
> ---
>  fs/block_dev.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index afe74dd..322cd05 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -110,6 +110,7 @@ void invalidate_bdev(struct block_device *bdev)
>  	 * But, for the strange corners, lets be cautious
>  	 */
>  	cleancache_flush_inode(mapping);
> +	mapping->backing_dev_info = &default_backing_dev_info;
>  }
>  EXPORT_SYMBOL(invalidate_bdev);

What ever happened to this patch?  Is it still needed?  Can you still
reproduce the problem on Linus's tree and older kernels?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] fs: Fix mod_timer crash when removing USB sticks
  2012-01-12 19:57 Paul Taysom
  2012-01-12 20:15 ` Greg KH
@ 2012-01-13 11:28 ` Sergei Shtylyov
  1 sibling, 0 replies; 25+ messages in thread
From: Sergei Shtylyov @ 2012-01-13 11:28 UTC (permalink / raw)
  To: Paul Taysom
  Cc: ; Paul Taysom, Mandeep Baines, Greg KH, Jens Axboe, Theodore Tso,
	Andrew Morton, linux-usb, linux-kernel, Alexander Viro,
	linux-fsdevel

Hello.

On 12-01-2012 23:57, Paul Taysom wrote:

> From: Paul Taysom<taysom@google.com>

> A USB stick with a ext file system on it, would occasionally crash
> when the stick was pulled.

> The problem was a timer was being set on the Backing Device Interface,
> bdi, after the USB device had been removed and the bdi had been
> unregistered. The bdi would then be later reinitialized by zeroing
> the timer without removing from the timer from the timer queue.
> This would eventually result in a kernel crash (NULL ptr dereference).

> When the bdi is unregistered, the dev field is set to NULL. This
> indication is used by bdi_unregister to only unregister the device
> once.

> Fix: When the backing device is invalidated, the mapping backing_dev_info
> should be redirected to the default_backing_dev_info.

> Created 3 USB sticks with ext2, ext4 and one with both apple and DOS
> file systems on it. Inserted and removed USB sticks many times in random
> order. With out the bug fix, the kernel would soon crash. With the fix,
> it did not. Ran on both stumpy and amd64-generic.

> Signed-off-by: Paul Taysom <taysom@chromium.org>
> Downstream-bug-report: http://crosbug.com/24165
> Cc: Mandeep Baines <msb@chromium.org>
> Cc: Greg KH <greg@kroah.com>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Theodore Tso <tytso@google.com>
> Cc: Andrew Morton <akpm@google.com>
> Cc: <linux-usb@vger.kernel.org>
> Cc: <linux-kernel@vger.kernel.org>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: <linux-fsdevel@vger.kernel.org>
> ---
>   fs/block_dev.c |    3 ++-
>   1 files changed, 2 insertions(+), 1 deletions(-)

> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index afe74dd..9f9b617 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1,4 +1,4 @@
> -/*
> +nvalid/*

    Huh?

>    *  linux/fs/block_dev.c
>    *
>    *  Copyright (C) 1991, 1992  Linus Torvalds

WBR, Sergei

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] fs: Fix mod_timer crash when removing USB sticks
@ 2012-01-12 21:57 Paul Taysom
  2012-03-16 17:36 ` Greg KH
  0 siblings, 1 reply; 25+ messages in thread
From: Paul Taysom @ 2012-01-12 21:57 UTC (permalink / raw)
  Cc: Paul Taysom, Mandeep Baines, Greg KH, Jens Axboe, Theodore Tso,
	Andrew Morton, linux-usb, linux-kernel, Alexander Viro,
	linux-fsdevel, stable

A USB stick with a ext file system on it, would occasionally crash
when the stick was pulled.

The problem was a timer was being set on the Backing Device Interface,
bdi, after the USB device had been removed and the bdi had been
unregistered. The bdi would then be later reinitialized by zeroing
the timer without removing from the timer from the timer queue.
This would eventually result in a kernel crash (NULL ptr dereference).

When the bdi is unregistered, the dev field is set to NULL. This
indication is used by bdi_unregister to only unregister the device
once.

Fix: When the backing device is invalidated, the mapping backing_dev_info
should be redirected to the default_backing_dev_info.

Created 3 USB sticks with ext2, ext4 and one with both apple and DOS
file systems on it. Inserted and removed USB sticks many times in random
order. With out the bug fix, the kernel would soon crash. With the fix,
it did not. Ran on both stumpy and amd64-generic.

Signed-off-by: Paul Taysom <taysom@chromium.org>
Cc: Mandeep Baines <msb@chromium.org>
Cc: Greg KH <greg@kroah.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Theodore Tso <tytso@google.com>
Cc: Andrew Morton <akpm@google.com>
Cc: <linux-usb@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: <stable@kernel.org>
---
 fs/block_dev.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index afe74dd..322cd05 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -110,6 +110,7 @@ void invalidate_bdev(struct block_device *bdev)
 	 * But, for the strange corners, lets be cautious
 	 */
 	cleancache_flush_inode(mapping);
+	mapping->backing_dev_info = &default_backing_dev_info;
 }
 EXPORT_SYMBOL(invalidate_bdev);
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] fs: Fix mod_timer crash when removing USB sticks
  2012-01-12 19:57 Paul Taysom
@ 2012-01-12 20:15 ` Greg KH
  2012-01-13 11:28 ` Sergei Shtylyov
  1 sibling, 0 replies; 25+ messages in thread
From: Greg KH @ 2012-01-12 20:15 UTC (permalink / raw)
  To: Paul Taysom
  Cc: Paul Taysom, Mandeep Baines, Jens Axboe, Theodore Tso,
	Andrew Morton, linux-usb, linux-kernel, Alexander Viro,
	linux-fsdevel

On Thu, Jan 12, 2012 at 11:57:42AM -0800, Paul Taysom wrote:
> From: Paul Taysom <taysom@google.com>
> 
> A USB stick with a ext file system on it, would occasionally crash
> when the stick was pulled.
> 
> The problem was a timer was being set on the Backing Device Interface,
> bdi, after the USB device had been removed and the bdi had been
> unregistered. The bdi would then be later reinitialized by zeroing
> the timer without removing from the timer from the timer queue.
> This would eventually result in a kernel crash (NULL ptr dereference).
> 
> When the bdi is unregistered, the dev field is set to NULL. This
> indication is used by bdi_unregister to only unregister the device
> once.
> 
> Fix: When the backing device is invalidated, the mapping backing_dev_info
> should be redirected to the default_backing_dev_info.
> 
> Created 3 USB sticks with ext2, ext4 and one with both apple and DOS
> file systems on it. Inserted and removed USB sticks many times in random
> order. With out the bug fix, the kernel would soon crash. With the fix,
> it did not. Ran on both stumpy and amd64-generic.
> 
> Signed-off-by: Paul Taysom <taysom@chromium.org>
> Downstream-bug-report: http://crosbug.com/24165
> Cc: Mandeep Baines <msb@chromium.org>
> Cc: Greg KH <greg@kroah.com>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Theodore Tso <tytso@google.com>
> Cc: Andrew Morton <akpm@google.com>
> Cc: <linux-usb@vger.kernel.org>
> Cc: <linux-kernel@vger.kernel.org>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: <linux-fsdevel@vger.kernel.org>
> ---
>  fs/block_dev.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index afe74dd..9f9b617 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1,4 +1,4 @@
> -/*
> +nvalid/*
>   *  linux/fs/block_dev.c
>   *
>   *  Copyright (C) 1991, 1992  Linus Torvalds

Minor nit, I don't think you ment the first line of the file to be
changed this way....

greg k-h

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] fs: Fix mod_timer crash when removing USB sticks
@ 2012-01-12 19:57 Paul Taysom
  2012-01-12 20:15 ` Greg KH
  2012-01-13 11:28 ` Sergei Shtylyov
  0 siblings, 2 replies; 25+ messages in thread
From: Paul Taysom @ 2012-01-12 19:57 UTC (permalink / raw)
  Cc: Paul Taysom, Paul Taysom, Mandeep Baines, Greg KH, Jens Axboe,
	Theodore Tso, Andrew Morton, linux-usb, linux-kernel,
	Alexander Viro, linux-fsdevel

From: Paul Taysom <taysom@google.com>

A USB stick with a ext file system on it, would occasionally crash
when the stick was pulled.

The problem was a timer was being set on the Backing Device Interface,
bdi, after the USB device had been removed and the bdi had been
unregistered. The bdi would then be later reinitialized by zeroing
the timer without removing from the timer from the timer queue.
This would eventually result in a kernel crash (NULL ptr dereference).

When the bdi is unregistered, the dev field is set to NULL. This
indication is used by bdi_unregister to only unregister the device
once.

Fix: When the backing device is invalidated, the mapping backing_dev_info
should be redirected to the default_backing_dev_info.

Created 3 USB sticks with ext2, ext4 and one with both apple and DOS
file systems on it. Inserted and removed USB sticks many times in random
order. With out the bug fix, the kernel would soon crash. With the fix,
it did not. Ran on both stumpy and amd64-generic.

Signed-off-by: Paul Taysom <taysom@chromium.org>
Downstream-bug-report: http://crosbug.com/24165
Cc: Mandeep Baines <msb@chromium.org>
Cc: Greg KH <greg@kroah.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Theodore Tso <tytso@google.com>
Cc: Andrew Morton <akpm@google.com>
Cc: <linux-usb@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <linux-fsdevel@vger.kernel.org>
---
 fs/block_dev.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index afe74dd..9f9b617 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1,4 +1,4 @@
-/*
+nvalid/*
  *  linux/fs/block_dev.c
  *
  *  Copyright (C) 1991, 1992  Linus Torvalds
@@ -110,6 +110,7 @@ void invalidate_bdev(struct block_device *bdev)
 	 * But, for the strange corners, lets be cautious
 	 */
 	cleancache_flush_inode(mapping);
+	mapping->backing_dev_info = &default_backing_dev_info;
 }
 EXPORT_SYMBOL(invalidate_bdev);
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2012-03-22 16:15 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-12 21:15 [PATCH] fs: Fix mod_timer crash when removing USB sticks Paul Taysom
2012-01-12 21:38 ` Greg KH
2012-01-12 21:53   ` Mandeep Singh Baines
2012-01-12 22:02     ` Greg KH
2012-01-12 22:11     ` Theodore Tso
     [not found]     ` <CAGagf4dk4KsZSkaWTO9Yegi=_wRJsYBPgfyks1z=wMZJV8gX0w@mail.gmail.com>
2012-01-12 22:35       ` Mandeep Singh Baines
2012-01-12 23:23         ` Andrew Morton
2012-01-13  5:42 ` Josh Boyer
2012-01-13 15:39   ` Paul Taysom
  -- strict thread matches above, loose matches on Subject: below --
2012-01-12 21:57 Paul Taysom
2012-03-16 17:36 ` Greg KH
2012-03-16 19:29   ` Paul Taysom
2012-03-16 19:43     ` Greg KH
2012-03-16 21:10       ` Theodore Tso
2012-03-17  0:06         ` Greg KH
2012-03-17 14:21         ` Alan Stern
2012-03-18  3:44           ` Ted Ts'o
2012-03-18 20:23             ` Alan Stern
2012-03-18 22:25               ` Mandeep Singh Baines
2012-03-20  0:24                 ` Paul Taysom
2012-03-20  2:02                   ` Alan Stern
2012-03-22 16:15                     ` Paul Taysom
2012-01-12 19:57 Paul Taysom
2012-01-12 20:15 ` Greg KH
2012-01-13 11:28 ` Sergei Shtylyov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).