* RE: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
[not found] <BAY151-W32DCB4BAFEC97DD4913A12A1090@phx.gbl>
@ 2011-09-17 17:34 ` Alan Stern
2011-09-18 23:00 ` Ben Hutchings
0 siblings, 1 reply; 10+ messages in thread
From: Alan Stern @ 2011-09-17 17:34 UTC (permalink / raw)
To: Rocko Requin; +Cc: tytso, Kernel development list
On Sat, 17 Sep 2011, Rocko Requin wrote:
> > Why were you using gnome-terminal? You should be running the tests at
> > a console VT, not under X at all. Ctrl-Alt-F2 or the equivalent...
>
> Because with Ted's patch it doesn't crash when run from a console VT, even with an X server running.
That's weird. Maybe the screen updates change some timing.
> > Here's another patch to address the new problem. You can apply it on
> > top of all the other patches.
>
> Attached is the crash log I get with the latest patch applied.
Okay, more fallout from the same problem. Here's an updated version of
the previous patch.
These are really just bandaid-type fixes. The people who understand
the block layer ought to be involved. If this keeps up much longer
I'll get in touch with them.
Alan Stern
Index: usb-3.1/block/blk-core.c
===================================================================
--- usb-3.1.orig/block/blk-core.c
+++ usb-3.1/block/blk-core.c
@@ -367,8 +367,10 @@ void blk_cleanup_queue(struct request_qu
queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
mutex_unlock(&q->sysfs_lock);
- if (q->elevator)
+ if (q->elevator) {
elevator_exit(q->elevator);
+ q->elevator = NULL;
+ }
blk_throtl_exit(q);
Index: usb-3.1/block/elevator.c
===================================================================
--- usb-3.1.orig/block/elevator.c
+++ usb-3.1/block/elevator.c
@@ -769,7 +769,7 @@ void elv_put_request(struct request_queu
{
struct elevator_queue *e = q->elevator;
- if (e->ops->elevator_put_req_fn)
+ if (e && e->ops->elevator_put_req_fn)
e->ops->elevator_put_req_fn(rq);
}
@@ -812,7 +812,7 @@ void elv_completed_request(struct reques
*/
if (blk_account_rq(rq)) {
q->in_flight[rq_is_sync(rq)]--;
- if ((rq->cmd_flags & REQ_SORTED) &&
+ if ((rq->cmd_flags & REQ_SORTED) && e &&
e->ops->elevator_completed_req_fn)
e->ops->elevator_completed_req_fn(q, rq);
}
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
2011-09-17 17:34 ` [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed Alan Stern
@ 2011-09-18 23:00 ` Ben Hutchings
2011-09-20 7:32 ` Jun'ichi Nomura
0 siblings, 1 reply; 10+ messages in thread
From: Ben Hutchings @ 2011-09-18 23:00 UTC (permalink / raw)
To: Alan Stern, James Bottomley
Cc: Rocko Requin, tytso, Kernel development list, linux-scsi
[-- Attachment #1: Type: text/plain, Size: 1864 bytes --]
On Sat, 2011-09-17 at 13:34 -0400, Alan Stern wrote:
> On Sat, 17 Sep 2011, Rocko Requin wrote:
>
> > > Why were you using gnome-terminal? You should be running the tests at
> > > a console VT, not under X at all. Ctrl-Alt-F2 or the equivalent...
> >
> > Because with Ted's patch it doesn't crash when run from a console VT, even with an X server running.
>
> That's weird. Maybe the screen updates change some timing.
>
> > > Here's another patch to address the new problem. You can apply it on
> > > top of all the other patches.
> >
> > Attached is the crash log I get with the latest patch applied.
>
> Okay, more fallout from the same problem. Here's an updated version of
> the previous patch.
[...]
There have been reports of this in Debian going back to 2.6.39:
http://bugs.debian.org/631187
http://bugs.debian.org/636263
http://bugs.debian.org/642043
Plus possibly related crashes in elv_put_request after CD-ROM removal:
http://bugs.debian.org/633890
http://bugs.debian.org/634681
http://bugs.debian.org/636103
The former was also reported in Ubuntu since their 2.6.38-10:
https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796
The result of the discussion there was that it appeared to be a
regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b
("[SCSI] put stricter guards on queue dead checks") which was also
included in a stable update for 2.6.38.
There was also a report on bugzilla.kernel.org, though no-one can see
quite what that says now:
https://bugzilla.kernel.org/show_bug.cgi?id=38842
I also reported most of the above to James Bottomley and linux-scsi
nearly 2 months ago, to no response.
Ben.
--
Ben Hutchings
Power corrupts. Absolute power is kind of neat.
- John Lehman, Secretary of the US Navy 1981-1987
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
2011-09-18 23:00 ` Ben Hutchings
@ 2011-09-20 7:32 ` Jun'ichi Nomura
2011-09-22 12:26 ` Hannes Reinecke
0 siblings, 1 reply; 10+ messages in thread
From: Jun'ichi Nomura @ 2011-09-20 7:32 UTC (permalink / raw)
To: Ben Hutchings, jaxboe
Cc: Alan Stern, James Bottomley, Rocko Requin, tytso,
Kernel development list, linux-scsi
On 09/19/11 08:00, Ben Hutchings wrote:
> On Sat, 2011-09-17 at 13:34 -0400, Alan Stern wrote:
>> On Sat, 17 Sep 2011, Rocko Requin wrote:
>>
>>>> Why were you using gnome-terminal? You should be running the tests at
>>>> a console VT, not under X at all. Ctrl-Alt-F2 or the equivalent...
>>>
>>> Because with Ted's patch it doesn't crash when run from a console VT, even with an X server running.
>>
>> That's weird. Maybe the screen updates change some timing.
>>
>>>> Here's another patch to address the new problem. You can apply it on
>>>> top of all the other patches.
>>>
>>> Attached is the crash log I get with the latest patch applied.
>>
>> Okay, more fallout from the same problem. Here's an updated version of
>> the previous patch.
> [...]
>
> There have been reports of this in Debian going back to 2.6.39:
>
> http://bugs.debian.org/631187
> http://bugs.debian.org/636263
> http://bugs.debian.org/642043
>
> Plus possibly related crashes in elv_put_request after CD-ROM removal:
>
> http://bugs.debian.org/633890
> http://bugs.debian.org/634681
> http://bugs.debian.org/636103
>
> The former was also reported in Ubuntu since their 2.6.38-10:
>
> https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796
>
> The result of the discussion there was that it appeared to be a
> regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b
> ("[SCSI] put stricter guards on queue dead checks") which was also
> included in a stable update for 2.6.38.
>
> There was also a report on bugzilla.kernel.org, though no-one can see
> quite what that says now:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=38842
>
> I also reported most of the above to James Bottomley and linux-scsi
> nearly 2 months ago, to no response.
I've reported a similar oops related to the above commit:
[BUG] Oops when SCSI device under multipath is removed
https://lkml.org/lkml/2011/8/10/11
Elevator being removed is the core of the problem.
And the essential issue seems 2 different models of queue/driver relation
implied by queue_lock.
If reverting the commit is not an option,
until somebody comes up to fix the essential issue,
the patch below should close the regressions introduced by the commit.
Thanks,
--
Jun'ichi Nomura, NEC Corporation
This patch moves elevator_exit() and blk_throtl_exit() from
blk_cleanup_queue() to blk_release_queue() when it is possible.
elevator_exit() and blk_throtl_exit() were called in blk_cleanup_queue()
because they use queue_lock.
There are 2 types of queue_locks:
a) supplied by driver (via blk_init_queue)
b) embedded in struct request_queue (__queue_lock)
When queue_lock is supplied by driver, there is no guarantee that
the pointer is valid after blk_cleanup_queue(), so they have to be
called in blk_cleanup_queue().
In this case, the driver has to make sure nobody is using the queue
before calling blk_cleanup_queue().
However, OTOH, if queue_lock is '__queue_lock' in request_queue,
blk_release_queue() is better place for freeing structures
because the block layer knows for sure there is no reference.
This patch is ugly but should fix various oopses introduced by this change:
86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b
[SCSI] put stricter guards on queue dead checks
For example:
https://lkml.org/lkml/2011/8/10/11
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Index: linux-3.1-rc4/block/blk-core.c
===================================================================
--- linux-3.1-rc4.orig/block/blk-core.c 2011-08-29 13:16:01.000000000 +0900
+++ linux-3.1-rc4/block/blk-core.c 2011-09-20 15:53:23.496814819 +0900
@@ -352,6 +352,14 @@
* unexpectedly as some queue cleanup components like elevator_exit() and
* blk_throtl_exit() need queue lock.
*/
+void blk_release_queue_components_with_queuelock(struct request_queue *q)
+{
+ if (q->elevator)
+ elevator_exit(q->elevator);
+
+ blk_throtl_exit(q);
+}
+
void blk_cleanup_queue(struct request_queue *q)
{
/*
@@ -367,10 +375,12 @@
queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
mutex_unlock(&q->sysfs_lock);
- if (q->elevator)
- elevator_exit(q->elevator);
-
- blk_throtl_exit(q);
+ /*
+ * A driver supplied the queue lock.
+ * Cleanup components while the queue lock is valid.
+ */
+ if (q->queue_lock != &q->__queue_lock)
+ blk_release_queue_components_with_queuelock(q);
blk_put_queue(q);
}
Index: linux-3.1-rc4/block/blk-sysfs.c
===================================================================
--- linux-3.1-rc4.orig/block/blk-sysfs.c 2011-09-19 09:38:51.000000000 +0900
+++ linux-3.1-rc4/block/blk-sysfs.c 2011-09-20 15:57:50.358807023 +0900
@@ -477,6 +477,9 @@
blk_sync_queue(q);
+ if (q->queue_lock == &q->__queue_lock)
+ blk_release_queue_components_with_queuelock(q);
+
if (rl->rq_pool)
mempool_destroy(rl->rq_pool);
Index: linux-3.1-rc4/block/blk.h
===================================================================
--- linux-3.1-rc4.orig/block/blk.h 2011-08-29 13:16:01.000000000 +0900
+++ linux-3.1-rc4/block/blk.h 2011-09-20 15:57:38.306807136 +0900
@@ -25,6 +25,9 @@
void blk_add_timer(struct request *);
void __generic_unplug_device(struct request_queue *);
+/* Wrapper to release functions to be called while queue_lock is valid */
+void blk_release_queue_components_with_queuelock(struct request_queue *q);
+
/*
* Internal atomic flags for request handling
*/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
2011-09-20 7:32 ` Jun'ichi Nomura
@ 2011-09-22 12:26 ` Hannes Reinecke
2011-09-22 12:35 ` James Bottomley
2011-09-22 15:16 ` Alan Stern
0 siblings, 2 replies; 10+ messages in thread
From: Hannes Reinecke @ 2011-09-22 12:26 UTC (permalink / raw)
To: Jun'ichi Nomura
Cc: Ben Hutchings, jaxboe, Alan Stern, James Bottomley, Rocko Requin,
tytso, Kernel development list, linux-scsi
On 09/20/2011 09:32 AM, Jun'ichi Nomura wrote:
> On 09/19/11 08:00, Ben Hutchings wrote:
[ .. ]
>>
>> There have been reports of this in Debian going back to 2.6.39:
>>
>> http://bugs.debian.org/631187
>> http://bugs.debian.org/636263
>> http://bugs.debian.org/642043
>>
>> Plus possibly related crashes in elv_put_request after CD-ROM removal:
>>
>> http://bugs.debian.org/633890
>> http://bugs.debian.org/634681
>> http://bugs.debian.org/636103
>>
>> The former was also reported in Ubuntu since their 2.6.38-10:
>>
>> https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796
>>
>> The result of the discussion there was that it appeared to be a
>> regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b
>> ("[SCSI] put stricter guards on queue dead checks") which was also
>> included in a stable update for 2.6.38.
>>
>> There was also a report on bugzilla.kernel.org, though no-one can see
>> quite what that says now:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=38842
>>
>> I also reported most of the above to James Bottomley and linux-scsi
>> nearly 2 months ago, to no response.
>
> I've reported a similar oops related to the above commit:
> [BUG] Oops when SCSI device under multipath is removed
> https://lkml.org/lkml/2011/8/10/11
>
> Elevator being removed is the core of the problem.
> And the essential issue seems 2 different models of queue/driver relation
> implied by queue_lock.
>
> If reverting the commit is not an option,
> until somebody comes up to fix the essential issue,
> the patch below should close the regressions introduced by the commit.
>
Why do you have to do it that complicated?
Couldn't we just state that any external lock is being disconnected from
queue_lock after blk_cleanup_queue()?
Then something like this should suffice here:
diff --git a/block/blk-core.c b/block/blk-core.c
index 90e1ffd..a4ac005 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -367,10 +367,8 @@ void blk_cleanup_queue(struct request_queue *q)
queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
mutex_unlock(&q->sysfs_lock);
- if (q->elevator)
- elevator_exit(q->elevator);
-
- blk_throtl_exit(q);
+ if (q->queue_lock != q->__queue_lock)
+ q->queue_lock = q->__queue_lock;
blk_put_queue(q);
}
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 0ee17b5..a5a756b 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -477,6 +477,11 @@ static void blk_release_queue(struct kobject *kobj)
blk_sync_queue(q);
+ if (q->elevator)
+ elevator_exit(q->elevator);
+
+ blk_throtl_exit(q);
+
if (rl->rq_pool)
mempool_destroy(rl->rq_pool);
And yeah, I find it pretty annoying, too.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
2011-09-22 12:26 ` Hannes Reinecke
@ 2011-09-22 12:35 ` James Bottomley
2011-09-22 15:16 ` Alan Stern
1 sibling, 0 replies; 10+ messages in thread
From: James Bottomley @ 2011-09-22 12:35 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Jun'ichi Nomura, Ben Hutchings, jaxboe, Alan Stern,
Rocko Requin, tytso, Kernel development list, linux-scsi
On Thu, 2011-09-22 at 14:26 +0200, Hannes Reinecke wrote:
> On 09/20/2011 09:32 AM, Jun'ichi Nomura wrote:
> > On 09/19/11 08:00, Ben Hutchings wrote:
> [ .. ]
> >>
> >> There have been reports of this in Debian going back to 2.6.39:
> >>
> >> http://bugs.debian.org/631187
> >> http://bugs.debian.org/636263
> >> http://bugs.debian.org/642043
> >>
> >> Plus possibly related crashes in elv_put_request after CD-ROM removal:
> >>
> >> http://bugs.debian.org/633890
> >> http://bugs.debian.org/634681
> >> http://bugs.debian.org/636103
> >>
> >> The former was also reported in Ubuntu since their 2.6.38-10:
> >>
> >> https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796
> >>
> >> The result of the discussion there was that it appeared to be a
> >> regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b
> >> ("[SCSI] put stricter guards on queue dead checks") which was also
> >> included in a stable update for 2.6.38.
> >>
> >> There was also a report on bugzilla.kernel.org, though no-one can see
> >> quite what that says now:
> >>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=38842
> >>
> >> I also reported most of the above to James Bottomley and linux-scsi
> >> nearly 2 months ago, to no response.
> >
> > I've reported a similar oops related to the above commit:
> > [BUG] Oops when SCSI device under multipath is removed
> > https://lkml.org/lkml/2011/8/10/11
> >
> > Elevator being removed is the core of the problem.
> > And the essential issue seems 2 different models of queue/driver relation
> > implied by queue_lock.
> >
> > If reverting the commit is not an option,
> > until somebody comes up to fix the essential issue,
> > the patch below should close the regressions introduced by the commit.
> >
> Why do you have to do it that complicated?
> Couldn't we just state that any external lock is being disconnected from
> queue_lock after blk_cleanup_queue()?
>
> Then something like this should suffice here:
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 90e1ffd..a4ac005 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -367,10 +367,8 @@ void blk_cleanup_queue(struct request_queue *q)
> queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
> mutex_unlock(&q->sysfs_lock);
>
> - if (q->elevator)
> - elevator_exit(q->elevator);
> -
> - blk_throtl_exit(q);
> + if (q->queue_lock != q->__queue_lock)
> + q->queue_lock = q->__queue_lock;
>
> blk_put_queue(q);
> }
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 0ee17b5..a5a756b 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -477,6 +477,11 @@ static void blk_release_queue(struct kobject *kobj)
>
> blk_sync_queue(q);
>
> + if (q->elevator)
> + elevator_exit(q->elevator);
> +
> + blk_throtl_exit(q);
> +
OK, I'll buy this one (when you fix the whitespace issue ... you have
spaces instead of tabs).
The fact that the lock check/replacement doesn't actually need any
locking is probably worthy of a comment.
James
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
2011-09-22 12:26 ` Hannes Reinecke
2011-09-22 12:35 ` James Bottomley
@ 2011-09-22 15:16 ` Alan Stern
2011-09-22 16:20 ` Thadeu Lima de Souza Cascardo
1 sibling, 1 reply; 10+ messages in thread
From: Alan Stern @ 2011-09-22 15:16 UTC (permalink / raw)
To: Rocko Requin
Cc: Hannes Reinecke, Jun'ichi Nomura, Ben Hutchings, jaxboe,
James Bottomley, tytso, Kernel development list, linux-scsi
Rocko:
Can you try testing this patch instead of all the patches I sent to
you (but keep Ted's patch)?
Alan Stern
On Thu, 22 Sep 2011, Hannes Reinecke wrote:
> On 09/20/2011 09:32 AM, Jun'ichi Nomura wrote:
> > On 09/19/11 08:00, Ben Hutchings wrote:
> [ .. ]
> >>
> >> There have been reports of this in Debian going back to 2.6.39:
> >>
> >> http://bugs.debian.org/631187
> >> http://bugs.debian.org/636263
> >> http://bugs.debian.org/642043
> >>
> >> Plus possibly related crashes in elv_put_request after CD-ROM removal:
> >>
> >> http://bugs.debian.org/633890
> >> http://bugs.debian.org/634681
> >> http://bugs.debian.org/636103
> >>
> >> The former was also reported in Ubuntu since their 2.6.38-10:
> >>
> >> https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796
> >>
> >> The result of the discussion there was that it appeared to be a
> >> regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b
> >> ("[SCSI] put stricter guards on queue dead checks") which was also
> >> included in a stable update for 2.6.38.
> >>
> >> There was also a report on bugzilla.kernel.org, though no-one can see
> >> quite what that says now:
> >>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=38842
> >>
> >> I also reported most of the above to James Bottomley and linux-scsi
> >> nearly 2 months ago, to no response.
> >
> > I've reported a similar oops related to the above commit:
> > [BUG] Oops when SCSI device under multipath is removed
> > https://lkml.org/lkml/2011/8/10/11
> >
> > Elevator being removed is the core of the problem.
> > And the essential issue seems 2 different models of queue/driver relation
> > implied by queue_lock.
> >
> > If reverting the commit is not an option,
> > until somebody comes up to fix the essential issue,
> > the patch below should close the regressions introduced by the commit.
> >
> Why do you have to do it that complicated?
> Couldn't we just state that any external lock is being disconnected from
> queue_lock after blk_cleanup_queue()?
>
> Then something like this should suffice here:
diff --git a/block/blk-core.c b/block/blk-core.c
index 90e1ffd..a4ac005 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -367,10 +367,8 @@ void blk_cleanup_queue(struct request_queue *q)
queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
mutex_unlock(&q->sysfs_lock);
- if (q->elevator)
- elevator_exit(q->elevator);
-
- blk_throtl_exit(q);
+ if (q->queue_lock != q->__queue_lock)
+ q->queue_lock = q->__queue_lock;
blk_put_queue(q);
}
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 0ee17b5..a5a756b 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -477,6 +477,11 @@ static void blk_release_queue(struct kobject *kobj)
blk_sync_queue(q);
+ if (q->elevator)
+ elevator_exit(q->elevator);
+
+ blk_throtl_exit(q);
+
if (rl->rq_pool)
mempool_destroy(rl->rq_pool);
> And yeah, I find it pretty annoying, too.
>
> Cheers,
>
> Hannes
>
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
2011-09-22 15:16 ` Alan Stern
@ 2011-09-22 16:20 ` Thadeu Lima de Souza Cascardo
2011-09-22 16:32 ` Hannes Reinecke
0 siblings, 1 reply; 10+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2011-09-22 16:20 UTC (permalink / raw)
To: Alan Stern
Cc: Rocko Requin, Hannes Reinecke, Jun'ichi Nomura,
Ben Hutchings, jaxboe, James Bottomley, tytso,
Kernel development list, linux-scsi
On Thu, Sep 22, 2011 at 11:16:30AM -0400, Alan Stern wrote:
> Rocko:
>
> Can you try testing this patch instead of all the patches I sent to
> you (but keep Ted's patch)?
>
> Alan Stern
>
> On Thu, 22 Sep 2011, Hannes Reinecke wrote:
>
> > On 09/20/2011 09:32 AM, Jun'ichi Nomura wrote:
> > > On 09/19/11 08:00, Ben Hutchings wrote:
> > [ .. ]
> > >>
> > >> There have been reports of this in Debian going back to 2.6.39:
> > >>
> > >> http://bugs.debian.org/631187
> > >> http://bugs.debian.org/636263
> > >> http://bugs.debian.org/642043
> > >>
> > >> Plus possibly related crashes in elv_put_request after CD-ROM removal:
> > >>
> > >> http://bugs.debian.org/633890
> > >> http://bugs.debian.org/634681
> > >> http://bugs.debian.org/636103
> > >>
> > >> The former was also reported in Ubuntu since their 2.6.38-10:
> > >>
> > >> https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796
> > >>
> > >> The result of the discussion there was that it appeared to be a
> > >> regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b
> > >> ("[SCSI] put stricter guards on queue dead checks") which was also
> > >> included in a stable update for 2.6.38.
> > >>
> > >> There was also a report on bugzilla.kernel.org, though no-one can see
> > >> quite what that says now:
> > >>
> > >> https://bugzilla.kernel.org/show_bug.cgi?id=38842
> > >>
> > >> I also reported most of the above to James Bottomley and linux-scsi
> > >> nearly 2 months ago, to no response.
> > >
> > > I've reported a similar oops related to the above commit:
> > > [BUG] Oops when SCSI device under multipath is removed
> > > https://lkml.org/lkml/2011/8/10/11
> > >
> > > Elevator being removed is the core of the problem.
> > > And the essential issue seems 2 different models of queue/driver relation
> > > implied by queue_lock.
> > >
> > > If reverting the commit is not an option,
> > > until somebody comes up to fix the essential issue,
> > > the patch below should close the regressions introduced by the commit.
> > >
> > Why do you have to do it that complicated?
> > Couldn't we just state that any external lock is being disconnected from
> > queue_lock after blk_cleanup_queue()?
> >
> > Then something like this should suffice here:
>
>
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 90e1ffd..a4ac005 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -367,10 +367,8 @@ void blk_cleanup_queue(struct request_queue *q)
> queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
> mutex_unlock(&q->sysfs_lock);
>
> - if (q->elevator)
> - elevator_exit(q->elevator);
> -
> - blk_throtl_exit(q);
> + if (q->queue_lock != q->__queue_lock)
> + q->queue_lock = q->__queue_lock;
That should be &q->__queue_lock.
Regards,
Cascardo.
>
> blk_put_queue(q);
> }
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 0ee17b5..a5a756b 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -477,6 +477,11 @@ static void blk_release_queue(struct kobject *kobj)
>
> blk_sync_queue(q);
>
> + if (q->elevator)
> + elevator_exit(q->elevator);
> +
> + blk_throtl_exit(q);
> +
> if (rl->rq_pool)
> mempool_destroy(rl->rq_pool);
>
>
> > And yeah, I find it pretty annoying, too.
> >
> > Cheers,
> >
> > Hannes
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
2011-09-22 16:20 ` Thadeu Lima de Souza Cascardo
@ 2011-09-22 16:32 ` Hannes Reinecke
0 siblings, 0 replies; 10+ messages in thread
From: Hannes Reinecke @ 2011-09-22 16:32 UTC (permalink / raw)
To: Thadeu Lima de Souza Cascardo
Cc: Alan Stern, Rocko Requin, Jun'ichi Nomura, Ben Hutchings,
jaxboe, James Bottomley, tytso, Kernel development list,
linux-scsi
On 09/22/2011 06:20 PM, Thadeu Lima de Souza Cascardo wrote:
> On Thu, Sep 22, 2011 at 11:16:30AM -0400, Alan Stern wrote:
>> Rocko:
>>
>> Can you try testing this patch instead of all the patches I sent to
>> you (but keep Ted's patch)?
>>
>> Alan Stern
>>
>> On Thu, 22 Sep 2011, Hannes Reinecke wrote:
>>
>>> On 09/20/2011 09:32 AM, Jun'ichi Nomura wrote:
>>>> On 09/19/11 08:00, Ben Hutchings wrote:
>>> [ .. ]
>>>>>
>>>>> There have been reports of this in Debian going back to 2.6.39:
>>>>>
>>>>> http://bugs.debian.org/631187
>>>>> http://bugs.debian.org/636263
>>>>> http://bugs.debian.org/642043
>>>>>
>>>>> Plus possibly related crashes in elv_put_request after CD-ROM removal:
>>>>>
>>>>> http://bugs.debian.org/633890
>>>>> http://bugs.debian.org/634681
>>>>> http://bugs.debian.org/636103
>>>>>
>>>>> The former was also reported in Ubuntu since their 2.6.38-10:
>>>>>
>>>>> https://bugs.launchpad.net/debian/+source/linux-2.6/+bug/793796
>>>>>
>>>>> The result of the discussion there was that it appeared to be a
>>>>> regression due to commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b
>>>>> ("[SCSI] put stricter guards on queue dead checks") which was also
>>>>> included in a stable update for 2.6.38.
>>>>>
>>>>> There was also a report on bugzilla.kernel.org, though no-one can see
>>>>> quite what that says now:
>>>>>
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=38842
>>>>>
>>>>> I also reported most of the above to James Bottomley and linux-scsi
>>>>> nearly 2 months ago, to no response.
>>>>
>>>> I've reported a similar oops related to the above commit:
>>>> [BUG] Oops when SCSI device under multipath is removed
>>>> https://lkml.org/lkml/2011/8/10/11
>>>>
>>>> Elevator being removed is the core of the problem.
>>>> And the essential issue seems 2 different models of queue/driver relation
>>>> implied by queue_lock.
>>>>
>>>> If reverting the commit is not an option,
>>>> until somebody comes up to fix the essential issue,
>>>> the patch below should close the regressions introduced by the commit.
>>>>
>>> Why do you have to do it that complicated?
>>> Couldn't we just state that any external lock is being disconnected from
>>> queue_lock after blk_cleanup_queue()?
>>>
>>> Then something like this should suffice here:
>>
>>
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 90e1ffd..a4ac005 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -367,10 +367,8 @@ void blk_cleanup_queue(struct request_queue *q)
>> queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
>> mutex_unlock(&q->sysfs_lock);
>>
>> - if (q->elevator)
>> - elevator_exit(q->elevator);
>> -
>> - blk_throtl_exit(q);
>> + if (q->queue_lock != q->__queue_lock)
>> + q->queue_lock = q->__queue_lock;
>
> That should be &q->__queue_lock.
>
Why, but of course.
It's been fixed with the official patch
(cf block: Free queue resources at blk_release_queue())
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
[not found] <BAY151-W13DDCCEFEB7B68EE506214A10C0@phx.gbl>
@ 2011-09-23 15:18 ` Alan Stern
0 siblings, 0 replies; 10+ messages in thread
From: Alan Stern @ 2011-09-23 15:18 UTC (permalink / raw)
To: Rocko Requin
Cc: hare, j-nomura, ben, jaxboe, james.bottomley, tytso,
linux-kernel, linux-scsi
On Thu, 22 Sep 2011, Rocko Requin wrote:
> > Rocko:
> >
> > Can you try testing this patch instead of all the patches I sent to
> > you (but keep Ted's patch)?
> >
> > Alan Stern
> >
>
> The simpler patch (in conjunction with Ted's patch) does stop the
> crashes. I get the same results as before: no kernel crashes
> (marvellous!), but the script's attempt to umount fails. I can then
> manually umount afterwards.
That sounds like a problem in the ext4 unmount implementation. Ted
should be able to help track it down.
What happens if you change your script to try two unmounts in a row?
In theory, the second should work like your manual unmount.
> Are these patches likely to be backported to the 3.0 kernel?
Yes, I should think so. The ext4/ext3 patches may be ported even
farther back.
Alan Stern
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed
[not found] <BAY151-W234D9A977DF076A732C2AAA1080@phx.gbl>
@ 2011-09-18 14:43 ` Alan Stern
0 siblings, 0 replies; 10+ messages in thread
From: Alan Stern @ 2011-09-18 14:43 UTC (permalink / raw)
To: Rocko Requin; +Cc: tytso, linux-kernel
On Sun, 18 Sep 2011, Rocko Requin wrote:
> That patch worked, thanks - no more kernel crashes!
That's good news.
> I did see some slightly strange behaviour, though: the umounts issued
> by the script were not working correctly, so the drive was mounting
> successively on /dev/sdb1, /dev/sdc1, /dev/sdd1, etc again. After
> stopping the script, I was able to umount the various mountpoints
> manually, except for one (not the last one in the list, so not the
> most recently mounted one) which reported that the device was busy.
> lsof showed process jbd2/sdj1 three times, once with FD=cwd and
> TYPE=DIR, another with FD=rtd and TYPE=DIR, and lastly with FD=txt
> and TYPE=unknown.
Ted may have some ideas on how to find out what part of the unmount
commands is failing.
Alan Stern
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-09-23 15:18 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <BAY151-W32DCB4BAFEC97DD4913A12A1090@phx.gbl>
2011-09-17 17:34 ` [Bug 25832] kernel crashes when a mounted ext3/4 file system is physically removed Alan Stern
2011-09-18 23:00 ` Ben Hutchings
2011-09-20 7:32 ` Jun'ichi Nomura
2011-09-22 12:26 ` Hannes Reinecke
2011-09-22 12:35 ` James Bottomley
2011-09-22 15:16 ` Alan Stern
2011-09-22 16:20 ` Thadeu Lima de Souza Cascardo
2011-09-22 16:32 ` Hannes Reinecke
[not found] <BAY151-W234D9A977DF076A732C2AAA1080@phx.gbl>
2011-09-18 14:43 ` Alan Stern
[not found] <BAY151-W13DDCCEFEB7B68EE506214A10C0@phx.gbl>
2011-09-23 15:18 ` Alan Stern
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).