All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] block: fix intermittent dm timeout based oops
@ 2009-03-24  7:17 Hannes Reinecke
  2009-04-03 14:32 ` Christof Schmitt
  0 siblings, 1 reply; 6+ messages in thread
From: Hannes Reinecke @ 2009-03-24  7:17 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-kernel, linux-scsi


Very rarely under stress testing of dm, oopses are occuring as
something tampers with an old stack frame.  This has been traced back
to blk_abort_queue() leaving a timeout_list pointing to the stack.
The reason is that sometimes blk_abort_request() won't delete the
timer (if the request is marked as complete but before the timer has
been removed, a small race window).  Fix this by splicing back from
the ususally empty list to the q->timeout_list.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-timeout.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index bbbdc4b..6213123 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -224,6 +224,12 @@ void blk_abort_queue(struct request_queue *q)
 	list_for_each_entry_safe(rq, tmp, &list, timeout_list)
 		blk_abort_request(rq);
 
+	/*
+	 * Occasionally, blk_abort_request() will return without
+	 * deleting the element from the list
+	 */
+	list_splice(&list, &q->timeout_list);
+
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
 }
-- 
1.5.3.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] block: fix intermittent dm timeout based oops
  2009-03-24  7:17 [PATCH] block: fix intermittent dm timeout based oops Hannes Reinecke
@ 2009-04-03 14:32 ` Christof Schmitt
  2009-04-03 14:35   ` James Bottomley
  2009-04-03 18:01   ` Jens Axboe
  0 siblings, 2 replies; 6+ messages in thread
From: Christof Schmitt @ 2009-04-03 14:32 UTC (permalink / raw)
  To: Jens Axboe; +Cc: James Bottomley, linux-kernel, linux-scsi, Hannes Reinecke

On Tue, Mar 24, 2009 at 08:17:30AM +0100, Hannes Reinecke wrote:
> Very rarely under stress testing of dm, oopses are occuring as
> something tampers with an old stack frame.  This has been traced back
> to blk_abort_queue() leaving a timeout_list pointing to the stack.
> The reason is that sometimes blk_abort_request() won't delete the
> timer (if the request is marked as complete but before the timer has
> been removed, a small race window).  Fix this by splicing back from
> the ususally empty list to the q->timeout_list.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  block/blk-timeout.c |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/block/blk-timeout.c b/block/blk-timeout.c
> index bbbdc4b..6213123 100644
> --- a/block/blk-timeout.c
> +++ b/block/blk-timeout.c
> @@ -224,6 +224,12 @@ void blk_abort_queue(struct request_queue *q)
>  	list_for_each_entry_safe(rq, tmp, &list, timeout_list)
>  		blk_abort_request(rq);
> 
> +	/*
> +	 * Occasionally, blk_abort_request() will return without
> +	 * deleting the element from the list
> +	 */
> +	list_splice(&list, &q->timeout_list);
> +
>  	spin_unlock_irqrestore(q->queue_lock, flags);
> 
>  }
> -- 
> 1.5.3.2

I just noticed that this fix is not upstream yet and i have seen test
cases hitting this problem.

Jens, are you going to included this patch, or should this go through
the SCSI tree?

--
Christof Schmitt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block: fix intermittent dm timeout based oops
  2009-04-03 14:32 ` Christof Schmitt
@ 2009-04-03 14:35   ` James Bottomley
  2009-04-03 18:01   ` Jens Axboe
  1 sibling, 0 replies; 6+ messages in thread
From: James Bottomley @ 2009-04-03 14:35 UTC (permalink / raw)
  To: Christof Schmitt; +Cc: Jens Axboe, linux-kernel, linux-scsi, Hannes Reinecke

On Fri, 2009-04-03 at 16:32 +0200, Christof Schmitt wrote:
> On Tue, Mar 24, 2009 at 08:17:30AM +0100, Hannes Reinecke wrote:
> > Very rarely under stress testing of dm, oopses are occuring as
> > something tampers with an old stack frame.  This has been traced back
> > to blk_abort_queue() leaving a timeout_list pointing to the stack.
> > The reason is that sometimes blk_abort_request() won't delete the
> > timer (if the request is marked as complete but before the timer has
> > been removed, a small race window).  Fix this by splicing back from
> > the ususally empty list to the q->timeout_list.
> > 
> > Signed-off-by: Hannes Reinecke <hare@suse.de>
> > ---
> >  block/blk-timeout.c |    6 ++++++
> >  1 files changed, 6 insertions(+), 0 deletions(-)
> > 
> > diff --git a/block/blk-timeout.c b/block/blk-timeout.c
> > index bbbdc4b..6213123 100644
> > --- a/block/blk-timeout.c
> > +++ b/block/blk-timeout.c
> > @@ -224,6 +224,12 @@ void blk_abort_queue(struct request_queue *q)
> >  	list_for_each_entry_safe(rq, tmp, &list, timeout_list)
> >  		blk_abort_request(rq);
> > 
> > +	/*
> > +	 * Occasionally, blk_abort_request() will return without
> > +	 * deleting the element from the list
> > +	 */
> > +	list_splice(&list, &q->timeout_list);
> > +
> >  	spin_unlock_irqrestore(q->queue_lock, flags);
> > 
> >  }
> > -- 
> > 1.5.3.2
> 
> I just noticed that this fix is not upstream yet and i have seen test
> cases hitting this problem.
> 
> Jens, are you going to included this patch, or should this go through
> the SCSI tree?

It's a block patch, so it goes through the block tree ... it also needs
backporting to stable.

James



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block: fix intermittent dm timeout based oops
  2009-04-03 14:32 ` Christof Schmitt
  2009-04-03 14:35   ` James Bottomley
@ 2009-04-03 18:01   ` Jens Axboe
  2009-04-23  8:21     ` Christof Schmitt
  1 sibling, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2009-04-03 18:01 UTC (permalink / raw)
  To: Christof Schmitt
  Cc: James Bottomley, linux-kernel, linux-scsi, Hannes Reinecke

On Fri, Apr 03 2009, Christof Schmitt wrote:
> On Tue, Mar 24, 2009 at 08:17:30AM +0100, Hannes Reinecke wrote:
> > Very rarely under stress testing of dm, oopses are occuring as
> > something tampers with an old stack frame.  This has been traced back
> > to blk_abort_queue() leaving a timeout_list pointing to the stack.
> > The reason is that sometimes blk_abort_request() won't delete the
> > timer (if the request is marked as complete but before the timer has
> > been removed, a small race window).  Fix this by splicing back from
> > the ususally empty list to the q->timeout_list.
> > 
> > Signed-off-by: Hannes Reinecke <hare@suse.de>
> > ---
> >  block/blk-timeout.c |    6 ++++++
> >  1 files changed, 6 insertions(+), 0 deletions(-)
> > 
> > diff --git a/block/blk-timeout.c b/block/blk-timeout.c
> > index bbbdc4b..6213123 100644
> > --- a/block/blk-timeout.c
> > +++ b/block/blk-timeout.c
> > @@ -224,6 +224,12 @@ void blk_abort_queue(struct request_queue *q)
> >  	list_for_each_entry_safe(rq, tmp, &list, timeout_list)
> >  		blk_abort_request(rq);
> > 
> > +	/*
> > +	 * Occasionally, blk_abort_request() will return without
> > +	 * deleting the element from the list
> > +	 */
> > +	list_splice(&list, &q->timeout_list);
> > +
> >  	spin_unlock_irqrestore(q->queue_lock, flags);
> > 
> >  }
> > -- 
> > 1.5.3.2
> 
> I just noticed that this fix is not upstream yet and i have seen test
> cases hitting this problem.
> 
> Jens, are you going to included this patch, or should this go through
> the SCSI tree?

I will include it, and CC stable as well.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block: fix intermittent dm timeout based oops
  2009-04-03 18:01   ` Jens Axboe
@ 2009-04-23  8:21     ` Christof Schmitt
  2009-04-23  8:31       ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Christof Schmitt @ 2009-04-23  8:21 UTC (permalink / raw)
  To: Jens Axboe; +Cc: James Bottomley, linux-kernel, linux-scsi, Hannes Reinecke

On Fri, Apr 03, 2009 at 08:01:06PM +0200, Jens Axboe wrote:
> On Fri, Apr 03 2009, Christof Schmitt wrote:
> > On Tue, Mar 24, 2009 at 08:17:30AM +0100, Hannes Reinecke wrote:
> > > Very rarely under stress testing of dm, oopses are occuring as
> > > something tampers with an old stack frame.  This has been traced back
> > > to blk_abort_queue() leaving a timeout_list pointing to the stack.
> > > The reason is that sometimes blk_abort_request() won't delete the
> > > timer (if the request is marked as complete but before the timer has
> > > been removed, a small race window).  Fix this by splicing back from
> > > the ususally empty list to the q->timeout_list.
> > > 
> > > Signed-off-by: Hannes Reinecke <hare@suse.de>
> > > ---
> > >  block/blk-timeout.c |    6 ++++++
> > >  1 files changed, 6 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/block/blk-timeout.c b/block/blk-timeout.c
> > > index bbbdc4b..6213123 100644
> > > --- a/block/blk-timeout.c
> > > +++ b/block/blk-timeout.c
> > > @@ -224,6 +224,12 @@ void blk_abort_queue(struct request_queue *q)
> > >  	list_for_each_entry_safe(rq, tmp, &list, timeout_list)
> > >  		blk_abort_request(rq);
> > > 
> > > +	/*
> > > +	 * Occasionally, blk_abort_request() will return without
> > > +	 * deleting the element from the list
> > > +	 */
> > > +	list_splice(&list, &q->timeout_list);
> > > +
> > >  	spin_unlock_irqrestore(q->queue_lock, flags);
> > > 
> > >  }
> > > -- 
> > > 1.5.3.2
> > 
> > I just noticed that this fix is not upstream yet and i have seen test
> > cases hitting this problem.
> > 
> > Jens, are you going to included this patch, or should this go through
> > the SCSI tree?
> 
> I will include it, and CC stable as well.

Any update on this? 2.6.30-rc3 does not have the patch.

--
Christof Schmitt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block: fix intermittent dm timeout based oops
  2009-04-23  8:21     ` Christof Schmitt
@ 2009-04-23  8:31       ` Jens Axboe
  0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2009-04-23  8:31 UTC (permalink / raw)
  To: Christof Schmitt
  Cc: James Bottomley, linux-kernel, linux-scsi, Hannes Reinecke

On Thu, Apr 23 2009, Christof Schmitt wrote:
> On Fri, Apr 03, 2009 at 08:01:06PM +0200, Jens Axboe wrote:
> > On Fri, Apr 03 2009, Christof Schmitt wrote:
> > > On Tue, Mar 24, 2009 at 08:17:30AM +0100, Hannes Reinecke wrote:
> > > > Very rarely under stress testing of dm, oopses are occuring as
> > > > something tampers with an old stack frame.  This has been traced back
> > > > to blk_abort_queue() leaving a timeout_list pointing to the stack.
> > > > The reason is that sometimes blk_abort_request() won't delete the
> > > > timer (if the request is marked as complete but before the timer has
> > > > been removed, a small race window).  Fix this by splicing back from
> > > > the ususally empty list to the q->timeout_list.
> > > > 
> > > > Signed-off-by: Hannes Reinecke <hare@suse.de>
> > > > ---
> > > >  block/blk-timeout.c |    6 ++++++
> > > >  1 files changed, 6 insertions(+), 0 deletions(-)
> > > > 
> > > > diff --git a/block/blk-timeout.c b/block/blk-timeout.c
> > > > index bbbdc4b..6213123 100644
> > > > --- a/block/blk-timeout.c
> > > > +++ b/block/blk-timeout.c
> > > > @@ -224,6 +224,12 @@ void blk_abort_queue(struct request_queue *q)
> > > >  	list_for_each_entry_safe(rq, tmp, &list, timeout_list)
> > > >  		blk_abort_request(rq);
> > > > 
> > > > +	/*
> > > > +	 * Occasionally, blk_abort_request() will return without
> > > > +	 * deleting the element from the list
> > > > +	 */
> > > > +	list_splice(&list, &q->timeout_list);
> > > > +
> > > >  	spin_unlock_irqrestore(q->queue_lock, flags);
> > > > 
> > > >  }
> > > > -- 
> > > > 1.5.3.2
> > > 
> > > I just noticed that this fix is not upstream yet and i have seen test
> > > cases hitting this problem.
> > > 
> > > Jens, are you going to included this patch, or should this go through
> > > the SCSI tree?
> > 
> > I will include it, and CC stable as well.
> 
> Any update on this? 2.6.30-rc3 does not have the patch.

I'll be sure to include it today, I need to fix one more thing before
sending a new pull request.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-04-23  8:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-24  7:17 [PATCH] block: fix intermittent dm timeout based oops Hannes Reinecke
2009-04-03 14:32 ` Christof Schmitt
2009-04-03 14:35   ` James Bottomley
2009-04-03 18:01   ` Jens Axboe
2009-04-23  8:21     ` Christof Schmitt
2009-04-23  8:31       ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.