All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] as i/o hang with aacraid driver 2.6.0-test1
@ 2003-07-15 23:02 Mark Haverkamp
  2003-07-16  1:40 ` Nick Piggin
                   ` (2 more replies)
  0 siblings, 3 replies; 57+ messages in thread
From: Mark Haverkamp @ 2003-07-15 23:02 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, Cliff White, linux-scsi

Daniel McNeil and I have been debugging a hang with the aacraid driver
using the as I/O scheduler.  We found that scsi_request_fn would
de-queue a request and later re-queued it.  This left the
as_data->nr_dispatched variable in an inconsistent state (it was never
being decremented back to zero).  We added a call to
elv_completed_request to clean up the state before re-adding the
request.  This has fixed our hang problem.  The linux-scsi list is being
copied for review of the scsi_lib.c change.

===== drivers/scsi/scsi_lib.c 1.99 vs edited =====
--- 1.99/drivers/scsi/scsi_lib.c	Sun Jun 29 18:14:44 2003
+++ edited/drivers/scsi/scsi_lib.c	Tue Jul 15 15:47:45 2003
@@ -1215,6 +1215,7 @@
 	spin_lock_irq(q->queue_lock);
 	if (blk_rq_tagged(req))
 		blk_queue_end_tag(q, req);
+	elv_completed_request(q, req);
 	__elv_add_request(q, req, 0, 0);
 	sdev->device_busy--;
 	if(sdev->device_busy == 0)

-- 
Mark Haverkamp <markh@osdl.org>


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-15 23:02 [PATCH] as i/o hang with aacraid driver 2.6.0-test1 Mark Haverkamp
@ 2003-07-16  1:40 ` Nick Piggin
  2003-07-16  5:53   ` Jens Axboe
  2003-07-16 12:41 ` James Bottomley
  2003-07-16 13:06 ` Alan Cox
  2 siblings, 1 reply; 57+ messages in thread
From: Nick Piggin @ 2003-07-16  1:40 UTC (permalink / raw)
  To: Mark Haverkamp; +Cc: Andrew Morton, Cliff White, linux-scsi

Thanks Mark, great. It looks fine.


>Daniel McNeil and I have been debugging a hang with the aacraid driver
>using the as I/O scheduler.  We found that scsi_request_fn would
>de-queue a request and later re-queued it.  This left the
>as_data->nr_dispatched variable in an inconsistent state (it was never
>being decremented back to zero).  We added a call to
>elv_completed_request to clean up the state before re-adding the
>request.  This has fixed our hang problem.  The linux-scsi list is being
>copied for review of the scsi_lib.c change.
>
>===== drivers/scsi/scsi_lib.c 1.99 vs edited =====
>--- 1.99/drivers/scsi/scsi_lib.c	Sun Jun 29 18:14:44 2003
>+++ edited/drivers/scsi/scsi_lib.c	Tue Jul 15 15:47:45 2003
>@@ -1215,6 +1215,7 @@
> 	spin_lock_irq(q->queue_lock);
> 	if (blk_rq_tagged(req))
> 		blk_queue_end_tag(q, req);
>+	elv_completed_request(q, req);
> 	__elv_add_request(q, req, 0, 0);
> 	sdev->device_busy--;
> 	if(sdev->device_busy == 0)
>
>  
>


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-16  1:40 ` Nick Piggin
@ 2003-07-16  5:53   ` Jens Axboe
  0 siblings, 0 replies; 57+ messages in thread
From: Jens Axboe @ 2003-07-16  5:53 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Mark Haverkamp, Andrew Morton, Cliff White, linux-scsi

On Wed, Jul 16 2003, Nick Piggin wrote:
> Thanks Mark, great. It looks fine.
> 
> 
> >Daniel McNeil and I have been debugging a hang with the aacraid driver
> >using the as I/O scheduler.  We found that scsi_request_fn would
> >de-queue a request and later re-queued it.  This left the
> >as_data->nr_dispatched variable in an inconsistent state (it was never
> >being decremented back to zero).  We added a call to
> >elv_completed_request to clean up the state before re-adding the
> >request.  This has fixed our hang problem.  The linux-scsi list is being
> >copied for review of the scsi_lib.c change.
> >
> >===== drivers/scsi/scsi_lib.c 1.99 vs edited =====
> >--- 1.99/drivers/scsi/scsi_lib.c	Sun Jun 29 18:14:44 2003
> >+++ edited/drivers/scsi/scsi_lib.c	Tue Jul 15 15:47:45 2003
> >@@ -1215,6 +1215,7 @@
> >	spin_lock_irq(q->queue_lock);
> >	if (blk_rq_tagged(req))
> >		blk_queue_end_tag(q, req);
> >+	elv_completed_request(q, req);
> >	__elv_add_request(q, req, 0, 0);
> >	sdev->device_busy--;
> >	if(sdev->device_busy == 0)

Ehm no not really. The request isn't completed here, it looks more like
a hack to keep AS happy. We ought to make a more generic solution to
this problem, other drivers could run into it as well.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-15 23:02 [PATCH] as i/o hang with aacraid driver 2.6.0-test1 Mark Haverkamp
  2003-07-16  1:40 ` Nick Piggin
@ 2003-07-16 12:41 ` James Bottomley
  2003-07-16 12:45   ` Jens Axboe
  2003-07-16 13:06 ` Alan Cox
  2 siblings, 1 reply; 57+ messages in thread
From: James Bottomley @ 2003-07-16 12:41 UTC (permalink / raw)
  To: Mark Haverkamp
  Cc: Nick Piggin, Andrew Morton, Cliff White, linux-scsi, Jens Axboe

On Tue, 2003-07-15 at 19:02, Mark Haverkamp wrote:
> Daniel McNeil and I have been debugging a hang with the aacraid driver
> using the as I/O scheduler.  We found that scsi_request_fn would
> de-queue a request and later re-queued it.  This left the
> as_data->nr_dispatched variable in an inconsistent state (it was never
> being decremented back to zero).  We added a call to
> elv_completed_request to clean up the state before re-adding the
> request.  This has fixed our hang problem.  The linux-scsi list is being
> copied for review of the scsi_lib.c change.
> 
> ===== drivers/scsi/scsi_lib.c 1.99 vs edited =====
> --- 1.99/drivers/scsi/scsi_lib.c	Sun Jun 29 18:14:44 2003
> +++ edited/drivers/scsi/scsi_lib.c	Tue Jul 15 15:47:45 2003
> @@ -1215,6 +1215,7 @@
>  	spin_lock_irq(q->queue_lock);
>  	if (blk_rq_tagged(req))
>  		blk_queue_end_tag(q, req);
> +	elv_completed_request(q, req);
>  	__elv_add_request(q, req, 0, 0);

This doen't look right to me.

SCSI expects to be able to push back uncompleted requests onto the
request queue.  The fact that you seem to be calling a completion
function for an uncompleted request is what's causing me heartburn.

This code used to work with the old scheduler (we extensively tested it
around the 2.5.6x timeframe because of other changes), so what I'd
really like to know is what changed in the scheduler assumptions to
necessitate this?

If this change is suddenly required, there are several other places in
our queueing functions that will need similar modifications.

Could I have a definitive statement from the I/O scheduler people about
the procedure for pushing back uncompleted I/O on the block queue just
so we all get back on the same page?

Thanks,

James



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-16 12:41 ` James Bottomley
@ 2003-07-16 12:45   ` Jens Axboe
  2003-07-16 12:56     ` James Bottomley
  0 siblings, 1 reply; 57+ messages in thread
From: Jens Axboe @ 2003-07-16 12:45 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mark Haverkamp, Nick Piggin, Andrew Morton, Cliff White, linux-scsi

On Wed, Jul 16 2003, James Bottomley wrote:
> On Tue, 2003-07-15 at 19:02, Mark Haverkamp wrote:
> > Daniel McNeil and I have been debugging a hang with the aacraid driver
> > using the as I/O scheduler.  We found that scsi_request_fn would
> > de-queue a request and later re-queued it.  This left the
> > as_data->nr_dispatched variable in an inconsistent state (it was never
> > being decremented back to zero).  We added a call to
> > elv_completed_request to clean up the state before re-adding the
> > request.  This has fixed our hang problem.  The linux-scsi list is being
> > copied for review of the scsi_lib.c change.
> > 
> > ===== drivers/scsi/scsi_lib.c 1.99 vs edited =====
> > --- 1.99/drivers/scsi/scsi_lib.c	Sun Jun 29 18:14:44 2003
> > +++ edited/drivers/scsi/scsi_lib.c	Tue Jul 15 15:47:45 2003
> > @@ -1215,6 +1215,7 @@
> >  	spin_lock_irq(q->queue_lock);
> >  	if (blk_rq_tagged(req))
> >  		blk_queue_end_tag(q, req);
> > +	elv_completed_request(q, req);
> >  	__elv_add_request(q, req, 0, 0);
> 
> This doen't look right to me.
> 
> SCSI expects to be able to push back uncompleted requests onto the
> request queue.  The fact that you seem to be calling a completion
> function for an uncompleted request is what's causing me heartburn.

Completely agree, see my earlier main on the subject.

> This code used to work with the old scheduler (we extensively tested it
> around the 2.5.6x timeframe because of other changes), so what I'd
> really like to know is what changed in the scheduler assumptions to
> necessitate this?

AS scheduler needs paired started/completed calls, the old scheduler
didn't. In fact elv_completed_request() was introduced for that
purpose.

> If this change is suddenly required, there are several other places in
> our queueing functions that will need similar modifications.

Indeed

> Could I have a definitive statement from the I/O scheduler people about
> the procedure for pushing back uncompleted I/O on the block queue just
> so we all get back on the same page?

SCSI does it right, don't worry. It's a block layer problem that we need
to find out how to correctly make sure that we deal it. Basically
elv_next_request() does an implicit elv_started_request(), at least that
is what AS assumes. So request re-adds should notify the scheduler of
that fact.

The hack posted here works for them, but isn't correct.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-16 12:45   ` Jens Axboe
@ 2003-07-16 12:56     ` James Bottomley
  2003-07-16 13:20       ` Jens Axboe
  0 siblings, 1 reply; 57+ messages in thread
From: James Bottomley @ 2003-07-16 12:56 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Mark Haverkamp, Nick Piggin, Andrew Morton, Cliff White, linux-scsi

On Wed, 2003-07-16 at 08:45, Jens Axboe wrote:
> Completely agree, see my earlier main on the subject.

Yes, saw that later...processed mails in the order seen.

> SCSI does it right, don't worry. It's a block layer problem that we need
> to find out how to correctly make sure that we deal it. Basically
> elv_next_request() does an implicit elv_started_request(), at least that
> is what AS assumes. So request re-adds should notify the scheduler of
> that fact.

What about exporting a more generic requeue function (perhaps one that
would take care of ending the tag)?

Our current requeue code looks like

	if (blk_rq_tagged(req))
		blk_queue_end_tag(q, req);
	__elv_add_request(q, req, 0, 0);

which looks slightly fragile given that we have to remember to end the
tag and then call a __ function (which are usually private).

If we called an explicit request requeue hook, schedulers that need to
adjust for requeueing would have a single place they could plug into?

James



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-15 23:02 [PATCH] as i/o hang with aacraid driver 2.6.0-test1 Mark Haverkamp
  2003-07-16  1:40 ` Nick Piggin
  2003-07-16 12:41 ` James Bottomley
@ 2003-07-16 13:06 ` Alan Cox
  2 siblings, 0 replies; 57+ messages in thread
From: Alan Cox @ 2003-07-16 13:06 UTC (permalink / raw)
  To: Mark Haverkamp; +Cc: Nick Piggin, Andrew Morton, Cliff White, linux-scsi

On Mer, 2003-07-16 at 00:02, Mark Haverkamp wrote:
> Daniel McNeil and I have been debugging a hang with the aacraid driver
> using the as I/O scheduler.  We found that scsi_request_fn would
> de-queue a request and later re-queued it.  This left the
> as_data->nr_dispatched variable in an inconsistent state (it was never
> being decremented back to zero).  We added a call to
> elv_completed_request to clean up the state before re-adding the
> request.  This has fixed our hang problem.  The linux-scsi list is being
> copied for review of the scsi_lib.c change.

Your aacraid is now terminally obsolete btw, the 2.4 one adds new cards
and fixes some bugs including one or two "under high load it dies" cases


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-16 12:56     ` James Bottomley
@ 2003-07-16 13:20       ` Jens Axboe
  2003-07-16 14:07         ` James Bottomley
  2003-07-16 22:45         ` Mark Haverkamp
  0 siblings, 2 replies; 57+ messages in thread
From: Jens Axboe @ 2003-07-16 13:20 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mark Haverkamp, Nick Piggin, Andrew Morton, Cliff White, linux-scsi

On Wed, Jul 16 2003, James Bottomley wrote:
> > SCSI does it right, don't worry. It's a block layer problem that we need
> > to find out how to correctly make sure that we deal it. Basically
> > elv_next_request() does an implicit elv_started_request(), at least that
> > is what AS assumes. So request re-adds should notify the scheduler of
> > that fact.
> 
> What about exporting a more generic requeue function (perhaps one that
> would take care of ending the tag)?
> 
> Our current requeue code looks like
> 
> 	if (blk_rq_tagged(req))
> 		blk_queue_end_tag(q, req);
> 	__elv_add_request(q, req, 0, 0);
> 
> which looks slightly fragile given that we have to remember to end the
> tag and then call a __ function (which are usually private).

Not usually private, but lockless.

> If we called an explicit request requeue hook, schedulers that need to
> adjust for requeueing would have a single place they could plug into?

blk_insert_request() is close, but it does a little more (it's useful
another place in the scsi queueing, though). How about something like
this? Then I can add the 'back on queue but not completed' stuff after
that.

===== drivers/block/ll_rw_blk.c 1.192 vs edited =====
--- 1.192/drivers/block/ll_rw_blk.c	Sun Jul 13 14:15:43 2003
+++ edited/drivers/block/ll_rw_blk.c	Wed Jul 16 15:17:23 2003
@@ -1494,6 +1494,23 @@
 
 	return rq;
 }
+/**
+ * blk_requeue_request - put a request back on queue
+ * @q:		request queue where request should be inserted
+ * @rq:		request to be inserted
+ *
+ * Description:
+ *    Drivers often keep queueing requests until the hardware cannot accept
+ *    more, when that condition happens we need to put the request back
+ *    on the queue. Must be called with queue lock held.
+ */
+void blk_requeue_request(request_queue_t *q, struct request *rq)
+{
+	if (blk_rq_tagged(rq))
+		blk_queue_end_tag(q, rq);
+
+	__elv_add_request(q, rq, 0, 0);
+}
 
 /**
  * blk_insert_request - insert a special request in to a request queue
===== drivers/scsi/scsi_lib.c 1.99 vs edited =====
--- 1.99/drivers/scsi/scsi_lib.c	Mon Jun 30 03:14:44 2003
+++ edited/drivers/scsi/scsi_lib.c	Wed Jul 16 15:17:52 2003
@@ -444,22 +444,8 @@
  */
 static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	cmd->request->special = cmd;
-	if (blk_rq_tagged(cmd->request))
-		blk_queue_end_tag(q, cmd->request);
-
-	/*
-	 * set REQ_SPECIAL - we have a command
-	 * clear REQ_DONTPREP - we assume the sg table has been 
-	 *	nuked so we need to set it up again.
-	 */
-	cmd->request->flags |= REQ_SPECIAL;
 	cmd->request->flags &= ~REQ_DONTPREP;
-	__elv_add_request(q, cmd->request, 0, 0);
-	spin_unlock_irqrestore(q->queue_lock, flags);
+	blk_insert_request(cmd->request, 1, cmd);
 
 	scsi_run_queue(q);
 }
@@ -1213,9 +1199,7 @@
 	 * later time.
 	 */
 	spin_lock_irq(q->queue_lock);
-	if (blk_rq_tagged(req))
-		blk_queue_end_tag(q, req);
-	__elv_add_request(q, req, 0, 0);
+	blk_requeue_request(q, req);
 	sdev->device_busy--;
 	if(sdev->device_busy == 0)
 		blk_plug_device(q);
===== include/linux/blkdev.h 1.116 vs edited =====
--- 1.116/include/linux/blkdev.h	Sat Jul  5 08:52:51 2003
+++ edited/include/linux/blkdev.h	Wed Jul 16 15:18:06 2003
@@ -491,6 +491,7 @@
 extern struct request *blk_get_request(request_queue_t *, int, int);
 extern void blk_put_request(struct request *);
 extern void blk_insert_request(request_queue_t *, struct request *, int, void *);
+extern void blk_requeue_request(request_queue_t *, struct request *);
 extern void blk_plug_device(request_queue_t *);
 extern int blk_remove_plug(request_queue_t *);
 extern void blk_recount_segments(request_queue_t *, struct bio *);

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-16 13:20       ` Jens Axboe
@ 2003-07-16 14:07         ` James Bottomley
  2003-07-16 17:04           ` Jens Axboe
  2003-07-16 22:45         ` Mark Haverkamp
  1 sibling, 1 reply; 57+ messages in thread
From: James Bottomley @ 2003-07-16 14:07 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Mark Haverkamp, Nick Piggin, Andrew Morton, Cliff White, linux-scsi

On Wed, 2003-07-16 at 09:20, Jens Axboe wrote:
> blk_insert_request() is close, but it does a little more (it's useful
> another place in the scsi queueing, though). How about something like
> this? Then I can add the 'back on queue but not completed' stuff after
> that.

Looks fine well except that it needs an EXPORT_SYMBOL.  How do you want
to submit?  I'm getting close to attempting a SCSI merge, so I can send
it in with that?

James



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-16 14:07         ` James Bottomley
@ 2003-07-16 17:04           ` Jens Axboe
  2003-07-17  8:57             ` Andrew Morton
  0 siblings, 1 reply; 57+ messages in thread
From: Jens Axboe @ 2003-07-16 17:04 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mark Haverkamp, Nick Piggin, Andrew Morton, Cliff White, linux-scsi

On Wed, Jul 16 2003, James Bottomley wrote:
> On Wed, 2003-07-16 at 09:20, Jens Axboe wrote:
> > blk_insert_request() is close, but it does a little more (it's useful
> > another place in the scsi queueing, though). How about something like
> > this? Then I can add the 'back on queue but not completed' stuff after
> > that.
> 
> Looks fine well except that it needs an EXPORT_SYMBOL.  How do you want
> to submit?  I'm getting close to attempting a SCSI merge, so I can send
> it in with that?

Sure go ahead. I'll add the missing bits when it is in.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-16 13:20       ` Jens Axboe
  2003-07-16 14:07         ` James Bottomley
@ 2003-07-16 22:45         ` Mark Haverkamp
  1 sibling, 0 replies; 57+ messages in thread
From: Mark Haverkamp @ 2003-07-16 22:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: James Bottomley, Nick Piggin, Andrew Morton, Cliff White, linux-scsi

On Wed, 2003-07-16 at 06:20, Jens Axboe wrote:
> On Wed, Jul 16 2003, James Bottomley wrote:
> > > SCSI does it right, don't worry. It's a block layer problem that we need
> > > to find out how to correctly make sure that we deal it. Basically
> > > elv_next_request() does an implicit elv_started_request(), at least that
> > > is what AS assumes. So request re-adds should notify the scheduler of
> > > that fact.
> > 
> > What about exporting a more generic requeue function (perhaps one that
> > would take care of ending the tag)?
> > 
> > Our current requeue code looks like
> > 
> > 	if (blk_rq_tagged(req))
> > 		blk_queue_end_tag(q, req);
> > 	__elv_add_request(q, req, 0, 0);
> > 
> > which looks slightly fragile given that we have to remember to end the
> > tag and then call a __ function (which are usually private).
> 
> Not usually private, but lockless.
> 
> > If we called an explicit request requeue hook, schedulers that need to
> > adjust for requeueing would have a single place they could plug into?
> 
> blk_insert_request() is close, but it does a little more (it's useful
> another place in the scsi queueing, though). How about something like
> this? Then I can add the 'back on queue but not completed' stuff after
> that.
> 
> ===== drivers/block/ll_rw_blk.c 1.192 vs edited =====
> --- 1.192/drivers/block/ll_rw_blk.c	Sun Jul 13 14:15:43 2003
> +++ edited/drivers/block/ll_rw_blk.c	Wed Jul 16 15:17:23 2003
> @@ -1494,6 +1494,23 @@
>  
>  	return rq;
>  }
> +/**
> + * blk_requeue_request - put a request back on queue
> + * @q:		request queue where request should be inserted
> + * @rq:		request to be inserted
> + *
> + * Description:
> + *    Drivers often keep queueing requests until the hardware cannot accept
> + *    more, when that condition happens we need to put the request back
> + *    on the queue. Must be called with queue lock held.
> + */
> +void blk_requeue_request(request_queue_t *q, struct request *rq)
> +{
> +	if (blk_rq_tagged(rq))
> +		blk_queue_end_tag(q, rq);
> +
> +	__elv_add_request(q, rq, 0, 0);
> +}
>  
>  /**
>   * blk_insert_request - insert a special request in to a request queue
> ===== drivers/scsi/scsi_lib.c 1.99 vs edited =====
> --- 1.99/drivers/scsi/scsi_lib.c	Mon Jun 30 03:14:44 2003
> +++ edited/drivers/scsi/scsi_lib.c	Wed Jul 16 15:17:52 2003
> @@ -444,22 +444,8 @@
>   */
>  static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd)
>  {
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(q->queue_lock, flags);
> -	cmd->request->special = cmd;
> -	if (blk_rq_tagged(cmd->request))
> -		blk_queue_end_tag(q, cmd->request);
> -
> -	/*
> -	 * set REQ_SPECIAL - we have a command
> -	 * clear REQ_DONTPREP - we assume the sg table has been 
> -	 *	nuked so we need to set it up again.
> -	 */
> -	cmd->request->flags |= REQ_SPECIAL;
>  	cmd->request->flags &= ~REQ_DONTPREP;
> -	__elv_add_request(q, cmd->request, 0, 0);
> -	spin_unlock_irqrestore(q->queue_lock, flags);
> +	blk_insert_request(cmd->request, 1, cmd);

blk_insert_request should have the q pointer as the first arg shouldn't
it?

Mark.


>  
>  	scsi_run_queue(q);
>  }
> @@ -1213,9 +1199,7 @@
>  	 * later time.
>  	 */
>  	spin_lock_irq(q->queue_lock);
> -	if (blk_rq_tagged(req))
> -		blk_queue_end_tag(q, req);
> -	__elv_add_request(q, req, 0, 0);
> +	blk_requeue_request(q, req);
>  	sdev->device_busy--;
>  	if(sdev->device_busy == 0)
>  		blk_plug_device(q);
> ===== include/linux/blkdev.h 1.116 vs edited =====
> --- 1.116/include/linux/blkdev.h	Sat Jul  5 08:52:51 2003
> +++ edited/include/linux/blkdev.h	Wed Jul 16 15:18:06 2003
> @@ -491,6 +491,7 @@
>  extern struct request *blk_get_request(request_queue_t *, int, int);
>  extern void blk_put_request(struct request *);
>  extern void blk_insert_request(request_queue_t *, struct request *, int, void *);
> +extern void blk_requeue_request(request_queue_t *, struct request *);
>  extern void blk_plug_device(request_queue_t *);
>  extern int blk_remove_plug(request_queue_t *);
>  extern void blk_recount_segments(request_queue_t *, struct bio *);
-- 
Mark Haverkamp <markh@osdl.org>


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-16 17:04           ` Jens Axboe
@ 2003-07-17  8:57             ` Andrew Morton
  2003-07-17  8:59               ` Jens Axboe
  0 siblings, 1 reply; 57+ messages in thread
From: Andrew Morton @ 2003-07-17  8:57 UTC (permalink / raw)
  To: Jens Axboe; +Cc: James.Bottomley, markh, piggin, cliffw, linux-scsi


So this is what I ended up with.  Could we please have confirmation that it
fixes the aacraid hang?


 25-akpm/drivers/block/ll_rw_blk.c |   18 ++++++++++++++++++
 25-akpm/drivers/scsi/scsi_lib.c   |   20 ++------------------
 25-akpm/include/linux/blkdev.h    |    1 +
 3 files changed, 21 insertions(+), 18 deletions(-)

diff -puN drivers/block/ll_rw_blk.c~blk_requeue_request drivers/block/ll_rw_blk.c
--- 25/drivers/block/ll_rw_blk.c~blk_requeue_request	Wed Jul 16 16:02:39 2003
+++ 25-akpm/drivers/block/ll_rw_blk.c	Wed Jul 16 16:02:39 2003
@@ -1508,6 +1508,23 @@ struct request *blk_get_request(request_
 
 	return rq;
 }
+/**
+ * blk_requeue_request - put a request back on queue
+ * @q:		request queue where request should be inserted
+ * @rq:		request to be inserted
+ *
+ * Description:
+ *    Drivers often keep queueing requests until the hardware cannot accept
+ *    more, when that condition happens we need to put the request back
+ *    on the queue. Must be called with queue lock held.
+ */
+void blk_requeue_request(request_queue_t *q, struct request *rq)
+{
+	if (blk_rq_tagged(rq))
+		blk_queue_end_tag(q, rq);
+
+	__elv_add_request(q, rq, 0, 0);
+}
 
 /**
  * blk_insert_request - insert a special request in to a request queue
@@ -2745,6 +2762,7 @@ EXPORT_SYMBOL(blk_hw_contig_segment);
 EXPORT_SYMBOL(blk_get_request);
 EXPORT_SYMBOL(blk_put_request);
 EXPORT_SYMBOL(blk_insert_request);
+EXPORT_SYMBOL(blk_requeue_request);
 
 EXPORT_SYMBOL(blk_queue_prep_rq);
 EXPORT_SYMBOL(blk_queue_merge_bvec);
diff -puN drivers/scsi/scsi_lib.c~blk_requeue_request drivers/scsi/scsi_lib.c
--- 25/drivers/scsi/scsi_lib.c~blk_requeue_request	Wed Jul 16 16:02:39 2003
+++ 25-akpm/drivers/scsi/scsi_lib.c	Wed Jul 16 16:03:28 2003
@@ -444,22 +444,8 @@ static void scsi_run_queue(struct reques
  */
 static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	cmd->request->special = cmd;
-	if (blk_rq_tagged(cmd->request))
-		blk_queue_end_tag(q, cmd->request);
-
-	/*
-	 * set REQ_SPECIAL - we have a command
-	 * clear REQ_DONTPREP - we assume the sg table has been 
-	 *	nuked so we need to set it up again.
-	 */
-	cmd->request->flags |= REQ_SPECIAL;
 	cmd->request->flags &= ~REQ_DONTPREP;
-	__elv_add_request(q, cmd->request, 0, 0);
-	spin_unlock_irqrestore(q->queue_lock, flags);
+	blk_insert_request(q, cmd->request, 1, cmd);
 
 	scsi_run_queue(q);
 }
@@ -1213,9 +1199,7 @@ static void scsi_request_fn(struct reque
 	 * later time.
 	 */
 	spin_lock_irq(q->queue_lock);
-	if (blk_rq_tagged(req))
-		blk_queue_end_tag(q, req);
-	__elv_add_request(q, req, 0, 0);
+	blk_requeue_request(q, req);
 	sdev->device_busy--;
 	if(sdev->device_busy == 0)
 		blk_plug_device(q);
diff -puN include/linux/blkdev.h~blk_requeue_request include/linux/blkdev.h
--- 25/include/linux/blkdev.h~blk_requeue_request	Wed Jul 16 16:02:39 2003
+++ 25-akpm/include/linux/blkdev.h	Wed Jul 16 16:02:39 2003
@@ -491,6 +491,7 @@ extern void __blk_attempt_remerge(reques
 extern struct request *blk_get_request(request_queue_t *, int, int);
 extern void blk_put_request(struct request *);
 extern void blk_insert_request(request_queue_t *, struct request *, int, void *);
+extern void blk_requeue_request(request_queue_t *, struct request *);
 extern void blk_plug_device(request_queue_t *);
 extern int blk_remove_plug(request_queue_t *);
 extern void blk_recount_segments(request_queue_t *, struct bio *);

_


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-17  8:57             ` Andrew Morton
@ 2003-07-17  8:59               ` Jens Axboe
  2003-07-17  9:56                 ` Nick Piggin
  0 siblings, 1 reply; 57+ messages in thread
From: Jens Axboe @ 2003-07-17  8:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: James.Bottomley, markh, piggin, cliffw, linux-scsi

On Thu, Jul 17 2003, Andrew Morton wrote:
> 
> So this is what I ended up with.  Could we please have confirmation that it
> fixes the aacraid hang?

It doesn't, it's just a pre-requisite to fixing the bug :-)

Nick should chime in with how we wants it to be handled from
blk_requeue_request(), he needs to decrease dispatched from there. We
could always add some hook for it of course, but...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-17  8:59               ` Jens Axboe
@ 2003-07-17  9:56                 ` Nick Piggin
  2003-07-17 10:29                   ` Jens Axboe
  0 siblings, 1 reply; 57+ messages in thread
From: Nick Piggin @ 2003-07-17  9:56 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, James.Bottomley, markh, cliffw, linux-scsi



Jens Axboe wrote:

>On Thu, Jul 17 2003, Andrew Morton wrote:
>
>>So this is what I ended up with.  Could we please have confirmation that it
>>fixes the aacraid hang?
>>
>
>It doesn't, it's just a pre-requisite to fixing the bug :-)
>
>Nick should chime in with how we wants it to be handled from
>blk_requeue_request(), he needs to decrease dispatched from there. We
>could always add some hook for it of course, but...
>

Well you could just put an elv_completed_request in there, but
I suppose it really wants an elv_requeue_request - which would
just default to elv_add_request for other schedulers.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-17  9:56                 ` Nick Piggin
@ 2003-07-17 10:29                   ` Jens Axboe
  2003-07-17 10:51                     ` Nick Piggin
  0 siblings, 1 reply; 57+ messages in thread
From: Jens Axboe @ 2003-07-17 10:29 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, James.Bottomley, markh, cliffw, linux-scsi

On Thu, Jul 17 2003, Nick Piggin wrote:
> 
> 
> Jens Axboe wrote:
> 
> >On Thu, Jul 17 2003, Andrew Morton wrote:
> >
> >>So this is what I ended up with.  Could we please have confirmation that 
> >>it
> >>fixes the aacraid hang?
> >>
> >
> >It doesn't, it's just a pre-requisite to fixing the bug :-)
> >
> >Nick should chime in with how we wants it to be handled from
> >blk_requeue_request(), he needs to decrease dispatched from there. We
> >could always add some hook for it of course, but...
> >
> 
> Well you could just put an elv_completed_request in there, but

Yeah, I told Andrew to add that for now.

> I suppose it really wants an elv_requeue_request - which would
> just default to elv_add_request for other schedulers.

I'd rather keep it seperate, ie just a requeue notifier. How's this?

===== drivers/block/elevator.c 1.46 vs edited =====
--- 1.46/drivers/block/elevator.c	Sat Jul  5 08:52:45 2003
+++ edited/drivers/block/elevator.c	Thu Jul 17 12:27:19 2003
@@ -214,6 +214,12 @@
 		e->elevator_merge_req_fn(q, rq, next);
 }
 
+void elv_requeue_request(request_queue_t *q, struct request *rq)
+{
+	if (q->elevator.elevator_requeue_req_fn)
+		q->elevator.elevator_requeue_req_fn(q, rq);
+}
+
 void __elv_add_request(request_queue_t *q, struct request *rq, int at_end,
 		       int plug)
 {
===== drivers/block/ll_rw_blk.c 1.192 vs edited =====
--- 1.192/drivers/block/ll_rw_blk.c	Sun Jul 13 14:15:43 2003
+++ edited/drivers/block/ll_rw_blk.c	Thu Jul 17 12:23:09 2003
@@ -1494,6 +1494,24 @@
 
 	return rq;
 }
+/**
+ * blk_requeue_request - put a request back on queue
+ * @q:		request queue where request should be inserted
+ * @rq:		request to be inserted
+ *
+ * Description:
+ *    Drivers often keep queueing requests until the hardware cannot accept
+ *    more, when that condition happens we need to put the request back
+ *    on the queue. Must be called with queue lock held.
+ */
+void blk_requeue_request(request_queue_t *q, struct request *rq)
+{
+	if (blk_rq_tagged(rq))
+		blk_queue_end_tag(q, rq);
+
+	elv_requeue_request(q, rq);
+	__elv_add_request(q, rq, 0, 0);
+}
 
 /**
  * blk_insert_request - insert a special request in to a request queue
===== drivers/scsi/scsi_lib.c 1.99 vs edited =====
--- 1.99/drivers/scsi/scsi_lib.c	Mon Jun 30 03:14:44 2003
+++ edited/drivers/scsi/scsi_lib.c	Thu Jul 17 12:22:46 2003
@@ -444,22 +444,8 @@
  */
 static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	cmd->request->special = cmd;
-	if (blk_rq_tagged(cmd->request))
-		blk_queue_end_tag(q, cmd->request);
-
-	/*
-	 * set REQ_SPECIAL - we have a command
-	 * clear REQ_DONTPREP - we assume the sg table has been 
-	 *	nuked so we need to set it up again.
-	 */
-	cmd->request->flags |= REQ_SPECIAL;
 	cmd->request->flags &= ~REQ_DONTPREP;
-	__elv_add_request(q, cmd->request, 0, 0);
-	spin_unlock_irqrestore(q->queue_lock, flags);
+	blk_insert_request(q, cmd->request, 1, cmd);
 
 	scsi_run_queue(q);
 }
@@ -1213,9 +1199,7 @@
 	 * later time.
 	 */
 	spin_lock_irq(q->queue_lock);
-	if (blk_rq_tagged(req))
-		blk_queue_end_tag(q, req);
-	__elv_add_request(q, req, 0, 0);
+	blk_requeue_request(q, req);
 	sdev->device_busy--;
 	if(sdev->device_busy == 0)
 		blk_plug_device(q);
===== include/linux/blkdev.h 1.116 vs edited =====
--- 1.116/include/linux/blkdev.h	Sat Jul  5 08:52:51 2003
+++ edited/include/linux/blkdev.h	Wed Jul 16 15:18:06 2003
@@ -491,6 +491,7 @@
 extern struct request *blk_get_request(request_queue_t *, int, int);
 extern void blk_put_request(struct request *);
 extern void blk_insert_request(request_queue_t *, struct request *, int, void *);
+extern void blk_requeue_request(request_queue_t *, struct request *);
 extern void blk_plug_device(request_queue_t *);
 extern int blk_remove_plug(request_queue_t *);
 extern void blk_recount_segments(request_queue_t *, struct bio *);
===== include/linux/elevator.h 1.25 vs edited =====
--- 1.25/include/linux/elevator.h	Sat Jul  5 08:52:40 2003
+++ edited/include/linux/elevator.h	Thu Jul 17 12:24:29 2003
@@ -13,6 +13,7 @@
 typedef void (elevator_add_req_fn) (request_queue_t *, struct request *, struct list_head *);
 typedef int (elevator_queue_empty_fn) (request_queue_t *);
 typedef void (elevator_remove_req_fn) (request_queue_t *, struct request *);
+typedef void (elevator_requeue_req_fn) (request_queue_t *, struct request *);
 typedef struct request *(elevator_request_list_fn) (request_queue_t *, struct request *);
 typedef struct list_head *(elevator_get_sort_head_fn) (request_queue_t *, struct request *);
 typedef void (elevator_completed_req_fn) (request_queue_t *, struct request *);
@@ -33,6 +34,7 @@
 	elevator_next_req_fn *elevator_next_req_fn;
 	elevator_add_req_fn *elevator_add_req_fn;
 	elevator_remove_req_fn *elevator_remove_req_fn;
+	elevator_requeue_req_fn *elevator_requeue_req_fn;
 
 	elevator_queue_empty_fn *elevator_queue_empty_fn;
 	elevator_completed_req_fn *elevator_completed_req_fn;
@@ -64,6 +66,7 @@
 			       struct request *);
 extern void elv_merged_request(request_queue_t *, struct request *);
 extern void elv_remove_request(request_queue_t *, struct request *);
+extern void elv_requeue_request(request_queue_t *, struct request *);
 extern int elv_queue_empty(request_queue_t *);
 extern struct request *elv_next_request(struct request_queue *q);
 extern struct request *elv_former_request(request_queue_t *, struct request *);

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-17 10:29                   ` Jens Axboe
@ 2003-07-17 10:51                     ` Nick Piggin
  2003-07-17 10:56                       ` Jens Axboe
  2003-07-17 10:57                       ` Jens Axboe
  0 siblings, 2 replies; 57+ messages in thread
From: Nick Piggin @ 2003-07-17 10:51 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, James.Bottomley, markh, cliffw, linux-scsi



Jens Axboe wrote:

>On Thu, Jul 17 2003, Nick Piggin wrote:
>
>>
>>Jens Axboe wrote:
>>
>>
>>>On Thu, Jul 17 2003, Andrew Morton wrote:
>>>
>>>
>>>>So this is what I ended up with.  Could we please have confirmation that 
>>>>it
>>>>fixes the aacraid hang?
>>>>
>>>>
>>>It doesn't, it's just a pre-requisite to fixing the bug :-)
>>>
>>>Nick should chime in with how we wants it to be handled from
>>>blk_requeue_request(), he needs to decrease dispatched from there. We
>>>could always add some hook for it of course, but...
>>>
>>>
>>Well you could just put an elv_completed_request in there, but
>>
>
>Yeah, I told Andrew to add that for now.
>
>
>>I suppose it really wants an elv_requeue_request - which would
>>just default to elv_add_request for other schedulers.
>>
>
>I'd rather keep it seperate, ie just a requeue notifier. How's this?
>

Well it would be much nicer for AS if it were seperate. Basically
AS wants the requeue implemented as as_add_request but without
the accounting updates: there has been no request completed, and
no really new request.

If the requeue were seperate to the add, it would simply be a call
to as_completed_request in as-iosched.c.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-17 10:51                     ` Nick Piggin
@ 2003-07-17 10:56                       ` Jens Axboe
  2003-07-17 11:09                         ` Nick Piggin
  2003-07-17 10:57                       ` Jens Axboe
  1 sibling, 1 reply; 57+ messages in thread
From: Jens Axboe @ 2003-07-17 10:56 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, James.Bottomley, markh, cliffw, linux-scsi

On Thu, Jul 17 2003, Nick Piggin wrote:
> 
> 
> Jens Axboe wrote:
> 
> >On Thu, Jul 17 2003, Nick Piggin wrote:
> >
> >>
> >>Jens Axboe wrote:
> >>
> >>
> >>>On Thu, Jul 17 2003, Andrew Morton wrote:
> >>>
> >>>
> >>>>So this is what I ended up with.  Could we please have confirmation 
> >>>>that it
> >>>>fixes the aacraid hang?
> >>>>
> >>>>
> >>>It doesn't, it's just a pre-requisite to fixing the bug :-)
> >>>
> >>>Nick should chime in with how we wants it to be handled from
> >>>blk_requeue_request(), he needs to decrease dispatched from there. We
> >>>could always add some hook for it of course, but...
> >>>
> >>>
> >>Well you could just put an elv_completed_request in there, but
> >>
> >
> >Yeah, I told Andrew to add that for now.
> >
> >
> >>I suppose it really wants an elv_requeue_request - which would
> >>just default to elv_add_request for other schedulers.
> >>
> >
> >I'd rather keep it seperate, ie just a requeue notifier. How's this?
> >
> 
> Well it would be much nicer for AS if it were seperate. Basically
> AS wants the requeue implemented as as_add_request but without
> the accounting updates: there has been no request completed, and
> no really new request.
> 
> If the requeue were seperate to the add, it would simply be a call
> to as_completed_request in as-iosched.c.

Like this then? Nicer semantics, too.

===== drivers/block/elevator.c 1.46 vs edited =====
--- 1.46/drivers/block/elevator.c	Sat Jul  5 08:52:45 2003
+++ edited/drivers/block/elevator.c	Thu Jul 17 12:55:51 2003
@@ -214,6 +214,18 @@
 		e->elevator_merge_req_fn(q, rq, next);
 }
 
+void elv_requeue_request(request_queue_t *q, struct request *rq)
+{
+	/*
+	 * if iosched has an explicit requeue hook, then use that. otherwise
+	 * just put the request at the front of the queue
+	 */
+	if (q->elevator.elevator_requeue_req_fn)
+		q->elevator.elevator_requeue_req_fn(q, rq);
+	else
+		__elv_add_request(q, rq, 0, 0);
+}
+
 void __elv_add_request(request_queue_t *q, struct request *rq, int at_end,
 		       int plug)
 {
@@ -417,6 +429,7 @@
 
 EXPORT_SYMBOL(elv_add_request);
 EXPORT_SYMBOL(__elv_add_request);
+EXPORT_SYMBOL(elv_requeue_request);
 EXPORT_SYMBOL(elv_next_request);
 EXPORT_SYMBOL(elv_remove_request);
 EXPORT_SYMBOL(elv_queue_empty);
===== drivers/block/ll_rw_blk.c 1.192 vs edited =====
--- 1.192/drivers/block/ll_rw_blk.c	Sun Jul 13 14:15:43 2003
+++ edited/drivers/block/ll_rw_blk.c	Thu Jul 17 12:56:00 2003
@@ -1494,6 +1494,23 @@
 
 	return rq;
 }
+/**
+ * blk_requeue_request - put a request back on queue
+ * @q:		request queue where request should be inserted
+ * @rq:		request to be inserted
+ *
+ * Description:
+ *    Drivers often keep queueing requests until the hardware cannot accept
+ *    more, when that condition happens we need to put the request back
+ *    on the queue. Must be called with queue lock held.
+ */
+void blk_requeue_request(request_queue_t *q, struct request *rq)
+{
+	if (blk_rq_tagged(rq))
+		blk_queue_end_tag(q, rq);
+
+	elv_requeue_request(q, rq);
+}
 
 /**
  * blk_insert_request - insert a special request in to a request queue
@@ -2730,6 +2747,7 @@
 EXPORT_SYMBOL(blk_get_request);
 EXPORT_SYMBOL(blk_put_request);
 EXPORT_SYMBOL(blk_insert_request);
+EXPORT_SYMBOL(blk_requeue_request);
 
 EXPORT_SYMBOL(blk_queue_prep_rq);
 EXPORT_SYMBOL(blk_queue_merge_bvec);
===== drivers/scsi/scsi_lib.c 1.99 vs edited =====
--- 1.99/drivers/scsi/scsi_lib.c	Mon Jun 30 03:14:44 2003
+++ edited/drivers/scsi/scsi_lib.c	Thu Jul 17 12:22:46 2003
@@ -444,22 +444,8 @@
  */
 static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	cmd->request->special = cmd;
-	if (blk_rq_tagged(cmd->request))
-		blk_queue_end_tag(q, cmd->request);
-
-	/*
-	 * set REQ_SPECIAL - we have a command
-	 * clear REQ_DONTPREP - we assume the sg table has been 
-	 *	nuked so we need to set it up again.
-	 */
-	cmd->request->flags |= REQ_SPECIAL;
 	cmd->request->flags &= ~REQ_DONTPREP;
-	__elv_add_request(q, cmd->request, 0, 0);
-	spin_unlock_irqrestore(q->queue_lock, flags);
+	blk_insert_request(q, cmd->request, 1, cmd);
 
 	scsi_run_queue(q);
 }
@@ -1213,9 +1199,7 @@
 	 * later time.
 	 */
 	spin_lock_irq(q->queue_lock);
-	if (blk_rq_tagged(req))
-		blk_queue_end_tag(q, req);
-	__elv_add_request(q, req, 0, 0);
+	blk_requeue_request(q, req);
 	sdev->device_busy--;
 	if(sdev->device_busy == 0)
 		blk_plug_device(q);
===== include/linux/blkdev.h 1.116 vs edited =====
--- 1.116/include/linux/blkdev.h	Sat Jul  5 08:52:51 2003
+++ edited/include/linux/blkdev.h	Wed Jul 16 15:18:06 2003
@@ -491,6 +491,7 @@
 extern struct request *blk_get_request(request_queue_t *, int, int);
 extern void blk_put_request(struct request *);
 extern void blk_insert_request(request_queue_t *, struct request *, int, void *);
+extern void blk_requeue_request(request_queue_t *, struct request *);
 extern void blk_plug_device(request_queue_t *);
 extern int blk_remove_plug(request_queue_t *);
 extern void blk_recount_segments(request_queue_t *, struct bio *);
===== include/linux/elevator.h 1.25 vs edited =====
--- 1.25/include/linux/elevator.h	Sat Jul  5 08:52:40 2003
+++ edited/include/linux/elevator.h	Thu Jul 17 12:24:29 2003
@@ -13,6 +13,7 @@
 typedef void (elevator_add_req_fn) (request_queue_t *, struct request *, struct list_head *);
 typedef int (elevator_queue_empty_fn) (request_queue_t *);
 typedef void (elevator_remove_req_fn) (request_queue_t *, struct request *);
+typedef void (elevator_requeue_req_fn) (request_queue_t *, struct request *);
 typedef struct request *(elevator_request_list_fn) (request_queue_t *, struct request *);
 typedef struct list_head *(elevator_get_sort_head_fn) (request_queue_t *, struct request *);
 typedef void (elevator_completed_req_fn) (request_queue_t *, struct request *);
@@ -33,6 +34,7 @@
 	elevator_next_req_fn *elevator_next_req_fn;
 	elevator_add_req_fn *elevator_add_req_fn;
 	elevator_remove_req_fn *elevator_remove_req_fn;
+	elevator_requeue_req_fn *elevator_requeue_req_fn;
 
 	elevator_queue_empty_fn *elevator_queue_empty_fn;
 	elevator_completed_req_fn *elevator_completed_req_fn;
@@ -64,6 +66,7 @@
 			       struct request *);
 extern void elv_merged_request(request_queue_t *, struct request *);
 extern void elv_remove_request(request_queue_t *, struct request *);
+extern void elv_requeue_request(request_queue_t *, struct request *);
 extern int elv_queue_empty(request_queue_t *);
 extern struct request *elv_next_request(struct request_queue *q);
 extern struct request *elv_former_request(request_queue_t *, struct request *);

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-17 10:51                     ` Nick Piggin
  2003-07-17 10:56                       ` Jens Axboe
@ 2003-07-17 10:57                       ` Jens Axboe
  2003-07-17 11:08                         ` Nick Piggin
  1 sibling, 1 reply; 57+ messages in thread
From: Jens Axboe @ 2003-07-17 10:57 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, James.Bottomley, markh, cliffw, linux-scsi

On Thu, Jul 17 2003, Nick Piggin wrote:
> If the requeue were seperate to the add, it would simply be a call
> to as_completed_request in as-iosched.c.

Out of laziness, or what? Just the accounting, not the actual completion
bits.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-17 10:57                       ` Jens Axboe
@ 2003-07-17 11:08                         ` Nick Piggin
  2003-07-17 11:10                           ` Jens Axboe
  0 siblings, 1 reply; 57+ messages in thread
From: Nick Piggin @ 2003-07-17 11:08 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, James.Bottomley, markh, cliffw, linux-scsi



Jens Axboe wrote:

>On Thu, Jul 17 2003, Nick Piggin wrote:
>
>>If the requeue were seperate to the add, it would simply be a call
>>to as_completed_request in as-iosched.c.
>>
>
>Out of laziness, or what? Just the accounting, not the actual completion
>bits.
>

Well no it would have to because it has nothing to associate
the re-add with a re-add and not a new add, if you know what
I mean.

So it would have to do all the cleanup stuff that
as_completed_request does. Its possible I guess, that we could
set a flag in the as_rq's state field, to say that the request
is a re-add, but thats not very clean IMO.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-17 10:56                       ` Jens Axboe
@ 2003-07-17 11:09                         ` Nick Piggin
  2003-07-17 11:11                           ` Jens Axboe
  0 siblings, 1 reply; 57+ messages in thread
From: Nick Piggin @ 2003-07-17 11:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, James.Bottomley, markh, cliffw, linux-scsi



Jens Axboe wrote:

>On Thu, Jul 17 2003, Nick Piggin wrote:
>
>>
>>Jens Axboe wrote:
>>
>>
>>>On Thu, Jul 17 2003, Nick Piggin wrote:
>>>
>>>
>>>>Jens Axboe wrote:
>>>>
>>>>
>>>>
>>>>>On Thu, Jul 17 2003, Andrew Morton wrote:
>>>>>
>>>>>
>>>>>
>>>>>>So this is what I ended up with.  Could we please have confirmation 
>>>>>>that it
>>>>>>fixes the aacraid hang?
>>>>>>
>>>>>>
>>>>>>
>>>>>It doesn't, it's just a pre-requisite to fixing the bug :-)
>>>>>
>>>>>Nick should chime in with how we wants it to be handled from
>>>>>blk_requeue_request(), he needs to decrease dispatched from there. We
>>>>>could always add some hook for it of course, but...
>>>>>
>>>>>
>>>>>
>>>>Well you could just put an elv_completed_request in there, but
>>>>
>>>>
>>>Yeah, I told Andrew to add that for now.
>>>
>>>
>>>
>>>>I suppose it really wants an elv_requeue_request - which would
>>>>just default to elv_add_request for other schedulers.
>>>>
>>>>
>>>I'd rather keep it seperate, ie just a requeue notifier. How's this?
>>>
>>>
>>Well it would be much nicer for AS if it were seperate. Basically
>>AS wants the requeue implemented as as_add_request but without
>>the accounting updates: there has been no request completed, and
>>no really new request.
>>
>>If the requeue were seperate to the add, it would simply be a call
>>to as_completed_request in as-iosched.c.
>>
>
>Like this then? Nicer semantics, too.
>

Jens, this is exactly right. Would you be OK with this? Everyone else?


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-17 11:08                         ` Nick Piggin
@ 2003-07-17 11:10                           ` Jens Axboe
  2003-07-17 11:21                             ` Nick Piggin
  0 siblings, 1 reply; 57+ messages in thread
From: Jens Axboe @ 2003-07-17 11:10 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, James.Bottomley, markh, cliffw, linux-scsi

On Thu, Jul 17 2003, Nick Piggin wrote:
> 
> 
> Jens Axboe wrote:
> 
> >On Thu, Jul 17 2003, Nick Piggin wrote:
> >
> >>If the requeue were seperate to the add, it would simply be a call
> >>to as_completed_request in as-iosched.c.
> >>
> >
> >Out of laziness, or what? Just the accounting, not the actual completion
> >bits.
> >
> 
> Well no it would have to because it has nothing to associate
> the re-add with a re-add and not a new add, if you know what
> I mean.
> 
> So it would have to do all the cleanup stuff that
> as_completed_request does. Its possible I guess, that we could
> set a flag in the as_rq's state field, to say that the request
> is a re-add, but thats not very clean IMO.

Wont that possibly cause it to make bad decisions? From AS pov, it looks
like an immediate complete of a request. Which it isn't.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-17 11:09                         ` Nick Piggin
@ 2003-07-17 11:11                           ` Jens Axboe
  2003-07-17 11:28                             ` Nick Piggin
  0 siblings, 1 reply; 57+ messages in thread
From: Jens Axboe @ 2003-07-17 11:11 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, James.Bottomley, markh, cliffw, linux-scsi

On Thu, Jul 17 2003, Nick Piggin wrote:
> 
> 
> Jens Axboe wrote:
> 
> >On Thu, Jul 17 2003, Nick Piggin wrote:
> >
> >>
> >>Jens Axboe wrote:
> >>
> >>
> >>>On Thu, Jul 17 2003, Nick Piggin wrote:
> >>>
> >>>
> >>>>Jens Axboe wrote:
> >>>>
> >>>>
> >>>>
> >>>>>On Thu, Jul 17 2003, Andrew Morton wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>>So this is what I ended up with.  Could we please have confirmation 
> >>>>>>that it
> >>>>>>fixes the aacraid hang?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>It doesn't, it's just a pre-requisite to fixing the bug :-)
> >>>>>
> >>>>>Nick should chime in with how we wants it to be handled from
> >>>>>blk_requeue_request(), he needs to decrease dispatched from there. We
> >>>>>could always add some hook for it of course, but...
> >>>>>
> >>>>>
> >>>>>
> >>>>Well you could just put an elv_completed_request in there, but
> >>>>
> >>>>
> >>>Yeah, I told Andrew to add that for now.
> >>>
> >>>
> >>>
> >>>>I suppose it really wants an elv_requeue_request - which would
> >>>>just default to elv_add_request for other schedulers.
> >>>>
> >>>>
> >>>I'd rather keep it seperate, ie just a requeue notifier. How's this?
> >>>
> >>>
> >>Well it would be much nicer for AS if it were seperate. Basically
> >>AS wants the requeue implemented as as_add_request but without
> >>the accounting updates: there has been no request completed, and
> >>no really new request.
> >>
> >>If the requeue were seperate to the add, it would simply be a call
> >>to as_completed_request in as-iosched.c.
> >>
> >
> >Like this then? Nicer semantics, too.
> >
> 
> Jens, this is exactly right. Would you be OK with this? Everyone else?

I'm happy with it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-17 11:10                           ` Jens Axboe
@ 2003-07-17 11:21                             ` Nick Piggin
  2003-07-17 11:23                               ` Jens Axboe
  0 siblings, 1 reply; 57+ messages in thread
From: Nick Piggin @ 2003-07-17 11:21 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, James.Bottomley, markh, cliffw, linux-scsi



Jens Axboe wrote:

>On Thu, Jul 17 2003, Nick Piggin wrote:
>
>>
>>Jens Axboe wrote:
>>
>>
>>>On Thu, Jul 17 2003, Nick Piggin wrote:
>>>
>>>
>>>>If the requeue were seperate to the add, it would simply be a call
>>>>to as_completed_request in as-iosched.c.
>>>>
>>>>
>>>Out of laziness, or what? Just the accounting, not the actual completion
>>>bits.
>>>
>>>
>>Well no it would have to because it has nothing to associate
>>the re-add with a re-add and not a new add, if you know what
>>I mean.
>>
>>So it would have to do all the cleanup stuff that
>>as_completed_request does. Its possible I guess, that we could
>>set a flag in the as_rq's state field, to say that the request
>>is a re-add, but thats not very clean IMO.
>>
>
>Wont that possibly cause it to make bad decisions? From AS pov, it looks
>like an immediate complete of a request. Which it isn't.
>

Yeah thats correct - AS assumes elv_add_request is an entirely new
request from the process point of view, so yeah the quick fix wasn't
entirely optimal for AS.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-17 11:21                             ` Nick Piggin
@ 2003-07-17 11:23                               ` Jens Axboe
  2003-07-17 11:29                                 ` Nick Piggin
  0 siblings, 1 reply; 57+ messages in thread
From: Jens Axboe @ 2003-07-17 11:23 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, James.Bottomley, markh, cliffw, linux-scsi

On Thu, Jul 17 2003, Nick Piggin wrote:
> 
> 
> Jens Axboe wrote:
> 
> >On Thu, Jul 17 2003, Nick Piggin wrote:
> >
> >>
> >>Jens Axboe wrote:
> >>
> >>
> >>>On Thu, Jul 17 2003, Nick Piggin wrote:
> >>>
> >>>
> >>>>If the requeue were seperate to the add, it would simply be a call
> >>>>to as_completed_request in as-iosched.c.
> >>>>
> >>>>
> >>>Out of laziness, or what? Just the accounting, not the actual completion
> >>>bits.
> >>>
> >>>
> >>Well no it would have to because it has nothing to associate
> >>the re-add with a re-add and not a new add, if you know what
> >>I mean.
> >>
> >>So it would have to do all the cleanup stuff that
> >>as_completed_request does. Its possible I guess, that we could
> >>set a flag in the as_rq's state field, to say that the request
> >>is a re-add, but thats not very clean IMO.
> >>
> >
> >Wont that possibly cause it to make bad decisions? From AS pov, it looks
> >like an immediate complete of a request. Which it isn't.
> >
> 
> Yeah thats correct - AS assumes elv_add_request is an entirely new
> request from the process point of view, so yeah the quick fix wasn't
> entirely optimal for AS.

But elv_requeue_req_fn gives you the possibility to do it right, just
wondering since you said you'd just make that basically:

as_requeue_req_fn()
{
	as_completed_req();
	as_add_req();
}

which could done a lot better.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-17 11:11                           ` Jens Axboe
@ 2003-07-17 11:28                             ` Nick Piggin
  2003-07-17 11:29                               ` Jens Axboe
                                                 ` (2 more replies)
  0 siblings, 3 replies; 57+ messages in thread
From: Nick Piggin @ 2003-07-17 11:28 UTC (permalink / raw)
  To: Jens Axboe, markh; +Cc: Andrew Morton, James.Bottomley, cliffw, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 334 bytes --]



Jens Axboe wrote:

>On Thu, Jul 17 2003, Nick Piggin wrote:
>

snip

>>
>>Jens, this is exactly right. Would you be OK with this? Everyone else?
>>
>
>I'm happy with it.
>

OK, Mark, please try the following patch in combination with Jens'
previous. Untested due to not being able to exercise the requeue
path, so it might blow up.

[-- Attachment #2: as-requeue --]
[-- Type: text/plain, Size: 2265 bytes --]

--- linux-2.6/drivers/block/as-iosched.c.orig	2003-07-17 21:13:55.000000000 +1000
+++ linux-2.6/drivers/block/as-iosched.c	2003-07-17 21:24:07.000000000 +1000
@@ -1307,6 +1307,66 @@ static void as_add_request(struct as_dat
 	as_update_arq(ad, arq); /* keep state machine up to date */
 }
 
+/*
+ * requeue the request. The request has not been completed, nor is it a
+ * new request, so don't touch accounting.
+ */
+static void as_requeue_request(request_queue_t *q, struct request *rq)
+{
+	struct as_data *ad = q->elevator.elevator_data;
+	struct as_rq *arq = RQ_DATA(rq);
+	int data_dir;
+
+	if (unlikely(rq->flags & REQ_HARDBARRIER)) {
+		AS_INVALIDATE_HASH(ad);
+		q->last_merge = NULL;
+
+		while (ad->next_arq[REQ_SYNC])
+			as_move_to_dispatch(ad, ad->next_arq[REQ_SYNC]);
+
+		while (ad->next_arq[REQ_ASYNC])
+			as_move_to_dispatch(ad, ad->next_arq[REQ_ASYNC]);
+	}
+
+	if (unlikely(!blk_fs_request(rq))) {
+		list_add(&rq->queuelist, ad->dispatch->prev);
+
+		/* Stop anticipating - let this request get through */
+		if (!list_empty(ad->dispatch)
+			&& (ad->antic_status == ANTIC_WAIT_REQ
+				|| ad->antic_status == ANTIC_WAIT_NEXT))
+			as_antic_stop(ad);
+
+		return;
+	}
+
+	if (rq_mergeable(rq)) {
+		as_add_arq_hash(ad, arq);
+
+		if (!q->last_merge)
+			q->last_merge = &rq->queuelist;
+	}
+
+
+	if (rq_data_dir(arq->request) == READ
+			|| current->flags&PF_SYNCWRITE)
+		arq->is_sync = 1;
+	else
+		arq->is_sync = 0;
+	data_dir = arq->is_sync;
+
+	as_add_arq_rb(ad, arq);
+
+	/*
+	 * set expire time (only used for reads) and add to fifo list
+	 */
+	arq->expires = jiffies + ad->fifo_expire[data_dir];
+	list_add_tail(&arq->fifo, &ad->fifo_list[data_dir]);
+	arq->state = AS_RQ_QUEUED;
+	as_update_arq(ad, arq); /* keep state machine up to date */
+
+}
+
 static void
 as_insert_request(request_queue_t *q, struct request *rq,
 			struct list_head *insert_here)
@@ -1822,6 +1882,7 @@ elevator_t iosched_as = {
 	.elevator_next_req_fn =		as_next_request,
 	.elevator_add_req_fn =		as_insert_request,
 	.elevator_remove_req_fn =	as_remove_request,
+	.elevator_requeue_req_fn = 	as_requeue_request,
 	.elevator_queue_empty_fn =	as_queue_empty,
 	.elevator_completed_req_fn =	as_completed_request,
 	.elevator_former_req_fn =	as_former_request,

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-17 11:28                             ` Nick Piggin
@ 2003-07-17 11:29                               ` Jens Axboe
  2003-07-17 14:44                               ` Mark Haverkamp
  2003-07-17 20:46                               ` Mark Haverkamp
  2 siblings, 0 replies; 57+ messages in thread
From: Jens Axboe @ 2003-07-17 11:29 UTC (permalink / raw)
  To: Nick Piggin; +Cc: markh, Andrew Morton, James.Bottomley, cliffw, linux-scsi

On Thu, Jul 17 2003, Nick Piggin wrote:
> 
> 
> Jens Axboe wrote:
> 
> >On Thu, Jul 17 2003, Nick Piggin wrote:
> >
> 
> snip
> 
> >>
> >>Jens, this is exactly right. Would you be OK with this? Everyone else?
> >>
> >
> >I'm happy with it.
> >
> 
> OK, Mark, please try the following patch in combination with Jens'
> previous. Untested due to not being able to exercise the requeue
> path, so it might blow up.

Looks good Nick! You can discard my comments from the previous mail.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-17 11:23                               ` Jens Axboe
@ 2003-07-17 11:29                                 ` Nick Piggin
  0 siblings, 0 replies; 57+ messages in thread
From: Nick Piggin @ 2003-07-17 11:29 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, James.Bottomley, markh, cliffw, linux-scsi



Jens Axboe wrote:

>On Thu, Jul 17 2003, Nick Piggin wrote:
>
>>
>>Jens Axboe wrote:
>>
>>
>>>On Thu, Jul 17 2003, Nick Piggin wrote:
>>>
>>>
>>>>Jens Axboe wrote:
>>>>
>>>>
>>>>
>>>>>On Thu, Jul 17 2003, Nick Piggin wrote:
>>>>>
>>>>>
>>>>>
>>>>>>If the requeue were seperate to the add, it would simply be a call
>>>>>>to as_completed_request in as-iosched.c.
>>>>>>
>>>>>>
>>>>>>
>>>>>Out of laziness, or what? Just the accounting, not the actual completion
>>>>>bits.
>>>>>
>>>>>
>>>>>
>>>>Well no it would have to because it has nothing to associate
>>>>the re-add with a re-add and not a new add, if you know what
>>>>I mean.
>>>>
>>>>So it would have to do all the cleanup stuff that
>>>>as_completed_request does. Its possible I guess, that we could
>>>>set a flag in the as_rq's state field, to say that the request
>>>>is a re-add, but thats not very clean IMO.
>>>>
>>>>
>>>Wont that possibly cause it to make bad decisions? From AS pov, it looks
>>>like an immediate complete of a request. Which it isn't.
>>>
>>>
>>Yeah thats correct - AS assumes elv_add_request is an entirely new
>>request from the process point of view, so yeah the quick fix wasn't
>>entirely optimal for AS.
>>
>
>But elv_requeue_req_fn gives you the possibility to do it right, just
>wondering since you said you'd just make that basically:
>
>as_requeue_req_fn()
>{
>	as_completed_req();
>	as_add_req();
>}
>
>which could done a lot better.
>

Yeah - if its the requeue operation that is responsible for the re add,
then everything should be fine. See my recent patch in the co-thread.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-17 11:28                             ` Nick Piggin
  2003-07-17 11:29                               ` Jens Axboe
@ 2003-07-17 14:44                               ` Mark Haverkamp
  2003-07-17 15:43                                 ` James Bottomley
  2003-07-17 20:46                               ` Mark Haverkamp
  2 siblings, 1 reply; 57+ messages in thread
From: Mark Haverkamp @ 2003-07-17 14:44 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Jens Axboe, Andrew Morton, James Bottomley, Cliff White, linux-scsi

On Thu, 2003-07-17 at 04:28, Nick Piggin wrote:
> Jens Axboe wrote:
> 
> >On Thu, Jul 17 2003, Nick Piggin wrote:
> >
> 
> snip
> 
> >>
> >>Jens, this is exactly right. Would you be OK with this? Everyone else?
> >>
> >
> >I'm happy with it.
> >
> 
> OK, Mark, please try the following patch in combination with Jens'
> previous. Untested due to not being able to exercise the requeue
> path, so it might blow up.
> 
> ______________________________________________________________________
I added Jens' and your patch and got this when I did a mkfs on an
aacraid disk:


Script started on Thu 17 Jul 2003 07:40:20 AM PDT
[Enter `^Ec?' for help]

[root@dev8-002 root]# ------------[ cut here ]------------
kernel BUG at drivers/block/as-iosched.c:984!
invalid operand: 0000 [#1]
CPU:    0
EIP:    0060:[<c02e1c2b>]    Not tainted
EFLAGS: 00010046
EIP is at as_remove_queued_request+0xdb/0x120
eax: 00000000   ebx: f6b3ad7c   ecx: 00000000   edx: f1f7234c
esi: edeb3004   edi: f0ed8004   ebp: c3f9fdc0   esp: c3f9fda0
ds: 007b   es: 007b   ss: 0068
Process ksoftirqd/0 (pid: 3, threadinfo=c3f9e000 task=c3fda000)
Stack: c02e118b edeb3004 00000000 00000000 00000000 edeb3004 f6b3ad7c 00000000 
       c3f9fde4 c02e1ef2 f0ed8004 edb87004 f13bc240 00000003 edeb3004 00000000 
       f6b3ad7c c3f9fe08 c02e20e8 edeb3004 f6b3ad7c 00000001 f8a3234a edeb3004 
Call Trace:
 [<c02e118b>] as_find_next_arq+0x4b/0x80
 [<c02e1ef2>] as_move_to_dispatch+0x92/0x140
 [<c02e20e8>] as_dispatch_request+0x148/0x280
 [<f8a3234a>] aac_queuecommand+0x1a/0x30 [aacraid]
 [<c02e2258>] as_next_request+0x38/0x50
 [<c02d86a6>] elv_next_request+0x16/0x110
 [<c031afbe>] scsi_request_fn+0x3e/0x430
 [<c02da35e>] blk_run_queue+0x3e/0xa0
 [<c031a3ca>] scsi_run_queue+0xca/0x1a0
 [<c031a631>] scsi_end_request+0xc1/0x110
 [<c031a98b>] scsi_io_completion+0x14b/0x470
 [<c03cf0b5>] ip_rcv+0x365/0x4a8
 [<c032b979>] sd_rw_intr+0x59/0x240
 [<c0315e94>] scsi_finish_command+0x74/0xb0
 [<c0315db8>] scsi_softirq+0xb8/0xe0
 [<c0127dd7>] do_softirq+0xe7/0xf0
 [<c0128375>] ksoftirqd+0xd5/0x100
 [<c01282a0>] ksoftirqd+0x0/0x100
 [<c01074e9>] kernel_thread_helper+0x5/0xc

Code: 0f 0b d8 03 cb 60 46 c0 e9 56 ff ff ff c7 44 24 0c d5 03 00 
 <0>Kernel panic: Fatal exception in interrupt
In interrupt handler - not syncing

Script done on Thu 17 Jul 2003 07:41:16 AM PDT

-- 
Mark Haverkamp <markh@osdl.org>


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-17 14:44                               ` Mark Haverkamp
@ 2003-07-17 15:43                                 ` James Bottomley
  0 siblings, 0 replies; 57+ messages in thread
From: James Bottomley @ 2003-07-17 15:43 UTC (permalink / raw)
  To: Mark Haverkamp
  Cc: Nick Piggin, Jens Axboe, Andrew Morton, Cliff White, linux-scsi

On Thu, 2003-07-17 at 10:44, Mark Haverkamp wrote:
> I added Jens' and your patch and got this when I did a mkfs on an
> aacraid disk:

I don't get the crash, but it still hangs for me.

Just as a matter of interest, you should be able to simulate the problem
with almost any SCSI driver by adding artificial HOST/DEVICE busy
returns in queuecommand.

This is what I did in 53c700.c:

	static int host_busy_cnt = 0;
	static int device_busy_cnt = 0;

	if((host_busy_cnt++)%1000 == 434) {
		printk("53c700 RETURNING HOST BUSY, depth %d\n",
		       NCR_700_get_depth(SCp->device));
		return SCSI_MLQUEUE_HOST_BUSY;
	}

	if((device_busy_cnt++) % 544 == 232) {
		printk("53c700 RETURNING DEVICE BUSY, depth %d\n",
		       NCR_700_get_depth(SCp->device));
		return SCSI_MLQUEUE_DEVICE_BUSY;
	}

This is my usual method of testing our requeuing code (the depth is
important, because we take different actions depending on whether the
depth is zero or not---exercises the starvation code).

I believe this should work with scsi_debug even if you don't actually
have a SCSI setup.

James



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-17 11:28                             ` Nick Piggin
  2003-07-17 11:29                               ` Jens Axboe
  2003-07-17 14:44                               ` Mark Haverkamp
@ 2003-07-17 20:46                               ` Mark Haverkamp
       [not found]                                 ` <1058481553 .19508.5.camel@markh1.pdx.osdl.net>
  2003-07-17 22:39                                 ` Mark Haverkamp
  2 siblings, 2 replies; 57+ messages in thread
From: Mark Haverkamp @ 2003-07-17 20:46 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Jens Axboe, Andrew Morton, James Bottomley, Cliff White,
	linux-scsi, Daniel McNeil

On Thu, 2003-07-17 at 04:28, Nick Piggin wrote:
> Jens Axboe wrote:
> 
> >On Thu, Jul 17 2003, Nick Piggin wrote:
> >
> 
> snip
> 
> >>
> >>Jens, this is exactly right. Would you be OK with this? Everyone else?
> >>
> >
> >I'm happy with it.
> >
> 
> OK, Mark, please try the following patch in combination with Jens'
> previous. Untested due to not being able to exercise the requeue
> path, so it might blow up.
> 
> ______________________________________________________________________
Nick,

Looking at the as_requeue_request function, the BUG that I sent you
earlier was caused from aic->nr_queued not being incremented.  When we
tried to increment this element in the requeue function, we found that
the ad->nr_dispatched variable was out of sync again.  The enclosed
patch contains Jens' patch and an updated patch to the
as_requeue_request code that just puts the request back on the
dispatched queue and increments aic->nr_dispatched.  We have run this
code for a while and it seems to work.  Let me know if we've over
simplified things here.

Thanks,

Mark and Daniel.

===== drivers/block/as-iosched.c 1.5 vs edited =====
--- 1.5/drivers/block/as-iosched.c	Sat Jul 12 04:28:01 2003
+++ edited/drivers/block/as-iosched.c	Thu Jul 17 13:35:03 2003
@@ -1306,6 +1306,30 @@
 	as_update_arq(ad, arq); /* keep state machine up to date */
 }
 
+/*
+ * requeue the request. The request has not been completed, nor is it a
+ * new request, so don't touch accounting.
+ */
+static void as_requeue_request(request_queue_t *q, struct request *rq)
+{
+	struct as_data *ad = q->elevator.elevator_data;
+	struct as_rq *arq = RQ_DATA(rq);
+	int data_dir;
+
+	if (arq->io_context && arq->io_context->aic)
+		atomic_inc(&arq->io_context->aic->nr_dispatched);
+
+	list_add(&rq->queuelist, ad->dispatch->prev);
+
+	/* Stop anticipating - let this request get through */
+	if (!list_empty(ad->dispatch)
+		&& (ad->antic_status == ANTIC_WAIT_REQ
+			|| ad->antic_status == ANTIC_WAIT_NEXT))
+		as_antic_stop(ad);
+
+	return;
+}
+
 static void
 as_insert_request(request_queue_t *q, struct request *rq,
 			struct list_head *insert_here)
@@ -1821,6 +1845,7 @@
 	.elevator_next_req_fn =		as_next_request,
 	.elevator_add_req_fn =		as_insert_request,
 	.elevator_remove_req_fn =	as_remove_request,
+	.elevator_requeue_req_fn = 	as_requeue_request,
 	.elevator_queue_empty_fn =	as_queue_empty,
 	.elevator_completed_req_fn =	as_completed_request,
 	.elevator_former_req_fn =	as_former_request,
===== drivers/block/elevator.c 1.46 vs edited =====
--- 1.46/drivers/block/elevator.c	Fri Jul  4 23:52:45 2003
+++ edited/drivers/block/elevator.c	Thu Jul 17 07:30:06 2003
@@ -214,6 +214,18 @@
 		e->elevator_merge_req_fn(q, rq, next);
 }
 
+void elv_requeue_request(request_queue_t *q, struct request *rq)
+{
+	/*
+	 * if iosched has an explicit requeue hook, then use that. otherwise
+	 * just put the request at the front of the queue
+	 */
+	if (q->elevator.elevator_requeue_req_fn)
+		q->elevator.elevator_requeue_req_fn(q, rq);
+	else
+		__elv_add_request(q, rq, 0, 0);
+}
+
 void __elv_add_request(request_queue_t *q, struct request *rq, int at_end,
 		       int plug)
 {
@@ -417,6 +429,7 @@
 
 EXPORT_SYMBOL(elv_add_request);
 EXPORT_SYMBOL(__elv_add_request);
+EXPORT_SYMBOL(elv_requeue_request);
 EXPORT_SYMBOL(elv_next_request);
 EXPORT_SYMBOL(elv_remove_request);
 EXPORT_SYMBOL(elv_queue_empty);
===== drivers/block/ll_rw_blk.c 1.192 vs edited =====
--- 1.192/drivers/block/ll_rw_blk.c	Sun Jul 13 05:15:43 2003
+++ edited/drivers/block/ll_rw_blk.c	Thu Jul 17 07:30:06 2003
@@ -1494,6 +1494,23 @@
 
 	return rq;
 }
+/**
+ * blk_requeue_request - put a request back on queue
+ * @q:		request queue where request should be inserted
+ * @rq:		request to be inserted
+ *
+ * Description:
+ *    Drivers often keep queueing requests until the hardware cannot accept
+ *    more, when that condition happens we need to put the request back
+ *    on the queue. Must be called with queue lock held.
+ */
+void blk_requeue_request(request_queue_t *q, struct request *rq)
+{
+	if (blk_rq_tagged(rq))
+		blk_queue_end_tag(q, rq);
+
+	elv_requeue_request(q, rq);
+}
 
 /**
  * blk_insert_request - insert a special request in to a request queue
@@ -2730,6 +2747,7 @@
 EXPORT_SYMBOL(blk_get_request);
 EXPORT_SYMBOL(blk_put_request);
 EXPORT_SYMBOL(blk_insert_request);
+EXPORT_SYMBOL(blk_requeue_request);
 
 EXPORT_SYMBOL(blk_queue_prep_rq);
 EXPORT_SYMBOL(blk_queue_merge_bvec);
===== drivers/scsi/scsi_lib.c 1.99 vs edited =====
--- 1.99/drivers/scsi/scsi_lib.c	Sun Jun 29 18:14:44 2003
+++ edited/drivers/scsi/scsi_lib.c	Thu Jul 17 07:30:06 2003
@@ -444,22 +444,8 @@
  */
 static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	cmd->request->special = cmd;
-	if (blk_rq_tagged(cmd->request))
-		blk_queue_end_tag(q, cmd->request);
-
-	/*
-	 * set REQ_SPECIAL - we have a command
-	 * clear REQ_DONTPREP - we assume the sg table has been 
-	 *	nuked so we need to set it up again.
-	 */
-	cmd->request->flags |= REQ_SPECIAL;
 	cmd->request->flags &= ~REQ_DONTPREP;
-	__elv_add_request(q, cmd->request, 0, 0);
-	spin_unlock_irqrestore(q->queue_lock, flags);
+	blk_insert_request(q, cmd->request, 1, cmd);
 
 	scsi_run_queue(q);
 }
@@ -1213,9 +1199,7 @@
 	 * later time.
 	 */
 	spin_lock_irq(q->queue_lock);
-	if (blk_rq_tagged(req))
-		blk_queue_end_tag(q, req);
-	__elv_add_request(q, req, 0, 0);
+	blk_requeue_request(q, req);
 	sdev->device_busy--;
 	if(sdev->device_busy == 0)
 		blk_plug_device(q);
===== include/linux/blkdev.h 1.116 vs edited =====
--- 1.116/include/linux/blkdev.h	Fri Jul  4 23:52:51 2003
+++ edited/include/linux/blkdev.h	Thu Jul 17 07:30:06 2003
@@ -491,6 +491,7 @@
 extern struct request *blk_get_request(request_queue_t *, int, int);
 extern void blk_put_request(struct request *);
 extern void blk_insert_request(request_queue_t *, struct request *, int, void *);
+extern void blk_requeue_request(request_queue_t *, struct request *);
 extern void blk_plug_device(request_queue_t *);
 extern int blk_remove_plug(request_queue_t *);
 extern void blk_recount_segments(request_queue_t *, struct bio *);
===== include/linux/elevator.h 1.25 vs edited =====
--- 1.25/include/linux/elevator.h	Fri Jul  4 23:52:40 2003
+++ edited/include/linux/elevator.h	Thu Jul 17 07:30:06 2003
@@ -13,6 +13,7 @@
 typedef void (elevator_add_req_fn) (request_queue_t *, struct request *, struct list_head *);
 typedef int (elevator_queue_empty_fn) (request_queue_t *);
 typedef void (elevator_remove_req_fn) (request_queue_t *, struct request *);
+typedef void (elevator_requeue_req_fn) (request_queue_t *, struct request *);
 typedef struct request *(elevator_request_list_fn) (request_queue_t *, struct request *);
 typedef struct list_head *(elevator_get_sort_head_fn) (request_queue_t *, struct request *);
 typedef void (elevator_completed_req_fn) (request_queue_t *, struct request *);
@@ -33,6 +34,7 @@
 	elevator_next_req_fn *elevator_next_req_fn;
 	elevator_add_req_fn *elevator_add_req_fn;
 	elevator_remove_req_fn *elevator_remove_req_fn;
+	elevator_requeue_req_fn *elevator_requeue_req_fn;
 
 	elevator_queue_empty_fn *elevator_queue_empty_fn;
 	elevator_completed_req_fn *elevator_completed_req_fn;
@@ -64,6 +66,7 @@
 			       struct request *);
 extern void elv_merged_request(request_queue_t *, struct request *);
 extern void elv_remove_request(request_queue_t *, struct request *);
+extern void elv_requeue_request(request_queue_t *, struct request *);
 extern int elv_queue_empty(request_queue_t *);
 extern struct request *elv_next_request(struct request_queue *q);
 extern struct request *elv_former_request(request_queue_t *, struct request *);

-- 
Mark Haverkamp <markh@osdl.org>


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-17 20:46                               ` Mark Haverkamp
       [not found]                                 ` <1058481553 .19508.5.camel@markh1.pdx.osdl.net>
@ 2003-07-17 22:39                                 ` Mark Haverkamp
  2003-07-17 23:47                                   ` Daniel McNeil
  1 sibling, 1 reply; 57+ messages in thread
From: Mark Haverkamp @ 2003-07-17 22:39 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Jens Axboe, Andrew Morton, James Bottomley, Cliff White,
	linux-scsi, Daniel McNeil

On Thu, 2003-07-17 at 13:46, Mark Haverkamp wrote: 
> On Thu, 2003-07-17 at 04:28, Nick Piggin wrote:
> > Jens Axboe wrote:
> > 
> > >On Thu, Jul 17 2003, Nick Piggin wrote:
> > >
> > 
> > snip
> > 
> > >>
> > >>Jens, this is exactly right. Would you be OK with this? Everyone else?
> > >>
> > >
> > >I'm happy with it.
> > >
> > 
> > OK, Mark, please try the following patch in combination with Jens'
> > previous. Untested due to not being able to exercise the requeue
> > path, so it might blow up.
> > 
> > ______________________________________________________________________
> Nick,
> 
> Looking at the as_requeue_request function, the BUG that I sent you
> earlier was caused from aic->nr_queued not being incremented.  When we
> tried to increment this element in the requeue function, we found that
> the ad->nr_dispatched variable was out of sync again.  The enclosed
> patch contains Jens' patch and an updated patch to the
> as_requeue_request code that just puts the request back on the
> dispatched queue and increments aic->nr_dispatched.  We have run this
> code for a while and it seems to work.  Let me know if we've over
> simplified things here.
> 
> Thanks,
> 
> Mark and Daniel.

I made a small change to the patch I previously sent.  When I did some
more testing with multiple disks, I found that the arq pointer from the
request was NULL sometimes. This patch checks for a non-NULL arq pointer
before dereferencing.  This allows me to do mkfs on multiple disks at
the same time.

Mark.

===== drivers/block/as-iosched.c 1.5 vs edited =====
--- 1.5/drivers/block/as-iosched.c	Sat Jul 12 04:28:01 2003
+++ edited/drivers/block/as-iosched.c	Thu Jul 17 15:31:27 2003
@@ -1306,6 +1306,29 @@
 	as_update_arq(ad, arq); /* keep state machine up to date */
 }
 
+/*
+ * requeue the request. The request has not been completed, nor is it a
+ * new request, so don't touch accounting.
+ */
+static void as_requeue_request(request_queue_t *q, struct request *rq)
+{
+	struct as_data *ad = q->elevator.elevator_data;
+	struct as_rq *arq = RQ_DATA(rq);
+
+	if (arq && arq->io_context && arq->io_context->aic)
+		atomic_inc(&arq->io_context->aic->nr_dispatched);
+
+	list_add(&rq->queuelist, ad->dispatch->prev);
+
+	/* Stop anticipating - let this request get through */
+	if (!list_empty(ad->dispatch)
+		&& (ad->antic_status == ANTIC_WAIT_REQ
+			|| ad->antic_status == ANTIC_WAIT_NEXT))
+		as_antic_stop(ad);
+
+	return;
+}
+
 static void
 as_insert_request(request_queue_t *q, struct request *rq,
 			struct list_head *insert_here)
@@ -1821,6 +1844,7 @@
 	.elevator_next_req_fn =		as_next_request,
 	.elevator_add_req_fn =		as_insert_request,
 	.elevator_remove_req_fn =	as_remove_request,
+	.elevator_requeue_req_fn = 	as_requeue_request,
 	.elevator_queue_empty_fn =	as_queue_empty,
 	.elevator_completed_req_fn =	as_completed_request,
 	.elevator_former_req_fn =	as_former_request,
===== drivers/block/elevator.c 1.46 vs edited =====
--- 1.46/drivers/block/elevator.c	Fri Jul  4 23:52:45 2003
+++ edited/drivers/block/elevator.c	Thu Jul 17 07:30:06 2003
@@ -214,6 +214,18 @@
 		e->elevator_merge_req_fn(q, rq, next);
 }
 
+void elv_requeue_request(request_queue_t *q, struct request *rq)
+{
+	/*
+	 * if iosched has an explicit requeue hook, then use that. otherwise
+	 * just put the request at the front of the queue
+	 */
+	if (q->elevator.elevator_requeue_req_fn)
+		q->elevator.elevator_requeue_req_fn(q, rq);
+	else
+		__elv_add_request(q, rq, 0, 0);
+}
+
 void __elv_add_request(request_queue_t *q, struct request *rq, int at_end,
 		       int plug)
 {
@@ -417,6 +429,7 @@
 
 EXPORT_SYMBOL(elv_add_request);
 EXPORT_SYMBOL(__elv_add_request);
+EXPORT_SYMBOL(elv_requeue_request);
 EXPORT_SYMBOL(elv_next_request);
 EXPORT_SYMBOL(elv_remove_request);
 EXPORT_SYMBOL(elv_queue_empty);
===== drivers/block/ll_rw_blk.c 1.192 vs edited =====
--- 1.192/drivers/block/ll_rw_blk.c	Sun Jul 13 05:15:43 2003
+++ edited/drivers/block/ll_rw_blk.c	Thu Jul 17 07:30:06 2003
@@ -1494,6 +1494,23 @@
 
 	return rq;
 }
+/**
+ * blk_requeue_request - put a request back on queue
+ * @q:		request queue where request should be inserted
+ * @rq:		request to be inserted
+ *
+ * Description:
+ *    Drivers often keep queueing requests until the hardware cannot accept
+ *    more, when that condition happens we need to put the request back
+ *    on the queue. Must be called with queue lock held.
+ */
+void blk_requeue_request(request_queue_t *q, struct request *rq)
+{
+	if (blk_rq_tagged(rq))
+		blk_queue_end_tag(q, rq);
+
+	elv_requeue_request(q, rq);
+}
 
 /**
  * blk_insert_request - insert a special request in to a request queue
@@ -2730,6 +2747,7 @@
 EXPORT_SYMBOL(blk_get_request);
 EXPORT_SYMBOL(blk_put_request);
 EXPORT_SYMBOL(blk_insert_request);
+EXPORT_SYMBOL(blk_requeue_request);
 
 EXPORT_SYMBOL(blk_queue_prep_rq);
 EXPORT_SYMBOL(blk_queue_merge_bvec);
===== drivers/scsi/scsi_lib.c 1.99 vs edited =====
--- 1.99/drivers/scsi/scsi_lib.c	Sun Jun 29 18:14:44 2003
+++ edited/drivers/scsi/scsi_lib.c	Thu Jul 17 07:30:06 2003
@@ -444,22 +444,8 @@
  */
 static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	cmd->request->special = cmd;
-	if (blk_rq_tagged(cmd->request))
-		blk_queue_end_tag(q, cmd->request);
-
-	/*
-	 * set REQ_SPECIAL - we have a command
-	 * clear REQ_DONTPREP - we assume the sg table has been 
-	 *	nuked so we need to set it up again.
-	 */
-	cmd->request->flags |= REQ_SPECIAL;
 	cmd->request->flags &= ~REQ_DONTPREP;
-	__elv_add_request(q, cmd->request, 0, 0);
-	spin_unlock_irqrestore(q->queue_lock, flags);
+	blk_insert_request(q, cmd->request, 1, cmd);
 
 	scsi_run_queue(q);
 }
@@ -1213,9 +1199,7 @@
 	 * later time.
 	 */
 	spin_lock_irq(q->queue_lock);
-	if (blk_rq_tagged(req))
-		blk_queue_end_tag(q, req);
-	__elv_add_request(q, req, 0, 0);
+	blk_requeue_request(q, req);
 	sdev->device_busy--;
 	if(sdev->device_busy == 0)
 		blk_plug_device(q);
===== include/linux/blkdev.h 1.116 vs edited =====
--- 1.116/include/linux/blkdev.h	Fri Jul  4 23:52:51 2003
+++ edited/include/linux/blkdev.h	Thu Jul 17 07:30:06 2003
@@ -491,6 +491,7 @@
 extern struct request *blk_get_request(request_queue_t *, int, int);
 extern void blk_put_request(struct request *);
 extern void blk_insert_request(request_queue_t *, struct request *, int, void *);
+extern void blk_requeue_request(request_queue_t *, struct request *);
 extern void blk_plug_device(request_queue_t *);
 extern int blk_remove_plug(request_queue_t *);
 extern void blk_recount_segments(request_queue_t *, struct bio *);
===== include/linux/elevator.h 1.25 vs edited =====
--- 1.25/include/linux/elevator.h	Fri Jul  4 23:52:40 2003
+++ edited/include/linux/elevator.h	Thu Jul 17 07:30:06 2003
@@ -13,6 +13,7 @@
 typedef void (elevator_add_req_fn) (request_queue_t *, struct request *, struct list_head *);
 typedef int (elevator_queue_empty_fn) (request_queue_t *);
 typedef void (elevator_remove_req_fn) (request_queue_t *, struct request *);
+typedef void (elevator_requeue_req_fn) (request_queue_t *, struct request *);
 typedef struct request *(elevator_request_list_fn) (request_queue_t *, struct request *);
 typedef struct list_head *(elevator_get_sort_head_fn) (request_queue_t *, struct request *);
 typedef void (elevator_completed_req_fn) (request_queue_t *, struct request *);
@@ -33,6 +34,7 @@
 	elevator_next_req_fn *elevator_next_req_fn;
 	elevator_add_req_fn *elevator_add_req_fn;
 	elevator_remove_req_fn *elevator_remove_req_fn;
+	elevator_requeue_req_fn *elevator_requeue_req_fn;
 
 	elevator_queue_empty_fn *elevator_queue_empty_fn;
 	elevator_completed_req_fn *elevator_completed_req_fn;
@@ -64,6 +66,7 @@
 			       struct request *);
 extern void elv_merged_request(request_queue_t *, struct request *);
 extern void elv_remove_request(request_queue_t *, struct request *);
+extern void elv_requeue_request(request_queue_t *, struct request *);
 extern int elv_queue_empty(request_queue_t *);
 extern struct request *elv_next_request(struct request_queue *q);
 extern struct request *elv_former_request(request_queue_t *, struct request *);


-- 

Mark Haverkamp <markh@osdl.org>


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-17 22:39                                 ` Mark Haverkamp
@ 2003-07-17 23:47                                   ` Daniel McNeil
  2003-07-18  0:00                                     ` Andrew Morton
  0 siblings, 1 reply; 57+ messages in thread
From: Daniel McNeil @ 2003-07-17 23:47 UTC (permalink / raw)
  To: Mark Haverkamp
  Cc: Nick Piggin, Jens Axboe, Andrew Morton, James Bottomley,
	Cliff White, linux-scsi

On Thu, 2003-07-17 at 15:39, Mark Haverkamp wrote:
> 
> ===== drivers/block/as-iosched.c 1.5 vs edited =====
> --- 1.5/drivers/block/as-iosched.c	Sat Jul 12 04:28:01 2003
> +++ edited/drivers/block/as-iosched.c	Thu Jul 17 15:31:27 2003
> @@ -1306,6 +1306,29 @@
>  	as_update_arq(ad, arq); /* keep state machine up to date */
>  }
>  
> +/*
> + * requeue the request. The request has not been completed, nor is it a
> + * new request, so don't touch accounting.
> + */
> +static void as_requeue_request(request_queue_t *q, struct request *rq)
> +{
> +	struct as_data *ad = q->elevator.elevator_data;
> +	struct as_rq *arq = RQ_DATA(rq);
> +
> +	if (arq && arq->io_context && arq->io_context->aic)
> +		atomic_inc(&arq->io_context->aic->nr_dispatched);
> +
> +	list_add(&rq->queuelist, ad->dispatch->prev);
> +
> +	/* Stop anticipating - let this request get through */
> +	if (!list_empty(ad->dispatch)
> +		&& (ad->antic_status == ANTIC_WAIT_REQ
> +			|| ad->antic_status == ANTIC_WAIT_NEXT))
> +		as_antic_stop(ad);
> +
> +	return;
> +}
> +

2 questions:

Should	
	list_add(&rq->queuelist, ad->dispatch->prev);
be
	list_add_tail(&rq->queuelist, ad->dispatch);

It is equivalent, but makes the code clearer that we're adding to the
end.

Also, isn't the: !list_empty(ad->dispatch) unnecessary since a
list_add was just done, so the list cannot be empty?

Daniel <daniel@osdl.org>




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-17 23:47                                   ` Daniel McNeil
@ 2003-07-18  0:00                                     ` Andrew Morton
  2003-07-18  5:14                                       ` Nick Piggin
  0 siblings, 1 reply; 57+ messages in thread
From: Andrew Morton @ 2003-07-18  0:00 UTC (permalink / raw)
  To: Daniel McNeil; +Cc: markh, piggin, axboe, James.Bottomley, cliffw, linux-scsi

Daniel McNeil <daniel@osdl.org> wrote:
>
> Should	
> 	list_add(&rq->queuelist, ad->dispatch->prev);
> be
> 	list_add_tail(&rq->queuelist, ad->dispatch);
> 
> It is equivalent, but makes the code clearer that we're adding to the
> end.

yup.

> Also, isn't the: !list_empty(ad->dispatch) unnecessary since a
> list_add was just done, so the list cannot be empty?

more yup.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-18  0:00                                     ` Andrew Morton
@ 2003-07-18  5:14                                       ` Nick Piggin
  2003-07-18  5:25                                         ` Andrew Morton
  2003-07-18 15:03                                         ` Mark Haverkamp
  0 siblings, 2 replies; 57+ messages in thread
From: Nick Piggin @ 2003-07-18  5:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Daniel McNeil, markh, axboe, James.Bottomley, cliffw, linux-scsi



Andrew Morton wrote:

>Daniel McNeil <daniel@osdl.org> wrote:
>
>>Should	
>>	list_add(&rq->queuelist, ad->dispatch->prev);
>>be
>>	list_add_tail(&rq->queuelist, ad->dispatch);
>>
>>It is equivalent, but makes the code clearer that we're adding to the
>>end.
>>
>
>yup.
>
>
>>Also, isn't the: !list_empty(ad->dispatch) unnecessary since a
>>list_add was just done, so the list cannot be empty?
>>
>
>more yup.
>

Oh, so the request should go on the dispatch list?
OK that makes sense. Maybe also set arq->state = AS_RQ_DISPATCHED
just in case.

And I think the arq == NULL case should be checked for with
rq->flags & REQ_HARDBARRIER || !blk_fs_request(rq) for consistency.
I think if arq is still NULL then its a bug.

Thanks
Nick


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-18  5:14                                       ` Nick Piggin
@ 2003-07-18  5:25                                         ` Andrew Morton
  2003-07-18  5:30                                           ` Nick Piggin
  2003-07-18 14:00                                           ` James Bottomley
  2003-07-18 15:03                                         ` Mark Haverkamp
  1 sibling, 2 replies; 57+ messages in thread
From: Andrew Morton @ 2003-07-18  5:25 UTC (permalink / raw)
  To: Nick Piggin; +Cc: daniel, markh, axboe, James.Bottomley, cliffw, linux-scsi

Nick Piggin <piggin@cyberone.com.au> wrote:
>
> 
> Oh, so the request should go on the dispatch list?
> OK that makes sense. Maybe also set arq->state = AS_RQ_DISPATCHED
> just in case.
> 
> And I think the arq == NULL case should be checked for with
> rq->flags & REQ_HARDBARRIER || !blk_fs_request(rq) for consistency.
> I think if arq is still NULL then its a bug.

Guys, it's been three days and I'm getting a dribble of sad little messages
from various people whose boxes are stuck with everything in D state.

Can we please get this thing wrapped up within the next 12 hours so we
don't spend all of next week with Linus's tree in such a blatantly buggy
state?

Thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-18  5:25                                         ` Andrew Morton
@ 2003-07-18  5:30                                           ` Nick Piggin
  2003-07-18  5:35                                             ` Nick Piggin
  2003-07-18 14:16                                             ` James Bottomley
  2003-07-18 14:00                                           ` James Bottomley
  1 sibling, 2 replies; 57+ messages in thread
From: Nick Piggin @ 2003-07-18  5:30 UTC (permalink / raw)
  To: Andrew Morton; +Cc: daniel, markh, axboe, James.Bottomley, cliffw, linux-scsi



Andrew Morton wrote:

>Nick Piggin <piggin@cyberone.com.au> wrote:
>
>>
>>Oh, so the request should go on the dispatch list?
>>OK that makes sense. Maybe also set arq->state = AS_RQ_DISPATCHED
>>just in case.
>>
>>And I think the arq == NULL case should be checked for with
>>rq->flags & REQ_HARDBARRIER || !blk_fs_request(rq) for consistency.
>>I think if arq is still NULL then its a bug.
>>
>
>Guys, it's been three days and I'm getting a dribble of sad little messages
>from various people whose boxes are stuck with everything in D state.
>
>Can we please get this thing wrapped up within the next 12 hours so we
>don't spend all of next week with Linus's tree in such a blatantly buggy
>state?
>

No, I think Mark's last patch with Daniel's and my suggestions
should be fine. It fixes their problems. Its unfortunate that
this bug never got caught until now but it was inevitable to be
one or two there.

I have been pretty worried about all the D state messages, but
between this and the ReiserFS bug, I _think_ they have been
found. If not, please forward me the reports because I haven't
seen them. Thanks.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-18  5:30                                           ` Nick Piggin
@ 2003-07-18  5:35                                             ` Nick Piggin
  2003-07-18 14:16                                             ` James Bottomley
  1 sibling, 0 replies; 57+ messages in thread
From: Nick Piggin @ 2003-07-18  5:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: daniel, markh, axboe, James.Bottomley, cliffw, linux-scsi



Nick Piggin wrote:

>
>
> Andrew Morton wrote:
>
>> Nick Piggin <piggin@cyberone.com.au> wrote:
>>
>>>
>>> Oh, so the request should go on the dispatch list?
>>> OK that makes sense. Maybe also set arq->state = AS_RQ_DISPATCHED
>>> just in case.
>>>
>>> And I think the arq == NULL case should be checked for with
>>> rq->flags & REQ_HARDBARRIER || !blk_fs_request(rq) for consistency.
>>> I think if arq is still NULL then its a bug.
>>>
>>
>> Guys, it's been three days and I'm getting a dribble of sad little 
>> messages
>> from various people whose boxes are stuck with everything in D state.
>>
>> Can we please get this thing wrapped up within the next 12 hours so we
>> don't spend all of next week with Linus's tree in such a blatantly buggy
>> state?
>>
>
> No, I think Mark's last patch with Daniel's and my suggestions
> should be fine. It fixes their problems. Its unfortunate that
> this bug never got caught until now but it was inevitable to be
> one or two there.
>
> I have been pretty worried about all the D state messages, but
> between this and the ReiserFS bug, I _think_ they have been
> found. If not, please forward me the reports because I haven't
> seen them. Thanks.
>

Andrew send the one line quick fix in to Linus if you like
until the complete fix is sorted out.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-18  5:25                                         ` Andrew Morton
  2003-07-18  5:30                                           ` Nick Piggin
@ 2003-07-18 14:00                                           ` James Bottomley
  1 sibling, 0 replies; 57+ messages in thread
From: James Bottomley @ 2003-07-18 14:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nick Piggin, daniel, markh, Jens Axboe, cliffw, SCSI Mailing List

On Fri, 2003-07-18 at 00:25, Andrew Morton wrote:
> Guys, it's been three days and I'm getting a dribble of sad little messages
> from various people whose boxes are stuck with everything in D state.
> 
> Can we please get this thing wrapped up within the next 12 hours so we
> don't spend all of next week with Linus's tree in such a blatantly buggy
> state?

I'm ready to ship the SCSI tree with Jens' enabler now.  However, can I
have more time to test the proposed fix?  The SCSI code has four other
queue cases that the aacraid problem didn't hit but which my test
harness does.  I'm worried that yesterday I saw the hang where Mark got
a crash.

The workaround is simply to boot the kernel with elevator=deadline.

James



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-18  5:30                                           ` Nick Piggin
  2003-07-18  5:35                                             ` Nick Piggin
@ 2003-07-18 14:16                                             ` James Bottomley
  2003-07-18 16:30                                               ` Andrew Morton
  1 sibling, 1 reply; 57+ messages in thread
From: James Bottomley @ 2003-07-18 14:16 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, daniel, markh, Jens Axboe, cliffw, SCSI Mailing List

On Fri, 2003-07-18 at 00:30, Nick Piggin wrote:
> No, I think Mark's last patch with Daniel's and my suggestions
> should be fine. It fixes their problems. Its unfortunate that
> this bug never got caught until now but it was inevitable to be
> one or two there.

Could you roll up all your AS changes and send them to this list as a
single patch so I can test them out?

Thanks,

James



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-18  5:14                                       ` Nick Piggin
  2003-07-18  5:25                                         ` Andrew Morton
@ 2003-07-18 15:03                                         ` Mark Haverkamp
  2003-07-18 16:28                                           ` Mark Haverkamp
  1 sibling, 1 reply; 57+ messages in thread
From: Mark Haverkamp @ 2003-07-18 15:03 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, Daniel McNeil, Jens Axboe, James Bottomley,
	Cliff White, linux-scsi

On Thu, 2003-07-17 at 22:14, Nick Piggin wrote:
> Andrew Morton wrote:
> 
> >Daniel McNeil <daniel@osdl.org> wrote:
> >
> >>Should	
> >>	list_add(&rq->queuelist, ad->dispatch->prev);
> >>be
> >>	list_add_tail(&rq->queuelist, ad->dispatch);
> >>
> >>It is equivalent, but makes the code clearer that we're adding to the
> >>end.
> >>
> >
> >yup.
> >
> >
> >>Also, isn't the: !list_empty(ad->dispatch) unnecessary since a
> >>list_add was just done, so the list cannot be empty?
> >>
> >
> >more yup.
> >
> 
> Oh, so the request should go on the dispatch list?
> OK that makes sense. Maybe also set arq->state = AS_RQ_DISPATCHED
> just in case.
> 
> And I think the arq == NULL case should be checked for with
> rq->flags & REQ_HARDBARRIER || !blk_fs_request(rq) for consistency.
> I think if arq is still NULL then its a bug.
> 
> Thanks
> Nick

Nick,

I tried what you suggested and I'm pretty sure that aic->nr_dispatched
is not getting incremented when it should be.  I get:

Badness in as_remove_dispatched_request at drivers/block/as-iosched.c:1019

Which says that aic->nr_dispatched is wrong. I have attached the
as_requeue_request that I was using.  Is this what you had in mind?


Thanks,
Mark.

---------------

/*
 * requeue the request. The request has not been completed, nor is it a
 * new request, so don't touch accounting.
 */
static void as_requeue_request(request_queue_t *q, struct request *rq)
{
	struct as_data *ad = q->elevator.elevator_data;
	struct as_rq *arq = RQ_DATA(rq);

	if (unlikely(rq->flags & REQ_HARDBARRIER) || 
			unlikely(!blk_fs_request(rq))) {
		BUG_ON(!arq);
		arq->state = AS_RQ_DISPATCHED;
		if (arq->io_context && arq->io_context->aic)
			atomic_inc(&arq->io_context->aic->nr_dispatched);
	}

	list_add_tail(&rq->queuelist, ad->dispatch);

	/* Stop anticipating - let this request get through */
	if (ad->antic_status == ANTIC_WAIT_REQ
			|| ad->antic_status == ANTIC_WAIT_NEXT)
		as_antic_stop(ad);

	return;
}



-- 
Mark Haverkamp <markh@osdl.org>


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-18 15:03                                         ` Mark Haverkamp
@ 2003-07-18 16:28                                           ` Mark Haverkamp
  2003-07-18 16:56                                             ` James Bottomley
  0 siblings, 1 reply; 57+ messages in thread
From: Mark Haverkamp @ 2003-07-18 16:28 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, Daniel McNeil, Jens Axboe, James Bottomley,
	Cliff White, linux-scsi

On Fri, 2003-07-18 at 08:03, Mark Haverkamp wrote:
> On Thu, 2003-07-17 at 22:14, Nick Piggin wrote:
> > Andrew Morton wrote:
> > 
> > >Daniel McNeil <daniel@osdl.org> wrote:
> > >
> > >>Should	
> > >>	list_add(&rq->queuelist, ad->dispatch->prev);
> > >>be
> > >>	list_add_tail(&rq->queuelist, ad->dispatch);
> > >>
> > >>It is equivalent, but makes the code clearer that we're adding to the
> > >>end.
> > >>
> > >
> > >yup.
> > >
> > >
> > >>Also, isn't the: !list_empty(ad->dispatch) unnecessary since a
> > >>list_add was just done, so the list cannot be empty?
> > >>
> > >
> > >more yup.
> > >
> > 
> > Oh, so the request should go on the dispatch list?
> > OK that makes sense. Maybe also set arq->state = AS_RQ_DISPATCHED
> > just in case.
> > 
> > And I think the arq == NULL case should be checked for with
> > rq->flags & REQ_HARDBARRIER || !blk_fs_request(rq) for consistency.
> > I think if arq is still NULL then its a bug.
> > 
> > Thanks
> > Nick
> 
> Nick,
> 
> I tried what you suggested and I'm pretty sure that aic->nr_dispatched
> is not getting incremented when it should be.  I get:
> 
> Badness in as_remove_dispatched_request at drivers/block/as-iosched.c:1019
> 
> Which says that aic->nr_dispatched is wrong. I have attached the
> as_requeue_request that I was using.  Is this what you had in mind?
> 

Nick,

I thought more about what you were saying about checking arq == NULL so
I have enclosed a patch (compilation of everything) which works for us
here on our hardware.  I have created file systems on multiple disks,
did copies and also writes to a raw device as well and have had no
problems yet.

Mark.
-----------

===== drivers/block/as-iosched.c 1.5 vs edited =====
--- 1.5/drivers/block/as-iosched.c	Sat Jul 12 04:28:01 2003
+++ edited/drivers/block/as-iosched.c	Fri Jul 18 09:12:40 2003
@@ -1306,6 +1306,33 @@
 	as_update_arq(ad, arq); /* keep state machine up to date */
 }
 
+/*
+ * requeue the request. The request has not been completed, nor is it a
+ * new request, so don't touch accounting.
+ */
+static void as_requeue_request(request_queue_t *q, struct request *rq)
+{
+	struct as_data *ad = q->elevator.elevator_data;
+	struct as_rq *arq = RQ_DATA(rq);
+
+	if (arq) 
+		if (arq->io_context && arq->io_context->aic) {
+			arq->state = AS_RQ_DISPATCHED;
+			atomic_inc(&arq->io_context->aic->nr_dispatched);
+		}
+	else
+		WARN_ON(!(rq->flags & REQ_HARDBARRIER) && blk_fs_request(rq));
+
+	list_add_tail(&rq->queuelist, ad->dispatch);
+
+	/* Stop anticipating - let this request get through */
+	if (ad->antic_status == ANTIC_WAIT_REQ
+			|| ad->antic_status == ANTIC_WAIT_NEXT)
+		as_antic_stop(ad);
+
+	return;
+}
+
 static void
 as_insert_request(request_queue_t *q, struct request *rq,
 			struct list_head *insert_here)
@@ -1821,6 +1848,7 @@
 	.elevator_next_req_fn =		as_next_request,
 	.elevator_add_req_fn =		as_insert_request,
 	.elevator_remove_req_fn =	as_remove_request,
+	.elevator_requeue_req_fn = 	as_requeue_request,
 	.elevator_queue_empty_fn =	as_queue_empty,
 	.elevator_completed_req_fn =	as_completed_request,
 	.elevator_former_req_fn =	as_former_request,
===== drivers/block/elevator.c 1.46 vs edited =====
--- 1.46/drivers/block/elevator.c	Fri Jul  4 23:52:45 2003
+++ edited/drivers/block/elevator.c	Thu Jul 17 07:30:06 2003
@@ -214,6 +214,18 @@
 		e->elevator_merge_req_fn(q, rq, next);
 }
 
+void elv_requeue_request(request_queue_t *q, struct request *rq)
+{
+	/*
+	 * if iosched has an explicit requeue hook, then use that. otherwise
+	 * just put the request at the front of the queue
+	 */
+	if (q->elevator.elevator_requeue_req_fn)
+		q->elevator.elevator_requeue_req_fn(q, rq);
+	else
+		__elv_add_request(q, rq, 0, 0);
+}
+
 void __elv_add_request(request_queue_t *q, struct request *rq, int at_end,
 		       int plug)
 {
@@ -417,6 +429,7 @@
 
 EXPORT_SYMBOL(elv_add_request);
 EXPORT_SYMBOL(__elv_add_request);
+EXPORT_SYMBOL(elv_requeue_request);
 EXPORT_SYMBOL(elv_next_request);
 EXPORT_SYMBOL(elv_remove_request);
 EXPORT_SYMBOL(elv_queue_empty);
===== drivers/block/ll_rw_blk.c 1.192 vs edited =====
--- 1.192/drivers/block/ll_rw_blk.c	Sun Jul 13 05:15:43 2003
+++ edited/drivers/block/ll_rw_blk.c	Thu Jul 17 07:30:06 2003
@@ -1494,6 +1494,23 @@
 
 	return rq;
 }
+/**
+ * blk_requeue_request - put a request back on queue
+ * @q:		request queue where request should be inserted
+ * @rq:		request to be inserted
+ *
+ * Description:
+ *    Drivers often keep queueing requests until the hardware cannot accept
+ *    more, when that condition happens we need to put the request back
+ *    on the queue. Must be called with queue lock held.
+ */
+void blk_requeue_request(request_queue_t *q, struct request *rq)
+{
+	if (blk_rq_tagged(rq))
+		blk_queue_end_tag(q, rq);
+
+	elv_requeue_request(q, rq);
+}
 
 /**
  * blk_insert_request - insert a special request in to a request queue
@@ -2730,6 +2747,7 @@
 EXPORT_SYMBOL(blk_get_request);
 EXPORT_SYMBOL(blk_put_request);
 EXPORT_SYMBOL(blk_insert_request);
+EXPORT_SYMBOL(blk_requeue_request);
 
 EXPORT_SYMBOL(blk_queue_prep_rq);
 EXPORT_SYMBOL(blk_queue_merge_bvec);
===== drivers/scsi/scsi_lib.c 1.99 vs edited =====
--- 1.99/drivers/scsi/scsi_lib.c	Sun Jun 29 18:14:44 2003
+++ edited/drivers/scsi/scsi_lib.c	Thu Jul 17 07:30:06 2003
@@ -444,22 +444,8 @@
  */
 static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	cmd->request->special = cmd;
-	if (blk_rq_tagged(cmd->request))
-		blk_queue_end_tag(q, cmd->request);
-
-	/*
-	 * set REQ_SPECIAL - we have a command
-	 * clear REQ_DONTPREP - we assume the sg table has been 
-	 *	nuked so we need to set it up again.
-	 */
-	cmd->request->flags |= REQ_SPECIAL;
 	cmd->request->flags &= ~REQ_DONTPREP;
-	__elv_add_request(q, cmd->request, 0, 0);
-	spin_unlock_irqrestore(q->queue_lock, flags);
+	blk_insert_request(q, cmd->request, 1, cmd);
 
 	scsi_run_queue(q);
 }
@@ -1213,9 +1199,7 @@
 	 * later time.
 	 */
 	spin_lock_irq(q->queue_lock);
-	if (blk_rq_tagged(req))
-		blk_queue_end_tag(q, req);
-	__elv_add_request(q, req, 0, 0);
+	blk_requeue_request(q, req);
 	sdev->device_busy--;
 	if(sdev->device_busy == 0)
 		blk_plug_device(q);
===== include/linux/blkdev.h 1.116 vs edited =====
--- 1.116/include/linux/blkdev.h	Fri Jul  4 23:52:51 2003
+++ edited/include/linux/blkdev.h	Thu Jul 17 07:30:06 2003
@@ -491,6 +491,7 @@
 extern struct request *blk_get_request(request_queue_t *, int, int);
 extern void blk_put_request(struct request *);
 extern void blk_insert_request(request_queue_t *, struct request *, int, void *);
+extern void blk_requeue_request(request_queue_t *, struct request *);
 extern void blk_plug_device(request_queue_t *);
 extern int blk_remove_plug(request_queue_t *);
 extern void blk_recount_segments(request_queue_t *, struct bio *);
===== include/linux/elevator.h 1.25 vs edited =====
--- 1.25/include/linux/elevator.h	Fri Jul  4 23:52:40 2003
+++ edited/include/linux/elevator.h	Thu Jul 17 07:30:06 2003
@@ -13,6 +13,7 @@
 typedef void (elevator_add_req_fn) (request_queue_t *, struct request *, struct list_head *);
 typedef int (elevator_queue_empty_fn) (request_queue_t *);
 typedef void (elevator_remove_req_fn) (request_queue_t *, struct request *);
+typedef void (elevator_requeue_req_fn) (request_queue_t *, struct request *);
 typedef struct request *(elevator_request_list_fn) (request_queue_t *, struct request *);
 typedef struct list_head *(elevator_get_sort_head_fn) (request_queue_t *, struct request *);
 typedef void (elevator_completed_req_fn) (request_queue_t *, struct request *);
@@ -33,6 +34,7 @@
 	elevator_next_req_fn *elevator_next_req_fn;
 	elevator_add_req_fn *elevator_add_req_fn;
 	elevator_remove_req_fn *elevator_remove_req_fn;
+	elevator_requeue_req_fn *elevator_requeue_req_fn;
 
 	elevator_queue_empty_fn *elevator_queue_empty_fn;
 	elevator_completed_req_fn *elevator_completed_req_fn;
@@ -64,6 +66,7 @@
 			       struct request *);
 extern void elv_merged_request(request_queue_t *, struct request *);
 extern void elv_remove_request(request_queue_t *, struct request *);
+extern void elv_requeue_request(request_queue_t *, struct request *);
 extern int elv_queue_empty(request_queue_t *);
 extern struct request *elv_next_request(struct request_queue *q);
 extern struct request *elv_former_request(request_queue_t *, struct request *);

-- 
Mark Haverkamp <markh@osdl.org>


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-18 14:16                                             ` James Bottomley
@ 2003-07-18 16:30                                               ` Andrew Morton
  2003-07-18 16:41                                                 ` James Bottomley
  0 siblings, 1 reply; 57+ messages in thread
From: Andrew Morton @ 2003-07-18 16:30 UTC (permalink / raw)
  To: James Bottomley; +Cc: piggin, daniel, markh, axboe, cliffw, linux-scsi

James Bottomley <James.Bottomley@SteelEye.com> wrote:
>
> On Fri, 2003-07-18 at 00:30, Nick Piggin wrote:
> > No, I think Mark's last patch with Daniel's and my suggestions
> > should be fine. It fixes their problems. Its unfortunate that
> > this bug never got caught until now but it was inevitable to be
> > one or two there.
> 
> Could you roll up all your AS changes and send them to this list as a
> single patch so I can test them out?

Nick is probably on a plane by now.  I sent Mark's original one-liner to
Linus last night.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-18 16:30                                               ` Andrew Morton
@ 2003-07-18 16:41                                                 ` James Bottomley
  2003-07-18 17:25                                                   ` Alan Cox
  2003-07-18 17:45                                                   ` Andrew Morton
  0 siblings, 2 replies; 57+ messages in thread
From: James Bottomley @ 2003-07-18 16:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: piggin, daniel, markh, Jens Axboe, cliffw, SCSI Mailing List

On Fri, 2003-07-18 at 11:30, Andrew Morton wrote:
> Nick is probably on a plane by now.  I sent Mark's original one-liner to
> Linus last night.

That's not a fix, it only catches one out of the five requeue cases in
SCSI.

why don't we just use the deadline elevator until this can be sorted out
properly?

James



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-18 16:28                                           ` Mark Haverkamp
@ 2003-07-18 16:56                                             ` James Bottomley
  2003-07-18 17:46                                               ` Mark Haverkamp
  0 siblings, 1 reply; 57+ messages in thread
From: James Bottomley @ 2003-07-18 16:56 UTC (permalink / raw)
  To: Mark Haverkamp
  Cc: Nick Piggin, Andrew Morton, Daniel McNeil, Jens Axboe,
	Cliff White, linux-scsi

On Fri, 2003-07-18 at 11:28, Mark Haverkamp wrote:
> I thought more about what you were saying about checking arq == NULL so
> I have enclosed a patch (compilation of everything) which works for us
> here on our hardware.  I have created file systems on multiple disks,
> did copies and also writes to a raw device as well and have had no
> problems yet.

This one still hangs for me a few I/O operations after my test harness
throws its first device busy.

This:

> +	if (arq) 
> +		if (arq->io_context && arq->io_context->aic) {
> +			arq->state = AS_RQ_DISPATCHED;
> +			atomic_inc(&arq->io_context->aic->nr_dispatched);
> +		}
> +	else
> +		WARN_ON(!(rq->flags & REQ_HARDBARRIER) && blk_fs_request(rq));

Needs braces around the first if otherwise gcc may get where the else
belongs wrong.

James



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-18 16:41                                                 ` James Bottomley
@ 2003-07-18 17:25                                                   ` Alan Cox
  2003-07-31  7:40                                                     ` Nick Piggin
  2003-07-18 17:45                                                   ` Andrew Morton
  1 sibling, 1 reply; 57+ messages in thread
From: Alan Cox @ 2003-07-18 17:25 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andrew Morton, piggin, daniel, markh, Jens Axboe, cliffw,
	SCSI Mailing List

On Gwe, 2003-07-18 at 17:41, James Bottomley wrote:
> On Fri, 2003-07-18 at 11:30, Andrew Morton wrote:
> > Nick is probably on a plane by now.  I sent Mark's original one-liner to
> > Linus last night.
> 
> That's not a fix, it only catches one out of the five requeue cases in
> SCSI.
> 
> why don't we just use the deadline elevator until this can be sorted out
> properly?

For aacraid you can just throw commands at the card, its got a 233Mhz
ARM and a mind of its own anyway


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-18 16:41                                                 ` James Bottomley
  2003-07-18 17:25                                                   ` Alan Cox
@ 2003-07-18 17:45                                                   ` Andrew Morton
  2003-07-18 18:34                                                     ` James Bottomley
  1 sibling, 1 reply; 57+ messages in thread
From: Andrew Morton @ 2003-07-18 17:45 UTC (permalink / raw)
  To: James Bottomley; +Cc: piggin, daniel, markh, axboe, cliffw, linux-scsi

James Bottomley <James.Bottomley@steeleye.com> wrote:
>
> > Nick is probably on a plane by now.  I sent Mark's original one-liner to
> > Linus last night.
> 
> That's not a fix, it only catches one out of the five requeue cases in
> SCSI.

Well yes, it's a bandaid.  Anton was having hangs (sym2 I guess) and
if fixed them, too.

> why don't we just use the deadline elevator until this can be sorted out
> properly?

Could do I guess.  Or people who continue to have problems can boot with
elevator=deadline.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-18 16:56                                             ` James Bottomley
@ 2003-07-18 17:46                                               ` Mark Haverkamp
  2003-07-18 20:21                                                 ` James Bottomley
  0 siblings, 1 reply; 57+ messages in thread
From: Mark Haverkamp @ 2003-07-18 17:46 UTC (permalink / raw)
  To: James Bottomley
  Cc: Nick Piggin, Andrew Morton, Daniel McNeil, Jens Axboe,
	Cliff White, linux-scsi

On Fri, 2003-07-18 at 09:56, James Bottomley wrote:
> On Fri, 2003-07-18 at 11:28, Mark Haverkamp wrote:
> > I thought more about what you were saying about checking arq == NULL so
> > I have enclosed a patch (compilation of everything) which works for us
> > here on our hardware.  I have created file systems on multiple disks,
> > did copies and also writes to a raw device as well and have had no
> > problems yet.
> 
> This one still hangs for me a few I/O operations after my test harness
> throws its first device busy.

I'll try out your test harness on our hardware and see what happens.

> 
> This:
> 
> > +	if (arq) 
> > +		if (arq->io_context && arq->io_context->aic) {
> > +			arq->state = AS_RQ_DISPATCHED;
> > +			atomic_inc(&arq->io_context->aic->nr_dispatched);
> > +		}
> > +	else
> > +		WARN_ON(!(rq->flags & REQ_HARDBARRIER) && blk_fs_request(rq));
> 
> Needs braces around the first if otherwise gcc may get where the else
> belongs wrong.
> 
> James

oops, you are right.

===== drivers/block/as-iosched.c 1.5 vs edited =====
--- 1.5/drivers/block/as-iosched.c	Sat Jul 12 04:28:01 2003
+++ edited/drivers/block/as-iosched.c	Fri Jul 18 10:32:22 2003
@@ -1306,6 +1306,33 @@
 	as_update_arq(ad, arq); /* keep state machine up to date */
 }
 
+/*
+ * requeue the request. The request has not been completed, nor is it a
+ * new request, so don't touch accounting.
+ */
+static void as_requeue_request(request_queue_t *q, struct request *rq)
+{
+	struct as_data *ad = q->elevator.elevator_data;
+	struct as_rq *arq = RQ_DATA(rq);
+
+	if (arq) {
+		if (arq->io_context && arq->io_context->aic) {
+			arq->state = AS_RQ_DISPATCHED;
+			atomic_inc(&arq->io_context->aic->nr_dispatched);
+		}
+	} else
+		WARN_ON(!(rq->flags & REQ_HARDBARRIER) && blk_fs_request(rq));
+
+	list_add_tail(&rq->queuelist, ad->dispatch);
+
+	/* Stop anticipating - let this request get through */
+	if (ad->antic_status == ANTIC_WAIT_REQ
+			|| ad->antic_status == ANTIC_WAIT_NEXT)
+		as_antic_stop(ad);
+
+	return;
+}
+
 static void
 as_insert_request(request_queue_t *q, struct request *rq,
 			struct list_head *insert_here)
@@ -1821,6 +1848,7 @@
 	.elevator_next_req_fn =		as_next_request,
 	.elevator_add_req_fn =		as_insert_request,
 	.elevator_remove_req_fn =	as_remove_request,
+	.elevator_requeue_req_fn = 	as_requeue_request,
 	.elevator_queue_empty_fn =	as_queue_empty,
 	.elevator_completed_req_fn =	as_completed_request,
 	.elevator_former_req_fn =	as_former_request,
===== drivers/block/elevator.c 1.46 vs edited =====
--- 1.46/drivers/block/elevator.c	Fri Jul  4 23:52:45 2003
+++ edited/drivers/block/elevator.c	Thu Jul 17 07:30:06 2003
@@ -214,6 +214,18 @@
 		e->elevator_merge_req_fn(q, rq, next);
 }
 
+void elv_requeue_request(request_queue_t *q, struct request *rq)
+{
+	/*
+	 * if iosched has an explicit requeue hook, then use that. otherwise
+	 * just put the request at the front of the queue
+	 */
+	if (q->elevator.elevator_requeue_req_fn)
+		q->elevator.elevator_requeue_req_fn(q, rq);
+	else
+		__elv_add_request(q, rq, 0, 0);
+}
+
 void __elv_add_request(request_queue_t *q, struct request *rq, int at_end,
 		       int plug)
 {
@@ -417,6 +429,7 @@
 
 EXPORT_SYMBOL(elv_add_request);
 EXPORT_SYMBOL(__elv_add_request);
+EXPORT_SYMBOL(elv_requeue_request);
 EXPORT_SYMBOL(elv_next_request);
 EXPORT_SYMBOL(elv_remove_request);
 EXPORT_SYMBOL(elv_queue_empty);
===== drivers/block/ll_rw_blk.c 1.192 vs edited =====
--- 1.192/drivers/block/ll_rw_blk.c	Sun Jul 13 05:15:43 2003
+++ edited/drivers/block/ll_rw_blk.c	Thu Jul 17 07:30:06 2003
@@ -1494,6 +1494,23 @@
 
 	return rq;
 }
+/**
+ * blk_requeue_request - put a request back on queue
+ * @q:		request queue where request should be inserted
+ * @rq:		request to be inserted
+ *
+ * Description:
+ *    Drivers often keep queueing requests until the hardware cannot accept
+ *    more, when that condition happens we need to put the request back
+ *    on the queue. Must be called with queue lock held.
+ */
+void blk_requeue_request(request_queue_t *q, struct request *rq)
+{
+	if (blk_rq_tagged(rq))
+		blk_queue_end_tag(q, rq);
+
+	elv_requeue_request(q, rq);
+}
 
 /**
  * blk_insert_request - insert a special request in to a request queue
@@ -2730,6 +2747,7 @@
 EXPORT_SYMBOL(blk_get_request);
 EXPORT_SYMBOL(blk_put_request);
 EXPORT_SYMBOL(blk_insert_request);
+EXPORT_SYMBOL(blk_requeue_request);
 
 EXPORT_SYMBOL(blk_queue_prep_rq);
 EXPORT_SYMBOL(blk_queue_merge_bvec);
===== drivers/scsi/scsi_lib.c 1.99 vs edited =====
--- 1.99/drivers/scsi/scsi_lib.c	Sun Jun 29 18:14:44 2003
+++ edited/drivers/scsi/scsi_lib.c	Thu Jul 17 07:30:06 2003
@@ -444,22 +444,8 @@
  */
 static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	cmd->request->special = cmd;
-	if (blk_rq_tagged(cmd->request))
-		blk_queue_end_tag(q, cmd->request);
-
-	/*
-	 * set REQ_SPECIAL - we have a command
-	 * clear REQ_DONTPREP - we assume the sg table has been 
-	 *	nuked so we need to set it up again.
-	 */
-	cmd->request->flags |= REQ_SPECIAL;
 	cmd->request->flags &= ~REQ_DONTPREP;
-	__elv_add_request(q, cmd->request, 0, 0);
-	spin_unlock_irqrestore(q->queue_lock, flags);
+	blk_insert_request(q, cmd->request, 1, cmd);
 
 	scsi_run_queue(q);
 }
@@ -1213,9 +1199,7 @@
 	 * later time.
 	 */
 	spin_lock_irq(q->queue_lock);
-	if (blk_rq_tagged(req))
-		blk_queue_end_tag(q, req);
-	__elv_add_request(q, req, 0, 0);
+	blk_requeue_request(q, req);
 	sdev->device_busy--;
 	if(sdev->device_busy == 0)
 		blk_plug_device(q);
===== include/linux/blkdev.h 1.116 vs edited =====
--- 1.116/include/linux/blkdev.h	Fri Jul  4 23:52:51 2003
+++ edited/include/linux/blkdev.h	Thu Jul 17 07:30:06 2003
@@ -491,6 +491,7 @@
 extern struct request *blk_get_request(request_queue_t *, int, int);
 extern void blk_put_request(struct request *);
 extern void blk_insert_request(request_queue_t *, struct request *, int, void *);
+extern void blk_requeue_request(request_queue_t *, struct request *);
 extern void blk_plug_device(request_queue_t *);
 extern int blk_remove_plug(request_queue_t *);
 extern void blk_recount_segments(request_queue_t *, struct bio *);
===== include/linux/elevator.h 1.25 vs edited =====
--- 1.25/include/linux/elevator.h	Fri Jul  4 23:52:40 2003
+++ edited/include/linux/elevator.h	Thu Jul 17 07:30:06 2003
@@ -13,6 +13,7 @@
 typedef void (elevator_add_req_fn) (request_queue_t *, struct request *, struct list_head *);
 typedef int (elevator_queue_empty_fn) (request_queue_t *);
 typedef void (elevator_remove_req_fn) (request_queue_t *, struct request *);
+typedef void (elevator_requeue_req_fn) (request_queue_t *, struct request *);
 typedef struct request *(elevator_request_list_fn) (request_queue_t *, struct request *);
 typedef struct list_head *(elevator_get_sort_head_fn) (request_queue_t *, struct request *);
 typedef void (elevator_completed_req_fn) (request_queue_t *, struct request *);
@@ -33,6 +34,7 @@
 	elevator_next_req_fn *elevator_next_req_fn;
 	elevator_add_req_fn *elevator_add_req_fn;
 	elevator_remove_req_fn *elevator_remove_req_fn;
+	elevator_requeue_req_fn *elevator_requeue_req_fn;
 
 	elevator_queue_empty_fn *elevator_queue_empty_fn;
 	elevator_completed_req_fn *elevator_completed_req_fn;
@@ -64,6 +66,7 @@
 			       struct request *);
 extern void elv_merged_request(request_queue_t *, struct request *);
 extern void elv_remove_request(request_queue_t *, struct request *);
+extern void elv_requeue_request(request_queue_t *, struct request *);
 extern int elv_queue_empty(request_queue_t *);
 extern struct request *elv_next_request(struct request_queue *q);
 extern struct request *elv_former_request(request_queue_t *, struct request *);

-- 
Mark Haverkamp <markh@osdl.org>


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-18 17:45                                                   ` Andrew Morton
@ 2003-07-18 18:34                                                     ` James Bottomley
  0 siblings, 0 replies; 57+ messages in thread
From: James Bottomley @ 2003-07-18 18:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: piggin, daniel, markh, Jens Axboe, cliffw, SCSI Mailing List

On Fri, 2003-07-18 at 12:45, Andrew Morton wrote:
> James Bottomley <James.Bottomley@steeleye.com> wrote:
> >
> > > Nick is probably on a plane by now.  I sent Mark's original one-liner to
> > > Linus last night.
> > 
> > That's not a fix, it only catches one out of the five requeue cases in
> > SCSI.
> 
> Well yes, it's a bandaid.  Anton was having hangs (sym2 I guess) and
> if fixed them, too.

Well, OK, but how about the attached bandaid which at least covers all
the queuing cases and confines the fix to the as code:

===== drivers/block/as-iosched.c 1.7 vs edited =====
--- 1.7/drivers/block/as-iosched.c	Fri Jul 18 00:31:01 2003
+++ edited/drivers/block/as-iosched.c	Fri Jul 18 13:28:21 2003
@@ -1306,6 +1306,15 @@
 	as_update_arq(ad, arq); /* keep state machine up to date */
 }
 
+/*
+ * FIXME: HACK for AS requeue problems
+ */
+static void as_requeue_request(request_queue_t *q, struct request *rq)
+{
+	elv_completed_request(q, rq);
+	__elv_add_request(q, rq, 0, 0);
+}
+
 static void
 as_insert_request(request_queue_t *q, struct request *rq,
 			struct list_head *insert_here)
@@ -1821,6 +1830,7 @@
 	.elevator_next_req_fn =		as_next_request,
 	.elevator_add_req_fn =		as_insert_request,
 	.elevator_remove_req_fn =	as_remove_request,
+	.elevator_requeue_req_fn = 	as_requeue_request,
 	.elevator_queue_empty_fn =	as_queue_empty,
 	.elevator_completed_req_fn =	as_completed_request,
 	.elevator_former_req_fn =	as_former_request,

I'll send this one in, if everyone is OK with it.

It still fails in my test harness, but that's a much more severe test
than most things people will come across in the field.

James



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-18 17:46                                               ` Mark Haverkamp
@ 2003-07-18 20:21                                                 ` James Bottomley
  2003-07-18 20:39                                                   ` Christoph Hellwig
                                                                     ` (3 more replies)
  0 siblings, 4 replies; 57+ messages in thread
From: James Bottomley @ 2003-07-18 20:21 UTC (permalink / raw)
  To: Mark Haverkamp
  Cc: Nick Piggin, Andrew Morton, Daniel McNeil, Jens Axboe,
	Cliff White, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 703 bytes --]

On Fri, 2003-07-18 at 12:46, Mark Haverkamp wrote:
> I'll try out your test harness on our hardware and see what happens.

OK, I think I found the problem.

Parts of the SCSI and block code don't distinguish between queueing and
requeueing, so they trip over the exact same error.

The (fairly invasive) fix is to add an extra parameter to
blk_insert_request to tell it if this is a reinsertion or a new request
(patch attached--against Jens' previous one).

With this, I now get the AS ioscheduler to survive my tests.

Incidentally, I'm not sure whether scsi_requeue_command() counts as a
reinsertion.  It's used to redo I/O after end_that_request_first() but
before end_that_request_last().

James



[-- Attachment #2: tmp.diff --]
[-- Type: text/plain, Size: 2981 bytes --]

===== drivers/block/ll_rw_blk.c 1.195 vs edited =====
--- 1.195/drivers/block/ll_rw_blk.c	Fri Jul 18 12:36:20 2003
+++ edited/drivers/block/ll_rw_blk.c	Fri Jul 18 14:54:34 2003
@@ -1518,6 +1518,7 @@
  * @rq:		request to be inserted
  * @at_head:	insert request at head or tail of queue
  * @data:	private data
+ * @reinsert:	true if request it a reinsertion of previously processed one
  *
  * Description:
  *    Many block devices need to execute commands asynchronously, so they don't
@@ -1532,7 +1533,7 @@
  *    host that is unable to accept a particular command.
  */
 void blk_insert_request(request_queue_t *q, struct request *rq,
-			int at_head, void *data)
+			int at_head, void *data, int reinsert)
 {
 	unsigned long flags;
 
@@ -1550,11 +1551,15 @@
 	/*
 	 * If command is tagged, release the tag
 	 */
-	if (blk_rq_tagged(rq))
-		blk_queue_end_tag(q, rq);
+	if(reinsert) {
+		blk_requeue_request(q, rq);
+	} else {
+		if (blk_rq_tagged(rq))
+			blk_queue_end_tag(q, rq);
 
-	drive_stat_acct(rq, rq->nr_sectors, 1);
-	__elv_add_request(q, rq, !at_head, 0);
+		drive_stat_acct(rq, rq->nr_sectors, 1);
+		__elv_add_request(q, rq, !at_head, 0);
+	}
 	q->request_fn(q);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 }
===== drivers/scsi/scsi_lib.c 1.104 vs edited =====
--- 1.104/drivers/scsi/scsi_lib.c	Fri Jul 18 12:36:25 2003
+++ edited/drivers/scsi/scsi_lib.c	Fri Jul 18 14:54:35 2003
@@ -69,7 +69,7 @@
 	 */
 	sreq->sr_request->flags &= ~REQ_DONTPREP;
 	blk_insert_request(sreq->sr_device->request_queue, sreq->sr_request,
-		       	   at_head, sreq);
+		       	   at_head, sreq, 0);
 	return 0;
 }
 
@@ -147,7 +147,7 @@
 	 * function.  The SCSI request function detects the blocked condition
 	 * and plugs the queue appropriately.
 	 */
-	blk_insert_request(device->request_queue, cmd->request, 1, cmd);
+	blk_insert_request(device->request_queue, cmd->request, 1, cmd, 1);
 	return 0;
 }
 
@@ -445,7 +445,7 @@
 static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd)
 {
 	cmd->request->flags &= ~REQ_DONTPREP;
-	blk_insert_request(q, cmd->request, 1, cmd);
+	blk_insert_request(q, cmd->request, 1, cmd, 1);
 
 	scsi_run_queue(q);
 }
===== include/linux/blkdev.h 1.118 vs edited =====
--- 1.118/include/linux/blkdev.h	Fri Jul 18 12:36:34 2003
+++ edited/include/linux/blkdev.h	Fri Jul 18 14:54:35 2003
@@ -490,7 +490,7 @@
 extern void __blk_attempt_remerge(request_queue_t *, struct request *);
 extern struct request *blk_get_request(request_queue_t *, int, int);
 extern void blk_put_request(struct request *);
-extern void blk_insert_request(request_queue_t *, struct request *, int, void *);
+extern void blk_insert_request(request_queue_t *, struct request *, int, void *, int);
 extern void blk_requeue_request(request_queue_t *, struct request *);
 extern void blk_plug_device(request_queue_t *);
 extern int blk_remove_plug(request_queue_t *);

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-18 20:21                                                 ` James Bottomley
@ 2003-07-18 20:39                                                   ` Christoph Hellwig
  2003-07-18 20:45                                                   ` Mark Haverkamp
                                                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 57+ messages in thread
From: Christoph Hellwig @ 2003-07-18 20:39 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mark Haverkamp, Nick Piggin, Andrew Morton, Daniel McNeil,
	Jens Axboe, Cliff White, linux-scsi

On Fri, Jul 18, 2003 at 03:21:56PM -0500, James Bottomley wrote:
> +++ edited/drivers/block/ll_rw_blk.c	Fri Jul 18 14:54:34 2003
> @@ -1518,6 +1518,7 @@
>   * @rq:		request to be inserted
>   * @at_head:	insert request at head or tail of queue
>   * @data:	private data
> + * @reinsert:	true if request it a reinsertion of previously processed one
>   *
>   * Description:
>   *    Many block devices need to execute commands asynchronously, so they don't
> @@ -1532,7 +1533,7 @@
>   *    host that is unable to accept a particular command.
>   */
>  void blk_insert_request(request_queue_t *q, struct request *rq,
> -			int at_head, void *data)
> +			int at_head, void *data, int reinsert)

Shouldn't the parameter be after at_head for consistency?


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-18 20:21                                                 ` James Bottomley
  2003-07-18 20:39                                                   ` Christoph Hellwig
@ 2003-07-18 20:45                                                   ` Mark Haverkamp
  2003-07-19  8:26                                                   ` Jens Axboe
  2003-07-31  7:16                                                   ` Nick Piggin
  3 siblings, 0 replies; 57+ messages in thread
From: Mark Haverkamp @ 2003-07-18 20:45 UTC (permalink / raw)
  To: James Bottomley
  Cc: Nick Piggin, Andrew Morton, Daniel McNeil, Jens Axboe,
	Cliff White, linux-scsi

On Fri, 2003-07-18 at 13:21, James Bottomley wrote:
> On Fri, 2003-07-18 at 12:46, Mark Haverkamp wrote:
> > I'll try out your test harness on our hardware and see what happens.
> 
> OK, I think I found the problem.
> 
> Parts of the SCSI and block code don't distinguish between queueing and
> requeueing, so they trip over the exact same error.
> 
> The (fairly invasive) fix is to add an extra parameter to
> blk_insert_request to tell it if this is a reinsertion or a new request
> (patch attached--against Jens' previous one).
> 
> With this, I now get the AS ioscheduler to survive my tests.
> 
> Incidentally, I'm not sure whether scsi_requeue_command() counts as a
> reinsertion.  It's used to redo I/O after end_that_request_first() but
> before end_that_request_last().
> 
> James

James,

I had just set my machine up with your test harness in the aacraid
driver and was able to get hangs after a short time.  I applied your
patch and now the hangs seem to be gone for me too.


Mark.
-- 
Mark Haverkamp <markh@osdl.org>


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-18 20:21                                                 ` James Bottomley
  2003-07-18 20:39                                                   ` Christoph Hellwig
  2003-07-18 20:45                                                   ` Mark Haverkamp
@ 2003-07-19  8:26                                                   ` Jens Axboe
  2003-07-31  7:16                                                   ` Nick Piggin
  3 siblings, 0 replies; 57+ messages in thread
From: Jens Axboe @ 2003-07-19  8:26 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mark Haverkamp, Nick Piggin, Andrew Morton, Daniel McNeil,
	Cliff White, linux-scsi

On Fri, Jul 18 2003, James Bottomley wrote:
> On Fri, 2003-07-18 at 12:46, Mark Haverkamp wrote:
> > I'll try out your test harness on our hardware and see what happens.
> 
> OK, I think I found the problem.
> 
> Parts of the SCSI and block code don't distinguish between queueing and
> requeueing, so they trip over the exact same error.
> 
> The (fairly invasive) fix is to add an extra parameter to
> blk_insert_request to tell it if this is a reinsertion or a new request
> (patch attached--against Jens' previous one).
> 
> With this, I now get the AS ioscheduler to survive my tests.
> 
> Incidentally, I'm not sure whether scsi_requeue_command() counts as a
> reinsertion.  It's used to redo I/O after end_that_request_first() but
> before end_that_request_last().
> 
  
> @@ -1550,11 +1551,15 @@
>  	/*
>  	 * If command is tagged, release the tag
>  	 */
> -	if (blk_rq_tagged(rq))
> -		blk_queue_end_tag(q, rq);
> +	if(reinsert) {
> +		blk_requeue_request(q, rq);
> +	} else {
> +		if (blk_rq_tagged(rq))
> +			blk_queue_end_tag(q, rq);
>  
> -	drive_stat_acct(rq, rq->nr_sectors, 1);
> -	__elv_add_request(q, rq, !at_head, 0);
> +		drive_stat_acct(rq, rq->nr_sectors, 1);
> +		__elv_add_request(q, rq, !at_head, 0);
> +	}
>  	q->request_fn(q);
>  	spin_unlock_irqrestore(q->queue_lock, flags);

much better, looks good to me. Andrew, could you pull your hack and
apply mine + this one? If James doesn't beat me to it, I can make a
complete patch this afternoon.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-18 20:21                                                 ` James Bottomley
                                                                     ` (2 preceding siblings ...)
  2003-07-19  8:26                                                   ` Jens Axboe
@ 2003-07-31  7:16                                                   ` Nick Piggin
  2003-07-31 14:28                                                     ` James Bottomley
  2003-07-31 14:40                                                     ` Mark Haverkamp
  3 siblings, 2 replies; 57+ messages in thread
From: Nick Piggin @ 2003-07-31  7:16 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mark Haverkamp, Andrew Morton, Daniel McNeil, Jens Axboe,
	Cliff White, linux-scsi



James Bottomley wrote:

>On Fri, 2003-07-18 at 12:46, Mark Haverkamp wrote:
>
>>I'll try out your test harness on our hardware and see what happens.
>>

Sorry for coming in a few weeks late here. Thanks everyone who
has been taking time on this.

What is the status of this stuff? I see Andrew's latest tree still
has your "HACK: ***" in it, James. Who is putting together a complete
patch for Andrew?

>
>OK, I think I found the problem.
>

Yes, the problem with AS just the I-don't-know-about-requeue thing.
As you have found there are other non AS problems there.

Thanks again everyone.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-18 17:25                                                   ` Alan Cox
@ 2003-07-31  7:40                                                     ` Nick Piggin
  0 siblings, 0 replies; 57+ messages in thread
From: Nick Piggin @ 2003-07-31  7:40 UTC (permalink / raw)
  To: Alan Cox
  Cc: James Bottomley, Andrew Morton, daniel, markh, Jens Axboe,
	cliffw, SCSI Mailing List



Alan Cox wrote:

>On Gwe, 2003-07-18 at 17:41, James Bottomley wrote:
>
>>On Fri, 2003-07-18 at 11:30, Andrew Morton wrote:
>>
>>>Nick is probably on a plane by now.  I sent Mark's original one-liner to
>>>Linus last night.
>>>
>>That's not a fix, it only catches one out of the five requeue cases in
>>SCSI.
>>
>>why don't we just use the deadline elevator until this can be sorted out
>>properly?
>>
>
>For aacraid you can just throw commands at the card, its got a 233Mhz
>ARM and a mind of its own anyway
>
>

The problem that AS solves isn't that a card/disk doesn't know what
to do with the requests it has, its that they don't know what requests
they are going to get (likely to get).

On the other hand high end SCSI devices running database stuff often
take some performance hit when using AS vs deadline.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-31  7:16                                                   ` Nick Piggin
@ 2003-07-31 14:28                                                     ` James Bottomley
  2003-07-31 14:40                                                     ` Mark Haverkamp
  1 sibling, 0 replies; 57+ messages in thread
From: James Bottomley @ 2003-07-31 14:28 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Mark Haverkamp, Andrew Morton, Daniel McNeil, Jens Axboe,
	Cliff White, linux-scsi

On Thu, 2003-07-31 at 02:16, Nick Piggin wrote:
> What is the status of this stuff? I see Andrew's latest tree still
> has your "HACK: ***" in it, James. Who is putting together a complete
> patch for Andrew?

The status of this stuff in the -test2 kernel is:

Jens' requeue hook is in along with my block layer requeue fixes and the
SCSI layer changes.

This should fix all of the SCSI requeue problems and provide the general
framework.  However, in your as-iosched.c requeue hook I put this:

/*
 * FIXME: HACK for AS requeue problems
 */
static void as_requeue_request(request_queue_t *q, struct request *rq)
{
	elv_completed_request(q, rq);
	__elv_add_request(q, rq, 0, 0);
}

Which was the original hack (except that it now works for all the scsi
queueing cases).  I've tested this in all of the SCSI requeue cases with
my test harness and it seems to work.  You should now be able to
substitute the correct code in here.

I also had a brief conversation with Paul Larson where he said he'd do
LTP SCSI queueing tests if I put the ability to trigger the requeueing
errors in scsi_debug, so hopefully we should get quick warning of any
issues like this in future.

The only outstanding thing is I didn't know whether AS wants a requeue
between end_that_request_first and end_that_request_last treated as a
requeue or a new request.  Following the default logic, it currently
gets treated as a requeue, but it strikes me that this may not strictly
be true, since SCSI will treat the thing as a new and separate command.

James



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-31  7:16                                                   ` Nick Piggin
  2003-07-31 14:28                                                     ` James Bottomley
@ 2003-07-31 14:40                                                     ` Mark Haverkamp
  2003-07-31 22:48                                                       ` Nick Piggin
  1 sibling, 1 reply; 57+ messages in thread
From: Mark Haverkamp @ 2003-07-31 14:40 UTC (permalink / raw)
  To: Nick Piggin
  Cc: James Bottomley, Andrew Morton, Daniel McNeil, Jens Axboe,
	Cliff White, linux-scsi

On Thu, 2003-07-31 at 00:16, Nick Piggin wrote:
> James Bottomley wrote:
> 
> >On Fri, 2003-07-18 at 12:46, Mark Haverkamp wrote:
> >
> >>I'll try out your test harness on our hardware and see what happens.
> >>
> 
> Sorry for coming in a few weeks late here. Thanks everyone who
> has been taking time on this.
> 
> What is the status of this stuff? I see Andrew's latest tree still
> has your "HACK: ***" in it, James. Who is putting together a complete
> patch for Andrew?
> 
> >
> >OK, I think I found the problem.
> >
> 
> Yes, the problem with AS just the I-don't-know-about-requeue thing.
> As you have found there are other non AS problems there.
> 
> Thanks again everyone.

Nick,

Here is the requeue function I have been using for a while.  It is based
on your original requeue function.  It has been working for me.

Mark.

===== drivers/block/as-iosched.c 1.9 vs edited =====
--- 1.9/drivers/block/as-iosched.c	Fri Jul 25 11:19:44 2003
+++ edited/drivers/block/as-iosched.c	Thu Jul 31 07:30:04 2003
@@ -1307,12 +1307,30 @@
 }
 
 /*
- * FIXME: HACK for AS requeue problems
+ * requeue the request. The request has not been completed, nor is it a
+ * new request, so don't touch accounting.
  */
 static void as_requeue_request(request_queue_t *q, struct request *rq)
 {
-	elv_completed_request(q, rq);
-	__elv_add_request(q, rq, 0, 0);
+	struct as_data *ad = q->elevator.elevator_data;
+	struct as_rq *arq = RQ_DATA(rq);
+
+	if (arq) {
+		if (arq->io_context && arq->io_context->aic) {
+			arq->state = AS_RQ_DISPATCHED;
+			atomic_inc(&arq->io_context->aic->nr_dispatched);
+		}
+	} else
+		WARN_ON(!(rq->flags & REQ_HARDBARRIER) && blk_fs_request(rq));
+
+	list_add_tail(&rq->queuelist, ad->dispatch);
+
+	/* Stop anticipating - let this request get through */
+	if (ad->antic_status == ANTIC_WAIT_REQ
+			|| ad->antic_status == ANTIC_WAIT_NEXT)
+		as_antic_stop(ad);
+
+	return;
 }
 
 static void

-- 
Mark Haverkamp <markh@osdl.org>


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] as i/o hang with aacraid driver 2.6.0-test1
  2003-07-31 14:40                                                     ` Mark Haverkamp
@ 2003-07-31 22:48                                                       ` Nick Piggin
  0 siblings, 0 replies; 57+ messages in thread
From: Nick Piggin @ 2003-07-31 22:48 UTC (permalink / raw)
  To: Mark Haverkamp, James Bottomley
  Cc: Andrew Morton, Daniel McNeil, Jens Axboe, Cliff White, linux-scsi



Mark Haverkamp wrote:

>On Thu, 2003-07-31 at 00:16, Nick Piggin wrote:
>
>>James Bottomley wrote:
>>
>>
>>>On Fri, 2003-07-18 at 12:46, Mark Haverkamp wrote:
>>>
>>>
>>>>I'll try out your test harness on our hardware and see what happens.
>>>>
>>>>
>>Sorry for coming in a few weeks late here. Thanks everyone who
>>has been taking time on this.
>>
>>What is the status of this stuff? I see Andrew's latest tree still
>>has your "HACK: ***" in it, James. Who is putting together a complete
>>patch for Andrew?
>>
>>
>>>OK, I think I found the problem.
>>>
>>>
>>Yes, the problem with AS just the I-don't-know-about-requeue thing.
>>As you have found there are other non AS problems there.
>>
>>Thanks again everyone.
>>
>
>Nick,
>
>Here is the requeue function I have been using for a while.  It is based
>on your original requeue function.  It has been working for me.
>

Yeah, the patch looks good. James, can you test when you get time
please? Then Mark can send to Andrew. Thanks.



^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2003-07-31 22:49 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-07-15 23:02 [PATCH] as i/o hang with aacraid driver 2.6.0-test1 Mark Haverkamp
2003-07-16  1:40 ` Nick Piggin
2003-07-16  5:53   ` Jens Axboe
2003-07-16 12:41 ` James Bottomley
2003-07-16 12:45   ` Jens Axboe
2003-07-16 12:56     ` James Bottomley
2003-07-16 13:20       ` Jens Axboe
2003-07-16 14:07         ` James Bottomley
2003-07-16 17:04           ` Jens Axboe
2003-07-17  8:57             ` Andrew Morton
2003-07-17  8:59               ` Jens Axboe
2003-07-17  9:56                 ` Nick Piggin
2003-07-17 10:29                   ` Jens Axboe
2003-07-17 10:51                     ` Nick Piggin
2003-07-17 10:56                       ` Jens Axboe
2003-07-17 11:09                         ` Nick Piggin
2003-07-17 11:11                           ` Jens Axboe
2003-07-17 11:28                             ` Nick Piggin
2003-07-17 11:29                               ` Jens Axboe
2003-07-17 14:44                               ` Mark Haverkamp
2003-07-17 15:43                                 ` James Bottomley
2003-07-17 20:46                               ` Mark Haverkamp
     [not found]                                 ` <1058481553 .19508.5.camel@markh1.pdx.osdl.net>
2003-07-17 22:39                                 ` Mark Haverkamp
2003-07-17 23:47                                   ` Daniel McNeil
2003-07-18  0:00                                     ` Andrew Morton
2003-07-18  5:14                                       ` Nick Piggin
2003-07-18  5:25                                         ` Andrew Morton
2003-07-18  5:30                                           ` Nick Piggin
2003-07-18  5:35                                             ` Nick Piggin
2003-07-18 14:16                                             ` James Bottomley
2003-07-18 16:30                                               ` Andrew Morton
2003-07-18 16:41                                                 ` James Bottomley
2003-07-18 17:25                                                   ` Alan Cox
2003-07-31  7:40                                                     ` Nick Piggin
2003-07-18 17:45                                                   ` Andrew Morton
2003-07-18 18:34                                                     ` James Bottomley
2003-07-18 14:00                                           ` James Bottomley
2003-07-18 15:03                                         ` Mark Haverkamp
2003-07-18 16:28                                           ` Mark Haverkamp
2003-07-18 16:56                                             ` James Bottomley
2003-07-18 17:46                                               ` Mark Haverkamp
2003-07-18 20:21                                                 ` James Bottomley
2003-07-18 20:39                                                   ` Christoph Hellwig
2003-07-18 20:45                                                   ` Mark Haverkamp
2003-07-19  8:26                                                   ` Jens Axboe
2003-07-31  7:16                                                   ` Nick Piggin
2003-07-31 14:28                                                     ` James Bottomley
2003-07-31 14:40                                                     ` Mark Haverkamp
2003-07-31 22:48                                                       ` Nick Piggin
2003-07-17 10:57                       ` Jens Axboe
2003-07-17 11:08                         ` Nick Piggin
2003-07-17 11:10                           ` Jens Axboe
2003-07-17 11:21                             ` Nick Piggin
2003-07-17 11:23                               ` Jens Axboe
2003-07-17 11:29                                 ` Nick Piggin
2003-07-16 22:45         ` Mark Haverkamp
2003-07-16 13:06 ` Alan Cox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.