linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Block: Prevent busy looping
@ 2008-04-16 15:37 Elias Oltmanns
  2008-04-16 16:31 ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Elias Oltmanns @ 2008-04-16 15:37 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, stable

blk_run_queue() as well as blk_start_queue() plug the device on reentry
and schedule blk_unplug_work() right afterwards. However,
blk_plug_device() takes care of that already and makes sure that there is
a short delay before blk_unplug_work() is scheduled. This is important
to prevent busy looping and possibly system lockups as observed here:
<http://permalink.gmane.org/gmane.linux.ide/28351>.

Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
Cc: <stable@kernel.org>
---

 block/blk-core.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 2a438a9..e88a6f2 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -344,7 +344,6 @@ void blk_start_queue(struct request_queue *q)
 		clear_bit(QUEUE_FLAG_REENTER, &q->queue_flags);
 	} else {
 		blk_plug_device(q);
-		kblockd_schedule_work(&q->unplug_work);
 	}
 }
 EXPORT_SYMBOL(blk_start_queue);
@@ -412,7 +411,6 @@ void blk_run_queue(struct request_queue *q)
 			clear_bit(QUEUE_FLAG_REENTER, &q->queue_flags);
 		} else {
 			blk_plug_device(q);
-			kblockd_schedule_work(&q->unplug_work);
 		}
 	}
 



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Block: Prevent busy looping
  2008-04-16 15:37 Block: Prevent busy looping Elias Oltmanns
@ 2008-04-16 16:31 ` Jens Axboe
  2008-04-16 16:42   ` Jens Axboe
  2008-04-16 22:24   ` Elias Oltmanns
  0 siblings, 2 replies; 13+ messages in thread
From: Jens Axboe @ 2008-04-16 16:31 UTC (permalink / raw)
  To: Elias Oltmanns; +Cc: linux-kernel, stable

On Wed, Apr 16 2008, Elias Oltmanns wrote:
> blk_run_queue() as well as blk_start_queue() plug the device on reentry
> and schedule blk_unplug_work() right afterwards. However,
> blk_plug_device() takes care of that already and makes sure that there is
> a short delay before blk_unplug_work() is scheduled. This is important
> to prevent busy looping and possibly system lockups as observed here:
> <http://permalink.gmane.org/gmane.linux.ide/28351>.

If you call blk_start_queue() and blk_run_queue(), you better mean it.
There should be no delay. The only reason it does blk_plug_device() is
so that the work queue function will actually do some work. In the newer
kernels we just do:

        set_bit(QUEUE_FLAG_PLUGGED, &q->queue_flags);
        kblockd_schedule_work(q, &q->unplug_work);

instead, which is much better.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Block: Prevent busy looping
  2008-04-16 16:31 ` Jens Axboe
@ 2008-04-16 16:42   ` Jens Axboe
  2008-04-16 22:24   ` Elias Oltmanns
  1 sibling, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2008-04-16 16:42 UTC (permalink / raw)
  To: Elias Oltmanns; +Cc: linux-kernel, stable

On Wed, Apr 16 2008, Jens Axboe wrote:
> On Wed, Apr 16 2008, Elias Oltmanns wrote:
> > blk_run_queue() as well as blk_start_queue() plug the device on reentry
> > and schedule blk_unplug_work() right afterwards. However,
> > blk_plug_device() takes care of that already and makes sure that there is
> > a short delay before blk_unplug_work() is scheduled. This is important
> > to prevent busy looping and possibly system lockups as observed here:
> > <http://permalink.gmane.org/gmane.linux.ide/28351>.
> 
> If you call blk_start_queue() and blk_run_queue(), you better mean it.
> There should be no delay. The only reason it does blk_plug_device() is
> so that the work queue function will actually do some work. In the newer
> kernels we just do:
> 
>         set_bit(QUEUE_FLAG_PLUGGED, &q->queue_flags);
>         kblockd_schedule_work(q, &q->unplug_work);
> 
> instead, which is much better.

actually that's only in my devel tree, not in mainline yet (which still
does blk_plug_device() instead of just setting the plugged bit).

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Block: Prevent busy looping
  2008-04-16 16:31 ` Jens Axboe
  2008-04-16 16:42   ` Jens Axboe
@ 2008-04-16 22:24   ` Elias Oltmanns
  2008-04-17  7:13     ` Jens Axboe
  1 sibling, 1 reply; 13+ messages in thread
From: Elias Oltmanns @ 2008-04-16 22:24 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, stable

Jens Axboe <jens.axboe@oracle.com> wrote:
> On Wed, Apr 16 2008, Elias Oltmanns wrote:
>> blk_run_queue() as well as blk_start_queue() plug the device on reentry
>> and schedule blk_unplug_work() right afterwards. However,
>> blk_plug_device() takes care of that already and makes sure that there is
>> a short delay before blk_unplug_work() is scheduled. This is important
>> to prevent busy looping and possibly system lockups as observed here:
>> <http://permalink.gmane.org/gmane.linux.ide/28351>.
>
> If you call blk_start_queue() and blk_run_queue(), you better mean it.
> There should be no delay. The only reason it does blk_plug_device() is
> so that the work queue function will actually do some work.

Well, I'm mainly concerned with blk_run_queue(). In a comment it says
that it should recurse only once so as not to overrun the stack. On my
machine, however, immediate rescheduling may have exactly as disastrous
consequences as an overrunning stack would have since the system locks
up completely.

Just to get this straight: Are low level drivers allowed to rely on
blk_run_queue() that there will be no loops or do they have to make sure
that this function is not called from the request_fn() of the same
queue?

> In the newer kernels we just do:
>
>         set_bit(QUEUE_FLAG_PLUGGED, &q->queue_flags);
>         kblockd_schedule_work(q, &q->unplug_work);
>
> instead, which is much better.

Only as long as it doesn't get called from the request_fn() of the same
queue. Otherwise, there may be no chance for other threads to clear the
condition that caused blk_run_queue() to be called in the first place.

Regards,

Elias

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Block: Prevent busy looping
  2008-04-16 22:24   ` Elias Oltmanns
@ 2008-04-17  7:13     ` Jens Axboe
  2008-04-17  8:50       ` Elias Oltmanns
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2008-04-17  7:13 UTC (permalink / raw)
  To: Elias Oltmanns; +Cc: linux-kernel, stable

On Thu, Apr 17 2008, Elias Oltmanns wrote:
> Jens Axboe <jens.axboe@oracle.com> wrote:
> > On Wed, Apr 16 2008, Elias Oltmanns wrote:
> >> blk_run_queue() as well as blk_start_queue() plug the device on reentry
> >> and schedule blk_unplug_work() right afterwards. However,
> >> blk_plug_device() takes care of that already and makes sure that there is
> >> a short delay before blk_unplug_work() is scheduled. This is important
> >> to prevent busy looping and possibly system lockups as observed here:
> >> <http://permalink.gmane.org/gmane.linux.ide/28351>.
> >
> > If you call blk_start_queue() and blk_run_queue(), you better mean it.
> > There should be no delay. The only reason it does blk_plug_device() is
> > so that the work queue function will actually do some work.
> 
> Well, I'm mainly concerned with blk_run_queue(). In a comment it says
> that it should recurse only once so as not to overrun the stack. On my
> machine, however, immediate rescheduling may have exactly as disastrous
> consequences as an overrunning stack would have since the system locks
> up completely.
> 
> Just to get this straight: Are low level drivers allowed to rely on
> blk_run_queue() that there will be no loops or do they have to make sure
> that this function is not called from the request_fn() of the same
> queue?

It's not really designed for being called recursively. Which isn't the
problem imo, the problem is SCSI apparently being dumb and calling
blk_run_queue() all the time. blk_run_queue() must run the queue NOW. If
SCSI wants something like 'run the queue in a bit', it should use
blk_plug_device() instead.

> > In the newer kernels we just do:
> >
> >         set_bit(QUEUE_FLAG_PLUGGED, &q->queue_flags);
> >         kblockd_schedule_work(q, &q->unplug_work);
> >
> > instead, which is much better.
> 
> Only as long as it doesn't get called from the request_fn() of the same
> queue. Otherwise, there may be no chance for other threads to clear the
> condition that caused blk_run_queue() to be called in the first place.

Broken usage.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Prevent busy looping
  2008-04-17  7:13     ` Jens Axboe
@ 2008-04-17  8:50       ` Elias Oltmanns
  2008-06-11  7:11         ` Tejun Heo
  0 siblings, 1 reply; 13+ messages in thread
From: Elias Oltmanns @ 2008-04-17  8:50 UTC (permalink / raw)
  To: Tejun Heo, James Bottomley, Jens Axboe
  Cc: linux-ide, linux-scsi, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2388 bytes --]

Jens Axboe <jens.axboe@oracle.com> wrote:
> On Thu, Apr 17 2008, Elias Oltmanns wrote:
>> Jens Axboe <jens.axboe@oracle.com> wrote:
>> > On Wed, Apr 16 2008, Elias Oltmanns wrote:
>> >> blk_run_queue() as well as blk_start_queue() plug the device on reentry
>> >> and schedule blk_unplug_work() right afterwards. However,
>> >> blk_plug_device() takes care of that already and makes sure that there is
>> >> a short delay before blk_unplug_work() is scheduled. This is important
>> >> to prevent busy looping and possibly system lockups as observed here:
>> >> <http://permalink.gmane.org/gmane.linux.ide/28351>.
>> >
>> > If you call blk_start_queue() and blk_run_queue(), you better mean it.
>> > There should be no delay. The only reason it does blk_plug_device() is
>> > so that the work queue function will actually do some work.
>> 
>> Well, I'm mainly concerned with blk_run_queue(). In a comment it says
>> that it should recurse only once so as not to overrun the stack. On my
>> machine, however, immediate rescheduling may have exactly as disastrous
>> consequences as an overrunning stack would have since the system locks
>> up completely.
>> 
>> Just to get this straight: Are low level drivers allowed to rely on
>> blk_run_queue() that there will be no loops or do they have to make sure
>> that this function is not called from the request_fn() of the same
>> queue?
>
> It's not really designed for being called recursively. Which isn't the
> problem imo, the problem is SCSI apparently being dumb and calling
> blk_run_queue() all the time. blk_run_queue() must run the queue NOW. If
> SCSI wants something like 'run the queue in a bit', it should use
> blk_plug_device() instead.

James would probably argue that this is alright as long as
max_device_blocked and max_host_blocked are bigger than one.

>
>> > In the newer kernels we just do:
>> >
>> >         set_bit(QUEUE_FLAG_PLUGGED, &q->queue_flags);
>> >         kblockd_schedule_work(q, &q->unplug_work);
>> >
>> > instead, which is much better.
>> 
>> Only as long as it doesn't get called from the request_fn() of the same
>> queue. Otherwise, there may be no chance for other threads to clear the
>> condition that caused blk_run_queue() to be called in the first place.
>
> Broken usage.

Right. Tejun, would it be possible to apply the patch below (2.6.25) or
do you see any alternative?

Regards,

Elias


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: adjust-blocked-counters.patch --]
[-- Type: text/x-patch, Size: 821 bytes --]

---

 drivers/ata/libata-scsi.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 1579539..ce865e9 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -831,7 +831,7 @@ static void ata_scsi_sdev_config(struct scsi_device *sdev)
 	 * prevent SCSI midlayer from automatically deferring
 	 * requests.
 	 */
-	sdev->max_device_blocked = 1;
+	sdev->max_device_blocked = 2;
 }
 
 /**
@@ -3206,7 +3206,7 @@ int ata_scsi_add_hosts(struct ata_host *host, struct scsi_host_template *sht)
 		 * Set host_blocked to 1 to prevent SCSI midlayer from
 		 * automatically deferring requests.
 		 */
-		shost->max_host_blocked = 1;
+		shost->max_host_blocked = 2;
 
 		rc = scsi_add_host(ap->scsi_host, ap->host->dev);
 		if (rc)

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Prevent busy looping
  2008-06-11  7:11         ` Tejun Heo
@ 2008-06-11  7:05           ` Alan Cox
  2008-06-11  8:03             ` Tejun Heo
  0 siblings, 1 reply; 13+ messages in thread
From: Alan Cox @ 2008-06-11  7:05 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Tejun Heo, James Bottomley, Jens Axboe, linux-ide, linux-scsi,
	linux-kernel

> Elias's synthetic test case triggered infinite loop because it wasn't
> a proper ->qc_defer().  ->qc_defer() should never defer commands when
> the target is idle.

Target or host ? We *do* defer commands in the case of an idle channel
when dealing with certain simplex controllers that can only issue one
command per host not one per cable (and in fact in the general case we
can defer commands due to activity on the other drive on the cable).

Alan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Prevent busy looping
  2008-04-17  8:50       ` Elias Oltmanns
@ 2008-06-11  7:11         ` Tejun Heo
  2008-06-11  7:05           ` Alan Cox
  0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2008-06-11  7:11 UTC (permalink / raw)
  To: Tejun Heo, James Bottomley, Jens Axboe, linux-ide, linux-scsi,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3940 bytes --]

Picking up a dropped ball.

Elias Oltmanns wrote:
> Jens Axboe <jens.axboe@oracle.com> wrote:
>> On Thu, Apr 17 2008, Elias Oltmanns wrote:
>>> Jens Axboe <jens.axboe@oracle.com> wrote:
>>>> On Wed, Apr 16 2008, Elias Oltmanns wrote:
>>>>> blk_run_queue() as well as blk_start_queue() plug the device on reentry
>>>>> and schedule blk_unplug_work() right afterwards. However,
>>>>> blk_plug_device() takes care of that already and makes sure that there is
>>>>> a short delay before blk_unplug_work() is scheduled. This is important
>>>>> to prevent busy looping and possibly system lockups as observed here:
>>>>> <http://permalink.gmane.org/gmane.linux.ide/28351>.
>>>> If you call blk_start_queue() and blk_run_queue(), you better mean it.
>>>> There should be no delay. The only reason it does blk_plug_device() is
>>>> so that the work queue function will actually do some work.
>>> Well, I'm mainly concerned with blk_run_queue(). In a comment it says
>>> that it should recurse only once so as not to overrun the stack. On my
>>> machine, however, immediate rescheduling may have exactly as disastrous
>>> consequences as an overrunning stack would have since the system locks
>>> up completely.
>>>
>>> Just to get this straight: Are low level drivers allowed to rely on
>>> blk_run_queue() that there will be no loops or do they have to make sure
>>> that this function is not called from the request_fn() of the same
>>> queue?
>> It's not really designed for being called recursively. Which isn't the
>> problem imo, the problem is SCSI apparently being dumb and calling
>> blk_run_queue() all the time. blk_run_queue() must run the queue NOW. If
>> SCSI wants something like 'run the queue in a bit', it should use
>> blk_plug_device() instead.
> 
> James would probably argue that this is alright as long as
> max_device_blocked and max_host_blocked are bigger than one.
> 
>>>> In the newer kernels we just do:
>>>>
>>>>         set_bit(QUEUE_FLAG_PLUGGED, &q->queue_flags);
>>>>         kblockd_schedule_work(q, &q->unplug_work);
>>>>
>>>> instead, which is much better.
>>> Only as long as it doesn't get called from the request_fn() of the same
>>> queue. Otherwise, there may be no chance for other threads to clear the
>>> condition that caused blk_run_queue() to be called in the first place.
>> Broken usage.
> 
> Right. Tejun, would it be possible to apply the patch below (2.6.25) or
> do you see any alternative?

Okay, I (finally) looked into this.  The meaning of blocked counts is
that to wait (count - 1) * plug delay if the target (be it device or
host) is idle before retrying.  libata uses deferring to implement
command scheduling and as such, there shouldn't be any delay if the
target is not busy.

Elias's synthetic test case triggered infinite loop because it wasn't
a proper ->qc_defer().  ->qc_defer() should never defer commands when
the target is idle.

Attached is debug patch to monitor libata command deferring.  It will
whine if certain command is retried 10 times or more, or ->qc_defer()
is called in rapid succession.  I couldn't find anything wrong with
it.  When IDENTIFY is queued while NCQ commands are in flight, it
waited for several hundreds millisecs for NCQ commands to drain with
each ->qc_defer() calling spaced by several milliseconds as determined
by in-flight NCQ command completion.

So, blocked counts of 1 are just fine as long as ->qc_defer() doesn't
try to defer a command when the target is idle.  That said, there's no
harm in increasing the blocked count to two or even leaving it at the
default because those blocked counters are reset to 0 whenever a
command completes and by the same logic which makes blocked counts of
1 okay, it's guaranteed that every deferred command will have matching
command completions to clear its blocked counts.

As the current code has been working well for quite some time now, I'm
more inclined to leave it as it is.

Thanks.

-- 
tejun

[-- Attachment #2: defer-debug.patch --]
[-- Type: text/x-patch, Size: 2155 bytes --]

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 3ce4392..8eb050e 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1612,6 +1612,11 @@ static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,
 			goto defer;
 	}
 
+	if (cmd->ata_deferred_cnt >= 10)
+		ata_dev_printk(dev, KERN_INFO, "XXX: cmd %02x deferred %d times taking %u msecs\n",
+			       qc->tf.command, cmd->ata_deferred_cnt,
+			       jiffies_to_msecs(jiffies - cmd->ata_first_deferred));
+
 	/* select device, send command to hardware */
 	ata_qc_issue(qc);
 
@@ -1633,6 +1638,18 @@ err_mem:
 	return 0;
 
 defer:
+	if (!cmd->ata_deferred_cnt++) {
+		cmd->ata_first_deferred = cmd->ata_last_deferred = jiffies;
+	} else {
+		unsigned long now = jiffies;
+
+		if (jiffies_to_msecs(now - cmd->ata_last_deferred) < 3)
+			ata_dev_printk(dev, KERN_INFO, "XXX: cmd %02x deferred in %d msecs, cnt=%d\n",
+				       qc->tf.command,
+				       jiffies_to_msecs(now - cmd->ata_last_deferred),
+				       cmd->ata_deferred_cnt);
+		cmd->ata_last_deferred = now;
+	}
 	ata_qc_free(qc);
 	DPRINTK("EXIT - defer\n");
 	if (rc == ATA_DEFER_LINK)
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 110e776..aadee36 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -265,6 +265,7 @@ struct scsi_cmnd *scsi_get_command(struct scsi_device *dev, gfp_t gfp_mask)
 		list_add_tail(&cmd->list, &dev->cmd_list);
 		spin_unlock_irqrestore(&dev->list_lock, flags);
 		cmd->jiffies_at_alloc = jiffies;
+		cmd->ata_deferred_cnt = 0;
 	} else
 		put_device(&dev->sdev_gendev);
 
diff --git a/include/linux/libata.h b/include/linux/libata.h
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 3e46dfa..0000971 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -127,6 +127,10 @@ struct scsi_cmnd {
 	int result;		/* Status code from lower level driver */
 
 	unsigned char tag;	/* SCSI-II queued command tag */
+
+	int ata_deferred_cnt;
+	unsigned long ata_first_deferred;
+	unsigned long ata_last_deferred;
 };
 
 extern struct scsi_cmnd *scsi_get_command(struct scsi_device *, gfp_t);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Prevent busy looping
  2008-06-11  7:05           ` Alan Cox
@ 2008-06-11  8:03             ` Tejun Heo
  2008-06-12  3:06               ` Tejun Heo
  0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2008-06-11  8:03 UTC (permalink / raw)
  To: Alan Cox; +Cc: James Bottomley, Jens Axboe, linux-ide, linux-scsi, linux-kernel

Alan Cox wrote:
>> Elias's synthetic test case triggered infinite loop because it wasn't
>> a proper ->qc_defer().  ->qc_defer() should never defer commands when
>> the target is idle.
> 
> Target or host ? We *do* defer commands in the case of an idle channel
> when dealing with certain simplex controllers that can only issue one
> command per host not one per cable (and in fact in the general case we
> can defer commands due to activity on the other drive on the cable).

The term was confusing.  I used target to mean both device
(ATA_DEFER_LINK) and host (ATA_DEFER_PORT).  Hmmm... in simplex case,
yeah, blocked counters need to be > 1.  We'll need to increase blocked
counts after all.  I'll test blocked counts of 2 w/ PMP and make sure it
doesn't incur unnecessary delays and post the patch.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Prevent busy looping
  2008-06-11  8:03             ` Tejun Heo
@ 2008-06-12  3:06               ` Tejun Heo
  2008-06-12 11:32                 ` Elias Oltmanns
  0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2008-06-12  3:06 UTC (permalink / raw)
  To: Alan Cox; +Cc: James Bottomley, Jens Axboe, linux-ide, linux-scsi, linux-kernel

Tejun Heo wrote:
> Alan Cox wrote:
>>> Elias's synthetic test case triggered infinite loop because it wasn't
>>> a proper ->qc_defer().  ->qc_defer() should never defer commands when
>>> the target is idle.
>> Target or host ? We *do* defer commands in the case of an idle channel
>> when dealing with certain simplex controllers that can only issue one
>> command per host not one per cable (and in fact in the general case we
>> can defer commands due to activity on the other drive on the cable).
> 
> The term was confusing.  I used target to mean both device
> (ATA_DEFER_LINK) and host (ATA_DEFER_PORT).  Hmmm... in simplex case,
> yeah, blocked counters need to be > 1.  We'll need to increase blocked
> counts after all.  I'll test blocked counts of 2 w/ PMP and make sure it
> doesn't incur unnecessary delays and post the patch.

Setting blocked counts to 2 makes simplex scheduling starve one of the
drives.  When a drive loses competition, it retries only after plug
delay and of course it loses most of the time.  For now, it seems we'll
have to live with busy loops (which doesn't lock up the machine) for
simplex controllers.  Ewww... :-(

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Prevent busy looping
  2008-06-12  3:06               ` Tejun Heo
@ 2008-06-12 11:32                 ` Elias Oltmanns
  2008-06-12 13:43                   ` Tejun Heo
  0 siblings, 1 reply; 13+ messages in thread
From: Elias Oltmanns @ 2008-06-12 11:32 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Alan Cox, James Bottomley, Jens Axboe, linux-ide, linux-scsi,
	linux-kernel

Tejun Heo <htejun@gmail.com> wrote:
> Tejun Heo wrote:
>> Alan Cox wrote:
>
>>>> Elias's synthetic test case triggered infinite loop because it wasn't
>>>> a proper ->qc_defer().  ->qc_defer() should never defer commands when
>>>> the target is idle.
>>> Target or host ? We *do* defer commands in the case of an idle channel
>>> when dealing with certain simplex controllers that can only issue one
>>> command per host not one per cable (and in fact in the general case we
>>> can defer commands due to activity on the other drive on the cable).
>> 
>> The term was confusing.  I used target to mean both device
>> (ATA_DEFER_LINK) and host (ATA_DEFER_PORT).  Hmmm... in simplex case,
>> yeah, blocked counters need to be > 1.  We'll need to increase blocked
>> counts after all.  I'll test blocked counts of 2 w/ PMP and make sure it
>> doesn't incur unnecessary delays and post the patch.
>
> Setting blocked counts to 2 makes simplex scheduling starve one of the
> drives.  When a drive loses competition, it retries only after plug
> delay and of course it loses most of the time.  For now, it seems we'll
> have to live with busy loops (which doesn't lock up the machine) for
> simplex controllers.  Ewww... :-(

Since I'm a little confused by your comment, please explain again. Do
you mean to say that busy looping doesn't lock up the machine in general
or merely in the case of a simplex configuration?

The reason why I'm asking is this: The whole point of my synthetic
->qc_defer() function was to prove that command deferral could (under
certain conditions) lead to busy looping which *did* lock up my machine.
Lock up in this context means that there was no response whatsoever to
key presses and even timers didn't fire anymore. I can see your point
that my ->qc_defer() function doesn't reflect reality very well because
the device is idle at the time and therefore no interrupts can be
expected from there. However, I still think that interrupts won't even
be processed once busy looping has started (in some configurations at
least).

You can find a slightly modified version of my synthetic ->qc_defer()
function below (apply to 2.6.26-rc5) which demonstrates that at least
soft interrupts don't get serviced anymore once the busy looping has
started. Considering this, how can I be sure that an interrupt of the
target would be processed, even if it was not idle?

Regards,

Elias

 drivers/ata/ata_piix.c |   37 +++++++++++++++++++++++++++++++++++++
 1 files changed, 37 insertions(+), 0 deletions(-)


diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
index 81b7ae3..9816daa 100644
--- a/drivers/ata/ata_piix.c
+++ b/drivers/ata/ata_piix.c
@@ -167,6 +167,7 @@ static int ich_pata_cable_detect(struct ata_port *ap);
 static u8 piix_vmw_bmdma_status(struct ata_port *ap);
 static int piix_sidpr_scr_read(struct ata_port *ap, unsigned int reg, u32 *val);
 static int piix_sidpr_scr_write(struct ata_port *ap, unsigned int reg, u32 val);
+static int piix_qc_defer(struct ata_queued_cmd *qc);
 #ifdef CONFIG_PM
 static int piix_pci_device_suspend(struct pci_dev *pdev, pm_message_t mesg);
 static int piix_pci_device_resume(struct pci_dev *pdev);
@@ -299,6 +300,7 @@ static struct ata_port_operations piix_pata_ops = {
 	.set_piomode		= piix_set_piomode,
 	.set_dmamode		= piix_set_dmamode,
 	.prereset		= piix_pata_prereset,
+	.qc_defer		= piix_qc_defer,
 };
 
 static struct ata_port_operations piix_vmw_ops = {
@@ -314,6 +316,7 @@ static struct ata_port_operations ich_pata_ops = {
 
 static struct ata_port_operations piix_sata_ops = {
 	.inherits		= &ata_bmdma_port_ops,
+	.qc_defer		= piix_qc_defer,
 };
 
 static struct ata_port_operations piix_sidpr_sata_ops = {
@@ -323,6 +326,40 @@ static struct ata_port_operations piix_sidpr_sata_ops = {
 	.scr_write		= piix_sidpr_scr_write,
 };
 
+static unsigned int defer_count = 0;
+static struct timer_list defer_timer;
+
+static void piix_defer_timeout(unsigned long data)
+{
+	struct ata_port *ap = (struct ata_port *)data;
+
+	spin_lock_bh(ap->lock);
+	defer_count = 0;
+	spin_unlock_bh(ap->lock);
+}
+
+static int piix_qc_defer(struct ata_queued_cmd *qc)
+{
+	static struct ata_port *ap = NULL;
+#define PIIX_QC_DEFER_THRESHOLD 2000
+
+	if (!ap) {
+		ap = qc->ap;
+		defer_timer.data = (unsigned long)ap;
+		defer_timer.function = piix_defer_timeout;
+		init_timer(&defer_timer);
+	} else if (ap != qc->ap)
+		return 0;
+
+	defer_count++;
+	if (defer_count < PIIX_QC_DEFER_THRESHOLD)
+		return 0;
+
+        if (defer_count == PIIX_QC_DEFER_THRESHOLD)
+		mod_timer(&defer_timer, msecs_to_jiffies(5));
+	return ATA_DEFER_LINK;
+}
+
 static const struct piix_map_db ich5_map_db = {
 	.mask = 0x7,
 	.port_enable = 0x3,

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Prevent busy looping
  2008-06-12 11:32                 ` Elias Oltmanns
@ 2008-06-12 13:43                   ` Tejun Heo
  2008-06-12 14:18                     ` James Bottomley
  0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2008-06-12 13:43 UTC (permalink / raw)
  To: Elias Oltmanns
  Cc: Alan Cox, James Bottomley, Jens Axboe, linux-ide, linux-scsi,
	linux-kernel

Elias Oltmanns wrote:
> Since I'm a little confused by your comment, please explain again. Do
> you mean to say that busy looping doesn't lock up the machine in general
> or merely in the case of a simplex configuration?

It busy loops but it won't lock up as command completion is the loop
breaker and command completion comes via IRQ and the busy looping
doesn't happen solely in the IRQ context.  Still needs to be fixed tho.
 Anyways, this is only limited to ->qc_defer for simplex and the reason
why there's busy loop is because we're trying to schedule two
independent hosts and SCSI midlayer (of course) doesn't have the notion
of cross host deferring.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Prevent busy looping
  2008-06-12 13:43                   ` Tejun Heo
@ 2008-06-12 14:18                     ` James Bottomley
  0 siblings, 0 replies; 13+ messages in thread
From: James Bottomley @ 2008-06-12 14:18 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Elias Oltmanns, Alan Cox, Jens Axboe, linux-ide, linux-scsi,
	linux-kernel

On Thu, 2008-06-12 at 22:43 +0900, Tejun Heo wrote:
> It busy loops but it won't lock up as command completion is the loop
> breaker and command completion comes via IRQ and the busy looping
> doesn't happen solely in the IRQ context.  Still needs to be fixed tho.
>  Anyways, this is only limited to ->qc_defer for simplex and the reason
> why there's busy loop is because we're trying to schedule two
> independent hosts and SCSI midlayer (of course) doesn't have the notion
> of cross host deferring.

It would if the host were at the right level.  We have the whole concept
of starved list processing for blocked queues that was supposed to be
designed for this (well, for a corresponding SCSI situation).

James



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-06-12 14:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-16 15:37 Block: Prevent busy looping Elias Oltmanns
2008-04-16 16:31 ` Jens Axboe
2008-04-16 16:42   ` Jens Axboe
2008-04-16 22:24   ` Elias Oltmanns
2008-04-17  7:13     ` Jens Axboe
2008-04-17  8:50       ` Elias Oltmanns
2008-06-11  7:11         ` Tejun Heo
2008-06-11  7:05           ` Alan Cox
2008-06-11  8:03             ` Tejun Heo
2008-06-12  3:06               ` Tejun Heo
2008-06-12 11:32                 ` Elias Oltmanns
2008-06-12 13:43                   ` Tejun Heo
2008-06-12 14:18                     ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).