blk_stop_queue/blk_start_queue confusion, problem, or bug???

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* blk_stop_queue/blk_start_queue confusion, problem, or bug???
@ 2003-07-27 18:24 Lou Langholtz
  2003-07-28  6:43 ` Lou Langholtz
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Lou Langholtz @ 2003-07-27 18:24 UTC (permalink / raw)
  To: linux-kernel, Jens Axboe, Andrea Arcangeli, NeilBrown; +Cc: Steven Whitehouse

I've been trying to use the blk_start_queue and blk_stop_queue functions 
in the network block device driver branch I'm working on. The stop works 
as expected, but the start doesn't. Processes that have tried to read or 
write to the device (after the queue was stopped) stay blocked in 
io_schedule instead of getting woken up (after blk_start_queue was 
called). Do I need to follow the call to blk_start_queue() with a call 
to wake_up() on the correct wait queues? Why not have that functionality 
be part of blk_start_queue()? Or was this an oversight/bug?

The reason I'm using blk_stop_queue and blk_start_queue is to stop the 
request handling function (installed from blk_init_queue), from being 
re-invoked and to return when the network block device server goes down. 
That way, the driver doesn't need to block indefinately within the 
request handling function - which seems like it'd likely block other 
block drivers if it did this - and doesn't need to be handled by 
yet-another seperate kernel thread. Anyways... the stop is called from 
either the request handling function context or from an ioctl call 
context. If then a process tries to read or write to the device it 
blocks - just as I'd like (more like NFS behavior that way). When my 
code detects that the server has come back up again from the ioctl call 
context it calls blk_start_queue(). But the I/O blocked process stays 
blocked.

Am I using these calls incorrectly or is something else going on? 
Insights, examples, very much appreciated.

BTW: LKML has had a related thread on this some years ago in discussing 
how the block layer system handles request functions that must drop the 
spinlock and may block indefinately. That never seemed to get resolved 
though and makes me believe that's why Steven Whitehouse opted to use a 
multi-threaded approach to the NBD driver at one point.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: blk_stop_queue/blk_start_queue confusion, problem, or bug???
  2003-07-27 18:24 blk_stop_queue/blk_start_queue confusion, problem, or bug??? Lou Langholtz
@ 2003-07-28  6:43 ` Lou Langholtz
  2003-07-28  7:00 ` [PATCH 2.6.0-test2] fix broken blk_start_queue behavior Lou Langholtz
  2003-07-28  7:01 ` blk_stop_queue/blk_start_queue confusion, problem, or bug??? Jens Axboe
  2 siblings, 0 replies; 7+ messages in thread
From: Lou Langholtz @ 2003-07-28  6:43 UTC (permalink / raw)
  To: linux-kernel, Jens Axboe; +Cc: Andrea Arcangeli, NeilBrown, Andrew Morton

Lou Langholtz wrote:

> I've been trying to use the blk_start_queue and blk_stop_queue 
> functions in the network block device driver branch I'm working on. 
> The stop works as expected, but the start doesn't. Processes that have 
> tried to read or write to the device (after the queue was stopped) 
> stay blocked in io_schedule instead of getting woken up (after 
> blk_start_queue was called). Do I need to follow the call to 
> blk_start_queue() with a call to wake_up() on the correct wait queues? 
> Why not have that functionality be part of blk_start_queue()? Or was 
> this an oversight/bug? . . .

I'm gonna call this a bug and submit the patch for this since the small 
change I just tried fixed the behavior. Seems like email with subjects 
beginning [PATCH] get a lot more attention so I'm sending the patch in 
another message (titled as such). Hang on... ;-)



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 2.6.0-test2] fix broken blk_start_queue behavior
  2003-07-27 18:24 blk_stop_queue/blk_start_queue confusion, problem, or bug??? Lou Langholtz
  2003-07-28  6:43 ` Lou Langholtz
@ 2003-07-28  7:00 ` Lou Langholtz
  2003-07-28  7:12   ` Jens Axboe
  2003-07-28  7:01 ` blk_stop_queue/blk_start_queue confusion, problem, or bug??? Jens Axboe
  2 siblings, 1 reply; 7+ messages in thread
From: Lou Langholtz @ 2003-07-28  7:00 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton; +Cc: Jens Axboe, Andrea Arcangeli, NeilBrown

[-- Attachment #1: Type: text/plain, Size: 828 bytes --]

This patch changes the behavior of blk_start_queue() so that request 
queues really do start up again after blk_start_queue() is called (on 
queues that were previously stopped via blk_stop_queue). The patch 
applies against 2.6.0-test2. I have tested this patch with the use of 
blk_stop_queue and blk_start_queue in my branch of the nbd block device 
driver (not yet released). blk_start_queue is also used in 
./drivers/{block/cciss.c,ide/ide-io.c} which should see things function 
as intended now w.r.t. stopping and starting the request queue (but I do 
not know if anybody noticed that they weren't working correctly before). 
ide-io.c uses queue stop and start for power management handling. Please 
let me know if you've seen a problem before with that, especially if 
this patch fixes it - that will be happy news ;-)

[-- Attachment #2: patch-2.6.0-test2-unplug --]
[-- Type: text/plain, Size: 593 bytes --]

diff -urN linux-2.6.0-test2/drivers/block/ll_rw_blk.c linux-2.6.0-test2-unplug/drivers/block/ll_rw_blk.c
--- linux-2.6.0-test2/drivers/block/ll_rw_blk.c	2003-07-27 19:02:48.000000000 -0600
+++ linux-2.6.0-test2-unplug/drivers/block/ll_rw_blk.c	2003-07-28 00:36:35.366537142 -0600
@@ -1027,10 +1027,10 @@
  */
 static inline void __generic_unplug_device(request_queue_t *q)
 {
-	if (!blk_remove_plug(q))
+	if (test_bit(QUEUE_FLAG_STOPPED, &q->queue_flags))
 		return;

-	if (test_bit(QUEUE_FLAG_STOPPED, &q->queue_flags))
+	if (!blk_remove_plug(q))
 		return;

 	del_timer(&q->unplug_timer);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: blk_stop_queue/blk_start_queue confusion, problem, or bug???
  2003-07-27 18:24 blk_stop_queue/blk_start_queue confusion, problem, or bug??? Lou Langholtz
  2003-07-28  6:43 ` Lou Langholtz
  2003-07-28  7:00 ` [PATCH 2.6.0-test2] fix broken blk_start_queue behavior Lou Langholtz
@ 2003-07-28  7:01 ` Jens Axboe
  2003-07-28  7:51   ` Lou Langholtz
  2 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2003-07-28  7:01 UTC (permalink / raw)
  To: Lou Langholtz
  Cc: linux-kernel, Andrea Arcangeli, NeilBrown, Steven Whitehouse

On Sun, Jul 27 2003, Lou Langholtz wrote:
> I've been trying to use the blk_start_queue and blk_stop_queue functions 
> in the network block device driver branch I'm working on. The stop works 
> as expected, but the start doesn't. Processes that have tried to read or 
> write to the device (after the queue was stopped) stay blocked in 
> io_schedule instead of getting woken up (after blk_start_queue was 
> called). Do I need to follow the call to blk_start_queue() with a call 
> to wake_up() on the correct wait queues? Why not have that functionality 
> be part of blk_start_queue()? Or was this an oversight/bug?

blk_start_queue() should be enough. What kind of behaviour are you
seeing? Is the request_fn() never called again?

> The reason I'm using blk_stop_queue and blk_start_queue is to stop the 
> request handling function (installed from blk_init_queue), from being 
> re-invoked and to return when the network block device server goes down. 
> That way, the driver doesn't need to block indefinately within the 
> request handling function - which seems like it'd likely block other 
> block drivers if it did this - and doesn't need to be handled by 

It will, you should never block in your request function/

> yet-another seperate kernel thread. Anyways... the stop is called from 
> either the request handling function context or from an ioctl call 
> context. If then a process tries to read or write to the device it 
> blocks - just as I'd like (more like NFS behavior that way). When my 
> code detects that the server has come back up again from the ioctl call 
> context it calls blk_start_queue(). But the I/O blocked process stays 
> blocked.

aaaaand what happens? You are not giving a lot of info. What kernel?
It's pretty trivial to put printks in stop/start_queue and start doing
some tracking, since none of the core drivers use it yet.

> Am I using these calls incorrectly or is something else going on? 
> Insights, examples, very much appreciated.

Hard to say, as you didn't post the code. But it sounds correct.

> BTW: LKML has had a related thread on this some years ago in discussing 
> how the block layer system handles request functions that must drop the 
> spinlock and may block indefinately. That never seemed to get resolved 
> though and makes me believe that's why Steven Whitehouse opted to use a 
> multi-threaded approach to the NBD driver at one point.

That has never really been allowed, in that it is a Bad Thing to do
something like that.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2.6.0-test2] fix broken blk_start_queue behavior
  2003-07-28  7:00 ` [PATCH 2.6.0-test2] fix broken blk_start_queue behavior Lou Langholtz
@ 2003-07-28  7:12   ` Jens Axboe
  0 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2003-07-28  7:12 UTC (permalink / raw)
  To: Lou Langholtz; +Cc: linux-kernel, Andrew Morton, Andrea Arcangeli, NeilBrown

On Mon, Jul 28 2003, Lou Langholtz wrote:
> This patch changes the behavior of blk_start_queue() so that request 
> queues really do start up again after blk_start_queue() is called (on 
> queues that were previously stopped via blk_stop_queue). The patch 
> applies against 2.6.0-test2. I have tested this patch with the use of 
> blk_stop_queue and blk_start_queue in my branch of the nbd block device 
> driver (not yet released). blk_start_queue is also used in 
> ./drivers/{block/cciss.c,ide/ide-io.c} which should see things function 
> as intended now w.r.t. stopping and starting the request queue (but I do 
> not know if anybody noticed that they weren't working correctly before). 
> ide-io.c uses queue stop and start for power management handling. Please 
> let me know if you've seen a problem before with that, especially if 
> this patch fixes it - that will be happy news ;-)

> diff -urN linux-2.6.0-test2/drivers/block/ll_rw_blk.c linux-2.6.0-test2-unplug/drivers/block/ll_rw_blk.c
> --- linux-2.6.0-test2/drivers/block/ll_rw_blk.c	2003-07-27 19:02:48.000000000 -0600
> +++ linux-2.6.0-test2-unplug/drivers/block/ll_rw_blk.c	2003-07-28 00:36:35.366537142 -0600
> @@ -1027,10 +1027,10 @@
>   */
>  static inline void __generic_unplug_device(request_queue_t *q)
>  {
> -	if (!blk_remove_plug(q))
> +	if (test_bit(QUEUE_FLAG_STOPPED, &q->queue_flags))
>  		return;
>  
> -	if (test_bit(QUEUE_FLAG_STOPPED, &q->queue_flags))
> +	if (!blk_remove_plug(q))
>  		return;
>  
>  	del_timer(&q->unplug_timer);

Patch looks correct, thanks! Guess our mails crossed in mid-air :)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: blk_stop_queue/blk_start_queue confusion, problem, or bug???
  2003-07-28  7:01 ` blk_stop_queue/blk_start_queue confusion, problem, or bug??? Jens Axboe
@ 2003-07-28  7:51   ` Lou Langholtz
  2003-08-07 10:51     ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Lou Langholtz @ 2003-07-28  7:51 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, Andrea Arcangeli, NeilBrown, Steven Whitehouse

Jens Axboe wrote:

>On Sun, Jul 27 2003, Lou Langholtz wrote:
>  
>
>>I've been trying to use the blk_start_queue and blk_stop_queue functions 
>>in the network block device driver branch I'm working on. The stop works 
>>as expected, but the start doesn't. Processes that have tried to read or 
>>write to the device (after the queue was stopped) stay blocked in 
>>io_schedule instead of getting woken up (after blk_start_queue was 
>>called). Do I need to follow the call to blk_start_queue() with a call 
>>to wake_up() on the correct wait queues? Why not have that functionality 
>>be part of blk_start_queue()? Or was this an oversight/bug?
>>    
>>
>
>blk_start_queue() should be enough. What kind of behaviour are you
>seeing? Is the request_fn() never called again?
>
Sorry. I've been so burried in this problem, I forgot others probably 
can't read my mind ;-) The behavior I was seeing was that processes 
blocked on I/O and in io_schedule, don't get woken up. After tracking 
the problem down, I realized that once the queue was stopped (using 
blk_stop_queue) any I/O requests against an empty request queue would 
plug the device. After the short timeout, generic_unplug would get 
called and would first try removing the plug then if it succeeded check 
QUEUE_FLAG_STOPPED. In my case QUEUE_FLAG_STOPPED hadn't gotten cleared 
by the time generic_unplug had gotten invoked. So the queue was left in 
a state where it wasn't plugged any more but the request_fn wasn't 
running either and things hung that way (locked in io_schedule). 
Hopefully the patch I just sent out will make sense if my explanation 
doesn't again this time. ;-)

>>The reason I'm using blk_stop_queue and blk_start_queue is to stop the 
>>request handling function (installed from blk_init_queue), from being 
>>re-invoked and to return when the network block device server goes down. 
>>That way, the driver doesn't need to block indefinately within the 
>>request handling function - which seems like it'd likely block other 
>>block drivers if it did this - and doesn't need to be handled by 
>>    
>>
>
>It will, you should never block in your request function/
>  
>
With the network block device driver, the only way to ensure the request 
function *never* blocks is to have a seperate dedicated kernel thread 
handling the actual network I/O. At best otherwise, I can use 
MSG_DONTWAIT coupled with the blk_start_queue and blk_stop_queue 
functions however the code must still drop the spin lock to make the 
socket calls (since they still may sleep). At least when I try to call 
sock_sendmsg/sendpage with the spin lock still held (and I'm using 
CONFIG_DEBUG_SPINLOCK_SLEEP) I get "sleeping function called from 
illegal context" messages. Is there another way? What's the way you 
would suggest?

>>. . . BTW: LKML has had a related thread on this some years ago in discussing 
>>how the block layer system handles request functions that must drop the 
>>spinlock and may block indefinately. That never seemed to get resolved 
>>though and makes me believe that's why Steven Whitehouse opted to use a 
>>multi-threaded approach to the NBD driver at one point.
>>    
>>
>
>That has never really been allowed, in that it is a Bad Thing to do
>something like that.
>  
>
Want to make sure I don't misunderstand... you mean that dropping the 
queue spin lock is a Bad Thing correct? Is it bad enough to warrant 
using a seperate kernel thread for handling network sends to avoid this 
then? This would have to be a seperate thread per network block device 
then to ensure the devices don't impede each other.

Thanks!!!!!


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: blk_stop_queue/blk_start_queue confusion, problem, or bug???
  2003-07-28  7:51   ` Lou Langholtz
@ 2003-08-07 10:51     ` Jens Axboe
  0 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2003-08-07 10:51 UTC (permalink / raw)
  To: Lou Langholtz
  Cc: linux-kernel, Andrea Arcangeli, NeilBrown, Steven Whitehouse

On Mon, Jul 28 2003, Lou Langholtz wrote:
> Jens Axboe wrote:
> 
> >On Sun, Jul 27 2003, Lou Langholtz wrote:
> > 
> >
> >>I've been trying to use the blk_start_queue and blk_stop_queue functions 
> >>in the network block device driver branch I'm working on. The stop works 
> >>as expected, but the start doesn't. Processes that have tried to read or 
> >>write to the device (after the queue was stopped) stay blocked in 
> >>io_schedule instead of getting woken up (after blk_start_queue was 
> >>called). Do I need to follow the call to blk_start_queue() with a call 
> >>to wake_up() on the correct wait queues? Why not have that functionality 
> >>be part of blk_start_queue()? Or was this an oversight/bug?
> >>   
> >>
> >
> >blk_start_queue() should be enough. What kind of behaviour are you
> >seeing? Is the request_fn() never called again?
> >
> Sorry. I've been so burried in this problem, I forgot others probably 
> can't read my mind ;-) The behavior I was seeing was that processes 
> blocked on I/O and in io_schedule, don't get woken up. After tracking 
> the problem down, I realized that once the queue was stopped (using 
> blk_stop_queue) any I/O requests against an empty request queue would 
> plug the device. After the short timeout, generic_unplug would get 
> called and would first try removing the plug then if it succeeded check 
> QUEUE_FLAG_STOPPED. In my case QUEUE_FLAG_STOPPED hadn't gotten cleared 
> by the time generic_unplug had gotten invoked. So the queue was left in 
> a state where it wasn't plugged any more but the request_fn wasn't 
> running either and things hung that way (locked in io_schedule). 
> Hopefully the patch I just sent out will make sense if my explanation 
> doesn't again this time. ;-)

Your diagnosis is correct and the patch you sent was too.

> >>The reason I'm using blk_stop_queue and blk_start_queue is to stop the 
> >>request handling function (installed from blk_init_queue), from being 
> >>re-invoked and to return when the network block device server goes down. 
> >>That way, the driver doesn't need to block indefinately within the 
> >>request handling function - which seems like it'd likely block other 
> >>block drivers if it did this - and doesn't need to be handled by 
> >>   
> >>
> >
> >It will, you should never block in your request function/
> > 
> >
> With the network block device driver, the only way to ensure the request 
> function *never* blocks is to have a seperate dedicated kernel thread 
> handling the actual network I/O. At best otherwise, I can use 

Correct

> >That has never really been allowed, in that it is a Bad Thing to do
> >something like that.
> > 
> >
> Want to make sure I don't misunderstand... you mean that dropping the 
> queue spin lock is a Bad Thing correct? Is it bad enough to warrant 
> using a seperate kernel thread for handling network sends to avoid this 
> then? This would have to be a seperate thread per network block device 
> then to ensure the devices don't impede each other.

I snipped your paragraph above, since this is essentially the same. It
doesn't matter that you drop the lock, that just makes it legal from a
general kernel perspective (must not schedule with spinlock held, that's
a hard rule). However, you are still pinning plugged queues "behind" you
so they can't run until you finish.

So yes, typically the correct thing is to offload such work from the
request_fn and do it in some sort of thread. Your request_fn then
becomes something ala

void my_rfn(request_queue_t *q)
{
	struct my_dev *dev = q->queuedata;

	wake_up(&dev->request_wqueue);
}

or something like that.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-08-07 10:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-07-27 18:24 blk_stop_queue/blk_start_queue confusion, problem, or bug??? Lou Langholtz
2003-07-28  6:43 ` Lou Langholtz
2003-07-28  7:00 ` [PATCH 2.6.0-test2] fix broken blk_start_queue behavior Lou Langholtz
2003-07-28  7:12   ` Jens Axboe
2003-07-28  7:01 ` blk_stop_queue/blk_start_queue confusion, problem, or bug??? Jens Axboe
2003-07-28  7:51   ` Lou Langholtz
2003-08-07 10:51     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).