Counter-kludge for 2.5.x hanging when writing to block device

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Counter-kludge for 2.5.x hanging when writing to block device
@ 2003-06-03  8:48 Adam J. Richter
  2003-06-03  9:10 ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Adam J. Richter @ 2003-06-03  8:48 UTC (permalink / raw)
  To: linux-kernel

	For at least the past few months, the Linux 2.5 kernels have
hung when I try to write a large amount of data to a block device.
I most commonly notice this when trying to clear a disk with a command
like "dd if=/dev/zero of=/dev/discs/disc1/disc".  Sometimes doing
an mkfs on a big file system is enough to cause the hang.
I wrote a little program to repeatedly write a 4kB block of zeroes
to the kernel so I could track how far it got before hanging, and it
would write 210-215MB of zeroes to the disk on a computer that had
512MB of RAM before hanging.  When these hangs occur, other processes
continue to run fine, and I can do syncs, which return, but the
hung process never resumes.  In the past, I've verified with a
printk that it is looping in balance_dirty_pages, repeatedly
calling blk_congestion_wait, and never leaving the loop.

	Here is a counter-kludge that seems to stop the problem.
This is certainly not the "right" fix.  It just illustrates a way
to stop the problem.

	By the way, I say "counter-kludge", because I get the impression
that blk_congestion_wait is itself a kludge, since it calls
blk_run_queues and waits a fixed amount of time, 100ms in this case,
potentially a big waste of time, rather than awaiting some more
accurate criterion.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Miplitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."


--- linux-2.5.70-bk7/mm/page-writeback.c	2003-06-02 14:02:39.000000000 -0700
+++ linux/mm/page-writeback.c	2003-06-02 13:59:31.000000000 -0700
@@ -177,7 +177,12 @@
 			if (pages_written >= write_chunk)
 				break;		/* We've done our duty */
 		}
+#if 0				/* AJR */
 		blk_congestion_wait(WRITE, HZ/10);
+#else
+		blk_run_queues();
+		break;
+#endif
 	}
 
 	if (nr_reclaimable + ps.nr_writeback <= dirty_thresh)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Counter-kludge for 2.5.x hanging when writing to block device
  2003-06-03  8:48 Counter-kludge for 2.5.x hanging when writing to block device Adam J. Richter
@ 2003-06-03  9:10 ` Jens Axboe
  2003-06-03 10:00   ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2003-06-03  9:10 UTC (permalink / raw)
  To: Adam J. Richter; +Cc: linux-kernel

On Tue, Jun 03 2003, Adam J. Richter wrote:
> 	For at least the past few months, the Linux 2.5 kernels have
> hung when I try to write a large amount of data to a block device.
> I most commonly notice this when trying to clear a disk with a command
> like "dd if=/dev/zero of=/dev/discs/disc1/disc".  Sometimes doing
> an mkfs on a big file system is enough to cause the hang.
> I wrote a little program to repeatedly write a 4kB block of zeroes
> to the kernel so I could track how far it got before hanging, and it
> would write 210-215MB of zeroes to the disk on a computer that had
> 512MB of RAM before hanging.  When these hangs occur, other processes
> continue to run fine, and I can do syncs, which return, but the
> hung process never resumes.  In the past, I've verified with a
> printk that it is looping in balance_dirty_pages, repeatedly
> calling blk_congestion_wait, and never leaving the loop.
> 
> 	Here is a counter-kludge that seems to stop the problem.
> This is certainly not the "right" fix.  It just illustrates a way
> to stop the problem.
> 
> 	By the way, I say "counter-kludge", because I get the impression
> that blk_congestion_wait is itself a kludge, since it calls
> blk_run_queues and waits a fixed amount of time, 100ms in this case,
> potentially a big waste of time, rather than awaiting some more
> accurate criterion.

Does something like this work? Andrew, what's the point of doing the
wait if the queue isn't congested?! I haven't even checked if this gets
the job done, I think it would be cleaner to pass in the backing dev
info to blk_congestion_wait so we can make the decision in there.

===== mm/page-writeback.c 1.66 vs edited =====
--- 1.66/mm/page-writeback.c	Sun Jun  1 23:12:47 2003
+++ edited/mm/page-writeback.c	Tue Jun  3 11:09:13 2003
@@ -152,6 +152,7 @@
 			.sync_mode	= WB_SYNC_NONE,
 			.older_than_this = NULL,
 			.nr_to_write	= write_chunk,
+			.encountered_congestion = 0,
 		};
 
 		get_dirty_limits(&ps, &background_thresh, &dirty_thresh);
@@ -178,7 +179,8 @@
 			if (pages_written >= write_chunk)
 				break;		/* We've done our duty */
 		}
-		blk_congestion_wait(WRITE, HZ/10);
+		if (wbc.encountered_congestion)
+			blk_congestion_wait(WRITE, HZ/10);
 	}
 
 	if (nr_reclaimable + ps.nr_writeback <= dirty_thresh)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Counter-kludge for 2.5.x hanging when writing to block device
  2003-06-03  9:10 ` Jens Axboe
@ 2003-06-03 10:00   ` Andrew Morton
  2003-06-03 10:02     ` Jens Axboe
  2003-06-03 10:21     ` Michael Frank
  0 siblings, 2 replies; 7+ messages in thread
From: Andrew Morton @ 2003-06-03 10:00 UTC (permalink / raw)
  To: Jens Axboe; +Cc: adam, linux-kernel

Jens Axboe <axboe@suse.de> wrote:
>
> On Tue, Jun 03 2003, Adam J. Richter wrote:
> > 	For at least the past few months, the Linux 2.5 kernels have
> > hung when I try to write a large amount of data to a block device.

Well ytf is this the first time I've heard about it?

> > I most commonly notice this when trying to clear a disk with a command
> > like "dd if=/dev/zero of=/dev/discs/disc1/disc".  Sometimes doing
> > an mkfs on a big file system is enough to cause the hang.
> > I wrote a little program to repeatedly write a 4kB block of zeroes
> > to the kernel so I could track how far it got before hanging, and it
> > would write 210-215MB of zeroes to the disk on a computer that had
> > 512MB of RAM before hanging.  When these hangs occur, other processes
> > continue to run fine, and I can do syncs, which return, but the
> > hung process never resumes.  In the past, I've verified with a
> > printk that it is looping in balance_dirty_pages, repeatedly
> > calling blk_congestion_wait, and never leaving the loop.
> > 

Please debug it further.  Something may have gone wrong with the arithmetic
in balance_dirty_pages().

> > 	Here is a counter-kludge that seems to stop the problem.
> > This is certainly not the "right" fix.  It just illustrates a way
> > to stop the problem.
> > 
> > 	By the way, I say "counter-kludge", because I get the impression
> > that blk_congestion_wait is itself a kludge, since it calls
> > blk_run_queues and waits a fixed amount of time, 100ms in this case,
> > potentially a big waste of time, rather than awaiting some more
> > accurate criterion.

The sleep in blk_congestion_wait() terminates when a request is returned to
the queue.  The timeout is only really there for non-request-based backing
devices.

> Does something like this work? Andrew, what's the point of doing the
> wait if the queue isn't congested?!

We need to wait until the amount of dirty memory in the machine is below
the designated limits.  This is unrelated to queue congestion.  The way the
logic is now we can have 256 megs worth of requests queues on a 32M machine
and everything throttles and clamps as intended.

There are several things wrong with blk_congestion_wait(), including:

a) it should be called throttle_on_io()

b) it should check that there are still requests in flight after parking
   itself on the waitqueue rather than relying on the timeout.

c) for memory reclaim we should terminate the sleep on a certain number
   of pages coming unreclaimable, not on write requests being returned or
   timeout.

d) network filesystems should be delivering wakeups to throttled
   processes rather than relying on the timeout.

But none of these have proven sufficiently problematic to justify futzing
with it.  I expect d) will eventually need to be implemented.

As for Adam's hang: dunno.  I and many others have run mkfs and dd an
unbelievable number of times.  He needs to debug it more.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Counter-kludge for 2.5.x hanging when writing to block device
  2003-06-03 10:00   ` Andrew Morton
@ 2003-06-03 10:02     ` Jens Axboe
  2003-06-03 10:20       ` Andrew Morton
  2003-06-03 10:21     ` Michael Frank
  1 sibling, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2003-06-03 10:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: adam, linux-kernel

On Tue, Jun 03 2003, Andrew Morton wrote:
> > Does something like this work? Andrew, what's the point of doing the
> > wait if the queue isn't congested?!
> 
> We need to wait until the amount of dirty memory in the machine is below
> the designated limits.  This is unrelated to queue congestion.  The way the
> logic is now we can have 256 megs worth of requests queues on a 32M machine
> and everything throttles and clamps as intended.
> 
> 
> There are several things wrong with blk_congestion_wait(), including:
> 
> a) it should be called throttle_on_io()

Well...

> b) it should check that there are still requests in flight after parking
>    itself on the waitqueue rather than relying on the timeout.

This is important, would be much nicer to pass in the backing dev. This
is a big problem, imho. It's broken right now.

> As for Adam's hang: dunno.  I and many others have run mkfs and dd an
> unbelievable number of times.  He needs to debug it more.

Agree

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Counter-kludge for 2.5.x hanging when writing to block device
  2003-06-03 10:02     ` Jens Axboe
@ 2003-06-03 10:20       ` Andrew Morton
  2003-06-03 14:42         ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2003-06-03 10:20 UTC (permalink / raw)
  To: Jens Axboe; +Cc: adam, linux-kernel

Jens Axboe <axboe@suse.de> wrote:
>
>  > b) it should check that there are still requests in flight after parking
>  >    itself on the waitqueue rather than relying on the timeout.
> 
>  This is important, would be much nicer to pass in the backing dev. This
>  is a big problem, imho. It's broken right now.

The throttling is not really a per-device concept.  It is a "global"
concept.

If a process has written to a really slow device and has encountered
throttling due to exceeded dirty memory limits, we _do_ want to wake that
process up (to reevaluate the system state) if a bunch of writes terminate
against a fast device.

There is a fixed amount of system memory which the administrator has
dedicated to buffering of dirty-and-writeback data and I believe that not
discriminating between different bandwidth devices will give the overall
lowest latency.  This may be wrong, and maybe we do want to throttle tasks
which write to slow devices more heavily.

Or place the device's nominal bandwidth in the backing_dev_info, account
for dirty memory on a per-queue basis and limit the permissible amount of
dirty memory against slower devices.  That's probably not too hard to do
but I'm not sure that the combination of slow and fast devices both under
heavy writeout at the same time is common enough to justify it.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Counter-kludge for 2.5.x hanging when writing to block device
  2003-06-03 10:00   ` Andrew Morton
  2003-06-03 10:02     ` Jens Axboe
@ 2003-06-03 10:21     ` Michael Frank
  1 sibling, 0 replies; 7+ messages in thread
From: Michael Frank @ 2003-06-03 10:21 UTC (permalink / raw)
  To: Adam J. Richter; +Cc: linux-kernel, Andrew Morton, Jens Axboe

On Tuesday 03 June 2003 18:00, Andrew Morton wrote:
> Jens Axboe <axboe@suse.de> wrote:
> > On Tue, Jun 03 2003, Adam J. Richter wrote:
> > > 	For at least the past few months, the Linux 2.5 kernels have
> > > hung when I try to write a large amount of data to a block device.
>
> Well ytf is this the first time I've heard about it?
>

Lots of people are using 2.5 in many configurations. This kind of
would have shown long ago.

Suspect driver/hardware specific issue.

More info on hardware is needed.

Regards
Michael


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Counter-kludge for 2.5.x hanging when writing to block device
  2003-06-03 10:20       ` Andrew Morton
@ 2003-06-03 14:42         ` Jens Axboe
  0 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2003-06-03 14:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: adam, linux-kernel

On Tue, Jun 03 2003, Andrew Morton wrote:
> Jens Axboe <axboe@suse.de> wrote:
> >
> >  > b) it should check that there are still requests in flight after parking
> >  >    itself on the waitqueue rather than relying on the timeout.
> > 
> >  This is important, would be much nicer to pass in the backing dev. This
> >  is a big problem, imho. It's broken right now.
> 
> The throttling is not really a per-device concept.  It is a "global"
> concept.
> 
> If a process has written to a really slow device and has encountered
> throttling due to exceeded dirty memory limits, we _do_ want to wake that
> process up (to reevaluate the system state) if a bunch of writes terminate
> against a fast device.
> 
> There is a fixed amount of system memory which the administrator has
> dedicated to buffering of dirty-and-writeback data and I believe that not
> discriminating between different bandwidth devices will give the overall
> lowest latency.  This may be wrong, and maybe we do want to throttle tasks
> which write to slow devices more heavily.
> 
> Or place the device's nominal bandwidth in the backing_dev_info, account
> for dirty memory on a per-queue basis and limit the permissible amount of
> dirty memory against slower devices.  That's probably not too hard to do
> but I'm not sure that the combination of slow and fast devices both under
> heavy writeout at the same time is common enough to justify it.

Per process slow vs fast device is probably not common enough to justify
any changes, as long as we deal correctly with fast vs slow globally.

But your mail explains it nicely, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-06-03 14:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-03  8:48 Counter-kludge for 2.5.x hanging when writing to block device Adam J. Richter
2003-06-03  9:10 ` Jens Axboe
2003-06-03 10:00   ` Andrew Morton
2003-06-03 10:02     ` Jens Axboe
2003-06-03 10:20       ` Andrew Morton
2003-06-03 14:42         ` Jens Axboe
2003-06-03 10:21     ` Michael Frank

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).