linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Comment on patch to remove nr_async_pages limit
@ 2001-06-05  1:04 Marcelo Tosatti
  2001-06-05  7:38 ` Mike Galbraith
  2001-06-05 15:56 ` Zlatko Calusic
  0 siblings, 2 replies; 14+ messages in thread
From: Marcelo Tosatti @ 2001-06-05  1:04 UTC (permalink / raw)
  To: Zlatko Calusic; +Cc: lkml, linux-mm



Zlatko, 

I've read your patch to remove nr_async_pages limit while reading an
archive on the web. (I have to figure out why lkml is not being delivered
correctly to me...)

Quoting your message: 

"That artificial limit hurts both swap out and swap in path as it
introduces synchronization points (and/or weakens swapin readahead),
which I think are not necessary."

If we are under low memory, we cannot simply writeout a whole bunch of
swap data. Remember the writeout operations will potentially allocate
buffer_head's for the swapcache pages before doing real IO, which takes
_more memory_: OOM deadlock. 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Comment on patch to remove nr_async_pages limit
  2001-06-05  7:38 ` Mike Galbraith
@ 2001-06-05  6:18   ` Marcelo Tosatti
  2001-06-05 10:32     ` Mike Galbraith
  2001-06-05 16:05     ` Comment on patch to remove nr_async_pages limit Zlatko Calusic
  2001-06-05 15:57   ` Zlatko Calusic
  1 sibling, 2 replies; 14+ messages in thread
From: Marcelo Tosatti @ 2001-06-05  6:18 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Zlatko Calusic, lkml, linux-mm


On Tue, 5 Jun 2001, Mike Galbraith wrote:

> On Mon, 4 Jun 2001, Marcelo Tosatti wrote:
> 
> > Zlatko,
> >
> > I've read your patch to remove nr_async_pages limit while reading an
> > archive on the web. (I have to figure out why lkml is not being delivered
> > correctly to me...)
> >
> > Quoting your message:
> >
> > "That artificial limit hurts both swap out and swap in path as it
> > introduces synchronization points (and/or weakens swapin readahead),
> > which I think are not necessary."
> >
> > If we are under low memory, we cannot simply writeout a whole bunch of
> > swap data. Remember the writeout operations will potentially allocate
> > buffer_head's for the swapcache pages before doing real IO, which takes
> > _more memory_: OOM deadlock.
> 
> What's the point of creating swapcache pages, and then avoiding doing
> the IO until it becomes _dangerous_ to do so?  

Its not dangerous to do the IO. Now it _is_ dangerous to do the IO without
having any sane limit on the amount of data being written out at the same
time.

> That's what we're doing right now.  This is a problem because we
> guarantee it will become one.

Its not really about swapcache pages --- its about anonymous memory. 

If you're memory is full of anonymous data, you have to push some of this
data to disk. (conceptually it does not really matter if its swapcache or
not, think about anonymous memory)

> We guarantee that the pagecache will become almost pure swapcache by
> delaying the writeout so long that everything else is consumed.

Exactly. And when we reach a low watermark of memory, we start writting
out the anonymous memory.

> In experiments, speeding swapcache pages on their way helps.  Special
> handling (swapcache bean counting) also helps. (was _really ugly_ code..
> putting them on a seperate list would be a lot easier on the stomach:)

I agree that the current way of limiting on-flight swapout can be changed
to perform better. 

Removing the amount of data being written to disk when we have a memory
shortage is not nice. 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Comment on patch to remove nr_async_pages limit
  2001-06-05  1:04 Comment on patch to remove nr_async_pages limit Marcelo Tosatti
@ 2001-06-05  7:38 ` Mike Galbraith
  2001-06-05  6:18   ` Marcelo Tosatti
  2001-06-05 15:57   ` Zlatko Calusic
  2001-06-05 15:56 ` Zlatko Calusic
  1 sibling, 2 replies; 14+ messages in thread
From: Mike Galbraith @ 2001-06-05  7:38 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Zlatko Calusic, lkml, linux-mm

On Mon, 4 Jun 2001, Marcelo Tosatti wrote:

> Zlatko,
>
> I've read your patch to remove nr_async_pages limit while reading an
> archive on the web. (I have to figure out why lkml is not being delivered
> correctly to me...)
>
> Quoting your message:
>
> "That artificial limit hurts both swap out and swap in path as it
> introduces synchronization points (and/or weakens swapin readahead),
> which I think are not necessary."
>
> If we are under low memory, we cannot simply writeout a whole bunch of
> swap data. Remember the writeout operations will potentially allocate
> buffer_head's for the swapcache pages before doing real IO, which takes
> _more memory_: OOM deadlock.

What's the point of creating swapcache pages, and then avoiding doing
the IO until it becomes _dangerous_ to do so?  That's what we're doing
right now.  This is a problem because we guarantee it will become one.
We guarantee that the pagecache will become almost pure swapcache by
delaying the writeout so long that everything else is consumed.

In experiments, speeding swapcache pages on their way helps.  Special
handling (swapcache bean counting) also helps. (was _really ugly_ code..
putting them on a seperate list would be a lot easier on the stomach:)

	$.02

	-Mike


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Comment on patch to remove nr_async_pages limit
  2001-06-05  6:18   ` Marcelo Tosatti
@ 2001-06-05 10:32     ` Mike Galbraith
  2001-06-05 11:42       ` Ed Tomlinson
  2001-06-05 19:21       ` Benjamin C.R. LaHaise
  2001-06-05 16:05     ` Comment on patch to remove nr_async_pages limit Zlatko Calusic
  1 sibling, 2 replies; 14+ messages in thread
From: Mike Galbraith @ 2001-06-05 10:32 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Zlatko Calusic, lkml, linux-mm

On Tue, 5 Jun 2001, Marcelo Tosatti wrote:

> On Tue, 5 Jun 2001, Mike Galbraith wrote:
>
> > On Mon, 4 Jun 2001, Marcelo Tosatti wrote:
> >
> > > Zlatko,
> > >
> > > I've read your patch to remove nr_async_pages limit while reading an
> > > archive on the web. (I have to figure out why lkml is not being delivered
> > > correctly to me...)
> > >
> > > Quoting your message:
> > >
> > > "That artificial limit hurts both swap out and swap in path as it
> > > introduces synchronization points (and/or weakens swapin readahead),
> > > which I think are not necessary."
> > >
> > > If we are under low memory, we cannot simply writeout a whole bunch of
> > > swap data. Remember the writeout operations will potentially allocate
> > > buffer_head's for the swapcache pages before doing real IO, which takes
> > > _more memory_: OOM deadlock.
> >
> > What's the point of creating swapcache pages, and then avoiding doing
> > the IO until it becomes _dangerous_ to do so?
>
> Its not dangerous to do the IO. Now it _is_ dangerous to do the IO without
> having any sane limit on the amount of data being written out at the same
> time.

Yes.  If we start writing out sooner, we aren't stuck with pushing a
ton of IO all at once and can use prudent limits.  Not only because of
potential allocation problems, but because our situation is changing
rapidly so small corrections done often is more precise than whopping
big ones can be.

> > That's what we're doing right now.  This is a problem because we
> > guarantee it will become one.
>
> Its not really about swapcache pages --- its about anonymous memory.

(swapcache is the biggest pain in the butt for the portion of the spetrum
I'm hammering on though)

> If you're memory is full of anonymous data, you have to push some of this
> data to disk. (conceptually it does not really matter if its swapcache or
> not, think about anonymous memory)
>
> > We guarantee that the pagecache will become almost pure swapcache by
> > delaying the writeout so long that everything else is consumed.
>
> Exactly. And when we reach a low watermark of memory, we start writting
> out the anonymous memory.
>
> > In experiments, speeding swapcache pages on their way helps.  Special
> > handling (swapcache bean counting) also helps. (was _really ugly_ code..
> > putting them on a seperate list would be a lot easier on the stomach:)
>
> I agree that the current way of limiting on-flight swapout can be changed
> to perform better.
>
> Removing the amount of data being written to disk when we have a memory
> shortage is not nice.

Here, that doesn't make any real difference.  We can have too many pages
completing IO too late or too few.. problem is that they start coming
out of the pipe too late.  I'd rather see my poor disk saturated than
partly idle when my box is choking on dirtclods ;-)

	-Mike


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Comment on patch to remove nr_async_pages limit
  2001-06-05 10:32     ` Mike Galbraith
@ 2001-06-05 11:42       ` Ed Tomlinson
  2001-06-05 16:08         ` Zlatko Calusic
  2001-06-05 19:21       ` Benjamin C.R. LaHaise
  1 sibling, 1 reply; 14+ messages in thread
From: Ed Tomlinson @ 2001-06-05 11:42 UTC (permalink / raw)
  To: Mike Galbraith, Marcelo Tosatti; +Cc: Zlatko Calusic, lkml, linux-mm

Hi,

To paraphase Mike,

We defer doing IO until we are under short of storage.  Doing IO uses storage.
So delaying IO as much as we do forces us to impose limits.  If we did the IO
earlier we would not need this limit often, if at all.

Does this make any sense?

Maybe we can have the best of both worlds.  Is it possible to allocate the BH
early and then defer the IO?  The idea being to make IO possible without having
to allocate.  This would let us remove the async page limit but would ensure
we could still free.

Thoughts?
Ed Tomlinson

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Comment on patch to remove nr_async_pages limit
  2001-06-05  1:04 Comment on patch to remove nr_async_pages limit Marcelo Tosatti
  2001-06-05  7:38 ` Mike Galbraith
@ 2001-06-05 15:56 ` Zlatko Calusic
  1 sibling, 0 replies; 14+ messages in thread
From: Zlatko Calusic @ 2001-06-05 15:56 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: lkml, linux-mm

Marcelo Tosatti <marcelo@conectiva.com.br> writes:

> Zlatko, 
> 
> I've read your patch to remove nr_async_pages limit while reading an
> archive on the web. (I have to figure out why lkml is not being delivered
> correctly to me...)
> 
> Quoting your message: 
> 
> "That artificial limit hurts both swap out and swap in path as it
> introduces synchronization points (and/or weakens swapin readahead),
> which I think are not necessary."
> 
> If we are under low memory, we cannot simply writeout a whole bunch of
> swap data. Remember the writeout operations will potentially allocate
> buffer_head's for the swapcache pages before doing real IO, which takes
> _more memory_: OOM deadlock. 
> 

My question is: if we defer writing and in a way "loose" that 4096
bytes of memory (because we decide to keep the page in the memory for
some more time), how can a much smaller buffer_head be a problem?

I think we could always make a bigger reserve of buffer heads just for
this purpose, to make swapout more robust, and then don't impose any
limits on the number of the outstanding async io pages in the flight.

Does this make any sense?

-- 
Zlatko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Comment on patch to remove nr_async_pages limit
  2001-06-05  7:38 ` Mike Galbraith
  2001-06-05  6:18   ` Marcelo Tosatti
@ 2001-06-05 15:57   ` Zlatko Calusic
  1 sibling, 0 replies; 14+ messages in thread
From: Zlatko Calusic @ 2001-06-05 15:57 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Marcelo Tosatti, lkml, linux-mm

Mike Galbraith <mikeg@wen-online.de> writes:

> On Mon, 4 Jun 2001, Marcelo Tosatti wrote:
> 
> > Zlatko,
> >
> > I've read your patch to remove nr_async_pages limit while reading an
> > archive on the web. (I have to figure out why lkml is not being delivered
> > correctly to me...)
> >
> > Quoting your message:
> >
> > "That artificial limit hurts both swap out and swap in path as it
> > introduces synchronization points (and/or weakens swapin readahead),
> > which I think are not necessary."
> >
> > If we are under low memory, we cannot simply writeout a whole bunch of
> > swap data. Remember the writeout operations will potentially allocate
> > buffer_head's for the swapcache pages before doing real IO, which takes
> > _more memory_: OOM deadlock.
> 
> What's the point of creating swapcache pages, and then avoiding doing
> the IO until it becomes _dangerous_ to do so?  That's what we're doing
> right now.  This is a problem because we guarantee it will become one.
> We guarantee that the pagecache will become almost pure swapcache by
> delaying the writeout so long that everything else is consumed.
> 

Huh, this looks just like my argument, just put in different words. I
should have read this sooner. :)
-- 
Zlatko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Comment on patch to remove nr_async_pages limit
  2001-06-05  6:18   ` Marcelo Tosatti
  2001-06-05 10:32     ` Mike Galbraith
@ 2001-06-05 16:05     ` Zlatko Calusic
  2001-06-09  3:09       ` Rik van Riel
  1 sibling, 1 reply; 14+ messages in thread
From: Zlatko Calusic @ 2001-06-05 16:05 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Mike Galbraith, lkml, linux-mm

Marcelo Tosatti <marcelo@conectiva.com.br> writes:

[snip]
> Exactly. And when we reach a low watermark of memory, we start writting
> out the anonymous memory.
>

Hm, my observations are a little bit different. I find that writeouts
happen sooner than the moment we reach low watermark, and many times
just in time to interact badly with some read I/O workload that made a
virtual shortage of memory in the first place. Net effect is poor
performance and too much stuff in the swap.

> > In experiments, speeding swapcache pages on their way helps.  Special
> > handling (swapcache bean counting) also helps. (was _really ugly_ code..
> > putting them on a seperate list would be a lot easier on the stomach:)
> 
> I agree that the current way of limiting on-flight swapout can be changed
> to perform better. 
> 
> Removing the amount of data being written to disk when we have a memory
> shortage is not nice. 
> 

OK, then we basically agree that there is a place for improvement, and
you also agree that we must be careful while trying to achieve that.

I'll admit that my patch is mostly experimental, and its best effect
is this discussion, which I enjoy very much. :)
-- 
Zlatko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Comment on patch to remove nr_async_pages limit
  2001-06-05 11:42       ` Ed Tomlinson
@ 2001-06-05 16:08         ` Zlatko Calusic
  0 siblings, 0 replies; 14+ messages in thread
From: Zlatko Calusic @ 2001-06-05 16:08 UTC (permalink / raw)
  To: Ed Tomlinson; +Cc: Mike Galbraith, Marcelo Tosatti, lkml, linux-mm

Ed Tomlinson <tomlins@cam.org> writes:

[snip]
> Maybe we can have the best of both worlds.  Is it possible to allocate the BH
> early and then defer the IO?  The idea being to make IO possible without having
> to allocate.  This would let us remove the async page limit but would ensure
> we could still free.
> 

Yes, this is a good idea if you ask me. Basically, to remove as many
limits as we can, and also to secure us from the deadlocks. With just
a few pages of extra memory for the reserved buffer heads, I think
it's a fair game. Still, pending further analysis...
-- 
Zlatko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Comment on patch to remove nr_async_pages limit
  2001-06-05 10:32     ` Mike Galbraith
  2001-06-05 11:42       ` Ed Tomlinson
@ 2001-06-05 19:21       ` Benjamin C.R. LaHaise
  2001-06-05 21:00         ` Comment on patch to remove nr_async_pages limitA Mike Galbraith
  1 sibling, 1 reply; 14+ messages in thread
From: Benjamin C.R. LaHaise @ 2001-06-05 19:21 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Marcelo Tosatti, Zlatko Calusic, lkml, linux-mm

On Tue, 5 Jun 2001, Mike Galbraith wrote:

> Yes.  If we start writing out sooner, we aren't stuck with pushing a
> ton of IO all at once and can use prudent limits.  Not only because of
> potential allocation problems, but because our situation is changing
> rapidly so small corrections done often is more precise than whopping
> big ones can be.

Hold on there big boy, writing out sooner is not better.  What if the
memory shortage is because real data is being written out to disk?
Swapping early causes many more problems than swapping late as extraneous
seeks to the swap partiton severely degrade performance.

		-ben


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Comment on patch to remove nr_async_pages limitA
  2001-06-05 19:21       ` Benjamin C.R. LaHaise
@ 2001-06-05 21:00         ` Mike Galbraith
  2001-06-05 22:21           ` Daniel Phillips
  0 siblings, 1 reply; 14+ messages in thread
From: Mike Galbraith @ 2001-06-05 21:00 UTC (permalink / raw)
  To: Benjamin C.R. LaHaise; +Cc: Marcelo Tosatti, Zlatko Calusic, lkml, linux-mm

On Tue, 5 Jun 2001, Benjamin C.R. LaHaise wrote:

> On Tue, 5 Jun 2001, Mike Galbraith wrote:
>
> > Yes.  If we start writing out sooner, we aren't stuck with pushing a
> > ton of IO all at once and can use prudent limits.  Not only because of
> > potential allocation problems, but because our situation is changing
> > rapidly so small corrections done often is more precise than whopping
> > big ones can be.
>
> Hold on there big boy, writing out sooner is not better.  What if the

(do definitely beat my thoughts up, please don't use condescending terms)

In some cases, it definitely is.  I can routinely improve throughput
by writing more.. that is a measurable and reproducable fact.  I know
also from measurement that it is not _always_ the right thing to do.

> memory shortage is because real data is being written out to disk?

(I would hope that we're doing our best to always be writing real data
to disk.  I also know that this isn't always the case.)

> Swapping early causes many more problems than swapping late as extraneous
> seeks to the swap partiton severely degrade performance.

That is not the case here at the spot in the performance curve I'm
looking at (transition to throughput).

Does this mean the block layer and/or elevator is having problems?  Why
would using avaliable disk bandwidth vs letting it lie dormant be a
generically bad thing?.. this I just can't understand.  The elevator
deals with seeks, the vm is flat not equipped to do so.. it contains
such concept.

Avoiding write is great, delaying write is not at _all_ great.

	-Mike


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Comment on patch to remove nr_async_pages limitA
  2001-06-05 21:00         ` Comment on patch to remove nr_async_pages limitA Mike Galbraith
@ 2001-06-05 22:21           ` Daniel Phillips
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel Phillips @ 2001-06-05 22:21 UTC (permalink / raw)
  To: Mike Galbraith, Benjamin C.R. LaHaise
  Cc: Marcelo Tosatti, Zlatko Calusic, lkml, linux-mm

On Tuesday 05 June 2001 23:00, Mike Galbraith wrote:
> On Tue, 5 Jun 2001, Benjamin C.R. LaHaise wrote:
> > Swapping early causes many more problems than swapping late as
> > extraneous seeks to the swap partiton severely degrade performance.
>
> That is not the case here at the spot in the performance curve I'm
> looking at (transition to throughput).
>
> Does this mean the block layer and/or elevator is having problems? 
> Why would using avaliable disk bandwidth vs letting it lie dormant be
> a generically bad thing?.. this I just can't understand.  The
> elevator deals with seeks, the vm is flat not equipped to do so.. it
> contains such concept.

Clearly, if the spindle a dirty file page belongs to is idle, we have 
goofed.

With process data the situation is a little different because the 
natural home of the data is not the swap device but main memory.  The 
following gets pretty close to the truth: when there is memory 
pressure, if the spindle a dirty process page belongs to is idle, we 
have goofed.

Well, as soon as I wrote those obvious truths I started thinking of 
exceptions, but they are silly exceptions such as:

  - read disk block 0
  - dirty last block of disk
  - dirty 1,000 blocks starting at block 0.

For good measure, delete the file the last block of the disk belongs 
to.  We have just sent the head off on a wild goose chase, but we had 
to work at it.  To handle such a set of events without requiring 
prescience we need to be able to cancel disk writes, but just ignoring 
such oddball situations is the next best thing.

That's all by way of saying I agree with you.

--
Daniel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Comment on patch to remove nr_async_pages limit
  2001-06-05 16:05     ` Comment on patch to remove nr_async_pages limit Zlatko Calusic
@ 2001-06-09  3:09       ` Rik van Riel
  2001-06-09  6:07         ` Mike Galbraith
  0 siblings, 1 reply; 14+ messages in thread
From: Rik van Riel @ 2001-06-09  3:09 UTC (permalink / raw)
  To: Zlatko Calusic; +Cc: Marcelo Tosatti, Mike Galbraith, lkml, linux-mm

On 5 Jun 2001, Zlatko Calusic wrote:
> Marcelo Tosatti <marcelo@conectiva.com.br> writes:
> 
> [snip]
> > Exactly. And when we reach a low watermark of memory, we start writting
> > out the anonymous memory.
> 
> Hm, my observations are a little bit different. I find that writeouts
> happen sooner than the moment we reach low watermark, and many times
> just in time to interact badly with some read I/O workload that made a
> virtual shortage of memory in the first place.

I have a patch that tries to address this by not reordering
the inactive list whenever we scan through it. I'll post it
right now ...

(yes, I've done some recreational patching while on holidays)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Comment on patch to remove nr_async_pages limit
  2001-06-09  3:09       ` Rik van Riel
@ 2001-06-09  6:07         ` Mike Galbraith
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Galbraith @ 2001-06-09  6:07 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Zlatko Calusic, Marcelo Tosatti, lkml, linux-mm

On Sat, 9 Jun 2001, Rik van Riel wrote:

> On 5 Jun 2001, Zlatko Calusic wrote:
> > Marcelo Tosatti <marcelo@conectiva.com.br> writes:
> >
> > [snip]
> > > Exactly. And when we reach a low watermark of memory, we start writting
> > > out the anonymous memory.
> >
> > Hm, my observations are a little bit different. I find that writeouts
> > happen sooner than the moment we reach low watermark, and many times
> > just in time to interact badly with some read I/O workload that made a
> > virtual shortage of memory in the first place.
>
> I have a patch that tries to address this by not reordering
> the inactive list whenever we scan through it. I'll post it
> right now ...

Excellent.  I've done some of that (crude but effective) and have had
nice encouraging results.  If the dirty list is long enough, this
most definitely improves behavior under heavy load.

	-Mike


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2001-06-09  6:08 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-06-05  1:04 Comment on patch to remove nr_async_pages limit Marcelo Tosatti
2001-06-05  7:38 ` Mike Galbraith
2001-06-05  6:18   ` Marcelo Tosatti
2001-06-05 10:32     ` Mike Galbraith
2001-06-05 11:42       ` Ed Tomlinson
2001-06-05 16:08         ` Zlatko Calusic
2001-06-05 19:21       ` Benjamin C.R. LaHaise
2001-06-05 21:00         ` Comment on patch to remove nr_async_pages limitA Mike Galbraith
2001-06-05 22:21           ` Daniel Phillips
2001-06-05 16:05     ` Comment on patch to remove nr_async_pages limit Zlatko Calusic
2001-06-09  3:09       ` Rik van Riel
2001-06-09  6:07         ` Mike Galbraith
2001-06-05 15:57   ` Zlatko Calusic
2001-06-05 15:56 ` Zlatko Calusic

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).