linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 5/18] mark swapout pages PageWriteback()
@ 2002-05-26 20:40 Andrew Morton
  2002-05-27  0:57 ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2002-05-26 20:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: lkml



Pages which are under writeout to swap are locked, and not
PageWriteback().  So page allocators do not throttle against them in
shrink_caches().

This causes enormous list scans and general coma under really heavy
swapout loads.

One fix would be to teach shrink_cache() to wait on PG_locked for swap
pages.  The other approach is to set both PG_locked and PG_writeback
for swap pages so they can be handled in the same manner as file-backed
pages in shrink_cache().

This patch takes the latter approach.

=====================================

--- 2.5.18/fs/buffer.c~swap-writeback	Sat May 25 23:26:46 2002
+++ 2.5.18-akpm/fs/buffer.c	Sun May 26 00:50:20 2002
@@ -544,6 +544,14 @@ static void end_buffer_async_read(struct
 	 */
 	if (page_uptodate && !PageError(page))
 		SetPageUptodate(page);
+
+	/*
+	 * swap page handling is a bit hacky.  A standalone completion handler
+	 * for swapout pages would fix that up.  swapin can use this function.
+	 */
+	if (PageSwapCache(page) && PageWriteback(page))
+		end_page_writeback(page);
+
 	unlock_page(page);
 	return;
 
@@ -2271,6 +2279,9 @@ int brw_kiovec(int rw, int nr, struct ki
  * calls block_flushpage() under spinlock and hits a locked buffer, and
  * schedules under spinlock.   Another approach would be to teach
  * find_trylock_page() to also trylock the page's writeback flags.
+ *
+ * Swap pages are also marked PageWriteback when they are being written
+ * so that memory allocators will throttle on them.
  */
 int brw_page(int rw, struct page *page,
 		struct block_device *bdev, sector_t b[], int size)
@@ -2301,6 +2312,11 @@ int brw_page(int rw, struct page *page,
 		bh = bh->b_this_page;
 	} while (bh != head);
 
+	if (rw == WRITE) {
+		BUG_ON(PageWriteback(page));
+		SetPageWriteback(page);
+	}
+
 	/* Stage 2: start the IO */
 	do {
 		struct buffer_head *next = bh->b_this_page;
--- 2.5.18/mm/swap_state.c~swap-writeback	Sat May 25 23:26:46 2002
+++ 2.5.18-akpm/mm/swap_state.c	Sun May 26 00:50:19 2002
@@ -36,10 +36,8 @@ static int swap_writepage(struct page *p
  * swapper_space doesn't have a real inode, so it gets a special vm_writeback()
  * so we don't need swap special cases in generic_vm_writeback().
  *
- * FIXME: swap pages are locked, but not PageWriteback while under writeout.
- * This will confuse throttling in shrink_cache().  It may be advantageous to
- * set PG_writeback against swap pages while they're also locked.  Either that,
- * or special-case swap pages in shrink_cache().
+ * Swap pages are PageLocked and PageWriteback while under writeout so that
+ * memory allocators will throttle against them.
  */
 static int swap_vm_writeback(struct page *page, int *nr_to_write)
 {


-

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch 5/18] mark swapout pages PageWriteback()
  2002-05-26 20:40 [patch 5/18] mark swapout pages PageWriteback() Andrew Morton
@ 2002-05-27  0:57 ` Linus Torvalds
  2002-05-27  2:19   ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2002-05-27  0:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml



On Sun, 26 May 2002, Andrew Morton wrote:
>
> One fix would be to teach shrink_cache() to wait on PG_locked for swap
> pages.  The other approach is to set both PG_locked and PG_writeback
> for swap pages so they can be handled in the same manner as file-backed
> pages in shrink_cache().

How about making them do exactly what normal writeout does, namely drop
the locked bit too. Is there any advantage to holding it any more? The
difference between swap writeout and normal writeout seems to be fairly
arbitrary at this point.

			Linus


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch 5/18] mark swapout pages PageWriteback()
  2002-05-27  0:57 ` Linus Torvalds
@ 2002-05-27  2:19   ` Andrew Morton
  2002-05-27  3:38     ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2002-05-27  2:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: lkml

Linus Torvalds wrote:
> 
> On Sun, 26 May 2002, Andrew Morton wrote:
> >
> > One fix would be to teach shrink_cache() to wait on PG_locked for swap
> > pages.  The other approach is to set both PG_locked and PG_writeback
> > for swap pages so they can be handled in the same manner as file-backed
> > pages in shrink_cache().
> 
> How about making them do exactly what normal writeout does, namely drop
> the locked bit too. Is there any advantage to holding it any more? The
> difference between swap writeout and normal writeout seems to be fairly
> arbitrary at this point.
> 

That leads to block_flushpage() being called under spinlock against a
page which has locked buffers, so it schedules on the buffer lock and
the box deadlocks.

So... can do, but I'll have to sort out the block_flushpage-under-spinlock
problem in the process.

But I recall you saying that there was advantage in keeping swapout pages
locked so that aggressive memory users would throttle against their
own swapout.  What's the story there?

Generally, there are a number of irritating swap special-cases popping up
and yes, it would be nice to give swap a proper inode, superblock (maybe)
and a get_block so it can become more regular.  One obstacle there is
the PAGE_SIZE versus PAGE_CACHE_SIZE thing.  Would have to add a new
address_space.page_size for that, which would penalise other address_spaces
slightly (in terms of memory usage and code size).

I've been trying to not look at the swap code ;) But there will be some
benefit it teaching swap to go direct to BIO to avoid the extra buffer
allocations when things are squeezy.  So I do need to stick my nose in 
there.

-

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch 5/18] mark swapout pages PageWriteback()
  2002-05-27  2:19   ` Andrew Morton
@ 2002-05-27  3:38     ` Linus Torvalds
  2002-05-31 20:58       ` Andrea Arcangeli
  0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2002-05-27  3:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml



On Sun, 26 May 2002, Andrew Morton wrote:
>
> But I recall you saying that there was advantage in keeping swapout pages
> locked so that aggressive memory users would throttle against their
> own swapout.  What's the story there?

The advantage is not the lock itself, as much as having people who page in
swap pages be delayed on them - which ends up slowing down processes that
swap a lot.

BUT: that could equally well be done by doing a "wait_on_writeback()" or
similar, and it could also be a tunable thing (ie wait on writeback only
when we actually need to slow them down). In particular, _not_ slowing
them down does improve throughput, it just makes it really really nasty
from an interactive standpoint under some loads.

I don't know. I have this feeling that it would be good to try to share
all the semantics between swap pages and shared file mappings, but at the
same time I also have to admit to believing that swap _is_ special in some
ways, so if we don't ever really unify them I won't be shedding any huge
tears.

		Linus


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch 5/18] mark swapout pages PageWriteback()
  2002-05-27  3:38     ` Linus Torvalds
@ 2002-05-31 20:58       ` Andrea Arcangeli
  2002-05-31 23:02         ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Andrea Arcangeli @ 2002-05-31 20:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, lkml

On Sun, May 26, 2002 at 08:38:26PM -0700, Linus Torvalds wrote:
> 
> 
> On Sun, 26 May 2002, Andrew Morton wrote:
> >
> > But I recall you saying that there was advantage in keeping swapout pages
> > locked so that aggressive memory users would throttle against their
> > own swapout.  What's the story there?
> 
> The advantage is not the lock itself, as much as having people who page in
> swap pages be delayed on them - which ends up slowing down processes that
> swap a lot.
> 
> BUT: that could equally well be done by doing a "wait_on_writeback()" or
> similar, and it could also be a tunable thing (ie wait on writeback only
> when we actually need to slow them down). In particular, _not_ slowing
> them down does improve throughput, it just makes it really really nasty
> from an interactive standpoint under some loads.
> 
> I don't know. I have this feeling that it would be good to try to share
> all the semantics between swap pages and shared file mappings, but at the

that is definitely a good idea, that's why I resurrected in my current vm
patches your optimizations that allows a minor fault to be resolved even
if the swapcache is under async writeout.

The fact is that there is no difference at all between MAP_SHARED and
MAP_ANONYMOUS in terms of memory pressure, the only difference is that
with MAP_SHARED we are guaranteed that we won't run out of "swap" space.
No other differnece.

In short the same way MAP_ANON must pageout correctly, also MAP_SHARED
must swapout correctly with very vm intensive conditions.

I suggest not making differences in the algorithm (besides having to use
two different code paths, but in terms of mem pressure and minor faults
the actions should be the same, if not then one of the two is either
buggy or inferior).


> same time I also have to admit to believing that swap _is_ special in some
> ways, so if we don't ever really unify them I won't be shedding any huge
> tears.
> 
> 		Linus
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


Andrea

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch 5/18] mark swapout pages PageWriteback()
  2002-05-31 20:58       ` Andrea Arcangeli
@ 2002-05-31 23:02         ` Linus Torvalds
  2002-06-01  6:17           ` Hugh Dickins
  0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2002-05-31 23:02 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Andrew Morton, lkml


On Fri, 31 May 2002, Andrea Arcangeli wrote:
> 
> In short the same way MAP_ANON must pageout correctly, also MAP_SHARED
> must swapout correctly with very vm intensive conditions.

That is true, but it is ignoring the fact that there _are_ real technical 
differences between swap cache mappings and regular shared mappings.

One major difference is the approach to the last user: a last use of a 
shared mapping still needs to write out dirty state, while the last use of 
a swap page is better off noticing that it should just optimize away the 
write, and we can just turn the page back into a dirty anonymous page.

But the differences are much smaller than many of the similarities.

		Linus


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch 5/18] mark swapout pages PageWriteback()
  2002-05-31 23:02         ` Linus Torvalds
@ 2002-06-01  6:17           ` Hugh Dickins
  2002-06-05 11:22             ` Pavel Machek
  0 siblings, 1 reply; 8+ messages in thread
From: Hugh Dickins @ 2002-06-01  6:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrea Arcangeli, Andrew Morton, lkml

On Fri, 31 May 2002, Linus Torvalds wrote:
> On Fri, 31 May 2002, Andrea Arcangeli wrote:
> > 
> > In short the same way MAP_ANON must pageout correctly, also MAP_SHARED
> > must swapout correctly with very vm intensive conditions.

I strongly agree with Andrea on that.  Swap fault without page lock
was a step forward, reinstating it in the light of observed degradation
in one workload was right decision for that particular tail of release,
but it's a shame to have stayed that way.

> That is true, but it is ignoring the fact that there _are_ real technical 
> differences between swap cache mappings and regular shared mappings.
> 
> One major difference is the approach to the last user: a last use of a 
> shared mapping still needs to write out dirty state, while the last use of 
> a swap page is better off noticing that it should just optimize away the 
> write, and we can just turn the page back into a dirty anonymous page.

Poor example: isn't last use of swap page just like last use of
shared mapping of an _unlinked_ file?

Offhand, I think most of the differences between swap and filesystem
just come from our wish not to waste any time on that filesystem:
we _expect_ the object will be unlinked before it needs to hit disk.
Plus, it's important that going to disk doesn't need more memory.

Perhaps those are just attributes of a particular filesystem.
Just as anon pages graduated to being put on LRU a few months ago,
maybe they could graduate to being hashed pages of tmpfs objects?
Probably yes, but without wasting time and memory - more doubtful.

Hugh


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch 5/18] mark swapout pages PageWriteback()
  2002-06-01  6:17           ` Hugh Dickins
@ 2002-06-05 11:22             ` Pavel Machek
  0 siblings, 0 replies; 8+ messages in thread
From: Pavel Machek @ 2002-06-05 11:22 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Linus Torvalds, Andrea Arcangeli, Andrew Morton, lkml

Hi!

> > > In short the same way MAP_ANON must pageout correctly, also MAP_SHARED
> > > must swapout correctly with very vm intensive conditions.
> 
> I strongly agree with Andrea on that.  Swap fault without page lock
> was a step forward, reinstating it in the light of observed degradation
> in one workload was right decision for that particular tail of release,
> but it's a shame to have stayed that way.
> 
> > That is true, but it is ignoring the fact that there _are_ real technical 
> > differences between swap cache mappings and regular shared mappings.
> > 
> > One major difference is the approach to the last user: a last use of a 
> > shared mapping still needs to write out dirty state, while the last use of 
> > a swap page is better off noticing that it should just optimize away the 
> > write, and we can just turn the page back into a dirty anonymous page.
> 
> Poor example: isn't last use of swap page just like last use of
> shared mapping of an _unlinked_ file?

Eh? Suppose I have two different shared mappings of (unlinked:3). It
may be last use of swap page in first mapping, but what prevents me
from seeking and trying to read what I wrote?
									Pavel

-- 
(about SSSCA) "I don't say this lightly.  However, I really think that the U.S.
no longer is classifiable as a democracy, but rather as a plutocracy." --hpa

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2002-06-05 20:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-05-26 20:40 [patch 5/18] mark swapout pages PageWriteback() Andrew Morton
2002-05-27  0:57 ` Linus Torvalds
2002-05-27  2:19   ` Andrew Morton
2002-05-27  3:38     ` Linus Torvalds
2002-05-31 20:58       ` Andrea Arcangeli
2002-05-31 23:02         ` Linus Torvalds
2002-06-01  6:17           ` Hugh Dickins
2002-06-05 11:22             ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).