All of lore.kernel.org
 help / color / mirror / Atom feed
* swapcache size oddness
@ 2012-04-27 20:27 Dan Magenheimer
  2012-04-28  3:58 ` Hugh Dickins
  0 siblings, 1 reply; 3+ messages in thread
From: Dan Magenheimer @ 2012-04-27 20:27 UTC (permalink / raw)
  To: linux-mm

In continuing digging through the swap code (with the
overall objective of improving zcache policy), I was
looking at the size of the swapcache.

My understanding was that the swapcache is simply a
buffer cache for pages that are actively in the process
of being swapped in or swapped out.  And keeping pages
around in the swapcache is inefficient because every
process access to a page in the swapcache causes a
minor page fault.

So I was surprised to see that, under a memory intensive
workload, the swapcache can grow quite large.  I have
seen it grow to almost half of the size of RAM.

Digging into this oddity, I re-discovered the definition
for "vm_swap_full()" which, in scan_swap_map() is a
pre-condition for calling __try_to_reclaim_swap().
But vm_swap_full() compares how much free swap space
there is "on disk", with the total swap space available
"on disk" with no regard to how much RAM there is.
So on my system, which is running with 1GB RAM and
10GB swap, I think this is the reason that swapcache
is growing so large.

Am I misunderstanding something?  Or is this code
making some (possibly false) assumptions about how
swap is/should be sized relative to RAM?  Or maybe the
size of swapcache is harmless as long as it doesn't
approach total "on disk" size?

(Sorry if this is a silly question again...)

Thanks,
Dan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: swapcache size oddness
  2012-04-27 20:27 swapcache size oddness Dan Magenheimer
@ 2012-04-28  3:58 ` Hugh Dickins
  2012-04-28 16:48   ` Dan Magenheimer
  0 siblings, 1 reply; 3+ messages in thread
From: Hugh Dickins @ 2012-04-28  3:58 UTC (permalink / raw)
  To: Dan Magenheimer; +Cc: linux-mm

On Fri, 27 Apr 2012, Dan Magenheimer wrote:

> In continuing digging through the swap code (with the
> overall objective of improving zcache policy), I was
> looking at the size of the swapcache.
> 
> My understanding was that the swapcache is simply a
> buffer cache for pages that are actively in the process
> of being swapped in or swapped out.

It's that part of the pagecache for pages on swap.

Once written out, as with other pagecache pages written out under
reclaim, we do expect to reclaim them fairly soon (they're moved to
the bottom of the inactive list).  But when read back in, we read a
cluster at a time, hoping to pick up some more useful pages while the
disk head is there (though of course it may be a headless disk).  We
don't disassociate those from swap until they're dirtied (or swap
looks fullish), why should we?

> And keeping pages
> around in the swapcache is inefficient because every
> process access to a page in the swapcache causes a
> minor page fault.

What's inefficient about that?  A minor fault is much less
costly than the major fault of reading them back from disk.

> 
> So I was surprised to see that, under a memory intensive
> workload, the swapcache can grow quite large.  I have
> seen it grow to almost half of the size of RAM.

Nothing wrong with that, so long as they can be freed and
used for better purpose when needed.

> 
> Digging into this oddity, I re-discovered the definition
> for "vm_swap_full()" which, in scan_swap_map() is a
> pre-condition for calling __try_to_reclaim_swap().
> But vm_swap_full() compares how much free swap space
> there is "on disk", with the total swap space available
> "on disk" with no regard to how much RAM there is.
> So on my system, which is running with 1GB RAM and
> 10GB swap, I think this is the reason that swapcache
> is growing so large.
> 
> Am I misunderstanding something?  Or is this code
> making some (possibly false) assumptions about how
> swap is/should be sized relative to RAM?  Or maybe the
> size of swapcache is harmless as long as it doesn't
> approach total "on disk" size?

The size of swapcache is harmless: we break those pages' association
with swap once a better use for the page comes up.  But the size of
swapcache does (of course) represent a duplication of what's on swap.

As swap becomes full, that duplication becomes wasteful: we may need
some of the swap already in memory for saving other pages; so break
the association, freeing the swap for reuse but keeping the page
(but now it's no longer swapcache).

That's what the vm_swap_full() tests are about: choosing to free swap
when it's duplicated in memory, once it's becoming a scarce resource.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: swapcache size oddness
  2012-04-28  3:58 ` Hugh Dickins
@ 2012-04-28 16:48   ` Dan Magenheimer
  0 siblings, 0 replies; 3+ messages in thread
From: Dan Magenheimer @ 2012-04-28 16:48 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm

> From: Hugh Dickins [mailto:hughd@google.com]
> Subject: Re: swapcache size oddness

Hi Hugh --

Thanks for your, as usual, quick and thorough response!

> On Fri, 27 Apr 2012, Dan Magenheimer wrote:
> 
> > In continuing digging through the swap code (with the
> > overall objective of improving zcache policy), I was
> > looking at the size of the swapcache.
> >
> > My understanding was that the swapcache is simply a
> > buffer cache for pages that are actively in the process
> > of being swapped in or swapped out.
> 
> It's that part of the pagecache for pages on swap.
> 
> Once written out, as with other pagecache pages written out under
> reclaim, we do expect to reclaim them fairly soon (they're moved to
> the bottom of the inactive list).  But when read back in, we read a
> cluster at a time, hoping to pick up some more useful pages while the
> disk head is there (though of course it may be a headless disk).  We
> don't disassociate those from swap until they're dirtied (or swap
> looks fullish), why should we?

OK.  Yes, I forgot about the pages that are swapped in
"speculatively" rather than on demand.  This will certainly
result in an increase in the size of the swapcache (especially
with Rik's recent change that increases the average effective
cluster size).

> > And keeping pages
> > around in the swapcache is inefficient because every
> > process access to a page in the swapcache causes a
> > minor page fault.
> 
> What's inefficient about that?  A minor fault is much less
> costly than the major fault of reading them back from disk.

Yes, but a minor fault is much more costly than a read/write.
I guess I was under the mistaken assumption that a page in
the swapcache can never be directly accessed because the
page table would always have it marked as non-present,
in order to avoid races due to multiple process accesses
and I/O.  But I think I see how that is avoided now (at
least for non-shared-memory pages).

> > So I was surprised to see that, under a memory intensive
> > workload, the swapcache can grow quite large.  I have
> > seen it grow to almost half of the size of RAM.
> 
> Nothing wrong with that, so long as they can be freed and
> used for better purpose when needed.

Due to my mistaken assumption above, I thought a page
in the swap cache was "worse" than a normal anonymous
page (i.e. for system performance).

So really the primary difference between an anonymous page
that is NOT in the swap cache, and an anonymous page
that IS in the swap cache, is that the latter already has
a slot reserved on the swap disk.  (Flags and mapping
differences too of course.)

> > Digging into this oddity, I re-discovered the definition
> > for "vm_swap_full()" which, in scan_swap_map() is a
> > pre-condition for calling __try_to_reclaim_swap().
> > But vm_swap_full() compares how much free swap space
> > there is "on disk", with the total swap space available
> > "on disk" with no regard to how much RAM there is.
> > So on my system, which is running with 1GB RAM and
> > 10GB swap, I think this is the reason that swapcache
> > is growing so large.
> >
> > Am I misunderstanding something?  Or is this code
> > making some (possibly false) assumptions about how
> > swap is/should be sized relative to RAM?  Or maybe the
> > size of swapcache is harmless as long as it doesn't
> > approach total "on disk" size?
> 
> The size of swapcache is harmless: we break those pages' association
> with swap once a better use for the page comes up.  But the size of
> swapcache does (of course) represent a duplication of what's on swap.
> 
> As swap becomes full, that duplication becomes wasteful: we may need
> some of the swap already in memory for saving other pages; so break
> the association, freeing the swap for reuse but keeping the page
> (but now it's no longer swapcache).
> 
> That's what the vm_swap_full() tests are about: choosing to free swap
> when it's duplicated in memory, once it's becoming a scarce resource.

Got it.  Thanks!

Dan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-04-28 16:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-27 20:27 swapcache size oddness Dan Magenheimer
2012-04-28  3:58 ` Hugh Dickins
2012-04-28 16:48   ` Dan Magenheimer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.