linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Rik van Riel <riel@redhat.com>
To: Linux Memory Management List <linux-mm@kvack.org>
Subject: [LSF/Collab] swap cache redesign idea
Date: Sun, 10 Apr 2011 20:50:01 -0400	[thread overview]
Message-ID: <4DA25039.3020700@redhat.com> (raw)

On Thursday after LSF, Hugh, Minchan, Mel, Johannes and I were
sitting in the hallway talking about yet more VM things.

During that discussion, we came up with a way to redesign the
swap cache.  During my flight home, I came with ideas on how
to use that redesign, that may make the changes worthwhile.

Currently, the page table entries that have swapped out pages
associated with them contain a swap entry, pointing directly
at the swap device and swap slot containing the data. Meanwhile,
the swap count lives in a separate array.

The redesign we are considering moving the swap entry to the
page cache radix tree for the swapper_space and having the pte
contain only the offset into the swapper_space.  The swap count
info can also fit inside the swapper_space page cache radix
tree (at least on 64 bits - on 32 bits we may need to get
creative or accept a smaller max amount of swap space).

This extra layer of indirection allows us to do several things:

1) get rid of the virtual address scanning swapoff; instead
    we just swap the data in and mark the pages as present in
    the swapper_space radix tree

2) free swap entries as the are read in, without waiting for
    the process to fault it in - this may be useful for memory
    types that have a large erase block

3) together with the defragmentation from (2), we can always
    do writes in large aligned blocks - the extra indirection
    will make it relatively easy to have special backend code
    for different kinds of swap space, since all the state can
    now live in just one place

4) skip writeout of zero-filled pages - this can be a big help
    for KVM virtual machines running Windows, since Windows zeroes
    out free pages;   simply discarding a zero-filled page is not
    at all simple in the current VM, where we would have to iterate
    over all the ptes to free the swap entry before being able to
    free the swap cache page (I am not sure how that locking would
    even work)

    with the extra layer of indirection, the locking for this scheme
    can be trivial - either the faulting process gets the old page,
    or it gets a new one, either way it'll be zero filled

5) skip writeout of pages the guest has marked as free - same as
    above, with the same easier locking

Only one real question remaining - how do we handle the swap count
in the new scheme?  On 64 bit systems we have enough space in the
radix tree, on 32 bit systems maybe we'll have to start overflowing
into the "swap_count_continued" logic a little sooner than we are
now and reduce the maximum swap size a little?

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

                 reply	other threads:[~2011-04-11  0:50 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DA25039.3020700@redhat.com \
    --to=riel@redhat.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).