linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jerome Glisse <jglisse@redhat.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Tejun Heo <tj@kernel.org>, Jan Kara <jack@suse.cz>,
	Josef Bacik <jbacik@fb.com>
Subject: Re: [PATCH 00/14] Small step toward KSM for file back page.
Date: Thu, 8 Oct 2020 11:30:28 -0400	[thread overview]
Message-ID: <20201008153028.GA3508856@redhat.com> (raw)
In-Reply-To: <20201007220916.GX20115@casper.infradead.org>

On Wed, Oct 07, 2020 at 11:09:16PM +0100, Matthew Wilcox wrote:
> On Wed, Oct 07, 2020 at 01:54:19PM -0400, Jerome Glisse wrote:
> > > For other things (NUMA distribution), we can point to something which
> > > isn't a struct page and can be distiguished from a real struct page by a
> > > bit somewhere (I have ideas for at least three bits in struct page that
> > > could be used for this).  Then use a pointer in that data structure to
> > > point to the real page.  Or do NUMA distribution at the inode level.
> > > Have a way to get from (inode, node) to an address_space which contains
> > > just regular pages.
> > 
> > How do you find all the copies ? KSM maintains a list for a reasons.
> > Same would be needed here because if you want to break the write prot
> > you need to find all the copy first. If you intend to walk page table
> > then how do you synchronize to avoid more copy to spawn while you
> > walk reverse mapping, we could lock the struct page i guess. Also how
> > do you walk device page table which are completely hidden from core mm.
> 
> So ... why don't you put a PageKsm page in the page cache?  That way you
> can share code with the current KSM implementation.  You'd need
> something like this:

I do just that but there is no need to change anything in page cache.
So below code is not necessary. What you need is a way to find all
the copies so if you have a write fault (or any write access) then
from that fault you get the mapping and offset and you use that to
lookup the fs specific informations and de-duplicate the page with
new page and the fs specific informations. Hence the filesystem code
do not need to know anything it all happens in generic common code.

So flow is:

  Same as before:
    1 - write fault (address, vma)
    2 - regular write fault handler -> find page in page cache

  New to common page fault code:
    3 - ksm check in write fault common code (same as ksm today
        for anonymous page fault code path).
    4 - break ksm (address, vma) -> (file offset, mapping)
        4.a - use mapping and file offset to lookup the proper
              fs specific information that were save when the
              page was made ksm.
        4.b - allocate new page and initialize it with that
              information (and page content), update page cache
              and mappings ie all the pte who where pointing to
              the ksm for that mapping at that offset to now use
              the new page (like KSM for anonymous page today).

  Resume regular code path:
        mkwrite /|| set pte ...

Roughly the same for write ioctl (other cases goes through GUP
which itself goes through page fault code path). There is no
need to change page cache in anyway. Just common code path that
enable write to file back page.

The fs specific information is page->private, some of the flags
(page->flags) and page->indexi (file offset). Everytime a page
is deduplicated a copy of that information is save in an alias
struct which you can get to from the the share KSM page (page->
mapping is a pointer to ksm root struct which has a pointer to
list of all aliases).

> 
> +++ b/mm/filemap.c
> @@ -1622,6 +1622,9 @@ struct page *find_lock_entry(struct address_space *mapping
> , pgoff_t index)
>                 lock_page(page);
>                 /* Has the page been truncated? */
>                 if (unlikely(page->mapping != mapping)) {
> +                       if (PageKsm(page)) {
> +                               ...
> +                       }
>                         unlock_page(page);
>                         put_page(page);
>                         goto repeat;
> @@ -1655,6 +1658,7 @@ struct page *find_lock_entry(struct address_space *mapping, pgoff_t index)
>   * * %FGP_WRITE - The page will be written
>   * * %FGP_NOFS - __GFP_FS will get cleared in gfp mask
>   * * %FGP_NOWAIT - Don't get blocked by page lock
> + * * %FGP_KSM - Return KSM pages
>   *
>   * If %FGP_LOCK or %FGP_CREAT are specified then the function may sleep even
>   * if the %GFP flags specified for %FGP_CREAT are atomic.
> @@ -1687,6 +1691,11 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
>  
>                 /* Has the page been truncated? */
>                 if (unlikely(page->mapping != mapping)) {
> +                       if (PageKsm(page) {
> +                               if (fgp_flags & FGP_KSM)
> +                                       return page;
> +                               ...
> +                       }
>                         unlock_page(page);
>                         put_page(page);
>                         goto repeat;
> 
> I don't know what you want to do when you find a KSM page, so I just left
> an ellipsis.
> 


  reply	other threads:[~2020-10-08 15:30 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-07  1:05 [PATCH 00/14] Small step toward KSM for file back page jglisse
2020-10-07  1:05 ` [PATCH 01/14] mm/pxa: page exclusive access add header file for all helpers jglisse
2020-10-07  1:05 ` [PATCH 02/14] fs: define filler_t as a function pointer type jglisse
2020-10-07  1:05 ` [PATCH 03/14] fs: directly use a_ops->freepage() instead of a local copy of it jglisse
2020-10-07  1:05 ` [PATCH 04/14] mm: add struct address_space to readpage() callback jglisse
2020-10-07  1:05 ` [PATCH 05/14] mm: add struct address_space to writepage() callback jglisse
2020-10-07  1:05 ` [PATCH 06/14] mm: add struct address_space to set_page_dirty() callback jglisse
2020-10-07  1:05 ` [PATCH 07/14] mm: add struct address_space to invalidatepage() callback jglisse
2020-10-07  1:05 ` [PATCH 08/14] mm: add struct address_space to releasepage() callback jglisse
2020-10-07  1:05 ` [PATCH 09/14] mm: add struct address_space to freepage() callback jglisse
2020-10-07  1:05 ` [PATCH 10/14] mm: add struct address_space to putback_page() callback jglisse
2020-10-07  1:06 ` [PATCH 11/14] mm: add struct address_space to launder_page() callback jglisse
2020-10-07  1:06 ` [PATCH 12/14] mm: add struct address_space to is_partially_uptodate() callback jglisse
2020-10-07  1:06 ` [PATCH 13/14] mm: add struct address_space to isolate_page() callback jglisse
2020-10-07  1:06 ` [PATCH 14/14] mm: add struct address_space to is_dirty_writeback() callback jglisse
2020-10-07  3:20 ` [PATCH 00/14] Small step toward KSM for file back page Matthew Wilcox
2020-10-07 14:48   ` Jerome Glisse
2020-10-07 17:05     ` Matthew Wilcox
2020-10-07 17:54       ` Jerome Glisse
2020-10-07 18:33         ` Matthew Wilcox
2020-10-07 21:45           ` Jerome Glisse
2020-10-07 22:09         ` Matthew Wilcox
2020-10-08 15:30           ` Jerome Glisse [this message]
2020-10-08 15:43             ` Matthew Wilcox
2020-10-08 18:48               ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201008153028.GA3508856@redhat.com \
    --to=jglisse@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=jbacik@fb.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).