All of lore.kernel.org
 help / color / mirror / Atom feed
* [LSF/MM TOPIC] Sharing file backed pages
@ 2019-01-23  8:48 ` Amir Goldstein
  0 siblings, 0 replies; 13+ messages in thread
From: Amir Goldstein @ 2019-01-23  8:48 UTC (permalink / raw)
  To: lsf-pc
  Cc: Al Viro, Darrick J. Wong, Dave Chinner, Jan Kara, Matthew Wilcox,
	Chris Mason, Miklos Szeredi, linux-fsdevel, Linux MM

Hi,

In his session about "reflink" in LSF/MM 2016 [1], Darrick Wong brought
up the subject of sharing pages between cloned files and the general vibe
in room was that it could be done.

In his talk about XFS subvolumes and snapshots [2], Dave Chinner said
that Matthew Willcox was "working on that problem".

I have started working on a new overlayfs address space implementation
that could also benefit from being able to share pages even for filesystems
that do not support clones (for copy up anticipation state).

To simplify the problem, we can start with sharing only uptodate clean
pages that map the same offset in respected files. While the same offset
requirement somewhat limits the use cases that benefit from shared
file pages, there is still a vast majority of use cases (i.e. clone full image),
where sharing pages of similar offset will bring a lot of benefit.

At first glance, this requires dropping the assumption that a for an uptodate
clean page, vmf->vma->vm_file->f_inode == page->mapping->host.
Is there really such an assumption in common vfs/mm code?
and what will it take to drop it?

I would like to discuss where do we stand on this effort and what are the
steps we need to take to move this forward, as well as to collaborate the
efforts between the interested parties (e.g. xfs, btrfs, overlayfs, anyone?).

Thanks,
Amir.

[1] https://lwn.net/Articles/684826/
[2] https://lwn.net/Articles/747633/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [LSF/MM TOPIC] Sharing file backed pages
@ 2019-01-23  8:48 ` Amir Goldstein
  0 siblings, 0 replies; 13+ messages in thread
From: Amir Goldstein @ 2019-01-23  8:48 UTC (permalink / raw)
  To: lsf-pc
  Cc: Al Viro, Darrick J. Wong, Dave Chinner, Jan Kara, Matthew Wilcox,
	Chris Mason, Miklos Szeredi, linux-fsdevel, Linux MM

Hi,

In his session about "reflink" in LSF/MM 2016 [1], Darrick Wong brought
up the subject of sharing pages between cloned files and the general vibe
in room was that it could be done.

In his talk about XFS subvolumes and snapshots [2], Dave Chinner said
that Matthew Willcox was "working on that problem".

I have started working on a new overlayfs address space implementation
that could also benefit from being able to share pages even for filesystems
that do not support clones (for copy up anticipation state).

To simplify the problem, we can start with sharing only uptodate clean
pages that map the same offset in respected files. While the same offset
requirement somewhat limits the use cases that benefit from shared
file pages, there is still a vast majority of use cases (i.e. clone full image),
where sharing pages of similar offset will bring a lot of benefit.

At first glance, this requires dropping the assumption that a for an uptodate
clean page, vmf->vma->vm_file->f_inode == page->mapping->host.
Is there really such an assumption in common vfs/mm code?
and what will it take to drop it?

I would like to discuss where do we stand on this effort and what are the
steps we need to take to move this forward, as well as to collaborate the
efforts between the interested parties (e.g. xfs, btrfs, overlayfs, anyone?).

Thanks,
Amir.

[1] https://lwn.net/Articles/684826/
[2] https://lwn.net/Articles/747633/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Sharing file backed pages
  2019-01-23  8:48 ` Amir Goldstein
  (?)
@ 2019-01-23 14:54 ` Jan Kara
  2019-01-23 15:12   ` Jerome Glisse
                     ` (2 more replies)
  -1 siblings, 3 replies; 13+ messages in thread
From: Jan Kara @ 2019-01-23 14:54 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: lsf-pc, Al Viro, Darrick J. Wong, Dave Chinner, Jan Kara,
	Matthew Wilcox, Chris Mason, Miklos Szeredi, linux-fsdevel,
	Linux MM, Jerome Glisse

On Wed 23-01-19 10:48:58, Amir Goldstein wrote:
> In his session about "reflink" in LSF/MM 2016 [1], Darrick Wong brought
> up the subject of sharing pages between cloned files and the general vibe
> in room was that it could be done.
> 
> In his talk about XFS subvolumes and snapshots [2], Dave Chinner said
> that Matthew Willcox was "working on that problem".
> 
> I have started working on a new overlayfs address space implementation
> that could also benefit from being able to share pages even for filesystems
> that do not support clones (for copy up anticipation state).
> 
> To simplify the problem, we can start with sharing only uptodate clean
> pages that map the same offset in respected files. While the same offset
> requirement somewhat limits the use cases that benefit from shared file
> pages, there is still a vast majority of use cases (i.e. clone full
> image), where sharing pages of similar offset will bring a lot of
> benefit.
> 
> At first glance, this requires dropping the assumption that a for an
> uptodate clean page, vmf->vma->vm_file->f_inode == page->mapping->host.
> Is there really such an assumption in common vfs/mm code?  and what will
> it take to drop it?

There definitely is such assumption. Take for example page reclaim as one
such place that will be non-trivial to deal with. You need to remove the
page from page cache of all inodes that contain it without having any file
context whatsoever. So you will need to create some way for this page->page
caches mapping to happen. Jerome in his talk at LSF/MM last year [1] actually
nicely summarized what it would take to get rid of page->mapping
dereferences. He even had some preliminary patches. To sum it up, it's a
lot of intrusive work but in principle it is possible.

[1] https://lwn.net/Articles/752564/

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Sharing file backed pages
  2019-01-23 14:54 ` Jan Kara
@ 2019-01-23 15:12   ` Jerome Glisse
  2019-01-23 15:26     ` Jerome Glisse
  2019-01-23 17:57     ` Amir Goldstein
  2019-01-24 10:39   ` Kirill A. Shutemov
  2 siblings, 1 reply; 13+ messages in thread
From: Jerome Glisse @ 2019-01-23 15:12 UTC (permalink / raw)
  To: Jan Kara
  Cc: Amir Goldstein, lsf-pc, Al Viro, Darrick J. Wong, Dave Chinner,
	Matthew Wilcox, Chris Mason, Miklos Szeredi, linux-fsdevel,
	Linux MM

On Wed, Jan 23, 2019 at 03:54:34PM +0100, Jan Kara wrote:
> On Wed 23-01-19 10:48:58, Amir Goldstein wrote:
> > In his session about "reflink" in LSF/MM 2016 [1], Darrick Wong brought
> > up the subject of sharing pages between cloned files and the general vibe
> > in room was that it could be done.
> > 
> > In his talk about XFS subvolumes and snapshots [2], Dave Chinner said
> > that Matthew Willcox was "working on that problem".
> > 
> > I have started working on a new overlayfs address space implementation
> > that could also benefit from being able to share pages even for filesystems
> > that do not support clones (for copy up anticipation state).
> > 
> > To simplify the problem, we can start with sharing only uptodate clean
> > pages that map the same offset in respected files. While the same offset
> > requirement somewhat limits the use cases that benefit from shared file
> > pages, there is still a vast majority of use cases (i.e. clone full
> > image), where sharing pages of similar offset will bring a lot of
> > benefit.
> > 
> > At first glance, this requires dropping the assumption that a for an
> > uptodate clean page, vmf->vma->vm_file->f_inode == page->mapping->host.
> > Is there really such an assumption in common vfs/mm code?  and what will
> > it take to drop it?
> 
> There definitely is such assumption. Take for example page reclaim as one
> such place that will be non-trivial to deal with. You need to remove the
> page from page cache of all inodes that contain it without having any file
> context whatsoever. So you will need to create some way for this page->page
> caches mapping to happen. Jerome in his talk at LSF/MM last year [1] actually
> nicely summarized what it would take to get rid of page->mapping
> dereferences. He even had some preliminary patches. To sum it up, it's a
> lot of intrusive work but in principle it is possible.
> 
> [1] https://lwn.net/Articles/752564/
> 

I intend to post a v2 of my patchset doing that sometime soon. For
various reasons this had been push to the bottom of my todo list since
last year. It is now almost at the top and it will stay at the top.
So i will be resuming work on that.

I wanted to propose this topic again as a joint session with mm so
here is my proposal:


I would like to discuss the removal of page mapping field dependency
in most kernel code path so the we can overload that field for generic
page write protection (KSM) for file back pages. The whole idea behind
this is that we almost always have the mapping a page belongs to within
the call stack for any function that operate on a file or on a vma do
have it:
    - syscall/kernel on a file (file -> inode -> mapping)
    - syscall/kernel on virtual address (vma -> file -> mapping)
    - write back for a given mapping

Note that the plan is not to free up the mapping field in struct page
but to reduce the number of place that needs the mapping corresponding
to a page to as few places as possible. The few exceptions are:
    - page reclaim
    - memory compaction
    - set_page_dirty() on GUPed (get_user_pages*()) pages

For page reclaim and memory compaction we do not care about mapping
exactly but about being able to unmap/migrate a page. So any over-
loading of mapping needs to keep providing helpers to handle those
cases.

For set_page_dirty() on GUPed pages we can take a slow path if the
page has an overloaded mapping field.


Previous patchset:
https://lore.kernel.org/lkml/20180404191831.5378-1-jglisse@redhat.com/

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Sharing file backed pages
  2019-01-23 15:12   ` Jerome Glisse
@ 2019-01-23 15:26     ` Jerome Glisse
  0 siblings, 0 replies; 13+ messages in thread
From: Jerome Glisse @ 2019-01-23 15:26 UTC (permalink / raw)
  To: Jan Kara
  Cc: Amir Goldstein, lsf-pc, Al Viro, Darrick J. Wong, Dave Chinner,
	Matthew Wilcox, Chris Mason, Miklos Szeredi, linux-fsdevel,
	Linux MM

On Wed, Jan 23, 2019 at 10:12:29AM -0500, Jerome Glisse wrote:
> On Wed, Jan 23, 2019 at 03:54:34PM +0100, Jan Kara wrote:
> > On Wed 23-01-19 10:48:58, Amir Goldstein wrote:
> > > In his session about "reflink" in LSF/MM 2016 [1], Darrick Wong brought
> > > up the subject of sharing pages between cloned files and the general vibe
> > > in room was that it could be done.
> > > 
> > > In his talk about XFS subvolumes and snapshots [2], Dave Chinner said
> > > that Matthew Willcox was "working on that problem".
> > > 
> > > I have started working on a new overlayfs address space implementation
> > > that could also benefit from being able to share pages even for filesystems
> > > that do not support clones (for copy up anticipation state).
> > > 
> > > To simplify the problem, we can start with sharing only uptodate clean
> > > pages that map the same offset in respected files. While the same offset
> > > requirement somewhat limits the use cases that benefit from shared file
> > > pages, there is still a vast majority of use cases (i.e. clone full
> > > image), where sharing pages of similar offset will bring a lot of
> > > benefit.
> > > 
> > > At first glance, this requires dropping the assumption that a for an
> > > uptodate clean page, vmf->vma->vm_file->f_inode == page->mapping->host.
> > > Is there really such an assumption in common vfs/mm code?  and what will
> > > it take to drop it?
> > 
> > There definitely is such assumption. Take for example page reclaim as one
> > such place that will be non-trivial to deal with. You need to remove the
> > page from page cache of all inodes that contain it without having any file
> > context whatsoever. So you will need to create some way for this page->page
> > caches mapping to happen. Jerome in his talk at LSF/MM last year [1] actually
> > nicely summarized what it would take to get rid of page->mapping
> > dereferences. He even had some preliminary patches. To sum it up, it's a
> > lot of intrusive work but in principle it is possible.
> > 
> > [1] https://lwn.net/Articles/752564/
> > 
> 
> I intend to post a v2 of my patchset doing that sometime soon. For
> various reasons this had been push to the bottom of my todo list since
> last year. It is now almost at the top and it will stay at the top.
> So i will be resuming work on that.
> 
> I wanted to propose this topic again as a joint session with mm so
> here is my proposal:
> 
> 
> I would like to discuss the removal of page mapping field dependency
> in most kernel code path so the we can overload that field for generic
> page write protection (KSM) for file back pages. The whole idea behind
> this is that we almost always have the mapping a page belongs to within
> the call stack for any function that operate on a file or on a vma do
> have it:
>     - syscall/kernel on a file (file -> inode -> mapping)
>     - syscall/kernel on virtual address (vma -> file -> mapping)
>     - write back for a given mapping
> 
> Note that the plan is not to free up the mapping field in struct page
> but to reduce the number of place that needs the mapping corresponding
> to a page to as few places as possible. The few exceptions are:
>     - page reclaim
>     - memory compaction
>     - set_page_dirty() on GUPed (get_user_pages*()) pages
> 
> For page reclaim and memory compaction we do not care about mapping
> exactly but about being able to unmap/migrate a page. So any over-
> loading of mapping needs to keep providing helpers to handle those
> cases.
> 
> For set_page_dirty() on GUPed pages we can take a slow path if the
> page has an overloaded mapping field.
> 
> 
> Previous patchset:
> https://lore.kernel.org/lkml/20180404191831.5378-1-jglisse@redhat.com/

Stupid me forget to say what i want to talk about during LSF/MM
session:
    - very quick overlook of the patchset and then taking questions on it
    - gather people feeling/opinion (does it looks good ?)
    - merging strategy for which i have some thought and would like to
      gather feedback on them

I expect to post a v2 long before LSF/MM probably in February so
people will have sometime to look at it. A warning it is almost
entirely all done with coccinelle as it is almost only mechanical
changes.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Sharing file backed pages
@ 2019-01-23 17:06   ` James Bottomley
  0 siblings, 0 replies; 13+ messages in thread
From: James Bottomley @ 2019-01-23 17:06 UTC (permalink / raw)
  To: Amir Goldstein, lsf-pc
  Cc: Al Viro, Darrick J. Wong, Dave Chinner, Jan Kara, Matthew Wilcox,
	Chris Mason, Miklos Szeredi, linux-fsdevel, Linux MM

On Wed, 2019-01-23 at 10:48 +0200, Amir Goldstein wrote:
> Hi,
> 
> In his session about "reflink" in LSF/MM 2016 [1], Darrick Wong
> brought up the subject of sharing pages between cloned files and the
> general vibe in room was that it could be done.

This subject has been around for a while.  We talked about cache
sharing for containers in LSF/MM 2013, although it was as a discussion
within a session rather than a session about it.  At that time,
Parallels already had an out of tree implementation of a daemon that
forced this sharing and docker was complaining about the dual caching
problem of their graph drivers.

So, what we need in addition to reflink for container images is
something like ksm for containers which can force read only sharing of
pages that have the same content even though they're apparently from
different files.  This is because most cloud container systems run
multiple copies of the same container image even if the overlays don't
necessarily reflect the origin.  Essentially it's the same reason why
reflink doesn't solve the sharing problem entirely for VMs.

James


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Sharing file backed pages
@ 2019-01-23 17:06   ` James Bottomley
  0 siblings, 0 replies; 13+ messages in thread
From: James Bottomley @ 2019-01-23 17:06 UTC (permalink / raw)
  To: Amir Goldstein, lsf-pc
  Cc: Al Viro, Darrick J. Wong, Dave Chinner, Jan Kara, Matthew Wilcox,
	Chris Mason, Miklos Szeredi, linux-fsdevel, Linux MM

On Wed, 2019-01-23 at 10:48 +0200, Amir Goldstein wrote:
> Hi,
> 
> In his session about "reflink" in LSF/MM 2016 [1], Darrick Wong
> brought up the subject of sharing pages between cloned files and the
> general vibe in room was that it could be done.

This subject has been around for a while.  We talked about cache
sharing for containers in LSF/MM 2013, although it was as a discussion
within a session rather than a session about it.  At that time,
Parallels already had an out of tree implementation of a daemon that
forced this sharing and docker was complaining about the dual caching
problem of their graph drivers.

So, what we need in addition to reflink for container images is
something like ksm for containers which can force read only sharing of
pages that have the same content even though they're apparently from
different files.  This is because most cloud container systems run
multiple copies of the same container image even if the overlays don't
necessarily reflect the origin.  Essentially it's the same reason why
reflink doesn't solve the sharing problem entirely for VMs.

James


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Sharing file backed pages
@ 2019-01-23 17:57     ` Amir Goldstein
  0 siblings, 0 replies; 13+ messages in thread
From: Amir Goldstein @ 2019-01-23 17:57 UTC (permalink / raw)
  To: Jan Kara
  Cc: lsf-pc, Al Viro, Darrick J. Wong, Dave Chinner, Matthew Wilcox,
	Chris Mason, Miklos Szeredi, linux-fsdevel, Linux MM,
	Jerome Glisse

On Wed, Jan 23, 2019 at 4:54 PM Jan Kara <jack@suse.cz> wrote:
...
> >
> > At first glance, this requires dropping the assumption that a for an
> > uptodate clean page, vmf->vma->vm_file->f_inode == page->mapping->host.
> > Is there really such an assumption in common vfs/mm code?  and what will
> > it take to drop it?
>
> There definitely is such assumption. Take for example page reclaim as one
> such place that will be non-trivial to deal with. You need to remove the
> page from page cache of all inodes that contain it without having any file
> context whatsoever. So you will need to create some way for this page->page
> caches mapping to happen. Jerome in his talk at LSF/MM last year [1] actually
> nicely summarized what it would take to get rid of page->mapping
> dereferences. He even had some preliminary patches. To sum it up, it's a
> lot of intrusive work but in principle it is possible.
>
> [1] https://lwn.net/Articles/752564/
>

That would be real nice if that work makes progress.
However, for the sake of discussion, for the narrow case of overlayfs page
sharing, if page->mapping is the overlay mapping, then it already has
references to the underlying inode/mapping and overlayfs mapping ops
can do the right thing for reclaim and migrate.

So the fact that there is a lot of code referencing page->mapping (I know that)
doesn't really answer my question of how hard it is to drop the assumption
that vmf->vma->vm_file->f_inode == page->mapping->host for read protected
uptodate pages from common code.
Because if overlayfs (or any other arbitrator) will make sure that dirty pages
and non uptodate pages abide by existing page->mapping semantics, then
block layer code (for example) can still safely dereference page->mapping.

In any case, I'd really love to see the first part of Jerome's work merged, with
mapping propagated to all common helpers, even if the fs-specific patches
and KSM patches will take longer to land.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Sharing file backed pages
@ 2019-01-23 17:57     ` Amir Goldstein
  0 siblings, 0 replies; 13+ messages in thread
From: Amir Goldstein @ 2019-01-23 17:57 UTC (permalink / raw)
  To: Jan Kara
  Cc: lsf-pc, Al Viro, Darrick J. Wong, Dave Chinner, Matthew Wilcox,
	Chris Mason, Miklos Szeredi, linux-fsdevel, Linux MM,
	Jerome Glisse

On Wed, Jan 23, 2019 at 4:54 PM Jan Kara <jack@suse.cz> wrote:
...
> >
> > At first glance, this requires dropping the assumption that a for an
> > uptodate clean page, vmf->vma->vm_file->f_inode == page->mapping->host.
> > Is there really such an assumption in common vfs/mm code?  and what will
> > it take to drop it?
>
> There definitely is such assumption. Take for example page reclaim as one
> such place that will be non-trivial to deal with. You need to remove the
> page from page cache of all inodes that contain it without having any file
> context whatsoever. So you will need to create some way for this page->page
> caches mapping to happen. Jerome in his talk at LSF/MM last year [1] actually
> nicely summarized what it would take to get rid of page->mapping
> dereferences. He even had some preliminary patches. To sum it up, it's a
> lot of intrusive work but in principle it is possible.
>
> [1] https://lwn.net/Articles/752564/
>

That would be real nice if that work makes progress.
However, for the sake of discussion, for the narrow case of overlayfs page
sharing, if page->mapping is the overlay mapping, then it already has
references to the underlying inode/mapping and overlayfs mapping ops
can do the right thing for reclaim and migrate.

So the fact that there is a lot of code referencing page->mapping (I know that)
doesn't really answer my question of how hard it is to drop the assumption
that vmf->vma->vm_file->f_inode == page->mapping->host for read protected
uptodate pages from common code.
Because if overlayfs (or any other arbitrator) will make sure that dirty pages
and non uptodate pages abide by existing page->mapping semantics, then
block layer code (for example) can still safely dereference page->mapping.

In any case, I'd really love to see the first part of Jerome's work merged, with
mapping propagated to all common helpers, even if the fs-specific patches
and KSM patches will take longer to land.

Thanks,
Amir.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Sharing file backed pages
  2019-01-23  8:48 ` Amir Goldstein
                   ` (2 preceding siblings ...)
  (?)
@ 2019-01-23 19:10 ` Matthew Wilcox
  -1 siblings, 0 replies; 13+ messages in thread
From: Matthew Wilcox @ 2019-01-23 19:10 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: lsf-pc, Al Viro, Darrick J. Wong, Dave Chinner, Jan Kara,
	Chris Mason, Miklos Szeredi, linux-fsdevel, Linux MM

On Wed, Jan 23, 2019 at 10:48:58AM +0200, Amir Goldstein wrote:
> Hi,
> 
> In his session about "reflink" in LSF/MM 2016 [1], Darrick Wong brought
> up the subject of sharing pages between cloned files and the general vibe
> in room was that it could be done.
> 
> In his talk about XFS subvolumes and snapshots [2], Dave Chinner said
> that Matthew Willcox was "working on that problem".

My solution is to move the DAX hacks into the page cache proper.  For a
reflinked file, the filesystem would create a canonical address_space
to own the pages, and this is what ->mapping and ->index would refer to.

Instances of that reflinked file would each have their own address_space,
just as they have their own inode.  The i_pages array would contain only
PFN entries (until the COWs start).

I'm currently at LCA; please excuse me for not participating more fully
right now.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Sharing file backed pages
  2019-01-23 14:54 ` Jan Kara
  2019-01-23 15:12   ` Jerome Glisse
  2019-01-23 17:57     ` Amir Goldstein
@ 2019-01-24 10:39   ` Kirill A. Shutemov
  2019-01-25  8:39       ` Amir Goldstein
  2 siblings, 1 reply; 13+ messages in thread
From: Kirill A. Shutemov @ 2019-01-24 10:39 UTC (permalink / raw)
  To: Jan Kara
  Cc: Amir Goldstein, lsf-pc, Al Viro, Darrick J. Wong, Dave Chinner,
	Matthew Wilcox, Chris Mason, Miklos Szeredi, linux-fsdevel,
	Linux MM, Jerome Glisse

On Wed, Jan 23, 2019 at 03:54:34PM +0100, Jan Kara wrote:
> On Wed 23-01-19 10:48:58, Amir Goldstein wrote:
> > In his session about "reflink" in LSF/MM 2016 [1], Darrick Wong brought
> > up the subject of sharing pages between cloned files and the general vibe
> > in room was that it could be done.
> > 
> > In his talk about XFS subvolumes and snapshots [2], Dave Chinner said
> > that Matthew Willcox was "working on that problem".
> > 
> > I have started working on a new overlayfs address space implementation
> > that could also benefit from being able to share pages even for filesystems
> > that do not support clones (for copy up anticipation state).
> > 
> > To simplify the problem, we can start with sharing only uptodate clean
> > pages that map the same offset in respected files. While the same offset
> > requirement somewhat limits the use cases that benefit from shared file
> > pages, there is still a vast majority of use cases (i.e. clone full
> > image), where sharing pages of similar offset will bring a lot of
> > benefit.
> > 
> > At first glance, this requires dropping the assumption that a for an
> > uptodate clean page, vmf->vma->vm_file->f_inode == page->mapping->host.
> > Is there really such an assumption in common vfs/mm code?  and what will
> > it take to drop it?
> 
> There definitely is such assumption. Take for example page reclaim as one
> such place that will be non-trivial to deal with. You need to remove the
> page from page cache of all inodes that contain it without having any file
> context whatsoever. So you will need to create some way for this page->page
> caches mapping to happen.

We have it solved for anon pages where we need to find all VMA the page
might be mapped to. I think we should look into adopting anon_vma
approach[1] for files too. From the first look the problemspace looks very
similar.

[1] https://lwn.net/Articles/383162/

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Sharing file backed pages
@ 2019-01-25  8:39       ` Amir Goldstein
  0 siblings, 0 replies; 13+ messages in thread
From: Amir Goldstein @ 2019-01-25  8:39 UTC (permalink / raw)
  To: Kirill A. Shutemov, Jerome Glisse, Jan Kara
  Cc: lsf-pc, Al Viro, Darrick J. Wong, Dave Chinner, Matthew Wilcox,
	Chris Mason, Miklos Szeredi, linux-fsdevel, Linux MM

On Thu, Jan 24, 2019 at 12:39 PM Kirill A. Shutemov
<kirill@shutemov.name> wrote:
>
> On Wed, Jan 23, 2019 at 03:54:34PM +0100, Jan Kara wrote:
> > On Wed 23-01-19 10:48:58, Amir Goldstein wrote:
> > > In his session about "reflink" in LSF/MM 2016 [1], Darrick Wong brought
> > > up the subject of sharing pages between cloned files and the general vibe
> > > in room was that it could be done.
> > >
> > > In his talk about XFS subvolumes and snapshots [2], Dave Chinner said
> > > that Matthew Willcox was "working on that problem".
> > >
> > > I have started working on a new overlayfs address space implementation
> > > that could also benefit from being able to share pages even for filesystems
> > > that do not support clones (for copy up anticipation state).
> > >
> > > To simplify the problem, we can start with sharing only uptodate clean
> > > pages that map the same offset in respected files. While the same offset
> > > requirement somewhat limits the use cases that benefit from shared file
> > > pages, there is still a vast majority of use cases (i.e. clone full
> > > image), where sharing pages of similar offset will bring a lot of
> > > benefit.
> > >
> > > At first glance, this requires dropping the assumption that a for an
> > > uptodate clean page, vmf->vma->vm_file->f_inode == page->mapping->host.
> > > Is there really such an assumption in common vfs/mm code?  and what will
> > > it take to drop it?
> >
> > There definitely is such assumption. Take for example page reclaim as one
> > such place that will be non-trivial to deal with. You need to remove the
> > page from page cache of all inodes that contain it without having any file
> > context whatsoever. So you will need to create some way for this page->page
> > caches mapping to happen.
>
> We have it solved for anon pages where we need to find all VMA the page
> might be mapped to. I think we should look into adopting anon_vma
> approach[1] for files too. From the first look the problemspace looks very
> similar.
>

Yes there are many similarities and we should definitely adopt existing
solutions for shared anon pages. There are also differences and we need
to make sure we cover them in the design.

For example, reclaiming a multiply shared page may prove to be more
expensive then reclaiming a non shared page. Depending on how the page
has ended up being shared (perhaps by KSM or by a special copy_file_range()
mode on an fs that doesn't support clone_file_range), the next time
the instances
of the shared page are faulted in, they may not be shared anymore and may
consume more cache space.

I'd also like to discuss which control the filesystem gets over
unsharing a page.
Will fs have a say before page is COWed? By which order of VMAs?
I think most people currently view the shared pages concept as symetric for
all VMAs that share the page, but for overlayfs, a "master-slave" or "stacked"
model might be a better fit, so that, for example, "master" can make a call to
notify the "slave" about page being dirty instead of breaking the sharing.

Jerome,

Do you think we will have time to cover these issues in the joint session.
Perhaps we should tentatively plan for a filesystem track session for
filesystem followup issues?

Some issues I can think of are:
- Which control filesystem gets for new functionality (see above)
- Common code to help sharing pages, i.e. for generic vfs interfaces
  like clone/dedupe/copy_range
- Can/should blockdev pages (of same block) be shared with file
  pages of the filesystem on that blockdev by common mpage_ helpers?
- A common use case is that filesystem images are cloned and loop mounted.
  How can we propagate the knowledge about files data on loop mounted fs
  originating from the same underlying block though the loop device? (*)

(*) loop device is just a simple example, but same can apply to other
storage stacks as well where block layer has dedupe.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Sharing file backed pages
@ 2019-01-25  8:39       ` Amir Goldstein
  0 siblings, 0 replies; 13+ messages in thread
From: Amir Goldstein @ 2019-01-25  8:39 UTC (permalink / raw)
  To: Kirill A. Shutemov, Jerome Glisse, Jan Kara
  Cc: lsf-pc, Al Viro, Darrick J. Wong, Dave Chinner, Matthew Wilcox,
	Chris Mason, Miklos Szeredi, linux-fsdevel, Linux MM

On Thu, Jan 24, 2019 at 12:39 PM Kirill A. Shutemov
<kirill@shutemov.name> wrote:
>
> On Wed, Jan 23, 2019 at 03:54:34PM +0100, Jan Kara wrote:
> > On Wed 23-01-19 10:48:58, Amir Goldstein wrote:
> > > In his session about "reflink" in LSF/MM 2016 [1], Darrick Wong brought
> > > up the subject of sharing pages between cloned files and the general vibe
> > > in room was that it could be done.
> > >
> > > In his talk about XFS subvolumes and snapshots [2], Dave Chinner said
> > > that Matthew Willcox was "working on that problem".
> > >
> > > I have started working on a new overlayfs address space implementation
> > > that could also benefit from being able to share pages even for filesystems
> > > that do not support clones (for copy up anticipation state).
> > >
> > > To simplify the problem, we can start with sharing only uptodate clean
> > > pages that map the same offset in respected files. While the same offset
> > > requirement somewhat limits the use cases that benefit from shared file
> > > pages, there is still a vast majority of use cases (i.e. clone full
> > > image), where sharing pages of similar offset will bring a lot of
> > > benefit.
> > >
> > > At first glance, this requires dropping the assumption that a for an
> > > uptodate clean page, vmf->vma->vm_file->f_inode == page->mapping->host.
> > > Is there really such an assumption in common vfs/mm code?  and what will
> > > it take to drop it?
> >
> > There definitely is such assumption. Take for example page reclaim as one
> > such place that will be non-trivial to deal with. You need to remove the
> > page from page cache of all inodes that contain it without having any file
> > context whatsoever. So you will need to create some way for this page->page
> > caches mapping to happen.
>
> We have it solved for anon pages where we need to find all VMA the page
> might be mapped to. I think we should look into adopting anon_vma
> approach[1] for files too. From the first look the problemspace looks very
> similar.
>

Yes there are many similarities and we should definitely adopt existing
solutions for shared anon pages. There are also differences and we need
to make sure we cover them in the design.

For example, reclaiming a multiply shared page may prove to be more
expensive then reclaiming a non shared page. Depending on how the page
has ended up being shared (perhaps by KSM or by a special copy_file_range()
mode on an fs that doesn't support clone_file_range), the next time
the instances
of the shared page are faulted in, they may not be shared anymore and may
consume more cache space.

I'd also like to discuss which control the filesystem gets over
unsharing a page.
Will fs have a say before page is COWed? By which order of VMAs?
I think most people currently view the shared pages concept as symetric for
all VMAs that share the page, but for overlayfs, a "master-slave" or "stacked"
model might be a better fit, so that, for example, "master" can make a call to
notify the "slave" about page being dirty instead of breaking the sharing.

Jerome,

Do you think we will have time to cover these issues in the joint session.
Perhaps we should tentatively plan for a filesystem track session for
filesystem followup issues?

Some issues I can think of are:
- Which control filesystem gets for new functionality (see above)
- Common code to help sharing pages, i.e. for generic vfs interfaces
  like clone/dedupe/copy_range
- Can/should blockdev pages (of same block) be shared with file
  pages of the filesystem on that blockdev by common mpage_ helpers?
- A common use case is that filesystem images are cloned and loop mounted.
  How can we propagate the knowledge about files data on loop mounted fs
  originating from the same underlying block though the loop device? (*)

(*) loop device is just a simple example, but same can apply to other
storage stacks as well where block layer has dedupe.

Thanks,
Amir.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-01-25  8:39 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-23  8:48 [LSF/MM TOPIC] Sharing file backed pages Amir Goldstein
2019-01-23  8:48 ` Amir Goldstein
2019-01-23 14:54 ` Jan Kara
2019-01-23 15:12   ` Jerome Glisse
2019-01-23 15:26     ` Jerome Glisse
2019-01-23 17:57   ` Amir Goldstein
2019-01-23 17:57     ` Amir Goldstein
2019-01-24 10:39   ` Kirill A. Shutemov
2019-01-25  8:39     ` Amir Goldstein
2019-01-25  8:39       ` Amir Goldstein
2019-01-23 17:06 ` James Bottomley
2019-01-23 17:06   ` James Bottomley
2019-01-23 19:10 ` Matthew Wilcox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.