linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* sharing page cache pages between multiple mappings
@ 2016-05-19  8:20 Miklos Szeredi
  2016-05-19  9:05 ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Miklos Szeredi @ 2016-05-19  8:20 UTC (permalink / raw)
  To: linux-mm, linux-fsdevel, linux-kernel, linux-btrfs

Has anyone thought about sharing pages between multiple files?

The obvious application is for COW filesytems where there are
logically distinct files that physically share data and could easily
share the cache as well if there was infrastructure for it.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sharing page cache pages between multiple mappings
  2016-05-19  8:20 sharing page cache pages between multiple mappings Miklos Szeredi
@ 2016-05-19  9:05 ` Michal Hocko
  2016-05-19 10:17   ` Miklos Szeredi
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2016-05-19  9:05 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-mm, linux-fsdevel, linux-kernel, linux-btrfs

On Thu 19-05-16 10:20:13, Miklos Szeredi wrote:
> Has anyone thought about sharing pages between multiple files?
> 
> The obvious application is for COW filesytems where there are
> logically distinct files that physically share data and could easily
> share the cache as well if there was infrastructure for it.

FYI this has been discussed at LSFMM this year[1]. I wasn't at the
session so cannot tell you any details but the LWN article covers it at
least briefly.

[1] https://lwn.net/Articles/684826/
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sharing page cache pages between multiple mappings
  2016-05-19  9:05 ` Michal Hocko
@ 2016-05-19 10:17   ` Miklos Szeredi
  2016-05-19 10:53     ` Michal Hocko
  2016-05-19 23:48     ` Dave Chinner
  0 siblings, 2 replies; 6+ messages in thread
From: Miklos Szeredi @ 2016-05-19 10:17 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-fsdevel, linux-kernel, linux-btrfs, Darrick J. Wong

On Thu, May 19, 2016 at 11:05 AM, Michal Hocko <mhocko@kernel.org> wrote:
> On Thu 19-05-16 10:20:13, Miklos Szeredi wrote:
>> Has anyone thought about sharing pages between multiple files?
>>
>> The obvious application is for COW filesytems where there are
>> logically distinct files that physically share data and could easily
>> share the cache as well if there was infrastructure for it.
>
> FYI this has been discussed at LSFMM this year[1]. I wasn't at the
> session so cannot tell you any details but the LWN article covers it at
> least briefly.

Cool, so it's not such a crazy idea.

Darrick, would you mind briefly sharing your ideas regarding this?

The use case I have is fixing overlayfs weird behavior. The following
may result in "buf" not matching "data":

    int fr = open("foo", O_RDONLY);
    int fw = open("foo", O_RDWR);
    write(fw, data, sizeof(data));
    read(fr, buf, sizeof(data));

The reason is that "foo" is on a read-only layer, and opening it for
read-write triggers copy-up into a read-write layer.  However the old,
read-only open still refers to the unmodified file.

Fixing this properly requires that when opening a file, we don't
delegate operations fully to the underlying file, but rather allow
sharing of pages from underlying file until the file is copied up.  At
that point we switch to sharing pages with the read-write copy.

Another use case is direct access in fuse:  people often want I/O
operations on a fuse file to go directly to an underlying file.  Doing
this properly requires sharing pages between the real, underlying file
and the fuse file.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sharing page cache pages between multiple mappings
  2016-05-19 10:17   ` Miklos Szeredi
@ 2016-05-19 10:53     ` Michal Hocko
  2016-05-19 23:48     ` Dave Chinner
  1 sibling, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2016-05-19 10:53 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-mm, linux-fsdevel, linux-kernel, linux-btrfs, Darrick J. Wong

On Thu 19-05-16 12:17:14, Miklos Szeredi wrote:
> On Thu, May 19, 2016 at 11:05 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > On Thu 19-05-16 10:20:13, Miklos Szeredi wrote:
> >> Has anyone thought about sharing pages between multiple files?
> >>
> >> The obvious application is for COW filesytems where there are
> >> logically distinct files that physically share data and could easily
> >> share the cache as well if there was infrastructure for it.
> >
> > FYI this has been discussed at LSFMM this year[1]. I wasn't at the
> > session so cannot tell you any details but the LWN article covers it at
> > least briefly.
> 
> Cool, so it's not such a crazy idea.

FWIW it is ;)
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sharing page cache pages between multiple mappings
  2016-05-19 10:17   ` Miklos Szeredi
  2016-05-19 10:53     ` Michal Hocko
@ 2016-05-19 23:48     ` Dave Chinner
  2016-05-20 10:37       ` Miklos Szeredi
  1 sibling, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2016-05-19 23:48 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Michal Hocko, linux-mm, linux-fsdevel, linux-kernel, linux-btrfs,
	Darrick J. Wong

On Thu, May 19, 2016 at 12:17:14PM +0200, Miklos Szeredi wrote:
> On Thu, May 19, 2016 at 11:05 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > On Thu 19-05-16 10:20:13, Miklos Szeredi wrote:
> >> Has anyone thought about sharing pages between multiple files?
> >>
> >> The obvious application is for COW filesytems where there are
> >> logically distinct files that physically share data and could easily
> >> share the cache as well if there was infrastructure for it.
> >
> > FYI this has been discussed at LSFMM this year[1]. I wasn't at the
> > session so cannot tell you any details but the LWN article covers it at
> > least briefly.
> 
> Cool, so it's not such a crazy idea.

Oh, it most certainly is crazy. :P

> Darrick, would you mind briefly sharing your ideas regarding this?

The current line of though is that we'll only attempt this in XFS on
inodes that are known to share underlying physical extents. i.e.
files that have blocks that have been reflinked or deduped.  That
way we can overload the breaking of reflink blocks (via copy on
write) with unsharing the pages in the page cache for that inode.
i.e. shared pages can propagate upwards in overlay if it uses
reflink for copy-up and writes will then break the sharing with the
underlying source without overlay having to do anything special.

Right now I'm not sure what mechanism we will use - we want to
support files that have a mix of private and shared pages, so that
implies we are not going to be sharing mappings but sharing pages
instead.  However, we've been looking at this as being completely
encapsulated within the filesystem because it's tightly linked to
changes in the physical layout of the filesystem, not as general
"share this mapping between two unrelated inodes" infrastructure.
That may change as we dig deeper into it...

> The use case I have is fixing overlayfs weird behavior. The following
> may result in "buf" not matching "data":
> 
>     int fr = open("foo", O_RDONLY);
>     int fw = open("foo", O_RDWR);
>     write(fw, data, sizeof(data));
>     read(fr, buf, sizeof(data));
> 
> The reason is that "foo" is on a read-only layer, and opening it for
> read-write triggers copy-up into a read-write layer.  However the old,
> read-only open still refers to the unmodified file.
>
> Fixing this properly requires that when opening a file, we don't
> delegate operations fully to the underlying file, but rather allow
> sharing of pages from underlying file until the file is copied up.  At
> that point we switch to sharing pages with the read-write copy.

Unless I'm missing something here (quite possible!), I'm not sure
we can fix that problem with page cache sharing or reflink. It
implies we are sharing pages in a downwards direction - private
overlay pages/mappings from multiple inodes would need to be shared
with a single underlying shared read-only inode, and I lack the
imagination to see how that works...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sharing page cache pages between multiple mappings
  2016-05-19 23:48     ` Dave Chinner
@ 2016-05-20 10:37       ` Miklos Szeredi
  0 siblings, 0 replies; 6+ messages in thread
From: Miklos Szeredi @ 2016-05-20 10:37 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Michal Hocko, linux-mm, linux-fsdevel, linux-kernel, linux-btrfs,
	Darrick J. Wong

On Fri, May 20, 2016 at 1:48 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Thu, May 19, 2016 at 12:17:14PM +0200, Miklos Szeredi wrote:
>> On Thu, May 19, 2016 at 11:05 AM, Michal Hocko <mhocko@kernel.org> wrote:
>> > On Thu 19-05-16 10:20:13, Miklos Szeredi wrote:
>> >> Has anyone thought about sharing pages between multiple files?
>> >>
>> >> The obvious application is for COW filesytems where there are
>> >> logically distinct files that physically share data and could easily
>> >> share the cache as well if there was infrastructure for it.
>> >
>> > FYI this has been discussed at LSFMM this year[1]. I wasn't at the
>> > session so cannot tell you any details but the LWN article covers it at
>> > least briefly.
>>
>> Cool, so it's not such a crazy idea.
>
> Oh, it most certainly is crazy. :P
>
>> Darrick, would you mind briefly sharing your ideas regarding this?
>
> The current line of though is that we'll only attempt this in XFS on
> inodes that are known to share underlying physical extents. i.e.
> files that have blocks that have been reflinked or deduped.  That
> way we can overload the breaking of reflink blocks (via copy on
> write) with unsharing the pages in the page cache for that inode.
> i.e. shared pages can propagate upwards in overlay if it uses
> reflink for copy-up and writes will then break the sharing with the
> underlying source without overlay having to do anything special.
>
> Right now I'm not sure what mechanism we will use - we want to
> support files that have a mix of private and shared pages, so that
> implies we are not going to be sharing mappings but sharing pages
> instead.  However, we've been looking at this as being completely
> encapsulated within the filesystem because it's tightly linked to
> changes in the physical layout of the filesystem, not as general
> "share this mapping between two unrelated inodes" infrastructure.
> That may change as we dig deeper into it...
>
>> The use case I have is fixing overlayfs weird behavior. The following
>> may result in "buf" not matching "data":
>>
>>     int fr = open("foo", O_RDONLY);
>>     int fw = open("foo", O_RDWR);
>>     write(fw, data, sizeof(data));
>>     read(fr, buf, sizeof(data));
>>
>> The reason is that "foo" is on a read-only layer, and opening it for
>> read-write triggers copy-up into a read-write layer.  However the old,
>> read-only open still refers to the unmodified file.
>>
>> Fixing this properly requires that when opening a file, we don't
>> delegate operations fully to the underlying file, but rather allow
>> sharing of pages from underlying file until the file is copied up.  At
>> that point we switch to sharing pages with the read-write copy.
>
> Unless I'm missing something here (quite possible!), I'm not sure
> we can fix that problem with page cache sharing or reflink. It
> implies we are sharing pages in a downwards direction - private
> overlay pages/mappings from multiple inodes would need to be shared
> with a single underlying shared read-only inode, and I lack the
> imagination to see how that works...

Indeed, reflink doesn't make this work.

We could reflink-up on any open (or on lookup), not just on write,
it's a trivial change in overlayfs.   Drawback is slower first
open/lookup and space used by duplicate trees even without
modification on the overlay.  Not sure if that's a problem in
practice.

I'll think about the generic downwards sharing.  For overlayfs it
doesn't need to be per-page, so that might make it somewhat simpler
problem.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-05-20 10:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-19  8:20 sharing page cache pages between multiple mappings Miklos Szeredi
2016-05-19  9:05 ` Michal Hocko
2016-05-19 10:17   ` Miklos Szeredi
2016-05-19 10:53     ` Michal Hocko
2016-05-19 23:48     ` Dave Chinner
2016-05-20 10:37       ` Miklos Szeredi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).