linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* XFS/mmap reflink file question
@ 2021-08-17 12:19 Laurent Stacul
  2021-08-17 22:12 ` Vito Caputo
  0 siblings, 1 reply; 4+ messages in thread
From: Laurent Stacul @ 2021-08-17 12:19 UTC (permalink / raw)
  To: linux-kernel

Hello,

I spent much time digging into the mmap mechanism and I don't have a clear view
on mmap'ing a file and a reflink to this file would be mapped twice in memory
(this only applies in case the filesystem supports reflink feature like XFS).

To describe my tests, I generate a file stored on an XFS partition and create a
reflink of it:

     % dd if=/dev/zero of=./output.dat bs=1M count=24
     % cp --reflink -v output.dat output2.dat
     % xfs_bmap -v output.dat
     output.dat:
      EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET          TOTAL
        0: [0..49151]:      3756776..3805927  0 (3756776..3805927) 49152 100000
     % xfs_bmap -v output2.dat
     output2.dat:
      EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET          TOTAL
        0: [0..49151]:      3756776..3805927  0 (3756776..3805927) 49152 100000

Then I mmap the first file twice using vmtouch tool:

     % vmtouch -l output.dat&
     [1] 15870
     LOCKED 6144 pages (24M)
     % vmtouch -l output.dat&
     [2] 15872
     LOCKED 6144 pages (24M)
     % pmap -X 15872 | grep -e 'Pss' -e 'output' | awk '{if(NR>1)printf("%16s %4s %6s %10s %10s %10s\n", $1, $2, $4, $5, $7, $8)}'
          Address Perm Device      Inode        Rss        Pss
     7fcbb9eb9000 r--s  fc:10    3755268      24576      12288

As we can see the Proportional Set Size is as expected the half of the Resident
Set Size because the memory is shared by the two processes. Now, I mmap the
reflink `output2.dat' of 'output.dat':

     % vmtouch -l output2.dat&
     [3] 15892
     LOCKED 6144 pages (24M)
     % pmap -X 15872 | grep -e 'Pss' -e 'output' | awk '{if(NR>1)printf("%16s %4s %6s %10s %10s %10s\n", $1, $2, $4, $5, $7, $8)}'
          Address Perm Device      Inode        Rss        Pss
     7fcbb9eb9000 r--s  fc:10    3755268      24576      12288

The Pss of mmap'ed file by the first process has not decreased (I expected a
value of Rss / 3 because I hoped the memory would have been shared by the 3
processes). If I look at the process map of the last process, we can interpret
a new memory area was allocated and locked.

     % pmap -X 15892 | grep -e 'Pss' -e 'output' | awk '{if(NR>1)printf("%16s %4s %6s %10s %10s %10s\n", $1, $2, $4, $5, $7, $8)}'
           Address Perm Device      Inode        Rss        Pss
      7f5adc53f000 r--s  fc:10    3755269      24576      24576

So my questions:
- Why can't we benefit from the memory sharing when reflinked files are mmap'ed
   ? It would be great because one application would be, in the context of
   containers, the possibility to share some read only areas between container
   that are built from the layer diff that are reproducible between images. We
   can imagine a layer that brings some shared libraries in an image from a
   reproducible FS diff so that containers would not load several times a
   library.
- I can think of many tricky cases with the behavior I was expecting (especially
   if a process has write access to the mapped area), but if you know a way, an
   option something to achieve what I am trying to do, I would be glad to hear
   it.
- Conversely, don't hesitate to tell me my expectation is just crazy.

Anyway, I am always looking forward to listening to valuable specialist insights.
Thanks in advance,

stac

PS: Please, add me is CC if this message deserves an answer.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: XFS/mmap reflink file question
  2021-08-17 12:19 XFS/mmap reflink file question Laurent Stacul
@ 2021-08-17 22:12 ` Vito Caputo
  2021-08-19  8:59   ` Laurent Stacul
  2021-08-19 10:04   ` Laurent Stacul
  0 siblings, 2 replies; 4+ messages in thread
From: Vito Caputo @ 2021-08-17 22:12 UTC (permalink / raw)
  To: Laurent Stacul; +Cc: linux-kernel

On Tue, Aug 17, 2021 at 02:19:12PM +0200, Laurent Stacul wrote:
> Hello,
> 
> I spent much time digging into the mmap mechanism and I don't have a clear view
> on mmap'ing a file and a reflink to this file would be mapped twice in memory
> (this only applies in case the filesystem supports reflink feature like XFS).
> 
> To describe my tests, I generate a file stored on an XFS partition and create a
> reflink of it:
> 
>     % dd if=/dev/zero of=./output.dat bs=1M count=24
>     % cp --reflink -v output.dat output2.dat
>     % xfs_bmap -v output.dat
>     output.dat:
>      EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET          TOTAL
>        0: [0..49151]:      3756776..3805927  0 (3756776..3805927) 49152 100000
>     % xfs_bmap -v output2.dat
>     output2.dat:
>      EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET          TOTAL
>        0: [0..49151]:      3756776..3805927  0 (3756776..3805927) 49152 100000
> 
> Then I mmap the first file twice using vmtouch tool:
> 
>     % vmtouch -l output.dat&
>     [1] 15870
>     LOCKED 6144 pages (24M)
>     % vmtouch -l output.dat&
>     [2] 15872
>     LOCKED 6144 pages (24M)
>     % pmap -X 15872 | grep -e 'Pss' -e 'output' | awk '{if(NR>1)printf("%16s %4s %6s %10s %10s %10s\n", $1, $2, $4, $5, $7, $8)}'
>          Address Perm Device      Inode        Rss        Pss
>     7fcbb9eb9000 r--s  fc:10    3755268      24576      12288
> 
> As we can see the Proportional Set Size is as expected the half of the Resident
> Set Size because the memory is shared by the two processes. Now, I mmap the
> reflink `output2.dat' of 'output.dat':
> 
>     % vmtouch -l output2.dat&
>     [3] 15892
>     LOCKED 6144 pages (24M)
>     % pmap -X 15872 | grep -e 'Pss' -e 'output' | awk '{if(NR>1)printf("%16s %4s %6s %10s %10s %10s\n", $1, $2, $4, $5, $7, $8)}'
>          Address Perm Device      Inode        Rss        Pss
>     7fcbb9eb9000 r--s  fc:10    3755268      24576      12288
> 
> The Pss of mmap'ed file by the first process has not decreased (I expected a
> value of Rss / 3 because I hoped the memory would have been shared by the 3
> processes). If I look at the process map of the last process, we can interpret
> a new memory area was allocated and locked.
> 
>     % pmap -X 15892 | grep -e 'Pss' -e 'output' | awk '{if(NR>1)printf("%16s %4s %6s %10s %10s %10s\n", $1, $2, $4, $5, $7, $8)}'
>           Address Perm Device      Inode        Rss        Pss
>      7f5adc53f000 r--s  fc:10    3755269      24576      24576
> 
> So my questions:
> - Why can't we benefit from the memory sharing when reflinked files are mmap'ed
>   ? It would be great because one application would be, in the context of
>   containers, the possibility to share some read only areas between container
>   that are built from the layer diff that are reproducible between images. We
>   can imagine a layer that brings some shared libraries in an image from a
>   reproducible FS diff so that containers would not load several times a
>   library.
> - I can think of many tricky cases with the behavior I was expecting (especially
>   if a process has write access to the mapped area), but if you know a way, an
>   option something to achieve what I am trying to do, I would be glad to hear
>   it.
> - Conversely, don't hesitate to tell me my expectation is just crazy.
> 
> Anyway, I am always looking forward to listening to valuable specialist insights.
> Thanks in advance,
> 
> stac
> 
> PS: Please, add me is CC if this message deserves an answer.
> 

This is one of the major features overlayfs brings to the table over
reflink's current implementation.

With reflink copies you get distinct inodes and the data sharing
occurs further down in the fs at the extent level, below the struct
address_space instances.

If memory serves Dave Chinner has given the issue some thought, but I
haven't noticed/heard anything in terms of progress there.  Maybe
he'll see this and chime in...

Regards,
Vito Caputo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: XFS/mmap reflink file question
  2021-08-17 22:12 ` Vito Caputo
@ 2021-08-19  8:59   ` Laurent Stacul
  2021-08-19 10:04   ` Laurent Stacul
  1 sibling, 0 replies; 4+ messages in thread
From: Laurent Stacul @ 2021-08-19  8:59 UTC (permalink / raw)
  To: Vito Caputo, linux-kernel

On Tue, Aug 17, 2021 at 03:12:58PM -0700, Vito Caputo wrote:
>On Tue, Aug 17, 2021 at 02:19:12PM +0200, Laurent Stacul wrote:
>> Hello,
>>
>> I spent much time digging into the mmap mechanism and I don't have a clear view
>> on mmap'ing a file and a reflink to this file would be mapped twice in memory
>> (this only applies in case the filesystem supports reflink feature like XFS).
>>
>> To describe my tests, I generate a file stored on an XFS partition and create a
>> reflink of it:
>>
>>     % dd if=/dev/zero of=./output.dat bs=1M count=24
>>     % cp --reflink -v output.dat output2.dat
>>     % xfs_bmap -v output.dat
>>     output.dat:
>>      EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET          TOTAL
>>        0: [0..49151]:      3756776..3805927  0 (3756776..3805927) 49152 100000
>>     % xfs_bmap -v output2.dat
>>     output2.dat:
>>      EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET          TOTAL
>>        0: [0..49151]:      3756776..3805927  0 (3756776..3805927) 49152 100000
>>
>> Then I mmap the first file twice using vmtouch tool:
>>
>>     % vmtouch -l output.dat&
>>     [1] 15870
>>     LOCKED 6144 pages (24M)
>>     % vmtouch -l output.dat&
>>     [2] 15872
>>     LOCKED 6144 pages (24M)
>>     % pmap -X 15872 | grep -e 'Pss' -e 'output' | awk '{if(NR>1)printf("%16s %4s %6s %10s %10s %10s\n", $1, $2, $4, $5, $7, $8)}'
>>          Address Perm Device      Inode        Rss        Pss
>>     7fcbb9eb9000 r--s  fc:10    3755268      24576      12288
>>
>> As we can see the Proportional Set Size is as expected the half of the Resident
>> Set Size because the memory is shared by the two processes. Now, I mmap the
>> reflink `output2.dat' of 'output.dat':
>>
>>     % vmtouch -l output2.dat&
>>     [3] 15892
>>     LOCKED 6144 pages (24M)
>>     % pmap -X 15872 | grep -e 'Pss' -e 'output' | awk '{if(NR>1)printf("%16s %4s %6s %10s %10s %10s\n", $1, $2, $4, $5, $7, $8)}'
>>          Address Perm Device      Inode        Rss        Pss
>>     7fcbb9eb9000 r--s  fc:10    3755268      24576      12288
>>
>> The Pss of mmap'ed file by the first process has not decreased (I expected a
>> value of Rss / 3 because I hoped the memory would have been shared by the 3
>> processes). If I look at the process map of the last process, we can interpret
>> a new memory area was allocated and locked.
>>
>>     % pmap -X 15892 | grep -e 'Pss' -e 'output' | awk '{if(NR>1)printf("%16s %4s %6s %10s %10s %10s\n", $1, $2, $4, $5, $7, $8)}'
>>           Address Perm Device      Inode        Rss        Pss
>>      7f5adc53f000 r--s  fc:10    3755269      24576      24576
>>
>> So my questions:
>> - Why can't we benefit from the memory sharing when reflinked files are mmap'ed
>>   ? It would be great because one application would be, in the context of
>>   containers, the possibility to share some read only areas between container
>>   that are built from the layer diff that are reproducible between images. We
>>   can imagine a layer that brings some shared libraries in an image from a
>>   reproducible FS diff so that containers would not load several times a
>>   library.
>> - I can think of many tricky cases with the behavior I was expecting (especially
>>   if a process has write access to the mapped area), but if you know a way, an
>>   option something to achieve what I am trying to do, I would be glad to hear
>>   it.
>> - Conversely, don't hesitate to tell me my expectation is just crazy.
>>
>> Anyway, I am always looking forward to listening to valuable specialist insights.
>> Thanks in advance,
>>
>> stac
>>
>> PS: Please, add me is CC if this message deserves an answer.
>>
>
>This is one of the major features overlayfs brings to the table over
>reflink's current implementation.
>
>With reflink copies you get distinct inodes and the data sharing
>occurs further down in the fs at the extent level, below the struct
>address_space instances.
>
>If memory serves Dave Chinner has given the issue some thought, but I
>haven't noticed/heard anything in terms of progress there.  Maybe
>he'll see this and chime in...
>
>Regards,
>Vito Caputo

Thanks for your answer. If I understand correctly, reflink feature
cannot be used in the scenario I propose because reflinks are
optimization occuring under the VFS. This makes sense to me and I was
not really confident this had a chance to work.

As you suggest, I will turn my effort on overlay.

Regards,
stac

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: XFS/mmap reflink file question
  2021-08-17 22:12 ` Vito Caputo
  2021-08-19  8:59   ` Laurent Stacul
@ 2021-08-19 10:04   ` Laurent Stacul
  1 sibling, 0 replies; 4+ messages in thread
From: Laurent Stacul @ 2021-08-19 10:04 UTC (permalink / raw)
  To: Vito Caputo; +Cc: Laurent Stacul, linux-kernel

On Tue, Aug 17, 2021 at 03:12:58PM -0700, Vito Caputo wrote:
>On Tue, Aug 17, 2021 at 02:19:12PM +0200, Laurent Stacul wrote:
>> Hello,
>>
>> I spent much time digging into the mmap mechanism and I don't have a clear view
>> on mmap'ing a file and a reflink to this file would be mapped twice in memory
>> (this only applies in case the filesystem supports reflink feature like XFS).
>>
>> To describe my tests, I generate a file stored on an XFS partition and create a
>> reflink of it:
>>
>>     % dd if=/dev/zero of=./output.dat bs=1M count=24
>>     % cp --reflink -v output.dat output2.dat
>>     % xfs_bmap -v output.dat
>>     output.dat:
>>      EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET          TOTAL
>>        0: [0..49151]:      3756776..3805927  0 (3756776..3805927) 49152 100000
>>     % xfs_bmap -v output2.dat
>>     output2.dat:
>>      EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET          TOTAL
>>        0: [0..49151]:      3756776..3805927  0 (3756776..3805927) 49152 100000
>>
>> Then I mmap the first file twice using vmtouch tool:
>>
>>     % vmtouch -l output.dat&
>>     [1] 15870
>>     LOCKED 6144 pages (24M)
>>     % vmtouch -l output.dat&
>>     [2] 15872
>>     LOCKED 6144 pages (24M)
>>     % pmap -X 15872 | grep -e 'Pss' -e 'output' | awk '{if(NR>1)printf("%16s %4s %6s %10s %10s %10s\n", $1, $2, $4, $5, $7, $8)}'
>>          Address Perm Device      Inode        Rss        Pss
>>     7fcbb9eb9000 r--s  fc:10    3755268      24576      12288
>>
>> As we can see the Proportional Set Size is as expected the half of the Resident
>> Set Size because the memory is shared by the two processes. Now, I mmap the
>> reflink `output2.dat' of 'output.dat':
>>
>>     % vmtouch -l output2.dat&
>>     [3] 15892
>>     LOCKED 6144 pages (24M)
>>     % pmap -X 15872 | grep -e 'Pss' -e 'output' | awk '{if(NR>1)printf("%16s %4s %6s %10s %10s %10s\n", $1, $2, $4, $5, $7, $8)}'
>>          Address Perm Device      Inode        Rss        Pss
>>     7fcbb9eb9000 r--s  fc:10    3755268      24576      12288
>>
>> The Pss of mmap'ed file by the first process has not decreased (I expected a
>> value of Rss / 3 because I hoped the memory would have been shared by the 3
>> processes). If I look at the process map of the last process, we can interpret
>> a new memory area was allocated and locked.
>>
>>     % pmap -X 15892 | grep -e 'Pss' -e 'output' | awk '{if(NR>1)printf("%16s %4s %6s %10s %10s %10s\n", $1, $2, $4, $5, $7, $8)}'
>>           Address Perm Device      Inode        Rss        Pss
>>      7f5adc53f000 r--s  fc:10    3755269      24576      24576
>>
>> So my questions:
>> - Why can't we benefit from the memory sharing when reflinked files are mmap'ed
>>   ? It would be great because one application would be, in the context of
>>   containers, the possibility to share some read only areas between container
>>   that are built from the layer diff that are reproducible between images. We
>>   can imagine a layer that brings some shared libraries in an image from a
>>   reproducible FS diff so that containers would not load several times a
>>   library.
>> - I can think of many tricky cases with the behavior I was expecting (especially
>>   if a process has write access to the mapped area), but if you know a way, an
>>   option something to achieve what I am trying to do, I would be glad to hear
>>   it.
>> - Conversely, don't hesitate to tell me my expectation is just crazy.
>>
>> Anyway, I am always looking forward to listening to valuable specialist insights.
>> Thanks in advance,
>>
>> stac
>>
>> PS: Please, add me is CC if this message deserves an answer.
>>
>
>This is one of the major features overlayfs brings to the table over
>reflink's current implementation.
>
>With reflink copies you get distinct inodes and the data sharing
>occurs further down in the fs at the extent level, below the struct
>address_space instances.
>
>If memory serves Dave Chinner has given the issue some thought, but I
>haven't noticed/heard anything in terms of progress there.  Maybe
>he'll see this and chime in...
>
>Regards,
>Vito Caputo

Thanks for your answer. If I understand correctly, reflink feature
cannot be used in the scenario I propose because reflinks are
optimization occuring under the VFS. This makes sense to me and I was
not really confident this had a chance to work.

As you suggest, I will turn my effort on overlay.

Regards,
stac


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-08-19 10:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-17 12:19 XFS/mmap reflink file question Laurent Stacul
2021-08-17 22:12 ` Vito Caputo
2021-08-19  8:59   ` Laurent Stacul
2021-08-19 10:04   ` Laurent Stacul

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).