linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM/BPF TOPIC] Generic page write protection
@ 2020-01-22  2:32 jglisse
  2020-01-22  4:28 ` Gao Xiang
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: jglisse @ 2020-01-22  2:32 UTC (permalink / raw)
  To: lsf-pc
  Cc: Jérôme Glisse, Andrea Arcangeli, linux-fsdevel,
	linux-block, linux-mm

From: Jérôme Glisse <jglisse@redhat.com>


Provide a generic way to write protect page (à la KSM) to enable new mm
optimization:
    - KSM (kernel share memory) to deduplicate pages (for file
      back pages too not only anonymous memory like today)
    - page duplication NUMA (read only duplication) in multiple
      different physical page. For instance share library code
      having a copy on each NUMA node. Or in case like GPU/FPGA
      duplicating memory read only inside the local device memory.
    ...

Note that this write protection is intend to be broken at anytime in
reasonable time (like KSM today) so that we never block more than
necessary anything that need to write to the page.


The goal is to provide a mechanism that work for both anonymous and
file back memory. For this we need to a pointer inside struct page.
For anonymous memory KSM uses the anon_vma field which correspond
to mapping field for file back pages.

So to allow generic write protection for file back pages we need to
avoid relying on struct page mapping field in the various kernel code
path that do use it today.

The page->mapping fields is use in 5 different ways:
 [1]- Functions operating on file, we can get the mapping from the file
      (issue here is that we might need to pass the file down the call-
      stack)

 [2]- Core/arch mm functions, those do not care about the file (if they
      do then it means they are vma related and we can get the mapping
      from the vma). Those functions only want to be able to walk all
      the pte point to the page (for instance memory compaction, memory
      reclaim, ...). We can provide the exact same functionality for
      write protected pages (like KSM does today).

 [3]- Block layer when I/O fails. This depends on fs, for instance for
      fs which uses buffer_head we can update buffer_head to store the
      mapping instead of the block_device as we can get the block_device
      from the mapping but not the mapping from the block_device.

      So solving this is mostly filesystem specific but i have not seen
      any fs that could not be updated properly so that block layer can
      report I/O failures without relying on page->mapping

 [4]- Debugging (mostly procfs/sysfs files to dump memory states). Those
      do not need the mapping per say, we just need to report page states
      (and thus write protection information if page is write protected).

 [5]- GUP (get user page) if something calls GUP in write mode then we
      need to break write protection (like KSM today). GUPed page should
      not be write protected as we do not know what the GUPers is doing
      with the page.


Most of the patchset deals with [1], [2] and [3] ([4] and [5] are mostly
trivial).

For [1] we only need to pass down the mapping to all fs and vfs callback
functions (this is mostly achieve with coccinelle). Roughly speaking the
patches are generated with following pseudo code:

add_mapping_parameter(func)
{
    function_add_parameter(func, mapping);

    for_each_function_calling (caller, func) {
        calling_add_parameter(caller, func, mapping);

        if (function_parameters_contains(caller, mapping|file))
            continue;

        add_mapping_parameter(caller);
    }
}

passdown_mapping()
{
    for_each_function_in_fs (func, fs_functions) {
        if (!function_body_contains(func, page->mapping))
            continue;

        if (function_parameters_contains(func, mapping|file))
            continue;

        add_mapping_parameter(func);
    }
}

For [2] KSM is generalized and extended so that both anonymous and file
back pages can be handled by a common write protected page case.

For [3] it depends on the filesystem (fs which uses buffer_head are
easily handled by storing mapping into the buffer_head struct).


To avoid any regression risks the page->mapping field is left intact as
today for non write protect pages. This means that if you do not use the
page write protection mechanism then it can not regress. This is achieve
by using an helper function that take the mapping from the context
(current function parameter, see above on how function are updated) and
the struct page. If the page is not write protected then it uses the
mapping from the struct page (just like today). The only difference
between before and after the patchset is that all fs functions that do
need the mapping for a page now also do get it as a parameter but only
use the parameter mapping pointer if the page is write protected.

Note also that i do not believe that once confidence is high that we
always passdown the correct mapping down each callstack, it does not
mean we will be able to get rid of the struct page mapping field.

I posted patchset before [*1] and i intend to post an updated patchset
before LSF/MM/BPF. I also talked about this at LSF/MM 2018. I still
believe this will a topic that warrent a discussion with FS/MM and
block device folks.


[*1] https://lwn.net/Articles/751050/
     https://cgit.freedesktop.org/~glisse/linux/log/?h=generic-write-protection-rfc
[*2] https://lwn.net/Articles/752564/


To: lsf-pc@lists.linux-foundation.org
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-block@vger.kernel.org
Cc: linux-mm@kvack.org


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Generic page write protection
  2020-01-22  2:32 [LSF/MM/BPF TOPIC] Generic page write protection jglisse
@ 2020-01-22  4:28 ` Gao Xiang
  2020-01-22  5:21   ` Jerome Glisse
  2020-01-22  4:41 ` John Hubbard
  2020-01-22 18:27 ` [Lsf-pc][LSF/MM/BPF " John Hubbard
  2 siblings, 1 reply; 8+ messages in thread
From: Gao Xiang @ 2020-01-22  4:28 UTC (permalink / raw)
  To: jglisse; +Cc: lsf-pc, Andrea Arcangeli, linux-fsdevel, linux-block, linux-mm

Hi J�r�me,

On Tue, Jan 21, 2020 at 06:32:22PM -0800, jglisse@redhat.com wrote:
> From: J�r�me Glisse <jglisse@redhat.com>
> 
> 

<snip>

> 
> To avoid any regression risks the page->mapping field is left intact as
> today for non write protect pages. This means that if you do not use the
> page write protection mechanism then it can not regress. This is achieve
> by using an helper function that take the mapping from the context
> (current function parameter, see above on how function are updated) and
> the struct page. If the page is not write protected then it uses the
> mapping from the struct page (just like today). The only difference
> between before and after the patchset is that all fs functions that do
> need the mapping for a page now also do get it as a parameter but only
> use the parameter mapping pointer if the page is write protected.
> 
> Note also that i do not believe that once confidence is high that we
> always passdown the correct mapping down each callstack, it does not
> mean we will be able to get rid of the struct page mapping field.

This feature is awesome and I might have some premature words here...

In short, are you suggesting completely getting rid of all way to access
mapping directly from struct page (other than by page->private or something
else like calling trace)?

I'm not sure if all cases can be handled without page->mapping easily (or
handled effectively) since mapping field could also be used to indicate/judge
truncated pages or some other filesystem specific states (okay, I think there
could be some replacement, but it seems a huge project...)

Currently, page->private is a per-page user-defined field, yet I don't think
it could always be used as a pointer pointing to some structure. It can be
simply used to store some unsigned long values for some kinds of filesystem
pages as well...

It might some ineffective to convert such above usage to individual per-page
structure pointers --- from cacheline or extra memory overhead view...

So I think at least there could be some another way to get its content
source (inode or sub-inode granularity, a reverse way) effectively...
by some field in struct page directly or indirectly...

I agree that the usage of page->mapping field is complicated for now.
I'm looking forward some unique way to mark the page type for a filesystem
to use (inode or fs internal special pages) or even extend to analymous
pages [1]. However, it seems a huge project to keep from some regression...

I'm interested in related stuffs, some conclusion and I saw the article of
LSF/MM 2018 although my English isn't good...

If something wrong, please kindly point out...

[1] https://lore.kernel.org/r/20191030172234.GA7018@hsiangkao-HP-ZHAN-66-Pro-G1

Thanks,
Gao Xiang


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Generic page write protection
  2020-01-22  2:32 [LSF/MM/BPF TOPIC] Generic page write protection jglisse
  2020-01-22  4:28 ` Gao Xiang
@ 2020-01-22  4:41 ` John Hubbard
  2020-01-22 18:27 ` [Lsf-pc][LSF/MM/BPF " John Hubbard
  2 siblings, 0 replies; 8+ messages in thread
From: John Hubbard @ 2020-01-22  4:41 UTC (permalink / raw)
  To: jglisse, lsf-pc; +Cc: Andrea Arcangeli, linux-fsdevel, linux-block, linux-mm

On 1/21/20 6:32 PM, jglisse@redhat.com wrote:
> From: Jérôme Glisse <jglisse@redhat.com>
> 
> 
> Provide a generic way to write protect page (à la KSM) to enable new mm
> optimization:

Hi Jerome, 

I am very interested in this feature and discussion. Thanks for posting
this topic.


>     - KSM (kernel share memory) to deduplicate pages (for file
>       back pages too not only anonymous memory like today)
>     - page duplication NUMA (read only duplication) in multiple
>       different physical page. For instance share library code
>       having a copy on each NUMA node. Or in case like GPU/FPGA
>       duplicating memory read only inside the local device memory.


And also, for the benefit of non-GPU-centric folks, let me add that
something like this is required in order to do GPU atomic operations
to system memory, in support of OpenCL Compute (as opposed to Graphics)
atomic ops.

GPUs can use both read duplication and atomics to great effect. It's 
something we've wanted for a while now.

A bit more below:


>     ...
> 
> Note that this write protection is intend to be broken at anytime in
> reasonable time (like KSM today) so that we never block more than
> necessary anything that need to write to the page.
> 
> 
> The goal is to provide a mechanism that work for both anonymous and
> file back memory. For this we need to a pointer inside struct page.
> For anonymous memory KSM uses the anon_vma field which correspond
> to mapping field for file back pages.
> 
> So to allow generic write protection for file back pages we need to
> avoid relying on struct page mapping field in the various kernel code
> path that do use it today.
> 
> The page->mapping fields is use in 5 different ways:
>  [1]- Functions operating on file, we can get the mapping from the file
>       (issue here is that we might need to pass the file down the call-
>       stack)
> 
>  [2]- Core/arch mm functions, those do not care about the file (if they
>       do then it means they are vma related and we can get the mapping
>       from the vma). Those functions only want to be able to walk all
>       the pte point to the page (for instance memory compaction, memory
>       reclaim, ...). We can provide the exact same functionality for
>       write protected pages (like KSM does today).
> 
>  [3]- Block layer when I/O fails. This depends on fs, for instance for
>       fs which uses buffer_head we can update buffer_head to store the
>       mapping instead of the block_device as we can get the block_device
>       from the mapping but not the mapping from the block_device.
> 
>       So solving this is mostly filesystem specific but i have not seen
>       any fs that could not be updated properly so that block layer can
>       report I/O failures without relying on page->mapping
> 
>  [4]- Debugging (mostly procfs/sysfs files to dump memory states). Those
>       do not need the mapping per say, we just need to report page states
>       (and thus write protection information if page is write protected).
> 
>  [5]- GUP (get user page) if something calls GUP in write mode then we
>       need to break write protection (like KSM today). GUPed page should
>       not be write protected as we do not know what the GUPers is doing
>       with the page.
> 

Yes, this is a reasonable constraint. It's a lot harder to make the page
globally write-protected against *everything* (physically-addressed pages
from a non-CPU device included), and providing write protection at the
virtual address level is not quite as difficult. And it will still provide
most of what we'd want.

If a programmer sets up memory to get gup-pinned, and also wants to do
OpenCL atomics to it, we're going to have to say that's just not supported
this year. But it's still a major new capability and the constraint is
not hard to explain.


thanks,
-- 
John Hubbard
NVIDIA

> 
> Most of the patchset deals with [1], [2] and [3] ([4] and [5] are mostly
> trivial).
> 
> For [1] we only need to pass down the mapping to all fs and vfs callback
> functions (this is mostly achieve with coccinelle). Roughly speaking the
> patches are generated with following pseudo code:
> 
> add_mapping_parameter(func)
> {
>     function_add_parameter(func, mapping);
> 
>     for_each_function_calling (caller, func) {
>         calling_add_parameter(caller, func, mapping);
> 
>         if (function_parameters_contains(caller, mapping|file))
>             continue;
> 
>         add_mapping_parameter(caller);
>     }
> }
> 
> passdown_mapping()
> {
>     for_each_function_in_fs (func, fs_functions) {
>         if (!function_body_contains(func, page->mapping))
>             continue;
> 
>         if (function_parameters_contains(func, mapping|file))
>             continue;
> 
>         add_mapping_parameter(func);
>     }
> }
> 
> For [2] KSM is generalized and extended so that both anonymous and file
> back pages can be handled by a common write protected page case.
> 
> For [3] it depends on the filesystem (fs which uses buffer_head are
> easily handled by storing mapping into the buffer_head struct).
> 
> 
> To avoid any regression risks the page->mapping field is left intact as
> today for non write protect pages. This means that if you do not use the
> page write protection mechanism then it can not regress. This is achieve
> by using an helper function that take the mapping from the context
> (current function parameter, see above on how function are updated) and
> the struct page. If the page is not write protected then it uses the
> mapping from the struct page (just like today). The only difference
> between before and after the patchset is that all fs functions that do
> need the mapping for a page now also do get it as a parameter but only
> use the parameter mapping pointer if the page is write protected.
> 
> Note also that i do not believe that once confidence is high that we
> always passdown the correct mapping down each callstack, it does not
> mean we will be able to get rid of the struct page mapping field.
> 
> I posted patchset before [*1] and i intend to post an updated patchset
> before LSF/MM/BPF. I also talked about this at LSF/MM 2018. I still
> believe this will a topic that warrent a discussion with FS/MM and
> block device folks.
> 
> 
> [*1] https://lwn.net/Articles/751050/
>      https://cgit.freedesktop.org/~glisse/linux/log/?h=generic-write-protection-rfc
> [*2] https://lwn.net/Articles/752564/
> 
> 
> To: lsf-pc@lists.linux-foundation.org
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-block@vger.kernel.org
> Cc: linux-mm@kvack.org
> 
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Generic page write protection
  2020-01-22  4:28 ` Gao Xiang
@ 2020-01-22  5:21   ` Jerome Glisse
  2020-01-22  5:52     ` Gao Xiang
  0 siblings, 1 reply; 8+ messages in thread
From: Jerome Glisse @ 2020-01-22  5:21 UTC (permalink / raw)
  To: Gao Xiang; +Cc: lsf-pc, Andrea Arcangeli, linux-fsdevel, linux-block, linux-mm

On Wed, Jan 22, 2020 at 12:28:39PM +0800, Gao Xiang wrote:
> Hi J�r�me,
> 
> On Tue, Jan 21, 2020 at 06:32:22PM -0800, jglisse@redhat.com wrote:
> > From: J�r�me Glisse <jglisse@redhat.com>
> > 
> > 
> 
> <snip>
> 
> > 
> > To avoid any regression risks the page->mapping field is left intact as
> > today for non write protect pages. This means that if you do not use the
> > page write protection mechanism then it can not regress. This is achieve
> > by using an helper function that take the mapping from the context
> > (current function parameter, see above on how function are updated) and
> > the struct page. If the page is not write protected then it uses the
> > mapping from the struct page (just like today). The only difference
> > between before and after the patchset is that all fs functions that do
> > need the mapping for a page now also do get it as a parameter but only
> > use the parameter mapping pointer if the page is write protected.
> > 
> > Note also that i do not believe that once confidence is high that we
> > always passdown the correct mapping down each callstack, it does not
> > mean we will be able to get rid of the struct page mapping field.
> 
> This feature is awesome and I might have some premature words here...
> 
> In short, are you suggesting completely getting rid of all way to access
> mapping directly from struct page (other than by page->private or something
> else like calling trace)?

No, all access to page->mapping are replace by:
    struct address_space *fs_page_mapping(struct page *page,
                                          struct address_space *mapping)
    {
        if (unlikely(!PageIsWriteProtected(page)))
            return page->mapping;
        return mapping;
    }

All function that where doing direct dereference are updated to use this
helper. If the function already has mapping in its context then it is
easy (there is a lot of place like that because you have file or inode or
mapping available from the function context).

If function does not have file, inode or mapping in its context then a
new mapping parameter is added to that function and all call site are
updated (and this does recurse ie if call site do not have file,inode or
mapping then a mapping parameter is added to them too ...).

This takes care of all fs code. The mm code is split between code that
deal with vma where we can get the mapping from the vma and mm code that
just want to walk all the CPU pte pointing to the page. In this latter
case we just need to provide CPU pte walkers for write protected pages
(like KSM does today).

The block device code only need the mapping on io error and they are
different strategy depending on individual fs. fs using buffer_head
can easily be updated. For other they are different solution and they
can be updated one at a time with tailor solution.


> I'm not sure if all cases can be handled without page->mapping easily (or
> handled effectively) since mapping field could also be used to indicate/judge
> truncated pages or some other filesystem specific states (okay, I think there
> could be some replacement, but it seems a huge project...)

I forgot to talk about truncate, all place that test for truncate are
updated to:
    bool fs_page_is_truncated(struct page *page,
                              struct address_space *mapping)
    {
        if (unlikely(!PageIsWriteProtected(page)))
            return !page->mapping || mapping != page->mapping;
        return wp_page_is_protected(page, mapping);
    }

Where wp_page_is_protected() will use common write protect mm code
(look at mm/ksm.c as it will be mostly that) to determine if the page
have been truncated. Also code doing truncation will have to special
case write protected page but that's easy enough.


> Currently, page->private is a per-page user-defined field, yet I don't think
> it could always be used as a pointer pointing to some structure. It can be
> simply used to store some unsigned long values for some kinds of filesystem
> pages as well...

For fs that use buffer_head i change buffer_head struct to store mapping
and not block_device. For other fs it will depend on the individual fs
but i am not changing page->private, i might only change the struct that
page->private points to for that specific fs.

> 
> It might some ineffective to convert such above usage to individual per-page
> structure pointers --- from cacheline or extra memory overhead view...
> 
> So I think at least there could be some another way to get its content
> source (inode or sub-inode granularity, a reverse way) effectively...
> by some field in struct page directly or indirectly...
> 
> I agree that the usage of page->mapping field is complicated for now.
> I'm looking forward some unique way to mark the page type for a filesystem
> to use (inode or fs internal special pages) or even extend to analymous
> pages [1]. However, it seems a huge project to keep from some regression...

Note that page->mapping stays _untouch_ if page is not write protected
so there is no memory lookup overhead, the only overhead is the extra
branch to test if the page is write protected or not.

So if you do not use the write protection feature then you can not
regress ie page->mapping is untouch and that's what get use like it is
today. So it can not regress unless i do stupid mistake, but that's
what review is for ;)).

> 
> I'm interested in related stuffs, some conclusion and I saw the article of
> LSF/MM 2018 although my English isn't good...
> 
> If something wrong, please kindly point out...
> 
> [1] https://lore.kernel.org/r/20191030172234.GA7018@hsiangkao-HP-ZHAN-66-Pro-G1

Missed that thread thank you for the pointer, i have some reading to do :)

Cheers,
Jérôme Glisse


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Generic page write protection
  2020-01-22  5:21   ` Jerome Glisse
@ 2020-01-22  5:52     ` Gao Xiang
  2020-01-22  6:09       ` Jerome Glisse
  0 siblings, 1 reply; 8+ messages in thread
From: Gao Xiang @ 2020-01-22  5:52 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: lsf-pc, Andrea Arcangeli, linux-fsdevel, linux-block, linux-mm

On Tue, Jan 21, 2020 at 09:21:18PM -0800, Jerome Glisse wrote:

<snip>

> 
> The block device code only need the mapping on io error and they are
> different strategy depending on individual fs. fs using buffer_head
> can easily be updated. For other they are different solution and they
> can be updated one at a time with tailor solution.

If I did't misunderstand, how about post-processing fs code without
some buffer_head but page->private used as another way rather than
a pointer? (Yes, some alternative ways exist such as hacking struct
bio_vec...)

I wonder the final plan on this from the community, learn new rule
and adapt my code anyway.. But in my opinion, such reserve way
(page->mapping likewise) is helpful in many respects, I'm not sure
we could totally get around all cases without it elegantly...

Thank you...

Thanks,
Gao Xiang


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Generic page write protection
  2020-01-22  5:52     ` Gao Xiang
@ 2020-01-22  6:09       ` Jerome Glisse
  2020-01-22  6:21         ` Gao Xiang
  0 siblings, 1 reply; 8+ messages in thread
From: Jerome Glisse @ 2020-01-22  6:09 UTC (permalink / raw)
  To: Gao Xiang; +Cc: lsf-pc, Andrea Arcangeli, linux-fsdevel, linux-block, linux-mm

On Wed, Jan 22, 2020 at 01:52:26PM +0800, Gao Xiang wrote:
> On Tue, Jan 21, 2020 at 09:21:18PM -0800, Jerome Glisse wrote:
> 
> <snip>
> 
> > 
> > The block device code only need the mapping on io error and they are
> > different strategy depending on individual fs. fs using buffer_head
> > can easily be updated. For other they are different solution and they
> > can be updated one at a time with tailor solution.
> 
> If I did't misunderstand, how about post-processing fs code without
> some buffer_head but page->private used as another way rather than
> a pointer? (Yes, some alternative ways exist such as hacking struct
> bio_vec...)

The ultimate answer is that page write protection will not be allow
for some filesystem (that's how the patchset is designed in fact so
that things can be merge piecemeal). But they are many way to solve
the io error reporting and that's one of the thing i would like to get
input on.

> 
> I wonder the final plan on this from the community, learn new rule
> and adapt my code anyway.. But in my opinion, such reserve way
> (page->mapping likewise) is helpful in many respects, I'm not sure
> we could totally get around all cases without it elegantly...

I still need to go read what it is you are trying to achieve. But i
do not see any reason to remove page->mapping

Cheers,
Jérôme


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Generic page write protection
  2020-01-22  6:09       ` Jerome Glisse
@ 2020-01-22  6:21         ` Gao Xiang
  0 siblings, 0 replies; 8+ messages in thread
From: Gao Xiang @ 2020-01-22  6:21 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: lsf-pc, Andrea Arcangeli, linux-fsdevel, linux-block, linux-mm

On Tue, Jan 21, 2020 at 10:09:51PM -0800, Jerome Glisse wrote:
> On Wed, Jan 22, 2020 at 01:52:26PM +0800, Gao Xiang wrote:
> > On Tue, Jan 21, 2020 at 09:21:18PM -0800, Jerome Glisse wrote:
> > 
> > <snip>
> > 
> > > 
> > > The block device code only need the mapping on io error and they are
> > > different strategy depending on individual fs. fs using buffer_head
> > > can easily be updated. For other they are different solution and they
> > > can be updated one at a time with tailor solution.
> > 
> > If I did't misunderstand, how about post-processing fs code without
> > some buffer_head but page->private used as another way rather than
> > a pointer? (Yes, some alternative ways exist such as hacking struct
> > bio_vec...)
> 
> The ultimate answer is that page write protection will not be allow
> for some filesystem (that's how the patchset is designed in fact so
> that things can be merge piecemeal). But they are many way to solve
> the io error reporting and that's one of the thing i would like to get
> input on.
> 
> > 
> > I wonder the final plan on this from the community, learn new rule
> > and adapt my code anyway.. But in my opinion, such reserve way
> > (page->mapping likewise) is helpful in many respects, I'm not sure
> > we could totally get around all cases without it elegantly...
> 
> I still need to go read what it is you are trying to achieve. But i
> do not see any reason to remove page->mapping

I could say it's a huge project :) and I mean there may be some other
options to "insert a pointer directly or indirectly to struct page. "

However, I agree the current page->mapping rule is complicated to be
sorted out in words and make full use it :)

Thanks,
Gao Xiang


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Lsf-pc][LSF/MM/BPF TOPIC] Generic page write protection
  2020-01-22  2:32 [LSF/MM/BPF TOPIC] Generic page write protection jglisse
  2020-01-22  4:28 ` Gao Xiang
  2020-01-22  4:41 ` John Hubbard
@ 2020-01-22 18:27 ` John Hubbard
  2 siblings, 0 replies; 8+ messages in thread
From: John Hubbard @ 2020-01-22 18:27 UTC (permalink / raw)
  To: jglisse, lsf-pc
  Cc: Andrea Arcangeli, linux-fsdevel, linux-block, linux-mm, lsf-pc

Adding: lsf-pc

On 1/21/20 6:32 PM, jglisse@redhat.com wrote:
> From: Jérôme Glisse <jglisse@redhat.com>
> 
> 
> Provide a generic way to write protect page (à la KSM) to enable new mm
> optimization:
>     - KSM (kernel share memory) to deduplicate pages (for file
>       back pages too not only anonymous memory like today)
>     - page duplication NUMA (read only duplication) in multiple
>       different physical page. For instance share library code
>       having a copy on each NUMA node. Or in case like GPU/FPGA
>       duplicating memory read only inside the local device memory.
>     ...
> 
> Note that this write protection is intend to be broken at anytime in
> reasonable time (like KSM today) so that we never block more than
> necessary anything that need to write to the page.
> 
> 
> The goal is to provide a mechanism that work for both anonymous and
> file back memory. For this we need to a pointer inside struct page.
> For anonymous memory KSM uses the anon_vma field which correspond
> to mapping field for file back pages.
> 
> So to allow generic write protection for file back pages we need to
> avoid relying on struct page mapping field in the various kernel code
> path that do use it today.
> 
> The page->mapping fields is use in 5 different ways:
>  [1]- Functions operating on file, we can get the mapping from the file
>       (issue here is that we might need to pass the file down the call-
>       stack)
> 
>  [2]- Core/arch mm functions, those do not care about the file (if they
>       do then it means they are vma related and we can get the mapping
>       from the vma). Those functions only want to be able to walk all
>       the pte point to the page (for instance memory compaction, memory
>       reclaim, ...). We can provide the exact same functionality for
>       write protected pages (like KSM does today).
> 
>  [3]- Block layer when I/O fails. This depends on fs, for instance for
>       fs which uses buffer_head we can update buffer_head to store the
>       mapping instead of the block_device as we can get the block_device
>       from the mapping but not the mapping from the block_device.
> 
>       So solving this is mostly filesystem specific but i have not seen
>       any fs that could not be updated properly so that block layer can
>       report I/O failures without relying on page->mapping
> 
>  [4]- Debugging (mostly procfs/sysfs files to dump memory states). Those
>       do not need the mapping per say, we just need to report page states
>       (and thus write protection information if page is write protected).
> 
>  [5]- GUP (get user page) if something calls GUP in write mode then we
>       need to break write protection (like KSM today). GUPed page should
>       not be write protected as we do not know what the GUPers is doing
>       with the page.
> 
> 
> Most of the patchset deals with [1], [2] and [3] ([4] and [5] are mostly
> trivial).
> 
> For [1] we only need to pass down the mapping to all fs and vfs callback
> functions (this is mostly achieve with coccinelle). Roughly speaking the
> patches are generated with following pseudo code:
> 
> add_mapping_parameter(func)
> {
>     function_add_parameter(func, mapping);
> 
>     for_each_function_calling (caller, func) {
>         calling_add_parameter(caller, func, mapping);
> 
>         if (function_parameters_contains(caller, mapping|file))
>             continue;
> 
>         add_mapping_parameter(caller);
>     }
> }
> 
> passdown_mapping()
> {
>     for_each_function_in_fs (func, fs_functions) {
>         if (!function_body_contains(func, page->mapping))
>             continue;
> 
>         if (function_parameters_contains(func, mapping|file))
>             continue;
> 
>         add_mapping_parameter(func);
>     }
> }
> 
> For [2] KSM is generalized and extended so that both anonymous and file
> back pages can be handled by a common write protected page case.
> 
> For [3] it depends on the filesystem (fs which uses buffer_head are
> easily handled by storing mapping into the buffer_head struct).
> 
> 
> To avoid any regression risks the page->mapping field is left intact as
> today for non write protect pages. This means that if you do not use the
> page write protection mechanism then it can not regress. This is achieve
> by using an helper function that take the mapping from the context
> (current function parameter, see above on how function are updated) and
> the struct page. If the page is not write protected then it uses the
> mapping from the struct page (just like today). The only difference
> between before and after the patchset is that all fs functions that do
> need the mapping for a page now also do get it as a parameter but only
> use the parameter mapping pointer if the page is write protected.
> 
> Note also that i do not believe that once confidence is high that we
> always passdown the correct mapping down each callstack, it does not
> mean we will be able to get rid of the struct page mapping field.
> 
> I posted patchset before [*1] and i intend to post an updated patchset
> before LSF/MM/BPF. I also talked about this at LSF/MM 2018. I still
> believe this will a topic that warrent a discussion with FS/MM and
> block device folks.
> 
> 
> [*1] https://lwn.net/Articles/751050/
>      https://cgit.freedesktop.org/~glisse/linux/log/?h=generic-write-protection-rfc
> [*2] https://lwn.net/Articles/752564/
> 
> 
> To: lsf-pc@lists.linux-foundation.org
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-block@vger.kernel.org
> Cc: linux-mm@kvack.org
> 
> 

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-01-22 18:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-22  2:32 [LSF/MM/BPF TOPIC] Generic page write protection jglisse
2020-01-22  4:28 ` Gao Xiang
2020-01-22  5:21   ` Jerome Glisse
2020-01-22  5:52     ` Gao Xiang
2020-01-22  6:09       ` Jerome Glisse
2020-01-22  6:21         ` Gao Xiang
2020-01-22  4:41 ` John Hubbard
2020-01-22 18:27 ` [Lsf-pc][LSF/MM/BPF " John Hubbard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).