All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jerome Glisse <jglisse@redhat.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	Andrea Arcangeli <aarcange@redhat.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Theodore Ts'o <tytso@mit.edu>, Tejun Heo <tj@kernel.org>,
	Josef Bacik <jbacik@fb.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Jeff Layton <jlayton@redhat.com>
Subject: Re: [RFC PATCH 00/79] Generic page write protection and a solution to page waitqueue
Date: Thu, 19 Apr 2018 10:52:19 -0400	[thread overview]
Message-ID: <20180419145219.GB3519@redhat.com> (raw)
In-Reply-To: <20180419103250.qvusqkjq6hlz3ch6@quack2.suse.cz>

On Thu, Apr 19, 2018 at 12:32:50PM +0200, Jan Kara wrote:
> On Wed 18-04-18 11:54:30, Jerome Glisse wrote:

[...]

> > I am affraid truely generic write protection for metadata pages is bit
> > out of scope of what i am doing. However the mechanism i am proposing
> > can be extended for that too. Issue is that all place that want to write
> > to those page need to be converted to something where write happens
> > between write_begin and write_end section (mmap and CPU pte does give
> > this implicitly through page fault, so does write syscall). Basicly
> > there is a need to make sure that write and write protection can be
> > ordered against one another without complex locking.
> 
> I understand metadata pages are not interesting for your use case. However
> from mm point of view these are page cache pages as any other. So maybe my
> question should have been: How do we make sure this mechanism will not be
> used for pages for which it cannot work?

Oh that one is easy, the API take vma + addr or rather mm struct + addr
(ie like KSM today kind of). I will change wording in v1 to almost
generic write protection :) or process' page write protection (but this
would not work for special pfn/vma so not generic their either).

> > > > A write protected page has page->mapping pointing to a structure like
> > > > struct rmap_item for KSM. So this structure has a list for each unique
> > > > combination:
> > > >     struct write_protect {
> > > >         struct list_head *mappings; /* write_protect_mapping list */
> > > >         ...
> > > >     };
> > > > 
> > > >     struct write_protect_mapping {
> > > >         struct list_head list
> > > >         struct address_space *mapping;
> > > >         unsigned long offset;
> > > >         unsigned long private;
> > > >         ...
> > > >     };
> > > 
> > > Auch, the fact that we could share a page as data storage for several
> > > inode+offset combinations that are not sharing underlying storage just
> > > looks viciously twisted ;) But is it really that useful to warrant
> > > complications? In particular I'm afraid that filesystems expect consistency
> > > between their internal state (attached to page->private) and page state
> > > (e.g. page->flags) and when there are multiple internal states attached to
> > > the same page this could go easily wrong...
> > 
> > So at first i want to limit to write protect (not KSM) thus page->flags
> > will stay consistent (ie page is only ever associated with a single
> > mapping). For KSM yes the page->flags can be problematic, however here
> > we can assume that page is clean (and uptodate) and not under write
> > back. So problematic flags for KSM:
> >   - private (page_has_buffers() or PagePrivate (nfs, metadata, ...))
> >   - private_2 (FsCache)
> >   - mappedtodisk
> >   - swapcache
> >   - error
> > 
> > Idea again would be to PageFlagsWithMapping(page, mapping) so that for
> > non KSM write protected page you test the usual page->flags and for
> > write protected page you find the flag value using mapping as lookup
> > index. Usualy those flag are seldomly changed/accessed. Again the
> > overhead (ignoring code size) would only be for page which are KSM.
> > So maybe KSM will not make sense because perf overhead it has with
> > page->flags access (i don't think so but i haven't tested this).
> 
> Yeah, sure, page->flags could be dealt with in a similar way but at this
> point I don't think it's worth it. And without page->flags I don't think
> abstracting page->private makes much sense - or am I missing something why
> you need page->private depend on the mapping? So what I wanted to suggest
> is that we leave page->private as is currently and just concentrate on
> page->mapping hacks...

Well i wanted to go up to KSM or at least as close as possible to KSM
for file back page. But i can focus on page->mapping first, do write
protection with that and also do the per page wait queue for page lock.
Which i believe are both nice features. This will also make the patchset
smaller and easier to review (less scary).

KSM can be done on top of that latter and i will be happy to help. I
have a bunch of coccinelle patches for page->private, page->index and
i can do some for page->flags.

Cheers,
J�r�me

WARNING: multiple messages have this Message-ID (diff)
From: Jerome Glisse <jglisse@redhat.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	Andrea Arcangeli <aarcange@redhat.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	"Theodore Ts'o" <tytso@mit.edu>, Tejun Heo <tj@kernel.org>,
	Josef Bacik <jbacik@fb.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Jeff Layton <jlayton@redhat.com>
Subject: Re: [RFC PATCH 00/79] Generic page write protection and a solution to page waitqueue
Date: Thu, 19 Apr 2018 10:52:19 -0400	[thread overview]
Message-ID: <20180419145219.GB3519@redhat.com> (raw)
In-Reply-To: <20180419103250.qvusqkjq6hlz3ch6@quack2.suse.cz>

On Thu, Apr 19, 2018 at 12:32:50PM +0200, Jan Kara wrote:
> On Wed 18-04-18 11:54:30, Jerome Glisse wrote:

[...]

> > I am affraid truely generic write protection for metadata pages is bit
> > out of scope of what i am doing. However the mechanism i am proposing
> > can be extended for that too. Issue is that all place that want to write
> > to those page need to be converted to something where write happens
> > between write_begin and write_end section (mmap and CPU pte does give
> > this implicitly through page fault, so does write syscall). Basicly
> > there is a need to make sure that write and write protection can be
> > ordered against one another without complex locking.
> 
> I understand metadata pages are not interesting for your use case. However
> from mm point of view these are page cache pages as any other. So maybe my
> question should have been: How do we make sure this mechanism will not be
> used for pages for which it cannot work?

Oh that one is easy, the API take vma + addr or rather mm struct + addr
(ie like KSM today kind of). I will change wording in v1 to almost
generic write protection :) or process' page write protection (but this
would not work for special pfn/vma so not generic their either).

> > > > A write protected page has page->mapping pointing to a structure like
> > > > struct rmap_item for KSM. So this structure has a list for each unique
> > > > combination:
> > > >     struct write_protect {
> > > >         struct list_head *mappings; /* write_protect_mapping list */
> > > >         ...
> > > >     };
> > > > 
> > > >     struct write_protect_mapping {
> > > >         struct list_head list
> > > >         struct address_space *mapping;
> > > >         unsigned long offset;
> > > >         unsigned long private;
> > > >         ...
> > > >     };
> > > 
> > > Auch, the fact that we could share a page as data storage for several
> > > inode+offset combinations that are not sharing underlying storage just
> > > looks viciously twisted ;) But is it really that useful to warrant
> > > complications? In particular I'm afraid that filesystems expect consistency
> > > between their internal state (attached to page->private) and page state
> > > (e.g. page->flags) and when there are multiple internal states attached to
> > > the same page this could go easily wrong...
> > 
> > So at first i want to limit to write protect (not KSM) thus page->flags
> > will stay consistent (ie page is only ever associated with a single
> > mapping). For KSM yes the page->flags can be problematic, however here
> > we can assume that page is clean (and uptodate) and not under write
> > back. So problematic flags for KSM:
> >   - private (page_has_buffers() or PagePrivate (nfs, metadata, ...))
> >   - private_2 (FsCache)
> >   - mappedtodisk
> >   - swapcache
> >   - error
> > 
> > Idea again would be to PageFlagsWithMapping(page, mapping) so that for
> > non KSM write protected page you test the usual page->flags and for
> > write protected page you find the flag value using mapping as lookup
> > index. Usualy those flag are seldomly changed/accessed. Again the
> > overhead (ignoring code size) would only be for page which are KSM.
> > So maybe KSM will not make sense because perf overhead it has with
> > page->flags access (i don't think so but i haven't tested this).
> 
> Yeah, sure, page->flags could be dealt with in a similar way but at this
> point I don't think it's worth it. And without page->flags I don't think
> abstracting page->private makes much sense - or am I missing something why
> you need page->private depend on the mapping? So what I wanted to suggest
> is that we leave page->private as is currently and just concentrate on
> page->mapping hacks...

Well i wanted to go up to KSM or at least as close as possible to KSM
for file back page. But i can focus on page->mapping first, do write
protection with that and also do the per page wait queue for page lock.
Which i believe are both nice features. This will also make the patchset
smaller and easier to review (less scary).

KSM can be done on top of that latter and i will be happy to help. I
have a bunch of coccinelle patches for page->private, page->index and
i can do some for page->flags.

Cheers,
Jérôme

WARNING: multiple messages have this Message-ID (diff)
From: Jerome Glisse <jglisse@redhat.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	Andrea Arcangeli <aarcange@redhat.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Theodore Ts'o <tytso@mit.edu>, Tejun Heo <tj@kernel.org>,
	Josef Bacik <jbacik@fb.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Jeff Layton <jlayton@redhat.com>
Subject: Re: [RFC PATCH 00/79] Generic page write protection and a solution to page waitqueue
Date: Thu, 19 Apr 2018 10:52:19 -0400	[thread overview]
Message-ID: <20180419145219.GB3519@redhat.com> (raw)
In-Reply-To: <20180419103250.qvusqkjq6hlz3ch6@quack2.suse.cz>

On Thu, Apr 19, 2018 at 12:32:50PM +0200, Jan Kara wrote:
> On Wed 18-04-18 11:54:30, Jerome Glisse wrote:

[...]

> > I am affraid truely generic write protection for metadata pages is bit
> > out of scope of what i am doing. However the mechanism i am proposing
> > can be extended for that too. Issue is that all place that want to write
> > to those page need to be converted to something where write happens
> > between write_begin and write_end section (mmap and CPU pte does give
> > this implicitly through page fault, so does write syscall). Basicly
> > there is a need to make sure that write and write protection can be
> > ordered against one another without complex locking.
> 
> I understand metadata pages are not interesting for your use case. However
> from mm point of view these are page cache pages as any other. So maybe my
> question should have been: How do we make sure this mechanism will not be
> used for pages for which it cannot work?

Oh that one is easy, the API take vma + addr or rather mm struct + addr
(ie like KSM today kind of). I will change wording in v1 to almost
generic write protection :) or process' page write protection (but this
would not work for special pfn/vma so not generic their either).

> > > > A write protected page has page->mapping pointing to a structure like
> > > > struct rmap_item for KSM. So this structure has a list for each unique
> > > > combination:
> > > >     struct write_protect {
> > > >         struct list_head *mappings; /* write_protect_mapping list */
> > > >         ...
> > > >     };
> > > > 
> > > >     struct write_protect_mapping {
> > > >         struct list_head list
> > > >         struct address_space *mapping;
> > > >         unsigned long offset;
> > > >         unsigned long private;
> > > >         ...
> > > >     };
> > > 
> > > Auch, the fact that we could share a page as data storage for several
> > > inode+offset combinations that are not sharing underlying storage just
> > > looks viciously twisted ;) But is it really that useful to warrant
> > > complications? In particular I'm afraid that filesystems expect consistency
> > > between their internal state (attached to page->private) and page state
> > > (e.g. page->flags) and when there are multiple internal states attached to
> > > the same page this could go easily wrong...
> > 
> > So at first i want to limit to write protect (not KSM) thus page->flags
> > will stay consistent (ie page is only ever associated with a single
> > mapping). For KSM yes the page->flags can be problematic, however here
> > we can assume that page is clean (and uptodate) and not under write
> > back. So problematic flags for KSM:
> >   - private (page_has_buffers() or PagePrivate (nfs, metadata, ...))
> >   - private_2 (FsCache)
> >   - mappedtodisk
> >   - swapcache
> >   - error
> > 
> > Idea again would be to PageFlagsWithMapping(page, mapping) so that for
> > non KSM write protected page you test the usual page->flags and for
> > write protected page you find the flag value using mapping as lookup
> > index. Usualy those flag are seldomly changed/accessed. Again the
> > overhead (ignoring code size) would only be for page which are KSM.
> > So maybe KSM will not make sense because perf overhead it has with
> > page->flags access (i don't think so but i haven't tested this).
> 
> Yeah, sure, page->flags could be dealt with in a similar way but at this
> point I don't think it's worth it. And without page->flags I don't think
> abstracting page->private makes much sense - or am I missing something why
> you need page->private depend on the mapping? So what I wanted to suggest
> is that we leave page->private as is currently and just concentrate on
> page->mapping hacks...

Well i wanted to go up to KSM or at least as close as possible to KSM
for file back page. But i can focus on page->mapping first, do write
protection with that and also do the per page wait queue for page lock.
Which i believe are both nice features. This will also make the patchset
smaller and easier to review (less scary).

KSM can be done on top of that latter and i will be happy to help. I
have a bunch of coccinelle patches for page->private, page->index and
i can do some for page->flags.

Cheers,
Jerome

  reply	other threads:[~2018-04-19 14:52 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-04 19:17 [RFC PATCH 00/79] Generic page write protection and a solution to page waitqueue jglisse
2018-04-04 19:17 ` jglisse
2018-04-04 19:17 ` [RFC PATCH 04/79] pipe: add inode field to struct pipe_inode_info jglisse
2018-04-04 19:17   ` jglisse
2018-04-04 19:17 ` [RFC PATCH 05/79] mm/swap: add an helper to get address_space from swap_entry_t jglisse
2018-04-04 19:17   ` jglisse
2018-04-04 19:17 ` [RFC PATCH 06/79] mm/page: add helpers to dereference struct page index field jglisse
2018-04-04 19:17   ` jglisse
2018-04-04 19:17 ` [RFC PATCH 07/79] mm/page: add helpers to find mapping give a page and buffer head jglisse
2018-04-04 19:17   ` jglisse
2018-04-04 19:17 ` [RFC PATCH 08/79] mm/page: add helpers to find page mapping and private given a bio jglisse
2018-04-04 19:17   ` jglisse
2018-04-04 19:17 ` [RFC PATCH 09/79] fs: add struct address_space to read_cache_page() callback argument jglisse
2018-04-04 19:17   ` jglisse
2018-04-04 19:17 ` [RFC PATCH 20/79] fs: add struct address_space to write_cache_pages() " jglisse
2018-04-04 19:17   ` jglisse
2018-04-04 19:17 ` [RFC PATCH 22/79] fs: add struct inode to block_read_full_page() arguments jglisse
2018-04-04 19:17   ` jglisse
2018-04-04 19:17 ` [RFC PATCH 24/79] fs: add struct inode to nobh_writepage() arguments jglisse
2018-04-04 19:17   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 26/79] fs: add struct address_space to mpage_readpage() arguments jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 27/79] fs: add struct address_space to fscache_read*() callback arguments jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 28/79] fs: introduce page_is_truncated() helper jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 29/79] fs/block: add struct address_space to bdev_write_page() arguments jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 30/79] fs/block: add struct address_space to __block_write_begin() arguments jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 31/79] fs/block: add struct address_space to __block_write_begin_int() args jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 32/79] fs/block: do not rely on page->mapping get it from the context jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 33/79] fs/journal: add struct super_block to jbd2_journal_forget() arguments jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 34/79] fs/journal: add struct inode to jbd2_journal_revoke() arguments jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 35/79] fs/buffer: add struct address_space and struct page to end_io callback jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 36/79] fs/buffer: add struct super_block to bforget() arguments jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 37/79] fs/buffer: add struct super_block to __bforget() arguments jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 38/79] fs/buffer: add first buffer flag for first buffer_head in a page jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 39/79] fs/buffer: add struct address_space to clean_page_buffers() arguments jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 50/79] fs: stop relying on mapping field of struct page, get it from context jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 51/79] " jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 52/79] fs/buffer: use _page_has_buffers() instead of page_has_buffers() jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 63/79] mm/page: convert page's index lookup to be against specific mapping jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 64/79] mm/buffer: use _page_has_buffers() instead of page_has_buffers() jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 65/79] mm/swap: add struct swap_info_struct swap_readpage() arguments jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 68/79] mm/vma_address: convert page's index lookup to be against specific mapping jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 69/79] fs/journal: add struct address_space to jbd2_journal_try_to_free_buffers() arguments jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 70/79] mm: add struct address_space to mark_buffer_dirty() jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 71/79] mm: add struct address_space to set_page_dirty() jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 72/79] mm: add struct address_space to set_page_dirty_lock() jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 73/79] mm: pass down struct address_space to set_page_dirty() jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 74/79] mm/page_ronly: add config option for generic read only page framework jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 75/79] mm/page_ronly: add page read only core structure and helpers jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 76/79] mm/ksm: have ksm select PAGE_RONLY config jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 77/79] mm/ksm: hide set_page_stable_node() and page_stable_node() jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 78/79] mm/ksm: rename PAGE_MAPPING_KSM to PAGE_MAPPING_RONLY jglisse
2018-04-04 19:18   ` jglisse
2018-04-04 19:18 ` [RFC PATCH 79/79] mm/ksm: set page->mapping to page_ronly struct instead of stable_node jglisse
2018-04-04 19:18   ` jglisse
2018-04-18 14:13 ` [RFC PATCH 00/79] Generic page write protection and a solution to page waitqueue Jan Kara
2018-04-18 14:13   ` Jan Kara
2018-04-18 14:13   ` Jan Kara
2018-04-18 15:54   ` Jerome Glisse
2018-04-18 15:54     ` Jerome Glisse
2018-04-18 15:54     ` Jerome Glisse
2018-04-18 16:20     ` Darrick J. Wong
2018-04-18 16:20       ` Darrick J. Wong
2018-04-18 16:20       ` Darrick J. Wong
2018-04-19 10:32     ` Jan Kara
2018-04-19 14:52       ` Jerome Glisse [this message]
2018-04-19 14:52         ` Jerome Glisse
2018-04-19 14:52         ` Jerome Glisse
2018-04-20 19:57 ` Tim Chen
2018-04-20 19:57   ` Tim Chen
2018-04-20 22:19   ` Jerome Glisse
2018-04-20 22:19     ` Jerome Glisse
2018-04-20 22:19     ` Jerome Glisse
2018-04-20 23:48     ` Tim Chen
2018-04-20 23:48       ` Tim Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180419145219.GB3519@redhat.com \
    --to=jglisse@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=jack@suse.cz \
    --cc=jbacik@fb.com \
    --cc=jlayton@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=tim.c.chen@linux.intel.com \
    --cc=tj@kernel.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.