All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-xfs@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
	Matthew Wilcox <mawilcox@microsoft.com>,
	linux-doc@vger.kernel.org,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Jonathan Corbet <corbet@lwn.net>,
	linux-kernel@vger.kernel.org,
	Steven Rostedt <rostedt@goodmis.org>,
	Christoph Hellwig <hch@lst.de>,
	linux-mm@kvack.org, Dave Hansen <dave.hansen@intel.com>,
	Ingo Molnar <mingo@redhat.com>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-nvdimm@lists.01.org
Subject: Re: [PATCH v3 3/5] dax: use common 4k zero page for dax mmap reads
Date: Wed, 19 Jul 2017 10:26:45 -0600	[thread overview]
Message-ID: <20170719162645.GA26445@linux.intel.com> (raw)
In-Reply-To: <20170719153314.GC15908@quack2.suse.cz>

On Wed, Jul 19, 2017 at 05:33:14PM +0200, Jan Kara wrote:
> On Wed 28-06-17 16:01:50, Ross Zwisler wrote:
> > Another major change is that we remove dax_pfn_mkwrite() from our fault
> > flow, and instead rely on the page fault itself to make the PTE dirty and
> > writeable.  The following description from the patch adding the
> > vm_insert_mixed_mkwrite() call explains this a little more:
> > 
> > ***
> >   To be able to use the common 4k zero page in DAX we need to have our PTE
> >   fault path look more like our PMD fault path where a PTE entry can be
> >   marked as dirty and writeable as it is first inserted, rather than
> >   waiting for a follow-up dax_pfn_mkwrite() => finish_mkwrite_fault() call.
> > 
> >   Right now we can rely on having a dax_pfn_mkwrite() call because we can
> >   distinguish between these two cases in do_wp_page():
> > 
> >   	case 1: 4k zero page => writable DAX storage
> >   	case 2: read-only DAX storage => writeable DAX storage
> > 
> >   This distinction is made by via vm_normal_page().  vm_normal_page()
> >   returns false for the common 4k zero page, though, just as it does for
> >   DAX ptes.  Instead of special casing the DAX + 4k zero page case, we will
> >   simplify our DAX PTE page fault sequence so that it matches our DAX PMD
> >   sequence, and get rid of dax_pfn_mkwrite() completely.
> > 
> >   This means that insert_pfn() needs to follow the lead of insert_pfn_pmd()
> >   and allow us to pass in a 'mkwrite' flag.  If 'mkwrite' is set
> >   insert_pfn() will do the work that was previously done by wp_page_reuse()
> >   as part of the dax_pfn_mkwrite() call path.
> > ***
> 
> Hum, thinking about this in context of this patch... So what if we have
> allocated storage, a process faults it read-only, we map it to page tables
> writeprotected. Then the process writes through mmap to the area - the code
> in handle_pte_fault() ends up in do_wp_page() if I'm reading it right.

Yep.

> Then, since we are missing ->pfn_mkwrite() handlers, the PTE will be marked
> writeable but radix tree entry stays clean - bug. Am I missing something?

I don't think we ever end up with a writeable PTE but with a clean radix tree
entry.  When we get the write fault we do a full fault through
dax_iomap_pte_fault() and dax_insert_mapping().

dax_insert_mapping() sets up the dirty radix tree entry via
dax_insert_mapping_entry() before it does anything with the page tables via
vm_insert_mixed_mkwrite().

So, this mkwrite fault path is exactly the path we would have taken if the
initial read to real storage hadn't happened, and we end up in the same end
state - with a dirty DAX radix tree entry and a writeable PTE.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Christoph Hellwig <hch@lst.de>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Ingo Molnar <mingo@redhat.com>, Jonathan Corbet <corbet@lwn.net>,
	Matthew Wilcox <mawilcox@microsoft.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	linux-doc@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-nvdimm@lists.01.org, linux-xfs@vger.kernel.org
Subject: Re: [PATCH v3 3/5] dax: use common 4k zero page for dax mmap reads
Date: Wed, 19 Jul 2017 10:26:45 -0600	[thread overview]
Message-ID: <20170719162645.GA26445@linux.intel.com> (raw)
In-Reply-To: <20170719153314.GC15908@quack2.suse.cz>

On Wed, Jul 19, 2017 at 05:33:14PM +0200, Jan Kara wrote:
> On Wed 28-06-17 16:01:50, Ross Zwisler wrote:
> > Another major change is that we remove dax_pfn_mkwrite() from our fault
> > flow, and instead rely on the page fault itself to make the PTE dirty and
> > writeable.  The following description from the patch adding the
> > vm_insert_mixed_mkwrite() call explains this a little more:
> > 
> > ***
> >   To be able to use the common 4k zero page in DAX we need to have our PTE
> >   fault path look more like our PMD fault path where a PTE entry can be
> >   marked as dirty and writeable as it is first inserted, rather than
> >   waiting for a follow-up dax_pfn_mkwrite() => finish_mkwrite_fault() call.
> > 
> >   Right now we can rely on having a dax_pfn_mkwrite() call because we can
> >   distinguish between these two cases in do_wp_page():
> > 
> >   	case 1: 4k zero page => writable DAX storage
> >   	case 2: read-only DAX storage => writeable DAX storage
> > 
> >   This distinction is made by via vm_normal_page().  vm_normal_page()
> >   returns false for the common 4k zero page, though, just as it does for
> >   DAX ptes.  Instead of special casing the DAX + 4k zero page case, we will
> >   simplify our DAX PTE page fault sequence so that it matches our DAX PMD
> >   sequence, and get rid of dax_pfn_mkwrite() completely.
> > 
> >   This means that insert_pfn() needs to follow the lead of insert_pfn_pmd()
> >   and allow us to pass in a 'mkwrite' flag.  If 'mkwrite' is set
> >   insert_pfn() will do the work that was previously done by wp_page_reuse()
> >   as part of the dax_pfn_mkwrite() call path.
> > ***
> 
> Hum, thinking about this in context of this patch... So what if we have
> allocated storage, a process faults it read-only, we map it to page tables
> writeprotected. Then the process writes through mmap to the area - the code
> in handle_pte_fault() ends up in do_wp_page() if I'm reading it right.

Yep.

> Then, since we are missing ->pfn_mkwrite() handlers, the PTE will be marked
> writeable but radix tree entry stays clean - bug. Am I missing something?

I don't think we ever end up with a writeable PTE but with a clean radix tree
entry.  When we get the write fault we do a full fault through
dax_iomap_pte_fault() and dax_insert_mapping().

dax_insert_mapping() sets up the dirty radix tree entry via
dax_insert_mapping_entry() before it does anything with the page tables via
vm_insert_mixed_mkwrite().

So, this mkwrite fault path is exactly the path we would have taken if the
initial read to real storage hadn't happened, and we end up in the same end
state - with a dirty DAX radix tree entry and a writeable PTE.

WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Theodore Ts'o <tytso@mit.edu>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Christoph Hellwig <hch@lst.de>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Ingo Molnar <mingo@redhat.com>, Jonathan Corbet <corbet@lwn.net>,
	Matthew Wilcox <mawilcox@microsoft.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	linux-doc@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-nvdimm@lists.01.org, linux-xfs@vger.kernel.org
Subject: Re: [PATCH v3 3/5] dax: use common 4k zero page for dax mmap reads
Date: Wed, 19 Jul 2017 10:26:45 -0600	[thread overview]
Message-ID: <20170719162645.GA26445@linux.intel.com> (raw)
In-Reply-To: <20170719153314.GC15908@quack2.suse.cz>

On Wed, Jul 19, 2017 at 05:33:14PM +0200, Jan Kara wrote:
> On Wed 28-06-17 16:01:50, Ross Zwisler wrote:
> > Another major change is that we remove dax_pfn_mkwrite() from our fault
> > flow, and instead rely on the page fault itself to make the PTE dirty and
> > writeable.  The following description from the patch adding the
> > vm_insert_mixed_mkwrite() call explains this a little more:
> > 
> > ***
> >   To be able to use the common 4k zero page in DAX we need to have our PTE
> >   fault path look more like our PMD fault path where a PTE entry can be
> >   marked as dirty and writeable as it is first inserted, rather than
> >   waiting for a follow-up dax_pfn_mkwrite() => finish_mkwrite_fault() call.
> > 
> >   Right now we can rely on having a dax_pfn_mkwrite() call because we can
> >   distinguish between these two cases in do_wp_page():
> > 
> >   	case 1: 4k zero page => writable DAX storage
> >   	case 2: read-only DAX storage => writeable DAX storage
> > 
> >   This distinction is made by via vm_normal_page().  vm_normal_page()
> >   returns false for the common 4k zero page, though, just as it does for
> >   DAX ptes.  Instead of special casing the DAX + 4k zero page case, we will
> >   simplify our DAX PTE page fault sequence so that it matches our DAX PMD
> >   sequence, and get rid of dax_pfn_mkwrite() completely.
> > 
> >   This means that insert_pfn() needs to follow the lead of insert_pfn_pmd()
> >   and allow us to pass in a 'mkwrite' flag.  If 'mkwrite' is set
> >   insert_pfn() will do the work that was previously done by wp_page_reuse()
> >   as part of the dax_pfn_mkwrite() call path.
> > ***
> 
> Hum, thinking about this in context of this patch... So what if we have
> allocated storage, a process faults it read-only, we map it to page tables
> writeprotected. Then the process writes through mmap to the area - the code
> in handle_pte_fault() ends up in do_wp_page() if I'm reading it right.

Yep.

> Then, since we are missing ->pfn_mkwrite() handlers, the PTE will be marked
> writeable but radix tree entry stays clean - bug. Am I missing something?

I don't think we ever end up with a writeable PTE but with a clean radix tree
entry.  When we get the write fault we do a full fault through
dax_iomap_pte_fault() and dax_insert_mapping().

dax_insert_mapping() sets up the dirty radix tree entry via
dax_insert_mapping_entry() before it does anything with the page tables via
vm_insert_mixed_mkwrite().

So, this mkwrite fault path is exactly the path we would have taken if the
initial read to real storage hadn't happened, and we end up in the same end
state - with a dirty DAX radix tree entry and a writeable PTE.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
To: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
Cc: linux-xfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org>,
	Matthew Wilcox <mawilcox-0li6OtcxBFHby3iVrkZq2A@public.gmane.org>,
	linux-doc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	"Darrick J. Wong"
	<darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	Jonathan Corbet <corbet-T1hC0tSOHrs@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>,
	Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	Dave Hansen <dave.hansen-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Andreas Dilger
	<adilger.kernel-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>,
	Alexander Viro
	<viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org
Subject: Re: [PATCH v3 3/5] dax: use common 4k zero page for dax mmap reads
Date: Wed, 19 Jul 2017 10:26:45 -0600	[thread overview]
Message-ID: <20170719162645.GA26445@linux.intel.com> (raw)
In-Reply-To: <20170719153314.GC15908-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>

On Wed, Jul 19, 2017 at 05:33:14PM +0200, Jan Kara wrote:
> On Wed 28-06-17 16:01:50, Ross Zwisler wrote:
> > Another major change is that we remove dax_pfn_mkwrite() from our fault
> > flow, and instead rely on the page fault itself to make the PTE dirty and
> > writeable.  The following description from the patch adding the
> > vm_insert_mixed_mkwrite() call explains this a little more:
> > 
> > ***
> >   To be able to use the common 4k zero page in DAX we need to have our PTE
> >   fault path look more like our PMD fault path where a PTE entry can be
> >   marked as dirty and writeable as it is first inserted, rather than
> >   waiting for a follow-up dax_pfn_mkwrite() => finish_mkwrite_fault() call.
> > 
> >   Right now we can rely on having a dax_pfn_mkwrite() call because we can
> >   distinguish between these two cases in do_wp_page():
> > 
> >   	case 1: 4k zero page => writable DAX storage
> >   	case 2: read-only DAX storage => writeable DAX storage
> > 
> >   This distinction is made by via vm_normal_page().  vm_normal_page()
> >   returns false for the common 4k zero page, though, just as it does for
> >   DAX ptes.  Instead of special casing the DAX + 4k zero page case, we will
> >   simplify our DAX PTE page fault sequence so that it matches our DAX PMD
> >   sequence, and get rid of dax_pfn_mkwrite() completely.
> > 
> >   This means that insert_pfn() needs to follow the lead of insert_pfn_pmd()
> >   and allow us to pass in a 'mkwrite' flag.  If 'mkwrite' is set
> >   insert_pfn() will do the work that was previously done by wp_page_reuse()
> >   as part of the dax_pfn_mkwrite() call path.
> > ***
> 
> Hum, thinking about this in context of this patch... So what if we have
> allocated storage, a process faults it read-only, we map it to page tables
> writeprotected. Then the process writes through mmap to the area - the code
> in handle_pte_fault() ends up in do_wp_page() if I'm reading it right.

Yep.

> Then, since we are missing ->pfn_mkwrite() handlers, the PTE will be marked
> writeable but radix tree entry stays clean - bug. Am I missing something?

I don't think we ever end up with a writeable PTE but with a clean radix tree
entry.  When we get the write fault we do a full fault through
dax_iomap_pte_fault() and dax_insert_mapping().

dax_insert_mapping() sets up the dirty radix tree entry via
dax_insert_mapping_entry() before it does anything with the page tables via
vm_insert_mixed_mkwrite().

So, this mkwrite fault path is exactly the path we would have taken if the
initial read to real storage hadn't happened, and we end up in the same end
state - with a dirty DAX radix tree entry and a writeable PTE.

  reply	other threads:[~2017-07-19 16:24 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-28 22:01 [PATCH v3 0/5] DAX common 4k zero page Ross Zwisler
2017-06-28 22:01 ` Ross Zwisler
2017-06-28 22:01 ` Ross Zwisler
2017-06-28 22:01 ` Ross Zwisler
2017-06-28 22:01 ` [PATCH v3 1/5] mm: add vm_insert_mixed_mkwrite() Ross Zwisler
2017-06-28 22:01   ` Ross Zwisler
2017-06-28 22:01   ` Ross Zwisler
2017-06-28 22:01   ` Ross Zwisler
2017-07-19 14:16   ` Jan Kara
2017-07-19 14:16     ` Jan Kara
2017-07-19 17:51     ` Ross Zwisler
2017-07-19 17:51       ` Ross Zwisler
2017-07-19 17:51       ` Ross Zwisler
2017-07-19 17:51       ` Ross Zwisler
2017-07-19 21:58       ` Ross Zwisler
2017-07-19 21:58         ` Ross Zwisler
2017-07-19 21:58         ` Ross Zwisler
2017-07-19 21:58         ` Ross Zwisler
2017-07-21 17:44         ` Ross Zwisler
2017-07-21 17:44           ` Ross Zwisler
2017-07-21 17:44           ` Ross Zwisler
2017-07-24 11:20           ` Jan Kara
2017-07-24 11:20             ` Jan Kara
2017-07-24 11:20             ` Jan Kara
2017-07-20 15:26   ` Vivek Goyal
2017-07-20 15:26     ` Vivek Goyal
2017-07-20 15:26     ` Vivek Goyal
2017-07-20 15:26     ` Vivek Goyal
2017-07-20 15:59     ` Ross Zwisler
2017-07-20 15:59       ` Ross Zwisler
2017-07-20 15:59       ` Ross Zwisler
2017-07-21 18:02       ` Ross Zwisler
2017-07-21 18:02         ` Ross Zwisler
2017-07-21 18:02         ` Ross Zwisler
2017-06-28 22:01 ` [PATCH v3 2/5] dax: relocate some dax functions Ross Zwisler
2017-06-28 22:01   ` Ross Zwisler
2017-06-28 22:01   ` Ross Zwisler
2017-06-28 22:01   ` Ross Zwisler
2017-06-28 22:01 ` [PATCH v3 3/5] dax: use common 4k zero page for dax mmap reads Ross Zwisler
2017-06-28 22:01   ` Ross Zwisler
2017-06-28 22:01   ` Ross Zwisler
2017-06-28 22:01   ` Ross Zwisler
2017-07-19 15:33   ` Jan Kara
2017-07-19 15:33     ` Jan Kara
2017-07-19 16:26     ` Ross Zwisler [this message]
2017-07-19 16:26       ` Ross Zwisler
2017-07-19 16:26       ` Ross Zwisler
2017-07-19 16:26       ` Ross Zwisler
2017-07-20 10:27       ` Jan Kara
2017-07-20 10:27         ` Jan Kara
2017-07-20 10:27         ` Jan Kara
2017-07-20 10:27         ` Jan Kara
2017-07-20 14:28         ` Ross Zwisler
2017-07-20 14:28           ` Ross Zwisler
2017-07-20 14:28           ` Ross Zwisler
2017-07-20 14:28           ` Ross Zwisler
2017-06-28 22:01 ` [PATCH v3 4/5] dax: remove DAX code from page_cache_tree_insert() Ross Zwisler
2017-06-28 22:01   ` Ross Zwisler
2017-06-28 22:01   ` Ross Zwisler
2017-06-28 22:01   ` Ross Zwisler
2017-06-28 22:01 ` [PATCH v3 5/5] dax: move all DAX radix tree defs to fs/dax.c Ross Zwisler
2017-06-28 22:01   ` Ross Zwisler
2017-06-28 22:01   ` Ross Zwisler
2017-06-28 22:01   ` Ross Zwisler
2017-06-30 19:05 ` [PATCH v3 0/5] DAX common 4k zero page Ross Zwisler
2017-06-30 19:05   ` Ross Zwisler
2017-06-30 19:05   ` Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170719162645.GA26445@linux.intel.com \
    --to=ross.zwisler@linux.intel.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=darrick.wong@oracle.com \
    --cc=dave.hansen@intel.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mawilcox@microsoft.com \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.