linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
	Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Theodore Ts'o <tytso@mit.edu>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Christoph Hellwig <hch@lst.de>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Ingo Molnar <mingo@redhat.com>, Jonathan Corbet <corbet@lwn.net>,
	Matthew Wilcox <mawilcox@microsoft.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	linux-doc@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-nvdimm@lists.01.org, linux-xfs@vger.kernel.org
Subject: Re: [PATCH v3 1/5] mm: add vm_insert_mixed_mkwrite()
Date: Fri, 21 Jul 2017 11:44:05 -0600	[thread overview]
Message-ID: <20170721174405.GA18697@linux.intel.com> (raw)
In-Reply-To: <20170719215831.GC10923@linux.intel.com>

On Wed, Jul 19, 2017 at 03:58:31PM -0600, Ross Zwisler wrote:
> On Wed, Jul 19, 2017 at 11:51:12AM -0600, Ross Zwisler wrote:
> > On Wed, Jul 19, 2017 at 04:16:59PM +0200, Jan Kara wrote:
> > > On Wed 28-06-17 16:01:48, Ross Zwisler wrote:
> > > > To be able to use the common 4k zero page in DAX we need to have our PTE
> > > > fault path look more like our PMD fault path where a PTE entry can be
> > > > marked as dirty and writeable as it is first inserted, rather than waiting
> > > > for a follow-up dax_pfn_mkwrite() => finish_mkwrite_fault() call.
> > > > 
> > > > Right now we can rely on having a dax_pfn_mkwrite() call because we can
> > > > distinguish between these two cases in do_wp_page():
> > > > 
> > > > 	case 1: 4k zero page => writable DAX storage
> > > > 	case 2: read-only DAX storage => writeable DAX storage
> > > > 
> > > > This distinction is made by via vm_normal_page().  vm_normal_page() returns
> > > > false for the common 4k zero page, though, just as it does for DAX ptes.
> > > > Instead of special casing the DAX + 4k zero page case, we will simplify our
> > > > DAX PTE page fault sequence so that it matches our DAX PMD sequence, and
> > > > get rid of dax_pfn_mkwrite() completely.
> > > > 
> > > > This means that insert_pfn() needs to follow the lead of insert_pfn_pmd()
> > > > and allow us to pass in a 'mkwrite' flag.  If 'mkwrite' is set insert_pfn()
> > > > will do the work that was previously done by wp_page_reuse() as part of the
> > > > dax_pfn_mkwrite() call path.
> > > > 
> > > > Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> > > 
> > > Just one small comment below.
> > > 
> > > > @@ -1658,14 +1658,26 @@ static int insert_pfn(struct vm_area_struct *vma, unsigned long addr,
> > > >  	if (!pte)
> > > >  		goto out;
> > > >  	retval = -EBUSY;
> > > > -	if (!pte_none(*pte))
> > > > -		goto out_unlock;
> > > > +	if (!pte_none(*pte)) {
> > > > +		if (mkwrite) {
> > > > +			entry = *pte;
> > > > +			goto out_mkwrite;
> > > 
> > > Can we maybe check here that (pte_pfn(*pte) == pfn_t_to_pfn(pfn)) and
> > > return -EBUSY otherwise? That way we are sure insert_pfn() isn't doing
> > > anything we don't expect 
> > 
> > Sure, that's fine.  I'll add it as a WARN_ON_ONCE() so it's a very loud
> > failure.  If the pfns don't match I think we're insane (and would have been
> > insane prior to this patch series as well) because we are getting a page fault
> > and somehow have a different PFN already mapped at that location.
> 
> Umm...well, I added the warning, and during my regression testing hit a case
> where the PFNs didn't match.  (generic/437 with both ext4 & XFS)
> 
> I've verified that this behavior happens with vanilla v4.12, so it's not a new
> condition introduced by my patch.
> 
> I'm off tracking that down - there's a bug lurking somewhere, I think.

Actually, I think we're fine.  What was happening was that two faults were
racing for a private mapping.  One was installing a RW PTE for the COW page
cache page via wp_page_copy(), and the second was trying to install a
read-only PTE in insert_pfn().  The PFNs don't match because the two faults
are trying to map very different PTEs - one for DAX storage, one for a page
cache page.

This collision is handled by insert_pfn() by just returning -EBUSY, which will
bail out of the fault and either re-fault if necessary, or use the PTE that
the other thread installed.  For the case I described above I think both
faults will just happily use the page cache page, and the RO DAX fault won't
be retried.

I think this is fine, and I'll preserve this behavior as you suggest in the
mkwrite case by validating that the PTE is what we think it should be after we
grab the PTL.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-07-21 17:44 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-28 22:01 [PATCH v3 0/5] DAX common 4k zero page Ross Zwisler
2017-06-28 22:01 ` [PATCH v3 1/5] mm: add vm_insert_mixed_mkwrite() Ross Zwisler
2017-07-19 14:16   ` Jan Kara
2017-07-19 17:51     ` Ross Zwisler
2017-07-19 21:58       ` Ross Zwisler
2017-07-21 17:44         ` Ross Zwisler [this message]
2017-07-24 11:20           ` Jan Kara
2017-07-20 15:26   ` Vivek Goyal
2017-07-20 15:59     ` Ross Zwisler
2017-07-21 18:02       ` Ross Zwisler
2017-06-28 22:01 ` [PATCH v3 2/5] dax: relocate some dax functions Ross Zwisler
2017-06-28 22:01 ` [PATCH v3 3/5] dax: use common 4k zero page for dax mmap reads Ross Zwisler
2017-07-19 15:33   ` Jan Kara
2017-07-19 16:26     ` Ross Zwisler
2017-07-20 10:27       ` Jan Kara
2017-07-20 14:28         ` Ross Zwisler
2017-06-28 22:01 ` [PATCH v3 4/5] dax: remove DAX code from page_cache_tree_insert() Ross Zwisler
2017-06-28 22:01 ` [PATCH v3 5/5] dax: move all DAX radix tree defs to fs/dax.c Ross Zwisler
2017-06-30 19:05 ` [PATCH v3 0/5] DAX common 4k zero page Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170721174405.GA18697@linux.intel.com \
    --to=ross.zwisler@linux.intel.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=darrick.wong@oracle.com \
    --cc=dave.hansen@intel.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mawilcox@microsoft.com \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).