All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Matthew Wilcox <willy@infradead.org>
Cc: Seema Pandit <seema.pandit@intel.com>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	stable <stable@vger.kernel.org>,
	Robert Barror <robert.barror@intel.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Jan Kara <jack@suse.cz>
Subject: Re: [PATCH] filesystem-dax: Disable PMD support
Date: Mon, 1 Jul 2019 14:11:19 +0200	[thread overview]
Message-ID: <20190701121119.GE31621@quack2.suse.cz> (raw)
In-Reply-To: <20190630152324.GA15900@bombadil.infradead.org>

On Sun 30-06-19 08:23:24, Matthew Wilcox wrote:
> On Sun, Jun 30, 2019 at 01:01:04AM -0700, Dan Williams wrote:
> > @@ -215,7 +216,7 @@ static wait_queue_head_t
> > *dax_entry_waitqueue(struct xa_state *xas,
> >          * queue to the start of that PMD.  This ensures that all offsets in
> >          * the range covered by the PMD map to the same bit lock.
> >          */
> > -       if (dax_is_pmd_entry(entry))
> > +       //if (dax_is_pmd_entry(entry))
> >                 index &= ~PG_PMD_COLOUR;
> >         key->xa = xas->xa;
> >         key->entry_start = index;
> 
> Hah, that's a great naive fix!  Thanks for trying that out.
> 
> I think my theory was slightly mistaken, but your fix has the effect of
> fixing the actual problem too.
> 
> The xas->xa_index for a PMD is going to be PMD-aligned (ie a multiple of
> 512), but xas_find_conflict() does _not_ adjust xa_index (... which I
> really should have mentioned in the documentation).  So we go to sleep
> on the PMD-aligned index instead of the index of the PTE.  Your patch
> fixes this by using the PMD-aligned index for PTEs too.
> 
> I'm trying to come up with a clean fix for this.  Clearly we
> shouldn't wait for a PTE entry if we're looking for a PMD entry.
> But what should get_unlocked_entry() return if it detects that case?
> We could have it return an error code encoded as an internal entry,
> like grab_mapping_entry() does.  Or we could have it return the _locked_
> PTE entry, and have callers interpret that.
> 
> At least get_unlocked_entry() is static, but it's got quite a few callers.
> Trying to discern which ones might ask for a PMD entry is a bit tricky.
> So this seems like a large patch which might have bugs.

Yeah. So get_unlocked_entry() is used in several cases:

1) Case where we already have entry at given index but it is locked and we
need it unlocked so that we can do our thing `(dax_writeback_one(),
dax_layout_busy_page()).

2) Case where we want any entry covering given index (in
__dax_invalidate_entry()). This is essentially the same as case 1) since we
have already looked up the entry (just didn't propagate that information
from mm/truncate.c) - we want any unlocked entry covering given index.

3) Cases where we really want entry at given index and we have some entry
order constraints (dax_insert_pfn_mkwrite(), grab_mapping_entry()).

Honestly I'd make the rule that get_unlocked_entry() returns entry of any
order that is covering given index. I agree it may be unnecessarily waiting
for PTE entry lock for the case where in case 3) we are really looking only
for PMD entry but that seems like a relatively small cost for the
simplicity of the interface.

BTW, looking into the xarray code, I think I found another difference
between the old radix tree code and the new xarray code that could cause
issues. In the old radix tree code if we tried to insert PMD entry but
there was some PTE entry in the covered range, we'd get EEXIST error back
and the DAX fault code relies on this. I don't see how similar behavior is
achieved by xas_store()...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: Matthew Wilcox <willy@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>,
	Seema Pandit <seema.pandit@intel.com>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	stable <stable@vger.kernel.org>,
	Robert Barror <robert.barror@intel.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Jan Kara <jack@suse.cz>
Subject: Re: [PATCH] filesystem-dax: Disable PMD support
Date: Mon, 1 Jul 2019 14:11:19 +0200	[thread overview]
Message-ID: <20190701121119.GE31621@quack2.suse.cz> (raw)
In-Reply-To: <20190630152324.GA15900@bombadil.infradead.org>

On Sun 30-06-19 08:23:24, Matthew Wilcox wrote:
> On Sun, Jun 30, 2019 at 01:01:04AM -0700, Dan Williams wrote:
> > @@ -215,7 +216,7 @@ static wait_queue_head_t
> > *dax_entry_waitqueue(struct xa_state *xas,
> >          * queue to the start of that PMD.  This ensures that all offsets in
> >          * the range covered by the PMD map to the same bit lock.
> >          */
> > -       if (dax_is_pmd_entry(entry))
> > +       //if (dax_is_pmd_entry(entry))
> >                 index &= ~PG_PMD_COLOUR;
> >         key->xa = xas->xa;
> >         key->entry_start = index;
> 
> Hah, that's a great naive fix!  Thanks for trying that out.
> 
> I think my theory was slightly mistaken, but your fix has the effect of
> fixing the actual problem too.
> 
> The xas->xa_index for a PMD is going to be PMD-aligned (ie a multiple of
> 512), but xas_find_conflict() does _not_ adjust xa_index (... which I
> really should have mentioned in the documentation).  So we go to sleep
> on the PMD-aligned index instead of the index of the PTE.  Your patch
> fixes this by using the PMD-aligned index for PTEs too.
> 
> I'm trying to come up with a clean fix for this.  Clearly we
> shouldn't wait for a PTE entry if we're looking for a PMD entry.
> But what should get_unlocked_entry() return if it detects that case?
> We could have it return an error code encoded as an internal entry,
> like grab_mapping_entry() does.  Or we could have it return the _locked_
> PTE entry, and have callers interpret that.
> 
> At least get_unlocked_entry() is static, but it's got quite a few callers.
> Trying to discern which ones might ask for a PMD entry is a bit tricky.
> So this seems like a large patch which might have bugs.

Yeah. So get_unlocked_entry() is used in several cases:

1) Case where we already have entry at given index but it is locked and we
need it unlocked so that we can do our thing `(dax_writeback_one(),
dax_layout_busy_page()).

2) Case where we want any entry covering given index (in
__dax_invalidate_entry()). This is essentially the same as case 1) since we
have already looked up the entry (just didn't propagate that information
from mm/truncate.c) - we want any unlocked entry covering given index.

3) Cases where we really want entry at given index and we have some entry
order constraints (dax_insert_pfn_mkwrite(), grab_mapping_entry()).

Honestly I'd make the rule that get_unlocked_entry() returns entry of any
order that is covering given index. I agree it may be unnecessarily waiting
for PTE entry lock for the case where in case 3) we are really looking only
for PMD entry but that seems like a relatively small cost for the
simplicity of the interface.

BTW, looking into the xarray code, I think I found another difference
between the old radix tree code and the new xarray code that could cause
issues. In the old radix tree code if we tried to insert PMD entry but
there was some PTE entry in the covered range, we'd get EEXIST error back
and the DAX fault code relies on this. I don't see how similar behavior is
achieved by xas_store()...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  parent reply	other threads:[~2019-07-03  9:47 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-27  0:15 [PATCH] filesystem-dax: Disable PMD support Dan Williams
2019-06-27  0:15 ` Dan Williams
2019-06-27 12:34 ` Matthew Wilcox
2019-06-27 12:34   ` Matthew Wilcox
2019-06-27 16:06   ` Dan Williams
2019-06-27 16:06     ` Dan Williams
2019-06-27 18:29     ` Dan Williams
2019-06-27 18:29       ` Dan Williams
2019-06-27 18:58       ` Dan Williams
2019-06-27 18:58         ` Dan Williams
2019-06-27 19:09         ` Dan Williams
2019-06-27 19:09           ` Dan Williams
2019-06-27 19:59           ` Matthew Wilcox
2019-06-27 19:59             ` Matthew Wilcox
2019-06-28  2:39             ` Dan Williams
2019-06-28  2:39               ` Dan Williams
2019-06-28 16:37               ` Matthew Wilcox
2019-06-28 16:37                 ` Matthew Wilcox
2019-06-28 16:39                 ` Dan Williams
2019-06-28 16:39                   ` Dan Williams
2019-06-28 16:54                   ` Matthew Wilcox
2019-06-28 16:54                     ` Matthew Wilcox
2019-06-29 16:03               ` Matthew Wilcox
2019-06-29 16:03                 ` Matthew Wilcox
2019-06-30  7:27                 ` Dan Williams
2019-06-30  7:27                   ` Dan Williams
2019-06-30  8:01                   ` Dan Williams
2019-06-30  8:01                     ` Dan Williams
2019-06-30 15:23                     ` Matthew Wilcox
2019-06-30 15:23                       ` Matthew Wilcox
2019-06-30 21:37                       ` Dan Williams
2019-06-30 21:37                         ` Dan Williams
2019-07-02  3:34                         ` Matthew Wilcox
2019-07-02  3:34                           ` Matthew Wilcox
2019-07-02 15:37                           ` Dan Williams
2019-07-02 15:37                             ` Dan Williams
2019-07-03  0:22                             ` Boaz Harrosh
2019-07-03  0:22                               ` Boaz Harrosh
2019-07-03  0:42                               ` Dan Williams
2019-07-03  0:42                                 ` Dan Williams
2019-07-03  1:39                                 ` Boaz Harrosh
2019-07-03  1:39                                   ` Boaz Harrosh
2019-07-01 12:11                       ` Jan Kara [this message]
2019-07-01 12:11                         ` Jan Kara
2019-07-03 15:47                         ` Matthew Wilcox
2019-07-03 15:47                           ` Matthew Wilcox
2019-07-04 16:40                           ` Jan Kara
2019-07-04 16:40                             ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190701121119.GE31621@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=robert.barror@intel.com \
    --cc=seema.pandit@intel.com \
    --cc=stable@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.