From: Dan Williams <dan.j.williams@intel.com> To: Matthew Wilcox <willy@infradead.org> Cc: Seema Pandit <seema.pandit@intel.com>, linux-nvdimm <linux-nvdimm@lists.01.org>, Boaz Harrosh <openosd@gmail.com>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, stable <stable@vger.kernel.org>, Robert Barror <robert.barror@intel.com>, linux-fsdevel <linux-fsdevel@vger.kernel.org>, Jan Kara <jack@suse.cz> Subject: Re: [PATCH] dax: Fix missed PMD wakeups Date: Wed, 3 Jul 2019 14:28:41 -0700 [thread overview] Message-ID: <CAPcyv4iPNz=oJyc_EoE-mC11=gyBzwMKbmj1ZY_Yna54=cC=Mg@mail.gmail.com> (raw) In-Reply-To: <20190703195302.GJ1729@bombadil.infradead.org> On Wed, Jul 3, 2019 at 12:53 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Wed, Jul 03, 2019 at 10:01:37AM -0700, Dan Williams wrote: > > On Wed, Jul 3, 2019 at 5:17 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > On Wed, Jul 03, 2019 at 12:24:54AM -0700, Dan Williams wrote: > > > > This fix may increase waitqueue contention, but a fix for that is saved > > > > for a larger rework. In the meantime this fix is suitable for -stable > > > > backports. > > > > > > I think this is too big for what it is; just the two-line patch to stop > > > incorporating the low bits of the PTE would be more appropriate. > > > > Sufficient, yes, "appropriate", not so sure. All those comments about > > pmd entry size are stale after this change. > > But then they'll have to be put back in again. This seems to be working > for me, although I doubt I'm actually hitting the edge case that rocksdb > hits: Seems to be holding up under testing here, a couple comments... > > diff --git a/fs/dax.c b/fs/dax.c > index 2e48c7ebb973..e77bd6aef10c 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -198,6 +198,10 @@ static void dax_wake_entry(struct xa_state *xas, void *entry, bool wake_all) > * if it did. > * > * Must be called with the i_pages lock held. > + * > + * If the xa_state refers to a larger entry, then it may return a locked > + * smaller entry (eg a PTE entry) without waiting for the smaller entry > + * to be unlocked. > */ > static void *get_unlocked_entry(struct xa_state *xas) > { > @@ -211,7 +215,8 @@ static void *get_unlocked_entry(struct xa_state *xas) > for (;;) { > entry = xas_find_conflict(xas); > if (!entry || WARN_ON_ONCE(!xa_is_value(entry)) || > - !dax_is_locked(entry)) > + !dax_is_locked(entry) || > + dax_entry_order(entry) < xas_get_order(xas)) Doesn't this potentially allow a locked entry to be returned for a caller that expects all value entries are unlocked? > return entry; > > wq = dax_entry_waitqueue(xas, entry, &ewait.key); > @@ -253,8 +258,12 @@ static void wait_entry_unlocked(struct xa_state *xas, void *entry) > > static void put_unlocked_entry(struct xa_state *xas, void *entry) > { > - /* If we were the only waiter woken, wake the next one */ > - if (entry) > + /* > + * If we were the only waiter woken, wake the next one. > + * Do not wake anybody if the entry is locked; that indicates > + * we weren't woken. > + */ > + if (entry && !dax_is_locked(entry)) > dax_wake_entry(xas, entry, false); > } > > diff --git a/include/linux/xarray.h b/include/linux/xarray.h > index 052e06ff4c36..b17289d92af4 100644 > --- a/include/linux/xarray.h > +++ b/include/linux/xarray.h > @@ -1529,6 +1529,27 @@ static inline void xas_set_order(struct xa_state *xas, unsigned long index, > #endif > } > > +/** > + * xas_get_order() - Get the order of the entry being operated on. > + * @xas: XArray operation state. > + * > + * Return: The order of the entry. > + */ > +static inline unsigned int xas_get_order(const struct xa_state *xas) > +{ > + unsigned int order = xas->xa_shift; > + > +#ifdef CONFIG_XARRAY_MULTI > + unsigned int sibs = xas->xa_sibs; > + > + while (sibs) { > + order++; > + sibs /= 2; > + } Use ilog2() here? _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com> To: Matthew Wilcox <willy@infradead.org> Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>, Jan Kara <jack@suse.cz>, Boaz Harrosh <openosd@gmail.com>, stable <stable@vger.kernel.org>, Robert Barror <robert.barror@intel.com>, Seema Pandit <seema.pandit@intel.com>, linux-nvdimm <linux-nvdimm@lists.01.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org> Subject: Re: [PATCH] dax: Fix missed PMD wakeups Date: Wed, 3 Jul 2019 14:28:41 -0700 [thread overview] Message-ID: <CAPcyv4iPNz=oJyc_EoE-mC11=gyBzwMKbmj1ZY_Yna54=cC=Mg@mail.gmail.com> (raw) In-Reply-To: <20190703195302.GJ1729@bombadil.infradead.org> On Wed, Jul 3, 2019 at 12:53 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Wed, Jul 03, 2019 at 10:01:37AM -0700, Dan Williams wrote: > > On Wed, Jul 3, 2019 at 5:17 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > On Wed, Jul 03, 2019 at 12:24:54AM -0700, Dan Williams wrote: > > > > This fix may increase waitqueue contention, but a fix for that is saved > > > > for a larger rework. In the meantime this fix is suitable for -stable > > > > backports. > > > > > > I think this is too big for what it is; just the two-line patch to stop > > > incorporating the low bits of the PTE would be more appropriate. > > > > Sufficient, yes, "appropriate", not so sure. All those comments about > > pmd entry size are stale after this change. > > But then they'll have to be put back in again. This seems to be working > for me, although I doubt I'm actually hitting the edge case that rocksdb > hits: Seems to be holding up under testing here, a couple comments... > > diff --git a/fs/dax.c b/fs/dax.c > index 2e48c7ebb973..e77bd6aef10c 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -198,6 +198,10 @@ static void dax_wake_entry(struct xa_state *xas, void *entry, bool wake_all) > * if it did. > * > * Must be called with the i_pages lock held. > + * > + * If the xa_state refers to a larger entry, then it may return a locked > + * smaller entry (eg a PTE entry) without waiting for the smaller entry > + * to be unlocked. > */ > static void *get_unlocked_entry(struct xa_state *xas) > { > @@ -211,7 +215,8 @@ static void *get_unlocked_entry(struct xa_state *xas) > for (;;) { > entry = xas_find_conflict(xas); > if (!entry || WARN_ON_ONCE(!xa_is_value(entry)) || > - !dax_is_locked(entry)) > + !dax_is_locked(entry) || > + dax_entry_order(entry) < xas_get_order(xas)) Doesn't this potentially allow a locked entry to be returned for a caller that expects all value entries are unlocked? > return entry; > > wq = dax_entry_waitqueue(xas, entry, &ewait.key); > @@ -253,8 +258,12 @@ static void wait_entry_unlocked(struct xa_state *xas, void *entry) > > static void put_unlocked_entry(struct xa_state *xas, void *entry) > { > - /* If we were the only waiter woken, wake the next one */ > - if (entry) > + /* > + * If we were the only waiter woken, wake the next one. > + * Do not wake anybody if the entry is locked; that indicates > + * we weren't woken. > + */ > + if (entry && !dax_is_locked(entry)) > dax_wake_entry(xas, entry, false); > } > > diff --git a/include/linux/xarray.h b/include/linux/xarray.h > index 052e06ff4c36..b17289d92af4 100644 > --- a/include/linux/xarray.h > +++ b/include/linux/xarray.h > @@ -1529,6 +1529,27 @@ static inline void xas_set_order(struct xa_state *xas, unsigned long index, > #endif > } > > +/** > + * xas_get_order() - Get the order of the entry being operated on. > + * @xas: XArray operation state. > + * > + * Return: The order of the entry. > + */ > +static inline unsigned int xas_get_order(const struct xa_state *xas) > +{ > + unsigned int order = xas->xa_shift; > + > +#ifdef CONFIG_XARRAY_MULTI > + unsigned int sibs = xas->xa_sibs; > + > + while (sibs) { > + order++; > + sibs /= 2; > + } Use ilog2() here?
next prev parent reply other threads:[~2019-07-03 21:28 UTC|newest] Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-07-03 7:24 [PATCH] dax: Fix missed PMD wakeups Dan Williams 2019-07-03 7:24 ` Dan Williams 2019-07-03 12:17 ` Matthew Wilcox 2019-07-03 12:17 ` Matthew Wilcox 2019-07-03 17:01 ` Dan Williams 2019-07-03 17:01 ` Dan Williams 2019-07-03 19:53 ` Matthew Wilcox 2019-07-03 19:53 ` Matthew Wilcox 2019-07-03 21:28 ` Dan Williams [this message] 2019-07-03 21:28 ` Dan Williams 2019-07-04 3:27 ` Matthew Wilcox 2019-07-04 3:27 ` Matthew Wilcox 2019-07-04 13:00 ` Boaz Harrosh 2019-07-04 13:00 ` Boaz Harrosh 2019-07-04 13:58 ` Matthew Wilcox 2019-07-04 13:58 ` Matthew Wilcox 2019-07-04 14:32 ` Boaz Harrosh 2019-07-04 14:32 ` Boaz Harrosh 2019-07-04 16:54 ` Jan Kara 2019-07-04 16:54 ` Jan Kara 2019-07-04 19:14 ` Matthew Wilcox 2019-07-04 19:14 ` Matthew Wilcox 2019-07-04 23:27 ` Dan Williams 2019-07-04 23:27 ` Dan Williams 2019-07-05 19:10 ` Matthew Wilcox 2019-07-05 19:10 ` Matthew Wilcox 2019-07-05 20:47 ` Dan Williams 2019-07-05 20:47 ` Dan Williams 2019-07-10 19:02 ` Jan Kara 2019-07-10 19:02 ` Jan Kara 2019-07-10 20:15 ` Matthew Wilcox 2019-07-10 20:15 ` Matthew Wilcox 2019-07-10 20:26 ` Jan Kara 2019-07-10 20:26 ` Jan Kara 2019-07-11 14:13 ` Matthew Wilcox 2019-07-11 14:13 ` Matthew Wilcox 2019-07-11 15:25 ` Matthew Wilcox 2019-07-11 15:25 ` Matthew Wilcox 2019-07-11 15:41 ` Jan Kara 2019-07-11 15:41 ` Jan Kara 2019-07-17 3:39 ` Dan Williams 2019-07-17 3:39 ` Dan Williams 2019-07-29 12:02 ` Jan Kara 2019-07-29 12:02 ` Jan Kara 2019-07-29 15:18 ` Dan Williams 2019-07-29 15:18 ` Dan Williams 2019-07-11 3:08 ` Matthew Wilcox 2019-07-11 3:08 ` Matthew Wilcox 2019-07-11 7:48 ` Jan Kara 2019-07-11 7:48 ` Jan Kara 2019-07-11 13:28 ` Matthew Wilcox 2019-07-11 13:28 ` Matthew Wilcox 2019-07-11 3:35 ` Matthew Wilcox 2019-07-11 3:35 ` Matthew Wilcox 2019-07-11 8:06 ` Jan Kara 2019-07-11 8:06 ` Jan Kara
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAPcyv4iPNz=oJyc_EoE-mC11=gyBzwMKbmj1ZY_Yna54=cC=Mg@mail.gmail.com' \ --to=dan.j.williams@intel.com \ --cc=jack@suse.cz \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-nvdimm@lists.01.org \ --cc=openosd@gmail.com \ --cc=robert.barror@intel.com \ --cc=seema.pandit@intel.com \ --cc=stable@vger.kernel.org \ --cc=willy@infradead.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.