linux-nvdimm.lists.01.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/3] dax: Fix missed wakeup in put_unlocked_entry()
@ 2021-04-28 19:03 Vivek Goyal
  2021-04-28 19:03 ` [PATCH v6 1/3] dax: Add an enum for specifying dax wakup mode Vivek Goyal
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Vivek Goyal @ 2021-04-28 19:03 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, linux-nvdimm, virtio-fs, dan.j.williams
  Cc: willy, jack, groug

Hi,

This is V6. Only change since V5 is that I changed order of WAKE_NEXT
and WAKE_ALL in comments too.

Vivek

Vivek Goyal (3):
  dax: Add an enum for specifying dax wakup mode
  dax: Add a wakeup mode parameter to put_unlocked_entry()
  dax: Wake up all waiters after invalidating dax entry

 fs/dax.c | 35 +++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 12 deletions(-)

-- 
2.25.4
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v6 1/3] dax: Add an enum for specifying dax wakup mode
  2021-04-28 19:03 [PATCH v6 0/3] dax: Fix missed wakeup in put_unlocked_entry() Vivek Goyal
@ 2021-04-28 19:03 ` Vivek Goyal
  2021-04-28 19:03 ` [PATCH v6 2/3] dax: Add a wakeup mode parameter to put_unlocked_entry() Vivek Goyal
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Vivek Goyal @ 2021-04-28 19:03 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, linux-nvdimm, virtio-fs, dan.j.williams
  Cc: willy, jack, groug

Dan mentioned that he is not very fond of passing around a boolean true/false
to specify if only next waiter should be woken up or all waiters should be
woken up. He instead prefers that we introduce an enum and make it very
explicity at the callsite itself. Easier to read code.

This patch should not introduce any change of behavior.

Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/dax.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index b3d27fdc6775..5ecee51c44ee 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -144,6 +144,16 @@ struct wait_exceptional_entry_queue {
 	struct exceptional_entry_key key;
 };
 
+/**
+ * enum dax_wake_mode: waitqueue wakeup behaviour
+ * @WAKE_ALL: wake all waiters in the waitqueue
+ * @WAKE_NEXT: wake only the first waiter in the waitqueue
+ */
+enum dax_wake_mode {
+	WAKE_ALL,
+	WAKE_NEXT,
+};
+
 static wait_queue_head_t *dax_entry_waitqueue(struct xa_state *xas,
 		void *entry, struct exceptional_entry_key *key)
 {
@@ -182,7 +192,8 @@ static int wake_exceptional_entry_func(wait_queue_entry_t *wait,
  * The important information it's conveying is whether the entry at
  * this index used to be a PMD entry.
  */
-static void dax_wake_entry(struct xa_state *xas, void *entry, bool wake_all)
+static void dax_wake_entry(struct xa_state *xas, void *entry,
+			   enum dax_wake_mode mode)
 {
 	struct exceptional_entry_key key;
 	wait_queue_head_t *wq;
@@ -196,7 +207,7 @@ static void dax_wake_entry(struct xa_state *xas, void *entry, bool wake_all)
 	 * must be in the waitqueue and the following check will see them.
 	 */
 	if (waitqueue_active(wq))
-		__wake_up(wq, TASK_NORMAL, wake_all ? 0 : 1, &key);
+		__wake_up(wq, TASK_NORMAL, mode == WAKE_ALL ? 0 : 1, &key);
 }
 
 /*
@@ -268,7 +279,7 @@ static void put_unlocked_entry(struct xa_state *xas, void *entry)
 {
 	/* If we were the only waiter woken, wake the next one */
 	if (entry && !dax_is_conflict(entry))
-		dax_wake_entry(xas, entry, false);
+		dax_wake_entry(xas, entry, WAKE_NEXT);
 }
 
 /*
@@ -286,7 +297,7 @@ static void dax_unlock_entry(struct xa_state *xas, void *entry)
 	old = xas_store(xas, entry);
 	xas_unlock_irq(xas);
 	BUG_ON(!dax_is_locked(old));
-	dax_wake_entry(xas, entry, false);
+	dax_wake_entry(xas, entry, WAKE_NEXT);
 }
 
 /*
@@ -524,7 +535,7 @@ static void *grab_mapping_entry(struct xa_state *xas,
 
 		dax_disassociate_entry(entry, mapping, false);
 		xas_store(xas, NULL);	/* undo the PMD join */
-		dax_wake_entry(xas, entry, true);
+		dax_wake_entry(xas, entry, WAKE_ALL);
 		mapping->nrexceptional--;
 		entry = NULL;
 		xas_set(xas, index);
@@ -937,7 +948,7 @@ static int dax_writeback_one(struct xa_state *xas, struct dax_device *dax_dev,
 	xas_lock_irq(xas);
 	xas_store(xas, entry);
 	xas_clear_mark(xas, PAGECACHE_TAG_DIRTY);
-	dax_wake_entry(xas, entry, false);
+	dax_wake_entry(xas, entry, WAKE_NEXT);
 
 	trace_dax_writeback_one(mapping->host, index, count);
 	return ret;
-- 
2.25.4
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v6 2/3] dax: Add a wakeup mode parameter to put_unlocked_entry()
  2021-04-28 19:03 [PATCH v6 0/3] dax: Fix missed wakeup in put_unlocked_entry() Vivek Goyal
  2021-04-28 19:03 ` [PATCH v6 1/3] dax: Add an enum for specifying dax wakup mode Vivek Goyal
@ 2021-04-28 19:03 ` Vivek Goyal
  2021-04-28 19:03 ` [PATCH v6 3/3] dax: Wake up all waiters after invalidating dax entry Vivek Goyal
  2021-04-28 21:08 ` [PATCH v6 0/3] dax: Fix missed wakeup in put_unlocked_entry() Dan Williams
  3 siblings, 0 replies; 5+ messages in thread
From: Vivek Goyal @ 2021-04-28 19:03 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, linux-nvdimm, virtio-fs, dan.j.williams
  Cc: willy, jack, groug

As of now put_unlocked_entry() always wakes up next waiter. In next
patches we want to wake up all waiters at one callsite. Hence, add a
parameter to the function.

This patch does not introduce any change of behavior.

Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/dax.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 5ecee51c44ee..56eb1c759ca5 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -275,11 +275,11 @@ static void wait_entry_unlocked(struct xa_state *xas, void *entry)
 	finish_wait(wq, &ewait.wait);
 }
 
-static void put_unlocked_entry(struct xa_state *xas, void *entry)
+static void put_unlocked_entry(struct xa_state *xas, void *entry,
+			       enum dax_wake_mode mode)
 {
-	/* If we were the only waiter woken, wake the next one */
 	if (entry && !dax_is_conflict(entry))
-		dax_wake_entry(xas, entry, WAKE_NEXT);
+		dax_wake_entry(xas, entry, mode);
 }
 
 /*
@@ -633,7 +633,7 @@ struct page *dax_layout_busy_page_range(struct address_space *mapping,
 			entry = get_unlocked_entry(&xas, 0);
 		if (entry)
 			page = dax_busy_page(entry);
-		put_unlocked_entry(&xas, entry);
+		put_unlocked_entry(&xas, entry, WAKE_NEXT);
 		if (page)
 			break;
 		if (++scanned % XA_CHECK_SCHED)
@@ -675,7 +675,7 @@ static int __dax_invalidate_entry(struct address_space *mapping,
 	mapping->nrexceptional--;
 	ret = 1;
 out:
-	put_unlocked_entry(&xas, entry);
+	put_unlocked_entry(&xas, entry, WAKE_NEXT);
 	xas_unlock_irq(&xas);
 	return ret;
 }
@@ -954,7 +954,7 @@ static int dax_writeback_one(struct xa_state *xas, struct dax_device *dax_dev,
 	return ret;
 
  put_unlocked:
-	put_unlocked_entry(xas, entry);
+	put_unlocked_entry(xas, entry, WAKE_NEXT);
 	return ret;
 }
 
@@ -1695,7 +1695,7 @@ dax_insert_pfn_mkwrite(struct vm_fault *vmf, pfn_t pfn, unsigned int order)
 	/* Did we race with someone splitting entry or so? */
 	if (!entry || dax_is_conflict(entry) ||
 	    (order == 0 && !dax_is_pte_entry(entry))) {
-		put_unlocked_entry(&xas, entry);
+		put_unlocked_entry(&xas, entry, WAKE_NEXT);
 		xas_unlock_irq(&xas);
 		trace_dax_insert_pfn_mkwrite_no_entry(mapping->host, vmf,
 						      VM_FAULT_NOPAGE);
-- 
2.25.4
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v6 3/3] dax: Wake up all waiters after invalidating dax entry
  2021-04-28 19:03 [PATCH v6 0/3] dax: Fix missed wakeup in put_unlocked_entry() Vivek Goyal
  2021-04-28 19:03 ` [PATCH v6 1/3] dax: Add an enum for specifying dax wakup mode Vivek Goyal
  2021-04-28 19:03 ` [PATCH v6 2/3] dax: Add a wakeup mode parameter to put_unlocked_entry() Vivek Goyal
@ 2021-04-28 19:03 ` Vivek Goyal
  2021-04-28 21:08 ` [PATCH v6 0/3] dax: Fix missed wakeup in put_unlocked_entry() Dan Williams
  3 siblings, 0 replies; 5+ messages in thread
From: Vivek Goyal @ 2021-04-28 19:03 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, linux-nvdimm, virtio-fs, dan.j.williams
  Cc: willy, jack, groug, Sergio Lopez

I am seeing missed wakeups which ultimately lead to a deadlock when I am
using virtiofs with DAX enabled and running "make -j". I had to mount
virtiofs as rootfs and also reduce to dax window size to 256M to reproduce
the problem consistently.

So here is the problem. put_unlocked_entry() wakes up waiters only
if entry is not null as well as !dax_is_conflict(entry). But if I
call multiple instances of invalidate_inode_pages2() in parallel,
then I can run into a situation where there are waiters on
this index but nobody will wake these waiters.

invalidate_inode_pages2()
  invalidate_inode_pages2_range()
    invalidate_exceptional_entry2()
      dax_invalidate_mapping_entry_sync()
        __dax_invalidate_entry() {
                xas_lock_irq(&xas);
                entry = get_unlocked_entry(&xas, 0);
                ...
                ...
                dax_disassociate_entry(entry, mapping, trunc);
                xas_store(&xas, NULL);
                ...
                ...
                put_unlocked_entry(&xas, entry);
                xas_unlock_irq(&xas);
        }

Say a fault in in progress and it has locked entry at offset say "0x1c".
Now say three instances of invalidate_inode_pages2() are in progress
(A, B, C) and they all try to invalidate entry at offset "0x1c". Given
dax entry is locked, all tree instances A, B, C will wait in wait queue.

When dax fault finishes, say A is woken up. It will store NULL entry
at index "0x1c" and wake up B. When B comes along it will find "entry=0"
at page offset 0x1c and it will call put_unlocked_entry(&xas, 0). And
this means put_unlocked_entry() will not wake up next waiter, given
the current code. And that means C continues to wait and is not woken
up.

This patch fixes the issue by waking up all waiters when a dax entry
has been invalidated. This seems to fix the deadlock I am facing
and I can make forward progress.

Reported-by: Sergio Lopez <slp@redhat.com>
Fixes: ac401cc78242 ("dax: New fault locking")
Reviewed-by: Jan Kara <jack@suse.cz>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/dax.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index 56eb1c759ca5..df5485b4bddf 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -675,7 +675,7 @@ static int __dax_invalidate_entry(struct address_space *mapping,
 	mapping->nrexceptional--;
 	ret = 1;
 out:
-	put_unlocked_entry(&xas, entry, WAKE_NEXT);
+	put_unlocked_entry(&xas, entry, WAKE_ALL);
 	xas_unlock_irq(&xas);
 	return ret;
 }
-- 
2.25.4
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v6 0/3] dax: Fix missed wakeup in put_unlocked_entry()
  2021-04-28 19:03 [PATCH v6 0/3] dax: Fix missed wakeup in put_unlocked_entry() Vivek Goyal
                   ` (2 preceding siblings ...)
  2021-04-28 19:03 ` [PATCH v6 3/3] dax: Wake up all waiters after invalidating dax entry Vivek Goyal
@ 2021-04-28 21:08 ` Dan Williams
  3 siblings, 0 replies; 5+ messages in thread
From: Dan Williams @ 2021-04-28 21:08 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-fsdevel, linux-kernel, linux-nvdimm, virtio-fs, willy, jack, groug

On Wed, Apr 28, 2021 at 12:03 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> Hi,
>
> This is V6. Only change since V5 is that I changed order of WAKE_NEXT
> and WAKE_ALL in comments too.
>
> Vivek
>
> Vivek Goyal (3):
>   dax: Add an enum for specifying dax wakup mode
>   dax: Add a wakeup mode parameter to put_unlocked_entry()
>   dax: Wake up all waiters after invalidating dax entry

Thanks Vivek, looks good.
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-04-28 21:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-28 19:03 [PATCH v6 0/3] dax: Fix missed wakeup in put_unlocked_entry() Vivek Goyal
2021-04-28 19:03 ` [PATCH v6 1/3] dax: Add an enum for specifying dax wakup mode Vivek Goyal
2021-04-28 19:03 ` [PATCH v6 2/3] dax: Add a wakeup mode parameter to put_unlocked_entry() Vivek Goyal
2021-04-28 19:03 ` [PATCH v6 3/3] dax: Wake up all waiters after invalidating dax entry Vivek Goyal
2021-04-28 21:08 ` [PATCH v6 0/3] dax: Fix missed wakeup in put_unlocked_entry() Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).