dax_lock_mapping_entry was never safe

* dax_lock_mapping_entry was never safe
@ 2018-11-26 16:12 Matthew Wilcox
  2018-11-26 17:11 ` Jan Kara
  0 siblings, 1 reply; 4+ messages in thread
From: Matthew Wilcox @ 2018-11-26 16:12 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-fsdevel, Jan Kara, linux-nvdimm


I noticed this path while I was doing the 4.19 backport of
dax: Avoid losing wakeup in dax_lock_mapping_entry

                xa_unlock_irq(&mapping->i_pages);
                revalidate = wait_fn();
                finish_wait(wq, &ewait.wait);
                xa_lock_irq(&mapping->i_pages);

It's not safe to call xa_lock_irq() if mapping can have been freed while
we slept.  We'll probably get away with it; most filesystems use a unique
slab for their inodes, so you'll likely get either a freed inode or an
inode which is now the wrong inode.  But if that page has been freed back
to the page allocator, that pointer could now be pointing at anything.

Fixing this in the current codebase is no easier than fixing it in the
4.19 codebase.  This is the best I've come up with.  Could we do better
by not using the _exclusive form of prepare_to_wait()?  I'm not familiar
with all the things that need to be considered when using this family
of interfaces.

diff --git a/fs/dax.c b/fs/dax.c
index 9bcce89ea18e..154b592b18eb 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -232,6 +232,24 @@ static void *get_unlocked_entry(struct xa_state *xas)
 	}
 }
 
+static void wait_unlocked_entry(struct xa_state *xas, void *entry)
+{
+	struct wait_exceptional_entry_queue ewait;
+	wait_queue_head_t *wq;
+
+	init_wait(&ewait.wait);
+	ewait.wait.func = wake_exceptional_entry_func;
+
+	wq = dax_entry_waitqueue(xas, entry, &ewait.key);
+	prepare_to_wait_exclusive(wq, &ewait.wait, TASK_UNINTERRUPTIBLE);
+	xas_unlock_irq(xas);
+	/* We can no longer look at xas */
+	schedule();
+	finish_wait(wq, &ewait.wait);
+	if (waitqueue_active(wq))
+		__wake_up(wq, TASK_NORMAL, 1, &ewait.key);
+}
+
 static void put_unlocked_entry(struct xa_state *xas, void *entry)
 {
 	/* If we were the only waiter woken, wake the next one */
@@ -389,9 +407,7 @@ bool dax_lock_mapping_entry(struct page *page)
 		entry = xas_load(&xas);
 		if (dax_is_locked(entry)) {
 			rcu_read_unlock();
-			entry = get_unlocked_entry(&xas);
-			xas_unlock_irq(&xas);
-			put_unlocked_entry(&xas, entry);
+			wait_unlocked_entry(&xas, entry);
 			rcu_read_lock();
 			continue;
 		}
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 4+ messages in thread