All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jan Kara <jack@suse.cz>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org
Subject: [PATCH 3/3] radix-tree: support locking of individual exception entries.
Date: Sun, 28 Feb 2016 16:09:29 +1100	[thread overview]
Message-ID: <145663616983.3865.11911049648442320016.stgit@notabene> (raw)
In-Reply-To: <145663588892.3865.9987439671424028216.stgit@notabene>

The least significant bit of an exception entry is used as a lock flag.
A caller can:
 - create a locked entry by simply adding an entry with this flag set
 - lock an existing entry with radix_tree_lookup_lock().  This may return
    NULL if the entry doesn't exists, or was deleted while waiting for
    the lock.  It may return a non-exception entry if that is what is
    found.  If it returns a locked entry then it has exclusive rights
    to delete the entry.
 - unlock an entry that is already locked.  This will wake any waiters.
 - delete an entry that is locked.  This will wake waiters so that they
   return NULL without looking at the slot in the radix tree.

These must all be called with the radix tree locked (i.e. a spinlock held).
That spinlock is passed to radix_tree_lookup_lock() so that it can drop
the lock while waiting.

This is a "demonstration of concept".  I haven't actually tested, only compiled.
A possible use case is for the exception entries used by DAX.

It is possible that some of the lookups can be optimised away in some
cases by storing a slot pointer.  I wanted to keep it reasonable
simple until it was determined if it might be useful.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 include/linux/radix-tree.h |    8 ++
 lib/radix-tree.c           |  158 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 166 insertions(+)

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 450c12b546b7..8f579f66574b 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -308,6 +308,14 @@ unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root,
 int radix_tree_tagged(struct radix_tree_root *root, unsigned int tag);
 unsigned long radix_tree_locate_item(struct radix_tree_root *root, void *item);
 
+void *radix_tree_lookup_lock(struct radix_tree_root *root, wait_queue_head_t *wq,
+			     unsigned long index, spinlock_t *lock);
+void radix_tree_unlock(struct radix_tree_root *root, wait_queue_head_t *wq,
+		       unsigned long index);
+void radix_tree_delete_unlock(struct radix_tree_root *root, wait_queue_head_t *wq,
+			      unsigned long index);
+
+
 static inline void radix_tree_preload_end(void)
 {
 	preempt_enable();
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 37d4643ab5c0..a24ea002f3eb 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -1500,3 +1500,161 @@ void __init radix_tree_init(void)
 	radix_tree_init_maxindex();
 	hotcpu_notifier(radix_tree_callback, 0);
 }
+
+/* Exception entry locking.
+ * The least significant bit of an exception entry can be used as a
+ * "locked" flag.  Supported locking operations are:
+ * radix_tree_lookup_lock() - if the indexed entry exists, lock it and
+ *         return the value, else return NULL.  If the indexed entry is not
+ *         exceptional it is returned without locking.
+ * radix_tree_unlock() - release the lock on the indexed entry
+ * radix_tree_delete_unlock() - the entry must be locked.  It will be atomically
+ *     unlocked and removed.  Any threads sleeping in lookup_lock() will return.
+ * Each of these take a radix_tree_root, a wait_queue_head_t, and an index.
+ * The '*lock' function also takes a spinlock_t which must be held when any
+ * of the functions is called.  *lock will drop the spinlock while waiting for
+ * the entry lock.
+ *
+ * As delete_unlock could free the radix_tree_node, waiters much not touch it
+ * when woken.  We provide a wake function for the waitq which records when the
+ * item has been deleted.
+ *
+ * The wait_queue_head passed should be one that is used for bit_wait, such
+ * as zone->wait_table.  We re-use the 'flags' and 'timeout' fields of the
+ * wait_bit_key to store the root and index that we are waiting for.
+ * __wake_up may only be called on one of these keys while the radix tree
+ * is locked.  The wakeup function will take the lock itself if appropriate, or
+ * may record that the radix tree entry has been deleted.  In either case
+ * the waiting function just looks at the status reported by the wakeup function
+ * and doesn't look at the radix tree itself.
+ *
+ * There is no function for locking an entry while inserting it.  Simply
+ * insert an entry that is already marked as 'locked' - lsb set.
+ *
+ */
+
+struct wait_slot_queue {
+	struct radix_tree_root	*root;
+	unsigned long		index;
+	wait_queue_t		wait;
+	enum {SLOT_WAITING, SLOT_LOCKED, SLOT_GONE} state;
+	void			*ret;
+};
+
+static inline int slot_locked(void *v)
+{
+	unsigned long l = (unsigned long)v;
+	return l & 1;
+}
+
+static inline void *lock_slot(void **v)
+{
+	unsigned long *l = (unsigned long *)v;
+	return (void*)(*l |= 1);
+}
+
+static inline void * unlock_slot(void **v)
+{
+	unsigned long *l = (unsigned long *)v;
+	return (void*)(*l &= ~1UL);
+}
+
+static int wake_slot_function(wait_queue_t *wait, unsigned mode, int sync,
+			      void *arg)
+{
+	struct wait_bit_key *key = arg;
+	struct wait_slot_queue *wait_slot =
+		container_of(wait, struct wait_slot_queue, wait);
+	void **slot;
+
+	if (wait_slot->root != key->flags ||
+	    wait_slot->index != key->timeout)
+		/* Not waking this waiter */
+		return 0;
+	if (wait_slot->state != SLOT_WAITING)
+		/* Should be impossible.... */
+		return 1;
+	if (key->bit_nr == -3)
+		/* Was just deleted, no point in doing a lookup */
+		wait_slot = NULL;
+	else
+		wait_slot->ret = __radix_tree_lookup(
+			wait_slot->root, wait_slot->index, NULL, &slot);
+	if (!wait_slot->ret || !radix_tree_exceptional_entry(wait_slot->ret)) {
+		wait_slot->state = SLOT_GONE;
+		return 1;
+	}
+	if (slot_locked(slot))
+		/* still locked */
+		return 0;
+	wait_slot->ret = lock_slot(slot);
+	wait_slot->state = SLOT_LOCKED;
+	return 1;
+}
+
+void *radix_tree_lookup_lock(struct radix_tree_root *root, wait_queue_head_t *wq,
+			     unsigned long index, spinlock_t *lock)
+{
+	void *ret, **slot;
+	struct wait_slot_queue wait;
+
+	ret = __radix_tree_lookup(root, index, NULL, &slot);
+	if (!ret || !radix_tree_exceptional_entry(ret))
+		return ret;
+	if (!slot_locked(slot))
+		return lock_slot(slot);
+
+	wait.wait.private = current;
+	wait.wait.func = wake_slot_function;
+	INIT_LIST_HEAD(&wait.wait.task_list);
+	wait.state = SLOT_WAITING;
+	wait.root = root;
+	wait.index = index;
+	wait.ret = NULL;
+	for (;;) {
+		prepare_to_wait(wq, &wait.wait,
+				TASK_UNINTERRUPTIBLE);
+		if (wait.state != SLOT_WAITING)
+			break;
+
+		spin_unlock(lock);
+		schedule();
+		spin_lock(lock);
+	}
+	finish_wait(wq, &wait.wait);
+	return wait.ret;
+}
+EXPORT_SYMBOL(radix_tree_lookup_lock);
+
+void radix_tree_unlock(struct radix_tree_root *root, wait_queue_head_t *wq,
+			unsigned long index)
+{
+	void *ret, **slot;
+
+	ret = __radix_tree_lookup(root, index, NULL, &slot);
+	if (WARN_ON_ONCE(!ret || !radix_tree_exceptional_entry(ret)))
+		return;
+	if (WARN_ON_ONCE(!slot_locked(slot)))
+		return;
+	unlock_slot(slot);
+
+	if (waitqueue_active(wq)) {
+		struct wait_bit_key key = {.flags = root, .bit_nr = -2,
+					   .timeout = index};
+		__wake_up(wq, TASK_NORMAL, 1, &key);
+	}
+}
+EXPORT_SYMBOL(radix_tree_unlock);
+
+void radix_tree_delete_unlock(struct radix_tree_root *root, wait_queue_head_t *wq,
+			      unsigned long index)
+{
+	radix_tree_delete(root, index);
+	if (waitqueue_active(wq)) {
+		/* -3 here indicates deletion */
+		struct wait_bit_key key = {.flags = root, .bit_nr = -3,
+					   .timeout = index};
+		__wake_up(wq, TASK_NORMAL, 1, &key);
+	}
+}
+EXPORT_SYMBOL(radix_tree_delete_unlock);

WARNING: multiple messages have this Message-ID (diff)
From: NeilBrown <neilb@suse.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
	Matthew Wilcox <matthew.r.wilcox@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jan Kara <jack@suse.cz>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org
Subject: [PATCH 3/3] radix-tree: support locking of individual exception entries.
Date: Sun, 28 Feb 2016 16:09:29 +1100	[thread overview]
Message-ID: <145663616983.3865.11911049648442320016.stgit@notabene> (raw)
In-Reply-To: <145663588892.3865.9987439671424028216.stgit@notabene>

The least significant bit of an exception entry is used as a lock flag.
A caller can:
 - create a locked entry by simply adding an entry with this flag set
 - lock an existing entry with radix_tree_lookup_lock().  This may return
    NULL if the entry doesn't exists, or was deleted while waiting for
    the lock.  It may return a non-exception entry if that is what is
    found.  If it returns a locked entry then it has exclusive rights
    to delete the entry.
 - unlock an entry that is already locked.  This will wake any waiters.
 - delete an entry that is locked.  This will wake waiters so that they
   return NULL without looking at the slot in the radix tree.

These must all be called with the radix tree locked (i.e. a spinlock held).
That spinlock is passed to radix_tree_lookup_lock() so that it can drop
the lock while waiting.

This is a "demonstration of concept".  I haven't actually tested, only compiled.
A possible use case is for the exception entries used by DAX.

It is possible that some of the lookups can be optimised away in some
cases by storing a slot pointer.  I wanted to keep it reasonable
simple until it was determined if it might be useful.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 include/linux/radix-tree.h |    8 ++
 lib/radix-tree.c           |  158 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 166 insertions(+)

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 450c12b546b7..8f579f66574b 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -308,6 +308,14 @@ unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root,
 int radix_tree_tagged(struct radix_tree_root *root, unsigned int tag);
 unsigned long radix_tree_locate_item(struct radix_tree_root *root, void *item);
 
+void *radix_tree_lookup_lock(struct radix_tree_root *root, wait_queue_head_t *wq,
+			     unsigned long index, spinlock_t *lock);
+void radix_tree_unlock(struct radix_tree_root *root, wait_queue_head_t *wq,
+		       unsigned long index);
+void radix_tree_delete_unlock(struct radix_tree_root *root, wait_queue_head_t *wq,
+			      unsigned long index);
+
+
 static inline void radix_tree_preload_end(void)
 {
 	preempt_enable();
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 37d4643ab5c0..a24ea002f3eb 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -1500,3 +1500,161 @@ void __init radix_tree_init(void)
 	radix_tree_init_maxindex();
 	hotcpu_notifier(radix_tree_callback, 0);
 }
+
+/* Exception entry locking.
+ * The least significant bit of an exception entry can be used as a
+ * "locked" flag.  Supported locking operations are:
+ * radix_tree_lookup_lock() - if the indexed entry exists, lock it and
+ *         return the value, else return NULL.  If the indexed entry is not
+ *         exceptional it is returned without locking.
+ * radix_tree_unlock() - release the lock on the indexed entry
+ * radix_tree_delete_unlock() - the entry must be locked.  It will be atomically
+ *     unlocked and removed.  Any threads sleeping in lookup_lock() will return.
+ * Each of these take a radix_tree_root, a wait_queue_head_t, and an index.
+ * The '*lock' function also takes a spinlock_t which must be held when any
+ * of the functions is called.  *lock will drop the spinlock while waiting for
+ * the entry lock.
+ *
+ * As delete_unlock could free the radix_tree_node, waiters much not touch it
+ * when woken.  We provide a wake function for the waitq which records when the
+ * item has been deleted.
+ *
+ * The wait_queue_head passed should be one that is used for bit_wait, such
+ * as zone->wait_table.  We re-use the 'flags' and 'timeout' fields of the
+ * wait_bit_key to store the root and index that we are waiting for.
+ * __wake_up may only be called on one of these keys while the radix tree
+ * is locked.  The wakeup function will take the lock itself if appropriate, or
+ * may record that the radix tree entry has been deleted.  In either case
+ * the waiting function just looks at the status reported by the wakeup function
+ * and doesn't look at the radix tree itself.
+ *
+ * There is no function for locking an entry while inserting it.  Simply
+ * insert an entry that is already marked as 'locked' - lsb set.
+ *
+ */
+
+struct wait_slot_queue {
+	struct radix_tree_root	*root;
+	unsigned long		index;
+	wait_queue_t		wait;
+	enum {SLOT_WAITING, SLOT_LOCKED, SLOT_GONE} state;
+	void			*ret;
+};
+
+static inline int slot_locked(void *v)
+{
+	unsigned long l = (unsigned long)v;
+	return l & 1;
+}
+
+static inline void *lock_slot(void **v)
+{
+	unsigned long *l = (unsigned long *)v;
+	return (void*)(*l |= 1);
+}
+
+static inline void * unlock_slot(void **v)
+{
+	unsigned long *l = (unsigned long *)v;
+	return (void*)(*l &= ~1UL);
+}
+
+static int wake_slot_function(wait_queue_t *wait, unsigned mode, int sync,
+			      void *arg)
+{
+	struct wait_bit_key *key = arg;
+	struct wait_slot_queue *wait_slot =
+		container_of(wait, struct wait_slot_queue, wait);
+	void **slot;
+
+	if (wait_slot->root != key->flags ||
+	    wait_slot->index != key->timeout)
+		/* Not waking this waiter */
+		return 0;
+	if (wait_slot->state != SLOT_WAITING)
+		/* Should be impossible.... */
+		return 1;
+	if (key->bit_nr == -3)
+		/* Was just deleted, no point in doing a lookup */
+		wait_slot = NULL;
+	else
+		wait_slot->ret = __radix_tree_lookup(
+			wait_slot->root, wait_slot->index, NULL, &slot);
+	if (!wait_slot->ret || !radix_tree_exceptional_entry(wait_slot->ret)) {
+		wait_slot->state = SLOT_GONE;
+		return 1;
+	}
+	if (slot_locked(slot))
+		/* still locked */
+		return 0;
+	wait_slot->ret = lock_slot(slot);
+	wait_slot->state = SLOT_LOCKED;
+	return 1;
+}
+
+void *radix_tree_lookup_lock(struct radix_tree_root *root, wait_queue_head_t *wq,
+			     unsigned long index, spinlock_t *lock)
+{
+	void *ret, **slot;
+	struct wait_slot_queue wait;
+
+	ret = __radix_tree_lookup(root, index, NULL, &slot);
+	if (!ret || !radix_tree_exceptional_entry(ret))
+		return ret;
+	if (!slot_locked(slot))
+		return lock_slot(slot);
+
+	wait.wait.private = current;
+	wait.wait.func = wake_slot_function;
+	INIT_LIST_HEAD(&wait.wait.task_list);
+	wait.state = SLOT_WAITING;
+	wait.root = root;
+	wait.index = index;
+	wait.ret = NULL;
+	for (;;) {
+		prepare_to_wait(wq, &wait.wait,
+				TASK_UNINTERRUPTIBLE);
+		if (wait.state != SLOT_WAITING)
+			break;
+
+		spin_unlock(lock);
+		schedule();
+		spin_lock(lock);
+	}
+	finish_wait(wq, &wait.wait);
+	return wait.ret;
+}
+EXPORT_SYMBOL(radix_tree_lookup_lock);
+
+void radix_tree_unlock(struct radix_tree_root *root, wait_queue_head_t *wq,
+			unsigned long index)
+{
+	void *ret, **slot;
+
+	ret = __radix_tree_lookup(root, index, NULL, &slot);
+	if (WARN_ON_ONCE(!ret || !radix_tree_exceptional_entry(ret)))
+		return;
+	if (WARN_ON_ONCE(!slot_locked(slot)))
+		return;
+	unlock_slot(slot);
+
+	if (waitqueue_active(wq)) {
+		struct wait_bit_key key = {.flags = root, .bit_nr = -2,
+					   .timeout = index};
+		__wake_up(wq, TASK_NORMAL, 1, &key);
+	}
+}
+EXPORT_SYMBOL(radix_tree_unlock);
+
+void radix_tree_delete_unlock(struct radix_tree_root *root, wait_queue_head_t *wq,
+			      unsigned long index)
+{
+	radix_tree_delete(root, index);
+	if (waitqueue_active(wq)) {
+		/* -3 here indicates deletion */
+		struct wait_bit_key key = {.flags = root, .bit_nr = -3,
+					   .timeout = index};
+		__wake_up(wq, TASK_NORMAL, 1, &key);
+	}
+}
+EXPORT_SYMBOL(radix_tree_delete_unlock);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-02-28  5:18 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-28  5:09 [PATCH 0/3] RFC improvements to radix-tree related to DAX NeilBrown
2016-02-28  5:09 ` NeilBrown
2016-02-28  5:09 ` NeilBrown [this message]
2016-02-28  5:09   ` [PATCH 3/3] radix-tree: support locking of individual exception entries NeilBrown
2016-02-28  5:30   ` kbuild test robot
2016-02-28  6:27   ` NeilBrown
2016-03-03 13:10   ` Jan Kara
2016-03-03 13:10     ` Jan Kara
2016-03-03 23:51     ` NeilBrown
2016-03-03 23:51       ` NeilBrown
2016-03-04 10:14       ` NeilBrown
2016-03-04 10:14         ` NeilBrown
2016-03-04 12:31         ` Jan Kara
2016-03-04 12:31           ` Jan Kara
2016-03-09  2:13           ` NeilBrown
2016-02-28  5:09 ` [PATCH 2/3] radix-tree: make 'indirect' bit available to " NeilBrown
2016-02-28  5:09   ` NeilBrown
2016-02-29 14:41   ` Wilcox, Matthew R
2016-02-29 14:41     ` Wilcox, Matthew R
2016-03-01 21:59     ` Ross Zwisler
2016-03-01 21:59       ` Ross Zwisler
2016-02-28  5:09 ` [PATCH 1/3] DAX: move RADIX_DAX_ definitions to dax.c NeilBrown
2016-02-28  5:09   ` NeilBrown
2016-02-29 14:28   ` Wilcox, Matthew R
2016-02-29 17:46     ` Ross Zwisler
2016-02-29 17:46       ` Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=145663616983.3865.11911049648442320016.stgit@notabene \
    --to=neilb@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.r.wilcox@intel.com \
    --cc=ross.zwisler@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.