All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/6] vfs: Use dlock list for SB's s_inodes list
@ 2017-10-04 21:20 Waiman Long
  2017-10-04 21:20 ` [PATCH v6 1/6] lib/dlock-list: Distributed and lock-protected lists Waiman Long
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Waiman Long @ 2017-10-04 21:20 UTC (permalink / raw)
  To: Alexander Viro, Jan Kara, Jeff Layton, J. Bruce Fields,
	Tejun Heo, Christoph Lameter
  Cc: linux-fsdevel, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Andi Kleen, Dave Chinner, Boqun Feng, Davidlohr Bueso,
	Waiman Long

v5->v6:
 - Rebased the patch to 4.14-rc3.
 - Drop the fsnotify patch as it had been merged somehow.
 - Add a new patch 5 with alternative way of selecting list by hashing
   instead of cpu #.
 - Add a new patch 6 to proivde a set irq safe APIs to be used in
   interrupt context.
 - Update the CPU to index mapping code.

v4->v5:
 - Rebased the patch to 4.8-rc1 (changes to fs/fs-writeback.c was
   dropped).
 - Use kcalloc() instead of percpu_alloc() to allocate the dlock list
   heads structure as suggested by Christoph Lameter.
 - Replaced patch 5 by another one that made sibling CPUs use the same
   dlock list head thus reducing the number of list heads that needed
   to be maintained.

v3->v4:
 - As suggested by Al, encapsulate the dlock list mechanism into
   the dlist_for_each_entry() and dlist_for_each_entry_safe()
   which are the equivalent of list_for_each_entry() and
   list_for_each_entry_safe() for regular linked list. That simplifies
   the changes in the call sites that perform dlock list iterations.
 - Add a new patch to make the percpu head structure cacheline aligned
   to prevent cacheline contention from disrupting the performance
   of nearby percpu variables.

v2->v3:
 - Remove the 2 persubnode API patches.
 - Merge __percpu tag patch 2 into patch 1.
 - As suggested by Tejun Heo, restructure the dlock_list_head data
   structure to hide the __percpu tag and rename some of the functions
   and structures.
 - Move most of the code from dlock_list.h to dlock_list.c and export
   the symbols.

v1->v2:
 - Add a set of simple per-subnode APIs that is between percpu and
   per-node in granularity.
 - Make dlock list to use the per-subnode APIs so as to reduce the
   total number of separate linked list that needs to be managed
   and iterated.
 - There is no change in patches 1-5.

This is a follow up of the following patchset:

  [PATCH v7 0/4] vfs: Use per-cpu list for SB's s_inodes list
  https://lkml.org/lkml/2016/4/12/1009

This patchset provides new APIs for a set of distributed locked lists
(one/CPU core) to minimize lock and cacheline contention. Insertion
and deletion to the list will be cheap and relatively contention free.
Lookup, on the other hand, may be a bit more costly as there are
multiple lists to iterate. This is not really a problem for the
replacement of superblock's inode list by dlock list included in
the patchset as lookup isn't needed.

For use cases that need to do lookup, the dlock list can also be
treated as a set of hashed lists that scales with the number of CPU
cores in the system.

Patch 1 introduces the dlock list. The list heads are allocated
by kcalloc() instead of percpu_alloc(). Each list head entry is
cacheline aligned to minimize contention.

Patch 2 replaces the use of list_for_each_entry_safe() in
evict_inodes() and invalidate_inodes() by list_for_each_entry().

Patch 3 modifies the superblock and inode structures to use the dlock
list. The corresponding functions that reference those structures
are modified.

Patch 4 makes the sibling CPUs use the same dlock list head to reduce
the number of list heads that need to be iterated.

Patch 5 enables alternative use case of as a set of hashed lists.

Patch 6 provides irq safe APIs to be used in interrupt context.

Jan Kara (1):
  vfs: Remove unnecessary list_for_each_entry_safe() variants

Waiman Long (5):
  lib/dlock-list: Distributed and lock-protected lists
  vfs: Use dlock list for superblock's inode list
  lib/dlock-list: Make sibling CPUs share the same linked list
  lib/dlock-list: Enable faster lookup with hashing
  lib/dlock-list: Provide IRQ-safe APIs

 fs/block_dev.c             |   9 +-
 fs/drop_caches.c           |   9 +-
 fs/inode.c                 |  38 ++---
 fs/notify/fsnotify.c       |   9 +-
 fs/quota/dquot.c           |  14 +-
 fs/super.c                 |   7 +-
 include/linux/dlock-list.h | 297 ++++++++++++++++++++++++++++++++++++
 include/linux/fs.h         |   8 +-
 lib/Makefile               |   2 +-
 lib/dlock-list.c           | 366 +++++++++++++++++++++++++++++++++++++++++++++
 10 files changed, 705 insertions(+), 54 deletions(-)
 create mode 100644 include/linux/dlock-list.h
 create mode 100644 lib/dlock-list.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v6 1/6] lib/dlock-list: Distributed and lock-protected lists
  2017-10-04 21:20 [PATCH v6 0/6] vfs: Use dlock list for SB's s_inodes list Waiman Long
@ 2017-10-04 21:20 ` Waiman Long
  2017-10-04 21:20 ` [PATCH v6 2/6] vfs: Remove unnecessary list_for_each_entry_safe() variants Waiman Long
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Waiman Long @ 2017-10-04 21:20 UTC (permalink / raw)
  To: Alexander Viro, Jan Kara, Jeff Layton, J. Bruce Fields,
	Tejun Heo, Christoph Lameter
  Cc: linux-fsdevel, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Andi Kleen, Dave Chinner, Boqun Feng, Davidlohr Bueso,
	Waiman Long, Waiman Long

Linked list is used everywhere in the Linux kernel. However, if many
threads are trying to add or delete entries into the same linked list,
it can create a performance bottleneck.

This patch introduces a new list APIs that provide a set of distributed
lists (one per CPU), each of which is protected by its own spinlock.
To the callers, however, the set of lists acts like a single
consolidated list.  This allows list entries insertion and deletion
operations to happen in parallel instead of being serialized with a
global list and lock.

List entry insertion is strictly per cpu. List deletion, however, can
happen in a cpu other than the one that did the insertion. So we still
need lock to protect the list. Because of that, there may still be
a small amount of contention when deletion is being done.

A new header file include/linux/dlock-list.h will be added with the
associated dlock_list_head and dlock_list_node structures. The following
functions are provided to manage the per-cpu list:

 1. int alloc_dlock_list_heads(struct dlock_list_heads *dlist)
 2. void free_dlock_list_heads(struct dlock_list_heads *dlist)
 3. void dlock_list_add(struct dlock_list_node *node,
		        struct dlock_list_heads *dlist)
 4. void dlock_list_del(struct dlock_list *node)

Iteration of all the list entries within a dlock list array
is done by calling either the dlist_for_each_entry() or
dlist_for_each_entry_safe() macros. They correspond to the
list_for_each_entry() and list_for_each_entry_safe() macros
respectively. The iteration states are keep in a dlock_list_iter
structure that is passed to the iteration macros.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 include/linux/dlock-list.h | 223 +++++++++++++++++++++++++++++++++++++++++++
 lib/Makefile               |   2 +-
 lib/dlock-list.c           | 231 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 455 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/dlock-list.h
 create mode 100644 lib/dlock-list.c

diff --git a/include/linux/dlock-list.h b/include/linux/dlock-list.h
new file mode 100644
index 0000000..4a4bc44
--- /dev/null
+++ b/include/linux/dlock-list.h
@@ -0,0 +1,223 @@
+/*
+ * Distributed and locked list
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * (C) Copyright 2016 Hewlett-Packard Enterprise Development LP
+ *
+ * Authors: Waiman Long <waiman.long@hpe.com>
+ */
+#ifndef __LINUX_DLOCK_LIST_H
+#define __LINUX_DLOCK_LIST_H
+
+#include <linux/spinlock.h>
+#include <linux/list.h>
+
+/*
+ * include/linux/dlock-list.h
+ *
+ * The dlock_list_head structure contains the spinlock. It is cacheline
+ * aligned to reduce contention among different CPUs. The other
+ * dlock_list_node structures contains a pointer to the head entry instead.
+ */
+struct dlock_list_head {
+	struct list_head list ____cacheline_aligned_in_smp;
+	spinlock_t lock;
+};
+
+struct dlock_list_heads {
+	struct dlock_list_head *heads;
+};
+
+/*
+ * dlock list node data structure
+ */
+struct dlock_list_node {
+	struct list_head list;
+	struct dlock_list_head *head;
+};
+
+/*
+ * dlock list iteration state
+ *
+ * This is an opaque data structure that may change. Users of this structure
+ * should not access the structure members directly other than using the
+ * helper functions and macros provided in this header file.
+ */
+struct dlock_list_iter {
+	int index;
+	struct dlock_list_head *head, *entry;
+};
+
+#define DLOCK_LIST_ITER_INIT(dlist)		\
+	{					\
+		.index = -1,			\
+		.head = (dlist)->heads,		\
+	}
+
+#define DEFINE_DLOCK_LIST_ITER(s, heads)	\
+	struct dlock_list_iter s = DLOCK_LIST_ITER_INIT(heads)
+
+static inline void init_dlock_list_iter(struct dlock_list_iter *iter,
+					struct dlock_list_heads *heads)
+{
+	*iter = (struct dlock_list_iter)DLOCK_LIST_ITER_INIT(heads);
+}
+
+#define DLOCK_LIST_NODE_INIT(name)		\
+	{					\
+		.list = LIST_HEAD_INIT(name)	\
+	}
+
+static inline void init_dlock_list_node(struct dlock_list_node *node)
+{
+	*node = (struct dlock_list_node)DLOCK_LIST_NODE_INIT(node->list);
+}
+
+/**
+ * dlock_list_unlock - unlock the spinlock that protects the current list
+ * @iter: Pointer to the dlock list iterator structure
+ */
+static inline void dlock_list_unlock(struct dlock_list_iter *iter)
+{
+	spin_unlock(&iter->entry->lock);
+}
+
+/**
+ * dlock_list_relock - lock the spinlock that protects the current list
+ * @iter: Pointer to the dlock list iterator structure
+ */
+static inline void dlock_list_relock(struct dlock_list_iter *iter)
+{
+	spin_lock(&iter->entry->lock);
+}
+
+/*
+ * Allocation and freeing of dlock list
+ */
+extern int  alloc_dlock_list_heads(struct dlock_list_heads *dlist);
+extern void free_dlock_list_heads(struct dlock_list_heads *dlist);
+
+/*
+ * Check if a dlock list is empty or not.
+ */
+extern bool dlock_lists_empty(struct dlock_list_heads *dlist);
+
+/*
+ * The dlock list addition and deletion functions here are not irq-safe.
+ * Special irq-safe variants will have to be added if we need them.
+ */
+extern void dlock_lists_add(struct dlock_list_node *node,
+			    struct dlock_list_heads *dlist);
+extern void dlock_lists_del(struct dlock_list_node *node);
+
+/*
+ * Find the first entry of the next available list.
+ */
+extern struct dlock_list_node *
+__dlock_list_next_list(struct dlock_list_iter *iter);
+
+/**
+ * __dlock_list_next_entry - Iterate to the next entry of the dlock list
+ * @curr : Pointer to the current dlock_list_node structure
+ * @iter : Pointer to the dlock list iterator structure
+ * Return: Pointer to the next entry or NULL if all the entries are iterated
+ *
+ * The iterator has to be properly initialized before calling this function.
+ */
+static inline struct dlock_list_node *
+__dlock_list_next_entry(struct dlock_list_node *curr,
+			struct dlock_list_iter *iter)
+{
+	/*
+	 * Find next entry
+	 */
+	if (curr)
+		curr = list_next_entry(curr, list);
+
+	if (!curr || (&curr->list == &iter->entry->list)) {
+		/*
+		 * The current list has been exhausted, try the next available
+		 * list.
+		 */
+		curr = __dlock_list_next_list(iter);
+	}
+
+	return curr;	/* Continue the iteration */
+}
+
+/**
+ * dlock_list_first_entry - get the first element from a list
+ * @iter  : The dlock list iterator.
+ * @type  : The type of the struct this is embedded in.
+ * @member: The name of the dlock_list_node within the struct.
+ * Return : Pointer to the next entry or NULL if all the entries are iterated.
+ */
+#define dlock_list_first_entry(iter, type, member)			\
+	({								\
+		struct dlock_list_node *_n;				\
+		_n = __dlock_list_next_entry(NULL, iter);		\
+		_n ? list_entry(_n, type, member) : NULL;		\
+	})
+
+/**
+ * dlock_list_next_entry - iterate to the next entry of the list
+ * @pos   : The type * to cursor
+ * @iter  : The dlock list iterator.
+ * @member: The name of the dlock_list_node within the struct.
+ * Return : Pointer to the next entry or NULL if all the entries are iterated.
+ *
+ * Note that pos can't be NULL.
+ */
+#define dlock_list_next_entry(pos, iter, member)			\
+	({								\
+		struct dlock_list_node *_n;				\
+		_n = __dlock_list_next_entry(&(pos)->member, iter);	\
+		_n ? list_entry(_n, typeof(*(pos)), member) : NULL;	\
+	})
+
+/**
+ * dlist_for_each_entry - iterate over the dlock list
+ * @pos   : Type * to use as a loop cursor
+ * @iter  : The dlock list iterator
+ * @member: The name of the dlock_list_node within the struct
+ *
+ * This iteration macro isn't safe with respect to list entry removal, but
+ * it can correctly iterate newly added entries right after the current one.
+ * This iteration function is designed to be used in a while loop.
+ */
+#define dlist_for_each_entry(pos, iter, member)				\
+	for (pos = dlock_list_first_entry(iter, typeof(*(pos)), member);\
+	     pos != NULL;						\
+	     pos = dlock_list_next_entry(pos, iter, member))
+
+/**
+ * dlist_for_each_entry_safe - iterate over the dlock list & safe over removal
+ * @pos   : Type * to use as a loop cursor
+ * @n	  : Another type * to use as temporary storage
+ * @iter  : The dlock list iterator
+ * @member: The name of the dlock_list_node within the struct
+ *
+ * This iteration macro is safe with respect to list entry removal.
+ * However, it cannot correctly iterate newly added entries right after the
+ * current one.
+ */
+#define dlist_for_each_entry_safe(pos, n, iter, member)			\
+	for (pos = dlock_list_first_entry(iter, typeof(*(pos)), member);\
+	    ({								\
+		bool _b = (pos != NULL);				\
+		if (_b)							\
+			n = dlock_list_next_entry(pos, iter, member);	\
+		_b;							\
+	    });								\
+	    pos = n)
+
+#endif /* __LINUX_DLOCK_LIST_H */
diff --git a/lib/Makefile b/lib/Makefile
index dafa796..0536cd3 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -38,7 +38,7 @@ obj-y += bcd.o div64.o sort.o parser.o debug_locks.o random32.o \
 	 gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \
 	 bsearch.o find_bit.o llist.o memweight.o kfifo.o \
 	 percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o \
-	 once.o refcount.o usercopy.o errseq.o
+	 once.o refcount.o usercopy.o errseq.o dlock-list.o
 obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
 obj-y += hexdump.o
diff --git a/lib/dlock-list.c b/lib/dlock-list.c
new file mode 100644
index 0000000..2779e3e
--- /dev/null
+++ b/lib/dlock-list.c
@@ -0,0 +1,231 @@
+/*
+ * Distributed and locked list
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * (C) Copyright 2016 Hewlett-Packard Enterprise Development LP
+ * (C) Copyright 2017 Red Hat, Inc.
+ *
+ * Authors: Waiman Long <longman@redhat.com>
+ */
+#include <linux/dlock-list.h>
+#include <linux/lockdep.h>
+#include <linux/slab.h>
+#include <linux/cpumask.h>
+
+/*
+ * The distributed and locked list is a distributed set of lists each of
+ * which is protected by its own spinlock, but acts like a single
+ * consolidated list to the callers. For scaling purpose, the number of
+ * lists used is equal to the number of possible CPUs in the system to
+ * minimize contention.
+ *
+ * However, it is possible that individual CPU numbers may be equal to
+ * or greater than the number of possible CPUs when there are holes in
+ * the CPU number list. As a result, we need to map the CPU number to a
+ * list index.
+ */
+static DEFINE_PER_CPU_READ_MOSTLY(int, cpu2idx);
+
+/*
+ * As all the locks in the dlock list are dynamically allocated, they need
+ * to belong to their own special lock class to avoid warning and stack
+ * trace in kernel log when lockdep is enabled. Statically allocated locks
+ * don't have this problem.
+ */
+static struct lock_class_key dlock_list_key;
+
+/*
+ * Initialize cpu2idx mapping table
+ *
+ * It is possible that a dlock-list can be allocated before the cpu2idx is
+ * initialized. In this case, all the cpus are mapped to the first entry
+ * before initialization.
+ *
+ */
+static int __init cpu2idx_init(void)
+{
+	int idx, cpu;
+
+	idx = 0;
+	for_each_possible_cpu(cpu)
+		per_cpu(cpu2idx, cpu) = idx++;
+	return 0;
+}
+postcore_initcall(cpu2idx_init);
+
+/**
+ * alloc_dlock_list_heads - Initialize and allocate the list of head entries
+ * @dlist: Pointer to the dlock_list_heads structure to be initialized
+ * Return: 0 if successful, -ENOMEM if memory allocation error
+ *
+ * This function does not allocate the dlock_list_heads structure itself. The
+ * callers will have to do their own memory allocation, if necessary. However,
+ * this allows embedding the dlock_list_heads structure directly into other
+ * structures.
+ */
+int alloc_dlock_list_heads(struct dlock_list_heads *dlist)
+{
+	int idx;
+
+	dlist->heads = kcalloc(nr_cpu_ids, sizeof(struct dlock_list_head),
+			       GFP_KERNEL);
+
+	if (!dlist->heads)
+		return -ENOMEM;
+
+	for (idx = 0; idx < nr_cpu_ids; idx++) {
+		struct dlock_list_head *head = &dlist->heads[idx];
+
+		INIT_LIST_HEAD(&head->list);
+		head->lock = __SPIN_LOCK_UNLOCKED(&head->lock);
+		lockdep_set_class(&head->lock, &dlock_list_key);
+	}
+	return 0;
+}
+
+/**
+ * free_dlock_list_heads - Free all the heads entries of the dlock list
+ * @dlist: Pointer of the dlock_list_heads structure to be freed
+ *
+ * This function doesn't free the dlock_list_heads structure itself. So
+ * the caller will have to do it, if necessary.
+ */
+void free_dlock_list_heads(struct dlock_list_heads *dlist)
+{
+	kfree(dlist->heads);
+	dlist->heads = NULL;
+}
+
+/**
+ * dlock_lists_empty - Check if all the dlock lists are empty
+ * @dlist: Pointer to the dlock_list_heads structure
+ * Return: true if list is empty, false otherwise.
+ *
+ * This can be a pretty expensive function call. If this function is required
+ * in a performance critical path, we may have to maintain a global count
+ * of the list entries in the global dlock_list_heads structure instead.
+ */
+bool dlock_lists_empty(struct dlock_list_heads *dlist)
+{
+	int idx;
+
+	for (idx = 0; idx < nr_cpu_ids; idx++)
+		if (!list_empty(&dlist->heads[idx].list))
+			return false;
+	return true;
+}
+
+/**
+ * dlock_lists_add - Adds a node to the given dlock list
+ * @node : The node to be added
+ * @dlist: The dlock list where the node is to be added
+ *
+ * List selection is based on the CPU being used when the dlock_list_add()
+ * function is called. However, deletion may be done by a different CPU.
+ */
+void dlock_lists_add(struct dlock_list_node *node,
+		     struct dlock_list_heads *dlist)
+{
+	struct dlock_list_head *head = &dlist->heads[this_cpu_read(cpu2idx)];
+
+	/*
+	 * There is no need to disable preemption
+	 */
+	spin_lock(&head->lock);
+	node->head = head;
+	list_add(&node->list, &head->list);
+	spin_unlock(&head->lock);
+}
+
+/**
+ * dlock_lists_del - Delete a node from a dlock list
+ * @node : The node to be deleted
+ *
+ * We need to check the lock pointer again after taking the lock to guard
+ * against concurrent deletion of the same node. If the lock pointer changes
+ * (becomes NULL or to a different one), we assume that the deletion was done
+ * elsewhere. A warning will be printed if this happens as it is likely to be
+ * a bug.
+ */
+void dlock_lists_del(struct dlock_list_node *node)
+{
+	struct dlock_list_head *head;
+	bool retry;
+
+	do {
+		head = READ_ONCE(node->head);
+		if (WARN_ONCE(!head, "%s: node 0x%lx has no associated head\n",
+			      __func__, (unsigned long)node))
+			return;
+
+		spin_lock(&head->lock);
+		if (likely(head == node->head)) {
+			list_del_init(&node->list);
+			node->head = NULL;
+			retry = false;
+		} else {
+			/*
+			 * The lock has somehow changed. Retry again if it is
+			 * not NULL. Otherwise, just ignore the delete
+			 * operation.
+			 */
+			retry = (node->head != NULL);
+		}
+		spin_unlock(&head->lock);
+	} while (retry);
+}
+
+/**
+ * __dlock_list_next_list: Find the first entry of the next available list
+ * @dlist: Pointer to the dlock_list_heads structure
+ * @iter : Pointer to the dlock list iterator structure
+ * Return: true if the entry is found, false if all the lists exhausted
+ *
+ * The information about the next available list will be put into the iterator.
+ */
+struct dlock_list_node *__dlock_list_next_list(struct dlock_list_iter *iter)
+{
+	struct dlock_list_node *next;
+	struct dlock_list_head *head;
+
+restart:
+	if (iter->entry) {
+		spin_unlock(&iter->entry->lock);
+		iter->entry = NULL;
+	}
+
+next_list:
+	/*
+	 * Try next list
+	 */
+	if (++iter->index >= nr_cpu_ids)
+		return NULL;	/* All the entries iterated */
+
+	if (list_empty(&iter->head[iter->index].list))
+		goto next_list;
+
+	head = iter->entry = &iter->head[iter->index];
+	spin_lock(&head->lock);
+	/*
+	 * There is a slight chance that the list may become empty just
+	 * before the lock is acquired. So an additional check is
+	 * needed to make sure that a valid node will be returned.
+	 */
+	if (list_empty(&head->list))
+		goto restart;
+
+	next = list_entry(head->list.next, struct dlock_list_node,
+			  list);
+	WARN_ON_ONCE(next->head != head);
+
+	return next;
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 2/6] vfs: Remove unnecessary list_for_each_entry_safe() variants
  2017-10-04 21:20 [PATCH v6 0/6] vfs: Use dlock list for SB's s_inodes list Waiman Long
  2017-10-04 21:20 ` [PATCH v6 1/6] lib/dlock-list: Distributed and lock-protected lists Waiman Long
@ 2017-10-04 21:20 ` Waiman Long
  2017-10-04 21:20 ` [PATCH v6 3/6] vfs: Use dlock list for superblock's inode list Waiman Long
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Waiman Long @ 2017-10-04 21:20 UTC (permalink / raw)
  To: Alexander Viro, Jan Kara, Jeff Layton, J. Bruce Fields,
	Tejun Heo, Christoph Lameter
  Cc: linux-fsdevel, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Andi Kleen, Dave Chinner, Boqun Feng, Davidlohr Bueso, Jan Kara,
	Waiman Long

From: Jan Kara <jack@suse.cz>

evict_inodes() and invalidate_inodes() use list_for_each_entry_safe()
to iterate sb->s_inodes list. However, since we use i_lru list entry for
our local temporary list of inodes to destroy, the inode is guaranteed
to stay in sb->s_inodes list while we hold sb->s_inode_list_lock. So
there is no real need for safe iteration variant and we can use
list_for_each_entry() just fine.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
---
 fs/inode.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index d1e35b5..e70e5fcb 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -601,12 +601,12 @@ static void dispose_list(struct list_head *head)
  */
 void evict_inodes(struct super_block *sb)
 {
-	struct inode *inode, *next;
+	struct inode *inode;
 	LIST_HEAD(dispose);
 
 again:
 	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
+	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
 		if (atomic_read(&inode->i_count))
 			continue;
 
@@ -652,11 +652,11 @@ void evict_inodes(struct super_block *sb)
 int invalidate_inodes(struct super_block *sb, bool kill_dirty)
 {
 	int busy = 0;
-	struct inode *inode, *next;
+	struct inode *inode;
 	LIST_HEAD(dispose);
 
 	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
+	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
 		spin_lock(&inode->i_lock);
 		if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
 			spin_unlock(&inode->i_lock);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 3/6] vfs: Use dlock list for superblock's inode list
  2017-10-04 21:20 [PATCH v6 0/6] vfs: Use dlock list for SB's s_inodes list Waiman Long
  2017-10-04 21:20 ` [PATCH v6 1/6] lib/dlock-list: Distributed and lock-protected lists Waiman Long
  2017-10-04 21:20 ` [PATCH v6 2/6] vfs: Remove unnecessary list_for_each_entry_safe() variants Waiman Long
@ 2017-10-04 21:20 ` Waiman Long
  2017-10-04 21:20 ` [PATCH v6 4/6] lib/dlock-list: Make sibling CPUs share the same linked list Waiman Long
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Waiman Long @ 2017-10-04 21:20 UTC (permalink / raw)
  To: Alexander Viro, Jan Kara, Jeff Layton, J. Bruce Fields,
	Tejun Heo, Christoph Lameter
  Cc: linux-fsdevel, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Andi Kleen, Dave Chinner, Boqun Feng, Davidlohr Bueso,
	Waiman Long, Waiman Long

When many threads are trying to add or delete inode to or from
a superblock's s_inodes list, spinlock contention on the list can
become a performance bottleneck.

This patch changes the s_inodes field to become a dlock list which
is a distributed set of lists with per-list spinlocks.  As a result,
the following superblock inode list (sb->s_inodes) iteration functions
in vfs are also being modified:

 1. iterate_bdevs()
 2. drop_pagecache_sb()
 3. evict_inodes()
 4. invalidate_inodes()
 5. fsnotify_unmount_inodes()
 6. add_dquot_ref()
 7. remove_dquot_ref()

With an exit microbenchmark that creates a large number of threads,
attachs many inodes to them and then exits. The runtimes of that
microbenchmark with 1200 threads before and after the patch on a
2-socket Intel E5-2699 v3 system (36 cores, 72 threads) were as
follows:

  Kernel            Elapsed Time    System Time
  ------            ------------    -----------
  Vanilla 4.14-rc3     58.40s         65m20s
  Patched 4.14-rc3     42.17s         45m32s

Before the patch, spinlock contention at the inode_sb_list_add()
function at the startup phase and the evict() function at the teardown
phase were about 68% and 97% of total CPU time respectively (as
measured by perf). With the patched kernel, the inode_sb_list_add()
function is less than 1% and the evict() function is at 2.5%
respectively. There were still quite a lot of lock contention elsewhere
that limited the gain. For example, the nlru lock was contributing
most of the lock contention in the teardown phase.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/block_dev.c       |  9 ++++-----
 fs/drop_caches.c     |  9 ++++-----
 fs/inode.c           | 34 +++++++++++++---------------------
 fs/notify/fsnotify.c |  9 ++++-----
 fs/quota/dquot.c     | 14 ++++++--------
 fs/super.c           |  7 ++++---
 include/linux/fs.h   |  8 ++++----
 7 files changed, 39 insertions(+), 51 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 93d088f..6a885ef 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -2125,9 +2125,9 @@ int __invalidate_device(struct block_device *bdev, bool kill_dirty)
 void iterate_bdevs(void (*func)(struct block_device *, void *), void *arg)
 {
 	struct inode *inode, *old_inode = NULL;
+	DEFINE_DLOCK_LIST_ITER(iter, &blockdev_superblock->s_inodes);
 
-	spin_lock(&blockdev_superblock->s_inode_list_lock);
-	list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list) {
+	dlist_for_each_entry(inode, &iter, i_sb_list) {
 		struct address_space *mapping = inode->i_mapping;
 		struct block_device *bdev;
 
@@ -2139,7 +2139,7 @@ void iterate_bdevs(void (*func)(struct block_device *, void *), void *arg)
 		}
 		__iget(inode);
 		spin_unlock(&inode->i_lock);
-		spin_unlock(&blockdev_superblock->s_inode_list_lock);
+		dlock_list_unlock(&iter);
 		/*
 		 * We hold a reference to 'inode' so it couldn't have been
 		 * removed from s_inodes list while we dropped the
@@ -2157,8 +2157,7 @@ void iterate_bdevs(void (*func)(struct block_device *, void *), void *arg)
 			func(bdev, arg);
 		mutex_unlock(&bdev->bd_mutex);
 
-		spin_lock(&blockdev_superblock->s_inode_list_lock);
+		dlock_list_relock(&iter);
 	}
-	spin_unlock(&blockdev_superblock->s_inode_list_lock);
 	iput(old_inode);
 }
diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index d72d52b..8db1374 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -16,9 +16,9 @@
 static void drop_pagecache_sb(struct super_block *sb, void *unused)
 {
 	struct inode *inode, *toput_inode = NULL;
+	DEFINE_DLOCK_LIST_ITER(iter, &sb->s_inodes);
 
-	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+	dlist_for_each_entry(inode, &iter, i_sb_list) {
 		spin_lock(&inode->i_lock);
 		if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
 		    (inode->i_mapping->nrpages == 0)) {
@@ -27,15 +27,14 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused)
 		}
 		__iget(inode);
 		spin_unlock(&inode->i_lock);
-		spin_unlock(&sb->s_inode_list_lock);
+		dlock_list_unlock(&iter);
 
 		invalidate_mapping_pages(inode->i_mapping, 0, -1);
 		iput(toput_inode);
 		toput_inode = inode;
 
-		spin_lock(&sb->s_inode_list_lock);
+		dlock_list_relock(&iter);
 	}
-	spin_unlock(&sb->s_inode_list_lock);
 	iput(toput_inode);
 }
 
diff --git a/fs/inode.c b/fs/inode.c
index e70e5fcb..7bfbe6b 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -28,7 +28,7 @@
  *   inode->i_state, inode->i_hash, __iget()
  * Inode LRU list locks protect:
  *   inode->i_sb->s_inode_lru, inode->i_lru
- * inode->i_sb->s_inode_list_lock protects:
+ * inode->i_sb->s_inodes->head->lock protects:
  *   inode->i_sb->s_inodes, inode->i_sb_list
  * bdi->wb.list_lock protects:
  *   bdi->wb.b_{dirty,io,more_io,dirty_time}, inode->i_io_list
@@ -37,7 +37,7 @@
  *
  * Lock ordering:
  *
- * inode->i_sb->s_inode_list_lock
+ * inode->i_sb->s_inodes->head->lock
  *   inode->i_lock
  *     Inode LRU list locks
  *
@@ -45,7 +45,7 @@
  *   inode->i_lock
  *
  * inode_hash_lock
- *   inode->i_sb->s_inode_list_lock
+ *   inode->i_sb->s_inodes->head->lock
  *   inode->i_lock
  *
  * iunique_lock
@@ -434,19 +434,14 @@ static void inode_lru_list_del(struct inode *inode)
  */
 void inode_sb_list_add(struct inode *inode)
 {
-	spin_lock(&inode->i_sb->s_inode_list_lock);
-	list_add(&inode->i_sb_list, &inode->i_sb->s_inodes);
-	spin_unlock(&inode->i_sb->s_inode_list_lock);
+	dlock_lists_add(&inode->i_sb_list, &inode->i_sb->s_inodes);
 }
 EXPORT_SYMBOL_GPL(inode_sb_list_add);
 
 static inline void inode_sb_list_del(struct inode *inode)
 {
-	if (!list_empty(&inode->i_sb_list)) {
-		spin_lock(&inode->i_sb->s_inode_list_lock);
-		list_del_init(&inode->i_sb_list);
-		spin_unlock(&inode->i_sb->s_inode_list_lock);
-	}
+	if (!list_empty(&inode->i_sb_list.list))
+		dlock_lists_del(&inode->i_sb_list);
 }
 
 static unsigned long hash(struct super_block *sb, unsigned long hashval)
@@ -602,11 +597,12 @@ static void dispose_list(struct list_head *head)
 void evict_inodes(struct super_block *sb)
 {
 	struct inode *inode;
+	struct dlock_list_iter iter;
 	LIST_HEAD(dispose);
 
 again:
-	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+	init_dlock_list_iter(&iter, &sb->s_inodes);
+	dlist_for_each_entry(inode, &iter, i_sb_list) {
 		if (atomic_read(&inode->i_count))
 			continue;
 
@@ -627,13 +623,12 @@ void evict_inodes(struct super_block *sb)
 		 * bit so we don't livelock.
 		 */
 		if (need_resched()) {
-			spin_unlock(&sb->s_inode_list_lock);
+			dlock_list_unlock(&iter);
 			cond_resched();
 			dispose_list(&dispose);
 			goto again;
 		}
 	}
-	spin_unlock(&sb->s_inode_list_lock);
 
 	dispose_list(&dispose);
 }
@@ -654,9 +649,9 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty)
 	int busy = 0;
 	struct inode *inode;
 	LIST_HEAD(dispose);
+	DEFINE_DLOCK_LIST_ITER(iter, &sb->s_inodes);
 
-	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+	dlist_for_each_entry(inode, &iter, i_sb_list) {
 		spin_lock(&inode->i_lock);
 		if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
 			spin_unlock(&inode->i_lock);
@@ -678,7 +673,6 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty)
 		spin_unlock(&inode->i_lock);
 		list_add(&inode->i_lru, &dispose);
 	}
-	spin_unlock(&sb->s_inode_list_lock);
 
 	dispose_list(&dispose);
 
@@ -893,7 +887,7 @@ struct inode *new_inode_pseudo(struct super_block *sb)
 		spin_lock(&inode->i_lock);
 		inode->i_state = 0;
 		spin_unlock(&inode->i_lock);
-		INIT_LIST_HEAD(&inode->i_sb_list);
+		init_dlock_list_node(&inode->i_sb_list);
 	}
 	return inode;
 }
@@ -914,8 +908,6 @@ struct inode *new_inode(struct super_block *sb)
 {
 	struct inode *inode;
 
-	spin_lock_prefetch(&sb->s_inode_list_lock);
-
 	inode = new_inode_pseudo(sb);
 	if (inode)
 		inode_sb_list_add(inode);
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 0c4583b..136a68a 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -51,9 +51,9 @@ void __fsnotify_vfsmount_delete(struct vfsmount *mnt)
 void fsnotify_unmount_inodes(struct super_block *sb)
 {
 	struct inode *inode, *iput_inode = NULL;
+	DEFINE_DLOCK_LIST_ITER(iter, &sb->s_inodes);
 
-	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+	dlist_for_each_entry(inode, &iter, i_sb_list) {
 		/*
 		 * We cannot __iget() an inode in state I_FREEING,
 		 * I_WILL_FREE, or I_NEW which is fine because by that point
@@ -78,7 +78,7 @@ void fsnotify_unmount_inodes(struct super_block *sb)
 
 		__iget(inode);
 		spin_unlock(&inode->i_lock);
-		spin_unlock(&sb->s_inode_list_lock);
+		dlock_list_unlock(&iter);
 
 		if (iput_inode)
 			iput(iput_inode);
@@ -90,9 +90,8 @@ void fsnotify_unmount_inodes(struct super_block *sb)
 
 		iput_inode = inode;
 
-		spin_lock(&sb->s_inode_list_lock);
+		dlock_list_relock(&iter);
 	}
-	spin_unlock(&sb->s_inode_list_lock);
 
 	if (iput_inode)
 		iput(iput_inode);
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 50b0556..983c7be 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -936,12 +936,12 @@ static int dqinit_needed(struct inode *inode, int type)
 static void add_dquot_ref(struct super_block *sb, int type)
 {
 	struct inode *inode, *old_inode = NULL;
+	DEFINE_DLOCK_LIST_ITER(iter, &sb->s_inodes);
 #ifdef CONFIG_QUOTA_DEBUG
 	int reserved = 0;
 #endif
 
-	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+	dlist_for_each_entry(inode, &iter, i_sb_list) {
 		spin_lock(&inode->i_lock);
 		if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
 		    !atomic_read(&inode->i_writecount) ||
@@ -951,7 +951,7 @@ static void add_dquot_ref(struct super_block *sb, int type)
 		}
 		__iget(inode);
 		spin_unlock(&inode->i_lock);
-		spin_unlock(&sb->s_inode_list_lock);
+		dlock_list_unlock(&iter);
 
 #ifdef CONFIG_QUOTA_DEBUG
 		if (unlikely(inode_get_rsv_space(inode) > 0))
@@ -969,9 +969,8 @@ static void add_dquot_ref(struct super_block *sb, int type)
 		 * later.
 		 */
 		old_inode = inode;
-		spin_lock(&sb->s_inode_list_lock);
+		dlock_list_relock(&iter);
 	}
-	spin_unlock(&sb->s_inode_list_lock);
 	iput(old_inode);
 
 #ifdef CONFIG_QUOTA_DEBUG
@@ -1039,9 +1038,9 @@ static void remove_dquot_ref(struct super_block *sb, int type,
 {
 	struct inode *inode;
 	int reserved = 0;
+	DEFINE_DLOCK_LIST_ITER(iter, &sb->s_inodes);
 
-	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+	dlist_for_each_entry(inode, &iter, i_sb_list) {
 		/*
 		 *  We have to scan also I_NEW inodes because they can already
 		 *  have quota pointer initialized. Luckily, we need to touch
@@ -1056,7 +1055,6 @@ static void remove_dquot_ref(struct super_block *sb, int type,
 		}
 		spin_unlock(&dq_data_lock);
 	}
-	spin_unlock(&sb->s_inode_list_lock);
 #ifdef CONFIG_QUOTA_DEBUG
 	if (reserved) {
 		printk(KERN_WARNING "VFS (%s): Writes happened after quota"
diff --git a/fs/super.c b/fs/super.c
index 166c4ee..a90a070 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -164,6 +164,7 @@ static void destroy_super(struct super_block *s)
 {
 	list_lru_destroy(&s->s_dentry_lru);
 	list_lru_destroy(&s->s_inode_lru);
+	free_dlock_list_heads(&s->s_inodes);
 	security_sb_free(s);
 	WARN_ON(!list_empty(&s->s_mounts));
 	put_user_ns(s->s_user_ns);
@@ -210,11 +211,11 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	INIT_HLIST_NODE(&s->s_instances);
 	INIT_HLIST_BL_HEAD(&s->s_anon);
 	mutex_init(&s->s_sync_lock);
-	INIT_LIST_HEAD(&s->s_inodes);
-	spin_lock_init(&s->s_inode_list_lock);
 	INIT_LIST_HEAD(&s->s_inodes_wb);
 	spin_lock_init(&s->s_inode_wblist_lock);
 
+	if (alloc_dlock_list_heads(&s->s_inodes))
+		goto fail;
 	if (list_lru_init_memcg(&s->s_dentry_lru))
 		goto fail;
 	if (list_lru_init_memcg(&s->s_inode_lru))
@@ -434,7 +435,7 @@ void generic_shutdown_super(struct super_block *sb)
 		if (sop->put_super)
 			sop->put_super(sb);
 
-		if (!list_empty(&sb->s_inodes)) {
+		if (!dlock_lists_empty(&sb->s_inodes)) {
 			printk("VFS: Busy inodes after unmount of %s. "
 			   "Self-destruct in 5 seconds.  Have a nice day...\n",
 			   sb->s_id);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 339e737..643f4d8 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -34,6 +34,7 @@
 #include <linux/delayed_call.h>
 #include <linux/uuid.h>
 #include <linux/errseq.h>
+#include <linux/dlock-list.h>
 
 #include <asm/byteorder.h>
 #include <uapi/linux/fs.h>
@@ -632,7 +633,7 @@ struct inode {
 	u16			i_wb_frn_history;
 #endif
 	struct list_head	i_lru;		/* inode LRU list */
-	struct list_head	i_sb_list;
+	struct dlock_list_node	i_sb_list;
 	struct list_head	i_wb_list;	/* backing dev writeback list */
 	union {
 		struct hlist_head	i_dentry;
@@ -1434,9 +1435,8 @@ struct super_block {
 	 */
 	int s_stack_depth;
 
-	/* s_inode_list_lock protects s_inodes */
-	spinlock_t		s_inode_list_lock ____cacheline_aligned_in_smp;
-	struct list_head	s_inodes;	/* all inodes */
+	/* The internal per-list locks protect s_inodes */
+	struct dlock_list_heads s_inodes;	/* all inodes */
 
 	spinlock_t		s_inode_wblist_lock;
 	struct list_head	s_inodes_wb;	/* writeback inodes */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 4/6] lib/dlock-list: Make sibling CPUs share the same linked list
  2017-10-04 21:20 [PATCH v6 0/6] vfs: Use dlock list for SB's s_inodes list Waiman Long
                   ` (2 preceding siblings ...)
  2017-10-04 21:20 ` [PATCH v6 3/6] vfs: Use dlock list for superblock's inode list Waiman Long
@ 2017-10-04 21:20 ` Waiman Long
  2017-10-05  8:59   ` Jan Kara
                     ` (3 more replies)
  2017-10-04 21:20 ` [PATCH v6 5/6] lib/dlock-list: Enable faster lookup with hashing Waiman Long
  2017-10-04 21:20 ` [PATCH v6 6/6] lib/dlock-list: Provide IRQ-safe APIs Waiman Long
  5 siblings, 4 replies; 18+ messages in thread
From: Waiman Long @ 2017-10-04 21:20 UTC (permalink / raw)
  To: Alexander Viro, Jan Kara, Jeff Layton, J. Bruce Fields,
	Tejun Heo, Christoph Lameter
  Cc: linux-fsdevel, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Andi Kleen, Dave Chinner, Boqun Feng, Davidlohr Bueso,
	Waiman Long, Waiman Long

The dlock list needs one list for each of the CPUs available. However,
for sibling CPUs, they are sharing the L2 and probably L1 caches
too. As a result, there is not much to gain in term of avoiding
cacheline contention while increasing the cacheline footprint of the
L1/L2 caches as separate lists may need to be in the cache.

This patch makes all the sibling CPUs share the same list, thus
reducing the number of lists that need to be maintained in each
dlock list without having any noticeable impact on performance. It
also improves dlock list iteration performance as fewer lists need
to be iterated.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
---
 lib/dlock-list.c | 61 ++++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 46 insertions(+), 15 deletions(-)

diff --git a/lib/dlock-list.c b/lib/dlock-list.c
index 2779e3e..a8c741d 100644
--- a/lib/dlock-list.c
+++ b/lib/dlock-list.c
@@ -25,15 +25,14 @@
  * The distributed and locked list is a distributed set of lists each of
  * which is protected by its own spinlock, but acts like a single
  * consolidated list to the callers. For scaling purpose, the number of
- * lists used is equal to the number of possible CPUs in the system to
- * minimize contention.
+ * lists used is equal to the number of possible cores in the system to
+ * minimize contention. All threads of the same CPU core will share the
+ * same list.
  *
- * However, it is possible that individual CPU numbers may be equal to
- * or greater than the number of possible CPUs when there are holes in
- * the CPU number list. As a result, we need to map the CPU number to a
- * list index.
+ * We need to map each CPU number to a list index.
  */
 static DEFINE_PER_CPU_READ_MOSTLY(int, cpu2idx);
+static int nr_dlock_lists __read_mostly;
 
 /*
  * As all the locks in the dlock list are dynamically allocated, they need
@@ -44,20 +43,53 @@
 static struct lock_class_key dlock_list_key;
 
 /*
- * Initialize cpu2idx mapping table
+ * Initialize cpu2idx mapping table & nr_dlock_lists.
  *
  * It is possible that a dlock-list can be allocated before the cpu2idx is
  * initialized. In this case, all the cpus are mapped to the first entry
  * before initialization.
  *
+ * All the sibling CPUs of a sibling group will map to the same dlock list so
+ * as to reduce the number of dlock lists to be maintained while minimizing
+ * cacheline contention.
+ *
+ * As the sibling masks are set up in the core initcall phase, this function
+ * has to be done in the postcore phase to get the right data.
  */
 static int __init cpu2idx_init(void)
 {
 	int idx, cpu;
+	cpumask_var_t sibling_mask;
+	static struct cpumask mask __initdata;
 
+	cpumask_clear(&mask);
 	idx = 0;
-	for_each_possible_cpu(cpu)
-		per_cpu(cpu2idx, cpu) = idx++;
+	for_each_possible_cpu(cpu) {
+		int scpu;
+
+		if (cpumask_test_cpu(cpu, &mask))
+			continue;
+		per_cpu(cpu2idx, cpu) = idx;
+		cpumask_set_cpu(cpu, &mask);
+
+		sibling_mask = topology_sibling_cpumask(cpu);
+		if (sibling_mask) {
+			for_each_cpu(scpu, sibling_mask) {
+				per_cpu(cpu2idx, scpu) = idx;
+				cpumask_set_cpu(scpu, &mask);
+			}
+		}
+		idx++;
+	}
+
+	/*
+	 * nr_dlock_lists can only be set after cpu2idx is properly
+	 * initialized.
+	 */
+	smp_mb();
+	nr_dlock_lists = idx;
+	pr_info("dlock-list: %d head entries per dlock list.\n",
+		nr_dlock_lists);
 	return 0;
 }
 postcore_initcall(cpu2idx_init);
@@ -74,15 +106,14 @@ static int __init cpu2idx_init(void)
  */
 int alloc_dlock_list_heads(struct dlock_list_heads *dlist)
 {
-	int idx;
+	int idx, cnt = nr_dlock_lists ? nr_dlock_lists : nr_cpu_ids;
 
-	dlist->heads = kcalloc(nr_cpu_ids, sizeof(struct dlock_list_head),
-			       GFP_KERNEL);
+	dlist->heads = kcalloc(cnt, sizeof(struct dlock_list_head), GFP_KERNEL);
 
 	if (!dlist->heads)
 		return -ENOMEM;
 
-	for (idx = 0; idx < nr_cpu_ids; idx++) {
+	for (idx = 0; idx < cnt; idx++) {
 		struct dlock_list_head *head = &dlist->heads[idx];
 
 		INIT_LIST_HEAD(&head->list);
@@ -118,7 +149,7 @@ bool dlock_lists_empty(struct dlock_list_heads *dlist)
 {
 	int idx;
 
-	for (idx = 0; idx < nr_cpu_ids; idx++)
+	for (idx = 0; idx < nr_dlock_lists; idx++)
 		if (!list_empty(&dlist->heads[idx].list))
 			return false;
 	return true;
@@ -207,7 +238,7 @@ struct dlock_list_node *__dlock_list_next_list(struct dlock_list_iter *iter)
 	/*
 	 * Try next list
 	 */
-	if (++iter->index >= nr_cpu_ids)
+	if (++iter->index >= nr_dlock_lists)
 		return NULL;	/* All the entries iterated */
 
 	if (list_empty(&iter->head[iter->index].list))
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 5/6] lib/dlock-list: Enable faster lookup with hashing
  2017-10-04 21:20 [PATCH v6 0/6] vfs: Use dlock list for SB's s_inodes list Waiman Long
                   ` (3 preceding siblings ...)
  2017-10-04 21:20 ` [PATCH v6 4/6] lib/dlock-list: Make sibling CPUs share the same linked list Waiman Long
@ 2017-10-04 21:20 ` Waiman Long
  2017-10-05  9:14   ` Jan Kara
  2017-10-04 21:20 ` [PATCH v6 6/6] lib/dlock-list: Provide IRQ-safe APIs Waiman Long
  5 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2017-10-04 21:20 UTC (permalink / raw)
  To: Alexander Viro, Jan Kara, Jeff Layton, J. Bruce Fields,
	Tejun Heo, Christoph Lameter
  Cc: linux-fsdevel, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Andi Kleen, Dave Chinner, Boqun Feng, Davidlohr Bueso,
	Waiman Long

Insertion and deletion is relatively cheap and mostly contention
free for dlock-list. Lookup, on the other hand, can be rather costly
because all the lists in a dlock-list will have to be iterated.

Currently dlock-list insertion is based on the cpu that the task is
running on. So a given object can be inserted into any one of the
lists depending on what the current cpu is.

This patch provides an alternative way of list selection. The caller
can provide a object context which will be hashed to one of the list
in a dlock-list. The object can then be added into that particular
list. Lookup can be done by iterating elements in the provided list
only instead of all the lists in a dlock-list.

The new APIs are:

struct dlock_list_head *dlock_list_hash(struct dlock_list_heads *, void *);
void dlock_list_add(struct dlock_list_node *, struct dlock_list_head *);

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/dlock-list.h |  9 +++++++++
 lib/dlock-list.c           | 49 +++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 51 insertions(+), 7 deletions(-)

diff --git a/include/linux/dlock-list.h b/include/linux/dlock-list.h
index 4a4bc44..7afea8f 100644
--- a/include/linux/dlock-list.h
+++ b/include/linux/dlock-list.h
@@ -120,6 +120,15 @@ extern void dlock_lists_add(struct dlock_list_node *node,
 extern void dlock_lists_del(struct dlock_list_node *node);
 
 /*
+ * Instead of individual list mapping by CPU number, it can be based on
+ * a given context to speed up loockup performance.
+ */
+extern struct dlock_list_head *dlock_list_hash(struct dlock_list_heads *dlist,
+					       void *context);
+extern void dlock_list_add(struct dlock_list_node *node,
+			   struct dlock_list_head *head);
+
+/*
  * Find the first entry of the next available list.
  */
 extern struct dlock_list_node *
diff --git a/lib/dlock-list.c b/lib/dlock-list.c
index a8c741d..e72579f 100644
--- a/lib/dlock-list.c
+++ b/lib/dlock-list.c
@@ -20,6 +20,7 @@
 #include <linux/lockdep.h>
 #include <linux/slab.h>
 #include <linux/cpumask.h>
+#include <linux/jhash.h>
 
 /*
  * The distributed and locked list is a distributed set of lists each of
@@ -156,6 +157,46 @@ bool dlock_lists_empty(struct dlock_list_heads *dlist)
 }
 
 /**
+ * dlock_list_hash - Hash the given context to a particular list
+ * @dlist: The dlock list
+ * @ctx  : The context for hashing
+ */
+struct dlock_list_head *dlock_list_hash(struct dlock_list_heads *dlist,
+					void *ctx)
+{
+	unsigned long val = (unsigned long)ctx;
+	u32 hash;
+
+	if (unlikely(!nr_dlock_lists)) {
+		WARN_ON_ONCE(1);
+		return &dlist->heads[0];
+	}
+	if (val < nr_dlock_lists)
+		hash = val;
+	else
+		hash = jhash2((u32 *)&ctx, sizeof(ctx)/sizeof(u32), 0)
+				% nr_dlock_lists;
+	return &dlist->heads[hash];
+}
+
+/**
+ * dlock_list_add - Add a node to a particular head of dlock list
+ * @node: The node to be added
+ * @head: The dlock list head where the node is to be added
+ */
+void dlock_list_add(struct dlock_list_node *node,
+		    struct dlock_list_head *head)
+{
+	/*
+	 * There is no need to disable preemption
+	 */
+	spin_lock(&head->lock);
+	node->head = head;
+	list_add(&node->list, &head->list);
+	spin_unlock(&head->lock);
+}
+
+/**
  * dlock_lists_add - Adds a node to the given dlock list
  * @node : The node to be added
  * @dlist: The dlock list where the node is to be added
@@ -168,13 +209,7 @@ void dlock_lists_add(struct dlock_list_node *node,
 {
 	struct dlock_list_head *head = &dlist->heads[this_cpu_read(cpu2idx)];
 
-	/*
-	 * There is no need to disable preemption
-	 */
-	spin_lock(&head->lock);
-	node->head = head;
-	list_add(&node->list, &head->list);
-	spin_unlock(&head->lock);
+	dlock_list_add(node, head);
 }
 
 /**
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 6/6] lib/dlock-list: Provide IRQ-safe APIs
  2017-10-04 21:20 [PATCH v6 0/6] vfs: Use dlock list for SB's s_inodes list Waiman Long
                   ` (4 preceding siblings ...)
  2017-10-04 21:20 ` [PATCH v6 5/6] lib/dlock-list: Enable faster lookup with hashing Waiman Long
@ 2017-10-04 21:20 ` Waiman Long
  2017-10-04 22:46   ` Davidlohr Bueso
  5 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2017-10-04 21:20 UTC (permalink / raw)
  To: Alexander Viro, Jan Kara, Jeff Layton, J. Bruce Fields,
	Tejun Heo, Christoph Lameter
  Cc: linux-fsdevel, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Andi Kleen, Dave Chinner, Boqun Feng, Davidlohr Bueso,
	Waiman Long

To enable the use of dlock-list in an interrupt handler, the following
new APIs are provided for a irqsafe dlock-list:

 - void dlock_list_unlock_irqsafe(struct dlock_list_iter *)
 - void dlock_list_relock_irqsafe(struct dlock_list_iter *)
 - void dlock_list_add_irqsafe(struct dlock_list_node *,
			       struct dlock_list_head *);
 - void dlock_lists_add_irqsafe(struct dlock_list_node *,
				struct dlock_list_heads *)
 - void dlock_lists_del_irqsafe(struct dlock_list_node *)

New macros for irqsafe dlock-list:

 - dlist_for_each_entry_irqsafe(pos, iter, member)
 - dlist_for_each_entry_safe_irqsafe(pos, n, iter, member)

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/dlock-list.h | 105 ++++++++++++++++++++++++++++++++++++---------
 lib/dlock-list.c           |  89 +++++++++++++++++++++++++++++++++-----
 2 files changed, 164 insertions(+), 30 deletions(-)

diff --git a/include/linux/dlock-list.h b/include/linux/dlock-list.h
index 7afea8f..00f0a29 100644
--- a/include/linux/dlock-list.h
+++ b/include/linux/dlock-list.h
@@ -54,6 +54,7 @@ struct dlock_list_node {
  */
 struct dlock_list_iter {
 	int index;
+	unsigned long flags;
 	struct dlock_list_head *head, *entry;
 };
 
@@ -100,6 +101,24 @@ static inline void dlock_list_relock(struct dlock_list_iter *iter)
 	spin_lock(&iter->entry->lock);
 }
 
+/**
+ * dlock_list_unlock_irqsafe - unlock spinlock that protects the current list
+ * @iter: Pointer to the dlock list iterator structure
+ */
+static inline void dlock_list_unlock_irqsafe(struct dlock_list_iter *iter)
+{
+	spin_unlock_irqrestore(&iter->entry->lock, iter->flags);
+}
+
+/**
+ * dlock_list_relock_irqsafe - lock spinlock that protects the current list
+ * @iter: Pointer to the dlock list iterator structure
+ */
+static inline void dlock_list_relock_irqsafe(struct dlock_list_iter *iter)
+{
+	spin_lock_irqsave(&iter->entry->lock, iter->flags);
+}
+
 /*
  * Allocation and freeing of dlock list
  */
@@ -113,12 +132,15 @@ static inline void dlock_list_relock(struct dlock_list_iter *iter)
 
 /*
  * The dlock list addition and deletion functions here are not irq-safe.
- * Special irq-safe variants will have to be added if we need them.
  */
 extern void dlock_lists_add(struct dlock_list_node *node,
 			    struct dlock_list_heads *dlist);
 extern void dlock_lists_del(struct dlock_list_node *node);
 
+extern void dlock_lists_add_irqsafe(struct dlock_list_node *node,
+				    struct dlock_list_heads *dlist);
+extern void dlock_lists_del_irqsafe(struct dlock_list_node *node);
+
 /*
  * Instead of individual list mapping by CPU number, it can be based on
  * a given context to speed up loockup performance.
@@ -127,24 +149,28 @@ extern struct dlock_list_head *dlock_list_hash(struct dlock_list_heads *dlist,
 					       void *context);
 extern void dlock_list_add(struct dlock_list_node *node,
 			   struct dlock_list_head *head);
+extern void dlock_list_add_irqsafe(struct dlock_list_node *node,
+				   struct dlock_list_head *head);
 
 /*
  * Find the first entry of the next available list.
  */
 extern struct dlock_list_node *
-__dlock_list_next_list(struct dlock_list_iter *iter);
+__dlock_list_next_list(struct dlock_list_iter *iter, bool irqsafe);
 
 /**
  * __dlock_list_next_entry - Iterate to the next entry of the dlock list
- * @curr : Pointer to the current dlock_list_node structure
- * @iter : Pointer to the dlock list iterator structure
+ * @curr   : Pointer to the current dlock_list_node structure
+ * @iter   : Pointer to the dlock list iterator structure
+ * @irqsafe: IRQ safe flag
  * Return: Pointer to the next entry or NULL if all the entries are iterated
  *
  * The iterator has to be properly initialized before calling this function.
  */
 static inline struct dlock_list_node *
 __dlock_list_next_entry(struct dlock_list_node *curr,
-			struct dlock_list_iter *iter)
+			struct dlock_list_iter *iter,
+			bool irqsafe)
 {
 	/*
 	 * Find next entry
@@ -157,7 +183,7 @@ extern void dlock_list_add(struct dlock_list_node *node,
 		 * The current list has been exhausted, try the next available
 		 * list.
 		 */
-		curr = __dlock_list_next_list(iter);
+		curr = __dlock_list_next_list(iter, irqsafe);
 	}
 
 	return curr;	/* Continue the iteration */
@@ -165,31 +191,33 @@ extern void dlock_list_add(struct dlock_list_node *node,
 
 /**
  * dlock_list_first_entry - get the first element from a list
- * @iter  : The dlock list iterator.
- * @type  : The type of the struct this is embedded in.
- * @member: The name of the dlock_list_node within the struct.
+ * @iter   : The dlock list iterator.
+ * @type   : The type of the struct this is embedded in.
+ * @member : The name of the dlock_list_node within the struct.
+ * @irqsafe: IRQ safe flag
  * Return : Pointer to the next entry or NULL if all the entries are iterated.
  */
-#define dlock_list_first_entry(iter, type, member)			\
+#define dlock_list_first_entry(iter, type, member, irqsafe)		\
 	({								\
 		struct dlock_list_node *_n;				\
-		_n = __dlock_list_next_entry(NULL, iter);		\
+		_n = __dlock_list_next_entry(NULL, iter, irqsafe);	\
 		_n ? list_entry(_n, type, member) : NULL;		\
 	})
 
 /**
  * dlock_list_next_entry - iterate to the next entry of the list
- * @pos   : The type * to cursor
- * @iter  : The dlock list iterator.
- * @member: The name of the dlock_list_node within the struct.
+ * @pos    : The type * to cursor
+ * @iter   : The dlock list iterator.
+ * @member : The name of the dlock_list_node within the struct.
+ * @irqsafe: IRQ safe flag
  * Return : Pointer to the next entry or NULL if all the entries are iterated.
  *
  * Note that pos can't be NULL.
  */
-#define dlock_list_next_entry(pos, iter, member)			\
+#define dlock_list_next_entry(pos, iter, member, irqsafe)		\
 	({								\
 		struct dlock_list_node *_n;				\
-		_n = __dlock_list_next_entry(&(pos)->member, iter);	\
+		_n = __dlock_list_next_entry(&(pos)->member, iter, irqsafe);\
 		_n ? list_entry(_n, typeof(*(pos)), member) : NULL;	\
 	})
 
@@ -204,9 +232,9 @@ extern void dlock_list_add(struct dlock_list_node *node,
  * This iteration function is designed to be used in a while loop.
  */
 #define dlist_for_each_entry(pos, iter, member)				\
-	for (pos = dlock_list_first_entry(iter, typeof(*(pos)), member);\
+	for (pos = dlock_list_first_entry(iter, typeof(*(pos)), member, 0);\
 	     pos != NULL;						\
-	     pos = dlock_list_next_entry(pos, iter, member))
+	     pos = dlock_list_next_entry(pos, iter, member, 0))
 
 /**
  * dlist_for_each_entry_safe - iterate over the dlock list & safe over removal
@@ -220,11 +248,48 @@ extern void dlock_list_add(struct dlock_list_node *node,
  * current one.
  */
 #define dlist_for_each_entry_safe(pos, n, iter, member)			\
-	for (pos = dlock_list_first_entry(iter, typeof(*(pos)), member);\
+	for (pos = dlock_list_first_entry(iter, typeof(*(pos)), member, 0);\
+	    ({								\
+		bool _b = (pos != NULL);				\
+		if (_b)							\
+			n = dlock_list_next_entry(pos, iter, member, 0);\
+		_b;							\
+	    });								\
+	    pos = n)
+
+/**
+ * dlist_for_each_entry_irqsafe - iterate over an irqsafe dlock list
+ * @pos   : Type * to use as a loop cursor
+ * @iter  : The dlock list iterator
+ * @member: The name of the dlock_list_node within the struct
+ *
+ * This iteration macro isn't safe with respect to list entry removal, but
+ * it can correctly iterate newly added entries right after the current one.
+ * This iteration function is designed to be used in a while loop.
+ */
+#define dlist_for_each_entry_irqsafe(pos, iter, member)			\
+	for (pos = dlock_list_first_entry(iter, typeof(*(pos)), member, 1);\
+	     pos != NULL;						\
+	     pos = dlock_list_next_entry(pos, iter, member, 1))
+
+/**
+ * dlist_for_each_entry_safe_irqsafe - iterate over an irqsafe dlock list &
+ *			       safe over removal
+ * @pos   : Type * to use as a loop cursor
+ * @n	  : Another type * to use as temporary storage
+ * @iter  : The dlock list iterator
+ * @member: The name of the dlock_list_node within the struct
+ *
+ * This iteration macro is safe with respect to list entry removal.
+ * However, it cannot correctly iterate newly added entries right after the
+ * current one.
+ */
+#define dlist_for_each_entry_safe_irqsafe(pos, n, iter, member)		\
+	for (pos = dlock_list_first_entry(iter, typeof(*(pos)), member, 1);\
 	    ({								\
 		bool _b = (pos != NULL);				\
 		if (_b)							\
-			n = dlock_list_next_entry(pos, iter, member);	\
+			n = dlock_list_next_entry(pos, iter, member, 1);\
 		_b;							\
 	    });								\
 	    pos = n)
diff --git a/lib/dlock-list.c b/lib/dlock-list.c
index e72579f..03d4b98 100644
--- a/lib/dlock-list.c
+++ b/lib/dlock-list.c
@@ -197,6 +197,22 @@ void dlock_list_add(struct dlock_list_node *node,
 }
 
 /**
+ * dlock_list_add_irqsafe - Add node to a particular head of irqsafe dlock list
+ * @node: The node to be added
+ * @head: The dlock list head where the node is to be added
+ */
+void dlock_list_add_irqsafe(struct dlock_list_node *node,
+			    struct dlock_list_head *head)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&head->lock, flags);
+	node->head = head;
+	list_add(&node->list, &head->list);
+	spin_unlock_irqrestore(&head->lock, flags);
+}
+
+/**
  * dlock_lists_add - Adds a node to the given dlock list
  * @node : The node to be added
  * @dlist: The dlock list where the node is to be added
@@ -213,8 +229,24 @@ void dlock_lists_add(struct dlock_list_node *node,
 }
 
 /**
- * dlock_lists_del - Delete a node from a dlock list
- * @node : The node to be deleted
+ * dlock_lists_add_irqsafe - Adds a node to the given irqsafe dlock list
+ * @node : The node to be added
+ * @dlist: The dlock list where the node is to be added
+ *
+ * List selection is based on the CPU being used when the
+ * dlock_list_add_irqsafe() function is called. However, deletion may be
+ * done by a different CPU.
+ */
+void dlock_lists_add_irqsafe(struct dlock_list_node *node,
+			     struct dlock_list_heads *dlist)
+{
+	struct dlock_list_head *head = &dlist->heads[this_cpu_read(cpu2idx)];
+
+	dlock_list_add_irqsafe(node, head);
+}
+
+/*
+ * Delete a node from a dlock list
  *
  * We need to check the lock pointer again after taking the lock to guard
  * against concurrent deletion of the same node. If the lock pointer changes
@@ -222,9 +254,11 @@ void dlock_lists_add(struct dlock_list_node *node,
  * elsewhere. A warning will be printed if this happens as it is likely to be
  * a bug.
  */
-void dlock_lists_del(struct dlock_list_node *node)
+static __always_inline void __dlock_lists_del(struct dlock_list_node *node,
+					      bool irqsafe)
 {
 	struct dlock_list_head *head;
+	unsigned long flags;
 	bool retry;
 
 	do {
@@ -233,7 +267,11 @@ void dlock_lists_del(struct dlock_list_node *node)
 			      __func__, (unsigned long)node))
 			return;
 
-		spin_lock(&head->lock);
+		if (irqsafe)
+			spin_lock_irqsave(&head->lock, flags);
+		else
+			spin_lock(&head->lock);
+
 		if (likely(head == node->head)) {
 			list_del_init(&node->list);
 			node->head = NULL;
@@ -246,26 +284,53 @@ void dlock_lists_del(struct dlock_list_node *node)
 			 */
 			retry = (node->head != NULL);
 		}
-		spin_unlock(&head->lock);
+
+		if (irqsafe)
+			spin_unlock_irqrestore(&head->lock, flags);
+		else
+			spin_unlock(&head->lock);
 	} while (retry);
 }
 
 /**
+ * dlock_lists_del - Delete a node from a dlock list
+ * @node : The node to be deleted
+ */
+void dlock_lists_del(struct dlock_list_node *node)
+{
+	__dlock_lists_del(node, false);
+}
+
+/**
+ * dlock_lists_del_irqsafe - Delete a node from a irqsafe dlock list
+ * @node : The node to be deleted
+ */
+void dlock_lists_del_irqsafe(struct dlock_list_node *node)
+{
+	__dlock_lists_del(node, true);
+}
+
+/**
  * __dlock_list_next_list: Find the first entry of the next available list
- * @dlist: Pointer to the dlock_list_heads structure
- * @iter : Pointer to the dlock list iterator structure
+ * @dlist  : Pointer to the dlock_list_heads structure
+ * @iter   : Pointer to the dlock list iterator structure
+ * @irqsafe: IRQ safe flag
  * Return: true if the entry is found, false if all the lists exhausted
  *
  * The information about the next available list will be put into the iterator.
  */
-struct dlock_list_node *__dlock_list_next_list(struct dlock_list_iter *iter)
+struct dlock_list_node *__dlock_list_next_list(struct dlock_list_iter *iter,
+					       bool irqsafe)
 {
 	struct dlock_list_node *next;
 	struct dlock_list_head *head;
 
 restart:
 	if (iter->entry) {
-		spin_unlock(&iter->entry->lock);
+		if (irqsafe)
+			dlock_list_unlock_irqsafe(iter);
+		else
+			dlock_list_unlock(iter);
 		iter->entry = NULL;
 	}
 
@@ -280,7 +345,11 @@ struct dlock_list_node *__dlock_list_next_list(struct dlock_list_iter *iter)
 		goto next_list;
 
 	head = iter->entry = &iter->head[iter->index];
-	spin_lock(&head->lock);
+	if (irqsafe)
+		dlock_list_relock_irqsafe(iter);
+	else
+		dlock_list_relock(iter);
+
 	/*
 	 * There is a slight chance that the list may become empty just
 	 * before the lock is acquired. So an additional check is
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 6/6] lib/dlock-list: Provide IRQ-safe APIs
  2017-10-04 21:20 ` [PATCH v6 6/6] lib/dlock-list: Provide IRQ-safe APIs Waiman Long
@ 2017-10-04 22:46   ` Davidlohr Bueso
  2017-10-05 13:40     ` Waiman Long
  0 siblings, 1 reply; 18+ messages in thread
From: Davidlohr Bueso @ 2017-10-04 22:46 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jan Kara, Jeff Layton, J. Bruce Fields,
	Tejun Heo, Christoph Lameter, linux-fsdevel, linux-kernel,
	Ingo Molnar, Peter Zijlstra, Andi Kleen, Dave Chinner,
	Boqun Feng

On Wed, 04 Oct 2017, Waiman Long wrote:

>To enable the use of dlock-list in an interrupt handler, the following
>new APIs are provided for a irqsafe dlock-list:
>
> - void dlock_list_unlock_irqsafe(struct dlock_list_iter *)
> - void dlock_list_relock_irqsafe(struct dlock_list_iter *)
> - void dlock_list_add_irqsafe(struct dlock_list_node *,
>			       struct dlock_list_head *);
> - void dlock_lists_add_irqsafe(struct dlock_list_node *,
>				struct dlock_list_heads *)
> - void dlock_lists_del_irqsafe(struct dlock_list_node *)
>
>New macros for irqsafe dlock-list:
>
> - dlist_for_each_entry_irqsafe(pos, iter, member)
> - dlist_for_each_entry_safe_irqsafe(pos, n, iter, member)

Instead of adding more calls to the api, could we not just use the
irqsave/restore as part of the regular api?

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 4/6] lib/dlock-list: Make sibling CPUs share the same linked list
  2017-10-04 21:20 ` [PATCH v6 4/6] lib/dlock-list: Make sibling CPUs share the same linked list Waiman Long
@ 2017-10-05  8:59   ` Jan Kara
  2017-10-05 14:57     ` Waiman Long
  2017-10-05 17:40   ` Davidlohr Bueso
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Jan Kara @ 2017-10-05  8:59 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jan Kara, Jeff Layton, J. Bruce Fields,
	Tejun Heo, Christoph Lameter, linux-fsdevel, linux-kernel,
	Ingo Molnar, Peter Zijlstra, Andi Kleen, Dave Chinner,
	Boqun Feng, Davidlohr Bueso, Waiman Long

On Wed 04-10-17 17:20:05, Waiman Long wrote:
>  int alloc_dlock_list_heads(struct dlock_list_heads *dlist)
>  {
> -	int idx;
> +	int idx, cnt = nr_dlock_lists ? nr_dlock_lists : nr_cpu_ids;

Hum, is this there for the case where alloc_dlock_list_heads() is called
before nr_dlock_lists is initialized? But how can the dlist be used later
when it has larger number of lists and you don't know how many?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 5/6] lib/dlock-list: Enable faster lookup with hashing
  2017-10-04 21:20 ` [PATCH v6 5/6] lib/dlock-list: Enable faster lookup with hashing Waiman Long
@ 2017-10-05  9:14   ` Jan Kara
  2017-10-05 11:06     ` Davidlohr Bueso
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Kara @ 2017-10-05  9:14 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jan Kara, Jeff Layton, J. Bruce Fields,
	Tejun Heo, Christoph Lameter, linux-fsdevel, linux-kernel,
	Ingo Molnar, Peter Zijlstra, Andi Kleen, Dave Chinner,
	Boqun Feng, Davidlohr Bueso

On Wed 04-10-17 17:20:06, Waiman Long wrote:
> Insertion and deletion is relatively cheap and mostly contention
> free for dlock-list. Lookup, on the other hand, can be rather costly
> because all the lists in a dlock-list will have to be iterated.
> 
> Currently dlock-list insertion is based on the cpu that the task is
> running on. So a given object can be inserted into any one of the
> lists depending on what the current cpu is.
> 
> This patch provides an alternative way of list selection. The caller
> can provide a object context which will be hashed to one of the list
> in a dlock-list. The object can then be added into that particular
> list. Lookup can be done by iterating elements in the provided list
> only instead of all the lists in a dlock-list.
> 
> The new APIs are:
> 
> struct dlock_list_head *dlock_list_hash(struct dlock_list_heads *, void *);
> void dlock_list_add(struct dlock_list_node *, struct dlock_list_head *);
> 
> Signed-off-by: Waiman Long <longman@redhat.com>

OK, this makes sense but do you have any particular user in mind? In
particular I'm not sure how big advantage this API brings over an existing
one in include/linux/list_bl.h. Sure it's a tradeoff between bitlock /
spinlock but is there a user where it matters?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 5/6] lib/dlock-list: Enable faster lookup with hashing
  2017-10-05  9:14   ` Jan Kara
@ 2017-10-05 11:06     ` Davidlohr Bueso
  0 siblings, 0 replies; 18+ messages in thread
From: Davidlohr Bueso @ 2017-10-05 11:06 UTC (permalink / raw)
  To: Jan Kara
  Cc: Waiman Long, Alexander Viro, Jan Kara, Jeff Layton,
	J. Bruce Fields, Tejun Heo, Christoph Lameter, linux-fsdevel,
	linux-kernel, Ingo Molnar, Peter Zijlstra, Andi Kleen,
	Dave Chinner, Boqun Feng

On Thu, 05 Oct 2017, Jan Kara wrote:

>OK, this makes sense but do you have any particular user in mind? In
>particular I'm not sure how big advantage this API brings over an existing
>one in include/linux/list_bl.h. Sure it's a tradeoff between bitlock /
>spinlock but is there a user where it matters?

This was tailored with epoll nested callbacks in mind, we've been discussing
offline. I have not looked at list_bl.h nonetheless.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 6/6] lib/dlock-list: Provide IRQ-safe APIs
  2017-10-04 22:46   ` Davidlohr Bueso
@ 2017-10-05 13:40     ` Waiman Long
  0 siblings, 0 replies; 18+ messages in thread
From: Waiman Long @ 2017-10-05 13:40 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Alexander Viro, Jan Kara, Jeff Layton, J. Bruce Fields,
	Tejun Heo, Christoph Lameter, linux-fsdevel, linux-kernel,
	Ingo Molnar, Peter Zijlstra, Andi Kleen, Dave Chinner,
	Boqun Feng

On 10/04/2017 06:46 PM, Davidlohr Bueso wrote:
> On Wed, 04 Oct 2017, Waiman Long wrote:
>
>> To enable the use of dlock-list in an interrupt handler, the following
>> new APIs are provided for a irqsafe dlock-list:
>>
>> - void dlock_list_unlock_irqsafe(struct dlock_list_iter *)
>> - void dlock_list_relock_irqsafe(struct dlock_list_iter *)
>> - void dlock_list_add_irqsafe(struct dlock_list_node *,
>>                    struct dlock_list_head *);
>> - void dlock_lists_add_irqsafe(struct dlock_list_node *,
>>                 struct dlock_list_heads *)
>> - void dlock_lists_del_irqsafe(struct dlock_list_node *)
>>
>> New macros for irqsafe dlock-list:
>>
>> - dlist_for_each_entry_irqsafe(pos, iter, member)
>> - dlist_for_each_entry_safe_irqsafe(pos, n, iter, member)
>
> Instead of adding more calls to the api, could we not just use the
> irqsave/restore as part of the regular api?
>
> Thanks,
> Davidlohr

The irqsave/restore spinlock calls are more expensive in term of
performance. I think the spin_lock_irqrestore() is especially bad in a
VM  as it probably causes a VMexit. So I try to avoid them unless it is
absolutely necessary.

Another alternative is to specify the dlock-list type at allocation time
and use either regular spinlock calls or irqsave/restore calls
accordingly. That will add a bit of overhead for users that don't need
irq safety, but much less than using irqsave/restore for all. I was
using that approach originally, but opt for the current solution for
performance reason. I can revert back to my original approach and send
out an updated patch.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 4/6] lib/dlock-list: Make sibling CPUs share the same linked list
  2017-10-05  8:59   ` Jan Kara
@ 2017-10-05 14:57     ` Waiman Long
  2017-10-09  7:26       ` Jan Kara
  0 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2017-10-05 14:57 UTC (permalink / raw)
  To: Jan Kara
  Cc: Alexander Viro, Jan Kara, Jeff Layton, J. Bruce Fields,
	Tejun Heo, Christoph Lameter, linux-fsdevel, linux-kernel,
	Ingo Molnar, Peter Zijlstra, Andi Kleen, Dave Chinner,
	Boqun Feng, Davidlohr Bueso, Waiman Long

On 10/05/2017 04:59 AM, Jan Kara wrote:
> On Wed 04-10-17 17:20:05, Waiman Long wrote:
>>  int alloc_dlock_list_heads(struct dlock_list_heads *dlist)
>>  {
>> -	int idx;
>> +	int idx, cnt = nr_dlock_lists ? nr_dlock_lists : nr_cpu_ids;
> Hum, is this there for the case where alloc_dlock_list_heads() is called
> before nr_dlock_lists is initialized? But how can the dlist be used later
> when it has larger number of lists and you don't know how many?

The point is nr_dlock_lists <= nr_cpu_ids. Before nr_dlock_lists is
initialized, the mapping table will map all cpus to the first entry. So
the extra entries will just stay unused. At free time, they will all get
de-allocated. I will clarify that with a comment.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 4/6] lib/dlock-list: Make sibling CPUs share the same linked list
  2017-10-04 21:20 ` [PATCH v6 4/6] lib/dlock-list: Make sibling CPUs share the same linked list Waiman Long
  2017-10-05  8:59   ` Jan Kara
@ 2017-10-05 17:40   ` Davidlohr Bueso
  2017-10-05 18:01     ` Waiman Long
  2017-10-07 13:44   ` kbuild test robot
  2017-10-07 14:38   ` kbuild test robot
  3 siblings, 1 reply; 18+ messages in thread
From: Davidlohr Bueso @ 2017-10-05 17:40 UTC (permalink / raw)
  To: Waiman Long
  Cc: Alexander Viro, Jan Kara, Jeff Layton, J. Bruce Fields,
	Tejun Heo, Christoph Lameter, linux-fsdevel, linux-kernel,
	Ingo Molnar, Peter Zijlstra, Andi Kleen, Dave Chinner,
	Boqun Feng, Waiman Long

This is still spitting:

lib/dlock-list.c: In function ‘cpu2idx_init’:
lib/dlock-list.c:75:16: error: incompatible types when assigning to type ‘cpumask_var_t’ from type ‘struct cpumask *’
   sibling_mask = topology_sibling_cpumask(cpu);
                ^
lib/dlock-list.c:76:7: warning: the address of ‘sibling_mask’ will always evaluate as ‘true’ [-Waddress]
   if (sibling_mask) {
       ^

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 4/6] lib/dlock-list: Make sibling CPUs share the same linked list
  2017-10-05 17:40   ` Davidlohr Bueso
@ 2017-10-05 18:01     ` Waiman Long
  0 siblings, 0 replies; 18+ messages in thread
From: Waiman Long @ 2017-10-05 18:01 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Alexander Viro, Jan Kara, Jeff Layton, J. Bruce Fields,
	Tejun Heo, Christoph Lameter, linux-fsdevel, linux-kernel,
	Ingo Molnar, Peter Zijlstra, Andi Kleen, Dave Chinner,
	Boqun Feng, Waiman Long

On 10/05/2017 01:40 PM, Davidlohr Bueso wrote:
> This is still spitting:
>
> lib/dlock-list.c: In function ‘cpu2idx_init’:
> lib/dlock-list.c:75:16: error: incompatible types when assigning to
> type ‘cpumask_var_t’ from type ‘struct cpumask *’
>   sibling_mask = topology_sibling_cpumask(cpu);
>                ^
> lib/dlock-list.c:76:7: warning: the address of ‘sibling_mask’ will
> always evaluate as ‘true’ [-Waddress]
>   if (sibling_mask) {
>       ^


Thanks for reporting this problem. Dealing with cpumask is always
tricky. Could you try the following patch to see if it could fix the
problem.

Thanks,
Longman

----------------------------------------------------

diff --git a/lib/dlock-list.c b/lib/dlock-list.c
index f0df7d5..3afd76d 100644
--- a/lib/dlock-list.c
+++ b/lib/dlock-list.c
@@ -60,7 +60,7 @@
 static int __init cpu2idx_init(void)
 {
        int idx, cpu;
-       cpumask_var_t sibling_mask;
+       struct cpumask *sibling_mask;
        static struct cpumask mask __initdata;
 
        cpumask_clear(&mask);
@@ -74,11 +74,9 @@ static int __init cpu2idx_init(void)
                cpumask_set_cpu(cpu, &mask);
 
                sibling_mask = topology_sibling_cpumask(cpu);
-               if (sibling_mask) {
-                       for_each_cpu(scpu, sibling_mask) {
-                               per_cpu(cpu2idx, scpu) = idx;
-                               cpumask_set_cpu(scpu, &mask);
-                       }
+               for_each_cpu(scpu, sibling_mask) {
+                       per_cpu(cpu2idx, scpu) = idx;
+                       cpumask_set_cpu(scpu, &mask);
                }
                idx++;
        }

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 4/6] lib/dlock-list: Make sibling CPUs share the same linked list
  2017-10-04 21:20 ` [PATCH v6 4/6] lib/dlock-list: Make sibling CPUs share the same linked list Waiman Long
  2017-10-05  8:59   ` Jan Kara
  2017-10-05 17:40   ` Davidlohr Bueso
@ 2017-10-07 13:44   ` kbuild test robot
  2017-10-07 14:38   ` kbuild test robot
  3 siblings, 0 replies; 18+ messages in thread
From: kbuild test robot @ 2017-10-07 13:44 UTC (permalink / raw)
  To: Waiman Long
  Cc: kbuild-all, Alexander Viro, Jan Kara, Jeff Layton,
	J. Bruce Fields, Tejun Heo, Christoph Lameter, linux-fsdevel,
	linux-kernel, Ingo Molnar, Peter Zijlstra, Andi Kleen,
	Dave Chinner, Boqun Feng, Davidlohr Bueso, Waiman Long,
	Waiman Long

[-- Attachment #1: Type: text/plain, Size: 2805 bytes --]

Hi Waiman,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.14-rc3]
[cannot apply to next-20170929]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Waiman-Long/vfs-Use-dlock-list-for-SB-s-s_inodes-list/20171007-210724
config: i386-tinyconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   lib/dlock-list.c: In function 'cpu2idx_init':
>> lib/dlock-list.c:75:16: error: assignment to expression with array type
      sibling_mask = topology_sibling_cpumask(cpu);
                   ^
>> lib/dlock-list.c:76:7: warning: the address of 'sibling_mask' will always evaluate as 'true' [-Waddress]
      if (sibling_mask) {
          ^~~~~~~~~~~~

vim +75 lib/dlock-list.c

    44	
    45	/*
    46	 * Initialize cpu2idx mapping table & nr_dlock_lists.
    47	 *
    48	 * It is possible that a dlock-list can be allocated before the cpu2idx is
    49	 * initialized. In this case, all the cpus are mapped to the first entry
    50	 * before initialization.
    51	 *
    52	 * All the sibling CPUs of a sibling group will map to the same dlock list so
    53	 * as to reduce the number of dlock lists to be maintained while minimizing
    54	 * cacheline contention.
    55	 *
    56	 * As the sibling masks are set up in the core initcall phase, this function
    57	 * has to be done in the postcore phase to get the right data.
    58	 */
    59	static int __init cpu2idx_init(void)
    60	{
    61		int idx, cpu;
    62		cpumask_var_t sibling_mask;
    63		static struct cpumask mask __initdata;
    64	
    65		cpumask_clear(&mask);
    66		idx = 0;
    67		for_each_possible_cpu(cpu) {
    68			int scpu;
    69	
    70			if (cpumask_test_cpu(cpu, &mask))
    71				continue;
    72			per_cpu(cpu2idx, cpu) = idx;
    73			cpumask_set_cpu(cpu, &mask);
    74	
  > 75			sibling_mask = topology_sibling_cpumask(cpu);
  > 76			if (sibling_mask) {
    77				for_each_cpu(scpu, sibling_mask) {
    78					per_cpu(cpu2idx, scpu) = idx;
    79					cpumask_set_cpu(scpu, &mask);
    80				}
    81			}
    82			idx++;
    83		}
    84	
    85		/*
    86		 * nr_dlock_lists can only be set after cpu2idx is properly
    87		 * initialized.
    88		 */
    89		smp_mb();
    90		nr_dlock_lists = idx;
    91		pr_info("dlock-list: %d head entries per dlock list.\n",
    92			nr_dlock_lists);
    93		return 0;
    94	}
    95	postcore_initcall(cpu2idx_init);
    96	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6685 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 4/6] lib/dlock-list: Make sibling CPUs share the same linked list
  2017-10-04 21:20 ` [PATCH v6 4/6] lib/dlock-list: Make sibling CPUs share the same linked list Waiman Long
                     ` (2 preceding siblings ...)
  2017-10-07 13:44   ` kbuild test robot
@ 2017-10-07 14:38   ` kbuild test robot
  3 siblings, 0 replies; 18+ messages in thread
From: kbuild test robot @ 2017-10-07 14:38 UTC (permalink / raw)
  To: Waiman Long
  Cc: kbuild-all, Alexander Viro, Jan Kara, Jeff Layton,
	J. Bruce Fields, Tejun Heo, Christoph Lameter, linux-fsdevel,
	linux-kernel, Ingo Molnar, Peter Zijlstra, Andi Kleen,
	Dave Chinner, Boqun Feng, Davidlohr Bueso, Waiman Long,
	Waiman Long

[-- Attachment #1: Type: text/plain, Size: 2841 bytes --]

Hi Waiman,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.14-rc3]
[cannot apply to next-20170929]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Waiman-Long/vfs-Use-dlock-list-for-SB-s-s_inodes-list/20171007-210724
config: i386-randconfig-i0-201740 (attached as .config)
compiler: gcc-4.8 (Debian 4.8.4-1) 4.8.4
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   lib/dlock-list.c: In function 'cpu2idx_init':
>> lib/dlock-list.c:75:16: error: incompatible types when assigning to type 'cpumask_var_t' from type 'const struct cpumask *'
      sibling_mask = topology_sibling_cpumask(cpu);
                   ^
   lib/dlock-list.c:76:7: warning: the address of 'sibling_mask' will always evaluate as 'true' [-Waddress]
      if (sibling_mask) {
          ^

vim +75 lib/dlock-list.c

    44	
    45	/*
    46	 * Initialize cpu2idx mapping table & nr_dlock_lists.
    47	 *
    48	 * It is possible that a dlock-list can be allocated before the cpu2idx is
    49	 * initialized. In this case, all the cpus are mapped to the first entry
    50	 * before initialization.
    51	 *
    52	 * All the sibling CPUs of a sibling group will map to the same dlock list so
    53	 * as to reduce the number of dlock lists to be maintained while minimizing
    54	 * cacheline contention.
    55	 *
    56	 * As the sibling masks are set up in the core initcall phase, this function
    57	 * has to be done in the postcore phase to get the right data.
    58	 */
    59	static int __init cpu2idx_init(void)
    60	{
    61		int idx, cpu;
    62		cpumask_var_t sibling_mask;
    63		static struct cpumask mask __initdata;
    64	
    65		cpumask_clear(&mask);
    66		idx = 0;
    67		for_each_possible_cpu(cpu) {
    68			int scpu;
    69	
    70			if (cpumask_test_cpu(cpu, &mask))
    71				continue;
    72			per_cpu(cpu2idx, cpu) = idx;
    73			cpumask_set_cpu(cpu, &mask);
    74	
  > 75			sibling_mask = topology_sibling_cpumask(cpu);
    76			if (sibling_mask) {
    77				for_each_cpu(scpu, sibling_mask) {
    78					per_cpu(cpu2idx, scpu) = idx;
    79					cpumask_set_cpu(scpu, &mask);
    80				}
    81			}
    82			idx++;
    83		}
    84	
    85		/*
    86		 * nr_dlock_lists can only be set after cpu2idx is properly
    87		 * initialized.
    88		 */
    89		smp_mb();
    90		nr_dlock_lists = idx;
    91		pr_info("dlock-list: %d head entries per dlock list.\n",
    92			nr_dlock_lists);
    93		return 0;
    94	}
    95	postcore_initcall(cpu2idx_init);
    96	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27064 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 4/6] lib/dlock-list: Make sibling CPUs share the same linked list
  2017-10-05 14:57     ` Waiman Long
@ 2017-10-09  7:26       ` Jan Kara
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Kara @ 2017-10-09  7:26 UTC (permalink / raw)
  To: Waiman Long
  Cc: Jan Kara, Alexander Viro, Jan Kara, Jeff Layton, J. Bruce Fields,
	Tejun Heo, Christoph Lameter, linux-fsdevel, linux-kernel,
	Ingo Molnar, Peter Zijlstra, Andi Kleen, Dave Chinner,
	Boqun Feng, Davidlohr Bueso, Waiman Long

On Thu 05-10-17 10:57:07, Waiman Long wrote:
> On 10/05/2017 04:59 AM, Jan Kara wrote:
> > On Wed 04-10-17 17:20:05, Waiman Long wrote:
> >>  int alloc_dlock_list_heads(struct dlock_list_heads *dlist)
> >>  {
> >> -	int idx;
> >> +	int idx, cnt = nr_dlock_lists ? nr_dlock_lists : nr_cpu_ids;
> > Hum, is this there for the case where alloc_dlock_list_heads() is called
> > before nr_dlock_lists is initialized? But how can the dlist be used later
> > when it has larger number of lists and you don't know how many?
> 
> The point is nr_dlock_lists <= nr_cpu_ids. Before nr_dlock_lists is
> initialized, the mapping table will map all cpus to the first entry. So
> the extra entries will just stay unused. At free time, they will all get
> de-allocated. I will clarify that with a comment.

OK, nice trick.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-10-09  7:26 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-04 21:20 [PATCH v6 0/6] vfs: Use dlock list for SB's s_inodes list Waiman Long
2017-10-04 21:20 ` [PATCH v6 1/6] lib/dlock-list: Distributed and lock-protected lists Waiman Long
2017-10-04 21:20 ` [PATCH v6 2/6] vfs: Remove unnecessary list_for_each_entry_safe() variants Waiman Long
2017-10-04 21:20 ` [PATCH v6 3/6] vfs: Use dlock list for superblock's inode list Waiman Long
2017-10-04 21:20 ` [PATCH v6 4/6] lib/dlock-list: Make sibling CPUs share the same linked list Waiman Long
2017-10-05  8:59   ` Jan Kara
2017-10-05 14:57     ` Waiman Long
2017-10-09  7:26       ` Jan Kara
2017-10-05 17:40   ` Davidlohr Bueso
2017-10-05 18:01     ` Waiman Long
2017-10-07 13:44   ` kbuild test robot
2017-10-07 14:38   ` kbuild test robot
2017-10-04 21:20 ` [PATCH v6 5/6] lib/dlock-list: Enable faster lookup with hashing Waiman Long
2017-10-05  9:14   ` Jan Kara
2017-10-05 11:06     ` Davidlohr Bueso
2017-10-04 21:20 ` [PATCH v6 6/6] lib/dlock-list: Provide IRQ-safe APIs Waiman Long
2017-10-04 22:46   ` Davidlohr Bueso
2017-10-05 13:40     ` Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.