linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/13] vfs: Convert file allocation code to use the IDR
@ 2017-04-27 18:48 Sandhya Bankar
  2017-04-27 18:54 ` [PATCH 01/13] idr: Add ability to set/clear tags Matthew Wilcox
                   ` (12 more replies)
  0 siblings, 13 replies; 15+ messages in thread
From: Sandhya Bankar @ 2017-04-27 18:48 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, viro, mawilcox, re.emese, adobriyan,
	keescook, riel

Currently the file descriptors are allocated using a custom allocator. 
This patchset replaces the custom code with an IDR. This replacement will 
result in some memory saving for processes with relatively few open files 
and improve performance of workloads with very large numbers of open files.

Note: Some more performance benchmark needs to be run on this patchset.
Can anyone please help in runnning benchmark for this patchset.

Matthew Wilcox (5):
  idr: Add ability to set/clear tags
  idr: Add idr_for_each_entry_tagged()
  idr, radix-tree: Add get_tag_batch function
  idr, radix-tree: Implement copy_preload
  vfs: Add init_task.h include

Sandhya Bankar (8):
  vfs: Replace array of file pointers with an IDR
  vfs: Remove next_fd from fd alloc code path.
  vfs: Remove full_fds_bits from fd allocation code path.
  vfs: Use idr_tag_get() in fd_is_open().
  vfs: Rewrite close_files()
  vfs: Replace close_on_exec bitmap with an IDR tag
  vfs: Convert select to use idr_get_tag_batch()
  vfs: Delete struct fdtable

 fs/compat.c                                 |   6 +-
 fs/exec.c                                   |   2 +-
 fs/fcntl.c                                  |   2 +-
 fs/file.c                                   | 606 ++++++----------------------
 fs/proc/array.c                             |   2 +-
 fs/proc/fd.c                                |   6 +-
 fs/select.c                                 |  21 +-
 include/linux/fdtable.h                     |  66 +--
 include/linux/file.h                        |   1 -
 include/linux/idr.h                         | 110 ++++-
 include/linux/radix-tree.h                  |   5 +
 lib/idr.c                                   |  30 +-
 lib/radix-tree.c                            | 169 +++++++-
 tools/testing/radix-tree/idr-test.c         |  85 +++-
 tools/testing/radix-tree/linux/radix-tree.h |   2 +
 tools/testing/radix-tree/main.c             |  38 ++
 tools/testing/radix-tree/test.c             |   9 +-
 tools/testing/radix-tree/test.h             |   3 +-
 18 files changed, 578 insertions(+), 585 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 01/13] idr: Add ability to set/clear tags
  2017-04-27 18:48 [PATCH 00/13] vfs: Convert file allocation code to use the IDR Sandhya Bankar
@ 2017-04-27 18:54 ` Matthew Wilcox
  2017-04-27 19:05 ` [PATCH 02/13] idr: Add idr_for_each_entry_tagged() Sandhya Bankar
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Matthew Wilcox @ 2017-04-27 18:54 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, viro, mawmilcox, keescook,
	adobriyan, re.emese, riel

Now that the IDR uses the radix tree, we can expose the radix tree tags
to users of the IDR.  A few spots in the radix tree needed to be changed
to cope with the fact that the IDR can have NULL pointers with tags set.
One of the more notable changes is that IDR_FREE really is special -- an
index which is out of range of the current tree height is free.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/idr.h                 | 57 +++++++++++++++++++++++++++++++++++--
 lib/radix-tree.c                    | 31 +++++++++-----------
 tools/testing/radix-tree/idr-test.c | 27 ++++++++++++++++++
 3 files changed, 95 insertions(+), 20 deletions(-)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index bf70b3e..7eb4432 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -22,8 +22,8 @@ struct idr {
 };
 
 /*
- * The IDR API does not expose the tagging functionality of the radix tree
- * to users.  Use tag 0 to track whether a node has free space below it.
+ * Reserve one of the radix tree tags to track whether a node has free
+ * space below it.
  */
 #define IDR_FREE	0
 
@@ -93,6 +93,59 @@ static inline void *idr_remove(struct idr *idr, int id)
 	return radix_tree_delete_item(&idr->idr_rt, id, NULL);
 }
 
+/**
+ * idr_tag_set - Set a tag on an entry
+ * @idr: IDR pointer
+ * @id: ID of entry to tag
+ * @tag: Tag index to set
+ *
+ * If there is an entry at @id in this IDR, set a tag on it and return
+ * the address of the entry.  If @id is outside the range of the IDR,
+ * return NULL.  This API does not allow you to set IDR_FREE on an entry;
+ * use idr_remove() for that.  The implementation does not currently check
+ * that IDR_FREE is clear, so it is possible to set a tag on a free entry.
+ * This is not recommended and may change in the future.
+ */
+static inline void *idr_tag_set(struct idr *idr, int id, unsigned int tag)
+{
+	BUG_ON(tag == IDR_FREE);
+	return radix_tree_tag_set(&idr->idr_rt, id, tag);
+}
+
+/**
+ * idr_tag_clear - Clear a tag on an entry
+ * @idr: IDR pointer
+ * @id: ID of entry to tag
+ * @tag: Tag index to clear
+ *
+ * If there is an entry at @id in this IDR, clear its tag and return
+ * the address of the entry.  If @id is outside the range of the IDR,
+ * return NULL.  This API does not allow you to clear IDR_FREE on an entry;
+ * use idr_alloc() for that.  The implementation does not currently check
+ * that IDR_FREE is clear, so it is possible to clear a tag on a free entry.
+ * This is not recommended and may change in the future.
+ */
+static inline void *idr_tag_clear(struct idr *idr, int id, unsigned int tag)
+{
+	BUG_ON(tag == IDR_FREE);
+	return radix_tree_tag_clear(&idr->idr_rt, id, tag);
+}
+
+/**
+ * idr_tag_get - Return whether a particular entry has a tag set
+ * @idr: IDR pointer
+ * @id: ID of entry to check
+ * @tag: Tag index to check
+ *
+ * Returns true/false depending whether @tag is set on this ID.  Unlike
+ * idr_tag_set() or idr_tag_clear(), you can use the IDR_FREE tag value,
+ * as it can be useful to know whether a particular ID has been allocated.
+ */
+static inline bool idr_tag_get(const struct idr *idr, int id, unsigned int tag)
+{
+	return radix_tree_tag_get(&idr->idr_rt, id, tag);
+}
+
 static inline void idr_init(struct idr *idr)
 {
 	INIT_RADIX_TREE(&idr->idr_rt, IDR_RT_MARKER);
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 691a9ad..6723384 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -638,18 +638,17 @@ static int radix_tree_extend(struct radix_tree_root *root, gfp_t gfp,
 		if (!node)
 			return -ENOMEM;
 
+		/* Propagate the aggregated tag info to the new child */
+		for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) {
+			if (root_tag_get(root, tag))
+				tag_set(node, tag, 0);
+		}
 		if (is_idr(root)) {
 			all_tag_set(node, IDR_FREE);
 			if (!root_tag_get(root, IDR_FREE)) {
 				tag_clear(node, IDR_FREE, 0);
 				root_tag_set(root, IDR_FREE);
 			}
-		} else {
-			/* Propagate the aggregated tag info to the new child */
-			for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) {
-				if (root_tag_get(root, tag))
-					tag_set(node, tag, 0);
-			}
 		}
 
 		BUG_ON(shift > BITS_PER_LONG);
@@ -1434,8 +1433,6 @@ void *radix_tree_tag_set(struct radix_tree_root *root,
 
 		parent = entry_to_node(node);
 		offset = radix_tree_descend(parent, &node, index);
-		BUG_ON(!node);
-
 		if (!tag_get(parent, tag, offset))
 			tag_set(parent, tag, offset);
 	}
@@ -1489,7 +1486,7 @@ static void node_tag_clear(struct radix_tree_root *root,
  *	Clear the search tag (which must be < RADIX_TREE_MAX_TAGS)
  *	corresponding to @index in the radix tree.  If this causes
  *	the leaf node to have no tags set then clear the tag in the
- *	next-to-leaf node, etc.
+ *	parent node, etc.
  *
  *	Returns the address of the tagged item on success, else NULL.  ie:
  *	has the same return value and semantics as radix_tree_lookup().
@@ -1512,8 +1509,7 @@ void *radix_tree_tag_clear(struct radix_tree_root *root,
 		offset = radix_tree_descend(parent, &node, index);
 	}
 
-	if (node)
-		node_tag_clear(root, parent, tag, offset);
+	node_tag_clear(root, parent, tag, offset);
 
 	return node;
 }
@@ -1552,11 +1548,11 @@ int radix_tree_tag_get(const struct radix_tree_root *root,
 	struct radix_tree_node *node, *parent;
 	unsigned long maxindex;
 
-	if (!root_tag_get(root, tag))
-		return 0;
-
 	radix_tree_load_root(root, &node, &maxindex);
 	if (index > maxindex)
+		return is_idr(root) && (tag == IDR_FREE);
+
+	if (!root_tag_get(root, tag))
 		return 0;
 
 	while (radix_tree_is_internal_node(node)) {
@@ -1783,7 +1779,7 @@ void __rcu **radix_tree_next_chunk(const struct radix_tree_root *root,
 			child = rcu_dereference_raw(node->slots[offset]);
 		}
 
-		if (!child)
+		if (!child && !is_idr(root))
 			goto restart;
 		if (child == RADIX_TREE_RETRY)
 			break;
@@ -1994,11 +1990,10 @@ static bool __radix_tree_delete(struct radix_tree_root *root,
 	unsigned offset = get_slot_offset(node, slot);
 	int tag;
 
+	for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
+		node_tag_clear(root, node, tag, offset);
 	if (is_idr(root))
 		node_tag_set(root, node, IDR_FREE, offset);
-	else
-		for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
-			node_tag_clear(root, node, tag, offset);
 
 	replace_slot(slot, NULL, node, -1, exceptional);
 	return node && delete_node(root, node, NULL, NULL);
diff --git a/tools/testing/radix-tree/idr-test.c b/tools/testing/radix-tree/idr-test.c
index 30cd0b2..fd94bee 100644
--- a/tools/testing/radix-tree/idr-test.c
+++ b/tools/testing/radix-tree/idr-test.c
@@ -135,6 +135,32 @@ void idr_null_test(void)
 	assert(idr_is_empty(&idr));
 }
 
+#define IDR_TEST	1
+
+void idr_tag_test(void)
+{
+	unsigned int i;
+	DEFINE_IDR(idr);
+
+	for (i = 0; i < 100; i++) {
+		assert(idr_alloc(&idr, NULL, 0, 0, GFP_KERNEL) == i);
+		if (i % 7 == 0)
+			idr_tag_set(&idr, i, IDR_TEST);
+	}
+
+	for (i = 0; i < 100; i += 14) {
+		assert(idr_tag_get(&idr, i, IDR_TEST));
+		idr_tag_clear(&idr, i, IDR_TEST);
+	}
+
+	for (i = 0; i < 100; i++) {
+		assert(idr_tag_get(&idr, i, IDR_TEST) == (i % 14 == 7));
+	}
+
+	idr_for_each(&idr, item_idr_free, &idr);
+	idr_destroy(&idr);
+}
+
 void idr_nowait_test(void)
 {
 	unsigned int i;
@@ -225,6 +251,7 @@ void idr_checks(void)
 	idr_replace_test();
 	idr_alloc_test();
 	idr_null_test();
+	idr_tag_test();
 	idr_nowait_test();
 	idr_get_next_test();
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 02/13] idr: Add idr_for_each_entry_tagged()
  2017-04-27 18:48 [PATCH 00/13] vfs: Convert file allocation code to use the IDR Sandhya Bankar
  2017-04-27 18:54 ` [PATCH 01/13] idr: Add ability to set/clear tags Matthew Wilcox
@ 2017-04-27 19:05 ` Sandhya Bankar
  2017-04-27 19:06 ` [PATCH 03/13] idr, radix-tree: Add get_tag_batch function Sandhya Bankar
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Sandhya Bankar @ 2017-04-27 19:05 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, viro, mawmilcox, keescook,
	adobriyan, re.emese, riel

Add the ability to iterate over tagged entries in the IDR with
idr_get_next_tag() and idr_for_each_entry_tagged().

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/idr.h                 | 15 ++++++++++++++-
 lib/idr.c                           | 30 +++++++++++++++++++++++++++++-
 tools/testing/radix-tree/idr-test.c | 18 ++++++++++--------
 tools/testing/radix-tree/test.c     |  9 +++++++--
 tools/testing/radix-tree/test.h     |  1 +
 5 files changed, 61 insertions(+), 12 deletions(-)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index 7eb4432..9f71e63 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -84,7 +84,8 @@ static inline void idr_set_cursor(struct idr *idr, unsigned int val)
 int idr_alloc_cyclic(struct idr *, void *entry, int start, int end, gfp_t);
 int idr_for_each(const struct idr *,
 		 int (*fn)(int id, void *p, void *data), void *data);
-void *idr_get_next(struct idr *, int *nextid);
+void *idr_get_next(const struct idr *, int *nextid);
+void *idr_get_next_tag(const struct idr *, int *nextid, unsigned int tag);
 void *idr_replace(struct idr *, void *, int id);
 void idr_destroy(struct idr *);
 
@@ -213,6 +214,18 @@ static inline void *idr_find(const struct idr *idr, int id)
 	     entry;							\
 	     ++id, (entry) = idr_get_next((idr), &(id)))
 
+/**
+ * idr_for_each_entry_tagged - iterate over IDs with a set tag
+ * @idr: IDR handle
+ * @entry: The pointer stored in @idr
+ * @id: The index of @entry in @idr
+ * @tag: tag to search for
+ */
+#define idr_for_each_entry_tagged(idr, entry, id, tag)			\
+	for (id = 0;							\
+	     ((entry) = idr_get_next_tag(idr, &(id), (tag))) != NULL;	\
+	     ++id)
+
 /*
  * IDA - IDR based id allocator, use when translation from id to
  * pointer isn't necessary.
diff --git a/lib/idr.c b/lib/idr.c
index b13682b..68e39c3 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -120,7 +120,7 @@ int idr_for_each(const struct idr *idr,
  * to the ID of the found value.  To use in a loop, the value pointed to by
  * nextid must be incremented by the user.
  */
-void *idr_get_next(struct idr *idr, int *nextid)
+void *idr_get_next(const struct idr *idr, int *nextid)
 {
 	struct radix_tree_iter iter;
 	void __rcu **slot;
@@ -135,6 +135,34 @@ void *idr_get_next(struct idr *idr, int *nextid)
 EXPORT_SYMBOL(idr_get_next);
 
 /**
+ * idr_get_next_tag - Find next tagged entry
+ * @idr: idr handle
+ * @nextid: Pointer to lowest possible ID to return
+ * @tag: tag to search for
+ *
+ * Returns the next tagged entry in the tree with an ID greater than
+ * or equal to the value pointed to by @nextid.  On exit, @nextid is updated
+ * to the ID of the found value.  To use in a loop, the value pointed to by
+ * nextid must be incremented by the user.  If a NULL entry is tagged, it
+ * will be returned.
+ */
+void *idr_get_next_tag(const struct idr *idr, int *nextid, unsigned int tag)
+{
+	struct radix_tree_iter iter;
+	void __rcu **slot;
+
+	radix_tree_iter_init(&iter, *nextid);
+	slot = radix_tree_next_chunk(&idr->idr_rt, &iter,
+					RADIX_TREE_ITER_TAGGED | tag);
+	if (!slot)
+		return NULL;
+
+	*nextid = iter.index;
+	return rcu_dereference_raw(*slot);
+}
+EXPORT_UNUSED_SYMBOL(idr_get_next_tag);
+
+/**
  * idr_replace - replace pointer for given id
  * @idr: idr handle
  * @ptr: New pointer to associate with the ID
diff --git a/tools/testing/radix-tree/idr-test.c b/tools/testing/radix-tree/idr-test.c
index fd94bee..334ce1c 100644
--- a/tools/testing/radix-tree/idr-test.c
+++ b/tools/testing/radix-tree/idr-test.c
@@ -23,19 +23,15 @@
 
 int item_idr_free(int id, void *p, void *data)
 {
-	struct item *item = p;
-	assert(item->index == id);
-	free(p);
-
+	item_free(p, id);
 	return 0;
 }
 
 void item_idr_remove(struct idr *idr, int id)
 {
 	struct item *item = idr_find(idr, id);
-	assert(item->index == id);
 	idr_remove(idr, id);
-	free(item);
+	item_free(item, id);
 }
 
 void idr_alloc_test(void)
@@ -139,11 +135,13 @@ void idr_null_test(void)
 
 void idr_tag_test(void)
 {
-	unsigned int i;
+	int i;
 	DEFINE_IDR(idr);
+	struct item *item;
 
 	for (i = 0; i < 100; i++) {
-		assert(idr_alloc(&idr, NULL, 0, 0, GFP_KERNEL) == i);
+		item = item_create(i, 0);
+		assert(idr_alloc(&idr, item, 0, 0, GFP_KERNEL) == i);
 		if (i % 7 == 0)
 			idr_tag_set(&idr, i, IDR_TEST);
 	}
@@ -157,6 +155,10 @@ void idr_tag_test(void)
 		assert(idr_tag_get(&idr, i, IDR_TEST) == (i % 14 == 7));
 	}
 
+	idr_for_each_entry_tagged(&idr, item, i, IDR_TEST) {
+		assert(item->index % 14 == 7);
+	}
+
 	idr_for_each(&idr, item_idr_free, &idr);
 	idr_destroy(&idr);
 }
diff --git a/tools/testing/radix-tree/test.c b/tools/testing/radix-tree/test.c
index 1a257d7..74f8e5c 100644
--- a/tools/testing/radix-tree/test.c
+++ b/tools/testing/radix-tree/test.c
@@ -62,13 +62,18 @@ void item_sanity(struct item *item, unsigned long index)
 	assert((item->index | mask) == (index | mask));
 }
 
+void item_free(struct item *item, unsigned long index)
+{
+	item_sanity(item, index);
+	free(item);
+}
+
 int item_delete(struct radix_tree_root *root, unsigned long index)
 {
 	struct item *item = radix_tree_delete(root, index);
 
 	if (item) {
-		item_sanity(item, index);
-		free(item);
+		item_free(item, index);
 		return 1;
 	}
 	return 0;
diff --git a/tools/testing/radix-tree/test.h b/tools/testing/radix-tree/test.h
index 0f8220c..cbabea1 100644
--- a/tools/testing/radix-tree/test.h
+++ b/tools/testing/radix-tree/test.h
@@ -13,6 +13,7 @@ struct item {
 int item_insert(struct radix_tree_root *root, unsigned long index);
 int item_insert_order(struct radix_tree_root *root, unsigned long index,
 			unsigned order);
+void item_free(struct item *item, unsigned long index);
 int item_delete(struct radix_tree_root *root, unsigned long index);
 struct item *item_lookup(struct radix_tree_root *root, unsigned long index);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 03/13] idr, radix-tree: Add get_tag_batch function
  2017-04-27 18:48 [PATCH 00/13] vfs: Convert file allocation code to use the IDR Sandhya Bankar
  2017-04-27 18:54 ` [PATCH 01/13] idr: Add ability to set/clear tags Matthew Wilcox
  2017-04-27 19:05 ` [PATCH 02/13] idr: Add idr_for_each_entry_tagged() Sandhya Bankar
@ 2017-04-27 19:06 ` Sandhya Bankar
  2017-04-27 19:07 ` [PATCH 04/13] idr, radix-tree: Implement copy_preload Sandhya Bankar
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Sandhya Bankar @ 2017-04-27 19:06 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, viro, mawmilcox, keescook,
	adobriyan, re.emese, riel

To implement select() on top of the IDR, we need to be able to get the
tags which represent the open files in bulk.  For this user, it makes
sense to get a batch of BITS_PER_LONG tags at a time, and until another
user shows up that wants something different, let's enforce that instead
of coping with arbitrary offsets.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/idr.h                 |  6 ++++
 include/linux/radix-tree.h          |  2 ++
 lib/radix-tree.c                    | 55 ++++++++++++++++++++++++++++++++++++-
 tools/testing/radix-tree/idr-test.c | 20 ++++++++++++++
 tools/testing/radix-tree/test.h     |  2 +-
 5 files changed, 83 insertions(+), 2 deletions(-)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index 9f71e63..d43cf01 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -147,6 +147,12 @@ static inline bool idr_tag_get(const struct idr *idr, int id, unsigned int tag)
 	return radix_tree_tag_get(&idr->idr_rt, id, tag);
 }
 
+static inline unsigned long idr_get_tag_batch(const struct idr *idr, int id,
+							unsigned int tag)
+{
+	return radix_tree_get_tag_batch(&idr->idr_rt, id, tag);
+}
+
 static inline void idr_init(struct idr *idr)
 {
 	INIT_RADIX_TREE(&idr->idr_rt, IDR_RT_MARKER);
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 3e57350..f701e0b 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -339,6 +339,8 @@ void radix_tree_iter_tag_set(struct radix_tree_root *,
 		const struct radix_tree_iter *iter, unsigned int tag);
 void radix_tree_iter_tag_clear(struct radix_tree_root *,
 		const struct radix_tree_iter *iter, unsigned int tag);
+unsigned long radix_tree_get_tag_batch(const struct radix_tree_root *,
+				unsigned long index, unsigned int tag);
 unsigned int radix_tree_gang_lookup_tag(const struct radix_tree_root *,
 		void **results, unsigned long first_index,
 		unsigned int max_items, unsigned int tag);
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 6723384..855ac8e 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -181,7 +181,8 @@ static inline void root_tag_clear_all(struct radix_tree_root *root)
 	root->gfp_mask &= (1 << ROOT_TAG_SHIFT) - 1;
 }
 
-static inline int root_tag_get(const struct radix_tree_root *root, unsigned tag)
+static inline bool root_tag_get(const struct radix_tree_root *root,
+								unsigned tag)
 {
 	return (__force int)root->gfp_mask & (1 << (tag + ROOT_TAG_SHIFT));
 }
@@ -1571,6 +1572,58 @@ int radix_tree_tag_get(const struct radix_tree_root *root,
 }
 EXPORT_SYMBOL(radix_tree_tag_get);
 
+static unsigned long
+__radix_tree_get_tag_batch(const struct radix_tree_root *root,
+				unsigned long index, unsigned int tag)
+{
+	struct radix_tree_node *node;
+	void __rcu **slot = NULL;
+	bool idr_free = is_idr(root) && (tag == IDR_FREE);
+
+	__radix_tree_lookup(root, index, &node, &slot);
+	if (!slot)
+		return idr_free ? ~0UL : 0;
+	if (!node)
+		return root_tag_get(root, tag) | (idr_free ? ~1UL : 0);
+	if (node->shift)
+		return idr_free ? ~0UL : 0;
+	return node->tags[tag][(index / BITS_PER_LONG) &
+						(RADIX_TREE_TAG_LONGS - 1)];
+}
+
+/**
+ * radix_tree_get_tag_batch() - get a batch of tags
+ * @root: radix tree root
+ * @index: start index of batch
+ * @tag: tag to get
+ *
+ * Get a batch of BITS_PER_LONG tags.  The only values of @index
+ * permitted are multiples of BITS_PER_LONG.
+ *
+ * Return: The tags for the next BITS_PER_LONG indices.
+ */
+unsigned long radix_tree_get_tag_batch(const struct radix_tree_root *root,
+				unsigned long index, unsigned int tag)
+{
+	unsigned long bits = 0;
+	unsigned shift = BITS_PER_LONG > RADIX_TREE_MAP_SIZE ? \
+						RADIX_TREE_MAP_SIZE : 0;
+
+	if (WARN_ON_ONCE(index & (BITS_PER_LONG - 1)))
+		return bits;
+
+	index += BITS_PER_LONG;
+	for (;;) {
+		index -= RADIX_TREE_MAP_SIZE;
+		bits |= __radix_tree_get_tag_batch(root, index, tag);
+		if (!(index & (BITS_PER_LONG - 1)))
+			break;
+		bits <<= shift;
+	}
+
+	return bits;
+}
+
 static inline void __set_iter_shift(struct radix_tree_iter *iter,
 					unsigned int shift)
 {
diff --git a/tools/testing/radix-tree/idr-test.c b/tools/testing/radix-tree/idr-test.c
index 334ce1c..3f9f429 100644
--- a/tools/testing/radix-tree/idr-test.c
+++ b/tools/testing/radix-tree/idr-test.c
@@ -139,6 +139,8 @@ void idr_tag_test(void)
 	DEFINE_IDR(idr);
 	struct item *item;
 
+	assert(idr_get_tag_batch(&idr, 0, IDR_FREE) == ~0UL);
+
 	for (i = 0; i < 100; i++) {
 		item = item_create(i, 0);
 		assert(idr_alloc(&idr, item, 0, 0, GFP_KERNEL) == i);
@@ -146,6 +148,20 @@ void idr_tag_test(void)
 			idr_tag_set(&idr, i, IDR_TEST);
 	}
 
+	assert(idr_get_tag_batch(&idr, 0, IDR_FREE) == 0);
+#if BITS_PER_LONG == 64
+	assert(idr_get_tag_batch(&idr, 0, IDR_TEST) == 0x8102040810204081UL);
+	assert(idr_get_tag_batch(&idr, 64, IDR_TEST) == 0x408102040UL);
+#else
+	assert(idr_get_tag_batch(&idr, 0, IDR_TEST) == 0x10204081UL);
+	assert(idr_get_tag_batch(&idr, 32, IDR_TEST) == 0x81020408UL);
+	assert(idr_get_tag_batch(&idr, 64, IDR_TEST) == 0x08102040UL);
+	assert(idr_get_tag_batch(&idr, 96, IDR_TEST) == 0x4UL);
+#endif
+	assert((int)idr_get_tag_batch(&idr, 64, IDR_FREE) == 0);
+	assert(idr_get_tag_batch(&idr, 128, IDR_FREE) == ~0UL);
+	assert(idr_get_tag_batch(&idr, 128, IDR_TEST) == 0);
+
 	for (i = 0; i < 100; i += 14) {
 		assert(idr_tag_get(&idr, i, IDR_TEST));
 		idr_tag_clear(&idr, i, IDR_TEST);
@@ -159,6 +175,10 @@ void idr_tag_test(void)
 		assert(item->index % 14 == 7);
 	}
 
+	item_free(idr_remove(&idr, 7), 7);
+	assert((int)idr_get_tag_batch(&idr, 0, IDR_TEST) == 0x00200000UL);
+	assert((int)idr_get_tag_batch(&idr, 0, IDR_FREE) == 0x00000080);
+
 	idr_for_each(&idr, item_idr_free, &idr);
 	idr_destroy(&idr);
 }
diff --git a/tools/testing/radix-tree/test.h b/tools/testing/radix-tree/test.h
index cbabea1..09616ff 100644
--- a/tools/testing/radix-tree/test.h
+++ b/tools/testing/radix-tree/test.h
@@ -52,7 +52,7 @@ struct item *
 /* Normally private parts of lib/radix-tree.c */
 struct radix_tree_node *entry_to_node(void *ptr);
 void radix_tree_dump(struct radix_tree_root *root);
-int root_tag_get(struct radix_tree_root *root, unsigned int tag);
+bool root_tag_get(struct radix_tree_root *root, unsigned int tag);
 unsigned long node_maxindex(struct radix_tree_node *);
 unsigned long shift_maxindex(unsigned int shift);
 int radix_tree_cpu_dead(unsigned int cpu);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 04/13] idr, radix-tree: Implement copy_preload
  2017-04-27 18:48 [PATCH 00/13] vfs: Convert file allocation code to use the IDR Sandhya Bankar
                   ` (2 preceding siblings ...)
  2017-04-27 19:06 ` [PATCH 03/13] idr, radix-tree: Add get_tag_batch function Sandhya Bankar
@ 2017-04-27 19:07 ` Sandhya Bankar
  2017-04-27 19:08 ` [PATCH 05/13] vfs: Replace array of file pointers with an IDR Sandhya Bankar
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Sandhya Bankar @ 2017-04-27 19:07 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, viro, mawmilcox, keescook,
	adobriyan, re.emese, riel

In the file descriptor table duplication code (called at fork()), we
need to duplicate an IDR.  But we have to do it under a lock (so another
thread doesn't open/close a fd in the middle), and there's no suitable
preload operation for this today.  Adding just idr_copy_preload() isn't
enough as another thread could grow the fd table between starting the
preload and acquiring the lock.  We also need idr_check_preload() to be
called after acquiring the lock.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/idr.h                         | 32 +++++++++++
 include/linux/radix-tree.h                  |  3 ++
 lib/radix-tree.c                            | 83 +++++++++++++++++++++++++++++
 tools/testing/radix-tree/idr-test.c         | 24 +++++++++
 tools/testing/radix-tree/linux/radix-tree.h |  2 +
 tools/testing/radix-tree/main.c             | 38 +++++++++++++
 6 files changed, 182 insertions(+)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index d43cf01..eed1c1a 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -166,6 +166,38 @@ static inline bool idr_is_empty(const struct idr *idr)
 }
 
 /**
+ * idr_copy_preload - preload for idr_copy()
+ * @src: IDR to be copied from
+ * @gfp: Allocation mask to use for preloading
+ *
+ * Preallocates enough memory for a call to idr_copy().  This function
+ * returns with preemption disabled.  Call idr_preload_end() once the
+ * copy has completed.
+ *
+ * Return: -ENOMEM if the memory could not be allocated.
+ */
+static inline int idr_copy_preload(const struct idr *src, gfp_t gfp)
+{
+	return radix_tree_copy_preload(&src->idr_rt, gfp);
+}
+
+/**
+ * idr_check_preload - Check the preload is still sufficient
+ * @src: IDR to be copied from
+ *
+ * Between the successful allocation of memory and acquiring the lock that
+ * protects @src, the IDR may have expanded.  If this function returns
+ * false, more memory needs to be preallocated.
+ *
+ * Return: true if enough memory remains allocated, false to retry the
+ * preallocation.
+ */
+static inline bool idr_check_preload(const struct idr *src)
+{
+	return radix_tree_check_preload(&src->idr_rt);
+}
+
+/**
  * idr_preload_end - end preload section started with idr_preload()
  *
  * Each idr_preload() should be matched with an invocation of this
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index f701e0b..f53d004 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -354,6 +354,9 @@ static inline void radix_tree_preload_end(void)
 	preempt_enable();
 }
 
+int radix_tree_copy_preload(const struct radix_tree_root *, gfp_t);
+bool radix_tree_check_preload(const struct radix_tree_root *);
+
 int radix_tree_split_preload(unsigned old_order, unsigned new_order, gfp_t);
 int radix_tree_split(struct radix_tree_root *, unsigned long index,
 			unsigned new_order);
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 855ac8e..c1d75224 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -277,6 +277,55 @@ static unsigned long next_index(unsigned long index,
 	return (index & ~node_maxindex(node)) + (offset << node->shift);
 }
 
+/**
+ * radix_tree_count_nodes - Returns the number of nodes in this tree
+ * @root: radix tree root
+ *
+ * This routine does not examine every node in the tree; it assumes that
+ * all entries in the tree at level 1 are nodes.  It will overestimate
+ * the number of nodes in the tree in the presence of entries of order
+ * RADIX_TREE_MAP_SHIFT or higher.
+ */
+static unsigned long radix_tree_count_nodes(const struct radix_tree_root *root)
+{
+	struct radix_tree_node *node, *child;
+	unsigned long n = 1;
+	void *entry = rcu_dereference_raw(root->rnode);
+	unsigned int offset = 0;
+
+	if (!radix_tree_is_internal_node(entry))
+		return 0;
+
+	node = entry_to_node(entry);
+	if (!node->shift)
+		return n;
+
+	n += node->count;
+	if (node->shift == RADIX_TREE_MAP_SHIFT)
+		return n;
+
+	while (node) {
+		if (offset == RADIX_TREE_MAP_SIZE) {
+			offset = node->offset + 1;
+			node = node->parent;
+			continue;
+		}
+
+		entry = rcu_dereference_raw(node->slots[offset]);
+		offset++;
+		if (!radix_tree_is_internal_node(entry))
+			continue;
+		child = entry_to_node(entry);
+		n += child->count;
+		if (node->shift <= 2 * RADIX_TREE_MAP_SHIFT)
+			continue;
+		offset = 0;
+		node = child;
+	}
+
+	return n;
+}
+
 #ifndef __KERNEL__
 static void dump_node(struct radix_tree_node *node, unsigned long index)
 {
@@ -530,6 +579,40 @@ int radix_tree_maybe_preload(gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(radix_tree_maybe_preload);
 
+/**
+ * radix_tree_copy_preload - preload for radix_tree_copy()
+ * @src: radix tree root to be copied from
+ * @gfp: Allocation mask to use for preloading
+ *
+ * Preallocates enough memory for a call to radix_tree_copy().  This function
+ * returns with preemption disabled.  Call radix_tree_preload_end() once the
+ * copy has completed.
+ *
+ * Return: -ENOMEM if the memory could not be allocated.
+ */
+int radix_tree_copy_preload(const struct radix_tree_root *src, gfp_t gfp_mask)
+{
+	return __radix_tree_preload(gfp_mask, radix_tree_count_nodes(src));
+}
+
+/**
+ * radix_tree_check_preload - Check the preload is still sufficient
+ * @src: radix tree to be copied from
+ * @cookie: Cookie returned from radix_tree_copy_preload()
+ *
+ * Between the successful allocation of memory and acquiring the lock that
+ * protects @src, the radix tree may have expanded.  Call this function
+ * to see if the preallocation needs to be expanded too.
+ *
+ * Return: true if enough memory remains allocated, false if it does not
+ * suffice.
+ */
+bool radix_tree_check_preload(const struct radix_tree_root *src)
+{
+	struct radix_tree_preload *rtp = this_cpu_ptr(&radix_tree_preloads);
+	return rtp->nr >= radix_tree_count_nodes(src);
+}
+
 #ifdef CONFIG_RADIX_TREE_MULTIORDER
 /*
  * Preload with enough objects to ensure that we can split a single entry
diff --git a/tools/testing/radix-tree/idr-test.c b/tools/testing/radix-tree/idr-test.c
index 3f9f429..e8d5386 100644
--- a/tools/testing/radix-tree/idr-test.c
+++ b/tools/testing/radix-tree/idr-test.c
@@ -225,6 +225,29 @@ void idr_get_next_test(void)
 	idr_destroy(&idr);
 }
 
+void idr_copy_test(void)
+{
+	DEFINE_IDR(src);
+	DEFINE_IDR(dst);
+	int i;
+	void *p;
+
+	for (i = 0; i < 10000; i++) {
+		struct item *item = item_create(i, 0);
+		assert(idr_alloc(&src, item, 0, 20000, GFP_KERNEL) == i);
+	}
+
+	idr_copy_preload(&src, GFP_KERNEL);
+	idr_for_each_entry(&src, p, i) {
+		assert(idr_alloc(&dst, p, i, i + 1, GFP_NOWAIT) == i);
+	}
+	idr_preload_end();
+
+	idr_for_each(&src, item_idr_free, NULL);
+	idr_destroy(&src);
+	idr_destroy(&dst);
+}
+
 void idr_checks(void)
 {
 	unsigned long i;
@@ -276,6 +299,7 @@ void idr_checks(void)
 	idr_tag_test();
 	idr_nowait_test();
 	idr_get_next_test();
+	idr_copy_test();
 }
 
 /*
diff --git a/tools/testing/radix-tree/linux/radix-tree.h b/tools/testing/radix-tree/linux/radix-tree.h
index bf1bb23..0c5885e 100644
--- a/tools/testing/radix-tree/linux/radix-tree.h
+++ b/tools/testing/radix-tree/linux/radix-tree.h
@@ -4,6 +4,8 @@
 #include "generated/map-shift.h"
 #include "../../../../include/linux/radix-tree.h"
 
+unsigned long radix_tree_count_nodes(const struct radix_tree_root *root);
+
 extern int kmalloc_verbose;
 extern int test_verbose;
 
diff --git a/tools/testing/radix-tree/main.c b/tools/testing/radix-tree/main.c
index bc9a784..a7504e2 100644
--- a/tools/testing/radix-tree/main.c
+++ b/tools/testing/radix-tree/main.c
@@ -11,6 +11,42 @@
 #include "test.h"
 #include "regression.h"
 
+static void count_node_check(void)
+{
+	unsigned long i, j, k;
+	RADIX_TREE(tree, GFP_KERNEL);
+
+	assert(radix_tree_empty(&tree));
+	assert(radix_tree_count_nodes(&tree) == 0);
+	assert(item_insert(&tree, 0) == 0);
+	assert(radix_tree_count_nodes(&tree) == 0);
+
+	for (i = 1; i < RADIX_TREE_MAP_SIZE; i++) {
+		assert(item_insert(&tree, i) == 0);
+		assert(radix_tree_count_nodes(&tree) == 1);
+	}
+
+	j = 3;
+	k = 3;
+	for (i = RADIX_TREE_MAP_SIZE; i < i * RADIX_TREE_MAP_SIZE;
+			i *= RADIX_TREE_MAP_SIZE) {
+		assert(item_insert(&tree, i) == 0);
+		assert(radix_tree_count_nodes(&tree) == j);
+		j += k;
+		k++;
+	}
+
+	assert(item_insert(&tree, i) == 0);
+	assert(radix_tree_count_nodes(&tree) == j);
+
+	item_kill_tree(&tree);
+}
+
+void basic_checks(void)
+{
+	count_node_check();
+}
+
 void __gang_check(unsigned long middle, long down, long up, int chunk, int hop)
 {
 	long idx;
@@ -362,6 +398,8 @@ int main(int argc, char **argv)
 	rcu_register_thread();
 	radix_tree_init();
 
+	basic_checks();
+
 	regression1_test();
 	regression2_test();
 	regression3_test();
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 05/13] vfs: Replace array of file pointers with an IDR
  2017-04-27 18:48 [PATCH 00/13] vfs: Convert file allocation code to use the IDR Sandhya Bankar
                   ` (3 preceding siblings ...)
  2017-04-27 19:07 ` [PATCH 04/13] idr, radix-tree: Implement copy_preload Sandhya Bankar
@ 2017-04-27 19:08 ` Sandhya Bankar
  2017-10-04 15:45   ` [RESEND PATCH " Mateusz Guzik
  2017-04-27 19:09 ` [PATCH 06/13] vfs: Remove next_fd from fd alloc code path Sandhya Bankar
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 15+ messages in thread
From: Sandhya Bankar @ 2017-04-27 19:08 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, viro, mawmilcox, keescook,
	adobriyan, re.emese, riel

Instead of storing all the file pointers in a single array, use an
IDR.  It is RCU-safe, and does not need to be reallocated when the
fd array grows.  It also handles allocation of new file descriptors.

Signed-off-by: Sandhya Bankar <bankarsandhya512@gmail.com>
[mawilcox@microsoft.com: fixes]
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/file.c               | 180 ++++++++++++++++++++----------------------------
 include/linux/fdtable.h |  10 +--
 2 files changed, 79 insertions(+), 111 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index ad6f094..1c000d8 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -47,7 +47,6 @@ static void *alloc_fdmem(size_t size)
 
 static void __free_fdtable(struct fdtable *fdt)
 {
-	kvfree(fdt->fd);
 	kvfree(fdt->open_fds);
 	kfree(fdt);
 }
@@ -89,15 +88,7 @@ static void copy_fd_bitmaps(struct fdtable *nfdt, struct fdtable *ofdt,
  */
 static void copy_fdtable(struct fdtable *nfdt, struct fdtable *ofdt)
 {
-	unsigned int cpy, set;
-
 	BUG_ON(nfdt->max_fds < ofdt->max_fds);
-
-	cpy = ofdt->max_fds * sizeof(struct file *);
-	set = (nfdt->max_fds - ofdt->max_fds) * sizeof(struct file *);
-	memcpy(nfdt->fd, ofdt->fd, cpy);
-	memset((char *)nfdt->fd + cpy, 0, set);
-
 	copy_fd_bitmaps(nfdt, ofdt, ofdt->max_fds);
 }
 
@@ -131,15 +122,11 @@ static struct fdtable * alloc_fdtable(unsigned int nr)
 	if (!fdt)
 		goto out;
 	fdt->max_fds = nr;
-	data = alloc_fdmem(nr * sizeof(struct file *));
-	if (!data)
-		goto out_fdt;
-	fdt->fd = data;
 
 	data = alloc_fdmem(max_t(size_t,
 				 2 * nr / BITS_PER_BYTE + BITBIT_SIZE(nr), L1_CACHE_BYTES));
 	if (!data)
-		goto out_arr;
+		goto out_fdt;
 	fdt->open_fds = data;
 	data += nr / BITS_PER_BYTE;
 	fdt->close_on_exec = data;
@@ -148,8 +135,6 @@ static struct fdtable * alloc_fdtable(unsigned int nr)
 
 	return fdt;
 
-out_arr:
-	kvfree(fdt->fd);
 out_fdt:
 	kfree(fdt);
 out:
@@ -170,6 +155,7 @@ static int expand_fdtable(struct files_struct *files, unsigned int nr)
 	struct fdtable *new_fdt, *cur_fdt;
 
 	spin_unlock(&files->file_lock);
+	idr_preload_end();
 	new_fdt = alloc_fdtable(nr);
 
 	/* make sure all __fd_install() have seen resize_in_progress
@@ -178,6 +164,7 @@ static int expand_fdtable(struct files_struct *files, unsigned int nr)
 	if (atomic_read(&files->count) > 1)
 		synchronize_sched();
 
+	idr_preload(GFP_KERNEL);
 	spin_lock(&files->file_lock);
 	if (!new_fdt)
 		return -ENOMEM;
@@ -228,8 +215,10 @@ static int expand_files(struct files_struct *files, unsigned int nr)
 
 	if (unlikely(files->resize_in_progress)) {
 		spin_unlock(&files->file_lock);
+		idr_preload_end();
 		expanded = 1;
 		wait_event(files->resize_wait, !files->resize_in_progress);
+		idr_preload(GFP_KERNEL);
 		spin_lock(&files->file_lock);
 		goto repeat;
 	}
@@ -290,8 +279,8 @@ static unsigned int count_open_files(struct fdtable *fdt)
 struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
 {
 	struct files_struct *newf;
-	struct file **old_fds, **new_fds;
 	unsigned int open_files, i;
+	struct file *f;
 	struct fdtable *old_fdt, *new_fdt;
 
 	*errorp = -ENOMEM;
@@ -302,6 +291,7 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
 	atomic_set(&newf->count, 1);
 
 	spin_lock_init(&newf->file_lock);
+	idr_init(&newf->fd_idr);
 	newf->resize_in_progress = false;
 	init_waitqueue_head(&newf->resize_wait);
 	newf->next_fd = 0;
@@ -310,8 +300,9 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
 	new_fdt->close_on_exec = newf->close_on_exec_init;
 	new_fdt->open_fds = newf->open_fds_init;
 	new_fdt->full_fds_bits = newf->full_fds_bits_init;
-	new_fdt->fd = &newf->fd_array[0];
 
+restart:
+	idr_copy_preload(&oldf->fd_idr, GFP_KERNEL);
 	spin_lock(&oldf->file_lock);
 	old_fdt = files_fdtable(oldf);
 	open_files = count_open_files(old_fdt);
@@ -321,6 +312,7 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
 	 */
 	while (unlikely(open_files > new_fdt->max_fds)) {
 		spin_unlock(&oldf->file_lock);
+		idr_preload_end();
 
 		if (new_fdt != &newf->fdtab)
 			__free_fdtable(new_fdt);
@@ -343,41 +335,50 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
 		 * who knows it may have a new bigger fd table. We need
 		 * the latest pointer.
 		 */
+		idr_copy_preload(&oldf->fd_idr, GFP_KERNEL);
 		spin_lock(&oldf->file_lock);
 		old_fdt = files_fdtable(oldf);
 		open_files = count_open_files(old_fdt);
 	}
 
+	if (!idr_check_preload(&oldf->fd_idr)) {
+		spin_unlock(&oldf->file_lock);
+		idr_preload_end();
+		goto restart;
+	}
+
 	copy_fd_bitmaps(new_fdt, old_fdt, open_files);
 
-	old_fds = old_fdt->fd;
-	new_fds = new_fdt->fd;
-
-	for (i = open_files; i != 0; i--) {
-		struct file *f = *old_fds++;
-		if (f) {
-			get_file(f);
-		} else {
-			/*
-			 * The fd may be claimed in the fd bitmap but not yet
-			 * instantiated in the files array if a sibling thread
-			 * is partway through open().  So make sure that this
-			 * fd is available to the new process.
-			 */
-			__clear_open_fd(open_files - i, new_fdt);
+	idr_for_each_entry(&oldf->fd_idr, f, i) {
+		int err;
+
+		get_file(f);
+		err = idr_alloc(&newf->fd_idr, f, i, i + 1, GFP_NOWAIT);
+		if (WARN(err != i, "Could not allocate %d: %d", i, err)) {
+			spin_unlock(&oldf->file_lock);
+			goto out;
 		}
-		rcu_assign_pointer(*new_fds++, f);
 	}
+
 	spin_unlock(&oldf->file_lock);
+	idr_preload_end();
 
-	/* clear the remainder */
-	memset(new_fds, 0, (new_fdt->max_fds - open_files) * sizeof(struct file *));
+	/*
+	 * The fd may be claimed in the fd bitmap but not yet
+	 * instantiated in the files array if a sibling thread
+	 * is partway through open().
+	 */
+	for_each_set_bit(i, new_fdt->open_fds, new_fdt->max_fds) {
+		if (!idr_find(&newf->fd_idr, i))
+			__clear_bit(i, new_fdt->open_fds);
+	}
 
 	rcu_assign_pointer(newf->fdt, new_fdt);
 
 	return newf;
 
 out_release:
+	idr_destroy(&newf->fd_idr);
 	kmem_cache_free(files_cachep, newf);
 out:
 	return NULL;
@@ -401,7 +402,8 @@ static struct fdtable *close_files(struct files_struct * files)
 		set = fdt->open_fds[j++];
 		while (set) {
 			if (set & 1) {
-				struct file * file = xchg(&fdt->fd[i], NULL);
+				struct file *file;
+				file = idr_remove(&files->fd_idr, i);
 				if (file) {
 					filp_close(file, files);
 					cond_resched_rcu_qs();
@@ -469,28 +471,14 @@ struct files_struct init_files = {
 	.fdt		= &init_files.fdtab,
 	.fdtab		= {
 		.max_fds	= NR_OPEN_DEFAULT,
-		.fd		= &init_files.fd_array[0],
 		.close_on_exec	= init_files.close_on_exec_init,
 		.open_fds	= init_files.open_fds_init,
 		.full_fds_bits	= init_files.full_fds_bits_init,
 	},
 	.file_lock	= __SPIN_LOCK_UNLOCKED(init_files.file_lock),
+	.fd_idr		= IDR_INIT,
 };
 
-static unsigned int find_next_fd(struct fdtable *fdt, unsigned int start)
-{
-	unsigned int maxfd = fdt->max_fds;
-	unsigned int maxbit = maxfd / BITS_PER_LONG;
-	unsigned int bitbit = start / BITS_PER_LONG;
-
-	bitbit = find_next_zero_bit(fdt->full_fds_bits, maxbit, bitbit) * BITS_PER_LONG;
-	if (bitbit > maxfd)
-		return maxfd;
-	if (bitbit > start)
-		start = bitbit;
-	return find_next_zero_bit(fdt->open_fds, maxfd, start);
-}
-
 /*
  * allocate a file descriptor, mark it busy.
  */
@@ -501,54 +489,37 @@ int __alloc_fd(struct files_struct *files,
 	int error;
 	struct fdtable *fdt;
 
+	idr_preload(GFP_KERNEL);
 	spin_lock(&files->file_lock);
-repeat:
-	fdt = files_fdtable(files);
-	fd = start;
-	if (fd < files->next_fd)
-		fd = files->next_fd;
 
-	if (fd < fdt->max_fds)
-		fd = find_next_fd(fdt, fd);
-
-	/*
-	 * N.B. For clone tasks sharing a files structure, this test
-	 * will limit the total number of files that can be opened.
-	 */
-	error = -EMFILE;
-	if (fd >= end)
+	error = idr_alloc(&files->fd_idr, NULL, start, end, GFP_NOWAIT);
+	if (error == -ENOSPC) {
+		error = -EMFILE;
 		goto out;
+	}
+	BUG_ON(error < 0);
+	fd = error;
 
 	error = expand_files(files, fd);
-	if (error < 0)
+	if (error < 0) {
+		idr_remove(&files->fd_idr, fd);
 		goto out;
-
-	/*
-	 * If we needed to expand the fs array we
-	 * might have blocked - try again.
-	 */
-	if (error)
-		goto repeat;
+	}
 
 	if (start <= files->next_fd)
 		files->next_fd = fd + 1;
 
+	fdt = files_fdtable(files);
 	__set_open_fd(fd, fdt);
 	if (flags & O_CLOEXEC)
 		__set_close_on_exec(fd, fdt);
 	else
 		__clear_close_on_exec(fd, fdt);
 	error = fd;
-#if 1
-	/* Sanity check */
-	if (rcu_access_pointer(fdt->fd[fd]) != NULL) {
-		printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd);
-		rcu_assign_pointer(fdt->fd[fd], NULL);
-	}
-#endif
 
 out:
 	spin_unlock(&files->file_lock);
+	idr_preload_end();
 	return error;
 }
 
@@ -575,6 +546,7 @@ void put_unused_fd(unsigned int fd)
 {
 	struct files_struct *files = current->files;
 	spin_lock(&files->file_lock);
+	BUG_ON(idr_remove(&files->fd_idr, fd));
 	__put_unused_fd(files, fd);
 	spin_unlock(&files->file_lock);
 }
@@ -604,22 +576,9 @@ void put_unused_fd(unsigned int fd)
 void __fd_install(struct files_struct *files, unsigned int fd,
 		struct file *file)
 {
-	struct fdtable *fdt;
-
-	might_sleep();
-	rcu_read_lock_sched();
-
-	while (unlikely(files->resize_in_progress)) {
-		rcu_read_unlock_sched();
-		wait_event(files->resize_wait, !files->resize_in_progress);
-		rcu_read_lock_sched();
-	}
-	/* coupled with smp_wmb() in expand_fdtable() */
-	smp_rmb();
-	fdt = rcu_dereference_sched(files->fdt);
-	BUG_ON(fdt->fd[fd] != NULL);
-	rcu_assign_pointer(fdt->fd[fd], file);
-	rcu_read_unlock_sched();
+	rcu_read_lock();
+	BUG_ON(idr_replace(&files->fd_idr, file, fd));
+	rcu_read_unlock();
 }
 
 void fd_install(unsigned int fd, struct file *file)
@@ -641,10 +600,9 @@ int __close_fd(struct files_struct *files, unsigned fd)
 	fdt = files_fdtable(files);
 	if (fd >= fdt->max_fds)
 		goto out_unlock;
-	file = fdt->fd[fd];
+	file = idr_remove(&files->fd_idr, fd);
 	if (!file)
 		goto out_unlock;
-	rcu_assign_pointer(fdt->fd[fd], NULL);
 	__clear_close_on_exec(fd, fdt);
 	__put_unused_fd(files, fd);
 	spin_unlock(&files->file_lock);
@@ -676,10 +634,9 @@ void do_close_on_exec(struct files_struct *files)
 			struct file *file;
 			if (!(set & 1))
 				continue;
-			file = fdt->fd[fd];
+			file = idr_remove(&files->fd_idr, fd);
 			if (!file)
 				continue;
-			rcu_assign_pointer(fdt->fd[fd], NULL);
 			__put_unused_fd(files, fd);
 			spin_unlock(&files->file_lock);
 			filp_close(file, files);
@@ -842,17 +799,27 @@ static int do_dup2(struct files_struct *files,
 	 * tables and this condition does not arise without those.
 	 */
 	fdt = files_fdtable(files);
-	tofree = fdt->fd[fd];
+	tofree = idr_find(&files->fd_idr, fd);
 	if (!tofree && fd_is_open(fd, fdt))
 		goto Ebusy;
 	get_file(file);
-	rcu_assign_pointer(fdt->fd[fd], file);
+	if (tofree) {
+		idr_replace(&files->fd_idr, file, fd);
+	} else {
+		int err = idr_alloc(&files->fd_idr, file, fd, fd + 1,
+								GFP_NOWAIT);
+		if (err != fd) {
+			WARN(1, "do_dup2 got %d for fd %d\n", err, fd);
+			goto Ebusy;
+		}
+	}
 	__set_open_fd(fd, fdt);
 	if (flags & O_CLOEXEC)
 		__set_close_on_exec(fd, fdt);
 	else
 		__clear_close_on_exec(fd, fdt);
 	spin_unlock(&files->file_lock);
+	idr_preload_end();
 
 	if (tofree)
 		filp_close(tofree, files);
@@ -861,6 +828,7 @@ static int do_dup2(struct files_struct *files,
 
 Ebusy:
 	spin_unlock(&files->file_lock);
+	idr_preload_end();
 	return -EBUSY;
 }
 
@@ -875,6 +843,7 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
 	if (fd >= rlimit(RLIMIT_NOFILE))
 		return -EBADF;
 
+	idr_preload(GFP_KERNEL);
 	spin_lock(&files->file_lock);
 	err = expand_files(files, fd);
 	if (unlikely(err < 0))
@@ -883,6 +852,7 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
 
 out_unlock:
 	spin_unlock(&files->file_lock);
+	idr_preload_end();
 	return err;
 }
 
@@ -901,6 +871,7 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
 	if (newfd >= rlimit(RLIMIT_NOFILE))
 		return -EBADF;
 
+	idr_preload(GFP_KERNEL);
 	spin_lock(&files->file_lock);
 	err = expand_files(files, newfd);
 	file = fcheck(oldfd);
@@ -917,6 +888,7 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
 	err = -EBADF;
 out_unlock:
 	spin_unlock(&files->file_lock);
+	idr_preload_end();
 	return err;
 }
 
@@ -974,7 +946,7 @@ int iterate_fd(struct files_struct *files, unsigned n,
 	spin_lock(&files->file_lock);
 	for (fdt = files_fdtable(files); n < fdt->max_fds; n++) {
 		struct file *file;
-		file = rcu_dereference_check_fdtable(files, fdt->fd[n]);
+		file = __fcheck_files(files, n);
 		if (!file)
 			continue;
 		res = f(p, file, n);
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 6e84b2cae..4072f24 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -12,6 +12,7 @@
 #include <linux/types.h>
 #include <linux/init.h>
 #include <linux/fs.h>
+#include <linux/idr.h>
 
 #include <linux/atomic.h>
 
@@ -23,7 +24,6 @@
 
 struct fdtable {
 	unsigned int max_fds;
-	struct file __rcu **fd;      /* current fd array */
 	unsigned long *close_on_exec;
 	unsigned long *open_fds;
 	unsigned long *full_fds_bits;
@@ -51,6 +51,7 @@ struct files_struct {
 	bool resize_in_progress;
 	wait_queue_head_t resize_wait;
 
+	struct idr fd_idr;
 	struct fdtable __rcu *fdt;
 	struct fdtable fdtab;
   /*
@@ -61,7 +62,6 @@ struct files_struct {
 	unsigned long close_on_exec_init[1];
 	unsigned long open_fds_init[1];
 	unsigned long full_fds_bits_init[1];
-	struct file __rcu * fd_array[NR_OPEN_DEFAULT];
 };
 
 struct file_operations;
@@ -79,11 +79,7 @@ struct files_struct {
  */
 static inline struct file *__fcheck_files(struct files_struct *files, unsigned int fd)
 {
-	struct fdtable *fdt = rcu_dereference_raw(files->fdt);
-
-	if (fd < fdt->max_fds)
-		return rcu_dereference_raw(fdt->fd[fd]);
-	return NULL;
+	return idr_find(&files->fd_idr, fd);
 }
 
 static inline struct file *fcheck_files(struct files_struct *files, unsigned int fd)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 06/13] vfs: Remove next_fd from fd alloc code path.
  2017-04-27 18:48 [PATCH 00/13] vfs: Convert file allocation code to use the IDR Sandhya Bankar
                   ` (4 preceding siblings ...)
  2017-04-27 19:08 ` [PATCH 05/13] vfs: Replace array of file pointers with an IDR Sandhya Bankar
@ 2017-04-27 19:09 ` Sandhya Bankar
  2017-04-27 19:10 ` [PATCH 07/13] vfs: Remove full_fds_bits from fd allocation " Sandhya Bankar
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Sandhya Bankar @ 2017-04-27 19:09 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, viro, mawmilcox, keescook,
	adobriyan, re.emese, riel

The IDR is used in file descriptor allocation code to
allocate new file descriptor so, no need of next_fd to
track next file descriptor.
Hence removing it from file descriptor allocation code path.

Signed-off-by: Sandhya Bankar <bankarsandhya512@gmail.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/file.c               | 6 ------
 include/linux/fdtable.h | 1 -
 2 files changed, 7 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index 1c000d8..da3a35b 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -294,7 +294,6 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
 	idr_init(&newf->fd_idr);
 	newf->resize_in_progress = false;
 	init_waitqueue_head(&newf->resize_wait);
-	newf->next_fd = 0;
 	new_fdt = &newf->fdtab;
 	new_fdt->max_fds = NR_OPEN_DEFAULT;
 	new_fdt->close_on_exec = newf->close_on_exec_init;
@@ -506,9 +505,6 @@ int __alloc_fd(struct files_struct *files,
 		goto out;
 	}
 
-	if (start <= files->next_fd)
-		files->next_fd = fd + 1;
-
 	fdt = files_fdtable(files);
 	__set_open_fd(fd, fdt);
 	if (flags & O_CLOEXEC)
@@ -538,8 +534,6 @@ static void __put_unused_fd(struct files_struct *files, unsigned int fd)
 {
 	struct fdtable *fdt = files_fdtable(files);
 	__clear_open_fd(fd, fdt);
-	if (fd < files->next_fd)
-		files->next_fd = fd;
 }
 
 void put_unused_fd(unsigned int fd)
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 4072f24..c2a53b6 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -58,7 +58,6 @@ struct files_struct {
    * written part on a separate cache line in SMP
    */
 	spinlock_t file_lock ____cacheline_aligned_in_smp;
-	unsigned int next_fd;
 	unsigned long close_on_exec_init[1];
 	unsigned long open_fds_init[1];
 	unsigned long full_fds_bits_init[1];
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 07/13] vfs: Remove full_fds_bits from fd allocation code path.
  2017-04-27 18:48 [PATCH 00/13] vfs: Convert file allocation code to use the IDR Sandhya Bankar
                   ` (5 preceding siblings ...)
  2017-04-27 19:09 ` [PATCH 06/13] vfs: Remove next_fd from fd alloc code path Sandhya Bankar
@ 2017-04-27 19:10 ` Sandhya Bankar
  2017-04-27 19:11 ` [PATCH 08/13] vfs: Use idr_tag_get() in fd_is_open() Sandhya Bankar
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Sandhya Bankar @ 2017-04-27 19:10 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, viro, mawmilcox, keescook,
	adobriyan, re.emese, riel

The IDR has removed the need to have full_fds_bits hence removing it.

Signed-off-by: Sandhya Bankar <bankarsandhya512@gmail.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/file.c               | 18 +-----------------
 include/linux/fdtable.h |  2 --
 2 files changed, 1 insertion(+), 19 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index da3a35b..e8c6ada 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -56,9 +56,6 @@ static void free_fdtable_rcu(struct rcu_head *rcu)
 	__free_fdtable(container_of(rcu, struct fdtable, rcu));
 }
 
-#define BITBIT_NR(nr)	BITS_TO_LONGS(BITS_TO_LONGS(nr))
-#define BITBIT_SIZE(nr)	(BITBIT_NR(nr) * sizeof(long))
-
 /*
  * Copy 'count' fd bits from the old table to the new table and clear the extra
  * space if any.  This does not copy the file pointers.  Called with the files
@@ -75,11 +72,6 @@ static void copy_fd_bitmaps(struct fdtable *nfdt, struct fdtable *ofdt,
 	memset((char *)nfdt->open_fds + cpy, 0, set);
 	memcpy(nfdt->close_on_exec, ofdt->close_on_exec, cpy);
 	memset((char *)nfdt->close_on_exec + cpy, 0, set);
-
-	cpy = BITBIT_SIZE(count);
-	set = BITBIT_SIZE(nfdt->max_fds) - cpy;
-	memcpy(nfdt->full_fds_bits, ofdt->full_fds_bits, cpy);
-	memset((char *)nfdt->full_fds_bits + cpy, 0, set);
 }
 
 /*
@@ -124,14 +116,12 @@ static struct fdtable * alloc_fdtable(unsigned int nr)
 	fdt->max_fds = nr;
 
 	data = alloc_fdmem(max_t(size_t,
-				 2 * nr / BITS_PER_BYTE + BITBIT_SIZE(nr), L1_CACHE_BYTES));
+				 2 * nr / BITS_PER_BYTE, L1_CACHE_BYTES));
 	if (!data)
 		goto out_fdt;
 	fdt->open_fds = data;
 	data += nr / BITS_PER_BYTE;
 	fdt->close_on_exec = data;
-	data += nr / BITS_PER_BYTE;
-	fdt->full_fds_bits = data;
 
 	return fdt;
 
@@ -246,15 +236,11 @@ static inline void __clear_close_on_exec(unsigned int fd, struct fdtable *fdt)
 static inline void __set_open_fd(unsigned int fd, struct fdtable *fdt)
 {
 	__set_bit(fd, fdt->open_fds);
-	fd /= BITS_PER_LONG;
-	if (!~fdt->open_fds[fd])
-		__set_bit(fd, fdt->full_fds_bits);
 }
 
 static inline void __clear_open_fd(unsigned int fd, struct fdtable *fdt)
 {
 	__clear_bit(fd, fdt->open_fds);
-	__clear_bit(fd / BITS_PER_LONG, fdt->full_fds_bits);
 }
 
 static unsigned int count_open_files(struct fdtable *fdt)
@@ -298,7 +284,6 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
 	new_fdt->max_fds = NR_OPEN_DEFAULT;
 	new_fdt->close_on_exec = newf->close_on_exec_init;
 	new_fdt->open_fds = newf->open_fds_init;
-	new_fdt->full_fds_bits = newf->full_fds_bits_init;
 
 restart:
 	idr_copy_preload(&oldf->fd_idr, GFP_KERNEL);
@@ -472,7 +457,6 @@ struct files_struct init_files = {
 		.max_fds	= NR_OPEN_DEFAULT,
 		.close_on_exec	= init_files.close_on_exec_init,
 		.open_fds	= init_files.open_fds_init,
-		.full_fds_bits	= init_files.full_fds_bits_init,
 	},
 	.file_lock	= __SPIN_LOCK_UNLOCKED(init_files.file_lock),
 	.fd_idr		= IDR_INIT,
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index c2a53b6..6bece35 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -26,7 +26,6 @@ struct fdtable {
 	unsigned int max_fds;
 	unsigned long *close_on_exec;
 	unsigned long *open_fds;
-	unsigned long *full_fds_bits;
 	struct rcu_head rcu;
 };
 
@@ -60,7 +59,6 @@ struct files_struct {
 	spinlock_t file_lock ____cacheline_aligned_in_smp;
 	unsigned long close_on_exec_init[1];
 	unsigned long open_fds_init[1];
-	unsigned long full_fds_bits_init[1];
 };
 
 struct file_operations;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 08/13] vfs: Use idr_tag_get() in fd_is_open().
  2017-04-27 18:48 [PATCH 00/13] vfs: Convert file allocation code to use the IDR Sandhya Bankar
                   ` (6 preceding siblings ...)
  2017-04-27 19:10 ` [PATCH 07/13] vfs: Remove full_fds_bits from fd allocation " Sandhya Bankar
@ 2017-04-27 19:11 ` Sandhya Bankar
  2017-04-27 19:12 ` [PATCH 09/13] vfs: Rewrite close_files() Sandhya Bankar
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Sandhya Bankar @ 2017-04-27 19:11 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, viro, mawmilcox, keescook,
	adobriyan, re.emese, riel

Use idr_tag_get() in fd_is_open() to know whether a given fd is
allocated.  Also move fd_is_open() to file.c and make it static
as it is only called from one place.

Signed-off-by: Sandhya Bankar <bankarsandhya512@gmail.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/file.c               | 7 ++++++-
 include/linux/fdtable.h | 5 -----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index e8c6ada..8d67968 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -222,6 +222,11 @@ static int expand_files(struct files_struct *files, unsigned int nr)
 	return expanded;
 }
 
+static inline bool fd_is_open(unsigned int fd, struct files_struct *files)
+{
+	return !idr_tag_get(&files->fd_idr, fd, IDR_FREE);
+}
+
 static inline void __set_close_on_exec(unsigned int fd, struct fdtable *fdt)
 {
 	__set_bit(fd, fdt->close_on_exec);
@@ -778,7 +783,7 @@ static int do_dup2(struct files_struct *files,
 	 */
 	fdt = files_fdtable(files);
 	tofree = idr_find(&files->fd_idr, fd);
-	if (!tofree && fd_is_open(fd, fdt))
+	if (!tofree && fd_is_open(fd, files))
 		goto Ebusy;
 	get_file(file);
 	if (tofree) {
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 6bece35..67259f4 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -34,11 +34,6 @@ static inline bool close_on_exec(unsigned int fd, const struct fdtable *fdt)
 	return test_bit(fd, fdt->close_on_exec);
 }
 
-static inline bool fd_is_open(unsigned int fd, const struct fdtable *fdt)
-{
-	return test_bit(fd, fdt->open_fds);
-}
-
 /*
  * Open file table structure
  */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 09/13] vfs: Rewrite close_files()
  2017-04-27 18:48 [PATCH 00/13] vfs: Convert file allocation code to use the IDR Sandhya Bankar
                   ` (7 preceding siblings ...)
  2017-04-27 19:11 ` [PATCH 08/13] vfs: Use idr_tag_get() in fd_is_open() Sandhya Bankar
@ 2017-04-27 19:12 ` Sandhya Bankar
  2017-04-27 19:14 ` [PATCH 10/13] vfs: Replace close_on_exec bitmap with an IDR tag Sandhya Bankar
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Sandhya Bankar @ 2017-04-27 19:12 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, viro, mawmilcox, keescook,
	adobriyan, re.emese, riel

Use the IDR iteration functionality instead of the open_fds bitmap to
call filp_close() for each open file.  Also make close_files() return
void, because it no longer uses the fdtable.

Signed-off-by: Sandhya Bankar <bankarsandhya512@gmail.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/file.c | 37 +++++++++++--------------------------
 1 file changed, 11 insertions(+), 26 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index 8d67968..8cd77c5 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -373,37 +373,21 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
 	return NULL;
 }
 
-static struct fdtable *close_files(struct files_struct * files)
+static void close_files(struct files_struct * files)
 {
 	/*
-	 * It is safe to dereference the fd table without RCU or
-	 * ->file_lock because this is the last reference to the
-	 * files structure.
+	 * No need for RCU or ->file_lock protection because
+	 * this is the last reference to the files structure.
 	 */
-	struct fdtable *fdt = rcu_dereference_raw(files->fdt);
-	unsigned int i, j = 0;
+	struct file *file;
+	int fd;
 
-	for (;;) {
-		unsigned long set;
-		i = j * BITS_PER_LONG;
-		if (i >= fdt->max_fds)
-			break;
-		set = fdt->open_fds[j++];
-		while (set) {
-			if (set & 1) {
-				struct file *file;
-				file = idr_remove(&files->fd_idr, i);
-				if (file) {
-					filp_close(file, files);
-					cond_resched_rcu_qs();
-				}
-			}
-			i++;
-			set >>= 1;
-		}
+	idr_for_each_entry(&files->fd_idr, file, fd) {
+		filp_close(file, files);
+		cond_resched_rcu_qs();
 	}
 
-	return fdt;
+	idr_destroy(&files->fd_idr);
 }
 
 struct files_struct *get_files_struct(struct task_struct *task)
@@ -422,7 +406,8 @@ struct files_struct *get_files_struct(struct task_struct *task)
 void put_files_struct(struct files_struct *files)
 {
 	if (atomic_dec_and_test(&files->count)) {
-		struct fdtable *fdt = close_files(files);
+		struct fdtable *fdt = rcu_dereference_raw(files->fdt);
+		close_files(files);
 
 		/* free the arrays if they are not embedded */
 		if (fdt != &files->fdtab)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 10/13] vfs: Replace close_on_exec bitmap with an IDR tag
  2017-04-27 18:48 [PATCH 00/13] vfs: Convert file allocation code to use the IDR Sandhya Bankar
                   ` (8 preceding siblings ...)
  2017-04-27 19:12 ` [PATCH 09/13] vfs: Rewrite close_files() Sandhya Bankar
@ 2017-04-27 19:14 ` Sandhya Bankar
  2017-04-27 19:17 ` [PATCH 11/13] vfs: Add init_task.h include Sandhya Bankar
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Sandhya Bankar @ 2017-04-27 19:14 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, viro, mawmilcox, keescook,
	adobriyan, re.emese, riel

Replace close_on_exec with idr_(get,set,clear)_tag().
Through this patch, added new IDR tag FD_TAG_CLOEXEC
which is passing to idr_(get,set,clear)_tag() to
achieve close_on_exec functionality.
Also removed get_close_on_exec() and using close_on_exec() instead of that.

Signed-off-by: Sandhya Bankar <bankarsandhya512@gmail.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/exec.c               |  2 +-
 fs/fcntl.c              |  2 +-
 fs/file.c               | 85 +++++++++++++++----------------------------------
 fs/proc/fd.c            |  4 +--
 include/linux/fdtable.h | 19 +++++++----
 include/linux/file.h    |  1 -
 6 files changed, 41 insertions(+), 72 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 65145a3..2070bc6 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1728,7 +1728,7 @@ static int do_execveat_common(int fd, struct filename *filename,
 		 * inaccessible after exec. Relies on having exclusive access to
 		 * current->files (due to unshare_files above).
 		 */
-		if (close_on_exec(fd, rcu_dereference_raw(current->files->fdt)))
+		if (close_on_exec(fd, current->files))
 			bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE;
 		bprm->filename = pathbuf;
 	}
diff --git a/fs/fcntl.c b/fs/fcntl.c
index be8fbe2..9c2061b 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -255,7 +255,7 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 		err = f_dupfd(arg, filp, O_CLOEXEC);
 		break;
 	case F_GETFD:
-		err = get_close_on_exec(fd) ? FD_CLOEXEC : 0;
+		err = close_on_exec(fd, current->files) ? FD_CLOEXEC : 0;
 		break;
 	case F_SETFD:
 		err = 0;
diff --git a/fs/file.c b/fs/file.c
index 8cd77c5..56c5731 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -70,8 +70,6 @@ static void copy_fd_bitmaps(struct fdtable *nfdt, struct fdtable *ofdt,
 	set = (nfdt->max_fds - count) / BITS_PER_BYTE;
 	memcpy(nfdt->open_fds, ofdt->open_fds, cpy);
 	memset((char *)nfdt->open_fds + cpy, 0, set);
-	memcpy(nfdt->close_on_exec, ofdt->close_on_exec, cpy);
-	memset((char *)nfdt->close_on_exec + cpy, 0, set);
 }
 
 /*
@@ -115,13 +113,10 @@ static struct fdtable * alloc_fdtable(unsigned int nr)
 		goto out;
 	fdt->max_fds = nr;
 
-	data = alloc_fdmem(max_t(size_t,
-				 2 * nr / BITS_PER_BYTE, L1_CACHE_BYTES));
+	data = alloc_fdmem(max_t(size_t, nr / BITS_PER_BYTE, L1_CACHE_BYTES));
 	if (!data)
 		goto out_fdt;
 	fdt->open_fds = data;
-	data += nr / BITS_PER_BYTE;
-	fdt->close_on_exec = data;
 
 	return fdt;
 
@@ -227,15 +222,16 @@ static inline bool fd_is_open(unsigned int fd, struct files_struct *files)
 	return !idr_tag_get(&files->fd_idr, fd, IDR_FREE);
 }
 
-static inline void __set_close_on_exec(unsigned int fd, struct fdtable *fdt)
+static inline void __set_close_on_exec(unsigned int fd,
+					struct files_struct *files)
 {
-	__set_bit(fd, fdt->close_on_exec);
+	idr_tag_set(&files->fd_idr, fd, FD_TAG_CLOEXEC);
 }
 
-static inline void __clear_close_on_exec(unsigned int fd, struct fdtable *fdt)
+static inline void __clear_close_on_exec(unsigned int fd,
+					 struct files_struct *files)
 {
-	if (test_bit(fd, fdt->close_on_exec))
-		__clear_bit(fd, fdt->close_on_exec);
+	idr_tag_clear(&files->fd_idr, fd, FD_TAG_CLOEXEC);
 }
 
 static inline void __set_open_fd(unsigned int fd, struct fdtable *fdt)
@@ -287,7 +283,6 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
 	init_waitqueue_head(&newf->resize_wait);
 	new_fdt = &newf->fdtab;
 	new_fdt->max_fds = NR_OPEN_DEFAULT;
-	new_fdt->close_on_exec = newf->close_on_exec_init;
 	new_fdt->open_fds = newf->open_fds_init;
 
 restart:
@@ -347,6 +342,9 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
 			spin_unlock(&oldf->file_lock);
 			goto out;
 		}
+
+		if (idr_tag_get(&oldf->fd_idr, i, FD_TAG_CLOEXEC))
+			idr_tag_set(&newf->fd_idr, i, FD_TAG_CLOEXEC);
 	}
 
 	spin_unlock(&oldf->file_lock);
@@ -445,7 +443,6 @@ struct files_struct init_files = {
 	.fdt		= &init_files.fdtab,
 	.fdtab		= {
 		.max_fds	= NR_OPEN_DEFAULT,
-		.close_on_exec	= init_files.close_on_exec_init,
 		.open_fds	= init_files.open_fds_init,
 	},
 	.file_lock	= __SPIN_LOCK_UNLOCKED(init_files.file_lock),
@@ -482,9 +479,9 @@ int __alloc_fd(struct files_struct *files,
 	fdt = files_fdtable(files);
 	__set_open_fd(fd, fdt);
 	if (flags & O_CLOEXEC)
-		__set_close_on_exec(fd, fdt);
+		__set_close_on_exec(fd, files);
 	else
-		__clear_close_on_exec(fd, fdt);
+		__clear_close_on_exec(fd, files);
 	error = fd;
 
 out:
@@ -571,7 +568,7 @@ int __close_fd(struct files_struct *files, unsigned fd)
 	file = idr_remove(&files->fd_idr, fd);
 	if (!file)
 		goto out_unlock;
-	__clear_close_on_exec(fd, fdt);
+	__clear_close_on_exec(fd, files);
 	__put_unused_fd(files, fd);
 	spin_unlock(&files->file_lock);
 	return filp_close(file, files);
@@ -583,35 +580,19 @@ int __close_fd(struct files_struct *files, unsigned fd)
 
 void do_close_on_exec(struct files_struct *files)
 {
-	unsigned i;
-	struct fdtable *fdt;
+	struct file *file;
+	unsigned int fd;
 
 	/* exec unshares first */
 	spin_lock(&files->file_lock);
-	for (i = 0; ; i++) {
-		unsigned long set;
-		unsigned fd = i * BITS_PER_LONG;
-		fdt = files_fdtable(files);
-		if (fd >= fdt->max_fds)
-			break;
-		set = fdt->close_on_exec[i];
-		if (!set)
-			continue;
-		fdt->close_on_exec[i] = 0;
-		for ( ; set ; fd++, set >>= 1) {
-			struct file *file;
-			if (!(set & 1))
-				continue;
-			file = idr_remove(&files->fd_idr, fd);
-			if (!file)
-				continue;
-			__put_unused_fd(files, fd);
-			spin_unlock(&files->file_lock);
-			filp_close(file, files);
-			cond_resched();
-			spin_lock(&files->file_lock);
-		}
 
+	idr_for_each_entry_tagged(&files->fd_idr, file, fd, FD_TAG_CLOEXEC) {
+		idr_remove(&files->fd_idr, fd);
+		__put_unused_fd(files, fd);
+		spin_unlock(&files->file_lock);
+		filp_close(file, files);
+		cond_resched();
+		spin_lock(&files->file_lock);
 	}
 	spin_unlock(&files->file_lock);
 }
@@ -723,28 +704,14 @@ void __f_unlock_pos(struct file *f)
 void set_close_on_exec(unsigned int fd, int flag)
 {
 	struct files_struct *files = current->files;
-	struct fdtable *fdt;
 	spin_lock(&files->file_lock);
-	fdt = files_fdtable(files);
 	if (flag)
-		__set_close_on_exec(fd, fdt);
+		__set_close_on_exec(fd, files);
 	else
-		__clear_close_on_exec(fd, fdt);
+		__clear_close_on_exec(fd, files);
 	spin_unlock(&files->file_lock);
 }
 
-bool get_close_on_exec(unsigned int fd)
-{
-	struct files_struct *files = current->files;
-	struct fdtable *fdt;
-	bool res;
-	rcu_read_lock();
-	fdt = files_fdtable(files);
-	res = close_on_exec(fd, fdt);
-	rcu_read_unlock();
-	return res;
-}
-
 static int do_dup2(struct files_struct *files,
 	struct file *file, unsigned fd, unsigned flags)
 __releases(&files->file_lock)
@@ -783,9 +750,9 @@ static int do_dup2(struct files_struct *files,
 	}
 	__set_open_fd(fd, fdt);
 	if (flags & O_CLOEXEC)
-		__set_close_on_exec(fd, fdt);
+		__set_close_on_exec(fd, files);
 	else
-		__clear_close_on_exec(fd, fdt);
+		__clear_close_on_exec(fd, files);
 	spin_unlock(&files->file_lock);
 	idr_preload_end();
 
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index c330495..2735ccc 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -36,10 +36,8 @@ static int seq_show(struct seq_file *m, void *v)
 		spin_lock(&files->file_lock);
 		file = fcheck_files(files, fd);
 		if (file) {
-			struct fdtable *fdt = files_fdtable(files);
-
 			f_flags = file->f_flags;
-			if (close_on_exec(fd, fdt))
+			if (close_on_exec(fd, files))
 				f_flags |= O_CLOEXEC;
 
 			get_file(file);
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 67259f4..7f1ab82 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -22,18 +22,14 @@
  */
 #define NR_OPEN_DEFAULT BITS_PER_LONG
 
+#define	FD_TAG_CLOEXEC	1
+
 struct fdtable {
 	unsigned int max_fds;
-	unsigned long *close_on_exec;
 	unsigned long *open_fds;
 	struct rcu_head rcu;
 };
 
-static inline bool close_on_exec(unsigned int fd, const struct fdtable *fdt)
-{
-	return test_bit(fd, fdt->close_on_exec);
-}
-
 /*
  * Open file table structure
  */
@@ -52,7 +48,6 @@ struct files_struct {
    * written part on a separate cache line in SMP
    */
 	spinlock_t file_lock ____cacheline_aligned_in_smp;
-	unsigned long close_on_exec_init[1];
 	unsigned long open_fds_init[1];
 };
 
@@ -82,6 +77,16 @@ static inline struct file *fcheck_files(struct files_struct *files, unsigned int
 	return __fcheck_files(files, fd);
 }
 
+static inline bool close_on_exec(unsigned int fd, struct files_struct *files)
+{
+	bool res;
+
+	rcu_read_lock();
+	res = idr_tag_get(&files->fd_idr, fd, FD_TAG_CLOEXEC);
+	rcu_read_unlock();
+	return res;
+}
+
 /*
  * Check whether the specified fd has an open file.
  */
diff --git a/include/linux/file.h b/include/linux/file.h
index 61eb82c..1856bbf 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -76,7 +76,6 @@ static inline void fdput_pos(struct fd f)
 extern int f_dupfd(unsigned int from, struct file *file, unsigned flags);
 extern int replace_fd(unsigned fd, struct file *file, unsigned flags);
 extern void set_close_on_exec(unsigned int fd, int flag);
-extern bool get_close_on_exec(unsigned int fd);
 extern void put_filp(struct file *);
 extern int get_unused_fd_flags(unsigned flags);
 extern void put_unused_fd(unsigned int fd);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 11/13] vfs: Add init_task.h include
  2017-04-27 18:48 [PATCH 00/13] vfs: Convert file allocation code to use the IDR Sandhya Bankar
                   ` (9 preceding siblings ...)
  2017-04-27 19:14 ` [PATCH 10/13] vfs: Replace close_on_exec bitmap with an IDR tag Sandhya Bankar
@ 2017-04-27 19:17 ` Sandhya Bankar
  2017-04-27 19:18 ` [PATCH 12/13] vfs: Convert select to use idr_get_tag_batch() Sandhya Bankar
  2017-04-27 19:19 ` [PATCH 13/13] vfs: Delete struct fdtable Sandhya Bankar
  12 siblings, 0 replies; 15+ messages in thread
From: Sandhya Bankar @ 2017-04-27 19:17 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, viro, mawmilcox, keescook,
	adobriyan, re.emese, riel

Removes a sparse warning about init_files() not being declared.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/file.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/file.c b/fs/file.c
index 56c5731..23f198b 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -19,6 +19,7 @@
 #include <linux/fdtable.h>
 #include <linux/bitops.h>
 #include <linux/interrupt.h>
+#include <linux/init_task.h>
 #include <linux/spinlock.h>
 #include <linux/rcupdate.h>
 #include <linux/workqueue.h>
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 12/13] vfs: Convert select to use idr_get_tag_batch()
  2017-04-27 18:48 [PATCH 00/13] vfs: Convert file allocation code to use the IDR Sandhya Bankar
                   ` (10 preceding siblings ...)
  2017-04-27 19:17 ` [PATCH 11/13] vfs: Add init_task.h include Sandhya Bankar
@ 2017-04-27 19:18 ` Sandhya Bankar
  2017-04-27 19:19 ` [PATCH 13/13] vfs: Delete struct fdtable Sandhya Bankar
  12 siblings, 0 replies; 15+ messages in thread
From: Sandhya Bankar @ 2017-04-27 19:18 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, viro, mawmilcox, keescook,
	adobriyan, re.emese, riel

Convert select to use idr_get_tag_batch().

Signed-off-by: Sandhya Bankar <bankarsandhya512@gmail.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/select.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index e211227..5d20a14 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -346,32 +346,33 @@ static int poll_select_copy_remaining(struct timespec64 *end_time,
 
 static int max_select_fd(unsigned long n, fd_set_bits *fds)
 {
-	unsigned long *open_fds;
+	unsigned long bad_fds;
 	unsigned long set;
 	int max;
-	struct fdtable *fdt;
+	struct idr *fd_idr = &current->files->fd_idr;
 
 	/* handle last in-complete long-word first */
 	set = ~(~0UL << (n & (BITS_PER_LONG-1)));
 	n /= BITS_PER_LONG;
-	fdt = files_fdtable(current->files);
-	open_fds = fdt->open_fds + n;
 	max = 0;
 	if (set) {
 		set &= BITS(fds, n);
 		if (set) {
-			if (!(set & ~*open_fds))
+			bad_fds = idr_get_tag_batch(fd_idr, (n * BITS_PER_LONG),
+							IDR_FREE);
+			if (!(set & bad_fds))
 				goto get_max;
 			return -EBADF;
 		}
 	}
 	while (n) {
-		open_fds--;
 		n--;
 		set = BITS(fds, n);
 		if (!set)
 			continue;
-		if (set & ~*open_fds)
+		bad_fds = idr_get_tag_batch(fd_idr, (n * BITS_PER_LONG),
+						IDR_FREE);
+		if (set & bad_fds)
 			return -EBADF;
 		if (max)
 			continue;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 13/13] vfs: Delete struct fdtable
  2017-04-27 18:48 [PATCH 00/13] vfs: Convert file allocation code to use the IDR Sandhya Bankar
                   ` (11 preceding siblings ...)
  2017-04-27 19:18 ` [PATCH 12/13] vfs: Convert select to use idr_get_tag_batch() Sandhya Bankar
@ 2017-04-27 19:19 ` Sandhya Bankar
  12 siblings, 0 replies; 15+ messages in thread
From: Sandhya Bankar @ 2017-04-27 19:19 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, viro, mawmilcox, keescook,
	adobriyan, re.emese, riel

Completing the conversion of the file descriptor allocation code to use
the IDR.

This patch includes below changes:

 - Move max_fds from struct fdtable to files_struct.
 - Added fill_max_fds() routine to calculate the new value of max_fds
   to matches the old behaviour of alloc_fdtable() code which is
   user-visible through /proc.
 - Remove struct fdtable
 - Removed resize_in_progress, resize_wait from files_struct
 - Delete open_fds() & count_open_files()
 - Use READ_ONCE() instead of rcu_read_lock/unlock(). The rcu_read_lock()/
   unlock() was used to dereference the files->fdt. Now we don't use RCU to
   protect files->max_fds. So using READ_ONCE() macro to read the value of
   files->max_fds.

Signed-off-by: Sandhya Bankar <bankarsandhya512@gmail.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/compat.c             |   6 +-
 fs/file.c               | 354 +++++++-----------------------------------------
 fs/proc/array.c         |   2 +-
 fs/proc/fd.c            |   2 +-
 fs/select.c             |   6 +-
 include/linux/fdtable.h |  31 +----
 6 files changed, 54 insertions(+), 347 deletions(-)

diff --git a/fs/compat.c b/fs/compat.c
index c61b506..7483c9c 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -1153,17 +1153,13 @@ int compat_core_sys_select(int n, compat_ulong_t __user *inp,
 	fd_set_bits fds;
 	void *bits;
 	int size, max_fds, ret = -EINVAL;
-	struct fdtable *fdt;
 	long stack_fds[SELECT_STACK_ALLOC/sizeof(long)];
 
 	if (n < 0)
 		goto out_nofds;
 
 	/* max_fds can increase, so grab it once to avoid race */
-	rcu_read_lock();
-	fdt = files_fdtable(current->files);
-	max_fds = fdt->max_fds;
-	rcu_read_unlock();
+	max_fds = READ_ONCE(current->files->max_fds);
 	if (n > max_fds)
 		n = max_fds;
 
diff --git a/fs/file.c b/fs/file.c
index 23f198b..3e6cf10 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -31,191 +31,36 @@
 unsigned int sysctl_nr_open_max =
 	__const_min(INT_MAX, ~(size_t)0/sizeof(void *)) & -BITS_PER_LONG;
 
-static void *alloc_fdmem(size_t size)
+static int fill_max_fds(struct files_struct *files, unsigned int nr)
 {
-	/*
-	 * Very large allocations can stress page reclaim, so fall back to
-	 * vmalloc() if the allocation size will be considered "large" by the VM.
-	 */
-	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
-		void *data = kmalloc(size, GFP_KERNEL_ACCOUNT |
-				     __GFP_NOWARN | __GFP_NORETRY);
-		if (data != NULL)
-			return data;
-	}
-	return __vmalloc(size, GFP_KERNEL_ACCOUNT | __GFP_HIGHMEM, PAGE_KERNEL);
-}
+	unsigned int nr_open;
 
-static void __free_fdtable(struct fdtable *fdt)
-{
-	kvfree(fdt->open_fds);
-	kfree(fdt);
-}
-
-static void free_fdtable_rcu(struct rcu_head *rcu)
-{
-	__free_fdtable(container_of(rcu, struct fdtable, rcu));
-}
-
-/*
- * Copy 'count' fd bits from the old table to the new table and clear the extra
- * space if any.  This does not copy the file pointers.  Called with the files
- * spinlock held for write.
- */
-static void copy_fd_bitmaps(struct fdtable *nfdt, struct fdtable *ofdt,
-			    unsigned int count)
-{
-	unsigned int cpy, set;
-
-	cpy = count / BITS_PER_BYTE;
-	set = (nfdt->max_fds - count) / BITS_PER_BYTE;
-	memcpy(nfdt->open_fds, ofdt->open_fds, cpy);
-	memset((char *)nfdt->open_fds + cpy, 0, set);
-}
+	if (likely(nr < files->max_fds))
+		return 0;
 
-/*
- * Copy all file descriptors from the old table to the new, expanded table and
- * clear the extra space.  Called with the files spinlock held for write.
- */
-static void copy_fdtable(struct fdtable *nfdt, struct fdtable *ofdt)
-{
-	BUG_ON(nfdt->max_fds < ofdt->max_fds);
-	copy_fd_bitmaps(nfdt, ofdt, ofdt->max_fds);
-}
+	nr_open = READ_ONCE(sysctl_nr_open);
 
-static struct fdtable * alloc_fdtable(unsigned int nr)
-{
-	struct fdtable *fdt;
-	void *data;
+	if (nr >= nr_open)
+		return -EMFILE;
 
 	/*
-	 * Figure out how many fds we actually want to support in this fdtable.
-	 * Allocation steps are keyed to the size of the fdarray, since it
-	 * grows far faster than any of the other dynamic data. We try to fit
-	 * the fdarray into comfortable page-tuned chunks: starting at 1024B
-	 * and growing in powers of two from there on.
+	 * This calculation of the new value of max_fds matches the old
+	 * behaviour of this code which is user-visible through /proc.
+	 * nr may exceed the sysctl_nr_open value by a small amount.
+	 * This used to be necessary to keep the bitmaps aligned, and
+	 * we keep it the same even though the IDR handles the bitmaps now.
 	 */
+
 	nr /= (1024 / sizeof(struct file *));
 	nr = roundup_pow_of_two(nr + 1);
 	nr *= (1024 / sizeof(struct file *));
-	/*
-	 * Note that this can drive nr *below* what we had passed if sysctl_nr_open
-	 * had been set lower between the check in expand_files() and here.  Deal
-	 * with that in caller, it's cheaper that way.
-	 *
-	 * We make sure that nr remains a multiple of BITS_PER_LONG - otherwise
-	 * bitmaps handling below becomes unpleasant, to put it mildly...
-	 */
-	if (unlikely(nr > sysctl_nr_open))
-		nr = ((sysctl_nr_open - 1) | (BITS_PER_LONG - 1)) + 1;
 
-	fdt = kmalloc(sizeof(struct fdtable), GFP_KERNEL_ACCOUNT);
-	if (!fdt)
-		goto out;
-	fdt->max_fds = nr;
+	if (unlikely(nr > nr_open))
+		nr = ((nr_open - 1) | (BITS_PER_LONG - 1)) + 1;
 
-	data = alloc_fdmem(max_t(size_t, nr / BITS_PER_BYTE, L1_CACHE_BYTES));
-	if (!data)
-		goto out_fdt;
-	fdt->open_fds = data;
+	files->max_fds = nr;
 
-	return fdt;
-
-out_fdt:
-	kfree(fdt);
-out:
-	return NULL;
-}
-
-/*
- * Expand the file descriptor table.
- * This function will allocate a new fdtable and both fd array and fdset, of
- * the given size.
- * Return <0 error code on error; 1 on successful completion.
- * The files->file_lock should be held on entry, and will be held on exit.
- */
-static int expand_fdtable(struct files_struct *files, unsigned int nr)
-	__releases(files->file_lock)
-	__acquires(files->file_lock)
-{
-	struct fdtable *new_fdt, *cur_fdt;
-
-	spin_unlock(&files->file_lock);
-	idr_preload_end();
-	new_fdt = alloc_fdtable(nr);
-
-	/* make sure all __fd_install() have seen resize_in_progress
-	 * or have finished their rcu_read_lock_sched() section.
-	 */
-	if (atomic_read(&files->count) > 1)
-		synchronize_sched();
-
-	idr_preload(GFP_KERNEL);
-	spin_lock(&files->file_lock);
-	if (!new_fdt)
-		return -ENOMEM;
-	/*
-	 * extremely unlikely race - sysctl_nr_open decreased between the check in
-	 * caller and alloc_fdtable().  Cheaper to catch it here...
-	 */
-	if (unlikely(new_fdt->max_fds <= nr)) {
-		__free_fdtable(new_fdt);
-		return -EMFILE;
-	}
-	cur_fdt = files_fdtable(files);
-	BUG_ON(nr < cur_fdt->max_fds);
-	copy_fdtable(new_fdt, cur_fdt);
-	rcu_assign_pointer(files->fdt, new_fdt);
-	if (cur_fdt != &files->fdtab)
-		call_rcu(&cur_fdt->rcu, free_fdtable_rcu);
-	/* coupled with smp_rmb() in __fd_install() */
-	smp_wmb();
-	return 1;
-}
-
-/*
- * Expand files.
- * This function will expand the file structures, if the requested size exceeds
- * the current capacity and there is room for expansion.
- * Return <0 error code on error; 0 when nothing done; 1 when files were
- * expanded and execution may have blocked.
- * The files->file_lock should be held on entry, and will be held on exit.
- */
-static int expand_files(struct files_struct *files, unsigned int nr)
-	__releases(files->file_lock)
-	__acquires(files->file_lock)
-{
-	struct fdtable *fdt;
-	int expanded = 0;
-
-repeat:
-	fdt = files_fdtable(files);
-
-	/* Do we need to expand? */
-	if (nr < fdt->max_fds)
-		return expanded;
-
-	/* Can we expand? */
-	if (nr >= sysctl_nr_open)
-		return -EMFILE;
-
-	if (unlikely(files->resize_in_progress)) {
-		spin_unlock(&files->file_lock);
-		idr_preload_end();
-		expanded = 1;
-		wait_event(files->resize_wait, !files->resize_in_progress);
-		idr_preload(GFP_KERNEL);
-		spin_lock(&files->file_lock);
-		goto repeat;
-	}
-
-	/* All good, so we try */
-	files->resize_in_progress = true;
-	expanded = expand_fdtable(files, nr);
-	files->resize_in_progress = false;
-
-	wake_up_all(&files->resize_wait);
-	return expanded;
+	return 0;
 }
 
 static inline bool fd_is_open(unsigned int fd, struct files_struct *files)
@@ -235,28 +80,21 @@ static inline void __clear_close_on_exec(unsigned int fd,
 	idr_tag_clear(&files->fd_idr, fd, FD_TAG_CLOEXEC);
 }
 
-static inline void __set_open_fd(unsigned int fd, struct fdtable *fdt)
-{
-	__set_bit(fd, fdt->open_fds);
-}
-
-static inline void __clear_open_fd(unsigned int fd, struct fdtable *fdt)
-{
-	__clear_bit(fd, fdt->open_fds);
-}
-
-static unsigned int count_open_files(struct fdtable *fdt)
+static void close_files(struct files_struct * files)
 {
-	unsigned int size = fdt->max_fds;
-	unsigned int i;
+	/*
+	 * No need for RCU or ->file_lock protection because
+	 * this is the last reference to the files structure.
+	 */
+	struct file *file;
+	int fd;
 
-	/* Find the last open fd */
-	for (i = size / BITS_PER_LONG; i > 0; ) {
-		if (fdt->open_fds[--i])
-			break;
+	idr_for_each_entry(&files->fd_idr, file, fd) {
+		filp_close(file, files);
+		cond_resched_rcu_qs();
 	}
-	i = (i + 1) * BITS_PER_LONG;
-	return i;
+
+	idr_destroy(&files->fd_idr);
 }
 
 /*
@@ -267,9 +105,8 @@ static unsigned int count_open_files(struct fdtable *fdt)
 struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
 {
 	struct files_struct *newf;
-	unsigned int open_files, i;
+	unsigned int i;
 	struct file *f;
-	struct fdtable *old_fdt, *new_fdt;
 
 	*errorp = -ENOMEM;
 	newf = kmem_cache_alloc(files_cachep, GFP_KERNEL);
@@ -280,51 +117,11 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
 
 	spin_lock_init(&newf->file_lock);
 	idr_init(&newf->fd_idr);
-	newf->resize_in_progress = false;
-	init_waitqueue_head(&newf->resize_wait);
-	new_fdt = &newf->fdtab;
-	new_fdt->max_fds = NR_OPEN_DEFAULT;
-	new_fdt->open_fds = newf->open_fds_init;
+	newf->max_fds = NR_OPEN_DEFAULT;
 
 restart:
 	idr_copy_preload(&oldf->fd_idr, GFP_KERNEL);
 	spin_lock(&oldf->file_lock);
-	old_fdt = files_fdtable(oldf);
-	open_files = count_open_files(old_fdt);
-
-	/*
-	 * Check whether we need to allocate a larger fd array and fd set.
-	 */
-	while (unlikely(open_files > new_fdt->max_fds)) {
-		spin_unlock(&oldf->file_lock);
-		idr_preload_end();
-
-		if (new_fdt != &newf->fdtab)
-			__free_fdtable(new_fdt);
-
-		new_fdt = alloc_fdtable(open_files - 1);
-		if (!new_fdt) {
-			*errorp = -ENOMEM;
-			goto out_release;
-		}
-
-		/* beyond sysctl_nr_open; nothing to do */
-		if (unlikely(new_fdt->max_fds < open_files)) {
-			__free_fdtable(new_fdt);
-			*errorp = -EMFILE;
-			goto out_release;
-		}
-
-		/*
-		 * Reacquire the oldf lock and a pointer to its fd table
-		 * who knows it may have a new bigger fd table. We need
-		 * the latest pointer.
-		 */
-		idr_copy_preload(&oldf->fd_idr, GFP_KERNEL);
-		spin_lock(&oldf->file_lock);
-		old_fdt = files_fdtable(oldf);
-		open_files = count_open_files(old_fdt);
-	}
 
 	if (!idr_check_preload(&oldf->fd_idr)) {
 		spin_unlock(&oldf->file_lock);
@@ -332,8 +129,6 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
 		goto restart;
 	}
 
-	copy_fd_bitmaps(new_fdt, old_fdt, open_files);
-
 	idr_for_each_entry(&oldf->fd_idr, f, i) {
 		int err;
 
@@ -341,6 +136,7 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
 		err = idr_alloc(&newf->fd_idr, f, i, i + 1, GFP_NOWAIT);
 		if (WARN(err != i, "Could not allocate %d: %d", i, err)) {
 			spin_unlock(&oldf->file_lock);
+			idr_preload_end();
 			goto out;
 		}
 
@@ -351,42 +147,18 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
 	spin_unlock(&oldf->file_lock);
 	idr_preload_end();
 
-	/*
-	 * The fd may be claimed in the fd bitmap but not yet
-	 * instantiated in the files array if a sibling thread
-	 * is partway through open().
-	 */
-	for_each_set_bit(i, new_fdt->open_fds, new_fdt->max_fds) {
-		if (!idr_find(&newf->fd_idr, i))
-			__clear_bit(i, new_fdt->open_fds);
+	if (unlikely(fill_max_fds(newf, i) < 0)) {
+		*errorp =  -EMFILE;
+		goto out;
 	}
 
-	rcu_assign_pointer(newf->fdt, new_fdt);
-
 	return newf;
 
-out_release:
-	idr_destroy(&newf->fd_idr);
-	kmem_cache_free(files_cachep, newf);
 out:
-	return NULL;
-}
-
-static void close_files(struct files_struct * files)
-{
-	/*
-	 * No need for RCU or ->file_lock protection because
-	 * this is the last reference to the files structure.
-	 */
-	struct file *file;
-	int fd;
-
-	idr_for_each_entry(&files->fd_idr, file, fd) {
-		filp_close(file, files);
-		cond_resched_rcu_qs();
-	}
+	close_files(newf);
+	kmem_cache_free(files_cachep, newf);
 
-	idr_destroy(&files->fd_idr);
+	return NULL;
 }
 
 struct files_struct *get_files_struct(struct task_struct *task)
@@ -405,12 +177,7 @@ struct files_struct *get_files_struct(struct task_struct *task)
 void put_files_struct(struct files_struct *files)
 {
 	if (atomic_dec_and_test(&files->count)) {
-		struct fdtable *fdt = rcu_dereference_raw(files->fdt);
 		close_files(files);
-
-		/* free the arrays if they are not embedded */
-		if (fdt != &files->fdtab)
-			__free_fdtable(fdt);
 		kmem_cache_free(files_cachep, files);
 	}
 }
@@ -441,11 +208,7 @@ void exit_files(struct task_struct *tsk)
 
 struct files_struct init_files = {
 	.count		= ATOMIC_INIT(1),
-	.fdt		= &init_files.fdtab,
-	.fdtab		= {
-		.max_fds	= NR_OPEN_DEFAULT,
-		.open_fds	= init_files.open_fds_init,
-	},
+	.max_fds	= NR_OPEN_DEFAULT,
 	.file_lock	= __SPIN_LOCK_UNLOCKED(init_files.file_lock),
 	.fd_idr		= IDR_INIT,
 };
@@ -458,7 +221,6 @@ int __alloc_fd(struct files_struct *files,
 {
 	unsigned int fd;
 	int error;
-	struct fdtable *fdt;
 
 	idr_preload(GFP_KERNEL);
 	spin_lock(&files->file_lock);
@@ -471,14 +233,12 @@ int __alloc_fd(struct files_struct *files,
 	BUG_ON(error < 0);
 	fd = error;
 
-	error = expand_files(files, fd);
+	error = fill_max_fds(files, fd);
 	if (error < 0) {
 		idr_remove(&files->fd_idr, fd);
 		goto out;
 	}
 
-	fdt = files_fdtable(files);
-	__set_open_fd(fd, fdt);
 	if (flags & O_CLOEXEC)
 		__set_close_on_exec(fd, files);
 	else
@@ -502,18 +262,11 @@ int get_unused_fd_flags(unsigned flags)
 }
 EXPORT_SYMBOL(get_unused_fd_flags);
 
-static void __put_unused_fd(struct files_struct *files, unsigned int fd)
-{
-	struct fdtable *fdt = files_fdtable(files);
-	__clear_open_fd(fd, fdt);
-}
-
 void put_unused_fd(unsigned int fd)
 {
 	struct files_struct *files = current->files;
 	spin_lock(&files->file_lock);
 	BUG_ON(idr_remove(&files->fd_idr, fd));
-	__put_unused_fd(files, fd);
 	spin_unlock(&files->file_lock);
 }
 
@@ -560,17 +313,12 @@ void fd_install(unsigned int fd, struct file *file)
 int __close_fd(struct files_struct *files, unsigned fd)
 {
 	struct file *file;
-	struct fdtable *fdt;
 
 	spin_lock(&files->file_lock);
-	fdt = files_fdtable(files);
-	if (fd >= fdt->max_fds)
-		goto out_unlock;
 	file = idr_remove(&files->fd_idr, fd);
 	if (!file)
 		goto out_unlock;
 	__clear_close_on_exec(fd, files);
-	__put_unused_fd(files, fd);
 	spin_unlock(&files->file_lock);
 	return filp_close(file, files);
 
@@ -589,7 +337,6 @@ void do_close_on_exec(struct files_struct *files)
 
 	idr_for_each_entry_tagged(&files->fd_idr, file, fd, FD_TAG_CLOEXEC) {
 		idr_remove(&files->fd_idr, fd);
-		__put_unused_fd(files, fd);
 		spin_unlock(&files->file_lock);
 		filp_close(file, files);
 		cond_resched();
@@ -718,7 +465,6 @@ static int do_dup2(struct files_struct *files,
 __releases(&files->file_lock)
 {
 	struct file *tofree;
-	struct fdtable *fdt;
 
 	/*
 	 * We need to detect attempts to do dup2() over allocated but still
@@ -734,7 +480,6 @@ static int do_dup2(struct files_struct *files,
 	 * scope of POSIX or SUS, since neither considers shared descriptor
 	 * tables and this condition does not arise without those.
 	 */
-	fdt = files_fdtable(files);
 	tofree = idr_find(&files->fd_idr, fd);
 	if (!tofree && fd_is_open(fd, files))
 		goto Ebusy;
@@ -749,7 +494,6 @@ static int do_dup2(struct files_struct *files,
 			goto Ebusy;
 		}
 	}
-	__set_open_fd(fd, fdt);
 	if (flags & O_CLOEXEC)
 		__set_close_on_exec(fd, files);
 	else
@@ -781,9 +525,10 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
 
 	idr_preload(GFP_KERNEL);
 	spin_lock(&files->file_lock);
-	err = expand_files(files, fd);
+	err = fill_max_fds(files, fd);
 	if (unlikely(err < 0))
 		goto out_unlock;
+
 	return do_dup2(files, file, fd, flags);
 
 out_unlock:
@@ -809,20 +554,18 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
 
 	idr_preload(GFP_KERNEL);
 	spin_lock(&files->file_lock);
-	err = expand_files(files, newfd);
+	err = fill_max_fds(files, newfd);
+	if (unlikely(err < 0))
+		 goto Ebadf;
+
 	file = fcheck(oldfd);
 	if (unlikely(!file))
 		goto Ebadf;
-	if (unlikely(err < 0)) {
-		if (err == -EMFILE)
-			goto Ebadf;
-		goto out_unlock;
-	}
+
 	return do_dup2(files, file, newfd, flags);
 
 Ebadf:
 	err = -EBADF;
-out_unlock:
 	spin_unlock(&files->file_lock);
 	idr_preload_end();
 	return err;
@@ -875,12 +618,11 @@ int iterate_fd(struct files_struct *files, unsigned n,
 		int (*f)(const void *, struct file *, unsigned),
 		const void *p)
 {
-	struct fdtable *fdt;
 	int res = 0;
 	if (!files)
 		return 0;
 	spin_lock(&files->file_lock);
-	for (fdt = files_fdtable(files); n < fdt->max_fds; n++) {
+	for (; n < files->max_fds; n++) {
 		struct file *file;
 		file = __fcheck_files(files, n);
 		if (!file)
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 88c3555..ec6fdaf 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -186,7 +186,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
 
 	task_lock(p);
 	if (p->files)
-		max_fds = files_fdtable(p->files)->max_fds;
+		max_fds = READ_ONCE(p->files->max_fds);
 	task_unlock(p);
 	rcu_read_unlock();
 
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 2735ccc..7e5aeca 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -231,7 +231,7 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx,
 
 	rcu_read_lock();
 	for (fd = ctx->pos - 2;
-	     fd < files_fdtable(files)->max_fds;
+	     fd < files->max_fds;
 	     fd++, ctx->pos++) {
 		char name[PROC_NUMBUF];
 		int len;
diff --git a/fs/select.c b/fs/select.c
index 5d20a14..14da393 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -557,7 +557,6 @@ int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp,
 	void *bits;
 	int ret, max_fds;
 	size_t size, alloc_size;
-	struct fdtable *fdt;
 	/* Allocate small arguments on the stack to save memory and be faster */
 	long stack_fds[SELECT_STACK_ALLOC/sizeof(long)];
 
@@ -566,10 +565,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp,
 		goto out_nofds;
 
 	/* max_fds can increase, so grab it once to avoid race */
-	rcu_read_lock();
-	fdt = files_fdtable(current->files);
-	max_fds = fdt->max_fds;
-	rcu_read_unlock();
+	max_fds = READ_ONCE(current->files->max_fds);
 	if (n > max_fds)
 		n = max_fds;
 
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 7f1ab82..ff94541 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -24,43 +24,16 @@
 
 #define	FD_TAG_CLOEXEC	1
 
-struct fdtable {
-	unsigned int max_fds;
-	unsigned long *open_fds;
-	struct rcu_head rcu;
-};
-
 /*
  * Open file table structure
  */
 struct files_struct {
-  /*
-   * read mostly part
-   */
+	spinlock_t file_lock;
 	atomic_t count;
-	bool resize_in_progress;
-	wait_queue_head_t resize_wait;
-
 	struct idr fd_idr;
-	struct fdtable __rcu *fdt;
-	struct fdtable fdtab;
-  /*
-   * written part on a separate cache line in SMP
-   */
-	spinlock_t file_lock ____cacheline_aligned_in_smp;
-	unsigned long open_fds_init[1];
+	unsigned int max_fds;
 };
 
-struct file_operations;
-struct vfsmount;
-struct dentry;
-
-#define rcu_dereference_check_fdtable(files, fdtfd) \
-	rcu_dereference_check((fdtfd), lockdep_is_held(&(files)->file_lock))
-
-#define files_fdtable(files) \
-	rcu_dereference_check_fdtable((files), (files)->fdt)
-
 /*
  * The caller must ensure that fd table isn't shared or hold rcu or file lock
  */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RESEND PATCH 05/13] vfs: Replace array of file pointers with an IDR
  2017-04-27 19:08 ` [PATCH 05/13] vfs: Replace array of file pointers with an IDR Sandhya Bankar
@ 2017-10-04 15:45   ` Mateusz Guzik
  0 siblings, 0 replies; 15+ messages in thread
From: Mateusz Guzik @ 2017-10-04 15:45 UTC (permalink / raw)
  To: Sandhya Bankar
  Cc: linux-fsdevel, linux-kernel, viro, mawilcox, keescook, adobriyan,
	re.emese, riel

On Tue, Jul 11, 2017 at 06:50:52PM +0530, Sandhya Bankar wrote:
> Instead of storing all the file pointers in a single array, use an
> IDR.  It is RCU-safe, and does not need to be reallocated when the
> fd array grows.  It also handles allocation of new file descriptors.
> 
> ---
>  
[snip]
> @@ -604,22 +576,9 @@ void put_unused_fd(unsigned int fd)
>  void __fd_install(struct files_struct *files, unsigned int fd,
>  		struct file *file)
>  {
> -	struct fdtable *fdt;
> -
> -	might_sleep();
> -	rcu_read_lock_sched();
> -
> -	while (unlikely(files->resize_in_progress)) {
> -		rcu_read_unlock_sched();
> -		wait_event(files->resize_wait, !files->resize_in_progress);
> -		rcu_read_lock_sched();
> -	}
> -	/* coupled with smp_wmb() in expand_fdtable() */
> -	smp_rmb();
> -	fdt = rcu_dereference_sched(files->fdt);
> -	BUG_ON(fdt->fd[fd] != NULL);
> -	rcu_assign_pointer(fdt->fd[fd], file);
> -	rcu_read_unlock_sched();
> +	rcu_read_lock();
> +	BUG_ON(idr_replace(&files->fd_idr, file, fd));
> +	rcu_read_unlock();
>  }
>  
>  void fd_install(unsigned int fd, struct file *file)
> @@ -641,10 +600,9 @@ int __close_fd(struct files_struct *files, unsigned fd)
>  	fdt = files_fdtable(files);
>  	if (fd >= fdt->max_fds)
>  		goto out_unlock;
> -	file = fdt->fd[fd];
> +	file = idr_remove(&files->fd_idr, fd);
>  	if (!file)
>  		goto out_unlock;
> -	rcu_assign_pointer(fdt->fd[fd], NULL);
>  	__clear_close_on_exec(fd, fdt);
>  	__put_unused_fd(files, fd);
>  	spin_unlock(&files->file_lock);

I have no opinions about the switch, however these 2 places make me
worried. I did not check all the changes so perhaps I missed something.

In the current code we are safe when it comes to concurrent install and
close, in particular here:

CPU0		CPU1
alloc_fd
		__close_fd
fd_install

__close_fd will either see a NULL pointer and return -EBADF or will see
an installed pointer and proceed with the close.

Your proposed patch seems to be buggy in this regard.

You call idr_remove, which from what I understand will free up the slot
no matter what. You only detect an error based on whether there was a
non-NULL pointer there or not. If so, fd_install can proceed to play
with a deallocated entry.

-- 
Mateusz Guzik

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-10-04 15:45 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-27 18:48 [PATCH 00/13] vfs: Convert file allocation code to use the IDR Sandhya Bankar
2017-04-27 18:54 ` [PATCH 01/13] idr: Add ability to set/clear tags Matthew Wilcox
2017-04-27 19:05 ` [PATCH 02/13] idr: Add idr_for_each_entry_tagged() Sandhya Bankar
2017-04-27 19:06 ` [PATCH 03/13] idr, radix-tree: Add get_tag_batch function Sandhya Bankar
2017-04-27 19:07 ` [PATCH 04/13] idr, radix-tree: Implement copy_preload Sandhya Bankar
2017-04-27 19:08 ` [PATCH 05/13] vfs: Replace array of file pointers with an IDR Sandhya Bankar
2017-10-04 15:45   ` [RESEND PATCH " Mateusz Guzik
2017-04-27 19:09 ` [PATCH 06/13] vfs: Remove next_fd from fd alloc code path Sandhya Bankar
2017-04-27 19:10 ` [PATCH 07/13] vfs: Remove full_fds_bits from fd allocation " Sandhya Bankar
2017-04-27 19:11 ` [PATCH 08/13] vfs: Use idr_tag_get() in fd_is_open() Sandhya Bankar
2017-04-27 19:12 ` [PATCH 09/13] vfs: Rewrite close_files() Sandhya Bankar
2017-04-27 19:14 ` [PATCH 10/13] vfs: Replace close_on_exec bitmap with an IDR tag Sandhya Bankar
2017-04-27 19:17 ` [PATCH 11/13] vfs: Add init_task.h include Sandhya Bankar
2017-04-27 19:18 ` [PATCH 12/13] vfs: Convert select to use idr_get_tag_batch() Sandhya Bankar
2017-04-27 19:19 ` [PATCH 13/13] vfs: Delete struct fdtable Sandhya Bankar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).